Message ID | 20250326-cross-lock-dep-v1-1-3199e49e8652@bootlin.com |
---|---|
State | New |
Headers | show |
Series | Fix the ABBA locking situation between clk and runtime PM | expand |
Hello Rafael, >> The runtime PM core currently allows to runtime resume/suspend a device, >> or its suppliers. >> >> Let's make it also possible to runtime resume/suspend consumers. >> >> Consumers and suppliers are seen here through the description made by >> device_links. > > It would be good to explain why all of this is needed. > > I gather that it is used for resolving some synchronization issues in > the clk framework, but neither the cover letter nor this changelog > explains how it is used. The explanation is quite long, there have been already 3 full threads from people attempting to fix a problem that resides in the clock subsystem (but that may also be probably problematic in others, just uncovered so far). I don't know if you took the time to read the cover letter: https://lore.kernel.org/linux-clk/20250326-cross-lock-dep-v1-0-3199e49e8652@bootlin.com/ It tries to explain the problem and the approach to fix this problem, but let me try to give a runtime PM focused view of it here. [Problem] We do have an ABBA locking situation between clk and any other subsystem that might be in use during runtime_resume() operations, provided that these subsystems also make clk calls at some point. The usual suspect here are power domains. There are different approaches that can be taken but the one that felt the most promising when we discussed it during last LPC (and also the one that was partially implemented in the clk subsystem already for a tiny portion of it) is the rule that "subsystem locks should not be kept acquired while calling in some other subsystems". Typically in the clk subsystem the logic is: func() { mutex_lock(clk); runtime_resume(clk); ... } Whereas what would definitely work without locking issues is the opposite: func() { runtime_resume(clk); mutex_lock(clk); ... } Of course life is not so simple, and the clock core is highly recursive, which means inverting the two calls like I hinted above simply does not work as we go deeper in the subcalls. As a result, we need to runtime resume *all* the relevant clocks in advance, before calling functions recursively (the lock itself is allowed to re-enter and is not blocking in this case). I followed all possible paths in the clock subsystem and identified 3 main categories. The list of clocks we need to runtime resume in advance can either be: 1- the parent clocks 2- the child clocks 3- the parent and child clocks 4- all the clocks (typically for debugfs/sysfs purposes). [Solution 1: discarded] The first approach to do that was do to some guessing based on the clock tree topology. Unfortunately this approach does not stand because it is virtually unbounded. In order to know the clock topology we must acquire the clock main lock. In order to runtime resume we must release it. As a result, this logic is virtually unbounded (even though in practice we would converge at some point). So this approach was discarded by Steven. [Solution 2: this proposal] After the LPC discussion with Steven, I also discussed with Saravana about this and he pointed that since we were using fw_devlink=rpm by default now, all providers -including clock controllers of course- would already be runtime resumed the first time we would make a runtime_resume(clk), and thus all the nested calls were no longer needed. This native solution was already addressing point #1 above (and partially point #3) and all I had to do was to make a similar function for point #2. And here we are, trying to resume all consumers (from a device link perspective) which include, but is not limited to, consumer clocks. I hope this explanation will help understanding this patch and why it is needed for this series. As stated in the cover letter, I've tried to keep the changes here to their minimum. Maybe there are other/better ways to do that and we can discuss them. My priority is however to get this possible ABBA deadlock situation sorted out. I can further expand the commit log with these details if you want. Thanks, Miquèl
Hi, On Fri, Mar 28, 2025 at 10:59 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hello Rafael, > > >> The runtime PM core currently allows to runtime resume/suspend a device, > >> or its suppliers. > >> > >> Let's make it also possible to runtime resume/suspend consumers. > >> > >> Consumers and suppliers are seen here through the description made by > >> device_links. > > > > It would be good to explain why all of this is needed. > > > > I gather that it is used for resolving some synchronization issues in > > the clk framework, but neither the cover letter nor this changelog > > explains how it is used. > > The explanation is quite long, there have been already 3 full threads > from people attempting to fix a problem that resides in the clock > subsystem (but that may also be probably problematic in others, just > uncovered so far). I don't know if you took the time to read the cover > letter: > https://lore.kernel.org/linux-clk/20250326-cross-lock-dep-v1-0-3199e49e8652@bootlin.com/ > It tries to explain the problem and the approach to fix this problem, > but let me try to give a runtime PM focused view of it here. > > [Problem] > > We do have an ABBA locking situation between clk and any other subsystem > that might be in use during runtime_resume() operations, provided that > these subsystems also make clk calls at some point. The usual suspect > here are power domains. > > There are different approaches that can be taken but the one that felt > the most promising when we discussed it during last LPC (and also the > one that was partially implemented in the clk subsystem already for a > tiny portion of it) is the rule that "subsystem locks should not be kept > acquired while calling in some other subsystems". > > Typically in the clk subsystem the logic is: > > func() { > mutex_lock(clk); > runtime_resume(clk); > ... > } > > Whereas what would definitely work without locking issues is the > opposite: > > func() { > runtime_resume(clk); > mutex_lock(clk); > ... > } > > Of course life is not so simple, and the clock core is highly > recursive, which means inverting the two calls like I hinted above > simply does not work as we go deeper in the subcalls. As a result, we > need to runtime resume *all* the relevant clocks in advance, before > calling functions recursively (the lock itself is allowed to re-enter > and is not blocking in this case). > > I followed all possible paths in the clock subsystem and identified 3 > main categories. The list of clocks we need to runtime resume in advance > can either be: > 1- the parent clocks > 2- the child clocks > 3- the parent and child clocks > 4- all the clocks (typically for debugfs/sysfs purposes). > > [Solution 1: discarded] > > The first approach to do that was do to some guessing based on the clock > tree topology. Unfortunately this approach does not stand because it is > virtually unbounded. In order to know the clock topology we must acquire > the clock main lock. In order to runtime resume we must release it. As a > result, this logic is virtually unbounded (even though in practice we > would converge at some point). So this approach was discarded by Steven. > > [Solution 2: this proposal] > > After the LPC discussion with Steven, I also discussed with Saravana > about this and he pointed that since we were using fw_devlink=rpm by > default now, all providers -including clock controllers of course- would > already be runtime resumed the first time we would make a > runtime_resume(clk), and thus all the nested calls were no longer > needed. This native solution was already addressing point #1 above (and > partially point #3) and all I had to do was to make a similar function > for point #2. So this depends on DT being used and fw_devlink=rpm being used, doesn't it? You cannot really assume in general that there will be device links between parents and children.
Hi Rafael, Thanks for taking the time to read all that. >> After the LPC discussion with Steven, I also discussed with Saravana >> about this and he pointed that since we were using fw_devlink=rpm by >> default now, all providers -including clock controllers of course- would >> already be runtime resumed the first time we would make a >> runtime_resume(clk), and thus all the nested calls were no longer >> needed. This native solution was already addressing point #1 above (and >> partially point #3) and all I had to do was to make a similar function >> for point #2. > > So this depends on DT being used and fw_devlink=rpm being used, > doesn't it? DT, not really. fw_devlink=rpm however, yes. > You cannot really assume in general that there will be device links > between parents and children. But if runtime PM already mandates fw_devlink to be the information source (which, IIRC is the case since fw_devlink=rpm), then why wouldn't this approach work? For sure there may be holes in fw_devlink, but what is the reason for it if we cannot use it? In other words, are you suggesting that this approach is invalid? If yes could you elaborate a bit? For this approach to work we do not need all the parenting to be perfectly described, just relationships between clock consumers and providers, which are in general rather basic. Thanks, Miquèl
Hi, On Thu, Apr 10, 2025 at 9:54 AM Miquel Raynal <miquel.raynal@bootlin.com> wrote: > > Hi Rafael, > > Thanks for taking the time to read all that. > > >> After the LPC discussion with Steven, I also discussed with Saravana > >> about this and he pointed that since we were using fw_devlink=rpm by > >> default now, all providers -including clock controllers of course- would > >> already be runtime resumed the first time we would make a > >> runtime_resume(clk), and thus all the nested calls were no longer > >> needed. This native solution was already addressing point #1 above (and > >> partially point #3) and all I had to do was to make a similar function > >> for point #2. > > > > So this depends on DT being used and fw_devlink=rpm being used, > > doesn't it? > > DT, not really. fw_devlink=rpm however, yes. Which means DT because fw_devlink=rpm is DT-specific. At least it is not used on systems where ACPI is the firmware interface. > > You cannot really assume in general that there will be device links > > between parents and children. > > But if runtime PM already mandates fw_devlink to be the information > source (which, IIRC is the case since fw_devlink=rpm), then why wouldn't > this approach work? For sure there may be holes in fw_devlink, but > what is the reason for it if we cannot use it? Well, see above. > In other words, are you suggesting that this approach is invalid? If yes > could you elaborate a bit? For this approach to work we do not need all > the parenting to be perfectly described, just relationships between > clock consumers and providers, which are in general rather basic. So you know which devices are parents and children without fw_devlink and this information can be used readily on all systems AFAICS. IIUC, the overall approach is to resume the entire hierarchy before making changes that may deadlock and I think that this is a good idea in general. However, you need to do it in a way that's usable on all systems and when you walk the hierarchy from top to bottom, you need to do it recursively I think because resuming a device doesn't cause its children or consumers to resume.
diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c index 2ee45841486bc73225b3e971164466647b3ce6d3..04bb66c18e3e4a45751fb3f9a6a1267d73757310 100644 --- a/drivers/base/power/runtime.c +++ b/drivers/base/power/runtime.c @@ -1841,6 +1841,60 @@ void pm_runtime_put_suppliers(struct device *dev) device_links_read_unlock(idx); } +static void __pm_runtime_get_consumers(struct device *dev) +{ + struct device_link *link; + + list_for_each_entry_rcu(link, &dev->links.consumers, s_node, + device_links_read_lock_held()) + if (link->flags & DL_FLAG_PM_RUNTIME) { + pm_runtime_get_sync(link->consumer); + __pm_runtime_get_consumers(link->consumer); + } +} + +/** + * pm_runtime_get_consumers - Resume and reference-count consumer devices. + * @dev: Supplier device. + */ +void pm_runtime_get_consumers(struct device *dev) +{ + int idx; + + idx = device_links_read_lock(); + + __pm_runtime_get_consumers(dev); + + device_links_read_unlock(idx); +} + +static void __pm_runtime_put_consumers(struct device *dev) +{ + struct device_link *link; + + list_for_each_entry_rcu(link, &dev->links.consumers, s_node, + device_links_read_lock_held()) + if (link->flags & DL_FLAG_PM_RUNTIME) { + pm_runtime_put(link->consumer); + __pm_runtime_put_consumers(link->consumer); + } +} + +/** + * pm_runtime_put_consumers - Drop references to consumer devices. + * @dev: Supplier device. + */ +void pm_runtime_put_consumers(struct device *dev) +{ + int idx; + + idx = device_links_read_lock(); + + __pm_runtime_put_consumers(dev); + + device_links_read_unlock(idx); +} + void pm_runtime_new_link(struct device *dev) { spin_lock_irq(&dev->power.lock); diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index d39dc863f612fe18dc34182117f87908d63c8e6d..151c885a3f421f09509232f144618da62297d61d 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -89,6 +89,8 @@ extern u64 pm_runtime_autosuspend_expiration(struct device *dev); extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable); extern void pm_runtime_get_suppliers(struct device *dev); extern void pm_runtime_put_suppliers(struct device *dev); +extern void pm_runtime_get_consumers(struct device *dev); +extern void pm_runtime_put_consumers(struct device *dev); extern void pm_runtime_new_link(struct device *dev); extern void pm_runtime_drop_link(struct device_link *link); extern void pm_runtime_release_supplier(struct device_link *link);
The runtime PM core currently allows to runtime resume/suspend a device, or its suppliers. Let's make it also possible to runtime resume/suspend consumers. Consumers and suppliers are seen here through the description made by device_links. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> --- drivers/base/power/runtime.c | 54 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/pm_runtime.h | 2 ++ 2 files changed, 56 insertions(+)