Message ID | 08f3bd66d7fc8e218bb6958777f342786b2c3705.1731554471.git.len.brown@intel.com |
---|---|
State | New |
Headers | show |
Series | [RFC/RFT] PM: sleep: Ignore device driver suspend() callback return values | expand |
Expanded CC list. On Thu, Nov 14, 2024 at 4:23 AM Len Brown <lenb@kernel.org> wrote: > > From: Len Brown <len.brown@intel.com> > > Drivers commonly return non-zero values from their suspend > callbacks due to transient errors, not realizing that doing so > aborts system-wide suspend. > > Log, but do not abort system suspend on non-zero return values > from driver's .suspend/.suspend_noirq/.suspend_late callbacks. > > Both before and after this patch, the correct method for a > device driver to abort system-wide suspend is to invoke > pm_system_wakeup() during the suspend flow. > > Legacy behaviour can be restored by adding this line to your .config: > CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT=y > > Signed-off-by: Len Brown <len.brown@intel.com> > --- > Documentation/power/driver_api.rst | 25 +++++++++++++++++++++++++ > Documentation/power/index.rst | 1 + > drivers/base/power/main.c | 25 ++++++++++++++++++------- > kernel/power/Kconfig | 17 +++++++++++++++++ > 4 files changed, 61 insertions(+), 7 deletions(-) > create mode 100644 Documentation/power/driver_api.rst > > diff --git a/Documentation/power/driver_api.rst b/Documentation/power/driver_api.rst > new file mode 100644 > index 000000000000..b9a46a17f39b > --- /dev/null > +++ b/Documentation/power/driver_api.rst > @@ -0,0 +1,25 @@ > +.. SPDX-License-Identifier: GPL-2.0+ > + > +================== > +Driver Suspend API > +================== > + > + > +1. How Can A driver abort system suspend? > +----------------------------------------- > + > +Any driver can abort system-wide by invoking pm_system_wakeup() > +during the suspend flow. > + > +ie. from the drivers suspend callbacks: > + .suspend() > + .suspend_noirq() > + .suspend_late() > + > +Alternatively, if CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT=y is present in .config, > +then any non-zero return value from any device drivers callback: > + .suspend() > + .suspend_noirq() > + .suspend_late() > +will abort the system-wide suspend flow. > +Note that CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT=n, by default. > diff --git a/Documentation/power/index.rst b/Documentation/power/index.rst > index a0f5244fb427..f655662a9c15 100644 > --- a/Documentation/power/index.rst > +++ b/Documentation/power/index.rst > @@ -7,6 +7,7 @@ Power Management > .. toctree:: > :maxdepth: 1 > > + driver_api > apm-acpi > basic-pm-debugging > charger-manager > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > index 4a67e83300e1..1b4ab73112e4 100644 > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -1244,7 +1244,6 @@ static int device_suspend_noirq(struct device *dev, pm_message_t state, bool asy > Run: > error = dpm_run_callback(callback, dev, state, info); > if (error) { > - async_error = error; > dpm_save_failed_dev(dev_name(dev)); > pm_dev_err(dev, state, async ? " async noirq" : " noirq", error); > goto Complete; > @@ -1270,7 +1269,12 @@ static int device_suspend_noirq(struct device *dev, pm_message_t state, bool asy > Complete: > complete_all(&dev->power.completion); > TRACE_SUSPEND(error); > - return error; > + > + if (IS_ENABLED(CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT)) { > + async_error = error; > + return error; > + } > + return 0; > } > > static void async_suspend_noirq(void *data, async_cookie_t cookie) > @@ -1424,7 +1428,6 @@ static int device_suspend_late(struct device *dev, pm_message_t state, bool asyn > Run: > error = dpm_run_callback(callback, dev, state, info); > if (error) { > - async_error = error; > dpm_save_failed_dev(dev_name(dev)); > pm_dev_err(dev, state, async ? " async late" : " late", error); > goto Complete; > @@ -1437,7 +1440,12 @@ static int device_suspend_late(struct device *dev, pm_message_t state, bool asyn > Complete: > TRACE_SUSPEND(error); > complete_all(&dev->power.completion); > - return error; > + > + if (IS_ENABLED(CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT)) { > + async_error = error; > + return error; > + } > + return 0; > } > > static void async_suspend_late(void *data, async_cookie_t cookie) > @@ -1681,7 +1689,7 @@ static int device_suspend(struct device *dev, pm_message_t state, bool async) > error = dpm_run_callback(callback, dev, state, info); > > End: > - if (!error) { > + if (!error || !IS_ENABLED(CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT)) { > dev->power.is_suspended = true; > if (device_may_wakeup(dev)) > dev->power.wakeup_path = true; > @@ -1695,14 +1703,17 @@ static int device_suspend(struct device *dev, pm_message_t state, bool async) > > Complete: > if (error) { > - async_error = error; > dpm_save_failed_dev(dev_name(dev)); > pm_dev_err(dev, state, async ? " async" : "", error); > } > > complete_all(&dev->power.completion); > TRACE_SUSPEND(error); > - return error; > + if (IS_ENABLED(CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT)) { > + async_error = error; > + return error; > + } > + return 0; > } > > static void async_suspend(void *data, async_cookie_t cookie) > diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig > index afce8130d8b9..db120bba0826 100644 > --- a/kernel/power/Kconfig > +++ b/kernel/power/Kconfig > @@ -141,6 +141,23 @@ config PM_SLEEP > depends on SUSPEND || HIBERNATE_CALLBACKS > select PM > > +config PM_SLEEP_LEGACY_CALLBACK_ABORT > + bool "Enable legacy callback abort via return value" > + depends on PM_SLEEP > + help > + This option enables the legacy API for device .suspend() callbacks. > + That API empowered any driver to abort system-wide suspend > + by returning any non-zero value from its suspend calbacks: > + (.suspend/.suspend_noirq/.suspend_late) > + In practice, these aborts are almost always spurious and unwanted. > + > + Disabling this option (default) ignores .suspend() callback return values, > + though they are still traced and logged. > + > + The proper way for a device driver to abort system-wide suspend is to > + invoke pm_system_wakeup() during the suspend flow. This method is > + valid, independent of this config option. > + > config PM_SLEEP_SMP > def_bool y > depends on SMP > -- I'm wondering if there are any opinions on this. IMV, drivers returning errors from their suspend callbacks without a sufficiently serious reason are kind of a problem.
On Thu, Dec 05, 2024 at 12:55:08PM +0100, Rafael J. Wysocki wrote: > Expanded CC list. > > On Thu, Nov 14, 2024 at 4:23 AM Len Brown <lenb@kernel.org> wrote: > > > > From: Len Brown <len.brown@intel.com> > > > > Drivers commonly return non-zero values from their suspend > > callbacks due to transient errors, not realizing that doing so > > aborts system-wide suspend. > > > > Log, but do not abort system suspend on non-zero return values > > from driver's .suspend/.suspend_noirq/.suspend_late callbacks. > > > > Both before and after this patch, the correct method for a > > device driver to abort system-wide suspend is to invoke > > pm_system_wakeup() during the suspend flow. > > > > Legacy behaviour can be restored by adding this line to your .config: > > CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT=y > > > > Signed-off-by: Len Brown <len.brown@intel.com> > > --- > > I'm wondering if there are any opinions on this. > > IMV, drivers returning errors from their suspend callbacks without a > sufficiently serious reason are kind of a problem. There is a least one driver whose suspend callback returns an error if the device is enabled for wakeup and a wakeup event occurs during the suspend procedure. We don't want to ignore those races. Alan Stern
On Thu, 5 Dec 2024 at 16:09, Alan Stern <stern@rowland.harvard.edu> wrote: > > On Thu, Dec 05, 2024 at 12:55:08PM +0100, Rafael J. Wysocki wrote: > > Expanded CC list. > > > > On Thu, Nov 14, 2024 at 4:23 AM Len Brown <lenb@kernel.org> wrote: > > > > > > From: Len Brown <len.brown@intel.com> > > > > > > Drivers commonly return non-zero values from their suspend > > > callbacks due to transient errors, not realizing that doing so > > > aborts system-wide suspend. > > > > > > Log, but do not abort system suspend on non-zero return values > > > from driver's .suspend/.suspend_noirq/.suspend_late callbacks. > > > > > > Both before and after this patch, the correct method for a > > > device driver to abort system-wide suspend is to invoke > > > pm_system_wakeup() during the suspend flow. > > > > > > Legacy behaviour can be restored by adding this line to your .config: > > > CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT=y > > > > > > Signed-off-by: Len Brown <len.brown@intel.com> > > > --- > > > > > I'm wondering if there are any opinions on this. > > > > IMV, drivers returning errors from their suspend callbacks without a > > sufficiently serious reason are kind of a problem. > > There is a least one driver whose suspend callback returns an error if > the device is enabled for wakeup and a wakeup event occurs during the > suspend procedure. We don't want to ignore those races. > > Alan Stern Right. I also think this looks a bit risky as the current behaviour has really been there for a long time. Who knows what depends on this. A way forward could be to implement the change as an opt-in thing, rather than an opt-out. That would allow us to test it and see how it plays to potentially change the default behaviour down the road. Kind regards Uffe
On Thu, Dec 5, 2024 at 10:09 AM Alan Stern <stern@rowland.harvard.edu> wrote: > > IMV, drivers returning errors from their suspend callbacks without a > > sufficiently serious reason are kind of a problem. > > There is a least one driver whose suspend callback returns an error if > the device is enabled for wakeup and a wakeup event occurs during the > suspend procedure. We don't want to ignore those races. That driver should invoke pm_system_wakeup() on that wakeup event, right? Len Brown, Intel Open Source Technology Center
On Thu, Dec 5, 2024 at 10:33 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > ...I also think this looks a bit risky as the current behaviour > has really been there for a long time. Who knows what depends on this. If everything were working 100% of the time, no risk would be justified because no improvement is possible. But we run over 1,000,000 suspend resume cycles per release in our lab, and this issue as a category, is the single most common failure. Worse, there is a huge population of drivers, and we can't possibly test them all into correctness. Every release this issue crops when another driver hiccups in response to some device specific transient issue. The current implementation is not a viable design. > A way forward could be to implement the change as an opt-in thing, > rather than an opt-out. That would allow us to test it and see how it > plays to potentially change the default behaviour down the road. The default is the only configuration that matters. Len Brown, Intel Open Source Technology Center
On Thu, Dec 05, 2024 at 12:24:29PM -0500, Len Brown wrote: > On Thu, Dec 5, 2024 at 10:09 AM Alan Stern <stern@rowland.harvard.edu> wrote: > > > > IMV, drivers returning errors from their suspend callbacks without a > > > sufficiently serious reason are kind of a problem. > > > > There is a least one driver whose suspend callback returns an error if > > the device is enabled for wakeup and a wakeup event occurs during the > > suspend procedure. We don't want to ignore those races. > > That driver should invoke pm_system_wakeup() on that wakeup event, right? Should it? When the system isn't yet suspended? Is this documented anywhere? There's still a race. Suppose a wakeup-enabled device signals an interrupt and the interrupt handler runs just before the suspend-resume transition begins. The handler might end up calling pm_system_wakeup() before the suspend starts, which won't accomplish anything, while the work queue that takes care of the device's events might get frozen (as part of the suspend transition) before it handles the new request. As a result, the wakeup event will remain in limbo, not aborting the suspend, not causing an immediate resume, and not getting handled until something else wakes up the system. Alan Stern
On 12/5/24 09:36, Len Brown wrote: > On Thu, Dec 5, 2024 at 10:33 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > >> ...I also think this looks a bit risky as the current behaviour >> has really been there for a long time. Who knows what depends on this. > > If everything were working 100% of the time, no risk would be justified > because no improvement is possible. > > But we run over 1,000,000 suspend resume cycles per release in our lab, > and this issue as a category, is the single most common failure. But you are starting to enter the big number category here, eventually something is going to fail with that many iterations. How was this 1 million iterations determined to be a good pass/fail criteria and just not an arbitrarily high number intended to shake off issues? Surely with such a big number you start getting an idea of which specific drivers within your test devices tend to fail to suspend? FWIW, with the products I work with, which are mainly set-top-box devices, we just set a pass/fail criteria at 100k which is essentially assuming there will be 27 suspend/resume cycles per day for the next 10 years, given the lifespan of the products, that seemed way overboard, realistically there is going to be more like 2-3 suspend/resume cycles per day. > > Worse, there is a huge population of drivers, and we can't possibly test > them all into correctness. Every release this issue crops when another > driver hiccups in response to some device specific transient issue. > > The current implementation is not a viable design. Neither is this approach because it assumes that drivers that need to abort the system suspend call pm_system_wakeup(), which most do not, they return -EBUSY or something like that. There is a total of 12 or so drivers calling pm_system_wakeup(), that's not the majority. How about you flipped the logic around, introduce an option that lets you ignore the suspend callback return value gated by a Kconfig option?
On Thu, Dec 5, 2024 at 9:57 AM Florian Fainelli <florian.fainelli@broadcom.com> wrote: > > On 12/5/24 09:36, Len Brown wrote: > > On Thu, Dec 5, 2024 at 10:33 AM Ulf Hansson <ulf.hansson@linaro.org> wrote: > > > >> ...I also think this looks a bit risky as the current behaviour > >> has really been there for a long time. Who knows what depends on this. > > > > If everything were working 100% of the time, no risk would be justified > > because no improvement is possible. > > > But we run over 1,000,000 suspend resume cycles per release in our lab, > > and this issue as a category, is the single most common failure. > > But you are starting to enter the big number category here, eventually > something is going to fail with that many iterations. > > How was this 1 million iterations determined to be a good pass/fail > criteria and just not an arbitrarily high number intended to shake off > issues? Surely with such a big number you start getting an idea of which > specific drivers within your test devices tend to fail to suspend? > > FWIW, with the products I work with, which are mainly set-top-box > devices, we just set a pass/fail criteria at 100k which is essentially > assuming there will be 27 suspend/resume cycles per day for the next 10 > years, given the lifespan of the products, that seemed way overboard, > realistically there is going to be more like 2-3 suspend/resume cycles > per day. > > > > > Worse, there is a huge population of drivers, and we can't possibly test > > them all into correctness. Every release this issue crops when another > > driver hiccups in response to some device specific transient issue. > > > > The current implementation is not a viable design. > > Neither is this approach because it assumes that drivers that need to > abort the system suspend call pm_system_wakeup(), which most do not, Agree completely with Ulf and Florian. This patch series is way too optimistic in thinking that all drivers are/should be calling pm_system_wakeup(). I'd argue it makes more sense and is intuitive to return an error if a device suspend fails and have the framework act on it immediately than setting some global flag and then looking at a later point in the framework. Based on the rest of the emails in this thread, this seems more like a driver problem than a framework issue. If you see a driver that's returning an error when it shouldn't, fix it. If you are testing this only on some specific set of hardware to claim it works without problems, then you can also just go fix the drivers for that specific set of hardware and call it a day. If that's "too many drivers to fix", then the amount of untested drivers surely is orders of magnitude more than that. At best this should be a driver specific flag (why not just fix the driver in that case) or a commandline arg that's default disabled. Then whoever wants to use this sledgehammer for their hardware can do so without affecting others. -Saravana > they return -EBUSY or something like that. There is a total of 12 or so > drivers calling pm_system_wakeup(), that's not the majority. > > How about you flipped the logic around, introduce an option that lets > you ignore the suspend callback return value gated by a Kconfig option? > -- > Florian >
diff --git a/Documentation/power/driver_api.rst b/Documentation/power/driver_api.rst new file mode 100644 index 000000000000..b9a46a17f39b --- /dev/null +++ b/Documentation/power/driver_api.rst @@ -0,0 +1,25 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +================== +Driver Suspend API +================== + + +1. How Can A driver abort system suspend? +----------------------------------------- + +Any driver can abort system-wide by invoking pm_system_wakeup() +during the suspend flow. + +ie. from the drivers suspend callbacks: + .suspend() + .suspend_noirq() + .suspend_late() + +Alternatively, if CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT=y is present in .config, +then any non-zero return value from any device drivers callback: + .suspend() + .suspend_noirq() + .suspend_late() +will abort the system-wide suspend flow. +Note that CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT=n, by default. diff --git a/Documentation/power/index.rst b/Documentation/power/index.rst index a0f5244fb427..f655662a9c15 100644 --- a/Documentation/power/index.rst +++ b/Documentation/power/index.rst @@ -7,6 +7,7 @@ Power Management .. toctree:: :maxdepth: 1 + driver_api apm-acpi basic-pm-debugging charger-manager diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 4a67e83300e1..1b4ab73112e4 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -1244,7 +1244,6 @@ static int device_suspend_noirq(struct device *dev, pm_message_t state, bool asy Run: error = dpm_run_callback(callback, dev, state, info); if (error) { - async_error = error; dpm_save_failed_dev(dev_name(dev)); pm_dev_err(dev, state, async ? " async noirq" : " noirq", error); goto Complete; @@ -1270,7 +1269,12 @@ static int device_suspend_noirq(struct device *dev, pm_message_t state, bool asy Complete: complete_all(&dev->power.completion); TRACE_SUSPEND(error); - return error; + + if (IS_ENABLED(CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT)) { + async_error = error; + return error; + } + return 0; } static void async_suspend_noirq(void *data, async_cookie_t cookie) @@ -1424,7 +1428,6 @@ static int device_suspend_late(struct device *dev, pm_message_t state, bool asyn Run: error = dpm_run_callback(callback, dev, state, info); if (error) { - async_error = error; dpm_save_failed_dev(dev_name(dev)); pm_dev_err(dev, state, async ? " async late" : " late", error); goto Complete; @@ -1437,7 +1440,12 @@ static int device_suspend_late(struct device *dev, pm_message_t state, bool asyn Complete: TRACE_SUSPEND(error); complete_all(&dev->power.completion); - return error; + + if (IS_ENABLED(CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT)) { + async_error = error; + return error; + } + return 0; } static void async_suspend_late(void *data, async_cookie_t cookie) @@ -1681,7 +1689,7 @@ static int device_suspend(struct device *dev, pm_message_t state, bool async) error = dpm_run_callback(callback, dev, state, info); End: - if (!error) { + if (!error || !IS_ENABLED(CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT)) { dev->power.is_suspended = true; if (device_may_wakeup(dev)) dev->power.wakeup_path = true; @@ -1695,14 +1703,17 @@ static int device_suspend(struct device *dev, pm_message_t state, bool async) Complete: if (error) { - async_error = error; dpm_save_failed_dev(dev_name(dev)); pm_dev_err(dev, state, async ? " async" : "", error); } complete_all(&dev->power.completion); TRACE_SUSPEND(error); - return error; + if (IS_ENABLED(CONFIG_PM_SLEEP_LEGACY_CALLBACK_ABORT)) { + async_error = error; + return error; + } + return 0; } static void async_suspend(void *data, async_cookie_t cookie) diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig index afce8130d8b9..db120bba0826 100644 --- a/kernel/power/Kconfig +++ b/kernel/power/Kconfig @@ -141,6 +141,23 @@ config PM_SLEEP depends on SUSPEND || HIBERNATE_CALLBACKS select PM +config PM_SLEEP_LEGACY_CALLBACK_ABORT + bool "Enable legacy callback abort via return value" + depends on PM_SLEEP + help + This option enables the legacy API for device .suspend() callbacks. + That API empowered any driver to abort system-wide suspend + by returning any non-zero value from its suspend calbacks: + (.suspend/.suspend_noirq/.suspend_late) + In practice, these aborts are almost always spurious and unwanted. + + Disabling this option (default) ignores .suspend() callback return values, + though they are still traced and logged. + + The proper way for a device driver to abort system-wide suspend is to + invoke pm_system_wakeup() during the suspend flow. This method is + valid, independent of this config option. + config PM_SLEEP_SMP def_bool y depends on SMP