mbox series

[v1,0/2] PM: runtime: Update device status before letting suppliers suspend (another take)

Message ID 5448054.DvuYhMxLoT@kreacher
Headers show
Series PM: runtime: Update device status before letting suppliers suspend (another take) | expand

Message

Rafael J. Wysocki March 18, 2021, 6:06 p.m. UTC
Hi All,

The previous attempt to address the issue tackled by this series, commit
44cc89f76464 ("PM: runtime: Update device status before letting suppliers
suspend") was incorrect, because it introduced a rather nasty race condition
into __rpm_callback(), so let's revert it (patch [1/2]).

Instead, let's avoid suspending the suppliers immediately after dropping the
PM-runtime references to them and do that later, when the status of the
consumer has changed to RPM_SUSPENDED.

Please see the patch changelogs for details.

Elaine, please test this series on the system where you saw the original
problem.

Thanks!

Comments

Ulf Hansson March 19, 2021, 1:29 p.m. UTC | #1
On Thu, 18 Mar 2021 at 19:15, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>

> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

>

> Because the PM-runtime status of the device is not updated in

> __rpm_callback(), attempts to suspend the suppliers of the given

> device triggered by the rpm_put_suppliers() call in there may fail.

>

> To fix this (1) modify __rpm_callback() to avoid attempting to

> actually suspend the suppliers, but only decrease their PM-runtime

> usage counters and (2) make rpm_suspend() try to suspend the suppliers

> after changing the device's PM-runtime status, in analogy with the

> handling of the device's parent.

>

> Link: https://lore.kernel.org/linux-pm/CAPDyKFqm06KDw_p8WXsM4dijDbho4bb6T4k50UqqvR1_COsp8g@mail.gmail.com/

> Fixes: 21d5c57b3726 ("PM / runtime: Use device links")

> Reported-by: elaine.zhang <zhangqing@rock-chips.com>

> Diagnosed-by: Ulf Hansson <ulf.hansson@linaro.org>

> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


Just a minor nitpick, see below. In any case:

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>


> ---

>  drivers/base/power/runtime.c |   45 +++++++++++++++++++++++++++++++++++++------

>  1 file changed, 39 insertions(+), 6 deletions(-)

>

> Index: linux-pm/drivers/base/power/runtime.c

> ===================================================================

> --- linux-pm.orig/drivers/base/power/runtime.c

> +++ linux-pm/drivers/base/power/runtime.c

> @@ -305,7 +305,7 @@ static int rpm_get_suppliers(struct devi

>         return 0;

>  }

>

> -static void rpm_put_suppliers(struct device *dev)

> +static void __rpm_put_suppliers(struct device *dev, bool try_to_suspend)

>  {

>         struct device_link *link;

>

> @@ -313,10 +313,30 @@ static void rpm_put_suppliers(struct dev

>                                 device_links_read_lock_held()) {

>

>                 while (refcount_dec_not_one(&link->rpm_active))

> -                       pm_runtime_put(link->supplier);

> +                       pm_runtime_put_noidle(link->supplier);

> +

> +               if (try_to_suspend)

> +                       pm_request_idle(link->supplier);

>         }

>  }

>

> +static void rpm_put_suppliers(struct device *dev)

> +{

> +       __rpm_put_suppliers(dev, true);

> +}

> +

> +static void rpm_try_to_suspend_suppliers(struct device *dev)


Maybe "rpm_suspend_suppliers" is sufficient for the name of the
function, but I have no strong opinion.

> +{

> +       struct device_link *link;

> +       int idx = device_links_read_lock();

> +

> +       list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,

> +                               device_links_read_lock_held())

> +               pm_request_idle(link->supplier);

> +

> +       device_links_read_unlock(idx);

> +}

> +


[...]

Kind regards
Uffe
Ulf Hansson March 19, 2021, 1:29 p.m. UTC | #2
On Thu, 18 Mar 2021 at 19:15, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>

> From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

>

> Revert commit 44cc89f76464 ("PM: runtime: Update device status

> before letting suppliers suspend") that introduced a race condition

> into __rpm_callback() which allowed a concurrent rpm_resume() to

> run and resume the device prematurely after its status had been

> changed to RPM_SUSPENDED by __rpm_callback().


Huh, the code path is not entirely easy to follow. :-)

Did you find this by code inspection or did you get an error report?

>

> Fixes: 44cc89f76464 ("PM: runtime: Update device status before letting suppliers suspend")

> Link: https://lore.kernel.org/linux-pm/24dfb6fc-5d54-6ee2-9195-26428b7ecf8a@intel.com/

> Reported by: Adrian Hunter <adrian.hunter@intel.com>

> Cc: 4.10+ <stable@vger.kernel.org> # 4.10+

> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>


Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>


Kind regards
Uffe


> ---

>  drivers/base/power/runtime.c | 62 +++++++++++++++---------------------

>  1 file changed, 25 insertions(+), 37 deletions(-)

>

> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c

> index 18b82427d0cb..a46a7e30881b 100644

> --- a/drivers/base/power/runtime.c

> +++ b/drivers/base/power/runtime.c

> @@ -325,22 +325,22 @@ static void rpm_put_suppliers(struct device *dev)

>  static int __rpm_callback(int (*cb)(struct device *), struct device *dev)

>         __releases(&dev->power.lock) __acquires(&dev->power.lock)

>  {

> -       bool use_links = dev->power.links_count > 0;

> -       bool get = false;

>         int retval, idx;

> -       bool put;

> +       bool use_links = dev->power.links_count > 0;

>

>         if (dev->power.irq_safe) {

>                 spin_unlock(&dev->power.lock);

> -       } else if (!use_links) {

> -               spin_unlock_irq(&dev->power.lock);

>         } else {

> -               get = dev->power.runtime_status == RPM_RESUMING;

> -

>                 spin_unlock_irq(&dev->power.lock);

>

> -               /* Resume suppliers if necessary. */

> -               if (get) {

> +               /*

> +                * Resume suppliers if necessary.

> +                *

> +                * The device's runtime PM status cannot change until this

> +                * routine returns, so it is safe to read the status outside of

> +                * the lock.

> +                */

> +               if (use_links && dev->power.runtime_status == RPM_RESUMING) {

>                         idx = device_links_read_lock();

>

>                         retval = rpm_get_suppliers(dev);

> @@ -355,36 +355,24 @@ static int __rpm_callback(int (*cb)(struct device *), struct device *dev)

>

>         if (dev->power.irq_safe) {

>                 spin_lock(&dev->power.lock);

> -               return retval;

> -       }

> -

> -       spin_lock_irq(&dev->power.lock);

> -

> -       if (!use_links)

> -               return retval;

> -

> -       /*

> -        * If the device is suspending and the callback has returned success,

> -        * drop the usage counters of the suppliers that have been reference

> -        * counted on its resume.

> -        *

> -        * Do that if the resume fails too.

> -        */

> -       put = dev->power.runtime_status == RPM_SUSPENDING && !retval;

> -       if (put)

> -               __update_runtime_status(dev, RPM_SUSPENDED);

> -       else

> -               put = get && retval;

> -

> -       if (put) {

> -               spin_unlock_irq(&dev->power.lock);

> -

> -               idx = device_links_read_lock();

> +       } else {

> +               /*

> +                * If the device is suspending and the callback has returned

> +                * success, drop the usage counters of the suppliers that have

> +                * been reference counted on its resume.

> +                *

> +                * Do that if resume fails too.

> +                */

> +               if (use_links

> +                   && ((dev->power.runtime_status == RPM_SUSPENDING && !retval)

> +                   || (dev->power.runtime_status == RPM_RESUMING && retval))) {

> +                       idx = device_links_read_lock();

>

> -fail:

> -               rpm_put_suppliers(dev);

> + fail:

> +                       rpm_put_suppliers(dev);

>

> -               device_links_read_unlock(idx);

> +                       device_links_read_unlock(idx);

> +               }

>

>                 spin_lock_irq(&dev->power.lock);

>         }

> --

> 2.26.2

>

>

>

>
Rafael J. Wysocki March 19, 2021, 1:43 p.m. UTC | #3
On Fri, Mar 19, 2021 at 2:31 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>

> On Thu, 18 Mar 2021 at 19:15, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

> >

> > From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

> >

> > Revert commit 44cc89f76464 ("PM: runtime: Update device status

> > before letting suppliers suspend") that introduced a race condition

> > into __rpm_callback() which allowed a concurrent rpm_resume() to

> > run and resume the device prematurely after its status had been

> > changed to RPM_SUSPENDED by __rpm_callback().

>

> Huh, the code path is not entirely easy to follow. :-)

>

> Did you find this by code inspection or did you get an error report?


There was a bug report that caused me to look at the code once again.

> > Fixes: 44cc89f76464 ("PM: runtime: Update device status before letting suppliers suspend")

> > Link: https://lore.kernel.org/linux-pm/24dfb6fc-5d54-6ee2-9195-26428b7ecf8a@intel.com/

> > Reported by: Adrian Hunter <adrian.hunter@intel.com>

> > Cc: 4.10+ <stable@vger.kernel.org> # 4.10+

> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

>

> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>


Thanks!
Rafael J. Wysocki March 19, 2021, 1:48 p.m. UTC | #4
On Fri, Mar 19, 2021 at 2:30 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>

> On Thu, 18 Mar 2021 at 19:15, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

> >

> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> >

> > Because the PM-runtime status of the device is not updated in

> > __rpm_callback(), attempts to suspend the suppliers of the given

> > device triggered by the rpm_put_suppliers() call in there may fail.

> >

> > To fix this (1) modify __rpm_callback() to avoid attempting to

> > actually suspend the suppliers, but only decrease their PM-runtime

> > usage counters and (2) make rpm_suspend() try to suspend the suppliers

> > after changing the device's PM-runtime status, in analogy with the

> > handling of the device's parent.

> >

> > Link: https://lore.kernel.org/linux-pm/CAPDyKFqm06KDw_p8WXsM4dijDbho4bb6T4k50UqqvR1_COsp8g@mail.gmail.com/

> > Fixes: 21d5c57b3726 ("PM / runtime: Use device links")

> > Reported-by: elaine.zhang <zhangqing@rock-chips.com>

> > Diagnosed-by: Ulf Hansson <ulf.hansson@linaro.org>

> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

>

> Just a minor nitpick, see below. In any case:

>

> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>


Thanks!

>

> > ---

> >  drivers/base/power/runtime.c |   45 +++++++++++++++++++++++++++++++++++++------

> >  1 file changed, 39 insertions(+), 6 deletions(-)

> >

> > Index: linux-pm/drivers/base/power/runtime.c

> > ===================================================================

> > --- linux-pm.orig/drivers/base/power/runtime.c

> > +++ linux-pm/drivers/base/power/runtime.c

> > @@ -305,7 +305,7 @@ static int rpm_get_suppliers(struct devi

> >         return 0;

> >  }

> >

> > -static void rpm_put_suppliers(struct device *dev)

> > +static void __rpm_put_suppliers(struct device *dev, bool try_to_suspend)

> >  {

> >         struct device_link *link;

> >

> > @@ -313,10 +313,30 @@ static void rpm_put_suppliers(struct dev

> >                                 device_links_read_lock_held()) {

> >

> >                 while (refcount_dec_not_one(&link->rpm_active))

> > -                       pm_runtime_put(link->supplier);

> > +                       pm_runtime_put_noidle(link->supplier);

> > +

> > +               if (try_to_suspend)

> > +                       pm_request_idle(link->supplier);

> >         }

> >  }

> >

> > +static void rpm_put_suppliers(struct device *dev)

> > +{

> > +       __rpm_put_suppliers(dev, true);

> > +}

> > +

> > +static void rpm_try_to_suspend_suppliers(struct device *dev)

>

> Maybe "rpm_suspend_suppliers" is sufficient for the name of the

> function, but I have no strong opinion.


OK

In addition to this, spin_unlock_irq()/spin_lock_irq() need to be used
around the call to it in rpm_suspend(), so I'll send a v2.  I guess
that the R-by still applies, though. :-)

> > +{

> > +       struct device_link *link;

> > +       int idx = device_links_read_lock();

> > +

> > +       list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,

> > +                               device_links_read_lock_held())

> > +               pm_request_idle(link->supplier);

> > +

> > +       device_links_read_unlock(idx);

> > +}

> > +