diff mbox series

[v1,1/5] PM: sleep: Fix runtime PM issue in dpm_resume()

Message ID 20241114220921.2529905-2-saravanak@google.com
State New
Headers show
Series Optimize async device suspend/resume | expand

Commit Message

Saravana Kannan Nov. 14, 2024, 10:09 p.m. UTC
Some devices might have their is_suspended flag set to false. In these
cases, dpm_resume() should skip doing anything for those devices.
However, runtime PM enable and a few others steps are done before
checking for this flag. Fix it so that we do things in the right order.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/power/main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Greg KH Nov. 16, 2024, 7:43 a.m. UTC | #1
On Thu, Nov 14, 2024 at 02:09:15PM -0800, Saravana Kannan wrote:
> Some devices might have their is_suspended flag set to false. In these
> cases, dpm_resume() should skip doing anything for those devices.
> However, runtime PM enable and a few others steps are done before
> checking for this flag. Fix it so that we do things in the right order.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>

This looks like a nice generic fix as well, should it go to older
kernels?

thanks,

greg k-h
Saravana Kannan Nov. 16, 2024, 9:06 p.m. UTC | #2
On Fri, Nov 15, 2024 at 11:43 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Thu, Nov 14, 2024 at 02:09:15PM -0800, Saravana Kannan wrote:
> > Some devices might have their is_suspended flag set to false. In these
> > cases, dpm_resume() should skip doing anything for those devices.
> > However, runtime PM enable and a few others steps are done before
> > checking for this flag. Fix it so that we do things in the right order.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> This looks like a nice generic fix as well, should it go to older
> kernels?

Yeah, it's meant to be a generic fix. But I'll feel better about it if
someone familiar with this code can give it a Reviewed-by.

And as I look at it... I have a bug in there. I think it should be
goto Complete and not Unlock! No idea how my devices didn't get freed
after a few suspend aborts!

I can send it out as a separate patch too if you want. And depending
on when it lands, I can keep it or drop it from v2 of this series.

Thanks,
Saravana
Rafael J. Wysocki Dec. 4, 2024, 12:53 p.m. UTC | #3
Trim CC list.

On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> Some devices might have their is_suspended flag set to false. In these
> cases, dpm_resume() should skip doing anything for those devices.

Not really.  This is particularly untrue for devices with
power.direct_complete set that have power.is_suspended clear.

> However, runtime PM enable and a few others steps are done before
> checking for this flag. Fix it so that we do things in the right order.

I don't see the bug this is fixing, but I can see bugs introduced by it.

I think that you want power.is_suspended to be checked before waiting
for the superiors.  Fair enough, since for devices with
power.is_suspended clear, there should be no superiors to wait for, so
the two checks can be done in any order and checking
power.is_suspended first would be less overhead.  And that's it
AFAICS.

> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/power/main.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 4a67e83300e1..86e51b9fefab 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
>         if (dev->power.syscore)
>                 goto Complete;
>
> +       if (!dev->power.is_suspended)
> +               goto Unlock;
> +
>         if (dev->power.direct_complete) {
>                 /* Match the pm_runtime_disable() in __device_suspend(). */
>                 pm_runtime_enable(dev);
> @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
>          */
>         dev->power.is_prepared = false;
>
> -       if (!dev->power.is_suspended)
> -               goto Unlock;
> -
>         if (dev->pm_domain) {
>                 info = "power domain ";
>                 callback = pm_op(&dev->pm_domain->ops, state);
> --
> 2.47.0.338.g60cca15819-goog
>
Rafael J. Wysocki March 11, 2025, 10:47 a.m. UTC | #4
On Wed, Dec 4, 2024 at 1:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> Trim CC list.
>
> On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > Some devices might have their is_suspended flag set to false. In these
> > cases, dpm_resume() should skip doing anything for those devices.
>
> Not really.  This is particularly untrue for devices with
> power.direct_complete set that have power.is_suspended clear.
>
> > However, runtime PM enable and a few others steps are done before
> > checking for this flag. Fix it so that we do things in the right order.
>
> I don't see the bug this is fixing, but I can see bugs introduced by it.

So AFAICS the bug is in the error path when dpm_suspend() fails in
which case some devices with direct_complete set may not have been
handled by device_suspend().  Since runtime PM has not been disabled
for those devices yet, it doesn't make sense to re-enable runtime PM
for them (and if they had runtime PM disabled to start with, this will
inadvertently enable runtime PM for them).

However, two changes are needed to fix this issue:
(1) power.is_suspended needs to be set for the devices with
direct_complete set in device_suspend().
(2) The power.is_suspended check needs to be moved after the
power.syscore one in device_resume().

The patch below only does (2) which is insufficient and it introduces
a functional issue for the direct_complete devices with runtime PM
disabled because it will cause runtime PM to remain disabled for them
permanently.

> I think that you want power.is_suspended to be checked before waiting
> for the superiors.  Fair enough, since for devices with
> power.is_suspended clear, there should be no superiors to wait for, so
> the two checks can be done in any order and checking
> power.is_suspended first would be less overhead.  And that's it
> AFAICS.
>
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/base/power/main.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > index 4a67e83300e1..86e51b9fefab 100644
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
> >         if (dev->power.syscore)
> >                 goto Complete;
> >
> > +       if (!dev->power.is_suspended)
> > +               goto Unlock;

And this should be "goto Complete" because jumping to Unlock
introduces a device locking imbalance.

> > +
> >         if (dev->power.direct_complete) {
> >                 /* Match the pm_runtime_disable() in __device_suspend(). */
> >                 pm_runtime_enable(dev);
> > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
> >          */
> >         dev->power.is_prepared = false;
> >
> > -       if (!dev->power.is_suspended)
> > -               goto Unlock;
> > -
> >         if (dev->pm_domain) {
> >                 info = "power domain ";
> >                 callback = pm_op(&dev->pm_domain->ops, state);
> > --

If you want to submit a new version of this patch, please do so by the
end of the week or I will send my fix because I want this issue to be
addressed in 6.15.

BTW, please note that this is orthogonal to the recent async
suspend-resume series

https://lore.kernel.org/linux-pm/13709135.uLZWGnKmhe@rjwysocki.net/

so there is no reason why it should be addressed in that series.
Saravana Kannan March 13, 2025, 1:49 a.m. UTC | #5
On Tue, Mar 11, 2025 at 3:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Dec 4, 2024 at 1:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > Trim CC list.
> >
> > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > Some devices might have their is_suspended flag set to false. In these
> > > cases, dpm_resume() should skip doing anything for those devices.
> >
> > Not really.  This is particularly untrue for devices with
> > power.direct_complete set that have power.is_suspended clear.
> >
> > > However, runtime PM enable and a few others steps are done before
> > > checking for this flag. Fix it so that we do things in the right order.
> >
> > I don't see the bug this is fixing, but I can see bugs introduced by it.
>
> So AFAICS the bug is in the error path when dpm_suspend() fails in
> which case some devices with direct_complete set may not have been
> handled by device_suspend().  Since runtime PM has not been disabled
> for those devices yet, it doesn't make sense to re-enable runtime PM
> for them (and if they had runtime PM disabled to start with, this will
> inadvertently enable runtime PM for them).
>
> However, two changes are needed to fix this issue:
> (1) power.is_suspended needs to be set for the devices with
> direct_complete set in device_suspend().
> (2) The power.is_suspended check needs to be moved after the
> power.syscore one in device_resume().
>
> The patch below only does (2) which is insufficient and it introduces
> a functional issue for the direct_complete devices with runtime PM
> disabled because it will cause runtime PM to remain disabled for them
> permanently.
>
> > I think that you want power.is_suspended to be checked before waiting
> > for the superiors.  Fair enough, since for devices with
> > power.is_suspended clear, there should be no superiors to wait for, so
> > the two checks can be done in any order and checking
> > power.is_suspended first would be less overhead.  And that's it
> > AFAICS.
> >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > ---
> > >  drivers/base/power/main.c | 6 +++---
> > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > index 4a67e83300e1..86e51b9fefab 100644
> > > --- a/drivers/base/power/main.c
> > > +++ b/drivers/base/power/main.c
> > > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
> > >         if (dev->power.syscore)
> > >                 goto Complete;
> > >
> > > +       if (!dev->power.is_suspended)
> > > +               goto Unlock;
>
> And this should be "goto Complete" because jumping to Unlock
> introduces a device locking imbalance.
>
> > > +
> > >         if (dev->power.direct_complete) {
> > >                 /* Match the pm_runtime_disable() in __device_suspend(). */
> > >                 pm_runtime_enable(dev);
> > > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
> > >          */
> > >         dev->power.is_prepared = false;
> > >
> > > -       if (!dev->power.is_suspended)
> > > -               goto Unlock;
> > > -
> > >         if (dev->pm_domain) {
> > >                 info = "power domain ";
> > >                 callback = pm_op(&dev->pm_domain->ops, state);
> > > --
>
> If you want to submit a new version of this patch, please do so by the
> end of the week or I will send my fix because I want this issue to be
> addressed in 6.15.

Please do ahead with the fix for this. I'm not too comfortable with
the direct_complete logic yet.

-Saravana

>
> BTW, please note that this is orthogonal to the recent async
> suspend-resume series
>
> https://lore.kernel.org/linux-pm/13709135.uLZWGnKmhe@rjwysocki.net/
>
> so there is no reason why it should be addressed in that series.
Rafael J. Wysocki March 13, 2025, 10:58 a.m. UTC | #6
On Thu, Mar 13, 2025 at 2:50 AM Saravana Kannan <saravanak@google.com> wrote:
>
> On Tue, Mar 11, 2025 at 3:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Wed, Dec 4, 2024 at 1:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > Trim CC list.
> > >
> > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > Some devices might have their is_suspended flag set to false. In these
> > > > cases, dpm_resume() should skip doing anything for those devices.
> > >
> > > Not really.  This is particularly untrue for devices with
> > > power.direct_complete set that have power.is_suspended clear.
> > >
> > > > However, runtime PM enable and a few others steps are done before
> > > > checking for this flag. Fix it so that we do things in the right order.
> > >
> > > I don't see the bug this is fixing, but I can see bugs introduced by it.
> >
> > So AFAICS the bug is in the error path when dpm_suspend() fails in
> > which case some devices with direct_complete set may not have been
> > handled by device_suspend().  Since runtime PM has not been disabled
> > for those devices yet, it doesn't make sense to re-enable runtime PM
> > for them (and if they had runtime PM disabled to start with, this will
> > inadvertently enable runtime PM for them).
> >
> > However, two changes are needed to fix this issue:
> > (1) power.is_suspended needs to be set for the devices with
> > direct_complete set in device_suspend().
> > (2) The power.is_suspended check needs to be moved after the
> > power.syscore one in device_resume().
> >
> > The patch below only does (2) which is insufficient and it introduces
> > a functional issue for the direct_complete devices with runtime PM
> > disabled because it will cause runtime PM to remain disabled for them
> > permanently.
> >
> > > I think that you want power.is_suspended to be checked before waiting
> > > for the superiors.  Fair enough, since for devices with
> > > power.is_suspended clear, there should be no superiors to wait for, so
> > > the two checks can be done in any order and checking
> > > power.is_suspended first would be less overhead.  And that's it
> > > AFAICS.
> > >
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > > ---
> > > >  drivers/base/power/main.c | 6 +++---
> > > >  1 file changed, 3 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> > > > index 4a67e83300e1..86e51b9fefab 100644
> > > > --- a/drivers/base/power/main.c
> > > > +++ b/drivers/base/power/main.c
> > > > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
> > > >         if (dev->power.syscore)
> > > >                 goto Complete;
> > > >
> > > > +       if (!dev->power.is_suspended)
> > > > +               goto Unlock;
> >
> > And this should be "goto Complete" because jumping to Unlock
> > introduces a device locking imbalance.
> >
> > > > +
> > > >         if (dev->power.direct_complete) {
> > > >                 /* Match the pm_runtime_disable() in __device_suspend(). */
> > > >                 pm_runtime_enable(dev);
> > > > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
> > > >          */
> > > >         dev->power.is_prepared = false;
> > > >
> > > > -       if (!dev->power.is_suspended)
> > > > -               goto Unlock;
> > > > -
> > > >         if (dev->pm_domain) {
> > > >                 info = "power domain ";
> > > >                 callback = pm_op(&dev->pm_domain->ops, state);
> > > > --
> >
> > If you want to submit a new version of this patch, please do so by the
> > end of the week or I will send my fix because I want this issue to be
> > addressed in 6.15.
>
> Please do ahead with the fix for this. I'm not too comfortable with
> the direct_complete logic yet.

OK, I will, thank you!
Pavel Machek March 14, 2025, 8:47 p.m. UTC | #7
Hi!

> Some devices might have their is_suspended flag set to false. In these
> cases, dpm_resume() should skip doing anything for those devices.
> However, runtime PM enable and a few others steps are done before
> checking for this flag. Fix it so that we do things in the right order.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/power/main.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 4a67e83300e1..86e51b9fefab 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async)
>  	if (dev->power.syscore)
>  		goto Complete;
>  
> +	if (!dev->power.is_suspended)
> +		goto Unlock;
> +

This needs to be goto Complete, right?
								Pavel
diff mbox series

Patch

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 4a67e83300e1..86e51b9fefab 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -913,6 +913,9 @@  static void device_resume(struct device *dev, pm_message_t state, bool async)
 	if (dev->power.syscore)
 		goto Complete;
 
+	if (!dev->power.is_suspended)
+		goto Unlock;
+
 	if (dev->power.direct_complete) {
 		/* Match the pm_runtime_disable() in __device_suspend(). */
 		pm_runtime_enable(dev);
@@ -931,9 +934,6 @@  static void device_resume(struct device *dev, pm_message_t state, bool async)
 	 */
 	dev->power.is_prepared = false;
 
-	if (!dev->power.is_suspended)
-		goto Unlock;
-
 	if (dev->pm_domain) {
 		info = "power domain ";
 		callback = pm_op(&dev->pm_domain->ops, state);