Message ID | 20241114220921.2529905-2-saravanak@google.com |
---|---|
State | New |
Headers | show |
Series | Optimize async device suspend/resume | expand |
On Thu, Nov 14, 2024 at 02:09:15PM -0800, Saravana Kannan wrote: > Some devices might have their is_suspended flag set to false. In these > cases, dpm_resume() should skip doing anything for those devices. > However, runtime PM enable and a few others steps are done before > checking for this flag. Fix it so that we do things in the right order. > > Signed-off-by: Saravana Kannan <saravanak@google.com> This looks like a nice generic fix as well, should it go to older kernels? thanks, greg k-h
On Fri, Nov 15, 2024 at 11:43 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Thu, Nov 14, 2024 at 02:09:15PM -0800, Saravana Kannan wrote: > > Some devices might have their is_suspended flag set to false. In these > > cases, dpm_resume() should skip doing anything for those devices. > > However, runtime PM enable and a few others steps are done before > > checking for this flag. Fix it so that we do things in the right order. > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > This looks like a nice generic fix as well, should it go to older > kernels? Yeah, it's meant to be a generic fix. But I'll feel better about it if someone familiar with this code can give it a Reviewed-by. And as I look at it... I have a bug in there. I think it should be goto Complete and not Unlock! No idea how my devices didn't get freed after a few suspend aborts! I can send it out as a separate patch too if you want. And depending on when it lands, I can keep it or drop it from v2 of this series. Thanks, Saravana
Trim CC list. On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > Some devices might have their is_suspended flag set to false. In these > cases, dpm_resume() should skip doing anything for those devices. Not really. This is particularly untrue for devices with power.direct_complete set that have power.is_suspended clear. > However, runtime PM enable and a few others steps are done before > checking for this flag. Fix it so that we do things in the right order. I don't see the bug this is fixing, but I can see bugs introduced by it. I think that you want power.is_suspended to be checked before waiting for the superiors. Fair enough, since for devices with power.is_suspended clear, there should be no superiors to wait for, so the two checks can be done in any order and checking power.is_suspended first would be less overhead. And that's it AFAICS. > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > drivers/base/power/main.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > index 4a67e83300e1..86e51b9fefab 100644 > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > if (dev->power.syscore) > goto Complete; > > + if (!dev->power.is_suspended) > + goto Unlock; > + > if (dev->power.direct_complete) { > /* Match the pm_runtime_disable() in __device_suspend(). */ > pm_runtime_enable(dev); > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > */ > dev->power.is_prepared = false; > > - if (!dev->power.is_suspended) > - goto Unlock; > - > if (dev->pm_domain) { > info = "power domain "; > callback = pm_op(&dev->pm_domain->ops, state); > -- > 2.47.0.338.g60cca15819-goog >
On Wed, Dec 4, 2024 at 1:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > Trim CC list. > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > Some devices might have their is_suspended flag set to false. In these > > cases, dpm_resume() should skip doing anything for those devices. > > Not really. This is particularly untrue for devices with > power.direct_complete set that have power.is_suspended clear. > > > However, runtime PM enable and a few others steps are done before > > checking for this flag. Fix it so that we do things in the right order. > > I don't see the bug this is fixing, but I can see bugs introduced by it. So AFAICS the bug is in the error path when dpm_suspend() fails in which case some devices with direct_complete set may not have been handled by device_suspend(). Since runtime PM has not been disabled for those devices yet, it doesn't make sense to re-enable runtime PM for them (and if they had runtime PM disabled to start with, this will inadvertently enable runtime PM for them). However, two changes are needed to fix this issue: (1) power.is_suspended needs to be set for the devices with direct_complete set in device_suspend(). (2) The power.is_suspended check needs to be moved after the power.syscore one in device_resume(). The patch below only does (2) which is insufficient and it introduces a functional issue for the direct_complete devices with runtime PM disabled because it will cause runtime PM to remain disabled for them permanently. > I think that you want power.is_suspended to be checked before waiting > for the superiors. Fair enough, since for devices with > power.is_suspended clear, there should be no superiors to wait for, so > the two checks can be done in any order and checking > power.is_suspended first would be less overhead. And that's it > AFAICS. > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > --- > > drivers/base/power/main.c | 6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > > index 4a67e83300e1..86e51b9fefab 100644 > > --- a/drivers/base/power/main.c > > +++ b/drivers/base/power/main.c > > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > > if (dev->power.syscore) > > goto Complete; > > > > + if (!dev->power.is_suspended) > > + goto Unlock; And this should be "goto Complete" because jumping to Unlock introduces a device locking imbalance. > > + > > if (dev->power.direct_complete) { > > /* Match the pm_runtime_disable() in __device_suspend(). */ > > pm_runtime_enable(dev); > > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > > */ > > dev->power.is_prepared = false; > > > > - if (!dev->power.is_suspended) > > - goto Unlock; > > - > > if (dev->pm_domain) { > > info = "power domain "; > > callback = pm_op(&dev->pm_domain->ops, state); > > -- If you want to submit a new version of this patch, please do so by the end of the week or I will send my fix because I want this issue to be addressed in 6.15. BTW, please note that this is orthogonal to the recent async suspend-resume series https://lore.kernel.org/linux-pm/13709135.uLZWGnKmhe@rjwysocki.net/ so there is no reason why it should be addressed in that series.
On Tue, Mar 11, 2025 at 3:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Dec 4, 2024 at 1:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > Trim CC list. > > > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > Some devices might have their is_suspended flag set to false. In these > > > cases, dpm_resume() should skip doing anything for those devices. > > > > Not really. This is particularly untrue for devices with > > power.direct_complete set that have power.is_suspended clear. > > > > > However, runtime PM enable and a few others steps are done before > > > checking for this flag. Fix it so that we do things in the right order. > > > > I don't see the bug this is fixing, but I can see bugs introduced by it. > > So AFAICS the bug is in the error path when dpm_suspend() fails in > which case some devices with direct_complete set may not have been > handled by device_suspend(). Since runtime PM has not been disabled > for those devices yet, it doesn't make sense to re-enable runtime PM > for them (and if they had runtime PM disabled to start with, this will > inadvertently enable runtime PM for them). > > However, two changes are needed to fix this issue: > (1) power.is_suspended needs to be set for the devices with > direct_complete set in device_suspend(). > (2) The power.is_suspended check needs to be moved after the > power.syscore one in device_resume(). > > The patch below only does (2) which is insufficient and it introduces > a functional issue for the direct_complete devices with runtime PM > disabled because it will cause runtime PM to remain disabled for them > permanently. > > > I think that you want power.is_suspended to be checked before waiting > > for the superiors. Fair enough, since for devices with > > power.is_suspended clear, there should be no superiors to wait for, so > > the two checks can be done in any order and checking > > power.is_suspended first would be less overhead. And that's it > > AFAICS. > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > > --- > > > drivers/base/power/main.c | 6 +++--- > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > > > index 4a67e83300e1..86e51b9fefab 100644 > > > --- a/drivers/base/power/main.c > > > +++ b/drivers/base/power/main.c > > > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > > > if (dev->power.syscore) > > > goto Complete; > > > > > > + if (!dev->power.is_suspended) > > > + goto Unlock; > > And this should be "goto Complete" because jumping to Unlock > introduces a device locking imbalance. > > > > + > > > if (dev->power.direct_complete) { > > > /* Match the pm_runtime_disable() in __device_suspend(). */ > > > pm_runtime_enable(dev); > > > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > > > */ > > > dev->power.is_prepared = false; > > > > > > - if (!dev->power.is_suspended) > > > - goto Unlock; > > > - > > > if (dev->pm_domain) { > > > info = "power domain "; > > > callback = pm_op(&dev->pm_domain->ops, state); > > > -- > > If you want to submit a new version of this patch, please do so by the > end of the week or I will send my fix because I want this issue to be > addressed in 6.15. Please do ahead with the fix for this. I'm not too comfortable with the direct_complete logic yet. -Saravana > > BTW, please note that this is orthogonal to the recent async > suspend-resume series > > https://lore.kernel.org/linux-pm/13709135.uLZWGnKmhe@rjwysocki.net/ > > so there is no reason why it should be addressed in that series.
On Thu, Mar 13, 2025 at 2:50 AM Saravana Kannan <saravanak@google.com> wrote: > > On Tue, Mar 11, 2025 at 3:47 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Wed, Dec 4, 2024 at 1:53 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > Trim CC list. > > > > > > On Thu, Nov 14, 2024 at 11:09 PM Saravana Kannan <saravanak@google.com> wrote: > > > > > > > > Some devices might have their is_suspended flag set to false. In these > > > > cases, dpm_resume() should skip doing anything for those devices. > > > > > > Not really. This is particularly untrue for devices with > > > power.direct_complete set that have power.is_suspended clear. > > > > > > > However, runtime PM enable and a few others steps are done before > > > > checking for this flag. Fix it so that we do things in the right order. > > > > > > I don't see the bug this is fixing, but I can see bugs introduced by it. > > > > So AFAICS the bug is in the error path when dpm_suspend() fails in > > which case some devices with direct_complete set may not have been > > handled by device_suspend(). Since runtime PM has not been disabled > > for those devices yet, it doesn't make sense to re-enable runtime PM > > for them (and if they had runtime PM disabled to start with, this will > > inadvertently enable runtime PM for them). > > > > However, two changes are needed to fix this issue: > > (1) power.is_suspended needs to be set for the devices with > > direct_complete set in device_suspend(). > > (2) The power.is_suspended check needs to be moved after the > > power.syscore one in device_resume(). > > > > The patch below only does (2) which is insufficient and it introduces > > a functional issue for the direct_complete devices with runtime PM > > disabled because it will cause runtime PM to remain disabled for them > > permanently. > > > > > I think that you want power.is_suspended to be checked before waiting > > > for the superiors. Fair enough, since for devices with > > > power.is_suspended clear, there should be no superiors to wait for, so > > > the two checks can be done in any order and checking > > > power.is_suspended first would be less overhead. And that's it > > > AFAICS. > > > > > > > Signed-off-by: Saravana Kannan <saravanak@google.com> > > > > --- > > > > drivers/base/power/main.c | 6 +++--- > > > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > > > > index 4a67e83300e1..86e51b9fefab 100644 > > > > --- a/drivers/base/power/main.c > > > > +++ b/drivers/base/power/main.c > > > > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > > > > if (dev->power.syscore) > > > > goto Complete; > > > > > > > > + if (!dev->power.is_suspended) > > > > + goto Unlock; > > > > And this should be "goto Complete" because jumping to Unlock > > introduces a device locking imbalance. > > > > > > + > > > > if (dev->power.direct_complete) { > > > > /* Match the pm_runtime_disable() in __device_suspend(). */ > > > > pm_runtime_enable(dev); > > > > @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > > > > */ > > > > dev->power.is_prepared = false; > > > > > > > > - if (!dev->power.is_suspended) > > > > - goto Unlock; > > > > - > > > > if (dev->pm_domain) { > > > > info = "power domain "; > > > > callback = pm_op(&dev->pm_domain->ops, state); > > > > -- > > > > If you want to submit a new version of this patch, please do so by the > > end of the week or I will send my fix because I want this issue to be > > addressed in 6.15. > > Please do ahead with the fix for this. I'm not too comfortable with > the direct_complete logic yet. OK, I will, thank you!
Hi! > Some devices might have their is_suspended flag set to false. In these > cases, dpm_resume() should skip doing anything for those devices. > However, runtime PM enable and a few others steps are done before > checking for this flag. Fix it so that we do things in the right order. > > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > drivers/base/power/main.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c > index 4a67e83300e1..86e51b9fefab 100644 > --- a/drivers/base/power/main.c > +++ b/drivers/base/power/main.c > @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) > if (dev->power.syscore) > goto Complete; > > + if (!dev->power.is_suspended) > + goto Unlock; > + This needs to be goto Complete, right? Pavel
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 4a67e83300e1..86e51b9fefab 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -913,6 +913,9 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) if (dev->power.syscore) goto Complete; + if (!dev->power.is_suspended) + goto Unlock; + if (dev->power.direct_complete) { /* Match the pm_runtime_disable() in __device_suspend(). */ pm_runtime_enable(dev); @@ -931,9 +934,6 @@ static void device_resume(struct device *dev, pm_message_t state, bool async) */ dev->power.is_prepared = false; - if (!dev->power.is_suspended) - goto Unlock; - if (dev->pm_domain) { info = "power domain "; callback = pm_op(&dev->pm_domain->ops, state);
Some devices might have their is_suspended flag set to false. In these cases, dpm_resume() should skip doing anything for those devices. However, runtime PM enable and a few others steps are done before checking for this flag. Fix it so that we do things in the right order. Signed-off-by: Saravana Kannan <saravanak@google.com> --- drivers/base/power/main.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)