diff mbox

PM regression with commit 5de85b9d57ab PM runtime re-init in v4.5-rc1

Message ID CAPDyKFqQb+jtTpZbq6EvbfAk28yVkgdtiOswngNH_BjCWBDxFg@mail.gmail.com
State New
Headers show

Commit Message

Ulf Hansson Feb. 2, 2016, 10:07 a.m. UTC
On 2 February 2016 at 04:05, Tony Lindgren <tony@atomide.com> wrote:
> Hi,

>

> * Alan Stern <stern@rowland.harvard.edu> [160201 15:50]:

>>

>> You get here only if runtime PM is disabled, right?  So the rpm_idle

>> call won't do anything -- "disabled" means don't make any callbacks.

>

> Hmm sorry yes I'm confused again. Yeah it looks like calling rpm_idle

> just has a side effect that makes a difference here.

>

>> Tony, exactly what are you trying to do here?  Do you want this to

>> invoke a runtime-PM callback in the subsystem, power domain, or driver?

>> (Is there even a driver bound to the device when this function runs?)

>

> I guess I need to add more printks to figure out what's going on here.

> But yeah, I'm not seeing the callback happening at the interconnect

> level so hardware and PM runtime states won't match on the following

> probe after -EPROBE_DEFER.

>

>> The function's name suggests that it merely resets the data stored in

>> dev->power without actually touching the hardware.  Is that what you

>> really want?

>

> I guess you mean pm_runtime_set_suspended() above? I'm seeing a state

> where we now set pm_runtime_set_suspended() between failed device

> probes and the device is still active in hardware.

>

> The patch below also helps with the problem and leaves out the

> rpm_suspend() call from loop so it might give more hints.

>

> The difference here from what Rafael suggested earlier is calling

> __pm_runtime_use_autosuspend() and then not calling

> pm_runtime_set_suspended().

>

> However, it seems the below patch keeps hardware active in the

> autoidle case though, so chances are there is more that needs to

> be done here. Anyways, I'll try to debug it more tomorrow.

>


Your observations is correct. The hardware will be kept active
in-between the probe attempts (and thus also if probing fails).
Although, that's not a regression as that's the behaviour you get from
runtime PM, when drivers are implemented like omap_hsmmc.

Instead of the suggested approaches, I think the regression should be
fixed at the PM domain level (omap hwmod). I have attached a patch
below, please give it try as it's untested.

To solve the other problem (allowing devices to become inactive
in-between at probe failures), I see two options (not treated as
regressions).
1)
Change the behaviour of pm_runtime_put_sync(), to *not* respect the
autosuspend mode.
I think I prefer this option, as it seems like autosuspend should be
respected only via the asynchronous runtime PM APIs.

2)
Change the failing drivers, to before calling pm_runtime_put_sync()
also invoke pm_runtime_dont_use_autosusend(). Or other similar
approach.

Kind regards
Uffe

[...]

From: Ulf Hansson <ulf.hansson@linaro.org>

Date: Tue, 2 Feb 2016 10:05:39 +0100
Subject: [PATCH] ARM: OMAP2+: omap-device: Allow devices to be pre-enabled

Commit 5de85b9d57ab ("PM / runtime: Re-init runtime PM states at probe
error and driver unbind") may re-initialize the runtime PM status of the
device to RPM_SUSPENDED at driver probe failures.

For the omap_hsmmc and likely also other omap drivers, which needs more
than one attempt to ->probe() (returning -EPROBE_DEFER), this commit
causes a regression at the PM domain level (omap hwmod).

The reason is that the drivers don't put back the device into low power
state while bailing out in ->probe to return -EPROBE_DEFER. This leads to
that pm_runtime_reinit() in driver core, is re-initializing the runtime PM
status from RPM_ACTIVE to RPM_SUSPENDED.

The next ->probe() attempt then triggers the ->runtime_resume() callback
to be invoked, which means this happens two times in a row. At the PM
domain level (omap hwmod) this is being treated as an error and thus the
runtime PM status of the device isn't correctly synchronized with the
runtime PM core.

In the end, ->probe() anyway succeeds (as the driver don't checks the
error code from the runtime PM APIs), but results in that the PM domain
always stays powered on. This because of the runtime PM core believes the
device is RPM_SUSPENDED.

Fix this regression by allowing devices to be pre-enabled when the PM
domain's (omap hwmod) ->runtime_resume() callback is requested to enable
the device. In such cases, return zero to synchronize the runtime PM
status with the runtime PM core.

Fixes: 5de85b9d57ab ("PM / runtime: Re-init runtime PM states at probe
error and driver unbind")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

---
 arch/arm/mach-omap2/omap_device.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ulf Hansson Feb. 3, 2016, 10:23 a.m. UTC | #1
On 3 February 2016 at 00:41, Tony Lindgren <tony@atomide.com> wrote:
> * Ulf Hansson <ulf.hansson@linaro.org> [160202 12:48]:

>> On 2 February 2016 at 17:35, Tony Lindgren <tony@atomide.com> wrote:

>> > That's a valid error though, let's not remove it. The reason why we

>> > call runtime_resume() twice is because runtime_suspend callback never

>> > gets called like I explain above.

>>

>> This isn't an error, it's just a hickup in the synchronization of the

>> runtime PM status.

>

> I'd rather not get the hardware state out of sync with PM runtime..

>

>> Very similar what happens at the first probe, when the driver core has

>> initialized the runtime PM status to RPM_SUSPENDED at the device

>> registration.

>

> Well we actually pretty much have devices in that state to start

> with.


That's not true, not even for omap.

There are definitely devices which HW state is reflected by RPM_ACTIVE
at boot. It's the responsible of the subsystem/driver (including PM
domains) to make sure the runtime PM status is updated accordingly, to
reflect the real HW state.

For omap hwmod, at device registration in omap_device_build_from_dt()
it may actually invoke pm_runtime_set_active() if the device is
enabled. To my understanding, that's done to synchronize the real HW
state with the runtime PM status, right?

>

>> To me, it's the responsible of the PM domain to *help* with the

>> synchronization, not prevent it as it currently does.

>

> The problem is that the hardware state gets out of sync with

> PM runtime. And that's going to be a pain to debug later on.


I don't see the problem, but of course you know omap for better than I do.

So if you are concerned about this, perhaps adding a dev_dbg print
when the omap hwmod's ->runtime_suspend() callback returns zero could
be a way forward?

>

>> > --- a/drivers/mmc/host/omap_hsmmc.c

>> > +++ b/drivers/mmc/host/omap_hsmmc.c

>> > @@ -2232,6 +2232,7 @@ err_irq:

>> >                 dma_release_channel(host->tx_chan);

>> >         if (host->rx_chan)

>> >                 dma_release_channel(host->rx_chan);

>> > +       pm_runtime_dont_use_autosuspend(host->dev);

>>

>> It's good know this works, although do you intend to fix this sequence

>> for all omap drivers/devices that's part of the hwmod PM domain?

>>

>> I haven't checked the number of drivers this would affect, but I

>> imagine there could be quite many with similar behaviour and thus may

>> suffer from the same issue.

>

> Yeah not sure what the right fix is. But I'd rather patch the

> few drivers using autosuspend if we come to the conclusion

> that there is no bug in PM runtime.

>

>> Could you please test my version 2 of the patch I attached earlier. I

>> still believe it's the best way to solve the regression, if it works

>> of course. :-)

>

> And I don't like it because of the reasons above :) But yeah

> gave it a quick try and that too works as expected.


Okay, thanks for testing!

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Feb. 3, 2016, 10:25 a.m. UTC | #2
>

> I don't see the problem, but of course you know omap for better than I do.

>

> So if you are concerned about this, perhaps adding a dev_dbg print

> when the omap hwmod's ->runtime_suspend() callback returns zero could


/s/runtime_suspend/runtime_resume

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Feb. 3, 2016, 6:02 p.m. UTC | #3
On 3 February 2016 at 17:27, Tony Lindgren <tony@atomide.com> wrote:
>

>> >> To me, it's the responsible of the PM domain to *help* with the

>> >> synchronization, not prevent it as it currently does.

>> >

>> > The problem is that the hardware state gets out of sync with

>> > PM runtime. And that's going to be a pain to debug later on.

>>

>> I don't see the problem, but of course you know omap for better than I do.

>

> Well there's also the long term maintenance aspect at least I

> need to consider.

>

>> So if you are concerned about this, perhaps adding a dev_dbg print

>> when the omap hwmod's ->runtime_suspend() callback returns zero could

>> be a way forward?

>

> If we downgrade it to a debug statement or a warning, we'll soon end

> up with even more driver specific warnings than we already have.

> And I don't want to be chasing people around to fix their drivers

> for eavery new driver that gets submitted.

>

> Also, without this error I would not even originally have noticed we

> have a problem :) So I suggest the following:


Well, I am actually not removing that existing warning in the
omap_device_enable(), as it's being called from other places as well.

It's just the runtime PM path that's being changed.

>

> 1. I'll do a series of patches to fix up the handful of omap

>    specific drivers with pm_runtime_use_autosuspend() that depend

>    on omap_device

>

> 2. I'll also do a patch to improve the omap_device error message

>    so new drivers are easy to fix. Something like:

>

>   "%() called from invalid state %d, use pm_runtime_put_sync_suspend()?"

>

> Does that sounds OK to you?


Sure.

>

> Regards,

>

> Tony


One more thing though. I just realized that you have yet another issue
to consider going for the approach fixing *only* drivers.

Let me summarize it here:

If userspace has prevented runtime PM (pm_runtime_forbid()) when a
driver becomes unbound, the driver will not be able to suspend the
device by using any of the pm_runtime_suspend() APIs, as the usage
count is isn't zero.

As pm_runtime_reinit() is invoked as part of the driver unbind
sequence, the runtime PM status goes out of sync. A following driver
rebind will then trigger the warning when the PM domain's
->runtime_resume() callback gets invoked. Again, forever preventing
the device from being runtime suspended.

How do you intend to solve this case?
I guess there are two options, pick up the patch I posted for omap
hwmod or make use of pm_runtime_force_suspend() in the driver.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Feb. 3, 2016, 6:37 p.m. UTC | #4
On 3 February 2016 at 19:28, Tony Lindgren <tony@atomide.com> wrote:
> * Ulf Hansson <ulf.hansson@linaro.org> [160203 10:03]:

>>

>> One more thing though. I just realized that you have yet another issue

>> to consider going for the approach fixing *only* drivers.

>>

>> Let me summarize it here:

>>

>> If userspace has prevented runtime PM (pm_runtime_forbid()) when a

>> driver becomes unbound, the driver will not be able to suspend the

>> device by using any of the pm_runtime_suspend() APIs, as the usage

>> count is isn't zero.

>>

>> As pm_runtime_reinit() is invoked as part of the driver unbind

>> sequence, the runtime PM status goes out of sync. A following driver

>> rebind will then trigger the warning when the PM domain's

>> ->runtime_resume() callback gets invoked. Again, forever preventing

>> the device from being runtime suspended.

>

> Hmm yeah that's a good point.

>

>> How do you intend to solve this case?

>> I guess there are two options, pick up the patch I posted for omap

>> hwmod or make use of pm_runtime_force_suspend() in the driver.

>

> My gut feeling right now is we should just have

> BUS_NOTIFY_UNBIND_DRIVER shut down the device on the interconnect

> automatically as it's unused after the driver has unloaded :)


BUS_NOTIFY_UNBIND_DRIVER is sent prior the ->remove() callbacks is
invoked from driver core.
So if the driver requires to do a pm_runtime_get_sync() during
->remove() callback, this won't work.

BUS_NOTIFY_UNBOUND_DRIVER may work though.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/arm/mach-omap2/omap_device.c
b/arch/arm/mach-omap2/omap_device.c
index 0437537..851767f 100644
--- a/arch/arm/mach-omap2/omap_device.c
+++ b/arch/arm/mach-omap2/omap_device.c
@@ -717,7 +717,7 @@  int omap_device_register(struct platform_device *pdev)
  * to be accessible and ready to operate.  This generally involves
  * enabling clocks, setting SYSCONFIG registers; and in the future may
  * involve remuxing pins.  Device drivers should call this function
- * indirectly via pm_runtime_get*().  Returns -EINVAL if called when
+ * indirectly via pm_runtime_get*().  Returns zero if called when
  * the omap_device is already enabled, or passes along the return
  * value of _omap_device_enable_hwmods().
  */
@@ -728,12 +728,13 @@  int omap_device_enable(struct platform_device *pdev)

        od = to_omap_device(pdev);

-       if (od->_state == OMAP_DEVICE_STATE_ENABLED) {
-               dev_warn(&pdev->dev,
-                        "omap_device: %s() called from invalid state %d\n",
-                        __func__, od->_state);
-               return -EINVAL;
-       }
+       /*
+        * From the PM domain perspective the device may already be enabled.
+        * In such case, let's return zero to synchronize the state with the
+        * runtime PM core.
+       */
+       if (od->_state == OMAP_DEVICE_STATE_ENABLED)
+               return 0;

        ret = _omap_device_enable_hwmods(od);