diff mbox

PM regression with commit 5de85b9d57ab PM runtime re-init in v4.5-rc1

Message ID CAPDyKFras1o13WBZzaZ_Snnj5TBQQFWKr=PtCjUvwe+yGPQn9w@mail.gmail.com
State New
Headers show

Commit Message

Ulf Hansson Feb. 2, 2016, 10:42 a.m. UTC
>

> Instead of the suggested approaches, I think the regression should be

> fixed at the PM domain level (omap hwmod). I have attached a patch

> below, please give it try as it's untested.


Realized that version 1 would actually *only* make the PM domain code
to deal with pre-enabled devices. It would still invoke the driver's
->runtime_resume() callbacks (via pm_generic_runtime_resume())  for
these scenarios.

That's not what we want. So I moved the checks to the actual
->runtime_resume() callback in the PM domain, instead of in the
omap_device_enable() function. A version 2 is attached. Please give it
at try.

[...]

From: Ulf Hansson <ulf.hansson@linaro.org>

Date: Tue, 2 Feb 2016 10:05:39 +0100
Subject: [PATCH V2] ARM: OMAP2+: omap-device: Allow devices to be pre-enabled

Commit 5de85b9d57ab ("PM / runtime: Re-init runtime PM states at probe
error and driver unbind") may re-initialize the runtime PM status of the
device to RPM_SUSPENDED at driver probe failures.

For the omap_hsmmc and likely also other omap drivers, which needs more
than one attempt to ->probe() (returning -EPROBE_DEFER), this commit
causes a regression at the PM domain level (omap hwmod).

The reason is that the drivers don't put back the device into low power
state while bailing out in ->probe to return -EPROBE_DEFER. This leads to
that pm_runtime_reinit() in driver core, is re-initializing the runtime PM
status from RPM_ACTIVE to RPM_SUSPENDED.

The next ->probe() attempt then triggers the ->runtime_resume() callback
to be invoked, which means this happens two times in a row. At the PM
domain level (omap hwmod) this is being treated as an error and thus the
runtime PM status of the device isn't correctly synchronized with the
runtime PM core.

In the end, ->probe() anyway succeeds (as the driver don't checks the
error code from the runtime PM APIs), but results in that the PM domain
always stays powered on. This because of the runtime PM core believes the
device is RPM_SUSPENDED.

Fix this regression by allowing devices to be pre-enabled when the PM
domain's (omap hwmod) ->runtime_resume() callback is requested to enable
the device. In such cases, return zero to synchronize the runtime PM
status with the runtime PM core.

Fixes: 5de85b9d57ab ("PM / runtime: Re-init runtime PM states at probe
error and driver unbind")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

---

Changes in v2:
        -Prevent a driver's ->runtime_resume() callbacks from being invoked for
        a pre-enabled device.

---
 arch/arm/mach-omap2/omap_device.c | 9 +++++++++
 1 file changed, 9 insertions(+)

-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ulf Hansson Feb. 2, 2016, 8:47 p.m. UTC | #1
On 2 February 2016 at 17:35, Tony Lindgren <tony@atomide.com> wrote:
> Hi,

>

> * Ulf Hansson <ulf.hansson@linaro.org> [160202 02:43]:

>>

>> For the omap_hsmmc and likely also other omap drivers, which needs more

>> than one attempt to ->probe() (returning -EPROBE_DEFER), this commit

>> causes a regression at the PM domain level (omap hwmod).

>>

>> The reason is that the drivers don't put back the device into low power

>> state while bailing out in ->probe to return -EPROBE_DEFER. This leads to

>> that pm_runtime_reinit() in driver core, is re-initializing the runtime PM

>> status from RPM_ACTIVE to RPM_SUSPENDED.

>

> Yup, that's the bug here. It seems that we never call the runtime_suspend

> callback at the end of a first failed device driver probe if the driver

> has set pm_runtime_use_autosuspend. Only rpm_idle runtime_idle callback

> gets called. So the device stays on.

>

> This does not happen if pm_runtime_dont_use_autosuspend() is added to

> the end of the device driver probe before pm_runtime_put_sync().


Thanks! It then confirms the second option I proposed.

>

>> The next ->probe() attempt then triggers the ->runtime_resume() callback

>> to be invoked, which means this happens two times in a row. At the PM

>> domain level (omap hwmod) this is being treated as an error and thus the

>> runtime PM status of the device isn't correctly synchronized with the

>> runtime PM core.

>

> That's a valid error though, let's not remove it. The reason why we

> call runtime_resume() twice is because runtime_suspend callback never

> gets called like I explain above.


This isn't an error, it's just a hickup in the synchronization of the
runtime PM status.

Very similar what happens at the first probe, when the driver core has
initialized the runtime PM status to RPM_SUSPENDED at the device
registration.

To me, it's the responsible of the PM domain to *help* with the
synchronization, not prevent it as it currently does.

>

>> In the end, ->probe() anyway succeeds (as the driver don't checks the

>> error code from the runtime PM APIs), but results in that the PM domain

>> always stays powered on. This because of the runtime PM core believes the

>> device is RPM_SUSPENDED.

>

> FYI, the following allows runtime_suspend callback to get called at the

> end of a failed driver probe so the hardware state matches the PM runtime

> state. Need to debug more.

>

> Regards,

>

> Tony

>

> 8< ------------

> --- a/drivers/mmc/host/omap_hsmmc.c

> +++ b/drivers/mmc/host/omap_hsmmc.c

> @@ -2232,6 +2232,7 @@ err_irq:

>                 dma_release_channel(host->tx_chan);

>         if (host->rx_chan)

>                 dma_release_channel(host->rx_chan);

> +       pm_runtime_dont_use_autosuspend(host->dev);


It's good know this works, although do you intend to fix this sequence
for all omap drivers/devices that's part of the hwmod PM domain?

I haven't checked the number of drivers this would affect, but I
imagine there could be quite many with similar behaviour and thus may
suffer from the same issue.

Of course the regression is only noticed for those returning
-EPROBE_DEFER, which might not be that many, but it seems fragile to
rely on this when going forward. All related drivers then needs to be
fixed.

>         pm_runtime_put_sync(host->dev);

>         pm_runtime_disable(host->dev);

>         if (host->dbclk)


Could you please test my version 2 of the patch I attached earlier. I
still believe it's the best way to solve the regression, if it works
of course. :-)

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Feb. 3, 2016, 2:58 p.m. UTC | #2
On 3 February 2016 at 13:18, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Wed, Feb 3, 2016 at 11:25 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

>>>

>>> I don't see the problem, but of course you know omap for better than I do.

>>>

>>> So if you are concerned about this, perhaps adding a dev_dbg print

>>> when the omap hwmod's ->runtime_suspend() callback returns zero could

>>

>> /s/runtime_suspend/runtime_resume

>

> OK

>

> Let me summarize my understanding of this thread so far.

>

> It looks like the omap3 code initializes hardware in ->probe() and

> then it may return -EPROBE_DEFER due to some unmet dependencies.  In

> that case the hardware is not reset to the previous state and the

> runtime PM framework is left in the state that corresponds to the

> current hardware state.  Before we had pm_runtime_reinit(), everything

> worked as expected on the second ->probe() call, because things were

> in sync then.


Correct!

Before pm_runtime_reinit(), the failing probe case (-EPROBE_DEFER)
worked fine because the HW state and the runtime PM status was in
sync. The device was powered and the runtime PM status was RPM_ACTIVE.

>

> The introduction of pm_runtime_reinit() changed the situation and now

> effectively the hardware is required to be reset to the initial state

> if -EPROBE_DEFER is to be returned too.


Not correct. The hardware doesn't need a reset as it stays powered
after a failed probe.

Instead, only the runtime PM status needs to be synchronized with the
HW state the next probe attempt.

In other words, when the PM domain's ->runtime_resume() callbacks gets
called due to a pm_runtime_get_sync() in the second probe attempt, it
may find that the HW is already powered and can return zero instead of
resetting it.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/arm/mach-omap2/omap_device.c
b/arch/arm/mach-omap2/omap_device.c
index 0437537..1ad390b 100644
--- a/arch/arm/mach-omap2/omap_device.c
+++ b/arch/arm/mach-omap2/omap_device.c
@@ -599,8 +599,17 @@  static int _od_runtime_suspend(struct device *dev)
 static int _od_runtime_resume(struct device *dev)
 {
        struct platform_device *pdev = to_platform_device(dev);
+       struct omap_device *od = to_omap_device(pdev);
        int ret;

+       /*
+        * From the PM domain perspective the device may already be enabled.
+        * In such case, let's return zero to synchronize the state with the
+        * runtime PM core.
+       */
+       if (od->_state == OMAP_DEVICE_STATE_ENABLED)
+               return 0;
+
        ret = omap_device_enable(pdev);
        if (ret)
                return ret;