diff mbox

[4/4] PM / Runtime: Don't allow to suspend a device with an active child

Message ID CAPDyKFoxSiCrfbWRGpSxpwYuL_5okARjqum25nZVe0KyOg_QYg@mail.gmail.com
State New
Headers show

Commit Message

Ulf Hansson Oct. 25, 2016, 9:31 p.m. UTC
On 25 October 2016 at 15:59, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> Hi Ulf,

>

> On Mon, Oct 17, 2016 at 8:17 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

>> When resuming a device in __pm_runtime_set_status(), the prerequisite is

>> that its parent must already be active, else an error code is returned and

>> the device's status remains suspended.

>>

>> When suspending a device there is no similar constraints being validated.

>> Let's change this to make the behaviour consistent, by not allowing to

>> suspend a device with an active child, unless it has been explicitly set to

>> ignore its children.

>>

>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

>

> This is now commit 8b1107b85efd78c in pm/linux-next.

>

> This breaks resume from s2ram on r8a73a4/ape6evm and sh73a0/kzm9g, where the

> smsc911x Ethernet is connected to an external bus, whose clock is controlled

> through Runtime PM and the simple-pm-bus driver.

>

> Reverting this commit fixes the issue.


Geert, thanks again for testing and reporting. I believe this problem
is directly related to what you reported for another patch [1] as
well.

Can you please try out this rather trivial patch to the Ethernet
driver (smsc911x), to see if that solves the problem(s).

From: Ulf Hansson <ulf.hansson@linaro.org>

Date: Tue, 25 Oct 2016 23:11:51 +0200
Subject: [PATCH] net: smsc911x: Synchronize the runtime PM status during
 system suspend

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

---
 drivers/net/ethernet/smsc/smsc911x.c | 6 ++++++
 1 file changed, 6 insertions(+)

--
1.9.1

Kind regards
Uffe

[1]
https://patchwork.kernel.org/patch/9375061/

>

>> ---

>>  drivers/base/power/runtime.c | 12 +++++++++++-

>>  1 file changed, 11 insertions(+), 1 deletion(-)

>>

>> diff --git a/drivers/base/power/runtime.c b/drivers/base/power/runtime.c

>> index f47a345..6f946a3 100644

>> --- a/drivers/base/power/runtime.c

>> +++ b/drivers/base/power/runtime.c

>> @@ -1028,7 +1028,17 @@ int __pm_runtime_set_status(struct device *dev, unsigned int status)

>>                 goto out_set;

>>

>>         if (status == RPM_SUSPENDED) {

>> -               /* It always is possible to set the status to 'suspended'. */

>> +               /*

>> +                * It is invalid to suspend a device with an active child,

>> +                * unless it has been set to ignore its children.

>> +                */

>> +               if (!dev->power.ignore_children &&

>> +                       atomic_read(&dev->power.child_count)) {

>> +                       dev_err(dev, "runtime PM trying to suspend device but active child\n");

>> +                       error = -EBUSY;

>> +                       goto out;

>> +               }

>> +

>>                 if (parent) {

>>                         atomic_add_unless(&parent->power.child_count, -1, 0);

>>                         notify_parent = !parent->power.ignore_children;

>

> Kernel output difference, with some additional debug info:

>

> | --- GOOD 2016-10-25 15:46:19.669597124 +0200

> | +++ BAD  2016-10-25 15:48:15.201596869 +0200

> | @@ -5,24 +5,71 @@ Freezing remaining freezable tasks ... (

> |  PM: Suspending system (mem)

> |  cpg_div6_clock_disable: sdhi1ck

> |  cpg_div6_clock_disable: sdhi0ck

> | -PM: suspend of devices complete after 22.777 msecs

> | -PM: late suspend of devices complete after 7.302 msecs

> | +PM: suspend of devices complete after 22.932 msecs

> | +PM: late suspend of devices complete after 7.311 msecs

> |  cpg_div6_clock_disable: mmc0

> |  cpg_div6_clock_disable: zb

>

> pm_clk disables the clock of fec10000.bus.

>

> | +simple-pm-bus fec10000.bus: runtime PM trying to suspend device but

> active child

>

> Suspend is aborted with -EBUSY, but zb stays disabled.

>

> |  rmobile_pd_power_down: a3sp

> | -PM: noirq suspend of devices complete after 18.424 msecs

> | +PM: noirq suspend of devices complete after 26.937 msecs

> |  Disabling non-boot CPUs ...

> | -Suspended for 7.196 seconds

> | +Suspended for 2.579 seconds

> |  rmobile_pd_power_up: a3sp

> | -cpg_div6_clock_enable: zb

>

> pm_clk doesn't enable the clock of fec10000.bus.

> (since it thinks it's still suspended?)

>

> |  cpg_div6_clock_enable: sdhi0ck

> |  cpg_div6_clock_enable: sdhi1ck

> |  cpg_div6_clock_enable: mmc0

> | -PM: noirq resume of devices complete after 134.580 msecs

> | -PM: early resume of devices complete after 6.084 msecs

> | -PM: resume of devices complete after 21.846 msecs

> | -PM: Finishing wakeup.

> | -Restarting tasks ... cpg_div6_clock_disable: mmc0

> | -done.

> | +PM: noirq resume of devices complete after 128.014 msecs

> | +PM: early resume of devices complete after 6.121 msecs

> | +Unhandled fault: imprecise external abort (0x1406) at 0x00000000

> | +pgd = ee748000

> | +[00000000] *pgd=7fc42835

> | +Internal error: : 1406 [#1] SMP ARM

> | +CPU: 0 PID: 1087 Comm: s2ram Not tainted

> 4.9.0-rc2-ape6evm-01959-g90550a414a6a6cd3-dirty #505

> | +Hardware name: Generic R8A73A4 (Flattened Device Tree)

> | +task: ee47a340 task.stack: ee686000

> | +PC is at smsc911x_resume+0x6c/0xbc

> | +LR is at smsc911x_resume+0x6c/0xbc

>

> smsc911x_resume() tries to access the smsc911x while the bus clock zb is

> not enabled.

>

> Gr{oetje,eeting}s,

>

>                         Geert

>

> --

> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

>

> In personal conversations with technical people, I call myself a hacker. But

> when I'm talking to journalists I just say "programmer" or something like that.

>                                 -- Linus Torvalds

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Geert Uytterhoeven Oct. 26, 2016, 9:47 a.m. UTC | #1
Hi Ulf,

On Tue, Oct 25, 2016 at 11:31 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 25 October 2016 at 15:59, Geert Uytterhoeven <geert@linux-m68k.org> wrote:

>> On Mon, Oct 17, 2016 at 8:17 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

>>> When resuming a device in __pm_runtime_set_status(), the prerequisite is

>>> that its parent must already be active, else an error code is returned and

>>> the device's status remains suspended.

>>>

>>> When suspending a device there is no similar constraints being validated.

>>> Let's change this to make the behaviour consistent, by not allowing to

>>> suspend a device with an active child, unless it has been explicitly set to

>>> ignore its children.

>>>

>>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

>>

>> This is now commit 8b1107b85efd78c in pm/linux-next.

>>

>> This breaks resume from s2ram on r8a73a4/ape6evm and sh73a0/kzm9g, where the

>> smsc911x Ethernet is connected to an external bus, whose clock is controlled

>> through Runtime PM and the simple-pm-bus driver.

>>

>> Reverting this commit fixes the issue.

>

> Geert, thanks again for testing and reporting. I believe this problem

> is directly related to what you reported for another patch [1] as

> well.


Indeed. At first I thought that patch got applied, but I couldn't find it ;-)

> Can you please try out this rather trivial patch to the Ethernet

> driver (smsc911x), to see if that solves the problem(s).


Thanks, it does!

Difference between before and after is:

 PM: Suspending system (mem)
 cpg_div6_clock_disable: sdhi1ck
 cpg_div6_clock_disable: sdhi0ck
-PM: suspend of devices complete after 22.932 msecs
-PM: late suspend of devices complete after 7.311 msecs
+PM: suspend of devices complete after 23.131 msecs
+PM: late suspend of devices complete after 7.300 msecs
 cpg_div6_clock_disable: mmc0
 cpg_div6_clock_disable: zb
-simple-pm-bus fec10000.bus: runtime PM trying to suspend device but
active child
 rmobile_pd_power_down: a3sp
-PM: noirq suspend of devices complete after 26.937 msecs
+PM: noirq suspend of devices complete after 18.467 msecs
 Disabling non-boot CPUs ...
-Suspended for 2.579 seconds
+Suspended for 2.949 seconds
 rmobile_pd_power_up: a3sp
+cpg_div6_clock_enable: zb
 cpg_div6_clock_enable: sdhi0ck
 cpg_div6_clock_enable: sdhi1ck
 cpg_div6_clock_enable: mmc0
-PM: noirq resume of devices complete after 128.014 msecs
-PM: early resume of devices complete after 6.121 msecs
-Unhandled fault: imprecise external abort (0x1406) at 0x00000000

The zb clock is enabled again.

BTW, shouldn't "runtime PM trying to suspend device but active child"
become a WARN()? If this happens, something is really wrong, and
the device will be left in half-suspended state (clock disabled).

> From: Ulf Hansson <ulf.hansson@linaro.org>

> Date: Tue, 25 Oct 2016 23:11:51 +0200

> Subject: [PATCH] net: smsc911x: Synchronize the runtime PM status during

>  system suspend

>

> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>


Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>


Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Oct. 26, 2016, 11:30 a.m. UTC | #2
On 26 October 2016 at 11:47, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> Hi Ulf,

>

> On Tue, Oct 25, 2016 at 11:31 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

>> On 25 October 2016 at 15:59, Geert Uytterhoeven <geert@linux-m68k.org> wrote:

>>> On Mon, Oct 17, 2016 at 8:17 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

>>>> When resuming a device in __pm_runtime_set_status(), the prerequisite is

>>>> that its parent must already be active, else an error code is returned and

>>>> the device's status remains suspended.

>>>>

>>>> When suspending a device there is no similar constraints being validated.

>>>> Let's change this to make the behaviour consistent, by not allowing to

>>>> suspend a device with an active child, unless it has been explicitly set to

>>>> ignore its children.

>>>>

>>>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

>>>

>>> This is now commit 8b1107b85efd78c in pm/linux-next.

>>>

>>> This breaks resume from s2ram on r8a73a4/ape6evm and sh73a0/kzm9g, where the

>>> smsc911x Ethernet is connected to an external bus, whose clock is controlled

>>> through Runtime PM and the simple-pm-bus driver.

>>>

>>> Reverting this commit fixes the issue.

>>

>> Geert, thanks again for testing and reporting. I believe this problem

>> is directly related to what you reported for another patch [1] as

>> well.

>

> Indeed. At first I thought that patch got applied, but I couldn't find it ;-)

>

>> Can you please try out this rather trivial patch to the Ethernet

>> driver (smsc911x), to see if that solves the problem(s).

>

> Thanks, it does!


Great, thanks for testing!

I will re-post the patch with a proper change-log.

>

> Difference between before and after is:

>

>  PM: Suspending system (mem)

>  cpg_div6_clock_disable: sdhi1ck

>  cpg_div6_clock_disable: sdhi0ck

> -PM: suspend of devices complete after 22.932 msecs

> -PM: late suspend of devices complete after 7.311 msecs

> +PM: suspend of devices complete after 23.131 msecs

> +PM: late suspend of devices complete after 7.300 msecs

>  cpg_div6_clock_disable: mmc0

>  cpg_div6_clock_disable: zb

> -simple-pm-bus fec10000.bus: runtime PM trying to suspend device but

> active child

>  rmobile_pd_power_down: a3sp

> -PM: noirq suspend of devices complete after 26.937 msecs

> +PM: noirq suspend of devices complete after 18.467 msecs

>  Disabling non-boot CPUs ...

> -Suspended for 2.579 seconds

> +Suspended for 2.949 seconds

>  rmobile_pd_power_up: a3sp

> +cpg_div6_clock_enable: zb

>  cpg_div6_clock_enable: sdhi0ck

>  cpg_div6_clock_enable: sdhi1ck

>  cpg_div6_clock_enable: mmc0

> -PM: noirq resume of devices complete after 128.014 msecs

> -PM: early resume of devices complete after 6.121 msecs

> -Unhandled fault: imprecise external abort (0x1406) at 0x00000000

>

> The zb clock is enabled again.

>

> BTW, shouldn't "runtime PM trying to suspend device but active child"

> become a WARN()? If this happens, something is really wrong, and

> the device will be left in half-suspended state (clock disabled).


That would make sense to me!
Although if we decide to change that, we should also convert from
dev_err() to WARN() when failing to set RPM_ACTIVE in
__pm_runtime_set_status(), as to be consistent.

Do you want to send a patch?

>

>> From: Ulf Hansson <ulf.hansson@linaro.org>

>> Date: Tue, 25 Oct 2016 23:11:51 +0200

>> Subject: [PATCH] net: smsc911x: Synchronize the runtime PM status during

>>  system suspend

>>

>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

>

> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

>

> Gr{oetje,eeting}s,

>

>                         Geert

>


Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/ethernet/smsc/smsc911x.c
b/drivers/net/ethernet/smsc/smsc911x.c
index e9b8579..65fca9c 100644
--- a/drivers/net/ethernet/smsc/smsc911x.c
+++ b/drivers/net/ethernet/smsc/smsc911x.c
@@ -2584,6 +2584,9 @@  static int smsc911x_suspend(struct device *dev)
                PMT_CTRL_PM_MODE_D1_ | PMT_CTRL_WOL_EN_ |
                PMT_CTRL_ED_EN_ | PMT_CTRL_PME_EN_);

+       pm_runtime_disable(dev);
+       pm_runtime_set_suspended(dev);
+
        return 0;
 }

@@ -2593,6 +2596,9 @@  static int smsc911x_resume(struct device *dev)
        struct smsc911x_data *pdata = netdev_priv(ndev);
        unsigned int to = 100;

+       pm_runtime_enable(dev);
+       pm_runtime_resume(dev);
+
        /* Note 3.11 from the datasheet:
         *      "When the LAN9220 is in a power saving state, a write of any
         *       data to the BYTE_TEST register will wake-up the device."