[RESEND,2/2] PM / Domains: Fix asynchronous execution of *noirq() callbacks

Message ID 1484241463-28435-3-git-send-email-ulf.hansson@linaro.org
State Superseded
Headers show

Commit Message

Ulf Hansson Jan. 12, 2017, 5:17 p.m.
As the PM core may invoke the *noirq() callbacks asynchronously, the
current lock-less approach in genpd doesn't work. The consequence is that
we may find concurrent operations racing to power on/off the PM domain.

As of now, no immediate errors has been reported, but it's probably only a
matter time. Therefor let's fix the problem now before this becomes a real
issue, by deploying the locking scheme to the relevant functions.

Reported-by: Brian Norris <briannorris@chromium.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

---
 drivers/base/power/domain.c | 62 ++++++++++++++++++++++++---------------------
 1 file changed, 33 insertions(+), 29 deletions(-)

-- 
1.9.1

Comments

Geert Uytterhoeven Feb. 7, 2017, 1:23 p.m. | #1
Hi Ulf, Rafael,

On Thu, Jan 12, 2017 at 6:17 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> As the PM core may invoke the *noirq() callbacks asynchronously, the

> current lock-less approach in genpd doesn't work. The consequence is that

> we may find concurrent operations racing to power on/off the PM domain.

>

> As of now, no immediate errors has been reported, but it's probably only a

> matter time. Therefor let's fix the problem now before this becomes a real

> issue, by deploying the locking scheme to the relevant functions.

>

> Reported-by: Brian Norris <briannorris@chromium.org>

> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>


This is commit ef4f7e2c8335a56b in pm/linux-next.

> ---

>  drivers/base/power/domain.c | 62 ++++++++++++++++++++++++---------------------

>  1 file changed, 33 insertions(+), 29 deletions(-)

>

> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c

> index fd2e3e1..6b23d82 100644

> --- a/drivers/base/power/domain.c

> +++ b/drivers/base/power/domain.c

> @@ -729,16 +729,17 @@ static bool genpd_dev_active_wakeup(struct generic_pm_domain *genpd,


> @@ -1070,13 +1070,17 @@ static void genpd_syscore_switch(struct device *dev, bool suspend)

>         if (!pm_genpd_present(genpd))

>                 return;

>

> +       genpd_lock(genpd);

> +

>         if (suspend) {

>                 genpd->suspended_count++;

> -               genpd_sync_power_off(genpd);

> +               genpd_sync_power_off(genpd, 0);

>         } else {

> -               genpd_sync_power_on(genpd);

> +               genpd_sync_power_on(genpd, 0);

>                 genpd->suspended_count--;

>         }

> +

> +       genpd_unlock(genpd);

>  }


This causes the following BUG on all my Renesas arm32 boards, where the
system timer is an IRQ safe device:

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:232
in_atomic(): 0, irqs_disabled(): 128, pid: 1751, name: s2ram
CPU: 0 PID: 1751 Comm: s2ram Not tainted
4.10.0-rc7-koelsch-05643-g27f4c73972a614fe #3354
Hardware name: Generic R8A7791 (Flattened Device Tree)
[<c020e9c4>] (unwind_backtrace) from [<c020a40c>] (show_stack+0x10/0x14)
[<c020a40c>] (show_stack) from [<c03f9818>] (dump_stack+0x7c/0x9c)
[<c03f9818>] (dump_stack) from [<c0240020>] (___might_sleep+0x124/0x160)
[<c0240020>] (___might_sleep) from [<c06fedfc>] (mutex_lock+0x18/0x60)
[<c06fedfc>] (mutex_lock) from [<c04de340>] (genpd_syscore_switch+0x2c/0x7c)
[<c04de340>] (genpd_syscore_switch) from [<c05ec328>]
(sh_cmt_clock_event_suspend+0x18/0x28)
[<c05ec328>] (sh_cmt_clock_event_suspend) from [<c027f9a4>]
(clockevents_suspend+0x40/0x54)
[<c027f9a4>] (clockevents_suspend) from [<c0276d48>]
(timekeeping_suspend+0x23c/0x278)
[<c0276d48>] (timekeeping_suspend) from [<c04cbb88>]
(syscore_suspend+0x88/0x138)
[<c04cbb88>] (syscore_suspend) from [<c025f618>]
(suspend_devices_and_enter+0x290/0x470)
[<c025f618>] (suspend_devices_and_enter) from [<c025fa20>]
(pm_suspend+0x228/0x280)
[<c025fa20>] (pm_suspend) from [<c025e60c>] (state_store+0xac/0xcc)
[<c025e60c>] (state_store) from [<c0340c4c>] (kernfs_fop_write+0x160/0x19c)
[<c0340c4c>] (kernfs_fop_write) from [<c02e3054>] (__vfs_write+0x20/0x108)
[<c02e3054>] (__vfs_write) from [<c02e4424>] (vfs_write+0xb8/0x144)
[<c02e4424>] (vfs_write) from [<c02e5014>] (SyS_write+0x40/0x80)
[<c02e5014>] (SyS_write) from [<c0206cc0>] (ret_fast_syscall+0x0/0x34)

Reverting the commit fixes the issue.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Feb. 7, 2017, 3:07 p.m. | #2
On 7 February 2017 at 14:23, Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> Hi Ulf, Rafael,

>

> On Thu, Jan 12, 2017 at 6:17 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

>> As the PM core may invoke the *noirq() callbacks asynchronously, the

>> current lock-less approach in genpd doesn't work. The consequence is that

>> we may find concurrent operations racing to power on/off the PM domain.

>>

>> As of now, no immediate errors has been reported, but it's probably only a

>> matter time. Therefor let's fix the problem now before this becomes a real

>> issue, by deploying the locking scheme to the relevant functions.

>>

>> Reported-by: Brian Norris <briannorris@chromium.org>

>> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

>

> This is commit ef4f7e2c8335a56b in pm/linux-next.

>

>> ---

>>  drivers/base/power/domain.c | 62 ++++++++++++++++++++++++---------------------

>>  1 file changed, 33 insertions(+), 29 deletions(-)

>>

>> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c

>> index fd2e3e1..6b23d82 100644

>> --- a/drivers/base/power/domain.c

>> +++ b/drivers/base/power/domain.c

>> @@ -729,16 +729,17 @@ static bool genpd_dev_active_wakeup(struct generic_pm_domain *genpd,

>

>> @@ -1070,13 +1070,17 @@ static void genpd_syscore_switch(struct device *dev, bool suspend)

>>         if (!pm_genpd_present(genpd))

>>                 return;

>>

>> +       genpd_lock(genpd);

>> +

>>         if (suspend) {

>>                 genpd->suspended_count++;

>> -               genpd_sync_power_off(genpd);

>> +               genpd_sync_power_off(genpd, 0);

>>         } else {

>> -               genpd_sync_power_on(genpd);

>> +               genpd_sync_power_on(genpd, 0);

>>                 genpd->suspended_count--;

>>         }

>> +

>> +       genpd_unlock(genpd);

>>  }

>

> This causes the following BUG on all my Renesas arm32 boards, where the

> system timer is an IRQ safe device:

>

> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:232

> in_atomic(): 0, irqs_disabled(): 128, pid: 1751, name: s2ram

> CPU: 0 PID: 1751 Comm: s2ram Not tainted

> 4.10.0-rc7-koelsch-05643-g27f4c73972a614fe #3354

> Hardware name: Generic R8A7791 (Flattened Device Tree)

> [<c020e9c4>] (unwind_backtrace) from [<c020a40c>] (show_stack+0x10/0x14)

> [<c020a40c>] (show_stack) from [<c03f9818>] (dump_stack+0x7c/0x9c)

> [<c03f9818>] (dump_stack) from [<c0240020>] (___might_sleep+0x124/0x160)

> [<c0240020>] (___might_sleep) from [<c06fedfc>] (mutex_lock+0x18/0x60)

> [<c06fedfc>] (mutex_lock) from [<c04de340>] (genpd_syscore_switch+0x2c/0x7c)

> [<c04de340>] (genpd_syscore_switch) from [<c05ec328>]

> (sh_cmt_clock_event_suspend+0x18/0x28)

> [<c05ec328>] (sh_cmt_clock_event_suspend) from [<c027f9a4>]

> (clockevents_suspend+0x40/0x54)

> [<c027f9a4>] (clockevents_suspend) from [<c0276d48>]

> (timekeeping_suspend+0x23c/0x278)

> [<c0276d48>] (timekeeping_suspend) from [<c04cbb88>]

> (syscore_suspend+0x88/0x138)

> [<c04cbb88>] (syscore_suspend) from [<c025f618>]

> (suspend_devices_and_enter+0x290/0x470)

> [<c025f618>] (suspend_devices_and_enter) from [<c025fa20>]

> (pm_suspend+0x228/0x280)

> [<c025fa20>] (pm_suspend) from [<c025e60c>] (state_store+0xac/0xcc)

> [<c025e60c>] (state_store) from [<c0340c4c>] (kernfs_fop_write+0x160/0x19c)

> [<c0340c4c>] (kernfs_fop_write) from [<c02e3054>] (__vfs_write+0x20/0x108)

> [<c02e3054>] (__vfs_write) from [<c02e4424>] (vfs_write+0xb8/0x144)

> [<c02e4424>] (vfs_write) from [<c02e5014>] (SyS_write+0x40/0x80)

> [<c02e5014>] (SyS_write) from [<c0206cc0>] (ret_fast_syscall+0x0/0x34)

>

> Reverting the commit fixes the issue.


Geert, thanks for reporting! Instead of a revert, I believe a better
option is to only partly revert the change.

What I suggest is to keep the locks for the *noirq() callbacks, but
remove it for the pm_genpd_syscore* functions.

Actually the change-log I wrote for the related commit, don't mention
the changes it introduces to the syscore API - my bad! I apologize for
the inconvenience.

Allow me to send a fix on top asap. Whether Rafael can fold it into
the existing commit or add on top, I leave to him to decide.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index fd2e3e1..6b23d82 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -729,16 +729,17 @@  static bool genpd_dev_active_wakeup(struct generic_pm_domain *genpd,
 /**
  * genpd_sync_power_off - Synchronously power off a PM domain and its masters.
  * @genpd: PM domain to power off, if possible.
+ * @depth: nesting count for lockdep.
  *
  * Check if the given PM domain can be powered off (during system suspend or
  * hibernation) and do that if so.  Also, in that case propagate to its masters.
  *
  * This function is only called in "noirq" and "syscore" stages of system power
- * transitions, so it need not acquire locks (all of the "noirq" callbacks are
- * executed sequentially, so it is guaranteed that it will never run twice in
- * parallel).
+ * transitions, but since the "noirq" callbacks may be executed asynchronously,
+ * the lock must be held.
  */
-static void genpd_sync_power_off(struct generic_pm_domain *genpd)
+static void genpd_sync_power_off(struct generic_pm_domain *genpd,
+				 unsigned int depth)
 {
 	struct gpd_link *link;
 
@@ -757,20 +758,24 @@  static void genpd_sync_power_off(struct generic_pm_domain *genpd)
 
 	list_for_each_entry(link, &genpd->slave_links, slave_node) {
 		genpd_sd_counter_dec(link->master);
-		genpd_sync_power_off(link->master);
+
+		genpd_lock_nested(link->master, depth + 1);
+		genpd_sync_power_off(link->master, depth + 1);
+		genpd_unlock(link->master);
 	}
 }
 
 /**
  * genpd_sync_power_on - Synchronously power on a PM domain and its masters.
  * @genpd: PM domain to power on.
+ * @depth: nesting count for lockdep.
  *
  * This function is only called in "noirq" and "syscore" stages of system power
- * transitions, so it need not acquire locks (all of the "noirq" callbacks are
- * executed sequentially, so it is guaranteed that it will never run twice in
- * parallel).
+ * transitions, but since the "noirq" callbacks may be executed asynchronously,
+ * the lock must be held.
  */
-static void genpd_sync_power_on(struct generic_pm_domain *genpd)
+static void genpd_sync_power_on(struct generic_pm_domain *genpd,
+				unsigned int depth)
 {
 	struct gpd_link *link;
 
@@ -778,8 +783,11 @@  static void genpd_sync_power_on(struct generic_pm_domain *genpd)
 		return;
 
 	list_for_each_entry(link, &genpd->slave_links, slave_node) {
-		genpd_sync_power_on(link->master);
 		genpd_sd_counter_inc(link->master);
+
+		genpd_lock_nested(link->master, depth + 1);
+		genpd_sync_power_on(link->master, depth + 1);
+		genpd_unlock(link->master);
 	}
 
 	_genpd_power_on(genpd, false);
@@ -888,13 +896,10 @@  static int pm_genpd_suspend_noirq(struct device *dev)
 			return ret;
 	}
 
-	/*
-	 * Since all of the "noirq" callbacks are executed sequentially, it is
-	 * guaranteed that this function will never run twice in parallel for
-	 * the same PM domain, so it is not necessary to use locking here.
-	 */
+	genpd_lock(genpd);
 	genpd->suspended_count++;
-	genpd_sync_power_off(genpd);
+	genpd_sync_power_off(genpd, 0);
+	genpd_unlock(genpd);
 
 	return 0;
 }
@@ -919,13 +924,10 @@  static int pm_genpd_resume_noirq(struct device *dev)
 	if (dev->power.wakeup_path && genpd_dev_active_wakeup(genpd, dev))
 		return 0;
 
-	/*
-	 * Since all of the "noirq" callbacks are executed sequentially, it is
-	 * guaranteed that this function will never run twice in parallel for
-	 * the same PM domain, so it is not necessary to use locking here.
-	 */
-	genpd_sync_power_on(genpd);
+	genpd_lock(genpd);
+	genpd_sync_power_on(genpd, 0);
 	genpd->suspended_count--;
+	genpd_unlock(genpd);
 
 	if (genpd->dev_ops.stop && genpd->dev_ops.start)
 		ret = pm_runtime_force_resume(dev);
@@ -1002,13 +1004,10 @@  static int pm_genpd_restore_noirq(struct device *dev)
 		return -EINVAL;
 
 	/*
-	 * Since all of the "noirq" callbacks are executed sequentially, it is
-	 * guaranteed that this function will never run twice in parallel for
-	 * the same PM domain, so it is not necessary to use locking here.
-	 *
 	 * At this point suspended_count == 0 means we are being run for the
 	 * first time for the given domain in the present cycle.
 	 */
+	genpd_lock(genpd);
 	if (genpd->suspended_count++ == 0)
 		/*
 		 * The boot kernel might put the domain into arbitrary state,
@@ -1017,7 +1016,8 @@  static int pm_genpd_restore_noirq(struct device *dev)
 		 */
 		genpd->status = GPD_STATE_POWER_OFF;
 
-	genpd_sync_power_on(genpd);
+	genpd_sync_power_on(genpd, 0);
+	genpd_unlock(genpd);
 
 	if (genpd->dev_ops.stop && genpd->dev_ops.start)
 		ret = pm_runtime_force_resume(dev);
@@ -1070,13 +1070,17 @@  static void genpd_syscore_switch(struct device *dev, bool suspend)
 	if (!pm_genpd_present(genpd))
 		return;
 
+	genpd_lock(genpd);
+
 	if (suspend) {
 		genpd->suspended_count++;
-		genpd_sync_power_off(genpd);
+		genpd_sync_power_off(genpd, 0);
 	} else {
-		genpd_sync_power_on(genpd);
+		genpd_sync_power_on(genpd, 0);
 		genpd->suspended_count--;
 	}
+
+	genpd_unlock(genpd);
 }
 
 void pm_genpd_syscore_poweroff(struct device *dev)