diff mbox series

cpuidle: psci: Fix suspicious RCU usage

Message ID 20200904064705.239614-1-ulf.hansson@linaro.org
State Accepted
Commit 36050d8984ab743f9990a2eb97a0062fdc3d7bbd
Headers show
Series cpuidle: psci: Fix suspicious RCU usage | expand

Commit Message

Ulf Hansson Sept. 4, 2020, 6:47 a.m. UTC
The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to
expose us for tracepoints. This lead to the following RCU splat on an ARM64
Qcom board.

[    5.529634] WARNING: suspicious RCU usage
[    5.537307] sdhci-pltfm: SDHCI platform and OF driver helper
[    5.541092] 5.9.0-rc3 #86 Not tainted
[    5.541098] -----------------------------
[    5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
[    5.541110]
[    5.541110] other info that might help us debug this:
[    5.541110]
[    5.541116]
[    5.541116] rcu_scheduler_active = 2, debug_locks = 1
[    5.541122] RCU used illegally from extended quiescent state!
[    5.541129] no locks held by swapper/0/0.
[    5.541134]
[    5.541134] stack backtrace:
[    5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86
[    5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
[    5.541157] Call trace:
[    5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO
[    5.574186]  dump_backtrace+0x0/0x1c8
[    5.574206]  show_stack+0x14/0x20
[    5.574229]  dump_stack+0xe8/0x154
[    5.574250]  lockdep_rcu_suspicious+0xd4/0xf8
[    5.574269]  lock_acquire+0x3f0/0x460
[    5.574292]  _raw_spin_lock_irqsave+0x80/0xb0
[    5.574314]  __pm_runtime_suspend+0x4c/0x188
[    5.574341]  psci_enter_domain_idle_state+0x40/0xa0
[    5.574362]  cpuidle_enter_state+0xc0/0x610
[    5.646487]  cpuidle_enter+0x38/0x50
[    5.650651]  call_cpuidle+0x18/0x40
[    5.654467]  do_idle+0x228/0x278
[    5.657678]  cpu_startup_entry+0x24/0x70
[    5.661153]  rest_init+0x1a4/0x278
[    5.665061]  arch_call_rest_init+0xc/0x14
[    5.668272]  start_kernel+0x508/0x540

Following the path in pm_runtime_put_sync_suspend() from
psci_enter_domain_idle_state(), it seems like we end up using the RCU.
Therefore, let's simply silence the splat by informing the RCU about it
with RCU_NONIDLE.

Note that, this is a temporary solution. Instead we should strive to avoid
using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit()
further down, closer to the arch specific code. However, as the CPU PM
notifiers are also using the RCU, additional rework is needed.

Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

---
 drivers/cpuidle/cpuidle-psci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.25.1

Comments

Paul E. McKenney Sept. 4, 2020, 2:13 p.m. UTC | #1
On Fri, Sep 04, 2020 at 08:47:05AM +0200, Ulf Hansson wrote:
> The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to

> expose us for tracepoints. This lead to the following RCU splat on an ARM64

> Qcom board.

> 

> [    5.529634] WARNING: suspicious RCU usage

> [    5.537307] sdhci-pltfm: SDHCI platform and OF driver helper

> [    5.541092] 5.9.0-rc3 #86 Not tainted

> [    5.541098] -----------------------------

> [    5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!

> [    5.541110]

> [    5.541110] other info that might help us debug this:

> [    5.541110]

> [    5.541116]

> [    5.541116] rcu_scheduler_active = 2, debug_locks = 1

> [    5.541122] RCU used illegally from extended quiescent state!

> [    5.541129] no locks held by swapper/0/0.

> [    5.541134]

> [    5.541134] stack backtrace:

> [    5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86

> [    5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)

> [    5.541157] Call trace:

> [    5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO

> [    5.574186]  dump_backtrace+0x0/0x1c8

> [    5.574206]  show_stack+0x14/0x20

> [    5.574229]  dump_stack+0xe8/0x154

> [    5.574250]  lockdep_rcu_suspicious+0xd4/0xf8

> [    5.574269]  lock_acquire+0x3f0/0x460

> [    5.574292]  _raw_spin_lock_irqsave+0x80/0xb0

> [    5.574314]  __pm_runtime_suspend+0x4c/0x188

> [    5.574341]  psci_enter_domain_idle_state+0x40/0xa0

> [    5.574362]  cpuidle_enter_state+0xc0/0x610

> [    5.646487]  cpuidle_enter+0x38/0x50

> [    5.650651]  call_cpuidle+0x18/0x40

> [    5.654467]  do_idle+0x228/0x278

> [    5.657678]  cpu_startup_entry+0x24/0x70

> [    5.661153]  rest_init+0x1a4/0x278

> [    5.665061]  arch_call_rest_init+0xc/0x14

> [    5.668272]  start_kernel+0x508/0x540

> 

> Following the path in pm_runtime_put_sync_suspend() from

> psci_enter_domain_idle_state(), it seems like we end up using the RCU.

> Therefore, let's simply silence the splat by informing the RCU about it

> with RCU_NONIDLE.

> 

> Note that, this is a temporary solution. Instead we should strive to avoid

> using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit()

> further down, closer to the arch specific code. However, as the CPU PM

> notifiers are also using the RCU, additional rework is needed.

> 

> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>

> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>


Acked-by: Paul E. McKenney <paulmck@kernel.org>


> ---

>  drivers/cpuidle/cpuidle-psci.c | 4 ++--

>  1 file changed, 2 insertions(+), 2 deletions(-)

> 

> diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c

> index 74463841805f..d928b37718bd 100644

> --- a/drivers/cpuidle/cpuidle-psci.c

> +++ b/drivers/cpuidle/cpuidle-psci.c

> @@ -66,7 +66,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,

>  		return -1;

>  

>  	/* Do runtime PM to manage a hierarchical CPU toplogy. */

> -	pm_runtime_put_sync_suspend(pd_dev);

> +	RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev));

>  

>  	state = psci_get_domain_state();

>  	if (!state)

> @@ -74,7 +74,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,

>  

>  	ret = psci_cpu_suspend_enter(state) ? -1 : idx;

>  

> -	pm_runtime_get_sync(pd_dev);

> +	RCU_NONIDLE(pm_runtime_get_sync(pd_dev));

>  

>  	cpu_pm_exit();

>  

> -- 

> 2.25.1

>
Ulf Hansson Sept. 15, 2020, 9:17 p.m. UTC | #2
On Fri, 4 Sep 2020 at 16:13, Paul E. McKenney <paulmck@kernel.org> wrote:
>
> On Fri, Sep 04, 2020 at 08:47:05AM +0200, Ulf Hansson wrote:
> > The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to
> > expose us for tracepoints. This lead to the following RCU splat on an ARM64
> > Qcom board.
> >
> > [    5.529634] WARNING: suspicious RCU usage
> > [    5.537307] sdhci-pltfm: SDHCI platform and OF driver helper
> > [    5.541092] 5.9.0-rc3 #86 Not tainted
> > [    5.541098] -----------------------------
> > [    5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
> > [    5.541110]
> > [    5.541110] other info that might help us debug this:
> > [    5.541110]
> > [    5.541116]
> > [    5.541116] rcu_scheduler_active = 2, debug_locks = 1
> > [    5.541122] RCU used illegally from extended quiescent state!
> > [    5.541129] no locks held by swapper/0/0.
> > [    5.541134]
> > [    5.541134] stack backtrace:
> > [    5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86
> > [    5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> > [    5.541157] Call trace:
> > [    5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO
> > [    5.574186]  dump_backtrace+0x0/0x1c8
> > [    5.574206]  show_stack+0x14/0x20
> > [    5.574229]  dump_stack+0xe8/0x154
> > [    5.574250]  lockdep_rcu_suspicious+0xd4/0xf8
> > [    5.574269]  lock_acquire+0x3f0/0x460
> > [    5.574292]  _raw_spin_lock_irqsave+0x80/0xb0
> > [    5.574314]  __pm_runtime_suspend+0x4c/0x188
> > [    5.574341]  psci_enter_domain_idle_state+0x40/0xa0
> > [    5.574362]  cpuidle_enter_state+0xc0/0x610
> > [    5.646487]  cpuidle_enter+0x38/0x50
> > [    5.650651]  call_cpuidle+0x18/0x40
> > [    5.654467]  do_idle+0x228/0x278
> > [    5.657678]  cpu_startup_entry+0x24/0x70
> > [    5.661153]  rest_init+0x1a4/0x278
> > [    5.665061]  arch_call_rest_init+0xc/0x14
> > [    5.668272]  start_kernel+0x508/0x540
> >
> > Following the path in pm_runtime_put_sync_suspend() from
> > psci_enter_domain_idle_state(), it seems like we end up using the RCU.
> > Therefore, let's simply silence the splat by informing the RCU about it
> > with RCU_NONIDLE.
> >
> > Note that, this is a temporary solution. Instead we should strive to avoid
> > using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit()
> > further down, closer to the arch specific code. However, as the CPU PM
> > notifiers are also using the RCU, additional rework is needed.
> >
> > Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
>
> Acked-by: Paul E. McKenney <paulmck@kernel.org>

Peter, Rafael,

Is $subject patch okay as the short term solution? If so, I can
continue to look at the CPU PM notifiers, etc.

Kind regards
Uffe

>
> > ---
> >  drivers/cpuidle/cpuidle-psci.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> > index 74463841805f..d928b37718bd 100644
> > --- a/drivers/cpuidle/cpuidle-psci.c
> > +++ b/drivers/cpuidle/cpuidle-psci.c
> > @@ -66,7 +66,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> >               return -1;
> >
> >       /* Do runtime PM to manage a hierarchical CPU toplogy. */
> > -     pm_runtime_put_sync_suspend(pd_dev);
> > +     RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev));
> >
> >       state = psci_get_domain_state();
> >       if (!state)
> > @@ -74,7 +74,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> >
> >       ret = psci_cpu_suspend_enter(state) ? -1 : idx;
> >
> > -     pm_runtime_get_sync(pd_dev);
> > +     RCU_NONIDLE(pm_runtime_get_sync(pd_dev));
> >
> >       cpu_pm_exit();
> >
> > --
> > 2.25.1
> >
Peter Zijlstra Sept. 16, 2020, 8:36 a.m. UTC | #3
On Tue, Sep 15, 2020 at 11:17:20PM +0200, Ulf Hansson wrote:
> Peter, Rafael,
> 
> Is $subject patch okay as the short term solution? If so, I can
> continue to look at the CPU PM notifiers, etc.

Yes, it's a proper band-aid :-)
Rafael J. Wysocki Sept. 21, 2020, 1:59 p.m. UTC | #4
On Fri, Sep 4, 2020 at 8:47 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> The commit eb1f00237aca ("lockdep,trace: Expose tracepoints"), started to
> expose us for tracepoints. This lead to the following RCU splat on an ARM64
> Qcom board.
>
> [    5.529634] WARNING: suspicious RCU usage
> [    5.537307] sdhci-pltfm: SDHCI platform and OF driver helper
> [    5.541092] 5.9.0-rc3 #86 Not tainted
> [    5.541098] -----------------------------
> [    5.541105] ../include/trace/events/lock.h:37 suspicious rcu_dereference_check() usage!
> [    5.541110]
> [    5.541110] other info that might help us debug this:
> [    5.541110]
> [    5.541116]
> [    5.541116] rcu_scheduler_active = 2, debug_locks = 1
> [    5.541122] RCU used illegally from extended quiescent state!
> [    5.541129] no locks held by swapper/0/0.
> [    5.541134]
> [    5.541134] stack backtrace:
> [    5.541143] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.9.0-rc3 #86
> [    5.541149] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT)
> [    5.541157] Call trace:
> [    5.568185] sdhci_msm 7864900.sdhci: Got CD GPIO
> [    5.574186]  dump_backtrace+0x0/0x1c8
> [    5.574206]  show_stack+0x14/0x20
> [    5.574229]  dump_stack+0xe8/0x154
> [    5.574250]  lockdep_rcu_suspicious+0xd4/0xf8
> [    5.574269]  lock_acquire+0x3f0/0x460
> [    5.574292]  _raw_spin_lock_irqsave+0x80/0xb0
> [    5.574314]  __pm_runtime_suspend+0x4c/0x188
> [    5.574341]  psci_enter_domain_idle_state+0x40/0xa0
> [    5.574362]  cpuidle_enter_state+0xc0/0x610
> [    5.646487]  cpuidle_enter+0x38/0x50
> [    5.650651]  call_cpuidle+0x18/0x40
> [    5.654467]  do_idle+0x228/0x278
> [    5.657678]  cpu_startup_entry+0x24/0x70
> [    5.661153]  rest_init+0x1a4/0x278
> [    5.665061]  arch_call_rest_init+0xc/0x14
> [    5.668272]  start_kernel+0x508/0x540
>
> Following the path in pm_runtime_put_sync_suspend() from
> psci_enter_domain_idle_state(), it seems like we end up using the RCU.
> Therefore, let's simply silence the splat by informing the RCU about it
> with RCU_NONIDLE.
>
> Note that, this is a temporary solution. Instead we should strive to avoid
> using RCU_NONIDLE (and similar), but rather push rcu_idle_enter|exit()
> further down, closer to the arch specific code. However, as the CPU PM
> notifiers are also using the RCU, additional rework is needed.
>
> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

I missed this one before, sorry about that.

Now applied as 5.9-rc6 material with the Paul's ACK, thanks!

> ---
>  drivers/cpuidle/cpuidle-psci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> index 74463841805f..d928b37718bd 100644
> --- a/drivers/cpuidle/cpuidle-psci.c
> +++ b/drivers/cpuidle/cpuidle-psci.c
> @@ -66,7 +66,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
>                 return -1;
>
>         /* Do runtime PM to manage a hierarchical CPU toplogy. */
> -       pm_runtime_put_sync_suspend(pd_dev);
> +       RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev));
>
>         state = psci_get_domain_state();
>         if (!state)
> @@ -74,7 +74,7 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
>
>         ret = psci_cpu_suspend_enter(state) ? -1 : idx;
>
> -       pm_runtime_get_sync(pd_dev);
> +       RCU_NONIDLE(pm_runtime_get_sync(pd_dev));
>
>         cpu_pm_exit();
>
> --
> 2.25.1
>
diff mbox series

Patch

diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
index 74463841805f..d928b37718bd 100644
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -66,7 +66,7 @@  static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
 		return -1;
 
 	/* Do runtime PM to manage a hierarchical CPU toplogy. */
-	pm_runtime_put_sync_suspend(pd_dev);
+	RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev));
 
 	state = psci_get_domain_state();
 	if (!state)
@@ -74,7 +74,7 @@  static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
 
 	ret = psci_cpu_suspend_enter(state) ? -1 : idx;
 
-	pm_runtime_get_sync(pd_dev);
+	RCU_NONIDLE(pm_runtime_get_sync(pd_dev));
 
 	cpu_pm_exit();