diff mbox series

notifier: Make atomic_notifiers use raw_spinlock

Message ID 20201122201904.30940-1-valentin.schneider@arm.com
State New
Headers show
Series notifier: Make atomic_notifiers use raw_spinlock | expand

Commit Message

Valentin Schneider Nov. 22, 2020, 8:19 p.m. UTC
Booting a recent PREEMPT_RT kernel (v5.10-rc3-rt7-rebase) on my arm64 Juno
leads to the idle task blocking on an RT sleeping spinlock down some
notifier path:

  [    1.809101] BUG: scheduling while atomic: swapper/5/0/0x00000002
  [    1.809116] Modules linked in:
  [    1.809123] Preemption disabled at:
  [    1.809125] secondary_start_kernel (arch/arm64/kernel/smp.c:227)
  [    1.809146] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.10.0-rc3-rt7 #168
  [    1.809153] Hardware name: ARM Juno development board (r0) (DT)
  [    1.809158] Call trace:
  [    1.809160] dump_backtrace (arch/arm64/kernel/stacktrace.c:100 (discriminator 1))
  [    1.809170] show_stack (arch/arm64/kernel/stacktrace.c:198)
  [    1.809178] dump_stack (lib/dump_stack.c:122)
  [    1.809188] __schedule_bug (kernel/sched/core.c:4886)
  [    1.809197] __schedule (./arch/arm64/include/asm/preempt.h:18 kernel/sched/core.c:4913 kernel/sched/core.c:5040)
  [    1.809204] preempt_schedule_lock (kernel/sched/core.c:5365 (discriminator 1))
  [    1.809210] rt_spin_lock_slowlock_locked (kernel/locking/rtmutex.c:1072)
  [    1.809217] rt_spin_lock_slowlock (kernel/locking/rtmutex.c:1110)
  [    1.809224] rt_spin_lock (./include/linux/rcupdate.h:647 kernel/locking/rtmutex.c:1139)
  [    1.809231] atomic_notifier_call_chain_robust (kernel/notifier.c:71 kernel/notifier.c:118 kernel/notifier.c:186)
  [    1.809240] cpu_pm_enter (kernel/cpu_pm.c:39 kernel/cpu_pm.c:93)
  [    1.809249] psci_enter_idle_state (drivers/cpuidle/cpuidle-psci.c:52 drivers/cpuidle/cpuidle-psci.c:129)
  [    1.809258] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:238)
  [    1.809267] cpuidle_enter (drivers/cpuidle/cpuidle.c:353)
  [    1.809275] do_idle (kernel/sched/idle.c:132 kernel/sched/idle.c:213 kernel/sched/idle.c:273)
  [    1.809282] cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1))
  [    1.809288] secondary_start_kernel (arch/arm64/kernel/smp.c:273)

Two points worth noting:

1) That this is conceptually the same issue as pointed out in:
   313c8c16ee62 ("PM / CPU: replace raw_notifier with atomic_notifier")
2) Only the _robust() variant of atomic_notifier callchains suffer from
   this

AFAICT only the cpu_pm_notifier_chain really needs to be changed, but
singling it out would mean introducing a new (truly) non-blocking API. At
the same time, callers that are fine with any blocking within the call
chain should use blocking notifiers, so patching up all atomic_notifier's
doesn't seem *too* crazy to me.

Fixes: 70d932985757 ("notifier: Fix broken error handling pattern")
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 include/linux/notifier.h |  6 +++---
 kernel/notifier.c        | 12 ++++++------
 2 files changed, 9 insertions(+), 9 deletions(-)

Comments

Peter Zijlstra Nov. 23, 2020, 2:14 p.m. UTC | #1
On Sun, Nov 22, 2020 at 08:19:04PM +0000, Valentin Schneider wrote:
> Booting a recent PREEMPT_RT kernel (v5.10-rc3-rt7-rebase) on my arm64 Juno

> leads to the idle task blocking on an RT sleeping spinlock down some

> notifier path:

> 

>   [    1.809101] BUG: scheduling while atomic: swapper/5/0/0x00000002

>   [    1.809116] Modules linked in:

>   [    1.809123] Preemption disabled at:

>   [    1.809125] secondary_start_kernel (arch/arm64/kernel/smp.c:227)

>   [    1.809146] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.10.0-rc3-rt7 #168

>   [    1.809153] Hardware name: ARM Juno development board (r0) (DT)

>   [    1.809158] Call trace:

>   [    1.809160] dump_backtrace (arch/arm64/kernel/stacktrace.c:100 (discriminator 1))

>   [    1.809170] show_stack (arch/arm64/kernel/stacktrace.c:198)

>   [    1.809178] dump_stack (lib/dump_stack.c:122)

>   [    1.809188] __schedule_bug (kernel/sched/core.c:4886)

>   [    1.809197] __schedule (./arch/arm64/include/asm/preempt.h:18 kernel/sched/core.c:4913 kernel/sched/core.c:5040)

>   [    1.809204] preempt_schedule_lock (kernel/sched/core.c:5365 (discriminator 1))

>   [    1.809210] rt_spin_lock_slowlock_locked (kernel/locking/rtmutex.c:1072)

>   [    1.809217] rt_spin_lock_slowlock (kernel/locking/rtmutex.c:1110)

>   [    1.809224] rt_spin_lock (./include/linux/rcupdate.h:647 kernel/locking/rtmutex.c:1139)

>   [    1.809231] atomic_notifier_call_chain_robust (kernel/notifier.c:71 kernel/notifier.c:118 kernel/notifier.c:186)

>   [    1.809240] cpu_pm_enter (kernel/cpu_pm.c:39 kernel/cpu_pm.c:93)

>   [    1.809249] psci_enter_idle_state (drivers/cpuidle/cpuidle-psci.c:52 drivers/cpuidle/cpuidle-psci.c:129)

>   [    1.809258] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:238)

>   [    1.809267] cpuidle_enter (drivers/cpuidle/cpuidle.c:353)

>   [    1.809275] do_idle (kernel/sched/idle.c:132 kernel/sched/idle.c:213 kernel/sched/idle.c:273)

>   [    1.809282] cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1))

>   [    1.809288] secondary_start_kernel (arch/arm64/kernel/smp.c:273)

> 

> Two points worth noting:

> 

> 1) That this is conceptually the same issue as pointed out in:

>    313c8c16ee62 ("PM / CPU: replace raw_notifier with atomic_notifier")

> 2) Only the _robust() variant of atomic_notifier callchains suffer from

>    this

> 

> AFAICT only the cpu_pm_notifier_chain really needs to be changed, but

> singling it out would mean introducing a new (truly) non-blocking API. At

> the same time, callers that are fine with any blocking within the call

> chain should use blocking notifiers, so patching up all atomic_notifier's

> doesn't seem *too* crazy to me.


How long are these notifier chains?, and all this pcs_enter_idle_state()
is still horribly broken vs RCU, witness the RCU_NONIDLE() there and the
rcu_irq_enter_irqson() in the pm_notifier code.

That said, we're running these notifiers from the idle path with IRQs
disabled, so taking that spinlock isn't going to make it worse..
Valentin Schneider Nov. 23, 2020, 2:52 p.m. UTC | #2
On 23/11/20 14:14, Peter Zijlstra wrote:
> On Sun, Nov 22, 2020 at 08:19:04PM +0000, Valentin Schneider wrote:

[...]
>> Two points worth noting:

>>

>> 1) That this is conceptually the same issue as pointed out in:

>>    313c8c16ee62 ("PM / CPU: replace raw_notifier with atomic_notifier")

>> 2) Only the _robust() variant of atomic_notifier callchains suffer from

>>    this

>>

>> AFAICT only the cpu_pm_notifier_chain really needs to be changed, but

>> singling it out would mean introducing a new (truly) non-blocking API. At

>> the same time, callers that are fine with any blocking within the call

>> chain should use blocking notifiers, so patching up all atomic_notifier's

>> doesn't seem *too* crazy to me.

>

> How long are these notifier chains?,


On said Juno I get:

  gic_notifier()
  arch_timer_cpu_pm_notify()
  fpsimd_cpu_pm_notifier()
  cpu_pm_pmu_notify() x2
  hyp_init_cpu_pm_notifier()

(I would take a guess that there's one PMU cb per cluster due to big.LITTLE
faffery)

> and all this pcs_enter_idle_state()

> is still horribly broken vs RCU, witness the RCU_NONIDLE() there and the

> rcu_irq_enter_irqson() in the pm_notifier code.

>


Hadn't paid attention to that, that's indeed... Interesting.

> That said, we're running these notifiers from the idle path with IRQs

> disabled, so taking that spinlock isn't going to make it worse..


And it's already taken on !PREEMPT_RT.
Valentin Schneider Nov. 30, 2020, 10:09 a.m. UTC | #3
On 22/11/20 20:19, Valentin Schneider wrote:
> Booting a recent PREEMPT_RT kernel (v5.10-rc3-rt7-rebase) on my arm64 Juno

> leads to the idle task blocking on an RT sleeping spinlock down some

> notifier path:

>

>   [    1.809101] BUG: scheduling while atomic: swapper/5/0/0x00000002

>   [    1.809116] Modules linked in:

>   [    1.809123] Preemption disabled at:

>   [    1.809125] secondary_start_kernel (arch/arm64/kernel/smp.c:227)

>   [    1.809146] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.10.0-rc3-rt7 #168

>   [    1.809153] Hardware name: ARM Juno development board (r0) (DT)

>   [    1.809158] Call trace:

>   [    1.809160] dump_backtrace (arch/arm64/kernel/stacktrace.c:100 (discriminator 1))

>   [    1.809170] show_stack (arch/arm64/kernel/stacktrace.c:198)

>   [    1.809178] dump_stack (lib/dump_stack.c:122)

>   [    1.809188] __schedule_bug (kernel/sched/core.c:4886)

>   [    1.809197] __schedule (./arch/arm64/include/asm/preempt.h:18 kernel/sched/core.c:4913 kernel/sched/core.c:5040)

>   [    1.809204] preempt_schedule_lock (kernel/sched/core.c:5365 (discriminator 1))

>   [    1.809210] rt_spin_lock_slowlock_locked (kernel/locking/rtmutex.c:1072)

>   [    1.809217] rt_spin_lock_slowlock (kernel/locking/rtmutex.c:1110)

>   [    1.809224] rt_spin_lock (./include/linux/rcupdate.h:647 kernel/locking/rtmutex.c:1139)

>   [    1.809231] atomic_notifier_call_chain_robust (kernel/notifier.c:71 kernel/notifier.c:118 kernel/notifier.c:186)

>   [    1.809240] cpu_pm_enter (kernel/cpu_pm.c:39 kernel/cpu_pm.c:93)

>   [    1.809249] psci_enter_idle_state (drivers/cpuidle/cpuidle-psci.c:52 drivers/cpuidle/cpuidle-psci.c:129)

>   [    1.809258] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:238)

>   [    1.809267] cpuidle_enter (drivers/cpuidle/cpuidle.c:353)

>   [    1.809275] do_idle (kernel/sched/idle.c:132 kernel/sched/idle.c:213 kernel/sched/idle.c:273)

>   [    1.809282] cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1))

>   [    1.809288] secondary_start_kernel (arch/arm64/kernel/smp.c:273)

>


FWIW, still squealing under v5.10-rc5-rt11.
Sebastian Andrzej Siewior Nov. 30, 2020, 1:51 p.m. UTC | #4
On 2020-11-30 10:09:41 [+0000], Valentin Schneider wrote:
> FWIW, still squealing under v5.10-rc5-rt11.


I could apply this into my current RT but it would be nice to get this
applied upstream.

Sebastian
Daniel Bristot de Oliveira Nov. 30, 2020, 1:55 p.m. UTC | #5
On 11/22/20 9:19 PM, Valentin Schneider wrote:
> Booting a recent PREEMPT_RT kernel (v5.10-rc3-rt7-rebase) on my arm64 Juno

> leads to the idle task blocking on an RT sleeping spinlock down some

> notifier path:

> 

>   [    1.809101] BUG: scheduling while atomic: swapper/5/0/0x00000002

>   [    1.809116] Modules linked in:

>   [    1.809123] Preemption disabled at:

>   [    1.809125] secondary_start_kernel (arch/arm64/kernel/smp.c:227)

>   [    1.809146] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.10.0-rc3-rt7 #168

>   [    1.809153] Hardware name: ARM Juno development board (r0) (DT)

>   [    1.809158] Call trace:

>   [    1.809160] dump_backtrace (arch/arm64/kernel/stacktrace.c:100 (discriminator 1))

>   [    1.809170] show_stack (arch/arm64/kernel/stacktrace.c:198)

>   [    1.809178] dump_stack (lib/dump_stack.c:122)

>   [    1.809188] __schedule_bug (kernel/sched/core.c:4886)

>   [    1.809197] __schedule (./arch/arm64/include/asm/preempt.h:18 kernel/sched/core.c:4913 kernel/sched/core.c:5040)

>   [    1.809204] preempt_schedule_lock (kernel/sched/core.c:5365 (discriminator 1))

>   [    1.809210] rt_spin_lock_slowlock_locked (kernel/locking/rtmutex.c:1072)

>   [    1.809217] rt_spin_lock_slowlock (kernel/locking/rtmutex.c:1110)

>   [    1.809224] rt_spin_lock (./include/linux/rcupdate.h:647 kernel/locking/rtmutex.c:1139)

>   [    1.809231] atomic_notifier_call_chain_robust (kernel/notifier.c:71 kernel/notifier.c:118 kernel/notifier.c:186)

>   [    1.809240] cpu_pm_enter (kernel/cpu_pm.c:39 kernel/cpu_pm.c:93)

>   [    1.809249] psci_enter_idle_state (drivers/cpuidle/cpuidle-psci.c:52 drivers/cpuidle/cpuidle-psci.c:129)

>   [    1.809258] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:238)

>   [    1.809267] cpuidle_enter (drivers/cpuidle/cpuidle.c:353)

>   [    1.809275] do_idle (kernel/sched/idle.c:132 kernel/sched/idle.c:213 kernel/sched/idle.c:273)

>   [    1.809282] cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1))

>   [    1.809288] secondary_start_kernel (arch/arm64/kernel/smp.c:273)

> 

> Two points worth noting:

> 

> 1) That this is conceptually the same issue as pointed out in:

>    313c8c16ee62 ("PM / CPU: replace raw_notifier with atomic_notifier")

> 2) Only the _robust() variant of atomic_notifier callchains suffer from

>    this

> 

> AFAICT only the cpu_pm_notifier_chain really needs to be changed, but

> singling it out would mean introducing a new (truly) non-blocking API. At

> the same time, callers that are fine with any blocking within the call

> chain should use blocking notifiers, so patching up all atomic_notifier's

> doesn't seem *too* crazy to me.

> 

> Fixes: 70d932985757 ("notifier: Fix broken error handling pattern")

> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>


Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>


Thanks!
-- Daniel
diff mbox series

Patch

diff --git a/include/linux/notifier.h b/include/linux/notifier.h
index 2fb373a5c1ed..723bc2df6388 100644
--- a/include/linux/notifier.h
+++ b/include/linux/notifier.h
@@ -58,7 +58,7 @@  struct notifier_block {
 };
 
 struct atomic_notifier_head {
-	spinlock_t lock;
+	raw_spinlock_t lock;
 	struct notifier_block __rcu *head;
 };
 
@@ -78,7 +78,7 @@  struct srcu_notifier_head {
 };
 
 #define ATOMIC_INIT_NOTIFIER_HEAD(name) do {	\
-		spin_lock_init(&(name)->lock);	\
+		raw_spin_lock_init(&(name)->lock);	\
 		(name)->head = NULL;		\
 	} while (0)
 #define BLOCKING_INIT_NOTIFIER_HEAD(name) do {	\
@@ -95,7 +95,7 @@  extern void srcu_init_notifier_head(struct srcu_notifier_head *nh);
 		cleanup_srcu_struct(&(name)->srcu);
 
 #define ATOMIC_NOTIFIER_INIT(name) {				\
-		.lock = __SPIN_LOCK_UNLOCKED(name.lock),	\
+		.lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock),	\
 		.head = NULL }
 #define BLOCKING_NOTIFIER_INIT(name) {				\
 		.rwsem = __RWSEM_INITIALIZER((name).rwsem),	\
diff --git a/kernel/notifier.c b/kernel/notifier.c
index 1b019cbca594..c20782f07643 100644
--- a/kernel/notifier.c
+++ b/kernel/notifier.c
@@ -142,9 +142,9 @@  int atomic_notifier_chain_register(struct atomic_notifier_head *nh,
 	unsigned long flags;
 	int ret;
 
-	spin_lock_irqsave(&nh->lock, flags);
+	raw_spin_lock_irqsave(&nh->lock, flags);
 	ret = notifier_chain_register(&nh->head, n);
-	spin_unlock_irqrestore(&nh->lock, flags);
+	raw_spin_unlock_irqrestore(&nh->lock, flags);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(atomic_notifier_chain_register);
@@ -164,9 +164,9 @@  int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh,
 	unsigned long flags;
 	int ret;
 
-	spin_lock_irqsave(&nh->lock, flags);
+	raw_spin_lock_irqsave(&nh->lock, flags);
 	ret = notifier_chain_unregister(&nh->head, n);
-	spin_unlock_irqrestore(&nh->lock, flags);
+	raw_spin_unlock_irqrestore(&nh->lock, flags);
 	synchronize_rcu();
 	return ret;
 }
@@ -182,9 +182,9 @@  int atomic_notifier_call_chain_robust(struct atomic_notifier_head *nh,
 	 * Musn't use RCU; because then the notifier list can
 	 * change between the up and down traversal.
 	 */
-	spin_lock_irqsave(&nh->lock, flags);
+	raw_spin_lock_irqsave(&nh->lock, flags);
 	ret = notifier_call_chain_robust(&nh->head, val_up, val_down, v);
-	spin_unlock_irqrestore(&nh->lock, flags);
+	raw_spin_unlock_irqrestore(&nh->lock, flags);
 
 	return ret;
 }