diff mbox series

kasan: Fix sleeping function called from invalid context in PREEMPT_RT

Message ID 20220401091006.2100058-1-qiang1.zhang@intel.com
State New
Headers show
Series kasan: Fix sleeping function called from invalid context in PREEMPT_RT | expand

Commit Message

Zhang, Qiang1 April 1, 2022, 9:10 a.m. UTC
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
preempt_count: 1, expected: 0
...........
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt #22
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x60/0x8c
dump_stack+0x10/0x12
 __might_resched.cold+0x13b/0x173
rt_spin_lock+0x5b/0xf0
 ___cache_free+0xa5/0x180
qlist_free_all+0x7a/0x160
per_cpu_remove_cache+0x5f/0x70
smp_call_function_many_cond+0x4c4/0x4f0
on_each_cpu_cond_mask+0x49/0xc0
kasan_quarantine_remove_cache+0x54/0xf0
kasan_cache_shrink+0x9/0x10
kmem_cache_shrink+0x13/0x20
acpi_os_purge_cache+0xe/0x20
acpi_purge_cached_objects+0x21/0x6d
acpi_initialize_objects+0x15/0x3b
acpi_init+0x130/0x5ba
do_one_initcall+0xe5/0x5b0
kernel_init_freeable+0x34f/0x3ad
kernel_init+0x1e/0x140
ret_from_fork+0x22/0x30

When the kmem_cache_shrink() be called, the IPI was triggered, the
___cache_free() is called in IPI interrupt context, the local lock
or spin lock will be acquired. on PREEMPT_RT kernel, these lock is
replaced with sleepbale rt spin lock, so the above problem is triggered.
fix it by migrating the release action from the IPI interrupt context
to the task context on RT kernel.

Signed-off-by: Zqiang <qiang1.zhang@intel.com>
---
 mm/kasan/quarantine.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Zhang, Qiang1 April 1, 2022, 10:10 a.m. UTC | #1
On 2022-04-01 17:10:06 [+0800], Zqiang wrote:
> BUG: sleeping function called from invalid context at 
> kernel/locking/spinlock_rt.c:46
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: 
> swapper/0
> preempt_count: 1, expected: 0
> ...........
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt 
> #22 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 Call Trace:
> <TASK>
> dump_stack_lvl+0x60/0x8c
> dump_stack+0x10/0x12
>  __might_resched.cold+0x13b/0x173
> rt_spin_lock+0x5b/0xf0
>  ___cache_free+0xa5/0x180
> qlist_free_all+0x7a/0x160
> per_cpu_remove_cache+0x5f/0x70
> smp_call_function_many_cond+0x4c4/0x4f0
> on_each_cpu_cond_mask+0x49/0xc0
> kasan_quarantine_remove_cache+0x54/0xf0
> kasan_cache_shrink+0x9/0x10
> kmem_cache_shrink+0x13/0x20
> acpi_os_purge_cache+0xe/0x20
> acpi_purge_cached_objects+0x21/0x6d
> acpi_initialize_objects+0x15/0x3b
> acpi_init+0x130/0x5ba
> do_one_initcall+0xe5/0x5b0
> kernel_init_freeable+0x34f/0x3ad
> kernel_init+0x1e/0x140
> ret_from_fork+0x22/0x30
> 
> When the kmem_cache_shrink() be called, the IPI was triggered, the
> ___cache_free() is called in IPI interrupt context, the local lock or 
> spin lock will be acquired. on PREEMPT_RT kernel, these lock is 
> replaced with sleepbale rt spin lock, so the above problem is triggered.
> fix it by migrating the release action from the IPI interrupt context 
> to the task context on RT kernel.

>I haven't seen that while playing with kasan. Is this new?
>Could we fix in a way that we don't involve freeing memory from in-IRQ?
>This could trigger a lockdep splat if the local-lock in SLUB is acquired from in-IRQ context on !PREEMPT_RT.

Hi, I  will move qlist_free_all() from IPI context to task context,
This operation and the next release  members
in the quarantine pool operate similarly

I don't know the phenomenon you described. Can you explain it in detail?

Thanks
Zqiang


> Signed-off-by: Zqiang <qiang1.zhang@intel.com>

Sebastian
Sebastian Andrzej Siewior April 1, 2022, 11:15 a.m. UTC | #2
On 2022-04-01 10:10:38 [+0000], Zhang, Qiang1 wrote:
> >Could we fix in a way that we don't involve freeing memory from in-IRQ?
> >This could trigger a lockdep splat if the local-lock in SLUB is acquired from in-IRQ context on !PREEMPT_RT.
> 
> Hi, I  will move qlist_free_all() from IPI context to task context,
> This operation and the next release  members
> in the quarantine pool operate similarly
> 
> I don't know the phenomenon you described. Can you explain it in detail?

If you mean by phenomenon my second sentence then the kernel option
CONFIG_PROVE_RAW_LOCK_NESTING will trigger on !PREEMPT_RT in a code
sequence like
	raw_spin_lock()
	spin_lock();

which is wrong on PREEMPT_RT. So we have a warning on both
configurations.
The call chain in your case will probably not lead to a warning since
there is no raw_spinlock_t involved within the IPI call. We worked on
avoiding memory allocation and freeing from in-IRQ context therefore I
would prefer to have something that works for both and not just ifdef
around the RT-case.

> Thanks
> Zqiang

Sebastian
Dmitry Vyukov April 1, 2022, 11:39 a.m. UTC | #3
On Fri, 1 Apr 2022 at 11:09, Zqiang <qiang1.zhang@intel.com> wrote:
>
> BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
> in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> preempt_count: 1, expected: 0
> ...........
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt #22
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
> Call Trace:
> <TASK>
> dump_stack_lvl+0x60/0x8c
> dump_stack+0x10/0x12
>  __might_resched.cold+0x13b/0x173
> rt_spin_lock+0x5b/0xf0
>  ___cache_free+0xa5/0x180
> qlist_free_all+0x7a/0x160
> per_cpu_remove_cache+0x5f/0x70
> smp_call_function_many_cond+0x4c4/0x4f0
> on_each_cpu_cond_mask+0x49/0xc0
> kasan_quarantine_remove_cache+0x54/0xf0
> kasan_cache_shrink+0x9/0x10
> kmem_cache_shrink+0x13/0x20
> acpi_os_purge_cache+0xe/0x20
> acpi_purge_cached_objects+0x21/0x6d
> acpi_initialize_objects+0x15/0x3b
> acpi_init+0x130/0x5ba
> do_one_initcall+0xe5/0x5b0
> kernel_init_freeable+0x34f/0x3ad
> kernel_init+0x1e/0x140
> ret_from_fork+0x22/0x30
>
> When the kmem_cache_shrink() be called, the IPI was triggered, the
> ___cache_free() is called in IPI interrupt context, the local lock
> or spin lock will be acquired. on PREEMPT_RT kernel, these lock is
> replaced with sleepbale rt spin lock, so the above problem is triggered.
> fix it by migrating the release action from the IPI interrupt context
> to the task context on RT kernel.
>
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> ---
>  mm/kasan/quarantine.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
> index 08291ed33e93..c26fa6473119 100644
> --- a/mm/kasan/quarantine.c
> +++ b/mm/kasan/quarantine.c
> @@ -90,6 +90,7 @@ static void qlist_move_all(struct qlist_head *from, struct qlist_head *to)
>   */
>  static DEFINE_PER_CPU(struct qlist_head, cpu_quarantine);
>
> +static DEFINE_PER_CPU(struct qlist_head, cpu_shrink_qlist);
>  /* Round-robin FIFO array of batches. */
>  static struct qlist_head global_quarantine[QUARANTINE_BATCHES];
>  static int quarantine_head;
> @@ -311,12 +312,14 @@ static void qlist_move_cache(struct qlist_head *from,
>  static void per_cpu_remove_cache(void *arg)
>  {
>         struct kmem_cache *cache = arg;
> -       struct qlist_head to_free = QLIST_INIT;
> +       struct qlist_head *to_free;
>         struct qlist_head *q;
>
> +       to_free = this_cpu_ptr(&cpu_shrink_qlist);
>         q = this_cpu_ptr(&cpu_quarantine);
> -       qlist_move_cache(q, &to_free, cache);
> -       qlist_free_all(&to_free, cache);
> +       qlist_move_cache(q, to_free, cache);
> +       if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> +               qlist_free_all(to_free, cache);
>  }
>
>  /* Free all quarantined objects belonging to cache. */
> @@ -324,6 +327,7 @@ void kasan_quarantine_remove_cache(struct kmem_cache *cache)
>  {
>         unsigned long flags, i;
>         struct qlist_head to_free = QLIST_INIT;
> +       int cpu;
>
>         /*
>          * Must be careful to not miss any objects that are being moved from
> @@ -334,6 +338,11 @@ void kasan_quarantine_remove_cache(struct kmem_cache *cache)
>          */
>         on_each_cpu(per_cpu_remove_cache, cache, 1);
>
> +       if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
> +               for_each_possible_cpu(cpu)
> +                       qlist_free_all(per_cpu_ptr(&cpu_shrink_qlist, cpu), cache);
> +       }

Hi Zqiang,

This code is not protected by any kind of mutex, right? If so, I think
it can lead to subtle memory corruptions, double-frees and leaks when
several tasks move to/free from cpu_shrink_qlist list.


>         raw_spin_lock_irqsave(&quarantine_lock, flags);
>         for (i = 0; i < QUARANTINE_BATCHES; i++) {
>                 if (qlist_empty(&global_quarantine[i]))
Mike Galbraith April 4, 2022, 12:12 p.m. UTC | #4
On Fri, 2022-04-01 at 11:27 +0200, Sebastian Andrzej Siewior wrote:
> On 2022-04-01 17:10:06 [+0800], Zqiang wrote:
> > BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
> > in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0
> > preempt_count: 1, expected: 0
> > ...........
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.1-rt16-yocto-preempt-rt #22
> > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> > BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014
> > Call Trace:
> > <TASK>
> > dump_stack_lvl+0x60/0x8c
> > dump_stack+0x10/0x12
> >  __might_resched.cold+0x13b/0x173
> > rt_spin_lock+0x5b/0xf0
> >  ___cache_free+0xa5/0x180
> > qlist_free_all+0x7a/0x160
> > per_cpu_remove_cache+0x5f/0x70
> > smp_call_function_many_cond+0x4c4/0x4f0
> > on_each_cpu_cond_mask+0x49/0xc0
> > kasan_quarantine_remove_cache+0x54/0xf0
> > kasan_cache_shrink+0x9/0x10
> > kmem_cache_shrink+0x13/0x20
> > acpi_os_purge_cache+0xe/0x20
> > acpi_purge_cached_objects+0x21/0x6d
> > acpi_initialize_objects+0x15/0x3b
> > acpi_init+0x130/0x5ba
> > do_one_initcall+0xe5/0x5b0
> > kernel_init_freeable+0x34f/0x3ad
> > kernel_init+0x1e/0x140
> > ret_from_fork+0x22/0x30
> >
> > When the kmem_cache_shrink() be called, the IPI was triggered, the
> > ___cache_free() is called in IPI interrupt context, the local lock
> > or spin lock will be acquired. on PREEMPT_RT kernel, these lock is
> > replaced with sleepbale rt spin lock, so the above problem is triggered.
> > fix it by migrating the release action from the IPI interrupt context
> > to the task context on RT kernel.
>
> I haven't seen that while playing with kasan. Is this new?

Don't think so, the rock below was apparently first tossed at 5.12.

---
 lib/stackdepot.c      |    3 +++
 mm/kasan/quarantine.c |   49 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)

--- a/lib/stackdepot.c
+++ b/lib/stackdepot.c
@@ -375,6 +375,9 @@ depot_stack_handle_t __stack_depot_save(
 	if (found)
 		goto exit;

+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && can_alloc && !preemptible())
+		can_alloc = false;
+
 	/*
 	 * Check if the current or the next stack slab need to be initialized.
 	 * If so, allocate the memory - we won't be able to do that under the
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -19,6 +19,9 @@
 #include <linux/srcu.h>
 #include <linux/string.h>
 #include <linux/types.h>
+#include <linux/cpu.h>
+#include <linux/mutex.h>
+#include <linux/workqueue.h>
 #include <linux/cpuhotplug.h>

 #include "../slab.h"
@@ -319,6 +322,48 @@ static void per_cpu_remove_cache(void *a
 	qlist_free_all(&to_free, cache);
 }

+#ifdef CONFIG_PREEMPT_RT
+struct remove_cache_work {
+	struct work_struct work;
+	struct kmem_cache *cache;
+};
+
+static DEFINE_MUTEX(remove_caches_lock);
+static DEFINE_PER_CPU(struct remove_cache_work, remove_cache_work);
+
+static void per_cpu_remove_cache_work(struct work_struct *w)
+{
+	struct remove_cache_work *rcw;
+
+	rcw = container_of(w, struct remove_cache_work, work);
+	per_cpu_remove_cache(rcw->cache);
+}
+
+static void per_cpu_remove_caches_sync(struct kmem_cache *cache)
+{
+	struct remove_cache_work *rcw;
+	unsigned int cpu;
+
+	cpus_read_lock();
+	mutex_lock(&remove_caches_lock);
+
+	for_each_online_cpu(cpu) {
+		rcw = &per_cpu(remove_cache_work, cpu);
+		INIT_WORK(&rcw->work, per_cpu_remove_cache_work);
+		rcw->cache = cache;
+		schedule_work_on(cpu, &rcw->work);
+	}
+
+	for_each_online_cpu(cpu) {
+		rcw = &per_cpu(remove_cache_work, cpu);
+		flush_work(&rcw->work);
+	}
+
+	mutex_unlock(&remove_caches_lock);
+	cpus_read_unlock();
+}
+#endif
+
 /* Free all quarantined objects belonging to cache. */
 void kasan_quarantine_remove_cache(struct kmem_cache *cache)
 {
@@ -332,7 +377,11 @@ void kasan_quarantine_remove_cache(struc
 	 * achieves the first goal, while synchronize_srcu() achieves the
 	 * second.
 	 */
+#ifndef CONFIG_PREEMPT_RT
 	on_each_cpu(per_cpu_remove_cache, cache, 1);
+#else
+	per_cpu_remove_caches_sync(cache);
+#endif

 	raw_spin_lock_irqsave(&quarantine_lock, flags);
 	for (i = 0; i < QUARANTINE_BATCHES; i++) {
diff mbox series

Patch

diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c
index 08291ed33e93..c26fa6473119 100644
--- a/mm/kasan/quarantine.c
+++ b/mm/kasan/quarantine.c
@@ -90,6 +90,7 @@  static void qlist_move_all(struct qlist_head *from, struct qlist_head *to)
  */
 static DEFINE_PER_CPU(struct qlist_head, cpu_quarantine);
 
+static DEFINE_PER_CPU(struct qlist_head, cpu_shrink_qlist);
 /* Round-robin FIFO array of batches. */
 static struct qlist_head global_quarantine[QUARANTINE_BATCHES];
 static int quarantine_head;
@@ -311,12 +312,14 @@  static void qlist_move_cache(struct qlist_head *from,
 static void per_cpu_remove_cache(void *arg)
 {
 	struct kmem_cache *cache = arg;
-	struct qlist_head to_free = QLIST_INIT;
+	struct qlist_head *to_free;
 	struct qlist_head *q;
 
+	to_free = this_cpu_ptr(&cpu_shrink_qlist);
 	q = this_cpu_ptr(&cpu_quarantine);
-	qlist_move_cache(q, &to_free, cache);
-	qlist_free_all(&to_free, cache);
+	qlist_move_cache(q, to_free, cache);
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		qlist_free_all(to_free, cache);
 }
 
 /* Free all quarantined objects belonging to cache. */
@@ -324,6 +327,7 @@  void kasan_quarantine_remove_cache(struct kmem_cache *cache)
 {
 	unsigned long flags, i;
 	struct qlist_head to_free = QLIST_INIT;
+	int cpu;
 
 	/*
 	 * Must be careful to not miss any objects that are being moved from
@@ -334,6 +338,11 @@  void kasan_quarantine_remove_cache(struct kmem_cache *cache)
 	 */
 	on_each_cpu(per_cpu_remove_cache, cache, 1);
 
+	if (IS_ENABLED(CONFIG_PREEMPT_RT)) {
+		for_each_possible_cpu(cpu)
+			qlist_free_all(per_cpu_ptr(&cpu_shrink_qlist, cpu), cache);
+	}
+
 	raw_spin_lock_irqsave(&quarantine_lock, flags);
 	for (i = 0; i < QUARANTINE_BATCHES; i++) {
 		if (qlist_empty(&global_quarantine[i]))