diff mbox series

Detect early free of a live mm

Message ID 20180228121458.2230-1-mark.rutland@arm.com
State New
Headers show
Series Detect early free of a live mm | expand

Commit Message

Mark Rutland Feb. 28, 2018, 12:14 p.m. UTC
KASAN splats indicate that in some cases we free a live mm, then
continue to access it, with potentially disastrous results. This is
likely due to a mismatched mmdrop() somewhere in the kernel, but so far
the culprit remains elusive.

Let's have __mmdrop() verify that the mm isn't live for the current
task, similar to the existing check for init_mm. This way, we can catch
this class of issue earlier, and without requiring KASAN.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Peter Zijlstra <peterz@infraded.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 kernel/fork.c | 2 ++
 1 file changed, 2 insertions(+)

Hi,

For context, we're seeing an intermittent use-after-free of an mm on
arm64 [1], where it looks like an mm has been freed earlier than
expected. So far KASAN has only caught legitimate mmdrop() uses, where
mm_count is presumably already bogus.

Mark.

[1] https://lkml.kernel.org/r/20180214120254.qq4w4s42ecxio7lu@lakrids.cambridge.arm.com

-- 
2.11.0

Comments

Mark Rutland Feb. 28, 2018, 12:18 p.m. UTC | #1
Ugh, I messed up Peter's email when sending this out. please
s/infraded/infradead/ if replying to the first mail.

Sorry about that.

Mark.

On Wed, Feb 28, 2018 at 12:14:58PM +0000, Mark Rutland wrote:
> KASAN splats indicate that in some cases we free a live mm, then

> continue to access it, with potentially disastrous results. This is

> likely due to a mismatched mmdrop() somewhere in the kernel, but so far

> the culprit remains elusive.

> 

> Let's have __mmdrop() verify that the mm isn't live for the current

> task, similar to the existing check for init_mm. This way, we can catch

> this class of issue earlier, and without requiring KASAN.

> 

> Signed-off-by: Mark Rutland <mark.rutland@arm.com>

> Cc: Andrew Morton <akpm@linux-foundation.org>

> Cc: Ingo Molnar <mingo@kernel.org>

> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

> Cc: Michal Hocko <mhocko@suse.com>

> Cc: Peter Zijlstra <peterz@infraded.org>

> Cc: Rik van Riel <riel@redhat.com>

> Cc: Will Deacon <will.deacon@arm.com>

> ---

>  kernel/fork.c | 2 ++

>  1 file changed, 2 insertions(+)

> 

> Hi,

> 

> For context, we're seeing an intermittent use-after-free of an mm on

> arm64 [1], where it looks like an mm has been freed earlier than

> expected. So far KASAN has only caught legitimate mmdrop() uses, where

> mm_count is presumably already bogus.

> 

> Mark.

> 

> [1] https://lkml.kernel.org/r/20180214120254.qq4w4s42ecxio7lu@lakrids.cambridge.arm.com

> 

> diff --git a/kernel/fork.c b/kernel/fork.c

> index e5d9d405ae4e..6922d93551b8 100644

> --- a/kernel/fork.c

> +++ b/kernel/fork.c

> @@ -595,6 +595,8 @@ static void check_mm(struct mm_struct *mm)

>  void __mmdrop(struct mm_struct *mm)

>  {

>  	BUG_ON(mm == &init_mm);

> +	BUG_ON(mm == current->mm);

> +	BUG_ON(mm == current->active_mm);

>  	mm_free_pgd(mm);

>  	destroy_context(mm);

>  	hmm_mm_destroy(mm);

> -- 

> 2.11.0

>
Michal Hocko March 1, 2018, 9:35 a.m. UTC | #2
On Wed 28-02-18 12:18:10, Mark Rutland wrote:
> Ugh, I messed up Peter's email when sending this out. please

> s/infraded/infradead/ if replying to the first mail.

> 

> Sorry about that.

> 

> Mark.

> 

> On Wed, Feb 28, 2018 at 12:14:58PM +0000, Mark Rutland wrote:

> > KASAN splats indicate that in some cases we free a live mm, then

> > continue to access it, with potentially disastrous results. This is

> > likely due to a mismatched mmdrop() somewhere in the kernel, but so far

> > the culprit remains elusive.

> > 

> > Let's have __mmdrop() verify that the mm isn't live for the current

> > task, similar to the existing check for init_mm. This way, we can catch

> > this class of issue earlier, and without requiring KASAN.


Wouldn't it be better to check the mm_users count instead?
{VM_}BUG_ON(atomic_read(&mm->mm_users));

Relying on current->mm resetting works currently but this is quite a
subtle dependency.
-- 
Michal Hocko
SUSE Labs
Mark Rutland March 1, 2018, 11:22 a.m. UTC | #3
On Thu, Mar 01, 2018 at 10:35:22AM +0100, Michal Hocko wrote:
> On Wed 28-02-18 12:18:10, Mark Rutland wrote:

> > Ugh, I messed up Peter's email when sending this out. please

> > s/infraded/infradead/ if replying to the first mail.

> > 

> > Sorry about that.

> > 

> > Mark.

> > 

> > On Wed, Feb 28, 2018 at 12:14:58PM +0000, Mark Rutland wrote:

> > > KASAN splats indicate that in some cases we free a live mm, then

> > > continue to access it, with potentially disastrous results. This is

> > > likely due to a mismatched mmdrop() somewhere in the kernel, but so far

> > > the culprit remains elusive.

> > > 

> > > Let's have __mmdrop() verify that the mm isn't live for the current

> > > task, similar to the existing check for init_mm. This way, we can catch

> > > this class of issue earlier, and without requiring KASAN.

> 

> Wouldn't it be better to check the mm_users count instead?

> {VM_}BUG_ON(atomic_read(&mm->mm_users));


Perhaps, but that won't catch a mismatched mmput(), which could
decrement mm_users (and mm_count) to zero early, before current->mm is
cleared in exit_mm().

Locally, I'm testing with:

  BUG_ON(mm == current->mm);
  BUG_ON(mm == current->active_mm);
  BUG_ON(refcount_read(&mm->mm_users) != 0);
  BUG_ON(refcount_read(&mm->mm_count) != 0);

... so as to also catch an early free from another thread.

> Relying on current->mm resetting works currently but this is quite a

> subtle dependency.


Do you mean because it only catches an early free in a thread using that
mm, or is there another subtlety you had in mind?

Thanks,
Mark.
Michal Hocko March 1, 2018, 12:40 p.m. UTC | #4
On Thu 01-03-18 11:22:37, Mark Rutland wrote:
> On Thu, Mar 01, 2018 at 10:35:22AM +0100, Michal Hocko wrote:

> > On Wed 28-02-18 12:18:10, Mark Rutland wrote:

> > > Ugh, I messed up Peter's email when sending this out. please

> > > s/infraded/infradead/ if replying to the first mail.

> > > 

> > > Sorry about that.

> > > 

> > > Mark.

> > > 

> > > On Wed, Feb 28, 2018 at 12:14:58PM +0000, Mark Rutland wrote:

> > > > KASAN splats indicate that in some cases we free a live mm, then

> > > > continue to access it, with potentially disastrous results. This is

> > > > likely due to a mismatched mmdrop() somewhere in the kernel, but so far

> > > > the culprit remains elusive.

> > > > 

> > > > Let's have __mmdrop() verify that the mm isn't live for the current

> > > > task, similar to the existing check for init_mm. This way, we can catch

> > > > this class of issue earlier, and without requiring KASAN.

> > 

> > Wouldn't it be better to check the mm_users count instead?

> > {VM_}BUG_ON(atomic_read(&mm->mm_users));

> 

> Perhaps, but that won't catch a mismatched mmput(), which could

> decrement mm_users (and mm_count) to zero early, before current->mm is

> cleared in exit_mm().


true

> Locally, I'm testing with:

> 

>   BUG_ON(mm == current->mm);

>   BUG_ON(mm == current->active_mm);

>   BUG_ON(refcount_read(&mm->mm_users) != 0);

>   BUG_ON(refcount_read(&mm->mm_count) != 0);

> 

> ... so as to also catch an early free from another thread.


I would be careful about active_mm. At least {un}use_mm does some tricks
with it.

> > Relying on current->mm resetting works currently but this is quite a

> > subtle dependency.

> 

> Do you mean because it only catches an early free in a thread using that

> mm, or is there another subtlety you had in mind?


No, I was more thinking about future changes when we stop clearing
tsk->mm at exit. This is rather unlikely to be honest because that
requires more changes.
-- 
Michal Hocko
SUSE Labs
Mark Rutland March 1, 2018, 12:49 p.m. UTC | #5
On Thu, Mar 01, 2018 at 01:40:24PM +0100, Michal Hocko wrote:
> On Thu 01-03-18 11:22:37, Mark Rutland wrote:

> > On Thu, Mar 01, 2018 at 10:35:22AM +0100, Michal Hocko wrote:

> > > On Wed 28-02-18 12:18:10, Mark Rutland wrote:

> > > > Ugh, I messed up Peter's email when sending this out. please

> > > > s/infraded/infradead/ if replying to the first mail.

> > > > 

> > > > Sorry about that.

> > > > 

> > > > Mark.

> > > > 

> > > > On Wed, Feb 28, 2018 at 12:14:58PM +0000, Mark Rutland wrote:

> > > > > KASAN splats indicate that in some cases we free a live mm, then

> > > > > continue to access it, with potentially disastrous results. This is

> > > > > likely due to a mismatched mmdrop() somewhere in the kernel, but so far

> > > > > the culprit remains elusive.

> > > > > 

> > > > > Let's have __mmdrop() verify that the mm isn't live for the current

> > > > > task, similar to the existing check for init_mm. This way, we can catch

> > > > > this class of issue earlier, and without requiring KASAN.

> > > 

> > > Wouldn't it be better to check the mm_users count instead?

> > > {VM_}BUG_ON(atomic_read(&mm->mm_users));

> > 

> > Perhaps, but that won't catch a mismatched mmput(), which could

> > decrement mm_users (and mm_count) to zero early, before current->mm is

> > cleared in exit_mm().

> 

> true

> 

> > Locally, I'm testing with:

> > 

> >   BUG_ON(mm == current->mm);

> >   BUG_ON(mm == current->active_mm);

> >   BUG_ON(refcount_read(&mm->mm_users) != 0);

> >   BUG_ON(refcount_read(&mm->mm_count) != 0);

> > 

> > ... so as to also catch an early free from another thread.

> 

> I would be careful about active_mm. At least {un}use_mm does some tricks

> with it.


If that can leave mm == current->active_mm during __mmdrop(), I think
that would be a bug in its own right, since that would leave
current->active_mm pointing at a freed mm.

AFAICT, use_mm() does the usual mm_count balancing when flipping
active_mm (i.e. we mmgrab() the new mm, make it active_mm, then mmdrop()
the old active_mm), so even if the mmdrop() ends up freeing the old
active_mm, that shouldn't trigger the BUG_ON(mm == current->active_mm).

In unuse_mm() we don't poke active_mm at all, and leave that to the
usual context_switch() + finish_task_switch() grab/drop.

> > > Relying on current->mm resetting works currently but this is quite a

> > > subtle dependency.

> > 

> > Do you mean because it only catches an early free in a thread using that

> > mm, or is there another subtlety you had in mind?

> 

> No, I was more thinking about future changes when we stop clearing

> tsk->mm at exit. This is rather unlikely to be honest because that

> requires more changes.


Ok.

Thanks,
Mark.
Andrew Morton March 2, 2018, 1:16 a.m. UTC | #6
On Wed, 28 Feb 2018 12:14:58 +0000 Mark Rutland <mark.rutland@arm.com> wrote:

> KASAN splats indicate that in some cases we free a live mm, then

> continue to access it, with potentially disastrous results. This is

> likely due to a mismatched mmdrop() somewhere in the kernel, but so far

> the culprit remains elusive.

> 

> Let's have __mmdrop() verify that the mm isn't live for the current

> task, similar to the existing check for init_mm. This way, we can catch

> this class of issue earlier, and without requiring KASAN.


Presumably the results usually aren't disastrous.  But they will be if
we go and add BUG_ON()s!  Can we make this WARN_ON[_ONCE]()?  We should
still get the same info from testers.
Mark Rutland March 2, 2018, 10:24 a.m. UTC | #7
On Thu, Mar 01, 2018 at 05:16:09PM -0800, Andrew Morton wrote:
> On Wed, 28 Feb 2018 12:14:58 +0000 Mark Rutland <mark.rutland@arm.com> wrote:

> 

> > KASAN splats indicate that in some cases we free a live mm, then

> > continue to access it, with potentially disastrous results. This is

> > likely due to a mismatched mmdrop() somewhere in the kernel, but so far

> > the culprit remains elusive.

> > 

> > Let's have __mmdrop() verify that the mm isn't live for the current

> > task, similar to the existing check for init_mm. This way, we can catch

> > this class of issue earlier, and without requiring KASAN.

> 

> Presumably the results usually aren't disastrous. 


It seems so.

> But they will be if we go and add BUG_ON()s!  Can we make this

> WARN_ON[_ONCE]()?  We should still get the same info from testers.


I fuzz the kernel with panic_on_warn=1, so that's practically the same
for me.

I'll respin to that effect.

Thanks,
Mark.
Mark Rutland March 12, 2018, 1:46 p.m. UTC | #8
On Sat, Mar 10, 2018 at 05:11:39AM +0800, kernel test robot wrote:
> FYI, we noticed the following commit (built with gcc-7):

> 

> commit: 94d3a254089a7cd4f11b7071b4323afd98eea0a6 ("Detect early free of a live mm")

> url: https://github.com/0day-ci/linux/commits/Mark-Rutland/Detect-early-free-of-a-live-mm/20180303-144149


> [   47.208935] kernel BUG at kernel/fork.c:599!

> [   47.210365] invalid opcode: 0000 [#1] SMP PTI

> [   47.211336] Modules linked in:

> [   47.212145] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.16.0-rc3-00001-g94d3a25 #1

> [   47.213966] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014

> [   47.215869] RIP: 0010:__mmdrop+0x136/0x170

> [   47.216866] RSP: 0018:ffffffff82803dd8 EFLAGS: 00010293

> [   47.218160] RAX: ffffffff82818500 RBX: ffff880115770000 RCX: ffffffff810ae876

> [   47.219758] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff880115770000

> [   47.221306] RBP: ffffffff82803e00 R08: 0000000000000001 R09: 0000000000000000

> [   47.223268] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82818500

> [   47.224961] R13: ffffffff82a8ce20 R14: ffff88013ff534c0 R15: 00000000000003e7

> [   47.226716] FS:  0000000000000000(0000) GS:ffff88013b200000(0000) knlGS:0000000000000000

> [   47.228550] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

> [   47.229884] CR2: 00007fbfc2cc0190 CR3: 0000000002812000 CR4: 00000000000006f0

> [   47.231580] Call Trace:

> [   47.232144]  idle_task_exit+0x53/0x60


Luckily this is a spurious warning. In idle_task_exit(), we switch to
the init_mm, but leave active_mm stale before calling mmdrop().

In addition to the WARN_ON[_ONCE] changes, I'll drop the following in:

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e7c535eee0a6..0ef844abc2da 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5506,6 +5506,7 @@ void idle_task_exit(void)
 
        if (mm != &init_mm) {
                switch_mm(mm, &init_mm, current);
+               current->active_mm = &init_mm;
                finish_arch_post_lock_switch();
        }
        mmdrop(mm);

Thanks,
Mark.
diff mbox series

Patch

diff --git a/kernel/fork.c b/kernel/fork.c
index e5d9d405ae4e..6922d93551b8 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -595,6 +595,8 @@  static void check_mm(struct mm_struct *mm)
 void __mmdrop(struct mm_struct *mm)
 {
 	BUG_ON(mm == &init_mm);
+	BUG_ON(mm == current->mm);
+	BUG_ON(mm == current->active_mm);
 	mm_free_pgd(mm);
 	destroy_context(mm);
 	hmm_mm_destroy(mm);