diff mbox

[RFC] uprobes: copy to user-space xol page with proper cache flushing

Message ID 534D6A1F.70102@linaro.org
State New
Headers show

Commit Message

David Long April 15, 2014, 5:19 p.m. UTC
On 04/15/14 11:46, Oleg Nesterov wrote:
> 
> But. Please do not add copy_to_user_page() into copy_to_page() (as your patch
> did). This is certainly not what uprobe_write_opcode() wants, we do not want
> or need "flush" in this case. The same for __create_xol_area().
> 

It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot()
would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly
be more philosophically correct for all architectures):




Opinions?  It's possible this approach isn't good enough.  Cache operations and VM
are not my strong suit.

-dl

Comments

David Miller April 15, 2014, 5:38 p.m. UTC | #1
From: David Long <dave.long@linaro.org>
Date: Tue, 15 Apr 2014 13:19:27 -0400

> On 04/15/14 11:46, Oleg Nesterov wrote:
>> 
>> But. Please do not add copy_to_user_page() into copy_to_page() (as your patch
>> did). This is certainly not what uprobe_write_opcode() wants, we do not want
>> or need "flush" in this case. The same for __create_xol_area().
>> 
> 
> It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot()
> would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly
> be more philosophically correct for all architectures):

It occurs to me that because of the specific environment in which this
executes we can make the interface take advantage of the invariants at
this call site:

1) We are copying into userspace and thus current->mm

2) We are only storing an instruction or two

So it would be just like the dynamic linker lazy resolving a PLT slot in
userland.

Furthermore, we can do the stores using something akin to put_user(),
directly store them into userspace.

This avoids completely any D-cache aliasing issues.  The kernel stores
to the same address, and therefore the same cache lines, as userspace
would.

So for example, since even on the most braindamaged sparc64 cpus the
remote cpus will have their I-cache snoop the store we do locally we
just need to do a simple flush on the local cpu where this store is
happening.  So I'd like to do this something like:

	stw	%r1, [%r2 + 0x0]
	flush	%r2

PowerPC could probably do something similar.

The copy_to_user_page() interface has to deal with storing into a non-current
'mm' and therefore via the kernel side copy of the page and that's why the
D-cache aliasing issues are so painful to deal with.

Just a thought...
Oleg Nesterov April 15, 2014, 5:43 p.m. UTC | #2
On 04/15, David Long wrote:
>
> On 04/15/14 11:46, Oleg Nesterov wrote:
> >
> > But. Please do not add copy_to_user_page() into copy_to_page() (as your patch
> > did). This is certainly not what uprobe_write_opcode() wants, we do not want
> > or need "flush" in this case. The same for __create_xol_area().
> >
>
> It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot()
> would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly
> be more philosophically correct for all architectures):

David, let me repeat. I am not going to argue even if I obviously like
the Linus's suggestion more.

I only argued with the suggestions to follow the __access_remote_vm()
logic.

> @@ -1297,13 +1298,11 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
>                 return 0;
>
>         /* Initialize the slot */
> -       copy_to_page(area->page, xol_vaddr,
> -                       &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
> -       /*
> -        * We probably need flush_icache_user_range() but it needs vma.
> -        * This should work on supported architectures too.
> -        */
> -       flush_dcache_page(area->page);
> +       kaddr = kmap_atomic(area->page);
> +       __copy_to_user_page(current->mm, area->page, xol_vaddr,
> +                       kaddr + (xol_vaddr & ~PAGE_MASK),
> +                       &uprobe->arch.ixol, sizeof(uprobe->arch.ixol), true);
> +       kunmap_atomic(kaddr);

This all is fine, but you need to introduce __copy_to_user_page() first.

And you need to do this for every arch. And you need to verify that it
does not need mmap_sem. It seems that at least alpha needs this lock.
And as Russel pointed out (again, again, I could easily misunderstood him)
you need preempt_disable() around memcpy + flush, so down_read(mmap_sem)
should probably go into __copy_to_user_page(). And I am not sure this
helper needs "struct mm_struct *mm" argument, but this is minor.

Finally, let me repeat, you should verify that this
__copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr
is not mmapped, or its mapping do not match area->page.

Good luck ;)

Oleg.
David Miller April 15, 2014, 5:46 p.m. UTC | #3
From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 15 Apr 2014 19:43:30 +0200

> Finally, let me repeat, you should verify that this
> __copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr
> is not mmapped, or its mapping do not match area->page.

Just directly access userspace with the usual exception mechanism we
use for copy_to_user(), put_user(), et al. and if it faults you'll get
-EFAULT and handle it.

This also avoids the D-cache aliasing issues entirely as I explained
in my other reply.
Oleg Nesterov April 15, 2014, 5:49 p.m. UTC | #4
On 04/15, David Miller wrote:
>
> From: David Long <dave.long@linaro.org>
> Date: Tue, 15 Apr 2014 13:19:27 -0400
>
> > On 04/15/14 11:46, Oleg Nesterov wrote:
> >>
> >> But. Please do not add copy_to_user_page() into copy_to_page() (as your patch
> >> did). This is certainly not what uprobe_write_opcode() wants, we do not want
> >> or need "flush" in this case. The same for __create_xol_area().
> >>
> >
> > It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot()
> > would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly
> > be more philosophically correct for all architectures):
>
> It occurs to me that because of the specific environment in which this
> executes we can make the interface take advantage of the invariants at
> this call site:
>
> 1) We are copying into userspace and thus current->mm
>
> 2) We are only storing an instruction or two

Not sure this can help, but also

  3) nobody else will even read/execute this peace of memory

> Furthermore, we can do the stores using something akin to put_user(),
> directly store them into userspace.
>
> This avoids completely any D-cache aliasing issues.  The kernel stores
> to the same address, and therefore the same cache lines, as userspace
> would.

Heh. Of course I thought about this. This can not work afaics.

We do not want to write to, say, page cache if the probed application
mmaps a file at the same vaddr.

Oleg.
David Miller April 15, 2014, 5:50 p.m. UTC | #5
From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 15 Apr 2014 19:49:00 +0200

> We do not want to write to, say, page cache if the probed application
> mmaps a file at the same vaddr.

If user has write access to that page... not our problem.

And if he doesn't we'll fault.
Oleg Nesterov April 15, 2014, 6:03 p.m. UTC | #6
On 04/15, David Miller wrote:
>
> From: Oleg Nesterov <oleg@redhat.com>
> Date: Tue, 15 Apr 2014 19:43:30 +0200
>
> > Finally, let me repeat, you should verify that this
> > __copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr
> > is not mmapped, or its mapping do not match area->page.
>
> Just directly access userspace with the usual exception mechanism we
> use for copy_to_user(), put_user(), et al. and if it faults you'll get
> -EFAULT and handle it.
>
> This also avoids the D-cache aliasing issues entirely as I explained
> in my other reply.

Yes, yes, this is obvious.

But I have no idea what else we should do to take care of icache.

Oleg.
Oleg Nesterov April 15, 2014, 6:07 p.m. UTC | #7
On 04/15, David Miller wrote:
>
> From: Oleg Nesterov <oleg@redhat.com>
> Date: Tue, 15 Apr 2014 19:49:00 +0200
>
> > We do not want to write to, say, page cache if the probed application
> > mmaps a file at the same vaddr.
>
> If user has write access to that page... not our problem.

Well, I am not sure.

Yes, this won't allow you to write the exploit. But if this actually
happens because an application is buggy, this can lead to really
hard-to-debug problems.

Oleg.
David Miller April 15, 2014, 6:27 p.m. UTC | #8
From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 15 Apr 2014 20:07:00 +0200

> On 04/15, David Miller wrote:
>>
>> From: Oleg Nesterov <oleg@redhat.com>
>> Date: Tue, 15 Apr 2014 19:49:00 +0200
>>
>> > We do not want to write to, say, page cache if the probed application
>> > mmaps a file at the same vaddr.
>>
>> If user has write access to that page... not our problem.
> 
> Well, I am not sure.
> 
> Yes, this won't allow you to write the exploit. But if this actually
> happens because an application is buggy, this can lead to really
> hard-to-debug problems.

And this is different from a stray userland scribble onto such a page
how?

This is all so much simpler if you don't insist on doing all of this
hand-holding for the user, that's not the kernel's job.
David Miller April 15, 2014, 6:30 p.m. UTC | #9
From: Oleg Nesterov <oleg@redhat.com>
Date: Tue, 15 Apr 2014 20:03:11 +0200

> On 04/15, David Miller wrote:
>>
>> From: Oleg Nesterov <oleg@redhat.com>
>> Date: Tue, 15 Apr 2014 19:43:30 +0200
>>
>> > Finally, let me repeat, you should verify that this
>> > __copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr
>> > is not mmapped, or its mapping do not match area->page.
>>
>> Just directly access userspace with the usual exception mechanism we
>> use for copy_to_user(), put_user(), et al. and if it faults you'll get
>> -EFAULT and handle it.
>>
>> This also avoids the D-cache aliasing issues entirely as I explained
>> in my other reply.
> 
> Yes, yes, this is obvious.
> 
> But I have no idea what else we should do to take care of icache.

Russell, if for example userland stores into the instruction stream
to patch an instruction, what needs to be done on ARM?
Oleg Nesterov April 15, 2014, 6:46 p.m. UTC | #10
On 04/15, David Miller wrote:
>
> From: Oleg Nesterov <oleg@redhat.com>
> Date: Tue, 15 Apr 2014 20:07:00 +0200
>
> > On 04/15, David Miller wrote:
> >>
> >> From: Oleg Nesterov <oleg@redhat.com>
> >> Date: Tue, 15 Apr 2014 19:49:00 +0200
> >>
> >> > We do not want to write to, say, page cache if the probed application
> >> > mmaps a file at the same vaddr.
> >>
> >> If user has write access to that page... not our problem.
> >
> > Well, I am not sure.
> >
> > Yes, this won't allow you to write the exploit. But if this actually
> > happens because an application is buggy, this can lead to really
> > hard-to-debug problems.
>
> And this is different from a stray userland scribble onto such a page
> how?

Sure. But I think that the kernel should not write to the file just because
a buggy application did the extra munmap() or passed a wrong addres to
mmap/munmap.

> This is all so much simpler if you don't insist on doing all of this
> hand-holding for the user, that's not the kernel's job.

Well, I do not agree, but perhaps because I do not really understand
what do you mean.

Oleg.
Russell King - ARM Linux April 15, 2014, 6:47 p.m. UTC | #11
On Tue, Apr 15, 2014 at 02:30:54PM -0400, David Miller wrote:
> Russell, if for example userland stores into the instruction stream
> to patch an instruction, what needs to be done on ARM?

Hi David,

I'm afraid I won't be too responsive this week, but I'll try to provide
input where I can.  So I'll try to cover all points in the previous
discussion in this reply.

I think your suggestion that we should be writing directly to userspace
from uprobes is definitely the correct way forward, as that nicely takes
any d-cache aliasing issues completely out of the picture.

However, unlike your "most braindead sparc" CPU, the i-cache doesn't
snoop d-cache stores at all.  However, this is something that we already
deal with since self-modifying code has to work, so (from userspace) we
have a syscall that is used to sort that out.  Internally in the kernel,
this translates to:

	ret = flush_cache_user_range(start, end)

This deals with whatever the CPU requires to be able to correctly execute
code which has been previously written in the range - and only actions on
the currently mapped userspace.

I hope this helps.
David Miller April 15, 2014, 6:50 p.m. UTC | #12
From: David Miller <davem@davemloft.net>
Date: Tue, 15 Apr 2014 14:30:54 -0400 (EDT)

> Russell, if for example userland stores into the instruction stream
> to patch an instruction, what needs to be done on ARM?

Looking around I suspect something like:

	mcrne	p15, 0, INSN_ADDR, c7, c5, 1

after the instruction stores will do it.
David Miller April 15, 2014, 6:53 p.m. UTC | #13
From: Russell King - ARM Linux <linux@arm.linux.org.uk>
Date: Tue, 15 Apr 2014 19:47:26 +0100

> However, unlike your "most braindead sparc" CPU, the i-cache doesn't
> snoop d-cache stores at all.

All that matters is remote snooping on SMP, not local snooping.

> However, this is something that we already deal with since
> self-modifying code has to work, so (from userspace) we have a
> syscall that is used to sort that out.  Internally in the kernel,
> this translates to:
> 
> 	ret = flush_cache_user_range(start, end)
> 
> This deals with whatever the CPU requires to be able to correctly execute
> code which has been previously written in the range - and only actions on
> the currently mapped userspace.

Looking around it seems the I-cache line mcr operation should do the
right thing for most chips.

You could simply make a new cpuc op for writing an instruction or two
to userspace and doing the I-cache line mcr op afterwards.
Russell King - ARM Linux April 15, 2014, 7:29 p.m. UTC | #14
On Tue, Apr 15, 2014 at 02:50:06PM -0400, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 15 Apr 2014 14:30:54 -0400 (EDT)
> 
> > Russell, if for example userland stores into the instruction stream
> > to patch an instruction, what needs to be done on ARM?
> 
> Looking around I suspect something like:
> 
> 	mcrne	p15, 0, INSN_ADDR, c7, c5, 1
> 
> after the instruction stores will do it.

It does still need to be pushed out of the D-cache first though.  So,
for ARMv7 for example:

	str	NEW_INSN, [INSN_ADDR]		@ store new instruction
	mcr	p15, 0, INSN_ADDR, c7, c11, 1	@ clean d line
	mcr	p15, 0, INSN_ADDR, c7, c5, 1	@ flush i line

would do it.  We of course need the user access marking on that (so that
any fault doesn't oops the kernel) - not only for the store, but also the
following two instructions which could fault (and oops unless they're
marked with a fixup) if someone were to munmap() this page in another
thread.  All those fixups can just do the "lets return -EFAULT" from
the operation.
David Long April 15, 2014, 7:39 p.m. UTC | #15
On 04/15/14 13:43, Oleg Nesterov wrote:
> On 04/15, David Long wrote:
>>
>> On 04/15/14 11:46, Oleg Nesterov wrote:
>>>
>>> But. Please do not add copy_to_user_page() into copy_to_page() (as your patch
>>> did). This is certainly not what uprobe_write_opcode() wants, we do not want
>>> or need "flush" in this case. The same for __create_xol_area().
>>>
>>
>> It looked me like a call to a new __copy_to_user_page(current->mm, ...) in xol_get_insn_slot()
>> would be in line with David Miller's suggestion and would cure the problem on ARM (and hopefuly
>> be more philosophically correct for all architectures):
>
> David, let me repeat. I am not going to argue even if I obviously like
> the Linus's suggestion more.
>
> I only argued with the suggestions to follow the __access_remote_vm()
> logic.
>
>> @@ -1297,13 +1298,11 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
>>                  return 0;
>>
>>          /* Initialize the slot */
>> -       copy_to_page(area->page, xol_vaddr,
>> -                       &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
>> -       /*
>> -        * We probably need flush_icache_user_range() but it needs vma.
>> -        * This should work on supported architectures too.
>> -        */
>> -       flush_dcache_page(area->page);
>> +       kaddr = kmap_atomic(area->page);
>> +       __copy_to_user_page(current->mm, area->page, xol_vaddr,
>> +                       kaddr + (xol_vaddr & ~PAGE_MASK),
>> +                       &uprobe->arch.ixol, sizeof(uprobe->arch.ixol), true);
>> +       kunmap_atomic(kaddr);
>
> This all is fine, but you need to introduce __copy_to_user_page() first.
>

Yes, I have that coded for 14 out of 24 architectures (the easy ones). 
The remaining ones need more work.  Some may prove problematic.  The 
whole approach could yet prove impractical.  More recent suggested 
approaches may be better too (or Linus's relatively simple change).

> And you need to do this for every arch. And you need to verify that it
> does not need mmap_sem. It seems that at least alpha needs this lock.
> And as Russel pointed out (again, again, I could easily misunderstood him)
> you need preempt_disable() around memcpy + flush, so down_read(mmap_sem)
> should probably go into __copy_to_user_page(). And I am not sure this
> helper needs "struct mm_struct *mm" argument, but this is minor.
>

I will look at the locking.  I'm not sure yet if anything needs struct 
mm_struct either.

> Finally, let me repeat, you should verify that this
> __copy_to_user_page(page, uaddr, kaddr) will not something bad if uaddr
> is not mmapped, or its mapping do not match area->page.
>
> Good luck ;)
>
> Oleg.
>
David Miller April 15, 2014, 7:51 p.m. UTC | #16
From: Russell King - ARM Linux <linux@arm.linux.org.uk>
Date: Tue, 15 Apr 2014 20:29:15 +0100

> On Tue, Apr 15, 2014 at 02:50:06PM -0400, David Miller wrote:
>> From: David Miller <davem@davemloft.net>
>> Date: Tue, 15 Apr 2014 14:30:54 -0400 (EDT)
>> 
>> > Russell, if for example userland stores into the instruction stream
>> > to patch an instruction, what needs to be done on ARM?
>> 
>> Looking around I suspect something like:
>> 
>> 	mcrne	p15, 0, INSN_ADDR, c7, c5, 1
>> 
>> after the instruction stores will do it.
> 
> It does still need to be pushed out of the D-cache first though.  So,
> for ARMv7 for example:
> 
> 	str	NEW_INSN, [INSN_ADDR]		@ store new instruction
> 	mcr	p15, 0, INSN_ADDR, c7, c11, 1	@ clean d line
> 	mcr	p15, 0, INSN_ADDR, c7, c5, 1	@ flush i line
> 
> would do it.  We of course need the user access marking on that (so that
> any fault doesn't oops the kernel) - not only for the store, but also the
> following two instructions which could fault (and oops unless they're
> marked with a fixup) if someone were to munmap() this page in another
> thread.  All those fixups can just do the "lets return -EFAULT" from
> the operation.

Right, you'd need the exception table business used in put_user() et al.
David Miller April 15, 2014, 7:53 p.m. UTC | #17
From: David Long <dave.long@linaro.org>
Date: Tue, 15 Apr 2014 15:39:18 -0400

> Yes, I have that coded for 14 out of 24 architectures (the easy
> ones). The remaining ones need more work.  Some may prove problematic.
> The whole approach could yet prove impractical.  More recent suggested
> approaches may be better too (or Linus's relatively simple change).

BTW, another reason perhaps to write directly to userspace and have
a uprobes_*() specific interface is that you'll only need to do
3 architectures, the ones that support uprobes.

That way, this is just part of what every arch needs to implement in
order to support uprobes.
diff mbox

Patch

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 04709b6..b418626 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1287,6 +1287,7 @@  static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
 {
        struct xol_area *area;
        unsigned long xol_vaddr;
+       void *kaddr;
 
        area = get_xol_area();
        if (!area)
@@ -1297,13 +1298,11 @@  static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
                return 0;
 
        /* Initialize the slot */
-       copy_to_page(area->page, xol_vaddr,
-                       &uprobe->arch.ixol, sizeof(uprobe->arch.ixol));
-       /*
-        * We probably need flush_icache_user_range() but it needs vma.
-        * This should work on supported architectures too.
-        */
-       flush_dcache_page(area->page);
+       kaddr = kmap_atomic(area->page);
+       __copy_to_user_page(current->mm, area->page, xol_vaddr,
+                       kaddr + (xol_vaddr & ~PAGE_MASK),
+                       &uprobe->arch.ixol, sizeof(uprobe->arch.ixol), true);
+       kunmap_atomic(kaddr);
 
        return xol_vaddr;
 }