diff mbox

[v15,04/10] arm64: Kprobes with single stepping support

Message ID 20160728144053.GA26510@e104818-lin.cambridge.arm.com
State New
Headers show

Commit Message

Catalin Marinas July 28, 2016, 2:40 p.m. UTC
On Wed, Jul 27, 2016 at 06:13:37PM -0400, David Long wrote:
> On 07/27/2016 07:50 AM, Daniel Thompson wrote:

> >On 25/07/16 23:27, David Long wrote:

> >>On 07/25/2016 01:13 PM, Catalin Marinas wrote:

> >>>The problem is that the original design was done on x86 for its PCS and

> >>>it doesn't always fit other architectures. So we could either ignore the

> >>>problem, hoping that no probed function requires argument passing on

> >>>stack or we copy all the valid data on the kernel stack:

> >>>

> >>>diff --git a/arch/arm64/include/asm/kprobes.h

> >>>b/arch/arm64/include/asm/kprobes.h

> >>>index 61b49150dfa3..157fd0d0aa08 100644

> >>>--- a/arch/arm64/include/asm/kprobes.h

> >>>+++ b/arch/arm64/include/asm/kprobes.h

> >>>@@ -22,7 +22,7 @@

> >>>

> >>>  #define __ARCH_WANT_KPROBES_INSN_SLOT

> >>>  #define MAX_INSN_SIZE            1

> >>>-#define MAX_STACK_SIZE            128

> >>>+#define MAX_STACK_SIZE            THREAD_SIZE

> >>>

> >>>  #define flush_insn_slot(p)        do { } while (0)

> >>>  #define kretprobe_blacklist_size    0

> >>

> >>I doubt the ARM PCS is unusual.  At any rate I'm certain there are other

> >>architectures that pass aggregate parameters on the stack. I suspect

> >>other RISC(-ish) architectures have similar PCS issues and I think this

> >>is at least a big part of where this simple copy with a 64/128 limit

> >>comes from, or at least why it continues to exist.  That said, I'm not

> >>enthusiastic about researching that assertion in detail as it could be

> >>time consuming.

> >

> >Given Mark shared a test program I *was* curious enough to take a look

> >at this.

> >

> >The only architecture I can find that behaves like arm64 with the

> >implicit pass-by-reference described by Catalin/Mark is sparc64.

> >

> >In contrast alpha, arm (32-bit), hppa64, mips64 and powerpc64 all use a

> >hybrid approach where the first fragments of the structure are passed in

> >registers and the remainder on the stack.

> 

> That's interesting.  It also looks like sparc64 does not copy any stack for

> jprobes. I guess that approach at least makes it clear what will and won't

> work.


I suggest we do the same for arm64 - avoid the copying entirely as it's
not safe anyway. We don't know how much to copy, nor can we be sure it
is safe (see Dave's DMA to the stack example). This would need to be
documented in the kprobes.txt file and MAX_STACK_SIZE removed from the
arm64 kprobes support.

There is also the case that Daniel was talking about - passing more than
8 arguments. I don't think it's worth handling this but we should at
least add a warning and skip the probe:


Unfortunately, we don't really have a way to detect large composite
types passed as arguments, so we only have to rely on the documentation.

Can you please submit a patch that removes MAX_STACK_SIZE for arm64,
documents it and include the above hunk (once tested that it actually
does what it intends to).

Thanks.

-- 
Catalin

Comments

Daniel Thompson July 29, 2016, 9:01 a.m. UTC | #1
On 28/07/16 15:40, Catalin Marinas wrote:
> On Wed, Jul 27, 2016 at 06:13:37PM -0400, David Long wrote:

>> On 07/27/2016 07:50 AM, Daniel Thompson wrote:

>>> On 25/07/16 23:27, David Long wrote:

>>>> On 07/25/2016 01:13 PM, Catalin Marinas wrote:

>>>>> The problem is that the original design was done on x86 for its PCS and

>>>>> it doesn't always fit other architectures. So we could either ignore the

>>>>> problem, hoping that no probed function requires argument passing on

>>>>> stack or we copy all the valid data on the kernel stack:

>>>>>

>>>>> diff --git a/arch/arm64/include/asm/kprobes.h

>>>>> b/arch/arm64/include/asm/kprobes.h

>>>>> index 61b49150dfa3..157fd0d0aa08 100644

>>>>> --- a/arch/arm64/include/asm/kprobes.h

>>>>> +++ b/arch/arm64/include/asm/kprobes.h

>>>>> @@ -22,7 +22,7 @@

>>>>>

>>>>>  #define __ARCH_WANT_KPROBES_INSN_SLOT

>>>>>  #define MAX_INSN_SIZE            1

>>>>> -#define MAX_STACK_SIZE            128

>>>>> +#define MAX_STACK_SIZE            THREAD_SIZE

>>>>>

>>>>>  #define flush_insn_slot(p)        do { } while (0)

>>>>>  #define kretprobe_blacklist_size    0

>>>>

>>>> I doubt the ARM PCS is unusual.  At any rate I'm certain there are other

>>>> architectures that pass aggregate parameters on the stack. I suspect

>>>> other RISC(-ish) architectures have similar PCS issues and I think this

>>>> is at least a big part of where this simple copy with a 64/128 limit

>>>> comes from, or at least why it continues to exist.  That said, I'm not

>>>> enthusiastic about researching that assertion in detail as it could be

>>>> time consuming.

>>>

>>> Given Mark shared a test program I *was* curious enough to take a look

>>> at this.

>>>

>>> The only architecture I can find that behaves like arm64 with the

>>> implicit pass-by-reference described by Catalin/Mark is sparc64.

>>>

>>> In contrast alpha, arm (32-bit), hppa64, mips64 and powerpc64 all use a

>>> hybrid approach where the first fragments of the structure are passed in

>>> registers and the remainder on the stack.

>>

>> That's interesting.  It also looks like sparc64 does not copy any stack for

>> jprobes. I guess that approach at least makes it clear what will and won't

>> work.

>

> I suggest we do the same for arm64 - avoid the copying entirely as it's

> not safe anyway. We don't know how much to copy, nor can we be sure it

> is safe (see Dave's DMA to the stack example). This would need to be

> documented in the kprobes.txt file and MAX_STACK_SIZE removed from the

> arm64 kprobes support.

>

> There is also the case that Daniel was talking about - passing more than

> 8 arguments. I don't think it's worth handling this


Its actually quite hard to document the (architecture specific) "no big 
structures" *and* the "8 argument" limits. It ends up as something like:

   Structures/unions >16 bytes must not be passed by value and the
   size of all arguments, after padding each to an 8 byte boundary, must
   be less than 64 bytes.

We cannot avoid tackling big structures through documentation but when 
we impose additional limits like "only 8 arguments" we are swapping an 
architecture neutral "gotcha" that affects almost all jprobes uses (and 
can be inferred from the documentation) with an architecture specific one!


 > but we should at

> least add a warning and skip the probe:

>

> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c

> index bf9768588288..84e02606ec3d 100644

> --- a/arch/arm64/kernel/probes/kprobes.c

> +++ b/arch/arm64/kernel/probes/kprobes.c

> @@ -491,6 +491,10 @@ int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)

>  	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();

>  	long stack_ptr = kernel_stack_pointer(regs);

>

> +	/* do not allow arguments passed on the stack */

> +	if (WARN_ON_ONCE(regs->sp != regs->regs[29]))

> +		return 0;

> +


I don't really understand this test.

If we could reliably assume that the frame record was at the lowest 
address within a stack frame then we could exploit that to store the 
stacked arguments without risking overwriting volatile variables on the 
stack.


Daniel.
Daniel Thompson Aug. 8, 2016, 11:13 a.m. UTC | #2
On 04/08/16 05:47, David Long wrote:
> From b451caa1adaf1d03e08a44b5dad3fca31cebd97a Mon Sep 17 00:00:00 2001

> From: "David A. Long" <dave.long@linaro.org>

> Date: Thu, 4 Aug 2016 00:35:33 -0400

> Subject: [PATCH] arm64: Remove stack duplicating code from jprobes

>

> Because the arm64 calling standard allows stacked function arguments to be

> anywhere in the stack frame, do not attempt to duplicate the stack frame for

> jprobes handler functions.

>

> Signed-off-by: David A. Long <dave.long@linaro.org>

> ---

>  Documentation/kprobes.txt          |  7 +++++++

>  arch/arm64/include/asm/kprobes.h   |  2 --

>  arch/arm64/kernel/probes/kprobes.c | 31 +++++--------------------------

>  3 files changed, 12 insertions(+), 28 deletions(-)

>

> diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt

> index 1f9b3e2..bd01839 100644

> --- a/Documentation/kprobes.txt

> +++ b/Documentation/kprobes.txt

> @@ -103,6 +103,13 @@ Note that the probed function's args may be passed on the stack

>  or in registers.  The jprobe will work in either case, so long as the

>  handler's prototype matches that of the probed function.

>

> +Note that in some architectures (e.g.: arm64) the stack copy is not


Could sparc64 be added to this list?

   For the sparc folks who are new to the thread, we've previously
   established that the sparc64 ABI passes large structures by
   allocating them from the caller's stack frame and passing a pointer
   to the stack frame (i.e. arguments may not be at top of the stack).
   We also noticed that sparc code does not save/restore anything from
   the stack.


> +done, as the actual location of stacked parameters may be outside of

> +a reasonable MAX_STACK_SIZE value and because that location cannot be

> +determined by the jprobes code. In this case the jprobes user must be

> +careful to make certain the calling signature of the function does

> +not cause parameters to be passed on the stack.

> +

>  1.3 Return Probes

>

>  1.3.1 How Does a Return Probe Work?

> diff --git a/arch/arm64/include/asm/kprobes.h b/arch/arm64/include/asm/kprobes.h

> index 61b4915..1737aec 100644

> --- a/arch/arm64/include/asm/kprobes.h

> +++ b/arch/arm64/include/asm/kprobes.h

> @@ -22,7 +22,6 @@

>

>  #define __ARCH_WANT_KPROBES_INSN_SLOT

>  #define MAX_INSN_SIZE			1

> -#define MAX_STACK_SIZE			128

>

>  #define flush_insn_slot(p)		do { } while (0)

>  #define kretprobe_blacklist_size	0

> @@ -47,7 +46,6 @@ struct kprobe_ctlblk {

>  	struct prev_kprobe prev_kprobe;

>  	struct kprobe_step_ctx ss_ctx;

>  	struct pt_regs jprobe_saved_regs;

> -	char jprobes_stack[MAX_STACK_SIZE];

>  };

>

>  void arch_remove_kprobe(struct kprobe *);

> diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c

> index bf97685..c6b0f40 100644

> --- a/arch/arm64/kernel/probes/kprobes.c

> +++ b/arch/arm64/kernel/probes/kprobes.c

> @@ -41,18 +41,6 @@ DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);

>  static void __kprobes

>  post_kprobe_handler(struct kprobe_ctlblk *, struct pt_regs *);

>

> -static inline unsigned long min_stack_size(unsigned long addr)

> -{

> -	unsigned long size;

> -

> -	if (on_irq_stack(addr, raw_smp_processor_id()))

> -		size = IRQ_STACK_PTR(raw_smp_processor_id()) - addr;

> -	else

> -		size = (unsigned long)current_thread_info() + THREAD_START_SP - addr;

> -

> -	return min(size, FIELD_SIZEOF(struct kprobe_ctlblk, jprobes_stack));

> -}

> -

>  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)

>  {

>  	/* prepare insn slot */

> @@ -489,20 +477,15 @@ int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)

>  {

>  	struct jprobe *jp = container_of(p, struct jprobe, kp);

>  	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();

> -	long stack_ptr = kernel_stack_pointer(regs);

>

>  	kcb->jprobe_saved_regs = *regs;

>  	/*

> -	 * As Linus pointed out, gcc assumes that the callee

> -	 * owns the argument space and could overwrite it, e.g.

> -	 * tailcall optimization. So, to be absolutely safe

> -	 * we also save and restore enough stack bytes to cover

> -	 * the argument area.

> +	 * Since we can't be sure where in the stack frame "stacked"

> +	 * pass-by-value arguments are stored we just don't try to

> +	 * duplicate any of the stack.

 > ...

>                                       Do not use jprobes on functions that

> +	 * use more than 64 bytes (after padding each to an 8 byte boundary)

> +	 * of arguments, or pass individual arguments larger than 16 bytes.


I like this wording. So much so that it really would be great to repeat 
this in the Documentation/. Could this be included in the list of 
architecture support/restrictions?


Daniel.
David Long Aug. 8, 2016, 2:29 p.m. UTC | #3
On 08/08/2016 07:13 AM, Daniel Thompson wrote:
> On 04/08/16 05:47, David Long wrote:

>> From b451caa1adaf1d03e08a44b5dad3fca31cebd97a Mon Sep 17 00:00:00 2001

>> From: "David A. Long" <dave.long@linaro.org>

>> Date: Thu, 4 Aug 2016 00:35:33 -0400

>> Subject: [PATCH] arm64: Remove stack duplicating code from jprobes

>>

>> Because the arm64 calling standard allows stacked function arguments

>> to be

>> anywhere in the stack frame, do not attempt to duplicate the stack

>> frame for

>> jprobes handler functions.

>>

>> Signed-off-by: David A. Long <dave.long@linaro.org>

>> ---

>>  Documentation/kprobes.txt          |  7 +++++++

>>  arch/arm64/include/asm/kprobes.h   |  2 --

>>  arch/arm64/kernel/probes/kprobes.c | 31 +++++--------------------------

>>  3 files changed, 12 insertions(+), 28 deletions(-)

>>

>> diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt

>> index 1f9b3e2..bd01839 100644

>> --- a/Documentation/kprobes.txt

>> +++ b/Documentation/kprobes.txt

>> @@ -103,6 +103,13 @@ Note that the probed function's args may be

>> passed on the stack

>>  or in registers.  The jprobe will work in either case, so long as the

>>  handler's prototype matches that of the probed function.

>>

>> +Note that in some architectures (e.g.: arm64) the stack copy is not

>

> Could sparc64 be added to this list?

>

>    For the sparc folks who are new to the thread, we've previously

>    established that the sparc64 ABI passes large structures by

>    allocating them from the caller's stack frame and passing a pointer

>    to the stack frame (i.e. arguments may not be at top of the stack).

>    We also noticed that sparc code does not save/restore anything from

>    the stack.

>


I was reluctant to do that in the context of late changes to v4.8 for 
arm64 but now that any changes for this are going in as a new patch it 
would indeed be useful to get involvement from sparc maintainers.

>

>> +done, as the actual location of stacked parameters may be outside of

>> +a reasonable MAX_STACK_SIZE value and because that location cannot be

>> +determined by the jprobes code. In this case the jprobes user must be

>> +careful to make certain the calling signature of the function does

>> +not cause parameters to be passed on the stack.

>> +

>>  1.3 Return Probes

>>

>>  1.3.1 How Does a Return Probe Work?

>> diff --git a/arch/arm64/include/asm/kprobes.h

>> b/arch/arm64/include/asm/kprobes.h

>> index 61b4915..1737aec 100644

>> --- a/arch/arm64/include/asm/kprobes.h

>> +++ b/arch/arm64/include/asm/kprobes.h

>> @@ -22,7 +22,6 @@

>>

>>  #define __ARCH_WANT_KPROBES_INSN_SLOT

>>  #define MAX_INSN_SIZE            1

>> -#define MAX_STACK_SIZE            128

>>

>>  #define flush_insn_slot(p)        do { } while (0)

>>  #define kretprobe_blacklist_size    0

>> @@ -47,7 +46,6 @@ struct kprobe_ctlblk {

>>      struct prev_kprobe prev_kprobe;

>>      struct kprobe_step_ctx ss_ctx;

>>      struct pt_regs jprobe_saved_regs;

>> -    char jprobes_stack[MAX_STACK_SIZE];

>>  };

>>

>>  void arch_remove_kprobe(struct kprobe *);

>> diff --git a/arch/arm64/kernel/probes/kprobes.c

>> b/arch/arm64/kernel/probes/kprobes.c

>> index bf97685..c6b0f40 100644

>> --- a/arch/arm64/kernel/probes/kprobes.c

>> +++ b/arch/arm64/kernel/probes/kprobes.c

>> @@ -41,18 +41,6 @@ DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);

>>  static void __kprobes

>>  post_kprobe_handler(struct kprobe_ctlblk *, struct pt_regs *);

>>

>> -static inline unsigned long min_stack_size(unsigned long addr)

>> -{

>> -    unsigned long size;

>> -

>> -    if (on_irq_stack(addr, raw_smp_processor_id()))

>> -        size = IRQ_STACK_PTR(raw_smp_processor_id()) - addr;

>> -    else

>> -        size = (unsigned long)current_thread_info() + THREAD_START_SP

>> - addr;

>> -

>> -    return min(size, FIELD_SIZEOF(struct kprobe_ctlblk, jprobes_stack));

>> -}

>> -

>>  static void __kprobes arch_prepare_ss_slot(struct kprobe *p)

>>  {

>>      /* prepare insn slot */

>> @@ -489,20 +477,15 @@ int __kprobes setjmp_pre_handler(struct kprobe

>> *p, struct pt_regs *regs)

>>  {

>>      struct jprobe *jp = container_of(p, struct jprobe, kp);

>>      struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();

>> -    long stack_ptr = kernel_stack_pointer(regs);

>>

>>      kcb->jprobe_saved_regs = *regs;

>>      /*

>> -     * As Linus pointed out, gcc assumes that the callee

>> -     * owns the argument space and could overwrite it, e.g.

>> -     * tailcall optimization. So, to be absolutely safe

>> -     * we also save and restore enough stack bytes to cover

>> -     * the argument area.

>> +     * Since we can't be sure where in the stack frame "stacked"

>> +     * pass-by-value arguments are stored we just don't try to

>> +     * duplicate any of the stack.

>  > ...

>>                                       Do not use jprobes on functions

>> that

>> +     * use more than 64 bytes (after padding each to an 8 byte boundary)

>> +     * of arguments, or pass individual arguments larger than 16 bytes.

>

> I like this wording. So much so that it really would be great to repeat

> this in the Documentation/. Could this be included in the list of

> architecture support/restrictions?

>


Are you thinking specifically of the "5. Kprobes Features and 
Limitations" section in Documentation/kprobes.txt?

>

> Daniel.

>


Thanks,
-dl
Catalin Marinas Aug. 9, 2016, 5:23 p.m. UTC | #4
On Mon, Aug 08, 2016 at 10:29:05AM -0400, David Long wrote:
> On 08/08/2016 07:13 AM, Daniel Thompson wrote:

> >On 04/08/16 05:47, David Long wrote:

> >>From b451caa1adaf1d03e08a44b5dad3fca31cebd97a Mon Sep 17 00:00:00 2001

> >>From: "David A. Long" <dave.long@linaro.org>

> >>Date: Thu, 4 Aug 2016 00:35:33 -0400

> >>Subject: [PATCH] arm64: Remove stack duplicating code from jprobes

> >>

> >>Because the arm64 calling standard allows stacked function arguments

> >>to be

> >>anywhere in the stack frame, do not attempt to duplicate the stack

> >>frame for

> >>jprobes handler functions.

> >>

> >>Signed-off-by: David A. Long <dave.long@linaro.org>

> >>---

> >> Documentation/kprobes.txt          |  7 +++++++

> >> arch/arm64/include/asm/kprobes.h   |  2 --

> >> arch/arm64/kernel/probes/kprobes.c | 31 +++++--------------------------

> >> 3 files changed, 12 insertions(+), 28 deletions(-)

> >>

> >>diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt

> >>index 1f9b3e2..bd01839 100644

> >>--- a/Documentation/kprobes.txt

> >>+++ b/Documentation/kprobes.txt

> >>@@ -103,6 +103,13 @@ Note that the probed function's args may be

> >>passed on the stack

> >> or in registers.  The jprobe will work in either case, so long as the

> >> handler's prototype matches that of the probed function.

> >>

> >>+Note that in some architectures (e.g.: arm64) the stack copy is not

> >

> >Could sparc64 be added to this list?

> >

> >   For the sparc folks who are new to the thread, we've previously

> >   established that the sparc64 ABI passes large structures by

> >   allocating them from the caller's stack frame and passing a pointer

> >   to the stack frame (i.e. arguments may not be at top of the stack).

> >   We also noticed that sparc code does not save/restore anything from

> >   the stack.

> 

> I was reluctant to do that in the context of late changes to v4.8 for arm64

> but now that any changes for this are going in as a new patch it would

> indeed be useful to get involvement from sparc maintainers.


I'm happy to take the arm64 patch for 4.8 as it's mainly a clean-up.
Whether you can mention sparc64 as well, it depends on the sparc
maintainers. You can either cc them or send the series as two patches,
one for documentation and the other for arm64.

-- 
Catalin
diff mbox

Patch

diff --git a/arch/arm64/kernel/probes/kprobes.c b/arch/arm64/kernel/probes/kprobes.c
index bf9768588288..84e02606ec3d 100644
--- a/arch/arm64/kernel/probes/kprobes.c
+++ b/arch/arm64/kernel/probes/kprobes.c
@@ -491,6 +491,10 @@  int __kprobes setjmp_pre_handler(struct kprobe *p, struct pt_regs *regs)
 	struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
 	long stack_ptr = kernel_stack_pointer(regs);
 
+	/* do not allow arguments passed on the stack */
+	if (WARN_ON_ONCE(regs->sp != regs->regs[29]))
+		return 0;
+
 	kcb->jprobe_saved_regs = *regs;
 	/*
 	 * As Linus pointed out, gcc assumes that the callee