Message ID | 1436765375-7119-3-git-send-email-takahiro.akashi@linaro.org |
---|---|
State | New |
Headers | show |
On 07/14/2015 10:31 PM, Steven Rostedt wrote: > On Tue, 14 Jul 2015 21:47:10 +0900 > Jungseok Lee <jungseoklee85@gmail.com> wrote: > >> Is the below example an unexpected result? >> Entry 17 and 18 are ftrace_call and ftrace_ops_no_ops, respectively. [snip] > Note, function tracing does not disable interrupts. This looks to be > that an interrupt came in while __aloc_skb() was being traced. Yeah, I think so, too. But if my insight is correct, it's not __alloc_skb() but one of functions that it calls. As I said in the commit log message of patch[1/3], the exact traced function will not be listed by save_stack_trace() because we don't create a stack frame at mcount(). I think this is a flaw in the current implementation (on x86). what do you think, Steve? -Takahiro AKASHI > > -- Steve > >> 17) 4080 168 ftrace_ops_no_ops+0xb4/0x16c >> 18) 3912 32 ftrace_call+0x0/0x4 >> 19) 3880 144 __alloc_skb+0x48/0x180 >> 20) 3736 96 alloc_skb_with_frags+0x74/0x234 >> 21) 3640 112 sock_alloc_send_pskb+0x1d0/0x294 >> 22) 3528 160 sock_alloc_send_skb+0x44/0x54 >> 23) 3368 64 __ip_append_data.isra.40+0x78c/0xb48 >> 24) 3304 224 ip_append_data.part.42+0x98/0xe8 >> 25) 3080 112 ip_append_data+0x68/0x7c >> 26) 2968 96 icmp_push_reply+0x7c/0x144 >> 27) 2872 96 icmp_send+0x3c0/0x3c8 >> 28) 2776 192 __udp4_lib_rcv+0x5b8/0x684 >> 29) 2584 96 udp_rcv+0x2c/0x3c >> 30) 2488 32 ip_local_deliver+0xa0/0x224 >> 31) 2456 48 ip_rcv+0x360/0x57c >> 32) 2408 64 __netif_receive_skb_core+0x4d0/0x80c >> 33) 2344 128 __netif_receive_skb+0x24/0x84 >> 34) 2216 32 process_backlog+0x9c/0x15c >> 35) 2184 80 net_rx_action+0x1ec/0x32c >> 36) 2104 160 __do_softirq+0x114/0x2f0 >> 37) 1944 128 do_softirq+0x60/0x68 >> 38) 1816 32 __local_bh_enable_ip+0xb0/0xd4 >> 39) 1784 32 ip_finish_output+0x1f4/0xabc >> 40) 1752 96 ip_output+0xf0/0x120 >> 41) 1656 64 ip_local_out_sk+0x44/0x54 >> 42) 1592 32 ip_send_skb+0x24/0xbc >> 43) 1560 48 udp_send_skb+0x1b4/0x2f4 >> 44) 1512 80 udp_sendmsg+0x2a8/0x7a0 >> 45) 1432 272 inet_sendmsg+0xa0/0xd0 >> 46) 1160 48 sock_sendmsg+0x30/0x78 >> 47) 1112 32 ___sys_sendmsg+0x15c/0x26c >> 48) 1080 400 __sys_sendmmsg+0x94/0x180 >> 49) 680 320 SyS_sendmmsg+0x38/0x54 >> 50) 360 360 el0_svc_naked+0x20/0x28 >> >> Best Regards >> Jungseok Lee > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Steve, On 07/15/2015 11:51 AM, Steven Rostedt wrote: > On Wed, 15 Jul 2015 09:20:42 +0900 > AKASHI Takahiro <takahiro.akashi@linaro.org> wrote: > >> On 07/14/2015 10:31 PM, Steven Rostedt wrote: >>> On Tue, 14 Jul 2015 21:47:10 +0900 >>> Jungseok Lee <jungseoklee85@gmail.com> wrote: >>> >>>> Is the below example an unexpected result? >>>> Entry 17 and 18 are ftrace_call and ftrace_ops_no_ops, respectively. >> >> [snip] >> >>> Note, function tracing does not disable interrupts. This looks to be >>> that an interrupt came in while __aloc_skb() was being traced. >> >> Yeah, I think so, too. But if my insight is correct, it's not __alloc_skb() >> but one of functions that it calls. As I said in the commit log message >> of patch[1/3], the exact traced function will not be listed by not patch[1/3], but patch[3/3] >> save_stack_trace() because we don't create a stack frame at mcount(). >> I think this is a flaw in the current implementation (on x86). >> >> what do you think, Steve? >> > > mcount (well ftrace_call actually) does indeed create a stack frame for > itself *and* for what called it. At least on x86_64. See mcount_64.S. > > With -pg -mfentry, it creates a stack frame. Without -mfentry, mcount > is called after the current function's frame is made so we don't need > to do much. Thank you for the explanation. But what I don't really understand here is why we need to add the "current function" to the stack dump list returned by save_stack_trace(): In check_stack(), > /* > * Add the passed in ip from the function tracer. > * Searching for this on the stack will skip over > * most of the overhead from the stack tracer itself. > */ > stack_dump_trace[0] = ip; > max_stack_trace.nr_entries++; I think that "ip" here is the "return address for func" in your ascii art, and it should be already in the list if a frame is made by mcount (or func_call). In fact, stack tracer on arm64 works OK even without the patch[3/3] if the code quoted above is commented out. Even on x86, the code is conditional and not activated if the kernel is compiled without -mfentry before the following commit: commit 4df297129f62 ("tracing: Remove most or all of stack tracer stack size from stack_max_size") So what do I misunderstand here? Thanks, -Takahiro AKASHI > Here's what the -mfentry version does: > > pushq %rbp > pushq 8*2(%rsp) /* this is the parent pointer */ > pushq %rbp > movq %rsp, %rbp > pushq 8*3(%rsp) /* Return address to ftrace_call */ > pushq %rbp > movq %rsp, %rbp > > > Thus the stack looks like this: > > <---+ > | | | > +------------------------------+ | > | return address for func | | > | return address for func_call | | > | original %rbp | | > +------------------------------+ | > | return address for func | | > | ptr to parent frame (%rbp) | ----+ > +------------------------------| <-----+ > | return address for func_call | | > | ptr to next frame (%rbp) | ------+ > +------------------------------+ <---+ > | > | > Current %rbp points to func_call frame -----+ > > The first box isn't used as a frame, but is used by ftrace_call to save > information to restore everything properly. > > Thus, __alloc_skb() is what is currently being traced. > > > -- Steve > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 07/16/2015 01:13 AM, Steven Rostedt wrote: > On Wed, 15 Jul 2015 10:55:36 -0400 > Steven Rostedt <rostedt@goodmis.org> wrote: > > >> I'll take a look at it and try to clean up the code. > > Does the following patch make sense for you? Looks nice. The patch greatly simplifies changes on arm64 side. - Takahiro AKASHI > -- Steve > > diff --git a/kernel/trace/trace_stack.c b/kernel/trace/trace_stack.c > index 3f34496244e9..9384647d07c3 100644 > --- a/kernel/trace/trace_stack.c > +++ b/kernel/trace/trace_stack.c > @@ -18,12 +18,6 @@ > > #define STACK_TRACE_ENTRIES 500 > > -#ifdef CC_USING_FENTRY > -# define fentry 1 > -#else > -# define fentry 0 > -#endif > - > static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES+1] = > { [0 ... (STACK_TRACE_ENTRIES)] = ULONG_MAX }; > static unsigned stack_dump_index[STACK_TRACE_ENTRIES]; > @@ -35,7 +29,7 @@ static unsigned stack_dump_index[STACK_TRACE_ENTRIES]; > */ > static struct stack_trace max_stack_trace = { > .max_entries = STACK_TRACE_ENTRIES - 1, > - .entries = &stack_dump_trace[1], > + .entries = &stack_dump_trace[0], > }; > > static unsigned long max_stack_size; > @@ -77,7 +71,7 @@ check_stack(unsigned long ip, unsigned long *stack) > unsigned long this_size, flags; unsigned long *p, *top, *start; > static int tracer_frame; > int frame_size = ACCESS_ONCE(tracer_frame); > - int i; > + int i, x; > > this_size = ((unsigned long)stack) & (THREAD_SIZE-1); > this_size = THREAD_SIZE - this_size; > @@ -105,26 +99,20 @@ check_stack(unsigned long ip, unsigned long *stack) > max_stack_size = this_size; > > max_stack_trace.nr_entries = 0; > - > - if (using_ftrace_ops_list_func()) > - max_stack_trace.skip = 4; > - else > - max_stack_trace.skip = 3; > + max_stack_trace.skip = 0; > > save_stack_trace(&max_stack_trace); > > - /* > - * Add the passed in ip from the function tracer. > - * Searching for this on the stack will skip over > - * most of the overhead from the stack tracer itself. > - */ > - stack_dump_trace[0] = ip; > - max_stack_trace.nr_entries++; > + /* Skip over the overhead of the stack tracer itself */ > + for (i = 0; i < max_stack_trace.nr_entries; i++) { > + if (stack_dump_trace[i] == ip) > + break; > + } > > /* > * Now find where in the stack these are. > */ > - i = 0; > + x = 0; > start = stack; > top = (unsigned long *) > (((unsigned long)start & ~(THREAD_SIZE-1)) + THREAD_SIZE); > @@ -139,12 +127,13 @@ check_stack(unsigned long ip, unsigned long *stack) > while (i < max_stack_trace.nr_entries) { > int found = 0; > > - stack_dump_index[i] = this_size; > + stack_dump_index[x] = this_size; > p = start; > > for (; p < top && i < max_stack_trace.nr_entries; p++) { > if (*p == stack_dump_trace[i]) { > - this_size = stack_dump_index[i++] = > + stack_dump_trace[x] = stack_dump_trace[i++]; > + this_size = stack_dump_index[x++] = > (top - p) * sizeof(unsigned long); > found = 1; > /* Start the search from here */ > @@ -168,6 +157,9 @@ check_stack(unsigned long ip, unsigned long *stack) > i++; > } > > + for (; x < max_stack_trace.nr_entries; x++) > + stack_dump_trace[x] = ULONG_MAX; > + > if (task_stack_end_corrupted(current)) { > print_max_stack(); > BUG(); > @@ -192,24 +184,7 @@ stack_trace_call(unsigned long ip, unsigned long parent_ip, > if (per_cpu(trace_active, cpu)++ != 0) > goto out; > > - /* > - * When fentry is used, the traced function does not get > - * its stack frame set up, and we lose the parent. > - * The ip is pretty useless because the function tracer > - * was called before that function set up its stack frame. > - * In this case, we use the parent ip. > - * > - * By adding the return address of either the parent ip > - * or the current ip we can disregard most of the stack usage > - * caused by the stack tracer itself. > - * > - * The function tracer always reports the address of where the > - * mcount call was, but the stack will hold the return address. > - */ > - if (fentry) > - ip = parent_ip; > - else > - ip += MCOUNT_INSN_SIZE; > + ip += MCOUNT_INSN_SIZE; > > check_stack(ip, &stack); > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index 407991b..978c923 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -97,34 +97,49 @@ static int save_trace(struct stackframe *frame, void *d) return trace->nr_entries >= trace->max_entries; } -void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace) +static inline void __save_stack_trace(struct stackframe *frame, + struct stack_trace *trace, int no_sched) { struct stack_trace_data data; - struct stackframe frame; data.trace = trace; data.skip = trace->skip; + data.no_sched_functions = no_sched; + + walk_stackframe(frame, save_trace, &data); + if (trace->nr_entries < trace->max_entries) + trace->entries[trace->nr_entries++] = ULONG_MAX; +} + +void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace) +{ + struct stackframe frame; + int no_sched; if (tsk != current) { - data.no_sched_functions = 1; + no_sched = 1; frame.fp = thread_saved_fp(tsk); frame.sp = thread_saved_sp(tsk); frame.pc = thread_saved_pc(tsk); } else { - data.no_sched_functions = 0; + no_sched = 0; frame.fp = (unsigned long)__builtin_frame_address(0); frame.sp = current_stack_pointer; frame.pc = (unsigned long)save_stack_trace_tsk; } - walk_stackframe(&frame, save_trace, &data); - if (trace->nr_entries < trace->max_entries) - trace->entries[trace->nr_entries++] = ULONG_MAX; + __save_stack_trace(&frame, trace, no_sched); } void save_stack_trace(struct stack_trace *trace) { - save_stack_trace_tsk(current, trace); + struct stackframe frame; + + frame.fp = (unsigned long)__builtin_frame_address(0); + frame.sp = current_stack_pointer; + frame.pc = (unsigned long)save_stack_trace_tsk; + + __save_stack_trace(&frame, trace, 0); } EXPORT_SYMBOL_GPL(save_stack_trace); #endif
Ftrace's stack tracer on arm64 returns wrong information about call stacks: Depth Size Location (50 entries) ----- ---- -------- 0) 5256 0 notifier_call_chain+0x30/0x94 1) 5256 0 ftrace_call+0x0/0x4 2) 5256 0 notifier_call_chain+0x2c/0x94 3) 5256 0 raw_notifier_call_chain+0x34/0x44 4) 5256 0 timekeeping_update.constprop.9+0xb8/0x114 5) 5256 0 update_wall_time+0x408/0x6dc One of tracer functions, ftrace_call (or mcount), is unexpectedly listed. The *bare* stack dump returned by save_stack_trace() is: save_stack_trace_tsk() save_stack_trace() stack_trace_call() ftrace_ops_no_ops() ftrace_call() notifier_call_chain() raw_notifier_call_chain() ... On arm64, save_stack_trace() calls save_stack_trace_tsk() and this will result in putting additional stack frame in the returned list. This behavior, however, conflicts with stack stracer's assumption that the number of functions to be skiped as part of tracer is 4, from save_stack_trace() to mcount(), if ftrace_ops_list_func() is used. The value is hard coded in check_patch(). This patch refactors save_stack_trace() and save_stack_trace_tsk() in order to reduce the stack depth by making the common code inlined. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> --- arch/arm64/kernel/stacktrace.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-)