From patchwork Thu Jul 7 05:34:49 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wang Nan X-Patchwork-Id: 71502 Delivered-To: patch@linaro.org Received: by 10.140.28.4 with SMTP id 4csp1219717qgy; Wed, 6 Jul 2016 22:36:16 -0700 (PDT) X-Received: by 10.98.42.76 with SMTP id q73mr2062803pfq.115.1467869775937; Wed, 06 Jul 2016 22:36:15 -0700 (PDT) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x6si5681258paj.136.2016.07.06.22.36.15; Wed, 06 Jul 2016 22:36:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756240AbcGGFgD (ORCPT + 30 others); Thu, 7 Jul 2016 01:36:03 -0400 Received: from szxga01-in.huawei.com ([58.251.152.64]:50344 "EHLO szxga01-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756172AbcGGFff (ORCPT ); Thu, 7 Jul 2016 01:35:35 -0400 Received: from 172.24.1.60 (EHLO szxeml428-hub.china.huawei.com) ([172.24.1.60]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id DNI84031; Thu, 07 Jul 2016 13:35:11 +0800 (CST) Received: from linux-4hy3.site (10.107.193.248) by szxeml428-hub.china.huawei.com (10.82.67.183) with Microsoft SMTP Server id 14.3.235.1; Thu, 7 Jul 2016 13:34:59 +0800 From: Wang Nan To: , CC: , , , Wang Nan , He Kuang , "Arnaldo Carvalho de Melo" , Jiri Olsa , "Masami Hiramatsu" , Namhyung Kim , "Nilay Vaish" Subject: [PATCH v14 8/8] perf tools: Add --tail-synthesize option Date: Thu, 7 Jul 2016 05:34:49 +0000 Message-ID: <1467869689-214015-9-git-send-email-wangnan0@huawei.com> X-Mailer: git-send-email 1.8.3.4 In-Reply-To: <1467869689-214015-1-git-send-email-wangnan0@huawei.com> References: <1467869689-214015-1-git-send-email-wangnan0@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.107.193.248] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020203.577DEA10.0063, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: eb801f9620a26f30531151d8be12db0e Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When working with overwritable ring buffer there's a inconvenience problem: if perf dumps data after a long period after it starts, non-sample events may lost, which makes following 'perf report' unable to identify proc name and mmap layout. For example: # perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \ dd if=/dev/zero of=/dev/null send SIGUSR2 after dd runs long enough. The resuling perf.data lost correct comm and mmap events: # perf script -i perf.data.2016061522374354 perf 24478 [004] 2581325.601789: raw_syscalls:sys_exit: NR 0 = 512 ^^^^ Should be 'dd' 27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux) 203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux) b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux) 7f47c417edf0 [unknown] ([unknown]) ^^^^^^^^^^^^ Fail to unwind This patch provides a '--tail-synthesize' option, allows perf to collect system status when finalizing output file. In resuling output file, the non-sample events reflect system status when dumping data. After this patch: # perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \ dd if=/dev/zero of=/dev/null # perf script -i perf.data.2016061600544998 dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ... ^^ Correct comm 203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms]) 203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms]) 203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms]) b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms]) d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so) ^^^^^ Correct unwind This option doesn't aim to solve this problem completely. If a process terminates before SIGUSR2, we still lost its COMM and MMAP events. For example, we can't unwind correctly from the final perf.data we get from the previous example, because when perf collects the final output file (when we press C-c), 'dd' has been terminated so its '/proc//mmap' becomes empty. However, this is a cheaper choice. To completely solve this problem we need to continously output non-sample events. To satisify the requirement of daemonization, we need to merge them periodically. It is possible but requires much more code and cycles. Automatically select --tail-synthesize when --overwrite is provided. Signed-off-by: Wang Nan Cc: He Kuang Cc: Arnaldo Carvalho de Melo Cc: Jiri Olsa Cc: Masami Hiramatsu Cc: Namhyung Kim Cc: Zefan Li Cc: Nilay Vaish Cc: pi3orama@163.com --- tools/perf/Documentation/perf-record.txt | 8 ++++++++ tools/perf/builtin-record.c | 31 +++++++++++++++++++++++++------ tools/perf/perf.h | 1 + 3 files changed, 34 insertions(+), 6 deletions(-) -- 1.8.3.4 diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 384c630..69966ab 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -367,6 +367,12 @@ options. 'perf record --dry-run -e' can act as a BPF script compiler if llvm.dump-obj in config file is set to true. +--tail-synthesize:: +Instead of collecting non-sample events (for example, fork, comm, mmap) at +the beginning of record, collect them during finalizing an output file. +The collected non-sample events reflects the status of the system when +record is finished. + --overwrite:: Makes all events use an overwritable ring buffer. An overwritable ring buffer works like a flight recorder: when it gets full, the kernel will @@ -381,6 +387,8 @@ those fitting in the ring buffer at that moment. 'overwrite' attribute can also be set or canceled for an event using config terms. For example: 'cycles/overwrite/' and 'instructions/no-overwrite/'. +Implies --tail-synthesize. + SEE ALSO -------- linkperf:perf-stat[1], linkperf:perf-list[1] diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 8d4c6bb..65e4f40 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -763,13 +763,16 @@ record__finish_output(struct record *rec) return; } -static int record__synthesize_workload(struct record *rec) +static int record__synthesize_workload(struct record *rec, bool tail) { struct { struct thread_map map; struct thread_map_data map_data; } thread_map; + if (rec->opts.tail_synthesize != tail) + return 0; + thread_map.map.nr = 1; thread_map.map.map[0].pid = rec->evlist->workload.pid; thread_map.map.map[0].comm = NULL; @@ -780,7 +783,7 @@ static int record__synthesize_workload(struct record *rec) rec->opts.proc_map_timeout); } -static int record__synthesize(struct record *rec); +static int record__synthesize(struct record *rec, bool tail); static int record__switch_output(struct record *rec, bool at_exit) @@ -791,6 +794,10 @@ record__switch_output(struct record *rec, bool at_exit) /* Same Size: "2015122520103046"*/ char timestamp[] = "InvalidTimestamp"; + record__synthesize(rec, true); + if (target__none(&rec->opts.target)) + record__synthesize_workload(rec, true); + rec->samples = 0; record__finish_output(rec); err = fetch_current_timestamp(timestamp, sizeof(timestamp)); @@ -813,7 +820,7 @@ record__switch_output(struct record *rec, bool at_exit) /* Output tracking events */ if (!at_exit) { - record__synthesize(rec); + record__synthesize(rec, false); /* * In 'perf record --switch-output' without -a, @@ -825,7 +832,7 @@ record__switch_output(struct record *rec, bool at_exit) * perf_event__synthesize_thread_map() for those events. */ if (target__none(&rec->opts.target)) - record__synthesize_workload(rec); + record__synthesize_workload(rec, false); } return fd; } @@ -880,7 +887,7 @@ static const struct perf_event_mmap_page *record__pick_pc(struct record *rec) return NULL; } -static int record__synthesize(struct record *rec) +static int record__synthesize(struct record *rec, bool tail) { struct perf_session *session = rec->session; struct machine *machine = &session->machines.host; @@ -890,6 +897,9 @@ static int record__synthesize(struct record *rec) int fd = perf_data_file__fd(file); int err = 0; + if (rec->opts.tail_synthesize != tail) + return 0; + if (file->is_pipe) { err = perf_event__synthesize_attrs(tool, session, process_synthesized_event); @@ -1053,7 +1063,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) machine = &session->machines.host; - err = record__synthesize(rec); + err = record__synthesize(rec, false); if (err < 0) goto out_child; @@ -1218,6 +1228,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) if (!quiet) fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n", waking); + if (target__none(&rec->opts.target)) + record__synthesize_workload(rec, true); + out_child: if (forks) { int exit_status; @@ -1236,6 +1249,7 @@ out_child: } else status = err; + record__synthesize(rec, true); /* this will be recalculated during process_buildids() */ rec->samples = 0; @@ -1561,6 +1575,8 @@ struct option __record_options[] = { OPT_BOOLEAN_SET('i', "no-inherit", &record.opts.no_inherit, &record.opts.no_inherit_set, "child tasks do not inherit counters"), + OPT_BOOLEAN(0, "tail-synthesize", &record.opts.tail_synthesize, + "synthesize non-sample events at the end of output"), OPT_BOOLEAN(0, "overwrite", &record.opts.overwrite, "use overwrite mode"), OPT_UINTEGER('F', "freq", &record.opts.user_freq, "profile at this frequency"), OPT_CALLBACK('m', "mmap-pages", &record.opts, "pages[,pages]", @@ -1772,6 +1788,9 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused) } } + if (record.opts.overwrite) + record.opts.tail_synthesize = true; + if (rec->evlist->nr_entries == 0 && perf_evlist__add_default(rec->evlist) < 0) { pr_err("Not enough memory for event selector list\n"); diff --git a/tools/perf/perf.h b/tools/perf/perf.h index 608b42b..a7e0f14 100644 --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@ -59,6 +59,7 @@ struct record_opts { bool record_switch_events; bool all_kernel; bool all_user; + bool tail_synthesize; bool overwrite; unsigned int freq; unsigned int mmap_pages;