From patchwork Tue Jan 19 11:16:50 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wang Nan X-Patchwork-Id: 59982 Delivered-To: patch@linaro.org Received: by 10.112.130.2 with SMTP id oa2csp2509267lbb; Tue, 19 Jan 2016 03:18:48 -0800 (PST) X-Received: by 10.67.6.195 with SMTP id cw3mr43579316pad.88.1453202328445; Tue, 19 Jan 2016 03:18:48 -0800 (PST) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 11si37778506pfq.115.2016.01.19.03.18.48; Tue, 19 Jan 2016 03:18:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753104AbcASLSq (ORCPT + 29 others); Tue, 19 Jan 2016 06:18:46 -0500 Received: from szxga03-in.huawei.com ([119.145.14.66]:4648 "EHLO szxga03-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752036AbcASLSR (ORCPT ); Tue, 19 Jan 2016 06:18:17 -0500 Received: from 172.24.1.49 (EHLO szxeml427-hub.china.huawei.com) ([172.24.1.49]) by szxrg03-dlp.huawei.com (MOS 4.4.3-GA FastPath queued) with ESMTP id BUW16802; Tue, 19 Jan 2016 19:17:14 +0800 (CST) Received: from linux-4hy3.site (10.107.193.248) by szxeml427-hub.china.huawei.com (10.82.67.182) with Microsoft SMTP Server id 14.3.235.1; Tue, 19 Jan 2016 19:17:04 +0800 From: Wang Nan To: , CC: , Wang Nan , He Kuang , Yunlong Song , "Arnaldo Carvalho de Melo" , Brendan Gregg , Jiri Olsa , Masami Hiramatsu , Namhyung Kim , Zefan Li , Subject: [PATCH 6/6] perf/core: Put size of a sample at the end of it by PERF_SAMPLE_TAILSIZE Date: Tue, 19 Jan 2016 11:16:50 +0000 Message-ID: <1453202210-134429-7-git-send-email-wangnan0@huawei.com> X-Mailer: git-send-email 1.8.3.4 In-Reply-To: <1453202210-134429-1-git-send-email-wangnan0@huawei.com> References: <20160118120230.GP6357@twins.programming.kicks-ass.net> <1453202210-134429-1-git-send-email-wangnan0@huawei.com> MIME-Version: 1.0 X-Originating-IP: [10.107.193.248] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020205.569E1B3B.0035, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0, ip=0.0.0.0, so=2013-05-26 15:14:31, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: da01893546d990ec9c37bb889744dd00 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch introduces a PERF_SAMPLE_TAILSIZE flag which allows a size field attached at the end of a sample. The idea comes from [1] that, with tie size at tail of an event, it is possible for user program who read from the ring buffer parse events backward. For example: head | V +--+---+-------+----------+------+---+ |E6|...| B 8| C 11| D 7|E..| +--+---+-------+----------+------+---+ In this case, from the 'head' pointer provided by kernel, user program can first see '6' by (*(head - sizeof(u64))), then it can get the start pointer of record 'E', then it can read size and find start position of record D, C, B in similar way. The implementation is easy: adding a PERF_SAMPLE_TAILSIZE flag, makes perf_output_sample() output size at the end of a sample. Following things are done for ensure the ring buffer is safe for backward parsing: - Don't allow two events with different PERF_SAMPLE_TAILSIZE setting set their output to each other; - For non-sample events, also output tailsize if required. This patch has a limitation for perf: Before reading such ring buffer, perf must ensure all events which may output to it is already stopped, so the 'head' pointer it get is the end of the last record. [1] http://lkml.kernel.org/g/1449063499-236703-1-git-send-email-wangnan0@huawei.com Signed-off-by: Wang Nan Cc: He Kuang Cc: Yunlong Song Cc: Alexei Starovoitov Cc: Arnaldo Carvalho de Melo Cc: Brendan Gregg Cc: Jiri Olsa Cc: Masami Hiramatsu Cc: Namhyung Kim Cc: Peter Zijlstra Cc: Zefan Li Cc: pi3orama@163.com --- include/linux/perf_event.h | 17 ++++++--- include/uapi/linux/perf_event.h | 3 +- kernel/events/core.c | 82 +++++++++++++++++++++++++++++------------ kernel/events/ring_buffer.c | 7 ++-- 4 files changed, 75 insertions(+), 34 deletions(-) -- 1.8.3.4 diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index c0335b9..7c70d4b 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -841,13 +841,13 @@ extern void perf_event_output(struct perf_event *event, struct pt_regs *regs); extern void -perf_event_header__init_id(struct perf_event_header *header, - struct perf_sample_data *data, - struct perf_event *event); +perf_event_header__init_extra(struct perf_event_header *header, + struct perf_sample_data *data, + struct perf_event *event); extern void -perf_event__output_id_sample(struct perf_event *event, - struct perf_output_handle *handle, - struct perf_sample_data *sample); +perf_event__output_extra(struct perf_event *event, u64 evt_size, + struct perf_output_handle *handle, + struct perf_sample_data *sample); extern void perf_log_lost_samples(struct perf_event *event, u64 lost); @@ -1043,6 +1043,11 @@ static inline bool is_write_backward(struct perf_event *event) return !!event->attr.write_backward; } +static inline bool has_tailsize(struct perf_event *event) +{ + return !!(event->attr.sample_type & PERF_SAMPLE_TAILSIZE); +} + extern int perf_output_begin(struct perf_output_handle *handle, struct perf_event *event, unsigned int size); extern int perf_output_begin_onward(struct perf_output_handle *handle, diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 598b9b0..f0cad26 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -139,8 +139,9 @@ enum perf_event_sample_format { PERF_SAMPLE_IDENTIFIER = 1U << 16, PERF_SAMPLE_TRANSACTION = 1U << 17, PERF_SAMPLE_REGS_INTR = 1U << 18, + PERF_SAMPLE_TAILSIZE = 1U << 19, - PERF_SAMPLE_MAX = 1U << 19, /* non-ABI */ + PERF_SAMPLE_MAX = 1U << 20, /* non-ABI */ }; /* diff --git a/kernel/events/core.c b/kernel/events/core.c index fa32d8c..d8bb92e 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5141,12 +5141,14 @@ static void __perf_event_header__init_id(struct perf_event_header *header, } } -void perf_event_header__init_id(struct perf_event_header *header, - struct perf_sample_data *data, - struct perf_event *event) +void perf_event_header__init_extra(struct perf_event_header *header, + struct perf_sample_data *data, + struct perf_event *event) { if (event->attr.sample_id_all) __perf_event_header__init_id(header, data, event); + if (has_tailsize(event)) + header->size += sizeof(u64); } static void __perf_event__output_id_sample(struct perf_output_handle *handle, @@ -5173,12 +5175,14 @@ static void __perf_event__output_id_sample(struct perf_output_handle *handle, perf_output_put(handle, data->id); } -void perf_event__output_id_sample(struct perf_event *event, - struct perf_output_handle *handle, - struct perf_sample_data *sample) +void perf_event__output_extra(struct perf_event *event, u64 evt_size, + struct perf_output_handle *handle, + struct perf_sample_data *sample) { if (event->attr.sample_id_all) __perf_event__output_id_sample(handle, sample); + if (has_tailsize(event)) + perf_output_put(handle, evt_size); } static void perf_output_read_one(struct perf_output_handle *handle, @@ -5420,6 +5424,13 @@ void perf_output_sample(struct perf_output_handle *handle, } } + /* Should be the last one */ + if (sample_type & PERF_SAMPLE_TAILSIZE) { + u64 evt_size = header->size; + + perf_output_put(handle, evt_size); + } + if (!event->attr.watermark) { int wakeup_events = event->attr.wakeup_events; @@ -5539,6 +5550,9 @@ void perf_prepare_sample(struct perf_event_header *header, header->size += size; } + + if (sample_type & PERF_SAMPLE_TAILSIZE) + header->size += sizeof(u64); } static void __always_inline @@ -5620,14 +5634,15 @@ perf_event_read_event(struct perf_event *event, }; int ret; - perf_event_header__init_id(&read_event.header, &sample, event); + perf_event_header__init_extra(&read_event.header, &sample, event); ret = perf_output_begin(&handle, event, read_event.header.size); if (ret) return; perf_output_put(&handle, read_event); perf_output_read(&handle, event); - perf_event__output_id_sample(event, &handle, &sample); + perf_event__output_extra(event, read_event.header.size, + &handle, &sample); perf_output_end(&handle); } @@ -5739,7 +5754,7 @@ static void perf_event_task_output(struct perf_event *event, if (!perf_event_task_match(event)) return; - perf_event_header__init_id(&task_event->event_id.header, &sample, event); + perf_event_header__init_extra(&task_event->event_id.header, &sample, event); ret = perf_output_begin(&handle, event, task_event->event_id.header.size); @@ -5756,7 +5771,9 @@ static void perf_event_task_output(struct perf_event *event, perf_output_put(&handle, task_event->event_id); - perf_event__output_id_sample(event, &handle, &sample); + perf_event__output_extra(event, + task_event->event_id.header.size, + &handle, &sample); perf_output_end(&handle); out: @@ -5835,7 +5852,7 @@ static void perf_event_comm_output(struct perf_event *event, if (!perf_event_comm_match(event)) return; - perf_event_header__init_id(&comm_event->event_id.header, &sample, event); + perf_event_header__init_extra(&comm_event->event_id.header, &sample, event); ret = perf_output_begin(&handle, event, comm_event->event_id.header.size); @@ -5849,7 +5866,8 @@ static void perf_event_comm_output(struct perf_event *event, __output_copy(&handle, comm_event->comm, comm_event->comm_size); - perf_event__output_id_sample(event, &handle, &sample); + perf_event__output_extra(event, comm_event->event_id.header.size, + &handle, &sample); perf_output_end(&handle); out: @@ -5958,7 +5976,7 @@ static void perf_event_mmap_output(struct perf_event *event, mmap_event->event_id.header.size += sizeof(mmap_event->flags); } - perf_event_header__init_id(&mmap_event->event_id.header, &sample, event); + perf_event_header__init_extra(&mmap_event->event_id.header, &sample, event); ret = perf_output_begin(&handle, event, mmap_event->event_id.header.size); if (ret) @@ -5981,7 +5999,8 @@ static void perf_event_mmap_output(struct perf_event *event, __output_copy(&handle, mmap_event->file_name, mmap_event->file_size); - perf_event__output_id_sample(event, &handle, &sample); + perf_event__output_extra(event, mmap_event->event_id.header.size, + &handle, &sample); perf_output_end(&handle); out: @@ -6164,14 +6183,15 @@ void perf_event_aux_event(struct perf_event *event, unsigned long head, }; int ret; - perf_event_header__init_id(&rec.header, &sample, event); + perf_event_header__init_extra(&rec.header, &sample, event); ret = perf_output_begin(&handle, event, rec.header.size); if (ret) return; perf_output_put(&handle, rec); - perf_event__output_id_sample(event, &handle, &sample); + perf_event__output_extra(event, rec.header.size, + &handle, &sample); perf_output_end(&handle); } @@ -6197,7 +6217,7 @@ void perf_log_lost_samples(struct perf_event *event, u64 lost) .lost = lost, }; - perf_event_header__init_id(&lost_samples_event.header, &sample, event); + perf_event_header__init_extra(&lost_samples_event.header, &sample, event); ret = perf_output_begin(&handle, event, lost_samples_event.header.size); @@ -6205,7 +6225,8 @@ void perf_log_lost_samples(struct perf_event *event, u64 lost) return; perf_output_put(&handle, lost_samples_event); - perf_event__output_id_sample(event, &handle, &sample); + perf_event__output_extra(event, lost_samples_event.header.size, + &handle, &sample); perf_output_end(&handle); } @@ -6252,7 +6273,7 @@ static void perf_event_switch_output(struct perf_event *event, void *data) perf_event_tid(event, se->next_prev); } - perf_event_header__init_id(&se->event_id.header, &sample, event); + perf_event_header__init_extra(&se->event_id.header, &sample, event); ret = perf_output_begin(&handle, event, se->event_id.header.size); if (ret) @@ -6263,7 +6284,8 @@ static void perf_event_switch_output(struct perf_event *event, void *data) else perf_output_put(&handle, se->event_id); - perf_event__output_id_sample(event, &handle, &sample); + perf_event__output_extra(event, se->event_id.header.size, + &handle, &sample); perf_output_end(&handle); } @@ -6323,7 +6345,7 @@ static void perf_log_throttle(struct perf_event *event, int enable) if (enable) throttle_event.header.type = PERF_RECORD_UNTHROTTLE; - perf_event_header__init_id(&throttle_event.header, &sample, event); + perf_event_header__init_extra(&throttle_event.header, &sample, event); ret = perf_output_begin(&handle, event, throttle_event.header.size); @@ -6331,7 +6353,8 @@ static void perf_log_throttle(struct perf_event *event, int enable) return; perf_output_put(&handle, throttle_event); - perf_event__output_id_sample(event, &handle, &sample); + perf_event__output_extra(event, throttle_event.header.size, + &handle, &sample); perf_output_end(&handle); } @@ -6359,14 +6382,15 @@ static void perf_log_itrace_start(struct perf_event *event) rec.pid = perf_event_pid(event, current); rec.tid = perf_event_tid(event, current); - perf_event_header__init_id(&rec.header, &sample, event); + perf_event_header__init_extra(&rec.header, &sample, event); ret = perf_output_begin(&handle, event, rec.header.size); if (ret) return; perf_output_put(&handle, rec); - perf_event__output_id_sample(event, &handle, &sample); + perf_event__output_extra(event, rec.header.size, + &handle, &sample); perf_output_end(&handle); } @@ -8151,6 +8175,16 @@ perf_event_set_output(struct perf_event *event, struct perf_event *output_event) event->pmu != output_event->pmu) goto out; + /* + * Don't allow mixed tailsize setting since the resuling + * ringbuffer would unable to be parsed backward. + * + * '!=' is safe because has_tailsize() returns bool, two differnt + * non-zero values would be treated as equal (both true). + */ + if (has_tailsize(event) != has_tailsize(output_event)) + goto out; + set: mutex_lock(&event->mmap_mutex); /* Can't redirect output if we've got an active mmap() */ diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index 4b0ef33..5cb098e 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -208,10 +208,11 @@ __perf_output_begin(struct perf_output_handle *handle, lost_event.id = event->id; lost_event.lost = local_xchg(&rb->lost, 0); - perf_event_header__init_id(&lost_event.header, - &sample_data, event); + perf_event_header__init_extra(&lost_event.header, + &sample_data, event); perf_output_put(handle, lost_event); - perf_event__output_id_sample(event, handle, &sample_data); + perf_event__output_extra(event, lost_event.header.type, + handle, &sample_data); } return 0;