Message ID | 1415060918-19954-2-git-send-email-pawel.moll@arm.com |
---|---|
State | New |
Headers | show |
On Tue, Nov 04, 2014 at 12:28:36AM +0000, Pawel Moll wrote: > +int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC; const ? > /* > * perf samples are done in some very critical code paths (NMIs). > * If they take too much CPU time, the system can lock up and not > @@ -324,7 +326,7 @@ extern __weak const char *perf_pmu_name(void) > > static inline u64 perf_clock(void) > { > - return local_clock(); > + return ktime_get_mono_fast_ns(); > } Do we maybe want to make it boot-time switchable back to local_clock for people with bad systems and or backwards compat issues? Everybody using Core2 and older will very much not want to have this unless they've got a very good reason for wanting it. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, 2014-11-04 at 07:23 +0000, Peter Zijlstra wrote: > On Tue, Nov 04, 2014 at 12:28:36AM +0000, Pawel Moll wrote: > > > +int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC; > > const ? Sure (unless we have to change it as mentioned below) > > /* > > * perf samples are done in some very critical code paths (NMIs). > > * If they take too much CPU time, the system can lock up and not > > @@ -324,7 +326,7 @@ extern __weak const char *perf_pmu_name(void) > > > > static inline u64 perf_clock(void) > > { > > - return local_clock(); > > + return ktime_get_mono_fast_ns(); > > } > > Do we maybe want to make it boot-time switchable back to local_clock for > people with bad systems and or backwards compat issues? Very good idea, should have came up with it myself :-) Does __setup("perf_use_local_clock") sound reasonable? Then we have to decide whether to hide the sysctl "perf_sample_time_clk_id" (my preferred option, will see how difficult it is) or just provide an invalid clock_id (eg. -1) in it. Cheers! Pawel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, Nov 04, 2014 at 03:25:27PM +0000, Pawel Moll wrote: > Very good idea, should have came up with it myself :-) > > Does __setup("perf_use_local_clock") sound reasonable? Would not the 'module' already prefix a perf.? I never quite know how all that works out. But sure, that works. > Then we have to > decide whether to hide the sysctl "perf_sample_time_clk_id" (my > preferred option, will see how difficult it is) or just provide an > invalid clock_id (eg. -1) in it. invalid clock id is fine with me, although I suppose the clock people might have an oh-pinion ;-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 893a0d0..ba490d5 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -738,6 +738,7 @@ extern int sysctl_perf_event_paranoid; extern int sysctl_perf_event_mlock; extern int sysctl_perf_event_sample_rate; extern int sysctl_perf_cpu_time_max_percent; +extern int sysctl_perf_sample_time_clk_id; extern void perf_sample_event_took(u64 sample_len_ns); diff --git a/kernel/events/core.c b/kernel/events/core.c index 2b02c9f..ea3d6d3 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -234,6 +234,8 @@ int perf_cpu_time_max_percent_handler(struct ctl_table *table, int write, return 0; } +int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC; + /* * perf samples are done in some very critical code paths (NMIs). * If they take too much CPU time, the system can lock up and not @@ -324,7 +326,7 @@ extern __weak const char *perf_pmu_name(void) static inline u64 perf_clock(void) { - return local_clock(); + return ktime_get_mono_fast_ns(); } static inline struct perf_cpu_context * diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 15f2511..cb75f5b 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1094,6 +1094,13 @@ static struct ctl_table kern_table[] = { .extra1 = &zero, .extra2 = &one_hundred, }, + { + .procname = "perf_sample_time_clk_id", + .data = &sysctl_perf_sample_time_clk_id, + .maxlen = sizeof(unsigned int), + .mode = 0444, + .proc_handler = proc_dointvec, + }, #endif #ifdef CONFIG_KMEMCHECK {
Until now, perf framework never defined the meaning of the timestampt captured as PERF_SAMPLE_TIME sample type. The values were obtaining from local (sched) clock, which is unavailable in userspace. This made it impossible to correlate perf data with any other events. Other tracing solutions have the source configurable (ftrace) or just share a common time domain between kernel and userspace (LTTng). Follow the trend by using monotonic clock, which is readily available as POSIX CLOCK_MONOTONIC. Also add a sysctl "perf_sample_time_clk_id" attribute which can be used by the user to obtain the clk_id to be used with POSIX clock API (eg. clock_gettime()) to obtain a time value comparable with perf samples. Signed-off-by: Pawel Moll <pawel.moll@arm.com> --- Ingo, I remember your comments about this approach in the past, but during discussions at LPC Thomas was convinced that it's the right thing to do - see cover letter for the series... include/linux/perf_event.h | 1 + kernel/events/core.c | 4 +++- kernel/sysctl.c | 7 +++++++ 3 files changed, 11 insertions(+), 1 deletion(-)