diff mbox

[v3,1/3] perf: Use monotonic clock as a source for timestamps

Message ID 1415060918-19954-2-git-send-email-pawel.moll@arm.com
State New
Headers show

Commit Message

Pawel Moll Nov. 4, 2014, 12:28 a.m. UTC
Until now, perf framework never defined the meaning of the timestampt
captured as PERF_SAMPLE_TIME sample type. The values were obtaining
from local (sched) clock, which is unavailable in userspace. This made
it impossible to correlate perf data with any other events. Other
tracing solutions have the source configurable (ftrace) or just share
a common time domain between kernel and userspace (LTTng).

Follow the trend by using monotonic clock, which is readily available
as POSIX CLOCK_MONOTONIC.

Also add a sysctl "perf_sample_time_clk_id" attribute which can be used
by the user to obtain the clk_id to be used with POSIX clock API (eg.
clock_gettime()) to obtain a time value comparable with perf samples.

Signed-off-by: Pawel Moll <pawel.moll@arm.com>
---

Ingo, I remember your comments about this approach in the past, but
during discussions at LPC Thomas was convinced that it's the right
thing to do - see cover letter for the series...

 include/linux/perf_event.h | 1 +
 kernel/events/core.c       | 4 +++-
 kernel/sysctl.c            | 7 +++++++
 3 files changed, 11 insertions(+), 1 deletion(-)

Comments

Peter Zijlstra Nov. 4, 2014, 7:23 a.m. UTC | #1
On Tue, Nov 04, 2014 at 12:28:36AM +0000, Pawel Moll wrote:

> +int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC;

const ?

>  /*
>   * perf samples are done in some very critical code paths (NMIs).
>   * If they take too much CPU time, the system can lock up and not
> @@ -324,7 +326,7 @@ extern __weak const char *perf_pmu_name(void)
>  
>  static inline u64 perf_clock(void)
>  {
> -	return local_clock();
> +	return ktime_get_mono_fast_ns();
>  }

Do we maybe want to make it boot-time switchable back to local_clock for
people with bad systems and or backwards compat issues?

Everybody using Core2 and older will very much not want to have this
unless they've got a very good reason for wanting it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Pawel Moll Nov. 4, 2014, 3:25 p.m. UTC | #2
On Tue, 2014-11-04 at 07:23 +0000, Peter Zijlstra wrote:
> On Tue, Nov 04, 2014 at 12:28:36AM +0000, Pawel Moll wrote:
> 
> > +int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC;
> 
> const ?

Sure (unless we have to change it as mentioned below)

> >  /*
> >   * perf samples are done in some very critical code paths (NMIs).
> >   * If they take too much CPU time, the system can lock up and not
> > @@ -324,7 +326,7 @@ extern __weak const char *perf_pmu_name(void)
> >  
> >  static inline u64 perf_clock(void)
> >  {
> > -	return local_clock();
> > +	return ktime_get_mono_fast_ns();
> >  }
> 
> Do we maybe want to make it boot-time switchable back to local_clock for
> people with bad systems and or backwards compat issues?

Very good idea, should have came up with it myself :-)

Does __setup("perf_use_local_clock") sound reasonable? Then we have to
decide whether to hide the sysctl "perf_sample_time_clk_id" (my
preferred option, will see how difficult it is) or just provide an
invalid clock_id (eg. -1) in it.

Cheers!

Pawel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Peter Zijlstra Nov. 4, 2014, 3:30 p.m. UTC | #3
On Tue, Nov 04, 2014 at 03:25:27PM +0000, Pawel Moll wrote:
> Very good idea, should have came up with it myself :-)
> 
> Does __setup("perf_use_local_clock") sound reasonable?

Would not the 'module' already prefix a perf.? I never quite know how
all that works out. But sure, that works.

> Then we have to
> decide whether to hide the sysctl "perf_sample_time_clk_id" (my
> preferred option, will see how difficult it is) or just provide an
> invalid clock_id (eg. -1) in it.

invalid clock id is fine with me, although I suppose the clock people
might have an oh-pinion ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 893a0d0..ba490d5 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -738,6 +738,7 @@  extern int sysctl_perf_event_paranoid;
 extern int sysctl_perf_event_mlock;
 extern int sysctl_perf_event_sample_rate;
 extern int sysctl_perf_cpu_time_max_percent;
+extern int sysctl_perf_sample_time_clk_id;
 
 extern void perf_sample_event_took(u64 sample_len_ns);
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2b02c9f..ea3d6d3 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -234,6 +234,8 @@  int perf_cpu_time_max_percent_handler(struct ctl_table *table, int write,
 	return 0;
 }
 
+int sysctl_perf_sample_time_clk_id = CLOCK_MONOTONIC;
+
 /*
  * perf samples are done in some very critical code paths (NMIs).
  * If they take too much CPU time, the system can lock up and not
@@ -324,7 +326,7 @@  extern __weak const char *perf_pmu_name(void)
 
 static inline u64 perf_clock(void)
 {
-	return local_clock();
+	return ktime_get_mono_fast_ns();
 }
 
 static inline struct perf_cpu_context *
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 15f2511..cb75f5b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1094,6 +1094,13 @@  static struct ctl_table kern_table[] = {
 		.extra1		= &zero,
 		.extra2		= &one_hundred,
 	},
+	{
+		.procname	= "perf_sample_time_clk_id",
+		.data		= &sysctl_perf_sample_time_clk_id,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0444,
+		.proc_handler	= proc_dointvec,
+	},
 #endif
 #ifdef CONFIG_KMEMCHECK
 	{