diff mbox

[RFC,v7,4/5] tracing: Measure delayed hrtimer offset latency

Message ID 1474379517-7705-5-git-send-email-binoy.jayan@linaro.org
State New
Headers show

Commit Message

Binoy Jayan Sept. 20, 2016, 1:51 p.m. UTC
Measure latencies caused due to delayed timer offsets in nanoseconds.
It shows the latency captured due to a delayed timer expire event. It
happens for example when a timer misses its deadline due to disabled
interrupts. A process if scheduled as a result of the timer expiration
suffers this latency. It is used to calculate the total wakeup latency
of a process which is the sum of the delayed timer offset and the
wakeup latency.

[
Initial work and idea by Carsten
Link: https://git.kernel.org/cgit/linux/kernel/git/rt/linux-stable-rt.git/commit/?h=v3.14-rt-rebase&id=56d50cc34943bbba12b8c5942ee1ae3b29f73acb
]

Cc: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>

---
 include/linux/hrtimer.h |  4 ++++
 include/linux/sched.h   |  3 +++
 kernel/time/Kconfig     |  8 ++++++++
 kernel/time/hrtimer.c   | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 62 insertions(+)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Comments

Thomas Gleixner Sept. 20, 2016, 2:19 p.m. UTC | #1
On Tue, 20 Sep 2016, Binoy Jayan wrote:
> +#ifdef CONFIG_TRACE_DELAYED_TIMER_OFFSETS

> +static inline void latency_hrtimer_timing_start(struct hrtimer *timer,

> +					 struct hrtimer_clock_base *new_base,

> +					 ktime_t tim)

> +{

> +	ktime_t now = new_base->get_time();

> +

> +	if (ktime_to_ns(tim) < ktime_to_ns(now))

> +		timer->tim_expiry = now;

> +	else

> +		timer->tim_expiry = ktime_set(0, 0);


You still fail to explain why this get_time() magic is required.

This is executed unconditionally when the config switch is enabled and does
not depend on whether the trace functionality is enabled or not. So you are
imposing the extra get_time() call, which can be expensive depending on the
underlying hardware, on every hrtimer start invocation.

Tracing is supposed to have ZERO impact when it is not used and even when
it's in use then the impact should be kept as low as possible. The above
does none of that.

Neither did you provide a proper rationale for this infrastructure in the
changelog.

You can repost that over and over and it will not go anywhere if you don't
start to address the review comments I give you.

Thanks,

	tglx
Binoy Jayan Sept. 21, 2016, 6:06 a.m. UTC | #2
On 20 September 2016 at 19:49, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Tue, 20 Sep 2016, Binoy Jayan wrote:

>> +#ifdef CONFIG_TRACE_DELAYED_TIMER_OFFSETS

>> +static inline void latency_hrtimer_timing_start(struct hrtimer *timer,

>> +                                      struct hrtimer_clock_base *new_base,

>> +                                      ktime_t tim)

>> +{

>> +     ktime_t now = new_base->get_time();

>> +

>> +     if (ktime_to_ns(tim) < ktime_to_ns(now))

>> +             timer->tim_expiry = now;

>> +     else

>> +             timer->tim_expiry = ktime_set(0, 0);

>

> You still fail to explain why this get_time() magic is required.

>

> This is executed unconditionally when the config switch is enabled and does

> not depend on whether the trace functionality is enabled or not. So you are

> imposing the extra get_time() call, which can be expensive depending on the

> underlying hardware, on every hrtimer start invocation.

>

> Tracing is supposed to have ZERO impact when it is not used and even when

> it's in use then the impact should be kept as low as possible. The above

> does none of that.

>

> Neither did you provide a proper rationale for this infrastructure in the

> changelog.

>

> You can repost that over and over and it will not go anywhere if you don't

> start to address the review comments I give you.


Hi Thomas,

Sorry, I missed to address this comment from you. From what I understand
why the get_time() is needed is to get the more accurate current time when
the hrtimer base is changed (from the cpu in which it was fired on, to
the current
cpu on which it is currently made to run or restarted) wherein the hrtimer base
needs to be switched to the new cpu provided that it is not running in
pinned mode.

Carsten,

Could you please comment on that?

Thanks,
Binoy
Thomas Gleixner Sept. 21, 2016, 3:58 p.m. UTC | #3
On Wed, 21 Sep 2016, Binoy Jayan wrote:
> On 20 September 2016 at 19:49, Thomas Gleixner <tglx@linutronix.de> wrote:

> > You still fail to explain why this get_time() magic is required.

> 

> Sorry, I missed to address this comment from you. From what I understand

> why the get_time() is needed is to get the more accurate current time

> when the hrtimer base is changed (from the cpu in which it was fired on,

> to the current cpu on which it is currently made to run or restarted)

> wherein the hrtimer base needs to be switched to the new cpu provided

> that it is not running in pinned mode.


Sorry. This has nothing to do with changing the hrtimer_base, simply
because the time base is the same on all cpus.

> Carsten, Could you please comment on that?


It's not Carstens repsonsibility to explain the nature of the change.

You are submitting that code and so it's your job to provide proper
explanations and justifications. If you can't do that, then how do you
think that the review process, which is a feedback loop between the
reviewer and the submitter, should work?

Answer: It cannot work that way. I hope I don't have to explain why.

Thanks,

	tglx
Binoy Jayan Sept. 22, 2016, 9:14 a.m. UTC | #4
Hi Thomas,

Thank you for the reply and sharing your insights.

On 21 September 2016 at 21:28, Thomas Gleixner <tglx@linutronix.de> wrote:

> Sorry. This has nothing to do with changing the hrtimer_base, simply

> because the time base is the same on all cpus.


The condition 'ktime_to_ns(tim) < ktime_to_ns(now)' checks if the timer
has already expired w.r.t. 'soft timeout' value as it does not include
the slack value 'delta_ns'. In that case 'tim_expiry' is normalized to
the current time. (I was under the impression that this inaccuracy
could be because timer was initially running on a different cpu. If that
is not the case, I guess we can use the code mentioned below).

Otherwise it postpones the decision of storing the expiry value
until the hrtimer interrupt. In this case, it calculates the latency
using the hard timeout (which includes the fuzz) as returned by a call
to 'hrtimer_get_expires' in 'latency_hrtimer_timing_stop'.

Since for calculating latency, we use hard timeout value which includes
the slack, and since the actual timeout might have happened in between
the soft and hard timeout, the actual expiry time could be less than
the hard expiry time. This is why latency is checked for negative value
before storing when the trace point is hit.

static inline void latency_hrtimer_timing_start(ktime_t tim)
{
     timer->tim_expiry = tim;
}

static inline void latency_hrtimer_timing_stop(struct hrtimer *timer,
                                                ktime_t basenow)
{
        long latency;
        struct task_struct *task;

        latency = ktime_to_ns(basenow) - hrtimer_get_softexpires_tv64(timer);

        task = timer->function == hrtimer_wakeup ?
                        container_of(timer, struct hrtimer_sleeper,
                                     timer)->task : NULL;
        if (task && latency > 0)   // Now the check for latency may
not be needed
                task->timer_offset = latency;
}

I am using 'hrtimer_get_softexpires_tv64' instead of 'hrtimer_get_expires'
so that 'latency' is never negative. Please let me know if this looks ok.

> It's not Carstens repsonsibility to explain the nature of the change.

> You are submitting that code and so it's your job to provide proper

> explanations and justifications. If you can't do that, then how do you

> think that the review process, which is a feedback loop between the

> reviewer and the submitter, should work?

>

> Answer: It cannot work that way. I hope I don't have to explain why.


Sure, I'll avoid this confusion in the future. I think I should have talked
to him only offline and not here.

Thanks,
Binoy
Thomas Gleixner Sept. 22, 2016, 5:28 p.m. UTC | #5
Binoy,

On Thu, 22 Sep 2016, Binoy Jayan wrote:
> The condition 'ktime_to_ns(tim) < ktime_to_ns(now)' checks if the timer

> has already expired w.r.t. 'soft timeout' value as it does not include

> the slack value 'delta_ns'. In that case 'tim_expiry' is normalized to

> the current time.


You are halfways coming close to the point why this is done, but your
conclusion is completely wrong.  The correct keyword is "expired", but
anything else is just random speculation.

> (I was under the impression that this inaccuracy

> could be because timer was initially running on a different cpu. If that

> is not the case, I guess we can use the code mentioned below).


We are not playing a guessing game here.
 
> I am using 'hrtimer_get_softexpires_tv64' instead of 'hrtimer_get_expires'

> so that 'latency' is never negative. Please let me know if this looks ok.


No it does not. And it won't look correct until you finally sit down and
decode and understand the functionality behind this code.

I'm not going to continue this, as it's not my job to explain you the code
which you are trying to submit.

I'm not blaming you, but I blame the responsible persons inside your
company who task you with that and expect that I'm going to do their work
of explaining to you how that code works.

@Arnd, @Mark: I'm starting to get seriously grumpy about that.

Thanks,

	tglx
Mark Brown Sept. 22, 2016, 7:23 p.m. UTC | #6
On Thu, Sep 22, 2016 at 07:28:01PM +0200, Thomas Gleixner wrote:

> I'm not blaming you, but I blame the responsible persons inside your

> company who task you with that and expect that I'm going to do their work

> of explaining to you how that code works.


> @Arnd, @Mark: I'm starting to get seriously grumpy about that.


Really sorry about that, obviously that's not the intention.  We'll try
to avoid this happening in future.
diff mbox

Patch

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 5e00f80..05d8086 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -90,6 +90,7 @@  enum hrtimer_restart {
  * @is_rel:	Set if the timer was armed relative
  * @start_pid:  timer statistics field to store the pid of the task which
  *		started the timer
+ * @tim_expiry: hrtimer expiry time or 0 in case already expired
  * @start_site:	timer statistics field to store the site where the timer
  *		was started
  * @start_comm: timer statistics field to store the name of the process which
@@ -104,6 +105,9 @@  struct hrtimer {
 	struct hrtimer_clock_base	*base;
 	u8				state;
 	u8				is_rel;
+#ifdef CONFIG_TRACE_DELAYED_TIMER_OFFSETS
+	ktime_t				tim_expiry;
+#endif
 #ifdef CONFIG_TIMER_STATS
 	int				start_pid;
 	void				*start_site;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 62c68e5..7bf67f8 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1891,6 +1891,9 @@  struct task_struct {
 	/* bitmask and counter of trace recursion */
 	unsigned long trace_recursion;
 #endif /* CONFIG_TRACING */
+#ifdef CONFIG_TRACE_DELAYED_TIMER_OFFSETS
+	long timer_offset;
+#endif /* CONFIG_TRACE_DELAYED_TIMER_OFFSETS */
 #ifdef CONFIG_KCOV
 	/* Coverage collection mode enabled for this task (0 if disabled). */
 	enum kcov_mode kcov_mode;
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 4008d9f..de4793c 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -193,5 +193,13 @@  config HIGH_RES_TIMERS
 	  hardware is not capable then this option only increases
 	  the size of the kernel image.
 
+config TRACE_DELAYED_TIMER_OFFSETS
+	depends on HIGH_RES_TIMERS
+	select GENERIC_TRACER
+	bool "Delayed Timer Offsets"
+	help
+	  Capture offsets of delayed hrtimer in nanoseconds. It is used
+	  to construct wakeup latency histogram.
+
 endmenu
 endif
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index 9ba7c82..7048f86 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -56,6 +56,8 @@ 
 
 #include "tick-internal.h"
 
+static enum hrtimer_restart hrtimer_wakeup(struct hrtimer *timer);
+
 /*
  * The timer bases:
  *
@@ -960,6 +962,47 @@  static inline ktime_t hrtimer_update_lowres(struct hrtimer *timer, ktime_t tim,
 	return tim;
 }
 
+#ifdef CONFIG_TRACE_DELAYED_TIMER_OFFSETS
+static inline void latency_hrtimer_timing_start(struct hrtimer *timer,
+					 struct hrtimer_clock_base *new_base,
+					 ktime_t tim)
+{
+	ktime_t now = new_base->get_time();
+
+	if (ktime_to_ns(tim) < ktime_to_ns(now))
+		timer->tim_expiry = now;
+	else
+		timer->tim_expiry = ktime_set(0, 0);
+}
+
+static inline void latency_hrtimer_timing_stop(struct hrtimer *timer,
+						ktime_t basenow)
+{
+	long latency;
+	struct task_struct *task;
+
+	latency = ktime_to_ns(ktime_sub(basenow,
+			      ktime_to_ns(timer->tim_expiry) ?
+			      timer->tim_expiry : hrtimer_get_expires(timer)));
+	task = timer->function == hrtimer_wakeup ?
+			container_of(timer, struct hrtimer_sleeper,
+				     timer)->task : NULL;
+	if (task && latency > 0)
+		task->timer_offset = latency;
+}
+#else
+static inline void latency_hrtimer_timing_start(struct hrtimer *timer,
+					 struct hrtimer_clock_base *new_base,
+					 ktime_t tim)
+{
+}
+static inline void latency_hrtimer_timing_stop(struct hrtimer *timer,
+						ktime_t basenow)
+{
+}
+
+#endif
+
 /**
  * hrtimer_start_range_ns - (re)start an hrtimer on the current CPU
  * @timer:	the timer to be added
@@ -992,6 +1035,8 @@  void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
 
 	timer_stats_hrtimer_set_start_info(timer);
 
+	latency_hrtimer_timing_start(timer, new_base, tim);
+
 	leftmost = enqueue_hrtimer(timer, new_base);
 	if (!leftmost)
 		goto unlock;
@@ -1284,6 +1329,8 @@  static void __hrtimer_run_queues(struct hrtimer_cpu_base *cpu_base, ktime_t now)
 
 			timer = container_of(node, struct hrtimer, node);
 
+			latency_hrtimer_timing_stop(timer, basenow);
+
 			/*
 			 * The immediate goal for using the softexpires is
 			 * minimizing wakeups, not running timers at the