diff mbox

[06/10] time: Cap clocksource reads to the clocksource max_cycles value

Message ID 1420850068-27828-7-git-send-email-john.stultz@linaro.org
State New
Headers show

Commit Message

John Stultz Jan. 10, 2015, 12:34 a.m. UTC
When calculating the current delta since the last tick, we
currently have no hard protections to prevent a multiplciation
overflow from ocurring.

This patch introduces such a cap that limits the read delta
value to the max_cycles value, which is where an overflow would
occur.

There was some concern that capping time like this could cause
problems as we may stop expiring timers, which could go circular
if the timer that triggers time accumulation were misscheduled
too far in the future, which would cause time to stop.

However, since the mult overflow would result in a smaller time
value, we would effectively have the same problem there.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

John Stultz Jan. 12, 2015, 6:54 p.m. UTC | #1
On Sun, Jan 11, 2015 at 4:41 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Fri, Jan 09, 2015 at 04:34:24PM -0800, John Stultz wrote:
>> When calculating the current delta since the last tick, we
>> currently have no hard protections to prevent a multiplciation
>> overflow from ocurring.
>
> This is just papering over the problem. The "hard protection" should
> be having a tick scheduled before the range of the clock source is
> exhausted.

So I disagree this is papering over the problem.

You say the tick should be scheduled before the clocksource wraps -
but we have logic to do that.

However there are many ways that can still go wrong.  Virtualization
can delay interrupts for long periods of time, the timer/irq code
isn't the simplest and there can be bugs, or timer hardware itself can
have issues. The difficulty is that when something has gone wrong, the
only thing we have to measure the problem may become corrupted.  And
worse, once the timekeeping code is having problems,  that can result
in bugs that manifest in all sorts of strange ways that are very
difficult to debug (you can't trust your log timestamps, etc).

So I think having some extra measures of protection is useful here.

I'll admit that its always difficult to manage, since we have to layer
our checks, we have circular dependencies (timer code needs
timekeeping to be correct, timekeeping code needs timer code to be
correct), and hardware problems are rampant - so we get trouble like
the clocksource watchdog which uses more trustworthy clocksources to
watch less trustworthy ones, but then hardware starts adding bugs to
the trustworthy ones which cause false positives, etc.   And these
checks make the code add complexity to the code that we'd be happier
without, but we can't throw out supporting the majority of hardware
that have some quirk and imperfection, so I'm not sure what the
alternative should be.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
John Stultz Jan. 13, 2015, 9:33 p.m. UTC | #2
On Tue, Jan 13, 2015 at 3:11 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jan 09, 2015 at 04:34:24PM -0800, John Stultz wrote:
>> When calculating the current delta since the last tick, we
>> currently have no hard protections to prevent a multiplciation
>> overflow from ocurring.
>>
>> This patch introduces such a cap that limits the read delta
>> value to the max_cycles value, which is where an overflow would
>> occur.
>
>> +++ b/kernel/time/timekeeping.c
>> @@ -202,6 +202,9 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
>>       /* calculate the delta since the last update_wall_time: */
>>       delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
>>
>> +     /* Cap delta value to the max_cycles values to avoid mult overflows */
>> +     delta = min(delta, tkr->clock->max_cycles);
>> +
>>       nsec = delta * tkr->mult + tkr->xtime_nsec;
>>       nsec >>= tkr->shift;
>>
>
> So while I appreciate stuff can be broken, should we not at least keep
> track of this brokenness? That is, we all agree bad things happened IF
> we actually hit this, right? So should we then not inform people that
> bad things did happen?

So since this is a time reading function, this could be called
anywhere. So I'm hesitant to try to printk anything in such a hot
path. Though, if we catch such a large delta during the timekeeping
update function, we will print a warning (which is done in an earlier
patch in the series).

Were you thinking of something else maybe? I guess we could set a flag
and then print later (if there is a later), but we'd lose much of the
context of what went wrong.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
John Stultz Jan. 22, 2015, 8:55 p.m. UTC | #3
On Wed, Jan 14, 2015 at 1:35 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Jan 13, 2015 at 01:33:29PM -0800, John Stultz wrote:
>> On Tue, Jan 13, 2015 at 3:11 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Fri, Jan 09, 2015 at 04:34:24PM -0800, John Stultz wrote:
>> >> When calculating the current delta since the last tick, we
>> >> currently have no hard protections to prevent a multiplciation
>> >> overflow from ocurring.
>> >>
>> >> This patch introduces such a cap that limits the read delta
>> >> value to the max_cycles value, which is where an overflow would
>> >> occur.
>> >
>> >> +++ b/kernel/time/timekeeping.c
>> >> @@ -202,6 +202,9 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
>> >>       /* calculate the delta since the last update_wall_time: */
>> >>       delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
>> >>
>> >> +     /* Cap delta value to the max_cycles values to avoid mult overflows */
>> >> +     delta = min(delta, tkr->clock->max_cycles);
>> >> +
>> >>       nsec = delta * tkr->mult + tkr->xtime_nsec;
>> >>       nsec >>= tkr->shift;
>> >>
>> >
>> > So while I appreciate stuff can be broken, should we not at least keep
>> > track of this brokenness? That is, we all agree bad things happened IF
>> > we actually hit this, right? So should we then not inform people that
>> > bad things did happen?
>>
>> So since this is a time reading function, this could be called
>> anywhere. So I'm hesitant to try to printk anything in such a hot
>> path. Though, if we catch such a large delta during the timekeeping
>> update function, we will print a warning (which is done in an earlier
>> patch in the series).
>>
>> Were you thinking of something else maybe? I guess we could set a flag
>> and then print later (if there is a later), but we'd lose much of the
>> context of what went wrong.
>
> Maybe a stats counter? In any case, keeping it silent seems the wrong
> thing.

So I spent a bit of time looking into if we could track if we were
actually seeing underflows, and to do some rate limited printing in a
different context to provide some warning.

Luckily this found one bug with my underflow checking logic (falsely
triggering if delta was zero), but even then I was occasionally seeing
reports of underflows on my test systems using the hpet (as well as
other clocksources). I added a bunch of extra debug logic and started
to see really strange underflow behavior, where hardware reads were
returning values slightly before cycle_last.

Then after banging my head on it a bit with a knot in my stomach that
I've uncovered a bigger issue, I realized that we're in a seqlock
loop. Duh. So the data reads are done optimistically and we throw away
any reads that might have raced with the writer. So I was doing a
hardware read, then having an interrupt land and update the cycle_last
state with even newer time, then go back see the negative delta, flag
to report a warning, then throw all the data away because we noticed
the seqlock was updated.

So this makes it a little hard to have the protective capping /
underflow logic also trigger a warning, because they may trigger in
many cases where the data is all junk and we're going to repeat the
calculation anyway. Its also a bit nested down a few functions below
where we actually take the seqlock, so we'd probably have to leave the
capping alone and then find some way to capture and sanity check the
data points after we left the lock. That gets a bit ugly.

I'll see if I can't sort out something better, but I'll probably drop
the reporting patch I was working on before resending this series.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 0dcceba..9740fd8 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -202,6 +202,9 @@  static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 	/* calculate the delta since the last update_wall_time: */
 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
 
+	/* Cap delta value to the max_cycles values to avoid mult overflows */
+	delta = min(delta, tkr->clock->max_cycles);
+
 	nsec = delta * tkr->mult + tkr->xtime_nsec;
 	nsec >>= tkr->shift;
 
@@ -221,6 +224,9 @@  static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk)
 	/* calculate the delta since the last update_wall_time: */
 	delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask);
 
+	/* Cap delta value to the max_cycles values to avoid mult overflows */
+	delta = min(delta, tk->tkr.clock->max_cycles);
+
 	/* convert delta to nanoseconds. */
 	nsec = clocksource_cyc2ns(delta, clock->mult, clock->shift);
 
@@ -482,6 +488,9 @@  static void timekeeping_forward_now(struct timekeeper *tk)
 	delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask);
 	tk->tkr.cycle_last = cycle_now;
 
+	/* Cap delta value to the max_cycles values to avoid mult overflows */
+	delta = min(delta, tk->tkr.clock->max_cycles);
+
 	tk->tkr.xtime_nsec += delta * tk->tkr.mult;
 
 	/* If arch requires, add in get_arch_timeoffset() */