Message ID | 1420850068-27828-7-git-send-email-john.stultz@linaro.org |
---|---|
State | New |
Headers | show |
On Sun, Jan 11, 2015 at 4:41 AM, Richard Cochran <richardcochran@gmail.com> wrote: > On Fri, Jan 09, 2015 at 04:34:24PM -0800, John Stultz wrote: >> When calculating the current delta since the last tick, we >> currently have no hard protections to prevent a multiplciation >> overflow from ocurring. > > This is just papering over the problem. The "hard protection" should > be having a tick scheduled before the range of the clock source is > exhausted. So I disagree this is papering over the problem. You say the tick should be scheduled before the clocksource wraps - but we have logic to do that. However there are many ways that can still go wrong. Virtualization can delay interrupts for long periods of time, the timer/irq code isn't the simplest and there can be bugs, or timer hardware itself can have issues. The difficulty is that when something has gone wrong, the only thing we have to measure the problem may become corrupted. And worse, once the timekeeping code is having problems, that can result in bugs that manifest in all sorts of strange ways that are very difficult to debug (you can't trust your log timestamps, etc). So I think having some extra measures of protection is useful here. I'll admit that its always difficult to manage, since we have to layer our checks, we have circular dependencies (timer code needs timekeeping to be correct, timekeeping code needs timer code to be correct), and hardware problems are rampant - so we get trouble like the clocksource watchdog which uses more trustworthy clocksources to watch less trustworthy ones, but then hardware starts adding bugs to the trustworthy ones which cause false positives, etc. And these checks make the code add complexity to the code that we'd be happier without, but we can't throw out supporting the majority of hardware that have some quirk and imperfection, so I'm not sure what the alternative should be. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, Jan 13, 2015 at 3:11 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Jan 09, 2015 at 04:34:24PM -0800, John Stultz wrote: >> When calculating the current delta since the last tick, we >> currently have no hard protections to prevent a multiplciation >> overflow from ocurring. >> >> This patch introduces such a cap that limits the read delta >> value to the max_cycles value, which is where an overflow would >> occur. > >> +++ b/kernel/time/timekeeping.c >> @@ -202,6 +202,9 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr) >> /* calculate the delta since the last update_wall_time: */ >> delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); >> >> + /* Cap delta value to the max_cycles values to avoid mult overflows */ >> + delta = min(delta, tkr->clock->max_cycles); >> + >> nsec = delta * tkr->mult + tkr->xtime_nsec; >> nsec >>= tkr->shift; >> > > So while I appreciate stuff can be broken, should we not at least keep > track of this brokenness? That is, we all agree bad things happened IF > we actually hit this, right? So should we then not inform people that > bad things did happen? So since this is a time reading function, this could be called anywhere. So I'm hesitant to try to printk anything in such a hot path. Though, if we catch such a large delta during the timekeeping update function, we will print a warning (which is done in an earlier patch in the series). Were you thinking of something else maybe? I guess we could set a flag and then print later (if there is a later), but we'd lose much of the context of what went wrong. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Wed, Jan 14, 2015 at 1:35 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Tue, Jan 13, 2015 at 01:33:29PM -0800, John Stultz wrote: >> On Tue, Jan 13, 2015 at 3:11 AM, Peter Zijlstra <peterz@infradead.org> wrote: >> > On Fri, Jan 09, 2015 at 04:34:24PM -0800, John Stultz wrote: >> >> When calculating the current delta since the last tick, we >> >> currently have no hard protections to prevent a multiplciation >> >> overflow from ocurring. >> >> >> >> This patch introduces such a cap that limits the read delta >> >> value to the max_cycles value, which is where an overflow would >> >> occur. >> > >> >> +++ b/kernel/time/timekeeping.c >> >> @@ -202,6 +202,9 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr) >> >> /* calculate the delta since the last update_wall_time: */ >> >> delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); >> >> >> >> + /* Cap delta value to the max_cycles values to avoid mult overflows */ >> >> + delta = min(delta, tkr->clock->max_cycles); >> >> + >> >> nsec = delta * tkr->mult + tkr->xtime_nsec; >> >> nsec >>= tkr->shift; >> >> >> > >> > So while I appreciate stuff can be broken, should we not at least keep >> > track of this brokenness? That is, we all agree bad things happened IF >> > we actually hit this, right? So should we then not inform people that >> > bad things did happen? >> >> So since this is a time reading function, this could be called >> anywhere. So I'm hesitant to try to printk anything in such a hot >> path. Though, if we catch such a large delta during the timekeeping >> update function, we will print a warning (which is done in an earlier >> patch in the series). >> >> Were you thinking of something else maybe? I guess we could set a flag >> and then print later (if there is a later), but we'd lose much of the >> context of what went wrong. > > Maybe a stats counter? In any case, keeping it silent seems the wrong > thing. So I spent a bit of time looking into if we could track if we were actually seeing underflows, and to do some rate limited printing in a different context to provide some warning. Luckily this found one bug with my underflow checking logic (falsely triggering if delta was zero), but even then I was occasionally seeing reports of underflows on my test systems using the hpet (as well as other clocksources). I added a bunch of extra debug logic and started to see really strange underflow behavior, where hardware reads were returning values slightly before cycle_last. Then after banging my head on it a bit with a knot in my stomach that I've uncovered a bigger issue, I realized that we're in a seqlock loop. Duh. So the data reads are done optimistically and we throw away any reads that might have raced with the writer. So I was doing a hardware read, then having an interrupt land and update the cycle_last state with even newer time, then go back see the negative delta, flag to report a warning, then throw all the data away because we noticed the seqlock was updated. So this makes it a little hard to have the protective capping / underflow logic also trigger a warning, because they may trigger in many cases where the data is all junk and we're going to repeat the calculation anyway. Its also a bit nested down a few functions below where we actually take the seqlock, so we'd probably have to leave the capping alone and then find some way to capture and sanity check the data points after we left the lock. That gets a bit ugly. I'll see if I can't sort out something better, but I'll probably drop the reporting patch I was working on before resending this series. thanks -john -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 0dcceba..9740fd8 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -202,6 +202,9 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr) /* calculate the delta since the last update_wall_time: */ delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask); + /* Cap delta value to the max_cycles values to avoid mult overflows */ + delta = min(delta, tkr->clock->max_cycles); + nsec = delta * tkr->mult + tkr->xtime_nsec; nsec >>= tkr->shift; @@ -221,6 +224,9 @@ static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk) /* calculate the delta since the last update_wall_time: */ delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask); + /* Cap delta value to the max_cycles values to avoid mult overflows */ + delta = min(delta, tk->tkr.clock->max_cycles); + /* convert delta to nanoseconds. */ nsec = clocksource_cyc2ns(delta, clock->mult, clock->shift); @@ -482,6 +488,9 @@ static void timekeeping_forward_now(struct timekeeper *tk) delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask); tk->tkr.cycle_last = cycle_now; + /* Cap delta value to the max_cycles values to avoid mult overflows */ + delta = min(delta, tk->tkr.clock->max_cycles); + tk->tkr.xtime_nsec += delta * tk->tkr.mult; /* If arch requires, add in get_arch_timeoffset() */
When calculating the current delta since the last tick, we currently have no hard protections to prevent a multiplciation overflow from ocurring. This patch introduces such a cap that limits the read delta value to the max_cycles value, which is where an overflow would occur. There was some concern that capping time like this could cause problems as we may stop expiring timers, which could go circular if the timer that triggers time accumulation were misscheduled too far in the future, which would cause time to stop. However, since the mult overflow would result in a smaller time value, we would effectively have the same problem there. Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Stephen Boyd <sboyd@codeaurora.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: John Stultz <john.stultz@linaro.org> --- kernel/time/timekeeping.c | 9 +++++++++ 1 file changed, 9 insertions(+)