diff mbox

timekeeping: Change type of nsec variable to unsigned in its calculation.

Message ID 1479531216-25361-1-git-send-email-john.stultz@linaro.org
State New
Headers show

Commit Message

John Stultz Nov. 19, 2016, 4:53 a.m. UTC
From: Liav Rehana <liavr@mellanox.com>


During the calculation of the nsec variable in the inline function
timekeeping_delta_to_ns, it may undergo a sign extension if its msb
is set just before the shift. The sign extension may, in some cases,
gain it a value near the maximum value of the 64-bit range. This is
bad when it is later used in a division function, such as
__iter_div_u64_rem, where the amount of loops it will go through to
calculate the division will be too large. One can encounter such a
problem, for example, when trying to connect through ftp from an
outside host to the operation system. When the OS is too overloaded,
delta will get a high enough value for the msb of the sum
delta * tkr->mult + tkr->xtime_nsec to be set, and so after the
shift the nsec variable will gain a value similar to
0xffffffffff000000. Using a variable with such a value in the
inline function __iter_div_u64_rem will take too long, making the
ftp connection attempt seem to get stuck.
The following commit fixes that chance of sign extension, while
maintaining the type of the nsec variable as signed for other
functions that use this variable, for possible legit negative
time intervals.

Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "Christopher S . Hall" <christopher.s.hall@intel.com>
Cc: stable@vger.kernel.org  (4.6+)
Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation")
Also-Reported-by: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Liav Rehana <liavr@mellanox.com>

Signed-off-by: John Stultz <john.stultz@linaro.org>

---
Thomas/Ingo: This is for tip:timers/urgent.

 kernel/time/timekeeping.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.7.4

Comments

David Gibson Dec. 3, 2016, 12:33 a.m. UTC | #1
On Fri, Dec 02, 2016 at 09:36:42AM +0100, Thomas Gleixner wrote:
> On Fri, 2 Dec 2016, David Gibson wrote:

> > On Thu, Dec 01, 2016 at 12:59:51PM +0100, Thomas Gleixner wrote:

> > > So I assume that you are talking about a VM which was not scheduled by the

> > > host due to overcommitment (who ever thought that this is a good idea) or

> > > whatever other reason (yes, people were complaining about wreckage caused

> > > by stopping kernels with debuggers) for a long enough time to trigger that

> > > overflow situation. If that's the case then the unsigned conversion will

> > > just make it more unlikely but it still will happen.

> > 

> > It was essentially the stopped by debugger case.  I forget exactly

> > why, but the guest was being explicitly stopped from outside, it

> > wasn't just scheduling lag.  I think it was something in the vicinity

> > of 10 minutes stopped.

> 

> Ok. Debuggers stopping stuff is one issue, but if I understood Liav

> correctly, then he is seing the issue on a heavy loaded machine.


Right.  I can't speak to other situations which might trigger this.

> Liav, can you please describe the scenario in detail? Are you observing

> this on bare metal or in a VM which gets scheduled out long enough or was

> there debugging/hypervisor intervention involved?

> 

> > It's long enough ago that I can't be sure, but I thought we'd tried

> > various different stoppage periods, which should have also triggered

> > the unsigned overflow you're describing, and didn't observe the crash

> > once the change was applied.  Note that there have been other changes

> > to the timekeeping code since then, which might have made a

> > difference.

> > 

> > I agree that it's not reasonable for the guest to be entirely

> > unaffected by such a large stoppage: I'd have no complaints if the

> > guest time was messed up, and/or it spewed warnings.  But complete

> > guest death seems a rather more fragile response to the situation than

> > we'd like.

> 

> Guests death? Is it really dead/crashed or just stuck in that endless loop

> trying to add that huge negative value piecewise?


Well, I don't know.  But the point was it was unusable from the
console, and didn't come back any time soon.

> That's at least what Liav was describing as he mentioned

> __iter_div_u64_rem() explicitely.

> 

> While I'm less worried about debuggers, I worry about the real thing.

> 

> I agree that we should not starve after resume from a debug stop, but in

> that case the least of my worries is time going backwards.

> 

> Though if the signed mult overrun is observable in a live system, then we

> need to worry about time going backwards even with the unsigned

> conversion. Simply because once we fixed the starvation issue people with

> insane enough setups will trigger the unsigned overrun and complain about

> time going backwards.

> 

> Thanks,

> 

> 	tglx

> 

> 


-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
diff mbox

Patch

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 37dec7e..46e312e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -299,10 +299,10 @@  u32 (*arch_gettimeoffset)(void) = default_arch_gettimeoffset;
 static inline u32 arch_gettimeoffset(void) { return 0; }
 #endif
 
-static inline s64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
+static inline u64 timekeeping_delta_to_ns(struct tk_read_base *tkr,
 					  cycle_t delta)
 {
-	s64 nsec;
+	u64 nsec;
 
 	nsec = delta * tkr->mult + tkr->xtime_nsec;
 	nsec >>= tkr->shift;