mbox series

[RFC,0/4] Fixes for two recently found timekeeping bugs

Message ID 1495856035-6622-1-git-send-email-john.stultz@linaro.org
Headers show
Series Fixes for two recently found timekeeping bugs | expand

Message

John Stultz May 27, 2017, 3:33 a.m. UTC
As part of the Linaro Linux Kernel Functional Test (LKFT)
effort, test failures from kselftest/timer's
inconsistency-check were reported connected to
CLOCK_MONOTONIC_RAW, on the HiKey platform.

Digging in I found that an old issue with how sub-ns accounting
is handled with the RAW time which was fixed long ago with the
CLOCK_MONOTONIC/REALTIME ids, but missed with RAW time, was
present.

Additionally, running further tests, I uncovered an issue with
how the clocksource read function is handled when clocksources
are changed, which can cause crashes.

Both of these issues have not been uncovered in x86 based
testing due to x86 not using vDSO to accelerate
CLOCK_MONOTONIC_RAW, combined with the HiKey's arch_timer
clocksource being fast to access but incrementing slowly enough
to get multiple reads using the same counter value (which helps
uncover time handing issues), along with the fact that none of
the x86 clocksources making use of the clocksource argument
passed to the read function.

This patchset addresses these two issues. 

Thanks so much to help from Will Deacon in getting the needed
adjustments to the arm64 vDSO in place. Also thanks to Daniel
Mentz who also properly diagnosed the MONOTONIC_RAW issue in
parallel while I was woking on this patchset.

I wanted to send these out for some initial review. I'm
unfortunately still chasing a third issue (related to
inconsistencies triggered during extreme freq adjustments) I've
uncovered on HiKey, which doesn't seem to be related to the
previous two.

As always feedback would be appreciated!

thanks
-john

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Daniel Mentz <danielmentz@google.com>


John Stultz (3):
  time: Fix clock->read(clock) race around clocksource changes
  time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
  time: Clean up CLOCK_MONOTONIC_RAW time handling

Will Deacon (1):
  arm64: vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW

 arch/arm64/kernel/vdso.c              |  5 +-
 arch/arm64/kernel/vdso/gettimeofday.S |  1 -
 include/linux/timekeeper_internal.h   |  8 +--
 kernel/time/timekeeping.c             | 96 ++++++++++++++++++++++-------------
 4 files changed, 66 insertions(+), 44 deletions(-)

-- 
2.7.4

Comments

Ingo Molnar May 27, 2017, 7:38 a.m. UTC | #1
* John Stultz <john.stultz@linaro.org> wrote:

> As part of the Linaro Linux Kernel Functional Test (LKFT)

> effort, test failures from kselftest/timer's

> inconsistency-check were reported connected to

> CLOCK_MONOTONIC_RAW, on the HiKey platform.

> 

> Digging in I found that an old issue with how sub-ns accounting

> is handled with the RAW time which was fixed long ago with the

> CLOCK_MONOTONIC/REALTIME ids, but missed with RAW time, was

> present.

> 

> Additionally, running further tests, I uncovered an issue with

> how the clocksource read function is handled when clocksources

> are changed, which can cause crashes.

> 

> Both of these issues have not been uncovered in x86 based

> testing due to x86 not using vDSO to accelerate

> CLOCK_MONOTONIC_RAW, combined with the HiKey's arch_timer

> clocksource being fast to access but incrementing slowly enough

> to get multiple reads using the same counter value (which helps

> uncover time handing issues), along with the fact that none of

> the x86 clocksources making use of the clocksource argument

> passed to the read function.

> 

> This patchset addresses these two issues. 


AFAICS only the first two patches are fixes, the other two patches are 
cleanups/simplifications that resulted out of the debugging effort, right?

Thanks,

	Ingo
John Stultz May 27, 2017, 4:16 p.m. UTC | #2
On Sat, May 27, 2017 at 12:38 AM, Ingo Molnar <mingo@kernel.org> wrote:
>

> * John Stultz <john.stultz@linaro.org> wrote:

>

>> As part of the Linaro Linux Kernel Functional Test (LKFT)

>> effort, test failures from kselftest/timer's

>> inconsistency-check were reported connected to

>> CLOCK_MONOTONIC_RAW, on the HiKey platform.

>>

>> Digging in I found that an old issue with how sub-ns accounting

>> is handled with the RAW time which was fixed long ago with the

>> CLOCK_MONOTONIC/REALTIME ids, but missed with RAW time, was

>> present.

>>

>> Additionally, running further tests, I uncovered an issue with

>> how the clocksource read function is handled when clocksources

>> are changed, which can cause crashes.

>>

>> Both of these issues have not been uncovered in x86 based

>> testing due to x86 not using vDSO to accelerate

>> CLOCK_MONOTONIC_RAW, combined with the HiKey's arch_timer

>> clocksource being fast to access but incrementing slowly enough

>> to get multiple reads using the same counter value (which helps

>> uncover time handing issues), along with the fact that none of

>> the x86 clocksources making use of the clocksource argument

>> passed to the read function.

>>

>> This patchset addresses these two issues.

>

> AFAICS only the first two patches are fixes, the other two patches are

> cleanups/simplifications that resulted out of the debugging effort, right?


Actually the first three are fixes (ARM64 still sees discontinuities
until the vDSO is fixed), the last one is a cleanup.

thanks
-john
Ingo Molnar May 28, 2017, 8:54 a.m. UTC | #3
* John Stultz <john.stultz@linaro.org> wrote:

> On Sat, May 27, 2017 at 12:38 AM, Ingo Molnar <mingo@kernel.org> wrote:

> >

> > * John Stultz <john.stultz@linaro.org> wrote:

> >

> >> As part of the Linaro Linux Kernel Functional Test (LKFT)

> >> effort, test failures from kselftest/timer's

> >> inconsistency-check were reported connected to

> >> CLOCK_MONOTONIC_RAW, on the HiKey platform.

> >>

> >> Digging in I found that an old issue with how sub-ns accounting

> >> is handled with the RAW time which was fixed long ago with the

> >> CLOCK_MONOTONIC/REALTIME ids, but missed with RAW time, was

> >> present.

> >>

> >> Additionally, running further tests, I uncovered an issue with

> >> how the clocksource read function is handled when clocksources

> >> are changed, which can cause crashes.

> >>

> >> Both of these issues have not been uncovered in x86 based

> >> testing due to x86 not using vDSO to accelerate

> >> CLOCK_MONOTONIC_RAW, combined with the HiKey's arch_timer

> >> clocksource being fast to access but incrementing slowly enough

> >> to get multiple reads using the same counter value (which helps

> >> uncover time handing issues), along with the fact that none of

> >> the x86 clocksources making use of the clocksource argument

> >> passed to the read function.

> >>

> >> This patchset addresses these two issues.

> >

> > AFAICS only the first two patches are fixes, the other two patches are

> > cleanups/simplifications that resulted out of the debugging effort, right?

> 

> Actually the first three are fixes (ARM64 still sees discontinuities

> until the vDSO is fixed), the last one is a cleanup.


Ok, please make it more apparent in the changelog of the third patch what user 
observable mis-behavior is actually fixed by it: corrupted time, non-monotonic 
time, or something else?

Thanks,

	Ingo
Daniel Mentz May 31, 2017, 12:11 a.m. UTC | #4
I successfully tested these four patches on a HiKey (ARMv8) board
using a 4.9 kernel. I cherry picked various commits from upstream that
touched kernel/time/timekeeping.c to ensure that John's patches apply
cleanly. The inconsistency-check from kselftest that previously failed
because of CLOCK_MONOTONIC_RAW jumping backwards by 1 ns is now
passing.

Tested-by: Daniel Mentz <danielmentz@google.com>


On Tue, May 30, 2017 at 5:10 PM, Daniel Mentz <danielmentz@google.com> wrote:
> I successfully tested these four patches on a HiKey (ARMv8) board using a

> 4.9 kernel. I cherry picked various commits from upstream that touched

> kernel/time/timekeeping.c to ensure that John's patches apply cleanly. The

> inconsistency-check from kselftest that previously failed because of

> CLOCK_MONOTONIC_RAW jumping backwards by 1 ns is now passing.

>

> Tested-by: Daniel Mentz <danielmentz@google.com>

>

> On Fri, May 26, 2017 at 8:33 PM, John Stultz <john.stultz@linaro.org> wrote:

>>

>> As part of the Linaro Linux Kernel Functional Test (LKFT)

>> effort, test failures from kselftest/timer's

>> inconsistency-check were reported connected to

>> CLOCK_MONOTONIC_RAW, on the HiKey platform.

>>

>> Digging in I found that an old issue with how sub-ns accounting

>> is handled with the RAW time which was fixed long ago with the

>> CLOCK_MONOTONIC/REALTIME ids, but missed with RAW time, was

>> present.

>>

>> Additionally, running further tests, I uncovered an issue with

>> how the clocksource read function is handled when clocksources

>> are changed, which can cause crashes.

>>

>> Both of these issues have not been uncovered in x86 based

>> testing due to x86 not using vDSO to accelerate

>> CLOCK_MONOTONIC_RAW, combined with the HiKey's arch_timer

>> clocksource being fast to access but incrementing slowly enough

>> to get multiple reads using the same counter value (which helps

>> uncover time handing issues), along with the fact that none of

>> the x86 clocksources making use of the clocksource argument

>> passed to the read function.

>>

>> This patchset addresses these two issues.

>>

>> Thanks so much to help from Will Deacon in getting the needed

>> adjustments to the arm64 vDSO in place. Also thanks to Daniel

>> Mentz who also properly diagnosed the MONOTONIC_RAW issue in

>> parallel while I was woking on this patchset.

>>

>> I wanted to send these out for some initial review. I'm

>> unfortunately still chasing a third issue (related to

>> inconsistencies triggered during extreme freq adjustments) I've

>> uncovered on HiKey, which doesn't seem to be related to the

>> previous two.

>>

>> As always feedback would be appreciated!

>>

>> thanks

>> -john

>>

>> Cc: Thomas Gleixner <tglx@linutronix.de>

>> Cc: Ingo Molnar <mingo@kernel.org>

>> Cc: Miroslav Lichvar <mlichvar@redhat.com>

>> Cc: Richard Cochran <richardcochran@gmail.com>

>> Cc: Prarit Bhargava <prarit@redhat.com>

>> Cc: Stephen Boyd <stephen.boyd@linaro.org>

>> Cc: Daniel Mentz <danielmentz@google.com>

>>

>>

>> John Stultz (3):

>>   time: Fix clock->read(clock) race around clocksource changes

>>   time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting

>>   time: Clean up CLOCK_MONOTONIC_RAW time handling

>>

>> Will Deacon (1):

>>   arm64: vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW

>>

>>  arch/arm64/kernel/vdso.c              |  5 +-

>>  arch/arm64/kernel/vdso/gettimeofday.S |  1 -

>>  include/linux/timekeeper_internal.h   |  8 +--

>>  kernel/time/timekeeping.c             | 96

>> ++++++++++++++++++++++-------------

>>  4 files changed, 66 insertions(+), 44 deletions(-)

>>

>> --

>> 2.7.4

>>

>