diff mbox series

pstore: use ktime_get_real_fast_ns() instead of __getnstimeofday()

Message ID 20171108160056.3042976-1-arnd@arndb.de
State Superseded
Headers show
Series pstore: use ktime_get_real_fast_ns() instead of __getnstimeofday() | expand

Commit Message

Arnd Bergmann Nov. 8, 2017, 4 p.m. UTC
I noticed that __getnstimeofday() is a rather odd interface, with
a number of quirks:

- The caller may come from NMI context, but the implementation is not NMI safe
- The calling conventions are different from any other timekeeping functions
- The naming doesn't fit into the 'ktime_get_*()' scheme

This addresses the above issues by using a completely different
method to get the time: ktime_get_real_fast_ns() is NMI safe and
doesn't print a warning when called with the timekeeping suspended,
and we can easily transform the result into a timespec structure. Since
ktime_get_real_fast_ns() was not exported to modules, I also add the
export.

The behavior for the suspended case changes, we now return the time
if we can find it out rather than setting it to zero. We could still
change that but this would require exporting the 'timekeeping_suspended'
to modules.

I'm not trying to address y2038-safety at this point, but plan to
do that later with a more complex patch.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

---
It would be helpful to get this merged into v4.15-rc1, since I have
some other cleanups that depend on this. Please have a look to see if
this makes sense to you.
---
 fs/pstore/platform.c      | 5 +----
 kernel/time/timekeeping.c | 1 +
 2 files changed, 2 insertions(+), 4 deletions(-)

-- 
2.9.0

Comments

Kees Cook Nov. 9, 2017, 12:55 a.m. UTC | #1
On Wed, Nov 8, 2017 at 8:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> I noticed that __getnstimeofday() is a rather odd interface, with

> a number of quirks:

>

> - The caller may come from NMI context, but the implementation is not NMI safe

> - The calling conventions are different from any other timekeeping functions

> - The naming doesn't fit into the 'ktime_get_*()' scheme

>

> This addresses the above issues by using a completely different

> method to get the time: ktime_get_real_fast_ns() is NMI safe and

> doesn't print a warning when called with the timekeeping suspended,

> and we can easily transform the result into a timespec structure. Since

> ktime_get_real_fast_ns() was not exported to modules, I also add the

> export.

>

> The behavior for the suspended case changes, we now return the time

> if we can find it out rather than setting it to zero. We could still

> change that but this would require exporting the 'timekeeping_suspended'

> to modules.

>

> I'm not trying to address y2038-safety at this point, but plan to

> do that later with a more complex patch.

>

> Cc: Kees Cook <keescook@chromium.org>

> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> ---

> It would be helpful to get this merged into v4.15-rc1, since I have

> some other cleanups that depend on this. Please have a look to see if

> this makes sense to you.


As long as this is sane to call when timekeeping is not running, I'm
happy to take the patch. Would you prefer I take this via pstore or do
you have another tree it should go through?

Acked-by: Kees Cook <keescook@chromium.org>


-Kees

> ---

>  fs/pstore/platform.c      | 5 +----

>  kernel/time/timekeeping.c | 1 +

>  2 files changed, 2 insertions(+), 4 deletions(-)

>

> diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c

> index ec7199e859d2..086e491faf04 100644

> --- a/fs/pstore/platform.c

> +++ b/fs/pstore/platform.c

> @@ -482,10 +482,7 @@ void pstore_record_init(struct pstore_record *record,

>         record->psi = psinfo;

>

>         /* Report zeroed timestamp if called before timekeeping has resumed. */

> -       if (__getnstimeofday(&record->time)) {

> -               record->time.tv_sec = 0;

> -               record->time.tv_nsec = 0;

> -       }

> +       record->time = ns_to_timespec(ktime_get_real_fast_ns());

>  }

>

>  /*

> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c

> index 353f7bd1eeb0..198afa78bf69 100644

> --- a/kernel/time/timekeeping.c

> +++ b/kernel/time/timekeeping.c

> @@ -528,6 +528,7 @@ u64 ktime_get_real_fast_ns(void)

>  {

>         return __ktime_get_real_fast_ns(&tk_fast_mono);

>  }

> +EXPORT_SYMBOL_GPL(ktime_get_real_fast_ns);

>

>  /**

>   * halt_fast_timekeeper - Prevent fast timekeeper from accessing clocksource.

> --

> 2.9.0

>




-- 
Kees Cook
Pixel Security
Arnd Bergmann Nov. 9, 2017, 12:54 p.m. UTC | #2
On Thu, Nov 9, 2017 at 1:55 AM, Kees Cook <keescook@chromium.org> wrote:
> On Wed, Nov 8, 2017 at 8:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:

>> I noticed that __getnstimeofday() is a rather odd interface, with

>> a number of quirks:

>>

>> - The caller may come from NMI context, but the implementation is not NMI safe

>> - The calling conventions are different from any other timekeeping functions

>> - The naming doesn't fit into the 'ktime_get_*()' scheme

>>

>> This addresses the above issues by using a completely different

>> method to get the time: ktime_get_real_fast_ns() is NMI safe and

>> doesn't print a warning when called with the timekeeping suspended,

>> and we can easily transform the result into a timespec structure. Since

>> ktime_get_real_fast_ns() was not exported to modules, I also add the

>> export.

>>

>> The behavior for the suspended case changes, we now return the time

>> if we can find it out rather than setting it to zero. We could still

>> change that but this would require exporting the 'timekeeping_suspended'

>> to modules.

>>

>> I'm not trying to address y2038-safety at this point, but plan to

>> do that later with a more complex patch.

>>

>> Cc: Kees Cook <keescook@chromium.org>

>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

>> ---

>> It would be helpful to get this merged into v4.15-rc1, since I have

>> some other cleanups that depend on this. Please have a look to see if

>> this makes sense to you.

>

> As long as this is sane to call when timekeeping is not running, I'm

> happy to take the patch.


Maybe John or Thomas can comment on this, I'm not totally sure how
reliable it is. My best guess is that it will still produce correct time stamps
a lot of the time, but it depends a bit on the clocksource hardware and
how long the timekeeping has been suspended. As far as I can tell,

- if the clocksource register contents wrap around, the reported time
  might appear to go back to the time of the last timer interrupt. The
  shortest time I could find for an overflow is a 16-bit timer running at
  32khz on i.MX1, overflowing every two seconds.
- if reading a suspended clocksource returns zero (or another incorrect
  value) the time might be in the far future
- if reading a suspended clocksource causes a hang or crash, you
  lose.

the latter two could probably be considered bugs in the respective
clocksource drivers, and it's possible that this doesn't actually happen
on any of them, but we could in theory work around it by not
reading the a suspended clocksource at all and returning the
last known time from the interrupt.

I suppose we could even make ktime_get_*_fast_ns() always
return zero when the timekeeping is suspended, but I don't
know if that would hurt the other callers.

> Would you prefer I take this via pstore or do

> you have another tree it should go through?

>

> Acked-by: Kees Cook <keescook@chromium.org>


The pstore tree is probably best, but going through -tip would work
as well I think.

     Arnd
Thomas Gleixner Nov. 9, 2017, 11 p.m. UTC | #3
On Thu, 9 Nov 2017, Arnd Bergmann wrote:
> On Thu, Nov 9, 2017 at 1:55 AM, Kees Cook <keescook@chromium.org> wrote:

> > On Wed, Nov 8, 2017 at 8:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:

> >> I noticed that __getnstimeofday() is a rather odd interface, with

> >> a number of quirks:

> >>

> >> - The caller may come from NMI context, but the implementation is not NMI safe


Hmm, no. None of the regular accessor functions can be called from NMI
context safely.

> >> - The calling conventions are different from any other timekeeping functions


Yes, that was done back then to actually return the value (which is
possible with TSC) and indicate at the same time that timekeeping is
suspended.

> >> - The naming doesn't fit into the 'ktime_get_*()' scheme


Right, because it's not in the ktime_get_() family of functions.

> > As long as this is sane to call when timekeeping is not running, I'm

> > happy to take the patch.

> 

> Maybe John or Thomas can comment on this, I'm not totally sure how

> reliable it is. My best guess is that it will still produce correct time stamps

> a lot of the time, but it depends a bit on the clocksource hardware and

> how long the timekeeping has been suspended. As far as I can tell,

> 

> - if the clocksource register contents wrap around, the reported time

>   might appear to go back to the time of the last timer interrupt. The

>   shortest time I could find for an overflow is a 16-bit timer running at

>   32khz on i.MX1, overflowing every two seconds.

> - if reading a suspended clocksource returns zero (or another incorrect

>   value) the time might be in the far future

> - if reading a suspended clocksource causes a hang or crash, you

>   lose.


None of those problems exist for the fast NMI safe accessors. In
timekeeping_suspend() we store the current time and any access to the fast
timekeeper returns exactly the time at suspend up to the point where
timekeeping is resumed. See halt_fast_timekeeper().

So all you get between timekeeping_suspend() and timekeeping_resume() is
a stale timestamp. No backwards, no crash and burn.

The normal timekeeping accessor functions cannot be called between
timekeeping_suspend() and timekeeping_resume() at all. They will emit a
warning and can indeed crash and burn in one of the ways you described
above. This does not happen on x86 because the TSC will just work on
systems with pstore.

Thanks,

	tglx
Arnd Bergmann Nov. 9, 2017, 11:43 p.m. UTC | #4
On Fri, Nov 10, 2017 at 12:00 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Thu, 9 Nov 2017, Arnd Bergmann wrote:

>> On Thu, Nov 9, 2017 at 1:55 AM, Kees Cook <keescook@chromium.org> wrote:

>> > On Wed, Nov 8, 2017 at 8:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:

>> >> I noticed that __getnstimeofday() is a rather odd interface, with

>> >> a number of quirks:

>> >>

>> >> - The caller may come from NMI context, but the implementation is not NMI safe

>

> Hmm, no. None of the regular accessor functions can be called from NMI

> context safely.


Right, that's what I mean: it must not get called from NMI context, but it
currently is, at least for this case:

NMI handler:
  something bad
    panic()
      kmsg_dump()
        pstore_dump()
           pstore_record_init()
             __getnstimeofday()

I should probably add that to the changelog text ;-)

>> >> - The calling conventions are different from any other timekeeping functions

>

> Yes, that was done back then to actually return the value (which is

> possible with TSC) and indicate at the same time that timekeeping is

> suspended.

>

>> >> - The naming doesn't fit into the 'ktime_get_*()' scheme

>

> Right, because it's not in the ktime_get_() family of functions.


For the background on how I got here: I'm currently trying to go through
all functions in include/linux/timekeeping32.h with the intention of
replacing them with appropriate ktime_get_*() interfaces that don't have
the y2038 overflow problem. I considered just using
the existing __getnstimeofday64() here, which would be a trivial
change, but then I noticed the NMI problem.

Also, I have a related patch series that renames getrawmonotonic64(),
current_kernel_time64() and get_monotonic_coarse64() to
ktime_get_raw_ts64(), ktime_get_coarse_real_ts64() and
ktime_get_coarse_ts64(), for consistency, but then I couldn't
come up with a good name for __getnstimeofday64(), as the
__ktime_get_*() naming is already used for a number of other
things and I did not want to overload that more. Completely
removing __getnstimeofday64() would be handier here.

>> > As long as this is sane to call when timekeeping is not running, I'm

>> > happy to take the patch.

>>

>> Maybe John or Thomas can comment on this, I'm not totally sure how

>> reliable it is. My best guess is that it will still produce correct time stamps

>> a lot of the time, but it depends a bit on the clocksource hardware and

>> how long the timekeeping has been suspended. As far as I can tell,

>>

>> - if the clocksource register contents wrap around, the reported time

>>   might appear to go back to the time of the last timer interrupt. The

>>   shortest time I could find for an overflow is a 16-bit timer running at

>>   32khz on i.MX1, overflowing every two seconds.

>> - if reading a suspended clocksource returns zero (or another incorrect

>>   value) the time might be in the far future

>> - if reading a suspended clocksource causes a hang or crash, you

>>   lose.

>

> None of those problems exist for the fast NMI safe accessors. In

> timekeeping_suspend() we store the current time and any access to the fast

> timekeeper returns exactly the time at suspend up to the point where

> timekeeping is resumed. See halt_fast_timekeeper().

>

> So all you get between timekeeping_suspend() and timekeeping_resume() is

> a stale timestamp. No backwards, no crash and burn.


Ah, perfect, then my patch should be fine.
I think I had read that part of the code before, but then forgotten about it.

> The normal timekeeping accessor functions cannot be called between

> timekeeping_suspend() and timekeeping_resume() at all. They will emit a

> warning and can indeed crash and burn in one of the ways you described

> above. This does not happen on x86 because the TSC will just work on

> systems with pstore.


Sure, except for __getnstimeofday64(), which will intentionally not warn but
could crash in the clocksource driver (on non-x86). We do ignore the result
from __getnstimeofday64() when timekeeping is suspended, but only after
we call into the clocksource driver.

       Arnd
Thomas Gleixner Nov. 10, 2017, 12:46 a.m. UTC | #5
On Fri, 10 Nov 2017, Arnd Bergmann wrote:
> On Fri, Nov 10, 2017 at 12:00 AM, Thomas Gleixner <tglx@linutronix.de> wrote:

> > Hmm, no. None of the regular accessor functions can be called from NMI

> > context safely.

> 

> Right, that's what I mean: it must not get called from NMI context, but it

> currently is, at least for this case:

> 

> NMI handler:

>   something bad

>     panic()

>       kmsg_dump()

>         pstore_dump()

>            pstore_record_init()

>              __getnstimeofday()

> 

> I should probably add that to the changelog text ;-)


Indeed.

> Also, I have a related patch series that renames getrawmonotonic64(),

> current_kernel_time64() and get_monotonic_coarse64() to

> ktime_get_raw_ts64(), ktime_get_coarse_real_ts64() and

> ktime_get_coarse_ts64(), for consistency, but then I couldn't

> come up with a good name for __getnstimeofday64(), as the

> __ktime_get_*() naming is already used for a number of other

> things and I did not want to overload that more. Completely

> removing __getnstimeofday64() would be handier here.


Oh yes, it's an abomination.

> > The normal timekeeping accessor functions cannot be called between

> > timekeeping_suspend() and timekeeping_resume() at all. They will emit a

> > warning and can indeed crash and burn in one of the ways you described

> > above. This does not happen on x86 because the TSC will just work on

> > systems with pstore.

> 

> Sure, except for __getnstimeofday64(), which will intentionally not warn but

> could crash in the clocksource driver (on non-x86). We do ignore the result

> from __getnstimeofday64() when timekeeping is suspended, but only after

> we call into the clocksource driver.


Right, let's get rid of it before it grows another user.

Thanks,

	tglx
Kees Cook Nov. 10, 2017, 1:04 a.m. UTC | #6
On Thu, Nov 9, 2017 at 4:46 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> On Fri, 10 Nov 2017, Arnd Bergmann wrote:

>> On Fri, Nov 10, 2017 at 12:00 AM, Thomas Gleixner <tglx@linutronix.de> wrote:

>> > Hmm, no. None of the regular accessor functions can be called from NMI

>> > context safely.

>>

>> Right, that's what I mean: it must not get called from NMI context, but it

>> currently is, at least for this case:

>>

>> NMI handler:

>>   something bad

>>     panic()

>>       kmsg_dump()

>>         pstore_dump()

>>            pstore_record_init()

>>              __getnstimeofday()

>>

>> I should probably add that to the changelog text ;-)

>

> Indeed.


Er, so, is this safe to call there? I've had to fix this a few times
now, so if using ktime_get_real_fast_ns() can be used here (and
doesn't return 0) then this is easily an improvement over the existing
"maybe read 0" case pstore has now.

-Kees


-- 
Kees Cook
Pixel Security
Thomas Gleixner Nov. 10, 2017, 7:04 a.m. UTC | #7
On Thu, 9 Nov 2017, Kees Cook wrote:
> On Thu, Nov 9, 2017 at 4:46 PM, Thomas Gleixner <tglx@linutronix.de> wrote:

> > On Fri, 10 Nov 2017, Arnd Bergmann wrote:

> >> On Fri, Nov 10, 2017 at 12:00 AM, Thomas Gleixner <tglx@linutronix.de> wrote:

> >> > Hmm, no. None of the regular accessor functions can be called from NMI

> >> > context safely.

> >>

> >> Right, that's what I mean: it must not get called from NMI context, but it

> >> currently is, at least for this case:

> >>

> >> NMI handler:

> >>   something bad

> >>     panic()

> >>       kmsg_dump()

> >>         pstore_dump()

> >>            pstore_record_init()

> >>              __getnstimeofday()

> >>

> >> I should probably add that to the changelog text ;-)

> >

> > Indeed.

> 

> Er, so, is this safe to call there? I've had to fix this a few times

> now, so if using ktime_get_real_fast_ns() can be used here (and

> doesn't return 0) then this is easily an improvement over the existing

> "maybe read 0" case pstore has now.


ktime_get_real_fast_ns() is NMI safe and returns

before timekeeping_suspend():	correct time

after timekeeping_suspend():	timestamp which was frozen in
      				timekeeping_suspend()

after timekeeping_resume():	correct time

Thanks,

	tglx
diff mbox series

Patch

diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c
index ec7199e859d2..086e491faf04 100644
--- a/fs/pstore/platform.c
+++ b/fs/pstore/platform.c
@@ -482,10 +482,7 @@  void pstore_record_init(struct pstore_record *record,
 	record->psi = psinfo;
 
 	/* Report zeroed timestamp if called before timekeeping has resumed. */
-	if (__getnstimeofday(&record->time)) {
-		record->time.tv_sec = 0;
-		record->time.tv_nsec = 0;
-	}
+	record->time = ns_to_timespec(ktime_get_real_fast_ns());
 }
 
 /*
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 353f7bd1eeb0..198afa78bf69 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -528,6 +528,7 @@  u64 ktime_get_real_fast_ns(void)
 {
 	return __ktime_get_real_fast_ns(&tk_fast_mono);
 }
+EXPORT_SYMBOL_GPL(ktime_get_real_fast_ns);
 
 /**
  * halt_fast_timekeeper - Prevent fast timekeeper from accessing clocksource.