[19/21] clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety margin

Message ID 1427945681-29972-20-git-send-email-john.stultz@linaro.org
State New
Headers show

Commit Message

John Stultz April 2, 2015, 3:34 a.m.
Ingo noted that the description of clocks_calc_max_nsecs()'s
50% safety margin was somewhat circular. So this patch tries
to improve the comment to better explain what we mean by the
50% safety margin and why we need it.

Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

John Stultz April 2, 2015, 5:30 p.m. | #1
On Thu, Apr 2, 2015 at 1:50 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Apr 01, 2015 at 08:34:39PM -0700, John Stultz wrote:
>> Ingo noted that the description of clocks_calc_max_nsecs()'s
>> 50% safety margin was somewhat circular. So this patch tries
>> to improve the comment to better explain what we mean by the
>> 50% safety margin and why we need it.
>>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Prarit Bhargava <prarit@redhat.com>
>> Cc: Richard Cochran <richardcochran@gmail.com>
>> Signed-off-by: John Stultz <john.stultz@linaro.org>
>> ---
>>  kernel/time/clocksource.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>> index c3be3c7..15facb1 100644
>> --- a/kernel/time/clocksource.c
>> +++ b/kernel/time/clocksource.c
>> @@ -472,8 +472,11 @@ static u32 clocksource_max_adjustment(struct clocksource *cs)
>>   * @max_cyc: maximum cycle value before potential overflow (does not include
>>   *           any safety margin)
>>   *
>> - * NOTE: This function includes a safety margin of 50%, so that bad clock values
>> - * can be detected.
>> + * NOTE: This function includes a safety margin of 50%, in other words, we
>> + * return half the number of nanoseconds the hardware counter can technically
>> + * cover. This is done so that we can potentially detect problems caused by
>> + * delayed timers or bad hardware, which might result in time intervals that
>> + * are larger then what the math used can handle without overflows.
>>   */
>>  u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 *max_cyc)
>>  {
>
> Should we make a further note that the tk_fast things rely on this
> slack since they're not strongly serialized against this? That is, they
> can end up using an older cycle_last value and therefore end up with a
> larger delta than other code.

Though, even with the tk_fast bits, we expect the update to happen
regularly, its just that for the benefit of lock-free access we are ok
with the possible slight inconsistencies (in the mono clock) that
could happen if we use a slightly stale value mid-update. So I don't
think the tk_fast bits are actually relying on the slack any more then
the normal timekeeping code relies on the slack to handle slight
delays in processing the updates.  If we deal with time deltas large
enough to cause overflows, or time intervals larger then the hardware
can represent, we're sunk in either case. This 50% margin just makes
it easier to catch unexpected delays or issues.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
John Stultz April 2, 2015, 6:41 p.m. | #2
On Thu, Apr 2, 2015 at 11:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Apr 02, 2015 at 10:30:18AM -0700, John Stultz wrote:
>> > Should we make a further note that the tk_fast things rely on this
>> > slack since they're not strongly serialized against this? That is, they
>> > can end up using an older cycle_last value and therefore end up with a
>> > larger delta than other code.
>>
>> Though, even with the tk_fast bits, we expect the update to happen
>> regularly, its just that for the benefit of lock-free access we are ok
>> with the possible slight inconsistencies (in the mono clock) that
>> could happen if we use a slightly stale value mid-update. So I don't
>> think the tk_fast bits are actually relying on the slack any more then
>> the normal timekeeping code relies on the slack to handle slight
>> delays in processing the updates.  If we deal with time deltas large
>> enough to cause overflows, or time intervals larger then the hardware
>> can represent, we're sunk in either case. This 50% margin just makes
>> it easier to catch unexpected delays or issues.
>
> Right, so you're saying that even though the fast bits will see slightly
> larger deltas than the normal code, they should still not get anywhere
> near the 50% because we update much more frequently?

Well, they may get to 50% or slightly over (since 50% is the max nohz
idle length), but that's likely rare, and we shouldn't get anywhere
close to real failure edges (100% be it the mult-overflow or hardware
mask limit).

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
John Stultz April 2, 2015, 6:50 p.m. | #3
On Thu, Apr 2, 2015 at 11:43 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Apr 02, 2015 at 11:41:15AM -0700, John Stultz wrote:
>> On Thu, Apr 2, 2015 at 11:34 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Thu, Apr 02, 2015 at 10:30:18AM -0700, John Stultz wrote:
>> >> > Should we make a further note that the tk_fast things rely on this
>> >> > slack since they're not strongly serialized against this? That is, they
>> >> > can end up using an older cycle_last value and therefore end up with a
>> >> > larger delta than other code.
>> >>
>> >> Though, even with the tk_fast bits, we expect the update to happen
>> >> regularly, its just that for the benefit of lock-free access we are ok
>> >> with the possible slight inconsistencies (in the mono clock) that
>> >> could happen if we use a slightly stale value mid-update. So I don't
>> >> think the tk_fast bits are actually relying on the slack any more then
>> >> the normal timekeeping code relies on the slack to handle slight
>> >> delays in processing the updates.  If we deal with time deltas large
>> >> enough to cause overflows, or time intervals larger then the hardware
>> >> can represent, we're sunk in either case. This 50% margin just makes
>> >> it easier to catch unexpected delays or issues.
>> >
>> > Right, so you're saying that even though the fast bits will see slightly
>> > larger deltas than the normal code, they should still not get anywhere
>> > near the 50% because we update much more frequently?
>>
>> Well, they may get to 50% or slightly over (since 50% is the max nohz
>> idle length), but that's likely rare, and we shouldn't get anywhere
>> close to real failure edges (100% be it the mult-overflow or hardware
>> mask limit).
>
> Right, so the fast thing rely on there being slack. That was my point
> rather. I know they'll not get to the end of slack.

Right, but all of the time code relies on there being slack, since we
can't be sure timers will not be slightly late.
For the fast-timekeper stuff, its just that the deltas could be ever
so slightly larger since the lock-free reads can land during an
update. Its not that its ok for the update to be delayed longer for
fast-timekeepers, or that we expect them to handle larger deltas then
the normal time read logic can handle.

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c3be3c7..15facb1 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -472,8 +472,11 @@  static u32 clocksource_max_adjustment(struct clocksource *cs)
  * @max_cyc:	maximum cycle value before potential overflow (does not include
  *		any safety margin)
  *
- * NOTE: This function includes a safety margin of 50%, so that bad clock values
- * can be detected.
+ * NOTE: This function includes a safety margin of 50%, in other words, we
+ * return half the number of nanoseconds the hardware counter can technically
+ * cover. This is done so that we can potentially detect problems caused by
+ * delayed timers or bad hardware, which might result in time intervals that
+ * are larger then what the math used can handle without overflows.
  */
 u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask, u64 *max_cyc)
 {