Regression: can't apply frequency offsets above 1000ppm.

Message ID CALAqxLUrHhcV2ksS8bH6p9rOi9rrXnGMa3VUirKJmkmf7Ft0Uw@mail.gmail.com
State New
Headers show

Commit Message

John Stultz Sept. 5, 2015, 12:57 a.m.
On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves <nunojpg@gmail.com> wrote:
>> > And just installing chrony from the feeds. With any kernel from 3.17
>> > you'll have wrong estimates at chronyc sourcestats.
>>
>> Wrong estimates? Could you be more specific about what the failure
>> you're seeing is here? The
>>
>> I installed the image above, which comes with a 4.1.6 kernel, and
>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
>> internet fairly quickly (at least according to chronyc tracking).
>
> To see the bug with chronyd the initial offset shouldn't be very close
> to zero, so it's forced to correct the offset by adjusting the
> frequency in a larger step.
>
> I'm attaching a simple C program that prints the frequency offset
> as measured between the REALTIME and MONOTONIC_RAW clocks when the
> adjtimex tick is set to 9000. It should show values close to -100000
> ppm and I suspect on the BBB it will be much smaller.

So I spent some time on this late last night and this afternoon.

It was a little odd because things don't seem totally broken, but
something isn't quite right.

Digging around it seems the iterative logrithmic approximation done in
timekeeping_freqadjust() wasn't working right. Instead of making
smaller order alternating positive and negative adjustments, it was
doing strange growing adjustments for the same value that wern't large
enough to actually correct things very quickly. This made it much
slower to adapt to specified frequency values.

The odd bit, is it seems to come down to:
    tick_error = abs(tick_error);

Haven't chased down why yet, but apparently abs() isn't doing what one
would think when passed a s64 value.

Anyway, the attached patch seems to improve things for me. If you can
confirm it resolves things for you I'll run it through some additional
testing after the (long holiday) weekend is over and try to get the
fix pushed out.

Thanks again for the issue report!
-john

Comments

John Stultz Sept. 5, 2015, 1 a.m. | #1
On Fri, Sep 4, 2015 at 5:57 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Thu, Sep 3, 2015 at 4:26 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
>> On Wed, Sep 02, 2015 at 04:16:00PM -0700, John Stultz wrote:
>>> On Tue, Sep 1, 2015 at 6:14 PM, Nuno Gonçalves <nunojpg@gmail.com> wrote:
>>> > And just installing chrony from the feeds. With any kernel from 3.17
>>> > you'll have wrong estimates at chronyc sourcestats.
>>>
>>> Wrong estimates? Could you be more specific about what the failure
>>> you're seeing is here? The
>>>
>>> I installed the image above, which comes with a 4.1.6 kernel, and
>>> chrony seems to have gotten my BBB into ~1ms sync w/ servers over the
>>> internet fairly quickly (at least according to chronyc tracking).
>>
>> To see the bug with chronyd the initial offset shouldn't be very close
>> to zero, so it's forced to correct the offset by adjusting the
>> frequency in a larger step.
>>
>> I'm attaching a simple C program that prints the frequency offset
>> as measured between the REALTIME and MONOTONIC_RAW clocks when the
>> adjtimex tick is set to 9000. It should show values close to -100000
>> ppm and I suspect on the BBB it will be much smaller.
>
> So I spent some time on this late last night and this afternoon.
>
> It was a little odd because things don't seem totally broken, but
> something isn't quite right.
>
> Digging around it seems the iterative logrithmic approximation done in
> timekeeping_freqadjust() wasn't working right. Instead of making
> smaller order alternating positive and negative adjustments, it was
> doing strange growing adjustments for the same value that wern't large
> enough to actually correct things very quickly. This made it much
> slower to adapt to specified frequency values.
>
> The odd bit, is it seems to come down to:
>     tick_error = abs(tick_error);
>
> Haven't chased down why yet, but apparently abs() isn't doing what one
> would think when passed a s64 value.

Well.. chasing it down wasn't hard.. from include/linux/kernel.h:
/*
 * abs() handles unsigned and signed longs, ints, shorts and chars.  For all
 * input types abs() returns a signed long.
 * abs() should not be used for 64-bit types (s64, u64, long long) - use abs64()
 * for those.
 */

Ouch.
-john
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
John Stultz Sept. 9, 2015, 12:52 a.m. | #2
On Sat, Sep 5, 2015 at 6:41 AM, Nuno Gonçalves <nunojpg@gmail.com> wrote:
> Considering your last message I just tried to use abs64 instead. Fixes
> the problem for me.

Yea, I've since simplified my patch in the same way.

Still looking good? Can I get a Tested-by: from you?

thanks
-john
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nuno Gonçalves Sept. 9, 2015, 1 a.m. | #3
Yes please.

Tested-by: Nuno Goncalves <nunojpg@gmail.com>

Thanks,
Nuno

On Wed, Sep 9, 2015 at 1:52 AM, John Stultz <john.stultz@linaro.org> wrote:
> On Sat, Sep 5, 2015 at 6:41 AM, Nuno Gonçalves <nunojpg@gmail.com> wrote:
>> Considering your last message I just tried to use abs64 instead. Fixes
>> the problem for me.
>
> Yea, I've since simplified my patch in the same way.
>
> Still looking good? Can I get a Tested-by: from you?
>
> thanks
> -john
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index bca3667..81dc975 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1586,7 +1586,7 @@  static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
 	s64 interval = tk->cycle_interval;
 	s64 xinterval = tk->xtime_interval;
 	s64 tick_error;
-	bool negative;
+	bool negative = 0;
 	u32 adj;
 
 	/* Remove any current error adj from freq calculation */
@@ -1604,10 +1604,12 @@  static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
 		return;
 
 	/* preserve the direction of correction */
-	negative = (tick_error < 0);
+	if (tick_error < 0) {
+		tick_error = -tick_error;
+		negative = 1;
+	}
 
 	/* Sort out the magnitude of the correction */
-	tick_error = abs(tick_error);
 	for (adj = 0; tick_error > interval; adj++)
 		tick_error >>= 1;