From patchwork Tue Jan  7 03:57:03 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: John Stultz <john.stultz@linaro.org>
X-Patchwork-Id: 22907
Return-Path: <patchwork-forward+bncBCJ7RPMX4EMBBGXWVWLAKGQEQ3EKHBQ@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-ie0-f197.google.com (mail-ie0-f197.google.com
 [209.85.223.197])
 by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 01F7023A3F
 for <linaro@patches.linaro.org>; Tue,  7 Jan 2014 03:57:15 +0000 (UTC)
Received: by mail-ie0-f197.google.com with SMTP id e14sf98010067iej.8
 for <linaro@patches.linaro.org>; Mon, 06 Jan 2014 19:57:14 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject
 :date:message-id:x-original-sender:x-original-authentication-results
 :precedence:mailing-list:list-id:list-post:list-help:list-archive
 :list-unsubscribe;
 bh=hnI9f2NDl/l7Wv+gsxGH4VvNNmwD+AL6ZMBlILkX35E=;
 b=TS1erBbR2d8SUYrxRJNqq1q3LbG5KjYiljxOT57fEf0rkwbLqub38vMLxi3ysGEbLN
 cD5bqcoDIsNzuY0M323620Vh3idrN+wntZKKFJMVqkmHB/j2cs9XFPkWMxZE1pLQd5tZ
 ksXRFh72/vmlD5khDPljOSna7+YUZqIQfr2wqZ31j7NF1I/JyatTnoeFOdtPw5m2PH8F
 6woD6go1818WMCA0Awi5jD4muSPqnm99ViCwM5nXSXeraJOIA2a5ZosGUaQ80rLVzHpa
 cEr7KD9xpAElxelCFb06SdPVme/ymbjOxEpIaJ/wvzvjc+PRcXFS3cwFEn0fbOwP9sh/
 3lxg==
X-Gm-Message-State: ALoCoQnbFv5BRUNF6vvW+FkrWC9iR7j2FBE3SrJmsSqqZ410KsCrb/B42F/cIZUi5xrY9BRX2t5S
X-Received: by 10.43.102.136 with SMTP id de8mr40661004icc.20.1389067034585; 
 Mon, 06 Jan 2014 19:57:14 -0800 (PST)
MIME-Version: 1.0
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.49.17.103 with SMTP id n7ls6261885qed.52.gmail; Mon, 06 Jan
 2014 19:57:14 -0800 (PST)
X-Received: by 10.52.185.132 with SMTP id fc4mr6255874vdc.21.1389067034451; 
 Mon, 06 Jan 2014 19:57:14 -0800 (PST)
Received: from mail-vb0-f47.google.com (mail-vb0-f47.google.com
 [209.85.212.47]) by mx.google.com with ESMTPS id
 ks3si32442428vec.89.2014.01.06.19.57.14
 for <patchwork-forward@linaro.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Mon, 06 Jan 2014 19:57:14 -0800 (PST)
Received-SPF: neutral (google.com: 209.85.212.47 is neither permitted nor
 denied by best guess record for domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org)
 client-ip=209.85.212.47; 
Received: by mail-vb0-f47.google.com with SMTP id m10so3383019vbh.6
 for <patchwork-forward@linaro.org>;
 Mon, 06 Jan 2014 19:57:14 -0800 (PST)
X-Received: by 10.220.79.199 with SMTP id q7mr6688vck.38.1389067034221;
 Mon, 06 Jan 2014 19:57:14 -0800 (PST)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patches@linaro.org
Received: by 10.59.13.131 with SMTP id ey3csp134583ved;
 Mon, 6 Jan 2014 19:57:13 -0800 (PST)
X-Received: by 10.66.160.40 with SMTP id xh8mr2333576pab.150.1389067031616; 
 Mon, 06 Jan 2014 19:57:11 -0800 (PST)
Received: from mail-pd0-f181.google.com (mail-pd0-f181.google.com
 [209.85.192.181]) by mx.google.com with ESMTPS id
 ai8si5802683pad.38.2014.01.06.19.57.11 for <patches@linaro.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Mon, 06 Jan 2014 19:57:11 -0800 (PST)
Received-SPF: neutral (google.com: 209.85.192.181 is neither permitted nor
 denied by best guess record for domain of
 john.stultz@linaro.org) client-ip=209.85.192.181; 
Received: by mail-pd0-f181.google.com with SMTP id p10so19000493pdj.12
 for <patches@linaro.org>; Mon, 06 Jan 2014 19:57:11 -0800 (PST)
X-Received: by 10.66.158.132 with SMTP id wu4mr2394534pab.66.1389067031103; 
 Mon, 06 Jan 2014 19:57:11 -0800 (PST)
Received: from localhost.localdomain (c-67-170-153-23.hsd1.or.comcast.net.
 [67.170.153.23]) by mx.google.com with ESMTPSA id
 ic7sm132339854pbc.29.2014.01.06.19.57.08 for <multiple recipients>
 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Mon, 06 Jan 2014 19:57:09 -0800 (PST)
From: John Stultz <john.stultz@linaro.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <john.stultz@linaro.org>,
 Miroslav Lichvar <mlichvar@redhat.com>,
 Richard Cochran <richardcochran@gmail.com>,
 Prarit Bhargava <prarit@redhat.com>
Subject: [PATCH] [RFC] timekeeping: Rework frequency adjustments to work
 better w/ nohz
Date: Mon,  6 Jan 2014 19:57:03 -0800
Message-Id: <1389067023-13541-1-git-send-email-john.stultz@linaro.org>
X-Mailer: git-send-email 1.8.3.2
X-Removed-Original-Auth: Dkim didn't pass.
X-Original-Sender: john.stultz@linaro.org
X-Original-Authentication-Results: mx.google.com;       spf=neutral
 (google.com: 209.85.212.47 is neither permitted nor denied by best
 guess record for domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org)
 smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org
Precedence: list
Mailing-list: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
List-ID: <patchwork-forward.linaro.org>
X-Google-Group-Id: 836684582541
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Unsubscribe: <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>, 
 <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>

Got a few cycles to take another look at this, and tried to address
Miroslav's latest comments. Please let me know if you have further
thoughts!

thanks
-john


The existing timekeeping_adjust logic has always been complicated
to understand. Further, since it was developed prior to NOHZ becoming
common, its not surprising it performs poorly when NOHZ is enabled.

Since Miroslav pointed out the problematic nature of the existing code
in the NOHZ case, I've tried to refactor the code to perform better.

The problem with the previous approach was that it tried to adjust
for the total cumulative error using a scaled dampening factor. This
resulted in large errors to be corrected slowly, while small errors
were corrected quickly. With NOHZ the timekeeping code doesn't know
how far out the next tick will be, so this results in bad
over-correction to small errors, and insufficient correction to large
errors.

Inspired by Miroslav's patch, I've refactored the code to try to
address the correction in two steps.

1) Check the future freq error for the next tick, and if the frequency
error is large, try to make sure we correct that error as best we can.

2) Then make a small single unit adjustment to correct any cumulative
error that we've collected.

This method performs fairly well in the simulator Miroslav created.

I also dropped the tk->ntp_error adjustments, as I've never quite
been able to recall how that was correct, and it doesn't seem to 
have any affect in the simulator. Will have to proceed carefully
with testing here.

I still think this is probably 3.15 or later material, but I'd be
very interested in feedback, thoughts, and testing.

Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reported-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 163 +++++++++++++++++-----------------------------
 1 file changed, 61 insertions(+), 102 deletions(-)
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 87b4f00..15354d4 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1048,54 +1048,50 @@ static int __init timekeeping_init_ops(void)
 device_initcall(timekeeping_init_ops);
 
 /*
- * If the error is already larger, we look ahead even further
- * to compensate for late or lost adjustments.
+ * Calculate the future error caused by incorrect freq value
+ * and adjust the timekeeper to correct that.
  */
-static __always_inline int timekeeping_bigadjust(struct timekeeper *tk,
-						 s64 error, s64 *interval,
-						 s64 *offset)
+static __always_inline void timekeeping_freqadjust(struct timekeeper *tk,
+						 s64 interval,
+						 s64 offset)
 {
 	s64 tick_error, i;
-	u32 look_ahead, adj;
-	s32 error2, mult;
+	u32 adj;
+	s32 mult = 1;
 
-	/*
-	 * Use the current error value to determine how much to look ahead.
-	 * The larger the error the slower we adjust for it to avoid problems
-	 * with losing too many ticks, otherwise we would overadjust and
-	 * produce an even larger error.  The smaller the adjustment the
-	 * faster we try to adjust for it, as lost ticks can do less harm
-	 * here.  This is tuned so that an error of about 1 msec is adjusted
-	 * within about 1 sec (or 2^20 nsec in 2^SHIFT_HZ ticks).
-	 */
-	error2 = tk->ntp_error >> (NTP_SCALE_SHIFT + 22 - 2 * SHIFT_HZ);
-	error2 = abs(error2);
-	for (look_ahead = 0; error2 > 0; look_ahead++)
-		error2 >>= 2;
+	/* Calculate current error per tick */
+	tick_error = ntp_tick_length() >> tk->ntp_error_shift;
+	tick_error -= tk->xtime_interval;
 
-	/*
-	 * Now calculate the error in (1 << look_ahead) ticks, but first
-	 * remove the single look ahead already included in the error.
-	 */
-	tick_error = ntp_tick_length() >> (tk->ntp_error_shift + 1);
-	tick_error -= tk->xtime_interval >> 1;
-	error = ((error - tick_error) >> look_ahead) + tick_error;
+	/* Don't worry about correcting it if its small */
+	if (likely(abs(tick_error) < 2*interval))
+		return;
 
-	/* Finally calculate the adjustment shift value.  */
-	i = *interval;
-	mult = 1;
-	if (error < 0) {
-		error = -error;
-		*interval = -*interval;
-		*offset = -*offset;
-		mult = -1;
+	if (tick_error < 0) {
+		interval = -interval;
+		offset = -offset;
+		mult = -mult;
 	}
-	for (adj = 0; error > i; adj++)
-		error >>= 1;
 
-	*interval <<= adj;
-	*offset <<= adj;
-	return mult << adj;
+	/* Sort out the magnitude of the correction */
+	tick_error = abs(tick_error);
+	i = abs(interval);
+	for (adj = 0; tick_error > i; adj++)
+		tick_error >>= 1;
+
+	/* scale the corrections */
+	interval <<= adj;
+	offset <<= adj;
+	mult <<= adj;
+
+	/*
+	 * Apply the correction to the timekeeper.
+	 * See long comment in timekeeping_adjust to explain the math.
+	 */
+	tk->mult += mult;
+	tk->xtime_interval += interval;
+	tk->xtime_nsec -= offset;
+
 }
 
 /*
@@ -1108,65 +1104,23 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 	s64 error, interval = tk->cycle_interval;
 	int adj;
 
-	/*
-	 * The point of this is to check if the error is greater than half
-	 * an interval.
-	 *
-	 * First we shift it down from NTP_SHIFT to clocksource->shifted nsecs.
-	 *
-	 * Note we subtract one in the shift, so that error is really error*2.
-	 * This "saves" dividing(shifting) interval twice, but keeps the
-	 * (error > interval) comparison as still measuring if error is
-	 * larger than half an interval.
-	 *
-	 * Note: It does not "save" on aggravation when reading the code.
-	 */
-	error = tk->ntp_error >> (tk->ntp_error_shift - 1);
-	if (error > interval) {
-		/*
-		 * We now divide error by 4(via shift), which checks if
-		 * the error is greater than twice the interval.
-		 * If it is greater, we need a bigadjust, if its smaller,
-		 * we can adjust by 1.
-		 */
-		error >>= 2;
-		/*
-		 * XXX - In update_wall_time, we round up to the next
-		 * nanosecond, and store the amount rounded up into
-		 * the error. This causes the likely below to be unlikely.
-		 *
-		 * The proper fix is to avoid rounding up by using
-		 * the high precision tk->xtime_nsec instead of
-		 * xtime.tv_nsec everywhere. Fixing this will take some
-		 * time.
-		 */
-		if (likely(error <= interval))
-			adj = 1;
-		else
-			adj = timekeeping_bigadjust(tk, error, &interval, &offset);
-	} else {
-		if (error < -interval) {
-			/* See comment above, this is just switched for the negative */
-			error >>= 2;
-			if (likely(error >= -interval)) {
-				adj = -1;
-				interval = -interval;
-				offset = -offset;
-			} else {
-				adj = timekeeping_bigadjust(tk, error, &interval, &offset);
-			}
-		} else {
-			goto out_adjust;
-		}
-	}
+	/* First correct for the current frequency error */
+	timekeeping_freqadjust(tk, interval, offset);
+
+
+	/* Next make a small adjustment to fix any cumulative error */
+	error = tk->ntp_error >> tk->ntp_error_shift;
+	if (likely(abs(error) <= interval/2))
+		goto out_adjust;
+
+	if (error < 0) {
+		adj = -1;
+		interval = -interval;
+		offset = -offset;
+	} else
+		adj = 1;
+
 
-	if (unlikely(tk->clock->maxadj &&
-		(tk->mult + adj > tk->clock->mult + tk->clock->maxadj))) {
-		printk_once(KERN_WARNING
-			"Adjusting %s more than 11%% (%ld vs %ld)\n",
-			tk->clock->name, (long)tk->mult + adj,
-			(long)tk->clock->mult + tk->clock->maxadj);
-	}
 	/*
 	 * So the following can be confusing.
 	 *
@@ -1213,15 +1167,21 @@ static void timekeeping_adjust(struct timekeeper *tk, s64 offset)
 	 *	xtime_nsec_2 = xtime_nsec_1 - offset
 	 * Which simplfies to:
 	 *	xtime_nsec -= offset
-	 *
-	 * XXX - TODO: Doc ntp_error calculation.
 	 */
 	tk->mult += adj;
 	tk->xtime_interval += interval;
 	tk->xtime_nsec -= offset;
-	tk->ntp_error -= (interval - offset) << tk->ntp_error_shift;
 
 out_adjust:
+
+	if (unlikely(tk->clock->maxadj &&
+		(tk->mult > tk->clock->mult + tk->clock->maxadj))) {
+		printk_once(KERN_WARNING
+			"Adjusting %s more than 11%% (%ld vs %ld)\n",
+			tk->clock->name, (long)tk->mult,
+			(long)tk->clock->mult + tk->clock->maxadj);
+	}
+
 	/*
 	 * It may be possible that when we entered this function, xtime_nsec
 	 * was very small.  Further, if we're slightly speeding the clocksource
@@ -1241,7 +1201,6 @@ out_adjust:
 		tk->xtime_nsec = 0;
 		tk->ntp_error += neg << tk->ntp_error_shift;
 	}
-
 }
 
 /**