From patchwork Tue Jul 17 07:05:14 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: john stultz X-Patchwork-Id: 10032 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id 3038023E1B for ; Tue, 17 Jul 2012 07:05:56 +0000 (UTC) Received: from mail-yw0-f52.google.com (mail-yw0-f52.google.com [209.85.213.52]) by fiordland.canonical.com (Postfix) with ESMTP id DE9DAA18604 for ; Tue, 17 Jul 2012 07:05:55 +0000 (UTC) Received: by mail-yw0-f52.google.com with SMTP id p61so64517yhp.11 for ; Tue, 17 Jul 2012 00:05:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-forwarded-to:x-forwarded-for:delivered-to:received-spf:from:to:cc :subject:date:message-id:x-mailer:in-reply-to:references :x-content-scanned:x-cbid:x-gm-message-state; bh=xv77T3HkU+11xfpOE8zQlUZGkYri+Upkp7raRb9b1nE=; b=n/Us+zn3O6dkX/2RoBUB4GFEPpqndEg85qGLRdphKiF+4L77obdonx0jjyptbGxkvx DNgIkGSWlXJqRW71uGzeTQSm6NiwSa/fIem7iXmRQx3YK9yDBkWxZYxrDAvb+qtf/XF1 tt8vrC4JVZCXa+Bok0UcjPB7+5o6GkM/rKF8cTyOCP2CW/zwqHJ0VKVei76B6vOC2ycD F59tfVFH4SiGTD/bWYkVY/JE5vcD2XCgAi5YLJ5me6kJB101Z0cB6gFn12UlVuloDymC 9IiJCKBrIxjmLEMxPOccr6HYp4bmJh4NFIAbnQanqU63KJ8lztZQm5KIJTsPOQmZipyr auVw== Received: by 10.43.63.140 with SMTP id xe12mr632192icb.57.1342508755445; Tue, 17 Jul 2012 00:05:55 -0700 (PDT) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.231.241.2 with SMTP id lc2csp15903ibb; Tue, 17 Jul 2012 00:05:54 -0700 (PDT) Received: by 10.43.104.202 with SMTP id dn10mr670246icc.29.1342508754904; Tue, 17 Jul 2012 00:05:54 -0700 (PDT) Received: from e33.co.us.ibm.com (e33.co.us.ibm.com. [32.97.110.151]) by mx.google.com with ESMTPS id bd9si23609501icc.90.2012.07.17.00.05.54 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 17 Jul 2012 00:05:54 -0700 (PDT) Received-SPF: pass (google.com: domain of johnstul@us.ibm.com designates 32.97.110.151 as permitted sender) client-ip=32.97.110.151; Authentication-Results: mx.google.com; spf=pass (google.com: domain of johnstul@us.ibm.com designates 32.97.110.151 as permitted sender) smtp.mail=johnstul@us.ibm.com Received: from /spool/local by e33.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 17 Jul 2012 01:05:52 -0600 Received: from d01dlp03.pok.ibm.com (9.56.224.17) by e33.co.us.ibm.com (192.168.1.133) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 17 Jul 2012 01:05:40 -0600 Received: from d01relay01.pok.ibm.com (d01relay01.pok.ibm.com [9.56.227.233]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 6CDF3C9005E; Tue, 17 Jul 2012 03:05:39 -0400 (EDT) Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay01.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q6H75dKH373796; Tue, 17 Jul 2012 03:05:39 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q6H75b3E016127; Tue, 17 Jul 2012 03:05:38 -0400 Received: from kernel.stglabs.ibm.com (kernel.stglabs.ibm.com [9.114.214.19]) by d01av04.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q6H75aBJ015997; Tue, 17 Jul 2012 03:05:36 -0400 From: John Stultz To: stable@vger.kernel.org Cc: John Stultz , Sasha Levin , Thomas Gleixner , Prarit Bhargava , Linux Kernel Subject: [PATCH 01/11] 3.2.x: ntp: Fix leap-second hrtimer livelock Date: Tue, 17 Jul 2012 03:05:14 -0400 Message-Id: <1342508724-14527-2-git-send-email-johnstul@us.ibm.com> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1342508724-14527-1-git-send-email-johnstul@us.ibm.com> References: <1342508724-14527-1-git-send-email-johnstul@us.ibm.com> X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12071707-2398-0000-0000-0000088EFE6B X-Gm-Message-State: ALoCoQnS1raG+eyHvmwWhGmz91id1pTKlm5sfoz3GKRHfFTr9U+73vp9ic4X0sVRTBFlp5T7AFA6 From: John Stultz This is a backport of 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d This should have been backported when it was commited, but I mistook the problem as requiring the ntp_lock changes that landed in 3.4 in order for it to occur. Unfortunately the same issue can happen (with only one cpu) as follows: do_adjtimex() write_seqlock_irq(&xtime_lock); process_adjtimex_modes() process_adj_status() ntp_start_leap_timer() hrtimer_start() hrtimer_reprogram() tick_program_event() clockevents_program_event() ktime_get() seq = req_seqbegin(xtime_lock); [DEADLOCK] This deadlock will no always occur, as it requires the leap_timer to force a hrtimer_reprogram which only happens if its set and there's no sooner timer to expire. NOTE: This patch, being faithful to the original commit, introduces a bug (we don't update wall_to_monotonic), which will be resovled by backporting a following fix. Original commit message below: Since commit 7dffa3c673fbcf835cd7be80bb4aec8ad3f51168 the ntp subsystem has used an hrtimer for triggering the leapsecond adjustment. However, this can cause a potential livelock. Thomas diagnosed this as the following pattern: CPU 0 CPU 1 do_adjtimex() spin_lock_irq(&ntp_lock); process_adjtimex_modes(); timer_interrupt() process_adj_status(); do_timer() ntp_start_leap_timer(); write_lock(&xtime_lock); hrtimer_start(); update_wall_time(); hrtimer_reprogram(); ntp_tick_length() tick_program_event() spin_lock(&ntp_lock); clockevents_program_event() ktime_get() seq = req_seqbegin(xtime_lock); This patch tries to avoid the problem by reverting back to not using an hrtimer to inject leapseconds, and instead we handle the leapsecond processing in the second_overflow() function. The downside to this change is that on systems that support highres timers, the leap second processing will occur on a HZ tick boundary, (ie: ~1-10ms, depending on HZ) after the leap second instead of possibly sooner (~34us in my tests w/ x86_64 lapic). This patch applies on top of tip/timers/core. CC: Sasha Levin CC: Thomas Gleixner Reported-by: Sasha Levin Diagnoised-by: Thomas Gleixner Tested-by: Sasha Levin Cc: Prarit Bhargava Cc: Thomas Gleixner Cc: Linux Kernel Signed-off-by: John Stultz --- include/linux/timex.h | 2 +- kernel/time/ntp.c | 122 +++++++++++++++------------------------------ kernel/time/timekeeping.c | 18 +++---- 3 files changed, 48 insertions(+), 94 deletions(-) diff --git a/include/linux/timex.h b/include/linux/timex.h index aa60fe7..08e90fb 100644 --- a/include/linux/timex.h +++ b/include/linux/timex.h @@ -266,7 +266,7 @@ static inline int ntp_synced(void) /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */ extern u64 tick_length; -extern void second_overflow(void); +extern int second_overflow(unsigned long secs); extern void update_ntp_one_tick(void); extern int do_adjtimex(struct timex *); extern void hardpps(const struct timespec *, const struct timespec *); diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 4b85a7a..4508f7f 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -31,8 +31,6 @@ unsigned long tick_nsec; u64 tick_length; static u64 tick_length_base; -static struct hrtimer leap_timer; - #define MAX_TICKADJ 500LL /* usecs */ #define MAX_TICKADJ_SCALED \ (((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ) @@ -350,60 +348,60 @@ void ntp_clear(void) } /* - * Leap second processing. If in leap-insert state at the end of the - * day, the system clock is set back one second; if in leap-delete - * state, the system clock is set ahead one second. + * this routine handles the overflow of the microsecond field + * + * The tricky bits of code to handle the accurate clock support + * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame. + * They were originally developed for SUN and DEC kernels. + * All the kudos should go to Dave for this stuff. + * + * Also handles leap second processing, and returns leap offset */ -static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer) +int second_overflow(unsigned long secs) { - enum hrtimer_restart res = HRTIMER_NORESTART; - - write_seqlock(&xtime_lock); + int leap = 0; + s64 delta; + /* + * Leap second processing. If in leap-insert state at the end of the + * day, the system clock is set back one second; if in leap-delete + * state, the system clock is set ahead one second. + */ switch (time_state) { case TIME_OK: + if (time_status & STA_INS) + time_state = TIME_INS; + else if (time_status & STA_DEL) + time_state = TIME_DEL; break; case TIME_INS: - timekeeping_leap_insert(-1); - time_state = TIME_OOP; - printk(KERN_NOTICE - "Clock: inserting leap second 23:59:60 UTC\n"); - hrtimer_add_expires_ns(&leap_timer, NSEC_PER_SEC); - res = HRTIMER_RESTART; + if (secs % 86400 == 0) { + leap = -1; + time_state = TIME_OOP; + printk(KERN_NOTICE + "Clock: inserting leap second 23:59:60 UTC\n"); + } break; case TIME_DEL: - timekeeping_leap_insert(1); - time_tai--; - time_state = TIME_WAIT; - printk(KERN_NOTICE - "Clock: deleting leap second 23:59:59 UTC\n"); + if ((secs + 1) % 86400 == 0) { + leap = 1; + time_tai--; + time_state = TIME_WAIT; + printk(KERN_NOTICE + "Clock: deleting leap second 23:59:59 UTC\n"); + } break; case TIME_OOP: time_tai++; time_state = TIME_WAIT; - /* fall through */ + break; + case TIME_WAIT: if (!(time_status & (STA_INS | STA_DEL))) time_state = TIME_OK; break; } - write_sequnlock(&xtime_lock); - - return res; -} - -/* - * this routine handles the overflow of the microsecond field - * - * The tricky bits of code to handle the accurate clock support - * were provided by Dave Mills (Mills@UDEL.EDU) of NTP fame. - * They were originally developed for SUN and DEC kernels. - * All the kudos should go to Dave for this stuff. - */ -void second_overflow(void) -{ - s64 delta; /* Bump the maxerror field */ time_maxerror += MAXFREQ / NSEC_PER_USEC; @@ -423,23 +421,25 @@ void second_overflow(void) pps_dec_valid(); if (!time_adjust) - return; + goto out; if (time_adjust > MAX_TICKADJ) { time_adjust -= MAX_TICKADJ; tick_length += MAX_TICKADJ_SCALED; - return; + goto out; } if (time_adjust < -MAX_TICKADJ) { time_adjust += MAX_TICKADJ; tick_length -= MAX_TICKADJ_SCALED; - return; + goto out; } tick_length += (s64)(time_adjust * NSEC_PER_USEC / NTP_INTERVAL_FREQ) << NTP_SCALE_SHIFT; time_adjust = 0; +out: + return leap; } #ifdef CONFIG_GENERIC_CMOS_UPDATE @@ -501,27 +501,6 @@ static void notify_cmos_timer(void) static inline void notify_cmos_timer(void) { } #endif -/* - * Start the leap seconds timer: - */ -static inline void ntp_start_leap_timer(struct timespec *ts) -{ - long now = ts->tv_sec; - - if (time_status & STA_INS) { - time_state = TIME_INS; - now += 86400 - now % 86400; - hrtimer_start(&leap_timer, ktime_set(now, 0), HRTIMER_MODE_ABS); - - return; - } - - if (time_status & STA_DEL) { - time_state = TIME_DEL; - now += 86400 - (now + 1) % 86400; - hrtimer_start(&leap_timer, ktime_set(now, 0), HRTIMER_MODE_ABS); - } -} /* * Propagate a new txc->status value into the NTP state: @@ -546,22 +525,6 @@ static inline void process_adj_status(struct timex *txc, struct timespec *ts) time_status &= STA_RONLY; time_status |= txc->status & ~STA_RONLY; - switch (time_state) { - case TIME_OK: - ntp_start_leap_timer(ts); - break; - case TIME_INS: - case TIME_DEL: - time_state = TIME_OK; - ntp_start_leap_timer(ts); - case TIME_WAIT: - if (!(time_status & (STA_INS | STA_DEL))) - time_state = TIME_OK; - break; - case TIME_OOP: - hrtimer_restart(&leap_timer); - break; - } } /* * Called with the xtime lock held, so we can access and modify @@ -643,9 +606,6 @@ int do_adjtimex(struct timex *txc) (txc->tick < 900000/USER_HZ || txc->tick > 1100000/USER_HZ)) return -EINVAL; - - if (txc->modes & ADJ_STATUS && time_state != TIME_OK) - hrtimer_cancel(&leap_timer); } if (txc->modes & ADJ_SETOFFSET) { @@ -967,6 +927,4 @@ __setup("ntp_tick_adj=", ntp_tick_adj_setup); void __init ntp_init(void) { ntp_clear(); - hrtimer_init(&leap_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS); - leap_timer.function = ntp_leap_second; } diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 2378413..4780a7d 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -169,15 +169,6 @@ static struct timespec raw_time; /* flag for if timekeeping is suspended */ int __read_mostly timekeeping_suspended; -/* must hold xtime_lock */ -void timekeeping_leap_insert(int leapsecond) -{ - xtime.tv_sec += leapsecond; - wall_to_monotonic.tv_sec -= leapsecond; - update_vsyscall(&xtime, &wall_to_monotonic, timekeeper.clock, - timekeeper.mult); -} - /** * timekeeping_forward_now - update clock to the current time * @@ -942,9 +933,11 @@ static cycle_t logarithmic_accumulation(cycle_t offset, int shift) timekeeper.xtime_nsec += timekeeper.xtime_interval << shift; while (timekeeper.xtime_nsec >= nsecps) { + int leap; timekeeper.xtime_nsec -= nsecps; xtime.tv_sec++; - second_overflow(); + leap = second_overflow(xtime.tv_sec); + xtime.tv_sec += leap; } /* Accumulate raw time */ @@ -1050,9 +1043,12 @@ static void update_wall_time(void) * xtime.tv_nsec isn't larger then NSEC_PER_SEC */ if (unlikely(xtime.tv_nsec >= NSEC_PER_SEC)) { + int leap; xtime.tv_nsec -= NSEC_PER_SEC; xtime.tv_sec++; - second_overflow(); + leap = second_overflow(xtime.tv_sec); + xtime.tv_sec += leap; + } /* check to see if there is a new clocksource to use */