From patchwork Mon Feb 10 21:11:53 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: John Stultz X-Patchwork-Id: 24427 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qc0-f199.google.com (mail-qc0-f199.google.com [209.85.216.199]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id AE15720143 for ; Mon, 10 Feb 2014 21:12:22 +0000 (UTC) Received: by mail-qc0-f199.google.com with SMTP id m20sf15093877qcx.10 for ; Mon, 10 Feb 2014 13:12:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:in-reply-to:references:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe; bh=u/fNOU/nPkKgwuR3H2OvH/EyBmUt+MJ3pbIxhBA8g1Q=; b=SBZJ1ZEZ9jTAjl9wmJzqIMVkw/RpKT5t+rJP35oVgDAesAMl+ECZofl/WrS+XNVTSF ezVFkCR2hWvlDGwIjYaEIbgnLrqdSGxl4p/whckktAR+l5NWVV2ZlyRqspgE5oZf4CP7 3ygBROy5kvpFR92V3KGtLH+l0I/L+XaiI/HeFw4ofoAIn4IZ08j1uIkSwUf+N4uGbuQL pZ/648yrqMW/SqjOcqOtnoF5uZrjgzGaHw6c3dYCK+SrQcvADBGczIDgbVVX/5L/QUoi ZRWHZ0yDX+HeJIvwR5mDkUj30HjqEQQlrU2Zx5iqEqIu3701xhQ8O42ciDJDXbiuQ0cS Ex5A== X-Gm-Message-State: ALoCoQkV2lF7gA11OF+28sco+syoMCvS9h/RA7MA+yRbwhdr7JY/mKEE1RzGzU0bRPLeUZmf/U9A X-Received: by 10.58.57.137 with SMTP id i9mr12318790veq.20.1392066740712; Mon, 10 Feb 2014 13:12:20 -0800 (PST) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.94.129 with SMTP id g1ls2201024qge.69.gmail; Mon, 10 Feb 2014 13:12:20 -0800 (PST) X-Received: by 10.58.37.67 with SMTP id w3mr3236668vej.22.1392066740640; Mon, 10 Feb 2014 13:12:20 -0800 (PST) Received: from mail-ve0-f176.google.com (mail-ve0-f176.google.com [209.85.128.176]) by mx.google.com with ESMTPS id uw4si5152208vdc.27.2014.02.10.13.12.20 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Feb 2014 13:12:20 -0800 (PST) Received-SPF: neutral (google.com: 209.85.128.176 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.128.176; Received: by mail-ve0-f176.google.com with SMTP id oz11so5336976veb.7 for ; Mon, 10 Feb 2014 13:12:20 -0800 (PST) X-Received: by 10.220.95.139 with SMTP id d11mr8004668vcn.21.1392066740537; Mon, 10 Feb 2014 13:12:20 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.220.174.196 with SMTP id u4csp198061vcz; Mon, 10 Feb 2014 13:12:19 -0800 (PST) X-Received: by 10.66.252.135 with SMTP id zs7mr28389172pac.13.1392066726317; Mon, 10 Feb 2014 13:12:06 -0800 (PST) Received: from mail-pa0-f50.google.com (mail-pa0-f50.google.com [209.85.220.50]) by mx.google.com with ESMTPS id mj6si16654298pab.101.2014.02.10.13.12.06 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Feb 2014 13:12:06 -0800 (PST) Received-SPF: neutral (google.com: 209.85.220.50 is neither permitted nor denied by best guess record for domain of john.stultz@linaro.org) client-ip=209.85.220.50; Received: by mail-pa0-f50.google.com with SMTP id kp14so6710101pab.37 for ; Mon, 10 Feb 2014 13:12:06 -0800 (PST) X-Received: by 10.66.250.202 with SMTP id ze10mr6099398pac.153.1392066725811; Mon, 10 Feb 2014 13:12:05 -0800 (PST) Received: from localhost.localdomain (c-67-170-153-23.hsd1.or.comcast.net. [67.170.153.23]) by mx.google.com with ESMTPSA id qh2sm119192622pab.13.2014.02.10.13.12.04 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 10 Feb 2014 13:12:04 -0800 (PST) From: John Stultz To: stable Cc: John Stultz , Thomas Gleixner , Prarit Bhargava , Richard Cochran , Ingo Molnar , Sasha Levin Subject: [PATCH 5/5] 3.10.y: timekeeping: Avoid possible deadlock from clock_was_set_delayed Date: Mon, 10 Feb 2014 13:11:53 -0800 Message-Id: <1392066713-5025-6-git-send-email-john.stultz@linaro.org> X-Mailer: git-send-email 1.8.3.2 In-Reply-To: <1392066713-5025-1-git-send-email-john.stultz@linaro.org> References: <1392066713-5025-1-git-send-email-john.stultz@linaro.org> X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: john.stultz@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.128.176 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , This is a 3.10-stable backport of 6fdda9a9c5db367130cf32df5d6618d08b89f46a As part of normal operaions, the hrtimer subsystem frequently calls into the timekeeping code, creating a locking order of hrtimer locks -> timekeeping locks clock_was_set_delayed() was suppoed to allow us to avoid deadlocks between the timekeeping the hrtimer subsystem, so that we could notify the hrtimer subsytem the time had changed while holding the timekeeping locks. This was done by scheduling delayed work that would run later once we were out of the timekeeing code. But unfortunately the lock chains are complex enoguh that in scheduling delayed work, we end up eventually trying to grab an hrtimer lock. Sasha Levin noticed this in testing when the new seqlock lockdep enablement triggered the following (somewhat abrieviated) message: [ 251.100221] ====================================================== [ 251.100221] [ INFO: possible circular locking dependency detected ] [ 251.100221] 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 Not tainted [ 251.101967] ------------------------------------------------------- [ 251.101967] kworker/10:1/4506 is trying to acquire lock: [ 251.101967] (timekeeper_seq){----..}, at: [] retrigger_next_event+0x56/0x70 [ 251.101967] [ 251.101967] but task is already holding lock: [ 251.101967] (hrtimer_bases.lock#11){-.-...}, at: [] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] which lock already depends on the new lock. [ 251.101967] [ 251.101967] [ 251.101967] the existing dependency chain (in reverse order) is: [ 251.101967] -> #5 (hrtimer_bases.lock#11){-.-...}: [snipped] -> #4 (&rt_b->rt_runtime_lock){-.-...}: [snipped] -> #3 (&rq->lock){-.-.-.}: [snipped] -> #2 (&p->pi_lock){-.-.-.}: [snipped] -> #1 (&(&pool->lock)->rlock){-.-...}: [ 251.101967] [] validate_chain+0x6c3/0x7b0 [ 251.101967] [] __lock_acquire+0x4ad/0x580 [ 251.101967] [] lock_acquire+0x182/0x1d0 [ 251.101967] [] _raw_spin_lock+0x40/0x80 [ 251.101967] [] __queue_work+0x1a9/0x3f0 [ 251.101967] [] queue_work_on+0x98/0x120 [ 251.101967] [] clock_was_set_delayed+0x21/0x30 [ 251.101967] [] do_adjtimex+0x111/0x160 [ 251.101967] [] compat_sys_adjtimex+0x41/0x70 [ 251.101967] [] ia32_sysret+0x0/0x5 [ 251.101967] -> #0 (timekeeper_seq){----..}: [snipped] [ 251.101967] other info that might help us debug this: [ 251.101967] [ 251.101967] Chain exists of: timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock#11 [ 251.101967] Possible unsafe locking scenario: [ 251.101967] [ 251.101967] CPU0 CPU1 [ 251.101967] ---- ---- [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(&rt_b->rt_runtime_lock); [ 251.101967] lock(hrtimer_bases.lock#11); [ 251.101967] lock(timekeeper_seq); [ 251.101967] [ 251.101967] *** DEADLOCK *** [ 251.101967] [ 251.101967] 3 locks held by kworker/10:1/4506: [ 251.101967] #0: (events){.+.+.+}, at: [] process_one_work+0x200/0x530 [ 251.101967] #1: (hrtimer_work){+.+...}, at: [] process_one_work+0x200/0x530 [ 251.101967] #2: (hrtimer_bases.lock#11){-.-...}, at: [] retrigger_next_event+0x3c/0x70 [ 251.101967] [ 251.101967] stack backtrace: [ 251.101967] CPU: 10 PID: 4506 Comm: kworker/10:1 Not tainted 3.13.0-rc2-next-20131206-sasha-00005-g8be2375-dirty #4053 [ 251.101967] Workqueue: events clock_was_set_work So the best solution is to avoid calling clock_was_set_delayed() while holding the timekeeping lock, and instead using a flag variable to decide if we should call clock_was_set() once we've released the locks. This works for the case here, where the do_adjtimex() was the deadlock trigger point. Unfortuantely, in update_wall_time() we still hold the jiffies lock, which would deadlock with the ipi triggered by clock_was_set(), preventing us from calling it even after we drop the timekeeping lock. So instead call clock_was_set_delayed() at that point. Cc: Thomas Gleixner Cc: Prarit Bhargava Cc: Richard Cochran Cc: Ingo Molnar Cc: Sasha Levin Cc: stable #3.10+ Reported-by: Sasha Levin Tested-by: Sasha Levin Signed-off-by: John Stultz --- kernel/time/timekeeping.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index d81b111..76fefb1 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -1239,9 +1239,10 @@ out_adjust: * It also calls into the NTP code to handle leapsecond processing. * */ -static inline void accumulate_nsecs_to_secs(struct timekeeper *tk) +static inline unsigned int accumulate_nsecs_to_secs(struct timekeeper *tk) { u64 nsecps = (u64)NSEC_PER_SEC << tk->shift; + unsigned int clock_set = 0; while (tk->xtime_nsec >= nsecps) { int leap; @@ -1263,9 +1264,10 @@ static inline void accumulate_nsecs_to_secs(struct timekeeper *tk) __timekeeping_set_tai_offset(tk, tk->tai_offset - leap); - clock_was_set_delayed(); + clock_set = 1; } } + return clock_set; } /** @@ -1278,7 +1280,8 @@ static inline void accumulate_nsecs_to_secs(struct timekeeper *tk) * Returns the unconsumed cycles. */ static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset, - u32 shift) + u32 shift, + unsigned int *clock_set) { cycle_t interval = tk->cycle_interval << shift; u64 raw_nsecs; @@ -1292,7 +1295,7 @@ static cycle_t logarithmic_accumulation(struct timekeeper *tk, cycle_t offset, tk->cycle_last += interval; tk->xtime_nsec += tk->xtime_interval << shift; - accumulate_nsecs_to_secs(tk); + *clock_set |= accumulate_nsecs_to_secs(tk); /* Accumulate raw time */ raw_nsecs = (u64)tk->raw_interval << shift; @@ -1350,6 +1353,7 @@ static void update_wall_time(void) struct timekeeper *tk = &shadow_timekeeper; cycle_t offset; int shift = 0, maxshift; + unsigned int clock_set = 0; unsigned long flags; raw_spin_lock_irqsave(&timekeeper_lock, flags); @@ -1384,7 +1388,8 @@ static void update_wall_time(void) maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1; shift = min(shift, maxshift); while (offset >= tk->cycle_interval) { - offset = logarithmic_accumulation(tk, offset, shift); + offset = logarithmic_accumulation(tk, offset, shift, + &clock_set); if (offset < tk->cycle_interval<cycle_last with the new value */ @@ -1422,6 +1427,10 @@ static void update_wall_time(void) write_seqcount_end(&timekeeper_seq); out: raw_spin_unlock_irqrestore(&timekeeper_lock, flags); + if (clock_set) + /* have to call outside the timekeeper_seq */ + clock_was_set_delayed(); + } /** @@ -1681,11 +1690,13 @@ int do_adjtimex(struct timex *txc) if (tai != orig_tai) { __timekeeping_set_tai_offset(tk, tai); timekeeping_update(tk, false, true); - clock_was_set_delayed(); } write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); + if (tai != orig_tai) + clock_was_set(); + ntp_notify_cmos_timer(); return ret;