From patchwork Thu Aug 30 18:18:31 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 11046 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id 2391123EFE for ; Thu, 30 Aug 2012 18:19:37 +0000 (UTC) Received: from mail-ie0-f180.google.com (mail-ie0-f180.google.com [209.85.223.180]) by fiordland.canonical.com (Postfix) with ESMTP id DE8E0A182AA for ; Thu, 30 Aug 2012 18:19:00 +0000 (UTC) Received: by mail-ie0-f180.google.com with SMTP id k11so996735iea.11 for ; Thu, 30 Aug 2012 11:19:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-forwarded-to:x-forwarded-for:delivered-to:received-spf:from:to:cc :subject:date:message-id:x-mailer:in-reply-to:references :x-content-scanned:x-cbid:x-gm-message-state; bh=PzwdygTqnTDAbFE+u+Dan3iCCay0ziJ0mZS0liVK21o=; b=oUjqpWhqqn9jXqw23WwRGTPY+RVVBwrTwa/vwBTVbbDOV2KFAo5hdLbTByoOQwkqRX i5hUWyDaRm3RkmVe2Cyt38fpxW616uvfftZf2eHzFvjZf9JM/WcdVT4bFLwtWh3VWxkq mcV7JbaUvGkE/+JdCFvBmCfsJr0jz1dNJ6jhpTzAWnOH/1x/qyRLryEo+uqL+3SuervV LyIV9eKecNooLECoK6Sk9FWB9So0CP0NQUFI/G1o6lh345S95KWumEVp/WauEcj7nC4T jwXiHsf42RR6KGafBW7q9Jg82U6vXJAHE41QDlfWv+RZ8VE4DPU16G7Fb16SnM8eidwy 9f0Q== Received: by 10.50.237.41 with SMTP id uz9mr1719049igc.43.1346350776582; Thu, 30 Aug 2012 11:19:36 -0700 (PDT) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.50.184.232 with SMTP id ex8csp24722igc; Thu, 30 Aug 2012 11:19:36 -0700 (PDT) Received: by 10.50.11.134 with SMTP id q6mr1713153igb.14.1346350775901; Thu, 30 Aug 2012 11:19:35 -0700 (PDT) Received: from e7.ny.us.ibm.com (e7.ny.us.ibm.com. [32.97.182.137]) by mx.google.com with ESMTPS id vb4si1921544igc.63.2012.08.30.11.19.35 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 30 Aug 2012 11:19:35 -0700 (PDT) Received-SPF: pass (google.com: domain of paulmck@linux.vnet.ibm.com designates 32.97.182.137 as permitted sender) client-ip=32.97.182.137; Authentication-Results: mx.google.com; spf=pass (google.com: domain of paulmck@linux.vnet.ibm.com designates 32.97.182.137 as permitted sender) smtp.mail=paulmck@linux.vnet.ibm.com Received: from /spool/local by e7.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 30 Aug 2012 14:19:34 -0400 Received: from d01dlp03.pok.ibm.com (9.56.250.168) by e7.ny.us.ibm.com (192.168.1.107) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 30 Aug 2012 14:19:28 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id C4362C9004A for ; Thu, 30 Aug 2012 14:19:27 -0400 (EDT) Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q7UIJRnG074496 for ; Thu, 30 Aug 2012 14:19:27 -0400 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q7UIIhY3019006 for ; Thu, 30 Aug 2012 12:18:52 -0600 Received: from paulmck-ThinkPad-W500 (sig-9-77-132-62.mts.ibm.com [9.77.132.62]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q7UIIgRk018685; Thu, 30 Aug 2012 12:18:43 -0600 Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000) id BF7D2EA836; Thu, 30 Aug 2012 11:18:40 -0700 (PDT) From: "Paul E. McKenney" To: linux-kernel@vger.kernel.org Cc: mingo@elte.hu, laijs@cn.fujitsu.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca, josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, Valdis.Kletnieks@vt.edu, dhowells@redhat.com, eric.dumazet@gmail.com, darren@dvhart.com, fweisbec@gmail.com, sbw@mit.edu, patches@linaro.org, "Paul E. McKenney" Subject: [PATCH tip/core/rcu 16/23] rcu: Prevent initialization-time quiescent-state race Date: Thu, 30 Aug 2012 11:18:31 -0700 Message-Id: <1346350718-30937-16-git-send-email-paulmck@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.8 In-Reply-To: <1346350718-30937-1-git-send-email-paulmck@linux.vnet.ibm.com> References: <20120830181811.GA29154@linux.vnet.ibm.com> <1346350718-30937-1-git-send-email-paulmck@linux.vnet.ibm.com> X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12083018-5806-0000-0000-000018FC3D7D X-Gm-Message-State: ALoCoQkm3/QPU7HeVO7nE7TVxXZ9SUHa2tCz2i2qmAYUJhF16sBGwTlO1Gg7oV3QcprGY5hLG9tv From: "Paul E. McKenney" Now the the grace-period initialization procedure is preemptible, it is subject to the following race on systems whose rcu_node tree contains more than one node: 1. CPU 31 starts initializing the grace period, including the first leaf rcu_node structures, and is then preempted. 2. CPU 0 refers to the first leaf rcu_node structure, and notes that a new grace period has started. It passes through a quiescent state shortly thereafter, and informs the RCU core of this rite of passage. 3. CPU 0 enters an RCU read-side critical section, acquiring a pointer to an RCU-protected data item. 4. CPU 31 removes the data item referenced by CPU 0 from the data structure, and registers an RCU callback in order to free it. 5. CPU 31 resumes initializing the grace period, including its own rcu_node structure. In invokes rcu_start_gp_per_cpu(), which advances all callbacks, including the one registered in #4 above, to be handled by the current grace period. 6. The remaining CPUs pass through quiescent states and inform the RCU core, but CPU 0 remains in its RCU read-side critical section, still referencing the now-removed data item. 7. The grace period completes and all the callbacks are invoked, including the one that frees the data item that CPU 0 is still referencing. Oops!!! This commit therefore moves the callback handling to precede initialization of any of the rcu_node structures, thus avoiding this race. Signed-off-by: Paul E. McKenney --- kernel/rcutree.c | 33 +++++++++++++++++++-------------- 1 files changed, 19 insertions(+), 14 deletions(-) diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 55f20fd..d435009 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -1028,20 +1028,6 @@ rcu_start_gp_per_cpu(struct rcu_state *rsp, struct rcu_node *rnp, struct rcu_dat /* Prior grace period ended, so advance callbacks for current CPU. */ __rcu_process_gp_end(rsp, rnp, rdp); - /* - * Because this CPU just now started the new grace period, we know - * that all of its callbacks will be covered by this upcoming grace - * period, even the ones that were registered arbitrarily recently. - * Therefore, advance all outstanding callbacks to RCU_WAIT_TAIL. - * - * Other CPUs cannot be sure exactly when the grace period started. - * Therefore, their recently registered callbacks must pass through - * an additional RCU_NEXT_READY stage, so that they will be handled - * by the next RCU grace period. - */ - rdp->nxttail[RCU_NEXT_READY_TAIL] = rdp->nxttail[RCU_NEXT_TAIL]; - rdp->nxttail[RCU_WAIT_TAIL] = rdp->nxttail[RCU_NEXT_TAIL]; - /* Set state so that this CPU will detect the next quiescent state. */ __note_new_gpnum(rsp, rnp, rdp); } @@ -1068,6 +1054,25 @@ static int rcu_gp_init(struct rcu_state *rsp) rsp->gpnum++; trace_rcu_grace_period(rsp->name, rsp->gpnum, "start"); record_gp_stall_check_time(rsp); + + /* + * Because this CPU just now started the new grace period, we + * know that all of its callbacks will be covered by this upcoming + * grace period, even the ones that were registered arbitrarily + * recently. Therefore, advance all RCU_NEXT_TAIL callbacks + * to RCU_NEXT_READY_TAIL. When the CPU later recognizes the + * start of the new grace period, it will advance all callbacks + * one position, which will cause all of its current outstanding + * callbacks to be handled by the newly started grace period. + * + * Other CPUs cannot be sure exactly when the grace period started. + * Therefore, their recently registered callbacks must pass through + * an additional RCU_NEXT_READY stage, so that they will be handled + * by the next RCU grace period. + */ + rdp = __this_cpu_ptr(rsp->rda); + rdp->nxttail[RCU_NEXT_READY_TAIL] = rdp->nxttail[RCU_NEXT_TAIL]; + raw_spin_unlock_irqrestore(&rnp->lock, flags); /* Exclude any concurrent CPU-hotplug operations. */