From patchwork Mon Apr 23 16:42:27 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 8056 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id B9F9E23E22 for ; Mon, 23 Apr 2012 16:43:13 +0000 (UTC) Received: from mail-iy0-f180.google.com (mail-iy0-f180.google.com [209.85.210.180]) by fiordland.canonical.com (Postfix) with ESMTP id 7BAA8A181B8 for ; Mon, 23 Apr 2012 16:43:13 +0000 (UTC) Received: by mail-iy0-f180.google.com with SMTP id e36so23096175iag.11 for ; Mon, 23 Apr 2012 09:43:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-forwarded-to:x-forwarded-for:delivered-to:received-spf:from:to:cc :subject:date:message-id:x-mailer:in-reply-to:references :x-content-scanned:x-cbid:x-gm-message-state; bh=r2p+s5QIRYGHxifhQTbkTimpYy48j+UMlt1AmBsmuCs=; b=BzRfZq4prF2va2Dhfj158nHBLy4yXufEzmU45Q2V/kVazz8NHODWfQryXbFNUqILro F3iRQEPS8tcxWN6mHgQnYA+QxCIXqj+w8cs4yXwV3IAKIW6R21dciEbihU0KTkgXdDRH 103qHgKnGW+JIQzhAeWZLa8BmKRDcLbqEDnb1emDBSG/yV5grZYLVaJ7nWSwN1/+AfPV T2F17MCVai8NLYfan+7pANT347+8ZPoWPp5U1Xvbn4HSFD96XfxXXIW5tRfv3/UwmyyD q8plcc7OKufnLolZsrH0lLl9qz39E04UvXpX/E4prz2DhlEaiXCZplGt0i6PF2qUruRo V+Xw== Received: by 10.43.49.201 with SMTP id vb9mr2330194icb.35.1335199393263; Mon, 23 Apr 2012 09:43:13 -0700 (PDT) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.231.137.198 with SMTP id x6csp125386ibt; Mon, 23 Apr 2012 09:43:12 -0700 (PDT) Received: by 10.60.27.98 with SMTP id s2mr23461347oeg.48.1335199392326; Mon, 23 Apr 2012 09:43:12 -0700 (PDT) Received: from e39.co.us.ibm.com (e39.co.us.ibm.com. [32.97.110.160]) by mx.google.com with ESMTPS id sq3si8713862obb.88.2012.04.23.09.43.12 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 23 Apr 2012 09:43:12 -0700 (PDT) Received-SPF: pass (google.com: domain of paulmck@linux.vnet.ibm.com designates 32.97.110.160 as permitted sender) client-ip=32.97.110.160; Authentication-Results: mx.google.com; spf=pass (google.com: domain of paulmck@linux.vnet.ibm.com designates 32.97.110.160 as permitted sender) smtp.mail=paulmck@linux.vnet.ibm.com Received: from /spool/local by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 23 Apr 2012 10:43:10 -0600 Received: from d03dlp01.boulder.ibm.com (9.17.202.177) by e39.co.us.ibm.com (192.168.1.139) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 23 Apr 2012 10:43:06 -0600 Received: from d03relay01.boulder.ibm.com (d03relay01.boulder.ibm.com [9.17.195.226]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 61851C4000D for ; Mon, 23 Apr 2012 10:43:04 -0600 (MDT) Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay01.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q3NGgfB5171906 for ; Mon, 23 Apr 2012 10:42:41 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q3NGgU5r018683 for ; Mon, 23 Apr 2012 10:42:31 -0600 Received: from paulmck-ThinkPad-W500 ([9.47.24.152]) by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q3NGgUj5018643; Mon, 23 Apr 2012 10:42:30 -0600 Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000) id A6930E527A; Mon, 23 Apr 2012 09:42:29 -0700 (PDT) From: "Paul E. McKenney" To: linux-kernel@vger.kernel.org Cc: mingo@elte.hu, laijs@cn.fujitsu.com, dipankar@in.ibm.com, akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca, josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de, peterz@infradead.org, rostedt@goodmis.org, Valdis.Kletnieks@vt.edu, dhowells@redhat.com, eric.dumazet@gmail.com, darren@dvhart.com, fweisbec@gmail.com, patches@linaro.org, "Paul E. McKenney" Subject: [PATCH RFC tip/core/rcu 6/6] rcu: Reduce cache-miss initialization latencies for large systems Date: Mon, 23 Apr 2012 09:42:27 -0700 Message-Id: <1335199347-13926-6-git-send-email-paulmck@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.8 In-Reply-To: <1335199347-13926-1-git-send-email-paulmck@linux.vnet.ibm.com> References: <20120423164159.GA13819@linux.vnet.ibm.com> <1335199347-13926-1-git-send-email-paulmck@linux.vnet.ibm.com> X-Content-Scanned: Fidelis XPS MAILER x-cbid: 12042316-4242-0000-0000-0000016DB46E X-Gm-Message-State: ALoCoQlu0df+TWg0rQoIMXOItkZTsepbHdQuD5XhR3Ek30q2ZhH0JlzG1XVD6cX/nk0ceoYXSMgg From: "Paul E. McKenney" Commit #0209f649 (rcu: limit rcu_node leaf-level fanout) set an upper limit of 16 on the leaf-level fanout for the rcu_node tree. This was needed to reduce lock contention that was induced by the synchronization of scheduling-clock interrupts, which was in turn needed to improve energy efficiency for moderate-sized lightly loaded servers. However, reducing the leaf-level fanout means that there are more leaf-level rcu_node structures in the tree, which in turn means that RCU's grace-period initialization incurs more cache misses. This is not a problem on moderate-sized servers with only a few tens of CPUs, but becomes a major source of real-time latency spikes on systems with many hundreds of CPUs. In addition, the workloads running on these large systems tend to be CPU-bound, which eliminates the energy-efficiency advantages of synchronizing scheduling-clock interrupts. Therefore, these systems need maximal values for the rcu_node leaf-level fanout. This commit addresses this problem by introducing a new kernel parameter named RCU_FANOUT_LEAF that directly controls the leaf-level fanout. This parameter defaults to 16 to handle the common case of a moderate sized lightly loaded servers, but may be set higher on larger systems. Reported-by: Mike Galbraith Reported-by: Dimitri Sivanich Signed-off-by: Paul E. McKenney --- init/Kconfig | 27 +++++++++++++++++++++++++++ kernel/rcutree.c | 2 +- kernel/rcutree.h | 10 +++------- 3 files changed, 31 insertions(+), 8 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index 85c6870..6d18ef8 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -458,6 +458,33 @@ config RCU_FANOUT Select a specific number if testing RCU itself. Take the default if unsure. +config RCU_FANOUT_LEAF + int "Tree-based hierarchical RCU leaf-level fanout value" + range 2 RCU_FANOUT if 64BIT + range 2 RCU_FANOUT if !64BIT + depends on TREE_RCU || TREE_PREEMPT_RCU + default 16 + help + This option controls the leaf-level fanout of hierarchical + implementations of RCU, and allows trading off cache misses + against lock contention. Systems that synchronize their + scheduling-clock interrupts for energy-efficiency reasons will + want the default because the smaller leaf-level fanout keeps + lock contention levels acceptably low. Very large systems + (hundreds or thousands of CPUs) will instead want to set this + value to the maximum value possible in order to reduce the + number of cache misses incurred during RCU's grace-period + initialization. These systems tend to run CPU-bound, and thus + are not helped by synchronized interrupts, and thus tend to + skew them, which reduces lock contention enough that large + leaf-level fanouts work well. + + Select a specific number if testing RCU itself. + + Select the maximum permissible value for large systems. + + Take the default if unsure. + config RCU_FANOUT_EXACT bool "Disable tree-based hierarchical RCU auto-balancing" depends on TREE_RCU || TREE_PREEMPT_RCU diff --git a/kernel/rcutree.c b/kernel/rcutree.c index 1050d6d..780acf8 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -2418,7 +2418,7 @@ static void __init rcu_init_levelspread(struct rcu_state *rsp) for (i = NUM_RCU_LVLS - 1; i > 0; i--) rsp->levelspread[i] = CONFIG_RCU_FANOUT; - rsp->levelspread[0] = RCU_FANOUT_LEAF; + rsp->levelspread[0] = CONFIG_RCU_FANOUT_LEAF; } #else /* #ifdef CONFIG_RCU_FANOUT_EXACT */ static void __init rcu_init_levelspread(struct rcu_state *rsp) diff --git a/kernel/rcutree.h b/kernel/rcutree.h index cdd1be0..a905c20 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -29,18 +29,14 @@ #include /* - * Define shape of hierarchy based on NR_CPUS and CONFIG_RCU_FANOUT. + * Define shape of hierarchy based on NR_CPUS, CONFIG_RCU_FANOUT, and + * CONFIG_RCU_FANOUT_LEAF. * In theory, it should be possible to add more levels straightforwardly. * In practice, this did work well going from three levels to four. * Of course, your mileage may vary. */ #define MAX_RCU_LVLS 4 -#if CONFIG_RCU_FANOUT > 16 -#define RCU_FANOUT_LEAF 16 -#else /* #if CONFIG_RCU_FANOUT > 16 */ -#define RCU_FANOUT_LEAF (CONFIG_RCU_FANOUT) -#endif /* #else #if CONFIG_RCU_FANOUT > 16 */ -#define RCU_FANOUT_1 (RCU_FANOUT_LEAF) +#define RCU_FANOUT_1 (CONFIG_RCU_FANOUT_LEAF) #define RCU_FANOUT_2 (RCU_FANOUT_1 * CONFIG_RCU_FANOUT) #define RCU_FANOUT_3 (RCU_FANOUT_2 * CONFIG_RCU_FANOUT) #define RCU_FANOUT_4 (RCU_FANOUT_3 * CONFIG_RCU_FANOUT)