From patchwork Thu Sep 20 18:48:04 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
X-Patchwork-Id: 11588
Return-Path: <patch+caf_=linaro-patchwork=canonical.com@linaro.org>
X-Original-To: patchwork@peony.canonical.com
Delivered-To: patchwork@peony.canonical.com
Received: from fiordland.canonical.com (fiordland.canonical.com
 [91.189.94.145])
 by peony.canonical.com (Postfix) with ESMTP id 38DCF23E54
 for <patchwork@peony.canonical.com>;
 Thu, 20 Sep 2012 18:48:48 +0000 (UTC)
Received: from mail-ie0-f180.google.com (mail-ie0-f180.google.com
 [209.85.223.180])
 by fiordland.canonical.com (Postfix) with ESMTP id 65EADA1823C
 for <linaro-patchwork@canonical.com>;
 Thu, 20 Sep 2012 18:48:47 +0000 (UTC)
Received: by mail-ie0-f180.google.com with SMTP id e10so3419662iej.11
 for <linaro-patchwork@canonical.com>;
 Thu, 20 Sep 2012 11:48:47 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-forwarded-to:x-forwarded-for:delivered-to:received-spf:from:to:cc
 :subject:date:message-id:x-mailer:in-reply-to:references
 :x-content-scanned:x-cbid:x-gm-message-state;
 bh=nSK1d/zxdG6nF4QUydrzLlB75n+i48a2DWPSe+qp3oc=;
 b=Mae0fdWQIzVkezq8ox012ojXviavxKi+B/ovhwS25Dp4wcXEkA+wnBjDhnJSnoUc5O
 NLueSQnqht7O3r2xTI62g3caoLPQYZ2mlqHhYPpyAqwgvgHedV73NtKRmbTun9ghsb4A
 lhfjSi62tFGfmysZ5cL4uZcaHkndYeJL9dtuKvrO29Da0mOTgvxGBrPWKPFkdxamzEvQ
 9lHuk+GnznJu5o8uMJkcFDUnRnAeGHPdoGxcRYZTF3rUVymXOSGhuhq5jTBk+pd7Q7fr
 L//AcIpz11BkHw1SmRliD6LDpPZd4GYbs0yr9ijxZGM8MEvDKXGD1s2jsOg3eQQ4AGHc
 r+pQ==
Received: by 10.50.217.227 with SMTP id pb3mr3336875igc.28.1348166927150;
 Thu, 20 Sep 2012 11:48:47 -0700 (PDT)
X-Forwarded-To: linaro-patchwork@canonical.com
X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com
Delivered-To: patches@linaro.org
Received: by 10.50.184.232 with SMTP id ex8csp92196igc;
 Thu, 20 Sep 2012 11:48:46 -0700 (PDT)
Received: by 10.50.95.231 with SMTP id dn7mr2601129igb.37.1348166926739;
 Thu, 20 Sep 2012 11:48:46 -0700 (PDT)
Received: from e39.co.us.ibm.com (e39.co.us.ibm.com. [32.97.110.160])
 by mx.google.com with ESMTPS id
 hh5si9786734igc.68.2012.09.20.11.48.46
 (version=TLSv1/SSLv3 cipher=OTHER);
 Thu, 20 Sep 2012 11:48:46 -0700 (PDT)
Received-SPF: pass (google.com: domain of paulmck@linux.vnet.ibm.com
 designates 32.97.110.160 as permitted sender)
 client-ip=32.97.110.160; 
Authentication-Results: mx.google.com; spf=pass (google.com: domain of
 paulmck@linux.vnet.ibm.com designates 32.97.110.160 as
 permitted sender) smtp.mail=paulmck@linux.vnet.ibm.com
Received: from /spool/local
 by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
 Only! Violators will be prosecuted
 for <patches@linaro.org> from <paulmck@linux.vnet.ibm.com>;
 Thu, 20 Sep 2012 12:48:45 -0600
Received: from d03dlp01.boulder.ibm.com (9.17.202.177)
 by e39.co.us.ibm.com (192.168.1.139) with IBM ESMTP SMTP Gateway:
 Authorized Use Only! Violators will be prosecuted; 
 Thu, 20 Sep 2012 12:48:43 -0600
Received: from d03relay01.boulder.ibm.com (d03relay01.boulder.ibm.com
 [9.17.195.226])
 by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id 199D9C40012;
 Thu, 20 Sep 2012 12:48:39 -0600 (MDT)
Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167])
 by d03relay01.boulder.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id
 q8KIma1a184906; Thu, 20 Sep 2012 12:48:36 -0600
Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1])
 by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP
 id q8KImP9G020949; Thu, 20 Sep 2012 12:48:36 -0600
Received: from paulmck-ThinkPad-W500 ([9.47.24.72])
 by d03av01.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP
 id q8KImNWK020789; Thu, 20 Sep 2012 12:48:24 -0600
Received: by paulmck-ThinkPad-W500 (Postfix, from userid 1000)
 id 08095EC52A; Thu, 20 Sep 2012 11:48:22 -0700 (PDT)
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: linux-kernel@vger.kernel.org
Cc: mingo@elte.hu, laijs@cn.fujitsu.com, dipankar@in.ibm.com,
 akpm@linux-foundation.org, mathieu.desnoyers@polymtl.ca,
 josh@joshtriplett.org, niv@us.ibm.com, tglx@linutronix.de,
 peterz@infradead.org, rostedt@goodmis.org, Valdis.Kletnieks@vt.edu,
 dhowells@redhat.com, eric.dumazet@gmail.com, darren@dvhart.com,
 fweisbec@gmail.com, sbw@mit.edu, patches@linaro.org,
 "Paul E. McKenney" <paul.mckenney@linaro.org>,
 "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: [PATCH tip/core/rcu 08/23] rcu: Provide OOM handler to motivate
 lazy RCU callbacks
Date: Thu, 20 Sep 2012 11:48:04 -0700
Message-Id: <1348166900-18716-8-git-send-email-paulmck@linux.vnet.ibm.com>
X-Mailer: git-send-email 1.7.8
In-Reply-To: <1348166900-18716-1-git-send-email-paulmck@linux.vnet.ibm.com>
References: <20120920184751.GA18657@linux.vnet.ibm.com>
 <1348166900-18716-1-git-send-email-paulmck@linux.vnet.ibm.com>
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12092018-4242-0000-0000-000002F74EDC
X-Gm-Message-State: ALoCoQnzhlP1X7cdi8fvO7gCLJa/i/bHvYkBDBrsd2vKUhCC6VmMIOd1dHXCeRvg5LaOyiOYk8Tu

From: "Paul E. McKenney" <paul.mckenney@linaro.org>

In kernels built with CONFIG_RCU_FAST_NO_HZ=y, CPUs can accumulate a
large number of lazy callbacks, which as the name implies will be slow
to be invoked.  This can be a problem on small-memory systems, where the
default 6-second sleep for CPUs having only lazy RCU callbacks could well
be fatal.  This commit therefore installs an OOM hander that ensures that
every CPU with lazy callbacks has at least one non-lazy callback, in turn
ensuring timely advancement for these callbacks.

Updated to fix bug that disabled OOM killing, noted by Lai Jiangshan.

Updated to push the for_each_rcu_flavor() loop into rcu_oom_notify_cpu(),
thus reducing the number of IPIs, as suggested by Steven Rostedt.  Also
to make the for_each_online_cpu() loop be preemptible.  (Later, it might
be good to use smp_call_function(), as suggested by Peter Zijlstra.)

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Sasha Levin <levinsasha928@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
---
 kernel/rcutree.h        |    5 ++-
 kernel/rcutree_plugin.h |   83 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 87 insertions(+), 1 deletions(-)

diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 117a150..effb273 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -315,8 +315,11 @@ struct rcu_data {
 	unsigned long n_rp_need_fqs;
 	unsigned long n_rp_need_nothing;
 
-	/* 6) _rcu_barrier() callback. */
+	/* 6) _rcu_barrier() and OOM callbacks. */
 	struct rcu_head barrier_head;
+#ifdef CONFIG_RCU_FAST_NO_HZ
+	struct rcu_head oom_head;
+#endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
 
 	int cpu;
 	struct rcu_state *rsp;
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 7f3244c..5879636 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -25,6 +25,7 @@
  */
 
 #include <linux/delay.h>
+#include <linux/oom.h>
 
 #define RCU_KTHREAD_PRIO 1
 
@@ -2112,6 +2113,88 @@ static void rcu_idle_count_callbacks_posted(void)
 	__this_cpu_add(rcu_dynticks.nonlazy_posted, 1);
 }
 
+/*
+ * Data for flushing lazy RCU callbacks at OOM time.
+ */
+static atomic_t oom_callback_count;
+static DECLARE_WAIT_QUEUE_HEAD(oom_callback_wq);
+
+/*
+ * RCU OOM callback -- decrement the outstanding count and deliver the
+ * wake-up if we are the last one.
+ */
+static void rcu_oom_callback(struct rcu_head *rhp)
+{
+	if (atomic_dec_and_test(&oom_callback_count))
+		wake_up(&oom_callback_wq);
+}
+
+/*
+ * Post an rcu_oom_notify callback on the current CPU if it has at
+ * least one lazy callback.  This will unnecessarily post callbacks
+ * to CPUs that already have a non-lazy callback at the end of their
+ * callback list, but this is an infrequent operation, so accept some
+ * extra overhead to keep things simple.
+ */
+static void rcu_oom_notify_cpu(void *unused)
+{
+	struct rcu_state *rsp;
+	struct rcu_data *rdp;
+
+	for_each_rcu_flavor(rsp) {
+		rdp = __this_cpu_ptr(rsp->rda);
+		if (rdp->qlen_lazy != 0) {
+			atomic_inc(&oom_callback_count);
+			rsp->call(&rdp->oom_head, rcu_oom_callback);
+		}
+	}
+}
+
+/*
+ * If low on memory, ensure that each CPU has a non-lazy callback.
+ * This will wake up CPUs that have only lazy callbacks, in turn
+ * ensuring that they free up the corresponding memory in a timely manner.
+ * Because an uncertain amount of memory will be freed in some uncertain
+ * timeframe, we do not claim to have freed anything.
+ */
+static int rcu_oom_notify(struct notifier_block *self,
+			  unsigned long notused, void *nfreed)
+{
+	int cpu;
+
+	/* Wait for callbacks from earlier instance to complete. */
+	wait_event(oom_callback_wq, atomic_read(&oom_callback_count) == 0);
+
+	/*
+	 * Prevent premature wakeup: ensure that all increments happen
+	 * before there is a chance of the counter reaching zero.
+	 */
+	atomic_set(&oom_callback_count, 1);
+
+	get_online_cpus();
+	for_each_online_cpu(cpu) {
+		smp_call_function_single(cpu, rcu_oom_notify_cpu, NULL, 1);
+		cond_resched();
+	}
+	put_online_cpus();
+
+	/* Unconditionally decrement: no need to wake ourselves up. */
+	atomic_dec(&oom_callback_count);
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block rcu_oom_nb = {
+	.notifier_call = rcu_oom_notify
+};
+
+static int __init rcu_register_oom_notifier(void)
+{
+	register_oom_notifier(&rcu_oom_nb);
+	return 0;
+}
+early_initcall(rcu_register_oom_notifier);
+
 #endif /* #else #if !defined(CONFIG_RCU_FAST_NO_HZ) */
 
 #ifdef CONFIG_RCU_CPU_STALL_INFO