[RFC,tip/core/rcu,14/41] rcu: Limit lazy-callback duration

Message ID 1328125319-5205-14-git-send-email-paulmck@linux.vnet.ibm.com
State New
Headers show

Commit Message

Paul E. McKenney Feb. 1, 2012, 7:41 p.m.
From: "Paul E. McKenney" <paul.mckenney@linaro.org>

Currently, a given CPU is permitted to remain in dyntick-idle mode
indefinitely if it has only lazy RCU callbacks queued.  This is vulnerable
to corner cases in NUMA systems, so limit the time to six seconds by
default.  (Currently controlled by a cpp macro.)

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree_plugin.h |   12 +++++++++++-
 1 files changed, 11 insertions(+), 1 deletions(-)

Comments

Josh Triplett Feb. 2, 2012, 2:03 a.m. | #1
On Wed, Feb 01, 2012 at 11:41:32AM -0800, Paul E. McKenney wrote:
> Currently, a given CPU is permitted to remain in dyntick-idle mode
> indefinitely if it has only lazy RCU callbacks queued.  This is vulnerable
> to corner cases in NUMA systems, so limit the time to six seconds by
> default.  (Currently controlled by a cpp macro.)

I wonder: should this scale with the number of callbacks, or do we not
want to make estimates about memory usage based on that?

Interestingly, with kfree_rcu, we actually know at callback queuing time
*exactly* how much memory we'll get back by calling the callback, and we
could sum up those numbers.

- Josh Triplett
Paul E. McKenney Feb. 2, 2012, 5:13 p.m. | #2
On Wed, Feb 01, 2012 at 06:03:56PM -0800, Josh Triplett wrote:
> On Wed, Feb 01, 2012 at 11:41:32AM -0800, Paul E. McKenney wrote:
> > Currently, a given CPU is permitted to remain in dyntick-idle mode
> > indefinitely if it has only lazy RCU callbacks queued.  This is vulnerable
> > to corner cases in NUMA systems, so limit the time to six seconds by
> > default.  (Currently controlled by a cpp macro.)
> 
> I wonder: should this scale with the number of callbacks, or do we not
> want to make estimates about memory usage based on that?

Interesting.  Which way would you scale it?  ;-)

> Interestingly, with kfree_rcu, we actually know at callback queuing time
> *exactly* how much memory we'll get back by calling the callback, and we
> could sum up those numbers.

We can indeed calculate for kfree_rcu(), but we won't be able to for
call_rcu_lazy(), which is my current approach for cases where you cannot
use kfree_rcu() due to (for example) freeing up a linked structure.
A very large fraction of the call_rcu()s in the kernel could become
call_rcu_lazy().

At some point in the future, it might make sense to tie into the
low-memory notifier, which could potentially allow the longer timeout
to be omitted.

My current guess is that the recent change allowing idle CPUs to
exhaust their callback lists will make this kind of fine-tuning
unnecessary, but we will see!

							Thanx, Paul
Josh Triplett Feb. 3, 2012, 4:07 a.m. | #3
On Thu, Feb 02, 2012 at 09:13:42AM -0800, Paul E. McKenney wrote:
> On Wed, Feb 01, 2012 at 06:03:56PM -0800, Josh Triplett wrote:
> > On Wed, Feb 01, 2012 at 11:41:32AM -0800, Paul E. McKenney wrote:
> > > Currently, a given CPU is permitted to remain in dyntick-idle mode
> > > indefinitely if it has only lazy RCU callbacks queued.  This is vulnerable
> > > to corner cases in NUMA systems, so limit the time to six seconds by
> > > default.  (Currently controlled by a cpp macro.)
> > 
> > I wonder: should this scale with the number of callbacks, or do we not
> > want to make estimates about memory usage based on that?
> 
> Interesting.  Which way would you scale it?  ;-)

Heh, I'd figured "don't wait too long if you have a giant pile of
callbacks", but I can see how the other direction could make sense as
well. :)

> > Interestingly, with kfree_rcu, we actually know at callback queuing time
> > *exactly* how much memory we'll get back by calling the callback, and we
> > could sum up those numbers.
> 
> We can indeed calculate for kfree_rcu(), but we won't be able to for
> call_rcu_lazy(), which is my current approach for cases where you cannot
> use kfree_rcu() due to (for example) freeing up a linked structure.
> A very large fraction of the call_rcu()s in the kernel could become
> call_rcu_lazy().

So, doing anything other than freeing memory makes a callback non-lazy?
Based on that, I'd find it at least somewhat surprising if any of the
current callers of call_rcu (other than synchronize_rcu() and similar)
had non-lazy callbacks.

> At some point in the future, it might make sense to tie into the
> low-memory notifier, which could potentially allow the longer timeout
> to be omitted.

Exactly the kind of thing that made me wonder about tracking the actual
amount of memory to free.  Still seems like a potentially useful
statistic to track on its own.

> My current guess is that the recent change allowing idle CPUs to
> exhaust their callback lists will make this kind of fine-tuning
> unnecessary, but we will see!

Good point; given that fix, idle CPUs should never need to wake up for
callbacks at all.

- Josh Triplett
Paul E. McKenney Feb. 3, 2012, 5:54 a.m. | #4
On Thu, Feb 02, 2012 at 08:07:51PM -0800, Josh Triplett wrote:
> On Thu, Feb 02, 2012 at 09:13:42AM -0800, Paul E. McKenney wrote:
> > On Wed, Feb 01, 2012 at 06:03:56PM -0800, Josh Triplett wrote:
> > > On Wed, Feb 01, 2012 at 11:41:32AM -0800, Paul E. McKenney wrote:
> > > > Currently, a given CPU is permitted to remain in dyntick-idle mode
> > > > indefinitely if it has only lazy RCU callbacks queued.  This is vulnerable
> > > > to corner cases in NUMA systems, so limit the time to six seconds by
> > > > default.  (Currently controlled by a cpp macro.)
> > > 
> > > I wonder: should this scale with the number of callbacks, or do we not
> > > want to make estimates about memory usage based on that?
> > 
> > Interesting.  Which way would you scale it?  ;-)
> 
> Heh, I'd figured "don't wait too long if you have a giant pile of
> callbacks", but I can see how the other direction could make sense as
> well. :)

;-)

> > > Interestingly, with kfree_rcu, we actually know at callback queuing time
> > > *exactly* how much memory we'll get back by calling the callback, and we
> > > could sum up those numbers.
> > 
> > We can indeed calculate for kfree_rcu(), but we won't be able to for
> > call_rcu_lazy(), which is my current approach for cases where you cannot
> > use kfree_rcu() due to (for example) freeing up a linked structure.
> > A very large fraction of the call_rcu()s in the kernel could become
> > call_rcu_lazy().
> 
> So, doing anything other than freeing memory makes a callback non-lazy?
> Based on that, I'd find it at least somewhat surprising if any of the
> current callers of call_rcu (other than synchronize_rcu() and similar)
> had non-lazy callbacks.

Yep!  But the caller has to tell me.

Something like 90% of the call_rcu()s could be call_rcu_lazy(), but there
are a significant number that wake someone up, manipulate a reference
counter that someone else is paying attention to, etc.

> > At some point in the future, it might make sense to tie into the
> > low-memory notifier, which could potentially allow the longer timeout
> > to be omitted.
> 
> Exactly the kind of thing that made me wonder about tracking the actual
> amount of memory to free.  Still seems like a potentially useful
> statistic to track on its own.

There is the qlen statistic in the debugfs tracing, tracked on a per-CPU
basis.  But unless it is kfree_rcu(), I have no way to tell how much
memory a given callback frees.

> > My current guess is that the recent change allowing idle CPUs to
> > exhaust their callback lists will make this kind of fine-tuning
> > unnecessary, but we will see!
> 
> Good point; given that fix, idle CPUs should never need to wake up for
> callbacks at all.

Here is hoping!  ;-)

							Thanx, Paul

Patch

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 2b183ab..2fde647 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2049,6 +2049,9 @@  static void rcu_prepare_for_idle(int cpu)
  *	number, be warned: Setting RCU_IDLE_GP_DELAY too high can hang your
  *	system.  And if you are -that- concerned about energy efficiency,
  *	just power the system down and be done with it!
+ * RCU_IDLE_LAZY_GP_DELAY gives the number of jiffies that a CPU is
+ *	permitted to sleep in dyntick-idle mode with only lazy RCU
+ *	callbacks pending.  Setting this too high can OOM your system.
  *
  * The values below work well in practice.  If future workloads require
  * adjustment, they can be converted into kernel config parameters, though
@@ -2057,11 +2060,13 @@  static void rcu_prepare_for_idle(int cpu)
 #define RCU_IDLE_FLUSHES 5		/* Number of dyntick-idle tries. */
 #define RCU_IDLE_OPT_FLUSHES 3		/* Optional dyntick-idle tries. */
 #define RCU_IDLE_GP_DELAY 6		/* Roughly one grace period. */
+#define RCU_IDLE_LAZY_GP_DELAY (6 * HZ)	/* Roughly six seconds. */
 
 static DEFINE_PER_CPU(int, rcu_dyntick_drain);
 static DEFINE_PER_CPU(unsigned long, rcu_dyntick_holdoff);
 static DEFINE_PER_CPU(struct hrtimer, rcu_idle_gp_timer);
-static ktime_t rcu_idle_gp_wait;
+static ktime_t rcu_idle_gp_wait;	/* If some non-lazy callbacks. */
+static ktime_t rcu_idle_lazy_gp_wait;	/* If only lazy callbacks. */
 
 /*
  * Allow the CPU to enter dyntick-idle mode if either: (1) There are no
@@ -2150,6 +2155,8 @@  static void rcu_prepare_for_idle_init(int cpu)
 		unsigned int upj = jiffies_to_usecs(RCU_IDLE_GP_DELAY);
 
 		rcu_idle_gp_wait = ns_to_ktime(upj * (u64)1000);
+		upj = jiffies_to_usecs(6 * HZ);
+		rcu_idle_lazy_gp_wait = ns_to_ktime(upj * (u64)1000);
 		firsttime = 0;
 	}
 }
@@ -2224,6 +2231,9 @@  static void rcu_prepare_for_idle(int cpu)
 		if (rcu_cpu_has_nonlazy_callbacks(cpu))
 			hrtimer_start(&per_cpu(rcu_idle_gp_timer, cpu),
 				      rcu_idle_gp_wait, HRTIMER_MODE_REL);
+		else
+			hrtimer_start(&per_cpu(rcu_idle_gp_timer, cpu),
+				      rcu_idle_lazy_gp_wait, HRTIMER_MODE_REL);
 		return; /* Nothing more to do immediately. */
 	} else if (--per_cpu(rcu_dyntick_drain, cpu) <= 0) {
 		/* We have hit the limit, so time to give up. */