diff mbox

[tip/core/rcu] Make RCU_FAST_NO_HZ respect nohz= boot parameter

Message ID 20120705223731.GA27608@linux.vnet.ibm.com
State Accepted
Commit 9d2ad24306f2fafc3612e5a216aab31f9e56e879
Headers show

Commit Message

Paul E. McKenney July 5, 2012, 10:37 p.m. UTC
If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to
also disable itself.  This commit therefore checks for tick_nohz_enabled
being zero, disabling rcu_prepare_for_idle() if so.  This patch assumes
that tick_nohz_enabled can change at runtime: If this is not the case,
then a simpler approach suffices.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Comments

Josh Triplett July 5, 2012, 11:02 p.m. UTC | #1
On Thu, Jul 05, 2012 at 03:37:31PM -0700, Paul E. McKenney wrote:
> If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to
> also disable itself.  This commit therefore checks for tick_nohz_enabled
> being zero, disabling rcu_prepare_for_idle() if so.  This patch assumes
> that tick_nohz_enabled can change at runtime: If this is not the case,
> then a simpler approach suffices.

Allowing nohz to change at runtime seems like an entirely unnecessary
bit of added complexity.  (So does having a boot parameter for it, but
that one at least seems easier to handle.)

What would the patch look like if you can assume nohz will never change
at runtime?  And does anyone have a use case for changing nohz at
runtime, rather than at boot time?

- Josh Triplett
Paul E. McKenney July 6, 2012, 12:29 a.m. UTC | #2
On Thu, Jul 05, 2012 at 04:02:08PM -0700, Josh Triplett wrote:
> On Thu, Jul 05, 2012 at 03:37:31PM -0700, Paul E. McKenney wrote:
> > If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to
> > also disable itself.  This commit therefore checks for tick_nohz_enabled
> > being zero, disabling rcu_prepare_for_idle() if so.  This patch assumes
> > that tick_nohz_enabled can change at runtime: If this is not the case,
> > then a simpler approach suffices.
> 
> Allowing nohz to change at runtime seems like an entirely unnecessary
> bit of added complexity.  (So does having a boot parameter for it, but
> that one at least seems easier to handle.)

I will let representatives from the various distros expound to you on
their one-binary-only strategy for kernel builds.  ;-)

> What would the patch look like if you can assume nohz will never change
> at runtime?  And does anyone have a use case for changing nohz at
> runtime, rather than at boot time?

It would be a little bit simpler, but would break in very odd and
difficult-to-debug ways if anyone ever did allow it to change at runtime,
for example, to accommodate systems subject to varying workloads.

							Thanx, Paul
Josh Triplett July 6, 2012, 3:46 a.m. UTC | #3
On Thu, Jul 05, 2012 at 05:29:44PM -0700, Paul E. McKenney wrote:
> On Thu, Jul 05, 2012 at 04:02:08PM -0700, Josh Triplett wrote:
> > On Thu, Jul 05, 2012 at 03:37:31PM -0700, Paul E. McKenney wrote:
> > > If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to
> > > also disable itself.  This commit therefore checks for tick_nohz_enabled
> > > being zero, disabling rcu_prepare_for_idle() if so.  This patch assumes
> > > that tick_nohz_enabled can change at runtime: If this is not the case,
> > > then a simpler approach suffices.
> > 
> > Allowing nohz to change at runtime seems like an entirely unnecessary
> > bit of added complexity.  (So does having a boot parameter for it, but
> > that one at least seems easier to handle.)
> 
> I will let representatives from the various distros expound to you on
> their one-binary-only strategy for kernel builds.  ;-)

I'm aware.  However, the subset of people wanting to turn off nohz seems
sufficiently small at this point that I'd *hope* distro kernels could
just always have it turned on. :)

In any case, as I said, the ability to change it at runtime seems like
the primary bit of complexity; the ability to change it at boot time
seems straightforward to handle.

> > What would the patch look like if you can assume nohz will never change
> > at runtime?  And does anyone have a use case for changing nohz at
> > runtime, rather than at boot time?
> 
> It would be a little bit simpler, but would break in very odd and
> difficult-to-debug ways if anyone ever did allow it to change at runtime,
> for example, to accommodate systems subject to varying workloads.

Granted, but it doesn't seem worth preemtively making RCU more
complicated to accomodate a use case that nobody has said they have yet.
:)

- Josh Triplett
Paul E. McKenney July 6, 2012, 12:52 p.m. UTC | #4
On Thu, Jul 05, 2012 at 08:46:48PM -0700, Josh Triplett wrote:
> On Thu, Jul 05, 2012 at 05:29:44PM -0700, Paul E. McKenney wrote:
> > On Thu, Jul 05, 2012 at 04:02:08PM -0700, Josh Triplett wrote:
> > > On Thu, Jul 05, 2012 at 03:37:31PM -0700, Paul E. McKenney wrote:
> > > > If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to
> > > > also disable itself.  This commit therefore checks for tick_nohz_enabled
> > > > being zero, disabling rcu_prepare_for_idle() if so.  This patch assumes
> > > > that tick_nohz_enabled can change at runtime: If this is not the case,
> > > > then a simpler approach suffices.
> > > 
> > > Allowing nohz to change at runtime seems like an entirely unnecessary
> > > bit of added complexity.  (So does having a boot parameter for it, but
> > > that one at least seems easier to handle.)
> > 
> > I will let representatives from the various distros expound to you on
> > their one-binary-only strategy for kernel builds.  ;-)
> 
> I'm aware.  However, the subset of people wanting to turn off nohz seems
> sufficiently small at this point that I'd *hope* distro kernels could
> just always have it turned on. :)

The people who want it turned on are of course those who care deeply
about latency from idle, which appears to be a not-unimportant group.
My best guess is that a strong desire to switch this at runtime will
come from cloud-computing people who need to support those who care
deeply about latency from idle, but who want to conserve energy when
running other workloads.

> In any case, as I said, the ability to change it at runtime seems like
> the primary bit of complexity; the ability to change it at boot time
> seems straightforward to handle.

I would need to at least have a warning, which ends up retaining most
of the code.  Otherwise, my first notification of the change is "Hey,
we have this weird and difficult-to-reproduce slowdown (or even hang)."
Of course, this will be at a time when this section of code is the
absolute last thing on my mind...

> > > What would the patch look like if you can assume nohz will never change
> > > at runtime?  And does anyone have a use case for changing nohz at
> > > runtime, rather than at boot time?
> > 
> > It would be a little bit simpler, but would break in very odd and
> > difficult-to-debug ways if anyone ever did allow it to change at runtime,
> > for example, to accommodate systems subject to varying workloads.
> 
> Granted, but it doesn't seem worth preemtively making RCU more
> complicated to accomodate a use case that nobody has said they have yet.
> :)

Speaking as the guy who repeatedly put off reducing RCU grace-period
initialization latency until someone actually complained about it,
despite my being quite certain from the get-go that someone was bound
to complain sooner or later, I can certainly appreciate and identify
with that sentiment.

But in the grace-period initialization latency case, the complexity
of the fix was large, and the problem-isolation procedure extremely
straightforward, courtesy of things like the latency tracer.
In constrast, in then case at hand, the incremental complexity is quite
small and the difficulty spotting the problem is likely to be quite large.

Hence my taking a different approach to these two situations.

							Thanx, Paul
diff mbox

Patch

diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 19b61ac..e978845 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -97,6 +97,7 @@  struct rcu_dynticks {
 				    /* # times non-lazy CBs posted to CPU. */
 	unsigned long nonlazy_posted_snap;
 				    /* idle-period nonlazy_posted snapshot. */
+	int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */
 #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */
 };
 
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index c28d255..3508000 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -1971,6 +1971,8 @@  static void rcu_idle_count_callbacks_posted(void)
 #define RCU_IDLE_GP_DELAY 4		/* Roughly one grace period. */
 #define RCU_IDLE_LAZY_GP_DELAY (6 * HZ)	/* Roughly six seconds. */
 
+extern int tick_nohz_enabled;
+
 /*
  * Does the specified flavor of RCU have non-lazy callbacks pending on
  * the specified CPU?  Both RCU flavor and CPU are specified by the
@@ -2112,6 +2114,7 @@  static void rcu_cleanup_after_idle(int cpu)
 
 	del_timer(&rdtp->idle_gp_timer);
 	trace_rcu_prep_idle("Cleanup after idle");
+	rdtp->tick_nohz_enabled_snap = ACCESS_ONCE(tick_nohz_enabled);
 }
 
 /*
@@ -2137,6 +2140,18 @@  static void rcu_prepare_for_idle(int cpu)
 {
 	struct timer_list *tp;
 	struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
+	int tne;
+
+	/* Handle nohz enablement switches conservatively. */
+	tne = ACCESS_ONCE(tick_nohz_enabled);
+	if (tne != rdtp->tick_nohz_enabled_snap) {
+		if (rcu_cpu_has_callbacks(cpu))
+			invoke_rcu_core(); /* force nohz to see update. */
+		rdtp->tick_nohz_enabled_snap = tne;
+		return;
+	}
+	if (!tne)
+		return;
 
 	/*
 	 * If this is an idle re-entry, for example, due to use of
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 8699978..66ff07f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -105,7 +105,7 @@  static ktime_t tick_init_jiffy_update(void)
 /*
  * NO HZ enabled ?
  */
-static int tick_nohz_enabled __read_mostly  = 1;
+int tick_nohz_enabled __read_mostly  = 1;
 
 /*
  * Enable / Disable tickless mode