Message ID | 20120705223731.GA27608@linux.vnet.ibm.com |
---|---|
State | Accepted |
Commit | 9d2ad24306f2fafc3612e5a216aab31f9e56e879 |
Headers | show |
On Thu, Jul 05, 2012 at 03:37:31PM -0700, Paul E. McKenney wrote: > If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to > also disable itself. This commit therefore checks for tick_nohz_enabled > being zero, disabling rcu_prepare_for_idle() if so. This patch assumes > that tick_nohz_enabled can change at runtime: If this is not the case, > then a simpler approach suffices. Allowing nohz to change at runtime seems like an entirely unnecessary bit of added complexity. (So does having a boot parameter for it, but that one at least seems easier to handle.) What would the patch look like if you can assume nohz will never change at runtime? And does anyone have a use case for changing nohz at runtime, rather than at boot time? - Josh Triplett
On Thu, Jul 05, 2012 at 04:02:08PM -0700, Josh Triplett wrote: > On Thu, Jul 05, 2012 at 03:37:31PM -0700, Paul E. McKenney wrote: > > If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to > > also disable itself. This commit therefore checks for tick_nohz_enabled > > being zero, disabling rcu_prepare_for_idle() if so. This patch assumes > > that tick_nohz_enabled can change at runtime: If this is not the case, > > then a simpler approach suffices. > > Allowing nohz to change at runtime seems like an entirely unnecessary > bit of added complexity. (So does having a boot parameter for it, but > that one at least seems easier to handle.) I will let representatives from the various distros expound to you on their one-binary-only strategy for kernel builds. ;-) > What would the patch look like if you can assume nohz will never change > at runtime? And does anyone have a use case for changing nohz at > runtime, rather than at boot time? It would be a little bit simpler, but would break in very odd and difficult-to-debug ways if anyone ever did allow it to change at runtime, for example, to accommodate systems subject to varying workloads. Thanx, Paul
On Thu, Jul 05, 2012 at 05:29:44PM -0700, Paul E. McKenney wrote: > On Thu, Jul 05, 2012 at 04:02:08PM -0700, Josh Triplett wrote: > > On Thu, Jul 05, 2012 at 03:37:31PM -0700, Paul E. McKenney wrote: > > > If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to > > > also disable itself. This commit therefore checks for tick_nohz_enabled > > > being zero, disabling rcu_prepare_for_idle() if so. This patch assumes > > > that tick_nohz_enabled can change at runtime: If this is not the case, > > > then a simpler approach suffices. > > > > Allowing nohz to change at runtime seems like an entirely unnecessary > > bit of added complexity. (So does having a boot parameter for it, but > > that one at least seems easier to handle.) > > I will let representatives from the various distros expound to you on > their one-binary-only strategy for kernel builds. ;-) I'm aware. However, the subset of people wanting to turn off nohz seems sufficiently small at this point that I'd *hope* distro kernels could just always have it turned on. :) In any case, as I said, the ability to change it at runtime seems like the primary bit of complexity; the ability to change it at boot time seems straightforward to handle. > > What would the patch look like if you can assume nohz will never change > > at runtime? And does anyone have a use case for changing nohz at > > runtime, rather than at boot time? > > It would be a little bit simpler, but would break in very odd and > difficult-to-debug ways if anyone ever did allow it to change at runtime, > for example, to accommodate systems subject to varying workloads. Granted, but it doesn't seem worth preemtively making RCU more complicated to accomodate a use case that nobody has said they have yet. :) - Josh Triplett
On Thu, Jul 05, 2012 at 08:46:48PM -0700, Josh Triplett wrote: > On Thu, Jul 05, 2012 at 05:29:44PM -0700, Paul E. McKenney wrote: > > On Thu, Jul 05, 2012 at 04:02:08PM -0700, Josh Triplett wrote: > > > On Thu, Jul 05, 2012 at 03:37:31PM -0700, Paul E. McKenney wrote: > > > > If the nohz= boot parameter disables nohz, then RCU_FAST_NO_HZ needs to > > > > also disable itself. This commit therefore checks for tick_nohz_enabled > > > > being zero, disabling rcu_prepare_for_idle() if so. This patch assumes > > > > that tick_nohz_enabled can change at runtime: If this is not the case, > > > > then a simpler approach suffices. > > > > > > Allowing nohz to change at runtime seems like an entirely unnecessary > > > bit of added complexity. (So does having a boot parameter for it, but > > > that one at least seems easier to handle.) > > > > I will let representatives from the various distros expound to you on > > their one-binary-only strategy for kernel builds. ;-) > > I'm aware. However, the subset of people wanting to turn off nohz seems > sufficiently small at this point that I'd *hope* distro kernels could > just always have it turned on. :) The people who want it turned on are of course those who care deeply about latency from idle, which appears to be a not-unimportant group. My best guess is that a strong desire to switch this at runtime will come from cloud-computing people who need to support those who care deeply about latency from idle, but who want to conserve energy when running other workloads. > In any case, as I said, the ability to change it at runtime seems like > the primary bit of complexity; the ability to change it at boot time > seems straightforward to handle. I would need to at least have a warning, which ends up retaining most of the code. Otherwise, my first notification of the change is "Hey, we have this weird and difficult-to-reproduce slowdown (or even hang)." Of course, this will be at a time when this section of code is the absolute last thing on my mind... > > > What would the patch look like if you can assume nohz will never change > > > at runtime? And does anyone have a use case for changing nohz at > > > runtime, rather than at boot time? > > > > It would be a little bit simpler, but would break in very odd and > > difficult-to-debug ways if anyone ever did allow it to change at runtime, > > for example, to accommodate systems subject to varying workloads. > > Granted, but it doesn't seem worth preemtively making RCU more > complicated to accomodate a use case that nobody has said they have yet. > :) Speaking as the guy who repeatedly put off reducing RCU grace-period initialization latency until someone actually complained about it, despite my being quite certain from the get-go that someone was bound to complain sooner or later, I can certainly appreciate and identify with that sentiment. But in the grace-period initialization latency case, the complexity of the fix was large, and the problem-isolation procedure extremely straightforward, courtesy of things like the latency tracer. In constrast, in then case at hand, the incremental complexity is quite small and the difficulty spotting the problem is likely to be quite large. Hence my taking a different approach to these two situations. Thanx, Paul
diff --git a/kernel/rcutree.h b/kernel/rcutree.h index 19b61ac..e978845 100644 --- a/kernel/rcutree.h +++ b/kernel/rcutree.h @@ -97,6 +97,7 @@ struct rcu_dynticks { /* # times non-lazy CBs posted to CPU. */ unsigned long nonlazy_posted_snap; /* idle-period nonlazy_posted snapshot. */ + int tick_nohz_enabled_snap; /* Previously seen value from sysfs. */ #endif /* #ifdef CONFIG_RCU_FAST_NO_HZ */ }; diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h index c28d255..3508000 100644 --- a/kernel/rcutree_plugin.h +++ b/kernel/rcutree_plugin.h @@ -1971,6 +1971,8 @@ static void rcu_idle_count_callbacks_posted(void) #define RCU_IDLE_GP_DELAY 4 /* Roughly one grace period. */ #define RCU_IDLE_LAZY_GP_DELAY (6 * HZ) /* Roughly six seconds. */ +extern int tick_nohz_enabled; + /* * Does the specified flavor of RCU have non-lazy callbacks pending on * the specified CPU? Both RCU flavor and CPU are specified by the @@ -2112,6 +2114,7 @@ static void rcu_cleanup_after_idle(int cpu) del_timer(&rdtp->idle_gp_timer); trace_rcu_prep_idle("Cleanup after idle"); + rdtp->tick_nohz_enabled_snap = ACCESS_ONCE(tick_nohz_enabled); } /* @@ -2137,6 +2140,18 @@ static void rcu_prepare_for_idle(int cpu) { struct timer_list *tp; struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu); + int tne; + + /* Handle nohz enablement switches conservatively. */ + tne = ACCESS_ONCE(tick_nohz_enabled); + if (tne != rdtp->tick_nohz_enabled_snap) { + if (rcu_cpu_has_callbacks(cpu)) + invoke_rcu_core(); /* force nohz to see update. */ + rdtp->tick_nohz_enabled_snap = tne; + return; + } + if (!tne) + return; /* * If this is an idle re-entry, for example, due to use of diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 8699978..66ff07f 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -105,7 +105,7 @@ static ktime_t tick_init_jiffy_update(void) /* * NO HZ enabled ? */ -static int tick_nohz_enabled __read_mostly = 1; +int tick_nohz_enabled __read_mostly = 1; /* * Enable / Disable tickless mode