diff mbox

[RFC,tip/core/rcu,4/4] rcu: Ensure that RCU_FAST_NO_HZ timers expire on correct CPU

Message ID 1335197761-6577-4-git-send-email-paulmck@linux.vnet.ibm.com
State New
Headers show

Commit Message

Paul E. McKenney April 23, 2012, 4:16 p.m. UTC
From: "Paul E. McKenney" <paul.mckenney@linaro.org>

Timers are subject to migration, which can lead to the following
system-hang scenario when CONFIG_RCU_FAST_NO_HZ=y:

1.	CPU 0 executes synchronize_rcu(), which posts an RCU callback.

2.	CPU 0 then goes idle.  It cannot immediately invoke the callback,
	but there is nothing RCU needs from ti, so it enters dyntick-idle
	mode after posting a timer.

3.	The timer gets migrated to CPU 1.

4.	CPU 0 never wakes up, so the synchronize_rcu() never returns, so
	the system hangs.

This commit fixes this problem by using add_timer_on() to ensure that
the timer remains on the CPU that posted it.

Reported-by: Dipankar Sarma <dipankar@in.ibm.com>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree_plugin.h |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

Comments

Peter Zijlstra April 26, 2012, 1:04 p.m. UTC | #1
On Mon, 2012-04-23 at 09:16 -0700, Paul E. McKenney wrote:
> -               mod_timer(&per_cpu(rcu_idle_gp_timer, cpu),
> -                         per_cpu(rcu_idle_gp_timer_expires, cpu));
> +               tp = &per_cpu(rcu_idle_gp_timer, cpu);
> +               tp->expires = per_cpu(rcu_idle_gp_timer_expires, cpu);
> +               add_timer_on(tp, cpu); 

The simpler change looks to use mod_timer_pinned()
Paul E. McKenney April 26, 2012, 3:54 p.m. UTC | #2
On Thu, Apr 26, 2012 at 03:04:56PM +0200, Peter Zijlstra wrote:
> On Mon, 2012-04-23 at 09:16 -0700, Paul E. McKenney wrote:
> > -               mod_timer(&per_cpu(rcu_idle_gp_timer, cpu),
> > -                         per_cpu(rcu_idle_gp_timer_expires, cpu));
> > +               tp = &per_cpu(rcu_idle_gp_timer, cpu);
> > +               tp->expires = per_cpu(rcu_idle_gp_timer_expires, cpu);
> > +               add_timer_on(tp, cpu); 
> 
> The simpler change looks to use mod_timer_pinned()

Good point!

Except...  Now that you mention it, I don't see how mod_timer_pinned()
actually helps.  It looks to me like a CPU-hotplug operation will
migrate the timers anyway.

This is actually (in theory) harmless in the RCU_FAST_NO_HZ case, because
the CPU_DYING stuff will force a wakeup of the CPU in question, which
will cancel the timer.  But still, mod_timer_pinned() has a rather
misleading name.  ;-)

But a line is a line, so I made this change.

							Thanx, Paul
Steven Rostedt April 26, 2012, 4:08 p.m. UTC | #3
On Thu, 2012-04-26 at 08:54 -0700, Paul E. McKenney wrote:

> > The simpler change looks to use mod_timer_pinned()
> 
> Good point!
> 
> Except...  Now that you mention it, I don't see how mod_timer_pinned()
> actually helps.  It looks to me like a CPU-hotplug operation will
> migrate the timers anyway.
> 
> This is actually (in theory) harmless in the RCU_FAST_NO_HZ case, because
> the CPU_DYING stuff will force a wakeup of the CPU in question, which
> will cancel the timer.  But still, mod_timer_pinned() has a rather
> misleading name.  ;-)
> 
> But a line is a line, so I made this change.
> 

It's expected that if you use this (or anything else pinned to a CPU)
that you add the hotplug hooks to handle a CPU going down.

There's only two users of this that I see. One is
arch/x86/kernel/apic/x2apic_uv_x.c, that has the hotplug handling. The
other is drivers/net/ethernet/tile/tilepro.c, that does not have hotplug
handling, but the tile arch does not support hotplug anyway:

arch/tile/kernel/process.c: cpu_idle()

			if (cpu_is_offline(cpu))
				BUG();  /* no HOTPLUG_CPU */

-- Steve
diff mbox

Patch

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 50c1797..90b3de5 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2109,6 +2109,8 @@  static void rcu_cleanup_after_idle(int cpu)
  */
 static void rcu_prepare_for_idle(int cpu)
 {
+	struct timer_list *tp;
+
 	/*
 	 * If this is an idle re-entry, for example, due to use of
 	 * RCU_NONIDLE() or the new idle-loop tracing API within the idle
@@ -2120,9 +2122,11 @@  static void rcu_prepare_for_idle(int cpu)
 	if (!per_cpu(rcu_idle_first_pass, cpu) &&
 	    (per_cpu(rcu_nonlazy_posted, cpu) ==
 	     per_cpu(rcu_nonlazy_posted_snap, cpu))) {
-		if (rcu_cpu_has_callbacks(cpu))
-			mod_timer(&per_cpu(rcu_idle_gp_timer, cpu),
-				  per_cpu(rcu_idle_gp_timer_expires, cpu));
+		if (rcu_cpu_has_callbacks(cpu)) {
+			tp = &per_cpu(rcu_idle_gp_timer, cpu);
+			tp->expires = per_cpu(rcu_idle_gp_timer_expires, cpu);
+			add_timer_on(tp, cpu);
+		}
 		return;
 	}
 	per_cpu(rcu_idle_first_pass, cpu) = 0;
@@ -2166,8 +2170,9 @@  static void rcu_prepare_for_idle(int cpu)
 		else
 			per_cpu(rcu_idle_gp_timer_expires, cpu) =
 					   jiffies + RCU_IDLE_LAZY_GP_DELAY;
-		mod_timer(&per_cpu(rcu_idle_gp_timer, cpu),
-			  per_cpu(rcu_idle_gp_timer_expires, cpu));
+		tp = &per_cpu(rcu_idle_gp_timer, cpu);
+		tp->expires = per_cpu(rcu_idle_gp_timer_expires, cpu);
+		add_timer_on(tp, cpu);
 		per_cpu(rcu_nonlazy_posted_snap, cpu) =
 			per_cpu(rcu_nonlazy_posted, cpu);
 		return; /* Nothing more to do immediately. */