[tip/core/rcu,10/15] rcu: Protect rcu_node accesses during CPU stall warnings

Message ID 1346352988-32444-10-git-send-email-paulmck@linux.vnet.ibm.com
State New
Headers show

Commit Message

Paul E. McKenney Aug. 30, 2012, 6:56 p.m.
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The print_other_cpu_stall() function accesses a number of rcu_node
fields without protection from the ->lock.  In theory, this is not
a problem because the fields accessed are all integers, but in
practice the compiler can get nasty.  Therefore, the commit extends
the existing critical section to cover the entire loop body.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

Comments

Josh Triplett Aug. 31, 2012, 6:23 p.m. | #1
On Thu, Aug 30, 2012 at 11:56:23AM -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> The print_other_cpu_stall() function accesses a number of rcu_node
> fields without protection from the ->lock.  In theory, this is not
> a problem because the fields accessed are all integers, but in
> practice the compiler can get nasty.  Therefore, the commit extends
> the existing critical section to cover the entire loop body.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcutree.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 9f44749..fbe43b0 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -746,14 +746,16 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
>  	rcu_for_each_leaf_node(rsp, rnp) {
>  		raw_spin_lock_irqsave(&rnp->lock, flags);
>  		ndetected += rcu_print_task_stall(rnp);
> -		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> -		if (rnp->qsmask == 0)
> +		if (rnp->qsmask == 0) {
> +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  			continue;
> +		}
>  		for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
>  			if (rnp->qsmask & (1UL << cpu)) {
>  				print_cpu_stall_info(rsp, rnp->grplo + cpu);
>  				ndetected++;
>  			}
> +		raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  	}

Now that you've extended the lock over the rest of the loop body, I
think this would look much clearer if written without the continue and
duplicate lock release:

		...
		if (rnp->qsmask != 0)
			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
				....
		raw_spin_unlock_irqrestore(&rnp->lock, flags);
	}
Paul E. McKenney Sept. 4, 2012, 10:51 p.m. | #2
On Fri, Aug 31, 2012 at 11:23:02AM -0700, Josh Triplett wrote:
> On Thu, Aug 30, 2012 at 11:56:23AM -0700, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > The print_other_cpu_stall() function accesses a number of rcu_node
> > fields without protection from the ->lock.  In theory, this is not
> > a problem because the fields accessed are all integers, but in
> > practice the compiler can get nasty.  Therefore, the commit extends
> > the existing critical section to cover the entire loop body.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcutree.c |    6 ++++--
> >  1 files changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 9f44749..fbe43b0 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -746,14 +746,16 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
> >  	rcu_for_each_leaf_node(rsp, rnp) {
> >  		raw_spin_lock_irqsave(&rnp->lock, flags);
> >  		ndetected += rcu_print_task_stall(rnp);
> > -		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > -		if (rnp->qsmask == 0)
> > +		if (rnp->qsmask == 0) {
> > +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> >  			continue;
> > +		}
> >  		for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
> >  			if (rnp->qsmask & (1UL << cpu)) {
> >  				print_cpu_stall_info(rsp, rnp->grplo + cpu);
> >  				ndetected++;
> >  			}
> > +		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> >  	}
> 
> Now that you've extended the lock over the rest of the loop body, I
> think this would look much clearer if written without the continue and
> duplicate lock release:
> 
> 		...
> 		if (rnp->qsmask != 0)
> 			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
> 				....
> 		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> 	}

And my Hollerith experience strikes again!  ;-)

Though this one seems more worthwhile, so I am making the change, conflicts
permitting.

							Thanx, Paul
Peter Zijlstra Sept. 6, 2012, 2:51 p.m. | #3
On Thu, 2012-08-30 at 11:56 -0700, Paul E. McKenney wrote:
> From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> 
> The print_other_cpu_stall() function accesses a number of rcu_node
> fields without protection from the ->lock.  In theory, this is not
> a problem because the fields accessed are all integers, but in
> practice the compiler can get nasty.  Therefore, the commit extends
> the existing critical section to cover the entire loop body.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcutree.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> index 9f44749..fbe43b0 100644
> --- a/kernel/rcutree.c
> +++ b/kernel/rcutree.c
> @@ -746,14 +746,16 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
>  	rcu_for_each_leaf_node(rsp, rnp) {
>  		raw_spin_lock_irqsave(&rnp->lock, flags);
>  		ndetected += rcu_print_task_stall(rnp);
> -		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> -		if (rnp->qsmask == 0)
> +		if (rnp->qsmask == 0) {
> +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  			continue;
> +		}
>  		for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
>  			if (rnp->qsmask & (1UL << cpu)) {
>  				print_cpu_stall_info(rsp, rnp->grplo + cpu);
>  				ndetected++;
>  			}
> +		raw_spin_unlock_irqrestore(&rnp->lock, flags);
>  	}

You now cover printk() and all that that can call with a RCU lock.. is
this a good thing?
Paul E. McKenney Sept. 6, 2012, 9:01 p.m. | #4
On Thu, Sep 06, 2012 at 04:51:18PM +0200, Peter Zijlstra wrote:
> On Thu, 2012-08-30 at 11:56 -0700, Paul E. McKenney wrote:
> > From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > 
> > The print_other_cpu_stall() function accesses a number of rcu_node
> > fields without protection from the ->lock.  In theory, this is not
> > a problem because the fields accessed are all integers, but in
> > practice the compiler can get nasty.  Therefore, the commit extends
> > the existing critical section to cover the entire loop body.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcutree.c |    6 ++++--
> >  1 files changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcutree.c b/kernel/rcutree.c
> > index 9f44749..fbe43b0 100644
> > --- a/kernel/rcutree.c
> > +++ b/kernel/rcutree.c
> > @@ -746,14 +746,16 @@ static void print_other_cpu_stall(struct rcu_state *rsp)
> >  	rcu_for_each_leaf_node(rsp, rnp) {
> >  		raw_spin_lock_irqsave(&rnp->lock, flags);
> >  		ndetected += rcu_print_task_stall(rnp);
> > -		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > -		if (rnp->qsmask == 0)
> > +		if (rnp->qsmask == 0) {
> > +			raw_spin_unlock_irqrestore(&rnp->lock, flags);
> >  			continue;
> > +		}
> >  		for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
> >  			if (rnp->qsmask & (1UL << cpu)) {
> >  				print_cpu_stall_info(rsp, rnp->grplo + cpu);
> >  				ndetected++;
> >  			}
> > +		raw_spin_unlock_irqrestore(&rnp->lock, flags);
> >  	}
> 
> You now cover printk() and all that that can call with a RCU lock.. is
> this a good thing?

Not particularly good.  However, this happens only if something manages to
block a grace period for 60 seconds, so it should not happen in normal
circumstances.  If that happens, holding the lock allows consistent
state to be printed, which can be helpful in tracking down the bug,
be it in RCU or elsewhere.  So the disease really is worse than the cure.

							Thanx, Paul

Patch

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 9f44749..fbe43b0 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -746,14 +746,16 @@  static void print_other_cpu_stall(struct rcu_state *rsp)
 	rcu_for_each_leaf_node(rsp, rnp) {
 		raw_spin_lock_irqsave(&rnp->lock, flags);
 		ndetected += rcu_print_task_stall(rnp);
-		raw_spin_unlock_irqrestore(&rnp->lock, flags);
-		if (rnp->qsmask == 0)
+		if (rnp->qsmask == 0) {
+			raw_spin_unlock_irqrestore(&rnp->lock, flags);
 			continue;
+		}
 		for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
 			if (rnp->qsmask & (1UL << cpu)) {
 				print_cpu_stall_info(rsp, rnp->grplo + cpu);
 				ndetected++;
 			}
+		raw_spin_unlock_irqrestore(&rnp->lock, flags);
 	}
 
 	/*