diff mbox

[RFC,tip/core/rcu,7/7] rcu: Quiet RCU-lockdep warnings involving interrupt disabling

Message ID 1322937282-19846-7-git-send-email-paulmck@linux.vnet.ibm.com
State New
Headers show

Commit Message

Paul E. McKenney Dec. 3, 2011, 6:34 p.m. UTC
From: Yong Zhang <yong.zhang0@gmail.com>

RCU-lockdep will issue warnings given the following use pattern:

	rcu_read_lock();
	local_irq_disable();
	rcu_read_unlock();
	local_irq_enable();

However, this use pattern is legal except for the scheduler's runqueue
and priority-inheritance locks (and any other locks that the scheduler
might use during priority-inheritance operations).

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree_plugin.h |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

Comments

Yong Zhang Dec. 5, 2011, 9:19 a.m. UTC | #1
On Sat, Dec 03, 2011 at 10:34:42AM -0800, Paul E. McKenney wrote:
> From: Yong Zhang <yong.zhang0@gmail.com>
> 
> RCU-lockdep will issue warnings given the following use pattern:
> 
> 	rcu_read_lock();
> 	local_irq_disable();
> 	rcu_read_unlock();
> 	local_irq_enable();
> 
> However, this use pattern is legal except for the scheduler's runqueue
> and priority-inheritance locks (and any other locks that the scheduler
> might use during priority-inheritance operations).
> 
> Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  kernel/rcutree_plugin.h |    8 ++++++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> index 8cd9efe..2020e8a 100644
> --- a/kernel/rcutree_plugin.h
> +++ b/kernel/rcutree_plugin.h
> @@ -401,8 +401,11 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
>  
>  #ifdef CONFIG_RCU_BOOST
>  		/* Unboost if we were boosted. */
> -		if (rbmp)
> +		if (rbmp) {
> +			local_irq_save(flags);
>  			rt_mutex_unlock(rbmp);
> +			local_irq_restore(flags);
> +		}
>  #endif /* #ifdef CONFIG_RCU_BOOST */
>  
>  		/*
> @@ -1233,9 +1236,10 @@ static int rcu_boost(struct rcu_node *rnp)
>  	lockdep_set_class_and_name(&mtx.wait_lock, &rcu_boost_class,
>  				   "rcu_boost_mutex");
>  	t->rcu_boost_mutex = &mtx;
> -	raw_spin_unlock_irqrestore(&rnp->lock, flags);
> +	raw_spin_unlock(&rnp->lock); /* rrupts remain disabled. */
>  	rt_mutex_lock(&mtx);  /* Side effect: boosts task t's priority. */
>  	rt_mutex_unlock(&mtx);  /* Keep lockdep happy. */

We permit rt_mutex_unlock() to be call with irq disabled,
but rt_mutex_lock() is still not allowed. So this usage
is not legal now.

Sounds we should hold this patch on until a more suitable
way is found.

Thanks,
Yong

> +	local_irq_restore(flags);
>  
>  	return rnp->exp_tasks != NULL || rnp->boost_tasks != NULL;
>  }
> -- 
> 1.7.3.2
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
Peter Zijlstra Dec. 5, 2011, 9:41 a.m. UTC | #2
On Sat, 2011-12-03 at 10:34 -0800, Paul E. McKenney wrote:
> From: Yong Zhang <yong.zhang0@gmail.com>
> 
> RCU-lockdep will issue warnings given the following use pattern:
> 
> 	rcu_read_lock();
> 	local_irq_disable();
> 	rcu_read_unlock();
> 	local_irq_enable();
> 
> However, this use pattern is legal except for the scheduler's runqueue
> and priority-inheritance locks (and any other locks that the scheduler
> might use during priority-inheritance operations).

So what does this patch do? Make it not complain when you do the above?
How often does this pattern actually happen? Can't be that often
otherwise we'd have had more complaints, no?
Yong Zhang Dec. 5, 2011, 10:03 a.m. UTC | #3
On Mon, Dec 05, 2011 at 10:41:36AM +0100, Peter Zijlstra wrote:
> On Sat, 2011-12-03 at 10:34 -0800, Paul E. McKenney wrote:
> > From: Yong Zhang <yong.zhang0@gmail.com>
> > 
> > RCU-lockdep will issue warnings given the following use pattern:
> > 
> > 	rcu_read_lock();
> > 	local_irq_disable();
> > 	rcu_read_unlock();
> > 	local_irq_enable();
> > 
> > However, this use pattern is legal except for the scheduler's runqueue
> > and priority-inheritance locks (and any other locks that the scheduler
> > might use during priority-inheritance operations).
> 
> So what does this patch do? Make it not complain when you do the above?

It suppose to not complain but it bring other complain :(

> How often does this pattern actually happen?

IIRC, we have just one which is cured by commit [a841796: signal: align
__lock_task_sighand() irq disabling and RCU]

> Can't be that often
> otherwise we'd have had more complaints, no?

Yeah,

So that also means we don't dedicated lock_class_key for mtx.wait_lock.

Thanks,
Yong
Paul E. McKenney Dec. 5, 2011, 4:45 p.m. UTC | #4
On Mon, Dec 05, 2011 at 05:19:24PM +0800, Yong Zhang wrote:
> On Sat, Dec 03, 2011 at 10:34:42AM -0800, Paul E. McKenney wrote:
> > From: Yong Zhang <yong.zhang0@gmail.com>
> > 
> > RCU-lockdep will issue warnings given the following use pattern:
> > 
> > 	rcu_read_lock();
> > 	local_irq_disable();
> > 	rcu_read_unlock();
> > 	local_irq_enable();
> > 
> > However, this use pattern is legal except for the scheduler's runqueue
> > and priority-inheritance locks (and any other locks that the scheduler
> > might use during priority-inheritance operations).
> > 
> > Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  kernel/rcutree_plugin.h |    8 ++++++--
> >  1 files changed, 6 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> > index 8cd9efe..2020e8a 100644
> > --- a/kernel/rcutree_plugin.h
> > +++ b/kernel/rcutree_plugin.h
> > @@ -401,8 +401,11 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
> >  
> >  #ifdef CONFIG_RCU_BOOST
> >  		/* Unboost if we were boosted. */
> > -		if (rbmp)
> > +		if (rbmp) {
> > +			local_irq_save(flags);
> >  			rt_mutex_unlock(rbmp);
> > +			local_irq_restore(flags);
> > +		}
> >  #endif /* #ifdef CONFIG_RCU_BOOST */
> >  
> >  		/*
> > @@ -1233,9 +1236,10 @@ static int rcu_boost(struct rcu_node *rnp)
> >  	lockdep_set_class_and_name(&mtx.wait_lock, &rcu_boost_class,
> >  				   "rcu_boost_mutex");
> >  	t->rcu_boost_mutex = &mtx;
> > -	raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > +	raw_spin_unlock(&rnp->lock); /* rrupts remain disabled. */
> >  	rt_mutex_lock(&mtx);  /* Side effect: boosts task t's priority. */
> >  	rt_mutex_unlock(&mtx);  /* Keep lockdep happy. */
> 
> We permit rt_mutex_unlock() to be call with irq disabled,
> but rt_mutex_lock() is still not allowed. So this usage
> is not legal now.

Even after commit #5342e269b has been applied?  The purpose of that
commit was to allow rt_mutex_lock() to be called with irqs disabled.

So, what am I missing?

						Thanx, Paul

> Sounds we should hold this patch on until a more suitable
> way is found.
> 
> Thanks,
> Yong
> 
> > +	local_irq_restore(flags);
> >  
> >  	return rnp->exp_tasks != NULL || rnp->boost_tasks != NULL;
> >  }
> > -- 
> > 1.7.3.2
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> -- 
> Only stand for myself
>
Paul E. McKenney Dec. 5, 2011, 4:48 p.m. UTC | #5
On Mon, Dec 05, 2011 at 06:03:46PM +0800, Yong Zhang wrote:
> On Mon, Dec 05, 2011 at 10:41:36AM +0100, Peter Zijlstra wrote:
> > On Sat, 2011-12-03 at 10:34 -0800, Paul E. McKenney wrote:
> > > From: Yong Zhang <yong.zhang0@gmail.com>
> > > 
> > > RCU-lockdep will issue warnings given the following use pattern:
> > > 
> > > 	rcu_read_lock();
> > > 	local_irq_disable();
> > > 	rcu_read_unlock();
> > > 	local_irq_enable();
> > > 
> > > However, this use pattern is legal except for the scheduler's runqueue
> > > and priority-inheritance locks (and any other locks that the scheduler
> > > might use during priority-inheritance operations).
> > 
> > So what does this patch do? Make it not complain when you do the above?
> 
> It suppose to not complain but it bring other complain :(

Again, even with commit #5342e269b applied?

> > How often does this pattern actually happen?
> 
> IIRC, we have just one which is cured by commit [a841796: signal: align
> __lock_task_sighand() irq disabling and RCU]
> 
> > Can't be that often
> > otherwise we'd have had more complaints, no?

Maybe, maybe not.  To see the complaint, you have to have RCU_BOOST=y.
This is used heavily in -rt, but I bet that there are config options that
don't see much use in -rt.

With this one, prevention is better than after-the-fact cure.

							Thanx, Paul

> Yeah,
> 
> So that also means we don't dedicated lock_class_key for mtx.wait_lock.
> 
> Thanks,
> Yong
>
Yong Zhang Dec. 6, 2011, 1:26 a.m. UTC | #6
On Mon, Dec 05, 2011 at 08:45:05AM -0800, Paul E. McKenney wrote:
> On Mon, Dec 05, 2011 at 05:19:24PM +0800, Yong Zhang wrote:
> > On Sat, Dec 03, 2011 at 10:34:42AM -0800, Paul E. McKenney wrote:
> > > From: Yong Zhang <yong.zhang0@gmail.com>
> > > 
> > > RCU-lockdep will issue warnings given the following use pattern:
> > > 
> > > 	rcu_read_lock();
> > > 	local_irq_disable();
> > > 	rcu_read_unlock();
> > > 	local_irq_enable();
> > > 
> > > However, this use pattern is legal except for the scheduler's runqueue
> > > and priority-inheritance locks (and any other locks that the scheduler
> > > might use during priority-inheritance operations).
> > > 
> > > Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
> > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > ---
> > >  kernel/rcutree_plugin.h |    8 ++++++--
> > >  1 files changed, 6 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> > > index 8cd9efe..2020e8a 100644
> > > --- a/kernel/rcutree_plugin.h
> > > +++ b/kernel/rcutree_plugin.h
> > > @@ -401,8 +401,11 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
> > >  
> > >  #ifdef CONFIG_RCU_BOOST
> > >  		/* Unboost if we were boosted. */
> > > -		if (rbmp)
> > > +		if (rbmp) {
> > > +			local_irq_save(flags);
> > >  			rt_mutex_unlock(rbmp);
> > > +			local_irq_restore(flags);
> > > +		}
> > >  #endif /* #ifdef CONFIG_RCU_BOOST */
> > >  
> > >  		/*
> > > @@ -1233,9 +1236,10 @@ static int rcu_boost(struct rcu_node *rnp)
> > >  	lockdep_set_class_and_name(&mtx.wait_lock, &rcu_boost_class,
> > >  				   "rcu_boost_mutex");
> > >  	t->rcu_boost_mutex = &mtx;
> > > -	raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > > +	raw_spin_unlock(&rnp->lock); /* rrupts remain disabled. */
> > >  	rt_mutex_lock(&mtx);  /* Side effect: boosts task t's priority. */
> > >  	rt_mutex_unlock(&mtx);  /* Keep lockdep happy. */
> > 
> > We permit rt_mutex_unlock() to be call with irq disabled,
> > but rt_mutex_lock() is still not allowed. So this usage
> > is not legal now.
> 
> Even after commit #5342e269b has been applied?

Yeah, because we call might_sleep() in rt_mutex_lock() unconditionally.
But in this case the 'BUG: sleeping function called from invalid context
at *' is obviously false positive.

Maybe we could teach might_sleep() about this special case?

Thanks,
Yong
Paul E. McKenney Dec. 6, 2011, 2:12 a.m. UTC | #7
On Tue, Dec 06, 2011 at 09:26:35AM +0800, Yong Zhang wrote:
> On Mon, Dec 05, 2011 at 08:45:05AM -0800, Paul E. McKenney wrote:
> > On Mon, Dec 05, 2011 at 05:19:24PM +0800, Yong Zhang wrote:
> > > On Sat, Dec 03, 2011 at 10:34:42AM -0800, Paul E. McKenney wrote:
> > > > From: Yong Zhang <yong.zhang0@gmail.com>
> > > > 
> > > > RCU-lockdep will issue warnings given the following use pattern:
> > > > 
> > > > 	rcu_read_lock();
> > > > 	local_irq_disable();
> > > > 	rcu_read_unlock();
> > > > 	local_irq_enable();
> > > > 
> > > > However, this use pattern is legal except for the scheduler's runqueue
> > > > and priority-inheritance locks (and any other locks that the scheduler
> > > > might use during priority-inheritance operations).
> > > > 
> > > > Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
> > > > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > ---
> > > >  kernel/rcutree_plugin.h |    8 ++++++--
> > > >  1 files changed, 6 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
> > > > index 8cd9efe..2020e8a 100644
> > > > --- a/kernel/rcutree_plugin.h
> > > > +++ b/kernel/rcutree_plugin.h
> > > > @@ -401,8 +401,11 @@ static noinline void rcu_read_unlock_special(struct task_struct *t)
> > > >  
> > > >  #ifdef CONFIG_RCU_BOOST
> > > >  		/* Unboost if we were boosted. */
> > > > -		if (rbmp)
> > > > +		if (rbmp) {
> > > > +			local_irq_save(flags);
> > > >  			rt_mutex_unlock(rbmp);
> > > > +			local_irq_restore(flags);
> > > > +		}
> > > >  #endif /* #ifdef CONFIG_RCU_BOOST */
> > > >  
> > > >  		/*
> > > > @@ -1233,9 +1236,10 @@ static int rcu_boost(struct rcu_node *rnp)
> > > >  	lockdep_set_class_and_name(&mtx.wait_lock, &rcu_boost_class,
> > > >  				   "rcu_boost_mutex");
> > > >  	t->rcu_boost_mutex = &mtx;
> > > > -	raw_spin_unlock_irqrestore(&rnp->lock, flags);
> > > > +	raw_spin_unlock(&rnp->lock); /* rrupts remain disabled. */
> > > >  	rt_mutex_lock(&mtx);  /* Side effect: boosts task t's priority. */
> > > >  	rt_mutex_unlock(&mtx);  /* Keep lockdep happy. */
> > > 
> > > We permit rt_mutex_unlock() to be call with irq disabled,
> > > but rt_mutex_lock() is still not allowed. So this usage
> > > is not legal now.
> > 
> > Even after commit #5342e269b has been applied?
> 
> Yeah, because we call might_sleep() in rt_mutex_lock() unconditionally.
> But in this case the 'BUG: sleeping function called from invalid context
> at *' is obviously false positive.
> 
> Maybe we could teach might_sleep() about this special case?

That sounds very good to me!

							Thanx, Paul
Peter Zijlstra Dec. 6, 2011, 9:52 a.m. UTC | #8
On Tue, 2011-12-06 at 09:26 +0800, Yong Zhang wrote:

> Yeah, because we call might_sleep() in rt_mutex_lock() unconditionally.
> But in this case the 'BUG: sleeping function called from invalid context
> at *' is obviously false positive.

Why can't this mutex acquisition not block?

> Maybe we could teach might_sleep() about this special case?

Sounds horrid.
Yong Zhang Dec. 6, 2011, 10:05 a.m. UTC | #9
On Tue, Dec 06, 2011 at 10:52:32AM +0100, Peter Zijlstra wrote:
> On Tue, 2011-12-06 at 09:26 +0800, Yong Zhang wrote:
> 
> > Yeah, because we call might_sleep() in rt_mutex_lock() unconditionally.
> > But in this case the 'BUG: sleeping function called from invalid context
> > at *' is obviously false positive.
> 
> Why can't this mutex acquisition not block?

It could block. The issue it's legal to call rt_mutex_lock() with
irqs disabled and we don't want might_sleep() bite us on this
special case. When we are going to sleep, we re-enable irq in
__rt_mutex_slowlock().

> 
> > Maybe we could teach might_sleep() about this special case?
> 
> Sounds horrid.

Maybe, any alternative?

Thanks,
Yong
Peter Zijlstra Dec. 6, 2011, 10:27 a.m. UTC | #10
On Tue, 2011-12-06 at 10:52 +0100, Peter Zijlstra wrote:
> On Tue, 2011-12-06 at 09:26 +0800, Yong Zhang wrote:
> 
> > Yeah, because we call might_sleep() in rt_mutex_lock() unconditionally.
> > But in this case the 'BUG: sleeping function called from invalid context
> > at *' is obviously false positive.
> 
> Why can't this mutex acquisition not block?

Gaah!! I see, this 5342e269 patch is revolting.. guys that's really vile
don't do that!

I tried reading the RCU code but I gave up.. rcu_boost() does:

  rt_mutex_init_proxy_locked();
  raw_spin_unlock_irqrestore();
  rt_mutex_lock();
  rt_mutex_unlock();

vs rcu_read_unlock_special()'s RCU_READ_UNLOCK_BLOCKED branch:

  rt_mutex_unlock();


The latter looks to be unbalanced because I can't actually find a
matching lock. Also, all of that is ran with IRQs enabled. So what's the
problem?
Peter Zijlstra Dec. 6, 2011, 10:32 a.m. UTC | #11
On Tue, 2011-12-06 at 18:05 +0800, Yong Zhang wrote:
> On Tue, Dec 06, 2011 at 10:52:32AM +0100, Peter Zijlstra wrote:
> > On Tue, 2011-12-06 at 09:26 +0800, Yong Zhang wrote:
> > 
> > > Yeah, because we call might_sleep() in rt_mutex_lock() unconditionally.
> > > But in this case the 'BUG: sleeping function called from invalid context
> > > at *' is obviously false positive.
> > 
> > Why can't this mutex acquisition not block?
> 
> It could block. The issue it's legal to call rt_mutex_lock() with
> irqs disabled and we don't want might_sleep() bite us on this

Of course its legal (nobody will arrest you for it), but its also bloody
horrid.

> special case. When we are going to sleep, we re-enable irq in
> __rt_mutex_slowlock().
> 
> > 
> > > Maybe we could teach might_sleep() about this special case?
> > 
> > Sounds horrid.
> 
> Maybe, any alternative?

Maybe someone explain this mess first?
Steven Rostedt Dec. 6, 2011, 12:26 p.m. UTC | #12
On Tue, 2011-12-06 at 11:32 +0100, Peter Zijlstra wrote:

> > > 
> > > > Maybe we could teach might_sleep() about this special case?
> > > 
> > > Sounds horrid.
> > 
> > Maybe, any alternative?
> 
> Maybe someone explain this mess first?

Me too, because this looks like a hack that's just like a lie. The first
lie to be said gets you out of a bit of trouble, but then something else
happens based on that lie, in which you need to make another bigger lie.
This new lie affects more people and requires new more ingenious lies to
control the chaos. But eventually the lies required to keep everything
going become so overwhelming that it all blows up in your face and you
end up looking like a jackass.

Why is rcu using rt_mutex_lock() in strange ways? It's lying about its
use. And now this patch is the bigger lie to get around the issues of
the first lie. Eventually this code will continue to expand largely
based on these lies and will explode in our faces, and I feel sorry for
the poor jackass that needs to fix it.

Perhaps the real answer is that we need to create an API for priority
inheritance, that things like RCU could use. Attach a task that another
task requires to finish something and boost the priority of that task.
Maybe even completions could use such a thing?

-- Steve
Paul E. McKenney Dec. 6, 2011, 4:01 p.m. UTC | #13
On Tue, Dec 06, 2011 at 10:52:32AM +0100, Peter Zijlstra wrote:
> On Tue, 2011-12-06 at 09:26 +0800, Yong Zhang wrote:
> 
> > Yeah, because we call might_sleep() in rt_mutex_lock() unconditionally.
> > But in this case the 'BUG: sleeping function called from invalid context
> > at *' is obviously false positive.
> 
> Why can't this mutex acquisition not block?

Because its purpose in life is to pass its priority on to the lock
holder.

							Thanx, Paul

> > Maybe we could teach might_sleep() about this special case?
> 
> Sounds horrid.
>
Paul E. McKenney Dec. 6, 2011, 4:04 p.m. UTC | #14
On Tue, Dec 06, 2011 at 07:26:45AM -0500, Steven Rostedt wrote:
> On Tue, 2011-12-06 at 11:32 +0100, Peter Zijlstra wrote:
> 
> > > > 
> > > > > Maybe we could teach might_sleep() about this special case?
> > > > 
> > > > Sounds horrid.
> > > 
> > > Maybe, any alternative?
> > 
> > Maybe someone explain this mess first?
> 
> Me too, because this looks like a hack that's just like a lie. The first
> lie to be said gets you out of a bit of trouble, but then something else
> happens based on that lie, in which you need to make another bigger lie.
> This new lie affects more people and requires new more ingenious lies to
> control the chaos. But eventually the lies required to keep everything
> going become so overwhelming that it all blows up in your face and you
> end up looking like a jackass.
> 
> Why is rcu using rt_mutex_lock() in strange ways? It's lying about its
> use. And now this patch is the bigger lie to get around the issues of
> the first lie. Eventually this code will continue to expand largely
> based on these lies and will explode in our faces, and I feel sorry for
> the poor jackass that needs to fix it.
> 
> Perhaps the real answer is that we need to create an API for priority
> inheritance, that things like RCU could use. Attach a task that another
> task requires to finish something and boost the priority of that task.
> Maybe even completions could use such a thing?

I would be OK with that -- that was in fact the approach I was taking
when I was advised to use mutexes instead.  ;-)

							Thanx, Paul
Paul E. McKenney Dec. 6, 2011, 4:11 p.m. UTC | #15
On Tue, Dec 06, 2011 at 11:27:26AM +0100, Peter Zijlstra wrote:
> On Tue, 2011-12-06 at 10:52 +0100, Peter Zijlstra wrote:
> > On Tue, 2011-12-06 at 09:26 +0800, Yong Zhang wrote:
> > 
> > > Yeah, because we call might_sleep() in rt_mutex_lock() unconditionally.
> > > But in this case the 'BUG: sleeping function called from invalid context
> > > at *' is obviously false positive.
> > 
> > Why can't this mutex acquisition not block?
> 
> Gaah!! I see, this 5342e269 patch is revolting.. guys that's really vile
> don't do that!
> 
> I tried reading the RCU code but I gave up.. rcu_boost() does:
> 
>   rt_mutex_init_proxy_locked();
>   raw_spin_unlock_irqrestore();
>   rt_mutex_lock();
>   rt_mutex_unlock();
> 
> vs rcu_read_unlock_special()'s RCU_READ_UNLOCK_BLOCKED branch:
> 
>   rt_mutex_unlock();
> 
> 
> The latter looks to be unbalanced because I can't actually find a
> matching lock. Also, all of that is ran with IRQs enabled. So what's the
> problem?

The rt_mutex_init_proxy_locked() creates the lock in held state,
held by the RCU reader who is holding up the grace period.
So rcu_read_unlock_special()'s rt_mutex_unlock() is balanced by the
rt_mutex_init_proxy_locked().

The problem with the IRQs enabled is the following sequence:

	rcu_read_lock();
	/* do stuff */
	local_irq_save(flags);
	/* do more stuff */
	rcu_read_unlock();
	/* do even more stuff */
	local_irq_restore(flags);

This has been legal in the past, and might well be used in places that
-rt does not exercise, hence the desire to explicitly legalize it.

							Thanx, Paul
Peter Zijlstra Dec. 6, 2011, 4:14 p.m. UTC | #16
On Tue, 2011-12-06 at 08:11 -0800, Paul E. McKenney wrote:

> The problem with the IRQs enabled is the following sequence:
> 
> 	rcu_read_lock();
> 	/* do stuff */
> 	local_irq_save(flags);
> 	/* do more stuff */
> 	rcu_read_unlock();
> 	/* do even more stuff */
> 	local_irq_restore(flags);
> 
> This has been legal in the past, and might well be used in places that
> -rt does not exercise, hence the desire to explicitly legalize it.

So why not make it strictly dis-allowed, even for !-rt and see what
falls over? If there's lots of fallout we might need to reconsider, but
wouldn't it be easier to all abide by the strictest rules than to try
and frob stuff like was proposed?
Paul E. McKenney Dec. 6, 2011, 4:33 p.m. UTC | #17
On Tue, Dec 06, 2011 at 08:04:55AM -0800, Paul E. McKenney wrote:
> On Tue, Dec 06, 2011 at 07:26:45AM -0500, Steven Rostedt wrote:
> > On Tue, 2011-12-06 at 11:32 +0100, Peter Zijlstra wrote:
> > 
> > > > > 
> > > > > > Maybe we could teach might_sleep() about this special case?
> > > > > 
> > > > > Sounds horrid.
> > > > 
> > > > Maybe, any alternative?
> > > 
> > > Maybe someone explain this mess first?
> > 
> > Me too, because this looks like a hack that's just like a lie. The first
> > lie to be said gets you out of a bit of trouble, but then something else
> > happens based on that lie, in which you need to make another bigger lie.
> > This new lie affects more people and requires new more ingenious lies to
> > control the chaos. But eventually the lies required to keep everything
> > going become so overwhelming that it all blows up in your face and you
> > end up looking like a jackass.
> > 
> > Why is rcu using rt_mutex_lock() in strange ways? It's lying about its
> > use. And now this patch is the bigger lie to get around the issues of
> > the first lie. Eventually this code will continue to expand largely
> > based on these lies and will explode in our faces, and I feel sorry for
> > the poor jackass that needs to fix it.
> > 
> > Perhaps the real answer is that we need to create an API for priority
> > inheritance, that things like RCU could use. Attach a task that another
> > task requires to finish something and boost the priority of that task.
> > Maybe even completions could use such a thing?
> 
> I would be OK with that -- that was in fact the approach I was taking
> when I was advised to use mutexes instead.  ;-)

But in the spirit of full disclosure -- moving from explicit priority
boosting to mutexes simplified the code greatly.  It eliminated a bunch
of races between one task boosting and the boostee exiting its RCU
read-side critical section.  The difficulty is (or maybe was) the need
to block when boosting priority, which meant that there were races
where the booster marked the task, dropped the rcu_node ->lock,
the boostee exited its RCU read-side critical section and uselessly
deboosted itself, then the booster incorrectly did the boost.

But perhaps I was just suffering a failure of imagination.

							Thanx, Paul
Steven Rostedt Dec. 6, 2011, 4:56 p.m. UTC | #18
On Tue, 2011-12-06 at 08:04 -0800, Paul E. McKenney wrote:

> > Perhaps the real answer is that we need to create an API for priority
> > inheritance, that things like RCU could use. Attach a task that another
> > task requires to finish something and boost the priority of that task.
> > Maybe even completions could use such a thing?
> 
> I would be OK with that -- that was in fact the approach I was taking
> when I was advised to use mutexes instead.  ;-)

Maybe we should rethink it. Using the makeshift mutex looks to be a
short term hack. But if we are starting to build on it, it will end up
being a horrible design, based off of a hack.

A mutex is to provide mutual exclusion. If we start bastardizing it to
do other things, it will become unmaintainable. I dare say that it's
close to unmaintainable now ;)

If we create a new API to handle inheritance, then perhaps it could be
used for other things like workqueues and completions (in -rt only).

-- Steve
Paul E. McKenney Dec. 6, 2011, 5:16 p.m. UTC | #19
On Tue, Dec 06, 2011 at 11:56:42AM -0500, Steven Rostedt wrote:
> On Tue, 2011-12-06 at 08:04 -0800, Paul E. McKenney wrote:
> 
> > > Perhaps the real answer is that we need to create an API for priority
> > > inheritance, that things like RCU could use. Attach a task that another
> > > task requires to finish something and boost the priority of that task.
> > > Maybe even completions could use such a thing?
> > 
> > I would be OK with that -- that was in fact the approach I was taking
> > when I was advised to use mutexes instead.  ;-)
> 
> Maybe we should rethink it. Using the makeshift mutex looks to be a
> short term hack. But if we are starting to build on it, it will end up
> being a horrible design, based off of a hack.
> 
> A mutex is to provide mutual exclusion. If we start bastardizing it to
> do other things, it will become unmaintainable. I dare say that it's
> close to unmaintainable now ;)
> 
> If we create a new API to handle inheritance, then perhaps it could be
> used for other things like workqueues and completions (in -rt only).

Tough choice between yours and Peter's suggestion...

1.	Re-introduce ugly races by eliminating the mutex.

2.	Possibly have to deal with a new spate of lockdep-RCU splats.

Decisions, decisions!  ;-)

I suppose that one approach is to start with Peter's approach,
possibly adapting lockdep to explicitly check -- with an exception for
srcu_read_lock_raw(), of course.  If lockdep-RCU splats rain down too
hard, then perhaps the explicit priority inheritance would be one
potential umbrella.

Thoughts?

							Thanx, Paul
diff mbox

Patch

diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index 8cd9efe..2020e8a 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -401,8 +401,11 @@  static noinline void rcu_read_unlock_special(struct task_struct *t)
 
 #ifdef CONFIG_RCU_BOOST
 		/* Unboost if we were boosted. */
-		if (rbmp)
+		if (rbmp) {
+			local_irq_save(flags);
 			rt_mutex_unlock(rbmp);
+			local_irq_restore(flags);
+		}
 #endif /* #ifdef CONFIG_RCU_BOOST */
 
 		/*
@@ -1233,9 +1236,10 @@  static int rcu_boost(struct rcu_node *rnp)
 	lockdep_set_class_and_name(&mtx.wait_lock, &rcu_boost_class,
 				   "rcu_boost_mutex");
 	t->rcu_boost_mutex = &mtx;
-	raw_spin_unlock_irqrestore(&rnp->lock, flags);
+	raw_spin_unlock(&rnp->lock); /* rrupts remain disabled. */
 	rt_mutex_lock(&mtx);  /* Side effect: boosts task t's priority. */
 	rt_mutex_unlock(&mtx);  /* Keep lockdep happy. */
+	local_irq_restore(flags);
 
 	return rnp->exp_tasks != NULL || rnp->boost_tasks != NULL;
 }