diff mbox series

[01/13] rcu/nocb: Fix potential missed nocb_timer rearm

Message ID 20210223001011.127063-2-frederic@kernel.org
State Superseded
Headers show
Series rcu/nocb updates v2 | expand

Commit Message

Frederic Weisbecker Feb. 23, 2021, 12:09 a.m. UTC
Two situations can cause a missed nocb timer rearm:

1) rdp(CPU A) queues its nocb timer. The grace period elapses before
   the timer get a chance to fire. The nocb_gp kthread is awaken by
   rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and
   process the callbacks, again before the nocb_timer for CPU A get a
   chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp
   kthread, cancelling the pending nocb_timer without resetting the
   corresponding nocb_defer_wakeup.

2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes
   the pending "nocb_timer" (note they are not the same timers) for the
   given rdp without resetting the matching state stored in nocb_defer
   wakeup.

On both situations, a future call_rcu() on that rdp may be fooled and
think the timer is armed when it's not, missing a deferred nocb_gp
wakeup.

Case 1) is very unlikely due to timing constraint (the timer fires after
1 jiffy) but still possible in theory. Case 2) is more likely to happen.
But in any case such scenario require the CPU to spend a long time
within a kernel thread without exiting to idle or user space, which is
a pretty exotic behaviour.

Fix this with resetting rdp->nocb_defer_wakeup everytime we disarm the
timer.

Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
Cc: Stable <stable@vger.kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/rcu/tree_plugin.h | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Paul E. McKenney Feb. 24, 2021, 6:37 p.m. UTC | #1
On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:
> Two situations can cause a missed nocb timer rearm:

> 

> 1) rdp(CPU A) queues its nocb timer. The grace period elapses before

>    the timer get a chance to fire. The nocb_gp kthread is awaken by

>    rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and

>    process the callbacks, again before the nocb_timer for CPU A get a

>    chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp

>    kthread, cancelling the pending nocb_timer without resetting the

>    corresponding nocb_defer_wakeup.


As discussed offlist, expanding the above scenario results in this
sequence of steps:

1.	There are no callbacks queued for any CPU covered by CPU 0-2's
	->nocb_gp_kthread.

2.	CPU 0 enqueues its first callback with interrupts disabled, and
	thus must defer awakening its ->nocb_gp_kthread.  It therefore
	queues its rcu_data structure's ->nocb_timer.

3.	CPU 1, which shares the same ->nocb_gp_kthread, also enqueues a
	callback, but with interrupts enabled, allowing it to directly
	awaken the ->nocb_gp_kthread.

4.	The newly awakened ->nocb_gp_kthread associates both CPU 0's
	and CPU 1's callbacks with a future grace period and arranges
	for that grace period to be started.

5.	This ->nocb_gp_kthread goes to sleep waiting for the end of this
	future grace period.

6.	This grace period elapses before the CPU 0's timer fires.
	This is normally improbably given that the timer is set for only
	one jiffy, but timers can be delayed.  Besides, it is possible
	that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.

7.	The grace period ends, so rcu_gp_kthread awakens the
	->nocb_gp_kthread, which in turn awakens both CPU 0's and
	CPU 1's ->nocb_cb_kthread.

8.	CPU 0's ->nocb_cb_kthread invokes its callback.

9.	Note that neither kthread updated any ->nocb_timer state,
	so CPU 0's ->nocb_defer_wakeup is still set to either
	RCU_NOCB_WAKE or RCU_NOCB_WAKE_FORCE.

10.	CPU 0 enqueues its second callback, again with interrupts
	disabled, and thus must again defer awakening its
	->nocb_gp_kthread.  However, ->nocb_defer_wakeup prevents
	CPU 0 from queueing the timer.

So far so good, but why isn't the timer still queued from back in step 2?
What am I missing here?  Either way, could you please update the commit
logs to tell the full story?  At some later time, you might be very
happy that you did.  ;-)

							Thanx, Paul

> 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes

>    the pending "nocb_timer" (note they are not the same timers) for the

>    given rdp without resetting the matching state stored in nocb_defer

>    wakeup.

> 

> On both situations, a future call_rcu() on that rdp may be fooled and

> think the timer is armed when it's not, missing a deferred nocb_gp

> wakeup.

> 

> Case 1) is very unlikely due to timing constraint (the timer fires after

> 1 jiffy) but still possible in theory. Case 2) is more likely to happen.

> But in any case such scenario require the CPU to spend a long time

> within a kernel thread without exiting to idle or user space, which is

> a pretty exotic behaviour.

> 

> Fix this with resetting rdp->nocb_defer_wakeup everytime we disarm the

> timer.

> 

> Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)

> Cc: Stable <stable@vger.kernel.org>

> Cc: Josh Triplett <josh@joshtriplett.org>

> Cc: Lai Jiangshan <jiangshanlai@gmail.com>

> Cc: Joel Fernandes <joel@joelfernandes.org>

> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>

> Cc: Boqun Feng <boqun.feng@gmail.com>

> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

> ---

>  kernel/rcu/tree_plugin.h | 7 +++++--

>  1 file changed, 5 insertions(+), 2 deletions(-)

> 

> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h

> index 2ec9d7f55f99..dd0dc66c282d 100644

> --- a/kernel/rcu/tree_plugin.h

> +++ b/kernel/rcu/tree_plugin.h

> @@ -1720,7 +1720,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,

>  		rcu_nocb_unlock_irqrestore(rdp, flags);

>  		return false;

>  	}

> -	del_timer(&rdp->nocb_timer);

> +

> +	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {

> +		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);

> +		del_timer(&rdp->nocb_timer);

> +	}

>  	rcu_nocb_unlock_irqrestore(rdp, flags);

>  	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);

>  	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {

> @@ -2349,7 +2353,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)

>  		return false;

>  	}

>  	ndw = READ_ONCE(rdp->nocb_defer_wakeup);

> -	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);

>  	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);

>  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));

>  

> -- 

> 2.25.1

>
Frederic Weisbecker Feb. 24, 2021, 10:06 p.m. UTC | #2
On Wed, Feb 24, 2021 at 10:37:09AM -0800, Paul E. McKenney wrote:
> On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:

> > Two situations can cause a missed nocb timer rearm:

> > 

> > 1) rdp(CPU A) queues its nocb timer. The grace period elapses before

> >    the timer get a chance to fire. The nocb_gp kthread is awaken by

> >    rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and

> >    process the callbacks, again before the nocb_timer for CPU A get a

> >    chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp

> >    kthread, cancelling the pending nocb_timer without resetting the

> >    corresponding nocb_defer_wakeup.

> 

> As discussed offlist, expanding the above scenario results in this

> sequence of steps:

> 

> 1.	There are no callbacks queued for any CPU covered by CPU 0-2's

> 	->nocb_gp_kthread.

> 

> 2.	CPU 0 enqueues its first callback with interrupts disabled, and

> 	thus must defer awakening its ->nocb_gp_kthread.  It therefore

> 	queues its rcu_data structure's ->nocb_timer.

> 

> 3.	CPU 1, which shares the same ->nocb_gp_kthread, also enqueues a

> 	callback, but with interrupts enabled, allowing it to directly

> 	awaken the ->nocb_gp_kthread.

> 

> 4.	The newly awakened ->nocb_gp_kthread associates both CPU 0's

> 	and CPU 1's callbacks with a future grace period and arranges

> 	for that grace period to be started.

> 

> 5.	This ->nocb_gp_kthread goes to sleep waiting for the end of this

> 	future grace period.

> 

> 6.	This grace period elapses before the CPU 0's timer fires.

> 	This is normally improbably given that the timer is set for only

> 	one jiffy, but timers can be delayed.  Besides, it is possible

> 	that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.

> 

> 7.	The grace period ends, so rcu_gp_kthread awakens the

> 	->nocb_gp_kthread, which in turn awakens both CPU 0's and

> 	CPU 1's ->nocb_cb_kthread.

> 

> 8.	CPU 0's ->nocb_cb_kthread invokes its callback.

> 

> 9.	Note that neither kthread updated any ->nocb_timer state,

> 	so CPU 0's ->nocb_defer_wakeup is still set to either

> 	RCU_NOCB_WAKE or RCU_NOCB_WAKE_FORCE.

> 

> 10.	CPU 0 enqueues its second callback, again with interrupts

> 	disabled, and thus must again defer awakening its

> 	->nocb_gp_kthread.  However, ->nocb_defer_wakeup prevents

> 	CPU 0 from queueing the timer.


I managed to recollect some pieces of my brain. So keep the above but
let's change the point 10:

10.     CPU 0 enqueues its second callback, this time with interrupts
 	enabled so it can wake directly	->nocb_gp_kthread.
	It does so with calling __wake_nocb_gp() which also cancels the
	pending timer that got queued in step 2. But that doesn't reset
	CPU 0's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
	So CPU 0's ->nocb_defer_wakeup and CPU 0's ->nocb_timer are now
	desynchronized.

11.	->nocb_gp_kthread associates the callback queued in 10 with a new
	grace period, arrange for it to start and sleeps on it.

12.     The grace period ends, ->nocb_gp_kthread awakens and wakes up
	CPU 0's ->nocb_cb_kthread which invokes the callback queued in 10.

13.	CPU 0 enqueues its third callback, this time with interrupts
	disabled so it tries to queue a deferred wakeup. However
	->nocb_defer_wakeup has a stalled RCU_NOCB_WAKE value which prevents
	the CPU 0's ->nocb_timer, that got cancelled in 10, from being armed.

14.     CPU 0 has its pending callback and it may go unnoticed until
        some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever calls
	an explicit deferred wake up caller like idle entry.

I hope I'm not missing something this time...

Thanks.
	

> 

> So far so good, but why isn't the timer still queued from back in step 2?

> What am I missing here?  Either way, could you please update the commit

> logs to tell the full story?  At some later time, you might be very

> happy that you did.  ;-)

> 

> 							Thanx, Paul

> 

> > 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes

> >    the pending "nocb_timer" (note they are not the same timers) for the

> >    given rdp without resetting the matching state stored in nocb_defer

> >    wakeup.

> > 

> > On both situations, a future call_rcu() on that rdp may be fooled and

> > think the timer is armed when it's not, missing a deferred nocb_gp

> > wakeup.

> > 

> > Case 1) is very unlikely due to timing constraint (the timer fires after

> > 1 jiffy) but still possible in theory. Case 2) is more likely to happen.

> > But in any case such scenario require the CPU to spend a long time

> > within a kernel thread without exiting to idle or user space, which is

> > a pretty exotic behaviour.

> > 

> > Fix this with resetting rdp->nocb_defer_wakeup everytime we disarm the

> > timer.

> > 

> > Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)

> > Cc: Stable <stable@vger.kernel.org>

> > Cc: Josh Triplett <josh@joshtriplett.org>

> > Cc: Lai Jiangshan <jiangshanlai@gmail.com>

> > Cc: Joel Fernandes <joel@joelfernandes.org>

> > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>

> > Cc: Boqun Feng <boqun.feng@gmail.com>

> > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

> > ---

> >  kernel/rcu/tree_plugin.h | 7 +++++--

> >  1 file changed, 5 insertions(+), 2 deletions(-)

> > 

> > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h

> > index 2ec9d7f55f99..dd0dc66c282d 100644

> > --- a/kernel/rcu/tree_plugin.h

> > +++ b/kernel/rcu/tree_plugin.h

> > @@ -1720,7 +1720,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,

> >  		rcu_nocb_unlock_irqrestore(rdp, flags);

> >  		return false;

> >  	}

> > -	del_timer(&rdp->nocb_timer);

> > +

> > +	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {

> > +		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);

> > +		del_timer(&rdp->nocb_timer);

> > +	}

> >  	rcu_nocb_unlock_irqrestore(rdp, flags);

> >  	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);

> >  	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {

> > @@ -2349,7 +2353,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)

> >  		return false;

> >  	}

> >  	ndw = READ_ONCE(rdp->nocb_defer_wakeup);

> > -	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);

> >  	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);

> >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));

> >  

> > -- 

> > 2.25.1

> >
Paul E. McKenney Feb. 25, 2021, 12:14 a.m. UTC | #3
On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:
> On Wed, Feb 24, 2021 at 10:37:09AM -0800, Paul E. McKenney wrote:
> > On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:
> > > Two situations can cause a missed nocb timer rearm:
> > > 
> > > 1) rdp(CPU A) queues its nocb timer. The grace period elapses before
> > >    the timer get a chance to fire. The nocb_gp kthread is awaken by
> > >    rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and
> > >    process the callbacks, again before the nocb_timer for CPU A get a
> > >    chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp
> > >    kthread, cancelling the pending nocb_timer without resetting the
> > >    corresponding nocb_defer_wakeup.
> > 
> > As discussed offlist, expanding the above scenario results in this
> > sequence of steps:
> > 
> > 1.	There are no callbacks queued for any CPU covered by CPU 0-2's
> > 	->nocb_gp_kthread.
> > 
> > 2.	CPU 0 enqueues its first callback with interrupts disabled, and
> > 	thus must defer awakening its ->nocb_gp_kthread.  It therefore
> > 	queues its rcu_data structure's ->nocb_timer.
> > 
> > 3.	CPU 1, which shares the same ->nocb_gp_kthread, also enqueues a
> > 	callback, but with interrupts enabled, allowing it to directly
> > 	awaken the ->nocb_gp_kthread.
> > 
> > 4.	The newly awakened ->nocb_gp_kthread associates both CPU 0's
> > 	and CPU 1's callbacks with a future grace period and arranges
> > 	for that grace period to be started.
> > 
> > 5.	This ->nocb_gp_kthread goes to sleep waiting for the end of this
> > 	future grace period.
> > 
> > 6.	This grace period elapses before the CPU 0's timer fires.
> > 	This is normally improbably given that the timer is set for only
> > 	one jiffy, but timers can be delayed.  Besides, it is possible
> > 	that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
> > 
> > 7.	The grace period ends, so rcu_gp_kthread awakens the
> > 	->nocb_gp_kthread, which in turn awakens both CPU 0's and
> > 	CPU 1's ->nocb_cb_kthread.
> > 
> > 8.	CPU 0's ->nocb_cb_kthread invokes its callback.
> > 
> > 9.	Note that neither kthread updated any ->nocb_timer state,
> > 	so CPU 0's ->nocb_defer_wakeup is still set to either
> > 	RCU_NOCB_WAKE or RCU_NOCB_WAKE_FORCE.
> > 
> > 10.	CPU 0 enqueues its second callback, again with interrupts
> > 	disabled, and thus must again defer awakening its
> > 	->nocb_gp_kthread.  However, ->nocb_defer_wakeup prevents
> > 	CPU 0 from queueing the timer.
> 
> I managed to recollect some pieces of my brain. So keep the above but
> let's change the point 10:
> 
> 10.     CPU 0 enqueues its second callback, this time with interrupts
>  	enabled so it can wake directly	->nocb_gp_kthread.
> 	It does so with calling __wake_nocb_gp() which also cancels the
> 	pending timer that got queued in step 2. But that doesn't reset
> 	CPU 0's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
> 	So CPU 0's ->nocb_defer_wakeup and CPU 0's ->nocb_timer are now
> 	desynchronized.
> 
> 11.	->nocb_gp_kthread associates the callback queued in 10 with a new
> 	grace period, arrange for it to start and sleeps on it.
> 
> 12.     The grace period ends, ->nocb_gp_kthread awakens and wakes up
> 	CPU 0's ->nocb_cb_kthread which invokes the callback queued in 10.
> 
> 13.	CPU 0 enqueues its third callback, this time with interrupts
> 	disabled so it tries to queue a deferred wakeup. However
> 	->nocb_defer_wakeup has a stalled RCU_NOCB_WAKE value which prevents
> 	the CPU 0's ->nocb_timer, that got cancelled in 10, from being armed.
> 
> 14.     CPU 0 has its pending callback and it may go unnoticed until
>         some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever calls
> 	an explicit deferred wake up caller like idle entry.
> 
> I hope I'm not missing something this time...

Thank you, that does sound plausible.  I guess I can see how rcutorture
might have missed this one!

							Thanx, Paul

> Thanks.
> 	
> 
> > 
> > So far so good, but why isn't the timer still queued from back in step 2?
> > What am I missing here?  Either way, could you please update the commit
> > logs to tell the full story?  At some later time, you might be very
> > happy that you did.  ;-)
> > 
> > 							Thanx, Paul
> > 
> > > 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes
> > >    the pending "nocb_timer" (note they are not the same timers) for the
> > >    given rdp without resetting the matching state stored in nocb_defer
> > >    wakeup.
> > > 
> > > On both situations, a future call_rcu() on that rdp may be fooled and
> > > think the timer is armed when it's not, missing a deferred nocb_gp
> > > wakeup.
> > > 
> > > Case 1) is very unlikely due to timing constraint (the timer fires after
> > > 1 jiffy) but still possible in theory. Case 2) is more likely to happen.
> > > But in any case such scenario require the CPU to spend a long time
> > > within a kernel thread without exiting to idle or user space, which is
> > > a pretty exotic behaviour.
> > > 
> > > Fix this with resetting rdp->nocb_defer_wakeup everytime we disarm the
> > > timer.
> > > 
> > > Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
> > > Cc: Stable <stable@vger.kernel.org>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > ---
> > >  kernel/rcu/tree_plugin.h | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 2ec9d7f55f99..dd0dc66c282d 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -1720,7 +1720,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
> > >  		rcu_nocb_unlock_irqrestore(rdp, flags);
> > >  		return false;
> > >  	}
> > > -	del_timer(&rdp->nocb_timer);
> > > +
> > > +	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
> > > +		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> > > +		del_timer(&rdp->nocb_timer);
> > > +	}
> > >  	rcu_nocb_unlock_irqrestore(rdp, flags);
> > >  	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
> > >  	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
> > > @@ -2349,7 +2353,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
> > >  		return false;
> > >  	}
> > >  	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
> > > -	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> > >  	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
> > >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
> > >  
> > > -- 
> > > 2.25.1
> > >
Frederic Weisbecker Feb. 25, 2021, 12:48 a.m. UTC | #4
On Wed, Feb 24, 2021 at 04:14:25PM -0800, Paul E. McKenney wrote:
> On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:

> > I managed to recollect some pieces of my brain. So keep the above but

> > let's change the point 10:

> > 

> > 10.     CPU 0 enqueues its second callback, this time with interrupts

> >  	enabled so it can wake directly	->nocb_gp_kthread.

> > 	It does so with calling __wake_nocb_gp() which also cancels the

> > 	pending timer that got queued in step 2. But that doesn't reset

> > 	CPU 0's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.

> > 	So CPU 0's ->nocb_defer_wakeup and CPU 0's ->nocb_timer are now

> > 	desynchronized.

> > 

> > 11.	->nocb_gp_kthread associates the callback queued in 10 with a new

> > 	grace period, arrange for it to start and sleeps on it.

> > 

> > 12.     The grace period ends, ->nocb_gp_kthread awakens and wakes up

> > 	CPU 0's ->nocb_cb_kthread which invokes the callback queued in 10.

> > 

> > 13.	CPU 0 enqueues its third callback, this time with interrupts

> > 	disabled so it tries to queue a deferred wakeup. However

> > 	->nocb_defer_wakeup has a stalled RCU_NOCB_WAKE value which prevents

> > 	the CPU 0's ->nocb_timer, that got cancelled in 10, from being armed.

> > 

> > 14.     CPU 0 has its pending callback and it may go unnoticed until

> >         some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever calls

> > 	an explicit deferred wake up caller like idle entry.

> > 

> > I hope I'm not missing something this time...

> 

> Thank you, that does sound plausible.  I guess I can see how rcutorture

> might have missed this one!


I must admit it requires a lot of stars to be aligned :-)

Thanks.
Paul E. McKenney Feb. 25, 2021, 1:07 a.m. UTC | #5
On Thu, Feb 25, 2021 at 01:48:13AM +0100, Frederic Weisbecker wrote:
> On Wed, Feb 24, 2021 at 04:14:25PM -0800, Paul E. McKenney wrote:

> > On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:

> > > I managed to recollect some pieces of my brain. So keep the above but

> > > let's change the point 10:

> > > 

> > > 10.     CPU 0 enqueues its second callback, this time with interrupts

> > >  	enabled so it can wake directly	->nocb_gp_kthread.

> > > 	It does so with calling __wake_nocb_gp() which also cancels the

> > > 	pending timer that got queued in step 2. But that doesn't reset

> > > 	CPU 0's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.

> > > 	So CPU 0's ->nocb_defer_wakeup and CPU 0's ->nocb_timer are now

> > > 	desynchronized.

> > > 

> > > 11.	->nocb_gp_kthread associates the callback queued in 10 with a new

> > > 	grace period, arrange for it to start and sleeps on it.

> > > 

> > > 12.     The grace period ends, ->nocb_gp_kthread awakens and wakes up

> > > 	CPU 0's ->nocb_cb_kthread which invokes the callback queued in 10.

> > > 

> > > 13.	CPU 0 enqueues its third callback, this time with interrupts

> > > 	disabled so it tries to queue a deferred wakeup. However

> > > 	->nocb_defer_wakeup has a stalled RCU_NOCB_WAKE value which prevents

> > > 	the CPU 0's ->nocb_timer, that got cancelled in 10, from being armed.

> > > 

> > > 14.     CPU 0 has its pending callback and it may go unnoticed until

> > >         some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever calls

> > > 	an explicit deferred wake up caller like idle entry.

> > > 

> > > I hope I'm not missing something this time...

> > 

> > Thank you, that does sound plausible.  I guess I can see how rcutorture

> > might have missed this one!

> 

> I must admit it requires a lot of stars to be aligned :-)


It nevertheless constitutes a bug in rcutorture.  Or maybe an additional
challenge for the formal verification people.  ;-)

							Thanx, Paul
Paul E. McKenney March 2, 2021, 1:48 a.m. UTC | #6
On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:
> On Wed, Feb 24, 2021 at 10:37:09AM -0800, Paul E. McKenney wrote:
> > On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:
> > > Two situations can cause a missed nocb timer rearm:
> > > 
> > > 1) rdp(CPU A) queues its nocb timer. The grace period elapses before
> > >    the timer get a chance to fire. The nocb_gp kthread is awaken by
> > >    rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and
> > >    process the callbacks, again before the nocb_timer for CPU A get a
> > >    chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp
> > >    kthread, cancelling the pending nocb_timer without resetting the
> > >    corresponding nocb_defer_wakeup.
> > 
> > As discussed offlist, expanding the above scenario results in this
> > sequence of steps:

I renumbered the CPUs, since the ->nocb_gp_kthread would normally be
associated with CPU 0.  If the first CPU to enqueue a callback was also
CPU 0, nocb_gp_wait() might clear that CPU's ->nocb_defer_wakeup, which
would prevent this scenario from playing out.  (But admittedly only if
some other CPU handled by this same ->nocb_gp_kthread used its bypass.)

> > 1.	There are no callbacks queued for any CPU covered by CPU 0-2's
> > 	->nocb_gp_kthread.

And ->nocb_gp_kthread is associated with CPU 0.

> > 2.	CPU 1 enqueues its first callback with interrupts disabled, and
> > 	thus must defer awakening its ->nocb_gp_kthread.  It therefore
> > 	queues its rcu_data structure's ->nocb_timer.

At this point, CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.

> > 3.	CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a
> > 	callback, but with interrupts enabled, allowing it to directly
> > 	awaken the ->nocb_gp_kthread.
> > 
> > 4.	The newly awakened ->nocb_gp_kthread associates both CPU 1's
> > 	and CPU 2's callbacks with a future grace period and arranges
> > 	for that grace period to be started.
> > 
> > 5.	This ->nocb_gp_kthread goes to sleep waiting for the end of this
> > 	future grace period.
> > 
> > 6.	This grace period elapses before the CPU 1's timer fires.
> > 	This is normally improbably given that the timer is set for only
> > 	one jiffy, but timers can be delayed.  Besides, it is possible
> > 	that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
> > 
> > 7.	The grace period ends, so rcu_gp_kthread awakens the
> > 	->nocb_gp_kthread, which in turn awakens both CPU 1's and
> > 	CPU 2's ->nocb_cb_kthread.

And then ->nocb_cb_kthread sleeps waiting for more callbacks.

> > 8.	CPU 1's ->nocb_cb_kthread invokes its callback.
> > 
> > 9.	Note that neither kthread updated any ->nocb_timer state,
> > 	so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE.
> > 
> > 10.	CPU 1 enqueues its second callback, again with interrupts
> > 	disabled, and thus must again defer awakening its
> > 	->nocb_gp_kthread.  However, ->nocb_defer_wakeup prevents
> > 	CPU 1 from queueing the timer.
> 
> I managed to recollect some pieces of my brain. So keep the above but
> let's change the point 10:
> 
> 10.	CPU 1 enqueues its second callback, this time with interrupts
>  	enabled so it can wake directly	->nocb_gp_kthread.
> 	It does so with calling __wake_nocb_gp() which also cancels the

wake_nocb_gp() in current -rcu, correct?

> 	pending timer that got queued in step 2. But that doesn't reset
> 	CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
> 	So CPU 1's ->nocb_defer_wakeup and CPU 1's ->nocb_timer are now
> 	desynchronized.

Agreed, and agreed that this is a bug.  Thank you for persisting on
this one!

> 11.	->nocb_gp_kthread associates the callback queued in 10 with a new
> 	grace period, arrange for it to start and sleeps on it.
> 
> 12.	The grace period ends, ->nocb_gp_kthread awakens and wakes up
> 	CPU 1's ->nocb_cb_kthread which invokes the callback queued in 10.
> 
> 13.	CPU 1 enqueues its third callback, this time with interrupts
> 	disabled so it tries to queue a deferred wakeup. However
> 	->nocb_defer_wakeup has a stalled RCU_NOCB_WAKE value which prevents
> 	the CPU 1's ->nocb_timer, that got cancelled in 10, from being armed.
> 
> 14.	CPU 1 has its pending callback and it may go unnoticed until
> 	some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever calls
> 	an explicit deferred wake up caller like idle entry.
> 
> I hope I'm not missing something this time...

If you are missing something, then so am I!  ;-)

> > So far so good, but why isn't the timer still queued from back in step 2?
> > What am I missing here?  Either way, could you please update the commit
> > logs to tell the full story?  At some later time, you might be very
> > happy that you did.  ;-)
> > 
> > > 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes
> > >    the pending "nocb_timer" (note they are not the same timers) for the
> > >    given rdp without resetting the matching state stored in nocb_defer
> > >    wakeup.

Would like to similarly expand this one, or would you prefer to rest your
case on Case 1) above?

							Thanx, Paul

> > > On both situations, a future call_rcu() on that rdp may be fooled and
> > > think the timer is armed when it's not, missing a deferred nocb_gp
> > > wakeup.
> > > 
> > > Case 1) is very unlikely due to timing constraint (the timer fires after
> > > 1 jiffy) but still possible in theory. Case 2) is more likely to happen.
> > > But in any case such scenario require the CPU to spend a long time
> > > within a kernel thread without exiting to idle or user space, which is
> > > a pretty exotic behaviour.
> > > 
> > > Fix this with resetting rdp->nocb_defer_wakeup everytime we disarm the
> > > timer.
> > > 
> > > Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
> > > Cc: Stable <stable@vger.kernel.org>
> > > Cc: Josh Triplett <josh@joshtriplett.org>
> > > Cc: Lai Jiangshan <jiangshanlai@gmail.com>
> > > Cc: Joel Fernandes <joel@joelfernandes.org>
> > > Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> > > Cc: Boqun Feng <boqun.feng@gmail.com>
> > > Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> > > ---
> > >  kernel/rcu/tree_plugin.h | 7 +++++--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 2ec9d7f55f99..dd0dc66c282d 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -1720,7 +1720,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
> > >  		rcu_nocb_unlock_irqrestore(rdp, flags);
> > >  		return false;
> > >  	}
> > > -	del_timer(&rdp->nocb_timer);
> > > +
> > > +	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
> > > +		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> > > +		del_timer(&rdp->nocb_timer);
> > > +	}
> > >  	rcu_nocb_unlock_irqrestore(rdp, flags);
> > >  	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
> > >  	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
> > > @@ -2349,7 +2353,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
> > >  		return false;
> > >  	}
> > >  	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
> > > -	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
> > >  	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
> > >  	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
> > >  
> > > -- 
> > > 2.25.1
> > >
Frederic Weisbecker March 2, 2021, 12:34 p.m. UTC | #7
On Mon, Mar 01, 2021 at 05:48:29PM -0800, Paul E. McKenney wrote:
> On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:

> > On Wed, Feb 24, 2021 at 10:37:09AM -0800, Paul E. McKenney wrote:

> > > On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:

> > > > Two situations can cause a missed nocb timer rearm:

> > > > 

> > > > 1) rdp(CPU A) queues its nocb timer. The grace period elapses before

> > > >    the timer get a chance to fire. The nocb_gp kthread is awaken by

> > > >    rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and

> > > >    process the callbacks, again before the nocb_timer for CPU A get a

> > > >    chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp

> > > >    kthread, cancelling the pending nocb_timer without resetting the

> > > >    corresponding nocb_defer_wakeup.

> > > 

> > > As discussed offlist, expanding the above scenario results in this

> > > sequence of steps:

> 

> I renumbered the CPUs, since the ->nocb_gp_kthread would normally be

> associated with CPU 0.  If the first CPU to enqueue a callback was also

> CPU 0, nocb_gp_wait() might clear that CPU's ->nocb_defer_wakeup, which

> would prevent this scenario from playing out.  (But admittedly only if

> some other CPU handled by this same ->nocb_gp_kthread used its bypass.)


Ok good point.

> 

> > > 1.	There are no callbacks queued for any CPU covered by CPU 0-2's

> > > 	->nocb_gp_kthread.

> 

> And ->nocb_gp_kthread is associated with CPU 0.

> 

> > > 2.	CPU 1 enqueues its first callback with interrupts disabled, and

> > > 	thus must defer awakening its ->nocb_gp_kthread.  It therefore

> > > 	queues its rcu_data structure's ->nocb_timer.

> 

> At this point, CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.


Right.

> > > 7.	The grace period ends, so rcu_gp_kthread awakens the

> > > 	->nocb_gp_kthread, which in turn awakens both CPU 1's and

> > > 	CPU 2's ->nocb_cb_kthread.

> 

> And then ->nocb_cb_kthread sleeps waiting for more callbacks.


Yep

> > I managed to recollect some pieces of my brain. So keep the above but

> > let's change the point 10:

> > 

> > 10.	CPU 1 enqueues its second callback, this time with interrupts

> >  	enabled so it can wake directly	->nocb_gp_kthread.

> > 	It does so with calling __wake_nocb_gp() which also cancels the

> 

> wake_nocb_gp() in current -rcu, correct?


Heh, right.

> > > So far so good, but why isn't the timer still queued from back in step 2?

> > > What am I missing here?  Either way, could you please update the commit

> > > logs to tell the full story?  At some later time, you might be very

> > > happy that you did.  ;-)

> > > 

> > > > 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes

> > > >    the pending "nocb_timer" (note they are not the same timers) for the

> > > >    given rdp without resetting the matching state stored in nocb_defer

> > > >    wakeup.

> 

> Would like to similarly expand this one, or would you prefer to rest your

> case on Case 1) above?


I was about to say that we can skip that one, the changelog will already be
big enough but the "Fixes:" tag refers to the second scenario, since it's the
oldest vulnerable commit AFAICS.

> > > > Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)


Thanks.
Paul E. McKenney March 2, 2021, 6:17 p.m. UTC | #8
On Tue, Mar 02, 2021 at 01:34:44PM +0100, Frederic Weisbecker wrote:
> On Mon, Mar 01, 2021 at 05:48:29PM -0800, Paul E. McKenney wrote:

> > On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:

> > > On Wed, Feb 24, 2021 at 10:37:09AM -0800, Paul E. McKenney wrote:

> > > > On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:

> > > > > Two situations can cause a missed nocb timer rearm:

> > > > > 

> > > > > 1) rdp(CPU A) queues its nocb timer. The grace period elapses before

> > > > >    the timer get a chance to fire. The nocb_gp kthread is awaken by

> > > > >    rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and

> > > > >    process the callbacks, again before the nocb_timer for CPU A get a

> > > > >    chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp

> > > > >    kthread, cancelling the pending nocb_timer without resetting the

> > > > >    corresponding nocb_defer_wakeup.

> > > > 

> > > > As discussed offlist, expanding the above scenario results in this

> > > > sequence of steps:

> > 

> > I renumbered the CPUs, since the ->nocb_gp_kthread would normally be

> > associated with CPU 0.  If the first CPU to enqueue a callback was also

> > CPU 0, nocb_gp_wait() might clear that CPU's ->nocb_defer_wakeup, which

> > would prevent this scenario from playing out.  (But admittedly only if

> > some other CPU handled by this same ->nocb_gp_kthread used its bypass.)

> 

> Ok good point.

> 

> > 

> > > > 1.	There are no callbacks queued for any CPU covered by CPU 0-2's

> > > > 	->nocb_gp_kthread.

> > 

> > And ->nocb_gp_kthread is associated with CPU 0.

> > 

> > > > 2.	CPU 1 enqueues its first callback with interrupts disabled, and

> > > > 	thus must defer awakening its ->nocb_gp_kthread.  It therefore

> > > > 	queues its rcu_data structure's ->nocb_timer.

> > 

> > At this point, CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.

> 

> Right.

> 

> > > > 7.	The grace period ends, so rcu_gp_kthread awakens the

> > > > 	->nocb_gp_kthread, which in turn awakens both CPU 1's and

> > > > 	CPU 2's ->nocb_cb_kthread.

> > 

> > And then ->nocb_cb_kthread sleeps waiting for more callbacks.

> 

> Yep

> 

> > > I managed to recollect some pieces of my brain. So keep the above but

> > > let's change the point 10:

> > > 

> > > 10.	CPU 1 enqueues its second callback, this time with interrupts

> > >  	enabled so it can wake directly	->nocb_gp_kthread.

> > > 	It does so with calling __wake_nocb_gp() which also cancels the

> > 

> > wake_nocb_gp() in current -rcu, correct?

> 

> Heh, right.

> 

> > > > So far so good, but why isn't the timer still queued from back in step 2?

> > > > What am I missing here?  Either way, could you please update the commit

> > > > logs to tell the full story?  At some later time, you might be very

> > > > happy that you did.  ;-)

> > > > 

> > > > > 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes

> > > > >    the pending "nocb_timer" (note they are not the same timers) for the

> > > > >    given rdp without resetting the matching state stored in nocb_defer

> > > > >    wakeup.

> > 

> > Would like to similarly expand this one, or would you prefer to rest your

> > case on Case 1) above?

> 

> I was about to say that we can skip that one, the changelog will already be

> big enough but the "Fixes:" tag refers to the second scenario, since it's the

> oldest vulnerable commit AFAICS.

> 

> > > > > Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)


OK, how about if I queue a temporary commit (shown below) that just
calls out the first scenario so that I can start testing, and you get
me more detail on the second scenario?  I can then update the commit.

							Thanx, Paul

------------------------------------------------------------------------

commit 302fd54b9ae98f678624cbf9bf7a4ca88455a8f9
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Tue Feb 23 01:09:59 2021 +0100

    rcu/nocb: Fix missed nocb_timer requeue
    
    This sequence of events can lead to a failure to requeue a CPU's
    ->nocb_timer:
    
    1.      There are no callbacks queued for any CPU covered by CPU 0-2's
            ->nocb_gp_kthread.  Note that ->nocb_gp_kthread is associated
            with CPU 0.
    
    2.      CPU 1 enqueues its first callback with interrupts disabled, and
            thus must defer awakening its ->nocb_gp_kthread.  It therefore
            queues its rcu_data structure's ->nocb_timer.  At this point,
            CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.
    
    3.      CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a
            callback, but with interrupts enabled, allowing it to directly
            awaken the ->nocb_gp_kthread.
    
    4.      The newly awakened ->nocb_gp_kthread associates both CPU 1's
            and CPU 2's callbacks with a future grace period and arranges
            for that grace period to be started.
    
    5.      This ->nocb_gp_kthread goes to sleep waiting for the end of this
            future grace period.
    
    6.      This grace period elapses before the CPU 1's timer fires.
            This is normally improbably given that the timer is set for only
            one jiffy, but timers can be delayed.  Besides, it is possible
            that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
    
    7.      The grace period ends, so rcu_gp_kthread awakens the
            ->nocb_gp_kthread, which in turn awakens both CPU 1's and
            CPU 2's ->nocb_cb_kthread.  Then ->nocb_gb_kthread sleeps
            waiting for more newly queued callbacks.
    
    8.      CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps
            waiting for more invocable callbacks.
    
    9.      Note that neither kthread updated any ->nocb_timer state,
            so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE.
    
    10.     CPU 1 enqueues its second callback, this time with interrupts
            enabled so it can wake directly ->nocb_gp_kthread.
            It does so with calling wake_nocb_gp() which also cancels the
            pending timer that got queued in step 2. But that doesn't reset
            CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.
            So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now
            desynchronized.
    
    11.     ->nocb_gp_kthread associates the callback queued in 10 with a new
            grace period, arranges for that grace period to start and sleeps
            waiting for it to complete.
    
    12.     The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread,
            which in turn wakes up CPU 1's ->nocb_cb_kthread which then
            invokes the callback queued in 10.
    
    13.     CPU 1 enqueues its third callback, this time with interrupts
            disabled so it must queue a timer for a deferred wakeup. However
            the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
            incorrectly indicates that a timer is already queued.  Instead,
            CPU 1's ->nocb_timer was cancelled in 10.  CPU 1 therefore fails
            to queue the ->nocb_timer.
    
    14.     CPU 1 has its pending callback and it may go unnoticed until
            some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever
            calls an explicit deferred wakeup, for example, during idle entry.
    
    This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime
    we delete the ->nocb_timer.
    
    Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)
    Cc: Stable <stable@vger.kernel.org>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Lai Jiangshan <jiangshanlai@gmail.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>


diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 44746d8..429491d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1721,7 +1721,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
 		rcu_nocb_unlock_irqrestore(rdp, flags);
 		return false;
 	}
-	del_timer(&rdp->nocb_timer);
+
+	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
+		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+		del_timer(&rdp->nocb_timer);
+	}
 	rcu_nocb_unlock_irqrestore(rdp, flags);
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
 	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
@@ -2350,7 +2354,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
 		return false;
 	}
 	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
-	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
 	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));
Frederic Weisbecker March 3, 2021, 1:35 a.m. UTC | #9
On Tue, Mar 02, 2021 at 10:17:29AM -0800, Paul E. McKenney wrote:
> On Tue, Mar 02, 2021 at 01:34:44PM +0100, Frederic Weisbecker wrote:

> 

> OK, how about if I queue a temporary commit (shown below) that just

> calls out the first scenario so that I can start testing, and you get

> me more detail on the second scenario?  I can then update the commit.


Sure, meanwhile here is an attempt for a nocb_bypass_timer based
scenario, it's overly hairy and perhaps I picture more power
in the hands of callbacks advancing on nocb_cb_wait() than it
really has:


0.          CPU 0's ->nocb_cb_kthread just called rcu_do_batch() and
            executed all the ready callbacks. Its segcblist is now
            entirely empty. It's preempted while calling local_bh_enable().

1.          A new callback is enqueued on CPU 0 with IRQs enabled. So
            the ->nocb_gp_kthread for CPU 0-2's is awaken. Then a storm
            of callbacks enqueue follows on CPU 0 and even reaches the
            bypass queue. Note that ->nocb_gp_kthread is also associated
            with CPU 0.

2.          CPU 0 queues one last bypass callback.

3.          The ->nocb_gp_kthread wakes up and associates a grace period
            with the whole queue of regular callbacks on CPU 0. It also
            tries to flush the bypass queue of CPU 0 but the bypass lock
            is contended due to the concurrent enqueuing on the previous
            step 2, so the flush fails.

4.          This ->nocb_gp_kthread arms its ->nocb_bypass_timer and goes
            to sleep waiting for the end of this future grace period.

5.          This grace period elapses before the ->nocb_bypass_timer timer
            fires. This is normally improbably given that the timer is set
            for only two jiffies, but timers can be delayed.  Besides, it
            is possible that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.

6.          The grace period ends, so rcu_gp_kthread awakens the
            ->nocb_gp_kthread but it doesn't get a chance to run on a CPU
            before a while.

7.          CPU 0's ->nocb_cb_kthread get back to the CPU after its preemption.
            As it notices the new completed grace period, it advances the callbacks
            and executes them. Then it gets preempted again on local_bh_enabled().

8.          A new callback enqueue on CPU 0 flushes itself the bypass queue
            because CPU 0's ->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy.

9.          CPUs from other ->nocb_gp_kthread groups (above CPU 2) initiate and
            elapse a few grace periods. CPU 0's ->nocb_gp_kthread still hasn't
            got an opportunity to run on a CPU and its ->nocb_bypass_timer still
            hasn't fired.

10.         CPU 0's ->nocb_cb_kthread wakes up from preemption. It notices the
            new grace periods that have elapsed, advance all the callbacks and
            executes them. Then it goes to sleep waiting for invocable callbacks.

11.         CPU 0 enqueues a new callback with interrupts disabled, and
            defers awakening its ->nocb_gp_kthread even though ->nocb_gp_sleep
            is actually false. It therefore queues its rcu_data structure's
            ->nocb_timer. At this point, CPU 0's rdp->nocb_defer_wakeup is
            RCU_NOCB_WAKE.

12.         The ->nocb_bypass_timer finally fires! It doesn't wake up
            ->nocb_gp_kthread because it's actually awaken already.
            But it cancels CPU 0's ->nocb_timer armed at 11. Yet it doesn't
            re-initialize CPU 0's ->nocb_defer_wakeup which stays with the
            stale RCU_NOCB_WAKE value. So CPU 0's->nocb_defer_wakeup and
            its ->nocb_timer are now desynchronized.
            
13.         The ->nocb_gp_kthread finally runs. It cancels the ->nocb_bypass_timer
            which has already fired. It sees the new callback on CPU 0 and
            associate it with a new grace period then sleep on it.
            
14.         The grace period elapses, rcu_gp_kthread wakes up ->nocb_gb_kthread
            which wakes up CPU 0's->nocb_cb_kthread which runs the callback.
            Both ->nocb_gp_kthread and CPU 0's->nocb_cb_kthread now wait for new
            callbacks.
            
15.         CPU 0 enqueues another callback, again with interrupts
            disabled so it must queue a timer for a deferred wakeup. However
            the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
            incorrectly indicates that a timer is already queued.  Instead,
            CPU 0's ->nocb_timer was cancelled in 12.  CPU 0 therefore fails
            to queue the ->nocb_timer.

16.         CPU 0 has its pending callback and it may go unnoticed until
            some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever
            calls an explicit deferred wakeup, for example, during idle entry.


Thanks.
Paul E. McKenney March 3, 2021, 2:06 a.m. UTC | #10
On Wed, Mar 03, 2021 at 02:35:33AM +0100, Frederic Weisbecker wrote:
> On Tue, Mar 02, 2021 at 10:17:29AM -0800, Paul E. McKenney wrote:

> > On Tue, Mar 02, 2021 at 01:34:44PM +0100, Frederic Weisbecker wrote:

> > 

> > OK, how about if I queue a temporary commit (shown below) that just

> > calls out the first scenario so that I can start testing, and you get

> > me more detail on the second scenario?  I can then update the commit.

> 

> Sure, meanwhile here is an attempt for a nocb_bypass_timer based

> scenario, it's overly hairy and perhaps I picture more power

> in the hands of callbacks advancing on nocb_cb_wait() than it

> really has:


Thank you very much!

I must defer looking through this in detail until I am more awake,
but I do very much like the fine-grained exposition.

							Thanx, Paul

> 0.          CPU 0's ->nocb_cb_kthread just called rcu_do_batch() and

>             executed all the ready callbacks. Its segcblist is now

>             entirely empty. It's preempted while calling local_bh_enable().

> 

> 1.          A new callback is enqueued on CPU 0 with IRQs enabled. So

>             the ->nocb_gp_kthread for CPU 0-2's is awaken. Then a storm

>             of callbacks enqueue follows on CPU 0 and even reaches the

>             bypass queue. Note that ->nocb_gp_kthread is also associated

>             with CPU 0.

> 

> 2.          CPU 0 queues one last bypass callback.

> 

> 3.          The ->nocb_gp_kthread wakes up and associates a grace period

>             with the whole queue of regular callbacks on CPU 0. It also

>             tries to flush the bypass queue of CPU 0 but the bypass lock

>             is contended due to the concurrent enqueuing on the previous

>             step 2, so the flush fails.

> 

> 4.          This ->nocb_gp_kthread arms its ->nocb_bypass_timer and goes

>             to sleep waiting for the end of this future grace period.

> 

> 5.          This grace period elapses before the ->nocb_bypass_timer timer

>             fires. This is normally improbably given that the timer is set

>             for only two jiffies, but timers can be delayed.  Besides, it

>             is possible that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.

> 

> 6.          The grace period ends, so rcu_gp_kthread awakens the

>             ->nocb_gp_kthread but it doesn't get a chance to run on a CPU

>             before a while.

> 

> 7.          CPU 0's ->nocb_cb_kthread get back to the CPU after its preemption.

>             As it notices the new completed grace period, it advances the callbacks

>             and executes them. Then it gets preempted again on local_bh_enabled().

> 

> 8.          A new callback enqueue on CPU 0 flushes itself the bypass queue

>             because CPU 0's ->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy.

> 

> 9.          CPUs from other ->nocb_gp_kthread groups (above CPU 2) initiate and

>             elapse a few grace periods. CPU 0's ->nocb_gp_kthread still hasn't

>             got an opportunity to run on a CPU and its ->nocb_bypass_timer still

>             hasn't fired.

> 

> 10.         CPU 0's ->nocb_cb_kthread wakes up from preemption. It notices the

>             new grace periods that have elapsed, advance all the callbacks and

>             executes them. Then it goes to sleep waiting for invocable callbacks.

> 

> 11.         CPU 0 enqueues a new callback with interrupts disabled, and

>             defers awakening its ->nocb_gp_kthread even though ->nocb_gp_sleep

>             is actually false. It therefore queues its rcu_data structure's

>             ->nocb_timer. At this point, CPU 0's rdp->nocb_defer_wakeup is

>             RCU_NOCB_WAKE.

> 

> 12.         The ->nocb_bypass_timer finally fires! It doesn't wake up

>             ->nocb_gp_kthread because it's actually awaken already.

>             But it cancels CPU 0's ->nocb_timer armed at 11. Yet it doesn't

>             re-initialize CPU 0's ->nocb_defer_wakeup which stays with the

>             stale RCU_NOCB_WAKE value. So CPU 0's->nocb_defer_wakeup and

>             its ->nocb_timer are now desynchronized.

>             

> 13.         The ->nocb_gp_kthread finally runs. It cancels the ->nocb_bypass_timer

>             which has already fired. It sees the new callback on CPU 0 and

>             associate it with a new grace period then sleep on it.

>             

> 14.         The grace period elapses, rcu_gp_kthread wakes up ->nocb_gb_kthread

>             which wakes up CPU 0's->nocb_cb_kthread which runs the callback.

>             Both ->nocb_gp_kthread and CPU 0's->nocb_cb_kthread now wait for new

>             callbacks.

>             

> 15.         CPU 0 enqueues another callback, again with interrupts

>             disabled so it must queue a timer for a deferred wakeup. However

>             the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which

>             incorrectly indicates that a timer is already queued.  Instead,

>             CPU 0's ->nocb_timer was cancelled in 12.  CPU 0 therefore fails

>             to queue the ->nocb_timer.

> 

> 16.         CPU 0 has its pending callback and it may go unnoticed until

>             some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever

>             calls an explicit deferred wakeup, for example, during idle entry.

> 

> 

> Thanks.
Frederic Weisbecker March 3, 2021, 2:17 a.m. UTC | #11
On Tue, Mar 02, 2021 at 06:06:43PM -0800, Paul E. McKenney wrote:
> On Wed, Mar 03, 2021 at 02:35:33AM +0100, Frederic Weisbecker wrote:
> > On Tue, Mar 02, 2021 at 10:17:29AM -0800, Paul E. McKenney wrote:
> > > On Tue, Mar 02, 2021 at 01:34:44PM +0100, Frederic Weisbecker wrote:
> > > 
> > > OK, how about if I queue a temporary commit (shown below) that just
> > > calls out the first scenario so that I can start testing, and you get
> > > me more detail on the second scenario?  I can then update the commit.
> > 
> > Sure, meanwhile here is an attempt for a nocb_bypass_timer based
> > scenario, it's overly hairy and perhaps I picture more power
> > in the hands of callbacks advancing on nocb_cb_wait() than it
> > really has:
> 
> Thank you very much!
> 
> I must defer looking through this in detail until I am more awake,
> but I do very much like the fine-grained exposition.
> 
> 							Thanx, Paul
> 
> > 0.          CPU 0's ->nocb_cb_kthread just called rcu_do_batch() and
> >             executed all the ready callbacks. Its segcblist is now
> >             entirely empty. It's preempted while calling local_bh_enable().
> > 
> > 1.          A new callback is enqueued on CPU 0 with IRQs enabled. So
> >             the ->nocb_gp_kthread for CPU 0-2's is awaken. Then a storm
> >             of callbacks enqueue follows on CPU 0 and even reaches the
> >             bypass queue. Note that ->nocb_gp_kthread is also associated
> >             with CPU 0.
> > 
> > 2.          CPU 0 queues one last bypass callback.
> > 
> > 3.          The ->nocb_gp_kthread wakes up and associates a grace period
> >             with the whole queue of regular callbacks on CPU 0. It also
> >             tries to flush the bypass queue of CPU 0 but the bypass lock
> >             is contended due to the concurrent enqueuing on the previous
> >             step 2, so the flush fails.
> > 
> > 4.          This ->nocb_gp_kthread arms its ->nocb_bypass_timer and goes
> >             to sleep waiting for the end of this future grace period.
> > 
> > 5.          This grace period elapses before the ->nocb_bypass_timer timer
> >             fires. This is normally improbably given that the timer is set
> >             for only two jiffies, but timers can be delayed.  Besides, it
> >             is possible that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
> > 
> > 6.          The grace period ends, so rcu_gp_kthread awakens the
> >             ->nocb_gp_kthread but it doesn't get a chance to run on a CPU
> >             before a while.
> > 
> > 7.          CPU 0's ->nocb_cb_kthread get back to the CPU after its preemption.
> >             As it notices the new completed grace period, it advances the callbacks
> >             and executes them. Then it gets preempted again on local_bh_enabled().
> > 
> > 8.          A new callback enqueue on CPU 0 flushes itself the bypass queue
> >             because CPU 0's ->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy.
> > 
> > 9.          CPUs from other ->nocb_gp_kthread groups (above CPU 2) initiate and
> >             elapse a few grace periods. CPU 0's ->nocb_gp_kthread still hasn't
> >             got an opportunity to run on a CPU and its ->nocb_bypass_timer still
> >             hasn't fired.
> > 
> > 10.         CPU 0's ->nocb_cb_kthread wakes up from preemption. It notices the
> >             new grace periods that have elapsed, advance all the callbacks and
> >             executes them. Then it goes to sleep waiting for invocable
> >             callbacks.

I'm just not so sure about the above point 10. Even though a few grace periods
have elapsed, the callback queued in 8 is in RCU_NEXT_TAIL at this
point. Perhaps one more grace period is necessary after that.

Anyway, I need to be more awake as well before checking that again.
Neeraj Upadhyay March 3, 2021, 11:15 a.m. UTC | #12
Hi,

On 3/2/2021 11:47 PM, Paul E. McKenney wrote:
> On Tue, Mar 02, 2021 at 01:34:44PM +0100, Frederic Weisbecker wrote:

>> On Mon, Mar 01, 2021 at 05:48:29PM -0800, Paul E. McKenney wrote:

>>> On Wed, Feb 24, 2021 at 11:06:06PM +0100, Frederic Weisbecker wrote:

>>>> On Wed, Feb 24, 2021 at 10:37:09AM -0800, Paul E. McKenney wrote:

>>>>> On Tue, Feb 23, 2021 at 01:09:59AM +0100, Frederic Weisbecker wrote:

>>>>>> Two situations can cause a missed nocb timer rearm:

>>>>>>

>>>>>> 1) rdp(CPU A) queues its nocb timer. The grace period elapses before

>>>>>>     the timer get a chance to fire. The nocb_gp kthread is awaken by

>>>>>>     rdp(CPU B). The nocb_cb kthread for rdp(CPU A) is awaken and

>>>>>>     process the callbacks, again before the nocb_timer for CPU A get a

>>>>>>     chance to fire. rdp(CPU A) queues a callback and wakes up nocb_gp

>>>>>>     kthread, cancelling the pending nocb_timer without resetting the

>>>>>>     corresponding nocb_defer_wakeup.

>>>>>

>>>>> As discussed offlist, expanding the above scenario results in this

>>>>> sequence of steps:

>>>

>>> I renumbered the CPUs, since the ->nocb_gp_kthread would normally be

>>> associated with CPU 0.  If the first CPU to enqueue a callback was also

>>> CPU 0, nocb_gp_wait() might clear that CPU's ->nocb_defer_wakeup, which

>>> would prevent this scenario from playing out.  (But admittedly only if

>>> some other CPU handled by this same ->nocb_gp_kthread used its bypass.)

>>

>> Ok good point.

>>

>>>

>>>>> 1.	There are no callbacks queued for any CPU covered by CPU 0-2's

>>>>> 	->nocb_gp_kthread.

>>>

>>> And ->nocb_gp_kthread is associated with CPU 0.

>>>

>>>>> 2.	CPU 1 enqueues its first callback with interrupts disabled, and

>>>>> 	thus must defer awakening its ->nocb_gp_kthread.  It therefore

>>>>> 	queues its rcu_data structure's ->nocb_timer.

>>>

>>> At this point, CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.

>>

>> Right.

>>

>>>>> 7.	The grace period ends, so rcu_gp_kthread awakens the

>>>>> 	->nocb_gp_kthread, which in turn awakens both CPU 1's and

>>>>> 	CPU 2's ->nocb_cb_kthread.

>>>

>>> And then ->nocb_cb_kthread sleeps waiting for more callbacks.

>>

>> Yep

>>

>>>> I managed to recollect some pieces of my brain. So keep the above but

>>>> let's change the point 10:

>>>>

>>>> 10.	CPU 1 enqueues its second callback, this time with interrupts

>>>>   	enabled so it can wake directly	->nocb_gp_kthread.

>>>> 	It does so with calling __wake_nocb_gp() which also cancels the

>>>

>>> wake_nocb_gp() in current -rcu, correct?

>>

>> Heh, right.

>>

>>>>> So far so good, but why isn't the timer still queued from back in step 2?

>>>>> What am I missing here?  Either way, could you please update the commit

>>>>> logs to tell the full story?  At some later time, you might be very

>>>>> happy that you did.  ;-)

>>>>>

>>>>>> 2) The "nocb_bypass_timer" ends up calling wake_nocb_gp() which deletes

>>>>>>     the pending "nocb_timer" (note they are not the same timers) for the

>>>>>>     given rdp without resetting the matching state stored in nocb_defer

>>>>>>     wakeup.

>>>

>>> Would like to similarly expand this one, or would you prefer to rest your

>>> case on Case 1) above?

>>

>> I was about to say that we can skip that one, the changelog will already be

>> big enough but the "Fixes:" tag refers to the second scenario, since it's the

>> oldest vulnerable commit AFAICS.

>>

>>>>>> Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)

> 

> OK, how about if I queue a temporary commit (shown below) that just

> calls out the first scenario so that I can start testing, and you get

> me more detail on the second scenario?  I can then update the commit.

> 

> 							Thanx, Paul

> 

> ------------------------------------------------------------------------

> 

> commit 302fd54b9ae98f678624cbf9bf7a4ca88455a8f9

> Author: Frederic Weisbecker <frederic@kernel.org>

> Date:   Tue Feb 23 01:09:59 2021 +0100

> 

>      rcu/nocb: Fix missed nocb_timer requeue

>      

>      This sequence of events can lead to a failure to requeue a CPU's

>      ->nocb_timer:

>      

>      1.      There are no callbacks queued for any CPU covered by CPU 0-2's

>              ->nocb_gp_kthread.  Note that ->nocb_gp_kthread is associated

>              with CPU 0.

>      

>      2.      CPU 1 enqueues its first callback with interrupts disabled, and

>              thus must defer awakening its ->nocb_gp_kthread.  It therefore

>              queues its rcu_data structure's ->nocb_timer.  At this point,

>              CPU 1's rdp->nocb_defer_wakeup is RCU_NOCB_WAKE.

>      

>      3.      CPU 2, which shares the same ->nocb_gp_kthread, also enqueues a

>              callback, but with interrupts enabled, allowing it to directly

>              awaken the ->nocb_gp_kthread.

>      

>      4.      The newly awakened ->nocb_gp_kthread associates both CPU 1's

>              and CPU 2's callbacks with a future grace period and arranges

>              for that grace period to be started.

>      

>      5.      This ->nocb_gp_kthread goes to sleep waiting for the end of this

>              future grace period.

>      

>      6.      This grace period elapses before the CPU 1's timer fires.

>              This is normally improbably given that the timer is set for only

>              one jiffy, but timers can be delayed.  Besides, it is possible

>              that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.

>      

>      7.      The grace period ends, so rcu_gp_kthread awakens the

>              ->nocb_gp_kthread, which in turn awakens both CPU 1's and

>              CPU 2's ->nocb_cb_kthread.  Then ->nocb_gb_kthread sleeps

>              waiting for more newly queued callbacks.

>      

>      8.      CPU 1's ->nocb_cb_kthread invokes its callback, then sleeps

>              waiting for more invocable callbacks.

>      

>      9.      Note that neither kthread updated any ->nocb_timer state,

>              so CPU 1's ->nocb_defer_wakeup is still set to RCU_NOCB_WAKE.

>      

>      10.     CPU 1 enqueues its second callback, this time with interrupts

>              enabled so it can wake directly ->nocb_gp_kthread.

>              It does so with calling wake_nocb_gp() which also cancels the

>              pending timer that got queued in step 2. But that doesn't reset

>              CPU 1's ->nocb_defer_wakeup which is still set to RCU_NOCB_WAKE.

>              So CPU 1's ->nocb_defer_wakeup and its ->nocb_timer are now

>              desynchronized.

>      

>      11.     ->nocb_gp_kthread associates the callback queued in 10 with a new

>              grace period, arranges for that grace period to start and sleeps

>              waiting for it to complete.

>      

>      12.     The grace period ends, rcu_gp_kthread awakens ->nocb_gp_kthread,

>              which in turn wakes up CPU 1's ->nocb_cb_kthread which then

>              invokes the callback queued in 10.

>      

>      13.     CPU 1 enqueues its third callback, this time with interrupts

>              disabled so it must queue a timer for a deferred wakeup. However

>              the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which

>              incorrectly indicates that a timer is already queued.  Instead,

>              CPU 1's ->nocb_timer was cancelled in 10.  CPU 1 therefore fails

>              to queue the ->nocb_timer.

>      

>      14.     CPU 1 has its pending callback and it may go unnoticed until

>              some other CPU ever wakes up ->nocb_gp_kthread or CPU 1 ever

>              calls an explicit deferred wakeup, for example, during idle entry.

>      

>      This commit fixes this bug by resetting rdp->nocb_defer_wakeup everytime

>      we delete the ->nocb_timer.

>      

>      Fixes: d1b222c6be1f (rcu/nocb: Add bypass callback queueing)

>      Cc: Stable <stable@vger.kernel.org>

>      Cc: Josh Triplett <josh@joshtriplett.org>

>      Cc: Lai Jiangshan <jiangshanlai@gmail.com>

>      Cc: Joel Fernandes <joel@joelfernandes.org>

>      Cc: Neeraj Upadhyay <neeraju@codeaurora.org>

>      Cc: Boqun Feng <boqun.feng@gmail.com>

>      Signed-off-by: Frederic Weisbecker <frederic@kernel.org>

>      Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

> 

> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h

> index 44746d8..429491d 100644

> --- a/kernel/rcu/tree_plugin.h

> +++ b/kernel/rcu/tree_plugin.h

> @@ -1721,7 +1721,11 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force,

>   		rcu_nocb_unlock_irqrestore(rdp, flags);

>   		return false;

>   	}

> -	del_timer(&rdp->nocb_timer);

> +

> +	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {

> +		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);

> +		del_timer(&rdp->nocb_timer);

> +	}

>   	rcu_nocb_unlock_irqrestore(rdp, flags);

>   	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);

>   	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {

> @@ -2350,7 +2354,6 @@ static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)

>   		return false;

>   	}

>   	ndw = READ_ONCE(rdp->nocb_defer_wakeup);

> -	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);

>   	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);

>   	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));

>   

> 


Reviewed-by: Neeraj Upadhyay <neeraju@codeaurora.org>



Thanks
Neeraj

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member of the Code Aurora Forum, hosted by The Linux Foundation
diff mbox series

Patch

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 2ec9d7f55f99..dd0dc66c282d 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1720,7 +1720,11 @@  static bool wake_nocb_gp(struct rcu_data *rdp, bool force,
 		rcu_nocb_unlock_irqrestore(rdp, flags);
 		return false;
 	}
-	del_timer(&rdp->nocb_timer);
+
+	if (READ_ONCE(rdp->nocb_defer_wakeup) > RCU_NOCB_WAKE_NOT) {
+		WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
+		del_timer(&rdp->nocb_timer);
+	}
 	rcu_nocb_unlock_irqrestore(rdp, flags);
 	raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
 	if (force || READ_ONCE(rdp_gp->nocb_gp_sleep)) {
@@ -2349,7 +2353,6 @@  static bool do_nocb_deferred_wakeup_common(struct rcu_data *rdp)
 		return false;
 	}
 	ndw = READ_ONCE(rdp->nocb_defer_wakeup);
-	WRITE_ONCE(rdp->nocb_defer_wakeup, RCU_NOCB_WAKE_NOT);
 	ret = wake_nocb_gp(rdp, ndw == RCU_NOCB_WAKE_FORCE, flags);
 	trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("DeferredWake"));