diff mbox

[RFC,tip/core/rcu,19/28] nohz: Allow rcu extended quiescent state handling seperately from tick stop

Message ID 1320265849-5744-19-git-send-email-paulmck@linux.vnet.ibm.com
State Accepted
Commit 2bbb6817c0ac1b5f2a68d720f364f98eeb1ac4fd
Headers show

Commit Message

Paul E. McKenney Nov. 2, 2011, 8:30 p.m. UTC
From: Frederic Weisbecker <fweisbec@gmail.com>

It is assumed that rcu won't be used once we switch to tickless
mode and until we restart the tick. However this is not always
true, as in x86-64 where we dereference the idle notifiers after
the tick is stopped.

To prepare for fixing this, add two new APIs:
tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().

If no use of RCU is made in the idle loop between
tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
must instead call the new *_norcu() version such that the arch doesn't
need to call rcu_idle_enter() and rcu_idle_exit().

Otherwise the arch must call tick_nohz_enter_idle() and
tick_nohz_exit_idle() and also call explicitly:

- rcu_idle_enter() after its last use of RCU before the CPU is put
to sleep.
- rcu_idle_exit() before the first use of RCU after the CPU is woken
up.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: David Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 arch/arm/kernel/process.c              |    4 +-
 arch/avr32/kernel/process.c            |    4 +-
 arch/blackfin/kernel/process.c         |    4 +-
 arch/microblaze/kernel/process.c       |    4 +-
 arch/mips/kernel/process.c             |    4 +-
 arch/openrisc/kernel/idle.c            |    4 +-
 arch/powerpc/kernel/idle.c             |    4 +-
 arch/powerpc/platforms/iseries/setup.c |    8 +++---
 arch/s390/kernel/process.c             |    4 +-
 arch/sh/kernel/idle.c                  |    4 +-
 arch/sparc/kernel/process_64.c         |    4 +-
 arch/tile/kernel/process.c             |    4 +-
 arch/um/kernel/process.c               |    4 +-
 arch/unicore32/kernel/process.c        |    4 +-
 arch/x86/kernel/process_32.c           |    4 +-
 arch/x86/kernel/process_64.c           |    4 +-
 include/linux/tick.h                   |   46 +++++++++++++++++++++++++++++--
 kernel/time/tick-sched.c               |   25 +++++++++--------
 18 files changed, 90 insertions(+), 49 deletions(-)

Comments

Josh Triplett Nov. 3, 2011, 4 a.m. UTC | #1
On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> From: Frederic Weisbecker <fweisbec@gmail.com>
> 
> It is assumed that rcu won't be used once we switch to tickless
> mode and until we restart the tick. However this is not always
> true, as in x86-64 where we dereference the idle notifiers after
> the tick is stopped.
> 
> To prepare for fixing this, add two new APIs:
> tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> 
> If no use of RCU is made in the idle loop between
> tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> must instead call the new *_norcu() version such that the arch doesn't
> need to call rcu_idle_enter() and rcu_idle_exit().

The _norcu names confused me a bit.  At first, I thought they meant
"idle but not RCU idle, so you can use RCU", but from re-reading the
commit message, apparently they mean "idle and RCU idle, so don't use
RCU".  What about something like _forbid_rcu instead?  Or,
alternatively, why not just go ahead and separate the two types of idle
entirely rather than introducing the _norcu variants first?

- Josh Triplett
Frederic Weisbecker Nov. 3, 2011, 11:54 a.m. UTC | #2
On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > From: Frederic Weisbecker <fweisbec@gmail.com>
> > 
> > It is assumed that rcu won't be used once we switch to tickless
> > mode and until we restart the tick. However this is not always
> > true, as in x86-64 where we dereference the idle notifiers after
> > the tick is stopped.
> > 
> > To prepare for fixing this, add two new APIs:
> > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > 
> > If no use of RCU is made in the idle loop between
> > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > must instead call the new *_norcu() version such that the arch doesn't
> > need to call rcu_idle_enter() and rcu_idle_exit().
> 
> The _norcu names confused me a bit.  At first, I thought they meant
> "idle but not RCU idle, so you can use RCU", but from re-reading the
> commit message, apparently they mean "idle and RCU idle, so don't use
> RCU".  What about something like _forbid_rcu instead?  Or,
> alternatively, why not just go ahead and separate the two types of idle
> entirely rather than introducing the _norcu variants first?

Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?

Sounds clear but too long. May be we can shorten the tick_nohz thing in the
beginning.
Paul E. McKenney Nov. 3, 2011, 1:32 p.m. UTC | #3
On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > 
> > > It is assumed that rcu won't be used once we switch to tickless
> > > mode and until we restart the tick. However this is not always
> > > true, as in x86-64 where we dereference the idle notifiers after
> > > the tick is stopped.
> > > 
> > > To prepare for fixing this, add two new APIs:
> > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > 
> > > If no use of RCU is made in the idle loop between
> > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > must instead call the new *_norcu() version such that the arch doesn't
> > > need to call rcu_idle_enter() and rcu_idle_exit().
> > 
> > The _norcu names confused me a bit.  At first, I thought they meant
> > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > commit message, apparently they mean "idle and RCU idle, so don't use
> > RCU".  What about something like _forbid_rcu instead?  Or,
> > alternatively, why not just go ahead and separate the two types of idle
> > entirely rather than introducing the _norcu variants first?
> 
> Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> 
> Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> beginning.

How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
entry to the idle loop and tick_nohz_rcu_idle_exit() vs
tick_nohz_idle_exit() on exit?

That said, I don't feel all that strongly on this naming topic.

								Thanx, Paul
Josh Triplett Nov. 3, 2011, 3:31 p.m. UTC | #4
On Thu, Nov 03, 2011 at 06:32:31AM -0700, Paul E. McKenney wrote:
> On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> > On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > 
> > > > It is assumed that rcu won't be used once we switch to tickless
> > > > mode and until we restart the tick. However this is not always
> > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > the tick is stopped.
> > > > 
> > > > To prepare for fixing this, add two new APIs:
> > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > 
> > > > If no use of RCU is made in the idle loop between
> > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > 
> > > The _norcu names confused me a bit.  At first, I thought they meant
> > > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > > commit message, apparently they mean "idle and RCU idle, so don't use
> > > RCU".  What about something like _forbid_rcu instead?  Or,
> > > alternatively, why not just go ahead and separate the two types of idle
> > > entirely rather than introducing the _norcu variants first?
> > 
> > Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> > 
> > Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> > beginning.
> 
> How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
> entry to the idle loop and tick_nohz_rcu_idle_exit() vs
> tick_nohz_idle_exit() on exit?
> 
> That said, I don't feel all that strongly on this naming topic.

Mostly I think that since this series tries to separate the concepts of
"idle nohz" and "rcu extended quiescent state", we should end up with
two entirely separate functions delimiting those two, without any
functions that poke both with correspondingly complex compound names.

- Josh Triplett
Paul E. McKenney Nov. 3, 2011, 4:06 p.m. UTC | #5
On Thu, Nov 03, 2011 at 08:31:02AM -0700, Josh Triplett wrote:
> On Thu, Nov 03, 2011 at 06:32:31AM -0700, Paul E. McKenney wrote:
> > On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> > > On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > > > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > 
> > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > mode and until we restart the tick. However this is not always
> > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > the tick is stopped.
> > > > > 
> > > > > To prepare for fixing this, add two new APIs:
> > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > > 
> > > > > If no use of RCU is made in the idle loop between
> > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > 
> > > > The _norcu names confused me a bit.  At first, I thought they meant
> > > > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > > > commit message, apparently they mean "idle and RCU idle, so don't use
> > > > RCU".  What about something like _forbid_rcu instead?  Or,
> > > > alternatively, why not just go ahead and separate the two types of idle
> > > > entirely rather than introducing the _norcu variants first?
> > > 
> > > Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> > > 
> > > Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> > > beginning.
> > 
> > How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
> > entry to the idle loop and tick_nohz_rcu_idle_exit() vs
> > tick_nohz_idle_exit() on exit?
> > 
> > That said, I don't feel all that strongly on this naming topic.
> 
> Mostly I think that since this series tries to separate the concepts of
> "idle nohz" and "rcu extended quiescent state", we should end up with
> two entirely separate functions delimiting those two, without any
> functions that poke both with correspondingly complex compound names.

Having four API members rather than the current six does seem quite
attractive to me.  Frederic, any reason why this approach won't work?

						Thanx, Paul
Peter Zijlstra Nov. 9, 2011, 2:28 p.m. UTC | #6
On Thu, 2011-11-03 at 09:06 -0700, Paul E. McKenney wrote:
> > Mostly I think that since this series tries to separate the concepts of
> > "idle nohz" and "rcu extended quiescent state", we should end up with
> > two entirely separate functions delimiting those two, without any
> > functions that poke both with correspondingly complex compound names.
> 
> Having four API members rather than the current six does seem quite
> attractive to me.  Frederic, any reason why this approach won't work? 

Quite agreed. And since you seem to be touching most archs anyway,
touching them all isn't much more extra work.
Frederic Weisbecker Nov. 9, 2011, 4:48 p.m. UTC | #7
On Thu, Nov 03, 2011 at 09:06:56AM -0700, Paul E. McKenney wrote:
> On Thu, Nov 03, 2011 at 08:31:02AM -0700, Josh Triplett wrote:
> > On Thu, Nov 03, 2011 at 06:32:31AM -0700, Paul E. McKenney wrote:
> > > On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> > > > On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > > > > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > > 
> > > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > > mode and until we restart the tick. However this is not always
> > > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > > the tick is stopped.
> > > > > > 
> > > > > > To prepare for fixing this, add two new APIs:
> > > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > > > 
> > > > > > If no use of RCU is made in the idle loop between
> > > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > > 
> > > > > The _norcu names confused me a bit.  At first, I thought they meant
> > > > > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > > > > commit message, apparently they mean "idle and RCU idle, so don't use
> > > > > RCU".  What about something like _forbid_rcu instead?  Or,
> > > > > alternatively, why not just go ahead and separate the two types of idle
> > > > > entirely rather than introducing the _norcu variants first?
> > > > 
> > > > Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> > > > 
> > > > Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> > > > beginning.
> > > 
> > > How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
> > > entry to the idle loop and tick_nohz_rcu_idle_exit() vs
> > > tick_nohz_idle_exit() on exit?
> > > 
> > > That said, I don't feel all that strongly on this naming topic.
> > 
> > Mostly I think that since this series tries to separate the concepts of
> > "idle nohz" and "rcu extended quiescent state", we should end up with
> > two entirely separate functions delimiting those two, without any
> > functions that poke both with correspondingly complex compound names.
> 
> Having four API members rather than the current six does seem quite
> attractive to me.  Frederic, any reason why this approach won't work?

The approach I took might sound silly but it's mostly an optimization:

I did the *_norcu() variant mostly to be able to keep rcu_idle_enter()
call under the same local_irq_disable() section.

This way we can't have an interrupt in between that can needlessly perform
RCU work (and trigger the softirq in the worst case), delaying the point
where we actually put the CPU to sleep.
Peter Zijlstra Nov. 10, 2011, 10:52 a.m. UTC | #8
On Wed, 2011-11-09 at 17:48 +0100, Frederic Weisbecker wrote:
> > Having four API members rather than the current six does seem quite
> > attractive to me.  Frederic, any reason why this approach won't work?
> 
> The approach I took might sound silly but it's mostly an optimization:
> 
> I did the *_norcu() variant mostly to be able to keep rcu_idle_enter()
> call under the same local_irq_disable() section.
> 
> This way we can't have an interrupt in between that can needlessly perform
> RCU work (and trigger the softirq in the worst case), delaying the point
> where we actually put the CPU to sleep. 

I'm not sure I get what you're saying. A fully decoupled RCU/NO_HZ API
looks like:

  rcu_idle_enter();
  rcu_idle_exit();

  tick_nohz_idle_enter();
  tick_nohz_idle_exit();

And done you are, no funny interactions, 4 functions.

There is no _norcu variant simply because nohz will never touch rcu. If
you want the old coupled behaviour simply call both
tick_nohz_idle_enter() and rcu_idle_enter().
Paul E. McKenney Nov. 10, 2011, 5:22 p.m. UTC | #9
On Wed, Nov 09, 2011 at 05:48:11PM +0100, Frederic Weisbecker wrote:
> On Thu, Nov 03, 2011 at 09:06:56AM -0700, Paul E. McKenney wrote:
> > On Thu, Nov 03, 2011 at 08:31:02AM -0700, Josh Triplett wrote:
> > > On Thu, Nov 03, 2011 at 06:32:31AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> > > > > On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > > > > > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > > > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > > > 
> > > > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > > > mode and until we restart the tick. However this is not always
> > > > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > > > the tick is stopped.
> > > > > > > 
> > > > > > > To prepare for fixing this, add two new APIs:
> > > > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > > > > 
> > > > > > > If no use of RCU is made in the idle loop between
> > > > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > > > 
> > > > > > The _norcu names confused me a bit.  At first, I thought they meant
> > > > > > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > > > > > commit message, apparently they mean "idle and RCU idle, so don't use
> > > > > > RCU".  What about something like _forbid_rcu instead?  Or,
> > > > > > alternatively, why not just go ahead and separate the two types of idle
> > > > > > entirely rather than introducing the _norcu variants first?
> > > > > 
> > > > > Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> > > > > 
> > > > > Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> > > > > beginning.
> > > > 
> > > > How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
> > > > entry to the idle loop and tick_nohz_rcu_idle_exit() vs
> > > > tick_nohz_idle_exit() on exit?
> > > > 
> > > > That said, I don't feel all that strongly on this naming topic.
> > > 
> > > Mostly I think that since this series tries to separate the concepts of
> > > "idle nohz" and "rcu extended quiescent state", we should end up with
> > > two entirely separate functions delimiting those two, without any
> > > functions that poke both with correspondingly complex compound names.
> > 
> > Having four API members rather than the current six does seem quite
> > attractive to me.  Frederic, any reason why this approach won't work?
> 
> The approach I took might sound silly but it's mostly an optimization:
> 
> I did the *_norcu() variant mostly to be able to keep rcu_idle_enter()
> call under the same local_irq_disable() section.
> 
> This way we can't have an interrupt in between that can needlessly perform
> RCU work (and trigger the softirq in the worst case), delaying the point
> where we actually put the CPU to sleep.

But we have to tolerate this sort of thing on some architectures (x86
and Power) in order to allow idle-task use of RCU read-side primitives,
right?

So consolidating from six to four APIs doesn't expand the overall state
space.

							Thanx, Paul
Frederic Weisbecker Nov. 15, 2011, 6:30 p.m. UTC | #10
On Thu, Nov 10, 2011 at 09:22:19AM -0800, Paul E. McKenney wrote:
> On Wed, Nov 09, 2011 at 05:48:11PM +0100, Frederic Weisbecker wrote:
> > On Thu, Nov 03, 2011 at 09:06:56AM -0700, Paul E. McKenney wrote:
> > > On Thu, Nov 03, 2011 at 08:31:02AM -0700, Josh Triplett wrote:
> > > > On Thu, Nov 03, 2011 at 06:32:31AM -0700, Paul E. McKenney wrote:
> > > > > On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> > > > > > On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > > > > > > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > > > > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > > > > 
> > > > > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > > > > mode and until we restart the tick. However this is not always
> > > > > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > > > > the tick is stopped.
> > > > > > > > 
> > > > > > > > To prepare for fixing this, add two new APIs:
> > > > > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > > > > > 
> > > > > > > > If no use of RCU is made in the idle loop between
> > > > > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > > > > 
> > > > > > > The _norcu names confused me a bit.  At first, I thought they meant
> > > > > > > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > > > > > > commit message, apparently they mean "idle and RCU idle, so don't use
> > > > > > > RCU".  What about something like _forbid_rcu instead?  Or,
> > > > > > > alternatively, why not just go ahead and separate the two types of idle
> > > > > > > entirely rather than introducing the _norcu variants first?
> > > > > > 
> > > > > > Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> > > > > > 
> > > > > > Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> > > > > > beginning.
> > > > > 
> > > > > How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
> > > > > entry to the idle loop and tick_nohz_rcu_idle_exit() vs
> > > > > tick_nohz_idle_exit() on exit?
> > > > > 
> > > > > That said, I don't feel all that strongly on this naming topic.
> > > > 
> > > > Mostly I think that since this series tries to separate the concepts of
> > > > "idle nohz" and "rcu extended quiescent state", we should end up with
> > > > two entirely separate functions delimiting those two, without any
> > > > functions that poke both with correspondingly complex compound names.
> > > 
> > > Having four API members rather than the current six does seem quite
> > > attractive to me.  Frederic, any reason why this approach won't work?
> > 
> > The approach I took might sound silly but it's mostly an optimization:
> > 
> > I did the *_norcu() variant mostly to be able to keep rcu_idle_enter()
> > call under the same local_irq_disable() section.
> > 
> > This way we can't have an interrupt in between that can needlessly perform
> > RCU work (and trigger the softirq in the worst case), delaying the point
> > where we actually put the CPU to sleep.
> 
> But we have to tolerate this sort of thing on some architectures (x86
> and Power) in order to allow idle-task use of RCU read-side primitives,
> right?
> 
> So consolidating from six to four APIs doesn't expand the overall state
> space.

Well, we tolerate that, the two more APIs are there for optimization, not
to provide correctness.
But if you want me to remove the optimization and keep only the four APIs I can do it.
Paul E. McKenney Nov. 16, 2011, 7:41 p.m. UTC | #11
On Tue, Nov 15, 2011 at 07:30:29PM +0100, Frederic Weisbecker wrote:
> On Thu, Nov 10, 2011 at 09:22:19AM -0800, Paul E. McKenney wrote:
> > On Wed, Nov 09, 2011 at 05:48:11PM +0100, Frederic Weisbecker wrote:
> > > On Thu, Nov 03, 2011 at 09:06:56AM -0700, Paul E. McKenney wrote:
> > > > On Thu, Nov 03, 2011 at 08:31:02AM -0700, Josh Triplett wrote:
> > > > > On Thu, Nov 03, 2011 at 06:32:31AM -0700, Paul E. McKenney wrote:
> > > > > > On Thu, Nov 03, 2011 at 12:54:33PM +0100, Frederic Weisbecker wrote:
> > > > > > > On Wed, Nov 02, 2011 at 09:00:03PM -0700, Josh Triplett wrote:
> > > > > > > > On Wed, Nov 02, 2011 at 01:30:40PM -0700, Paul E. McKenney wrote:
> > > > > > > > > From: Frederic Weisbecker <fweisbec@gmail.com>
> > > > > > > > > 
> > > > > > > > > It is assumed that rcu won't be used once we switch to tickless
> > > > > > > > > mode and until we restart the tick. However this is not always
> > > > > > > > > true, as in x86-64 where we dereference the idle notifiers after
> > > > > > > > > the tick is stopped.
> > > > > > > > > 
> > > > > > > > > To prepare for fixing this, add two new APIs:
> > > > > > > > > tick_nohz_idle_enter_norcu() and tick_nohz_idle_exit_norcu().
> > > > > > > > > 
> > > > > > > > > If no use of RCU is made in the idle loop between
> > > > > > > > > tick_nohz_enter_idle() and tick_nohz_exit_idle() calls, the arch
> > > > > > > > > must instead call the new *_norcu() version such that the arch doesn't
> > > > > > > > > need to call rcu_idle_enter() and rcu_idle_exit().
> > > > > > > > 
> > > > > > > > The _norcu names confused me a bit.  At first, I thought they meant
> > > > > > > > "idle but not RCU idle, so you can use RCU", but from re-reading the
> > > > > > > > commit message, apparently they mean "idle and RCU idle, so don't use
> > > > > > > > RCU".  What about something like _forbid_rcu instead?  Or,
> > > > > > > > alternatively, why not just go ahead and separate the two types of idle
> > > > > > > > entirely rather than introducing the _norcu variants first?
> > > > > > > 
> > > > > > > Or tick_nohz_idle_enter_rcu_stop() and tick_nohz_idle_exit_rcu_restart()?
> > > > > > > 
> > > > > > > Sounds clear but too long. May be we can shorten the tick_nohz thing in the
> > > > > > > beginning.
> > > > > > 
> > > > > > How about tick_nohz_rcu_idle_enter() vs. tick_nohz_idle_enter() on
> > > > > > entry to the idle loop and tick_nohz_rcu_idle_exit() vs
> > > > > > tick_nohz_idle_exit() on exit?
> > > > > > 
> > > > > > That said, I don't feel all that strongly on this naming topic.
> > > > > 
> > > > > Mostly I think that since this series tries to separate the concepts of
> > > > > "idle nohz" and "rcu extended quiescent state", we should end up with
> > > > > two entirely separate functions delimiting those two, without any
> > > > > functions that poke both with correspondingly complex compound names.
> > > > 
> > > > Having four API members rather than the current six does seem quite
> > > > attractive to me.  Frederic, any reason why this approach won't work?
> > > 
> > > The approach I took might sound silly but it's mostly an optimization:
> > > 
> > > I did the *_norcu() variant mostly to be able to keep rcu_idle_enter()
> > > call under the same local_irq_disable() section.
> > > 
> > > This way we can't have an interrupt in between that can needlessly perform
> > > RCU work (and trigger the softirq in the worst case), delaying the point
> > > where we actually put the CPU to sleep.
> > 
> > But we have to tolerate this sort of thing on some architectures (x86
> > and Power) in order to allow idle-task use of RCU read-side primitives,
> > right?
> > 
> > So consolidating from six to four APIs doesn't expand the overall state
> > space.
> 
> Well, we tolerate that, the two more APIs are there for optimization, not
> to provide correctness.
> But if you want me to remove the optimization and keep only the four APIs I can do it.

Probably best to start with the simpler API and expand it if performance
considerations suggest that this is approapriate.

							Thanx, Paul
diff mbox

Patch

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index f9261d0..4f83362 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -183,7 +183,7 @@  void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		leds_event(led_idle_start);
 		while (!need_resched()) {
 #ifdef CONFIG_HOTPLUG_CPU
@@ -210,7 +210,7 @@  void cpu_idle(void)
 			}
 		}
 		leds_event(led_idle_end);
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/avr32/kernel/process.c b/arch/avr32/kernel/process.c
index 6ee7952..34c8c70 100644
--- a/arch/avr32/kernel/process.c
+++ b/arch/avr32/kernel/process.c
@@ -34,10 +34,10 @@  void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched())
 			cpu_idle_sleep();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/blackfin/kernel/process.c b/arch/blackfin/kernel/process.c
index 7b141b5..57e0749 100644
--- a/arch/blackfin/kernel/process.c
+++ b/arch/blackfin/kernel/process.c
@@ -88,10 +88,10 @@  void cpu_idle(void)
 #endif
 		if (!idle)
 			idle = default_idle;
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched())
 			idle();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index 6dc123e..c6ece38 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -103,10 +103,10 @@  void cpu_idle(void)
 		if (!idle)
 			idle = default_idle;
 
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched())
 			idle();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 
 		preempt_enable_no_resched();
 		schedule();
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index d50a005..7df2ffc 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -56,7 +56,7 @@  void __noreturn cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched() && cpu_online(cpu)) {
 #ifdef CONFIG_MIPS_MT_SMTC
 			extern void smtc_idle_loop_hook(void);
@@ -77,7 +77,7 @@  void __noreturn cpu_idle(void)
 		     system_state == SYSTEM_BOOTING))
 			play_dead();
 #endif
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/openrisc/kernel/idle.c b/arch/openrisc/kernel/idle.c
index fb6a9bf..2e82cd0 100644
--- a/arch/openrisc/kernel/idle.c
+++ b/arch/openrisc/kernel/idle.c
@@ -51,7 +51,7 @@  void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 
 		while (!need_resched()) {
 			check_pgt_cache();
@@ -69,7 +69,7 @@  void cpu_idle(void)
 			set_thread_flag(TIF_POLLING_NRFLAG);
 		}
 
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 878572f..2e782a3 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -56,7 +56,7 @@  void cpu_idle(void)
 
 	set_thread_flag(TIF_POLLING_NRFLAG);
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched() && !cpu_should_die()) {
 			ppc64_runlatch_off();
 
@@ -93,7 +93,7 @@  void cpu_idle(void)
 
 		HMT_medium();
 		ppc64_runlatch_on();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		if (cpu_should_die())
 			cpu_die();
diff --git a/arch/powerpc/platforms/iseries/setup.c b/arch/powerpc/platforms/iseries/setup.c
index e2f5fad..77ff6eb 100644
--- a/arch/powerpc/platforms/iseries/setup.c
+++ b/arch/powerpc/platforms/iseries/setup.c
@@ -562,7 +562,7 @@  static void yield_shared_processor(void)
 static void iseries_shared_idle(void)
 {
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched() && !hvlpevent_is_pending()) {
 			local_irq_disable();
 			ppc64_runlatch_off();
@@ -576,7 +576,7 @@  static void iseries_shared_idle(void)
 		}
 
 		ppc64_runlatch_on();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 
 		if (hvlpevent_is_pending())
 			process_iSeries_events();
@@ -592,7 +592,7 @@  static void iseries_dedicated_idle(void)
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		if (!need_resched()) {
 			while (!need_resched()) {
 				ppc64_runlatch_off();
@@ -609,7 +609,7 @@  static void iseries_dedicated_idle(void)
 		}
 
 		ppc64_runlatch_on();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index db3e930..44028ae 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -90,10 +90,10 @@  static void default_idle(void)
 void cpu_idle(void)
 {
 	for (;;) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched())
 			default_idle();
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index 6015743..ad58e75 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -89,7 +89,7 @@  void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 
 		while (!need_resched()) {
 			check_pgt_cache();
@@ -111,7 +111,7 @@  void cpu_idle(void)
 			start_critical_timings();
 		}
 
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index 1235f63..78b1bc0 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -95,12 +95,12 @@  void cpu_idle(void)
 	set_thread_flag(TIF_POLLING_NRFLAG);
 
 	while(1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 
 		while (!need_resched() && !cpu_is_offline(cpu))
 			sparc64_yield(cpu);
 
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 
 		preempt_enable_no_resched();
 
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index 920e674..53ac895 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -85,7 +85,7 @@  void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched()) {
 			if (cpu_is_offline(cpu))
 				BUG();  /* no HOTPLUG_CPU */
@@ -105,7 +105,7 @@  void cpu_idle(void)
 				local_irq_enable();
 			current_thread_info()->status |= TS_POLLING;
 		}
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 41acf59..9e7176b 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -245,10 +245,10 @@  void default_idle(void)
 		if (need_resched())
 			schedule();
 
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		nsecs = disable_timer();
 		idle_sleep(nsecs);
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 	}
 }
 
diff --git a/arch/unicore32/kernel/process.c b/arch/unicore32/kernel/process.c
index 9999b9a..095ff5a 100644
--- a/arch/unicore32/kernel/process.c
+++ b/arch/unicore32/kernel/process.c
@@ -55,7 +55,7 @@  void cpu_idle(void)
 {
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched()) {
 			local_irq_disable();
 			stop_critical_timings();
@@ -63,7 +63,7 @@  void cpu_idle(void)
 			local_irq_enable();
 			start_critical_timings();
 		}
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index ad93205..f311d096 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -98,7 +98,7 @@  void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched()) {
 
 			check_pgt_cache();
@@ -114,7 +114,7 @@  void cpu_idle(void)
 				pm_idle();
 			start_critical_timings();
 		}
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index 9ca714e..e72daf9 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -121,7 +121,7 @@  void cpu_idle(void)
 
 	/* endless idle loop with no priority at all */
 	while (1) {
-		tick_nohz_idle_enter();
+		tick_nohz_idle_enter_norcu();
 		while (!need_resched()) {
 
 			rmb();
@@ -147,7 +147,7 @@  void cpu_idle(void)
 			__exit_idle();
 		}
 
-		tick_nohz_idle_exit();
+		tick_nohz_idle_exit_norcu();
 		preempt_enable_no_resched();
 		schedule();
 		preempt_disable();
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 0df1d50..327434a 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -7,6 +7,7 @@ 
 #define _LINUX_TICK_H
 
 #include <linux/clockchips.h>
+#include <linux/irqflags.h>
 
 #ifdef CONFIG_GENERIC_CLOCKEVENTS
 
@@ -121,18 +122,57 @@  static inline int tick_oneshot_mode_active(void) { return 0; }
 #endif /* !CONFIG_GENERIC_CLOCKEVENTS */
 
 # ifdef CONFIG_NO_HZ
-extern void tick_nohz_idle_enter(void);
+extern void __tick_nohz_idle_enter(void);
+static inline void tick_nohz_idle_enter(void)
+{
+	local_irq_disable();
+	__tick_nohz_idle_enter();
+	local_irq_enable();
+}
 extern void tick_nohz_idle_exit(void);
+
+/*
+ * Call this pair of function if the arch doesn't make any use
+ * of RCU in-between. You won't need to call rcu_idle_enter() and
+ * rcu_idle_exit().
+ * Otherwise you need to call tick_nohz_idle_enter() and tick_nohz_idle_exit()
+ * and explicitly tell RCU about the window around the place the CPU enters low
+ * power mode where no RCU use is made. This is done by calling rcu_idle_enter()
+ * after the last use of RCU before the CPU is put to sleep and by calling
+ * rcu_idle_exit() before the first use of RCU after the CPU woke up.
+ */
+static inline void tick_nohz_idle_enter_norcu(void)
+{
+	/*
+	 * Also call rcu_idle_enter() in the irq disabled section even
+	 * if it disables irq itself.
+	 * Just an optimization that prevents from an interrupt happening
+	 * between it and __tick_nohz_idle_enter() to lose time to help
+	 * completing a grace period while we could be in extended grace
+	 * period already.
+	 */
+	local_irq_disable();
+	__tick_nohz_idle_enter();
+	rcu_idle_enter();
+	local_irq_enable();
+}
+static inline void tick_nohz_idle_exit_norcu(void)
+{
+	rcu_idle_exit();
+	tick_nohz_idle_exit();
+}
 extern void tick_nohz_irq_exit(void);
 extern ktime_t tick_nohz_get_sleep_length(void);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
 extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
 # else
-static inline void tick_nohz_idle_enter(void)
+static inline void tick_nohz_idle_enter(void) { }
+static inline void tick_nohz_idle_exit(void) { }
+static inline void tick_nohz_idle_enter_norcu(void)
 {
 	rcu_idle_enter();
 }
-static inline void tick_nohz_idle_exit(void)
+static inline void tick_nohz_idle_exit_norcu(void)
 {
 	rcu_idle_exit();
 }
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 52b7ace..360d028 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -424,18 +424,22 @@  out:
  *
  * When the next event is more than a tick into the future, stop the idle tick
  * Called when we start the idle loop.
- * This also enters into RCU extended quiescent state so that this CPU doesn't
- * need anymore to be part of any global grace period completion. This way
- * the tick can be stopped safely as we don't need to report quiescent states.
+ *
+ * If no use of RCU is made in the idle loop between
+ * tick_nohz_idle_enter() and tick_nohz_idle_exit() calls, then
+ * tick_nohz_idle_enter_norcu() should be called instead and the arch
+ * doesn't need to call rcu_idle_enter() and rcu_idle_exit() explicitly.
+ *
+ * Otherwise the arch is responsible of calling:
+ *
+ * - rcu_idle_enter() after its last use of RCU before the CPU is put
+ *  to sleep.
+ * - rcu_idle_exit() before the first use of RCU after the CPU is woken up.
  */
-void tick_nohz_idle_enter(void)
+void __tick_nohz_idle_enter(void)
 {
 	struct tick_sched *ts;
 
-	WARN_ON_ONCE(irqs_disabled());
-
-	local_irq_disable();
-
 	ts = &__get_cpu_var(tick_cpu_sched);
 	/*
 	 * set ts->inidle unconditionally. even if the system did not
@@ -444,9 +448,6 @@  void tick_nohz_idle_enter(void)
 	 */
 	ts->inidle = 1;
 	tick_nohz_stop_sched_tick(ts);
-	rcu_idle_enter();
-
-	local_irq_enable();
 }
 
 /**
@@ -522,7 +523,7 @@  void tick_nohz_idle_exit(void)
 	ktime_t now;
 
 	local_irq_disable();
-	rcu_idle_exit();
+
 	if (ts->idle_active || (ts->inidle && ts->tick_stopped))
 		now = ktime_get();