diff mbox

[RFC,tip/core/rcu,17/41] rcu: Remove single-rcu_node optimization in rcu_start_gp()

Message ID 1328125319-5205-17-git-send-email-paulmck@linux.vnet.ibm.com
State Superseded
Headers show

Commit Message

Paul E. McKenney Feb. 1, 2012, 7:41 p.m. UTC
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>

The grace-period initialization sequence in rcu_start_gp() has a special
case for systems where the rcu_node tree is a single rcu_node structure.
This made sense some years ago when systems were smaller and up to 64
CPUs could share a single rcu_node structure, but now that large systems
are common and a given leaf rcu_node structure can support only 16 CPUs
(due to lock contention on the rcu_node's ->lock field), this optimization
is almost never taken.  And even the small mobile platforms that might
make use of it might rather have the kernel text reduction.

Therefore, this commit removes the check for single-rcu_node trees.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 kernel/rcutree.c |   18 ------------------
 1 files changed, 0 insertions(+), 18 deletions(-)

Comments

Josh Triplett Feb. 2, 2012, 2:13 a.m. UTC | #1
On Wed, Feb 01, 2012 at 11:41:35AM -0800, Paul E. McKenney wrote:
> The grace-period initialization sequence in rcu_start_gp() has a special
> case for systems where the rcu_node tree is a single rcu_node structure.
> This made sense some years ago when systems were smaller and up to 64
> CPUs could share a single rcu_node structure, but now that large systems
> are common and a given leaf rcu_node structure can support only 16 CPUs
> (due to lock contention on the rcu_node's ->lock field), this optimization
> is almost never taken.  And even the small mobile platforms that might
> make use of it might rather have the kernel text reduction.
> 
> Therefore, this commit removes the check for single-rcu_node trees.

This optimization would continue to work on laptops for a while longer.
:)

That said, I do agree that reducing code size and complexity seems
preferable.  If someone wants an optimization like this, they'd probably
do better to compile RCU with a low compile-time limit on the number of
CPUs, which would at least theoretically allow the compiler to get
similar results through optimization.  (I don't know if that works in
practice with the current code structure and the current intelligence of
GCC.)

Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Paul E. McKenney Feb. 2, 2012, 5:16 p.m. UTC | #2
On Wed, Feb 01, 2012 at 06:13:14PM -0800, Josh Triplett wrote:
> On Wed, Feb 01, 2012 at 11:41:35AM -0800, Paul E. McKenney wrote:
> > The grace-period initialization sequence in rcu_start_gp() has a special
> > case for systems where the rcu_node tree is a single rcu_node structure.
> > This made sense some years ago when systems were smaller and up to 64
> > CPUs could share a single rcu_node structure, but now that large systems
> > are common and a given leaf rcu_node structure can support only 16 CPUs
> > (due to lock contention on the rcu_node's ->lock field), this optimization
> > is almost never taken.  And even the small mobile platforms that might
> > make use of it might rather have the kernel text reduction.
> > 
> > Therefore, this commit removes the check for single-rcu_node trees.
> 
> This optimization would continue to work on laptops for a while longer.
> :)

How many more months?  ;-)

> That said, I do agree that reducing code size and complexity seems
> preferable.  If someone wants an optimization like this, they'd probably
> do better to compile RCU with a low compile-time limit on the number of
> CPUs, which would at least theoretically allow the compiler to get
> similar results through optimization.  (I don't know if that works in
> practice with the current code structure and the current intelligence of
> GCC.)
> 
> Reviewed-by: Josh Triplett <josh@joshtriplett.org>

Thank you for all your reviews -- as always, very helpful!!!

							Thanx, Paul
diff mbox

Patch

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index ee2009d..38d143b 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -984,26 +984,8 @@  rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
 	rsp->fqs_state = RCU_GP_INIT; /* Hold off force_quiescent_state. */
 	rsp->jiffies_force_qs = jiffies + RCU_JIFFIES_TILL_FORCE_QS;
 	record_gp_stall_check_time(rsp);
-
-	/* Special-case the common single-level case. */
-	if (NUM_RCU_NODES == 1) {
-		rcu_preempt_check_blocked_tasks(rnp);
-		rnp->qsmask = rnp->qsmaskinit;
-		rnp->gpnum = rsp->gpnum;
-		rnp->completed = rsp->completed;
-		rsp->fqs_state = RCU_SIGNAL_INIT; /* force_quiescent_state OK */
-		rcu_start_gp_per_cpu(rsp, rnp, rdp);
-		rcu_preempt_boost_start_gp(rnp);
-		trace_rcu_grace_period_init(rsp->name, rnp->gpnum,
-					    rnp->level, rnp->grplo,
-					    rnp->grphi, rnp->qsmask);
-		raw_spin_unlock_irqrestore(&rnp->lock, flags);
-		return;
-	}
-
 	raw_spin_unlock(&rnp->lock);  /* leave irqs disabled. */
 
-
 	/* Exclude any concurrent CPU-hotplug operations. */
 	raw_spin_lock(&rsp->onofflock);  /* irqs already disabled. */