diff mbox

rcu: tree: correctly handle sparse possible CPUs

Message ID 1463417306-3826-1-git-send-email-mark.rutland@arm.com
State New
Headers show

Commit Message

Mark Rutland May 16, 2016, 4:48 p.m. UTC
In many cases in the RCU tree code, we iterate over the set of CPUS for
a leaf node described by rcu_node::grplo and rcu_node::grphi, checking
per-cpu data for each CPU in this range. However, if the set of possible
CPUs is sparse, some CPUs described in this range are not possible, and
thus no per-cpu region will have been allocated (or initialised) for
them by the generic percpu code.

Erroneous accesses to a per-cpu area for these !possible CPUs may fault
or may hit other data depending on the addressed generated when the
erroneous per cpu offset is applied. In practice, both cases have been
observed on arm64 hardware (the former being silent, but detectable with
additional patches).

To avoid issues resulting from this, we must iterate over the set of
*possible* cpus for a given leaf node. This patch adds new helpers to
enable this (also unifying and simplifying some related bitmask
manipulation logic), and moves the RCU tree code over to them.

Without this patch, running reboot at a shell can result in an oops
like:

[ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c
[ 3369.083881] pgd = ffffffc3ecdda000
[ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000
[ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[ 3369.101781] Modules linked in:
[ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G        W       4.6.0+ #3
[ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000
[ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510
[ 3369.139479] pc : [<ffffff80081109a8>] lr : [<ffffff8008110924>] pstate: 200001c5
[ 3369.146860] sp : ffffffc3eb9435a0
[ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88
[ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600
[ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88
[ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80
[ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40
[ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000
[ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0
[ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000
[ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000
[ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78
[ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000
[ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003
[ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280
[ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001
[ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140
...
[ 3369.972257] [<ffffff80081109a8>] sync_rcu_exp_select_cpus+0x188/0x510
[ 3369.978685] [<ffffff80081128b4>] synchronize_rcu_expedited+0x64/0xa8
[ 3369.985026] [<ffffff80086b987c>] synchronize_net+0x24/0x30
[ 3369.990499] [<ffffff80086ddb54>] dev_deactivate_many+0x28c/0x298
[ 3369.996493] [<ffffff80086b6bb8>] __dev_close_many+0x60/0xd0
[ 3370.002052] [<ffffff80086b6d48>] __dev_close+0x28/0x40
[ 3370.007178] [<ffffff80086bf62c>] __dev_change_flags+0x8c/0x158
[ 3370.012999] [<ffffff80086bf718>] dev_change_flags+0x20/0x60
[ 3370.018558] [<ffffff80086cf7f0>] do_setlink+0x288/0x918
[ 3370.023771] [<ffffff80086d0798>] rtnl_newlink+0x398/0x6a8
[ 3370.029158] [<ffffff80086cee84>] rtnetlink_rcv_msg+0xe4/0x220
[ 3370.034891] [<ffffff80086e274c>] netlink_rcv_skb+0xc4/0xf8
[ 3370.040364] [<ffffff80086ced8c>] rtnetlink_rcv+0x2c/0x40
[ 3370.045663] [<ffffff80086e1fe8>] netlink_unicast+0x160/0x238
[ 3370.051309] [<ffffff80086e24b8>] netlink_sendmsg+0x2f0/0x358
[ 3370.056956] [<ffffff80086a0070>] sock_sendmsg+0x18/0x30
[ 3370.062168] [<ffffff80086a21cc>] ___sys_sendmsg+0x26c/0x280
[ 3370.067728] [<ffffff80086a30ac>] __sys_sendmsg+0x44/0x88
[ 3370.073027] [<ffffff80086a3100>] SyS_sendmsg+0x10/0x20
[ 3370.078153] [<ffffff8008085e70>] el0_svc_naked+0x24/0x28

Signed-off-by: Mark Rutland <mark.rutland@arm.com>

Reported-by: Dennis Chen <dennis.chen@arm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-kernel@vger.kernel.org
---
 kernel/rcu/tree.c | 30 +++++++++++++-----------------
 kernel/rcu/tree.h | 18 ++++++++++++++++++
 2 files changed, 31 insertions(+), 17 deletions(-)

-- 
1.9.1

Comments

Paul E. McKenney May 16, 2016, 7:19 p.m. UTC | #1
On Mon, May 16, 2016 at 05:48:26PM +0100, Mark Rutland wrote:
> In many cases in the RCU tree code, we iterate over the set of CPUS for

> a leaf node described by rcu_node::grplo and rcu_node::grphi, checking

> per-cpu data for each CPU in this range. However, if the set of possible

> CPUs is sparse, some CPUs described in this range are not possible, and

> thus no per-cpu region will have been allocated (or initialised) for

> them by the generic percpu code.

> 

> Erroneous accesses to a per-cpu area for these !possible CPUs may fault

> or may hit other data depending on the addressed generated when the

> erroneous per cpu offset is applied. In practice, both cases have been

> observed on arm64 hardware (the former being silent, but detectable with

> additional patches).

> 

> To avoid issues resulting from this, we must iterate over the set of

> *possible* cpus for a given leaf node. This patch adds new helpers to

> enable this (also unifying and simplifying some related bitmask

> manipulation logic), and moves the RCU tree code over to them.

> 

> Without this patch, running reboot at a shell can result in an oops

> like:

> 

> [ 3369.075979] Unable to handle kernel paging request at virtual address ffffff8008b21b4c

> [ 3369.083881] pgd = ffffffc3ecdda000

> [ 3369.087270] [ffffff8008b21b4c] *pgd=00000083eca48003, *pud=00000083eca48003, *pmd=0000000000000000

> [ 3369.096222] Internal error: Oops: 96000007 [#1] PREEMPT SMP

> [ 3369.101781] Modules linked in:

> [ 3369.104825] CPU: 2 PID: 1817 Comm: NetworkManager Tainted: G        W       4.6.0+ #3

> [ 3369.121239] task: ffffffc0fa13e000 ti: ffffffc3eb940000 task.ti: ffffffc3eb940000

> [ 3369.128708] PC is at sync_rcu_exp_select_cpus+0x188/0x510

> [ 3369.134094] LR is at sync_rcu_exp_select_cpus+0x104/0x510

> [ 3369.139479] pc : [<ffffff80081109a8>] lr : [<ffffff8008110924>] pstate: 200001c5

> [ 3369.146860] sp : ffffffc3eb9435a0

> [ 3369.150162] x29: ffffffc3eb9435a0 x28: ffffff8008be4f88

> [ 3369.155465] x27: ffffff8008b66c80 x26: ffffffc3eceb2600

> [ 3369.160767] x25: 0000000000000001 x24: ffffff8008be4f88

> [ 3369.166070] x23: ffffff8008b51c3c x22: ffffff8008b66c80

> [ 3369.171371] x21: 0000000000000001 x20: ffffff8008b21b40

> [ 3369.176673] x19: ffffff8008b66c80 x18: 0000000000000000

> [ 3369.181975] x17: 0000007fa951a010 x16: ffffff80086a30f0

> [ 3369.187278] x15: 0000007fa9505590 x14: 0000000000000000

> [ 3369.192580] x13: ffffff8008b51000 x12: ffffffc3eb940000

> [ 3369.197882] x11: 0000000000000006 x10: ffffff8008b51b78

> [ 3369.203184] x9 : 0000000000000001 x8 : ffffff8008be4000

> [ 3369.208486] x7 : ffffff8008b21b40 x6 : 0000000000001003

> [ 3369.213788] x5 : 0000000000000000 x4 : ffffff8008b27280

> [ 3369.219090] x3 : ffffff8008b21b4c x2 : 0000000000000001

> [ 3369.224406] x1 : 0000000000000001 x0 : 0000000000000140

> ...

> [ 3369.972257] [<ffffff80081109a8>] sync_rcu_exp_select_cpus+0x188/0x510

> [ 3369.978685] [<ffffff80081128b4>] synchronize_rcu_expedited+0x64/0xa8

> [ 3369.985026] [<ffffff80086b987c>] synchronize_net+0x24/0x30

> [ 3369.990499] [<ffffff80086ddb54>] dev_deactivate_many+0x28c/0x298

> [ 3369.996493] [<ffffff80086b6bb8>] __dev_close_many+0x60/0xd0

> [ 3370.002052] [<ffffff80086b6d48>] __dev_close+0x28/0x40

> [ 3370.007178] [<ffffff80086bf62c>] __dev_change_flags+0x8c/0x158

> [ 3370.012999] [<ffffff80086bf718>] dev_change_flags+0x20/0x60

> [ 3370.018558] [<ffffff80086cf7f0>] do_setlink+0x288/0x918

> [ 3370.023771] [<ffffff80086d0798>] rtnl_newlink+0x398/0x6a8

> [ 3370.029158] [<ffffff80086cee84>] rtnetlink_rcv_msg+0xe4/0x220

> [ 3370.034891] [<ffffff80086e274c>] netlink_rcv_skb+0xc4/0xf8

> [ 3370.040364] [<ffffff80086ced8c>] rtnetlink_rcv+0x2c/0x40

> [ 3370.045663] [<ffffff80086e1fe8>] netlink_unicast+0x160/0x238

> [ 3370.051309] [<ffffff80086e24b8>] netlink_sendmsg+0x2f0/0x358

> [ 3370.056956] [<ffffff80086a0070>] sock_sendmsg+0x18/0x30

> [ 3370.062168] [<ffffff80086a21cc>] ___sys_sendmsg+0x26c/0x280

> [ 3370.067728] [<ffffff80086a30ac>] __sys_sendmsg+0x44/0x88

> [ 3370.073027] [<ffffff80086a3100>] SyS_sendmsg+0x10/0x20

> [ 3370.078153] [<ffffff8008085e70>] el0_svc_naked+0x24/0x28


Nice fix and simplification!

Could you please forward-port this to branch rcu/dev of -rcu?

	git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git

							Thanx, Paul


> Signed-off-by: Mark Rutland <mark.rutland@arm.com>

> Reported-by: Dennis Chen <dennis.chen@arm.com>

> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Cc: Catalin Marinas <catalin.marinas@arm.com>

> Cc: Josh Triplett <josh@joshtriplett.org>

> Cc: Lai Jiangshan <jiangshanlai@gmail.com>

> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

> Cc: Steve Capper <steve.capper@arm.com>

> Cc: Steven Rostedt <rostedt@goodmis.org>

> Cc: Will Deacon <will.deacon@arm.com>

> Cc: linux-kernel@vger.kernel.org

> ---

>  kernel/rcu/tree.c | 30 +++++++++++++-----------------

>  kernel/rcu/tree.h | 18 ++++++++++++++++++

>  2 files changed, 31 insertions(+), 17 deletions(-)

> 

> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c

> index 9a535a8..2923df3 100644

> --- a/kernel/rcu/tree.c

> +++ b/kernel/rcu/tree.c

> @@ -1235,15 +1235,16 @@ static void rcu_check_gp_kthread_starvation(struct rcu_state *rsp)

>  static void rcu_dump_cpu_stacks(struct rcu_state *rsp)

>  {

>  	int cpu;

> +	unsigned long bit;

>  	unsigned long flags;

>  	struct rcu_node *rnp;

> 

>  	rcu_for_each_leaf_node(rsp, rnp) {

>  		raw_spin_lock_irqsave_rcu_node(rnp, flags);

>  		if (rnp->qsmask != 0) {

> -			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)

> -				if (rnp->qsmask & (1UL << cpu))

> -					dump_cpu_task(rnp->grplo + cpu);

> +			for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit)

> +				if (rnp->qsmask & bit)

> +					dump_cpu_task(cpu);

>  		}

>  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);

>  	}

> @@ -1252,6 +1253,7 @@ static void rcu_dump_cpu_stacks(struct rcu_state *rsp)

>  static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)

>  {

>  	int cpu;

> +	unsigned long bit;

>  	long delta;

>  	unsigned long flags;

>  	unsigned long gpa;

> @@ -1284,10 +1286,9 @@ static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)

>  		raw_spin_lock_irqsave_rcu_node(rnp, flags);

>  		ndetected += rcu_print_task_stall(rnp);

>  		if (rnp->qsmask != 0) {

> -			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)

> -				if (rnp->qsmask & (1UL << cpu)) {

> -					print_cpu_stall_info(rsp,

> -							     rnp->grplo + cpu);

> +			for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit)

> +				if (rnp->qsmask & bit) {

> +					print_cpu_stall_info(rsp, cpu);

>  					ndetected++;

>  				}

>  		}

> @@ -2823,9 +2824,7 @@ static void force_qs_rnp(struct rcu_state *rsp,

>  				continue;

>  			}

>  		}

> -		cpu = rnp->grplo;

> -		bit = 1;

> -		for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {

> +		for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) {

>  			if ((rnp->qsmask & bit) != 0) {

>  				if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))

>  					mask |= bit;

> @@ -3690,7 +3689,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,

> 

>  		/* Each pass checks a CPU for identity, offline, and idle. */

>  		mask_ofl_test = 0;

> -		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {

> +		for_each_leaf_node_possible_cpu(rnp, cpu) {

>  			struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);

>  			struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);

> 

> @@ -3710,8 +3709,7 @@ static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,

>  		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);

> 

>  		/* IPI the remaining CPUs for expedited quiescent state. */

> -		mask = 1;

> -		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {

> +		for_each_leaf_node_possible_cpu_bit(rnp, cpu, mask) {

>  			if (!(mask_ofl_ipi & mask))

>  				continue;

>  retry_ipi:

> @@ -3774,8 +3772,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)

>  		ndetected = 0;

>  		rcu_for_each_leaf_node(rsp, rnp) {

>  			ndetected = rcu_print_task_exp_stall(rnp);

> -			mask = 1;

> -			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {

> +			for_each_leaf_node_possible_cpu_bit(rnp, cpu, mask) {

>  				struct rcu_data *rdp;

> 

>  				if (!(rnp->expmask & mask))

> @@ -3807,8 +3804,7 @@ static void synchronize_sched_expedited_wait(struct rcu_state *rsp)

>  			pr_cont("\n");

>  		}

>  		rcu_for_each_leaf_node(rsp, rnp) {

> -			mask = 1;

> -			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {

> +			for_each_leaf_node_possible_cpu_bit(rnp, cpu, mask) {

>  				if (!(rnp->expmask & mask))

>  					continue;

>  				dump_cpu_task(cpu);

> diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h

> index df668c0..d823b8e 100644

> --- a/kernel/rcu/tree.h

> +++ b/kernel/rcu/tree.h

> @@ -283,6 +283,24 @@ struct rcu_node {

>  	     (rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++)

> 

>  /*

> + * Iterate over all possible CPUs in a leaf RCU node.

> + */

> +#define for_each_leaf_node_possible_cpu(rnp, cpu) \

> +	for ((cpu) = rnp->grplo; \

> +	     cpu <= rnp->grphi; \

> +	     cpu = cpumask_next((cpu), cpu_possible_mask))

> +

> +/*

> + * Iterate over all possible CPUs in a leaf RCU node, at each step providing a

> + * bit for comparison against rcu_node bitmasks.

> + */

> +#define for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) \

> +	for ((cpu) = rnp->grplo, (bit) = 1; \

> +	     cpu <= rnp->grphi; \

> +	     cpu = cpumask_next((cpu), cpu_possible_mask), \

> +		   (bit) = 1UL << (cpu - rnp->grplo))

> +

> +/*

>   * Union to allow "aggregate OR" operation on the need for a quiescent

>   * state by the normal and expedited grace periods.

>   */

> -- 

> 1.9.1

>
Andrey Ryabinin May 18, 2016, 3:15 p.m. UTC | #2
2016-05-16 19:48 GMT+03:00 Mark Rutland <mark.rutland@arm.com>:

>  /*

> + * Iterate over all possible CPUs in a leaf RCU node.

> + */

> +#define for_each_leaf_node_possible_cpu(rnp, cpu) \

> +       for ((cpu) = rnp->grplo; \

> +            cpu <= rnp->grphi; \

> +            cpu = cpumask_next((cpu), cpu_possible_mask))

> +

> +/*

> + * Iterate over all possible CPUs in a leaf RCU node, at each step providing a

> + * bit for comparison against rcu_node bitmasks.

> + */

> +#define for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) \

> +       for ((cpu) = rnp->grplo, (bit) = 1; \

> +            cpu <= rnp->grphi; \

> +            cpu = cpumask_next((cpu), cpu_possible_mask), \

> +                  (bit) = 1UL << (cpu - rnp->grplo))

> +



[    0.163652] UBSAN: Undefined behaviour in ../kernel/rcu/tree.c:2912:3
[    0.164000] shift exponent 64 is too large for 64-bit type 'long
unsigned int'
Paul E. McKenney May 18, 2016, 6:01 p.m. UTC | #3
On Wed, May 18, 2016 at 06:15:23PM +0300, Andrey Ryabinin wrote:
> 2016-05-16 19:48 GMT+03:00 Mark Rutland <mark.rutland@arm.com>:

> 

> >  /*

> > + * Iterate over all possible CPUs in a leaf RCU node.

> > + */

> > +#define for_each_leaf_node_possible_cpu(rnp, cpu) \

> > +       for ((cpu) = rnp->grplo; \

> > +            cpu <= rnp->grphi; \

> > +            cpu = cpumask_next((cpu), cpu_possible_mask))

> > +

> > +/*

> > + * Iterate over all possible CPUs in a leaf RCU node, at each step providing a

> > + * bit for comparison against rcu_node bitmasks.

> > + */

> > +#define for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) \

> > +       for ((cpu) = rnp->grplo, (bit) = 1; \

> > +            cpu <= rnp->grphi; \

> > +            cpu = cpumask_next((cpu), cpu_possible_mask), \

> > +                  (bit) = 1UL << (cpu - rnp->grplo))

> > +

> 

> [    0.163652] UBSAN: Undefined behaviour in ../kernel/rcu/tree.c:2912:3

> [    0.164000] shift exponent 64 is too large for 64-bit type 'long

> unsigned int'


Ah, dead value, but can happen nevertheless.  One fix is to prevent the
assignment to bit when cpu > rnp->grphi.

Any ideas for a better fix?  And isn't there some combination of
signedness that makes shifting all the bits out of the value defined
to zero?  Or is that only for right shifts?

							Thanx, Paul
Mark Rutland May 18, 2016, 6:30 p.m. UTC | #4
On Wed, May 18, 2016 at 11:01:53AM -0700, Paul E. McKenney wrote:
> On Wed, May 18, 2016 at 06:15:23PM +0300, Andrey Ryabinin wrote:

> > 2016-05-16 19:48 GMT+03:00 Mark Rutland <mark.rutland@arm.com>:

> > 

> > >  /*

> > > + * Iterate over all possible CPUs in a leaf RCU node.

> > > + */

> > > +#define for_each_leaf_node_possible_cpu(rnp, cpu) \

> > > +       for ((cpu) = rnp->grplo; \

> > > +            cpu <= rnp->grphi; \

> > > +            cpu = cpumask_next((cpu), cpu_possible_mask))

> > > +

> > > +/*

> > > + * Iterate over all possible CPUs in a leaf RCU node, at each step providing a

> > > + * bit for comparison against rcu_node bitmasks.

> > > + */

> > > +#define for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) \

> > > +       for ((cpu) = rnp->grplo, (bit) = 1; \

> > > +            cpu <= rnp->grphi; \

> > > +            cpu = cpumask_next((cpu), cpu_possible_mask), \

> > > +                  (bit) = 1UL << (cpu - rnp->grplo))

> > > +

> > 

> > [    0.163652] UBSAN: Undefined behaviour in ../kernel/rcu/tree.c:2912:3

> > [    0.164000] shift exponent 64 is too large for 64-bit type 'long

> > unsigned int'

> 

> Ah, dead value, but can happen nevertheless.  One fix is to prevent the

> assignment to bit when cpu > rnp->grphi.

> 

> Any ideas for a better fix?  And isn't there some combination of

> signedness that makes shifting all the bits out of the value defined

> to zero?  Or is that only for right shifts?


We could add a (leaf/rcu)_node_cpu_mask(rnp, cpu) macro, and only use that in
the body of the loop. That would avoid the stale value and would be useful in a
couple of additional places.

If that makes sense to you, I can respin the patch with that.

Thanks,
Mark.
Paul E. McKenney May 18, 2016, 6:44 p.m. UTC | #5
On Wed, May 18, 2016 at 07:30:41PM +0100, Mark Rutland wrote:
> On Wed, May 18, 2016 at 11:01:53AM -0700, Paul E. McKenney wrote:

> > On Wed, May 18, 2016 at 06:15:23PM +0300, Andrey Ryabinin wrote:

> > > 2016-05-16 19:48 GMT+03:00 Mark Rutland <mark.rutland@arm.com>:

> > > 

> > > >  /*

> > > > + * Iterate over all possible CPUs in a leaf RCU node.

> > > > + */

> > > > +#define for_each_leaf_node_possible_cpu(rnp, cpu) \

> > > > +       for ((cpu) = rnp->grplo; \

> > > > +            cpu <= rnp->grphi; \

> > > > +            cpu = cpumask_next((cpu), cpu_possible_mask))

> > > > +

> > > > +/*

> > > > + * Iterate over all possible CPUs in a leaf RCU node, at each step providing a

> > > > + * bit for comparison against rcu_node bitmasks.

> > > > + */

> > > > +#define for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) \

> > > > +       for ((cpu) = rnp->grplo, (bit) = 1; \

> > > > +            cpu <= rnp->grphi; \

> > > > +            cpu = cpumask_next((cpu), cpu_possible_mask), \

> > > > +                  (bit) = 1UL << (cpu - rnp->grplo))

> > > > +

> > > 

> > > [    0.163652] UBSAN: Undefined behaviour in ../kernel/rcu/tree.c:2912:3

> > > [    0.164000] shift exponent 64 is too large for 64-bit type 'long

> > > unsigned int'

> > 

> > Ah, dead value, but can happen nevertheless.  One fix is to prevent the

> > assignment to bit when cpu > rnp->grphi.

> > 

> > Any ideas for a better fix?  And isn't there some combination of

> > signedness that makes shifting all the bits out of the value defined

> > to zero?  Or is that only for right shifts?

> 

> We could add a (leaf/rcu)_node_cpu_mask(rnp, cpu) macro, and only use that in

> the body of the loop. That would avoid the stale value and would be useful in a

> couple of additional places.

> 

> If that makes sense to you, I can respin the patch with that.


Please try it and then let's see what it looks like.

							Thanx, Paul
diff mbox

Patch

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 9a535a8..2923df3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1235,15 +1235,16 @@  static void rcu_check_gp_kthread_starvation(struct rcu_state *rsp)
 static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
 {
 	int cpu;
+	unsigned long bit;
 	unsigned long flags;
 	struct rcu_node *rnp;
 
 	rcu_for_each_leaf_node(rsp, rnp) {
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		if (rnp->qsmask != 0) {
-			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
-				if (rnp->qsmask & (1UL << cpu))
-					dump_cpu_task(rnp->grplo + cpu);
+			for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit)
+				if (rnp->qsmask & bit)
+					dump_cpu_task(cpu);
 		}
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 	}
@@ -1252,6 +1253,7 @@  static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
 static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 {
 	int cpu;
+	unsigned long bit;
 	long delta;
 	unsigned long flags;
 	unsigned long gpa;
@@ -1284,10 +1286,9 @@  static void print_other_cpu_stall(struct rcu_state *rsp, unsigned long gpnum)
 		raw_spin_lock_irqsave_rcu_node(rnp, flags);
 		ndetected += rcu_print_task_stall(rnp);
 		if (rnp->qsmask != 0) {
-			for (cpu = 0; cpu <= rnp->grphi - rnp->grplo; cpu++)
-				if (rnp->qsmask & (1UL << cpu)) {
-					print_cpu_stall_info(rsp,
-							     rnp->grplo + cpu);
+			for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit)
+				if (rnp->qsmask & bit) {
+					print_cpu_stall_info(rsp, cpu);
 					ndetected++;
 				}
 		}
@@ -2823,9 +2824,7 @@  static void force_qs_rnp(struct rcu_state *rsp,
 				continue;
 			}
 		}
-		cpu = rnp->grplo;
-		bit = 1;
-		for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
+		for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) {
 			if ((rnp->qsmask & bit) != 0) {
 				if (f(per_cpu_ptr(rsp->rda, cpu), isidle, maxj))
 					mask |= bit;
@@ -3690,7 +3689,7 @@  static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 
 		/* Each pass checks a CPU for identity, offline, and idle. */
 		mask_ofl_test = 0;
-		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++) {
+		for_each_leaf_node_possible_cpu(rnp, cpu) {
 			struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
 			struct rcu_dynticks *rdtp = &per_cpu(rcu_dynticks, cpu);
 
@@ -3710,8 +3709,7 @@  static void sync_rcu_exp_select_cpus(struct rcu_state *rsp,
 		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 
 		/* IPI the remaining CPUs for expedited quiescent state. */
-		mask = 1;
-		for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+		for_each_leaf_node_possible_cpu_bit(rnp, cpu, mask) {
 			if (!(mask_ofl_ipi & mask))
 				continue;
 retry_ipi:
@@ -3774,8 +3772,7 @@  static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 		ndetected = 0;
 		rcu_for_each_leaf_node(rsp, rnp) {
 			ndetected = rcu_print_task_exp_stall(rnp);
-			mask = 1;
-			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+			for_each_leaf_node_possible_cpu_bit(rnp, cpu, mask) {
 				struct rcu_data *rdp;
 
 				if (!(rnp->expmask & mask))
@@ -3807,8 +3804,7 @@  static void synchronize_sched_expedited_wait(struct rcu_state *rsp)
 			pr_cont("\n");
 		}
 		rcu_for_each_leaf_node(rsp, rnp) {
-			mask = 1;
-			for (cpu = rnp->grplo; cpu <= rnp->grphi; cpu++, mask <<= 1) {
+			for_each_leaf_node_possible_cpu_bit(rnp, cpu, mask) {
 				if (!(rnp->expmask & mask))
 					continue;
 				dump_cpu_task(cpu);
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index df668c0..d823b8e 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -283,6 +283,24 @@  struct rcu_node {
 	     (rnp) < &(rsp)->node[rcu_num_nodes]; (rnp)++)
 
 /*
+ * Iterate over all possible CPUs in a leaf RCU node.
+ */
+#define for_each_leaf_node_possible_cpu(rnp, cpu) \
+	for ((cpu) = rnp->grplo; \
+	     cpu <= rnp->grphi; \
+	     cpu = cpumask_next((cpu), cpu_possible_mask))
+
+/*
+ * Iterate over all possible CPUs in a leaf RCU node, at each step providing a
+ * bit for comparison against rcu_node bitmasks.
+ */
+#define for_each_leaf_node_possible_cpu_bit(rnp, cpu, bit) \
+	for ((cpu) = rnp->grplo, (bit) = 1; \
+	     cpu <= rnp->grphi; \
+	     cpu = cpumask_next((cpu), cpu_possible_mask), \
+		   (bit) = 1UL << (cpu - rnp->grplo))
+
+/*
  * Union to allow "aggregate OR" operation on the need for a quiescent
  * state by the normal and expedited grace periods.
  */