diff mbox series

[RT,5/8] sched: Fix affine_move_task() self-concurrency

Message ID 20210709220018.003428207@goodmis.org
State New
Headers show
Series Linux 5.10.47-rt46-rc1 | expand

Commit Message

Steven Rostedt July 9, 2021, 9:59 p.m. UTC
5.10.47-rt46-rc1 stable review patch.
If anyone has any objections, please let me know.

------------------

From: Peter Zijlstra <peterz@infradead.org>

commit 9e81889c7648d48dd5fe13f41cbc99f3c362484a upstream.

Consider:

   sched_setaffinity(p, X);		sched_setaffinity(p, Y);

Then the first will install p->migration_pending = &my_pending; and
issue stop_one_cpu_nowait(pending); and the second one will read
p->migration_pending and _also_ issue: stop_one_cpu_nowait(pending),
the _SAME_ @pending.

This causes stopper list corruption.

Add set_affinity_pending::stop_pending, to indicate if a stopper is in
progress.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.649146419@infradead.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
---
 kernel/sched/core.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Pavel Machek July 25, 2021, 5:03 a.m. UTC | #1
Hi!

> 5.10.47-rt46-rc1 stable review patch.

> If anyone has any objections, please let me know.

> 

> Add set_affinity_pending::stop_pending, to indicate if a stopper is in

> progress.

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c

> index 9cbe12d8c5bd..20588a59300d 100644

> --- a/kernel/sched/core.c

> +++ b/kernel/sched/core.c

> @@ -1900,6 +1900,7 @@ struct migration_arg {

>  

>  struct set_affinity_pending {

>  	refcount_t		refs;

> +	unsigned int		stop_pending;

>  	struct completion	done;

>  	struct cpu_stop_work	stop_work;

>  	struct migration_arg	arg;


For better readability, this should be bool, AFAICT.

>  		 * and have the stopper function handle it all race-free.

>  		 */

> +		stop_pending = pending->stop_pending;

> +		if (!stop_pending)

> +			pending->stop_pending = true;

>  

...because it is used as bool.

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Valentin Schneider July 26, 2021, 1:39 p.m. UTC | #2
On 25/07/21 07:03, Pavel Machek wrote:
> Hi!

>

>> 5.10.47-rt46-rc1 stable review patch.

>> If anyone has any objections, please let me know.

>>

>> Add set_affinity_pending::stop_pending, to indicate if a stopper is in

>> progress.

>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c

>> index 9cbe12d8c5bd..20588a59300d 100644

>> --- a/kernel/sched/core.c

>> +++ b/kernel/sched/core.c

>> @@ -1900,6 +1900,7 @@ struct migration_arg {

>>

>>  struct set_affinity_pending {

>>      refcount_t		refs;

>> +	unsigned int		stop_pending;

>>      struct completion	done;

>>      struct cpu_stop_work	stop_work;

>>      struct migration_arg	arg;

>

> For better readability, this should be bool, AFAICT.

>


It's intentionally declared as an int. sizeof(_Bool) is Implementation
Defined, so you can't sanely reason about struct layout.

There's been quite a few threads about this already, a quick search on lore
gave me:

https://lore.kernel.org/lkml/20180411081502.GJ4082@hirez.programming.kicks-ass.net/
Paul Gortmaker July 26, 2021, 4:07 p.m. UTC | #3
[Re: [PATCH RT 5/8] sched: Fix affine_move_task() self-concurrency] On 25/07/2021 (Sun 07:03) Pavel Machek wrote:

> Hi!

> 

> > 5.10.47-rt46-rc1 stable review patch.

> > If anyone has any objections, please let me know.

> > 

> > Add set_affinity_pending::stop_pending, to indicate if a stopper is in

> > progress.

> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c

> > index 9cbe12d8c5bd..20588a59300d 100644

> > --- a/kernel/sched/core.c

> > +++ b/kernel/sched/core.c

> > @@ -1900,6 +1900,7 @@ struct migration_arg {

> >  

> >  struct set_affinity_pending {

> >  	refcount_t		refs;

> > +	unsigned int		stop_pending;

> >  	struct completion	done;

> >  	struct cpu_stop_work	stop_work;

> >  	struct migration_arg	arg;

> 

> For better readability, this should be bool, AFAICT.


Maybe you missed it in the context you deleted, but this is a mainline
backport to stable-rt, and hence is not the time or place to be
injecting stylistic comments.  Just like gregKH's stable tree, backports
are kept as "faithful" to the original as possible unless the older
surrounding code base forces some kind of alteration out of necessity.

Thanks,
Paul.
--

> 

> >  		 * and have the stopper function handle it all race-free.

> >  		 */

> > +		stop_pending = pending->stop_pending;

> > +		if (!stop_pending)

> > +			pending->stop_pending = true;

> >  

> ...because it is used as bool.

> 

> 									Pavel

> 

> -- 

> (english) http://www.livejournal.com/~pavelmachek

> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
diff mbox series

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9cbe12d8c5bd..20588a59300d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1900,6 +1900,7 @@  struct migration_arg {
 
 struct set_affinity_pending {
 	refcount_t		refs;
+	unsigned int		stop_pending;
 	struct completion	done;
 	struct cpu_stop_work	stop_work;
 	struct migration_arg	arg;
@@ -2018,12 +2019,15 @@  static int migration_cpu_stop(void *data)
 		 * determine is_migration_disabled() and so have to chase after
 		 * it.
 		 */
+		WARN_ON_ONCE(!pending->stop_pending);
 		task_rq_unlock(rq, p, &rf);
 		stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
 		return 0;
 	}
 out:
+	if (pending)
+		pending->stop_pending = false;
 	task_rq_unlock(rq, p, &rf);
 
 	if (complete)
@@ -2219,7 +2223,7 @@  static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			    int dest_cpu, unsigned int flags)
 {
 	struct set_affinity_pending my_pending = { }, *pending = NULL;
-	bool complete = false;
+	bool stop_pending, complete = false;
 
 	/* Can the task run on the task's current CPU? If so, we're done */
 	if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
@@ -2292,14 +2296,19 @@  static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		 * anything else we cannot do is_migration_disabled(), punt
 		 * and have the stopper function handle it all race-free.
 		 */
+		stop_pending = pending->stop_pending;
+		if (!stop_pending)
+			pending->stop_pending = true;
 
 		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
 		if (flags & SCA_MIGRATE_ENABLE)
 			p->migration_flags &= ~MDF_PUSH;
 		task_rq_unlock(rq, p, rf);
 
-		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
-				    &pending->arg, &pending->stop_work);
+		if (!stop_pending) {
+			stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
+					    &pending->arg, &pending->stop_work);
+		}
 
 		if (flags & SCA_MIGRATE_ENABLE)
 			return 0;