diff mbox series

[v2,2/4] sched: Introduce is_pcpu_safe()

Message ID 20210807005807.1083943-3-valentin.schneider@arm.com
State New
Headers show
Series rcu, arm64: PREEMPT_RT fixlets | expand

Commit Message

Valentin Schneider Aug. 7, 2021, 12:58 a.m. UTC
Some areas use preempt_disable() + preempt_enable() to safely access
per-CPU data. The PREEMPT_RT folks have shown this can also be done by
keeping preemption enabled and instead disabling migration (and acquiring a
sleepable lock, if relevant).

Introduce a helper which checks whether the current task can safely access
per-CPU data, IOW if the task's context guarantees the accesses will target
a single CPU. This accounts for preemption, CPU affinity, and migrate
disable - note that the CPU affinity check also mandates the presence of
PF_NO_SETAFFINITY, as otherwise userspace could concurrently render the
upcoming per-CPU access(es) unsafe.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 include/linux/sched.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Mike Galbraith Aug. 7, 2021, 1:42 a.m. UTC | #1
On Sat, 2021-08-07 at 01:58 +0100, Valentin Schneider wrote:
>
> +static inline bool is_pcpu_safe(void)

Nit: seems odd to avoid spelling it out to save two characters, percpu
is word like, rolls off the ole tongue better than p-c-p-u.

	-Mike
Valentin Schneider Aug. 8, 2021, 4:15 p.m. UTC | #2
On 07/08/21 03:42, Mike Galbraith wrote:
> On Sat, 2021-08-07 at 01:58 +0100, Valentin Schneider wrote:

>>

>> +static inline bool is_pcpu_safe(void)

>

> Nit: seems odd to avoid spelling it out to save two characters, percpu

> is word like, rolls off the ole tongue better than p-c-p-u.

>

>       -Mike


True. A quick grep says both versions are used, though "percpu" wins by
about a factor of 2. I'll tweak that for a v3.
Boqun Feng Aug. 10, 2021, 2:42 a.m. UTC | #3
Hi,

On Sat, Aug 07, 2021 at 01:58:05AM +0100, Valentin Schneider wrote:
> Some areas use preempt_disable() + preempt_enable() to safely access

> per-CPU data. The PREEMPT_RT folks have shown this can also be done by

> keeping preemption enabled and instead disabling migration (and acquiring a

> sleepable lock, if relevant).

> 

> Introduce a helper which checks whether the current task can safely access

> per-CPU data, IOW if the task's context guarantees the accesses will target

> a single CPU. This accounts for preemption, CPU affinity, and migrate

> disable - note that the CPU affinity check also mandates the presence of

> PF_NO_SETAFFINITY, as otherwise userspace could concurrently render the

> upcoming per-CPU access(es) unsafe.

> 

> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>

> ---

>  include/linux/sched.h | 10 ++++++++++

>  1 file changed, 10 insertions(+)

> 

> diff --git a/include/linux/sched.h b/include/linux/sched.h

> index debc960f41e3..b77d65f677f6 100644

> --- a/include/linux/sched.h

> +++ b/include/linux/sched.h

> @@ -1715,6 +1715,16 @@ static inline bool is_percpu_thread(void)

>  #endif

>  }

>  

> +/* Is the current task guaranteed not to be migrated elsewhere? */

> +static inline bool is_pcpu_safe(void)

> +{

> +#ifdef CONFIG_SMP

> +	return !preemptible() || is_percpu_thread() || current->migration_disabled;

> +#else

> +	return true;

> +#endif

> +}


I wonder whether the following can happen, say thread A is a worker
thread for CPU 1, so it has the flag PF_NO_SETAFFINITY set.

	{ percpu variable X on CPU 2 is initially 0 }

	thread A
	========

	<preemption enabled>
	if (is_pcpu_safe()) { // nr_cpus_allowed == 1, so return true.
		<preempted>
		<hot unplug CPU 1>
			unbinder_workers(1); // A->cpus_mask becomes cpu_possible_mask
		<back to run on CPU 2>
		__this_cpu_inc(X);
		  tmp = X; // tmp == 0
		  <preempted>
		  <in thread B>
		  this_cpu_inc(X); // X becomes 1
		  <back to run A on CPU 2>
		  X = tmp + 1; // race!
	}

if so, then is_percpu_thread() doesn't indicate is_pcpu_safe()?

Regards,
Boqun

> +

>  /* Per-process atomic flags. */

>  #define PFA_NO_NEW_PRIVS		0	/* May not gain new privileges. */

>  #define PFA_SPREAD_PAGE			1	/* Spread page cache over cpuset */

> -- 

> 2.25.1

>
Valentin Schneider Aug. 10, 2021, 9:26 a.m. UTC | #4
Hi,

On 10/08/21 10:42, Boqun Feng wrote:
> Hi,

>

> On Sat, Aug 07, 2021 at 01:58:05AM +0100, Valentin Schneider wrote:

>> Some areas use preempt_disable() + preempt_enable() to safely access

>> per-CPU data. The PREEMPT_RT folks have shown this can also be done by

>> keeping preemption enabled and instead disabling migration (and acquiring a

>> sleepable lock, if relevant).

>> 

>> Introduce a helper which checks whether the current task can safely access

>> per-CPU data, IOW if the task's context guarantees the accesses will target

>> a single CPU. This accounts for preemption, CPU affinity, and migrate

>> disable - note that the CPU affinity check also mandates the presence of

>> PF_NO_SETAFFINITY, as otherwise userspace could concurrently render the

>> upcoming per-CPU access(es) unsafe.

>> 

>> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>

>> ---

>>  include/linux/sched.h | 10 ++++++++++

>>  1 file changed, 10 insertions(+)

>> 

>> diff --git a/include/linux/sched.h b/include/linux/sched.h

>> index debc960f41e3..b77d65f677f6 100644

>> --- a/include/linux/sched.h

>> +++ b/include/linux/sched.h

>> @@ -1715,6 +1715,16 @@ static inline bool is_percpu_thread(void)

>>  #endif

>>  }

>>  

>> +/* Is the current task guaranteed not to be migrated elsewhere? */

>> +static inline bool is_pcpu_safe(void)

>> +{

>> +#ifdef CONFIG_SMP

>> +	return !preemptible() || is_percpu_thread() || current->migration_disabled;

>> +#else

>> +	return true;

>> +#endif

>> +}

>

> I wonder whether the following can happen, say thread A is a worker

> thread for CPU 1, so it has the flag PF_NO_SETAFFINITY set.

>

> 	{ percpu variable X on CPU 2 is initially 0 }

>

> 	thread A

> 	========

>

> 	<preemption enabled>

> 	if (is_pcpu_safe()) { // nr_cpus_allowed == 1, so return true.

> 		<preempted>

> 		<hot unplug CPU 1>

> 			unbinder_workers(1); // A->cpus_mask becomes cpu_possible_mask

> 		<back to run on CPU 2>

> 		__this_cpu_inc(X);

> 		  tmp = X; // tmp == 0

> 		  <preempted>

> 		  <in thread B>

> 		  this_cpu_inc(X); // X becomes 1

> 		  <back to run A on CPU 2>

> 		  X = tmp + 1; // race!

> 	}

>

> if so, then is_percpu_thread() doesn't indicate is_pcpu_safe()?

>


You're absolutely right.

migrate_disable() protects the thread against being migrated due to
hotplug, but pure CPU affinity doesn't at all. kthread_is_per_cpu() doesn't
work either, because parking is not the only approach to hotplug for those
(e.g. per-CPU workqueue threads unbind themselves on hotplug, as in your
example).

One could hold cpus_read_lock(), but I don't see much point here. So that
has to be 

  return !preemptible() || current->migration_disabled;

Thanks!
Boqun Feng Aug. 10, 2021, 12:49 p.m. UTC | #5
On Sun, Aug 08, 2021 at 05:15:20PM +0100, Valentin Schneider wrote:
> On 07/08/21 03:42, Mike Galbraith wrote:

> > On Sat, 2021-08-07 at 01:58 +0100, Valentin Schneider wrote:

> >>

> >> +static inline bool is_pcpu_safe(void)

> >

> > Nit: seems odd to avoid spelling it out to save two characters, percpu

> > is word like, rolls off the ole tongue better than p-c-p-u.

> >

> >       -Mike

> 

> True. A quick grep says both versions are used, though "percpu" wins by

> about a factor of 2. I'll tweak that for a v3.


I wonder why is_percpu_safe() is the correct name. The safety of
accesses to percpu variables means two things to me:

a)	The thread cannot migrate to other CPU in the middle of
	accessing a percpu variable, in other words, the following
	cannot happen:

	{ percpu variable X is 0 on CPU 0 and 2 on CPU 1
	CPU 0				CPU 1
	========			=========
	<in thread A>
	__this_cpu_inc(X);
	  tmp = X; // tmp is 0
	  <preempted>
	  <migrate to CPU 1>
	  				// continue __this_cpu_inc(X);
					X = tmp + 1; // CPU 0 miss this
						     // increment (this
						     // may be OK), and
						     // CPU 1's X got
						     // corrupted.

b)	The accesses to a percpu variable are exclusive, i.e. no
	interrupt or preemption can happen in the middle of accessing,
	in other words, the following cannot happen:

	{ percpu variable X is 0 on CPU 0 }
	CPU 0
	========
	<in thread A>
	__this_cpu_inc(X);
	  tmp = X; // tmp is 0
	  <preempted>
	  <in other thread>
	  this_cpu_inc(X); // X is 1 afterwards.
	  <back to thread A>
	  X = tmp + 1; // X is 1, and we have a race condition.

And the is_p{er}cpu_safe() only detects the first, and it doesn't mean
totally safe for percpu accesses.

Maybe we can implement a migratable()? Although not sure it's a English
word.

Regards,
Boqun
Valentin Schneider Aug. 10, 2021, 1:04 p.m. UTC | #6
On 10/08/21 20:49, Boqun Feng wrote:
> On Sun, Aug 08, 2021 at 05:15:20PM +0100, Valentin Schneider wrote:

>> On 07/08/21 03:42, Mike Galbraith wrote:

>> > On Sat, 2021-08-07 at 01:58 +0100, Valentin Schneider wrote:

>> >>

>> >> +static inline bool is_pcpu_safe(void)

>> >

>> > Nit: seems odd to avoid spelling it out to save two characters, percpu

>> > is word like, rolls off the ole tongue better than p-c-p-u.

>> >

>> >       -Mike

>> 

>> True. A quick grep says both versions are used, though "percpu" wins by

>> about a factor of 2. I'll tweak that for a v3.

>

> I wonder why is_percpu_safe() is the correct name. The safety of

> accesses to percpu variables means two things to me:

>

> a)	The thread cannot migrate to other CPU in the middle of

> 	accessing a percpu variable, in other words, the following

> 	cannot happen:

>

> 	{ percpu variable X is 0 on CPU 0 and 2 on CPU 1

> 	CPU 0				CPU 1

> 	========			=========

> 	<in thread A>

> 	__this_cpu_inc(X);

> 	  tmp = X; // tmp is 0

> 	  <preempted>

> 	  <migrate to CPU 1>

> 	  				// continue __this_cpu_inc(X);

> 					X = tmp + 1; // CPU 0 miss this

> 						     // increment (this

> 						     // may be OK), and

> 						     // CPU 1's X got

> 						     // corrupted.

>

> b)	The accesses to a percpu variable are exclusive, i.e. no

> 	interrupt or preemption can happen in the middle of accessing,

> 	in other words, the following cannot happen:

>

> 	{ percpu variable X is 0 on CPU 0 }

> 	CPU 0

> 	========

> 	<in thread A>

> 	__this_cpu_inc(X);

> 	  tmp = X; // tmp is 0

> 	  <preempted>

> 	  <in other thread>

> 	  this_cpu_inc(X); // X is 1 afterwards.

> 	  <back to thread A>

> 	  X = tmp + 1; // X is 1, and we have a race condition.

>

> And the is_p{er}cpu_safe() only detects the first, and it doesn't mean

> totally safe for percpu accesses.

>


Right. I do briefly point this out in the changelog (the bit about
"acquiring a sleepable lock if relevant"), but that doesn't do much to
clarify the helper name itself.

> Maybe we can implement a migratable()? Although not sure it's a English

> word.

>


Funnily enough that is exactly how I named the thing in my initial draft,
but then I somehow convinced myself that tailoring the name to per-CPU
accesses would make its intent clearer.

I think you're right that "migratable()" is less confusing at the end of
the day. Oh well, so much for overthinking the naming problem :-)

> Regards,

> Boqun
diff mbox series

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index debc960f41e3..b77d65f677f6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1715,6 +1715,16 @@  static inline bool is_percpu_thread(void)
 #endif
 }
 
+/* Is the current task guaranteed not to be migrated elsewhere? */
+static inline bool is_pcpu_safe(void)
+{
+#ifdef CONFIG_SMP
+	return !preemptible() || is_percpu_thread() || current->migration_disabled;
+#else
+	return true;
+#endif
+}
+
 /* Per-process atomic flags. */
 #define PFA_NO_NEW_PRIVS		0	/* May not gain new privileges. */
 #define PFA_SPREAD_PAGE			1	/* Spread page cache over cpuset */