[V5,1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE

Message ID 1543325060-1599-1-git-send-email-daniel.lezcano@linaro.org
State New
Headers show
Series
  • [V5,1/2] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE
Related show

Commit Message

Daniel Lezcano Nov. 27, 2018, 1:24 p.m.
The mutex protects a per_cpu variable access. The potential race can
happen only when the cpufreq governor module is loaded and at the same
time the cpu capacity is changed in the sysfs.

There is no real interest of using a mutex to protect a variable
assignation when there is no situation where a task can take the lock
and block.

Replace the mutex by READ_ONCE / WRITE_ONCE.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

Cc: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>

---
 drivers/base/arch_topology.c  | 7 +------
 include/linux/arch_topology.h | 2 +-
 2 files changed, 2 insertions(+), 7 deletions(-)

-- 
2.7.4

Comments

Juri Lelli Nov. 28, 2018, 11:44 a.m. | #1
Hi Daniel,

On 27/11/18 14:24, Daniel Lezcano wrote:
> The mutex protects a per_cpu variable access. The potential race can

> happen only when the cpufreq governor module is loaded and at the same

> time the cpu capacity is changed in the sysfs.

> 

> There is no real interest of using a mutex to protect a variable

> assignation when there is no situation where a task can take the lock

> and block.

> 

> Replace the mutex by READ_ONCE / WRITE_ONCE.

> 

> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

> Cc: Sudeep Holla <sudeep.holla@arm.com>

> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>

> ---

>  drivers/base/arch_topology.c  | 7 +------

>  include/linux/arch_topology.h | 2 +-

>  2 files changed, 2 insertions(+), 7 deletions(-)

> 

> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c

> index edfcf8d..fd5325b 100644

> --- a/drivers/base/arch_topology.c

> +++ b/drivers/base/arch_topology.c

> @@ -31,12 +31,11 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,

>  		per_cpu(freq_scale, i) = scale;

>  }

>  

> -static DEFINE_MUTEX(cpu_scale_mutex);

>  DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;

>  

>  void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)

>  {

> -	per_cpu(cpu_scale, cpu) = capacity;

> +	WRITE_ONCE(per_cpu(cpu_scale, cpu), capacity);

>  }

>  

>  static ssize_t cpu_capacity_show(struct device *dev,

> @@ -71,10 +70,8 @@ static ssize_t cpu_capacity_store(struct device *dev,

>  	if (new_capacity > SCHED_CAPACITY_SCALE)

>  		return -EINVAL;

>  

> -	mutex_lock(&cpu_scale_mutex);

>  	for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)

>  		topology_set_cpu_scale(i, new_capacity);

> -	mutex_unlock(&cpu_scale_mutex);


IIRC this was meant to ensure atomic updates of all siblings with the new
capacity value. I actually now wonder if readers should not grab the
mutex as well (cpu_capacity_show()). Can't we get into a situation where
a reader might see siblings with intermediate values (while the loop
above is performing an update)?

BTW, please update my email address. :-)

Best,

- Juri
Daniel Lezcano Nov. 28, 2018, 5:54 p.m. | #2
On 28/11/2018 12:44, Juri Lelli wrote:
> Hi Daniel,

> 

> On 27/11/18 14:24, Daniel Lezcano wrote:

>> The mutex protects a per_cpu variable access. The potential race can

>> happen only when the cpufreq governor module is loaded and at the same

>> time the cpu capacity is changed in the sysfs.

>>

>> There is no real interest of using a mutex to protect a variable

>> assignation when there is no situation where a task can take the lock

>> and block.

>>

>> Replace the mutex by READ_ONCE / WRITE_ONCE.

>>

>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

>> Cc: Sudeep Holla <sudeep.holla@arm.com>

>> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>

>> ---

>>  drivers/base/arch_topology.c  | 7 +------

>>  include/linux/arch_topology.h | 2 +-

>>  2 files changed, 2 insertions(+), 7 deletions(-)

>>

>> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c

>> index edfcf8d..fd5325b 100644

>> --- a/drivers/base/arch_topology.c

>> +++ b/drivers/base/arch_topology.c

>> @@ -31,12 +31,11 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,

>>  		per_cpu(freq_scale, i) = scale;

>>  }

>>  

>> -static DEFINE_MUTEX(cpu_scale_mutex);

>>  DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;

>>  

>>  void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)

>>  {

>> -	per_cpu(cpu_scale, cpu) = capacity;

>> +	WRITE_ONCE(per_cpu(cpu_scale, cpu), capacity);

>>  }

>>  

>>  static ssize_t cpu_capacity_show(struct device *dev,

>> @@ -71,10 +70,8 @@ static ssize_t cpu_capacity_store(struct device *dev,

>>  	if (new_capacity > SCHED_CAPACITY_SCALE)

>>  		return -EINVAL;

>>  

>> -	mutex_lock(&cpu_scale_mutex);

>>  	for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)

>>  		topology_set_cpu_scale(i, new_capacity);

>> -	mutex_unlock(&cpu_scale_mutex);

> 

> IIRC this was meant to ensure atomic updates of all siblings with the new

> capacity value. I actually now wonder if readers should not grab the

> mutex as well (cpu_capacity_show()). Can't we get into a situation where

> a reader might see siblings with intermediate values (while the loop

> above is performing an update)?


With or without this patch, it is the case:

                task1                      task2
                  |                          |
  read("/sys/.../cpu1/cpu_capacity)          |
                  |                  write("/sys/.../cpu1/cpu_capacity")
  read("/sys/.../cpu2/cpu_capacity)          |


There is no guarantee userspace can have a consistent view of the
capacity. As soon as it reads a capacity, it can be changed in its back.


> BTW, please update my email address. :-)


Sure.


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Juri Lelli Nov. 29, 2018, 7:04 a.m. | #3
On 28/11/18 18:54, Daniel Lezcano wrote:
> On 28/11/2018 12:44, Juri Lelli wrote:

> > Hi Daniel,

> > 

> > On 27/11/18 14:24, Daniel Lezcano wrote:

> >> The mutex protects a per_cpu variable access. The potential race can

> >> happen only when the cpufreq governor module is loaded and at the same

> >> time the cpu capacity is changed in the sysfs.

> >>

> >> There is no real interest of using a mutex to protect a variable

> >> assignation when there is no situation where a task can take the lock

> >> and block.

> >>

> >> Replace the mutex by READ_ONCE / WRITE_ONCE.

> >>

> >> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

> >> Cc: Sudeep Holla <sudeep.holla@arm.com>

> >> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>

> >> ---

> >>  drivers/base/arch_topology.c  | 7 +------

> >>  include/linux/arch_topology.h | 2 +-

> >>  2 files changed, 2 insertions(+), 7 deletions(-)

> >>

> >> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c

> >> index edfcf8d..fd5325b 100644

> >> --- a/drivers/base/arch_topology.c

> >> +++ b/drivers/base/arch_topology.c

> >> @@ -31,12 +31,11 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,

> >>  		per_cpu(freq_scale, i) = scale;

> >>  }

> >>  

> >> -static DEFINE_MUTEX(cpu_scale_mutex);

> >>  DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;

> >>  

> >>  void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)

> >>  {

> >> -	per_cpu(cpu_scale, cpu) = capacity;

> >> +	WRITE_ONCE(per_cpu(cpu_scale, cpu), capacity);

> >>  }

> >>  

> >>  static ssize_t cpu_capacity_show(struct device *dev,

> >> @@ -71,10 +70,8 @@ static ssize_t cpu_capacity_store(struct device *dev,

> >>  	if (new_capacity > SCHED_CAPACITY_SCALE)

> >>  		return -EINVAL;

> >>  

> >> -	mutex_lock(&cpu_scale_mutex);

> >>  	for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)

> >>  		topology_set_cpu_scale(i, new_capacity);

> >> -	mutex_unlock(&cpu_scale_mutex);

> > 

> > IIRC this was meant to ensure atomic updates of all siblings with the new

> > capacity value. I actually now wonder if readers should not grab the

> > mutex as well (cpu_capacity_show()). Can't we get into a situation where

> > a reader might see siblings with intermediate values (while the loop

> > above is performing an update)?

> 

> With or without this patch, it is the case:

> 

>                 task1                      task2

>                   |                          |

>   read("/sys/.../cpu1/cpu_capacity)          |

>                   |                  write("/sys/.../cpu1/cpu_capacity")

>   read("/sys/.../cpu2/cpu_capacity)          |

> 

> 

> There is no guarantee userspace can have a consistent view of the

> capacity. As soon as it reads a capacity, it can be changed in its back.


True, but w/o the mutex task1 could read different cpu_capacity values
for a cluster (it actually can also with current implementation, we
should grab the mutex in the read path as well if we want to avoid
this). Just pointing this out, not sure it is a problem, though, as even
the notion of strictly equal capacities clusters is going to disappear
AFAIK.
Daniel Lezcano Nov. 29, 2018, 9:18 a.m. | #4
On 29/11/2018 08:04, Juri Lelli wrote:

[ ... ]

>> With or without this patch, it is the case:

>>

>>                 task1                      task2

>>                   |                          |

>>   read("/sys/.../cpu1/cpu_capacity)          |

>>                   |                  write("/sys/.../cpu1/cpu_capacity")

>>   read("/sys/.../cpu2/cpu_capacity)          |

>>

>>

>> There is no guarantee userspace can have a consistent view of the

>> capacity. As soon as it reads a capacity, it can be changed in its back.

> 

> True, but w/o the mutex task1 could read different cpu_capacity values

> for a cluster (it actually can also with current implementation, we

> should grab the mutex in the read path as well if we want to avoid

> this). 


Even if the mutex is on the read path, the userspace can see different
capacities because it will read the cpu_capacity per cpu directory.

The mutex will be take when reading cpu0/cpu_capacity, not for
cpu[0-9]/cpu_capacity. Between two reads, a write can happen because the
lock is released in between.

Do you agree with the patch ? Or do you want me to drop it ?

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Juri Lelli Nov. 29, 2018, 9:58 a.m. | #5
On 29/11/18 10:18, Daniel Lezcano wrote:
> On 29/11/2018 08:04, Juri Lelli wrote:

> 

> [ ... ]

> 

> >> With or without this patch, it is the case:

> >>

> >>                 task1                      task2

> >>                   |                          |

> >>   read("/sys/.../cpu1/cpu_capacity)          |

> >>                   |                  write("/sys/.../cpu1/cpu_capacity")

> >>   read("/sys/.../cpu2/cpu_capacity)          |

> >>

> >>

> >> There is no guarantee userspace can have a consistent view of the

> >> capacity. As soon as it reads a capacity, it can be changed in its back.

> > 

> > True, but w/o the mutex task1 could read different cpu_capacity values

> > for a cluster (it actually can also with current implementation, we

> > should grab the mutex in the read path as well if we want to avoid

> > this). 

> 

> Even if the mutex is on the read path, the userspace can see different

> capacities because it will read the cpu_capacity per cpu directory.

> 

> The mutex will be take when reading cpu0/cpu_capacity, not for

> cpu[0-9]/cpu_capacity. Between two reads, a write can happen because the

> lock is released in between.

> 

> Do you agree with the patch ? Or do you want me to drop it ?


I don't actually have cases at hand that are showing regression with it,
I was just trying to understand if we might potentially hit problems in
the future. So, I'm not against this patch. :-)
Daniel Lezcano Nov. 29, 2018, 10:02 a.m. | #6
On 29/11/2018 10:58, Juri Lelli wrote:
> On 29/11/18 10:18, Daniel Lezcano wrote:

>> On 29/11/2018 08:04, Juri Lelli wrote:

>>

>> [ ... ]

>>

>>>> With or without this patch, it is the case:

>>>>

>>>>                 task1                      task2

>>>>                   |                          |

>>>>   read("/sys/.../cpu1/cpu_capacity)          |

>>>>                   |                  write("/sys/.../cpu1/cpu_capacity")

>>>>   read("/sys/.../cpu2/cpu_capacity)          |

>>>>

>>>>

>>>> There is no guarantee userspace can have a consistent view of the

>>>> capacity. As soon as it reads a capacity, it can be changed in its back.

>>>

>>> True, but w/o the mutex task1 could read different cpu_capacity values

>>> for a cluster (it actually can also with current implementation, we

>>> should grab the mutex in the read path as well if we want to avoid

>>> this). 

>>

>> Even if the mutex is on the read path, the userspace can see different

>> capacities because it will read the cpu_capacity per cpu directory.

>>

>> The mutex will be take when reading cpu0/cpu_capacity, not for

>> cpu[0-9]/cpu_capacity. Between two reads, a write can happen because the

>> lock is released in between.

>>

>> Do you agree with the patch ? Or do you want me to drop it ?

> 

> I don't actually have cases at hand that are showing regression with it,

> I was just trying to understand if we might potentially hit problems in

> the future. So, I'm not against this patch. :-)


not-not-acked-by ? :)

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Juri Lelli Nov. 29, 2018, 12:40 p.m. | #7
On 29/11/18 11:02, Daniel Lezcano wrote:
> On 29/11/2018 10:58, Juri Lelli wrote:

> > On 29/11/18 10:18, Daniel Lezcano wrote:

> >> On 29/11/2018 08:04, Juri Lelli wrote:

> >>

> >> [ ... ]

> >>

> >>>> With or without this patch, it is the case:

> >>>>

> >>>>                 task1                      task2

> >>>>                   |                          |

> >>>>   read("/sys/.../cpu1/cpu_capacity)          |

> >>>>                   |                  write("/sys/.../cpu1/cpu_capacity")

> >>>>   read("/sys/.../cpu2/cpu_capacity)          |

> >>>>

> >>>>

> >>>> There is no guarantee userspace can have a consistent view of the

> >>>> capacity. As soon as it reads a capacity, it can be changed in its back.

> >>>

> >>> True, but w/o the mutex task1 could read different cpu_capacity values

> >>> for a cluster (it actually can also with current implementation, we

> >>> should grab the mutex in the read path as well if we want to avoid

> >>> this). 

> >>

> >> Even if the mutex is on the read path, the userspace can see different

> >> capacities because it will read the cpu_capacity per cpu directory.

> >>

> >> The mutex will be take when reading cpu0/cpu_capacity, not for

> >> cpu[0-9]/cpu_capacity. Between two reads, a write can happen because the

> >> lock is released in between.

> >>

> >> Do you agree with the patch ? Or do you want me to drop it ?

> > 

> > I don't actually have cases at hand that are showing regression with it,

> > I was just trying to understand if we might potentially hit problems in

> > the future. So, I'm not against this patch. :-)

> 

> not-not-acked-by ? :)


:-)

I'm not maintaining this, so,

Reviewed-by: Juri Lelli <juri.lelli@redhat.com>

Patch

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index edfcf8d..fd5325b 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -31,12 +31,11 @@  void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
 		per_cpu(freq_scale, i) = scale;
 }
 
-static DEFINE_MUTEX(cpu_scale_mutex);
 DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
 
 void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
 {
-	per_cpu(cpu_scale, cpu) = capacity;
+	WRITE_ONCE(per_cpu(cpu_scale, cpu), capacity);
 }
 
 static ssize_t cpu_capacity_show(struct device *dev,
@@ -71,10 +70,8 @@  static ssize_t cpu_capacity_store(struct device *dev,
 	if (new_capacity > SCHED_CAPACITY_SCALE)
 		return -EINVAL;
 
-	mutex_lock(&cpu_scale_mutex);
 	for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
 		topology_set_cpu_scale(i, new_capacity);
-	mutex_unlock(&cpu_scale_mutex);
 
 	schedule_work(&update_topology_flags_work);
 
@@ -141,7 +138,6 @@  void topology_normalize_cpu_scale(void)
 		return;
 
 	pr_debug("cpu_capacity: capacity_scale=%u\n", capacity_scale);
-	mutex_lock(&cpu_scale_mutex);
 	for_each_possible_cpu(cpu) {
 		pr_debug("cpu_capacity: cpu=%d raw_capacity=%u\n",
 			 cpu, raw_capacity[cpu]);
@@ -151,7 +147,6 @@  void topology_normalize_cpu_scale(void)
 		pr_debug("cpu_capacity: CPU%d cpu_capacity=%lu\n",
 			cpu, topology_get_cpu_scale(NULL, cpu));
 	}
-	mutex_unlock(&cpu_scale_mutex);
 }
 
 bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index d9bdc1a..12c439f 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -20,7 +20,7 @@  struct sched_domain;
 static inline
 unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu)
 {
-	return per_cpu(cpu_scale, cpu);
+	return READ_ONCE(per_cpu(cpu_scale, cpu));
 }
 
 void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity);