[2/4] base/drivers/arch_topology: Replace mutex with READ_ONCE / WRITE_ONCE

Message ID 1540830201-2947-2-git-send-email-daniel.lezcano@linaro.org
State Superseded
Headers show
Series
  • [1/4] base/drivers/arch_topology: Remove useless check
Related show

Commit Message

Daniel Lezcano Oct. 29, 2018, 4:23 p.m.
The mutex protects a per_cpu variable access. The potential race can
happen only when the cpufreq governor module is loaded and at the same
time the cpu capacity is changed in the sysfs.

There is no real interest of using a mutex to protect a variable
assignation when there is no situation where a task can take the lock
and block.

Replace the mutex by READ_ONCE / WRITE_ONCE.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

---
 drivers/base/arch_topology.c  | 7 +------
 include/linux/arch_topology.h | 2 +-
 2 files changed, 2 insertions(+), 7 deletions(-)

-- 
2.7.4

Comments

Viresh Kumar Oct. 30, 2018, 5:57 a.m. | #1
On Mon, Oct 29, 2018 at 9:54 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>

> The mutex protects a per_cpu variable access. The potential race can

> happen only when the cpufreq governor module is loaded and at the same

> time the cpu capacity is changed in the sysfs.

>

> There is no real interest of using a mutex to protect a variable

> assignation when there is no situation where a task can take the lock

> and block.

>

> Replace the mutex by READ_ONCE / WRITE_ONCE.

>

> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

> ---

>  drivers/base/arch_topology.c  | 7 +------

>  include/linux/arch_topology.h | 2 +-

>  2 files changed, 2 insertions(+), 7 deletions(-)


Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
Sudeep Holla Nov. 23, 2018, 1:58 p.m. | #2
On Mon, Oct 29, 2018 at 05:23:18PM +0100, Daniel Lezcano wrote:
> The mutex protects a per_cpu variable access. The potential race can

> happen only when the cpufreq governor module is loaded and at the same

> time the cpu capacity is changed in the sysfs.

>


I wonder if we really need that sysfs entry to be writable. For some
reason, I had assumed it's read only, obviously it's not. I prefer to
make it RO if there's no strong reason other than debug purposes.

--
Regards,
Sudeep
Daniel Lezcano Nov. 23, 2018, 4:04 p.m. | #3
On 23/11/2018 14:58, Sudeep Holla wrote:
> On Mon, Oct 29, 2018 at 05:23:18PM +0100, Daniel Lezcano wrote:

>> The mutex protects a per_cpu variable access. The potential race can

>> happen only when the cpufreq governor module is loaded and at the same

>> time the cpu capacity is changed in the sysfs.

>>

> 

> I wonder if we really need that sysfs entry to be writable. For some

> reason, I had assumed it's read only, obviously it's not. I prefer to

> make it RO if there's no strong reason other than debug purposes.


Are you suggesting to remove the READ_ONCE/WRITE_ONCE patch and set the
sysfs file read-only ?


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Sudeep Holla Nov. 23, 2018, 4:20 p.m. | #4
On Fri, Nov 23, 2018 at 05:04:18PM +0100, Daniel Lezcano wrote:
> On 23/11/2018 14:58, Sudeep Holla wrote:

> > On Mon, Oct 29, 2018 at 05:23:18PM +0100, Daniel Lezcano wrote:

> >> The mutex protects a per_cpu variable access. The potential race can

> >> happen only when the cpufreq governor module is loaded and at the same

> >> time the cpu capacity is changed in the sysfs.

> >>

> >

> > I wonder if we really need that sysfs entry to be writable. For some

> > reason, I had assumed it's read only, obviously it's not. I prefer to

> > make it RO if there's no strong reason other than debug purposes.

>

> Are you suggesting to remove the READ_ONCE/WRITE_ONCE patch and set the

> sysfs file read-only ?

>


Just to be sure, if we retain RW capability we still need to fix the
race you are pointing out.

However I just don't see the need for RW cpu_capacity sysfs and hence
asking the reason here. IIRC I had pointed this out long back(not sure
internally or externally) but seemed to have missed the version that got
merged. So I am just asking if we really need write capability given that
it has known issues.

If user-space starts writing the value to influence the scheduler, then
it makes it difficult for kernel to change the way it uses the
cpu_capacity in future.

Sorry if there's valid usecase and I am just making noise here.

--
Regards,
Sudeep
Daniel Lezcano Nov. 23, 2018, 4:54 p.m. | #5
On 23/11/2018 17:20, Sudeep Holla wrote:
> On Fri, Nov 23, 2018 at 05:04:18PM +0100, Daniel Lezcano wrote:

>> On 23/11/2018 14:58, Sudeep Holla wrote:

>>> On Mon, Oct 29, 2018 at 05:23:18PM +0100, Daniel Lezcano wrote:

>>>> The mutex protects a per_cpu variable access. The potential race can

>>>> happen only when the cpufreq governor module is loaded and at the same

>>>> time the cpu capacity is changed in the sysfs.

>>>>

>>>

>>> I wonder if we really need that sysfs entry to be writable. For some

>>> reason, I had assumed it's read only, obviously it's not. I prefer to

>>> make it RO if there's no strong reason other than debug purposes.

>>

>> Are you suggesting to remove the READ_ONCE/WRITE_ONCE patch and set the

>> sysfs file read-only ?

>>

> 

> Just to be sure, if we retain RW capability we still need to fix the

> race you are pointing out.

> 

> However I just don't see the need for RW cpu_capacity sysfs and hence

> asking the reason here. IIRC I had pointed this out long back(not sure

> internally or externally) but seemed to have missed the version that got

> merged. So I am just asking if we really need write capability given that

> it has known issues.

> 

> If user-space starts writing the value to influence the scheduler, then

> it makes it difficult for kernel to change the way it uses the

> cpu_capacity in future.

> 

> Sorry if there's valid usecase and I am just making noise here.


It's ok [added Juri Lelli]

I've been through the history:

commit be8f185d8af4dbd450023a42a48c6faa8cbcdfe6
Author: Juri Lelli <juri.lelli@arm.com>
Date:   Thu Nov 3 05:40:18 2016 +0000

    arm64: add sysfs cpu_capacity attribute

    Add a sysfs cpu_capacity attribute with which it is possible to read and
    write (thus over-writing default values) CPUs capacity. This might be
    useful in situations where values needs changing after boot.

    The new attribute shows up as:

     /sys/devices/system/cpu/cpu*/cpu_capacity

    Cc: Will Deacon <will.deacon@arm.com>
    Cc: Mark Brown <broonie@kernel.org>
    Cc: Sudeep Holla <sudeep.holla@arm.com>
    Signed-off-by: Juri Lelli <juri.lelli@arm.com>

    Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>


Juri do you have a use case where we want to override the capacity?

Shall we switch the sysfs attribute to read-only?


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Juri Lelli Nov. 26, 2018, 8:27 a.m. | #6
Hi,

On 23/11/18 17:54, Daniel Lezcano wrote:
> On 23/11/2018 17:20, Sudeep Holla wrote:

> > On Fri, Nov 23, 2018 at 05:04:18PM +0100, Daniel Lezcano wrote:

> >> On 23/11/2018 14:58, Sudeep Holla wrote:

> >>> On Mon, Oct 29, 2018 at 05:23:18PM +0100, Daniel Lezcano wrote:

> >>>> The mutex protects a per_cpu variable access. The potential race can

> >>>> happen only when the cpufreq governor module is loaded and at the same

> >>>> time the cpu capacity is changed in the sysfs.

> >>>>

> >>>

> >>> I wonder if we really need that sysfs entry to be writable. For some

> >>> reason, I had assumed it's read only, obviously it's not. I prefer to

> >>> make it RO if there's no strong reason other than debug purposes.

> >>

> >> Are you suggesting to remove the READ_ONCE/WRITE_ONCE patch and set the

> >> sysfs file read-only ?

> >>

> > 

> > Just to be sure, if we retain RW capability we still need to fix the

> > race you are pointing out.

> > 

> > However I just don't see the need for RW cpu_capacity sysfs and hence

> > asking the reason here. IIRC I had pointed this out long back(not sure

> > internally or externally) but seemed to have missed the version that got

> > merged. So I am just asking if we really need write capability given that

> > it has known issues.

> > 

> > If user-space starts writing the value to influence the scheduler, then

> > it makes it difficult for kernel to change the way it uses the

> > cpu_capacity in future.

> > 

> > Sorry if there's valid usecase and I am just making noise here.

> 

> It's ok [added Juri Lelli]

> 

> I've been through the history:

> 

> commit be8f185d8af4dbd450023a42a48c6faa8cbcdfe6

> Author: Juri Lelli <juri.lelli@arm.com>

> Date:   Thu Nov 3 05:40:18 2016 +0000

> 

>     arm64: add sysfs cpu_capacity attribute

> 

>     Add a sysfs cpu_capacity attribute with which it is possible to read and

>     write (thus over-writing default values) CPUs capacity. This might be

>     useful in situations where values needs changing after boot.

> 

>     The new attribute shows up as:

> 

>      /sys/devices/system/cpu/cpu*/cpu_capacity

> 

>     Cc: Will Deacon <will.deacon@arm.com>

>     Cc: Mark Brown <broonie@kernel.org>

>     Cc: Sudeep Holla <sudeep.holla@arm.com>

>     Signed-off-by: Juri Lelli <juri.lelli@arm.com>

>     Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

> 

> Juri do you have a use case where we want to override the capacity?

> 

> Shall we switch the sysfs attribute to read-only?


So, I spent a bit of time researching patchset history and IIRC the
point of having a RW cpu_capacity was to help in situations where one
wants to change values after boot, because she/he now has "better"
numbers (remember we advocate to use Dhrystone to populate DTs, but that
is highly debatable). I also seem to remember that there might also be
cases where DT values cannot be changed at all for a (new?) platform
that happens to be using DTs shipped with an old revision; something
along these lines was mentioned (by Mark?) during the review process,
but exact details escape my mind ATM, apologies.

Best,

- Juri
Daniel Lezcano Nov. 26, 2018, 8:39 a.m. | #7
Hi Juri,

On 26/11/2018 09:27, Juri Lelli wrote:
> Hi,

> 

> On 23/11/18 17:54, Daniel Lezcano wrote:

>> On 23/11/2018 17:20, Sudeep Holla wrote:

>>> On Fri, Nov 23, 2018 at 05:04:18PM +0100, Daniel Lezcano wrote:

>>>> On 23/11/2018 14:58, Sudeep Holla wrote:

>>>>> On Mon, Oct 29, 2018 at 05:23:18PM +0100, Daniel Lezcano wrote:

>>>>>> The mutex protects a per_cpu variable access. The potential race can

>>>>>> happen only when the cpufreq governor module is loaded and at the same

>>>>>> time the cpu capacity is changed in the sysfs.

>>>>>>

>>>>>

>>>>> I wonder if we really need that sysfs entry to be writable. For some

>>>>> reason, I had assumed it's read only, obviously it's not. I prefer to

>>>>> make it RO if there's no strong reason other than debug purposes.

>>>>

>>>> Are you suggesting to remove the READ_ONCE/WRITE_ONCE patch and set the

>>>> sysfs file read-only ?

>>>>

>>>

>>> Just to be sure, if we retain RW capability we still need to fix the

>>> race you are pointing out.

>>>

>>> However I just don't see the need for RW cpu_capacity sysfs and hence

>>> asking the reason here. IIRC I had pointed this out long back(not sure

>>> internally or externally) but seemed to have missed the version that got

>>> merged. So I am just asking if we really need write capability given that

>>> it has known issues.

>>>

>>> If user-space starts writing the value to influence the scheduler, then

>>> it makes it difficult for kernel to change the way it uses the

>>> cpu_capacity in future.

>>>

>>> Sorry if there's valid usecase and I am just making noise here.

>>

>> It's ok [added Juri Lelli]

>>

>> I've been through the history:

>>

>> commit be8f185d8af4dbd450023a42a48c6faa8cbcdfe6

>> Author: Juri Lelli <juri.lelli@arm.com>

>> Date:   Thu Nov 3 05:40:18 2016 +0000

>>

>>     arm64: add sysfs cpu_capacity attribute

>>

>>     Add a sysfs cpu_capacity attribute with which it is possible to read and

>>     write (thus over-writing default values) CPUs capacity. This might be

>>     useful in situations where values needs changing after boot.

>>

>>     The new attribute shows up as:

>>

>>      /sys/devices/system/cpu/cpu*/cpu_capacity

>>

>>     Cc: Will Deacon <will.deacon@arm.com>

>>     Cc: Mark Brown <broonie@kernel.org>

>>     Cc: Sudeep Holla <sudeep.holla@arm.com>

>>     Signed-off-by: Juri Lelli <juri.lelli@arm.com>

>>     Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

>>

>> Juri do you have a use case where we want to override the capacity?

>>

>> Shall we switch the sysfs attribute to read-only?

> 

> So, I spent a bit of time researching patchset history and IIRC the

> point of having a RW cpu_capacity was to help in situations where one

> wants to change values after boot, because she/he now has "better"

> numbers (remember we advocate to use Dhrystone to populate DTs, but that

> is highly debatable). I also seem to remember that there might also be

> cases where DT values cannot be changed at all for a (new?) platform

> that happens to be using DTs shipped with an old revision; something

> along these lines was mentioned (by Mark?) during the review process,

> but exact details escape my mind ATM, apologies.


Ok, so I guess it makes sense to keep it RW then.

Thanks for the feedback.

  -- Daniel



-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Mark Brown Nov. 26, 2018, 11:33 a.m. | #8
On Mon, Nov 26, 2018 at 09:27:02AM +0100, Juri Lelli wrote:

> is highly debatable). I also seem to remember that there might also be

> cases where DT values cannot be changed at all for a (new?) platform

> that happens to be using DTs shipped with an old revision; something

> along these lines was mentioned (by Mark?) during the review process,

> but exact details escape my mind ATM, apologies.


Yeah, DTs might be in firmware which is either non-updatable or which
people are reluctant to update due to it not being recoverable if the
update goes wrong.

Patch

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 204ed10..b19d6d4 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -30,12 +30,11 @@  void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
 		per_cpu(freq_scale, i) = scale;
 }
 
-static DEFINE_MUTEX(cpu_scale_mutex);
 DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
 
 void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
 {
-	per_cpu(cpu_scale, cpu) = capacity;
+	WRITE_ONCE(per_cpu(cpu_scale, cpu), capacity);
 }
 
 static ssize_t cpu_capacity_show(struct device *dev,
@@ -67,10 +66,8 @@  static ssize_t cpu_capacity_store(struct device *dev,
 	if (new_capacity > SCHED_CAPACITY_SCALE)
 		return -EINVAL;
 
-	mutex_lock(&cpu_scale_mutex);
 	for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
 		topology_set_cpu_scale(i, new_capacity);
-	mutex_unlock(&cpu_scale_mutex);
 
 	return count;
 }
@@ -116,7 +113,6 @@  void topology_normalize_cpu_scale(void)
 		return;
 
 	pr_debug("cpu_capacity: capacity_scale=%u\n", capacity_scale);
-	mutex_lock(&cpu_scale_mutex);
 	for_each_possible_cpu(cpu) {
 		pr_debug("cpu_capacity: cpu=%d raw_capacity=%u\n",
 			 cpu, raw_capacity[cpu]);
@@ -126,7 +122,6 @@  void topology_normalize_cpu_scale(void)
 		pr_debug("cpu_capacity: CPU%d cpu_capacity=%lu\n",
 			cpu, topology_get_cpu_scale(NULL, cpu));
 	}
-	mutex_unlock(&cpu_scale_mutex);
 }
 
 bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index 2b70941..7c0aaa9 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -19,7 +19,7 @@  struct sched_domain;
 static inline
 unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu)
 {
-	return per_cpu(cpu_scale, cpu);
+	return READ_ONCE(per_cpu(cpu_scale, cpu));
 }
 
 void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity);