diff mbox

[1/2] cpufreq: serialize calls to __cpufreq_governor()

Message ID efa1d8ae3e7b227b73a951d6072fa211d3cfe593.1410235439.git.viresh.kumar@linaro.org
State New
Headers show

Commit Message

Viresh Kumar Sept. 9, 2014, 4:16 a.m. UTC
This commit was earlier commited in kernel as:
19c7630 cpufreq: serialize calls to __cpufreq_governor()

and was later reverted by Srivatsa:
56d07db cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes

When this commit was first introduced it was written for races during hotplug
and because we got some other solution to take care of the races with hotplug we
reverted it.

But (as I also said in the revert patch: https://lkml.org/lkml/2013/9/10/61)
there are more cases where this is required.

Recently Robert shown an instance where changing governors with multiple threads
leads to following warnings:

------------[ cut here ]------------
WARNING: CPU: 1 PID: 2458 at drivers/cpufreq/cpufreq_governor.c:261 cpufreq_governor_dbs+0x6d2/0x740()
CPU: 1 PID: 2458 Comm: tee Tainted: G           OE 3.16.0-rc6+ #1
Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011
 0000000000000009 ffff8800ae403b78 ffffffff8173b0bf 0000000000000000
 ffff8800ae403bb0 ffffffff8106c82d 0000000000000000 ffff88022fa27000
 0000000000000005 0000000000000002 ffffffff81cd5d00 ffff8800ae403bc0
Call Trace:
 [<ffffffff8173b0bf>] dump_stack+0x45/0x56
 [<ffffffff8106c82d>] warn_slowpath_common+0x7d/0xa0
 [<ffffffff8106c90a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff815e4a12>] cpufreq_governor_dbs+0x6d2/0x740
 [<ffffffff810941fc>] ? notifier_call_chain+0x4c/0x70
 [<ffffffff815e2757>] od_cpufreq_governor_dbs+0x17/0x20
 [<ffffffff815dea50>] __cpufreq_governor+0xb0/0x2a0
 [<ffffffff815ded8c>] cpufreq_set_policy+0x14c/0x2f0
 [<ffffffff815df796>] store_scaling_governor+0x96/0xf0
 [<ffffffff815df100>] ? cpufreq_update_policy+0x1d0/0x1d0
 [<ffffffff815de3c9>] store+0x79/0xc0
 [<ffffffff81245bed>] sysfs_kf_write+0x3d/0x50
 [<ffffffff81245120>] kernfs_fop_write+0xe0/0x160
 [<ffffffff811d00d7>] vfs_write+0xb7/0x1f0
 [<ffffffff811d0c76>] SyS_write+0x46/0xb0
 [<ffffffff817439ff>] tracesys+0xe1/0xe6
---[ end trace a2dad7e42b22c796 ]---
BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff815e4395>] cpufreq_governor_dbs+0x55/0x740
PGD 36a05067 PUD b47df067 PMD 0
Oops: 0000 [#1] SMP

Robert also provided a small script to reproduce it:
crash_governor.sh:
for I in `seq 1000`
do
        echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
        echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
done

runme.sh:
for I in `seq 8`
do
        ./crash_governor.sh &
done

Just run runme.sh to crash your system :)

Introduce an additional variable which would guarantee serialization of
__cpufreq_governor() routine.

Reported-and-tested-by: Robert Schöne <robert.schoene@tu-dresden.de>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
Hi Rafael,

These fixes the issues reported by Robert. There is slight change after Robert
tested my initial patch, 'bool' is replaced by 'int' for 'governor_state'.

Regardingn stable trees, I am not too sure. The first patch of this series was
earlier applied on 3.12 and then was reverted quickly in the same release.

So, the best we can do is 3.12+.

 drivers/cpufreq/cpufreq.c | 7 ++++++-
 include/linux/cpufreq.h   | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

Comments

Robert Schöne Sept. 9, 2014, 7:29 a.m. UTC | #1
Even though the test fixed my initital problem, my test system (which
runs the kernel with this patch applied) just crashed when changing
frequencies. This might be connected to the patch.

I'll have another look into it and let you know.


Am Dienstag, den 09.09.2014, 09:46 +0530 schrieb Viresh Kumar:
> This commit was earlier commited in kernel as:
> 19c7630 cpufreq: serialize calls to __cpufreq_governor()
> 
> and was later reverted by Srivatsa:
> 56d07db cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
> 
> When this commit was first introduced it was written for races during hotplug
> and because we got some other solution to take care of the races with hotplug we
> reverted it.
> 
> But (as I also said in the revert patch: https://lkml.org/lkml/2013/9/10/61)
> there are more cases where this is required.


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Sept. 9, 2014, 7:35 a.m. UTC | #2
On 9 September 2014 12:59, Robert Schöne <robert.schoene@tu-dresden.de> wrote:
> Even though the test fixed my initital problem, my test system (which
> runs the kernel with this patch applied) just crashed when changing

Only this patch ? or both 1/2 & 2/2 ??

> frequencies. This might be connected to the patch.
>
> I'll have another look into it and let you know.

Hmm, that's strange. Probably some other (existing) bug woken up with
this change.. Also this patch will come into picture while changing governor
tunables and not during frequency switches.. So they looked a bit unrelated
to me..

Yeah, please let us know if this breaks anything.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Sept. 9, 2014, 2:45 p.m. UTC | #3
Empty body?

On 9 September 2014 17:25, Prarit Bhargava <prarit@redhat.com> wrote:
>
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rafael J. Wysocki Sept. 24, 2014, 11:46 p.m. UTC | #4
On Tuesday, September 09, 2014 09:46:39 AM Viresh Kumar wrote:
> This commit was earlier commited in kernel as:
> 19c7630 cpufreq: serialize calls to __cpufreq_governor()
> 
> and was later reverted by Srivatsa:
> 56d07db cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
> 
> When this commit was first introduced it was written for races during hotplug
> and because we got some other solution to take care of the races with hotplug we
> reverted it.
> 
> But (as I also said in the revert patch: https://lkml.org/lkml/2013/9/10/61)
> there are more cases where this is required.
> 
> Recently Robert shown an instance where changing governors with multiple threads
> leads to following warnings:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 2458 at drivers/cpufreq/cpufreq_governor.c:261 cpufreq_governor_dbs+0x6d2/0x740()
> CPU: 1 PID: 2458 Comm: tee Tainted: G           OE 3.16.0-rc6+ #1
> Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011
>  0000000000000009 ffff8800ae403b78 ffffffff8173b0bf 0000000000000000
>  ffff8800ae403bb0 ffffffff8106c82d 0000000000000000 ffff88022fa27000
>  0000000000000005 0000000000000002 ffffffff81cd5d00 ffff8800ae403bc0
> Call Trace:
>  [<ffffffff8173b0bf>] dump_stack+0x45/0x56
>  [<ffffffff8106c82d>] warn_slowpath_common+0x7d/0xa0
>  [<ffffffff8106c90a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffff815e4a12>] cpufreq_governor_dbs+0x6d2/0x740
>  [<ffffffff810941fc>] ? notifier_call_chain+0x4c/0x70
>  [<ffffffff815e2757>] od_cpufreq_governor_dbs+0x17/0x20
>  [<ffffffff815dea50>] __cpufreq_governor+0xb0/0x2a0
>  [<ffffffff815ded8c>] cpufreq_set_policy+0x14c/0x2f0
>  [<ffffffff815df796>] store_scaling_governor+0x96/0xf0
>  [<ffffffff815df100>] ? cpufreq_update_policy+0x1d0/0x1d0
>  [<ffffffff815de3c9>] store+0x79/0xc0
>  [<ffffffff81245bed>] sysfs_kf_write+0x3d/0x50
>  [<ffffffff81245120>] kernfs_fop_write+0xe0/0x160
>  [<ffffffff811d00d7>] vfs_write+0xb7/0x1f0
>  [<ffffffff811d0c76>] SyS_write+0x46/0xb0
>  [<ffffffff817439ff>] tracesys+0xe1/0xe6
> ---[ end trace a2dad7e42b22c796 ]---
> BUG: unable to handle kernel NULL pointer dereference at           (null)
> IP: [<ffffffff815e4395>] cpufreq_governor_dbs+0x55/0x740
> PGD 36a05067 PUD b47df067 PMD 0
> Oops: 0000 [#1] SMP
> 
> Robert also provided a small script to reproduce it:
> crash_governor.sh:
> for I in `seq 1000`
> do
>         echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
>         echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> done
> 
> runme.sh:
> for I in `seq 8`
> do
>         ./crash_governor.sh &
> done
> 
> Just run runme.sh to crash your system :)
> 
> Introduce an additional variable which would guarantee serialization of
> __cpufreq_governor() routine.
> 
> Reported-and-tested-by: Robert Schöne <robert.schoene@tu-dresden.de>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> Hi Rafael,
> 
> These fixes the issues reported by Robert. There is slight change after Robert
> tested my initial patch, 'bool' is replaced by 'int' for 'governor_state'.
> 
> Regardingn stable trees, I am not too sure. The first patch of this series was
> earlier applied on 3.12 and then was reverted quickly in the same release.
> 
> So, the best we can do is 3.12+.

What's the current status of this series?
Robert Schöne Sept. 25, 2014, 6:07 a.m. UTC | #5
> What's the current status of this series?
> 
We had some iterations of patches, but the only solution that works for
me is the patch with the coarse-grained lock that I sent at Mon, 08 Sep
2014 10:16:48 CEST [1]
Viresh is pretty occupied lately, but he told me that he might do the
tests himself when the current period of busyness is over as he is
supplied with a test system. I'm not sure about his current status (busy
or testing). 

[1] http://permalink.gmane.org/gmane.linux.power-management.general/49272
Viresh Kumar Sept. 29, 2014, 9:50 a.m. UTC | #6
On 24 September 2014 16:46, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> What's the current status of this series?

Started working again after two weeks today only and so it might
take some days for things to get stable at my end. Should be able
to test things after that..
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Sept. 29, 2014, 11:29 a.m. UTC | #7
On 09/09/2014 12:16 AM, Viresh Kumar wrote:
> This commit was earlier commited in kernel as:
> 19c7630 cpufreq: serialize calls to __cpufreq_governor()
> 
> and was later reverted by Srivatsa:
> 56d07db cpufreq: Remove temporary fix for race between CPU hotplug and sysfs-writes
> 
> When this commit was first introduced it was written for races during hotplug
> and because we got some other solution to take care of the races with hotplug we
> reverted it.
> 
> But (as I also said in the revert patch: https://lkml.org/lkml/2013/9/10/61)
> there are more cases where this is required.
> 
> Recently Robert shown an instance where changing governors with multiple threads
> leads to following warnings:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 2458 at drivers/cpufreq/cpufreq_governor.c:261 cpufreq_governor_dbs+0x6d2/0x740()
> CPU: 1 PID: 2458 Comm: tee Tainted: G           OE 3.16.0-rc6+ #1
> Hardware name: FUJITSU ESPRIMO P700/D3061-A1, BIOS V4.6.4.0 R1.12.0 for D3061-A1x 07/04/2011
>  0000000000000009 ffff8800ae403b78 ffffffff8173b0bf 0000000000000000
>  ffff8800ae403bb0 ffffffff8106c82d 0000000000000000 ffff88022fa27000
>  0000000000000005 0000000000000002 ffffffff81cd5d00 ffff8800ae403bc0
> Call Trace:
>  [<ffffffff8173b0bf>] dump_stack+0x45/0x56
>  [<ffffffff8106c82d>] warn_slowpath_common+0x7d/0xa0
>  [<ffffffff8106c90a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffff815e4a12>] cpufreq_governor_dbs+0x6d2/0x740
>  [<ffffffff810941fc>] ? notifier_call_chain+0x4c/0x70
>  [<ffffffff815e2757>] od_cpufreq_governor_dbs+0x17/0x20
>  [<ffffffff815dea50>] __cpufreq_governor+0xb0/0x2a0
>  [<ffffffff815ded8c>] cpufreq_set_policy+0x14c/0x2f0
>  [<ffffffff815df796>] store_scaling_governor+0x96/0xf0
>  [<ffffffff815df100>] ? cpufreq_update_policy+0x1d0/0x1d0
>  [<ffffffff815de3c9>] store+0x79/0xc0
>  [<ffffffff81245bed>] sysfs_kf_write+0x3d/0x50
>  [<ffffffff81245120>] kernfs_fop_write+0xe0/0x160
>  [<ffffffff811d00d7>] vfs_write+0xb7/0x1f0
>  [<ffffffff811d0c76>] SyS_write+0x46/0xb0
>  [<ffffffff817439ff>] tracesys+0xe1/0xe6
> ---[ end trace a2dad7e42b22c796 ]---
> BUG: unable to handle kernel NULL pointer dereference at           (null)
> IP: [<ffffffff815e4395>] cpufreq_governor_dbs+0x55/0x740
> PGD 36a05067 PUD b47df067 PMD 0
> Oops: 0000 [#1] SMP
> 
> Robert also provided a small script to reproduce it:
> crash_governor.sh:
> for I in `seq 1000`
> do
>         echo ondemand | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
>         echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
> done
> 
> runme.sh:
> for I in `seq 8`
> do
>         ./crash_governor.sh &
> done
> 
> Just run runme.sh to crash your system :)
> 

This is exactly the same issue I mentioned a few weeks ago and traced back to
955ef4833574636819cd269cfbae12f79cbde63a which drops the lock around the
CPUFREQ_GOV_POLICY_EXIT __cpufreq_governor() call.

Just my two cents -- I don't think that adding a new lock/locking scheme is the
way to fix this.

P.

> Introduce an additional variable which would guarantee serialization of
> __cpufreq_governor() routine.
> 
> Reported-and-tested-by: Robert Schöne <robert.schoene@tu-dresden.de>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> Hi Rafael,
> 
> These fixes the issues reported by Robert. There is slight change after Robert
> tested my initial patch, 'bool' is replaced by 'int' for 'governor_state'.
> 
> Regardingn stable trees, I am not too sure. The first patch of this series was
> earlier applied on 3.12 and then was reverted quickly in the same release.
> 
> So, the best we can do is 3.12+.
> 
>  drivers/cpufreq/cpufreq.c | 7 ++++++-
>  include/linux/cpufreq.h   | 1 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index d9fdedd..a7ceae3 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2012,13 +2012,15 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
>  		 policy->cpu, event);
>  
>  	mutex_lock(&cpufreq_governor_lock);
> -	if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
> +	if (policy->governor_busy
> +	    || (policy->governor_enabled && event == CPUFREQ_GOV_START)
>  	    || (!policy->governor_enabled
>  	    && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) {
>  		mutex_unlock(&cpufreq_governor_lock);
>  		return -EBUSY;
>  	}
>  
> +	policy->governor_busy = true;
>  	if (event == CPUFREQ_GOV_STOP)
>  		policy->governor_enabled = false;
>  	else if (event == CPUFREQ_GOV_START)
> @@ -2047,6 +2049,9 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
>  			((event == CPUFREQ_GOV_POLICY_EXIT) && !ret))
>  		module_put(policy->governor->owner);
>  
> +	mutex_lock(&cpufreq_governor_lock);
> +	policy->governor_busy = false;
> +	mutex_unlock(&cpufreq_governor_lock);
>  	return ret;
>  }
>  
> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> index 7d1955a..c7aa96b 100644
> --- a/include/linux/cpufreq.h
> +++ b/include/linux/cpufreq.h
> @@ -82,6 +82,7 @@ struct cpufreq_policy {
>  	struct cpufreq_governor	*governor; /* see below */
>  	void			*governor_data;
>  	bool			governor_enabled; /* governor start/stop flag */
> +	bool			governor_busy;
>  
>  	struct work_struct	update; /* if update_policy() needs to be
>  					 * called, but you're in IRQ context */
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Sept. 29, 2014, 11:38 a.m. UTC | #8
On 29 September 2014 16:59, Prarit Bhargava <prarit@redhat.com> wrote:

> This is exactly the same issue I mentioned a few weeks ago and traced back to
> 955ef4833574636819cd269cfbae12f79cbde63a which drops the lock around the
> CPUFREQ_GOV_POLICY_EXIT __cpufreq_governor() call.
>
> Just my two cents -- I don't think that adding a new lock/locking scheme is the
> way to fix this.

Me and Robert are just inches away from fixing it. Just that the
remote testing by
Robert and patches from me aren't working well together.. I need to do
this myself
and have a board to reproduce it now.. But would take some time to get going...

And yes, I am also against another lock here :)
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Prarit Bhargava Sept. 29, 2014, 11:50 a.m. UTC | #9
On 09/29/2014 07:38 AM, Viresh Kumar wrote:
> On 29 September 2014 16:59, Prarit Bhargava <prarit@redhat.com> wrote:
> 
>> This is exactly the same issue I mentioned a few weeks ago and traced back to
>> 955ef4833574636819cd269cfbae12f79cbde63a which drops the lock around the
>> CPUFREQ_GOV_POLICY_EXIT __cpufreq_governor() call.
>>
>> Just my two cents -- I don't think that adding a new lock/locking scheme is the
>> way to fix this.
> 
> Me and Robert are just inches away from fixing it. Just that the
> remote testing by
> Robert and patches from me aren't working well together.. I need to do
> this myself
> and have a board to reproduce it now.. But would take some time to get going...
> 
> And yes, I am also against another lock here :)

Send me what you have in mind -- I can always take a look and put it through
tests as well.

P.

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar Sept. 29, 2014, 11:55 a.m. UTC | #10
On 29 September 2014 17:20, Prarit Bhargava <prarit@redhat.com> wrote:
> Send me what you have in mind -- I can always take a look and put it through
> tests as well.

The last set of changes I gave to Robert:

git://git.linaro.org/people/viresh.kumar/linux.git /cpufreq/governor-fixes
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index d9fdedd..a7ceae3 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2012,13 +2012,15 @@  static int __cpufreq_governor(struct cpufreq_policy *policy,
 		 policy->cpu, event);
 
 	mutex_lock(&cpufreq_governor_lock);
-	if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
+	if (policy->governor_busy
+	    || (policy->governor_enabled && event == CPUFREQ_GOV_START)
 	    || (!policy->governor_enabled
 	    && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) {
 		mutex_unlock(&cpufreq_governor_lock);
 		return -EBUSY;
 	}
 
+	policy->governor_busy = true;
 	if (event == CPUFREQ_GOV_STOP)
 		policy->governor_enabled = false;
 	else if (event == CPUFREQ_GOV_START)
@@ -2047,6 +2049,9 @@  static int __cpufreq_governor(struct cpufreq_policy *policy,
 			((event == CPUFREQ_GOV_POLICY_EXIT) && !ret))
 		module_put(policy->governor->owner);
 
+	mutex_lock(&cpufreq_governor_lock);
+	policy->governor_busy = false;
+	mutex_unlock(&cpufreq_governor_lock);
 	return ret;
 }
 
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 7d1955a..c7aa96b 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -82,6 +82,7 @@  struct cpufreq_policy {
 	struct cpufreq_governor	*governor; /* see below */
 	void			*governor_data;
 	bool			governor_enabled; /* governor start/stop flag */
+	bool			governor_busy;
 
 	struct work_struct	update; /* if update_policy() needs to be
 					 * called, but you're in IRQ context */