diff mbox series

[V3,3/3] thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping

Message ID 1555443521-579-4-git-send-email-thara.gopinath@linaro.org
State New
Headers show
Series Introduce Thermal Pressure | expand

Commit Message

Thara Gopinath April 16, 2019, 7:38 p.m. UTC
Enable cpufreq cooling device to update the thermal pressure in
event of a capped maximum frequency or removal of capped maximum
frequency.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>

---
 drivers/thermal/cpu_cooling.c | 4 ++++
 1 file changed, 4 insertions(+)

-- 
2.1.4

Comments

Quentin Perret April 18, 2019, 9:48 a.m. UTC | #1
On Tuesday 16 Apr 2019 at 15:38:41 (-0400), Thara Gopinath wrote:
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c

> @@ -177,6 +178,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,

>  

>  		if (policy->max > clipped_freq)

>  			cpufreq_verify_within_limits(policy, 0, clipped_freq);

> +

> +		sched_update_thermal_pressure(policy->cpus,

> +				policy->max, policy->cpuinfo.max_freq);


Is this something we could do this CPUFreq ? Directly in
cpufreq_verify_within_limits() perhaps ?

That would re-define the 'thermal pressure' framework in a more abstract
way and make the scheduler look at 'frequency capping' events,
regardless of the reason for capping.

That would reflect user-defined frequency constraint into cpu_capacity,
in addition to the thermal stuff. I'm not sure if there is another use
case for frequency capping ?

Perhaps the Intel boost stuff could be factored in there ? That is,
at times when the boost freq is not reachable capacity_of() would appear
smaller ... Unless this wants to be reflected instantaneously ?

Thoughts ?
Quentin
Thara Gopinath April 23, 2019, 10:38 p.m. UTC | #2
On 04/18/2019 05:48 AM, Quentin Perret wrote:
> On Tuesday 16 Apr 2019 at 15:38:41 (-0400), Thara Gopinath wrote:

>> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c

>> @@ -177,6 +178,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,

>>  

>>  		if (policy->max > clipped_freq)

>>  			cpufreq_verify_within_limits(policy, 0, clipped_freq);

>> +

>> +		sched_update_thermal_pressure(policy->cpus,

>> +				policy->max, policy->cpuinfo.max_freq);

> 

> Is this something we could do this CPUFreq ? Directly in

> cpufreq_verify_within_limits() perhaps ?

> 

> That would re-define the 'thermal pressure' framework in a more abstract

> way and make the scheduler look at 'frequency capping' events,

> regardless of the reason for capping.

> 

> That would reflect user-defined frequency constraint into cpu_capacity,

> in addition to the thermal stuff. I'm not sure if there is another use

> case for frequency capping ?

Hi Quentin,
Thanks for the review. Sorry for the delay in response as I was on
vacation for the past few days.
I think there is one major difference between user-defined frequency
constraints and frequency constraints due to thermal events in terms of
the time period the system spends in the the constraint state.
Typically, a user constraint lasts for seconds if not minutes and I
think in this case cpu_capacity_orig should reflect this constraint and
not cpu_capacity like this patch set. Also, in case of the user
constraint, there is possibly no need to accumulate and average the
capacity constraints and instantaneous values can be directly applied to
cpu_capacity_orig. On the other hand thermal pressure is more spiky and
sometimes in the order of ms and us requiring the accumulating and
averaging.
> 

> Perhaps the Intel boost stuff could be factored in there ? That is,

> at times when the boost freq is not reachable capacity_of() would appear

> smaller ... Unless this wants to be reflected instantaneously ?

Again, do you think intel boost is more applicable to be reflected in
cpu_capacity_orig and not cpu_capacity?
> 

> Thoughts ?

> Quentin

> 



-- 
Regards
Thara
Ionela Voinescu April 24, 2019, 3:56 p.m. UTC | #3
Hi guys,

On 23/04/2019 23:38, Thara Gopinath wrote:
> On 04/18/2019 05:48 AM, Quentin Perret wrote:

>> On Tuesday 16 Apr 2019 at 15:38:41 (-0400), Thara Gopinath wrote:

>>> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c

>>> @@ -177,6 +178,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,

>>>  

>>>  		if (policy->max > clipped_freq)

>>>  			cpufreq_verify_within_limits(policy, 0, clipped_freq);

>>> +

>>> +		sched_update_thermal_pressure(policy->cpus,

>>> +				policy->max, policy->cpuinfo.max_freq);

>>

>> Is this something we could do this CPUFreq ? Directly in

>> cpufreq_verify_within_limits() perhaps ?

>>

>> That would re-define the 'thermal pressure' framework in a more abstract

>> way and make the scheduler look at 'frequency capping' events,

>> regardless of the reason for capping.

>>

>> That would reflect user-defined frequency constraint into cpu_capacity,

>> in addition to the thermal stuff. I'm not sure if there is another use

>> case for frequency capping ?

> Hi Quentin,

> Thanks for the review. Sorry for the delay in response as I was on

> vacation for the past few days.

> I think there is one major difference between user-defined frequency

> constraints and frequency constraints due to thermal events in terms of

> the time period the system spends in the the constraint state.

> Typically, a user constraint lasts for seconds if not minutes and I

> think in this case cpu_capacity_orig should reflect this constraint and

> not cpu_capacity like this patch set. Also, in case of the user

> constraint, there is possibly no need to accumulate and average the

> capacity constraints and instantaneous values can be directly applied to

> cpu_capacity_orig. On the other hand thermal pressure is more spiky and

> sometimes in the order of ms and us requiring the accumulating and

> averaging.


I think we can't make any assumptions in regards to the intentions of
the user when restricting the OPP range though the cpufreq interface,
but it would still be nice to do something and reflecting it as thermal
pressure would be a good start. It might not be due to thermal, but it
is a capacity restriction that would have the same result. Also, if the
user has the ability to tune the decay period he has the control over
the behavior of the signal. Given that currently there isn't a smarter
mechanism (modifying capacity orig, re-normalising the capacity range)
for long-term capping, even treating it as short-term capping is a good
start. But this is a bigger exercise and it needs thorough
consideration, so it could be skipped, in my opinion, for now.. 

Also, if we want to stick with the "definition", userspace would still
be able to reflect thermal pressure though the thermal limits interface
by setting the cooling device state, which will be reflected in this
update as well. So userspace would have a mechanism to reflect thermal
pressure.

One addition.. I like that the thermal pressure framework is not tied to
cpufreq. There are firmware solutions that do not bother informing
cpufreq of limits being changed, and therefore all of this could be
skipped. But any firmware driver could call sched_update_thermal_pressure
on notifications for limits changing from firmware, which is an
important feature.

>>

>> Perhaps the Intel boost stuff could be factored in there ? That is,

>> at times when the boost freq is not reachable capacity_of() would appear

>> smaller ... Unless this wants to be reflected instantaneously ?

> Again, do you think intel boost is more applicable to be reflected in

> cpu_capacity_orig and not cpu_capacity?

>>

>> Thoughts ?

>> Quentin

>>

> 


The changes here would happen even faster than thermal capping, same as
other restrictions imposed by firmware, so it would not seem right to me
to reflect it in capacity_orig. Reflecting it as thermal pressure is
another matter, which I'd say it should be up to the client. The big
disadvantage I'd see for this is coping with decisions made while being
capped, when you're not capped any longer, and the other way around. I
believe these changes would happen too often and they will not happen in
a ramp-up/ramp-down behavior that we expect from thermal mitigation.
That's why I believe averaging/regulation of the signal works well in
this case, and it might not for power related fast restrictions.

But given these three cases above, it might be that the ideal solution
is for this framework to be made more generic and for each client to be
able to obtain and configure a pressure signal to be reflected
separately in the capacity of each CPU.

My two pennies' worth,
Ionela.
Peter Zijlstra April 24, 2019, 4:47 p.m. UTC | #4
On Tue, Apr 16, 2019 at 03:38:41PM -0400, Thara Gopinath wrote:
> Enable cpufreq cooling device to update the thermal pressure in

> event of a capped maximum frequency or removal of capped maximum

> frequency.

> 

> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>

> ---

>  drivers/thermal/cpu_cooling.c | 4 ++++

>  1 file changed, 4 insertions(+)

> 

> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c

> index 6fff161..d5cc3c3 100644

> --- a/drivers/thermal/cpu_cooling.c

> +++ b/drivers/thermal/cpu_cooling.c

> @@ -31,6 +31,7 @@

>  #include <linux/slab.h>

>  #include <linux/cpu.h>

>  #include <linux/cpu_cooling.h>

> +#include <linux/sched/thermal.h>

>  

>  #include <trace/events/thermal.h>

>  

> @@ -177,6 +178,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,

>  

>  		if (policy->max > clipped_freq)

>  			cpufreq_verify_within_limits(policy, 0, clipped_freq);

> +

> +		sched_update_thermal_pressure(policy->cpus,

> +				policy->max, policy->cpuinfo.max_freq);


If it's already telling the cpufreq thing, why not get it from sugov
instead?
Quentin Perret April 25, 2019, 10:45 a.m. UTC | #5
On Tuesday 23 Apr 2019 at 18:38:46 (-0400), Thara Gopinath wrote:
> I think there is one major difference between user-defined frequency

> constraints and frequency constraints due to thermal events in terms of

> the time period the system spends in the the constraint state.

> Typically, a user constraint lasts for seconds if not minutes and I

> think in this case cpu_capacity_orig should reflect this constraint and

> not cpu_capacity like this patch set.


That might not always be true I think. There's tons of userspace thermal
deamons out there, and I wouldn't be suprised if they were writing into
the cpufreq sysfs files, although I'm not sure.

Another thing is, if you want to change the capacity_orig value, you'll
need to rebuild the sched domains and all I believe. Otherwise there is
a risk to 'break' the sd_asym flags. So we need to make sure we're happy
to pay that price.

> Also, in case of the user

> constraint, there is possibly no need to accumulate and average the

> capacity constraints and instantaneous values can be directly applied to

> cpu_capacity_orig. On the other hand thermal pressure is more spiky and

> sometimes in the order of ms and us requiring the accumulating and

> averaging.

> > 

> > Perhaps the Intel boost stuff could be factored in there ? That is,

> > at times when the boost freq is not reachable capacity_of() would appear

> > smaller ... Unless this wants to be reflected instantaneously ?

> Again, do you think intel boost is more applicable to be reflected in

> cpu_capacity_orig and not cpu_capacity?


I'm not even sure if we want to reflect it at all TBH, but I'd be
interested to see what Intel folks think :-)

Thanks,
Quentin
Vincent Guittot April 25, 2019, 12:04 p.m. UTC | #6
On Thu, 25 Apr 2019 at 12:45, Quentin Perret <quentin.perret@arm.com> wrote:
>

> On Tuesday 23 Apr 2019 at 18:38:46 (-0400), Thara Gopinath wrote:

> > I think there is one major difference between user-defined frequency

> > constraints and frequency constraints due to thermal events in terms of

> > the time period the system spends in the the constraint state.

> > Typically, a user constraint lasts for seconds if not minutes and I

> > think in this case cpu_capacity_orig should reflect this constraint and

> > not cpu_capacity like this patch set.

>

> That might not always be true I think. There's tons of userspace thermal

> deamons out there, and I wouldn't be suprised if they were writing into

> the cpufreq sysfs files, although I'm not sure.


They would better use the sysfs set_target interface of cpu_cooling
device in this case.

>

> Another thing is, if you want to change the capacity_orig value, you'll

> need to rebuild the sched domains and all I believe. Otherwise there is

> a risk to 'break' the sd_asym flags. So we need to make sure we're happy

> to pay that price.


That would be the goal, if userspace uses the sysfs interface of
cpufreq to set a new max frequency, it should be considered as a long
change in regards to the scheduling rate and in this case it should be
interesting to update cpacity_orig and rebuild sched_domain.

>

> > Also, in case of the user

> > constraint, there is possibly no need to accumulate and average the

> > capacity constraints and instantaneous values can be directly applied to

> > cpu_capacity_orig. On the other hand thermal pressure is more spiky and

> > sometimes in the order of ms and us requiring the accumulating and

> > averaging.

> > >

> > > Perhaps the Intel boost stuff could be factored in there ? That is,

> > > at times when the boost freq is not reachable capacity_of() would appear

> > > smaller ... Unless this wants to be reflected instantaneously ?

> > Again, do you think intel boost is more applicable to be reflected in

> > cpu_capacity_orig and not cpu_capacity?

>

> I'm not even sure if we want to reflect it at all TBH, but I'd be

> interested to see what Intel folks think :-)

>

> Thanks,

> Quentin
Quentin Perret April 25, 2019, 12:50 p.m. UTC | #7
On Thursday 25 Apr 2019 at 14:04:10 (+0200), Vincent Guittot wrote:
> On Thu, 25 Apr 2019 at 12:45, Quentin Perret <quentin.perret@arm.com> wrote:

> >

> > On Tuesday 23 Apr 2019 at 18:38:46 (-0400), Thara Gopinath wrote:

> > > I think there is one major difference between user-defined frequency

> > > constraints and frequency constraints due to thermal events in terms of

> > > the time period the system spends in the the constraint state.

> > > Typically, a user constraint lasts for seconds if not minutes and I

> > > think in this case cpu_capacity_orig should reflect this constraint and

> > > not cpu_capacity like this patch set.

> >

> > That might not always be true I think. There's tons of userspace thermal

> > deamons out there, and I wouldn't be suprised if they were writing into

> > the cpufreq sysfs files, although I'm not sure.

> 

> They would better use the sysfs set_target interface of cpu_cooling

> device in this case.


Right

> > Another thing is, if you want to change the capacity_orig value, you'll

> > need to rebuild the sched domains and all I believe. Otherwise there is

> > a risk to 'break' the sd_asym flags. So we need to make sure we're happy

> > to pay that price.

> 

> That would be the goal, if userspace uses the sysfs interface of

> cpufreq to set a new max frequency, it should be considered as a long

> change in regards to the scheduling rate and in this case it should be

> interesting to update cpacity_orig and rebuild sched_domain.


I guess as long as we don't rebuild too frequently that could work.
Perhaps we could put some rate limiting in there to enforce that. Though
we don't do it for hotplug so ... :/
Thara Gopinath April 26, 2019, 10:24 a.m. UTC | #8
On 04/24/2019 11:56 AM, Ionela Voinescu wrote:
> Hi guys,

> 

> On 23/04/2019 23:38, Thara Gopinath wrote:

>> On 04/18/2019 05:48 AM, Quentin Perret wrote:

>>> On Tuesday 16 Apr 2019 at 15:38:41 (-0400), Thara Gopinath wrote:

>>>> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c

>>>> @@ -177,6 +178,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,

>>>>  

>>>>  		if (policy->max > clipped_freq)

>>>>  			cpufreq_verify_within_limits(policy, 0, clipped_freq);

>>>> +

>>>> +		sched_update_thermal_pressure(policy->cpus,

>>>> +				policy->max, policy->cpuinfo.max_freq);

>>>

>>> Is this something we could do this CPUFreq ? Directly in

>>> cpufreq_verify_within_limits() perhaps ?

>>>

>>> That would re-define the 'thermal pressure' framework in a more abstract

>>> way and make the scheduler look at 'frequency capping' events,

>>> regardless of the reason for capping.

>>>

>>> That would reflect user-defined frequency constraint into cpu_capacity,

>>> in addition to the thermal stuff. I'm not sure if there is another use

>>> case for frequency capping ?

>> Hi Quentin,

>> Thanks for the review. Sorry for the delay in response as I was on

>> vacation for the past few days.

>> I think there is one major difference between user-defined frequency

>> constraints and frequency constraints due to thermal events in terms of

>> the time period the system spends in the the constraint state.

>> Typically, a user constraint lasts for seconds if not minutes and I

>> think in this case cpu_capacity_orig should reflect this constraint and

>> not cpu_capacity like this patch set. Also, in case of the user

>> constraint, there is possibly no need to accumulate and average the

>> capacity constraints and instantaneous values can be directly applied to

>> cpu_capacity_orig. On the other hand thermal pressure is more spiky and

>> sometimes in the order of ms and us requiring the accumulating and

>> averaging.

> 

> I think we can't make any assumptions in regards to the intentions of

> the user when restricting the OPP range though the cpufreq interface,

> but it would still be nice to do something and reflecting it as thermal

> pressure would be a good start. It might not be due to thermal, but it

> is a capacity restriction that would have the same result. Also, if the

> user has the ability to tune the decay period he has the control over

> the behavior of the signal. Given that currently there isn't a smarter

> mechanism (modifying capacity orig, re-normalising the capacity range)

> for long-term capping, even treating it as short-term capping is a good

> start. But this is a bigger exercise and it needs thorough

> consideration, so it could be skipped, in my opinion, for now.. 

> 

> Also, if we want to stick with the "definition", userspace would still

> be able to reflect thermal pressure though the thermal limits interface

> by setting the cooling device state, which will be reflected in this

> update as well. So userspace would have a mechanism to reflect thermal

> pressure.


Yes, target_state under cooling devices can be set and this will reflect
as thermal pressure.

> 

> One addition.. I like that the thermal pressure framework is not tied to

> cpufreq. There are firmware solutions that do not bother informing

> cpufreq of limits being changed, and therefore all of this could be

> skipped. But any firmware driver could call sched_update_thermal_pressure

> on notifications for limits changing from firmware, which is an

> important feature.


For me, I am open to discussion on the best place to call
sched_update_thermal_pressure from. Seeing the discussion and different
opinions, I am wondering should there be a SoC or platform specific hook
provided for better abstraction.

Regards
Thara

> 

>>>

>>> Perhaps the Intel boost stuff could be factored in there ? That is,

>>> at times when the boost freq is not reachable capacity_of() would appear

>>> smaller ... Unless this wants to be reflected instantaneously ?

>> Again, do you think intel boost is more applicable to be reflected in

>> cpu_capacity_orig and not cpu_capacity?

>>>

>>> Thoughts ?

>>> Quentin

>>>

>>

> 

> The changes here would happen even faster than thermal capping, same as

> other restrictions imposed by firmware, so it would not seem right to me

> to reflect it in capacity_orig. Reflecting it as thermal pressure is

> another matter, which I'd say it should be up to the client. The big

> disadvantage I'd see for this is coping with decisions made while being

> capped, when you're not capped any longer, and the other way around. I

> believe these changes would happen too often and they will not happen in

> a ramp-up/ramp-down behavior that we expect from thermal mitigation.

> That's why I believe averaging/regulation of the signal works well in

> this case, and it might not for power related fast restrictions.

> 

> But given these three cases above, it might be that the ideal solution

> is for this framework to be made more generic and for each client to be

> able to obtain and configure a pressure signal to be reflected

> separately in the capacity of each CPU.

> 

> My two pennies' worth,

> Ionela.

> 

> 

> 



-- 
Regards
Thara
Thara Gopinath April 26, 2019, 1:47 p.m. UTC | #9
On 04/25/2019 06:45 AM, Quentin Perret wrote:
> On Tuesday 23 Apr 2019 at 18:38:46 (-0400), Thara Gopinath wrote:

>> I think there is one major difference between user-defined frequency

>> constraints and frequency constraints due to thermal events in terms of

>> the time period the system spends in the the constraint state.

>> Typically, a user constraint lasts for seconds if not minutes and I

>> think in this case cpu_capacity_orig should reflect this constraint and

>> not cpu_capacity like this patch set.

> 

> That might not always be true I think. There's tons of userspace thermal

> deamons out there, and I wouldn't be suprised if they were writing into

> the cpufreq sysfs files, although I'm not sure.

> 

> Another thing is, if you want to change the capacity_orig value, you'll

> need to rebuild the sched domains and all I believe. Otherwise there is

> a risk to 'break' the sd_asym flags. So we need to make sure we're happy

> to pay that price.


Hi Quentin,

I saw Vincent's reply on this and my answer is similar. I completely
agree that this will involve a rebuild of sched domains. My thought on
cpufreq capping max frequency from the user space is that the capping is
for long term basis and hence we could live with re-building sched
domains. If user space wants to control the max frequency of a cpu for
thermal reasons then the cooling device sys interface should be used. In
practical scenario, I am interested in knowing why thermal daemons
control cpufreq sysfs files instead of cooling device files.

Regards
Thara

> 

>> Also, in case of the user

>> constraint, there is possibly no need to accumulate and average the

>> capacity constraints and instantaneous values can be directly applied to

>> cpu_capacity_orig. On the other hand thermal pressure is more spiky and

>> sometimes in the order of ms and us requiring the accumulating and

>> averaging.

>>>

>>> Perhaps the Intel boost stuff could be factored in there ? That is,

>>> at times when the boost freq is not reachable capacity_of() would appear

>>> smaller ... Unless this wants to be reflected instantaneously ?

>> Again, do you think intel boost is more applicable to be reflected in

>> cpu_capacity_orig and not cpu_capacity?

> 

> I'm not even sure if we want to reflect it at all TBH, but I'd be

> interested to see what Intel folks think :-)

> 

> Thanks,

> Quentin

> 



-- 
Regards
Thara
diff mbox series

Patch

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 6fff161..d5cc3c3 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -31,6 +31,7 @@ 
 #include <linux/slab.h>
 #include <linux/cpu.h>
 #include <linux/cpu_cooling.h>
+#include <linux/sched/thermal.h>
 
 #include <trace/events/thermal.h>
 
@@ -177,6 +178,9 @@  static int cpufreq_thermal_notifier(struct notifier_block *nb,
 
 		if (policy->max > clipped_freq)
 			cpufreq_verify_within_limits(policy, 0, clipped_freq);
+
+		sched_update_thermal_pressure(policy->cpus,
+				policy->max, policy->cpuinfo.max_freq);
 		break;
 	}
 	mutex_unlock(&cooling_list_lock);