[v4,6/6] sched: thermal: Enable tuning of decay period

Message ID 1571776465-29763-7-git-send-email-thara.gopinath@linaro.org
State New
Headers show
Series
  • Introduce Thermal Pressure
Related show

Commit Message

Thara Gopinath Oct. 22, 2019, 8:34 p.m.
Thermal pressure follows pelt signas which means the
decay period for thermal pressure is the default pelt
decay period. Depending on soc charecteristics and thermal
activity, it might be beneficial to decay thermal pressure
slower, but still in-tune with the pelt signals.
One way to achieve this is to provide a command line parameter
to set the decay coefficient to an integer between 0 and 10.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>

---
v3->v4:
	- Removed the sysctl setting to tune decay period and instead
	  introduced a command line parameter to control it. The rationale
	  here being changing decay period of a PELT signal runtime can
	  result in a skewed average value for atleast some cycles.

 Documentation/admin-guide/kernel-parameters.txt |  5 +++++
 kernel/sched/thermal.c                          | 25 ++++++++++++++++++++++++-
 2 files changed, 29 insertions(+), 1 deletion(-)

-- 
2.1.4

Comments

Peter Zijlstra Oct. 28, 2019, 3:42 p.m. | #1
On Tue, Oct 22, 2019 at 04:34:25PM -0400, Thara Gopinath wrote:
> Thermal pressure follows pelt signas which means the

> decay period for thermal pressure is the default pelt

> decay period. Depending on soc charecteristics and thermal

> activity, it might be beneficial to decay thermal pressure

> slower, but still in-tune with the pelt signals.

> One way to achieve this is to provide a command line parameter

> to set the decay coefficient to an integer between 0 and 10.

> 

> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>

> ---

> v3->v4:

> 	- Removed the sysctl setting to tune decay period and instead

> 	  introduced a command line parameter to control it. The rationale

> 	  here being changing decay period of a PELT signal runtime can

> 	  result in a skewed average value for atleast some cycles.


TBH, I don't care too much about that. If you touch a knob, you'd better
know what it does anyway.

>  void trigger_thermal_pressure_average(struct rq *rq)

>  {

> -	update_thermal_load_avg(rq_clock_task(rq), rq,

> +	update_thermal_load_avg(rq_clock_task(rq) >>

> +				sched_thermal_decay_coeff, rq,


You see, 'coefficient' means 'multiplicative factor', but what we have
here is a negative exponent. More specifically it is a power-of-2
exponent, and we typically call them '_shift', given how we use them
with 'shift' operators.
Ionela Voinescu Nov. 4, 2019, 4:12 p.m. | #2
Hi Thara,

On Tuesday 22 Oct 2019 at 16:34:25 (-0400), Thara Gopinath wrote:
> Thermal pressure follows pelt signas which means the

> decay period for thermal pressure is the default pelt

> decay period. Depending on soc charecteristics and thermal

> activity, it might be beneficial to decay thermal pressure

> slower, but still in-tune with the pelt signals.


I wonder if it can be beneficial to decay thermal pressure faster as
well.

This implementation makes 32 (LOAD_AVG_PERIOD) the minimum half-life
of the thermal pressure samples. This results in more than 100ms for a
sample to decay significantly and therefore let's say it can take more
than 100ms for capacity to return to (close to) max when the CPU is no
longer capped. This value seems high to me considering that a minimum
value should result in close to 'instantaneous' behaviour, when there
are thermal capping mechanisms that can react in ~20ms (hikey960 has a
polling delay of 25ms, if I'm remembering correctly).

I agree 32ms seems like a good default but given that you've made this
configurable as to give users options, I'm wondering if it would be
better to cover a wider range.

> One way to achieve this is to provide a command line parameter

> to set the decay coefficient to an integer between 0 and 10.

> 

> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>

> ---

> v3->v4:

> 	- Removed the sysctl setting to tune decay period and instead

> 	  introduced a command line parameter to control it. The rationale

> 	  here being changing decay period of a PELT signal runtime can

> 	  result in a skewed average value for atleast some cycles.

> 

>  Documentation/admin-guide/kernel-parameters.txt |  5 +++++

>  kernel/sched/thermal.c                          | 25 ++++++++++++++++++++++++-

>  2 files changed, 29 insertions(+), 1 deletion(-)

> 

> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt

> index a84a83f..61d7baa 100644

> --- a/Documentation/admin-guide/kernel-parameters.txt

> +++ b/Documentation/admin-guide/kernel-parameters.txt

> @@ -4273,6 +4273,11 @@

>  			incurs a small amount of overhead in the scheduler

>  			but is useful for debugging and performance tuning.

>  

> +	sched_thermal_decay_coeff=

> +			[KNL, SMP] Set decay coefficient for thermal pressure signal.

> +			Format: integer betweer 0 and 10

> +			Default is 0.

> +

>  	skew_tick=	[KNL] Offset the periodic timer tick per cpu to mitigate

>  			xtime_lock contention on larger systems, and/or RCU lock

>  			contention on all systems with CONFIG_MAXSMP set.

> diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c

> index 0c84960..0da31e1 100644

> --- a/kernel/sched/thermal.c

> +++ b/kernel/sched/thermal.c

> @@ -10,6 +10,28 @@

>  #include "pelt.h"

>  #include "thermal.h"

>  

> +/**

> + * By default the decay is the default pelt decay period.

> + * The decay coefficient can change is decay period in

> + * multiples of 32.


This description has to be corrected as well, as per Peter's comment.

Also, it might be good not to use the value 32 directly but to mention
that the decay period is a shift of LOAD_AVG_PERIOD. If that changes,
the translation from decay shift to decay period below will change as
well.

Thank you,
Ionela.

> + *   Decay coefficient    Decay period(ms)

> + *	0			32

> + *	1			64

> + *	2			128

> + *	3			256

> + *	4			512

> + */

> +static int sched_thermal_decay_coeff;

> +

> +static int __init setup_sched_thermal_decay_coeff(char *str)

> +{

> +	if (kstrtoint(str, 0, &sched_thermal_decay_coeff))

> +		pr_warn("Unable to set scheduler thermal pressure decay coefficient\n");

> +

> +	return 1;

> +}

> +__setup("sched_thermal_decay_coeff=", setup_sched_thermal_decay_coeff);

> +

>  static DEFINE_PER_CPU(unsigned long, delta_capacity);

>  

>  /**

> @@ -40,6 +62,7 @@ void update_thermal_pressure(int cpu, u64 capped_freq_ratio)

>   */

>  void trigger_thermal_pressure_average(struct rq *rq)

>  {

> -	update_thermal_load_avg(rq_clock_task(rq), rq,

> +	update_thermal_load_avg(rq_clock_task(rq) >>

> +				sched_thermal_decay_coeff, rq,

>  				per_cpu(delta_capacity, cpu_of(rq)));

>  }

> -- 

> 2.1.4

>
Thara Gopinath Nov. 5, 2019, 8:26 p.m. | #3
On 11/04/2019 11:12 AM, Ionela Voinescu wrote:
> Hi Thara,

> 

> On Tuesday 22 Oct 2019 at 16:34:25 (-0400), Thara Gopinath wrote:

>> Thermal pressure follows pelt signas which means the

>> decay period for thermal pressure is the default pelt

>> decay period. Depending on soc charecteristics and thermal

>> activity, it might be beneficial to decay thermal pressure

>> slower, but still in-tune with the pelt signals.

> 

> I wonder if it can be beneficial to decay thermal pressure faster as

> well.

> 

> This implementation makes 32 (LOAD_AVG_PERIOD) the minimum half-life

> of the thermal pressure samples. This results in more than 100ms for a

> sample to decay significantly and therefore let's say it can take more

> than 100ms for capacity to return to (close to) max when the CPU is no

> longer capped. This value seems high to me considering that a minimum

> value should result in close to 'instantaneous' behaviour, when there

> are thermal capping mechanisms that can react in ~20ms (hikey960 has a

> polling delay of 25ms, if I'm remembering correctly).

> 

> I agree 32ms seems like a good default but given that you've made this

> configurable as to give users options, I'm wondering if it would be

> better to cover a wider range.

> 

>> One way to achieve this is to provide a command line parameter

>> to set the decay coefficient to an integer between 0 and 10.

>>

>> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>

>> ---

>> v3->v4:

>> 	- Removed the sysctl setting to tune decay period and instead

>> 	  introduced a command line parameter to control it. The rationale

>> 	  here being changing decay period of a PELT signal runtime can

>> 	  result in a skewed average value for atleast some cycles.

>>

>>  Documentation/admin-guide/kernel-parameters.txt |  5 +++++

>>  kernel/sched/thermal.c                          | 25 ++++++++++++++++++++++++-

>>  2 files changed, 29 insertions(+), 1 deletion(-)

>>

>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt

>> index a84a83f..61d7baa 100644

>> --- a/Documentation/admin-guide/kernel-parameters.txt

>> +++ b/Documentation/admin-guide/kernel-parameters.txt

>> @@ -4273,6 +4273,11 @@

>>  			incurs a small amount of overhead in the scheduler

>>  			but is useful for debugging and performance tuning.

>>  

>> +	sched_thermal_decay_coeff=

>> +			[KNL, SMP] Set decay coefficient for thermal pressure signal.

>> +			Format: integer betweer 0 and 10

>> +			Default is 0.

>> +

>>  	skew_tick=	[KNL] Offset the periodic timer tick per cpu to mitigate

>>  			xtime_lock contention on larger systems, and/or RCU lock

>>  			contention on all systems with CONFIG_MAXSMP set.

>> diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c

>> index 0c84960..0da31e1 100644

>> --- a/kernel/sched/thermal.c

>> +++ b/kernel/sched/thermal.c

>> @@ -10,6 +10,28 @@

>>  #include "pelt.h"

>>  #include "thermal.h"

>>  

>> +/**

>> + * By default the decay is the default pelt decay period.

>> + * The decay coefficient can change is decay period in

>> + * multiples of 32.

> 

> This description has to be corrected as well, as per Peter's comment.

> 

> Also, it might be good not to use the value 32 directly but to mention

> that the decay period is a shift of LOAD_AVG_PERIOD. If that changes,

> the translation from decay shift to decay period below will change as

> well.


Hi Ionela,

I sent out the v5 without fixing this. Even if there are no other
comments on v5 I will send out a v6 fixing this.

Regarding a slower decay, we need a strong case for it.



-- 
Warm Regards
Thara
Ionela Voinescu Nov. 5, 2019, 9:29 p.m. | #4
On Tuesday 05 Nov 2019 at 15:26:06 (-0500), Thara Gopinath wrote:
> On 11/04/2019 11:12 AM, Ionela Voinescu wrote:

> > Hi Thara,

> > 

> > On Tuesday 22 Oct 2019 at 16:34:25 (-0400), Thara Gopinath wrote:

> >> Thermal pressure follows pelt signas which means the

> >> decay period for thermal pressure is the default pelt

> >> decay period. Depending on soc charecteristics and thermal

> >> activity, it might be beneficial to decay thermal pressure

> >> slower, but still in-tune with the pelt signals.

> > 

> > I wonder if it can be beneficial to decay thermal pressure faster as

> > well.

> > 

> > This implementation makes 32 (LOAD_AVG_PERIOD) the minimum half-life

> > of the thermal pressure samples. This results in more than 100ms for a

> > sample to decay significantly and therefore let's say it can take more

> > than 100ms for capacity to return to (close to) max when the CPU is no

> > longer capped. This value seems high to me considering that a minimum

> > value should result in close to 'instantaneous' behaviour, when there

> > are thermal capping mechanisms that can react in ~20ms (hikey960 has a

> > polling delay of 25ms, if I'm remembering correctly).

> > 

> > I agree 32ms seems like a good default but given that you've made this

> > configurable as to give users options, I'm wondering if it would be

> > better to cover a wider range.

> >


[...]

> 

> Hi Ionela,

>


[...]

> 

> Regarding a slower decay, we need a strong case for it.

> 

>


I think you mean faster decay, if you refer to my comment above.

To be blunt, I'm not sure there is a strong case for either kind of
dacay, if we look at the test results only. There is a theoretical case
for both, in my opinion and given that the purpose of this patch is to
give options to platform with different thermal characteristics, I do
believe it's worth providing a good range of options for the decay
period.

Thanks,
Ionela.

> 

> -- 

> Warm Regards

> Thara

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a84a83f..61d7baa 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4273,6 +4273,11 @@ 
 			incurs a small amount of overhead in the scheduler
 			but is useful for debugging and performance tuning.
 
+	sched_thermal_decay_coeff=
+			[KNL, SMP] Set decay coefficient for thermal pressure signal.
+			Format: integer betweer 0 and 10
+			Default is 0.
+
 	skew_tick=	[KNL] Offset the periodic timer tick per cpu to mitigate
 			xtime_lock contention on larger systems, and/or RCU lock
 			contention on all systems with CONFIG_MAXSMP set.
diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c
index 0c84960..0da31e1 100644
--- a/kernel/sched/thermal.c
+++ b/kernel/sched/thermal.c
@@ -10,6 +10,28 @@ 
 #include "pelt.h"
 #include "thermal.h"
 
+/**
+ * By default the decay is the default pelt decay period.
+ * The decay coefficient can change is decay period in
+ * multiples of 32.
+ *   Decay coefficient    Decay period(ms)
+ *	0			32
+ *	1			64
+ *	2			128
+ *	3			256
+ *	4			512
+ */
+static int sched_thermal_decay_coeff;
+
+static int __init setup_sched_thermal_decay_coeff(char *str)
+{
+	if (kstrtoint(str, 0, &sched_thermal_decay_coeff))
+		pr_warn("Unable to set scheduler thermal pressure decay coefficient\n");
+
+	return 1;
+}
+__setup("sched_thermal_decay_coeff=", setup_sched_thermal_decay_coeff);
+
 static DEFINE_PER_CPU(unsigned long, delta_capacity);
 
 /**
@@ -40,6 +62,7 @@  void update_thermal_pressure(int cpu, u64 capped_freq_ratio)
  */
 void trigger_thermal_pressure_average(struct rq *rq)
 {
-	update_thermal_load_avg(rq_clock_task(rq), rq,
+	update_thermal_load_avg(rq_clock_task(rq) >>
+				sched_thermal_decay_coeff, rq,
 				per_cpu(delta_capacity, cpu_of(rq)));
 }