[v4,2/6] sched: Add infrastructure to store and update instantaneous thermal pressure

Message ID 1571776465-29763-3-git-send-email-thara.gopinath@linaro.org
State New
Headers show
Series
  • Introduce Thermal Pressure
Related show

Commit Message

Thara Gopinath Oct. 22, 2019, 8:34 p.m.
Add thermal.c and thermal.h files that provides interface
APIs to initialize, update/average, track, accumulate and decay
thermal pressure per cpu basis. A per cpu variable delta_capacity is
introduced to keep track of instantaneous per cpu thermal pressure.
Thermal pressure is the delta between maximum capacity and capped
capacity due to a thermal event.
API trigger_thermal_pressure_average is called for periodic accumulate
and decay of the thermal pressure. It is to to be called from a
periodic tick function. This API passes on the instantaneous delta
capacity of a cpu to update_thermal_load_avg to do the necessary
accumulate, decay and average. 
API update_thermal_pressure is for the system to update the thermal
pressure by providing a capped frequency ratio.
Considering, trigger_thermal_pressure_average reads delta_capacity and
update_thermal_pressure writes into delta_capacity, one can argue for
some sort of locking mechanism to avoid a stale value.
But considering trigger_thermal_pressure_average can be called from a
system critical path like scheduler tick function, a locking mechanism
is not ideal. This means that it is possible the delta_capacity value
used to calculate average thermal pressure for a cpu can be
stale for upto 1 tick period.

Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>

---
v3->v4:
	- Dropped per cpu max_capacity_info struct and instead added a per
	  delta_capacity variable to store the delta between maximum
	  capacity and capped capacity. The delta is now calculated when
	  thermal pressure is updated and not every tick.
	- Dropped populate_max_capacity_info api as only per cpu delta
	  capacity is stored.
	- Renamed update_periodic_maxcap to
	  trigger_thermal_pressure_average and update_maxcap_capacity to
	  update_thermal_pressure.

 include/linux/sched.h  |  8 ++++++++
 kernel/sched/Makefile  |  2 +-
 kernel/sched/thermal.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/thermal.h | 13 +++++++++++++
 4 files changed, 67 insertions(+), 1 deletion(-)
 create mode 100644 kernel/sched/thermal.c
 create mode 100644 kernel/sched/thermal.h

-- 
2.1.4

Comments

Peter Zijlstra Oct. 28, 2019, 3:21 p.m. | #1
On Tue, Oct 22, 2019 at 04:34:21PM -0400, Thara Gopinath wrote:
> Add thermal.c and thermal.h files that provides interface

> APIs to initialize, update/average, track, accumulate and decay

> thermal pressure per cpu basis. A per cpu variable delta_capacity is

> introduced to keep track of instantaneous per cpu thermal pressure.

> Thermal pressure is the delta between maximum capacity and capped

> capacity due to a thermal event.


> API trigger_thermal_pressure_average is called for periodic accumulate

> and decay of the thermal pressure. It is to to be called from a

> periodic tick function. This API passes on the instantaneous delta

> capacity of a cpu to update_thermal_load_avg to do the necessary

> accumulate, decay and average.


> API update_thermal_pressure is for the system to update the thermal

> pressure by providing a capped frequency ratio.


> Considering, trigger_thermal_pressure_average reads delta_capacity and

> update_thermal_pressure writes into delta_capacity, one can argue for

> some sort of locking mechanism to avoid a stale value.


> But considering trigger_thermal_pressure_average can be called from a

> system critical path like scheduler tick function, a locking mechanism

> is not ideal. This means that it is possible the delta_capacity value

> used to calculate average thermal pressure for a cpu can be

> stale for upto 1 tick period.


Please use a blank line at the end of a paragraph.

> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>

> ---


>  include/linux/sched.h  |  8 ++++++++

>  kernel/sched/Makefile  |  2 +-

>  kernel/sched/thermal.c | 45 +++++++++++++++++++++++++++++++++++++++++++++

>  kernel/sched/thermal.h | 13 +++++++++++++

>  4 files changed, 67 insertions(+), 1 deletion(-)

>  create mode 100644 kernel/sched/thermal.c

>  create mode 100644 kernel/sched/thermal.h


These are some tiny files, do these functions really need their own
little files?


> diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c

> new file mode 100644

> index 0000000..0c84960

> --- /dev/null

> +++ b/kernel/sched/thermal.c

> @@ -0,0 +1,45 @@

> +// SPDX-License-Identifier: GPL-2.0

> +/*

> + * Scheduler Thermal Interactions

> + *

> + *  Copyright (C) 2018 Linaro, Inc., Thara Gopinath <thara.gopinath@linaro.org>

> + */

> +

> +#include <linux/sched.h>

> +#include "sched.h"

> +#include "pelt.h"

> +#include "thermal.h"

> +

> +static DEFINE_PER_CPU(unsigned long, delta_capacity);

> +

> +/**

> + * update_thermal_pressure: Update thermal pressure

> + * @cpu: the cpu for which thermal pressure is to be updated for

> + * @capped_freq_ratio: capped max frequency << SCHED_CAPACITY_SHIFT / max freq

> + *

> + * capped_freq_ratio is normalized into capped capacity and the delta between

> + * the arch_scale_cpu_capacity and capped capacity is stored in per cpu

> + * delta_capacity.

> + */

> +void update_thermal_pressure(int cpu, u64 capped_freq_ratio)

> +{

> +	unsigned long __capacity, delta;

> +

> +	/* Normalize the capped freq ratio */

> +	__capacity = (capped_freq_ratio * arch_scale_cpu_capacity(cpu)) >>

> +							SCHED_CAPACITY_SHIFT;

> +	delta = arch_scale_cpu_capacity(cpu) -  __capacity;

> +	pr_debug("updating cpu%d thermal pressure to %lu\n", cpu, delta);


Surely we can do without the pr_debug() here?

> +	per_cpu(delta_capacity, cpu) = delta;

> +}
Thara Gopinath Oct. 30, 2019, 9:37 p.m. | #2
Hello Peter,
Thanks for the review.


On 10/28/2019 11:21 AM, Peter Zijlstra wrote:
> On Tue, Oct 22, 2019 at 04:34:21PM -0400, Thara Gopinath wrote:

>> Add thermal.c and thermal.h files that provides interface

>> APIs to initialize, update/average, track, accumulate and decay

>> thermal pressure per cpu basis. A per cpu variable delta_capacity is

>> introduced to keep track of instantaneous per cpu thermal pressure.

>> Thermal pressure is the delta between maximum capacity and capped

>> capacity due to a thermal event.

> 

>> API trigger_thermal_pressure_average is called for periodic accumulate

>> and decay of the thermal pressure. It is to to be called from a

>> periodic tick function. This API passes on the instantaneous delta

>> capacity of a cpu to update_thermal_load_avg to do the necessary

>> accumulate, decay and average.

> 

>> API update_thermal_pressure is for the system to update the thermal

>> pressure by providing a capped frequency ratio.

> 

>> Considering, trigger_thermal_pressure_average reads delta_capacity and

>> update_thermal_pressure writes into delta_capacity, one can argue for

>> some sort of locking mechanism to avoid a stale value.

> 

>> But considering trigger_thermal_pressure_average can be called from a

>> system critical path like scheduler tick function, a locking mechanism

>> is not ideal. This means that it is possible the delta_capacity value

>> used to calculate average thermal pressure for a cpu can be

>> stale for upto 1 tick period.

> 

> Please use a blank line at the end of a paragraph.


Will do

> 

>> Signed-off-by: Thara Gopinath <thara.gopinath@linaro.org>

>> ---

> 

>>  include/linux/sched.h  |  8 ++++++++

>>  kernel/sched/Makefile  |  2 +-

>>  kernel/sched/thermal.c | 45 +++++++++++++++++++++++++++++++++++++++++++++

>>  kernel/sched/thermal.h | 13 +++++++++++++

>>  4 files changed, 67 insertions(+), 1 deletion(-)

>>  create mode 100644 kernel/sched/thermal.c

>>  create mode 100644 kernel/sched/thermal.h

> 

> These are some tiny files, do these functions really need their own

> little files?


I will merge them  into fair.c. There will be update_thermal_pressure
that will be called from cpu_cooling or other relevant framework.
> 

> 

>> diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c

>> new file mode 100644

>> index 0000000..0c84960

>> --- /dev/null

>> +++ b/kernel/sched/thermal.c

>> @@ -0,0 +1,45 @@

>> +// SPDX-License-Identifier: GPL-2.0

>> +/*

>> + * Scheduler Thermal Interactions

>> + *

>> + *  Copyright (C) 2018 Linaro, Inc., Thara Gopinath <thara.gopinath@linaro.org>

>> + */

>> +

>> +#include <linux/sched.h>

>> +#include "sched.h"

>> +#include "pelt.h"

>> +#include "thermal.h"

>> +

>> +static DEFINE_PER_CPU(unsigned long, delta_capacity);

>> +

>> +/**

>> + * update_thermal_pressure: Update thermal pressure

>> + * @cpu: the cpu for which thermal pressure is to be updated for

>> + * @capped_freq_ratio: capped max frequency << SCHED_CAPACITY_SHIFT / max freq

>> + *

>> + * capped_freq_ratio is normalized into capped capacity and the delta between

>> + * the arch_scale_cpu_capacity and capped capacity is stored in per cpu

>> + * delta_capacity.

>> + */

>> +void update_thermal_pressure(int cpu, u64 capped_freq_ratio)

>> +{

>> +	unsigned long __capacity, delta;

>> +

>> +	/* Normalize the capped freq ratio */

>> +	__capacity = (capped_freq_ratio * arch_scale_cpu_capacity(cpu)) >>

>> +							SCHED_CAPACITY_SHIFT;

>> +	delta = arch_scale_cpu_capacity(cpu) -  __capacity;

>> +	pr_debug("updating cpu%d thermal pressure to %lu\n", cpu, delta);

> 

> Surely we can do without the pr_debug() here?


Will remove. I had it while developing to check if the thermal pressure
is calculated correct.
> 

>> +	per_cpu(delta_capacity, cpu) = delta;

>> +}



-- 
Warm Regards
Thara
Dietmar Eggemann Nov. 1, 2019, 12:17 p.m. | #3
On 22.10.19 22:34, Thara Gopinath wrote:

[...]

> +/**

> + * trigger_thermal_pressure_average: Trigger the thermal pressure accumulate

> + *				     and average algorithm

> + */

> +void trigger_thermal_pressure_average(struct rq *rq)

> +{

> +	update_thermal_load_avg(rq_clock_task(rq), rq,

> +				per_cpu(delta_capacity, cpu_of(rq)));

> +}


Why not call update_thermal_load_avg() directly in fair.c? We do this for all
the other update_foo_load_avg() functions (foo eq. irq, rt_rq, dl_rq ...)

You don't have to pass 'u64 now', so you can hide it plus the 
sched_thermal_decay_coeff add-on within update_thermal_load_avg().
(Similar to update_irq_load_avg()).

You could even hide 'u64 capacity' in it.

So we save one function layer (trigger_thermal_pressure_average()), thermal becomes
more aligned with the other PELT users when it comes to call-sites and we have less
code for this feature.

Something like this (only for 'u64 now' and only compile tested on arm64:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index be3e802a2dc5..ac3ec3a04469 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7576,7 +7576,7 @@ static void update_blocked_averages(int cpu)
 
        update_blocked_load_status(rq, !done);
 
-       trigger_thermal_pressure_average(rq);
+       update_thermal_load_avg(rq, per_cpu(delta_capacity, cpu_of(rq)));
        rq_unlock_irqrestore(rq, &rf);
 }
 
@@ -9938,7 +9938,7 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued)
        update_misfit_status(curr, rq);
        update_overutilized_status(task_rq(curr));
 
-       trigger_thermal_pressure_average(rq);
+       update_thermal_load_avg(rq, per_cpu(delta_capacity, cpu_of(rq)));
 }
 
 /*
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index 38210691c615..7dd0d7e43854 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -353,8 +353,12 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running)
        return 0;
 }
 
-int update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity)
+extern int sched_thermal_decay_coeff;
+
+int update_thermal_load_avg(struct rq *rq, u64 capacity)
 {
+       u64 now = rq_clock_task(rq) >> sched_thermal_decay_coeff;
+
        if (___update_load_sum(now, &rq->avg_thermal,
                               capacity,
                               capacity,
diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h
index c74226de716e..91483c957b6c 100644
--- a/kernel/sched/pelt.h
+++ b/kernel/sched/pelt.h
@@ -6,7 +6,7 @@ int __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, struct sched_entity *se
 int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq);
 int update_rt_rq_load_avg(u64 now, struct rq *rq, int running);
 int update_dl_rq_load_avg(u64 now, struct rq *rq, int running);
-int update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity);
+int update_thermal_load_avg(struct rq *rq, u64 capacity);
 
 #ifdef CONFIG_HAVE_SCHED_AVG_IRQ
 int update_irq_load_avg(struct rq *rq, u64 running);
diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c
index 0da31e12a5ff..7b8c4e35c28d 100644
--- a/kernel/sched/thermal.c
+++ b/kernel/sched/thermal.c
@@ -21,7 +21,7 @@
  *     3                       256
  *     4                       512
  */
-static int sched_thermal_decay_coeff;
+int sched_thermal_decay_coeff;
 
 static int __init setup_sched_thermal_decay_coeff(char *str)
 {
@@ -32,7 +32,7 @@ static int __init setup_sched_thermal_decay_coeff(char *str)
 }
 __setup("sched_thermal_decay_coeff=", setup_sched_thermal_decay_coeff);
 
-static DEFINE_PER_CPU(unsigned long, delta_capacity);
+DEFINE_PER_CPU(unsigned long, delta_capacity);
 
 /**
  * update_thermal_pressure: Update thermal pressure
@@ -55,14 +55,3 @@ void update_thermal_pressure(int cpu, u64 capped_freq_ratio)
 
        per_cpu(delta_capacity, cpu) = delta;
 }
-
-/**
- * trigger_thermal_pressure_average: Trigger the thermal pressure accumulate
- *                                  and average algorithm
- */
-void trigger_thermal_pressure_average(struct rq *rq)
-{
-       update_thermal_load_avg(rq_clock_task(rq) >>
-                               sched_thermal_decay_coeff, rq,
-                               per_cpu(delta_capacity, cpu_of(rq)));
-}
diff --git a/kernel/sched/thermal.h b/kernel/sched/thermal.h
index 26e5b07e9c29..a6ee973db41b 100644
--- a/kernel/sched/thermal.h
+++ b/kernel/sched/thermal.h
@@ -2,12 +2,4 @@
 /*
  * Scheduler thermal interaction internal methods.
  */
-
-#ifdef CONFIG_SMP
-void trigger_thermal_pressure_average(struct rq *rq);
-
-#else
-static inline void trigger_thermal_pressure_average(struct rq *rq)
-{
-}
-#endif
+DECLARE_PER_CPU(unsigned long , delta_capacity);
Thara Gopinath Nov. 1, 2019, 8:57 p.m. | #4
On 11/01/2019 08:17 AM, Dietmar Eggemann wrote:
> On 22.10.19 22:34, Thara Gopinath wrote:

> 

> [...]

> 

>> +/**

>> + * trigger_thermal_pressure_average: Trigger the thermal pressure accumulate

>> + *				     and average algorithm

>> + */

>> +void trigger_thermal_pressure_average(struct rq *rq)

>> +{

>> +	update_thermal_load_avg(rq_clock_task(rq), rq,

>> +				per_cpu(delta_capacity, cpu_of(rq)));

>> +}

> 

> Why not call update_thermal_load_avg() directly in fair.c? We do this for all

> the other update_foo_load_avg() functions (foo eq. irq, rt_rq, dl_rq ...)

thermal.c is going away in next version and I am moving everything to
fair.c. So this is taken care of

> 

> You don't have to pass 'u64 now', so you can hide it plus the 


You still need now.All the update_*_avg apis take now as a parameter.


-- 
Warm Regards
Thara
Dietmar Eggemann Nov. 4, 2019, 5:29 p.m. | #5
On 01/11/2019 21:57, Thara Gopinath wrote:
> On 11/01/2019 08:17 AM, Dietmar Eggemann wrote:

>> On 22.10.19 22:34, Thara Gopinath wrote:

>>

>> [...]

>>

>>> +/**

>>> + * trigger_thermal_pressure_average: Trigger the thermal pressure accumulate

>>> + *				     and average algorithm

>>> + */

>>> +void trigger_thermal_pressure_average(struct rq *rq)

>>> +{

>>> +	update_thermal_load_avg(rq_clock_task(rq), rq,

>>> +				per_cpu(delta_capacity, cpu_of(rq)));

>>> +}

>>

>> Why not call update_thermal_load_avg() directly in fair.c? We do this for all

>> the other update_foo_load_avg() functions (foo eq. irq, rt_rq, dl_rq ...)

> thermal.c is going away in next version and I am moving everything to

> fair.c. So this is taken care of

> 

>>

>> You don't have to pass 'u64 now', so you can hide it plus the 

> 

> You still need now.All the update_*_avg apis take now as a parameter.


You do need it for the ___update_load_sum() call inside the
foo_load_avg() functions. But that doesn't mean you have to pass it into
foo_load_avg(). Look at update_irq_load_avg() for example. We don't pass
rq->clock as now in there.

-int update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity)
+extern int sched_thermal_decay_coeff;
+
+int update_thermal_load_avg(struct rq *rq, u64 capacity)
 {
+       u64 now = rq_clock_task(rq) >> sched_thermal_decay_coeff;
+
        if (___update_load_sum(now, &rq->avg_thermal,
                               capacity,
                               capacity,
Vincent Guittot Nov. 4, 2019, 5:34 p.m. | #6
On Mon, 4 Nov 2019 at 18:29, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>

> On 01/11/2019 21:57, Thara Gopinath wrote:

> > On 11/01/2019 08:17 AM, Dietmar Eggemann wrote:

> >> On 22.10.19 22:34, Thara Gopinath wrote:

> >>

> >> [...]

> >>

> >>> +/**

> >>> + * trigger_thermal_pressure_average: Trigger the thermal pressure accumulate

> >>> + *                              and average algorithm

> >>> + */

> >>> +void trigger_thermal_pressure_average(struct rq *rq)

> >>> +{

> >>> +   update_thermal_load_avg(rq_clock_task(rq), rq,

> >>> +                           per_cpu(delta_capacity, cpu_of(rq)));

> >>> +}

> >>

> >> Why not call update_thermal_load_avg() directly in fair.c? We do this for all

> >> the other update_foo_load_avg() functions (foo eq. irq, rt_rq, dl_rq ...)

> > thermal.c is going away in next version and I am moving everything to

> > fair.c. So this is taken care of

> >

> >>

> >> You don't have to pass 'u64 now', so you can hide it plus the

> >

> > You still need now.All the update_*_avg apis take now as a parameter.

>

> You do need it for the ___update_load_sum() call inside the

> foo_load_avg() functions. But that doesn't mean you have to pass it into

> foo_load_avg(). Look at update_irq_load_avg() for example. We don't pass

> rq->clock as now in there.


update_irq_load_avg is the exception but having now as a parameter is
the default behavior that update_thermal_load_avg have to follow

>

> -int update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity)

> +extern int sched_thermal_decay_coeff;

> +

> +int update_thermal_load_avg(struct rq *rq, u64 capacity)

>  {

> +       u64 now = rq_clock_task(rq) >> sched_thermal_decay_coeff;

> +

>         if (___update_load_sum(now, &rq->avg_thermal,

>                                capacity,

>                                capacity,
Dietmar Eggemann Nov. 4, 2019, 5:41 p.m. | #7
On 04/11/2019 18:34, Vincent Guittot wrote:
> On Mon, 4 Nov 2019 at 18:29, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:

>>

>> On 01/11/2019 21:57, Thara Gopinath wrote:

>>> On 11/01/2019 08:17 AM, Dietmar Eggemann wrote:

>>>> On 22.10.19 22:34, Thara Gopinath wrote:


[...]

>>> You still need now.All the update_*_avg apis take now as a parameter.

>>

>> You do need it for the ___update_load_sum() call inside the

>> foo_load_avg() functions. But that doesn't mean you have to pass it into

>> foo_load_avg(). Look at update_irq_load_avg() for example. We don't pass

>> rq->clock as now in there.

> 

> update_irq_load_avg is the exception but having now as a parameter is

> the default behavior that update_thermal_load_avg have to follow


Why would this be? Just so the functions have the the same parameters?

In this case you could argue that update_irq_load_avg() has to pass in
rq->clock as now.

>> -int update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity)

>> +extern int sched_thermal_decay_coeff;

>> +

>> +int update_thermal_load_avg(struct rq *rq, u64 capacity)

>>  {

>> +       u64 now = rq_clock_task(rq) >> sched_thermal_decay_coeff;

>> +

>>         if (___update_load_sum(now, &rq->avg_thermal,

>>                                capacity,

>>                                capacity,
Vincent Guittot Nov. 4, 2019, 5:48 p.m. | #8
On Mon, 4 Nov 2019 at 18:42, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>

> On 04/11/2019 18:34, Vincent Guittot wrote:

> > On Mon, 4 Nov 2019 at 18:29, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:

> >>

> >> On 01/11/2019 21:57, Thara Gopinath wrote:

> >>> On 11/01/2019 08:17 AM, Dietmar Eggemann wrote:

> >>>> On 22.10.19 22:34, Thara Gopinath wrote:

>

> [...]

>

> >>> You still need now.All the update_*_avg apis take now as a parameter.

> >>

> >> You do need it for the ___update_load_sum() call inside the

> >> foo_load_avg() functions. But that doesn't mean you have to pass it into

> >> foo_load_avg(). Look at update_irq_load_avg() for example. We don't pass

> >> rq->clock as now in there.

> >

> > update_irq_load_avg is the exception but having now as a parameter is

> > the default behavior that update_thermal_load_avg have to follow

>

> Why would this be? Just so the functions have the the same parameters?


That's the default behavior to keep all pelt function to behave
similarly and keep outside what is not strictly related to pelt so it
will ease any further modification
sched_thermal_decay_coeff is not a pelt parameter but a thermal one
irq_avg is an exception not the default behavior to follow

>

> In this case you could argue that update_irq_load_avg() has to pass in

> rq->clock as now.

>

> >> -int update_thermal_load_avg(u64 now, struct rq *rq, u64 capacity)

> >> +extern int sched_thermal_decay_coeff;

> >> +

> >> +int update_thermal_load_avg(struct rq *rq, u64 capacity)

> >>  {

> >> +       u64 now = rq_clock_task(rq) >> sched_thermal_decay_coeff;

> >> +

> >>         if (___update_load_sum(now, &rq->avg_thermal,

> >>                                capacity,

> >>                                capacity,

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2c2e56b..d7ef543 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1983,6 +1983,14 @@  static inline void rseq_syscall(struct pt_regs *regs)
 
 #endif
 
+#ifdef CONFIG_SMP
+void update_thermal_pressure(int cpu, u64 capacity);
+#else
+static inline void update_thermal_pressure(int cpu, u64 capacity)
+{
+}
+#endif
+
 const struct sched_avg *sched_trace_cfs_rq_avg(struct cfs_rq *cfs_rq);
 char *sched_trace_cfs_rq_path(struct cfs_rq *cfs_rq, char *str, int len);
 int sched_trace_cfs_rq_cpu(struct cfs_rq *cfs_rq);
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 21fb5a5..4d3b820 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -20,7 +20,7 @@  obj-y += core.o loadavg.o clock.o cputime.o
 obj-y += idle.o fair.o rt.o deadline.o
 obj-y += wait.o wait_bit.o swait.o completion.o
 
-obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o
+obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o pelt.o thermal.o
 obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
 obj-$(CONFIG_SCHED_DEBUG) += debug.o
diff --git a/kernel/sched/thermal.c b/kernel/sched/thermal.c
new file mode 100644
index 0000000..0c84960
--- /dev/null
+++ b/kernel/sched/thermal.c
@@ -0,0 +1,45 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Scheduler Thermal Interactions
+ *
+ *  Copyright (C) 2018 Linaro, Inc., Thara Gopinath <thara.gopinath@linaro.org>
+ */
+
+#include <linux/sched.h>
+#include "sched.h"
+#include "pelt.h"
+#include "thermal.h"
+
+static DEFINE_PER_CPU(unsigned long, delta_capacity);
+
+/**
+ * update_thermal_pressure: Update thermal pressure
+ * @cpu: the cpu for which thermal pressure is to be updated for
+ * @capped_freq_ratio: capped max frequency << SCHED_CAPACITY_SHIFT / max freq
+ *
+ * capped_freq_ratio is normalized into capped capacity and the delta between
+ * the arch_scale_cpu_capacity and capped capacity is stored in per cpu
+ * delta_capacity.
+ */
+void update_thermal_pressure(int cpu, u64 capped_freq_ratio)
+{
+	unsigned long __capacity, delta;
+
+	/* Normalize the capped freq ratio */
+	__capacity = (capped_freq_ratio * arch_scale_cpu_capacity(cpu)) >>
+							SCHED_CAPACITY_SHIFT;
+	delta = arch_scale_cpu_capacity(cpu) -  __capacity;
+	pr_debug("updating cpu%d thermal pressure to %lu\n", cpu, delta);
+
+	per_cpu(delta_capacity, cpu) = delta;
+}
+
+/**
+ * trigger_thermal_pressure_average: Trigger the thermal pressure accumulate
+ *				     and average algorithm
+ */
+void trigger_thermal_pressure_average(struct rq *rq)
+{
+	update_thermal_load_avg(rq_clock_task(rq), rq,
+				per_cpu(delta_capacity, cpu_of(rq)));
+}
diff --git a/kernel/sched/thermal.h b/kernel/sched/thermal.h
new file mode 100644
index 0000000..26e5b07
--- /dev/null
+++ b/kernel/sched/thermal.h
@@ -0,0 +1,13 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Scheduler thermal interaction internal methods.
+ */
+
+#ifdef CONFIG_SMP
+void trigger_thermal_pressure_average(struct rq *rq);
+
+#else
+static inline void trigger_thermal_pressure_average(struct rq *rq)
+{
+}
+#endif