diff mbox

[v10,04/11] sched: Make sched entity usage tracking scale-invariant

Message ID 1425052454-25797-5-git-send-email-vincent.guittot@linaro.org
State New
Headers show

Commit Message

Vincent Guittot Feb. 27, 2015, 3:54 p.m. UTC
From: Morten Rasmussen <morten.rasmussen@arm.com>

Apply frequency scale-invariance correction factor to usage tracking.
Each segment of the running_load_avg geometric series is now scaled by the
current frequency so the utilization_avg_contrib of each entity will be
invariant with frequency scaling. As a result, utilization_load_avg which is
the sum of utilization_avg_contrib, becomes invariant too. So the usage level
that is returned by get_cpu_usage, stays relative to the max frequency as the
cpu_capacity which is is compared against.
Then, we want the keep the load tracking values in a 32bits type, which implies
that the max value of {runnable|running}_avg_sum must be lower than
2^32/88761=48388 (88761 is the max weigth of a task). As LOAD_AVG_MAX = 47742,
arch_scale_freq_capacity must return a value less than
(48388/47742) << SCHED_CAPACITY_SHIFT = 1037 (SCHED_SCALE_CAPACITY = 1024).
So we define the range to [0..SCHED_SCALE_CAPACITY] in order to avoid overflow.

cc: Paul Turner <pjt@google.com>
cc: Ben Segall <bsegall@google.com>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

Comments

Vincent Guittot March 4, 2015, 7:54 a.m. UTC | #1
On 3 March 2015 at 13:51, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 27/02/15 15:54, Vincent Guittot wrote:
>> From: Morten Rasmussen <morten.rasmussen@arm.com>
>>
>> Apply frequency scale-invariance correction factor to usage tracking.
>> Each segment of the running_load_avg geometric series is now scaled by the
>
> The same comment I sent out on [PATCH v10 07/11]:
>
> The use of underscores in running_load_avg implies to me that this is a
> data member of struct sched_avg or something similar. But there is no
> running_load_avg in the current code. However, I can see that
> sched_avg::*running_avg_sum* (and therefore
> cfs_rq::*utilization_load_avg*) are frequency scale invariant.

I have resent the patch with typo correction

>
> -- Dietmar
>
>> current frequency so the utilization_avg_contrib of each entity will be
>> invariant with frequency scaling. As a result, utilization_load_avg which is
>> the sum of utilization_avg_contrib, becomes invariant too. So the usage level
>> that is returned by get_cpu_usage, stays relative to the max frequency as the
>> cpu_capacity which is is compared against.
>> Then, we want the keep the load tracking values in a 32bits type, which implies
>> that the max value of {runnable|running}_avg_sum must be lower than
>> 2^32/88761=48388 (88761 is the max weigth of a task). As LOAD_AVG_MAX = 47742,
>> arch_scale_freq_capacity must return a value less than
>> (48388/47742) << SCHED_CAPACITY_SHIFT = 1037 (SCHED_SCALE_CAPACITY = 1024).
>> So we define the range to [0..SCHED_SCALE_CAPACITY] in order to avoid overflow.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot March 24, 2015, 10 a.m. UTC | #2
On 23 March 2015 at 14:19, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Feb 27, 2015 at 04:54:07PM +0100, Vincent Guittot wrote:
>
>> +     unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu);
>
>> +                     sa->running_avg_sum += delta_w * scale_freq
>> +                             >> SCHED_CAPACITY_SHIFT;
>
> so the only thing that could be improved is somehow making this
> multiplication go away when the arch doesn't implement the function.
>
> But I'm not sure how to do that without #ifdef.
>
> Maybe a little something like so then... that should make the compiler
> get rid of those multiplications unless the arch needs them.

yes, it removes useless multiplication when not used by an arch.
It also adds a constraint on the arch side which have to define
arch_scale_freq_capacity like below:

#define arch_scale_freq_capacity xxx_arch_scale_freq_capacity
with xxx_arch_scale_freq_capacity an architecture specific function

If it sounds acceptable i can update the patch with your proposal ?

Vincent
>
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2484,8 +2484,6 @@ static u32 __compute_runnable_contrib(u6
>         return contrib + runnable_avg_yN_sum[n];
>  }
>
> -unsigned long __weak arch_scale_freq_capacity(struct sched_domain *sd, int cpu);
> -
>  /*
>   * We can represent the historical contribution to runnable average as the
>   * coefficients of a geometric series.  To do this we sub-divide our runnable
> @@ -6015,11 +6013,6 @@ static unsigned long default_scale_capac
>         return SCHED_CAPACITY_SCALE;
>  }
>
> -unsigned long __weak arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
> -{
> -       return default_scale_capacity(sd, cpu);
> -}
> -
>  static unsigned long default_scale_cpu_capacity(struct sched_domain *sd, int cpu)
>  {
>         if ((sd->flags & SD_SHARE_CPUCAPACITY) && (sd->span_weight > 1))
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1387,7 +1387,14 @@ static inline int hrtick_enabled(struct
>
>  #ifdef CONFIG_SMP
>  extern void sched_avg_update(struct rq *rq);
> -extern unsigned long arch_scale_freq_capacity(struct sched_domain *sd, int cpu);
> +
> +#ifndef arch_scale_freq_capacity
> +static __always_inline
> +unsigned long arch_scale_freq_capacity(struct sched_domain *sd, int cpu)
> +{
> +       return SCHED_CAPACITY_SCALE;
> +}
> +#endif
>
>  static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
>  {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot March 25, 2015, 6:08 p.m. UTC | #3
On 25 March 2015 at 18:33, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Mar 24, 2015 at 11:00:57AM +0100, Vincent Guittot wrote:
>> On 23 March 2015 at 14:19, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Fri, Feb 27, 2015 at 04:54:07PM +0100, Vincent Guittot wrote:
>> >
>> >> +     unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu);
>> >
>> >> +                     sa->running_avg_sum += delta_w * scale_freq
>> >> +                             >> SCHED_CAPACITY_SHIFT;
>> >
>> > so the only thing that could be improved is somehow making this
>> > multiplication go away when the arch doesn't implement the function.
>> >
>> > But I'm not sure how to do that without #ifdef.
>> >
>> > Maybe a little something like so then... that should make the compiler
>> > get rid of those multiplications unless the arch needs them.
>>
>> yes, it removes useless multiplication when not used by an arch.
>> It also adds a constraint on the arch side which have to define
>> arch_scale_freq_capacity like below:
>>
>> #define arch_scale_freq_capacity xxx_arch_scale_freq_capacity
>> with xxx_arch_scale_freq_capacity an architecture specific function
>
> Yeah, but it not being weak should make that a compile time warn/fail,
> which should be pretty easy to deal with.
>
>> If it sounds acceptable i can update the patch with your proposal ?
>
> I'll stick it to the end, I just wanted to float to patch to see if
> people had better solutions.

ok. all other methods that i have tried, was removing the optimization
when default arch_scale_freq_capacity was used
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot March 27, 2015, 8:17 a.m. UTC | #4
On 26 March 2015 at 18:38, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> On Wed, Mar 25, 2015 at 06:08:42PM +0000, Vincent Guittot wrote:
>> On 25 March 2015 at 18:33, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Tue, Mar 24, 2015 at 11:00:57AM +0100, Vincent Guittot wrote:
>> >> On 23 March 2015 at 14:19, Peter Zijlstra <peterz@infradead.org> wrote:
>> >> > On Fri, Feb 27, 2015 at 04:54:07PM +0100, Vincent Guittot wrote:
>> >> >
>> >> >> +     unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu);
>> >> >
>> >> >> +                     sa->running_avg_sum += delta_w * scale_freq
>> >> >> +                             >> SCHED_CAPACITY_SHIFT;
>> >> >
>> >> > so the only thing that could be improved is somehow making this
>> >> > multiplication go away when the arch doesn't implement the function.
>> >> >
>> >> > But I'm not sure how to do that without #ifdef.
>> >> >
>> >> > Maybe a little something like so then... that should make the compiler
>> >> > get rid of those multiplications unless the arch needs them.
>> >>
>> >> yes, it removes useless multiplication when not used by an arch.
>> >> It also adds a constraint on the arch side which have to define
>> >> arch_scale_freq_capacity like below:
>> >>
>> >> #define arch_scale_freq_capacity xxx_arch_scale_freq_capacity
>> >> with xxx_arch_scale_freq_capacity an architecture specific function
>> >
>> > Yeah, but it not being weak should make that a compile time warn/fail,
>> > which should be pretty easy to deal with.
>> >
>> >> If it sounds acceptable i can update the patch with your proposal ?
>> >
>> > I'll stick it to the end, I just wanted to float to patch to see if
>> > people had better solutions.
>>
>> ok. all other methods that i have tried, was removing the optimization
>> when default arch_scale_freq_capacity was used
>
> Another potential solution is to stay with weak functions but move the
> multiplication and shift into the arch_scale_*() functions by passing
> the value we want to scale into the arch_scale_*() function. That way we
> can completely avoid multiplication and shift in the default case (no
> arch_scale*() implementations, which is better than what we have today.

the sched_rt_avg_update only uses the mul with
arch_scale_freq_capacity because the shift by SCHED_CAPACITY_SHIFT has
been factorized in scale_rt_capacity

>
> The only downside is that for frequency invariance we need three
> arch_scale_freq_capacity() calls instead of two.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot March 27, 2015, 9:05 a.m. UTC | #5
On 27 March 2015 at 09:17, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> On 26 March 2015 at 18:38, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
>> On Wed, Mar 25, 2015 at 06:08:42PM +0000, Vincent Guittot wrote:
>>> On 25 March 2015 at 18:33, Peter Zijlstra <peterz@infradead.org> wrote:
>>> > On Tue, Mar 24, 2015 at 11:00:57AM +0100, Vincent Guittot wrote:
>>> >> On 23 March 2015 at 14:19, Peter Zijlstra <peterz@infradead.org> wrote:
>>> >> > On Fri, Feb 27, 2015 at 04:54:07PM +0100, Vincent Guittot wrote:
>>> >> >
>>> >> >> +     unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu);
>>> >> >
>>> >> >> +                     sa->running_avg_sum += delta_w * scale_freq
>>> >> >> +                             >> SCHED_CAPACITY_SHIFT;
>>> >> >
>>> >> > so the only thing that could be improved is somehow making this
>>> >> > multiplication go away when the arch doesn't implement the function.
>>> >> >
>>> >> > But I'm not sure how to do that without #ifdef.
>>> >> >
>>> >> > Maybe a little something like so then... that should make the compiler
>>> >> > get rid of those multiplications unless the arch needs them.
>>> >>
>>> >> yes, it removes useless multiplication when not used by an arch.
>>> >> It also adds a constraint on the arch side which have to define
>>> >> arch_scale_freq_capacity like below:
>>> >>
>>> >> #define arch_scale_freq_capacity xxx_arch_scale_freq_capacity
>>> >> with xxx_arch_scale_freq_capacity an architecture specific function
>>> >
>>> > Yeah, but it not being weak should make that a compile time warn/fail,
>>> > which should be pretty easy to deal with.
>>> >
>>> >> If it sounds acceptable i can update the patch with your proposal ?
>>> >
>>> > I'll stick it to the end, I just wanted to float to patch to see if
>>> > people had better solutions.
>>>
>>> ok. all other methods that i have tried, was removing the optimization
>>> when default arch_scale_freq_capacity was used
>>
>> Another potential solution is to stay with weak functions but move the
>> multiplication and shift into the arch_scale_*() functions by passing
>> the value we want to scale into the arch_scale_*() function. That way we
>> can completely avoid multiplication and shift in the default case (no
>> arch_scale*() implementations, which is better than what we have today.
>
> the sched_rt_avg_update only uses the mul with
> arch_scale_freq_capacity because the shift by SCHED_CAPACITY_SHIFT has
> been factorized in scale_rt_capacity

when arch_scale_freq_capacity is not defined by an arch, the mul is
optimized in a lsl 10

>
>>
>> The only downside is that for frequency invariance we need three
>> arch_scale_freq_capacity() calls instead of two.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e54231f..7f031e4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2484,6 +2484,8 @@  static u32 __compute_runnable_contrib(u64 n)
 	return contrib + runnable_avg_yN_sum[n];
 }
 
+unsigned long __weak arch_scale_freq_capacity(struct sched_domain *sd, int cpu);
+
 /*
  * We can represent the historical contribution to runnable average as the
  * coefficients of a geometric series.  To do this we sub-divide our runnable
@@ -2512,7 +2514,7 @@  static u32 __compute_runnable_contrib(u64 n)
  *   load_avg = u_0` + y*(u_0 + u_1*y + u_2*y^2 + ... )
  *            = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}]
  */
-static __always_inline int __update_entity_runnable_avg(u64 now,
+static __always_inline int __update_entity_runnable_avg(u64 now, int cpu,
 							struct sched_avg *sa,
 							int runnable,
 							int running)
@@ -2520,6 +2522,7 @@  static __always_inline int __update_entity_runnable_avg(u64 now,
 	u64 delta, periods;
 	u32 runnable_contrib;
 	int delta_w, decayed = 0;
+	unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu);
 
 	delta = now - sa->last_runnable_update;
 	/*
@@ -2555,7 +2558,8 @@  static __always_inline int __update_entity_runnable_avg(u64 now,
 		if (runnable)
 			sa->runnable_avg_sum += delta_w;
 		if (running)
-			sa->running_avg_sum += delta_w;
+			sa->running_avg_sum += delta_w * scale_freq
+				>> SCHED_CAPACITY_SHIFT;
 		sa->avg_period += delta_w;
 
 		delta -= delta_w;
@@ -2576,7 +2580,8 @@  static __always_inline int __update_entity_runnable_avg(u64 now,
 		if (runnable)
 			sa->runnable_avg_sum += runnable_contrib;
 		if (running)
-			sa->running_avg_sum += runnable_contrib;
+			sa->running_avg_sum += runnable_contrib * scale_freq
+				>> SCHED_CAPACITY_SHIFT;
 		sa->avg_period += runnable_contrib;
 	}
 
@@ -2584,7 +2589,8 @@  static __always_inline int __update_entity_runnable_avg(u64 now,
 	if (runnable)
 		sa->runnable_avg_sum += delta;
 	if (running)
-		sa->running_avg_sum += delta;
+		sa->running_avg_sum += delta * scale_freq
+			>> SCHED_CAPACITY_SHIFT;
 	sa->avg_period += delta;
 
 	return decayed;
@@ -2692,8 +2698,8 @@  static inline void __update_group_entity_contrib(struct sched_entity *se)
 
 static inline void update_rq_runnable_avg(struct rq *rq, int runnable)
 {
-	__update_entity_runnable_avg(rq_clock_task(rq), &rq->avg, runnable,
-			runnable);
+	__update_entity_runnable_avg(rq_clock_task(rq), cpu_of(rq), &rq->avg,
+			runnable, runnable);
 	__update_tg_runnable_avg(&rq->avg, &rq->cfs);
 }
 #else /* CONFIG_FAIR_GROUP_SCHED */
@@ -2771,6 +2777,7 @@  static inline void update_entity_load_avg(struct sched_entity *se,
 {
 	struct cfs_rq *cfs_rq = cfs_rq_of(se);
 	long contrib_delta, utilization_delta;
+	int cpu = cpu_of(rq_of(cfs_rq));
 	u64 now;
 
 	/*
@@ -2782,7 +2789,7 @@  static inline void update_entity_load_avg(struct sched_entity *se,
 	else
 		now = cfs_rq_clock_task(group_cfs_rq(se));
 
-	if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq,
+	if (!__update_entity_runnable_avg(now, cpu, &se->avg, se->on_rq,
 					cfs_rq->curr == se))
 		return;