[v6,4/6] sched: get CPU's usage statistic

Message ID 1411488485-10025-5-git-send-email-vincent.guittot@linaro.org
State New
Headers show

Commit Message

Vincent Guittot Sept. 23, 2014, 4:08 p.m.
Monitor the usage level of each group of each sched_domain level. The usage is
the amount of cpu_capacity that is currently used on a CPU or group of CPUs.
We use the utilization_load_avg to evaluate the usage level of each group.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Dietmar Eggemann Sept. 25, 2014, 7:05 p.m. | #1
On 23/09/14 17:08, Vincent Guittot wrote:
> Monitor the usage level of each group of each sched_domain level. The usage is
> the amount of cpu_capacity that is currently used on a CPU or group of CPUs.
> We use the utilization_load_avg to evaluate the usage level of each group.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2cf153d..4097e3f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4523,6 +4523,17 @@ static int select_idle_sibling(struct task_struct *p, int target)
>  	return target;
>  }
>  
> +static int get_cpu_usage(int cpu)
> +{
> +	unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
> +	unsigned long capacity = capacity_orig_of(cpu);
> +
> +	if (usage >= SCHED_LOAD_SCALE)
> +		return capacity + 1;

Why you are returning rq->cpu_capacity_orig + 1 (1025) in case
utilization_load_avg is greater or equal than 1024 and not usage or
(usage * capacity) >> SCHED_LOAD_SHIFT too?

In case the weight of a sched group is greater than 1, you might loose
the information that the whole sched group is over-utilized too.

You add up the individual cpu usage values for a group by
sgs->group_usage += get_cpu_usage(i) in update_sg_lb_stats and later use
sgs->group_usage in group_is_overloaded to compare it against
sgs->group_capacity (taking imbalance_pct into consideration).

> +
> +	return (usage * capacity) >> SCHED_LOAD_SHIFT;

Nit-pick: Since you're multiplying by a capacity value
(rq->cpu_capacity_orig) you should shift by SCHED_CAPACITY_SHIFT.

Just to make sure: You do this scaling of usage by cpu_capacity_orig
here only to cater for the fact that cpu_capacity_orig might be uarch
scaled (by arch_scale_cpu_capacity, !SMT) in update_cpu_capacity while
utilization_load_avg is currently not.
We don't even uArch scale on ARM TC2 big.LITTLE platform in mainline
today due to the missing clock-frequency property in the device tree.

I think it's hard for people to grasp that your patch-set takes uArch
scaling of capacity into consideration but not frequency scaling of
capacity (via arch_scale_freq_capacity, not used at the moment).

> +}
> +
>  /*
>   * select_task_rq_fair: Select target runqueue for the waking task in domains
>   * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
> @@ -5663,6 +5674,7 @@ struct sg_lb_stats {
>  	unsigned long sum_weighted_load; /* Weighted load of group's tasks */
>  	unsigned long load_per_task;
>  	unsigned long group_capacity;
> +	unsigned long group_usage; /* Total usage of the group */
>  	unsigned int sum_nr_running; /* Nr tasks running in the group */
>  	unsigned int group_capacity_factor;
>  	unsigned int idle_cpus;
> @@ -6037,6 +6049,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>  			load = source_load(i, load_idx);
>  
>  		sgs->group_load += load;
> +		sgs->group_usage += get_cpu_usage(i);
>  		sgs->sum_nr_running += rq->cfs.h_nr_running;
>  
>  		if (rq->nr_running > 1)
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot Sept. 26, 2014, 12:17 p.m. | #2
On 25 September 2014 21:05, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 23/09/14 17:08, Vincent Guittot wrote:
>> Monitor the usage level of each group of each sched_domain level. The usage is
>> the amount of cpu_capacity that is currently used on a CPU or group of CPUs.
>> We use the utilization_load_avg to evaluate the usage level of each group.
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  kernel/sched/fair.c | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 2cf153d..4097e3f 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4523,6 +4523,17 @@ static int select_idle_sibling(struct task_struct *p, int target)
>>       return target;
>>  }
>>
>> +static int get_cpu_usage(int cpu)
>> +{
>> +     unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
>> +     unsigned long capacity = capacity_orig_of(cpu);
>> +
>> +     if (usage >= SCHED_LOAD_SCALE)
>> +             return capacity + 1;
>
> Why you are returning rq->cpu_capacity_orig + 1 (1025) in case
> utilization_load_avg is greater or equal than 1024 and not usage or
> (usage * capacity) >> SCHED_LOAD_SHIFT too?

The usage can't be higher than the full capacity of the CPU because
it's about the running time on this CPU. Nevertheless, usage can be
higher than SCHED_LOAD_SCALE because of unfortunate rounding in
avg_period and running_load_avg or just after migrating tasks until
the average stabilizes with the new running time.

>
> In case the weight of a sched group is greater than 1, you might loose
> the information that the whole sched group is over-utilized too.

that's exactly for sched_group with more than 1 CPU that we need to
cap the usage of a CPU to 100%. Otherwise, the group could be seen as
overloaded (CPU0 usage at 121% + CPU1 usage at 80%) whereas CPU1 has
20% of available capacity

>
> You add up the individual cpu usage values for a group by
> sgs->group_usage += get_cpu_usage(i) in update_sg_lb_stats and later use
> sgs->group_usage in group_is_overloaded to compare it against
> sgs->group_capacity (taking imbalance_pct into consideration).
>
>> +
>> +     return (usage * capacity) >> SCHED_LOAD_SHIFT;
>
> Nit-pick: Since you're multiplying by a capacity value
> (rq->cpu_capacity_orig) you should shift by SCHED_CAPACITY_SHIFT.

we want to compare the output of the function with some capacity
figures so i think that >> SCHED_LOAD_SHIFT is the right operation.

>
> Just to make sure: You do this scaling of usage by cpu_capacity_orig
> here only to cater for the fact that cpu_capacity_orig might be uarch
> scaled (by arch_scale_cpu_capacity, !SMT) in update_cpu_capacity while

I do this for any system with CPUs that have an original capacity that
is different from SCHED_CAPACITY_SCALE so it's for both uArch and SMT.

> utilization_load_avg is currently not.
> We don't even uArch scale on ARM TC2 big.LITTLE platform in mainline
> today due to the missing clock-frequency property in the device tree.

sorry i don't catch your point

>
> I think it's hard for people to grasp that your patch-set takes uArch
> scaling of capacity into consideration but not frequency scaling of
> capacity (via arch_scale_freq_capacity, not used at the moment).
>
>> +}
>> +
>>  /*
>>   * select_task_rq_fair: Select target runqueue for the waking task in domains
>>   * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
>> @@ -5663,6 +5674,7 @@ struct sg_lb_stats {
>>       unsigned long sum_weighted_load; /* Weighted load of group's tasks */
>>       unsigned long load_per_task;
>>       unsigned long group_capacity;
>> +     unsigned long group_usage; /* Total usage of the group */
>>       unsigned int sum_nr_running; /* Nr tasks running in the group */
>>       unsigned int group_capacity_factor;
>>       unsigned int idle_cpus;
>> @@ -6037,6 +6049,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>>                       load = source_load(i, load_idx);
>>
>>               sgs->group_load += load;
>> +             sgs->group_usage += get_cpu_usage(i);
>>               sgs->sum_nr_running += rq->cfs.h_nr_running;
>>
>>               if (rq->nr_running > 1)
>>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Morten Rasmussen Sept. 26, 2014, 3:58 p.m. | #3
On Fri, Sep 26, 2014 at 01:17:43PM +0100, Vincent Guittot wrote:
> On 25 September 2014 21:05, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> > On 23/09/14 17:08, Vincent Guittot wrote:
> >> Monitor the usage level of each group of each sched_domain level. The usage is
> >> the amount of cpu_capacity that is currently used on a CPU or group of CPUs.
> >> We use the utilization_load_avg to evaluate the usage level of each group.
> >>
> >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> >> ---
> >>  kernel/sched/fair.c | 13 +++++++++++++
> >>  1 file changed, 13 insertions(+)
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 2cf153d..4097e3f 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -4523,6 +4523,17 @@ static int select_idle_sibling(struct task_struct *p, int target)
> >>       return target;
> >>  }
> >>
> >> +static int get_cpu_usage(int cpu)
> >> +{
> >> +     unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
> >> +     unsigned long capacity = capacity_orig_of(cpu);
> >> +
> >> +     if (usage >= SCHED_LOAD_SCALE)
> >> +             return capacity + 1;
> >
> > Why you are returning rq->cpu_capacity_orig + 1 (1025) in case
> > utilization_load_avg is greater or equal than 1024 and not usage or
> > (usage * capacity) >> SCHED_LOAD_SHIFT too?
> 
> The usage can't be higher than the full capacity of the CPU because
> it's about the running time on this CPU. Nevertheless, usage can be
> higher than SCHED_LOAD_SCALE because of unfortunate rounding in
> avg_period and running_load_avg or just after migrating tasks until
> the average stabilizes with the new running time.

I fully agree that the cpu usage should be capped to capacity, but why
do you return capacity + 1? I would just return capacity, no?

Now that you have gotten rid of 'usage' everywhere else, shouldn't this
function be renamed to get_cpu_utilization()?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Dietmar Eggemann Sept. 26, 2014, 7:57 p.m. | #4
On 26/09/14 13:17, Vincent Guittot wrote:
> On 25 September 2014 21:05, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>> On 23/09/14 17:08, Vincent Guittot wrote:

[...]

>>>
>>> +static int get_cpu_usage(int cpu)
>>> +{
>>> +     unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
>>> +     unsigned long capacity = capacity_orig_of(cpu);
>>> +
>>> +     if (usage >= SCHED_LOAD_SCALE)
>>> +             return capacity + 1;
>>
>> Why you are returning rq->cpu_capacity_orig + 1 (1025) in case
>> utilization_load_avg is greater or equal than 1024 and not usage or
>> (usage * capacity) >> SCHED_LOAD_SHIFT too?
> 
> The usage can't be higher than the full capacity of the CPU because
> it's about the running time on this CPU. Nevertheless, usage can be
> higher than SCHED_LOAD_SCALE because of unfortunate rounding in
> avg_period and running_load_avg or just after migrating tasks until
> the average stabilizes with the new running time.

Ok, I got it now, thanks!


When running 'hackbench -p -T -s 10 -l 1' on TC2, the usage for a cpu
goes occasionally also much higher than SCHED_LOAD_SCALE. After all,
p->se.avg.running_avg_sum is initialized to slice in
init_task_runnable_average.

> 
>>
>> In case the weight of a sched group is greater than 1, you might loose
>> the information that the whole sched group is over-utilized too.
> 
> that's exactly for sched_group with more than 1 CPU that we need to
> cap the usage of a CPU to 100%. Otherwise, the group could be seen as
> overloaded (CPU0 usage at 121% + CPU1 usage at 80%) whereas CPU1 has
> 20% of available capacity

Makes sense, we don't want to do anything in this case on a sched level
(e.g. DIE), the appropriate level below (e.g. MC) should balance this
out first. Got it!

> 
>>
>> You add up the individual cpu usage values for a group by
>> sgs->group_usage += get_cpu_usage(i) in update_sg_lb_stats and later use
>> sgs->group_usage in group_is_overloaded to compare it against
>> sgs->group_capacity (taking imbalance_pct into consideration).
>>
>>> +
>>> +     return (usage * capacity) >> SCHED_LOAD_SHIFT;
>>
>> Nit-pick: Since you're multiplying by a capacity value
>> (rq->cpu_capacity_orig) you should shift by SCHED_CAPACITY_SHIFT.
> 
> we want to compare the output of the function with some capacity
> figures so i think that >> SCHED_LOAD_SHIFT is the right operation.
> 
>>
>> Just to make sure: You do this scaling of usage by cpu_capacity_orig
>> here only to cater for the fact that cpu_capacity_orig might be uarch
>> scaled (by arch_scale_cpu_capacity, !SMT) in update_cpu_capacity while
> 
> I do this for any system with CPUs that have an original capacity that
> is different from SCHED_CAPACITY_SCALE so it's for both uArch and SMT.

Understood so your current patch-set is doing uArch scaling for capacity
and since you're not doing uArch scaling for utilization, you do this '*
capacity) >> SCHED_LOAD_SHIFT' thing. Correct?

> 
>> utilization_load_avg is currently not.
>> We don't even uArch scale on ARM TC2 big.LITTLE platform in mainline
>> today due to the missing clock-frequency property in the device tree.
> 
> sorry i don't catch your point

With mainline dts file for ARM TC2, the rq->cpu_capacity-orig is 1024
for all 5 cpus (A15's and A7's). The arm topology shim layer barfs a

  /cpus/cpu@x missing clock-frequency property

per cpu in this case and doesn't scale the capacity. Only when I add

 clock-frequency = <xxxxxxxxx>;

per cpuX node into the dts file, I get a system with asymmetric
rq->cpu_capacity_orig values (606 for an A7 and 1441 for an A15).

> 
>>
>> I think it's hard for people to grasp that your patch-set takes uArch
>> scaling of capacity into consideration but not frequency scaling of
>> capacity (via arch_scale_freq_capacity, not used at the moment).

[...]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Wanpeng Li Nov. 21, 2014, 5:36 a.m. | #5
Hi Vincent,
On 9/26/14, 8:17 PM, Vincent Guittot wrote:
> On 25 September 2014 21:05, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>> On 23/09/14 17:08, Vincent Guittot wrote:
>>> Monitor the usage level of each group of each sched_domain level. The usage is
>>> the amount of cpu_capacity that is currently used on a CPU or group of CPUs.
>>> We use the utilization_load_avg to evaluate the usage level of each group.
>>>
>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>> ---
>>>   kernel/sched/fair.c | 13 +++++++++++++
>>>   1 file changed, 13 insertions(+)
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 2cf153d..4097e3f 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -4523,6 +4523,17 @@ static int select_idle_sibling(struct task_struct *p, int target)
>>>        return target;
>>>   }
>>>
>>> +static int get_cpu_usage(int cpu)
>>> +{
>>> +     unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
>>> +     unsigned long capacity = capacity_orig_of(cpu);
>>> +
>>> +     if (usage >= SCHED_LOAD_SCALE)
>>> +             return capacity + 1;
>> Why you are returning rq->cpu_capacity_orig + 1 (1025) in case
>> utilization_load_avg is greater or equal than 1024 and not usage or
>> (usage * capacity) >> SCHED_LOAD_SHIFT too?
> The usage can't be higher than the full capacity of the CPU because
> it's about the running time on this CPU. Nevertheless, usage can be
> higher than SCHED_LOAD_SCALE because of unfortunate rounding in
> avg_period and running_load_avg or just after migrating tasks until
> the average stabilizes with the new running time.
>
>> In case the weight of a sched group is greater than 1, you might loose
>> the information that the whole sched group is over-utilized too.
> that's exactly for sched_group with more than 1 CPU that we need to
> cap the usage of a CPU to 100%. Otherwise, the group could be seen as
> overloaded (CPU0 usage at 121% + CPU1 usage at 80%) whereas CPU1 has
> 20% of available capacity
>
>> You add up the individual cpu usage values for a group by
>> sgs->group_usage += get_cpu_usage(i) in update_sg_lb_stats and later use
>> sgs->group_usage in group_is_overloaded to compare it against
>> sgs->group_capacity (taking imbalance_pct into consideration).
>>
>>> +
>>> +     return (usage * capacity) >> SCHED_LOAD_SHIFT;
>> Nit-pick: Since you're multiplying by a capacity value
>> (rq->cpu_capacity_orig) you should shift by SCHED_CAPACITY_SHIFT.
> we want to compare the output of the function with some capacity
> figures so i think that >> SCHED_LOAD_SHIFT is the right operation.

Could you explain more why '>> SCHED_LOAD_SHIFT' instead of '>> 
SCHED_CAPACITY_SHIFT'?

Regards,
Wanpeng Li

>
>> Just to make sure: You do this scaling of usage by cpu_capacity_orig
>> here only to cater for the fact that cpu_capacity_orig might be uarch
>> scaled (by arch_scale_cpu_capacity, !SMT) in update_cpu_capacity while
> I do this for any system with CPUs that have an original capacity that
> is different from SCHED_CAPACITY_SCALE so it's for both uArch and SMT.
>
>> utilization_load_avg is currently not.
>> We don't even uArch scale on ARM TC2 big.LITTLE platform in mainline
>> today due to the missing clock-frequency property in the device tree.
> sorry i don't catch your point
>
>> I think it's hard for people to grasp that your patch-set takes uArch
>> scaling of capacity into consideration but not frequency scaling of
>> capacity (via arch_scale_freq_capacity, not used at the moment).
>>
>>> +}
>>> +
>>>   /*
>>>    * select_task_rq_fair: Select target runqueue for the waking task in domains
>>>    * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
>>> @@ -5663,6 +5674,7 @@ struct sg_lb_stats {
>>>        unsigned long sum_weighted_load; /* Weighted load of group's tasks */
>>>        unsigned long load_per_task;
>>>        unsigned long group_capacity;
>>> +     unsigned long group_usage; /* Total usage of the group */
>>>        unsigned int sum_nr_running; /* Nr tasks running in the group */
>>>        unsigned int group_capacity_factor;
>>>        unsigned int idle_cpus;
>>> @@ -6037,6 +6049,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>>>                        load = source_load(i, load_idx);
>>>
>>>                sgs->group_load += load;
>>> +             sgs->group_usage += get_cpu_usage(i);
>>>                sgs->sum_nr_running += rq->cfs.h_nr_running;
>>>
>>>                if (rq->nr_running > 1)
>>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot Nov. 21, 2014, 12:17 p.m. | #6
On 21 November 2014 06:36, Wanpeng Li <kernellwp@gmail.com> wrote:
> Hi Vincent,
>
> On 9/26/14, 8:17 PM, Vincent Guittot wrote:

 [snip]

>>
>>> You add up the individual cpu usage values for a group by
>>> sgs->group_usage += get_cpu_usage(i) in update_sg_lb_stats and later use
>>> sgs->group_usage in group_is_overloaded to compare it against
>>> sgs->group_capacity (taking imbalance_pct into consideration).
>>>
>>>> +
>>>> +     return (usage * capacity) >> SCHED_LOAD_SHIFT;
>>>
>>> Nit-pick: Since you're multiplying by a capacity value
>>> (rq->cpu_capacity_orig) you should shift by SCHED_CAPACITY_SHIFT.
>>
>> we want to compare the output of the function with some capacity
>> figures so i think that >> SCHED_LOAD_SHIFT is the right operation.
>
>
> Could you explain more why '>> SCHED_LOAD_SHIFT' instead of '>>
> SCHED_CAPACITY_SHIFT'?

The return of get_cpu_usage is going to be compared with capacity so
we need to return CAPACITY unit

usage unit is LOAD, capacity unit is CAPACITY so we have to divide by
LOAD unit to return CAPACITY unit

LOAD unit * CAPACITY unit / LOAD unit -> CAPACITY unit

Regards,
Vincent

>
> Regards,
> Wanpeng Li
>
>>
>>> Just to make sure: You do this scaling of usage by cpu_capacity_orig
>>> here only to cater for the fact that cpu_capacity_orig might be uarch
>>> scaled (by arch_scale_cpu_capacity, !SMT) in update_cpu_capacity while
>>
>> I do this for any system with CPUs that have an original capacity that
>> is different from SCHED_CAPACITY_SCALE so it's for both uArch and SMT.
>>
>>> utilization_load_avg is currently not.
>>> We don't even uArch scale on ARM TC2 big.LITTLE platform in mainline
>>> today due to the missing clock-frequency property in the device tree.
>>
>> sorry i don't catch your point
>>
>>> I think it's hard for people to grasp that your patch-set takes uArch
>>> scaling of capacity into consideration but not frequency scaling of
>>> capacity (via arch_scale_freq_capacity, not used at the moment).
>>>
>>>> +}
>>>> +
>>>>   /*
>>>>    * select_task_rq_fair: Select target runqueue for the waking task in
>>>> domains
>>>>    * that have the 'sd_flag' flag set. In practice, this is
>>>> SD_BALANCE_WAKE,
>>>> @@ -5663,6 +5674,7 @@ struct sg_lb_stats {
>>>>        unsigned long sum_weighted_load; /* Weighted load of group's
>>>> tasks */
>>>>        unsigned long load_per_task;
>>>>        unsigned long group_capacity;
>>>> +     unsigned long group_usage; /* Total usage of the group */
>>>>        unsigned int sum_nr_running; /* Nr tasks running in the group */
>>>>        unsigned int group_capacity_factor;
>>>>        unsigned int idle_cpus;
>>>> @@ -6037,6 +6049,7 @@ static inline void update_sg_lb_stats(struct
>>>> lb_env *env,
>>>>                        load = source_load(i, load_idx);
>>>>
>>>>                sgs->group_load += load;
>>>> +             sgs->group_usage += get_cpu_usage(i);
>>>>                sgs->sum_nr_running += rq->cfs.h_nr_running;
>>>>
>>>>                if (rq->nr_running > 1)
>>>>
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at  http://www.tux.org/lkml/
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2cf153d..4097e3f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4523,6 +4523,17 @@  static int select_idle_sibling(struct task_struct *p, int target)
 	return target;
 }
 
+static int get_cpu_usage(int cpu)
+{
+	unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
+	unsigned long capacity = capacity_orig_of(cpu);
+
+	if (usage >= SCHED_LOAD_SCALE)
+		return capacity + 1;
+
+	return (usage * capacity) >> SCHED_LOAD_SHIFT;
+}
+
 /*
  * select_task_rq_fair: Select target runqueue for the waking task in domains
  * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
@@ -5663,6 +5674,7 @@  struct sg_lb_stats {
 	unsigned long sum_weighted_load; /* Weighted load of group's tasks */
 	unsigned long load_per_task;
 	unsigned long group_capacity;
+	unsigned long group_usage; /* Total usage of the group */
 	unsigned int sum_nr_running; /* Nr tasks running in the group */
 	unsigned int group_capacity_factor;
 	unsigned int idle_cpus;
@@ -6037,6 +6049,7 @@  static inline void update_sg_lb_stats(struct lb_env *env,
 			load = source_load(i, load_idx);
 
 		sgs->group_load += load;
+		sgs->group_usage += get_cpu_usage(i);
 		sgs->sum_nr_running += rq->cfs.h_nr_running;
 
 		if (rq->nr_running > 1)