[v7,5/7] sched: get CPU's usage statistic

Message ID 1412684017-16595-6-git-send-email-vincent.guittot@linaro.org
State New
Headers show

Commit Message

Vincent Guittot Oct. 7, 2014, 12:13 p.m.
Monitor the usage level of each group of each sched_domain level. The usage is
the amount of cpu_capacity that is currently used on a CPU or group of CPUs.
We use the utilization_load_avg to evaluate the usage level of each group.

The utilization_avg_contrib only takes into account the running time but not
the uArch so the utilization_load_avg is in the range [0..SCHED_LOAD_SCALE]
to reflect the running load on the CPU. We have to scale the utilization with
the capacity of the CPU to get the usage of the latter. The usage can then be
compared with the available capacity.

The frequency scaling invariance is not taken into account in this patchset,
it will be solved in another patchset

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Peter Zijlstra Oct. 9, 2014, 11:36 a.m. | #1
On Tue, Oct 07, 2014 at 02:13:35PM +0200, Vincent Guittot wrote:
> Monitor the usage level of each group of each sched_domain level. The usage is
> the amount of cpu_capacity that is currently used on a CPU or group of CPUs.
> We use the utilization_load_avg to evaluate the usage level of each group.
> 
> The utilization_avg_contrib only takes into account the running time but not
> the uArch so the utilization_load_avg is in the range [0..SCHED_LOAD_SCALE]
> to reflect the running load on the CPU. We have to scale the utilization with
> the capacity of the CPU to get the usage of the latter. The usage can then be
> compared with the available capacity.

You say cpu_capacity, but in actual fact you use capacity_orig and fail
to justify/clarify this.

> The frequency scaling invariance is not taken into account in this patchset,
> it will be solved in another patchset

Maybe explain what the specific invariance issue is that is skipped over
for now.

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d3e9067..7364ed4 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4551,6 +4551,17 @@ static int select_idle_sibling(struct task_struct *p, int target)
>  	return target;
>  }
>  
> +static int get_cpu_usage(int cpu)
> +{
> +	unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
> +	unsigned long capacity = capacity_orig_of(cpu);
> +
> +	if (usage >= SCHED_LOAD_SCALE)
> +		return capacity + 1;

Like Morten I'm confused by that +1 thing.

> +
> +	return (usage * capacity) >> SCHED_LOAD_SHIFT;
> +}

A comment with that function that it returns capacity units might
clarify shift confusion Morten raised the other day.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot Oct. 9, 2014, 1:57 p.m. | #2
On 9 October 2014 13:36, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Oct 07, 2014 at 02:13:35PM +0200, Vincent Guittot wrote:
>> Monitor the usage level of each group of each sched_domain level. The usage is
>> the amount of cpu_capacity that is currently used on a CPU or group of CPUs.
>> We use the utilization_load_avg to evaluate the usage level of each group.
>>
>> The utilization_avg_contrib only takes into account the running time but not
>> the uArch so the utilization_load_avg is in the range [0..SCHED_LOAD_SCALE]
>> to reflect the running load on the CPU. We have to scale the utilization with
>> the capacity of the CPU to get the usage of the latter. The usage can then be
>> compared with the available capacity.
>
> You say cpu_capacity, but in actual fact you use capacity_orig and fail
> to justify/clarify this.

you're right it's cpu_capacity_orig no cpu_capacity

cpu_capacity is the compute capacity available for CFS task once we
have removed the capacity that is used by RT tasks.

We want to compare the utilization of the CPU (utilization_avg_contrib
which is in the range [0..SCHED_LOAD_SCALE]) with available capacity
(cpu_capacity which is in the range [0..cpu_capacity_orig])
An utilization_avg_contrib equals to SCHED_LOAD_SCALE means that the
CPU is fully utilized so all cpu_capacity_orig are used. so we scale
the utilization_avg_contrib from [0..SCHED_LOAD_SCALE] into cpu_usage
in the range [0..cpu_capacity_orig]


>
>> The frequency scaling invariance is not taken into account in this patchset,
>> it will be solved in another patchset
>
> Maybe explain what the specific invariance issue is that is skipped over
> for now.

ok. I can add description on the fact that if the core run slower, the
tasks will use more running time of the CPU for the same job so the
usage of the cpu which is based on the amount of time, will increase

>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  kernel/sched/fair.c | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index d3e9067..7364ed4 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4551,6 +4551,17 @@ static int select_idle_sibling(struct task_struct *p, int target)
>>       return target;
>>  }
>>
>> +static int get_cpu_usage(int cpu)
>> +{
>> +     unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
>> +     unsigned long capacity = capacity_orig_of(cpu);
>> +
>> +     if (usage >= SCHED_LOAD_SCALE)
>> +             return capacity + 1;
>
> Like Morten I'm confused by that +1 thing.

ok. the goal was to point out the erroneous case where usage is out of
the range but if it generates confusion, it can remove it

>
>> +
>> +     return (usage * capacity) >> SCHED_LOAD_SHIFT;
>> +}
>
> A comment with that function that it returns capacity units might
> clarify shift confusion Morten raised the other day.

ok. i will add a comment
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Peter Zijlstra Oct. 9, 2014, 3:12 p.m. | #3
On Thu, Oct 09, 2014 at 03:57:28PM +0200, Vincent Guittot wrote:
> On 9 October 2014 13:36, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, Oct 07, 2014 at 02:13:35PM +0200, Vincent Guittot wrote:
> >> Monitor the usage level of each group of each sched_domain level. The usage is
> >> the amount of cpu_capacity that is currently used on a CPU or group of CPUs.
> >> We use the utilization_load_avg to evaluate the usage level of each group.
> >>
> >> The utilization_avg_contrib only takes into account the running time but not
> >> the uArch so the utilization_load_avg is in the range [0..SCHED_LOAD_SCALE]
> >> to reflect the running load on the CPU. We have to scale the utilization with
> >> the capacity of the CPU to get the usage of the latter. The usage can then be
> >> compared with the available capacity.
> >
> > You say cpu_capacity, but in actual fact you use capacity_orig and fail
> > to justify/clarify this.
> 
> you're right it's cpu_capacity_orig no cpu_capacity
> 
> cpu_capacity is the compute capacity available for CFS task once we
> have removed the capacity that is used by RT tasks.

But why, when you compare the sum of usage against the capacity you want
matching units. Otherwise your usage will far exceed capacity in the
presence of RT tasks, that doesn't seen to make sense to me.

> We want to compare the utilization of the CPU (utilization_avg_contrib
> which is in the range [0..SCHED_LOAD_SCALE]) with available capacity
> (cpu_capacity which is in the range [0..cpu_capacity_orig])
> An utilization_avg_contrib equals to SCHED_LOAD_SCALE means that the
> CPU is fully utilized so all cpu_capacity_orig are used. so we scale
> the utilization_avg_contrib from [0..SCHED_LOAD_SCALE] into cpu_usage
> in the range [0..cpu_capacity_orig]

Right, I got that, the usage thing converts from 'utilization' to
fraction of capacity, so that we can then compare it against the total
capacity.

But if, as per the above we were to use the same capacity for both
sides, its a pointless factor and we could've immediately compared the
'utilization' number against its unit.

> >> +static int get_cpu_usage(int cpu)
> >> +{
> >> +     unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
> >> +     unsigned long capacity = capacity_orig_of(cpu);
> >> +
> >> +     if (usage >= SCHED_LOAD_SCALE)
> >> +             return capacity + 1;
> >
> > Like Morten I'm confused by that +1 thing.
> 
> ok. the goal was to point out the erroneous case where usage is out of
> the range but if it generates confusion, it can remove it

Well, the fact that you clip makes that point, returning a value outside
of the specified range doesn't.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot Oct. 10, 2014, 2:38 p.m. | #4
On 9 October 2014 17:12, Peter Zijlstra <peterz@infradead.org> wrote:

>> >> +static int get_cpu_usage(int cpu)
>> >> +{
>> >> +     unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
>> >> +     unsigned long capacity = capacity_orig_of(cpu);
>> >> +
>> >> +     if (usage >= SCHED_LOAD_SCALE)
>> >> +             return capacity + 1;
>> >
>> > Like Morten I'm confused by that +1 thing.
>>
>> ok. the goal was to point out the erroneous case where usage is out of
>> the range but if it generates confusion, it can remove it
>
> Well, the fact that you clip makes that point, returning a value outside
> of the specified range doesn't.

i meant removing the +1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d3e9067..7364ed4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4551,6 +4551,17 @@  static int select_idle_sibling(struct task_struct *p, int target)
 	return target;
 }
 
+static int get_cpu_usage(int cpu)
+{
+	unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
+	unsigned long capacity = capacity_orig_of(cpu);
+
+	if (usage >= SCHED_LOAD_SCALE)
+		return capacity + 1;
+
+	return (usage * capacity) >> SCHED_LOAD_SHIFT;
+}
+
 /*
  * select_task_rq_fair: Select target runqueue for the waking task in domains
  * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
@@ -5679,6 +5690,7 @@  struct sg_lb_stats {
 	unsigned long sum_weighted_load; /* Weighted load of group's tasks */
 	unsigned long load_per_task;
 	unsigned long group_capacity;
+	unsigned long group_usage; /* Total usage of the group */
 	unsigned int sum_nr_running; /* Nr tasks running in the group */
 	unsigned int group_capacity_factor;
 	unsigned int idle_cpus;
@@ -6053,6 +6065,7 @@  static inline void update_sg_lb_stats(struct lb_env *env,
 			load = source_load(i, load_idx);
 
 		sgs->group_load += load;
+		sgs->group_usage += get_cpu_usage(i);
 		sgs->sum_nr_running += rq->cfs.h_nr_running;
 
 		if (rq->nr_running > 1)