[v10,07/11] sched: get CPU's usage statistic

Message ID	1425052454-25797-8-git-send-email-vincent.guittot@linaro.org
State	New
Headers	show Return-Path: <patchwork-forward+bncBCPZXIGQSEHBB5NGYKTQKGQEXZ4SMMQ@linaro.org> MIME-Version: 1.0 Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.180 as permitted sender) client-ip=209.85.217.180; Received-SPF: none (google.com: linux-kernel-owner@vger.kernel.org does not designate permitted sender hosts) client-ip=209.132.180.67; From: Vincent Guittot <vincent.guittot@linaro.org> To: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org, preeti@linux.vnet.ibm.com, Morten.Rasmussen@arm.com, kamalesh@linux.vnet.ibm.com Cc: riel@redhat.com, efault@gmx.de, nicolas.pitre@linaro.org, dietmar.eggemann@arm.com, linaro-kernel@lists.linaro.org, Vincent Guittot <vincent.guittot@linaro.org> Subject: [PATCH v10 07/11] sched: get CPU's usage statistic Date: Fri, 27 Feb 2015 16:54:10 +0100 Message-Id: <1425052454-25797-8-git-send-email-vincent.guittot@linaro.org> In-Reply-To: <1425052454-25797-1-git-send-email-vincent.guittot@linaro.org> References: <1425052454-25797-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org

Message ID

1425052454-25797-8-git-send-email-vincent.guittot@linaro.org

State

New

Headers

MIME-Version: 1.0
Received-SPF: pass (google.com: domain of
	patch+caf_=patchwork-forward=linaro.org@linaro.org designates
	209.85.217.180 as permitted sender) client-ip=209.85.217.180; 
Received-SPF: none (google.com: linux-kernel-owner@vger.kernel.org does not
	designate permitted sender hosts) client-ip=209.132.180.67; 
From: Vincent Guittot <vincent.guittot@linaro.org>
To: peterz@infradead.org, mingo@kernel.org,
	linux-kernel@vger.kernel.org, preeti@linux.vnet.ibm.com,
	Morten.Rasmussen@arm.com, kamalesh@linux.vnet.ibm.com
Cc: riel@redhat.com, efault@gmx.de, nicolas.pitre@linaro.org,
	dietmar.eggemann@arm.com, linaro-kernel@lists.linaro.org,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH v10 07/11] sched: get CPU's usage statistic
Date: Fri, 27 Feb 2015 16:54:10 +0100
Message-Id: <1425052454-25797-8-git-send-email-vincent.guittot@linaro.org>
In-Reply-To: <1425052454-25797-1-git-send-email-vincent.guittot@linaro.org>
References: <1425052454-25797-1-git-send-email-vincent.guittot@linaro.org>
Sender: linux-kernel-owner@vger.kernel.org
Precedence: list
Mailing-list: list patchwork-forward@linaro.org;
	contact patchwork-forward+owners@linaro.org

Commit Message

Vincent Guittot Feb. 27, 2015, 3:54 p.m. UTC

Monitor the usage level of each group of each sched_domain level. The usage is
the portion of cpu_capacity_orig that is currently used on a CPU or group of
CPUs. We use the utilization_load_avg to evaluate the usage level of each
group.

The utilization_load_avg only takes into account the running time of the CFS
tasks on a CPU with a maximum value of SCHED_LOAD_SCALE when the CPU is fully
utilized. Nevertheless, we must cap utilization_load_avg which can be temporaly
greater than SCHED_LOAD_SCALE after the migration of a task on this CPU and
until the metrics are stabilized.

The utilization_load_avg is in the range [0..SCHED_LOAD_SCALE] to reflect the
running load on the CPU whereas the available capacity for the CFS task is in
the range [0..cpu_capacity_orig]. In order to test if a CPU is fully utilized
by CFS tasks, we have to scale the utilization in the cpu_capacity_orig range
of the CPU to get the usage of the latter. The usage can then be compared with
the available capacity (ie cpu_capacity) to deduct the usage level of a CPU.

The frequency scaling invariance of the usage is not taken into account in this
patch, it will be solved in another patch which will deal with frequency
scaling invariance on the running_load_avg.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
---
 kernel/sched/fair.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

Comments

Vincent Guittot March 4, 2015, 7:53 a.m. UTC | #1

On 3 March 2015 at 13:47, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 27/02/15 15:54, Vincent Guittot wrote:
>> Monitor the usage level of each group of each sched_domain level. The usage is
>> the portion of cpu_capacity_orig that is currently used on a CPU or group of
>> CPUs. We use the utilization_load_avg to evaluate the usage level of each
>> group.
>>
>> The utilization_load_avg only takes into account the running time of the CFS
>> tasks on a CPU with a maximum value of SCHED_LOAD_SCALE when the CPU is fully
>> utilized. Nevertheless, we must cap utilization_load_avg which can be temporaly
>
> s/temporaly/temporally
>
>> greater than SCHED_LOAD_SCALE after the migration of a task on this CPU and
>> until the metrics are stabilized.
>>
>> The utilization_load_avg is in the range [0..SCHED_LOAD_SCALE] to reflect the
>> running load on the CPU whereas the available capacity for the CFS task is in
>> the range [0..cpu_capacity_orig]. In order to test if a CPU is fully utilized
>> by CFS tasks, we have to scale the utilization in the cpu_capacity_orig range
>> of the CPU to get the usage of the latter. The usage can then be compared with
>> the available capacity (ie cpu_capacity) to deduct the usage level of a CPU.
>>
>> The frequency scaling invariance of the usage is not taken into account in this
>> patch, it will be solved in another patch which will deal with frequency
>> scaling invariance on the running_load_avg.
>
> The use of underscores in running_load_avg implies to me that this is a
> data member of struct sched_avg or something similar. But there is no
> running_load_avg in the current code. However, I can see that
> sched_avg::*running_avg_sum* (and therefore
> cfs_rq::*utilization_load_avg*) are frequency scale invariant.
>
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
>> ---
>>  kernel/sched/fair.c | 29 +++++++++++++++++++++++++++++
>>  1 file changed, 29 insertions(+)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 10f84c3..faf61a2 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4781,6 +4781,33 @@ static int select_idle_sibling(struct task_struct *p, int target)
>>  done:
>>       return target;
>>  }
>> +/*
>> + * get_cpu_usage returns the amount of capacity of a CPU that is used by CFS
>> + * tasks. The unit of the return value must capacity so we can compare the
>
> s/must capacity/must be the one of capacity
>
>> + * usage with the capacity of the CPU that is available for CFS task (ie
>> + * cpu_capacity).
>> + * cfs.utilization_load_avg is the sum of running time of runnable tasks on a
>> + * CPU. It represents the amount of utilization of a CPU in the range
>> + * [0..SCHED_LOAD_SCALE].  The usage of a CPU can't be higher than the full
>> + * capacity of the CPU because it's about the running time on this CPU.
>> + * Nevertheless, cfs.utilization_load_avg can be higher than SCHED_LOAD_SCALE
>> + * because of unfortunate rounding in avg_period and running_load_avg or just
>> + * after migrating tasks until the average stabilizes with the new running
>> + * time. So we need to check that the usage stays into the range
>> + * [0..cpu_capacity_orig] and cap if necessary.
>> + * Without capping the usage, a group could be seen as overloaded (CPU0 usage
>> + * at 121% + CPU1 usage at 80%) whereas CPU1 has 20% of available capacity/
>
> s/capacity\//capacity.

I have resent the patch with typo correction

>
> [...]
>
> -- Dietmar
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

pang.xunlei March 27, 2015, 3:12 p.m. UTC | #2

Hi Vincent,

On 27 February 2015 at 23:54, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
> Monitor the usage level of each group of each sched_domain level. The usage is
> the portion of cpu_capacity_orig that is currently used on a CPU or group of
> CPUs. We use the utilization_load_avg to evaluate the usage level of each
> group.
>
> The utilization_load_avg only takes into account the running time of the CFS
> tasks on a CPU with a maximum value of SCHED_LOAD_SCALE when the CPU is fully
> utilized. Nevertheless, we must cap utilization_load_avg which can be temporaly
> greater than SCHED_LOAD_SCALE after the migration of a task on this CPU and
> until the metrics are stabilized.
>
> + * at 121% + CPU1 usage at 80%) whereas CPU1 has 20% of available capacity/
> + */
> +static int get_cpu_usage(int cpu)
> +{
> +       unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
> +       unsigned long capacity = capacity_orig_of(cpu);
> +
> +       if (usage >= SCHED_LOAD_SCALE)
> +               return capacity;

Can "capacity" be greater than SCHED_LOAD_SCALE?
Why use SCHED_LOAD_SCALE instead of "capacity" in this judgement?

-Xunlei

> +
> +       return (usage * capacity) >> SCHED_LOAD_SHIFT;
> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Vincent Guittot March 27, 2015, 3:37 p.m. UTC | #3

On 27 March 2015 at 16:12, Xunlei Pang <pang.xunlei@linaro.org> wrote:
> Hi Vincent,
>
> On 27 February 2015 at 23:54, Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
>> Monitor the usage level of each group of each sched_domain level. The usage is
>> the portion of cpu_capacity_orig that is currently used on a CPU or group of
>> CPUs. We use the utilization_load_avg to evaluate the usage level of each
>> group.
>>
>> The utilization_load_avg only takes into account the running time of the CFS
>> tasks on a CPU with a maximum value of SCHED_LOAD_SCALE when the CPU is fully
>> utilized. Nevertheless, we must cap utilization_load_avg which can be temporaly
>> greater than SCHED_LOAD_SCALE after the migration of a task on this CPU and
>> until the metrics are stabilized.
>>
>> + * at 121% + CPU1 usage at 80%) whereas CPU1 has 20% of available capacity/
>> + */
>> +static int get_cpu_usage(int cpu)
>> +{
>> +       unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
>> +       unsigned long capacity = capacity_orig_of(cpu);
>> +
>> +       if (usage >= SCHED_LOAD_SCALE)
>> +               return capacity;
>
> Can "capacity" be greater than SCHED_LOAD_SCALE?
> Why use SCHED_LOAD_SCALE instead of "capacity" in this judgement?

Yes, SCHED_LOAD_SCALE is the default value but the capacity can be in
the range [1536:512] for arm as an example

>
> -Xunlei
>
>> +
>> +       return (usage * capacity) >> SCHED_LOAD_SHIFT;
>> +}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

pang.xunlei April 1, 2015, 3:22 a.m. UTC | #4

Vincent,
On 27 March 2015 at 23:37, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> On 27 March 2015 at 16:12, Xunlei Pang <pang.xunlei@linaro.org> wrote:
>>> +static int get_cpu_usage(int cpu)
>>> +{
>>> +       unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
>>> +       unsigned long capacity = capacity_orig_of(cpu);
>>> +
>>> +       if (usage >= SCHED_LOAD_SCALE)
>>> +               return capacity;
>>
>> Can "capacity" be greater than SCHED_LOAD_SCALE?
>> Why use SCHED_LOAD_SCALE instead of "capacity" in this judgement?
>
> Yes, SCHED_LOAD_SCALE is the default value but the capacity can be in
> the range [1536:512] for arm as an example

Right, I was confused between cpu capacity and
arch_scale_freq_capacity() in "Patch 04"  then. Thanks.

-Xunlei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 10f84c3..faf61a2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4781,6 +4781,33 @@  static int select_idle_sibling(struct task_struct *p, int target)
 done:
 	return target;
 }
+/*
+ * get_cpu_usage returns the amount of capacity of a CPU that is used by CFS
+ * tasks. The unit of the return value must capacity so we can compare the
+ * usage with the capacity of the CPU that is available for CFS task (ie
+ * cpu_capacity).
+ * cfs.utilization_load_avg is the sum of running time of runnable tasks on a
+ * CPU. It represents the amount of utilization of a CPU in the range
+ * [0..SCHED_LOAD_SCALE].  The usage of a CPU can't be higher than the full
+ * capacity of the CPU because it's about the running time on this CPU.
+ * Nevertheless, cfs.utilization_load_avg can be higher than SCHED_LOAD_SCALE
+ * because of unfortunate rounding in avg_period and running_load_avg or just
+ * after migrating tasks until the average stabilizes with the new running
+ * time. So we need to check that the usage stays into the range
+ * [0..cpu_capacity_orig] and cap if necessary.
+ * Without capping the usage, a group could be seen as overloaded (CPU0 usage
+ * at 121% + CPU1 usage at 80%) whereas CPU1 has 20% of available capacity/
+ */
+static int get_cpu_usage(int cpu)
+{
+	unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg;
+	unsigned long capacity = capacity_orig_of(cpu);
+
+	if (usage >= SCHED_LOAD_SCALE)
+		return capacity;
+
+	return (usage * capacity) >> SCHED_LOAD_SHIFT;
+}
 
 /*
  * select_task_rq_fair: Select target runqueue for the waking task in domains
@@ -5907,6 +5934,7 @@  struct sg_lb_stats {
 	unsigned long sum_weighted_load; /* Weighted load of group's tasks */
 	unsigned long load_per_task;
 	unsigned long group_capacity;
+	unsigned long group_usage; /* Total usage of the group */
 	unsigned int sum_nr_running; /* Nr tasks running in the group */
 	unsigned int group_capacity_factor;
 	unsigned int idle_cpus;
@@ -6255,6 +6283,7 @@  static inline void update_sg_lb_stats(struct lb_env *env,
 			load = source_load(i, load_idx);
 
 		sgs->group_load += load;
+		sgs->group_usage += get_cpu_usage(i);
 		sgs->sum_nr_running += rq->cfs.h_nr_running;
 
 		if (rq->nr_running > 1)