Message ID | 1425052454-25797-8-git-send-email-vincent.guittot@linaro.org |
---|---|
State | New |
Headers | show |
On 3 March 2015 at 13:47, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > On 27/02/15 15:54, Vincent Guittot wrote: >> Monitor the usage level of each group of each sched_domain level. The usage is >> the portion of cpu_capacity_orig that is currently used on a CPU or group of >> CPUs. We use the utilization_load_avg to evaluate the usage level of each >> group. >> >> The utilization_load_avg only takes into account the running time of the CFS >> tasks on a CPU with a maximum value of SCHED_LOAD_SCALE when the CPU is fully >> utilized. Nevertheless, we must cap utilization_load_avg which can be temporaly > > s/temporaly/temporally > >> greater than SCHED_LOAD_SCALE after the migration of a task on this CPU and >> until the metrics are stabilized. >> >> The utilization_load_avg is in the range [0..SCHED_LOAD_SCALE] to reflect the >> running load on the CPU whereas the available capacity for the CFS task is in >> the range [0..cpu_capacity_orig]. In order to test if a CPU is fully utilized >> by CFS tasks, we have to scale the utilization in the cpu_capacity_orig range >> of the CPU to get the usage of the latter. The usage can then be compared with >> the available capacity (ie cpu_capacity) to deduct the usage level of a CPU. >> >> The frequency scaling invariance of the usage is not taken into account in this >> patch, it will be solved in another patch which will deal with frequency >> scaling invariance on the running_load_avg. > > The use of underscores in running_load_avg implies to me that this is a > data member of struct sched_avg or something similar. But there is no > running_load_avg in the current code. However, I can see that > sched_avg::*running_avg_sum* (and therefore > cfs_rq::*utilization_load_avg*) are frequency scale invariant. > >> >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> >> Acked-by: Morten Rasmussen <morten.rasmussen@arm.com> >> --- >> kernel/sched/fair.c | 29 +++++++++++++++++++++++++++++ >> 1 file changed, 29 insertions(+) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 10f84c3..faf61a2 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -4781,6 +4781,33 @@ static int select_idle_sibling(struct task_struct *p, int target) >> done: >> return target; >> } >> +/* >> + * get_cpu_usage returns the amount of capacity of a CPU that is used by CFS >> + * tasks. The unit of the return value must capacity so we can compare the > > s/must capacity/must be the one of capacity > >> + * usage with the capacity of the CPU that is available for CFS task (ie >> + * cpu_capacity). >> + * cfs.utilization_load_avg is the sum of running time of runnable tasks on a >> + * CPU. It represents the amount of utilization of a CPU in the range >> + * [0..SCHED_LOAD_SCALE]. The usage of a CPU can't be higher than the full >> + * capacity of the CPU because it's about the running time on this CPU. >> + * Nevertheless, cfs.utilization_load_avg can be higher than SCHED_LOAD_SCALE >> + * because of unfortunate rounding in avg_period and running_load_avg or just >> + * after migrating tasks until the average stabilizes with the new running >> + * time. So we need to check that the usage stays into the range >> + * [0..cpu_capacity_orig] and cap if necessary. >> + * Without capping the usage, a group could be seen as overloaded (CPU0 usage >> + * at 121% + CPU1 usage at 80%) whereas CPU1 has 20% of available capacity/ > > s/capacity\//capacity. I have resent the patch with typo correction > > [...] > > -- Dietmar > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Hi Vincent, On 27 February 2015 at 23:54, Vincent Guittot <vincent.guittot@linaro.org> wrote: > Monitor the usage level of each group of each sched_domain level. The usage is > the portion of cpu_capacity_orig that is currently used on a CPU or group of > CPUs. We use the utilization_load_avg to evaluate the usage level of each > group. > > The utilization_load_avg only takes into account the running time of the CFS > tasks on a CPU with a maximum value of SCHED_LOAD_SCALE when the CPU is fully > utilized. Nevertheless, we must cap utilization_load_avg which can be temporaly > greater than SCHED_LOAD_SCALE after the migration of a task on this CPU and > until the metrics are stabilized. > > + * at 121% + CPU1 usage at 80%) whereas CPU1 has 20% of available capacity/ > + */ > +static int get_cpu_usage(int cpu) > +{ > + unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg; > + unsigned long capacity = capacity_orig_of(cpu); > + > + if (usage >= SCHED_LOAD_SCALE) > + return capacity; Can "capacity" be greater than SCHED_LOAD_SCALE? Why use SCHED_LOAD_SCALE instead of "capacity" in this judgement? -Xunlei > + > + return (usage * capacity) >> SCHED_LOAD_SHIFT; > +} -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 27 March 2015 at 16:12, Xunlei Pang <pang.xunlei@linaro.org> wrote: > Hi Vincent, > > On 27 February 2015 at 23:54, Vincent Guittot > <vincent.guittot@linaro.org> wrote: >> Monitor the usage level of each group of each sched_domain level. The usage is >> the portion of cpu_capacity_orig that is currently used on a CPU or group of >> CPUs. We use the utilization_load_avg to evaluate the usage level of each >> group. >> >> The utilization_load_avg only takes into account the running time of the CFS >> tasks on a CPU with a maximum value of SCHED_LOAD_SCALE when the CPU is fully >> utilized. Nevertheless, we must cap utilization_load_avg which can be temporaly >> greater than SCHED_LOAD_SCALE after the migration of a task on this CPU and >> until the metrics are stabilized. >> >> + * at 121% + CPU1 usage at 80%) whereas CPU1 has 20% of available capacity/ >> + */ >> +static int get_cpu_usage(int cpu) >> +{ >> + unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg; >> + unsigned long capacity = capacity_orig_of(cpu); >> + >> + if (usage >= SCHED_LOAD_SCALE) >> + return capacity; > > Can "capacity" be greater than SCHED_LOAD_SCALE? > Why use SCHED_LOAD_SCALE instead of "capacity" in this judgement? Yes, SCHED_LOAD_SCALE is the default value but the capacity can be in the range [1536:512] for arm as an example > > -Xunlei > >> + >> + return (usage * capacity) >> SCHED_LOAD_SHIFT; >> +} -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Vincent, On 27 March 2015 at 23:37, Vincent Guittot <vincent.guittot@linaro.org> wrote: > On 27 March 2015 at 16:12, Xunlei Pang <pang.xunlei@linaro.org> wrote: >>> +static int get_cpu_usage(int cpu) >>> +{ >>> + unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg; >>> + unsigned long capacity = capacity_orig_of(cpu); >>> + >>> + if (usage >= SCHED_LOAD_SCALE) >>> + return capacity; >> >> Can "capacity" be greater than SCHED_LOAD_SCALE? >> Why use SCHED_LOAD_SCALE instead of "capacity" in this judgement? > > Yes, SCHED_LOAD_SCALE is the default value but the capacity can be in > the range [1536:512] for arm as an example Right, I was confused between cpu capacity and arch_scale_freq_capacity() in "Patch 04" then. Thanks. -Xunlei -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 10f84c3..faf61a2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4781,6 +4781,33 @@ static int select_idle_sibling(struct task_struct *p, int target) done: return target; } +/* + * get_cpu_usage returns the amount of capacity of a CPU that is used by CFS + * tasks. The unit of the return value must capacity so we can compare the + * usage with the capacity of the CPU that is available for CFS task (ie + * cpu_capacity). + * cfs.utilization_load_avg is the sum of running time of runnable tasks on a + * CPU. It represents the amount of utilization of a CPU in the range + * [0..SCHED_LOAD_SCALE]. The usage of a CPU can't be higher than the full + * capacity of the CPU because it's about the running time on this CPU. + * Nevertheless, cfs.utilization_load_avg can be higher than SCHED_LOAD_SCALE + * because of unfortunate rounding in avg_period and running_load_avg or just + * after migrating tasks until the average stabilizes with the new running + * time. So we need to check that the usage stays into the range + * [0..cpu_capacity_orig] and cap if necessary. + * Without capping the usage, a group could be seen as overloaded (CPU0 usage + * at 121% + CPU1 usage at 80%) whereas CPU1 has 20% of available capacity/ + */ +static int get_cpu_usage(int cpu) +{ + unsigned long usage = cpu_rq(cpu)->cfs.utilization_load_avg; + unsigned long capacity = capacity_orig_of(cpu); + + if (usage >= SCHED_LOAD_SCALE) + return capacity; + + return (usage * capacity) >> SCHED_LOAD_SHIFT; +} /* * select_task_rq_fair: Select target runqueue for the waking task in domains @@ -5907,6 +5934,7 @@ struct sg_lb_stats { unsigned long sum_weighted_load; /* Weighted load of group's tasks */ unsigned long load_per_task; unsigned long group_capacity; + unsigned long group_usage; /* Total usage of the group */ unsigned int sum_nr_running; /* Nr tasks running in the group */ unsigned int group_capacity_factor; unsigned int idle_cpus; @@ -6255,6 +6283,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, load = source_load(i, load_idx); sgs->group_load += load; + sgs->group_usage += get_cpu_usage(i); sgs->sum_nr_running += rq->cfs.h_nr_running; if (rq->nr_running > 1)