diff mbox series

[v2,6/8] sched/fair: use load instead of runnable load

Message ID 1564670424-26023-7-git-send-email-vincent.guittot@linaro.org
State New
Headers show
Series sched/fair: rework the CFS load balance | expand

Commit Message

Vincent Guittot Aug. 1, 2019, 2:40 p.m. UTC
runnable load has been introduced to take into account the case
where blocked load biases the load balance decision which was selecting
underutilized group with huge blocked load whereas other groups were
overloaded.

The load is now only used when groups are overloaded. In this case,
it's worth being conservative and taking into account the sleeping
tasks that might wakeup on the cpu.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

---
 kernel/sched/fair.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

-- 
2.7.4

Comments

Peter Zijlstra Aug. 6, 2019, 4:07 p.m. UTC | #1
On Thu, Aug 01, 2019 at 04:40:22PM +0200, Vincent Guittot wrote:
> runnable load has been introduced to take into account the case

> where blocked load biases the load balance decision which was selecting

> underutilized group with huge blocked load whereas other groups were

> overloaded.

> 

> The load is now only used when groups are overloaded. In this case,

> it's worth being conservative and taking into account the sleeping

> tasks that might wakeup on the cpu.


This one scares me a little. I have the feeling I'm missing/forgetting
something.

Also; while the regular load-balance (find-busiest) stuff is now all
aware of idle, this change also impacts wake_affine and find_idlest, and
they've not changed.
Vincent Guittot Aug. 26, 2019, 3:45 p.m. UTC | #2
On Tue, 6 Aug 2019 at 18:07, Peter Zijlstra <peterz@infradead.org> wrote:
>

> On Thu, Aug 01, 2019 at 04:40:22PM +0200, Vincent Guittot wrote:

> > runnable load has been introduced to take into account the case

> > where blocked load biases the load balance decision which was selecting

> > underutilized group with huge blocked load whereas other groups were

> > overloaded.

> >

> > The load is now only used when groups are overloaded. In this case,

> > it's worth being conservative and taking into account the sleeping

> > tasks that might wakeup on the cpu.

>

> This one scares me a little. I have the feeling I'm missing/forgetting

> something.

>

> Also; while the regular load-balance (find-busiest) stuff is now all

> aware of idle, this change also impacts wake_affine and find_idlest, and

> they've not changed.


Yes. I thought about this a bit before applying this changes to all
cpu_runnable_load.
-For wake_affine, it starts by looking at an idle cpu with
wake_affine_idle which is similar in some way to the new load balance
approach even if it might need more changes to align more closely both
paths.
-For find_idlest, it looks at the group with most spare capacity in
priority and fall back to load. That being said I have overlooked that
 both load and runnable load are already used and cpu.load is saved
twice instead of one.

I can't remember detailed results but this patch is responsible of
part of perf improvements described in the cover letter

Also, I haven't touch the numa stats which still use runnable_load.
But this should be addressed too
diff mbox series

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f05f1ad..dfaf0b8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5445,6 +5445,11 @@  static unsigned long cpu_runnable_load(struct rq *rq)
 	return cfs_rq_runnable_load_avg(&rq->cfs);
 }
 
+static unsigned long cpu_load(struct rq *rq)
+{
+	return cfs_rq_load_avg(&rq->cfs);
+}
+
 static unsigned long capacity_of(int cpu)
 {
 	return cpu_rq(cpu)->cpu_capacity;
@@ -5540,7 +5545,7 @@  wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
 	s64 this_eff_load, prev_eff_load;
 	unsigned long task_load;
 
-	this_eff_load = cpu_runnable_load(cpu_rq(this_cpu));
+	this_eff_load = cpu_load(cpu_rq(this_cpu));
 
 	if (sync) {
 		unsigned long current_load = task_h_load(current);
@@ -5558,7 +5563,7 @@  wake_affine_weight(struct sched_domain *sd, struct task_struct *p,
 		this_eff_load *= 100;
 	this_eff_load *= capacity_of(prev_cpu);
 
-	prev_eff_load = cpu_runnable_load(cpu_rq(prev_cpu));
+	prev_eff_load = cpu_load(cpu_rq(prev_cpu));
 	prev_eff_load -= task_load;
 	if (sched_feat(WA_BIAS))
 		prev_eff_load *= 100 + (sd->imbalance_pct - 100) / 2;
@@ -5646,7 +5651,7 @@  find_idlest_group(struct sched_domain *sd, struct task_struct *p,
 		max_spare_cap = 0;
 
 		for_each_cpu(i, sched_group_span(group)) {
-			load = cpu_runnable_load(cpu_rq(i));
+			load = cpu_load(cpu_rq(i));
 			runnable_load += load;
 
 			avg_load += cfs_rq_load_avg(&cpu_rq(i)->cfs);
@@ -5787,7 +5792,7 @@  find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 				continue;
 			}
 
-			load = cpu_runnable_load(cpu_rq(i));
+			load = cpu_load(cpu_rq(i));
 			if (load < min_load) {
 				min_load = load;
 				least_loaded_cpu = i;
@@ -8128,7 +8133,7 @@  static inline void update_sg_lb_stats(struct lb_env *env,
 		if ((env->flags & LBF_NOHZ_STATS) && update_nohz_stats(rq, false))
 			env->flags |= LBF_NOHZ_AGAIN;
 
-		sgs->group_load += cpu_runnable_load(rq);
+		sgs->group_load += cpu_load(rq);
 		sgs->group_util += cpu_util(i);
 		sgs->sum_h_nr_running += rq->cfs.h_nr_running;
 
@@ -8569,7 +8574,7 @@  static struct sched_group *find_busiest_group(struct lb_env *env)
 	init_sd_lb_stats(&sds);
 
 	/*
-	 * Compute the various statistics relavent for load balancing at
+	 * Compute the various statistics relevant for load balancing at
 	 * this level.
 	 */
 	update_sd_lb_stats(env, &sds);
@@ -8748,10 +8753,10 @@  static struct rq *find_busiest_queue(struct lb_env *env,
 
 		case migrate_load:
 			/*
-			 * When comparing with load imbalance, use cpu_runnable_load()
+			 * When comparing with load imbalance, use cpu_load()
 			 * which is not scaled with the CPU capacity.
 			 */
-			load = cpu_runnable_load(rq);
+			load = cpu_load(rq);
 
 			if (nr_running == 1 && load > env->imbalance &&
 			    !check_cpu_capacity(rq, env->sd))
@@ -8759,7 +8764,7 @@  static struct rq *find_busiest_queue(struct lb_env *env,
 
 			/*
 			 * For the load comparisons with the other CPU's, consider
-			 * the cpu_runnable_load() scaled with the CPU capacity, so
+			 * the cpu_load() scaled with the CPU capacity, so
 			 * that the load can be moved away from the CPU that is
 			 * potentially running at a lower capacity.
 			 *