diff mbox series

sched/fair: clean up asym packing

Message ID 1559571436-29091-1-git-send-email-vincent.guittot@linaro.org
State Superseded
Headers show
Series sched/fair: clean up asym packing | expand

Commit Message

Vincent Guittot June 3, 2019, 2:17 p.m. UTC
Clean up asym packing to follow the default load balance behavior:
- classify the group by creating a group_asym_capacity field.
- calculate the imbalance in calculate_imbalance() instead of bypassing it.

We don't need to test twice same conditions anymore to detect asym packing
and we consolidate the calculation of imbalance in calculate_imbalance().

There is no functional changes.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

---
 kernel/sched/fair.c | 63 ++++++++++++++---------------------------------------
 1 file changed, 16 insertions(+), 47 deletions(-)

-- 
2.7.4

Comments

Valentin Schneider June 3, 2019, 6:15 p.m. UTC | #1
Hi,

On 03/06/2019 15:17, Vincent Guittot wrote:
> Clean up asym packing to follow the default load balance behavior:

> - classify the group by creating a group_asym_capacity field.


Being nitpicky here, this doesn't classify the group in the usual way
- it doesn't get a specific group_type value (group_classify()). So maybe
"classify" isn't the best term here.

Also, why tag this group in update_sd_pick_busiest()? It would make more
sense to do so in update_sg_lb_stats() like with the other sg_lb_stats fields:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 93c24473c8a0..537710026c3a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8298,6 +8298,10 @@ static inline void update_sg_lb_stats(struct lb_env *env,
                }
        }
 
+       if (sgs->sum_nr_running &&
+           sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu))
+               sgs->group_asym_capacity = 1;
+
        /* Adjust by relative CPU capacity of the group */
        sgs->group_capacity = group->sgc->capacity;
        sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity;
@@ -8391,9 +8395,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,
         * perform better since they share less core resources.  Hence when we
         * have idle threads, we want them to be the higher ones.
         */
-       if (sgs->sum_nr_running &&
-           sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) {
-               sgs->group_asym_capacity = 1;
+       if (sgs->group_asym_capacity) {
                if (!sds->busiest)
                        return true;
Vincent Guittot June 3, 2019, 6:32 p.m. UTC | #2
On Mon, 3 Jun 2019 at 20:15, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>

> Hi,

>

> On 03/06/2019 15:17, Vincent Guittot wrote:

> > Clean up asym packing to follow the default load balance behavior:

> > - classify the group by creating a group_asym_capacity field.

>

> Being nitpicky here, this doesn't classify the group in the usual way

> - it doesn't get a specific group_type value (group_classify()). So maybe

> "classify" isn't the best term here.


My original goal was to add a group type to classify the group but
this would have broken the current behavior whereas I only want to
move code

>

> Also, why tag this group in update_sd_pick_busiest()? It would make more

> sense to do so in update_sg_lb_stats() like with the other sg_lb_stats fields:


With your proposal below, the test is called for every groups'
statistic update whereas it is only done lastly after checking other
rules in the current code and I don't want to modify the current
behavior but only move code to set imbalance in calculate imbalance.

A bigger cleanup will come in next steps

>

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> index 93c24473c8a0..537710026c3a 100644

> --- a/kernel/sched/fair.c

> +++ b/kernel/sched/fair.c

> @@ -8298,6 +8298,10 @@ static inline void update_sg_lb_stats(struct lb_env *env,

>                 }

>         }

>

> +       if (sgs->sum_nr_running &&

> +           sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu))

> +               sgs->group_asym_capacity = 1;

> +

>         /* Adjust by relative CPU capacity of the group */

>         sgs->group_capacity = group->sgc->capacity;

>         sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity;

> @@ -8391,9 +8395,7 @@ static bool update_sd_pick_busiest(struct lb_env *env,

>          * perform better since they share less core resources.  Hence when we

>          * have idle threads, we want them to be the higher ones.

>          */

> -       if (sgs->sum_nr_running &&

> -           sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) {

> -               sgs->group_asym_capacity = 1;

> +       if (sgs->group_asym_capacity) {

>                 if (!sds->busiest)

>                         return true;

>
Valentin Schneider June 4, 2019, 5:24 p.m. UTC | #3
On 03/06/2019 19:32, Vincent Guittot wrote:
> On Mon, 3 Jun 2019 at 20:15, Valentin Schneider

[...]
> My original goal was to add a group type to classify the group but

> this would have broken the current behavior whereas I only want to

> move code

> 

>>

>> Also, why tag this group in update_sd_pick_busiest()? It would make more

>> sense to do so in update_sg_lb_stats() like with the other sg_lb_stats fields:

> 

> With your proposal below, the test is called for every groups'

> statistic update whereas it is only done lastly after checking other

> rules in the current code and I don't want to modify the current

> behavior but only move code to set imbalance in calculate imbalance.

> 


Adding a new group_type would make sense. From a behavioral point of view
your change is fine, but from a logical one it sits halfway between being
a new stat and being a new group_type. I'd rather see a new group_type,
though as you said that's a different topic than cleaning up duplicate
operations.

> A bigger cleanup will come in next steps

> 

[...]
diff mbox series

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f35930f..93c2447 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7888,6 +7888,7 @@  struct sg_lb_stats {
 	unsigned int group_weight;
 	enum group_type group_type;
 	int group_no_capacity;
+	int group_asym_capacity;
 	unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */
 #ifdef CONFIG_NUMA_BALANCING
 	unsigned int nr_numa_running;
@@ -8382,9 +8383,17 @@  static bool update_sd_pick_busiest(struct lb_env *env,
 	 * ASYM_PACKING needs to move all the work to the highest
 	 * prority CPUs in the group, therefore mark all groups
 	 * of lower priority than ourself as busy.
+	 *
+	 * This is primarily intended to used at the sibling level.  Some
+	 * cores like POWER7 prefer to use lower numbered SMT threads.  In the
+	 * case of POWER7, it can move to lower SMT modes only when higher
+	 * threads are idle.  When in lower SMT modes, the threads will
+	 * perform better since they share less core resources.  Hence when we
+	 * have idle threads, we want them to be the higher ones.
 	 */
 	if (sgs->sum_nr_running &&
 	    sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) {
+		sgs->group_asym_capacity = 1;
 		if (!sds->busiest)
 			return true;
 
@@ -8522,51 +8531,6 @@  static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 }
 
 /**
- * check_asym_packing - Check to see if the group is packed into the
- *			sched domain.
- *
- * This is primarily intended to used at the sibling level.  Some
- * cores like POWER7 prefer to use lower numbered SMT threads.  In the
- * case of POWER7, it can move to lower SMT modes only when higher
- * threads are idle.  When in lower SMT modes, the threads will
- * perform better since they share less core resources.  Hence when we
- * have idle threads, we want them to be the higher ones.
- *
- * This packing function is run on idle threads.  It checks to see if
- * the busiest CPU in this domain (core in the P7 case) has a higher
- * CPU number than the packing function is being run on.  Here we are
- * assuming lower CPU number will be equivalent to lower a SMT thread
- * number.
- *
- * Return: 1 when packing is required and a task should be moved to
- * this CPU.  The amount of the imbalance is returned in env->imbalance.
- *
- * @env: The load balancing environment.
- * @sds: Statistics of the sched_domain which is to be packed
- */
-static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
-{
-	int busiest_cpu;
-
-	if (!(env->sd->flags & SD_ASYM_PACKING))
-		return 0;
-
-	if (env->idle == CPU_NOT_IDLE)
-		return 0;
-
-	if (!sds->busiest)
-		return 0;
-
-	busiest_cpu = sds->busiest->asym_prefer_cpu;
-	if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
-		return 0;
-
-	env->imbalance = sds->busiest_stat.group_load;
-
-	return 1;
-}
-
-/**
  * fix_small_imbalance - Calculate the minor imbalance that exists
  *			amongst the groups of a sched_domain, during
  *			load balancing.
@@ -8650,6 +8614,11 @@  static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
 	local = &sds->local_stat;
 	busiest = &sds->busiest_stat;
 
+	if (busiest->group_asym_capacity) {
+		env->imbalance = busiest->group_load;
+		return;
+	}
+
 	if (busiest->group_type == group_imbalanced) {
 		/*
 		 * In the group_imb case we cannot rely on group-wide averages
@@ -8754,8 +8723,8 @@  static struct sched_group *find_busiest_group(struct lb_env *env)
 	busiest = &sds.busiest_stat;
 
 	/* ASYM feature bypasses nice load balance check */
-	if (check_asym_packing(env, &sds))
-		return sds.busiest;
+	if (busiest->group_asym_capacity)
+		goto force_balance;
 
 	/* There is no busy sibling group to pull tasks from */
 	if (!sds.busiest || busiest->sum_nr_running == 0)