diff mbox series

[v2,3/3] sched/fair: fix unnecessary increase of balance interval

Message ID 1544803317-7610-4-git-send-email-vincent.guittot@linaro.org
State New
Headers show
Series sched/fair: some fixes for asym_packing | expand

Commit Message

Vincent Guittot Dec. 14, 2018, 4:01 p.m. UTC
In case of active balance, we increase the balance interval to cover
pinned tasks cases not covered by all_pinned logic. Neverthless, the active
migration triggered by asym packing should be treated as the normal
unbalanced case and reset the interval to default value otherwise active
migration for asym_packing can be easily delayed for hundreds of ms
because of this all_pinned detection mecanism.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

---
 kernel/sched/fair.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

-- 
2.7.4

Comments

Valentin Schneider Dec. 17, 2018, 4:56 p.m. UTC | #1
Hi Vincent,

About time I had a look at this one...

On 14/12/2018 16:01, Vincent Guittot wrote:
> In case of active balance, we increase the balance interval to cover

> pinned tasks cases not covered by all_pinned logic.


AFAIUI the balance increase is there to have plenty of time to
stop the task before going through another load_balance().

Seeing as there is a cpus_allowed check that leads to

    env.flags |= LBF_ALL_PINNED;
    goto out_one_pinned;

in the active balancing part of load_balance(), the condition you're
changing should never be hit when we have pinned tasks. So you may want
to rephrase that bit.

> Neverthless, the active

> migration triggered by asym packing should be treated as the normal

> unbalanced case and reset the interval to default value otherwise active

> migration for asym_packing can be easily delayed for hundreds of ms

> because of this all_pinned detection mecanism.


Mmm so it's not exactly clear why we need this change. If we double the
interval because of a pinned task we wanted to active balance, well it's
just regular pinned task issues and we can't do much about it.

The only scenario I can think of is if you had a workload where you wanted
to do an active balance in several successive load_balance(), in which case
you would keep increasing the interval even though you do migrate a task
every time (which would harm every subsequent active_balance).

In that case every active_balance "user" (pressured CPU, misfit) is
affected, so maybe what we want is something like this:

-----8<-----
@@ -9136,13 +9149,11 @@ static int load_balance(int this_cpu, struct rq *this_rq,
                sd->balance_interval = sd->min_interval;
        } else {
                /*
-                * If we've begun active balancing, start to back off. This
-                * case may not be covered by the all_pinned logic if there
-                * is only 1 task on the busy runqueue (because we don't call
-                * detach_tasks).
+                * If we've begun active balancing, start to back off.
+                * Don't increase too much in case we have more active balances
+                * coming up.
                 */
-               if (sd->balance_interval < sd->max_interval)
-                       sd->balance_interval *= 2;
+               sd->balance_interval = 2 * sd->min_interval;
        }
 
        goto out;
----->8-----

Maybe we want a larger threshold - truth be told, it all depends on how
long the cpu stopper can take and if that delay increase is still relevant
nowadays.

> 

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---

>  kernel/sched/fair.c | 27 +++++++++++++++------------

>  1 file changed, 15 insertions(+), 12 deletions(-)

> 

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> index 9591e7a..487287e 100644

> --- a/kernel/sched/fair.c

> +++ b/kernel/sched/fair.c

> @@ -8857,21 +8857,24 @@ static struct rq *find_busiest_queue(struct lb_env *env,

>   */

>  #define MAX_PINNED_INTERVAL	512

>  

> +static inline bool

> +asym_active_balance(struct lb_env *env)

> +{

> +	/*

> +	 * ASYM_PACKING needs to force migrate tasks from busy but

> +	 * lower priority CPUs in order to pack all tasks in the

> +	 * highest priority CPUs.

> +	 */

> +	return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&

> +	       sched_asym_prefer(env->dst_cpu, env->src_cpu);

> +}

> +

>  static int need_active_balance(struct lb_env *env)

>  {

>  	struct sched_domain *sd = env->sd;

>  

> -	if (env->idle != CPU_NOT_IDLE) {

> -

> -		/*

> -		 * ASYM_PACKING needs to force migrate tasks from busy but

> -		 * lower priority CPUs in order to pack all tasks in the

> -		 * highest priority CPUs.

> -		 */

> -		if ((sd->flags & SD_ASYM_PACKING) &&

> -		    sched_asym_prefer(env->dst_cpu, env->src_cpu))

> -			return 1;

> -	}

> +	if (asym_active_balance(env))

> +		return 1;

>  

>  	/*

>  	 * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.

> @@ -9150,7 +9153,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,

>  	} else

>  		sd->nr_balance_failed = 0;

>  

> -	if (likely(!active_balance)) {

> +	if (likely(!active_balance) || asym_active_balance(&env)) {


AFAICT this makes us reset the interval whenever we do an asym packing
active balance (if the task is pinned we would goto out_one_pinned).
This goes against the logic I had understood so far and that I explained
higher up.

>  		/* We were unbalanced, so reset the balancing interval */

>  		sd->balance_interval = sd->min_interval;

>  	} else {

>
Vincent Guittot Dec. 18, 2018, 9:32 a.m. UTC | #2
Hi Valentin,

On Mon, 17 Dec 2018 at 17:56, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>

> Hi Vincent,

>

> About time I had a look at this one...

>

> On 14/12/2018 16:01, Vincent Guittot wrote:

> > In case of active balance, we increase the balance interval to cover

> > pinned tasks cases not covered by all_pinned logic.

>

> AFAIUI the balance increase is there to have plenty of time to

> stop the task before going through another load_balance().

>

> Seeing as there is a cpus_allowed check that leads to

>

>     env.flags |= LBF_ALL_PINNED;

>     goto out_one_pinned;

>

> in the active balancing part of load_balance(), the condition you're

> changing should never be hit when we have pinned tasks. So you may want

> to rephrase that bit.


In this asym packing case, It has nothing to do with pinned tasks and
that's the root cause of the problem:
the active balance triggered by asym packing is wrongly assumed to be
an active balance due to pinned task(s) and the load balance interval
is increased without any reason

>

> > Neverthless, the active

> > migration triggered by asym packing should be treated as the normal

> > unbalanced case and reset the interval to default value otherwise active

> > migration for asym_packing can be easily delayed for hundreds of ms

> > because of this all_pinned detection mecanism.

>

> Mmm so it's not exactly clear why we need this change. If we double the

> interval because of a pinned task we wanted to active balance, well it's

> just regular pinned task issues and we can't do much about it.


As explained above, it's not a pinned task case

>

> The only scenario I can think of is if you had a workload where you wanted

> to do an active balance in several successive load_balance(), in which case

> you would keep increasing the interval even though you do migrate a task

> every time (which would harm every subsequent active_balance).

>

> In that case every active_balance "user" (pressured CPU, misfit) is

> affected, so maybe what we want is something like this:

>

> -----8<-----

> @@ -9136,13 +9149,11 @@ static int load_balance(int this_cpu, struct rq *this_rq,

>                 sd->balance_interval = sd->min_interval;

>         } else {

>                 /*

> -                * If we've begun active balancing, start to back off. This

> -                * case may not be covered by the all_pinned logic if there

> -                * is only 1 task on the busy runqueue (because we don't call

> -                * detach_tasks).

> +                * If we've begun active balancing, start to back off.

> +                * Don't increase too much in case we have more active balances

> +                * coming up.

>                  */

> -               if (sd->balance_interval < sd->max_interval)

> -                       sd->balance_interval *= 2;

> +               sd->balance_interval = 2 * sd->min_interval;

>         }

>

>         goto out;

> ----->8-----

>

> Maybe we want a larger threshold - truth be told, it all depends on how

> long the cpu stopper can take and if that delay increase is still relevant

> nowadays.


hmm the increase of balance interval is not linked to cpu stopper but
to increase the load balance interval when we know that there is no
possible load balance to perform

Regards,
Vincent
>

> >

> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

> > ---

> >  kernel/sched/fair.c | 27 +++++++++++++++------------

> >  1 file changed, 15 insertions(+), 12 deletions(-)

> >

> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> > index 9591e7a..487287e 100644

> > --- a/kernel/sched/fair.c

> > +++ b/kernel/sched/fair.c

> > @@ -8857,21 +8857,24 @@ static struct rq *find_busiest_queue(struct lb_env *env,

> >   */

> >  #define MAX_PINNED_INTERVAL  512

> >

> > +static inline bool

> > +asym_active_balance(struct lb_env *env)

> > +{

> > +     /*

> > +      * ASYM_PACKING needs to force migrate tasks from busy but

> > +      * lower priority CPUs in order to pack all tasks in the

> > +      * highest priority CPUs.

> > +      */

> > +     return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&

> > +            sched_asym_prefer(env->dst_cpu, env->src_cpu);

> > +}

> > +

> >  static int need_active_balance(struct lb_env *env)

> >  {

> >       struct sched_domain *sd = env->sd;

> >

> > -     if (env->idle != CPU_NOT_IDLE) {

> > -

> > -             /*

> > -              * ASYM_PACKING needs to force migrate tasks from busy but

> > -              * lower priority CPUs in order to pack all tasks in the

> > -              * highest priority CPUs.

> > -              */

> > -             if ((sd->flags & SD_ASYM_PACKING) &&

> > -                 sched_asym_prefer(env->dst_cpu, env->src_cpu))

> > -                     return 1;

> > -     }

> > +     if (asym_active_balance(env))

> > +             return 1;

> >

> >       /*

> >        * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.

> > @@ -9150,7 +9153,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,

> >       } else

> >               sd->nr_balance_failed = 0;

> >

> > -     if (likely(!active_balance)) {

> > +     if (likely(!active_balance) || asym_active_balance(&env)) {

>

> AFAICT this makes us reset the interval whenever we do an asym packing

> active balance (if the task is pinned we would goto out_one_pinned).

> This goes against the logic I had understood so far and that I explained

> higher up.

>

> >               /* We were unbalanced, so reset the balancing interval */

> >               sd->balance_interval = sd->min_interval;

> >       } else {

> >
Valentin Schneider Dec. 18, 2018, 11:46 a.m. UTC | #3
On 18/12/2018 09:32, Vincent Guittot wrote:
[...]
> In this asym packing case, It has nothing to do with pinned tasks and

> that's the root cause of the problem:

> the active balance triggered by asym packing is wrongly assumed to be

> an active balance due to pinned task(s) and the load balance interval

> is increased without any reason

> 

[...]> hmm the increase of balance interval is not linked to cpu stopper but
> to increase the load balance interval when we know that there is no

> possible load balance to perform

> 

> Regards,

> Vincent


Ah, I think I get it: you're saying that this balance_interval increase
is done because it is always assumed we do an active balance with
busiest->nr_running > 1 && pinned tasks, and that all that is left to
migrate after the active_balance is pinned tasks that we can't actually
migrate.

Maybe that's what we should target (as always, totally untested):

-----8<-----
@@ -9131,18 +9144,16 @@ static int load_balance(int this_cpu, struct rq *this_rq,
        } else
                sd->nr_balance_failed = 0;
 
-       if (likely(!active_balance)) {
-               /* We were unbalanced, so reset the balancing interval */
-               sd->balance_interval = sd->min_interval;
-       } else {
+       if (unlikely(active_balance && (env.flags & LBF_ALL_PINNED))) {
                /*
-                * If we've begun active balancing, start to back off. This
-                * case may not be covered by the all_pinned logic if there
-                * is only 1 task on the busy runqueue (because we don't call
-                * detach_tasks).
+                * We did an active balance as a last resort (all other tasks
+                * were pinned), start to back off.
                 */
                if (sd->balance_interval < sd->max_interval)
                        sd->balance_interval *= 2;
+       } else {
+               /* We were unbalanced, so reset the balancing interval */
+               sd->balance_interval = sd->min_interval;
        }
 
        goto out;
----->8-----

can_migrate_task() called by detach_tasks() in the
'if (busiest->nr_running > 1)' block should clear that LBF_ALL_PINNED flag
if there is any migratable task (even if we don't end up migrating it),
and it's not set if (busiest->nr_running == 1), so that *should* work.

I believe the argument that this applies to all kinds of active balances
is still valid - this shouldn't be changed just for asym packing active
balance.
Vincent Guittot Dec. 18, 2018, 1:23 p.m. UTC | #4
On Tue, 18 Dec 2018 at 12:46, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>

> On 18/12/2018 09:32, Vincent Guittot wrote:

> [...]

> > In this asym packing case, It has nothing to do with pinned tasks and

> > that's the root cause of the problem:

> > the active balance triggered by asym packing is wrongly assumed to be

> > an active balance due to pinned task(s) and the load balance interval

> > is increased without any reason

> >

> [...]> hmm the increase of balance interval is not linked to cpu stopper but

> > to increase the load balance interval when we know that there is no

> > possible load balance to perform

> >

> > Regards,

> > Vincent

>

> Ah, I think I get it: you're saying that this balance_interval increase

> is done because it is always assumed we do an active balance with

> busiest->nr_running > 1 && pinned tasks, and that all that is left to

> migrate after the active_balance is pinned tasks that we can't actually

> migrate.

>

> Maybe that's what we should target (as always, totally untested):

>

> -----8<-----

> @@ -9131,18 +9144,16 @@ static int load_balance(int this_cpu, struct rq *this_rq,

>         } else

>                 sd->nr_balance_failed = 0;

>

> -       if (likely(!active_balance)) {

> -               /* We were unbalanced, so reset the balancing interval */

> -               sd->balance_interval = sd->min_interval;

> -       } else {

> +       if (unlikely(active_balance && (env.flags & LBF_ALL_PINNED))) {


So it's not exactly LBF_ALL_PINNED but also some pinned tasks

but IIUC, you would like to extend the reset of balance  interval  to
all the "good" reasons to trig active load balance
In fact the condition used in need_active_balance except the last
remaining one. Good reasons are:
- asym packing
- /*
* The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.
* It's worth migrating the task if the src_cpu's capacity is reduced
* because of other sched_class or IRQs if more capacity stays
* available on dst_cpu.
*/
- misfit task

I can extend the test for these conditions


>                 /*

> -                * If we've begun active balancing, start to back off. This

> -                * case may not be covered by the all_pinned logic if there

> -                * is only 1 task on the busy runqueue (because we don't call

> -                * detach_tasks).

> +                * We did an active balance as a last resort (all other tasks

> +                * were pinned), start to back off.

>                  */

>                 if (sd->balance_interval < sd->max_interval)

>                         sd->balance_interval *= 2;

> +       } else {

> +               /* We were unbalanced, so reset the balancing interval */

> +               sd->balance_interval = sd->min_interval;

>         }

>

>         goto out;

> ----->8-----

>

> can_migrate_task() called by detach_tasks() in the

> 'if (busiest->nr_running > 1)' block should clear that LBF_ALL_PINNED flag

> if there is any migratable task (even if we don't end up migrating it),

> and it's not set if (busiest->nr_running == 1), so that *should* work.

>

> I believe the argument that this applies to all kinds of active balances

> is still valid - this shouldn't be changed just for asym packing active

> balance.
Valentin Schneider Dec. 18, 2018, 2:09 p.m. UTC | #5
On 18/12/2018 13:23, Vincent Guittot wrote:
[...]
>> Ah, I think I get it: you're saying that this balance_interval increase

>> is done because it is always assumed we do an active balance with

>> busiest->nr_running > 1 && pinned tasks, and that all that is left to

>> migrate after the active_balance is pinned tasks that we can't actually

>> migrate.

>>

>> Maybe that's what we should target (as always, totally untested):

>>

>> -----8<-----

>> @@ -9131,18 +9144,16 @@ static int load_balance(int this_cpu, struct rq *this_rq,

>>         } else

>>                 sd->nr_balance_failed = 0;

>>

>> -       if (likely(!active_balance)) {

>> -               /* We were unbalanced, so reset the balancing interval */

>> -               sd->balance_interval = sd->min_interval;

>> -       } else {

>> +       if (unlikely(active_balance && (env.flags & LBF_ALL_PINNED))) {

> 

> So it's not exactly LBF_ALL_PINNED but also some pinned tasks

> 


Wouldn't LBF_ALL_PINNED cover all relevant cases? It's set at the very top
of the 'if (busiest->nr_running > 1)' block and cleared whenever we find
at least one task we could pull, even if we don't pull it because of 
other reasons in can_migrate_task() (e.g. cache hotness).

If we have LBF_SOME_PINNED but not LBF_ALL_PINNED, we currently don't
increase the balance_interval, which is what we would want to maintain.

> but IIUC, you would like to extend the reset of balance  interval  to

> all the "good" reasons to trig active load balance


Right

> In fact the condition used in need_active_balance except the last

> remaining one. Good reasons are:

> - asym packing

> - /*

> * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.

> * It's worth migrating the task if the src_cpu's capacity is reduced

> * because of other sched_class or IRQs if more capacity stays

> * available on dst_cpu.

> */

> - misfit task

> 

> I can extend the test for these conditions


So that's all the need_active_balance() cases except the last
sd->nr_balance_failed one. I'd argue this could also be counted as a
"good" reason to active balance which shouldn't lead to a balance_interval
increase. Plus, it keeps to the logic of increasing the balance_interval
only when task affinity gets in the way.
Vincent Guittot Dec. 19, 2018, 8:27 a.m. UTC | #6
On Tue, 18 Dec 2018 at 15:09, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>

> On 18/12/2018 13:23, Vincent Guittot wrote:

> [...]

> >> Ah, I think I get it: you're saying that this balance_interval increase

> >> is done because it is always assumed we do an active balance with

> >> busiest->nr_running > 1 && pinned tasks, and that all that is left to

> >> migrate after the active_balance is pinned tasks that we can't actually

> >> migrate.

> >>

> >> Maybe that's what we should target (as always, totally untested):

> >>

> >> -----8<-----

> >> @@ -9131,18 +9144,16 @@ static int load_balance(int this_cpu, struct rq *this_rq,

> >>         } else

> >>                 sd->nr_balance_failed = 0;

> >>

> >> -       if (likely(!active_balance)) {

> >> -               /* We were unbalanced, so reset the balancing interval */

> >> -               sd->balance_interval = sd->min_interval;

> >> -       } else {

> >> +       if (unlikely(active_balance && (env.flags & LBF_ALL_PINNED))) {

> >

> > So it's not exactly LBF_ALL_PINNED but also some pinned tasks

> >

>

> Wouldn't LBF_ALL_PINNED cover all relevant cases? It's set at the very top

> of the 'if (busiest->nr_running > 1)' block and cleared whenever we find

> at least one task we could pull, even if we don't pull it because of

> other reasons in can_migrate_task() (e.g. cache hotness).

>

> If we have LBF_SOME_PINNED but not LBF_ALL_PINNED, we currently don't

> increase the balance_interval, which is what we would want to maintain.


But there are several other UC to do active migration and increase the interval
like all except running tasks are pinned

>

> > but IIUC, you would like to extend the reset of balance  interval  to

> > all the "good" reasons to trig active load balance

>

> Right

>

> > In fact the condition used in need_active_balance except the last

> > remaining one. Good reasons are:

> > - asym packing

> > - /*

> > * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.

> > * It's worth migrating the task if the src_cpu's capacity is reduced

> > * because of other sched_class or IRQs if more capacity stays

> > * available on dst_cpu.

> > */

> > - misfit task

> >

> > I can extend the test for these conditions

>

> So that's all the need_active_balance() cases except the last

> sd->nr_balance_failed one. I'd argue this could also be counted as a

> "good" reason to active balance which shouldn't lead to a balance_interval

> increase. Plus, it keeps to the logic of increasing the balance_interval

> only when task affinity gets in the way.


But this is some kind of affinity, the hotness is a way for the
scheduler to temporarily pinned the task on a cpu to take advantage of
cache hotness.

I would prefer to be conservative and only reset the interval when we
are sure that active load balance is really what we want to do.
Asym packing is one, we can add the misfit case and the move task on
cpu with more available capacity as well. For the other case, it's
less obvious and I would prefer to keep current behavior
Valentin Schneider Dec. 19, 2018, 11:16 a.m. UTC | #7
On 19/12/2018 08:27, Vincent Guittot wrote:
[...]
>> Wouldn't LBF_ALL_PINNED cover all relevant cases? It's set at the very top

>> of the 'if (busiest->nr_running > 1)' block and cleared whenever we find

>> at least one task we could pull, even if we don't pull it because of

>> other reasons in can_migrate_task() (e.g. cache hotness).

>>

>> If we have LBF_SOME_PINNED but not LBF_ALL_PINNED, we currently don't

>> increase the balance_interval, which is what we would want to maintain.

> 

> But there are several other UC to do active migration and increase the interval

> like all except running tasks are pinned

> 


My point is that AFAICT the LBF_ALL_PINNED flag would cover all the cases
we care about, although the one you're mentioning is the only one I can 
think of. In that case LBF_ALL_PINNED would never be cleared, so when we do
the active balance we'd know it's because all other tasks were pinned so
we should probably increase the interval (see last snippet I sent).

[...]
>> So that's all the need_active_balance() cases except the last

>> sd->nr_balance_failed one. I'd argue this could also be counted as a

>> "good" reason to active balance which shouldn't lead to a balance_interval

>> increase. Plus, it keeps to the logic of increasing the balance_interval

>> only when task affinity gets in the way.

> 

> But this is some kind of affinity, the hotness is a way for the

> scheduler to temporarily pinned the task on a cpu to take advantage of

> cache hotness.

> 

> I would prefer to be conservative and only reset the interval when we

> are sure that active load balance is really what we want to do.

> Asym packing is one, we can add the misfit case and the move task on

> cpu with more available capacity as well. For the other case, it's

> less obvious and I would prefer to keep current behavior

> 


Mmm ok so this one is kinda about semantics on what do we really consider
as "pinning".

If we look at the regular load_balance() path, if all tasks are cache hot
(so we clear LBF_ALL_PINNED but don't pass can_migrate_task()) we won't
increase the balance_interval. Actually, if we have !active_balance we'll
reset it.

Seeing as the duration of a task's cache hotness (default .5ms) is small
compared to any balance_interval (1ms * sd_weight), IMO it would make sense
to reset the interval for all active balance cases. On top of that, we
would keep to the current logic of increasing the balance_interval *only*
when task->cpus_allowed gets in the way.
Vincent Guittot Dec. 19, 2018, 1:29 p.m. UTC | #8
On Wed, 19 Dec 2018 at 12:16, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>

> On 19/12/2018 08:27, Vincent Guittot wrote:

> [...]

> >> Wouldn't LBF_ALL_PINNED cover all relevant cases? It's set at the very top

> >> of the 'if (busiest->nr_running > 1)' block and cleared whenever we find

> >> at least one task we could pull, even if we don't pull it because of

> >> other reasons in can_migrate_task() (e.g. cache hotness).

> >>

> >> If we have LBF_SOME_PINNED but not LBF_ALL_PINNED, we currently don't

> >> increase the balance_interval, which is what we would want to maintain.

> >

> > But there are several other UC to do active migration and increase the interval

> > like all except running tasks are pinned

> >

>

> My point is that AFAICT the LBF_ALL_PINNED flag would cover all the cases

> we care about, although the one you're mentioning is the only one I can

> think of. In that case LBF_ALL_PINNED would never be cleared, so when we do

> the active balance we'd know it's because all other tasks were pinned so

> we should probably increase the interval (see last snippet I sent).


There are probably several other UC than the one mentioned below as
tasks can be discarded for several reasons.
So instead of changing for all UC by default, i would prefer only
change for those for which we know it's safe

>

> [...]

> >> So that's all the need_active_balance() cases except the last

> >> sd->nr_balance_failed one. I'd argue this could also be counted as a

> >> "good" reason to active balance which shouldn't lead to a balance_interval

> >> increase. Plus, it keeps to the logic of increasing the balance_interval

> >> only when task affinity gets in the way.

> >

> > But this is some kind of affinity, the hotness is a way for the

> > scheduler to temporarily pinned the task on a cpu to take advantage of

> > cache hotness.

> >

> > I would prefer to be conservative and only reset the interval when we

> > are sure that active load balance is really what we want to do.

> > Asym packing is one, we can add the misfit case and the move task on

> > cpu with more available capacity as well. For the other case, it's

> > less obvious and I would prefer to keep current behavior

> >

>

> Mmm ok so this one is kinda about semantics on what do we really consider

> as "pinning".

>

> If we look at the regular load_balance() path, if all tasks are cache hot

> (so we clear LBF_ALL_PINNED but don't pass can_migrate_task()) we won't

> increase the balance_interval. Actually, if we have !active_balance we'll

> reset it.

>

> Seeing as the duration of a task's cache hotness (default .5ms) is small

> compared to any balance_interval (1ms * sd_weight), IMO it would make sense

> to reset the interval for all active balance cases. On top of that, we

> would keep to the current logic of increasing the balance_interval *only*

> when task->cpus_allowed gets in the way.
Valentin Schneider Dec. 19, 2018, 3:54 p.m. UTC | #9
On 19/12/2018 13:29, Vincent Guittot wrote:
[...]
>> My point is that AFAICT the LBF_ALL_PINNED flag would cover all the cases

>> we care about, although the one you're mentioning is the only one I can

>> think of. In that case LBF_ALL_PINNED would never be cleared, so when we do

>> the active balance we'd know it's because all other tasks were pinned so

>> we should probably increase the interval (see last snippet I sent).

> 

> There are probably several other UC than the one mentioned below as

> tasks can be discarded for several reasons.

> So instead of changing for all UC by default, i would prefer only

> change for those for which we know it's safe


I get your point. Thing is, I've stared at the code for a while and
couldn't find any other usecase where checking LBF_ALL_PINNED wouldn't
suffice.

It would be nice convince ourselves it is indeed enough (or not, but then
we should be sure of it rather than base ourselves on assumptions), because
then we can have just a simple condition rather than something that
introduces active balance categories.
Vincent Guittot Dec. 19, 2018, 4:54 p.m. UTC | #10
On Wed, 19 Dec 2018 at 16:54, Valentin Schneider
<valentin.schneider@arm.com> wrote:
>

> On 19/12/2018 13:29, Vincent Guittot wrote:

> [...]

> >> My point is that AFAICT the LBF_ALL_PINNED flag would cover all the cases

> >> we care about, although the one you're mentioning is the only one I can

> >> think of. In that case LBF_ALL_PINNED would never be cleared, so when we do

> >> the active balance we'd know it's because all other tasks were pinned so

> >> we should probably increase the interval (see last snippet I sent).

> >

> > There are probably several other UC than the one mentioned below as

> > tasks can be discarded for several reasons.

> > So instead of changing for all UC by default, i would prefer only

> > change for those for which we know it's safe

>

> I get your point. Thing is, I've stared at the code for a while and

> couldn't find any other usecase where checking LBF_ALL_PINNED wouldn't

> suffice.


The point is that LBF_ALL_PINNED flag is not set otherwise we would
have jump to out_*_pinned
But conditions are similar

>

> It would be nice convince ourselves it is indeed enough (or not, but then

> we should be sure of it rather than base ourselves on assumptions), because

> then we can have just a simple condition rather than something that

> introduces active balance categories.


this can be part of the larger rework that Peter asked few days ago
diff mbox series

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9591e7a..487287e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8857,21 +8857,24 @@  static struct rq *find_busiest_queue(struct lb_env *env,
  */
 #define MAX_PINNED_INTERVAL	512
 
+static inline bool
+asym_active_balance(struct lb_env *env)
+{
+	/*
+	 * ASYM_PACKING needs to force migrate tasks from busy but
+	 * lower priority CPUs in order to pack all tasks in the
+	 * highest priority CPUs.
+	 */
+	return env->idle != CPU_NOT_IDLE && (env->sd->flags & SD_ASYM_PACKING) &&
+	       sched_asym_prefer(env->dst_cpu, env->src_cpu);
+}
+
 static int need_active_balance(struct lb_env *env)
 {
 	struct sched_domain *sd = env->sd;
 
-	if (env->idle != CPU_NOT_IDLE) {
-
-		/*
-		 * ASYM_PACKING needs to force migrate tasks from busy but
-		 * lower priority CPUs in order to pack all tasks in the
-		 * highest priority CPUs.
-		 */
-		if ((sd->flags & SD_ASYM_PACKING) &&
-		    sched_asym_prefer(env->dst_cpu, env->src_cpu))
-			return 1;
-	}
+	if (asym_active_balance(env))
+		return 1;
 
 	/*
 	 * The dst_cpu is idle and the src_cpu CPU has only 1 CFS task.
@@ -9150,7 +9153,7 @@  static int load_balance(int this_cpu, struct rq *this_rq,
 	} else
 		sd->nr_balance_failed = 0;
 
-	if (likely(!active_balance)) {
+	if (likely(!active_balance) || asym_active_balance(&env)) {
 		/* We were unbalanced, so reset the balancing interval */
 		sd->balance_interval = sd->min_interval;
 	} else {