Message ID | 1536590589-437-1-git-send-email-vincent.guittot@linaro.org |
---|---|
State | New |
Headers | show |
Series | [v2] sched/fair: fix 1 task per CPU | expand |
Hi, On 10/09/18 07:43, Vincent Guittot wrote: > When CPUs have different capacity because of RT/DL tasks or > micro-architecture or max frequency differences, there are situation where > the imbalance is not correctly set to migrate waiting task on the idle CPU. > > The UC uses the force_balance case : > if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && > busiest->group_no_capacity) > goto force_balance; > > But calculate_imbalance fails to set the right amount of load to migrate > a task because of the special condition: > busiest->avg_load <= sds->avg_load || local->avg_load >= sds->avg_load) > > Add in fix_small_imbalance, this special case that triggered the force > balance in order to make sure that the amount of load to migrate will be > enough. > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Other than the commit nit, LGTM. Out of curiosity I ran some kernel compile on my HiKey960 (-j8) but didn't see much change - something along the lines of ~1% speedup, and although it was consistent over a few iterations, I'd need a whole lot more of them to back this up. I kind of expected it because some sporadic task can show up and tip the scale in the right direction, so even without the patch the situation can "fix itself" eventually, and it becomes less noticeable on really long workloads. I do see a difference by looking at the trace of a simple 8 100% tasks rt-app workload though, as I no longer see that idling LITTLE I sometimes get without the patch, which is what we expect, so: Tested-by: Valentin Schneider <valentin.schneider@arm.com> > --- Again, I'd argue for a slightly more explicit header. As you pointed out in v1, it's not just long running tasks, so maybe just "fix 1 *running* task per CPU"? Otherwise I feel it's a tad obscure. > kernel/sched/fair.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 309c93f..72bc5e8 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8048,6 +8048,20 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) > local = &sds->local_stat; > busiest = &sds->busiest_stat; > > + /* > + * There is available capacity in local group and busiest group is > + * overloaded but calculate_imbalance can't compute the amount of load > + * to migrate because load_avg became meaningless due to asymetric > + * capacity between groups. Could you add something along the lines of "(see similar condition in find_busiest_group())"? In such case, we only want to migrate at > + * least one tasks of the busiest group and rely of the average load > + * per task to ensure the migration. > + */ > + if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && > + busiest->group_no_capacity) { > + env->imbalance = busiest->load_per_task; > + return; > + } > + > if (!local->sum_nr_running) > local->load_per_task = cpu_avg_load_per_task(env->dst_cpu); > else if (busiest->load_per_task > local->load_per_task) >
On Fri, 14 Sep 2018 at 05:22, Valentin Schneider <valentin.schneider@arm.com> wrote: > > Hi, > > On 10/09/18 07:43, Vincent Guittot wrote: > > When CPUs have different capacity because of RT/DL tasks or > > micro-architecture or max frequency differences, there are situation where > > the imbalance is not correctly set to migrate waiting task on the idle CPU. > > > > The UC uses the force_balance case : > > if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && > > busiest->group_no_capacity) > > goto force_balance; > > > > But calculate_imbalance fails to set the right amount of load to migrate > > a task because of the special condition: > > busiest->avg_load <= sds->avg_load || local->avg_load >= sds->avg_load) > > > > Add in fix_small_imbalance, this special case that triggered the force > > balance in order to make sure that the amount of load to migrate will be > > enough. > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > Other than the commit nit, LGTM. Out of curiosity I ran some kernel compile > on my HiKey960 (-j8) but didn't see much change - something along the lines > of ~1% speedup, and although it was consistent over a few iterations, I'd > need a whole lot more of them to back this up. > > I kind of expected it because some sporadic task can show up and tip the > scale in the right direction, so even without the patch the situation can > "fix itself" eventually, and it becomes less noticeable on really long > workloads. I have seen a better stdev and shorter duration for the tests that you used for misfit patch. The test have been done with asym packing and the few fixes that I sent in another patchset for asym packing > > I do see a difference by looking at the trace of a simple 8 100% tasks rt-app > workload though, as I no longer see that idling LITTLE I sometimes get > without the patch, which is what we expect, so: > > Tested-by: Valentin Schneider <valentin.schneider@arm.com> Thanks > > > --- > > Again, I'd argue for a slightly more explicit header. As you pointed out in > v1, it's not just long running tasks, so maybe just "fix 1 *running* task per > CPU"? Otherwise I feel it's a tad obscure. To be honest i don't mind about header but I don't see the benefit of adding *running*. So I let Peter or Ingo decide what they prefer > > > kernel/sched/fair.c | 14 ++++++++++++++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 309c93f..72bc5e8 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -8048,6 +8048,20 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) > > local = &sds->local_stat; > > busiest = &sds->busiest_stat; > > > > + /* > > + * There is available capacity in local group and busiest group is > > + * overloaded but calculate_imbalance can't compute the amount of load > > + * to migrate because load_avg became meaningless due to asymetric > > + * capacity between groups. > > Could you add something along the lines of "(see similar condition in > find_busiest_group())"? > > In such case, we only want to migrate at > > + * least one tasks of the busiest group and rely of the average load > > + * per task to ensure the migration. > > + */ > > + if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && > > + busiest->group_no_capacity) { > > + env->imbalance = busiest->load_per_task; > > + return; > > + } > > + > > if (!local->sum_nr_running) > > local->load_per_task = cpu_avg_load_per_task(env->dst_cpu); > > else if (busiest->load_per_task > local->load_per_task) > >
Hi Peter, Ingo, On Fri, 14 Sep 2018 at 18:26, Vincent Guittot <vincent.guittot@linaro.org> wrote: > > On Fri, 14 Sep 2018 at 05:22, Valentin Schneider > <valentin.schneider@arm.com> wrote: > > > > Hi, > > > > On 10/09/18 07:43, Vincent Guittot wrote: > > > When CPUs have different capacity because of RT/DL tasks or > > > micro-architecture or max frequency differences, there are situation where > > > the imbalance is not correctly set to migrate waiting task on the idle CPU. > > > > > > The UC uses the force_balance case : > > > if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && > > > busiest->group_no_capacity) > > > goto force_balance; > > > > > > But calculate_imbalance fails to set the right amount of load to migrate > > > a task because of the special condition: > > > busiest->avg_load <= sds->avg_load || local->avg_load >= sds->avg_load) > > > > > > Add in fix_small_imbalance, this special case that triggered the force > > > balance in order to make sure that the amount of load to migrate will be > > > enough. > > > > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > > > Other than the commit nit, LGTM. Out of curiosity I ran some kernel compile > > on my HiKey960 (-j8) but didn't see much change - something along the lines > > of ~1% speedup, and although it was consistent over a few iterations, I'd > > need a whole lot more of them to back this up. > > > > I kind of expected it because some sporadic task can show up and tip the > > scale in the right direction, so even without the patch the situation can > > "fix itself" eventually, and it becomes less noticeable on really long > > workloads. > > I have seen a better stdev and shorter duration for the tests that you > used for misfit patch. > The test have been done with asym packing and the few fixes that I > sent in another patchset for asym packing > > > > > I do see a difference by looking at the trace of a simple 8 100% tasks rt-app > > workload though, as I no longer see that idling LITTLE I sometimes get > > without the patch, which is what we expect, so: > > > > Tested-by: Valentin Schneider <valentin.schneider@arm.com> > > Thanks > > > > > > --- > > > > Again, I'd argue for a slightly more explicit header. As you pointed out in > > v1, it's not just long running tasks, so maybe just "fix 1 *running* task per > > CPU"? Otherwise I feel it's a tad obscure. > > To be honest i don't mind about header but I don't see the benefit of > adding *running*. > So I let Peter or Ingo decide what they prefer This patch has been in the list for a while. What should i do to move forward with it ? Thanks Vincent > > > > > > kernel/sched/fair.c | 14 ++++++++++++++ > > > 1 file changed, 14 insertions(+) > > > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > > index 309c93f..72bc5e8 100644 > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -8048,6 +8048,20 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) > > > local = &sds->local_stat; > > > busiest = &sds->busiest_stat; > > > > > > + /* > > > + * There is available capacity in local group and busiest group is > > > + * overloaded but calculate_imbalance can't compute the amount of load > > > + * to migrate because load_avg became meaningless due to asymetric > > > + * capacity between groups. > > > > Could you add something along the lines of "(see similar condition in > > find_busiest_group())"? > > > > In such case, we only want to migrate at > > > + * least one tasks of the busiest group and rely of the average load > > > + * per task to ensure the migration. > > > + */ > > > + if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && > > > + busiest->group_no_capacity) { > > > + env->imbalance = busiest->load_per_task; > > > + return; > > > + } > > > + > > > if (!local->sum_nr_running) > > > local->load_per_task = cpu_avg_load_per_task(env->dst_cpu); > > > else if (busiest->load_per_task > local->load_per_task) > > >
On Mon, Sep 10, 2018 at 04:43:09PM +0200, Vincent Guittot wrote: > When CPUs have different capacity because of RT/DL tasks or > micro-architecture or max frequency differences, there are situation where > the imbalance is not correctly set to migrate waiting task on the idle CPU. > > The UC uses the force_balance case: > > if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && > busiest->group_no_capacity) > goto force_balance; > > But calculate_imbalance fails to set the right amount of load to migrate > a task because of the special condition: > > busiest->avg_load <= sds->avg_load || local->avg_load >= sds->avg_load) > > Add in fix_small_imbalance, this special case that triggered the force > balance in order to make sure that the amount of load to migrate will be > enough. So I think this patch is going in the wrong direction for a number of reasons: - we'd like to get rid of fix_small_imbalance(), and this adds to it; - the whole load_per_task stuff is terminally broken, it _cannot_ work right. What I've suggested in the past is parameterizing the load balancer and picking different criteria to balance on: - nr_running ; if there are idle CPUs around - utilization ; if there's idle time (u<1) - weight And I suppose you're hitting one of the nr_running cases here. > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > kernel/sched/fair.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 309c93f..72bc5e8 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -8048,6 +8048,20 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) > local = &sds->local_stat; > busiest = &sds->busiest_stat; > > + /* > + * There is available capacity in local group and busiest group is > + * overloaded but calculate_imbalance can't compute the amount of load > + * to migrate because load_avg became meaningless due to asymetric > + * capacity between groups. In such case, we only want to migrate at > + * least one tasks of the busiest group and rely of the average load > + * per task to ensure the migration. > + */ > + if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && > + busiest->group_no_capacity) { > + env->imbalance = busiest->load_per_task; > + return; > + } > + > if (!local->sum_nr_running) > local->load_per_task = cpu_avg_load_per_task(env->dst_cpu); > else if (busiest->load_per_task > local->load_per_task) > -- > 2.7.4 >
On Thu, 13 Dec 2018 at 11:44, Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Sep 10, 2018 at 04:43:09PM +0200, Vincent Guittot wrote: > > When CPUs have different capacity because of RT/DL tasks or > > micro-architecture or max frequency differences, there are situation where > > the imbalance is not correctly set to migrate waiting task on the idle CPU. > > > > The UC uses the force_balance case: > > > > if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && > > busiest->group_no_capacity) > > goto force_balance; > > > > But calculate_imbalance fails to set the right amount of load to migrate > > a task because of the special condition: > > > > busiest->avg_load <= sds->avg_load || local->avg_load >= sds->avg_load) > > > > Add in fix_small_imbalance, this special case that triggered the force > > balance in order to make sure that the amount of load to migrate will be > > enough. > > So I think this patch is going in the wrong direction for a number of > reasons: > > - we'd like to get rid of fix_small_imbalance(), and this adds to it; > > - the whole load_per_task stuff is terminally broken, it _cannot_ work > right. > > > What I've suggested in the past is parameterizing the load balancer and > picking different criteria to balance on: This patch is clearly a fix of the current implementation. What you suggest below makes sense but implies a significant rework in the calculate_imbalance and the load_balancer in general and will need more time to reach a stable state. Nevertheless, I will have a look at that I imagine that your feedback for https://lkml.org/lkml/2018/10/2/283 will be the same ? Vincent > > - nr_running ; if there are idle CPUs around > - utilization ; if there's idle time (u<1) > - weight > > And I suppose you're hitting one of the nr_running cases here. > > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > > --- > > kernel/sched/fair.c | 14 ++++++++++++++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 309c93f..72bc5e8 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -8048,6 +8048,20 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) > > local = &sds->local_stat; > > busiest = &sds->busiest_stat; > > > > + /* > > + * There is available capacity in local group and busiest group is > > + * overloaded but calculate_imbalance can't compute the amount of load > > + * to migrate because load_avg became meaningless due to asymetric > > + * capacity between groups. In such case, we only want to migrate at > > + * least one tasks of the busiest group and rely of the average load > > + * per task to ensure the migration. > > + */ > > + if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && > > + busiest->group_no_capacity) { > > + env->imbalance = busiest->load_per_task; > > + return; > > + } > > + > > if (!local->sum_nr_running) > > local->load_per_task = cpu_avg_load_per_task(env->dst_cpu); > > else if (busiest->load_per_task > local->load_per_task) > > -- > > 2.7.4 > >
On Thu, Dec 13, 2018 at 12:04:20PM +0100, Vincent Guittot wrote: > On Thu, 13 Dec 2018 at 11:44, Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Mon, Sep 10, 2018 at 04:43:09PM +0200, Vincent Guittot wrote: > > > When CPUs have different capacity because of RT/DL tasks or > > > micro-architecture or max frequency differences, there are situation where > > > the imbalance is not correctly set to migrate waiting task on the idle CPU. > > > > > > The UC uses the force_balance case: > > > > > > if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && > > > busiest->group_no_capacity) > > > goto force_balance; > > > > > > But calculate_imbalance fails to set the right amount of load to migrate > > > a task because of the special condition: > > > > > > busiest->avg_load <= sds->avg_load || local->avg_load >= sds->avg_load) > > > > > > Add in fix_small_imbalance, this special case that triggered the force > > > balance in order to make sure that the amount of load to migrate will be > > > enough. > > > > So I think this patch is going in the wrong direction for a number of > > reasons: > > > > - we'd like to get rid of fix_small_imbalance(), and this adds to it; > > > > - the whole load_per_task stuff is terminally broken, it _cannot_ work > > right. > > > > > > What I've suggested in the past is parameterizing the load balancer and > > picking different criteria to balance on: > > This patch is clearly a fix of the current implementation. > What you suggest below makes sense but implies a significant rework in > the calculate_imbalance and the load_balancer in general and will need > more time to reach a stable state. > Nevertheless, I will have a look at that > > I imagine that your feedback for https://lkml.org/lkml/2018/10/2/283 > will be the same ? No; those actually look ok. It is mostly that I really don't think load_per_task makes any kind of sense. It sorta works when all tasks are of the same weight, but if you start using nice -- or way worse, cgroups -- then the number is complete bollocks.
On Thu, 13 Dec 2018 at 14:12, Peter Zijlstra <peterz@infradead.org> wrote: > > On Thu, Dec 13, 2018 at 12:04:20PM +0100, Vincent Guittot wrote: > > On Thu, 13 Dec 2018 at 11:44, Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Mon, Sep 10, 2018 at 04:43:09PM +0200, Vincent Guittot wrote: > > > > When CPUs have different capacity because of RT/DL tasks or > > > > micro-architecture or max frequency differences, there are situation where > > > > the imbalance is not correctly set to migrate waiting task on the idle CPU. > > > > > > > > The UC uses the force_balance case: > > > > > > > > if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && > > > > busiest->group_no_capacity) > > > > goto force_balance; > > > > > > > > But calculate_imbalance fails to set the right amount of load to migrate > > > > a task because of the special condition: > > > > > > > > busiest->avg_load <= sds->avg_load || local->avg_load >= sds->avg_load) > > > > > > > > Add in fix_small_imbalance, this special case that triggered the force > > > > balance in order to make sure that the amount of load to migrate will be > > > > enough. > > > > > > So I think this patch is going in the wrong direction for a number of > > > reasons: > > > > > > - we'd like to get rid of fix_small_imbalance(), and this adds to it; > > > > > > - the whole load_per_task stuff is terminally broken, it _cannot_ work > > > right. > > > > > > > > > What I've suggested in the past is parameterizing the load balancer and > > > picking different criteria to balance on: > > > > This patch is clearly a fix of the current implementation. > > What you suggest below makes sense but implies a significant rework in > > the calculate_imbalance and the load_balancer in general and will need > > more time to reach a stable state. > > Nevertheless, I will have a look at that > > > > I imagine that your feedback for https://lkml.org/lkml/2018/10/2/283 > > will be the same ? > > No; those actually look ok. It is mostly that I really don't think > load_per_task makes any kind of sense. > > It sorta works when all tasks are of the same weight, but if you start > using nice -- or way worse, cgroups -- then the number is complete > bollocks. Yes. As soon as we goes out of a simple load balance UC, load_per_task becomes a kind of default random choice
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 309c93f..72bc5e8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8048,6 +8048,20 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) local = &sds->local_stat; busiest = &sds->busiest_stat; + /* + * There is available capacity in local group and busiest group is + * overloaded but calculate_imbalance can't compute the amount of load + * to migrate because load_avg became meaningless due to asymetric + * capacity between groups. In such case, we only want to migrate at + * least one tasks of the busiest group and rely of the average load + * per task to ensure the migration. + */ + if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && + busiest->group_no_capacity) { + env->imbalance = busiest->load_per_task; + return; + } + if (!local->sum_nr_running) local->load_per_task = cpu_avg_load_per_task(env->dst_cpu); else if (busiest->load_per_task > local->load_per_task)
When CPUs have different capacity because of RT/DL tasks or micro-architecture or max frequency differences, there are situation where the imbalance is not correctly set to migrate waiting task on the idle CPU. The UC uses the force_balance case : if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && busiest->group_no_capacity) goto force_balance; But calculate_imbalance fails to set the right amount of load to migrate a task because of the special condition: busiest->avg_load <= sds->avg_load || local->avg_load >= sds->avg_load) Add in fix_small_imbalance, this special case that triggered the force balance in order to make sure that the amount of load to migrate will be enough. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) -- 2.7.4