Message ID | 1474892393-5095-6-git-send-email-vincent.guittot@linaro.org |
---|---|
State | Superseded |
Headers | show |
On 26/09/16 13:19, Vincent Guittot wrote: > A task can be asynchronously detached from cfs_rq when migrating > between CPUs. The load of the migrated task is then removed from > source cfs_rq during its next update. We use this event to set propagation > flag. > > During the load balance, we take advanatge of the update of blocked load > to we propagate any pending changes. IMHO, it would be a good idea to mention that '2/7 sched: fix hierarchical order in rq->leaf_cfs_rq_list' is a hard requirement for this to work. The functionality relies on the order of cfs_rq's (top to root) in the rq->leaf_cfs_rq_list list. > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > kernel/sched/fair.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 8ba500f..bd3b6b9 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3221,6 +3221,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq) > sub_positive(&sa->load_avg, r); > sub_positive(&sa->load_sum, r * LOAD_AVG_MAX); > removed_load = 1; > + set_tg_cfs_propagate(cfs_rq); > } > > if (atomic_long_read(&cfs_rq->removed_util_avg)) { > @@ -3228,6 +3229,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq) > sub_positive(&sa->util_avg, r); > sub_positive(&sa->util_sum, r * LOAD_AVG_MAX); > removed_util = 1; > + set_tg_cfs_propagate(cfs_rq); > } > > decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, > @@ -6607,6 +6609,10 @@ static void update_blocked_averages(int cpu) > > if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true)) > update_tg_load_avg(cfs_rq, 0); > + > + /* Propagate pending load changes to the parent */ > + if (cfs_rq->tg->se[cpu]) > + update_load_avg(cfs_rq->tg->se[cpu], 0); In my test (1 task (run/period: 8ms/16ms) in tg_root->tg_x->tg_y->*tg_z* and oscillating between cpu1 and cpu2) the cfs_rq related signals are nicely going down to 0 after the task has left the cpu but it doesn't seem to be the case for the corresponding se (cfs_rq->tg->se[cpu])? It should actually work correctly because of the update_tg_cfs_util/load() calls in update_load_avg(cfs_rq->tg->se[cpu], 0)->propagate_entity_load_avg() [...]
On 12 October 2016 at 17:03, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: > On 26/09/16 13:19, Vincent Guittot wrote: >> A task can be asynchronously detached from cfs_rq when migrating >> between CPUs. The load of the migrated task is then removed from >> source cfs_rq during its next update. We use this event to set propagation >> flag. >> >> During the load balance, we take advanatge of the update of blocked load >> to we propagate any pending changes. > > IMHO, it would be a good idea to mention that '2/7 sched: fix > hierarchical order in rq->leaf_cfs_rq_list' is a hard requirement for > this to work. The functionality relies on the order of cfs_rq's (top to > root) in the rq->leaf_cfs_rq_list list. yes. i will add a comment > >> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> >> --- >> kernel/sched/fair.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 8ba500f..bd3b6b9 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -3221,6 +3221,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq) >> sub_positive(&sa->load_avg, r); >> sub_positive(&sa->load_sum, r * LOAD_AVG_MAX); >> removed_load = 1; >> + set_tg_cfs_propagate(cfs_rq); >> } >> >> if (atomic_long_read(&cfs_rq->removed_util_avg)) { >> @@ -3228,6 +3229,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq) >> sub_positive(&sa->util_avg, r); >> sub_positive(&sa->util_sum, r * LOAD_AVG_MAX); >> removed_util = 1; >> + set_tg_cfs_propagate(cfs_rq); >> } >> >> decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, >> @@ -6607,6 +6609,10 @@ static void update_blocked_averages(int cpu) >> >> if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true)) >> update_tg_load_avg(cfs_rq, 0); >> + >> + /* Propagate pending load changes to the parent */ >> + if (cfs_rq->tg->se[cpu]) >> + update_load_avg(cfs_rq->tg->se[cpu], 0); > > In my test (1 task (run/period: 8ms/16ms) in tg_root->tg_x->tg_y->*tg_z* > and oscillating between cpu1 and cpu2) the cfs_rq related signals are > nicely going down to 0 after the task has left the cpu but it doesn't > seem to be the case for the corresponding se (cfs_rq->tg->se[cpu])? strange because such use case is part of the functional tests that I run and it was working fine according to last test that I did > > It should actually work correctly because of the > update_tg_cfs_util/load() calls in update_load_avg(cfs_rq->tg->se[cpu], > 0)->propagate_entity_load_avg() Furthermore, the update of the parent cfs_rq tg_x->cfs_rq[cpu] uses the delta between previous and new value for the child tg_y->se[cpu]. So it means that if tg_x->cfs_rq[cpu]->avg.load_avg goes down to 0, tg_y->se[cpu]->avg.load_avg has at least changed and most probably goes down to 0 too Can't it be a misplaced trace point ? > > [...]
On 12/10/16 16:45, Vincent Guittot wrote: > On 12 October 2016 at 17:03, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote: >> On 26/09/16 13:19, Vincent Guittot wrote: [...] >>> @@ -6607,6 +6609,10 @@ static void update_blocked_averages(int cpu) >>> >>> if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true)) >>> update_tg_load_avg(cfs_rq, 0); >>> + >>> + /* Propagate pending load changes to the parent */ >>> + if (cfs_rq->tg->se[cpu]) >>> + update_load_avg(cfs_rq->tg->se[cpu], 0); >> >> In my test (1 task (run/period: 8ms/16ms) in tg_root->tg_x->tg_y->*tg_z* >> and oscillating between cpu1 and cpu2) the cfs_rq related signals are >> nicely going down to 0 after the task has left the cpu but it doesn't >> seem to be the case for the corresponding se (cfs_rq->tg->se[cpu])? > > strange because such use case is part of the functional tests that I > run and it was working fine according to last test that I did > >> >> It should actually work correctly because of the >> update_tg_cfs_util/load() calls in update_load_avg(cfs_rq->tg->se[cpu], >> 0)->propagate_entity_load_avg() > > Furthermore, the update of the parent cfs_rq tg_x->cfs_rq[cpu] uses > the delta between previous and new value for the child tg_y->se[cpu]. > So it means that if tg_x->cfs_rq[cpu]->avg.load_avg goes down to 0, > tg_y->se[cpu]->avg.load_avg has at least changed and most probably > goes down to 0 too Makes sense. > > Can't it be a misplaced trace point ? Yes, you're right, it was a missing tracepoint. I only had se and cfs_rq pelt tracepoints in __update_load_avg() and attach/detach_entity_load_avg(). I've added them as well to propagate_entity_load_avg() after the update_tg_cfs_load() call and now it makes sense. Thanks! [...]
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 8ba500f..bd3b6b9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3221,6 +3221,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq) sub_positive(&sa->load_avg, r); sub_positive(&sa->load_sum, r * LOAD_AVG_MAX); removed_load = 1; + set_tg_cfs_propagate(cfs_rq); } if (atomic_long_read(&cfs_rq->removed_util_avg)) { @@ -3228,6 +3229,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq) sub_positive(&sa->util_avg, r); sub_positive(&sa->util_sum, r * LOAD_AVG_MAX); removed_util = 1; + set_tg_cfs_propagate(cfs_rq); } decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa, @@ -6607,6 +6609,10 @@ static void update_blocked_averages(int cpu) if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true)) update_tg_load_avg(cfs_rq, 0); + + /* Propagate pending load changes to the parent */ + if (cfs_rq->tg->se[cpu]) + update_load_avg(cfs_rq->tg->se[cpu], 0); } raw_spin_unlock_irqrestore(&rq->lock, flags); }
A task can be asynchronously detached from cfs_rq when migrating between CPUs. The load of the migrated task is then removed from source cfs_rq during its next update. We use this event to set propagation flag. During the load balance, we take advanatge of the update of blocked load to we propagate any pending changes. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 6 ++++++ 1 file changed, 6 insertions(+) -- 1.9.1