Message ID | 1528972380-16268-1-git-send-email-vincent.guittot@linaro.org |
---|---|
State | Accepted |
Commit | 3482d98bbc730758b63a5d1cf41d05ea17481412 |
Headers | show |
Series | sched/util_est: fix util_est_dequeue() for throttled cfs rq | expand |
On 14-Jun 12:33, Vincent Guittot wrote: > When a cfs_rq is throttled, parent cfs_rq->nr_running is decreased and > everything happens at cfs_rq level. Currently util_est stays unchanged > in such case and it keeps accounting the utilization of throttled tasks. > This can somewhat make sense as we don't dequeue tasks but only throttled > cfs_rq. I think the idea here was that, if tasks are throttled, this should manifest in a reduction of their utilization... and thus the estimated utilization should still represent the amount of bandwidth required by that tasks. Although one could argue that, while a TG is throttled we would like to be able to drop the frequency if possible. This has not been implemented that way so far because the attach/detach of TGs will require to walk them to account for all child tasks's util_est or, otherwise, to aggregate util_est across TGs. > If a task of another group is enqueued/dequeued and root cfs_rq becomes > idle during the dequeue, util_est will be cleared whereas it was > accounting util_est of throttled tasks before. Yep :/ > So the behavior of util_est > is not always the same regarding throttled tasks and depends of side > activity. Furthermore, util_est will not be updated when the cfs_rq is > unthrottled right... that happens because (un)throttling does not involve (en/de)queue. > as everything happens at cfs rq level. Main results is that > util_est will stay null whereas we now have running tasks. We have to wait > for the next dequeue/enqueue of the previously throttled tasks to get an > up to date util_est. > > Remove the assumption that cfs_rq's estimated utilization of a CPU is 0 > if there is no running task so the util_est of a task remains until the > latter is dequeued even if its cfs_rq has been throttled. Right... > Fixes: 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > --- > kernel/sched/fair.c | 16 ++++------------ > 1 file changed, 4 insertions(+), 12 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index e497c05..d3121fc 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3982,18 +3982,10 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep) > if (!sched_feat(UTIL_EST)) > return; > > - /* > - * Update root cfs_rq's estimated utilization > - * > - * If *p is the last task then the root cfs_rq's estimated utilization > - * of a CPU is 0 by definition. > - */ > - ue.enqueued = 0; ... AFAIR, this reset what there since one of the first posts as an "optimization". But actually I was not considering the scenario you describe. > - if (cfs_rq->nr_running) { > - ue.enqueued = cfs_rq->avg.util_est.enqueued; > - ue.enqueued -= min_t(unsigned int, ue.enqueued, > - (_task_util_est(p) | UTIL_AVG_UNCHANGED)); > - } > + /* Update root cfs_rq's estimated utilization */ > + ue.enqueued = cfs_rq->avg.util_est.enqueued; > + ue.enqueued -= min_t(unsigned int, ue.enqueued, > + (_task_util_est(p) | UTIL_AVG_UNCHANGED)); So, this should still be bound-safe thanks to the min() for the subtraction. > WRITE_ONCE(cfs_rq->avg.util_est.enqueued, ue.enqueued); > > /* > -- > 2.7.4 > LGTM: Reviewed-by: Patrick Bellasi <patrick.bellasi@arm.com> -- #include <best/regards.h> Patrick Bellasi
On Thu, Jun 14, 2018 at 12:32:32PM +0100, Patrick Bellasi wrote: > On 14-Jun 12:33, Vincent Guittot wrote: > > When a cfs_rq is throttled, parent cfs_rq->nr_running is decreased and > > everything happens at cfs_rq level. Currently util_est stays unchanged > > in such case and it keeps accounting the utilization of throttled tasks. > > This can somewhat make sense as we don't dequeue tasks but only throttled > > cfs_rq. > > If a task of another group is enqueued/dequeued and root cfs_rq becomes > > idle during the dequeue, util_est will be cleared whereas it was > > accounting util_est of throttled tasks before. > > So the behavior of util_est > > is not always the same regarding throttled tasks and depends of side > > activity. Furthermore, util_est will not be updated when the cfs_rq is > > unthrottled > > as everything happens at cfs rq level. Main results is that > > util_est will stay null whereas we now have running tasks. We have to wait > > for the next dequeue/enqueue of the previously throttled tasks to get an > > up to date util_est. > > > > Remove the assumption that cfs_rq's estimated utilization of a CPU is 0 > > if there is no running task so the util_est of a task remains until the > > latter is dequeued even if its cfs_rq has been throttled. > > Fixes: 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") > > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> > LGTM: > > Reviewed-by: Patrick Bellasi <patrick.bellasi@arm.com> Thanks guys!
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e497c05..d3121fc 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3982,18 +3982,10 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep) if (!sched_feat(UTIL_EST)) return; - /* - * Update root cfs_rq's estimated utilization - * - * If *p is the last task then the root cfs_rq's estimated utilization - * of a CPU is 0 by definition. - */ - ue.enqueued = 0; - if (cfs_rq->nr_running) { - ue.enqueued = cfs_rq->avg.util_est.enqueued; - ue.enqueued -= min_t(unsigned int, ue.enqueued, - (_task_util_est(p) | UTIL_AVG_UNCHANGED)); - } + /* Update root cfs_rq's estimated utilization */ + ue.enqueued = cfs_rq->avg.util_est.enqueued; + ue.enqueued -= min_t(unsigned int, ue.enqueued, + (_task_util_est(p) | UTIL_AVG_UNCHANGED)); WRITE_ONCE(cfs_rq->avg.util_est.enqueued, ue.enqueued); /*
When a cfs_rq is throttled, parent cfs_rq->nr_running is decreased and everything happens at cfs_rq level. Currently util_est stays unchanged in such case and it keeps accounting the utilization of throttled tasks. This can somewhat make sense as we don't dequeue tasks but only throttled cfs_rq. If a task of another group is enqueued/dequeued and root cfs_rq becomes idle during the dequeue, util_est will be cleared whereas it was accounting util_est of throttled tasks before. So the behavior of util_est is not always the same regarding throttled tasks and depends of side activity. Furthermore, util_est will not be updated when the cfs_rq is unthrottled as everything happens at cfs rq level. Main results is that util_est will stay null whereas we now have running tasks. We have to wait for the next dequeue/enqueue of the previously throttled tasks to get an up to date util_est. Remove the assumption that cfs_rq's estimated utilization of a CPU is 0 if there is no running task so the util_est of a task remains until the latter is dequeued even if its cfs_rq has been throttled. Fixes: 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> --- kernel/sched/fair.c | 16 ++++------------ 1 file changed, 4 insertions(+), 12 deletions(-) -- 2.7.4