diff mbox series

sched/util_est: fix util_est_dequeue() for throttled cfs rq

Message ID 1528972380-16268-1-git-send-email-vincent.guittot@linaro.org
State Accepted
Commit 3482d98bbc730758b63a5d1cf41d05ea17481412
Headers show
Series sched/util_est: fix util_est_dequeue() for throttled cfs rq | expand

Commit Message

Vincent Guittot June 14, 2018, 10:33 a.m. UTC
When a cfs_rq is throttled, parent cfs_rq->nr_running is decreased and
everything happens at cfs_rq level. Currently util_est stays unchanged
in such case and it keeps accounting the utilization of throttled tasks.
This can somewhat make sense as we don't dequeue tasks but only throttled
cfs_rq.
If a task of another group is enqueued/dequeued and root cfs_rq becomes
idle during the dequeue, util_est will be cleared whereas it was
accounting util_est of throttled tasks before. So the behavior of util_est
is not always the same regarding throttled tasks and depends of side
activity. Furthermore, util_est will not be updated when the cfs_rq is
unthrottled as everything happens at cfs rq level. Main results is that
util_est will stay null whereas we now have running tasks. We have to wait
for the next dequeue/enqueue of the previously throttled tasks to get an
up to date util_est.

Remove the assumption that cfs_rq's estimated utilization of a CPU is 0
if there is no running task so the util_est of a task remains until the
latter is dequeued even if its cfs_rq has been throttled.

Fixes: 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

---
 kernel/sched/fair.c | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

-- 
2.7.4

Comments

Patrick Bellasi June 14, 2018, 11:32 a.m. UTC | #1
On 14-Jun 12:33, Vincent Guittot wrote:
> When a cfs_rq is throttled, parent cfs_rq->nr_running is decreased and

> everything happens at cfs_rq level. Currently util_est stays unchanged

> in such case and it keeps accounting the utilization of throttled tasks.

> This can somewhat make sense as we don't dequeue tasks but only throttled

> cfs_rq.


I think the idea here was that, if tasks are throttled, this should
manifest in a reduction of their utilization... and thus the estimated
utilization should still represent the amount of bandwidth required by
that tasks. Although one could argue that, while a TG is throttled we
would like to be able to drop the frequency if possible.

This has not been implemented that way so far because the
attach/detach of TGs will require to walk them to account for all
child tasks's util_est or, otherwise, to aggregate util_est across TGs.

> If a task of another group is enqueued/dequeued and root cfs_rq becomes

> idle during the dequeue, util_est will be cleared whereas it was

> accounting util_est of throttled tasks before.


Yep :/

> So the behavior of util_est

> is not always the same regarding throttled tasks and depends of side

> activity. Furthermore, util_est will not be updated when the cfs_rq is

> unthrottled


right... that happens because (un)throttling does not involve (en/de)queue.

> as everything happens at cfs rq level. Main results is that

> util_est will stay null whereas we now have running tasks. We have to wait

> for the next dequeue/enqueue of the previously throttled tasks to get an

> up to date util_est.

> 

> Remove the assumption that cfs_rq's estimated utilization of a CPU is 0

> if there is no running task so the util_est of a task remains until the

> latter is dequeued even if its cfs_rq has been throttled.


Right...

> Fixes: 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT")

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---

>  kernel/sched/fair.c | 16 ++++------------

>  1 file changed, 4 insertions(+), 12 deletions(-)

> 

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> index e497c05..d3121fc 100644

> --- a/kernel/sched/fair.c

> +++ b/kernel/sched/fair.c

> @@ -3982,18 +3982,10 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)

>  	if (!sched_feat(UTIL_EST))

>  		return;

>  

> -	/*

> -	 * Update root cfs_rq's estimated utilization

> -	 *

> -	 * If *p is the last task then the root cfs_rq's estimated utilization

> -	 * of a CPU is 0 by definition.

> -	 */

> -	ue.enqueued = 0;


... AFAIR, this reset what there since one of the first posts as an
"optimization". But actually I was not considering the scenario you
describe.

> -	if (cfs_rq->nr_running) {

> -		ue.enqueued  = cfs_rq->avg.util_est.enqueued;

> -		ue.enqueued -= min_t(unsigned int, ue.enqueued,

> -				     (_task_util_est(p) | UTIL_AVG_UNCHANGED));

> -	}

> +	/* Update root cfs_rq's estimated utilization */

> +	ue.enqueued  = cfs_rq->avg.util_est.enqueued;

> +	ue.enqueued -= min_t(unsigned int, ue.enqueued,

> +			     (_task_util_est(p) | UTIL_AVG_UNCHANGED));


So, this should still be bound-safe thanks to the min() for the
subtraction.


>  	WRITE_ONCE(cfs_rq->avg.util_est.enqueued, ue.enqueued);

>  

>  	/*

> -- 

> 2.7.4

> 


LGTM:

Reviewed-by: Patrick Bellasi <patrick.bellasi@arm.com>


-- 
#include <best/regards.h>

Patrick Bellasi
Peter Zijlstra June 21, 2018, 7:08 p.m. UTC | #2
On Thu, Jun 14, 2018 at 12:32:32PM +0100, Patrick Bellasi wrote:
> On 14-Jun 12:33, Vincent Guittot wrote:

> > When a cfs_rq is throttled, parent cfs_rq->nr_running is decreased and

> > everything happens at cfs_rq level. Currently util_est stays unchanged

> > in such case and it keeps accounting the utilization of throttled tasks.

> > This can somewhat make sense as we don't dequeue tasks but only throttled

> > cfs_rq.


> > If a task of another group is enqueued/dequeued and root cfs_rq becomes

> > idle during the dequeue, util_est will be cleared whereas it was

> > accounting util_est of throttled tasks before.


> > So the behavior of util_est

> > is not always the same regarding throttled tasks and depends of side

> > activity. Furthermore, util_est will not be updated when the cfs_rq is

> > unthrottled


> > as everything happens at cfs rq level. Main results is that

> > util_est will stay null whereas we now have running tasks. We have to wait

> > for the next dequeue/enqueue of the previously throttled tasks to get an

> > up to date util_est.

> > 

> > Remove the assumption that cfs_rq's estimated utilization of a CPU is 0

> > if there is no running task so the util_est of a task remains until the

> > latter is dequeued even if its cfs_rq has been throttled.


> > Fixes: 7f65ea42eb00 ("sched/fair: Add util_est on top of PELT")

> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>


> LGTM:

> 

> Reviewed-by: Patrick Bellasi <patrick.bellasi@arm.com>


Thanks guys!
diff mbox series

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e497c05..d3121fc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3982,18 +3982,10 @@  util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
 	if (!sched_feat(UTIL_EST))
 		return;
 
-	/*
-	 * Update root cfs_rq's estimated utilization
-	 *
-	 * If *p is the last task then the root cfs_rq's estimated utilization
-	 * of a CPU is 0 by definition.
-	 */
-	ue.enqueued = 0;
-	if (cfs_rq->nr_running) {
-		ue.enqueued  = cfs_rq->avg.util_est.enqueued;
-		ue.enqueued -= min_t(unsigned int, ue.enqueued,
-				     (_task_util_est(p) | UTIL_AVG_UNCHANGED));
-	}
+	/* Update root cfs_rq's estimated utilization */
+	ue.enqueued  = cfs_rq->avg.util_est.enqueued;
+	ue.enqueued -= min_t(unsigned int, ue.enqueued,
+			     (_task_util_est(p) | UTIL_AVG_UNCHANGED));
 	WRITE_ONCE(cfs_rq->avg.util_est.enqueued, ue.enqueued);
 
 	/*