diff mbox

[v3] sched: fix first task of a task group is attached twice

Message ID 1464623541-25429-1-git-send-email-vincent.guittot@linaro.org
State New
Headers show

Commit Message

Vincent Guittot May 30, 2016, 3:52 p.m. UTC
The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
that the 1st sched_entity that will be attached, will keep its
last_update_time set to 0 and will attached once again during the
enqueue.
Initialize cfs_rq->avg.last_update_time to 1 instead.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

---

v3:
- add initialization of load_last_update_time_copy for not 64bits system
- move init into init_cfs_rq

v2:
- rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held

 kernel/sched/fair.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

-- 
1.9.1

Comments

Vincent Guittot May 31, 2016, 7:28 a.m. UTC | #1
Hi Yuyang,

On 30 May 2016 at 21:48, Yuyang Du <yuyang.du@intel.com> wrote:
> On Mon, May 30, 2016 at 05:52:20PM +0200, Vincent Guittot wrote:

>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect

>> that the 1st sched_entity that will be attached, will keep its

>> last_update_time set to 0 and will attached once again during the

>> enqueue.

>> Initialize cfs_rq->avg.last_update_time to 1 instead.

>>

>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

>> ---

>>

>> v3:

>> - add initialization of load_last_update_time_copy for not 64bits system

>> - move init into init_cfs_rq

>>

>> v2:

>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held

>>

>>  kernel/sched/fair.c | 10 ++++++++++

>>  1 file changed, 10 insertions(+)

>>

>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

>> index 218f8e8..86be9c1 100644

>> --- a/kernel/sched/fair.c

>> +++ b/kernel/sched/fair.c

>> @@ -8459,6 +8459,16 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)

>>       cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;

>>  #endif

>>  #ifdef CONFIG_SMP

>> +     /*

>> +      * Set last_update_time to something different from 0 to make

>> +      * sure the 1st sched_entity will not be attached twice:once

>> +      * when attaching the task to the group and one more time when

>> +      * enqueueing the task.

>> +      */

>

> The first time: "once when attaching the task to the group".

>

> That attaching is purely wrong, but will not have any effect (at least

> load/util wise), because the task will later be inited in

> init_entity_runnable_average().


This patch is not related to the init of a task but related to the
init of the cfs_rq and to what happen with the 1st task that is
enqueued on it.

Lets take a task A that has already been scheduled on other cfs_rq so
its se->avg.last_update_time is different from 0.

Create a new task group TGB
At creation, the cfs_rq->avg.last_update_time of this TGB is set to 0.

Now move task A on TGB.
A is attached to TGB so  se->avg.last_update_time =
cfs_rq->avg.last_update_time which is 0
A is then enqueued on th cfs_rq and because se->avg.last_update_time
== 0, A will be attached one more time on the cfs_rq

This patch set cfs_rq->avg.last_update_time to 1 at creation so the
1st time that A is attached to TGB,  se->avg.last_update_time =
cfs_rq->avg.last_update_time = 1 and A will not bve attached one more
time during the enqueue.

>

> The second time: "one more time when enqueueing the task".

>

> In enqueue_entity_load_avg(), your patch will not have any effect either,

> because of the above overwriting the new task's load in

> init_entity_runnable_average(), no?
Dietmar Eggemann June 1, 2016, 3:31 p.m. UTC | #2
On 30/05/16 16:52, Vincent Guittot wrote:
> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect

> that the 1st sched_entity that will be attached, will keep its


attached in .task_move_group ?

I'm not sure if we can have a .switched_to() directly followed by a
.enqueue_task() into a cfs_rq with avg.last_update_time = 0.

> last_update_time set to 0 and will attached once again during the

> enqueue.

> Initialize cfs_rq->avg.last_update_time to 1 instead.


Maybe worth mentioning in the header:

This double se attaching for the first task which moves into the task
group owning this cfs_rq (.task_move_group() and .enqueue_task()) can
(obviously) only happen if CONFIG_FAIR_GROUP_SCHED is set.

The reason for this is that we set 'se->avg.last_update_time =
cfs_rq->avg.last_update_time' in attach_entity_load_avg() and use
'migrated = !sa->last_update_time' as a flag in
enqueue_entity_load_avg() to decide if we call attach_entity_load_avg()
(again) or only update se->avg .

Tested-by: Dietmar Eggemann <Dietmar.Eggemann@arm.com>


> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---

> 

> v3:

> - add initialization of load_last_update_time_copy for not 64bits system

> - move init into init_cfs_rq

> 

> v2:

> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held

> 

>  kernel/sched/fair.c | 10 ++++++++++

>  1 file changed, 10 insertions(+)

> 

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> index 218f8e8..86be9c1 100644

> --- a/kernel/sched/fair.c

> +++ b/kernel/sched/fair.c

> @@ -8459,6 +8459,16 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)

>  	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;

>  #endif

>  #ifdef CONFIG_SMP

> +	/*

> +	 * Set last_update_time to something different from 0 to make

> +	 * sure the 1st sched_entity will not be attached twice: once

> +	 * when attaching the task to the group and one more time when

> +	 * enqueueing the task.

> +	 */

> +	cfs_rq->avg.last_update_time = 1;

> +#ifndef CONFIG_64BIT

> +	cfs_rq->load_last_update_time_copy = 1;

> +#endif

>  	atomic_long_set(&cfs_rq->removed_load_avg, 0);

>  	atomic_long_set(&cfs_rq->removed_util_avg, 0);

>  #endif
Vincent Guittot June 1, 2016, 3:54 p.m. UTC | #3
On 1 June 2016 at 17:31, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 30/05/16 16:52, Vincent Guittot wrote:

>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect

>> that the 1st sched_entity that will be attached, will keep its

>

> attached in .task_move_group ?

>

> I'm not sure if we can have a .switched_to() directly followed by a

> .enqueue_task() into a cfs_rq with avg.last_update_time = 0.


I think it can happen as well if the task is not queued during the
change of scheduling class because when we attach the task in
switched_to, the task->se.avg.last_update_time will stay to 0. So when
the task will be enqueued, it will be attached one more time

>

>> last_update_time set to 0 and will attached once again during the

>> enqueue.

>> Initialize cfs_rq->avg.last_update_time to 1 instead.

>

> Maybe worth mentioning in the header:

>

> This double se attaching for the first task which moves into the task

> group owning this cfs_rq (.task_move_group() and .enqueue_task()) can

> (obviously) only happen if CONFIG_FAIR_GROUP_SCHED is set.

>

> The reason for this is that we set 'se->avg.last_update_time =

> cfs_rq->avg.last_update_time' in attach_entity_load_avg() and use

> 'migrated = !sa->last_update_time' as a flag in

> enqueue_entity_load_avg() to decide if we call attach_entity_load_avg()

> (again) or only update se->avg .

>

> Tested-by: Dietmar Eggemann <Dietmar.Eggemann@arm.com>


Thanks

>

>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

>> ---

>>

>> v3:

>> - add initialization of load_last_update_time_copy for not 64bits system

>> - move init into init_cfs_rq

>>

>> v2:

>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held

>>

>>  kernel/sched/fair.c | 10 ++++++++++

>>  1 file changed, 10 insertions(+)

>>

>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

>> index 218f8e8..86be9c1 100644

>> --- a/kernel/sched/fair.c

>> +++ b/kernel/sched/fair.c

>> @@ -8459,6 +8459,16 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)

>>       cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;

>>  #endif

>>  #ifdef CONFIG_SMP

>> +     /*

>> +      * Set last_update_time to something different from 0 to make

>> +      * sure the 1st sched_entity will not be attached twice: once

>> +      * when attaching the task to the group and one more time when

>> +      * enqueueing the task.

>> +      */

>> +     cfs_rq->avg.last_update_time = 1;

>> +#ifndef CONFIG_64BIT

>> +     cfs_rq->load_last_update_time_copy = 1;

>> +#endif

>>       atomic_long_set(&cfs_rq->removed_load_avg, 0);

>>       atomic_long_set(&cfs_rq->removed_util_avg, 0);

>>  #endif
Dietmar Eggemann June 6, 2016, 7:32 p.m. UTC | #4
On 01/06/16 16:54, Vincent Guittot wrote:
> On 1 June 2016 at 17:31, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:

>> On 30/05/16 16:52, Vincent Guittot wrote:

>>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect

>>> that the 1st sched_entity that will be attached, will keep its

>>

>> attached in .task_move_group ?

>>

>> I'm not sure if we can have a .switched_to() directly followed by a

>> .enqueue_task() into a cfs_rq with avg.last_update_time = 0.

> 

> I think it can happen as well if the task is not queued during the

> change of scheduling class because when we attach the task in

> switched_to, the task->se.avg.last_update_time will stay to 0. So when

> the task will be enqueued, it will be attached one more time


You're right. 

So I started msc_test as an rt task in task group tg_1 (css.id=2) (1) and when it slept 
I changed it to become a cfs task (2).


(1) # chrt -r 99 ./cg.sh /tg_1 ./msc_test

(2) # chrt -o -p 0 2093

...
---> busy
[   84.336570] dequeue_task_rt: cpu=2 comm=msc_test pid=2093 tg_css_id=2
[   84.342948] enqueue_task_rt: cpu=1 comm=msc_test pid=2093 tg_css_id=2
---> sleep
[   86.548655] dequeue_task_rt: cpu=1 comm=msc_test pid=2093 tg_css_id=2
[   91.008457] switched_from_rt: cpu=1 comm=msc_test pid=2093
[   91.013896] switched_to_fair: cpu=1 comm=msc_test pid=2093
[   91.019336] attach_entity_load_avg: cpu=1 comm=msc_test pid=2093 last_update_time=0 tg->css.id=2
[   91.028499] do_sched_setscheduler comm=msc_test pid=2093 policy=0 rv=0
[   91.548807] enqueue_task_fair: cpu=1 comm=msc_test pid=2093 tg_css_id=2
[   91.555363] attach_entity_load_avg: cpu=1 comm=msc_test pid=2093 last_update_time=91548795220 tg->css.id=2
---> busy
...
Vincent Guittot June 7, 2016, 7:35 a.m. UTC | #5
Hi Dietmar,

On 6 June 2016 at 21:32, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 01/06/16 16:54, Vincent Guittot wrote:

>> On 1 June 2016 at 17:31, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:

>>> On 30/05/16 16:52, Vincent Guittot wrote:

>>>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect

>>>> that the 1st sched_entity that will be attached, will keep its

>>>

>>> attached in .task_move_group ?

>>>

>>> I'm not sure if we can have a .switched_to() directly followed by a

>>> .enqueue_task() into a cfs_rq with avg.last_update_time = 0.

>>

>> I think it can happen as well if the task is not queued during the

>> change of scheduling class because when we attach the task in

>> switched_to, the task->se.avg.last_update_time will stay to 0. So when

>> the task will be enqueued, it will be attached one more time

>

> You're right.

>

> So I started msc_test as an rt task in task group tg_1 (css.id=2) (1) and when it slept

> I changed it to become a cfs task (2).

>

>

> (1) # chrt -r 99 ./cg.sh /tg_1 ./msc_test

>

> (2) # chrt -o -p 0 2093

>

> ...

> ---> busy

> [   84.336570] dequeue_task_rt: cpu=2 comm=msc_test pid=2093 tg_css_id=2

> [   84.342948] enqueue_task_rt: cpu=1 comm=msc_test pid=2093 tg_css_id=2

> ---> sleep

> [   86.548655] dequeue_task_rt: cpu=1 comm=msc_test pid=2093 tg_css_id=2

> [   91.008457] switched_from_rt: cpu=1 comm=msc_test pid=2093

> [   91.013896] switched_to_fair: cpu=1 comm=msc_test pid=2093

> [   91.019336] attach_entity_load_avg: cpu=1 comm=msc_test pid=2093 last_update_time=0 tg->css.id=2

> [   91.028499] do_sched_setscheduler comm=msc_test pid=2093 policy=0 rv=0

> [   91.548807] enqueue_task_fair: cpu=1 comm=msc_test pid=2093 tg_css_id=2

> [   91.555363] attach_entity_load_avg: cpu=1 comm=msc_test pid=2093 last_update_time=91548795220 tg->css.id=2

> ---> busy

> ...


Thanks for testing the sequence
Vincent Guittot June 16, 2016, 7:12 a.m. UTC | #6
On 15 June 2016 at 21:19, Yuyang Du <yuyang.du@intel.com> wrote:
> On Mon, May 30, 2016 at 05:52:20PM +0200, Vincent Guittot wrote:

>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect

>> that the 1st sched_entity that will be attached, will keep its

>> last_update_time set to 0 and will attached once again during the

>> enqueue.

>> Initialize cfs_rq->avg.last_update_time to 1 instead.

>>

>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

>> ---

>>

>> v3:

>> - add initialization of load_last_update_time_copy for not 64bits system

>> - move init into init_cfs_rq

>>

>> v2:

>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held

>>

>>  kernel/sched/fair.c | 10 ++++++++++

>>  1 file changed, 10 insertions(+)

>>

>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

>> index 218f8e8..86be9c1 100644

>> --- a/kernel/sched/fair.c

>> +++ b/kernel/sched/fair.c

>> @@ -8459,6 +8459,16 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)

>>       cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;

>>  #endif

>>  #ifdef CONFIG_SMP

>> +     /*

>> +      * Set last_update_time to something different from 0 to make

>> +      * sure the 1st sched_entity will not be attached twice: once

>> +      * when attaching the task to the group and one more time when

>> +      * enqueueing the task.

>> +      */

>> +     cfs_rq->avg.last_update_time = 1;

>> +#ifndef CONFIG_64BIT

>> +     cfs_rq->load_last_update_time_copy = 1;

>> +#endif

>>       atomic_long_set(&cfs_rq->removed_load_avg, 0);

>>       atomic_long_set(&cfs_rq->removed_util_avg, 0);

>>  #endif

>

> Then, when enqueued, both cfs_rq and task will be decayed to 0, due to

> a large gap between 1 and now, no?


yes, like it is done currently (but 1ns later) .
Vincent Guittot June 16, 2016, 9:42 a.m. UTC | #7
On 16 June 2016 at 01:24, Yuyang Du <yuyang.du@intel.com> wrote:
> On Thu, Jun 16, 2016 at 09:12:58AM +0200, Vincent Guittot wrote:

>> > Then, when enqueued, both cfs_rq and task will be decayed to 0, due to

>> > a large gap between 1 and now, no?

>>

>> yes, like it is done currently (but 1ns later) .

>

> Well, currently, cfs_rq will be decayed to 0, but will then add the task.

> So it turns out the current result is right. Attached twice, but result

> is right. Correct?


So the load looks accidentally correct but not the utilization which
will be overestimated up to twice the max

With this change,the behavior of the 1st task becomes the same as what
happen to the other tasks that will be attached to the task group that
has been idle for a while and  that has an old last_update_time.
Then, i have other pending patch to fix this behavior, has mentioned
in a previous email
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e8..86be9c1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8459,6 +8459,16 @@  void init_cfs_rq(struct cfs_rq *cfs_rq)
 	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
 #endif
 #ifdef CONFIG_SMP
+	/*
+	 * Set last_update_time to something different from 0 to make
+	 * sure the 1st sched_entity will not be attached twice: once
+	 * when attaching the task to the group and one more time when
+	 * enqueueing the task.
+	 */
+	cfs_rq->avg.last_update_time = 1;
+#ifndef CONFIG_64BIT
+	cfs_rq->load_last_update_time_copy = 1;
+#endif
 	atomic_long_set(&cfs_rq->removed_load_avg, 0);
 	atomic_long_set(&cfs_rq->removed_util_avg, 0);
 #endif