diff mbox

[v2] sched: fix first task of a task group is attached twice

Message ID 1464188472-30086-1-git-send-email-vincent.guittot@linaro.org
State New
Headers show

Commit Message

Vincent Guittot May 25, 2016, 3:01 p.m. UTC
The cfs_rq->avg.last_update_time is initialize to 0 with the main effect
that the 1st sched_entity that will be attached, will keep its
last_update_time set to 0 and will attached once again during the
enqueue.
Initialize cfs_rq->avg.last_update_time to 1 instead.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

---

v2:
- rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held

 kernel/sched/fair.c | 8 ++++++++
 1 file changed, 8 insertions(+)

-- 
1.9.1

Comments

Vincent Guittot May 26, 2016, 8:26 a.m. UTC | #1
On 26 May 2016 at 00:38, Yuyang Du <yuyang.du@intel.com> wrote:
> On Wed, May 25, 2016 at 05:01:11PM +0200, Vincent Guittot wrote:

>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect

>> that the 1st sched_entity that will be attached, will keep its

>> last_update_time set to 0 and will attached once again during the

>> enqueue.

>> Initialize cfs_rq->avg.last_update_time to 1 instead.

>

> Actually, the first time (attach_task_cfs_rq()) is called even way

> before init_entity_runnable_average(), no?


maybe there is an issue during the fork of a task too
This patch is about the init of the sched_avg of a task group. The
problem happens when the 1st task is moved the group but it's not
about the fork of a task
Vincent Guittot May 26, 2016, 8:51 a.m. UTC | #2
On 26 May 2016 at 02:40, Yuyang Du <yuyang.du@intel.com> wrote:
> On Thu, May 26, 2016 at 10:26:54AM +0200, Vincent Guittot wrote:

>> On 26 May 2016 at 00:38, Yuyang Du <yuyang.du@intel.com> wrote:

>> > On Wed, May 25, 2016 at 05:01:11PM +0200, Vincent Guittot wrote:

>> >> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect

>> >> that the 1st sched_entity that will be attached, will keep its

>> >> last_update_time set to 0 and will attached once again during the

>> >> enqueue.

>> >> Initialize cfs_rq->avg.last_update_time to 1 instead.

>> >

>> > Actually, the first time (attach_task_cfs_rq()) is called even way

>> > before init_entity_runnable_average(), no?

>>

>> maybe there is an issue during the fork of a task too

>> This patch is about the init of the sched_avg of a task group. The

>> problem happens when the 1st task is moved the group but it's not

>> about the fork of a task

>

> Right, but the root cause is not addressed, which is when forked, the task

> should not be touched on sched avgs before init_entity_runnable_average()

> in wake_up_new_task(). You think?


so yes, this patch doesn't not address the fact that sched avgs should
not be touch before init_entity_runnable_average(). This patch is only
about inti of sched_avg of task group
Dietmar Eggemann May 27, 2016, 3:48 p.m. UTC | #3
On 25/05/16 16:01, Vincent Guittot wrote:
> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect

> that the 1st sched_entity that will be attached, will keep its

> last_update_time set to 0 and will attached once again during the

> enqueue.

> Initialize cfs_rq->avg.last_update_time to 1 instead.

> 

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---

> 

> v2:

> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held

> 

>  kernel/sched/fair.c | 8 ++++++++

>  1 file changed, 8 insertions(+)

> 

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> index 218f8e8..3724656 100644

> --- a/kernel/sched/fair.c

> +++ b/kernel/sched/fair.c

> @@ -8586,6 +8586,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,

>  		se->depth = parent->depth + 1;

>  	}

>  

> +	/*

> +	 * Set last_update_time to something different from 0 to make

> +	 * sure the 1st sched_entity will not be attached twice: once

> +	 * when attaching the task to the group and one more time when

> +	 * enqueueing the task.

> +	 */

> +	tg->cfs_rq[cpu]->avg.last_update_time = 1;

> +

>  	se->my_q = cfs_rq;

>  	/* guarantee group entities always have weight */

>  	update_load_set(&se->load, NICE_0_LOAD);


So why not setting the last_update_time value for those cfs_rq's when
we have the lock? E.g. in task_move_group_fair() or attach_task_cfs_rq().

@@ -8490,12 +8493,20 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
 #ifdef CONFIG_FAIR_GROUP_SCHED
 static void task_move_group_fair(struct task_struct *p)
 {
+#ifdef CONFIG_SMP
+       struct cfs_rq *cfs_rq = NULL;
+#endif
+
        detach_task_cfs_rq(p);
        set_task_rq(p, task_cpu(p));
 
 #ifdef CONFIG_SMP
        /* Tell se's cfs_rq has been changed -- migrated */
        p->se.avg.last_update_time = 0;
+
+       cfs_rq = cfs_rq_of(&p->se);
+       if (!cfs_rq->avg.last_update_time)
+               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));
 #endif

or

@@ -8423,6 +8423,9 @@ static void attach_task_cfs_rq(struct task_struct *p)
        se->depth = se->parent ? se->parent->depth + 1 : 0;
 #endif
 
+       if (!cfs_rq->avg.last_update_time)
+               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));
+
        /* Synchronize task with its cfs_rq */
        attach_entity_load_avg(cfs_rq, se);
Vincent Guittot May 27, 2016, 5:16 p.m. UTC | #4
On 27 May 2016 at 17:48, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 25/05/16 16:01, Vincent Guittot wrote:

>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect

>> that the 1st sched_entity that will be attached, will keep its

>> last_update_time set to 0 and will attached once again during the

>> enqueue.

>> Initialize cfs_rq->avg.last_update_time to 1 instead.

>>

>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

>> ---

>>

>> v2:

>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held

>>

>>  kernel/sched/fair.c | 8 ++++++++

>>  1 file changed, 8 insertions(+)

>>

>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

>> index 218f8e8..3724656 100644

>> --- a/kernel/sched/fair.c

>> +++ b/kernel/sched/fair.c

>> @@ -8586,6 +8586,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,

>>               se->depth = parent->depth + 1;

>>       }

>>

>> +     /*

>> +      * Set last_update_time to something different from 0 to make

>> +      * sure the 1st sched_entity will not be attached twice: once

>> +      * when attaching the task to the group and one more time when

>> +      * enqueueing the task.

>> +      */

>> +     tg->cfs_rq[cpu]->avg.last_update_time = 1;

>> +

>>       se->my_q = cfs_rq;

>>       /* guarantee group entities always have weight */

>>       update_load_set(&se->load, NICE_0_LOAD);

>

> So why not setting the last_update_time value for those cfs_rq's when

> we have the lock? E.g. in task_move_group_fair() or attach_task_cfs_rq().


I'm not sure that it's worth adding this init in functions that are
then used often only for the init of it.
If you are concerned by the update of the load of the 1st task that
will be attached, it can still have elapsed  a long time between the
creation of the group and the 1st enqueue of a task. This was the case
for the test i did when i found this issue.

Beside this point, I have to send a new version to set
load_last_update_time_copy for not 64 bits system. Fengguang points me
the issue

Regards,
Vincent

>

> @@ -8490,12 +8493,20 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)

>  #ifdef CONFIG_FAIR_GROUP_SCHED

>  static void task_move_group_fair(struct task_struct *p)

>  {

> +#ifdef CONFIG_SMP

> +       struct cfs_rq *cfs_rq = NULL;

> +#endif

> +

>         detach_task_cfs_rq(p);

>         set_task_rq(p, task_cpu(p));

>

>  #ifdef CONFIG_SMP

>         /* Tell se's cfs_rq has been changed -- migrated */

>         p->se.avg.last_update_time = 0;

> +

> +       cfs_rq = cfs_rq_of(&p->se);

> +       if (!cfs_rq->avg.last_update_time)

> +               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));

>  #endif

>

> or

>

> @@ -8423,6 +8423,9 @@ static void attach_task_cfs_rq(struct task_struct *p)

>         se->depth = se->parent ? se->parent->depth + 1 : 0;

>  #endif

>

> +       if (!cfs_rq->avg.last_update_time)

> +               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));

> +

>         /* Synchronize task with its cfs_rq */

>         attach_entity_load_avg(cfs_rq, se);
Dietmar Eggemann May 27, 2016, 8:38 p.m. UTC | #5
On 27/05/16 18:16, Vincent Guittot wrote:
> On 27 May 2016 at 17:48, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:

>> On 25/05/16 16:01, Vincent Guittot wrote:

>>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect

>>> that the 1st sched_entity that will be attached, will keep its

>>> last_update_time set to 0 and will attached once again during the

>>> enqueue.

>>> Initialize cfs_rq->avg.last_update_time to 1 instead.

>>>

>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

>>> ---

>>>

>>> v2:

>>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held

>>>

>>>  kernel/sched/fair.c | 8 ++++++++

>>>  1 file changed, 8 insertions(+)

>>>

>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

>>> index 218f8e8..3724656 100644

>>> --- a/kernel/sched/fair.c

>>> +++ b/kernel/sched/fair.c

>>> @@ -8586,6 +8586,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,

>>>               se->depth = parent->depth + 1;

>>>       }

>>>

>>> +     /*

>>> +      * Set last_update_time to something different from 0 to make

>>> +      * sure the 1st sched_entity will not be attached twice: once

>>> +      * when attaching the task to the group and one more time when

>>> +      * enqueueing the task.

>>> +      */

>>> +     tg->cfs_rq[cpu]->avg.last_update_time = 1;

>>> +


Couldn't you not just set the value in init_cfs_rq():

@@ -8482,6 +8482,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
        cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
 #endif
 #ifdef CONFIG_SMP
+       cfs_rq->avg.last_update_time = 1;
        atomic_long_set(&cfs_rq->removed_load_avg, 0);
        atomic_long_set(&cfs_rq->removed_util_avg, 0);
 #endif

>>>       se->my_q = cfs_rq;

>>>       /* guarantee group entities always have weight */

>>>       update_load_set(&se->load, NICE_0_LOAD);

>>

>> So why not setting the last_update_time value for those cfs_rq's when

>> we have the lock? E.g. in task_move_group_fair() or attach_task_cfs_rq().

> 

> I'm not sure that it's worth adding this init in functions that are

> then used often only for the init of it.


Yeah, there will be this if condition overhead.

> If you are concerned by the update of the load of the 1st task that

> will be attached, it can still have elapsed  a long time between the

> creation of the group and the 1st enqueue of a task. This was the case

> for the test i did when i found this issue.


Understood, but for me, creation of the task group is
cpu_cgroup_css_alloc ->  sched_create_group() -> ... -> init_cfs_rq(),
init_tg_cfs_entry(), ...

and the functions which are called when the first task is put into the
task group are cpu_cgroup_attach() and cpu_cgroup_fork() and they whould
trigger the initial setup of the cfs_rq->avg.last_update_time.

> 

> Beside this point, I have to send a new version to set

> load_last_update_time_copy for not 64 bits system. Fengguang points me

> the issue


OK.

[...]
>>

>> +       if (!cfs_rq->avg.last_update_time)

>> +               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));

>> +

>>         /* Synchronize task with its cfs_rq */

>>         attach_entity_load_avg(cfs_rq, se);

>
Vincent Guittot May 30, 2016, 7:04 a.m. UTC | #6
On 27 May 2016 at 22:38, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 27/05/16 18:16, Vincent Guittot wrote:

>> On 27 May 2016 at 17:48, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:

>>> On 25/05/16 16:01, Vincent Guittot wrote:

>>>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect

>>>> that the 1st sched_entity that will be attached, will keep its

>>>> last_update_time set to 0 and will attached once again during the

>>>> enqueue.

>>>> Initialize cfs_rq->avg.last_update_time to 1 instead.

>>>>

>>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

>>>> ---

>>>>

>>>> v2:

>>>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held

>>>>

>>>>  kernel/sched/fair.c | 8 ++++++++

>>>>  1 file changed, 8 insertions(+)

>>>>

>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

>>>> index 218f8e8..3724656 100644

>>>> --- a/kernel/sched/fair.c

>>>> +++ b/kernel/sched/fair.c

>>>> @@ -8586,6 +8586,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,

>>>>               se->depth = parent->depth + 1;

>>>>       }

>>>>

>>>> +     /*

>>>> +      * Set last_update_time to something different from 0 to make

>>>> +      * sure the 1st sched_entity will not be attached twice: once

>>>> +      * when attaching the task to the group and one more time when

>>>> +      * enqueueing the task.

>>>> +      */

>>>> +     tg->cfs_rq[cpu]->avg.last_update_time = 1;

>>>> +

>

> Couldn't you not just set the value in init_cfs_rq():


yes, there is no good reason to use  init_tg_cfs_entry instead of  init_cfs_rq

>

> @@ -8482,6 +8482,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)

>         cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;

>  #endif

>  #ifdef CONFIG_SMP

> +       cfs_rq->avg.last_update_time = 1;

>         atomic_long_set(&cfs_rq->removed_load_avg, 0);

>         atomic_long_set(&cfs_rq->removed_util_avg, 0);

>  #endif

>

>>>>       se->my_q = cfs_rq;

>>>>       /* guarantee group entities always have weight */

>>>>       update_load_set(&se->load, NICE_0_LOAD);

>>>

>>> So why not setting the last_update_time value for those cfs_rq's when

>>> we have the lock? E.g. in task_move_group_fair() or attach_task_cfs_rq().

>>

>> I'm not sure that it's worth adding this init in functions that are

>> then used often only for the init of it.

>

> Yeah, there will be this if condition overhead.

>

>> If you are concerned by the update of the load of the 1st task that

>> will be attached, it can still have elapsed  a long time between the

>> creation of the group and the 1st enqueue of a task. This was the case

>> for the test i did when i found this issue.

>

> Understood, but for me, creation of the task group is

> cpu_cgroup_css_alloc ->  sched_create_group() -> ... -> init_cfs_rq(),

> init_tg_cfs_entry(), ...

>

> and the functions which are called when the first task is put into the

> task group are cpu_cgroup_attach() and cpu_cgroup_fork() and they whould

> trigger the initial setup of the cfs_rq->avg.last_update_time.

>

>>

>> Beside this point, I have to send a new version to set

>> load_last_update_time_copy for not 64 bits system. Fengguang points me

>> the issue

>

> OK.

>

> [...]

>>>

>>> +       if (!cfs_rq->avg.last_update_time)

>>> +               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));

>>> +

>>>         /* Synchronize task with its cfs_rq */

>>>         attach_entity_load_avg(cfs_rq, se);

>>
Vincent Guittot May 30, 2016, 3:54 p.m. UTC | #7
On 27 May 2016 at 22:38, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 27/05/16 18:16, Vincent Guittot wrote:

>> On 27 May 2016 at 17:48, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:

>>> On 25/05/16 16:01, Vincent Guittot wrote:

>>>> The cfs_rq->avg.last_update_time is initialize to 0 with the main effect

>>>> that the 1st sched_entity that will be attached, will keep its

>>>> last_update_time set to 0 and will attached once again during the

>>>> enqueue.

>>>> Initialize cfs_rq->avg.last_update_time to 1 instead.

>>>>

>>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

>>>> ---

>>>>

>>>> v2:

>>>> - rq_clock_task(rq_of(cfs_rq)) can't be used because lock is not held

>>>>

>>>>  kernel/sched/fair.c | 8 ++++++++

>>>>  1 file changed, 8 insertions(+)

>>>>

>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

>>>> index 218f8e8..3724656 100644

>>>> --- a/kernel/sched/fair.c

>>>> +++ b/kernel/sched/fair.c

>>>> @@ -8586,6 +8586,14 @@ void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,

>>>>               se->depth = parent->depth + 1;

>>>>       }

>>>>

>>>> +     /*

>>>> +      * Set last_update_time to something different from 0 to make

>>>> +      * sure the 1st sched_entity will not be attached twice: once

>>>> +      * when attaching the task to the group and one more time when

>>>> +      * enqueueing the task.

>>>> +      */

>>>> +     tg->cfs_rq[cpu]->avg.last_update_time = 1;

>>>> +

>

> Couldn't you not just set the value in init_cfs_rq():

>

> @@ -8482,6 +8482,7 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)

>         cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;

>  #endif

>  #ifdef CONFIG_SMP

> +       cfs_rq->avg.last_update_time = 1;

>         atomic_long_set(&cfs_rq->removed_load_avg, 0);

>         atomic_long_set(&cfs_rq->removed_util_avg, 0);

>  #endif

>

>>>>       se->my_q = cfs_rq;

>>>>       /* guarantee group entities always have weight */

>>>>       update_load_set(&se->load, NICE_0_LOAD);

>>>

>>> So why not setting the last_update_time value for those cfs_rq's when

>>> we have the lock? E.g. in task_move_group_fair() or attach_task_cfs_rq().

>>

>> I'm not sure that it's worth adding this init in functions that are

>> then used often only for the init of it.

>

> Yeah, there will be this if condition overhead.

>

>> If you are concerned by the update of the load of the 1st task that

>> will be attached, it can still have elapsed  a long time between the

>> creation of the group and the 1st enqueue of a task. This was the case

>> for the test i did when i found this issue.

>

> Understood, but for me, creation of the task group is

> cpu_cgroup_css_alloc ->  sched_create_group() -> ... -> init_cfs_rq(),

> init_tg_cfs_entry(), ...

>

> and the functions which are called when the first task is put into the

> task group are cpu_cgroup_attach() and cpu_cgroup_fork() and they whould

> trigger the initial setup of the cfs_rq->avg.last_update_time.


Adding a test and the init of cfs_rq->avg.last_update_time in
cpu_cgroup_attach() and cpu_cgroup_fork() in order to have an almost
up to date cfs_rq->avg.last_update_time at creation, will only solve a
part of a wider issue that happens when moving a task to a cfs_rq that
has not been updated for a while (since the creation for the 1st time
but also since the last update of a blocked cfs_rq)

I have another pending patch for this kind of issue that i haven't sent yet

>

>>

>> Beside this point, I have to send a new version to set

>> load_last_update_time_copy for not 64 bits system. Fengguang points me

>> the issue

>

> OK.

>

> [...]

>>>

>>> +       if (!cfs_rq->avg.last_update_time)

>>> +               cfs_rq->avg.last_update_time = rq_clock_task(rq_of(cfs_rq));

>>> +

>>>         /* Synchronize task with its cfs_rq */

>>>         attach_entity_load_avg(cfs_rq, se);

>>
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 218f8e8..3724656 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8586,6 +8586,14 @@  void init_tg_cfs_entry(struct task_group *tg, struct cfs_rq *cfs_rq,
 		se->depth = parent->depth + 1;
 	}
 
+	/*
+	 * Set last_update_time to something different from 0 to make
+	 * sure the 1st sched_entity will not be attached twice: once
+	 * when attaching the task to the group and one more time when
+	 * enqueueing the task.
+	 */
+	tg->cfs_rq[cpu]->avg.last_update_time = 1;
+
 	se->my_q = cfs_rq;
 	/* guarantee group entities always have weight */
 	update_load_set(&se->load, NICE_0_LOAD);