diff mbox

[5/7,v4] sched: propagate asynchrous detach

Message ID 1474892393-5095-6-git-send-email-vincent.guittot@linaro.org
State Superseded
Headers show

Commit Message

Vincent Guittot Sept. 26, 2016, 12:19 p.m. UTC
A task can be asynchronously detached from cfs_rq when migrating
between CPUs. The load of the migrated task is then removed from
source cfs_rq during its next update. We use this event to set propagation
flag.

During the load balance, we take advanatge of the update of blocked load
to we propagate any pending changes.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

---
 kernel/sched/fair.c | 6 ++++++
 1 file changed, 6 insertions(+)

-- 
1.9.1

Comments

Dietmar Eggemann Oct. 12, 2016, 3:03 p.m. UTC | #1
On 26/09/16 13:19, Vincent Guittot wrote:
> A task can be asynchronously detached from cfs_rq when migrating

> between CPUs. The load of the migrated task is then removed from

> source cfs_rq during its next update. We use this event to set propagation

> flag.

> 

> During the load balance, we take advanatge of the update of blocked load

> to we propagate any pending changes.


IMHO, it would be a good idea to mention that '2/7 sched: fix
hierarchical order in rq->leaf_cfs_rq_list' is a hard requirement for
this to work. The functionality relies on the order of cfs_rq's (top to
root) in the rq->leaf_cfs_rq_list list.

> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---

>  kernel/sched/fair.c | 6 ++++++

>  1 file changed, 6 insertions(+)

> 

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> index 8ba500f..bd3b6b9 100644

> --- a/kernel/sched/fair.c

> +++ b/kernel/sched/fair.c

> @@ -3221,6 +3221,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)

>  		sub_positive(&sa->load_avg, r);

>  		sub_positive(&sa->load_sum, r * LOAD_AVG_MAX);

>  		removed_load = 1;

> +		set_tg_cfs_propagate(cfs_rq);

>  	}

>  

>  	if (atomic_long_read(&cfs_rq->removed_util_avg)) {

> @@ -3228,6 +3229,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)

>  		sub_positive(&sa->util_avg, r);

>  		sub_positive(&sa->util_sum, r * LOAD_AVG_MAX);

>  		removed_util = 1;

> +		set_tg_cfs_propagate(cfs_rq);

>  	}

>  

>  	decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,

> @@ -6607,6 +6609,10 @@ static void update_blocked_averages(int cpu)

>  

>  		if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true))

>  			update_tg_load_avg(cfs_rq, 0);

> +

> +		/* Propagate pending load changes to the parent */

> +		if (cfs_rq->tg->se[cpu])

> +			update_load_avg(cfs_rq->tg->se[cpu], 0);


In my test (1 task (run/period: 8ms/16ms) in tg_root->tg_x->tg_y->*tg_z*
and oscillating between cpu1 and cpu2) the cfs_rq related signals are
nicely going down to 0 after the task has left the cpu but it doesn't
seem to be the case for the corresponding se (cfs_rq->tg->se[cpu])?

It should actually work correctly because of the
update_tg_cfs_util/load() calls in update_load_avg(cfs_rq->tg->se[cpu],
0)->propagate_entity_load_avg()

[...]
Vincent Guittot Oct. 12, 2016, 3:45 p.m. UTC | #2
On 12 October 2016 at 17:03, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 26/09/16 13:19, Vincent Guittot wrote:

>> A task can be asynchronously detached from cfs_rq when migrating

>> between CPUs. The load of the migrated task is then removed from

>> source cfs_rq during its next update. We use this event to set propagation

>> flag.

>>

>> During the load balance, we take advanatge of the update of blocked load

>> to we propagate any pending changes.

>

> IMHO, it would be a good idea to mention that '2/7 sched: fix

> hierarchical order in rq->leaf_cfs_rq_list' is a hard requirement for

> this to work. The functionality relies on the order of cfs_rq's (top to

> root) in the rq->leaf_cfs_rq_list list.


yes. i will add a comment

>

>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

>> ---

>>  kernel/sched/fair.c | 6 ++++++

>>  1 file changed, 6 insertions(+)

>>

>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

>> index 8ba500f..bd3b6b9 100644

>> --- a/kernel/sched/fair.c

>> +++ b/kernel/sched/fair.c

>> @@ -3221,6 +3221,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)

>>               sub_positive(&sa->load_avg, r);

>>               sub_positive(&sa->load_sum, r * LOAD_AVG_MAX);

>>               removed_load = 1;

>> +             set_tg_cfs_propagate(cfs_rq);

>>       }

>>

>>       if (atomic_long_read(&cfs_rq->removed_util_avg)) {

>> @@ -3228,6 +3229,7 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)

>>               sub_positive(&sa->util_avg, r);

>>               sub_positive(&sa->util_sum, r * LOAD_AVG_MAX);

>>               removed_util = 1;

>> +             set_tg_cfs_propagate(cfs_rq);

>>       }

>>

>>       decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,

>> @@ -6607,6 +6609,10 @@ static void update_blocked_averages(int cpu)

>>

>>               if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true))

>>                       update_tg_load_avg(cfs_rq, 0);

>> +

>> +             /* Propagate pending load changes to the parent */

>> +             if (cfs_rq->tg->se[cpu])

>> +                     update_load_avg(cfs_rq->tg->se[cpu], 0);

>

> In my test (1 task (run/period: 8ms/16ms) in tg_root->tg_x->tg_y->*tg_z*

> and oscillating between cpu1 and cpu2) the cfs_rq related signals are

> nicely going down to 0 after the task has left the cpu but it doesn't

> seem to be the case for the corresponding se (cfs_rq->tg->se[cpu])?


strange because such use case is part of the functional tests that I
run and it was working fine according to last test that I did

>

> It should actually work correctly because of the

> update_tg_cfs_util/load() calls in update_load_avg(cfs_rq->tg->se[cpu],

> 0)->propagate_entity_load_avg()


Furthermore, the update of the parent cfs_rq tg_x->cfs_rq[cpu] uses
the delta between previous and new value for the child tg_y->se[cpu].
So it means that if tg_x->cfs_rq[cpu]->avg.load_avg goes down to 0,
tg_y->se[cpu]->avg.load_avg has at least changed and most probably
goes down to 0 too

Can't it be a misplaced trace point ?

>

> [...]
Dietmar Eggemann Oct. 12, 2016, 4:18 p.m. UTC | #3
On 12/10/16 16:45, Vincent Guittot wrote:
> On 12 October 2016 at 17:03, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:

>> On 26/09/16 13:19, Vincent Guittot wrote:


[...]

>>> @@ -6607,6 +6609,10 @@ static void update_blocked_averages(int cpu)

>>>

>>>               if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true))

>>>                       update_tg_load_avg(cfs_rq, 0);

>>> +

>>> +             /* Propagate pending load changes to the parent */

>>> +             if (cfs_rq->tg->se[cpu])

>>> +                     update_load_avg(cfs_rq->tg->se[cpu], 0);

>>

>> In my test (1 task (run/period: 8ms/16ms) in tg_root->tg_x->tg_y->*tg_z*

>> and oscillating between cpu1 and cpu2) the cfs_rq related signals are

>> nicely going down to 0 after the task has left the cpu but it doesn't

>> seem to be the case for the corresponding se (cfs_rq->tg->se[cpu])?

> 

> strange because such use case is part of the functional tests that I

> run and it was working fine according to last test that I did

> 

>>

>> It should actually work correctly because of the

>> update_tg_cfs_util/load() calls in update_load_avg(cfs_rq->tg->se[cpu],

>> 0)->propagate_entity_load_avg()

> 

> Furthermore, the update of the parent cfs_rq tg_x->cfs_rq[cpu] uses

> the delta between previous and new value for the child tg_y->se[cpu].

> So it means that if tg_x->cfs_rq[cpu]->avg.load_avg goes down to 0,

> tg_y->se[cpu]->avg.load_avg has at least changed and most probably

> goes down to 0 too


Makes sense.

> 

> Can't it be a misplaced trace point ?


Yes, you're right, it was a missing tracepoint. I only had se and cfs_rq
pelt tracepoints in __update_load_avg() and
attach/detach_entity_load_avg(). I've added them as well to
propagate_entity_load_avg() after the update_tg_cfs_load() call and now
it makes sense. Thanks!

[...]
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8ba500f..bd3b6b9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3221,6 +3221,7 @@  update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
 		sub_positive(&sa->load_avg, r);
 		sub_positive(&sa->load_sum, r * LOAD_AVG_MAX);
 		removed_load = 1;
+		set_tg_cfs_propagate(cfs_rq);
 	}
 
 	if (atomic_long_read(&cfs_rq->removed_util_avg)) {
@@ -3228,6 +3229,7 @@  update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq, bool update_freq)
 		sub_positive(&sa->util_avg, r);
 		sub_positive(&sa->util_sum, r * LOAD_AVG_MAX);
 		removed_util = 1;
+		set_tg_cfs_propagate(cfs_rq);
 	}
 
 	decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
@@ -6607,6 +6609,10 @@  static void update_blocked_averages(int cpu)
 
 		if (update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq, true))
 			update_tg_load_avg(cfs_rq, 0);
+
+		/* Propagate pending load changes to the parent */
+		if (cfs_rq->tg->se[cpu])
+			update_load_avg(cfs_rq->tg->se[cpu], 0);
 	}
 	raw_spin_unlock_irqrestore(&rq->lock, flags);
 }