[RFC] sched: Ensure cpu_power periodic update

Message ID	1323717668-2143-1-git-send-email-vincent.guittot@linaro.org
State	Accepted
Headers	show Return-Path: <patch+caf_=linaro-patchwork=canonical.com@linaro.org> Received-SPF: neutral (google.com: 209.85.215.178 is neither permitted nor denied by best guess record for domain of vincent.guittot@linaro.org) client-ip=209.85.215.178; From: Vincent Guittot <vincent.guittot@linaro.org> To: linux-kernel@vger.kernel.org, linaro-dev@lists.linaro.org, a.p.zijlstra@chello.nl Cc: patches@linaro.org, Vincent Guittot <vincent.guittot@linaro.org> Subject: [RFC] sched: Ensure cpu_power periodic update Date: Mon, 12 Dec 2011 20:21:08 +0100 Message-Id: <1323717668-2143-1-git-send-email-vincent.guittot@linaro.org>

Vincent Guittot Dec. 12, 2011, 7:21 p.m. UTC

With a lot of small tasks, the softirq sched is nearly never called
when no_hz is enable. In this case the load_balance is mainly called with
the newly_idle mode which doesn't update the cpu_power.
Add a next_update field which ensure a maximum update period when
there is short activity

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 include/linux/sched.h |    1 +
 kernel/sched/fair.c   |   24 ++++++++++++++++--------
 2 files changed, 17 insertions(+), 8 deletions(-)

Peter Zijlstra Dec. 15, 2011, 10:08 a.m. UTC | #1

On Mon, 2011-12-12 at 20:21 +0100, Vincent Guittot wrote:
> With a lot of small tasks, the softirq sched is nearly never called
> when no_hz is enable. In this case the load_balance is mainly called with
> the newly_idle mode which doesn't update the cpu_power.
> Add a next_update field which ensure a maximum update period when
> there is short activity

> +	if (local_group) {
> +		if (idle != CPU_NEWLY_IDLE) {
> +			if (balance_cpu != this_cpu) {
> +				*balance = 0;
> +				return;
> +			}
> +			update_group_power(sd, this_cpu);
> +		} else if (time_after_eq(jiffies, group->sgp->next_update))
> +			update_group_power(sd, this_cpu);
>  	}

Hmm, I would have expected it to be called from the NOHZ balancing path
instead of the new_idle path. Your changelog fails to mentions any
considerations on this..

Then again, its probably easier to keep update_group_power on this_cpu
than to allow a remote update of your cpu_power.

So I'm not opposed to this patch, I'd just like a little extra
clarification.

Vincent Guittot Dec. 15, 2011, 1:36 p.m. UTC | #2

On 15 December 2011 11:08, Peter Zijlstra <a.p.zijlstra@chello.nl> wrote:
> On Mon, 2011-12-12 at 20:21 +0100, Vincent Guittot wrote:
>> With a lot of small tasks, the softirq sched is nearly never called
>> when no_hz is enable. In this case the load_balance is mainly called with
>> the newly_idle mode which doesn't update the cpu_power.
>> Add a next_update field which ensure a maximum update period when
>> there is short activity
>
>> +     if (local_group) {
>> +             if (idle != CPU_NEWLY_IDLE) {
>> +                     if (balance_cpu != this_cpu) {
>> +                             *balance = 0;
>> +                             return;
>> +                     }
>> +                     update_group_power(sd, this_cpu);
>> +             } else if (time_after_eq(jiffies, group->sgp->next_update))
>> +                     update_group_power(sd, this_cpu);
>>       }
>
> Hmm, I would have expected it to be called from the NOHZ balancing path
> instead of the new_idle path. Your changelog fails to mentions any
> considerations on this..
>

As we are not lucky, the small tasks are mainly running between ticks
and  the timer interrupt doesn't fire which implies that both
rebalance_domain of the cpu and nohz_balancer_kick are not called. We
have a lot of call to idle_balance() when cpus become idle and very
few calls to rebalance or nohz_idle_balance. If some tasks are rt
tasks, the cpu_power should be updated regularly to reflect current
use of the cpu by rt scheduler.

I'm using cyclictest to easily reproduce the problem on my dual cortex-A9

> Then again, its probably easier to keep update_group_power on this_cpu
> than to allow a remote update of your cpu_power.
>

This additional path for updating the cpu_power will only be used by
this_cpu because it is called by idle_balance. But we still have a
call to update_group_power by a remote cpu when nohz_idle_balance is
called.

> So I'm not opposed to this patch, I'd just like a little extra
> clarification.

Vincent

Suresh Siddha Dec. 16, 2011, 12:58 a.m. UTC | #3

On Thu, 2011-12-15 at 05:36 -0800, Vincent Guittot wrote:
> I'm using cyclictest to easily reproduce the problem on my dual cortex-A9

So does the cyclictest itself exhibit the problem or running cyclictest
with another workload showed the problem? In other words, what numbers
of the workload did you see change with this patch?

> 
> > Then again, its probably easier to keep update_group_power on this_cpu
> > than to allow a remote update of your cpu_power.
> >
> 
> This additional path for updating the cpu_power will only be used by
> this_cpu because it is called by idle_balance. But we still have a
> call to update_group_power by a remote cpu when nohz_idle_balance is
> called.

As Vincent mentioned, the current mainline kernel already updates the
remote cpu's group_power in the nohz idle load balancing patch.

Also with all the recent nohz idle load balancing using kick, on a
dual-core system there may not be any nohz idle load balancing if
multiple tasks wakeup, run for short time and go back to idle before the
next tick. We rely on the wakeup balance to get it right in this case.

thanks,
suresh

Vincent Guittot Dec. 16, 2011, 8:23 a.m. UTC | #4

On 16 December 2011 01:58, Suresh Siddha <suresh.b.siddha@intel.com> wrote:
> On Thu, 2011-12-15 at 05:36 -0800, Vincent Guittot wrote:
>> I'm using cyclictest to easily reproduce the problem on my dual cortex-A9
>
> So does the cyclictest itself exhibit the problem or running cyclictest
> with another workload showed the problem? In other words, what numbers
> of the workload did you see change with this patch?
>

Using a cyclictest -q -t 5 -D 4 on my dual cortex-A9 shows the fact
that the softirqs timer and sched are not called very often and the
cpu_power is nearly never updated.

I have also used the following sequence :

cyclictest -q -t 5 -D 4 &
sleep 2
cyclictest -q -t 3 --affinity=0 -p 99 -D 2

The cpu_power of cpu0 should start to decrease when the rt threads are
started. Without the patch, we must wait for the next sched softirq
for starting to update the cpu_power and we have no guarantee of the
maximum interval. With the patch, the cpu_power is updated regularly
using the balance_interval value.

Vincent

>>
>> > Then again, its probably easier to keep update_group_power on this_cpu
>> > than to allow a remote update of your cpu_power.
>> >
>>
>> This additional path for updating the cpu_power will only be used by
>> this_cpu because it is called by idle_balance. But we still have a
>> call to update_group_power by a remote cpu when nohz_idle_balance is
>> called.
>
> As Vincent mentioned, the current mainline kernel already updates the
> remote cpu's group_power in the nohz idle load balancing patch.
>
> Also with all the recent nohz idle load balancing using kick, on a
> dual-core system there may not be any nohz idle load balancing if
> multiple tasks wakeup, run for short time and go back to idle before the
> next tick. We rely on the wakeup balance to get it right in this case.
>
> thanks,
> suresh
>

Vincent Guittot Jan. 4, 2012, 8:22 a.m. UTC | #5

Hi Peter,

Does it sound good for you if I update the comment of this patch with
the explanation of the previous mails or do you need more information
?

Thanks,
Vincent

On 16 December 2011 09:23, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> On 16 December 2011 01:58, Suresh Siddha <suresh.b.siddha@intel.com> wrote:
>> On Thu, 2011-12-15 at 05:36 -0800, Vincent Guittot wrote:
>>> I'm using cyclictest to easily reproduce the problem on my dual cortex-A9
>>
>> So does the cyclictest itself exhibit the problem or running cyclictest
>> with another workload showed the problem? In other words, what numbers
>> of the workload did you see change with this patch?
>>
>
> Using a cyclictest -q -t 5 -D 4 on my dual cortex-A9 shows the fact
> that the softirqs timer and sched are not called very often and the
> cpu_power is nearly never updated.
>
> I have also used the following sequence :
>
> cyclictest -q -t 5 -D 4 &
> sleep 2
> cyclictest -q -t 3 --affinity=0 -p 99 -D 2
>
> The cpu_power of cpu0 should start to decrease when the rt threads are
> started. Without the patch, we must wait for the next sched softirq
> for starting to update the cpu_power and we have no guarantee of the
> maximum interval. With the patch, the cpu_power is updated regularly
> using the balance_interval value.
>
> Vincent
>
>>>
>>> > Then again, its probably easier to keep update_group_power on this_cpu
>>> > than to allow a remote update of your cpu_power.
>>> >
>>>
>>> This additional path for updating the cpu_power will only be used by
>>> this_cpu because it is called by idle_balance. But we still have a
>>> call to update_group_power by a remote cpu when nohz_idle_balance is
>>> called.
>>
>> As Vincent mentioned, the current mainline kernel already updates the
>> remote cpu's group_power in the nohz idle load balancing patch.
>>
>> Also with all the recent nohz idle load balancing using kick, on a
>> dual-core system there may not be any nohz idle load balancing if
>> multiple tasks wakeup, run for short time and go back to idle before the
>> next tick. We rely on the wakeup balance to get it right in this case.
>>
>> thanks,
>> suresh
>>

Peter Zijlstra Jan. 6, 2012, 12:10 p.m. UTC | #6

On Wed, 2012-01-04 at 09:22 +0100, Vincent Guittot wrote:
> Does it sound good for you if I update the comment of this patch with
> the explanation of the previous mails or do you need more information
> ?
> 
No, looks ok. Just got stuck in the mailbox + extra holidays delay.

Got it queued now, Thanks!

[RFC] sched: Ensure cpu_power periodic update

Commit Message

Comments

Patch