diff mbox

[06/16] sched: Disable WAKE_AFFINE for asymmetric configurations

Message ID 1464001138-25063-7-git-send-email-morten.rasmussen@arm.com
State New
Headers show

Commit Message

Morten Rasmussen May 23, 2016, 10:58 a.m. UTC
If the system has cpu of different compute capacities (e.g. big.LITTLE)
let affine wakeups be constrained to cpus of the same type.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

---
 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)

-- 
1.9.1

Comments

Vincent Guittot May 24, 2016, 9:10 a.m. UTC | #1
On 23 May 2016 at 12:58, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> If the system has cpu of different compute capacities (e.g. big.LITTLE)

> let affine wakeups be constrained to cpus of the same type.


Can you explain why you don't want wake affine with cpus with
different compute capacity ?

>

> cc: Ingo Molnar <mingo@redhat.com>

> cc: Peter Zijlstra <peterz@infradead.org>

>

> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

> ---

>  kernel/sched/core.c | 3 +++

>  1 file changed, 3 insertions(+)

>

> diff --git a/kernel/sched/core.c b/kernel/sched/core.c

> index d9619a3..558ec4a 100644

> --- a/kernel/sched/core.c

> +++ b/kernel/sched/core.c

> @@ -6410,6 +6410,9 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)

>                 sd->idle_idx = 1;

>         }

>

> +       if (sd->flags & SD_ASYM_CPUCAPACITY)

> +               sd->flags &= ~SD_WAKE_AFFINE;

> +

>         sd->private = &tl->data;

>

>         return sd;

> --

> 1.9.1

>
Morten Rasmussen May 24, 2016, 10:29 a.m. UTC | #2
On Tue, May 24, 2016 at 11:10:28AM +0200, Vincent Guittot wrote:
> On 23 May 2016 at 12:58, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> > If the system has cpu of different compute capacities (e.g. big.LITTLE)

> > let affine wakeups be constrained to cpus of the same type.

> 

> Can you explain why you don't want wake affine with cpus with

> different compute capacity ?


I should have made the overall idea a bit more clear. The idea is to
deal with cross-capacity migrations in the find_idlest_{group, cpu}{}
path so we don't have to touch select_idle_sibling().
select_idle_sibling() is critical for wake-up latency, and I'm assumed
that people wouldn't like adding extra overhead in there to deal with
capacity and utilization.

So the overall idea is that symmetric capacity systems, everything
should work as normal. For asymmetric capacity systems, we restrict
select_idle_sibling() to only look among same-capacity cpus and then use
wake_cap() to use find_idlest_{group, cpu}() to look wider if we think
should look for cpu with higher capacity than the previous one. So, for
assymmetric cpus we take one of the two routes depending on whether a
cpu of the same capacity as the previous one is okay.

Do that make any sense?

> 

> >

> > cc: Ingo Molnar <mingo@redhat.com>

> > cc: Peter Zijlstra <peterz@infradead.org>

> >

> > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

> > ---

> >  kernel/sched/core.c | 3 +++

> >  1 file changed, 3 insertions(+)

> >

> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c

> > index d9619a3..558ec4a 100644

> > --- a/kernel/sched/core.c

> > +++ b/kernel/sched/core.c

> > @@ -6410,6 +6410,9 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)

> >                 sd->idle_idx = 1;

> >         }

> >

> > +       if (sd->flags & SD_ASYM_CPUCAPACITY)

> > +               sd->flags &= ~SD_WAKE_AFFINE;

> > +

> >         sd->private = &tl->data;

> >

> >         return sd;

> > --

> > 1.9.1

> >
Vincent Guittot May 24, 2016, 12:12 p.m. UTC | #3
On 24 May 2016 at 12:29, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> On Tue, May 24, 2016 at 11:10:28AM +0200, Vincent Guittot wrote:

>> On 23 May 2016 at 12:58, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> > If the system has cpu of different compute capacities (e.g. big.LITTLE)

>> > let affine wakeups be constrained to cpus of the same type.

>>

>> Can you explain why you don't want wake affine with cpus with

>> different compute capacity ?

>

> I should have made the overall idea a bit more clear. The idea is to

> deal with cross-capacity migrations in the find_idlest_{group, cpu}{}

> path so we don't have to touch select_idle_sibling().

> select_idle_sibling() is critical for wake-up latency, and I'm assumed

> that people wouldn't like adding extra overhead in there to deal with

> capacity and utilization.


So this means that we will never use the quick path of
select_idle_sibling for cross capacity migration but always the one
with extra overhead?
Patch 9 adds more tests for enabling wake_affine path. Can't it also
be used for cross capacity migration ? so we can use wake_affine if
the task or the cpus (even with different capacity) doesn't need this
extra overhead

>

> So the overall idea is that symmetric capacity systems, everything

> should work as normal. For asymmetric capacity systems, we restrict

> select_idle_sibling() to only look among same-capacity cpus and then use

> wake_cap() to use find_idlest_{group, cpu}() to look wider if we think

> should look for cpu with higher capacity than the previous one. So, for

> assymmetric cpus we take one of the two routes depending on whether a

> cpu of the same capacity as the previous one is okay.

>

> Do that make any sense?

>

>>

>> >

>> > cc: Ingo Molnar <mingo@redhat.com>

>> > cc: Peter Zijlstra <peterz@infradead.org>

>> >

>> > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

>> > ---

>> >  kernel/sched/core.c | 3 +++

>> >  1 file changed, 3 insertions(+)

>> >

>> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c

>> > index d9619a3..558ec4a 100644

>> > --- a/kernel/sched/core.c

>> > +++ b/kernel/sched/core.c

>> > @@ -6410,6 +6410,9 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)

>> >                 sd->idle_idx = 1;

>> >         }

>> >

>> > +       if (sd->flags & SD_ASYM_CPUCAPACITY)

>> > +               sd->flags &= ~SD_WAKE_AFFINE;

>> > +

>> >         sd->private = &tl->data;

>> >

>> >         return sd;

>> > --

>> > 1.9.1

>> >
Morten Rasmussen May 24, 2016, 1:16 p.m. UTC | #4
On Tue, May 24, 2016 at 02:12:38PM +0200, Vincent Guittot wrote:
> On 24 May 2016 at 12:29, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> > On Tue, May 24, 2016 at 11:10:28AM +0200, Vincent Guittot wrote:

> >> On 23 May 2016 at 12:58, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> > If the system has cpu of different compute capacities (e.g. big.LITTLE)

> >> > let affine wakeups be constrained to cpus of the same type.

> >>

> >> Can you explain why you don't want wake affine with cpus with

> >> different compute capacity ?

> >

> > I should have made the overall idea a bit more clear. The idea is to

> > deal with cross-capacity migrations in the find_idlest_{group, cpu}{}

> > path so we don't have to touch select_idle_sibling().

> > select_idle_sibling() is critical for wake-up latency, and I'm assumed

> > that people wouldn't like adding extra overhead in there to deal with

> > capacity and utilization.

> 

> So this means that we will never use the quick path of

> select_idle_sibling for cross capacity migration but always the one

> with extra overhead?


Yes. select_idle_sibling() is only used to choose among equal capacity
cpus (capacity_orig).

> Patch 9 adds more tests for enabling wake_affine path. Can't it also

> be used for cross capacity migration ? so we can use wake_affine if

> the task or the cpus (even with different capacity) doesn't need this

> extra overhead


The test in patch 9 is to determine whether we are happy with the
capacity of the previous cpu, or we should go look for one with more
capacity. I don't see how we can use select_idle_sibling() unmodified
for sched domains containing cpus of different capacity to select an
appropriate cpu. It is just picking an idle cpu, it might have high
capacity or low, it wouldn't care.

How would you avoid the overhead of checking capacity and utilization of
the cpus and still pick an appropriate cpu?
Vincent Guittot May 24, 2016, 1:27 p.m. UTC | #5
On 24 May 2016 at 15:16, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> On Tue, May 24, 2016 at 02:12:38PM +0200, Vincent Guittot wrote:

>> On 24 May 2016 at 12:29, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> > On Tue, May 24, 2016 at 11:10:28AM +0200, Vincent Guittot wrote:

>> >> On 23 May 2016 at 12:58, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> >> > If the system has cpu of different compute capacities (e.g. big.LITTLE)

>> >> > let affine wakeups be constrained to cpus of the same type.

>> >>

>> >> Can you explain why you don't want wake affine with cpus with

>> >> different compute capacity ?

>> >

>> > I should have made the overall idea a bit more clear. The idea is to

>> > deal with cross-capacity migrations in the find_idlest_{group, cpu}{}

>> > path so we don't have to touch select_idle_sibling().

>> > select_idle_sibling() is critical for wake-up latency, and I'm assumed

>> > that people wouldn't like adding extra overhead in there to deal with

>> > capacity and utilization.

>>

>> So this means that we will never use the quick path of

>> select_idle_sibling for cross capacity migration but always the one

>> with extra overhead?

>

> Yes. select_idle_sibling() is only used to choose among equal capacity

> cpus (capacity_orig).

>

>> Patch 9 adds more tests for enabling wake_affine path. Can't it also

>> be used for cross capacity migration ? so we can use wake_affine if

>> the task or the cpus (even with different capacity) doesn't need this

>> extra overhead

>

> The test in patch 9 is to determine whether we are happy with the

> capacity of the previous cpu, or we should go look for one with more

> capacity. I don't see how we can use select_idle_sibling() unmodified

> for sched domains containing cpus of different capacity to select an

> appropriate cpu. It is just picking an idle cpu, it might have high

> capacity or low, it wouldn't care.

>

> How would you avoid the overhead of checking capacity and utilization of

> the cpus and still pick an appropriate cpu?


My point is that there is some wake up case where we don't care about
the capacity and utilization of cpus even for cross capacity migration
and we will never take benefit of this fast path.
You have added an extra check for setting want_affine in patch 9 which
uses capacity and utilization of cpu to disable this fast path when a
task needs more capacity than available. Can't you use this function
to disable the want_affine for cross-capacity migration situation that
cares of the capacity and need the full scan of sched_domain but keep
it enable for other cases ?
Morten Rasmussen May 24, 2016, 1:36 p.m. UTC | #6
On Tue, May 24, 2016 at 03:27:05PM +0200, Vincent Guittot wrote:
> On 24 May 2016 at 15:16, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> > On Tue, May 24, 2016 at 02:12:38PM +0200, Vincent Guittot wrote:

> >> On 24 May 2016 at 12:29, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> > On Tue, May 24, 2016 at 11:10:28AM +0200, Vincent Guittot wrote:

> >> >> On 23 May 2016 at 12:58, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> >> > If the system has cpu of different compute capacities (e.g. big.LITTLE)

> >> >> > let affine wakeups be constrained to cpus of the same type.

> >> >>

> >> >> Can you explain why you don't want wake affine with cpus with

> >> >> different compute capacity ?

> >> >

> >> > I should have made the overall idea a bit more clear. The idea is to

> >> > deal with cross-capacity migrations in the find_idlest_{group, cpu}{}

> >> > path so we don't have to touch select_idle_sibling().

> >> > select_idle_sibling() is critical for wake-up latency, and I'm assumed

> >> > that people wouldn't like adding extra overhead in there to deal with

> >> > capacity and utilization.

> >>

> >> So this means that we will never use the quick path of

> >> select_idle_sibling for cross capacity migration but always the one

> >> with extra overhead?

> >

> > Yes. select_idle_sibling() is only used to choose among equal capacity

> > cpus (capacity_orig).

> >

> >> Patch 9 adds more tests for enabling wake_affine path. Can't it also

> >> be used for cross capacity migration ? so we can use wake_affine if

> >> the task or the cpus (even with different capacity) doesn't need this

> >> extra overhead

> >

> > The test in patch 9 is to determine whether we are happy with the

> > capacity of the previous cpu, or we should go look for one with more

> > capacity. I don't see how we can use select_idle_sibling() unmodified

> > for sched domains containing cpus of different capacity to select an

> > appropriate cpu. It is just picking an idle cpu, it might have high

> > capacity or low, it wouldn't care.

> >

> > How would you avoid the overhead of checking capacity and utilization of

> > the cpus and still pick an appropriate cpu?

> 

> My point is that there is some wake up case where we don't care about

> the capacity and utilization of cpus even for cross capacity migration

> and we will never take benefit of this fast path.

> You have added an extra check for setting want_affine in patch 9 which

> uses capacity and utilization of cpu to disable this fast path when a

> task needs more capacity than available. Can't you use this function

> to disable the want_affine for cross-capacity migration situation that

> cares of the capacity and need the full scan of sched_domain but keep

> it enable for other cases ?


It is not clear to me what the other cases are. What kind of cases do
you have in mind?
Vincent Guittot May 24, 2016, 1:52 p.m. UTC | #7
On 24 May 2016 at 15:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> On Tue, May 24, 2016 at 03:27:05PM +0200, Vincent Guittot wrote:

>> On 24 May 2016 at 15:16, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> > On Tue, May 24, 2016 at 02:12:38PM +0200, Vincent Guittot wrote:

>> >> On 24 May 2016 at 12:29, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> >> > On Tue, May 24, 2016 at 11:10:28AM +0200, Vincent Guittot wrote:

>> >> >> On 23 May 2016 at 12:58, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> >> >> > If the system has cpu of different compute capacities (e.g. big.LITTLE)

>> >> >> > let affine wakeups be constrained to cpus of the same type.

>> >> >>

>> >> >> Can you explain why you don't want wake affine with cpus with

>> >> >> different compute capacity ?

>> >> >

>> >> > I should have made the overall idea a bit more clear. The idea is to

>> >> > deal with cross-capacity migrations in the find_idlest_{group, cpu}{}

>> >> > path so we don't have to touch select_idle_sibling().

>> >> > select_idle_sibling() is critical for wake-up latency, and I'm assumed

>> >> > that people wouldn't like adding extra overhead in there to deal with

>> >> > capacity and utilization.

>> >>

>> >> So this means that we will never use the quick path of

>> >> select_idle_sibling for cross capacity migration but always the one

>> >> with extra overhead?

>> >

>> > Yes. select_idle_sibling() is only used to choose among equal capacity

>> > cpus (capacity_orig).

>> >

>> >> Patch 9 adds more tests for enabling wake_affine path. Can't it also

>> >> be used for cross capacity migration ? so we can use wake_affine if

>> >> the task or the cpus (even with different capacity) doesn't need this

>> >> extra overhead

>> >

>> > The test in patch 9 is to determine whether we are happy with the

>> > capacity of the previous cpu, or we should go look for one with more

>> > capacity. I don't see how we can use select_idle_sibling() unmodified

>> > for sched domains containing cpus of different capacity to select an

>> > appropriate cpu. It is just picking an idle cpu, it might have high

>> > capacity or low, it wouldn't care.

>> >

>> > How would you avoid the overhead of checking capacity and utilization of

>> > the cpus and still pick an appropriate cpu?

>>

>> My point is that there is some wake up case where we don't care about

>> the capacity and utilization of cpus even for cross capacity migration

>> and we will never take benefit of this fast path.

>> You have added an extra check for setting want_affine in patch 9 which

>> uses capacity and utilization of cpu to disable this fast path when a

>> task needs more capacity than available. Can't you use this function

>> to disable the want_affine for cross-capacity migration situation that

>> cares of the capacity and need the full scan of sched_domain but keep

>> it enable for other cases ?

>

> It is not clear to me what the other cases are. What kind of cases do

> you have in mind?


As an example, you have a task A that have to be on a big CPU because
of the requirement of compute capacity, that wakes up a task B that
can run on any cpu according to its utilization. The fast wake up path
is fine for task B whatever prev cpu is.
Morten Rasmussen May 24, 2016, 3:02 p.m. UTC | #8
On Tue, May 24, 2016 at 03:52:00PM +0200, Vincent Guittot wrote:
> On 24 May 2016 at 15:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> > On Tue, May 24, 2016 at 03:27:05PM +0200, Vincent Guittot wrote:

> >> On 24 May 2016 at 15:16, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> > On Tue, May 24, 2016 at 02:12:38PM +0200, Vincent Guittot wrote:

> >> >> On 24 May 2016 at 12:29, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> >> > On Tue, May 24, 2016 at 11:10:28AM +0200, Vincent Guittot wrote:

> >> >> >> On 23 May 2016 at 12:58, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> >> >> > If the system has cpu of different compute capacities (e.g. big.LITTLE)

> >> >> >> > let affine wakeups be constrained to cpus of the same type.

> >> >> >>

> >> >> >> Can you explain why you don't want wake affine with cpus with

> >> >> >> different compute capacity ?

> >> >> >

> >> >> > I should have made the overall idea a bit more clear. The idea is to

> >> >> > deal with cross-capacity migrations in the find_idlest_{group, cpu}{}

> >> >> > path so we don't have to touch select_idle_sibling().

> >> >> > select_idle_sibling() is critical for wake-up latency, and I'm assumed

> >> >> > that people wouldn't like adding extra overhead in there to deal with

> >> >> > capacity and utilization.

> >> >>

> >> >> So this means that we will never use the quick path of

> >> >> select_idle_sibling for cross capacity migration but always the one

> >> >> with extra overhead?

> >> >

> >> > Yes. select_idle_sibling() is only used to choose among equal capacity

> >> > cpus (capacity_orig).

> >> >

> >> >> Patch 9 adds more tests for enabling wake_affine path. Can't it also

> >> >> be used for cross capacity migration ? so we can use wake_affine if

> >> >> the task or the cpus (even with different capacity) doesn't need this

> >> >> extra overhead

> >> >

> >> > The test in patch 9 is to determine whether we are happy with the

> >> > capacity of the previous cpu, or we should go look for one with more

> >> > capacity. I don't see how we can use select_idle_sibling() unmodified

> >> > for sched domains containing cpus of different capacity to select an

> >> > appropriate cpu. It is just picking an idle cpu, it might have high

> >> > capacity or low, it wouldn't care.

> >> >

> >> > How would you avoid the overhead of checking capacity and utilization of

> >> > the cpus and still pick an appropriate cpu?

> >>

> >> My point is that there is some wake up case where we don't care about

> >> the capacity and utilization of cpus even for cross capacity migration

> >> and we will never take benefit of this fast path.

> >> You have added an extra check for setting want_affine in patch 9 which

> >> uses capacity and utilization of cpu to disable this fast path when a

> >> task needs more capacity than available. Can't you use this function

> >> to disable the want_affine for cross-capacity migration situation that

> >> cares of the capacity and need the full scan of sched_domain but keep

> >> it enable for other cases ?

> >

> > It is not clear to me what the other cases are. What kind of cases do

> > you have in mind?

> 

> As an example, you have a task A that have to be on a big CPU because

> of the requirement of compute capacity, that wakes up a task B that

> can run on any cpu according to its utilization. The fast wake up path

> is fine for task B whatever prev cpu is.


In that case, we will take always take fast path (select_idle_sibling())
for task B if wake_wide() allows it, which should be fine.

wake_cap() will return true as the B's prev_cpu is either a big cpu
(first criteria) or have sufficient capacity for B (second criteria).
Given that wake_wide() allows returns false as well and there are no
restrictions, want_affine will be true. Depending on where wake_affine()
sends us, we will use select_idle_sibling() to search around B's
prev_cpu or this cpu (where task A is running).

We avoid the overhead of looking for cpu capacity and utilization, but
we have restricted the search space for select_idle_sibling(). In case
B's prev_cpu is a little cpu, the choice whether we looks for little or
big capacity cpus depends on the wake_affine()'s decision. So the search
space isn't as wide as it could be.

To expand the search space we would have be able to adjust the
sched_domain level at which select_idle_sibling() is operating, so we
can look at same-capacity cpus only in the fast path for tasks like A,
and look at all cpus for tasks like B. It could possibly be done, if we
dare touching select_idle_sibling() ;-) I still have to look at those
patches PeterZ posted a while back.

TLDR; The fast path should already be used for task B, but the cpu
search space is restricted to a specific subset of cpus selected by
wake_affine() which isn't ideal, but much less invasive in terms of code
changes.
Vincent Guittot May 24, 2016, 3:53 p.m. UTC | #9
On 24 May 2016 at 17:02, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> On Tue, May 24, 2016 at 03:52:00PM +0200, Vincent Guittot wrote:

>> On 24 May 2016 at 15:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> > On Tue, May 24, 2016 at 03:27:05PM +0200, Vincent Guittot wrote:

>> >> On 24 May 2016 at 15:16, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> >> > On Tue, May 24, 2016 at 02:12:38PM +0200, Vincent Guittot wrote:

>> >> >> On 24 May 2016 at 12:29, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> >> >> > On Tue, May 24, 2016 at 11:10:28AM +0200, Vincent Guittot wrote:

>> >> >> >> On 23 May 2016 at 12:58, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> >> >> >> > If the system has cpu of different compute capacities (e.g. big.LITTLE)

>> >> >> >> > let affine wakeups be constrained to cpus of the same type.

>> >> >> >>

>> >> >> >> Can you explain why you don't want wake affine with cpus with

>> >> >> >> different compute capacity ?

>> >> >> >

>> >> >> > I should have made the overall idea a bit more clear. The idea is to

>> >> >> > deal with cross-capacity migrations in the find_idlest_{group, cpu}{}

>> >> >> > path so we don't have to touch select_idle_sibling().

>> >> >> > select_idle_sibling() is critical for wake-up latency, and I'm assumed

>> >> >> > that people wouldn't like adding extra overhead in there to deal with

>> >> >> > capacity and utilization.

>> >> >>

>> >> >> So this means that we will never use the quick path of

>> >> >> select_idle_sibling for cross capacity migration but always the one

>> >> >> with extra overhead?

>> >> >

>> >> > Yes. select_idle_sibling() is only used to choose among equal capacity

>> >> > cpus (capacity_orig).

>> >> >

>> >> >> Patch 9 adds more tests for enabling wake_affine path. Can't it also

>> >> >> be used for cross capacity migration ? so we can use wake_affine if

>> >> >> the task or the cpus (even with different capacity) doesn't need this

>> >> >> extra overhead

>> >> >

>> >> > The test in patch 9 is to determine whether we are happy with the

>> >> > capacity of the previous cpu, or we should go look for one with more

>> >> > capacity. I don't see how we can use select_idle_sibling() unmodified

>> >> > for sched domains containing cpus of different capacity to select an

>> >> > appropriate cpu. It is just picking an idle cpu, it might have high

>> >> > capacity or low, it wouldn't care.

>> >> >

>> >> > How would you avoid the overhead of checking capacity and utilization of

>> >> > the cpus and still pick an appropriate cpu?

>> >>

>> >> My point is that there is some wake up case where we don't care about

>> >> the capacity and utilization of cpus even for cross capacity migration

>> >> and we will never take benefit of this fast path.

>> >> You have added an extra check for setting want_affine in patch 9 which

>> >> uses capacity and utilization of cpu to disable this fast path when a

>> >> task needs more capacity than available. Can't you use this function

>> >> to disable the want_affine for cross-capacity migration situation that

>> >> cares of the capacity and need the full scan of sched_domain but keep

>> >> it enable for other cases ?

>> >

>> > It is not clear to me what the other cases are. What kind of cases do

>> > you have in mind?

>>

>> As an example, you have a task A that have to be on a big CPU because

>> of the requirement of compute capacity, that wakes up a task B that

>> can run on any cpu according to its utilization. The fast wake up path

>> is fine for task B whatever prev cpu is.

>

> In that case, we will take always take fast path (select_idle_sibling())

> for task B if wake_wide() allows it, which should be fine.


Even if want_affine is set, the wake up of task B will not use the fast path.
The affine_sd will not be set because the sched_domain, which have
both cpus, will not have the SD_WAKE_AFFINE flag according to this
patch, isn't it ?
So task B can't use the fast path whereas nothing prevent him to take
benefit of it

Am I missing something ?

>

> wake_cap() will return true as the B's prev_cpu is either a big cpu

> (first criteria) or have sufficient capacity for B (second criteria).

> Given that wake_wide() allows returns false as well and there are no

> restrictions, want_affine will be true. Depending on where wake_affine()

> sends us, we will use select_idle_sibling() to search around B's

> prev_cpu or this cpu (where task A is running).

>

> We avoid the overhead of looking for cpu capacity and utilization, but

> we have restricted the search space for select_idle_sibling(). In case

> B's prev_cpu is a little cpu, the choice whether we looks for little or

> big capacity cpus depends on the wake_affine()'s decision. So the search

> space isn't as wide as it could be.

>

> To expand the search space we would have be able to adjust the

> sched_domain level at which select_idle_sibling() is operating, so we

> can look at same-capacity cpus only in the fast path for tasks like A,

> and look at all cpus for tasks like B. It could possibly be done, if we

> dare touching select_idle_sibling() ;-) I still have to look at those

> patches PeterZ posted a while back.

>

> TLDR; The fast path should already be used for task B, but the cpu

> search space is restricted to a specific subset of cpus selected by

> wake_affine() which isn't ideal, but much less invasive in terms of code

> changes.
Morten Rasmussen May 25, 2016, 9:12 a.m. UTC | #10
On Tue, May 24, 2016 at 05:53:27PM +0200, Vincent Guittot wrote:
> On 24 May 2016 at 17:02, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> > On Tue, May 24, 2016 at 03:52:00PM +0200, Vincent Guittot wrote:

> >> On 24 May 2016 at 15:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> > On Tue, May 24, 2016 at 03:27:05PM +0200, Vincent Guittot wrote:

> >> >> On 24 May 2016 at 15:16, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> >> > On Tue, May 24, 2016 at 02:12:38PM +0200, Vincent Guittot wrote:

> >> >> >> On 24 May 2016 at 12:29, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> >> >> > On Tue, May 24, 2016 at 11:10:28AM +0200, Vincent Guittot wrote:

> >> >> >> >> On 23 May 2016 at 12:58, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> >> >> >> > If the system has cpu of different compute capacities (e.g. big.LITTLE)

> >> >> >> >> > let affine wakeups be constrained to cpus of the same type.

> >> >> >> >>

> >> >> >> >> Can you explain why you don't want wake affine with cpus with

> >> >> >> >> different compute capacity ?

> >> >> >> >

> >> >> >> > I should have made the overall idea a bit more clear. The idea is to

> >> >> >> > deal with cross-capacity migrations in the find_idlest_{group, cpu}{}

> >> >> >> > path so we don't have to touch select_idle_sibling().

> >> >> >> > select_idle_sibling() is critical for wake-up latency, and I'm assumed

> >> >> >> > that people wouldn't like adding extra overhead in there to deal with

> >> >> >> > capacity and utilization.

> >> >> >>

> >> >> >> So this means that we will never use the quick path of

> >> >> >> select_idle_sibling for cross capacity migration but always the one

> >> >> >> with extra overhead?

> >> >> >

> >> >> > Yes. select_idle_sibling() is only used to choose among equal capacity

> >> >> > cpus (capacity_orig).

> >> >> >

> >> >> >> Patch 9 adds more tests for enabling wake_affine path. Can't it also

> >> >> >> be used for cross capacity migration ? so we can use wake_affine if

> >> >> >> the task or the cpus (even with different capacity) doesn't need this

> >> >> >> extra overhead

> >> >> >

> >> >> > The test in patch 9 is to determine whether we are happy with the

> >> >> > capacity of the previous cpu, or we should go look for one with more

> >> >> > capacity. I don't see how we can use select_idle_sibling() unmodified

> >> >> > for sched domains containing cpus of different capacity to select an

> >> >> > appropriate cpu. It is just picking an idle cpu, it might have high

> >> >> > capacity or low, it wouldn't care.

> >> >> >

> >> >> > How would you avoid the overhead of checking capacity and utilization of

> >> >> > the cpus and still pick an appropriate cpu?

> >> >>

> >> >> My point is that there is some wake up case where we don't care about

> >> >> the capacity and utilization of cpus even for cross capacity migration

> >> >> and we will never take benefit of this fast path.

> >> >> You have added an extra check for setting want_affine in patch 9 which

> >> >> uses capacity and utilization of cpu to disable this fast path when a

> >> >> task needs more capacity than available. Can't you use this function

> >> >> to disable the want_affine for cross-capacity migration situation that

> >> >> cares of the capacity and need the full scan of sched_domain but keep

> >> >> it enable for other cases ?

> >> >

> >> > It is not clear to me what the other cases are. What kind of cases do

> >> > you have in mind?

> >>

> >> As an example, you have a task A that have to be on a big CPU because

> >> of the requirement of compute capacity, that wakes up a task B that

> >> can run on any cpu according to its utilization. The fast wake up path

> >> is fine for task B whatever prev cpu is.

> >

> > In that case, we will take always take fast path (select_idle_sibling())

> > for task B if wake_wide() allows it, which should be fine.

> 

> Even if want_affine is set, the wake up of task B will not use the fast path.

> The affine_sd will not be set because the sched_domain, which have

> both cpus, will not have the SD_WAKE_AFFINE flag according to this

> patch, isn't it ?

> So task B can't use the fast path whereas nothing prevent him to take

> benefit of it

> 

> Am I missing something ?


No, I think you are right. Very good point. The cpumask test with
sched_domain_span() will of cause return false. So yes, in this case the
slow path is taken. It isn't wrong as such, just slower for asymmetric
capacity systems :-)

It is clearly not as optimized for asymmetric capacity systems as it
could be, but my focus was to not ruin existing behaviour and minimize
overhead for others. There are a lot of different routes through those
conditions in the first half of select_task_rq_fair() that aren't
obvious. I worry that some users depend on them and that I don't
see/understand all of them.

If people agree on changing things, it is fine with me. I just tried to
avoid getting the patches shot down on that account ;-)
Vincent Guittot May 26, 2016, 6:45 a.m. UTC | #11
On 25 May 2016 at 11:12, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> On Tue, May 24, 2016 at 05:53:27PM +0200, Vincent Guittot wrote:

>> On 24 May 2016 at 17:02, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> > On Tue, May 24, 2016 at 03:52:00PM +0200, Vincent Guittot wrote:

>> >> On 24 May 2016 at 15:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> >> > On Tue, May 24, 2016 at 03:27:05PM +0200, Vincent Guittot wrote:

>> >> >> On 24 May 2016 at 15:16, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> >> >> > On Tue, May 24, 2016 at 02:12:38PM +0200, Vincent Guittot wrote:

>> >> >> >> On 24 May 2016 at 12:29, Morten Rasmussen <morten.rasmussen@arm.com> wrote:


[snip]

>> >> > It is not clear to me what the other cases are. What kind of cases do

>> >> > you have in mind?

>> >>

>> >> As an example, you have a task A that have to be on a big CPU because

>> >> of the requirement of compute capacity, that wakes up a task B that

>> >> can run on any cpu according to its utilization. The fast wake up path

>> >> is fine for task B whatever prev cpu is.

>> >

>> > In that case, we will take always take fast path (select_idle_sibling())

>> > for task B if wake_wide() allows it, which should be fine.

>>

>> Even if want_affine is set, the wake up of task B will not use the fast path.

>> The affine_sd will not be set because the sched_domain, which have

>> both cpus, will not have the SD_WAKE_AFFINE flag according to this

>> patch, isn't it ?

>> So task B can't use the fast path whereas nothing prevent him to take

>> benefit of it

>>

>> Am I missing something ?

>

> No, I think you are right. Very good point. The cpumask test with

> sched_domain_span() will of cause return false. So yes, in this case the

> slow path is taken. It isn't wrong as such, just slower for asymmetric

> capacity systems :-)

>


So, I still don't see why the function wake_cap that is introduce by
patch 9 can't be used for testing cross capacity migration at wake up
?
The only reason for which we would like to skip fast wake up path for
cross capacity migration, is whether the task needs more capacity than
the capacity of cpu or prev_cpu. You already do that for prev_cpu,
can't you add same test for cpu ?

> It is clearly not as optimized for asymmetric capacity systems as it

> could be, but my focus was to not ruin existing behaviour and minimize

> overhead for others. There are a lot of different routes through those

> conditions in the first half of select_task_rq_fair() that aren't

> obvious. I worry that some users depend on them and that I don't

> see/understand all of them.

>

> If people agree on changing things, it is fine with me. I just tried to

> avoid getting the patches shot down on that account ;-)
Morten Rasmussen June 7, 2016, 4:50 p.m. UTC | #12
On Thu, May 26, 2016 at 08:45:03AM +0200, Vincent Guittot wrote:
> On 25 May 2016 at 11:12, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> > On Tue, May 24, 2016 at 05:53:27PM +0200, Vincent Guittot wrote:

> >> On 24 May 2016 at 17:02, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> > On Tue, May 24, 2016 at 03:52:00PM +0200, Vincent Guittot wrote:

> >> >> On 24 May 2016 at 15:36, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> >> > On Tue, May 24, 2016 at 03:27:05PM +0200, Vincent Guittot wrote:

> >> >> >> On 24 May 2016 at 15:16, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> >> >> > On Tue, May 24, 2016 at 02:12:38PM +0200, Vincent Guittot wrote:

> >> >> >> >> On 24 May 2016 at 12:29, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> 

> [snip]

> 

> >> >> > It is not clear to me what the other cases are. What kind of cases do

> >> >> > you have in mind?

> >> >>

> >> >> As an example, you have a task A that have to be on a big CPU because

> >> >> of the requirement of compute capacity, that wakes up a task B that

> >> >> can run on any cpu according to its utilization. The fast wake up path

> >> >> is fine for task B whatever prev cpu is.

> >> >

> >> > In that case, we will take always take fast path (select_idle_sibling())

> >> > for task B if wake_wide() allows it, which should be fine.

> >>

> >> Even if want_affine is set, the wake up of task B will not use the fast path.

> >> The affine_sd will not be set because the sched_domain, which have

> >> both cpus, will not have the SD_WAKE_AFFINE flag according to this

> >> patch, isn't it ?

> >> So task B can't use the fast path whereas nothing prevent him to take

> >> benefit of it

> >>

> >> Am I missing something ?

> >

> > No, I think you are right. Very good point. The cpumask test with

> > sched_domain_span() will of cause return false. So yes, in this case the

> > slow path is taken. It isn't wrong as such, just slower for asymmetric

> > capacity systems :-)

> >

> 

> So, I still don't see why the function wake_cap that is introduce by

> patch 9 can't be used for testing cross capacity migration at wake up

> ?

> The only reason for which we would like to skip fast wake up path for

> cross capacity migration, is whether the task needs more capacity than

> the capacity of cpu or prev_cpu. You already do that for prev_cpu,

> can't you add same test for cpu ?


I think I finally see what you mean. Thanks for your patience :-)

It should be safe to let wake_affine be cross-capacity as long as we
only take the fast path when both prev_cpu and this_cpu have sufficient
capacity for the task. select_idle_sibling() can never end up looking at
cpus with different capacities than those of this_cpu and prev_cpu.

I will give that a try. I have a few ideas on how it can extended to use
the fast path in a few additional cases as well.
diff mbox

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d9619a3..558ec4a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6410,6 +6410,9 @@  sd_init(struct sched_domain_topology_level *tl, int cpu)
 		sd->idle_idx = 1;
 	}
 
+	if (sd->flags & SD_ASYM_CPUCAPACITY)
+		sd->flags &= ~SD_WAKE_AFFINE;
+
 	sd->private = &tl->data;
 
 	return sd;