diff mbox

[v2,07/13] sched/fair: Let asymmetric cpu configurations balance at wake-up

Message ID 1466615004-3503-8-git-send-email-morten.rasmussen@arm.com
State Superseded
Headers show

Commit Message

Morten Rasmussen June 22, 2016, 5:03 p.m. UTC
Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if
SD_BALANCE_WAKE is set on the sched_domains. For asymmetric
configurations SD_WAKE_AFFINE is only desirable if the waking task's
compute demand (utilization) is suitable for all the cpu capacities
available within the SD_WAKE_AFFINE sched_domain. If not, let wakeup
balancing take over (find_idlest_{group, cpu}()).

This patch makes affine wake-ups conditional on whether both the waker
cpu and prev_cpu has sufficient capacity for the waking task, or not.

It is assumed that the sched_group(s) containing the waker cpu and
prev_cpu only contain cpu with the same capacity (homogeneous).

Ideally, we shouldn't set 'want_affine' in the first place, but we don't
know if SD_BALANCE_WAKE is enabled on the sched_domain(s) until we start
traversing them.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

---
 kernel/sched/fair.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

-- 
1.9.1

Comments

Morten Rasmussen July 11, 2016, 12:32 p.m. UTC | #1
On Mon, Jul 11, 2016 at 01:13:44PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 22, 2016 at 06:03:18PM +0100, Morten Rasmussen wrote:

> > Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if

> > SD_BALANCE_WAKE is set on the sched_domains. For asymmetric

> > configurations SD_WAKE_AFFINE is only desirable if the waking task's

> > compute demand (utilization) is suitable for all the cpu capacities

> > available within the SD_WAKE_AFFINE sched_domain. If not, let wakeup

> > balancing take over (find_idlest_{group, cpu}()).

> 

> I think I tripped over this one the last time around, and I'm not sure

> this Changelog is any clearer.

> 

> This is about the case where the waking cpu and prev_cpu are both in the

> 'wrong' cluster, right?


Almost :-) It is the cases where waking cpu _or_ prev_cpu both don't
'fit' the task ('wrong' cluster). select_idle_sibling() doesn't consider
capacity so we can't let it choose between waking cpu and prev_cpu if
one of them isn't a good choice.

Bringing in the table from the cover letter:

tu   = Task utilization big/little
pcpu = Previous cpu big/little
tcpu = This (waker) cpu big/little
dl   = New cpu is little
db   = New cpu is big
sis  = New cpu chosen by select_idle_sibling()
figc = New cpu chosen by find_idlest_*()
ww   = wake_wide(task) count for figc wakeups
bw   = sd_flag & SD_BALANCE_WAKE (non-fork/exec wake)
       for figc wakeups

case tu   pcpu tcpu   dl   db  sis figc   ww   bw
1    l    l    l     122   68   28  162  161  161
2    l    l    b      11    4    0   15   15   15
3    l    b    l       0  252    8  244  244  244
4    l    b    b      36 1928  711 1253 1016 1016
5    b    l    l       5   19    0   24   22   24
6    b    l    b       5    1    0    6    0    6
7    b    b    l       0   31    0   31   31   31
8    b    b    b       1  194  109   86   59   59
--------------------------------------------------
                     180 2497  856 1821

In cases 5, 6, and 7 prev_cpu or waking cpu is little and the task only
fits on big. This is why we have zeros in the 'sis' column for those
three cases. All other cases we don't care from a capacity point of view
if select_idle_sibling() choose prev_cpu or waking cpu.

I will try to rewrite the commit message to makes this clearer.

> 

> > This patch makes affine wake-ups conditional on whether both the waker

> > cpu and prev_cpu has sufficient capacity for the waking task, or not.

> > 

> > It is assumed that the sched_group(s) containing the waker cpu and

> > prev_cpu only contain cpu with the same capacity (homogeneous).

> 

> > 

> > Ideally, we shouldn't set 'want_affine' in the first place, but we don't

> > know if SD_BALANCE_WAKE is enabled on the sched_domain(s) until we start

> > traversing them.

> 

> Is this again more fallout from that weird ASYM_CAP thing?


I think this is a left-over comment from v1 that shouldn't have
survived into v2. The flag game was more complicated in v1 as it
required disabling SD_WAKE_AFFINE. That is all gone know thanks to
Vincent's suggestions in the v1 review.

> 

> > +static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)

> > +{

> > +	long min_cap, max_cap;

> > +

> > +	min_cap = min(capacity_orig_of(prev_cpu), capacity_orig_of(cpu));

> > +	max_cap = cpu_rq(cpu)->rd->max_cpu_capacity;

> > +

> > +	/* Minimum capacity is close to max, no need to abort wake_affine */

> > +	if (max_cap - min_cap < max_cap >> 3)

> > +		return 0;

> > +

> > +	return min_cap * 1024 < task_util(p) * capacity_margin;

> > +}

> 

> I'm most puzzled by these inequalities, how, why ?

> 

> I would figure you'd compare task_util to the current remaining util of

> the small group, and if it fits, place it there. This seems to do

> something entirely different.


Right, I only compare task utilization to max capacity here and
completely ignore whether any of those cpus are already fully or
partially utilized. Available (remaining) capacity is considered in
find_idlest_group() when that route is taken.

wake_cap() is meant as a first check to see if we should worry about cpu
types (max capacities) at all: If not go select_idle_sibling(), if we
should, go find_idlest_{group,cpu}() and look more into the details.

Find the best target cpu quickly becomes messy. Comparing group
utilization to group capacity can be very misleading. The ARM Juno
platform has four little cpus and two big cpus, so cluster group
capacities are roughly comparable. We have to iterate over all the cpus
to find the one with most spare capacity to find the best fit. So I
thought that would fit better with the existing code in find_idlest_*().
Vincent Guittot July 13, 2016, 12:56 p.m. UTC | #2
On 22 June 2016 at 19:03, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if

> SD_BALANCE_WAKE is set on the sched_domains. For asymmetric

> configurations SD_WAKE_AFFINE is only desirable if the waking task's

> compute demand (utilization) is suitable for all the cpu capacities

> available within the SD_WAKE_AFFINE sched_domain. If not, let wakeup


instead of "suitable for all the cpu capacities available within the
SD_WAKE_AFFINE sched_domain", should it be "suitable for local cpu and
prev cpu" becasue you only check the capacity of these 2 CPUs.

Other than this comment for the commit message, the patch looks good to me
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>


> balancing take over (find_idlest_{group, cpu}()).

>

> This patch makes affine wake-ups conditional on whether both the waker

> cpu and prev_cpu has sufficient capacity for the waking task, or not.

>

> It is assumed that the sched_group(s) containing the waker cpu and

> prev_cpu only contain cpu with the same capacity (homogeneous).

>

> Ideally, we shouldn't set 'want_affine' in the first place, but we don't

> know if SD_BALANCE_WAKE is enabled on the sched_domain(s) until we start

> traversing them.

>

> cc: Ingo Molnar <mingo@redhat.com>

> cc: Peter Zijlstra <peterz@infradead.org>

>

> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

> ---

>  kernel/sched/fair.c | 28 +++++++++++++++++++++++++++-

>  1 file changed, 27 insertions(+), 1 deletion(-)

>

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> index 216db302e87d..dba02c7b57b3 100644

> --- a/kernel/sched/fair.c

> +++ b/kernel/sched/fair.c

> @@ -114,6 +114,12 @@ unsigned int __read_mostly sysctl_sched_shares_window = 10000000UL;

>  unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;

>  #endif

>

> +/*

> + * The margin used when comparing utilization with cpu capacity:

> + * util * 1024 < capacity * margin

> + */

> +unsigned int capacity_margin = 1280; /* ~20% */

> +

>  static inline void update_load_add(struct load_weight *lw, unsigned long inc)

>  {

>         lw->weight += inc;

> @@ -5260,6 +5266,25 @@ static int cpu_util(int cpu)

>         return (util >= capacity) ? capacity : util;

>  }

>

> +static inline int task_util(struct task_struct *p)

> +{

> +       return p->se.avg.util_avg;

> +}

> +

> +static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)

> +{

> +       long min_cap, max_cap;

> +

> +       min_cap = min(capacity_orig_of(prev_cpu), capacity_orig_of(cpu));

> +       max_cap = cpu_rq(cpu)->rd->max_cpu_capacity;

> +

> +       /* Minimum capacity is close to max, no need to abort wake_affine */

> +       if (max_cap - min_cap < max_cap >> 3)

> +               return 0;

> +

> +       return min_cap * 1024 < task_util(p) * capacity_margin;

> +}

> +

>  /*

>   * select_task_rq_fair: Select target runqueue for the waking task in domains

>   * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,

> @@ -5283,7 +5308,8 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f

>

>         if (sd_flag & SD_BALANCE_WAKE) {

>                 record_wakee(p);

> -               want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));

> +               want_affine = !wake_wide(p) && !wake_cap(p, cpu, prev_cpu)

> +                             && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));

>         }

>

>         rcu_read_lock();

> --

> 1.9.1

>
Morten Rasmussen July 13, 2016, 4:14 p.m. UTC | #3
On Wed, Jul 13, 2016 at 02:56:41PM +0200, Vincent Guittot wrote:
> On 22 June 2016 at 19:03, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> > Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if

> > SD_BALANCE_WAKE is set on the sched_domains. For asymmetric

> > configurations SD_WAKE_AFFINE is only desirable if the waking task's

> > compute demand (utilization) is suitable for all the cpu capacities

> > available within the SD_WAKE_AFFINE sched_domain. If not, let wakeup

> 

> instead of "suitable for all the cpu capacities available within the

> SD_WAKE_AFFINE sched_domain", should it be "suitable for local cpu and

> prev cpu" becasue you only check the capacity of these 2 CPUs.


Good point. I currently make the implicit assumption that capacity of local cpu
and prev cpu represent the capacity for all cpus their SD_WAKE_AFFINE
domains. It breaks if you should choose to have SD_WAKE_AFFINE on a
domain that spans both little and big cpus, as if local/prev cpu happens
to be big we assume that they are all big and let select_idle_sibling()
handle the task placement even for big tasks if local/prev cpu are both
big.

I don't see why anybody would want that kind of setup, but I think the
assumption should still be written down somewhere, either here or in a
comment in wake_cap() or both.

The next paragraph in the commit message mentions that we actually only
check waker cpu and prev_cpu capacity. Would it be more clear if we
extend that to something like:

    This patch makes affine wake-ups conditional on whether both the waker
    cpu and prev_cpu has sufficient capacity for the waking task, or
    not, assuming that the cpu capacities within an SD_WAKE_AFFINE
    domain are homogeneous.

Thoughts?

> 

> Other than this comment for the commit message, the patch looks good to me

> Acked-by: Vincent Guittot <vincent.guittot@linaro.org>


Thanks,
Morten


> 

> > balancing take over (find_idlest_{group, cpu}()).

> >

> > This patch makes affine wake-ups conditional on whether both the waker

> > cpu and prev_cpu has sufficient capacity for the waking task, or not.

> >

> > It is assumed that the sched_group(s) containing the waker cpu and

> > prev_cpu only contain cpu with the same capacity (homogeneous).

> >

> > Ideally, we shouldn't set 'want_affine' in the first place, but we don't

> > know if SD_BALANCE_WAKE is enabled on the sched_domain(s) until we start

> > traversing them.

> >

> > cc: Ingo Molnar <mingo@redhat.com>

> > cc: Peter Zijlstra <peterz@infradead.org>

> >

> > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

> > ---

> >  kernel/sched/fair.c | 28 +++++++++++++++++++++++++++-

> >  1 file changed, 27 insertions(+), 1 deletion(-)

> >

> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> > index 216db302e87d..dba02c7b57b3 100644

> > --- a/kernel/sched/fair.c

> > +++ b/kernel/sched/fair.c

> > @@ -114,6 +114,12 @@ unsigned int __read_mostly sysctl_sched_shares_window = 10000000UL;

> >  unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;

> >  #endif

> >

> > +/*

> > + * The margin used when comparing utilization with cpu capacity:

> > + * util * 1024 < capacity * margin

> > + */

> > +unsigned int capacity_margin = 1280; /* ~20% */

> > +

> >  static inline void update_load_add(struct load_weight *lw, unsigned long inc)

> >  {

> >         lw->weight += inc;

> > @@ -5260,6 +5266,25 @@ static int cpu_util(int cpu)

> >         return (util >= capacity) ? capacity : util;

> >  }

> >

> > +static inline int task_util(struct task_struct *p)

> > +{

> > +       return p->se.avg.util_avg;

> > +}

> > +

> > +static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)

> > +{

> > +       long min_cap, max_cap;

> > +

> > +       min_cap = min(capacity_orig_of(prev_cpu), capacity_orig_of(cpu));

> > +       max_cap = cpu_rq(cpu)->rd->max_cpu_capacity;

> > +

> > +       /* Minimum capacity is close to max, no need to abort wake_affine */

> > +       if (max_cap - min_cap < max_cap >> 3)

> > +               return 0;

> > +

> > +       return min_cap * 1024 < task_util(p) * capacity_margin;

> > +}

> > +

> >  /*

> >   * select_task_rq_fair: Select target runqueue for the waking task in domains

> >   * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,

> > @@ -5283,7 +5308,8 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f

> >

> >         if (sd_flag & SD_BALANCE_WAKE) {

> >                 record_wakee(p);

> > -               want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));

> > +               want_affine = !wake_wide(p) && !wake_cap(p, cpu, prev_cpu)

> > +                             && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));

> >         }

> >

> >         rcu_read_lock();

> > --

> > 1.9.1

> >
Vincent Guittot July 14, 2016, 1:45 p.m. UTC | #4
On 13 July 2016 at 18:14, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> On Wed, Jul 13, 2016 at 02:56:41PM +0200, Vincent Guittot wrote:

>> On 22 June 2016 at 19:03, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

>> > Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if

>> > SD_BALANCE_WAKE is set on the sched_domains. For asymmetric

>> > configurations SD_WAKE_AFFINE is only desirable if the waking task's

>> > compute demand (utilization) is suitable for all the cpu capacities

>> > available within the SD_WAKE_AFFINE sched_domain. If not, let wakeup

>>

>> instead of "suitable for all the cpu capacities available within the

>> SD_WAKE_AFFINE sched_domain", should it be "suitable for local cpu and

>> prev cpu" becasue you only check the capacity of these 2 CPUs.

>

> Good point. I currently make the implicit assumption that capacity of local cpu

> and prev cpu represent the capacity for all cpus their SD_WAKE_AFFINE

> domains. It breaks if you should choose to have SD_WAKE_AFFINE on a

> domain that spans both little and big cpus, as if local/prev cpu happens

> to be big we assume that they are all big and let select_idle_sibling()

> handle the task placement even for big tasks if local/prev cpu are both

> big.


Isn't the sd_llc used in select_idle_sibling and not the
SD_WAKE_AFFINE domian so if CPUs in the sd_llc are homogeneous, we are
safe

>

> I don't see why anybody would want that kind of setup, but I think the

> assumption should still be written down somewhere, either here or in a

> comment in wake_cap() or both.

>

> The next paragraph in the commit message mentions that we actually only

> check waker cpu and prev_cpu capacity. Would it be more clear if we

> extend that to something like:

>

>     This patch makes affine wake-ups conditional on whether both the waker

>     cpu and prev_cpu has sufficient capacity for the waking task, or

>     not, assuming that the cpu capacities within an SD_WAKE_AFFINE

>     domain are homogeneous.

>

> Thoughts?

>

>>

>> Other than this comment for the commit message, the patch looks good to me

>> Acked-by: Vincent Guittot <vincent.guittot@linaro.org>

>

> Thanks,

> Morten

>

>

>>

>> > balancing take over (find_idlest_{group, cpu}()).

>> >

>> > This patch makes affine wake-ups conditional on whether both the waker

>> > cpu and prev_cpu has sufficient capacity for the waking task, or not.

>> >

>> > It is assumed that the sched_group(s) containing the waker cpu and

>> > prev_cpu only contain cpu with the same capacity (homogeneous).

>> >

>> > Ideally, we shouldn't set 'want_affine' in the first place, but we don't

>> > know if SD_BALANCE_WAKE is enabled on the sched_domain(s) until we start

>> > traversing them.

>> >

>> > cc: Ingo Molnar <mingo@redhat.com>

>> > cc: Peter Zijlstra <peterz@infradead.org>

>> >

>> > Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

>> > ---

>> >  kernel/sched/fair.c | 28 +++++++++++++++++++++++++++-

>> >  1 file changed, 27 insertions(+), 1 deletion(-)

>> >

>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

>> > index 216db302e87d..dba02c7b57b3 100644

>> > --- a/kernel/sched/fair.c

>> > +++ b/kernel/sched/fair.c

>> > @@ -114,6 +114,12 @@ unsigned int __read_mostly sysctl_sched_shares_window = 10000000UL;

>> >  unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;

>> >  #endif

>> >

>> > +/*

>> > + * The margin used when comparing utilization with cpu capacity:

>> > + * util * 1024 < capacity * margin

>> > + */

>> > +unsigned int capacity_margin = 1280; /* ~20% */

>> > +

>> >  static inline void update_load_add(struct load_weight *lw, unsigned long inc)

>> >  {

>> >         lw->weight += inc;

>> > @@ -5260,6 +5266,25 @@ static int cpu_util(int cpu)

>> >         return (util >= capacity) ? capacity : util;

>> >  }

>> >

>> > +static inline int task_util(struct task_struct *p)

>> > +{

>> > +       return p->se.avg.util_avg;

>> > +}

>> > +

>> > +static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)

>> > +{

>> > +       long min_cap, max_cap;

>> > +

>> > +       min_cap = min(capacity_orig_of(prev_cpu), capacity_orig_of(cpu));

>> > +       max_cap = cpu_rq(cpu)->rd->max_cpu_capacity;

>> > +

>> > +       /* Minimum capacity is close to max, no need to abort wake_affine */

>> > +       if (max_cap - min_cap < max_cap >> 3)

>> > +               return 0;

>> > +

>> > +       return min_cap * 1024 < task_util(p) * capacity_margin;

>> > +}

>> > +

>> >  /*

>> >   * select_task_rq_fair: Select target runqueue for the waking task in domains

>> >   * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,

>> > @@ -5283,7 +5308,8 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f

>> >

>> >         if (sd_flag & SD_BALANCE_WAKE) {

>> >                 record_wakee(p);

>> > -               want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));

>> > +               want_affine = !wake_wide(p) && !wake_cap(p, cpu, prev_cpu)

>> > +                             && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));

>> >         }

>> >

>> >         rcu_read_lock();

>> > --

>> > 1.9.1

>> >
Morten Rasmussen July 15, 2016, 8:37 a.m. UTC | #5
On Thu, Jul 14, 2016 at 03:45:17PM +0200, Vincent Guittot wrote:
> On 13 July 2016 at 18:14, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> > On Wed, Jul 13, 2016 at 02:56:41PM +0200, Vincent Guittot wrote:

> >> On 22 June 2016 at 19:03, Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> >> > Currently, SD_WAKE_AFFINE always takes priority over wakeup balancing if

> >> > SD_BALANCE_WAKE is set on the sched_domains. For asymmetric

> >> > configurations SD_WAKE_AFFINE is only desirable if the waking task's

> >> > compute demand (utilization) is suitable for all the cpu capacities

> >> > available within the SD_WAKE_AFFINE sched_domain. If not, let wakeup

> >>

> >> instead of "suitable for all the cpu capacities available within the

> >> SD_WAKE_AFFINE sched_domain", should it be "suitable for local cpu and

> >> prev cpu" becasue you only check the capacity of these 2 CPUs.

> >

> > Good point. I currently make the implicit assumption that capacity of local cpu

> > and prev cpu represent the capacity for all cpus their SD_WAKE_AFFINE

> > domains. It breaks if you should choose to have SD_WAKE_AFFINE on a

> > domain that spans both little and big cpus, as if local/prev cpu happens

> > to be big we assume that they are all big and let select_idle_sibling()

> > handle the task placement even for big tasks if local/prev cpu are both

> > big.

> 

> Isn't the sd_llc used in select_idle_sibling and not the

> SD_WAKE_AFFINE domian so if CPUs in the sd_llc are homogeneous, we are

> safe


Yes, I confused myself (again) with SD_WAKE_AFFINE and sd_llc in the
above. It should have been sd_llc instead of SD_WAKE_AFFINE. I will fix
the commit message to be correct.
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 216db302e87d..dba02c7b57b3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -114,6 +114,12 @@  unsigned int __read_mostly sysctl_sched_shares_window = 10000000UL;
 unsigned int sysctl_sched_cfs_bandwidth_slice = 5000UL;
 #endif
 
+/*
+ * The margin used when comparing utilization with cpu capacity:
+ * util * 1024 < capacity * margin
+ */
+unsigned int capacity_margin = 1280; /* ~20% */
+
 static inline void update_load_add(struct load_weight *lw, unsigned long inc)
 {
 	lw->weight += inc;
@@ -5260,6 +5266,25 @@  static int cpu_util(int cpu)
 	return (util >= capacity) ? capacity : util;
 }
 
+static inline int task_util(struct task_struct *p)
+{
+	return p->se.avg.util_avg;
+}
+
+static int wake_cap(struct task_struct *p, int cpu, int prev_cpu)
+{
+	long min_cap, max_cap;
+
+	min_cap = min(capacity_orig_of(prev_cpu), capacity_orig_of(cpu));
+	max_cap = cpu_rq(cpu)->rd->max_cpu_capacity;
+
+	/* Minimum capacity is close to max, no need to abort wake_affine */
+	if (max_cap - min_cap < max_cap >> 3)
+		return 0;
+
+	return min_cap * 1024 < task_util(p) * capacity_margin;
+}
+
 /*
  * select_task_rq_fair: Select target runqueue for the waking task in domains
  * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
@@ -5283,7 +5308,8 @@  select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
 
 	if (sd_flag & SD_BALANCE_WAKE) {
 		record_wakee(p);
-		want_affine = !wake_wide(p) && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));
+		want_affine = !wake_wide(p) && !wake_cap(p, cpu, prev_cpu)
+			      && cpumask_test_cpu(cpu, tsk_cpus_allowed(p));
 	}
 
 	rcu_read_lock();