diff mbox

[v2,05/13] sched: Enable SD_BALANCE_WAKE for asymmetric capacity systems

Message ID 1466615004-3503-6-git-send-email-morten.rasmussen@arm.com
State New
Headers show

Commit Message

Morten Rasmussen June 22, 2016, 5:03 p.m. UTC
Systems with the SD_ASYM_CPUCAPACITY flag set indicate that sched_groups
at this level or below do not include cpus of all capacities available
(e.g. group containing little-only or big-only cpus in big.LITTLE
systems). It is therefore necessary to put in more effort in finding an
appropriate cpu at task wake-up by enabling balancing at wake-up
(SD_BALANCE_WAKE).

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

---
 kernel/sched/core.c | 3 +++
 1 file changed, 3 insertions(+)

-- 
1.9.1

Comments

Morten Rasmussen July 11, 2016, 10:37 a.m. UTC | #1
On Mon, Jul 11, 2016 at 12:04:49PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 22, 2016 at 06:03:16PM +0100, Morten Rasmussen wrote:

> > Systems with the SD_ASYM_CPUCAPACITY flag set indicate that sched_groups

> > at this level or below do not include cpus of all capacities available

> > (e.g. group containing little-only or big-only cpus in big.LITTLE

> > systems). It is therefore necessary to put in more effort in finding an

> > appropriate cpu at task wake-up by enabling balancing at wake-up

> > (SD_BALANCE_WAKE).

> 

> > --- a/kernel/sched/core.c

> > +++ b/kernel/sched/core.c

> > @@ -6397,6 +6397,9 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)

> >  	 * Convert topological properties into behaviour.

> >  	 */

> >  

> > +	if (sd->flags & SD_ASYM_CPUCAPACITY)

> > +		sd->flags |= SD_BALANCE_WAKE;

> > +

> 

> So I'm a bit confused on the exact requirements for this; as also per

> the previous patch.

> 

> Should all sched domains get BALANCE_WAKE if one (typically the top)

> domain has ASYM_CAP set?

> 

> The previous patch set it on the actual asym one and one below that, but

> what if there's more levels below that? Imagine ARM gaining SMT or

> somesuch. Should not then that level also get BALANCE_WAKE in order to

> 'correctly' place light/heavy tasks?

> 

> IOW, are you trying to fudge the behaviour semantics by creating 'weird'

> ASYM_CAP rules instead of having a more complex behaviour rule here?


That is one possible way of describing it :-)

The proposed semantic is to set ASYM_CAP at all levels starting from the
bottom up until you have sched_groups containing all types of cpus
available in the system, or reach the top level.

The fundamental reason for this weird semantics is that we somehow need
to know at the lower levels, which may be capacity symmetric, if we need
to consider balancing at a higher level to see the asymmetry or not.

If the flag isn't set bottom up we need some other way of knowing if the
system is asymmetric, or we would have to go look for the flag further
up the sched_domain hierarchy each time.

I'm not saying this is the perfect solution, I'm happy to discuss
alternatives.

The example in the previous patch has the flag set on both levels, as we
have two clusters of different cpus and therefore have to go to the top
so 'see' all the types of cpus we have in the system.

If you add SMT, you would add a third level at the bottom with
ASYM_CAP set as well as you still have to balance at top level to have
the full range of choice of cpu type.

Should someone build a system with multiple big.LITTLE cluster pairs and
essentially add another sched_domain level on top, then that level
should _not_ have the ASYM_CAP flag set. The sched_groups at this level
would span both big and little cpus of the cluster pair so there is
little reason to expand the search scope at wake-up further.

I hope that makes sense.
Morten Rasmussen July 11, 2016, 11:04 a.m. UTC | #2
On Mon, Jul 11, 2016 at 11:37:18AM +0100, Morten Rasmussen wrote:
> On Mon, Jul 11, 2016 at 12:04:49PM +0200, Peter Zijlstra wrote:

> > On Wed, Jun 22, 2016 at 06:03:16PM +0100, Morten Rasmussen wrote:

> > > Systems with the SD_ASYM_CPUCAPACITY flag set indicate that sched_groups

> > > at this level or below do not include cpus of all capacities available

> > > (e.g. group containing little-only or big-only cpus in big.LITTLE

> > > systems). It is therefore necessary to put in more effort in finding an

> > > appropriate cpu at task wake-up by enabling balancing at wake-up

> > > (SD_BALANCE_WAKE).

> > 

> > > --- a/kernel/sched/core.c

> > > +++ b/kernel/sched/core.c

> > > @@ -6397,6 +6397,9 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)

> > >  	 * Convert topological properties into behaviour.

> > >  	 */

> > >  

> > > +	if (sd->flags & SD_ASYM_CPUCAPACITY)

> > > +		sd->flags |= SD_BALANCE_WAKE;

> > > +

> > 

> > So I'm a bit confused on the exact requirements for this; as also per

> > the previous patch.

> > 

> > Should all sched domains get BALANCE_WAKE if one (typically the top)

> > domain has ASYM_CAP set?

> > 

> > The previous patch set it on the actual asym one and one below that, but

> > what if there's more levels below that? Imagine ARM gaining SMT or

> > somesuch. Should not then that level also get BALANCE_WAKE in order to

> > 'correctly' place light/heavy tasks?

> > 

> > IOW, are you trying to fudge the behaviour semantics by creating 'weird'

> > ASYM_CAP rules instead of having a more complex behaviour rule here?

> 

> That is one possible way of describing it :-)

> 

> The proposed semantic is to set ASYM_CAP at all levels starting from the

> bottom up until you have sched_groups containing all types of cpus

> available in the system, or reach the top level.

> 

> The fundamental reason for this weird semantics is that we somehow need

> to know at the lower levels, which may be capacity symmetric, if we need

> to consider balancing at a higher level to see the asymmetry or not.

> 

> If the flag isn't set bottom up we need some other way of knowing if the

> system is asymmetric, or we would have to go look for the flag further

> up the sched_domain hierarchy each time.

> 

> I'm not saying this is the perfect solution, I'm happy to discuss

> alternatives.


One alternative to setting ASYM_CAP bottom up would be to set it only
where the asymmetry can be observed, and instead come up with a more
complicated way of setting BALANCE_WAKE bottom up until and including
the first level having the ASYM_CAP.

I looked at it briefly an realized that I couldn't find a clean way of
implementing it as I don't think we have visibility of which flags that
will be set at higher levels in the sched_domain hierarchy when the
lower levels are initialized. IOW, we have behavioural flags settings
depend on topology flags settings at a different level.
Morten Rasmussen July 12, 2016, 2:26 p.m. UTC | #3
On Mon, Jul 11, 2016 at 01:24:04PM +0200, Peter Zijlstra wrote:
> On Mon, Jul 11, 2016 at 12:04:58PM +0100, Morten Rasmussen wrote:

> 

> > One alternative to setting ASYM_CAP bottom up would be to set it only

> > where the asymmetry can be observed, and instead come up with a more

> > complicated way of setting BALANCE_WAKE bottom up until and including

> > the first level having the ASYM_CAP.

> 

> Right, that is what I was thinking.

> 

> > I looked at it briefly an realized that I couldn't find a clean way of

> > implementing it as I don't think we have visibility of which flags that

> > will be set at higher levels in the sched_domain hierarchy when the

> > lower levels are initialized. IOW, we have behavioural flags settings

> > depend on topology flags settings at a different level.

> 

> Looks doable if we pass @child into sd_init() in build_sched_domain().

> Then we could simply do:

> 

> 	*sd = (struct sched_domain){

> 		/* ... */

> 		.child = child,

> 	};

> 

> 	if (sd->flags & ASYM_CAP) {

> 		struct sched_domain *t = sd;

> 		while (t) {

> 			t->sd_flags |= BALANCE_WAKE;

> 			t = t->child;

> 		}

> 	}

> 

> Or something like that.


It appears to be working fine. I will roll it into v3 along with the
simpler and more sane ASYM_CAP semantics :)
diff mbox

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 351609279341..fe39118ffdfb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6397,6 +6397,9 @@  sd_init(struct sched_domain_topology_level *tl, int cpu)
 	 * Convert topological properties into behaviour.
 	 */
 
+	if (sd->flags & SD_ASYM_CPUCAPACITY)
+		sd->flags |= SD_BALANCE_WAKE;
+
 	if (sd->flags & SD_SHARE_CPUCAPACITY) {
 		sd->flags |= SD_PREFER_SIBLING;
 		sd->imbalance_pct = 110;