diff mbox

[v2,11/13] sched/fair: Avoid pulling tasks from non-overloaded higher capacity groups

Message ID 1466615004-3503-12-git-send-email-morten.rasmussen@arm.com
State Superseded
Headers show

Commit Message

Morten Rasmussen June 22, 2016, 5:03 p.m. UTC
For asymmetric cpu capacity systems it is counter-productive for
throughput if low capacity cpus are pulling tasks from non-overloaded
cpus with higher capacity. The assumption is that higher cpu capacity is
preferred over running alone in a group with lower cpu capacity.

This patch rejects higher cpu capacity groups with one or less task per
cpu as potential busiest group which could otherwise lead to a series of
failing load-balancing attempts leading to a force-migration.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

---
 kernel/sched/fair.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

-- 
1.9.1

Comments

Morten Rasmussen June 30, 2016, 7:49 a.m. UTC | #1
On Thu, Jun 23, 2016 at 02:20:48PM -0700, Sai Gurrappadi wrote:
> Hi Morten,

> 

> On 06/22/2016 10:03 AM, Morten Rasmussen wrote:

> 

> [...]

> 

> >  

> > +/*

> > + * group_smaller_cpu_capacity: Returns true if sched_group sg has smaller

> > + * per-cpu capacity than sched_group ref.

> > + */

> > +static inline bool

> > +group_smaller_cpu_capacity(struct sched_group *sg, struct sched_group *ref)

> > +{

> > +	return sg->sgc->max_capacity * capacity_margin <

> > +						ref->sgc->max_capacity * 1024;

> > +}

> > +

> >  static inline enum

> >  group_type group_classify(struct sched_group *group,

> >  			  struct sg_lb_stats *sgs)

> > @@ -6892,6 +6903,19 @@ static bool update_sd_pick_busiest(struct lb_env *env,

> >  	if (sgs->avg_load <= busiest->avg_load)

> >  		return false;

> >  

> > +	if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))

> > +		goto asym_packing;

> > +

> > +	/* Candidate sg has no more than one task per cpu and has

> > +	 * higher per-cpu capacity. Migrating tasks to less capable

> > +	 * cpus may harm throughput. Maximize throughput,

> > +	 * power/energy consequences are not considered.

> > +	 */

> > +	if (sgs->sum_nr_running <= sgs->group_weight &&

> > +	    group_smaller_cpu_capacity(sds->local, sg))

> > +		return false;

> > +

> > +asym_packing:

> 

> What about the case where IRQ/RT work reduces the capacity of some of

> these bigger CPUs? sgc->max_capacity might not necessarily capture

> that case.


Right, we could possibly improve this by using min_capacity instead, but
we could end up allowing tasks to be pulled to lower capacity cpus just
because one big cpu has reduced capacity due to RT/IRQ pressure and
therefore has lowered the groups min_capacity.

Ideally we should check all the capacities, but that complicates things
a lot.

Would you prefer min_capacity instead, or attempts to consider all the
cpu capacities available in both groups?
Morten Rasmussen July 12, 2016, 2:34 p.m. UTC | #2
On Tue, Jul 12, 2016 at 02:59:53PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 22, 2016 at 06:03:22PM +0100, Morten Rasmussen wrote:

> > @@ -6892,6 +6903,19 @@ static bool update_sd_pick_busiest(struct lb_env *env,

> >  	if (sgs->avg_load <= busiest->avg_load)

> >  		return false;

> >  

> > +	if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))

> > +		goto asym_packing;

> 

> Does this rely on the 'funny' ASYM_CAP semantics?


No, I would actually prefer 'sane' ASYM_CAP semantics. With 'funny'
semantics we ended up doing capacity checks inside domain with similar
cpus for the lower domain levels which would is pointless and pure
overhead. With 'sane' semantics, we only do the check for the domain
level where asymmetric capacities are actually observed.

> 

> > +

> > +	/* Candidate sg has no more than one task per cpu and has

> 

> Tssk, borken comment style.


Yes. I will fix that in v3.
Morten Rasmussen July 15, 2016, 8:39 a.m. UTC | #3
On Thu, Jul 14, 2016 at 09:39:23AM -0700, Sai Gurrappadi wrote:
> On 06/30/2016 12:49 AM, Morten Rasmussen wrote:

> > On Thu, Jun 23, 2016 at 02:20:48PM -0700, Sai Gurrappadi wrote:

> >> Hi Morten,

> >>

> >> On 06/22/2016 10:03 AM, Morten Rasmussen wrote:

> >>

> >> [...]

> >>

> >>>  

> >>> +/*

> >>> + * group_smaller_cpu_capacity: Returns true if sched_group sg has smaller

> >>> + * per-cpu capacity than sched_group ref.

> >>> + */

> >>> +static inline bool

> >>> +group_smaller_cpu_capacity(struct sched_group *sg, struct sched_group *ref)

> >>> +{

> >>> +	return sg->sgc->max_capacity * capacity_margin <

> >>> +						ref->sgc->max_capacity * 1024;

> >>> +}

> >>> +

> >>>  static inline enum

> >>>  group_type group_classify(struct sched_group *group,

> >>>  			  struct sg_lb_stats *sgs)

> >>> @@ -6892,6 +6903,19 @@ static bool update_sd_pick_busiest(struct lb_env *env,

> >>>  	if (sgs->avg_load <= busiest->avg_load)

> >>>  		return false;

> >>>  

> >>> +	if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))

> >>> +		goto asym_packing;

> >>> +

> >>> +	/* Candidate sg has no more than one task per cpu and has

> >>> +	 * higher per-cpu capacity. Migrating tasks to less capable

> >>> +	 * cpus may harm throughput. Maximize throughput,

> >>> +	 * power/energy consequences are not considered.

> >>> +	 */

> >>> +	if (sgs->sum_nr_running <= sgs->group_weight &&

> >>> +	    group_smaller_cpu_capacity(sds->local, sg))

> >>> +		return false;

> >>> +

> >>> +asym_packing:

> >>

> >> What about the case where IRQ/RT work reduces the capacity of some of

> >> these bigger CPUs? sgc->max_capacity might not necessarily capture

> >> that case.

> > 

> > Right, we could possibly improve this by using min_capacity instead, but

> > we could end up allowing tasks to be pulled to lower capacity cpus just

> > because one big cpu has reduced capacity due to RT/IRQ pressure and

> > therefore has lowered the groups min_capacity.

> > 

> > Ideally we should check all the capacities, but that complicates things

> > a lot.

> > 

> > Would you prefer min_capacity instead, or attempts to consider all the

> > cpu capacities available in both groups?

> > 

> 

> min_capacity as a start works I think given that we are only trying to

> make existing LB better, not necessarily optimizing for every case.

> Might have to revisit this anyways for thermals etc.


Agreed, I will make it min_capacity instead of max_capacity in v3.

Thanks,
Morten
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4d10d022006d..ca0048d95b3f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6789,6 +6789,17 @@  group_is_overloaded(struct lb_env *env, struct sg_lb_stats *sgs)
 	return false;
 }
 
+/*
+ * group_smaller_cpu_capacity: Returns true if sched_group sg has smaller
+ * per-cpu capacity than sched_group ref.
+ */
+static inline bool
+group_smaller_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
+{
+	return sg->sgc->max_capacity * capacity_margin <
+						ref->sgc->max_capacity * 1024;
+}
+
 static inline enum
 group_type group_classify(struct sched_group *group,
 			  struct sg_lb_stats *sgs)
@@ -6892,6 +6903,19 @@  static bool update_sd_pick_busiest(struct lb_env *env,
 	if (sgs->avg_load <= busiest->avg_load)
 		return false;
 
+	if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
+		goto asym_packing;
+
+	/* Candidate sg has no more than one task per cpu and has
+	 * higher per-cpu capacity. Migrating tasks to less capable
+	 * cpus may harm throughput. Maximize throughput,
+	 * power/energy consequences are not considered.
+	 */
+	if (sgs->sum_nr_running <= sgs->group_weight &&
+	    group_smaller_cpu_capacity(sds->local, sg))
+		return false;
+
+asym_packing:
 	/* This is the busiest node in its class. */
 	if (!(env->sd->flags & SD_ASYM_PACKING))
 		return true;