Message ID | 1466615004-3503-12-git-send-email-morten.rasmussen@arm.com |
---|---|
State | Superseded |
Headers | show |
On Thu, Jun 23, 2016 at 02:20:48PM -0700, Sai Gurrappadi wrote: > Hi Morten, > > On 06/22/2016 10:03 AM, Morten Rasmussen wrote: > > [...] > > > > > +/* > > + * group_smaller_cpu_capacity: Returns true if sched_group sg has smaller > > + * per-cpu capacity than sched_group ref. > > + */ > > +static inline bool > > +group_smaller_cpu_capacity(struct sched_group *sg, struct sched_group *ref) > > +{ > > + return sg->sgc->max_capacity * capacity_margin < > > + ref->sgc->max_capacity * 1024; > > +} > > + > > static inline enum > > group_type group_classify(struct sched_group *group, > > struct sg_lb_stats *sgs) > > @@ -6892,6 +6903,19 @@ static bool update_sd_pick_busiest(struct lb_env *env, > > if (sgs->avg_load <= busiest->avg_load) > > return false; > > > > + if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) > > + goto asym_packing; > > + > > + /* Candidate sg has no more than one task per cpu and has > > + * higher per-cpu capacity. Migrating tasks to less capable > > + * cpus may harm throughput. Maximize throughput, > > + * power/energy consequences are not considered. > > + */ > > + if (sgs->sum_nr_running <= sgs->group_weight && > > + group_smaller_cpu_capacity(sds->local, sg)) > > + return false; > > + > > +asym_packing: > > What about the case where IRQ/RT work reduces the capacity of some of > these bigger CPUs? sgc->max_capacity might not necessarily capture > that case. Right, we could possibly improve this by using min_capacity instead, but we could end up allowing tasks to be pulled to lower capacity cpus just because one big cpu has reduced capacity due to RT/IRQ pressure and therefore has lowered the groups min_capacity. Ideally we should check all the capacities, but that complicates things a lot. Would you prefer min_capacity instead, or attempts to consider all the cpu capacities available in both groups?
On Tue, Jul 12, 2016 at 02:59:53PM +0200, Peter Zijlstra wrote: > On Wed, Jun 22, 2016 at 06:03:22PM +0100, Morten Rasmussen wrote: > > @@ -6892,6 +6903,19 @@ static bool update_sd_pick_busiest(struct lb_env *env, > > if (sgs->avg_load <= busiest->avg_load) > > return false; > > > > + if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) > > + goto asym_packing; > > Does this rely on the 'funny' ASYM_CAP semantics? No, I would actually prefer 'sane' ASYM_CAP semantics. With 'funny' semantics we ended up doing capacity checks inside domain with similar cpus for the lower domain levels which would is pointless and pure overhead. With 'sane' semantics, we only do the check for the domain level where asymmetric capacities are actually observed. > > > + > > + /* Candidate sg has no more than one task per cpu and has > > Tssk, borken comment style. Yes. I will fix that in v3.
On Thu, Jul 14, 2016 at 09:39:23AM -0700, Sai Gurrappadi wrote: > On 06/30/2016 12:49 AM, Morten Rasmussen wrote: > > On Thu, Jun 23, 2016 at 02:20:48PM -0700, Sai Gurrappadi wrote: > >> Hi Morten, > >> > >> On 06/22/2016 10:03 AM, Morten Rasmussen wrote: > >> > >> [...] > >> > >>> > >>> +/* > >>> + * group_smaller_cpu_capacity: Returns true if sched_group sg has smaller > >>> + * per-cpu capacity than sched_group ref. > >>> + */ > >>> +static inline bool > >>> +group_smaller_cpu_capacity(struct sched_group *sg, struct sched_group *ref) > >>> +{ > >>> + return sg->sgc->max_capacity * capacity_margin < > >>> + ref->sgc->max_capacity * 1024; > >>> +} > >>> + > >>> static inline enum > >>> group_type group_classify(struct sched_group *group, > >>> struct sg_lb_stats *sgs) > >>> @@ -6892,6 +6903,19 @@ static bool update_sd_pick_busiest(struct lb_env *env, > >>> if (sgs->avg_load <= busiest->avg_load) > >>> return false; > >>> > >>> + if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) > >>> + goto asym_packing; > >>> + > >>> + /* Candidate sg has no more than one task per cpu and has > >>> + * higher per-cpu capacity. Migrating tasks to less capable > >>> + * cpus may harm throughput. Maximize throughput, > >>> + * power/energy consequences are not considered. > >>> + */ > >>> + if (sgs->sum_nr_running <= sgs->group_weight && > >>> + group_smaller_cpu_capacity(sds->local, sg)) > >>> + return false; > >>> + > >>> +asym_packing: > >> > >> What about the case where IRQ/RT work reduces the capacity of some of > >> these bigger CPUs? sgc->max_capacity might not necessarily capture > >> that case. > > > > Right, we could possibly improve this by using min_capacity instead, but > > we could end up allowing tasks to be pulled to lower capacity cpus just > > because one big cpu has reduced capacity due to RT/IRQ pressure and > > therefore has lowered the groups min_capacity. > > > > Ideally we should check all the capacities, but that complicates things > > a lot. > > > > Would you prefer min_capacity instead, or attempts to consider all the > > cpu capacities available in both groups? > > > > min_capacity as a start works I think given that we are only trying to > make existing LB better, not necessarily optimizing for every case. > Might have to revisit this anyways for thermals etc. Agreed, I will make it min_capacity instead of max_capacity in v3. Thanks, Morten
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4d10d022006d..ca0048d95b3f 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6789,6 +6789,17 @@ group_is_overloaded(struct lb_env *env, struct sg_lb_stats *sgs) return false; } +/* + * group_smaller_cpu_capacity: Returns true if sched_group sg has smaller + * per-cpu capacity than sched_group ref. + */ +static inline bool +group_smaller_cpu_capacity(struct sched_group *sg, struct sched_group *ref) +{ + return sg->sgc->max_capacity * capacity_margin < + ref->sgc->max_capacity * 1024; +} + static inline enum group_type group_classify(struct sched_group *group, struct sg_lb_stats *sgs) @@ -6892,6 +6903,19 @@ static bool update_sd_pick_busiest(struct lb_env *env, if (sgs->avg_load <= busiest->avg_load) return false; + if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) + goto asym_packing; + + /* Candidate sg has no more than one task per cpu and has + * higher per-cpu capacity. Migrating tasks to less capable + * cpus may harm throughput. Maximize throughput, + * power/energy consequences are not considered. + */ + if (sgs->sum_nr_running <= sgs->group_weight && + group_smaller_cpu_capacity(sds->local, sg)) + return false; + +asym_packing: /* This is the busiest node in its class. */ if (!(env->sd->flags & SD_ASYM_PACKING)) return true;
For asymmetric cpu capacity systems it is counter-productive for throughput if low capacity cpus are pulling tasks from non-overloaded cpus with higher capacity. The assumption is that higher cpu capacity is preferred over running alone in a group with lower cpu capacity. This patch rejects higher cpu capacity groups with one or less task per cpu as potential busiest group which could otherwise lead to a series of failing load-balancing attempts leading to a force-migration. cc: Ingo Molnar <mingo@redhat.com> cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> --- kernel/sched/fair.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) -- 1.9.1