diff mbox

[07/16] sched: Make SD_BALANCE_WAKE a topology flag

Message ID 1464001138-25063-8-git-send-email-morten.rasmussen@arm.com
State New
Headers show

Commit Message

Morten Rasmussen May 23, 2016, 10:58 a.m. UTC
For systems with the SD_ASYM_CPUCAPACITY flag set on higher level in the
sched_domain hierarchy we need a way to enable wake-up balancing for the
lower levels as well as we may want to balance tasks that don't fit the
capacity of the previous cpu.

We have the option of introducing a new topology flag to express this
requirement, or let the existing SD_BALANCE_WAKE flag be set by the
architecture as a topology flag. The former means introducing yet
another flag, the latter breaks the current meaning of topology flags.
None of the options are really desirable.

cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

---
 kernel/sched/core.c | 4 ++++
 1 file changed, 4 insertions(+)

-- 
1.9.1

Comments

Morten Rasmussen May 25, 2016, 9:27 a.m. UTC | #1
On Wed, May 25, 2016 at 07:52:49AM +0800, Yuyang Du wrote:
> On Mon, May 23, 2016 at 11:58:49AM +0100, Morten Rasmussen wrote:

> > For systems with the SD_ASYM_CPUCAPACITY flag set on higher level in the

> > sched_domain hierarchy we need a way to enable wake-up balancing for the

> > lower levels as well as we may want to balance tasks that don't fit the

> > capacity of the previous cpu.

> > 

> > We have the option of introducing a new topology flag to express this

> > requirement, or let the existing SD_BALANCE_WAKE flag be set by the

> > architecture as a topology flag. The former means introducing yet

> > another flag, the latter breaks the current meaning of topology flags.

> > None of the options are really desirable.

>  

> I'd propose to replace SD_WAKE_AFFINE with SD_BALANCE_WAKE. And the

> SD_WAKE_AFFINE semantic is simply "waker allowed":

> 

> waker_allowed = cpumask_test_cpu(cpu, tsk_cpus_allowed(p));

> 

> This can be implemented without current functionality change.

> 

> From there, the choice between waker and wakee, and fast path

> select_idle_sibling() and the rest slow path should be reworked, which

> I am thinking about.


I don't really understand how that would work. If you change the
semantics of the flags you don't preserve current behaviour. To me it
sounds like at total rewrite of everything.

SD_BALANCE_WAKE controls whether we go slow path or not in case
want_affine is false. SD_WAKE_AFFINE controls whether we should consider
waking up near the waker instead of always waking up near the previous
cpu.
Morten Rasmussen June 8, 2016, 8:45 a.m. UTC | #2
On Wed, Jun 01, 2016 at 10:18:17PM +0200, Peter Zijlstra wrote:
> On Mon, May 23, 2016 at 11:58:49AM +0100, Morten Rasmussen wrote:

> > For systems with the SD_ASYM_CPUCAPACITY flag set on higher level in the

> > sched_domain hierarchy we need a way to enable wake-up balancing for the

> > lower levels as well as we may want to balance tasks that don't fit the

> > capacity of the previous cpu.

> > 

> > We have the option of introducing a new topology flag to express this

> > requirement, or let the existing SD_BALANCE_WAKE flag be set by the

> > architecture as a topology flag. The former means introducing yet

> > another flag, the latter breaks the current meaning of topology flags.

> > None of the options are really desirable.

> 

> So why can't you couple this to ASYM_CAPACITY? If that's set anywhere,

> add BALANCE_WAKE as appropriate?


That should be possible. It is set at the sched_domain levels that span
cpus of different capacity, but we need to set BALANCE_WAKE on the level
below as well. So we would have to introduce a dependency between flags
set at different levels.

However, following the discussion with Vincent on enabling WAKE_AFFINE
across cpus of different capacity, I might be able to repurpose
ASYM_CPUCAPACITY to set BALANCE_WAKE instead which would be simpler I
think.
diff mbox

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 558ec4a..8014b4a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5658,6 +5658,7 @@  static int sd_degenerate(struct sched_domain *sd)
 			 SD_BALANCE_NEWIDLE |
 			 SD_BALANCE_FORK |
 			 SD_BALANCE_EXEC |
+			 SD_BALANCE_WAKE |
 			 SD_SHARE_CPUCAPACITY |
 			 SD_ASYM_CPUCAPACITY |
 			 SD_SHARE_PKG_RESOURCES |
@@ -5690,6 +5691,7 @@  sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
 				SD_BALANCE_NEWIDLE |
 				SD_BALANCE_FORK |
 				SD_BALANCE_EXEC |
+				SD_BALANCE_WAKE |
 				SD_ASYM_CPUCAPACITY |
 				SD_SHARE_CPUCAPACITY |
 				SD_SHARE_PKG_RESOURCES |
@@ -6308,6 +6310,7 @@  static int sched_domains_curr_level;
  * Odd ones out:
  * SD_ASYM_PACKING        - describes SMT quirks
  * SD_ASYM_CPUCAPACITY    - describes mixed capacity topologies
+ * SD_BALANCE_WAKE	  - controls wake-up balancing (expensive)
  */
 #define TOPOLOGY_SD_FLAGS		\
 	(SD_SHARE_CPUCAPACITY |		\
@@ -6315,6 +6318,7 @@  static int sched_domains_curr_level;
 	 SD_NUMA |			\
 	 SD_ASYM_PACKING |		\
 	 SD_ASYM_CPUCAPACITY |		\
+	 SD_BALANCE_WAKE |		\
 	 SD_SHARE_POWERDOMAIN)
 
 static struct sched_domain *