[V3,2/2] sched/fair: Fallback to sched-idle CPU if idle CPU isn't found

Message ID eeafa25fdeb6f6edd5b2da716bc8f0ba7708cbcf.1561523542.git.viresh.kumar@linaro.org
State New
Headers show
Series
  • sched/fair: Fallback to sched-idle CPU in absence of idle CPUs
Related show

Commit Message

Viresh Kumar June 26, 2019, 5:06 a.m.
We try to find an idle CPU to run the next task, but in case we don't
find an idle CPU it is better to pick a CPU which will run the task the
soonest, for performance reason.

A CPU which isn't idle but has only SCHED_IDLE activity queued on it
should be a good target based on this criteria as any normal fair task
will most likely preempt the currently running SCHED_IDLE task
immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one
shall give better results as it should be able to run the task sooner
than an idle CPU (which requires to be woken up from an idle state).

This patch updates both fast and slow paths with this optimization.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

---
 kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++----------
 1 file changed, 33 insertions(+), 10 deletions(-)

-- 
2.21.0.rc0.269.g1a574e7a288b

Comments

Subhra Mazumdar June 29, 2019, 1:16 a.m. | #1
On 6/25/19 10:06 PM, Viresh Kumar wrote:
> We try to find an idle CPU to run the next task, but in case we don't

> find an idle CPU it is better to pick a CPU which will run the task the

> soonest, for performance reason.

>

> A CPU which isn't idle but has only SCHED_IDLE activity queued on it

> should be a good target based on this criteria as any normal fair task

> will most likely preempt the currently running SCHED_IDLE task

> immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one

> shall give better results as it should be able to run the task sooner

> than an idle CPU (which requires to be woken up from an idle state).

>

> This patch updates both fast and slow paths with this optimization.

>

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

> ---

>   kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++----------

>   1 file changed, 33 insertions(+), 10 deletions(-)

>

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> index 1277adc3e7ed..2e0527fd468c 100644

> --- a/kernel/sched/fair.c

> +++ b/kernel/sched/fair.c

> @@ -5376,6 +5376,15 @@ static struct {

>   

>   #endif /* CONFIG_NO_HZ_COMMON */

>   

> +/* CPU only has SCHED_IDLE tasks enqueued */

> +static int sched_idle_cpu(int cpu)

> +{

> +	struct rq *rq = cpu_rq(cpu);

> +

> +	return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&

> +			rq->nr_running);

> +}

> +

Shouldn't this check if rq->curr is also sched idle? And why not drop the
rq->nr_running non zero check?
Viresh Kumar July 1, 2019, 8:03 a.m. | #2
On 28-06-19, 18:16, Subhra Mazumdar wrote:
> 

> On 6/25/19 10:06 PM, Viresh Kumar wrote:

> > We try to find an idle CPU to run the next task, but in case we don't

> > find an idle CPU it is better to pick a CPU which will run the task the

> > soonest, for performance reason.

> > 

> > A CPU which isn't idle but has only SCHED_IDLE activity queued on it

> > should be a good target based on this criteria as any normal fair task

> > will most likely preempt the currently running SCHED_IDLE task

> > immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one

> > shall give better results as it should be able to run the task sooner

> > than an idle CPU (which requires to be woken up from an idle state).

> > 

> > This patch updates both fast and slow paths with this optimization.

> > 

> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

> > ---

> >   kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++----------

> >   1 file changed, 33 insertions(+), 10 deletions(-)

> > 

> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> > index 1277adc3e7ed..2e0527fd468c 100644

> > --- a/kernel/sched/fair.c

> > +++ b/kernel/sched/fair.c

> > @@ -5376,6 +5376,15 @@ static struct {

> >   #endif /* CONFIG_NO_HZ_COMMON */

> > +/* CPU only has SCHED_IDLE tasks enqueued */

> > +static int sched_idle_cpu(int cpu)

> > +{

> > +	struct rq *rq = cpu_rq(cpu);

> > +

> > +	return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&

> > +			rq->nr_running);

> > +}

> > +

> Shouldn't this check if rq->curr is also sched idle?


Why wouldn't the current set of checks be enough to guarantee that ?

> And why not drop the rq->nr_running non zero check?


Because CPU isn't sched-idle if nr_running and idle_h_nr_running are both 0,
i.e. it is an IDLE cpu in that case. And so I thought it is important to have
this check as well.

-- 
viresh
Subhra Mazumdar July 1, 2019, 10:08 p.m. | #3
On 7/1/19 1:03 AM, Viresh Kumar wrote:
> On 28-06-19, 18:16, Subhra Mazumdar wrote:

>> On 6/25/19 10:06 PM, Viresh Kumar wrote:

>>> We try to find an idle CPU to run the next task, but in case we don't

>>> find an idle CPU it is better to pick a CPU which will run the task the

>>> soonest, for performance reason.

>>>

>>> A CPU which isn't idle but has only SCHED_IDLE activity queued on it

>>> should be a good target based on this criteria as any normal fair task

>>> will most likely preempt the currently running SCHED_IDLE task

>>> immediately. In fact, choosing a SCHED_IDLE CPU over a fully idle one

>>> shall give better results as it should be able to run the task sooner

>>> than an idle CPU (which requires to be woken up from an idle state).

>>>

>>> This patch updates both fast and slow paths with this optimization.

>>>

>>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

>>> ---

>>>    kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++----------

>>>    1 file changed, 33 insertions(+), 10 deletions(-)

>>>

>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

>>> index 1277adc3e7ed..2e0527fd468c 100644

>>> --- a/kernel/sched/fair.c

>>> +++ b/kernel/sched/fair.c

>>> @@ -5376,6 +5376,15 @@ static struct {

>>>    #endif /* CONFIG_NO_HZ_COMMON */

>>> +/* CPU only has SCHED_IDLE tasks enqueued */

>>> +static int sched_idle_cpu(int cpu)

>>> +{

>>> +	struct rq *rq = cpu_rq(cpu);

>>> +

>>> +	return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&

>>> +			rq->nr_running);

>>> +}

>>> +

>> Shouldn't this check if rq->curr is also sched idle?

> Why wouldn't the current set of checks be enough to guarantee that ?

I thought nr_running does not include the on-cpu thread.
>

>> And why not drop the rq->nr_running non zero check?

> Because CPU isn't sched-idle if nr_running and idle_h_nr_running are both 0,

> i.e. it is an IDLE cpu in that case. And so I thought it is important to have

> this check as well.

>

idle_cpu() not only checks nr_running is 0 but also rq->curr == rq->idle
Peter Zijlstra July 2, 2019, 8:35 a.m. | #4
On Mon, Jul 01, 2019 at 03:08:41PM -0700, Subhra Mazumdar wrote:
> On 7/1/19 1:03 AM, Viresh Kumar wrote:

> > On 28-06-19, 18:16, Subhra Mazumdar wrote:

> > > On 6/25/19 10:06 PM, Viresh Kumar wrote:


> > > > @@ -5376,6 +5376,15 @@ static struct {

> > > >    #endif /* CONFIG_NO_HZ_COMMON */

> > > > +/* CPU only has SCHED_IDLE tasks enqueued */

> > > > +static int sched_idle_cpu(int cpu)

> > > > +{

> > > > +	struct rq *rq = cpu_rq(cpu);

> > > > +

> > > > +	return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&

> > > > +			rq->nr_running);

> > > > +}

> > > > +

> > > Shouldn't this check if rq->curr is also sched idle?

> > Why wouldn't the current set of checks be enough to guarantee that ?

> I thought nr_running does not include the on-cpu thread.


It very much does.

> > > And why not drop the rq->nr_running non zero check?

> > Because CPU isn't sched-idle if nr_running and idle_h_nr_running are both 0,

> > i.e. it is an IDLE cpu in that case. And so I thought it is important to have

> > this check as well.

> > 

> idle_cpu() not only checks nr_running is 0 but also rq->curr == rq->idle


idle_cpu() will try very hard to declare a CPU !idle. But I don't see
how that it relevant. sched_idle_cpu() will only return true if there
are only SCHED_IDLE tasks on the CPU. Viresh's test is simple and
straight forward.
Subhra Mazumdar July 2, 2019, 4:32 p.m. | #5
On 7/2/19 1:35 AM, Peter Zijlstra wrote:
> On Mon, Jul 01, 2019 at 03:08:41PM -0700, Subhra Mazumdar wrote:

>> On 7/1/19 1:03 AM, Viresh Kumar wrote:

>>> On 28-06-19, 18:16, Subhra Mazumdar wrote:

>>>> On 6/25/19 10:06 PM, Viresh Kumar wrote:

>>>>> @@ -5376,6 +5376,15 @@ static struct {

>>>>>     #endif /* CONFIG_NO_HZ_COMMON */

>>>>> +/* CPU only has SCHED_IDLE tasks enqueued */

>>>>> +static int sched_idle_cpu(int cpu)

>>>>> +{

>>>>> +	struct rq *rq = cpu_rq(cpu);

>>>>> +

>>>>> +	return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&

>>>>> +			rq->nr_running);

>>>>> +}

>>>>> +

>>>> Shouldn't this check if rq->curr is also sched idle?

>>> Why wouldn't the current set of checks be enough to guarantee that ?

>> I thought nr_running does not include the on-cpu thread.

> It very much does.

>

>>>> And why not drop the rq->nr_running non zero check?

>>> Because CPU isn't sched-idle if nr_running and idle_h_nr_running are both 0,

>>> i.e. it is an IDLE cpu in that case. And so I thought it is important to have

>>> this check as well.

>>>

>> idle_cpu() not only checks nr_running is 0 but also rq->curr == rq->idle

> idle_cpu() will try very hard to declare a CPU !idle. But I don't see

> how that it relevant. sched_idle_cpu() will only return true if there

> are only SCHED_IDLE tasks on the CPU. Viresh's test is simple and

> straight forward.


OK makes sense.

Thanks,
Subhra

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1277adc3e7ed..2e0527fd468c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5376,6 +5376,15 @@  static struct {
 
 #endif /* CONFIG_NO_HZ_COMMON */
 
+/* CPU only has SCHED_IDLE tasks enqueued */
+static int sched_idle_cpu(int cpu)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&
+			rq->nr_running);
+}
+
 static unsigned long cpu_runnable_load(struct rq *rq)
 {
 	return cfs_rq_runnable_load_avg(&rq->cfs);
@@ -5698,7 +5707,7 @@  find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 	unsigned int min_exit_latency = UINT_MAX;
 	u64 latest_idle_timestamp = 0;
 	int least_loaded_cpu = this_cpu;
-	int shallowest_idle_cpu = -1;
+	int shallowest_idle_cpu = -1, si_cpu = -1;
 	int i;
 
 	/* Check if we have any choice: */
@@ -5729,7 +5738,12 @@  find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 				latest_idle_timestamp = rq->idle_stamp;
 				shallowest_idle_cpu = i;
 			}
-		} else if (shallowest_idle_cpu == -1) {
+		} else if (shallowest_idle_cpu == -1 && si_cpu == -1) {
+			if (sched_idle_cpu(i)) {
+				si_cpu = i;
+				continue;
+			}
+
 			load = cpu_runnable_load(cpu_rq(i));
 			if (load < min_load) {
 				min_load = load;
@@ -5738,7 +5752,11 @@  find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this
 		}
 	}
 
-	return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
+	if (shallowest_idle_cpu != -1)
+		return shallowest_idle_cpu;
+	if (si_cpu != -1)
+		return si_cpu;
+	return least_loaded_cpu;
 }
 
 static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p,
@@ -5891,7 +5909,7 @@  static int select_idle_core(struct task_struct *p, struct sched_domain *sd, int
  */
 static int select_idle_smt(struct task_struct *p, int target)
 {
-	int cpu;
+	int cpu, si_cpu = -1;
 
 	if (!static_branch_likely(&sched_smt_present))
 		return -1;
@@ -5901,9 +5919,11 @@  static int select_idle_smt(struct task_struct *p, int target)
 			continue;
 		if (available_idle_cpu(cpu))
 			return cpu;
+		if (si_cpu == -1 && sched_idle_cpu(cpu))
+			si_cpu = cpu;
 	}
 
-	return -1;
+	return si_cpu;
 }
 
 #else /* CONFIG_SCHED_SMT */
@@ -5931,7 +5951,7 @@  static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 	u64 avg_cost, avg_idle;
 	u64 time, cost;
 	s64 delta;
-	int cpu, nr = INT_MAX;
+	int cpu, nr = INT_MAX, si_cpu = -1;
 	int this = smp_processor_id();
 
 	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
@@ -5960,11 +5980,13 @@  static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
 
 	for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
 		if (!--nr)
-			return -1;
+			return si_cpu;
 		if (!cpumask_test_cpu(cpu, p->cpus_ptr))
 			continue;
 		if (available_idle_cpu(cpu))
 			break;
+		if (si_cpu == -1 && sched_idle_cpu(cpu))
+			si_cpu = cpu;
 	}
 
 	time = cpu_clock(this) - time;
@@ -5983,13 +6005,14 @@  static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	struct sched_domain *sd;
 	int i, recent_used_cpu;
 
-	if (available_idle_cpu(target))
+	if (available_idle_cpu(target) || sched_idle_cpu(target))
 		return target;
 
 	/*
 	 * If the previous CPU is cache affine and idle, don't be stupid:
 	 */
-	if (prev != target && cpus_share_cache(prev, target) && available_idle_cpu(prev))
+	if (prev != target && cpus_share_cache(prev, target) &&
+	    (available_idle_cpu(prev) || sched_idle_cpu(prev)))
 		return prev;
 
 	/* Check a recently used CPU as a potential idle candidate: */
@@ -5997,7 +6020,7 @@  static int select_idle_sibling(struct task_struct *p, int prev, int target)
 	if (recent_used_cpu != prev &&
 	    recent_used_cpu != target &&
 	    cpus_share_cache(recent_used_cpu, target) &&
-	    available_idle_cpu(recent_used_cpu) &&
+	    (available_idle_cpu(recent_used_cpu) || sched_idle_cpu(recent_used_cpu)) &&
 	    cpumask_test_cpu(p->recent_used_cpu, p->cpus_ptr)) {
 		/*
 		 * Replace recent_used_cpu with prev as it is a potential