diff mbox

sched: fix nohz.next_balance update

Message ID 1438336149-25809-1-git-send-email-vincent.guittot@linaro.org
State New
Headers show

Commit Message

Vincent Guittot July 31, 2015, 9:49 a.m. UTC
Since commit d4573c3e1c99 ("sched: Improve load balancing in the presence
of idle CPUs"), the ILB CPU starts with the idle load balancing of other
idle CPUs and finishes with itself in order to speed up the spread of tasks
in all idle CPUs.

The this_rq->next_balance is still used in nohz_idle_balance as an
intermediate step to gather the shortest next balance before updating
nohz.next_balance. But the former has not been updated yet and is likely to
be set with the current jiffies. As a result, the nohz.next_balance will be
set with current jiffies instead of the real next balance date. This
generates spurious kicks of nohz ilde balance.

nohz_idle_balance must set the nohz.next_balance without taking into
account this_rq->next_balance which is not updated yet. Then, this_rq will
update nohz.next_update with its next_balance once updated and if necessary.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

Comments

Vincent Guittot Aug. 3, 2015, 8:44 a.m. UTC | #1
Hi Jason,

On 1 August 2015 at 00:12, Jason Low <jason.low2@hp.com> wrote:
> Hi Vincent,
>
> On Fri, 2015-07-31 at 11:49 +0200, Vincent Guittot wrote:
>> Since commit d4573c3e1c99 ("sched: Improve load balancing in the presence
>> of idle CPUs"), the ILB CPU starts with the idle load balancing of other
>> idle CPUs and finishes with itself in order to speed up the spread of tasks
>> in all idle CPUs.
>>
>> The this_rq->next_balance is still used in nohz_idle_balance as an
>> intermediate step to gather the shortest next balance before updating
>> nohz.next_balance. But the former has not been updated yet and is likely to
>> be set with the current jiffies. As a result, the nohz.next_balance will be
>> set with current jiffies instead of the real next balance date. This
>> generates spurious kicks of nohz ilde balance.
>>
>> nohz_idle_balance must set the nohz.next_balance without taking into
>> account this_rq->next_balance which is not updated yet. Then, this_rq will
>> update nohz.next_update with its next_balance once updated and if necessary.
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  kernel/sched/fair.c | 33 +++++++++++++++++++++++++++++----
>>  1 file changed, 29 insertions(+), 4 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 587a2f6..2b02089 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -7779,8 +7779,21 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
>>        * When the cpu is attached to null domain for ex, it will not be
>>        * updated.
>>        */
>> -     if (likely(update_next_balance))
>> +     if (likely(update_next_balance)) {
>>               rq->next_balance = next_balance;
>> +
>> +             /*
>> +              * If this cpu has been elected to perform the nohz idle
>> +              * balance. Other idle cpus have already rebalance with
>> +              * nohz_idle_balance and the nohz.next_balaance has been
>> +              * updated accordingly. This cpu has now run the idle load
>> +              * balance for itself and we need to update the
>> +              * nohz.next_balance accordingly.
>> +              */
>> +             if ((idle == CPU_IDLE) &&
>> +                     time_after(nohz.next_balance, rq->next_balance))
>> +                             nohz.next_balance = rq->next_balance;
>
> The code accessing the "nohz" structure may also need to be put in an:
>
>     #ifdef CONFIG_NO_HZ_COMMON
>
> in order for it to compile when !CONFIG_NO_HZ_COMMON.

You're right. I'm going to fix that right now


Vincent

>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 587a2f6..2b02089 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7779,8 +7779,21 @@  static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 	 * When the cpu is attached to null domain for ex, it will not be
 	 * updated.
 	 */
-	if (likely(update_next_balance))
+	if (likely(update_next_balance)) {
 		rq->next_balance = next_balance;
+
+		/*
+		 * If this cpu has been elected to perform the nohz idle
+		 * balance. Other idle cpus have already rebalance with
+		 * nohz_idle_balance and the nohz.next_balaance has been
+		 * updated accordingly. This cpu has now run the idle load
+		 * balance for itself and we need to update the
+		 * nohz.next_balance accordingly.
+		 */
+		if ((idle == CPU_IDLE) &&
+			time_after(nohz.next_balance, rq->next_balance))
+				nohz.next_balance = rq->next_balance;
+	}
 }
 
 #ifdef CONFIG_NO_HZ_COMMON
@@ -7793,6 +7806,9 @@  static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 	int this_cpu = this_rq->cpu;
 	struct rq *rq;
 	int balance_cpu;
+	/* Earliest time when we have to do rebalance again */
+	unsigned long next_balance = jiffies + 60*HZ;
+	int update_next_balance = 0;
 
 	if (idle != CPU_IDLE ||
 	    !test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
@@ -7824,10 +7840,19 @@  static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 			rebalance_domains(rq, CPU_IDLE);
 		}
 
-		if (time_after(this_rq->next_balance, rq->next_balance))
-			this_rq->next_balance = rq->next_balance;
+		if (time_after(next_balance, rq->next_balance)) {
+			next_balance = rq->next_balance;
+			update_next_balance = 1;
+		}
 	}
-	nohz.next_balance = this_rq->next_balance;
+
+	/*
+	 * next_balance will be updated only when there is a need.
+	 * When the cpu is attached to null domain for ex, it will not be
+	 * updated.
+	 */
+	if (likely(update_next_balance))
+		nohz.next_balance = next_balance;
 end:
 	clear_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu));
 }