From patchwork Wed Feb 14 15:26:44 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Vincent Guittot <vincent.guittot@linaro.org>
X-Patchwork-Id: 128364
Delivered-To: patch@linaro.org
Received: by 10.46.124.24 with SMTP id x24csp727691ljc;
 Wed, 14 Feb 2018 07:27:43 -0800 (PST)
X-Google-Smtp-Source: AH8x226XVKe1W+6H/jdQOx0auP9EYFxaS224ugoSib5tZyxJ6jJj0KqlSqRcy/D9Uhjtxiby2g2F
X-Received: by 10.101.92.66 with SMTP id v2mr4206797pgr.341.1518622063514;
 Wed, 14 Feb 2018 07:27:43 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1518622063; cv=none;
 d=google.com; s=arc-20160816;
 b=SyZDhjnrdOYJwdlViR+UEWXrhTDUlwqis0X8NrlJlDlcA71jeyCyWiBsXjOmFTwi7c
 N+gpoCn1Zc5ZfIMehVtux3sP2KC6DnsoXEkK1CXqWnyKSO0RybAqq0SFroICklk7s5bm
 iyQQQj9OQUr0iCXtY98ytazX60U4hKAfBxpXiL4MpMeTo2Do4BjHpgd2Yy/4JawvYozX
 pOBqfCMtFQvj5Exsgr+QacleyZ51Gvz2+9yAmRxJA32lxNXhriDWgA6JCsw2Qki97dZC
 PaAEkPvkbuaL8sPs8DXhx3z8TCPPksAOMK6smqlALIkEmhgSWHDKLT36g9nogKrAY1WB
 w60w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=ychFB2x+Ocf7MMtzUI1dfCI/hUxDhn5G7pssdtHsoo4=;
 b=kyhTREmqkVlmrcbgJ13zlxsmtJQkl3HM/jE/fCxwnIO57j/8oeDBNr+BeJIG1U9MHQ
 MxhO6wqxS38NgElqU2ZUWMtp6eH+T7bsQY0bHc2dOMNMVGdwmAPgYEMP7wvgJKIuMiGg
 C4m39guO4md7uFLkHuV1QV7JLg4J+YR6B1Wg6ic1PFoNntX2OM0P8/igYuADoeF5XoxD
 DjyuxUPu1syXJLx+rp5tlOoKZbK1KnbFpuyYbb73nXIcIPUi1yuNPuQyFpTSBIkVdfo9
 I+AkYqNV0JEGOh2lfIdbJAAE0oQ1pONrOxP+ZhVnWqoUIZPPFv5mPWJaBXMTWet00hvy
 U3vg==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=js1/XRBP;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 h1-v6si9153681pld.637.2018.02.14.07.27.43; 
 Wed, 14 Feb 2018 07:27:43 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=js1/XRBP;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1031351AbeBNP1h (ORCPT <rfc822; dan.rue@linaro.org> + 28 others); 
 Wed, 14 Feb 2018 10:27:37 -0500
Received: from mail-wm0-f65.google.com ([74.125.82.65]:56028 "EHLO
 mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1031296AbeBNP0x (ORCPT
 <rfc822;linux-kernel@vger.kernel.org>);
 Wed, 14 Feb 2018 10:26:53 -0500
Received: by mail-wm0-f65.google.com with SMTP id a84so12095020wmi.5
 for <linux-kernel@vger.kernel.org>;
 Wed, 14 Feb 2018 07:26:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=ychFB2x+Ocf7MMtzUI1dfCI/hUxDhn5G7pssdtHsoo4=;
 b=js1/XRBP1KEJOvGPx4HNw9DA6wRrDaA2sa06LRtYGrUC7eyBiFxp/21R85DWM/GjEm
 reVoVzQJKNSJ5qHNbB8j5ntS5WBvaxPoOXEFw/ImAJ2fOw2TRaD1FtKlbsgqxm4K3SPp
 DW0l34xEpsLS4NYkbXjUtS2xYG4jcnVoNdH4o=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=ychFB2x+Ocf7MMtzUI1dfCI/hUxDhn5G7pssdtHsoo4=;
 b=WoVUq6ezjytBGIM5BYKcr1+BdaUCn/0VXxBDcyoRaYrl/MT4FEVMf5RlElO0UYqHUy
 p3ln7WHqVqmmdk8XfrLtYaqWD0wAkY2voRksT+T+wPg6AoBWs5YZ7PqLDybCguBcNW9/
 Yz2u9kGE/CocU4zUd4q8tkc6ry3nCHRyX3KqwKwGi3/mF8rwZuDVupcaA76jh+jfGLIS
 LF9Ks0zHOhR+KMiAgsDnQLg227tjtbfpCfWjvhkKL24rPVYS5F+fEcFK+IauBwH5QhqU
 WpBqFwIoR315RNtuGzL4uaKcTy98GQ0PNY0dXLXAXB3Scyys0CNrLpQv3hcw0TyKdWq8
 nfMw==
X-Gm-Message-State: APf1xPDE5xv+1DERaDa4mS8ctIC49WmK2jbCemhCWi6FaKPTQuv8cEnO
 v/SQq+DwjcpgCyK/X5CLZhJ6TQ==
X-Received: by 10.80.186.142 with SMTP id x14mr7649797ede.210.1518622011969; 
 Wed, 14 Feb 2018 07:26:51 -0800 (PST)
Received: from localhost.localdomain ([2a01:e0a:f:6020:f0d0:a4bc:1106:ff2c])
 by smtp.gmail.com with ESMTPSA id
 p93sm3361152edp.63.2018.02.14.07.26.50
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Wed, 14 Feb 2018 07:26:51 -0800 (PST)
From: Vincent Guittot <vincent.guittot@linaro.org>
To: peterz@infradead.org, mingo@kernel.org,
 linux-kernel@vger.kernel.org, valentin.schneider@arm.com
Cc: morten.rasmussen@foss.arm.com, brendan.jackman@arm.com,
 dietmar.eggemann@arm.com, Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH v5 1/3] sched: Stop nohz stats when decayed
Date: Wed, 14 Feb 2018 16:26:44 +0100
Message-Id: <1518622006-16089-2-git-send-email-vincent.guittot@linaro.org>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1518622006-16089-1-git-send-email-vincent.guittot@linaro.org>
References: <1518622006-16089-1-git-send-email-vincent.guittot@linaro.org>
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Stopped the periodic update of blocked load when all idle CPUs have fully
decayed. We introduce a new nohz.has_blocked that reflect if some idle
CPUs has blocked load that have to be periodiccally updated. nohz.has_blocked
is set everytime that a Idle CPU can have blocked load and it is then clear
when no more blocked load has been detected during an update. We don't need
atomic operation but only to make cure of the right ordering when updating
nohz.idle_cpus_mask and nohz.has_blocked.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c  | 122 ++++++++++++++++++++++++++++++++++++++++++---------
 kernel/sched/sched.h |   1 +
 2 files changed, 102 insertions(+), 21 deletions(-)

-- 
2.7.4

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7af1fa9..5a6835e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5383,8 +5383,9 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
 static struct {
 	cpumask_var_t idle_cpus_mask;
 	atomic_t nr_cpus;
+	int has_blocked;		/* Idle CPUS has blocked load */
 	unsigned long next_balance;     /* in jiffy units */
-	unsigned long next_stats;
+	unsigned long next_blocked;	/* Next update of blocked load in jiffies */
 } nohz ____cacheline_aligned;
 
 #endif /* CONFIG_NO_HZ_COMMON */
@@ -6951,6 +6952,7 @@ enum fbq_type { regular, remote, all };
 #define LBF_DST_PINNED  0x04
 #define LBF_SOME_PINNED	0x08
 #define LBF_NOHZ_STATS	0x10
+#define LBF_NOHZ_AGAIN	0x20
 
 struct lb_env {
 	struct sched_domain	*sd;
@@ -7335,8 +7337,6 @@ static void attach_tasks(struct lb_env *env)
 	rq_unlock(env->dst_rq, &rf);
 }
 
-#ifdef CONFIG_FAIR_GROUP_SCHED
-
 static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
 {
 	if (cfs_rq->load.weight)
@@ -7354,11 +7354,14 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
 	return true;
 }
 
+#ifdef CONFIG_FAIR_GROUP_SCHED
+
 static void update_blocked_averages(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
 	struct cfs_rq *cfs_rq, *pos;
 	struct rq_flags rf;
+	bool done = true;
 
 	rq_lock_irqsave(rq, &rf);
 	update_rq_clock(rq);
@@ -7388,10 +7391,14 @@ static void update_blocked_averages(int cpu)
 		 */
 		if (cfs_rq_is_decayed(cfs_rq))
 			list_del_leaf_cfs_rq(cfs_rq);
+		else
+			done = false;
 	}
 
 #ifdef CONFIG_NO_HZ_COMMON
 	rq->last_blocked_load_update_tick = jiffies;
+	if (done)
+		rq->has_blocked_load = 0;
 #endif
 	rq_unlock_irqrestore(rq, &rf);
 }
@@ -7454,6 +7461,8 @@ static inline void update_blocked_averages(int cpu)
 	update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);
 #ifdef CONFIG_NO_HZ_COMMON
 	rq->last_blocked_load_update_tick = jiffies;
+	if (cfs_rq_is_decayed(cfs_rq))
+		rq->has_blocked_load = 0;
 #endif
 	rq_unlock_irqrestore(rq, &rf);
 }
@@ -7789,18 +7798,25 @@ group_type group_classify(struct sched_group *group,
 	return group_other;
 }
 
-static void update_nohz_stats(struct rq *rq)
+static bool update_nohz_stats(struct rq *rq)
 {
 #ifdef CONFIG_NO_HZ_COMMON
 	unsigned int cpu = rq->cpu;
 
+	if (!rq->has_blocked_load)
+		return false;
+
 	if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
-		return;
+		return false;
 
 	if (!time_after(jiffies, rq->last_blocked_load_update_tick))
-		return;
+		return true;
 
 	update_blocked_averages(cpu);
+
+	return rq->has_blocked_load;
+#else
+	return false;
 #endif
 }
 
@@ -7826,8 +7842,8 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
 		struct rq *rq = cpu_rq(i);
 
-		if (env->flags & LBF_NOHZ_STATS)
-			update_nohz_stats(rq);
+		if ((env->flags & LBF_NOHZ_STATS) && update_nohz_stats(rq))
+			env->flags |= LBF_NOHZ_AGAIN;
 
 		/* Bias balancing toward cpus of our domain */
 		if (local_group)
@@ -7979,18 +7995,17 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 	struct sg_lb_stats *local = &sds->local_stat;
 	struct sg_lb_stats tmp_sgs;
 	int load_idx, prefer_sibling = 0;
+#ifdef CONFIG_NO_HZ_COMMON
+	int has_blocked = READ_ONCE(nohz.has_blocked);
+#endif
 	bool overload = false;
 
 	if (child && child->flags & SD_PREFER_SIBLING)
 		prefer_sibling = 1;
 
 #ifdef CONFIG_NO_HZ_COMMON
-	if (env->idle == CPU_NEWLY_IDLE) {
+	if (env->idle == CPU_NEWLY_IDLE && has_blocked)
 		env->flags |= LBF_NOHZ_STATS;
-
-		if (cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd)))
-			nohz.next_stats = jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD);
-	}
 #endif
 
 	load_idx = get_sd_load_idx(env->sd, env->idle);
@@ -8046,6 +8061,15 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
 		sg = sg->next;
 	} while (sg != env->sd->groups);
 
+#ifdef CONFIG_NO_HZ_COMMON
+	if ((env->flags & LBF_NOHZ_AGAIN) &&
+	    cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) {
+
+		WRITE_ONCE(nohz.next_blocked,
+				jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD));
+	}
+#endif
+
 	if (env->sd->flags & SD_NUMA)
 		env->fbq_type = fbq_classify_group(&sds->busiest_stat);
 
@@ -9069,6 +9093,8 @@ static void nohz_balancer_kick(struct rq *rq)
 	struct sched_domain *sd;
 	int nr_busy, i, cpu = rq->cpu;
 	unsigned int flags = 0;
+	unsigned long has_blocked = READ_ONCE(nohz.has_blocked);
+	unsigned long next_blocked = READ_ONCE(nohz.next_blocked);
 
 	if (unlikely(rq->idle_balance))
 		return;
@@ -9086,7 +9112,7 @@ static void nohz_balancer_kick(struct rq *rq)
 	if (likely(!atomic_read(&nohz.nr_cpus)))
 		return;
 
-	if (time_after(now, nohz.next_stats))
+	if (time_after(now, next_blocked) && has_blocked)
 		flags = NOHZ_STATS_KICK;
 
 	if (time_before(now, nohz.next_balance))
@@ -9207,13 +9233,26 @@ void nohz_balance_enter_idle(int cpu)
 	if (!housekeeping_cpu(cpu, HK_FLAG_SCHED))
 		return;
 
+	/*
+	 * Can be set safely without rq->lock held
+	 * If a clear happens, it will have evaluated last additions because
+	 * rq->lock is held during the check and the clear
+	 */
+	rq->has_blocked_load = 1;
+
+	/*
+	 * The tick is still stopped but load could have been added in the
+	 * meantime. We set the nohz.has_blocked flag to trig a check of the
+	 * *_avg. The CPU is already part of nohz.idle_cpus_mask so the clear
+	 * of nohz.has_blocked can only happen after checking the new load
+	 */
 	if (rq->nohz_tick_stopped)
-		return;
+		goto out;
 
 	/*
 	 * If we're a completely isolated CPU, we don't play.
 	 */
-	if (on_null_domain(cpu_rq(cpu)))
+	if (on_null_domain(rq))
 		return;
 
 	rq->nohz_tick_stopped = 1;
@@ -9221,7 +9260,21 @@ void nohz_balance_enter_idle(int cpu)
 	cpumask_set_cpu(cpu, nohz.idle_cpus_mask);
 	atomic_inc(&nohz.nr_cpus);
 
+	/*
+	 * Ensures that if nohz_idle_balance() fails to observe our
+	 * @idle_cpus_mask store, it must observe the @has_blocked
+	 * store.
+	 */
+	smp_mb__after_atomic();
+
 	set_cpu_sd_state_idle(cpu);
+
+out:
+	/*
+	 * Each time a cpu enter idle, we assume that it has blocked load and
+	 * enable the periodic update of the load of idle cpus
+	 */
+	WRITE_ONCE(nohz.has_blocked, 1);
 }
 #else
 static inline void nohz_balancer_kick(struct rq *rq) { }
@@ -9355,7 +9408,7 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 	/* Earliest time when we have to do rebalance again */
 	unsigned long now = jiffies;
 	unsigned long next_balance = now + 60*HZ;
-	unsigned long next_stats = now + msecs_to_jiffies(LOAD_AVG_PERIOD);
+	bool has_blocked_load = false;
 	int update_next_balance = 0;
 	int this_cpu = this_rq->cpu;
 	unsigned int flags;
@@ -9374,6 +9427,22 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 
 	SCHED_WARN_ON((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK);
 
+	/*
+	 * We assume there will be no idle load after this update and clear
+	 * the has_blocked flag. If a cpu enters idle in the mean time, it will
+	 * set the has_blocked flag and trig another update of idle load.
+	 * Because a cpu that becomes idle, is added to idle_cpus_mask before
+	 * setting the flag, we are sure to not clear the state and not
+	 * check the load of an idle cpu.
+	 */
+	WRITE_ONCE(nohz.has_blocked, 0);
+
+	/*
+	 * Ensures that if we miss the CPU, we must see the has_blocked
+	 * store from nohz_balance_enter_idle().
+	 */
+	smp_mb();
+
 	for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
 		if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
 			continue;
@@ -9383,11 +9452,16 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 		 * work being done for other cpus. Next load
 		 * balancing owner will pick it up.
 		 */
-		if (need_resched())
-			break;
+		if (need_resched()) {
+			has_blocked_load = true;
+			goto abort;
+		}
 
 		rq = cpu_rq(balance_cpu);
 
+		update_blocked_averages(rq->cpu);
+		has_blocked_load |= rq->has_blocked_load;
+
 		/*
 		 * If time for next balance is due,
 		 * do the balance.
@@ -9400,7 +9474,6 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 			cpu_load_update_idle(rq);
 			rq_unlock_irq(rq, &rf);
 
-			update_blocked_averages(rq->cpu);
 			if (flags & NOHZ_BALANCE_KICK)
 				rebalance_domains(rq, CPU_IDLE);
 		}
@@ -9415,7 +9488,13 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 	if (flags & NOHZ_BALANCE_KICK)
 		rebalance_domains(this_rq, CPU_IDLE);
 
-	nohz.next_stats = next_stats;
+	WRITE_ONCE(nohz.next_blocked,
+		now + msecs_to_jiffies(LOAD_AVG_PERIOD));
+
+abort:
+	/* There is still blocked load, enable periodic update */
+	if (has_blocked_load)
+		WRITE_ONCE(nohz.has_blocked, 1);
 
 	/*
 	 * next_balance will be updated only when there is a need.
@@ -10046,6 +10125,7 @@ __init void init_sched_fair_class(void)
 
 #ifdef CONFIG_NO_HZ_COMMON
 	nohz.next_balance = jiffies;
+	nohz.next_blocked = jiffies;
 	zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);
 #endif
 #endif /* SMP */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e200045..ad9b929 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -723,6 +723,7 @@ struct rq {
 #ifdef CONFIG_SMP
 	unsigned long last_load_update_tick;
 	unsigned long last_blocked_load_update_tick;
+	unsigned int has_blocked_load;
 #endif /* CONFIG_SMP */
 	unsigned int nohz_tick_stopped;
 	atomic_t nohz_flags;

From patchwork Wed Feb 14 15:26:45 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Vincent Guittot <vincent.guittot@linaro.org>
X-Patchwork-Id: 128362
Delivered-To: patch@linaro.org
Received: by 10.46.124.24 with SMTP id x24csp727135ljc;
 Wed, 14 Feb 2018 07:27:05 -0800 (PST)
X-Google-Smtp-Source: AH8x227g3ynKJGZRdZSGP6s13OeRRVT+LQXuIKlAF84gvvGgqM4T/ZTnMwJbTqsPR7kvsjEQFYme
X-Received: by 2002:a17:902:7182:: with SMTP id
 b2-v6mr4762888pll.331.1518622025609; 
 Wed, 14 Feb 2018 07:27:05 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1518622025; cv=none;
 d=google.com; s=arc-20160816;
 b=AHtFZyAs6bM8naP+eP0rTv+kaJY6uqOdB9ZpI0ztvz9Uk6Cvrfnd8Hm8WjR9gqau3B
 Ag7X/Hm/1pKrioUkfKTZggaBA5burJ6OgtZMq4uZmf3yXb3/RKWoqK821EO8o8gGOu49
 WbdGW24RMYKMA0+f4uWE+XdB6zObn/AlYaoNNJsDKeKoLqTlNGU6DOnA1QcK9tshR2sn
 v9z2ck5DP6gbG7CMo9Qx1c057cPbjSdGFdG+ijPO3oeTszrtaQOT5QOIuryp6VmQKq0B
 pfLdM9Uyb5oOnKJ1tsUmg5hHpyh0oNgcQuM2ceUFdi2z5g32vbtJH2QKafZqNraKVlH/
 d8Fw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=0awVy9Eht9OU1vgZB1RJ/d7ScAU0JQVdWyx7O31zaGc=;
 b=xtjxrpFEBnSOmgHQGExKPZhqEUm1+aiq9ZLWl1cEpT12Bw8V1Auf7jV/IivfztwVUc
 xxqIP4K89psgHEE7KwKvzuzM0ZmL3ILDoOUG9heNzjkyO9u+PufbfbamdXIRGqC2yEmb
 Z8wfM+JRe3EJ6522/zZXcjM2ujq4EvxvaBh0fWzmJUAZhfgMQsgaRJT3n3jZ0tIe3MhM
 PvULr6+DEcZSKAhYd/yUlGGLFwiZgCB1hMyd+eGHXBlZ/RT7wWqeyJOynn+bcgBPwrZG
 hCPibWEPMinpbkp/dU4fWO8G9qevL2xZP0ZI1WxEQlXQ1c4jFcXDdnE57bbZ3Gga3M+B
 kS1A==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=WRwHiAxs;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id i2si1140262pgq.601.2018.02.14.07.27.05;
 Wed, 14 Feb 2018 07:27:05 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=WRwHiAxs;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1031341AbeBNP1C (ORCPT <rfc822; dan.rue@linaro.org> + 28 others); 
 Wed, 14 Feb 2018 10:27:02 -0500
Received: from mail-wm0-f67.google.com ([74.125.82.67]:38832 "EHLO
 mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1031300AbeBNP0y (ORCPT
 <rfc822;linux-kernel@vger.kernel.org>);
 Wed, 14 Feb 2018 10:26:54 -0500
Received: by mail-wm0-f67.google.com with SMTP id 141so22559095wme.3
 for <linux-kernel@vger.kernel.org>;
 Wed, 14 Feb 2018 07:26:54 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=0awVy9Eht9OU1vgZB1RJ/d7ScAU0JQVdWyx7O31zaGc=;
 b=WRwHiAxsHGpu8DujO7SGUQOaywRgT3NrYx9QeJL/KpvFSdYCyjx5ZYs32B4Ty8OwEU
 Rq8SzehNpCk2KSOymWKHLfZQiK/Rnpt//MoY/AC8xIRWRbOY/5gBse83VXzazIW7EbQc
 9nlZ7DDpXV+YOe8lpKPuLEHaTIS2jqiFd1EYA=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=0awVy9Eht9OU1vgZB1RJ/d7ScAU0JQVdWyx7O31zaGc=;
 b=q6QHCZJtw/6Ix5XzJLf2z2j/1PMJi+QzMflGDl53V+Oc6H3BRJXHp9/1RMUX/J/2AX
 hKh/3OoWqDloMKMCmhGSNLCR/ui5RkT1702CfK0glkLKoBgjB5SDGCdJdln88yB25Y/m
 1tQTHi8PNt+IiA/Na6ZxslgVaJqGiQA+qE2izBTnhaN3iYbYrr7lD4j03+Bfs/1v9ctl
 Bj0YNiIVBWWnAvnPXMxrzQxSQeOS3giGnHAWMA6d6SmaPBhj8sbkPeCHoknuJvdOKoD7
 QKfBDa3JHD4SV/Q0ADFjbTg6FLqHEFIdHev0/I3gKxT9VRktjRxP6TKY0gCKH8RAYaU1
 pfOA==
X-Gm-Message-State: APf1xPA4SaiAyNH8zNsmyxCRegRZCZfTOGaM6gGNGExYo/sF/3SlEP3a
 stR9wVTwqSGVwY/oJqNyMuCFyw==
X-Received: by 10.80.175.162 with SMTP id h31mr7639312edd.48.1518622013522; 
 Wed, 14 Feb 2018 07:26:53 -0800 (PST)
Received: from localhost.localdomain ([2a01:e0a:f:6020:f0d0:a4bc:1106:ff2c])
 by smtp.gmail.com with ESMTPSA id
 p93sm3361152edp.63.2018.02.14.07.26.52
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Wed, 14 Feb 2018 07:26:52 -0800 (PST)
From: Vincent Guittot <vincent.guittot@linaro.org>
To: peterz@infradead.org, mingo@kernel.org,
 linux-kernel@vger.kernel.org, valentin.schneider@arm.com
Cc: morten.rasmussen@foss.arm.com, brendan.jackman@arm.com,
 dietmar.eggemann@arm.com, Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH v5 2/3] sched: reduce the periodic update duration
Date: Wed, 14 Feb 2018 16:26:45 +0100
Message-Id: <1518622006-16089-3-git-send-email-vincent.guittot@linaro.org>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1518622006-16089-1-git-send-email-vincent.guittot@linaro.org>
References: <1518622006-16089-1-git-send-email-vincent.guittot@linaro.org>
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Instead of using the cfs_rq_is_decayed() which monitors all *_avg
and *_sum, we create a cfs_rq_has_blocked() which only takes care of
util_avg and load_avg. We are only interested by these 2 values which are
decaying faster than the *_sum so we can stop the periodic update earlier.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

-- 
2.7.4

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5a6835e..9183fee 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7337,6 +7337,19 @@ static void attach_tasks(struct lb_env *env)
 	rq_unlock(env->dst_rq, &rf);
 }
 
+static inline bool cfs_rq_has_blocked(struct cfs_rq *cfs_rq)
+{
+	if (cfs_rq->avg.load_avg)
+		return true;
+
+	if (cfs_rq->avg.util_avg)
+		return true;
+
+	return false;
+}
+
+#ifdef CONFIG_FAIR_GROUP_SCHED
+
 static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
 {
 	if (cfs_rq->load.weight)
@@ -7354,8 +7367,6 @@ static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
 	return true;
 }
 
-#ifdef CONFIG_FAIR_GROUP_SCHED
-
 static void update_blocked_averages(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
@@ -7391,7 +7402,9 @@ static void update_blocked_averages(int cpu)
 		 */
 		if (cfs_rq_is_decayed(cfs_rq))
 			list_del_leaf_cfs_rq(cfs_rq);
-		else
+
+		/* Don't need periodic decay once load/util_avg are null */
+		if (cfs_rq_has_blocked(cfs_rq))
 			done = false;
 	}
 
@@ -7461,7 +7474,7 @@ static inline void update_blocked_averages(int cpu)
 	update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);
 #ifdef CONFIG_NO_HZ_COMMON
 	rq->last_blocked_load_update_tick = jiffies;
-	if (cfs_rq_is_decayed(cfs_rq))
+	if (!cfs_rq_has_blocked(cfs_rq))
 		rq->has_blocked_load = 0;
 #endif
 	rq_unlock_irqrestore(rq, &rf);

From patchwork Wed Feb 14 15:26:46 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Vincent Guittot <vincent.guittot@linaro.org>
X-Patchwork-Id: 128363
Delivered-To: patch@linaro.org
Received: by 10.46.124.24 with SMTP id x24csp727286ljc;
 Wed, 14 Feb 2018 07:27:17 -0800 (PST)
X-Google-Smtp-Source: AH8x224BAA29WoD7CwMf0Be5WIF9ZfU6sDHcuzzXbaUC4aBVf85JMsh3txOPrfbxchUWCsES7u+U
X-Received: by 2002:a17:902:6e4:: with SMTP id
 91-v6mr4735828plh.26.1518622037778; 
 Wed, 14 Feb 2018 07:27:17 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1518622037; cv=none;
 d=google.com; s=arc-20160816;
 b=SvFZqTwoo121TYfLxE2kGp0EacRuL/SGuKqXT7g3HV++d4Zq+pN7gIGTj03v1zMgN3
 YfFLqgQd8uVQjajfzGTXdeIG6Kfk5lzqj1FwkelS474CVTahCMup49S4ygInsOvEZCO8
 Xcumg4M/zc+FPOaL+sa8iqTjNCP2ly8M/cYI1k41ZeY5wwtw8PcDte+GQnYz+a5N6+aX
 BFxhysJAFuEtf0E+Ltk22JgM1s8I4dQ0njj0ZF2aVHYVSdQRsXQRFFoSzIKUH/Ej40lq
 lhHVHhWZ6TZnho2jWH68eyN+N5nslseoKFLBxshYOHAnrG2wEz4mipFJByhMDM3pUCcv
 XtIA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=J6Rou7/dt12U6O4O0gp1klkZktOFTVpbAWJWJZ0Wf0w=;
 b=NaOrgF9w0mRSUwFoN+RGqp7UfrQhuUqqI/M+U09Byal747jfix9ucKS7fbX13KJNjg
 SXb0PqrPEt1REyMsCOmAAHNg4evYpZV023YnoshrluXf/mjh5L6zV62PVSS6tP1fH/jP
 1dDs8aO9QYoRy/2CLrs/C9VCSH1js4R+15jM1ToM0U/ZEHakNQApOeVGMoueu2DsVySl
 +5sVJZKn4qLu+0UBG/iajBhObRNsF74xi9LkDAc2NZsyBo5oXOICgNpg/pgQWYZJLSQb
 JSP5VrPfFXsElTWZPaFxoy9RibDp0hnFj3HwN3Sq8yGV7OK5BGcfDoKSZu0NrgI+jPab
 Wgfg==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=P+2d02bR;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id p8si943697pgq.51.2018.02.14.07.27.17; 
 Wed, 14 Feb 2018 07:27:17 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=P+2d02bR;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1031327AbeBNP1A (ORCPT <rfc822; dan.rue@linaro.org> + 28 others); 
 Wed, 14 Feb 2018 10:27:00 -0500
Received: from mail-wm0-f67.google.com ([74.125.82.67]:53484 "EHLO
 mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1031243AbeBNP04 (ORCPT
 <rfc822;linux-kernel@vger.kernel.org>);
 Wed, 14 Feb 2018 10:26:56 -0500
Received: by mail-wm0-f67.google.com with SMTP id t74so23447814wme.3
 for <linux-kernel@vger.kernel.org>;
 Wed, 14 Feb 2018 07:26:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=J6Rou7/dt12U6O4O0gp1klkZktOFTVpbAWJWJZ0Wf0w=;
 b=P+2d02bRmhm3w2hKBKvNNMiOPNYVH1p3edXdyx+2taQuDktkI7VH+5Z7dxGzcTQrqd
 KL9Ab27BaElNi2QoKpurV4W7SODJS6lzS9TPYWEvl14rb9ua77l+UljY0esnPHwhvjV0
 mBV8nBevC2jA9diGJqFOtxtYKbOZzlhF04zSo=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=J6Rou7/dt12U6O4O0gp1klkZktOFTVpbAWJWJZ0Wf0w=;
 b=MDF5GceGv/a/4mNmcq+wAe2LTaN2w0gXWWw1uQ9OjJza6BaHYv2HooKvs+qrjeeoLR
 vTddEZ6/3zANNUUKdPq5fqlnGvkB2v9IzKOcO6C1afZeF69pxvPg/g4tY0sRaU4/WBbg
 KNlasgNfx1kwxIiXHhhFTO6Kw4vW82Iz5/u17ncnQRQ2sSXsggfMcDVz+P5Fe+HwmGdO
 hipipJs9kWEchDn/de3sVVZJj59d0VMyBd/fuOhVwo5ixqKCEa3r8tQrcI1zjKzMC87o
 3m9a8HrAt8bHiC/koYKdoSpxXHtv5r4pGnLWEaDd0iJNXp19G9Cn+2ov0ET0iJd6Thqn
 UZYg==
X-Gm-Message-State: APf1xPAkLdeD82PH0THEbTh7clXVK2qW8TBYmwRkBuf5ONy9dMGnte2A
 918VpkTvx6N7AprliKNXAbXdEw==
X-Received: by 10.80.177.9 with SMTP id k9mr7385859edd.154.1518622015133;
 Wed, 14 Feb 2018 07:26:55 -0800 (PST)
Received: from localhost.localdomain ([2a01:e0a:f:6020:f0d0:a4bc:1106:ff2c])
 by smtp.gmail.com with ESMTPSA id
 p93sm3361152edp.63.2018.02.14.07.26.53
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128);
 Wed, 14 Feb 2018 07:26:54 -0800 (PST)
From: Vincent Guittot <vincent.guittot@linaro.org>
To: peterz@infradead.org, mingo@kernel.org,
 linux-kernel@vger.kernel.org, valentin.schneider@arm.com
Cc: morten.rasmussen@foss.arm.com, brendan.jackman@arm.com,
 dietmar.eggemann@arm.com, Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH v5 3/3] sched: update blocked load when newly idle
Date: Wed, 14 Feb 2018 16:26:46 +0100
Message-Id: <1518622006-16089-4-git-send-email-vincent.guittot@linaro.org>
X-Mailer: git-send-email 2.7.4
In-Reply-To: <1518622006-16089-1-git-send-email-vincent.guittot@linaro.org>
References: <1518622006-16089-1-git-send-email-vincent.guittot@linaro.org>
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

When NEWLY_IDLE load balance is not triggered, we might need to update the
blocked load anyway. We can kick an ilb so an idle CPU will take care of
updating blocked load or we can try to update them locally before entering
idle. In the latter case, we reuse part of the nohz_idle_balance.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 324 +++++++++++++++++++++++++++++++---------------------
 1 file changed, 193 insertions(+), 131 deletions(-)

-- 
2.7.4
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9183fee..4db5de76 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8832,120 +8832,6 @@ update_next_balance(struct sched_domain *sd, unsigned long *next_balance)
 }
 
 /*
- * idle_balance is called by schedule() if this_cpu is about to become
- * idle. Attempts to pull tasks from other CPUs.
- */
-static int idle_balance(struct rq *this_rq, struct rq_flags *rf)
-{
-	unsigned long next_balance = jiffies + HZ;
-	int this_cpu = this_rq->cpu;
-	struct sched_domain *sd;
-	int pulled_task = 0;
-	u64 curr_cost = 0;
-
-	/*
-	 * We must set idle_stamp _before_ calling idle_balance(), such that we
-	 * measure the duration of idle_balance() as idle time.
-	 */
-	this_rq->idle_stamp = rq_clock(this_rq);
-
-	/*
-	 * Do not pull tasks towards !active CPUs...
-	 */
-	if (!cpu_active(this_cpu))
-		return 0;
-
-	/*
-	 * This is OK, because current is on_cpu, which avoids it being picked
-	 * for load-balance and preemption/IRQs are still disabled avoiding
-	 * further scheduler activity on it and we're being very careful to
-	 * re-start the picking loop.
-	 */
-	rq_unpin_lock(this_rq, rf);
-
-	if (this_rq->avg_idle < sysctl_sched_migration_cost ||
-	    !this_rq->rd->overload) {
-		rcu_read_lock();
-		sd = rcu_dereference_check_sched_domain(this_rq->sd);
-		if (sd)
-			update_next_balance(sd, &next_balance);
-		rcu_read_unlock();
-
-		goto out;
-	}
-
-	raw_spin_unlock(&this_rq->lock);
-
-	update_blocked_averages(this_cpu);
-	rcu_read_lock();
-	for_each_domain(this_cpu, sd) {
-		int continue_balancing = 1;
-		u64 t0, domain_cost;
-
-		if (!(sd->flags & SD_LOAD_BALANCE))
-			continue;
-
-		if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
-			update_next_balance(sd, &next_balance);
-			break;
-		}
-
-		if (sd->flags & SD_BALANCE_NEWIDLE) {
-			t0 = sched_clock_cpu(this_cpu);
-
-			pulled_task = load_balance(this_cpu, this_rq,
-						   sd, CPU_NEWLY_IDLE,
-						   &continue_balancing);
-
-			domain_cost = sched_clock_cpu(this_cpu) - t0;
-			if (domain_cost > sd->max_newidle_lb_cost)
-				sd->max_newidle_lb_cost = domain_cost;
-
-			curr_cost += domain_cost;
-		}
-
-		update_next_balance(sd, &next_balance);
-
-		/*
-		 * Stop searching for tasks to pull if there are
-		 * now runnable tasks on this rq.
-		 */
-		if (pulled_task || this_rq->nr_running > 0)
-			break;
-	}
-	rcu_read_unlock();
-
-	raw_spin_lock(&this_rq->lock);
-
-	if (curr_cost > this_rq->max_idle_balance_cost)
-		this_rq->max_idle_balance_cost = curr_cost;
-
-	/*
-	 * While browsing the domains, we released the rq lock, a task could
-	 * have been enqueued in the meantime. Since we're not going idle,
-	 * pretend we pulled a task.
-	 */
-	if (this_rq->cfs.h_nr_running && !pulled_task)
-		pulled_task = 1;
-
-out:
-	/* Move the next balance forward */
-	if (time_after(this_rq->next_balance, next_balance))
-		this_rq->next_balance = next_balance;
-
-	/* Is there a task of a high priority class? */
-	if (this_rq->nr_running != this_rq->cfs.h_nr_running)
-		pulled_task = -1;
-
-	if (pulled_task)
-		this_rq->idle_stamp = 0;
-
-	rq_repin_lock(this_rq, rf);
-
-	return pulled_task;
-}
-
-/*
  * active_load_balance_cpu_stop is run by cpu stopper. It pushes
  * running tasks off the busiest CPU onto idle CPUs. It requires at
  * least 1 task to be running on each physical CPU where possible, and
@@ -9413,10 +9299,14 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
 
 #ifdef CONFIG_NO_HZ_COMMON
 /*
- * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
- * rebalancing for all the cpus for whom scheduler ticks are stopped.
+ * Internal function that runs load balance for all idle cpus. The load balance
+ * can be a simple update of blocked load or a complete load balance with
+ * tasks movement depending of flags.
+ * The function returns false if the loop has stopped before running
+ * through all idle CPUs.
  */
-static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
+static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
+			       enum cpu_idle_type idle)
 {
 	/* Earliest time when we have to do rebalance again */
 	unsigned long now = jiffies;
@@ -9424,20 +9314,10 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 	bool has_blocked_load = false;
 	int update_next_balance = 0;
 	int this_cpu = this_rq->cpu;
-	unsigned int flags;
 	int balance_cpu;
+	int ret = false;
 	struct rq *rq;
 
-	if (!(atomic_read(nohz_flags(this_cpu)) & NOHZ_KICK_MASK))
-		return false;
-
-	if (idle != CPU_IDLE) {
-		atomic_andnot(NOHZ_KICK_MASK, nohz_flags(this_cpu));
-		return false;
-	}
-
-	flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(this_cpu));
-
 	SCHED_WARN_ON((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK);
 
 	/*
@@ -9482,10 +9362,10 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 		if (time_after_eq(jiffies, rq->next_balance)) {
 			struct rq_flags rf;
 
-			rq_lock_irq(rq, &rf);
+			rq_lock_irqsave(rq, &rf);
 			update_rq_clock(rq);
 			cpu_load_update_idle(rq);
-			rq_unlock_irq(rq, &rf);
+			rq_unlock_irqrestore(rq, &rf);
 
 			if (flags & NOHZ_BALANCE_KICK)
 				rebalance_domains(rq, CPU_IDLE);
@@ -9497,13 +9377,21 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 		}
 	}
 
-	update_blocked_averages(this_cpu);
+	/* Newly idle CPU doesn't need an update */
+	if (idle != CPU_NEWLY_IDLE) {
+		update_blocked_averages(this_cpu);
+		has_blocked_load |= this_rq->has_blocked_load;
+	}
+
 	if (flags & NOHZ_BALANCE_KICK)
 		rebalance_domains(this_rq, CPU_IDLE);
 
 	WRITE_ONCE(nohz.next_blocked,
 		now + msecs_to_jiffies(LOAD_AVG_PERIOD));
 
+	/* The full idle balance loop has been done */
+	ret = true;
+
 abort:
 	/* There is still blocked load, enable periodic update */
 	if (has_blocked_load)
@@ -9517,6 +9405,35 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 	if (likely(update_next_balance))
 		nohz.next_balance = next_balance;
 
+	return ret;
+}
+
+/*
+ * In CONFIG_NO_HZ_COMMON case, the idle balance kickee will do the
+ * rebalancing for all the cpus for whom scheduler ticks are stopped.
+ */
+static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
+{
+	int this_cpu = this_rq->cpu;
+	unsigned int flags;
+
+	if (!(atomic_read(nohz_flags(this_cpu)) & NOHZ_KICK_MASK))
+		return false;
+
+	if (idle != CPU_IDLE) {
+		atomic_andnot(NOHZ_KICK_MASK, nohz_flags(this_cpu));
+		return false;
+	}
+
+	/*
+	 * barrier, pairs with nohz_balance_enter_idle(), ensures ...
+	 */
+	flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(this_cpu));
+	if (!(flags & NOHZ_KICK_MASK))
+		return false;
+
+	_nohz_idle_balance(this_rq, flags, idle);
+
 	return true;
 }
 #else
@@ -9527,6 +9444,151 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
 #endif
 
 /*
+ * idle_balance is called by schedule() if this_cpu is about to become
+ * idle. Attempts to pull tasks from other CPUs.
+ */
+static int idle_balance(struct rq *this_rq, struct rq_flags *rf)
+{
+	unsigned long next_balance = jiffies + HZ;
+	int this_cpu = this_rq->cpu;
+	struct sched_domain *sd;
+	int pulled_task = 0;
+	u64 curr_cost = 0;
+
+	/*
+	 * We must set idle_stamp _before_ calling idle_balance(), such that we
+	 * measure the duration of idle_balance() as idle time.
+	 */
+	this_rq->idle_stamp = rq_clock(this_rq);
+
+	/*
+	 * Do not pull tasks towards !active CPUs...
+	 */
+	if (!cpu_active(this_cpu))
+		return 0;
+
+	/*
+	 * This is OK, because current is on_cpu, which avoids it being picked
+	 * for load-balance and preemption/IRQs are still disabled avoiding
+	 * further scheduler activity on it and we're being very careful to
+	 * re-start the picking loop.
+	 */
+	rq_unpin_lock(this_rq, rf);
+
+	if (this_rq->avg_idle < sysctl_sched_migration_cost ||
+	    !this_rq->rd->overload) {
+#ifdef CONFIG_NO_HZ_COMMON
+		unsigned long has_blocked = READ_ONCE(nohz.has_blocked);
+		unsigned long next_blocked = READ_ONCE(nohz.next_blocked);
+#endif
+		rcu_read_lock();
+		sd = rcu_dereference_check_sched_domain(this_rq->sd);
+		if (sd)
+			update_next_balance(sd, &next_balance);
+		rcu_read_unlock();
+
+#ifdef CONFIG_NO_HZ_COMMON
+		/*
+		 * This CPU doesn't want to be disturbed by scheduler
+		 * housekeeping
+		 */
+		if (!housekeeping_cpu(this_cpu, HK_FLAG_SCHED))
+			goto out;
+
+		/* Will wake up very soon. No time for doing anything else*/
+		if (this_rq->avg_idle < sysctl_sched_migration_cost)
+			goto out;
+
+		/* Don't need to update blocked load of idle CPUs*/
+		if (!has_blocked || time_before(jiffies, next_blocked))
+			goto out;
+
+		raw_spin_unlock(&this_rq->lock);
+		/*
+		 * This CPU is going to be idle and blocked load of idle CPUs
+		 * need to be updated. Run the ilb locally as it is a good
+		 * candidate for ilb instead of waking up another idle CPU.
+		 * Kick an normal ilb if we failed to do the update.
+		 */
+		if (!_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE))
+			kick_ilb(NOHZ_STATS_KICK);
+		raw_spin_lock(&this_rq->lock);
+#endif
+		goto out;
+	}
+
+	raw_spin_unlock(&this_rq->lock);
+
+	update_blocked_averages(this_cpu);
+	rcu_read_lock();
+	for_each_domain(this_cpu, sd) {
+		int continue_balancing = 1;
+		u64 t0, domain_cost;
+
+		if (!(sd->flags & SD_LOAD_BALANCE))
+			continue;
+
+		if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
+			update_next_balance(sd, &next_balance);
+			break;
+		}
+
+		if (sd->flags & SD_BALANCE_NEWIDLE) {
+			t0 = sched_clock_cpu(this_cpu);
+
+			pulled_task = load_balance(this_cpu, this_rq,
+						   sd, CPU_NEWLY_IDLE,
+						   &continue_balancing);
+
+			domain_cost = sched_clock_cpu(this_cpu) - t0;
+			if (domain_cost > sd->max_newidle_lb_cost)
+				sd->max_newidle_lb_cost = domain_cost;
+
+			curr_cost += domain_cost;
+		}
+
+		update_next_balance(sd, &next_balance);
+
+		/*
+		 * Stop searching for tasks to pull if there are
+		 * now runnable tasks on this rq.
+		 */
+		if (pulled_task || this_rq->nr_running > 0)
+			break;
+	}
+	rcu_read_unlock();
+
+	raw_spin_lock(&this_rq->lock);
+
+	if (curr_cost > this_rq->max_idle_balance_cost)
+		this_rq->max_idle_balance_cost = curr_cost;
+
+	/*
+	 * While browsing the domains, we released the rq lock, a task could
+	 * have been enqueued in the meantime. Since we're not going idle,
+	 * pretend we pulled a task.
+	 */
+	if (this_rq->cfs.h_nr_running && !pulled_task)
+		pulled_task = 1;
+
+out:
+	/* Move the next balance forward */
+	if (time_after(this_rq->next_balance, next_balance))
+		this_rq->next_balance = next_balance;
+
+	/* Is there a task of a high priority class? */
+	if (this_rq->nr_running != this_rq->cfs.h_nr_running)
+		pulled_task = -1;
+
+	if (pulled_task)
+		this_rq->idle_stamp = 0;
+
+	rq_repin_lock(this_rq, rf);
+
+	return pulled_task;
+}
+
+/*
  * run_rebalance_domains is triggered when needed from the scheduler tick.
  * Also triggered for nohz idle balancing (with nohz_balancing_kick set).
  */