From patchwork Tue Feb  6 14:41:30 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Patrick Bellasi <patrick.bellasi@arm.com>
X-Patchwork-Id: 127016
Delivered-To: patch@linaro.org
Received: by 10.46.124.24 with SMTP id x24csp2981267ljc;
 Tue, 6 Feb 2018 06:42:09 -0800 (PST)
X-Google-Smtp-Source: AH8x225C0d6ZZD4YZFaEOhnTFRzIKGXrP1diwQPICX6cEobBOZLAyCac8lBbVj5PO2SpvXo1WcfZ
X-Received: by 10.99.102.1 with SMTP id a1mr2149904pgc.452.1517928129386;
 Tue, 06 Feb 2018 06:42:09 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1517928129; cv=none;
 d=google.com; s=arc-20160816;
 b=sut/e3rHrDwAl7VNeyzAF90BjZwbja0b8mrgolB6LCncckso4L/03nGM37ceqXdEP2
 syN/zOQI6eihA4Zt6M1a9BbNazFd+SV8e+6QlueZ8nQX4MhrtfmJS+12Clxf+3K7g31y
 wIEOYq7TFH76cPyf1wR6hGt1/4DUz8FnGEoVgjtbO7AFOyOHsX0vLjB5A1+J3d/0SCs2
 Hnwi0du8cABuUtTFeUhovcjk8dfEbcrBvXuoG1mCLmhCqZyEstoqJSM6hvigsrw9wHG7
 L5LsoTb/ZNz819TqT1HS3UcnpXlQ6HqJ8aeWyahwIoAb/aA9zSYKTUeM3856wcmkZwi3
 F7ew==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:arc-authentication-results;
 bh=VE4N3p/C7S5eN7xLuy4KOV85Ru6ihqLR+UUxJw0NEbs=;
 b=DDZQjHUhUoaEJcqedFB87qQEMo7IwLaf+6ZJ/UdppVGkzKnHneRsG+pVylVo/cqgGO
 uk816HtUSJOPBTif0MlwDlz6mhBtDlXz30XhHw0aPMH09DSzExzz38+6qTGEkRWXjAwr
 h2q0GDuVIHGzVjhmEmsKctnou1KCXwkVKiGcaxmukmGLW2Pw088jh+U14PkrYWYzlFEC
 JZg3eb5O4Ea/xq3OMNkuig2GaFBYd+H61rIirwgaKZ+dnlARiaJxtm0ELOy+rAWW31HW
 NUqqN94wgOY6znFVTFWpRE1XzXIk2ByHn3RqyETwozi39l574peLLMnUwdF6jeoyk3cy
 OAKQ==
ARC-Authentication-Results: i=1; mx.google.com;
 spf=pass (google.com: best guess record for domain of
 linux-pm-owner@vger.kernel.org designates 209.132.180.67 as
 permitted sender)
 smtp.mailfrom=linux-pm-owner@vger.kernel.org
Return-Path: <linux-pm-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 l33-v6si6554317pld.692.2018.02.06.06.42.09; 
 Tue, 06 Feb 2018 06:42:09 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-pm-owner@vger.kernel.org designates 209.132.180.67 as
 permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 spf=pass (google.com: best guess record for domain of
 linux-pm-owner@vger.kernel.org designates 209.132.180.67 as
 permitted sender)
 smtp.mailfrom=linux-pm-owner@vger.kernel.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1752213AbeBFOl4 (ORCPT <rfc822;thara.gopinath@linaro.org>
 + 11 others); Tue, 6 Feb 2018 09:41:56 -0500
Received: from foss.arm.com ([217.140.101.70]:38246 "EHLO foss.arm.com"
 rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
 id S1751396AbeBFOlt (ORCPT <rfc822;linux-pm@vger.kernel.org>);
 Tue, 6 Feb 2018 09:41:49 -0500
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 093031610;
 Tue,  6 Feb 2018 06:41:49 -0800 (PST)
Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com
 [10.1.210.68])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id
 8029F3F487; Tue,  6 Feb 2018 06:41:46 -0800 (PST)
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
 "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
 Viresh Kumar <viresh.kumar@linaro.org>,
 Vincent Guittot <vincent.guittot@linaro.org>, Paul Turner <pjt@google.com>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Morten Rasmussen <morten.rasmussen@arm.com>,
 Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@android.com>,
 Joel Fernandes <joelaf@google.com>, Steve Muckle <smuckle@google.com>
Subject: [PATCH v4 2/3] sched/fair: use util_est in LB and WU paths
Date: Tue,  6 Feb 2018 14:41:30 +0000
Message-Id: <20180206144131.31233-3-patrick.bellasi@arm.com>
X-Mailer: git-send-email 2.15.1
In-Reply-To: <20180206144131.31233-1-patrick.bellasi@arm.com>
References: <20180206144131.31233-1-patrick.bellasi@arm.com>
Sender: linux-pm-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-pm.vger.kernel.org>
X-Mailing-List: linux-pm@vger.kernel.org

When the scheduler looks at the CPU utilization, the current PELT value
for a CPU is returned straight away. In certain scenarios this can have
undesired side effects on task placement.

For example, since the task utilization is decayed at wakeup time, when
a long sleeping big task is enqueued it does not add immediately a
significant contribution to the target CPU.
As a result we generate a race condition where other tasks can be placed
on the same CPU while it is still considered relatively empty.

In order to reduce this kind of race conditions, this patch introduces the
required support to integrate the usage of the CPU's estimated utilization
in cpu_util_wake as well as in update_sg_lb_stats.

The estimated utilization of a CPU is defined to be the maximum between
its PELT's utilization and the sum of the estimated utilization of the
tasks currently RUNNABLE on that CPU.
This allows to properly represent the spare capacity of a CPU which, for
example, has just got a big task running since a long sleep period.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Paul Turner <pjt@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org

---
Changes in v4:
 - rebased on today's tip/sched/core (commit 460e8c3340a2)
 - ensure cpu_util_wake() is cpu_capacity_orig()'s clamped (Pavan)

Changes in v3:
 - rebased on today's tip/sched/core (commit 07881166a892)

Changes in v2:
 - rebase on top of v4.15-rc2
 - tested that overhauled PELT code does not affect the util_est

Change-Id: Id5a38d0e41aae7ca89f021f277851ee4e6ba5112
---
 kernel/sched/fair.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 76 insertions(+), 5 deletions(-)

-- 
2.15.1

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 118f49c39b60..2a2e88bced87 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6347,6 +6347,41 @@ static unsigned long cpu_util(int cpu)
 	return (util >= capacity) ? capacity : util;
 }
 
+/**
+ * cpu_util_est: estimated utilization for the specified CPU
+ * @cpu: the CPU to get the estimated utilization for
+ *
+ * The estimated utilization of a CPU is defined to be the maximum between its
+ * PELT's utilization and the sum of the estimated utilization of the tasks
+ * currently RUNNABLE on that CPU.
+ *
+ * This allows to properly represent the expected utilization of a CPU which
+ * has just got a big task running since a long sleep period. At the same time
+ * however it preserves the benefits of the "blocked utilization" in
+ * describing the potential for other tasks waking up on the same CPU.
+ *
+ * Return: the estimated utilization for the specified CPU
+ */
+static inline unsigned long cpu_util_est(int cpu)
+{
+	unsigned long util, util_est;
+	unsigned long capacity;
+	struct cfs_rq *cfs_rq;
+
+	if (!sched_feat(UTIL_EST))
+		return cpu_util(cpu);
+
+	cfs_rq = &cpu_rq(cpu)->cfs;
+	util = cfs_rq->avg.util_avg;
+	util_est = cfs_rq->avg.util_est.enqueued;
+	util_est = max(util, util_est);
+
+	capacity = capacity_orig_of(cpu);
+	util_est = min(util_est, capacity);
+
+	return util_est;
+}
+
 static inline unsigned long task_util(struct task_struct *p)
 {
 	return p->se.avg.util_avg;
@@ -6364,16 +6399,52 @@ static inline unsigned long _task_util_est(struct task_struct *p)
  */
 static unsigned long cpu_util_wake(int cpu, struct task_struct *p)
 {
-	unsigned long util, capacity;
+	unsigned long capacity;
+	long util, util_est;
 
 	/* Task has no contribution or is new */
 	if (cpu != task_cpu(p) || !p->se.avg.last_update_time)
-		return cpu_util(cpu);
+		return cpu_util_est(cpu);
 
+	/* Discount task's blocked util from CPU's util */
+	util = cpu_util(cpu) - task_util(p);
+	util = max(util, 0L);
+
+	if (!sched_feat(UTIL_EST))
+		return util;
+
+	/*
+	 * Covered cases:
+	 * - if *p is the only task sleeping on this CPU, then:
+	 *      cpu_util (== task_util) > util_est (== 0)
+	 *   and thus we return:
+	 *      cpu_util_wake = (cpu_util - task_util) = 0
+	 *
+	 * - if other tasks are SLEEPING on the same CPU, which is just waking
+	 *   up, then:
+	 *      cpu_util >= task_util
+	 *      cpu_util > util_est (== 0)
+	 *   and thus we discount *p's blocked utilization to return:
+	 *      cpu_util_wake = (cpu_util - task_util) >= 0
+	 *
+	 * - if other tasks are RUNNABLE on that CPU and
+	 *      util_est > cpu_util
+	 *   then we use util_est since it returns a more restrictive
+	 *   estimation of the spare capacity on that CPU, by just considering
+	 *   the expected utilization of tasks already runnable on that CPU.
+	 */
+	util_est = cpu_rq(cpu)->cfs.avg.util_est.enqueued;
+	util = max(util, util_est);
+
+	/*
+	 * Estimated utilization can exceed the CPU capacity, thus let's clamp
+	 * to the maximum CPU capacity to ensure consistency with other
+	 * cpu_util[_est] calls.
+	 */
 	capacity = capacity_orig_of(cpu);
-	util = max_t(long, cpu_rq(cpu)->cfs.avg.util_avg - task_util(p), 0);
+	util = min_t(unsigned long, util, capacity);
 
-	return (util >= capacity) ? capacity : util;
+	return util;
 }
 
 /*
@@ -7898,7 +7969,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
 			load = source_load(i, load_idx);
 
 		sgs->group_load += load;
-		sgs->group_util += cpu_util(i);
+		sgs->group_util += cpu_util_est(i);
 		sgs->sum_nr_running += rq->cfs.h_nr_running;
 
 		nr_running = rq->nr_running;