From patchwork Tue Dec 5 17:10:15 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Bellasi X-Patchwork-Id: 120719 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp6004538qgn; Tue, 5 Dec 2017 09:11:20 -0800 (PST) X-Google-Smtp-Source: AGs4zMZWwQGY+lOuLpp2DNQOcBM0Sjpqi2B2hd0j8c024CqDP8yEONBxRaa/sbhAaHN5GK9vSQeK X-Received: by 10.99.115.79 with SMTP id d15mr18609784pgn.340.1512493880819; Tue, 05 Dec 2017 09:11:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512493880; cv=none; d=google.com; s=arc-20160816; b=LnGnlk7DV05MO0LahL8Lahxdsek+JA/zfOHuh7cD1bUmN9xpJMR3AGW1Sh4y06Rjp4 qKJQXxnO101nxkJvgnegxXro2Ra+T0qsOcWa3ki7SUvy/LEyPGIyecFdVt4X62aW14mH Fab5qkoP/SBWHv+trBecod6R5kMMYP8IOnTju8dK6owv2oWBcME2hRl7TDNhGOqdCW6Q y2aMmw/2HA03jGoKC50hHcbhgUF7LpnbD7sIUzsk/4rQvMtytpGqP7bIK7Y0QjqzLVYl Wxq/20n00MpmlAuOsmi2TiPutXyjL6VjAVzGQzZ7+CT6kANjm6FAAZDcyiIGd13shNi6 SG3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=aKC8satZkuqirdZ0/kSttaQgxbK9Pbfa/ju+uTAFL/M=; b=wJ6Mr3hayrm3vB4tvAGc/ueKK+R4Lb1DQJjjo/sqdFTB+ISG70gChbZBvsI1m59iu+ oWetI0MUnqWPNJjCnVtalNMyNlvJuVaTVRKRiRn5YREXIrHDkoahu0XhZDm1tlFEhKmd qbzQqtan/A423s5b1uQ/MC3GaH39/LNEvDmKK6KvdzTdroYeRRgexaLoI/7YTcwkQnad gZ9useQGIGNkX5gYiAoaIqPn1HL1jwN9wmq4AsM3tXus2jI3IktgrkDcC7zRLMGujBvR Kn9YhPsPn6pxStdH/IRhiMM6pfeIkTdGDjW3q1Mw4BAUecq+NJ9NeT98nFsMAvMUnint K5wQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-pm-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f4si331093plr.643.2017.12.05.09.11.20; Tue, 05 Dec 2017 09:11:20 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-pm-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753063AbdLERLT (ORCPT + 11 others); Tue, 5 Dec 2017 12:11:19 -0500 Received: from foss.arm.com ([217.140.101.70]:52290 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752902AbdLERKi (ORCPT ); Tue, 5 Dec 2017 12:10:38 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5A2C61596; Tue, 5 Dec 2017 09:10:38 -0800 (PST) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 0051B3F246; Tue, 5 Dec 2017 09:10:35 -0800 (PST) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , "Rafael J . Wysocki" , Viresh Kumar , Vincent Guittot , Paul Turner , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes Subject: [PATCH v2 1/4] sched/fair: always used unsigned long for utilization Date: Tue, 5 Dec 2017 17:10:15 +0000 Message-Id: <20171205171018.9203-2-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171205171018.9203-1-patrick.bellasi@arm.com> References: <20171205171018.9203-1-patrick.bellasi@arm.com> Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Utilization and capacity are tracked as unsigned long, however some functions using them return an int which is ultimately assigned back to unsigned long variables. Since there is not scope on using a different and signed type, this consolidate the signature of functions returning utilization to always use the native type. As well as improving code consistency this is expected also benefits code paths where utilizations should be clamped by avoiding further type conversions or ugly type casts. Signed-off-by: Patrick Bellasi Reviewed-by: Chris Redpath Reviewed-by: Brendan Jackman Reviewed-by: Dietmar Eggemann Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Vincent Guittot Cc: Morten Rasmussen Cc: Dietmar Eggemann Cc: linux-kernel@vger.kernel.org --- Changes v1->v2: - rebase on top of v4.15-rc2 - tested that overhauled PELT code does not affect the util_est --- kernel/sched/fair.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) -- 2.14.1 Acked-by: Vincent Guittot diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 4037e19bbca2..ad21550d008c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5721,8 +5721,8 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, return affine; } -static inline int task_util(struct task_struct *p); -static int cpu_util_wake(int cpu, struct task_struct *p); +static inline unsigned long task_util(struct task_struct *p); +static unsigned long cpu_util_wake(int cpu, struct task_struct *p); static unsigned long capacity_spare_wake(int cpu, struct task_struct *p) { @@ -6203,7 +6203,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target) * capacity_orig) as it useful for predicting the capacity required after task * migrations (scheduler-driven DVFS). */ -static int cpu_util(int cpu) +static unsigned long cpu_util(int cpu) { unsigned long util = cpu_rq(cpu)->cfs.avg.util_avg; unsigned long capacity = capacity_orig_of(cpu); @@ -6211,7 +6211,7 @@ static int cpu_util(int cpu) return (util >= capacity) ? capacity : util; } -static inline int task_util(struct task_struct *p) +static inline unsigned long task_util(struct task_struct *p) { return p->se.avg.util_avg; } @@ -6220,7 +6220,7 @@ static inline int task_util(struct task_struct *p) * cpu_util_wake: Compute cpu utilization with any contributions from * the waking task p removed. */ -static int cpu_util_wake(int cpu, struct task_struct *p) +static unsigned long cpu_util_wake(int cpu, struct task_struct *p) { unsigned long util, capacity; From patchwork Tue Dec 5 17:10:16 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Bellasi X-Patchwork-Id: 120716 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp6003902qgn; Tue, 5 Dec 2017 09:10:48 -0800 (PST) X-Google-Smtp-Source: AGs4zMZlpm2cKfekl8148E/fZmPzp2txFsinJwFyNAcN8UGEsQtuMKHHnRxUUZBRuBKrdqvmggw8 X-Received: by 10.98.71.90 with SMTP id u87mr24022239pfa.75.1512493848896; Tue, 05 Dec 2017 09:10:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512493848; cv=none; d=google.com; s=arc-20160816; b=E8sB8NPl5fWsSlTUhSMCWCkb7ck8U9sOTIQWm8yRTwdoTsbfcPjZCqYC95e9cUsKAv DjI9tYoKtO3EjVa7qj1OkTrkPGjJoRWPqyGyAC6rGiiQff0KC7hlyKtmA+fxUrrPDLcz Ayzg0jzqVOeJucxsRCguP9VI3IX8xKCWrkGw66/XgUJ/6HT5PWJl+IvQSaWIQNpiEQGb 1HZlcLogkKQAMNCnDHo/ut+yEYd1+hMDAs50s/KGTuV8PuWJGZu8/GX/sOXGesUeqJSS QmEOHkvqXFDFvGHrDAG4UTS3TXv5VrFyZ0V4r2N3JakBma+hZUy4FzJhWmSQVf2CEcKG 0P/Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=sbhW7FPd6W5qiXzL5MwoUVfjaR9wWhAKAwW7E4FkO/k=; b=h+e6R9Z1kAVIgmEZozevXhkfMjmGqYrADpX0khEoEjjh3YkrtmvM5L7I34hTuq0zDW FyVu0uyYmpfXDu4ceJtEFrRwouqx4nU6bo3D/zVgjHd4V3xcllNvwHhwdAJFnUzjU5Rl JEZPbUE7dXLsJdFtGv/io/fq2j5gUrOo9xy5kfZXBNVUthw5RDirpEvs+xex27EToTWz /L3J6iE9puVXGi/X4GxZmPcnm77dZu/nXRZwec6Ayt8t268UWdWIpkxOTyIpP8OTkqHm hPPRlj9ML+l/55VDvfq/D6PJqJn9xpzc8GPGxcCLN4OhRdpixWPWx5GFN2BVdMumJsRX zlbQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-pm-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f26si345697pge.67.2017.12.05.09.10.48; Tue, 05 Dec 2017 09:10:48 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-pm-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752978AbdLERKq (ORCPT + 11 others); Tue, 5 Dec 2017 12:10:46 -0500 Received: from foss.arm.com ([217.140.101.70]:52310 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753042AbdLERKm (ORCPT ); Tue, 5 Dec 2017 12:10:42 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BDCEA15BE; Tue, 5 Dec 2017 09:10:41 -0800 (PST) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 63EA23F246; Tue, 5 Dec 2017 09:10:39 -0800 (PST) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , "Rafael J . Wysocki" , Viresh Kumar , Vincent Guittot , Paul Turner , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes Subject: [PATCH v2 2/4] sched/fair: add util_est on top of PELT Date: Tue, 5 Dec 2017 17:10:16 +0000 Message-Id: <20171205171018.9203-3-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171205171018.9203-1-patrick.bellasi@arm.com> References: <20171205171018.9203-1-patrick.bellasi@arm.com> Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The util_avg signal computed by PELT is too variable for some use-cases. For example, a big task waking up after a long sleep period will have its utilization almost completely decayed. This introduces some latency before schedutil will be able to pick the best frequency to run a task. The same issue can affect task placement. Indeed, since the task utilization is already decayed at wakeup, when the task is enqueued in a CPU, this can result in a CPU running a big task as being temporarily represented as being almost empty. This leads to a race condition where other tasks can be potentially allocated on a CPU which just started to run a big task which slept for a relatively long period. Moreover, the PELT utilization of a task can be updated every [ms], thus making it a continuously changing value for certain longer running tasks. This means that the instantaneous PELT utilization of a RUNNING task is not really meaningful to properly support scheduler decisions. For all these reasons, a more stable signal can do a better job of representing the expected/estimated utilization of a task/cfs_rq. Such a signal can be easily created on top of PELT by still using it as an estimator which produces values to be aggregated on meaningful events. This patch adds a simple implementation of util_est, a new signal built on top of PELT's util_avg where: util_est(task) = max(task::util_avg, f(task::util_avg@dequeue_times)) This allows to remember how big a task has been reported by PELT in its previous activations via the function: f(task::util_avg@dequeue_times). If a task should change its behavior and it runs even longer in a new activation, after a certain time its util_est will just track the original PELT signal (i.e. task::util_avg). The estimated utilization of cfs_rq is defined only for root ones. That's because the only sensible consumer of this signal are the scheduler and schedutil when looking for the overall CPU utilization due to FAIR tasks. For this reason, the estimated utilization of a root cfs_rq is simply defined as: util_est(cfs_rq) = max(cfs_rq::util_avg, cfs_rq::util_est_runnable) where: cfs_rq::util_est_runnable = sum(util_est(task)) for each RUNNABLE task on that root cfs_rq It's worth to note that the estimated utilization is tracked only for entities of interests, specifically: - Tasks: to better support tasks placement decisions - root cfs_rqs: to better support both tasks placement decisions as well as frequencies selection Signed-off-by: Patrick Bellasi Reviewed-by: Brendan Jackman Reviewed-by: Dietmar Eggemann Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Viresh Kumar Cc: Paul Turner Cc: Vincent Guittot Cc: Morten Rasmussen Cc: Dietmar Eggemann Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- Changes v1->v2: - rebase on top of v4.15-rc2 - tested that overhauled PELT code does not affect the util_est --- include/linux/sched.h | 21 ++++++++++ kernel/sched/debug.c | 4 ++ kernel/sched/fair.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++- kernel/sched/features.h | 5 +++ kernel/sched/sched.h | 1 + 5 files changed, 132 insertions(+), 1 deletion(-) -- 2.14.1 diff --git a/include/linux/sched.h b/include/linux/sched.h index 21991d668d35..b01c0dc75ef5 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -338,6 +338,21 @@ struct sched_avg { unsigned long util_avg; }; +/** + * Estimation Utilization for FAIR tasks. + * + * Support data structure to track an Exponential Weighted Moving Average + * (EWMA) of a FAIR task's utilization. New samples are added to the moving + * average each time a task completes an activation. Sample's weight is + * chosen so that the EWMA will be relatively insensitive to transient changes + * to the task's workload. + */ +struct util_est { + unsigned long last; + unsigned long ewma; +#define UTIL_EST_WEIGHT_SHIFT 2 +}; + struct sched_statistics { #ifdef CONFIG_SCHEDSTATS u64 wait_start; @@ -562,6 +577,12 @@ struct task_struct { const struct sched_class *sched_class; struct sched_entity se; + /* + * Since we use se.avg.util_avg to update util_est fields, + * this last can benefit from being close to se which + * also defines se.avg as cache aligned. + */ + struct util_est util_est; struct sched_rt_entity rt; #ifdef CONFIG_CGROUP_SCHED struct task_group *sched_task_group; diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 1ca0130ed4f9..5ffa8234524a 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -567,6 +567,8 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) cfs_rq->avg.runnable_load_avg); SEQ_printf(m, " .%-30s: %lu\n", "util_avg", cfs_rq->avg.util_avg); + SEQ_printf(m, " .%-30s: %lu\n", "util_est_runnable", + cfs_rq->util_est_runnable); SEQ_printf(m, " .%-30s: %ld\n", "removed.load_avg", cfs_rq->removed.load_avg); SEQ_printf(m, " .%-30s: %ld\n", "removed.util_avg", @@ -1018,6 +1020,8 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, P(se.avg.runnable_load_avg); P(se.avg.util_avg); P(se.avg.last_update_time); + P(util_est.ewma); + P(util_est.last); #endif P(policy); P(prio); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ad21550d008c..d8f3ed71010b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -732,6 +732,12 @@ void init_entity_runnable_average(struct sched_entity *se) se->runnable_weight = se->load.weight; /* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */ + + /* Utilization estimation */ + if (entity_is_task(se)) { + task_of(se)->util_est.ewma = 0; + task_of(se)->util_est.last = 0; + } } static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq); @@ -5153,6 +5159,20 @@ static inline void hrtick_update(struct rq *rq) } #endif +static inline unsigned long task_util(struct task_struct *p); +static inline unsigned long task_util_est(struct task_struct *p); + +static inline void util_est_enqueue(struct task_struct *p) +{ + struct cfs_rq *cfs_rq = &task_rq(p)->cfs; + + if (!sched_feat(UTIL_EST)) + return; + + /* Update root cfs_rq's estimated utilization */ + cfs_rq->util_est_runnable += task_util_est(p); +} + /* * The enqueue_task method is called before nr_running is * increased. Here we update the fair scheduling stats and @@ -5205,9 +5225,84 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) if (!se) add_nr_running(rq, 1); + util_est_enqueue(p); hrtick_update(rq); } +static inline void util_est_dequeue(struct task_struct *p, int flags) +{ + struct cfs_rq *cfs_rq = &task_rq(p)->cfs; + unsigned long util_last = task_util(p); + bool sleep = flags & DEQUEUE_SLEEP; + unsigned long ewma; + long util_est; + + if (!sched_feat(UTIL_EST)) + return; + + /* + * Update root cfs_rq's estimated utilization + * + * If *p is the last task then the root cfs_rq's estimated utilization + * of a CPU is 0 by definition. + * + * Otherwise, in removing *p's util_est from its cfs_rq's + * util_est_runnable we should account for cases where this last + * activation of *p was longer then the previous ones. + * Also in these cases we need to set 0 the estimated utilization for + * the CPU. + */ + if (cfs_rq->nr_running > 0) { + util_est = cfs_rq->util_est_runnable; + util_est -= task_util_est(p); + if (util_est < 0) + util_est = 0; + cfs_rq->util_est_runnable = util_est; + } else { + cfs_rq->util_est_runnable = 0; + } + + /* + * Skip update of task's estimated utilization when the task has not + * yet completed an activation, e.g. being migrated. + */ + if (!sleep) + return; + + /* + * Skip update of task's estimated utilization when its EWMA is already + * ~1% close to its last activation value. + */ + util_est = p->util_est.ewma; + if (abs(util_est - util_last) <= (SCHED_CAPACITY_SCALE / 100)) + return; + + /* + * Update Task's estimated utilization + * + * When *p completes an activation we can consolidate another sample + * about the task size. This is done by storing the last PELT value + * for this task and using this value to load another sample in the + * exponential weighted moving average: + * + * ewma(t) = w * task_util(p) + (1 - w) ewma(t-1) + * = w * task_util(p) + ewma(t-1) - w * ewma(t-1) + * = w * (task_util(p) + ewma(t-1) / w - ewma(t-1)) + * + * Where 'w' is the weight of new samples, which is configured to be + * 0.25, thus making w=1/4 + */ + p->util_est.last = util_last; + ewma = p->util_est.ewma; + if (likely(ewma != 0)) { + ewma = util_last + (ewma << UTIL_EST_WEIGHT_SHIFT) - ewma; + ewma >>= UTIL_EST_WEIGHT_SHIFT; + } else { + ewma = util_last; + } + p->util_est.ewma = ewma; +} + static void set_next_buddy(struct sched_entity *se); /* @@ -5264,6 +5359,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) if (!se) sub_nr_running(rq, 1); + util_est_dequeue(p, flags); hrtick_update(rq); } @@ -5721,7 +5817,6 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, return affine; } -static inline unsigned long task_util(struct task_struct *p); static unsigned long cpu_util_wake(int cpu, struct task_struct *p); static unsigned long capacity_spare_wake(int cpu, struct task_struct *p) @@ -6216,6 +6311,11 @@ static inline unsigned long task_util(struct task_struct *p) return p->se.avg.util_avg; } +static inline unsigned long task_util_est(struct task_struct *p) +{ + return max(p->util_est.ewma, p->util_est.last); +} + /* * cpu_util_wake: Compute cpu utilization with any contributions from * the waking task p removed. diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 9552fd5854bf..e9f312acc0d3 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -85,3 +85,8 @@ SCHED_FEAT(ATTACH_AGE_LOAD, true) SCHED_FEAT(WA_IDLE, true) SCHED_FEAT(WA_WEIGHT, true) SCHED_FEAT(WA_BIAS, true) + +/* + * UtilEstimation. Use estimated CPU utiliation. + */ +SCHED_FEAT(UTIL_EST, false) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index b19552a212de..8371839075fa 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -444,6 +444,7 @@ struct cfs_rq { * CFS load tracking */ struct sched_avg avg; + unsigned long util_est_runnable; #ifndef CONFIG_64BIT u64 load_last_update_time_copy; #endif From patchwork Tue Dec 5 17:10:17 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Bellasi X-Patchwork-Id: 120718 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp6004500qgn; Tue, 5 Dec 2017 09:11:18 -0800 (PST) X-Google-Smtp-Source: AGs4zMbs/O5lUYbG5NDF9StfEYjEyYqHF+qDVG9u4Qk2fpzlqJXAQ3g+MKzjI/E6wkQJVEuJUSe3 X-Received: by 10.84.130.104 with SMTP id 95mr19201962plc.151.1512493878485; Tue, 05 Dec 2017 09:11:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512493878; cv=none; d=google.com; s=arc-20160816; b=zXx7WgCsJje6woLHCf9r1RRDolXs+/P5KodNHqaOqB+CbAOX56RTKj+93c9OE6JPol sBm4CutkZ4BpMhH/zlcprJQpTjUhIIwxosddTrmJ+7ciD95alQBDGCICvUwwWO8hap/z e0kSPdUWh0ekw3TdSvrm1rImQDPReSLcAAzIHT35S7H88BX+fmyxGsUrryqdvnkEqbxg 4e2HjjggehjidBGjul+JXrw+Dwi/nupRPIdlHCq8s5qy37bnUKOYASEGteWwTYaoVzW7 TX3Zd3y3KGSEB//eilWYN5fbiQqp61dEK2WS83EY3XgnZg1pUkn87s7bJYz+FKckyVlJ NxsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=79tChW3Lz5g9S3JN9l2Dvldyefk0jw9wFGuTkyJvaVI=; b=0MWTUDSZ3+DgQxFCGraqdgp0a6QB270m6RL8viNUGosm7OZoCt0vXByldhETUdESUW eAiT1t9EKJmVhD1hjrZOrehD/VA8eOgq016AsMrz6nzAFWn9GlG6HZEC7zC+jwEC9CS6 MAmMG6o2US9D6hLEUPTnmNNTuoqTUm6k7ZTN7ToBeVa75ni3ckI/S27iB7ykVrIi8X8K QlzIcAfhuhA0y4X9HGt0axS+o/ysN/DJbAxMNKndgHake8w9VKsFtNp/dLsbQ6M6OiDo ssAhaZzsEPjwwTNvoOoq91CazKJlxBC1CxO7jo3yG5Re2wderZExcgpJD2HSZUFX8tCq ZdZw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-pm-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f4si331093plr.643.2017.12.05.09.11.18; Tue, 05 Dec 2017 09:11:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-pm-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752780AbdLERLQ (ORCPT + 11 others); Tue, 5 Dec 2017 12:11:16 -0500 Received: from foss.arm.com ([217.140.101.70]:52338 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753063AbdLERKp (ORCPT ); Tue, 5 Dec 2017 12:10:45 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 564E515BF; Tue, 5 Dec 2017 09:10:45 -0800 (PST) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id F09623F246; Tue, 5 Dec 2017 09:10:42 -0800 (PST) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , "Rafael J . Wysocki" , Viresh Kumar , Vincent Guittot , Paul Turner , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes Subject: [PATCH v2 3/4] sched/fair: use util_est in LB and WU paths Date: Tue, 5 Dec 2017 17:10:17 +0000 Message-Id: <20171205171018.9203-4-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171205171018.9203-1-patrick.bellasi@arm.com> References: <20171205171018.9203-1-patrick.bellasi@arm.com> Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org When the scheduler looks at the CPU utilization, the current PELT value for a CPU is returned straight away. In certain scenarios this can have undesired side effects on task placement. For example, since the task utilization is decayed at wakeup time, when a long sleeping big task is enqueued it does not add immediately a significant contribution to the target CPU. As a result we generate a race condition where other tasks can be placed on the same CPU while is still considered relatively empty. In order to reduce these kind of race conditions, this patch introduces the required support to integrate the usage of the CPU's estimated utilization in cpu_util_wake as well as in update_sg_lb_stats. The estimated utilization of a CPU is defined to be the maximum between its PELT's utilization and the sum of the estimated utilization of the tasks currently RUNNABLE on that CPU. This allows to properly represent the expected utilization of a CPU which, for example, has just got a big task running since a long sleep period. Signed-off-by: Patrick Bellasi Reviewed-by: Brendan Jackman Reviewed-by: Dietmar Eggemann Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Viresh Kumar Cc: Paul Turner Cc: Vincent Guittot Cc: Morten Rasmussen Cc: Dietmar Eggemann Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- Changes v1->v2: - rebase on top of v4.15-rc2 - tested that overhauled PELT code does not affect the util_est --- kernel/sched/fair.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 68 insertions(+), 6 deletions(-) -- 2.14.1 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d8f3ed71010b..373d631efa91 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6306,6 +6306,41 @@ static unsigned long cpu_util(int cpu) return (util >= capacity) ? capacity : util; } +/** + * cpu_util_est: estimated utilization for the specified CPU + * @cpu: the CPU to get the estimated utilization for + * + * The estimated utilization of a CPU is defined to be the maximum between its + * PELT's utilization and the sum of the estimated utilization of the tasks + * currently RUNNABLE on that CPU. + * + * This allows to properly represent the expected utilization of a CPU which + * has just got a big task running since a long sleep period. At the same time + * however it preserves the benefits of the "blocked load" in describing the + * potential for other tasks waking up on the same CPU. + * + * Return: the estimated utlization for the specified CPU + */ +static inline unsigned long cpu_util_est(int cpu) +{ + unsigned long util, util_est; + unsigned long capacity; + struct cfs_rq *cfs_rq; + + if (!sched_feat(UTIL_EST)) + return cpu_util(cpu); + + cfs_rq = &cpu_rq(cpu)->cfs; + util = cfs_rq->avg.util_avg; + util_est = cfs_rq->util_est_runnable; + util_est = max(util, util_est); + + capacity = capacity_orig_of(cpu); + util_est = min(util_est, capacity); + + return util_est; +} + static inline unsigned long task_util(struct task_struct *p) { return p->se.avg.util_avg; @@ -6322,16 +6357,43 @@ static inline unsigned long task_util_est(struct task_struct *p) */ static unsigned long cpu_util_wake(int cpu, struct task_struct *p) { - unsigned long util, capacity; + long util, util_est; /* Task has no contribution or is new */ if (cpu != task_cpu(p) || !p->se.avg.last_update_time) - return cpu_util(cpu); + return cpu_util_est(cpu); - capacity = capacity_orig_of(cpu); - util = max_t(long, cpu_rq(cpu)->cfs.avg.util_avg - task_util(p), 0); + /* Discount task's blocked util from CPU's util */ + util = cpu_util(cpu) - task_util(p); + util = max(util, 0L); - return (util >= capacity) ? capacity : util; + if (!sched_feat(UTIL_EST)) + return util; + + /* + * These are the main cases covered: + * - if *p is the only task sleeping on this CPU, then: + * cpu_util (== task_util) > util_est (== 0) + * and thus we return: + * cpu_util_wake = (cpu_util - task_util) = 0 + * + * - if other tasks are SLEEPING on the same CPU, which is just waking + * up, then: + * cpu_util >= task_util + * cpu_util > util_est (== 0) + * and thus we discount *p's blocked utilization to return: + * cpu_util_wake = (cpu_util - task_util) >= 0 + * + * - if other tasks are RUNNABLE on that CPU and + * util_est > cpu_util + * then we use util_est since it returns a more restrictive + * estimation of the spare capacity on that CPU, by just considering + * the expected utilization of tasks already runnable on that CPU. + */ + util_est = cpu_rq(cpu)->cfs.util_est_runnable; + util = max(util, util_est); + + return util; } /* @@ -7857,7 +7919,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, load = source_load(i, load_idx); sgs->group_load += load; - sgs->group_util += cpu_util(i); + sgs->group_util += cpu_util_est(i); sgs->sum_nr_running += rq->cfs.h_nr_running; nr_running = rq->nr_running;