From patchwork Tue Feb 6 14:41:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Bellasi X-Patchwork-Id: 127015 Delivered-To: patch@linaro.org Received: by 10.46.124.24 with SMTP id x24csp2981107ljc; Tue, 6 Feb 2018 06:41:56 -0800 (PST) X-Google-Smtp-Source: AH8x226v97a1q3XIRdU927AIXON+WO/UD73aEMqYM+hWf5uITSyi+6Whr6uPpv3zwisTEwc0Xy2D X-Received: by 10.99.114.71 with SMTP id c7mr2134735pgn.283.1517928116370; Tue, 06 Feb 2018 06:41:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517928116; cv=none; d=google.com; s=arc-20160816; b=KTnMMkye68MnCdO7C7EwQEcAm2bIgTQh9IFPDMySi3foLZDo8EVYaQm9eVxkXHsNVS qUsgZeKGnvP6CsfpFk8d9OzDpYmNFM6UaMWvTVHt8/KXAxw9YGJmMQ8k4os5Yj7uj7d0 mcRSLPga2dWH+D1vxnexmqnF368bis5yQLe7FOChgsRs2ec2xFXxRgo57gBP82GAL0fm o7NlzCjPD0fs2gg01uZYY65whgqLJ75YofAa8z1kN5zAFLWJJLDT0lUgubEdBEtcrGTr jf9kiKzq43l6aaGg5AKbfaPxr/OUol+2gVSc+NY5Mm8FpRMx7rSzLuNu5v3H3ExCcrNP Gfyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=oOG35cHxHiPihmgb4Oub1+pEynLq2vFD7rxQqowipOs=; b=Yz9dRnOuOtKjDqyEJgiFs7xLXmh/Ip2NKgQI4jGg4604JbUipbfyphRZJrJ5DoM1sH V6tmBRpAhd0zkbGFqykcfCyId8PT6ytpk9FB0S26pjRF9nO4CBEH1JZvs04KqhYUZWdE FbvpRcRiAFe76WdXhpejzjJmZb6zvBg1fA/eX4zEOhh8gg0JU3xfNcj5t6gHH29VliQ2 lqZxwKivASZ1EGQVIvVklJSuNPF9cVMCEGlzhnxvfLYwHbvzbYy7yWAb0tZ6e7bpbBp/ W7BRwqIMTEmFsR06HT4/9+SXppDKL1UMLnspZXZiHVJctsXgiS4OGTd+zVlprYLyGBZi Jmtg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-pm-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u23-v6si3190plk.516.2018.02.06.06.41.56; Tue, 06 Feb 2018 06:41:56 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-pm-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751993AbeBFOly (ORCPT + 11 others); Tue, 6 Feb 2018 09:41:54 -0500 Received: from foss.arm.com ([217.140.101.70]:38226 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752180AbeBFOlp (ORCPT ); Tue, 6 Feb 2018 09:41:45 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 878FE1596; Tue, 6 Feb 2018 06:41:45 -0800 (PST) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 08FB83F487; Tue, 6 Feb 2018 06:41:42 -0800 (PST) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , "Rafael J . Wysocki" , Viresh Kumar , Vincent Guittot , Paul Turner , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes , Steve Muckle Subject: [PATCH v4 1/3] sched/fair: add util_est on top of PELT Date: Tue, 6 Feb 2018 14:41:29 +0000 Message-Id: <20180206144131.31233-2-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180206144131.31233-1-patrick.bellasi@arm.com> References: <20180206144131.31233-1-patrick.bellasi@arm.com> Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The util_avg signal computed by PELT is too variable for some use-cases. For example, a big task waking up after a long sleep period will have its utilization almost completely decayed. This introduces some latency before schedutil will be able to pick the best frequency to run a task. The same issue can affect task placement. Indeed, since the task utilization is already decayed at wakeup, when the task is enqueued in a CPU, this can result in a CPU running a big task as being temporarily represented as being almost empty. This leads to a race condition where other tasks can be potentially allocated on a CPU which just started to run a big task which slept for a relatively long period. Moreover, the PELT utilization of a task can be updated every [ms], thus making it a continuously changing value for certain longer running tasks. This means that the instantaneous PELT utilization of a RUNNING task is not really meaningful to properly support scheduler decisions. For all these reasons, a more stable signal can do a better job of representing the expected/estimated utilization of a task/cfs_rq. Such a signal can be easily created on top of PELT by still using it as an estimator which produces values to be aggregated on meaningful events. This patch adds a simple implementation of util_est, a new signal built on top of PELT's util_avg where: util_est(task) = max(task::util_avg, f(task::util_avg@dequeue_times)) This allows to remember how big a task has been reported by PELT in its previous activations via the function: f(task::util_avg@dequeue_times). If a task should change its behavior and it runs even longer in a new activation, after a certain time its util_est will just track the original PELT signal (i.e. task::util_avg). The estimated utilization of cfs_rq is defined only for root ones. That's because the only sensible consumer of this signal are the scheduler and schedutil when looking for the overall CPU utilization due to FAIR tasks. For this reason, the estimated utilization of a root cfs_rq is simply defined as: util_est(cfs_rq) = max(cfs_rq::util_avg, cfs_rq::util_est_runnable) where: cfs_rq::util_est_runnable = sum(util_est(task)) for each RUNNABLE task on that root cfs_rq It's worth to note that the estimated utilization is tracked only for objects of interests, specifically: - Tasks: to better support tasks placement decisions - root cfs_rqs: to better support both tasks placement decisions as well as frequencies selection Signed-off-by: Patrick Bellasi Reviewed-by: Dietmar Eggemann Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Viresh Kumar Cc: Paul Turner Cc: Vincent Guittot Cc: Morten Rasmussen Cc: Dietmar Eggemann Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- Changes in v4: - rebased on today's tip/sched/core (commit 460e8c3340a2) - renamed util_est's "last" into "enqueued" - using util_est's "enqueued" for both se and cfs_rqs (Joel) - update margin check to use more ASM friendly code (Peter) - optimize EWMA updates (Peter) Changes in v3: - rebased on today's tip/sched/core (commit 07881166a892) - moved util_est into sched_avg (Peter) - use {READ,WRITE}_ONCE() for EWMA updates (Peter) - using unsigned int to fit all sched_avg into a single 64B cache line Changes in v2: - rebase on top of v4.15-rc2 - tested that overhauled PELT code does not affect the util_est Change-Id: If5690c05b54bc24e1bcbaad85212656f71ab68a3 --- include/linux/sched.h | 16 ++++++++ kernel/sched/debug.c | 4 ++ kernel/sched/fair.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++- kernel/sched/features.h | 5 +++ 4 files changed, 122 insertions(+), 1 deletion(-) -- 2.15.1 diff --git a/include/linux/sched.h b/include/linux/sched.h index 166144c04ef6..0e374d69e431 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -275,6 +275,21 @@ struct load_weight { u32 inv_weight; }; +/** + * Estimation Utilization for FAIR tasks. + * + * Support data structure to track an Exponential Weighted Moving Average + * (EWMA) of a FAIR task's utilization. New samples are added to the moving + * average each time a task completes an activation. Sample's weight is + * chosen so that the EWMA will be relatively insensitive to transient changes + * to the task's workload. + */ +struct util_est { + unsigned int enqueued; + unsigned int ewma; +#define UTIL_EST_WEIGHT_SHIFT 2 +}; + /* * The load_avg/util_avg accumulates an infinite geometric series * (see __update_load_avg() in kernel/sched/fair.c). @@ -336,6 +351,7 @@ struct sched_avg { unsigned long load_avg; unsigned long runnable_load_avg; unsigned long util_avg; + struct util_est util_est; }; struct sched_statistics { diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 1ca0130ed4f9..d4eb5532ea6b 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -567,6 +567,8 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) cfs_rq->avg.runnable_load_avg); SEQ_printf(m, " .%-30s: %lu\n", "util_avg", cfs_rq->avg.util_avg); + SEQ_printf(m, " .%-30s: %u\n", "util_est_enqueued", + cfs_rq->avg.util_est.enqueued); SEQ_printf(m, " .%-30s: %ld\n", "removed.load_avg", cfs_rq->removed.load_avg); SEQ_printf(m, " .%-30s: %ld\n", "removed.util_avg", @@ -1018,6 +1020,8 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, P(se.avg.runnable_load_avg); P(se.avg.util_avg); P(se.avg.last_update_time); + P(se.avg.util_est.ewma); + P(se.avg.util_est.enqueued); #endif P(policy); P(prio); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7b6535987500..118f49c39b60 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5193,6 +5193,20 @@ static inline void hrtick_update(struct rq *rq) } #endif +static inline unsigned long task_util(struct task_struct *p); +static inline unsigned long _task_util_est(struct task_struct *p); + +static inline void util_est_enqueue(struct task_struct *p) +{ + struct cfs_rq *cfs_rq = &task_rq(p)->cfs; + + if (!sched_feat(UTIL_EST)) + return; + + /* Update root cfs_rq's estimated utilization */ + cfs_rq->avg.util_est.enqueued += _task_util_est(p); +} + /* * The enqueue_task method is called before nr_running is * increased. Here we update the fair scheduling stats and @@ -5245,9 +5259,85 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) if (!se) add_nr_running(rq, 1); + util_est_enqueue(p); hrtick_update(rq); } +/* + * Check if the specified (signed) value is within a specified margin, + * based on the observation that: + * abs(x) < y := (unsigned)(x + y - 1) < (2 * y - 1) + */ +static inline bool within_margin(long value, unsigned int margin) +{ + return ((unsigned int)(value + margin - 1) < (2 * margin - 1)); +} + +static inline void util_est_dequeue(struct task_struct *p, int flags) +{ + struct cfs_rq *cfs_rq = &task_rq(p)->cfs; + unsigned long util_last; + long last_ewma_diff; + unsigned long ewma; + long util_est = 0; + + if (!sched_feat(UTIL_EST)) + return; + + /* + * Update root cfs_rq's estimated utilization + * + * If *p is the last task then the root cfs_rq's estimated utilization + * of a CPU is 0 by definition. + */ + if (cfs_rq->nr_running) { + util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued); + util_est -= min_t(long, util_est, _task_util_est(p)); + } + WRITE_ONCE(cfs_rq->avg.util_est.enqueued, util_est); + + /* + * Skip update of task's estimated utilization when the task has not + * yet completed an activation, e.g. being migrated. + */ + if (!(flags & DEQUEUE_SLEEP)) + return; + + ewma = READ_ONCE(p->se.avg.util_est.ewma); + util_last = task_util(p); + + /* + * Skip update of task's estimated utilization when its EWMA is + * already ~1% close to its last activation value. + */ + last_ewma_diff = util_last - ewma; + if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100))) + return; + + /* + * Update Task's estimated utilization + * + * When *p completes an activation we can consolidate another sample + * about the task size. This is done by storing the last PELT value + * for this task and using this value to load another sample in the + * exponential weighted moving average: + * + * ewma(t) = w * task_util(p) + (1-w) * ewma(t-1) + * = w * task_util(p) + ewma(t-1) - w * ewma(t-1) + * = w * (task_util(p) - ewma(t-1)) + ewma(t-1) + * = w * ( last_ewma_diff ) + ewma(t-1) + * = w * (last_ewma_diff + ewma(t-1) / w) + * + * Where 'w' is the weight of new samples, which is configured to be + * 0.25, thus making w=1/4 ( >>= UTIL_EST_WEIGHT_SHIFT) + */ + ewma = last_ewma_diff + (ewma << UTIL_EST_WEIGHT_SHIFT); + ewma >>= UTIL_EST_WEIGHT_SHIFT; + + WRITE_ONCE(p->se.avg.util_est.ewma, ewma); + WRITE_ONCE(p->se.avg.util_est.enqueued, util_last); +} + static void set_next_buddy(struct sched_entity *se); /* @@ -5304,6 +5394,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) if (!se) sub_nr_running(rq, 1); + util_est_dequeue(p, flags); hrtick_update(rq); } @@ -5767,7 +5858,6 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, return affine; } -static inline unsigned long task_util(struct task_struct *p); static unsigned long cpu_util_wake(int cpu, struct task_struct *p); static unsigned long capacity_spare_wake(int cpu, struct task_struct *p) @@ -6262,6 +6352,12 @@ static inline unsigned long task_util(struct task_struct *p) return p->se.avg.util_avg; } + +static inline unsigned long _task_util_est(struct task_struct *p) +{ + return max(p->se.avg.util_est.ewma, p->se.avg.util_est.enqueued); +} + /* * cpu_util_wake: Compute cpu utilization with any contributions from * the waking task p removed. diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 9552fd5854bf..c459a4b61544 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -85,3 +85,8 @@ SCHED_FEAT(ATTACH_AGE_LOAD, true) SCHED_FEAT(WA_IDLE, true) SCHED_FEAT(WA_WEIGHT, true) SCHED_FEAT(WA_BIAS, true) + +/* + * UtilEstimation. Use estimated CPU utilization. + */ +SCHED_FEAT(UTIL_EST, false) From patchwork Tue Feb 6 14:41:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Bellasi X-Patchwork-Id: 127016 Delivered-To: patch@linaro.org Received: by 10.46.124.24 with SMTP id x24csp2981267ljc; Tue, 6 Feb 2018 06:42:09 -0800 (PST) X-Google-Smtp-Source: AH8x225C0d6ZZD4YZFaEOhnTFRzIKGXrP1diwQPICX6cEobBOZLAyCac8lBbVj5PO2SpvXo1WcfZ X-Received: by 10.99.102.1 with SMTP id a1mr2149904pgc.452.1517928129386; Tue, 06 Feb 2018 06:42:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517928129; cv=none; d=google.com; s=arc-20160816; b=sut/e3rHrDwAl7VNeyzAF90BjZwbja0b8mrgolB6LCncckso4L/03nGM37ceqXdEP2 syN/zOQI6eihA4Zt6M1a9BbNazFd+SV8e+6QlueZ8nQX4MhrtfmJS+12Clxf+3K7g31y wIEOYq7TFH76cPyf1wR6hGt1/4DUz8FnGEoVgjtbO7AFOyOHsX0vLjB5A1+J3d/0SCs2 Hnwi0du8cABuUtTFeUhovcjk8dfEbcrBvXuoG1mCLmhCqZyEstoqJSM6hvigsrw9wHG7 L5LsoTb/ZNz819TqT1HS3UcnpXlQ6HqJ8aeWyahwIoAb/aA9zSYKTUeM3856wcmkZwi3 F7ew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=VE4N3p/C7S5eN7xLuy4KOV85Ru6ihqLR+UUxJw0NEbs=; b=DDZQjHUhUoaEJcqedFB87qQEMo7IwLaf+6ZJ/UdppVGkzKnHneRsG+pVylVo/cqgGO uk816HtUSJOPBTif0MlwDlz6mhBtDlXz30XhHw0aPMH09DSzExzz38+6qTGEkRWXjAwr h2q0GDuVIHGzVjhmEmsKctnou1KCXwkVKiGcaxmukmGLW2Pw088jh+U14PkrYWYzlFEC JZg3eb5O4Ea/xq3OMNkuig2GaFBYd+H61rIirwgaKZ+dnlARiaJxtm0ELOy+rAWW31HW NUqqN94wgOY6znFVTFWpRE1XzXIk2ByHn3RqyETwozi39l574peLLMnUwdF6jeoyk3cy OAKQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-pm-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l33-v6si6554317pld.692.2018.02.06.06.42.09; Tue, 06 Feb 2018 06:42:09 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-pm-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752213AbeBFOl4 (ORCPT + 11 others); Tue, 6 Feb 2018 09:41:56 -0500 Received: from foss.arm.com ([217.140.101.70]:38246 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751396AbeBFOlt (ORCPT ); Tue, 6 Feb 2018 09:41:49 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 093031610; Tue, 6 Feb 2018 06:41:49 -0800 (PST) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 8029F3F487; Tue, 6 Feb 2018 06:41:46 -0800 (PST) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , "Rafael J . Wysocki" , Viresh Kumar , Vincent Guittot , Paul Turner , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes , Steve Muckle Subject: [PATCH v4 2/3] sched/fair: use util_est in LB and WU paths Date: Tue, 6 Feb 2018 14:41:30 +0000 Message-Id: <20180206144131.31233-3-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180206144131.31233-1-patrick.bellasi@arm.com> References: <20180206144131.31233-1-patrick.bellasi@arm.com> Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org When the scheduler looks at the CPU utilization, the current PELT value for a CPU is returned straight away. In certain scenarios this can have undesired side effects on task placement. For example, since the task utilization is decayed at wakeup time, when a long sleeping big task is enqueued it does not add immediately a significant contribution to the target CPU. As a result we generate a race condition where other tasks can be placed on the same CPU while it is still considered relatively empty. In order to reduce this kind of race conditions, this patch introduces the required support to integrate the usage of the CPU's estimated utilization in cpu_util_wake as well as in update_sg_lb_stats. The estimated utilization of a CPU is defined to be the maximum between its PELT's utilization and the sum of the estimated utilization of the tasks currently RUNNABLE on that CPU. This allows to properly represent the spare capacity of a CPU which, for example, has just got a big task running since a long sleep period. Signed-off-by: Patrick Bellasi Reviewed-by: Dietmar Eggemann Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Viresh Kumar Cc: Paul Turner Cc: Vincent Guittot Cc: Morten Rasmussen Cc: Dietmar Eggemann Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- Changes in v4: - rebased on today's tip/sched/core (commit 460e8c3340a2) - ensure cpu_util_wake() is cpu_capacity_orig()'s clamped (Pavan) Changes in v3: - rebased on today's tip/sched/core (commit 07881166a892) Changes in v2: - rebase on top of v4.15-rc2 - tested that overhauled PELT code does not affect the util_est Change-Id: Id5a38d0e41aae7ca89f021f277851ee4e6ba5112 --- kernel/sched/fair.c | 81 +++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 76 insertions(+), 5 deletions(-) -- 2.15.1 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 118f49c39b60..2a2e88bced87 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6347,6 +6347,41 @@ static unsigned long cpu_util(int cpu) return (util >= capacity) ? capacity : util; } +/** + * cpu_util_est: estimated utilization for the specified CPU + * @cpu: the CPU to get the estimated utilization for + * + * The estimated utilization of a CPU is defined to be the maximum between its + * PELT's utilization and the sum of the estimated utilization of the tasks + * currently RUNNABLE on that CPU. + * + * This allows to properly represent the expected utilization of a CPU which + * has just got a big task running since a long sleep period. At the same time + * however it preserves the benefits of the "blocked utilization" in + * describing the potential for other tasks waking up on the same CPU. + * + * Return: the estimated utilization for the specified CPU + */ +static inline unsigned long cpu_util_est(int cpu) +{ + unsigned long util, util_est; + unsigned long capacity; + struct cfs_rq *cfs_rq; + + if (!sched_feat(UTIL_EST)) + return cpu_util(cpu); + + cfs_rq = &cpu_rq(cpu)->cfs; + util = cfs_rq->avg.util_avg; + util_est = cfs_rq->avg.util_est.enqueued; + util_est = max(util, util_est); + + capacity = capacity_orig_of(cpu); + util_est = min(util_est, capacity); + + return util_est; +} + static inline unsigned long task_util(struct task_struct *p) { return p->se.avg.util_avg; @@ -6364,16 +6399,52 @@ static inline unsigned long _task_util_est(struct task_struct *p) */ static unsigned long cpu_util_wake(int cpu, struct task_struct *p) { - unsigned long util, capacity; + unsigned long capacity; + long util, util_est; /* Task has no contribution or is new */ if (cpu != task_cpu(p) || !p->se.avg.last_update_time) - return cpu_util(cpu); + return cpu_util_est(cpu); + /* Discount task's blocked util from CPU's util */ + util = cpu_util(cpu) - task_util(p); + util = max(util, 0L); + + if (!sched_feat(UTIL_EST)) + return util; + + /* + * Covered cases: + * - if *p is the only task sleeping on this CPU, then: + * cpu_util (== task_util) > util_est (== 0) + * and thus we return: + * cpu_util_wake = (cpu_util - task_util) = 0 + * + * - if other tasks are SLEEPING on the same CPU, which is just waking + * up, then: + * cpu_util >= task_util + * cpu_util > util_est (== 0) + * and thus we discount *p's blocked utilization to return: + * cpu_util_wake = (cpu_util - task_util) >= 0 + * + * - if other tasks are RUNNABLE on that CPU and + * util_est > cpu_util + * then we use util_est since it returns a more restrictive + * estimation of the spare capacity on that CPU, by just considering + * the expected utilization of tasks already runnable on that CPU. + */ + util_est = cpu_rq(cpu)->cfs.avg.util_est.enqueued; + util = max(util, util_est); + + /* + * Estimated utilization can exceed the CPU capacity, thus let's clamp + * to the maximum CPU capacity to ensure consistency with other + * cpu_util[_est] calls. + */ capacity = capacity_orig_of(cpu); - util = max_t(long, cpu_rq(cpu)->cfs.avg.util_avg - task_util(p), 0); + util = min_t(unsigned long, util, capacity); - return (util >= capacity) ? capacity : util; + return util; } /* @@ -7898,7 +7969,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, load = source_load(i, load_idx); sgs->group_load += load; - sgs->group_util += cpu_util(i); + sgs->group_util += cpu_util_est(i); sgs->sum_nr_running += rq->cfs.h_nr_running; nr_running = rq->nr_running; From patchwork Tue Feb 6 14:41:31 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Bellasi X-Patchwork-Id: 127017 Delivered-To: patch@linaro.org Received: by 10.46.124.24 with SMTP id x24csp2981442ljc; Tue, 6 Feb 2018 06:42:23 -0800 (PST) X-Google-Smtp-Source: AH8x227zr+bwsjcnVvxpCpM13s1q4/HFLsHczmzP4JZxFEoSTzI1mtWZaEMRKFJ+bqYw/bOf9SmO X-Received: by 2002:a17:902:49:: with SMTP id 67-v6mr2602266pla.424.1517928143481; Tue, 06 Feb 2018 06:42:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517928143; cv=none; d=google.com; s=arc-20160816; b=V3PTkB5HjdDMnck6ghNiiAQjMIlq+v2sQMRyjED7foF9vYienxvV0lYbSSzta6zuma zol2QiNFfKNpzFEKu8raJECbWhQXGAJvMcOfYqUxKdiMmkfCK8CzpUCcZOoXsjzIHokM /GnHTQQxDq8AYVlMGHgFgtwSnmF1YAGHii4SnnAv8szV0mhO4KnaPWNN/VEEBsLdP2yV pQFQDNHFLN9ey9ELGp7Yj8xfa8s6+fgG4M3eoz9Cu7EypSWJmPb5hJXOi/Dm9/nxN0Qc 8rHLo5jVH+LN9sxAI/dDF4gGnrurmcAQegAyC2uOxeRU0i6JqhDdj7ed1bBRMfg0QbuZ /5fw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=SFoiXCZLZJN59JhFFoFn8dAeoYunbtihd3MDS1OoWAA=; b=AZO0OPZqJegvEu+yM1UfVYTQS021eybY3ntClhpAt33UIzB0vtxmdy2vAet9kf7kkl b34vu3Y/ZvRQElx8TmKDL28tiPKs5LkzGzyDrDOn8I+7jLedneEw7ccA9YThSF1snAxr OpQcYKQE28QQ3PfgHKpuSe0V4m12qH1MnCkYdtWfQ4OReGEJMYNmtWG/XNi+bWhoaboX HqRv0HSzhWgqry3FdaWzoPqeL+GrBJWqkGnzQe/g7w4btSnifxm7h5TK7El8IB5E+g7U wAd8xhmfZzwEKYYnKaYXcNd7tYYCDaQ4cHTZIJB6uzCzv7t2hUpiHpgc+htsGCeOGT4f udwg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-pm-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 10si2500504pfu.318.2018.02.06.06.42.23; Tue, 06 Feb 2018 06:42:23 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-pm-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-pm-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752408AbeBFOmW (ORCPT + 11 others); Tue, 6 Feb 2018 09:42:22 -0500 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:38262 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752151AbeBFOlx (ORCPT ); Tue, 6 Feb 2018 09:41:53 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6C6F2164F; Tue, 6 Feb 2018 06:41:52 -0800 (PST) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id E3B713F487; Tue, 6 Feb 2018 06:41:49 -0800 (PST) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , "Rafael J . Wysocki" , Viresh Kumar , Vincent Guittot , Paul Turner , Dietmar Eggemann , Morten Rasmussen , Juri Lelli , Todd Kjos , Joel Fernandes , Steve Muckle Subject: [PATCH v4 3/3] sched/cpufreq_schedutil: use util_est for OPP selection Date: Tue, 6 Feb 2018 14:41:31 +0000 Message-Id: <20180206144131.31233-4-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180206144131.31233-1-patrick.bellasi@arm.com> References: <20180206144131.31233-1-patrick.bellasi@arm.com> Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org When schedutil looks at the CPU utilization, the current PELT value for that CPU is returned straight away. In certain scenarios this can have undesired side effects and delays on frequency selection. For example, since the task utilization is decayed at wakeup time, a long sleeping big task newly enqueued does not add immediately a significant contribution to the target CPU. This introduces some latency before schedutil will be able to detect the best frequency required by that task. Moreover, the PELT signal build-up time is a function of the current frequency, because of the scale invariant load tracking support. Thus, starting from a lower frequency, the utilization build-up time will increase even more and further delays the selection of the actual frequency which better serves the task requirements. In order to reduce this kind of latencies, we integrate the usage of the CPU's estimated utilization in the sugov_get_util function. This allows to properly consider the expected utilization of a CPU which, for example, has just got a big task running after a long sleep period. Ultimately this allows to select the best frequency to run a task right after its wake-up. Signed-off-by: Patrick Bellasi Reviewed-by: Dietmar Eggemann Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: Viresh Kumar Cc: Paul Turner Cc: Vincent Guittot Cc: Morten Rasmussen Cc: Dietmar Eggemann Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- Changes in v4: - rebased on today's tip/sched/core (commit 460e8c3340a2) - use util_est.enqueued for cfs_rq's util_est (Joel) - simplify cpu_util_cfs() integration (Dietmar) Changes in v3: - rebase on today's tip/sched/core (commit 07881166a892) - moved into Juri's cpu_util_cfs(), which should also address Rafael's suggestion to use a local variable. Changes in v2: - rebase on top of v4.15-rc2 - tested that overhauled PELT code does not affect the util_est Change-Id: I62c01ed90d8ad45b06383be03d39fcf8c9041646 --- kernel/sched/sched.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) -- 2.15.1 Acked-by: Rafael J. Wysocki diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2e95505e23c6..f3c7b6a83ef4 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2127,7 +2127,12 @@ static inline unsigned long cpu_util_dl(struct rq *rq) static inline unsigned long cpu_util_cfs(struct rq *rq) { - return rq->cfs.avg.util_avg; + if (!sched_feat(UTIL_EST)) + return rq->cfs.avg.util_avg; + + return max_t(unsigned long, + rq->cfs.avg.util_avg, + rq->cfs.avg.util_est.enqueued); } #endif