From patchwork Thu Aug 24 18:08:52 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Bellasi X-Patchwork-Id: 110940 Delivered-To: patch@linaro.org Received: by 10.140.95.78 with SMTP id h72csp6047722qge; Thu, 24 Aug 2017 11:09:33 -0700 (PDT) X-Received: by 10.99.113.19 with SMTP id m19mr7331095pgc.268.1503598173170; Thu, 24 Aug 2017 11:09:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1503598173; cv=none; d=google.com; s=arc-20160816; b=LL5xBQxgxxcOWIarrzYExLj/AFMNNmn+FZfGA2SfW1F4WTPxnEMG6vlxA0ohIgey4+ DDs1rVLGfXRshwLJ5jVtUu2OPzsyvujTbCEwZfTeyH0+zrI+GiWdkUMsmedKC+aaVYmh XvIaQD252+MZw1p6vIVu05aG1FWj6axGfHD9F2fKCADwhTKD1hY+Pg+6S5SyeeKMO4wh hVMyB7hGtEXc15VE2909okWxjlrJGK1CCbGOMrguVQ9qxV4uews6oUMbrslwos9QnpkJ 9Q91r8Ixg2m+2OEVmt+7xBWRsUKEa+70+qkOwMxkPupMa8NOZ0nswwaHRr+ZYtZCRRNH /t+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=MP6qEVXfLwFX/xWrFnE6NlV4o9wnqsJSKUl3EU3T724=; b=ceSKbWuwSv5qjkrbBFUPOlzps85N4Cl8NUD8Tq6RRVR2IDnm3q4NumvUIKkt8DpHrC 2K/EvHXeqHkK7JlMJm/YpLhz5z7Aa0Q4413zWG5CGcxVbEBiL1cz7aZYpzto/qhboV0s ENEucINkWgXPWdBkDvLvjblgOi5vdjzKqhqw/VAdUfEPwUYoeRL7CgKxDlbpKbjUTMHt KrUpS+xR27f+tpnCOooAjATUQLCbUiJz9rXjsXu9MvcAv3AzswFzuDmLtOt8TvyopfL9 bQzWirNa9plqQkLwlky4HEX6WpoDDH1rH93Kbfk9lyj2aY7GWIjwTTkdVz5OQXxEszYU E6yA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j2si3179988pgc.263.2017.08.24.11.09.32; Thu, 24 Aug 2017 11:09:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751561AbdHXSJ3 (ORCPT + 26 others); Thu, 24 Aug 2017 14:09:29 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:45432 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753378AbdHXSJ0 (ORCPT ); Thu, 24 Aug 2017 14:09:26 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 787A613D5; Thu, 24 Aug 2017 11:09:26 -0700 (PDT) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 819E63F3E1; Thu, 24 Aug 2017 11:09:23 -0700 (PDT) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Tejun Heo , "Rafael J . Wysocki" , Paul Turner , Vincent Guittot , John Stultz , Morten Rasmussen , Dietmar Eggemann , Juri Lelli , Tim Murray , Todd Kjos , Andres Oportus , Joel Fernandes , Viresh Kumar Subject: [RFCv4 1/6] sched/core: add utilization clamping to CPU controller Date: Thu, 24 Aug 2017 19:08:52 +0100 Message-Id: <20170824180857.32103-2-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20170824180857.32103-1-patrick.bellasi@arm.com> References: <20170824180857.32103-1-patrick.bellasi@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The cgroup's CPU controller allows to assign a specified (maximum) bandwidth to the tasks of a group. However this bandwidth is defined and enforced only on a temporal base, without considering the actual frequency a CPU is running on. Thus, the amount of computation completed by a task within an allocated bandwidth can be very different depending on the actual frequency the CPU is running that task. With the availability of schedutil, the scheduler is now able to drive frequency selections based on the actual tasks utilization. Thus, it is now possible to extend the cpu controller to specify what is the minimum (or maximum) utilization which a task is allowed to generate. By adding new constraints on minimum and maximum utilization allowed for tasks in a cpu control group it will be possible to better control the actual amount of CPU bandwidth consumed by these tasks. The ultimate goal of this new pair of constraints is to enable: - boosting: by selecting a higher execution frequency for small tasks which are affecting the user interactive experience - capping: by enforcing lower execution frequency (which usually improves energy efficiency) for big tasks which are mainly related to background activities without a direct impact on the user experience. This patch extends the CPU controller by adding a couple of new attributes, util_min and util_max, which can be used to enforce frequency boosting and capping. Specifically: - util_min: defines the minimum CPU utilization which should be considered, e.g. when schedutil selects the frequency for a CPU while a task in this group is RUNNABLE. i.e. the task will run at least at a minimum frequency which corresponds to the min_util utilization - util_max: defines the maximum CPU utilization which should be considered, e.g. when schedutil selects the frequency for a CPU while a task in this group is RUNNABLE. i.e. the task will run up to a maximum frequency which corresponds to the max_util utilization These attributes: a) are tunable at all hierarchy levels, i.e. at root group level too, thus allowing to defined minimum and maximum frequency constraints for all otherwise non-classified tasks (e.g. autogroups) b) allow to create subgroups of tasks which are not violating the utilization constraints defined by the parent group. Tasks on a subgroup can only be more boosted and/or capped, which is matching with the "limits" schema proposed by the "Resource Distribution Model (RDM)" suggested by the CGroups v2 documentation: Documentation/cgroup-v2.txt This patch provides the basic support to expose the two new attributes and to validate their run-time update based on the "limits" of the aforementioned RDM schema. Signed-off-by: Patrick Bellasi Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Tejun Heo Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- include/linux/sched.h | 7 ++ init/Kconfig | 17 +++++ kernel/sched/core.c | 180 ++++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 22 ++++++ 4 files changed, 226 insertions(+) -- 2.14.1 diff --git a/include/linux/sched.h b/include/linux/sched.h index c28b182c9833..265ac0898f9e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -241,6 +241,13 @@ struct vtime { u64 gtime; }; +enum uclamp_id { + UCLAMP_MIN = 0, /* Minimum utilization */ + UCLAMP_MAX, /* Maximum utilization */ + /* Utilization clamping constraints count */ + UCLAMP_CNT +}; + struct sched_info { #ifdef CONFIG_SCHED_INFO /* Cumulative counters: */ diff --git a/init/Kconfig b/init/Kconfig index 8514b25db21c..db736529f08b 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -754,6 +754,23 @@ config RT_GROUP_SCHED endif #CGROUP_SCHED +config UTIL_CLAMP + bool "Utilization clamping per group of tasks" + depends on CPU_FREQ_GOV_SCHEDUTIL + depends on CGROUP_SCHED + default n + help + This feature enables the scheduler to track the clamped utilization + of each CPU based on RUNNABLE tasks currently scheduled on that CPU. + + When this option is enabled, the user can specify a min and max + CPU bandwidth which is allowed for each single task in a group. + The max bandwidth allows to clamp the maximum frequency a task + can use, while the min bandwidth allows to define a minimum + frequency a task will alwasy use. + + If in doubt, say N. + config CGROUP_PIDS bool "PIDs controller" help diff --git a/kernel/sched/core.c b/kernel/sched/core.c index f9f9948e2470..20b5a11d64ab 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -751,6 +751,48 @@ static void set_load_weight(struct task_struct *p) load->inv_weight = sched_prio_to_wmult[prio]; } +#ifdef CONFIG_UTIL_CLAMP +/** + * uclamp_mutex: serialize updates of TG's utilization clamp values + */ +static DEFINE_MUTEX(uclamp_mutex); + +/** + * alloc_uclamp_sched_group: initialize a new TG's for utilization clamping + * @tg: the newly created task group + * @parent: its parent task group + * + * A newly created task group inherits its utilization clamp values, for all + * clamp indexes, from its parent task group. + */ +static inline void alloc_uclamp_sched_group(struct task_group *tg, + struct task_group *parent) +{ + int clamp_id; + + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) + tg->uclamp[clamp_id] = parent->uclamp[clamp_id]; +} + +/** + * init_uclamp: initialize data structures required for utilization clamping + */ +static inline void init_uclamp(void) +{ + int clamp_id; + + mutex_init(&uclamp_mutex); + + /* Initialize root TG's to default (none) clamp values */ + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) + root_task_group.uclamp[clamp_id] = uclamp_none(clamp_id); +} +#else +static inline void alloc_uclamp_sched_group(struct task_group *tg, + struct task_group *parent) { } +static inline void init_uclamp(void) { } +#endif /* CONFIG_UTIL_CLAMP */ + static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags) { if (!(flags & ENQUEUE_NOCLOCK)) @@ -5907,6 +5949,8 @@ void __init sched_init(void) init_schedstats(); + init_uclamp(); + scheduler_running = 1; } @@ -6099,6 +6143,8 @@ struct task_group *sched_create_group(struct task_group *parent) if (!alloc_rt_sched_group(tg, parent)) goto err; + alloc_uclamp_sched_group(tg, parent); + return tg; err: @@ -6319,6 +6365,128 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset) sched_move_task(task); } +#ifdef CONFIG_UTIL_CLAMP +static int cpu_util_min_write_u64(struct cgroup_subsys_state *css, + struct cftype *cftype, u64 min_value) +{ + struct cgroup_subsys_state *pos; + struct task_group *tg; + int ret = -EINVAL; + + if (min_value > SCHED_CAPACITY_SCALE) + return ret; + + mutex_lock(&uclamp_mutex); + rcu_read_lock(); + + tg = css_tg(css); + + /* Already at the required value */ + if (tg->uclamp[UCLAMP_MIN] == min_value) { + ret = 0; + goto out; + } + + /* Ensure to not exceed the maximum clamp value */ + if (tg->uclamp[UCLAMP_MAX] < min_value) + goto out; + + /* Ensure min clamp fits within parent's clamp value */ + if (tg->parent && + tg->parent->uclamp[UCLAMP_MIN] > min_value) + goto out; + + /* Ensure each child is a restriction of this TG */ + css_for_each_child(pos, css) { + if (css_tg(pos)->uclamp[UCLAMP_MIN] < min_value) + goto out; + } + + /* Update TG's utilization clamp */ + tg->uclamp[UCLAMP_MIN] = min_value; + ret = 0; + +out: + rcu_read_unlock(); + mutex_unlock(&uclamp_mutex); + + return ret; +} + +static int cpu_util_max_write_u64(struct cgroup_subsys_state *css, + struct cftype *cftype, u64 max_value) +{ + struct cgroup_subsys_state *pos; + struct task_group *tg; + int ret = -EINVAL; + + if (max_value > SCHED_CAPACITY_SCALE) + return ret; + + mutex_lock(&uclamp_mutex); + rcu_read_lock(); + + tg = css_tg(css); + + /* Already at the required value */ + if (tg->uclamp[UCLAMP_MAX] == max_value) { + ret = 0; + goto out; + } + + /* Ensure to not go below the minimum clamp value */ + if (tg->uclamp[UCLAMP_MIN] > max_value) + goto out; + + /* Ensure max clamp fits within parent's clamp value */ + if (tg->parent && + tg->parent->uclamp[UCLAMP_MAX] < max_value) + goto out; + + /* Ensure each child is a restriction of this TG */ + css_for_each_child(pos, css) { + if (css_tg(pos)->uclamp[UCLAMP_MAX] > max_value) + goto out; + } + + /* Update TG's utilization clamp */ + tg->uclamp[UCLAMP_MAX] = max_value; + ret = 0; + +out: + rcu_read_unlock(); + mutex_unlock(&uclamp_mutex); + + return ret; +} + +static inline u64 cpu_uclamp_read(struct cgroup_subsys_state *css, + enum uclamp_id clamp_id) +{ + struct task_group *tg; + u64 util_clamp; + + rcu_read_lock(); + tg = css_tg(css); + util_clamp = tg->uclamp[clamp_id]; + rcu_read_unlock(); + + return util_clamp; +} + +static u64 cpu_util_min_read_u64(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return cpu_uclamp_read(css, UCLAMP_MIN); +} + +static u64 cpu_util_max_read_u64(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return cpu_uclamp_read(css, UCLAMP_MAX); +} +#endif /* CONFIG_UTIL_CLAMP */ + #ifdef CONFIG_FAIR_GROUP_SCHED static int cpu_shares_write_u64(struct cgroup_subsys_state *css, struct cftype *cftype, u64 shareval) @@ -6641,6 +6809,18 @@ static struct cftype cpu_files[] = { .read_u64 = cpu_rt_period_read_uint, .write_u64 = cpu_rt_period_write_uint, }, +#endif +#ifdef CONFIG_UTIL_CLAMP + { + .name = "util_min", + .read_u64 = cpu_util_min_read_u64, + .write_u64 = cpu_util_min_write_u64, + }, + { + .name = "util_max", + .read_u64 = cpu_util_max_read_u64, + .write_u64 = cpu_util_max_write_u64, + }, #endif { } /* Terminate */ }; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index eeef1a3086d1..982340b8870b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -330,6 +330,10 @@ struct task_group { #endif struct cfs_bandwidth cfs_bandwidth; + +#ifdef CONFIG_UTIL_CLAMP + unsigned int uclamp[UCLAMP_CNT]; +#endif }; #ifdef CONFIG_FAIR_GROUP_SCHED @@ -365,6 +369,24 @@ static inline int walk_tg_tree(tg_visitor down, tg_visitor up, void *data) extern int tg_nop(struct task_group *tg, void *data); +#ifdef CONFIG_UTIL_CLAMP +/** + * uclamp_none: default value for a clamp + * + * This returns the default value for each clamp + * - 0 for a min utilization clamp + * - SCHED_CAPACITY_SCALE for a max utilization clamp + * + * Return: the default value for a given utilization clamp + */ +static inline unsigned int uclamp_none(int clamp_id) +{ + if (clamp_id == UCLAMP_MIN) + return 0; + return SCHED_CAPACITY_SCALE; +} +#endif /* CONFIG_UTIL_CLAMP */ + extern void free_fair_sched_group(struct task_group *tg); extern int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent); extern void online_fair_sched_group(struct task_group *tg); From patchwork Thu Aug 24 18:08:53 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Bellasi X-Patchwork-Id: 110941 Delivered-To: patch@linaro.org Received: by 10.140.95.78 with SMTP id h72csp6047793qge; Thu, 24 Aug 2017 11:09:36 -0700 (PDT) X-Received: by 10.99.47.66 with SMTP id v63mr7164439pgv.4.1503598176213; Thu, 24 Aug 2017 11:09:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1503598176; cv=none; d=google.com; s=arc-20160816; b=AR3pScAkmJdX8VBzNQEOmUTf7g7MWgSVqZyE/PhfQPHfmY8ehz5USTJMzjiZ0+6Lqj ZlkO4zxeYWsTHc4H76TmmaKr2lcYY29HnLyQVv9ssqhs0XRp9D2HlxWBBG3Jv2KcXAe+ fQqyLKC+CSBg6xvsFxnTAIV4uZJgHJzGT3PhtduKAsZWfDfIhy1KICSBNkKQ7P2Ndo8k xZmWpoOJTD0pgUA0mbGBKPKOLqcgqUks4kwbsnZU1qx5gWy7QmnOrQ0C64RFg9x1+vhx 9WRAIN/gTSgiyPQGT0dWB6QYD8oXgQeeHEEHW18aAopVDECnuhwuKtFj1L2+DRCVyQyJ z3sw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=Sey30cZDVfv45iG8gZ++nlgyOEzwXMTdXEtjXNbIZ3c=; b=xShzWcIOhwL4J0r+pHV2RwFQdl0u51kmQNChOoL/yoBVmglaeGD9Qx23dxD+7e9G0H a6H5aDM4qqmyj3nifJzlc6g6YNcfJQA4yKBNqAtq3QZSCC6yo3pQ7qFL2qkUhfyXZtWe hbv3MPNi33+k2lilYFWIGDBli4QSgcNehWx4bnG1OI1iDlvVM9p7tvMMvoOb7l7ZRwY9 gixvGNVykD1GjYntcmyHOGd5NzLJx642+CB+Hg/cqkvHcab/dJ8ymR+R5OJn8Si9S0TN DfGkc8PiEyZIQkf3oVnkzU7ExwdBfzUIRMcXeTWdvZnFdY/s57sYvdu0b31t41V1tTmg Hwww== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d13si3645064pln.658.2017.08.24.11.09.35; Thu, 24 Aug 2017 11:09:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753547AbdHXSJc (ORCPT + 26 others); Thu, 24 Aug 2017 14:09:32 -0400 Received: from foss.arm.com ([217.140.101.70]:45448 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753378AbdHXSJa (ORCPT ); Thu, 24 Aug 2017 14:09:30 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D15F21650; Thu, 24 Aug 2017 11:09:29 -0700 (PDT) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B6D4C3F3E1; Thu, 24 Aug 2017 11:09:26 -0700 (PDT) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Tejun Heo , "Rafael J . Wysocki" , Paul Turner , Vincent Guittot , John Stultz , Morten Rasmussen , Dietmar Eggemann , Juri Lelli , Tim Murray , Todd Kjos , Andres Oportus , Joel Fernandes , Viresh Kumar Subject: [RFCv4 2/6] sched/core: map cpu's task groups to clamp groups Date: Thu, 24 Aug 2017 19:08:53 +0100 Message-Id: <20170824180857.32103-3-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20170824180857.32103-1-patrick.bellasi@arm.com> References: <20170824180857.32103-1-patrick.bellasi@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org To properly support per-task utilization clamping, each CPU needs to know which clamp values are required by tasks currently RUNNABLE on that CPU. However, multiple task groups can define the same clamp value for a given clamp index (i.e. util_{min,max}). Thus, a mechanism is required to map clamp values to a properly defined data structure which is suitable for fast and efficient aggregation of clamp values coming from tasks belonging to different task_groups. Such a data structure can be an array of reference counters, where each slot is used to account how many tasks requiring a certain clamp value are currently enqueued. Thus each clamp value can be mapped into a "clamp index" which identifies the position within the reference counters array. : : SLOW PATH : FAST PATH task_group::write : sched/core::enqueue/dequeue : cpufreq_schedutil : +----------------+ +--------------------+ +-------------------+ | TASK GROUP | | CLAMP GROUP | | CPU CLAMPS | +----------------+ +--------------------+ +-------------------+ | | | clamp_{min,max} | | clamp_{min,max} | | util_{min,max} | | tg_count | | tasks count | +----------------+ +--------------------+ +-------------------+ : +------------------> : +-------------------> map(clamp_value, clamp_group) : ref_count(clamp_group) : : : This patch introduces the support to map task groups on "clamp groups". Specifically it introduces the required functions to translate a clamp value into a clamp group index. Only a limited number of (different) clamp values are supported since: 1. there are usually only few classes of workloads for which it makes sense to boost/cap to different frequencies e.g. background vs foreground, interactive vs low-priority 2. it allows a simpler and more memory/time efficient tracking of the per-CPU clamp values in the fast path The number of possible different clamp values is currently defined at compile time. It's worth to notice that this does not impose a limitation on the number of task groups that can be generated. Indeed, each new task group always maps to the clamp groups of its parent. Instead, changing the clamp value for a TG can result into a -ENOSPC error in case this will exceed the number of maximum different clamp values supported. Signed-off-by: Patrick Bellasi Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Tejun Heo Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- init/Kconfig | 19 +++ kernel/sched/core.c | 348 +++++++++++++++++++++++++++++++++++++++++++++++---- kernel/sched/sched.h | 21 +++- 3 files changed, 363 insertions(+), 25 deletions(-) -- 2.14.1 diff --git a/init/Kconfig b/init/Kconfig index db736529f08b..5f0c246f2a3a 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -771,6 +771,25 @@ config UTIL_CLAMP If in doubt, say N. +config UCLAMP_GROUPS_COUNT + int "Number of utilization clamp values supported" + range 1 127 + depends on UTIL_CLAMP + default 3 + help + This defines the maximum number of different utilization clamp + values which can be concurrently enforced for each utilization + clamp index (i.e. minimum and maximum). + + Only a limited number of clamp values are supported because: + 1. there are usually only few classes of workloads for which is + makese sense to boost/cap for different frequencies + e.g. background vs foreground, interactive vs low-priority + 2. it allows a simpler and more memory/time efficient tracking of + the per-CPU clamp values. + + If in doubt, use the default value. + config CGROUP_PIDS bool "PIDs controller" help diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 20b5a11d64ab..0d39766f2b03 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -757,6 +757,232 @@ static void set_load_weight(struct task_struct *p) */ static DEFINE_MUTEX(uclamp_mutex); +/** + * uclamp_map: a clamp group representing a clamp value + * + * Since only a limited number of different clamp values are supported, this + * map allows to track how many TG's use the same clamp value and it defines + * the clamp group index used by the per-CPU accounting in the fast-path + * (i.e. tasks enqueuing/dequeuing) + * + * Since we support both max and min utilization clamp value, a matrix is used + * to map clamp values to group indexes: + * - rows map the different clamp indexes + * i.e. minimum or maximum utilization + * - columns map the different clamp groups + * i.e. TG's with similar clamp value for a given clamp index + * + * NOTE: clamp group 0 is reserved for the tracking of non clamped tasks. + * Thus we allocate one more slot than the value of + * CONFIG_UCLAMP_GROUPS_COUNT. + */ +struct uclamp_map { + int value; + int tg_count; + raw_spinlock_t tg_lock; +}; + +/** + * uclamp_maps: map TGs into per-CPU utilization clamp values + * + * This is a matrix where: + * - rows are indexed by clamp_id, and collects the clamp group for a given + * clamp index (i.e. minimum or maximum utilization) + * - cols are indexed by group_id, and represents an actual clamp group (i.e. + * a uclamp_map instance) + * + * Here is the map layout and, right below, how entries are accessed by the + * following code. + * + * uclamp_maps is a matrix of + * +------- UCLAMP_CNT by CONFIG_UCLAMP_GROUPS_COUNT+1 entries + * | | + * | /---------------+---------------\ + * | +------------+ +------------+ + * | / UCLAMP_MIN | value | | value | + * | | | tg_count |...... | tg_count | + * | | +------------+ +------------+ + * +--+ +------------+ +------------+ + * | | value | | value | + * \ UCLAMP_MAX | tg_count |...... | tg_count | + * +-----^------+ +----^-------+ + * | | + * uc_map = + | + * &uclamp_maps[clamp_id][0] + + * clamp_value = + * uc_map[group_id].value + */ +static struct uclamp_map uclamp_maps[UCLAMP_CNT][CONFIG_UCLAMP_GROUPS_COUNT + 1]; + +/** + * uclamp_group_available: check if a clamp group is available + * @clamp_id: the utilization clamp index (i.e. min or max clamp) + * @group_id: the group index in the given clamp_id + * + * A clamp group is not free if there is at least one TG's using a clamp value + * mapped on the specified clamp_id. These TG's are reference counted by the + * tg_count of a uclamp_map entry. + * + * Return: true if there are no TG's mapped on the specified clamp + * index and group + */ +static inline bool uclamp_group_available(int clamp_id, int group_id) +{ + struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0]; + + return (uc_map[group_id].value == UCLAMP_NONE); +} + +/** + * uclamp_group_init: map a clamp value on a specified clamp group + * @clamp_id: the utilization clamp index (i.e. min or max clamp) + * @group_id: the group index to map a given clamp_value + * @clamp_value: the utilization clamp value to map + * + * Each different clamp value, for a given clamp index (i.e. min/max + * utilization clamp), is mapped by a clamp group which index is use by the + * fast-path code to keep track of active tasks requiring a certain clamp + * value. + * + * This function initializes a clamp group to track tasks from the fast-path. + */ +static inline void uclamp_group_init(int clamp_id, int group_id, + unsigned int clamp_value) +{ + struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0]; + + uc_map[group_id].value = clamp_value; + uc_map[group_id].tg_count = 0; +} + +/** + * uclamp_group_reset: reset a specified clamp group + * @clamp_id: the utilization clamp index (i.e. min or max clamping) + * @group_id: the group index to release + * + * A clamp group can be reset every time there are no more TGs using the + * clamp value it maps for a given clamp index. + */ +static inline void uclamp_group_reset(int clamp_id, int group_id) +{ + uclamp_group_init(clamp_id, group_id, UCLAMP_NONE); +} + +/** + * uclamp_group_find: find the group index of a utilization clamp group + * @clamp_id: the utilization clamp index (i.e. min or max clamping) + * @clamp_value: the utilization clamping value lookup for + * + * Verify if a group has been assigned to a certain clamp value and return + * its index to be used for accounting. + * + * Since only a limited number of utilization clamp groups are allowed, if no + * groups have been assigned for the specified value, a new group is assigned + * if possible. Otherwise an error is returned, meaning that a different clamp + * value is not (currently) supported. + */ +static int +uclamp_group_find(int clamp_id, unsigned int clamp_value) +{ + struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0]; + int free_group_id = UCLAMP_NONE; + unsigned int group_id = 0; + + for ( ; group_id <= CONFIG_UCLAMP_GROUPS_COUNT; ++group_id) { + /* Keep track of first free clamp group */ + if (uclamp_group_available(clamp_id, group_id)) { + if (free_group_id == UCLAMP_NONE) + free_group_id = group_id; + continue; + } + /* Return index of first group with same clamp value */ + if (uc_map[group_id].value == clamp_value) + return group_id; + } + /* Default to first free clamp group */ + if (group_id > CONFIG_UCLAMP_GROUPS_COUNT) + group_id = free_group_id; + /* All clamp group already tracking different clamp values */ + if (group_id == UCLAMP_NONE) + return -ENOSPC; + return group_id; +} + +/** + * uclamp_group_put: decrease the reference count for a clamp group + * @clamp_id: the clamp index which was affected by a task group + * @uc_tg: the utilization clamp data for that task group + * + * When the clamp value for a task group is changed we decrease the reference + * count for the clamp group mapping its current clamp value. A clamp group is + * released when there are no more task groups referencing its clamp value. + */ +static inline void uclamp_group_put(int clamp_id, int group_id) +{ + struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0]; + unsigned long flags; + + /* Ignore TG's not yet attached */ + if (group_id == UCLAMP_NONE) + return; + + /* Remove TG from this clamp group */ + raw_spin_lock_irqsave(&uc_map[group_id].tg_lock, flags); + uc_map[group_id].tg_count -= 1; + if (uc_map[group_id].tg_count == 0) + uclamp_group_reset(clamp_id, group_id); + raw_spin_unlock_irqrestore(&uc_map[group_id].tg_lock, flags); +} + +/** + * uclamp_group_get: increase the reference count for a clamp group + * @css: reference to the task group to account + * @clamp_id: the clamp index affected by the task group + * @uc_tg: the utilization clamp data for the task group + * @clamp_value: the new clamp value for the task group + * + * Each time a task group changes the utilization clamp value, for a specified + * clamp index, we need to find an available clamp group which can be used + * to track its new clamp value. The corresponding clamp group index will be + * used by tasks in this task group to reference count the clamp value on CPUs + * where they are enqueued. + * + * Return: -ENOSPC if there are not available clamp groups, 0 on success. + */ +static inline int uclamp_group_get(struct cgroup_subsys_state *css, + int clamp_id, struct uclamp_tg *uc_tg, + unsigned int clamp_value) +{ + struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0]; + int prev_group_id = uc_tg->group_id; + int next_group_id = UCLAMP_NONE; + unsigned long flags; + + /* Lookup for a usable utilization clamp group */ + next_group_id = uclamp_group_find(clamp_id, clamp_value); + if (next_group_id < 0) { + pr_err("Cannot allocate more than %d utilization clamp groups\n", + CONFIG_UCLAMP_GROUPS_COUNT); + return -ENOSPC; + } + + /* Allocate new clamp group for this clamp value */ + if (uclamp_group_available(clamp_id, next_group_id)) + uclamp_group_init(clamp_id, next_group_id, clamp_value); + + /* Update TG's clamp values and attach it to new clamp group */ + raw_spin_lock_irqsave(&uc_map[next_group_id].tg_lock, flags); + uc_tg->value = clamp_value; + uc_tg->group_id = next_group_id; + uc_map[next_group_id].tg_count += 1; + raw_spin_unlock_irqrestore(&uc_map[next_group_id].tg_lock, flags); + + /* Release the previous clamp group */ + uclamp_group_put(clamp_id, prev_group_id); + + return 0; +} + /** * alloc_uclamp_sched_group: initialize a new TG's for utilization clamping * @tg: the newly created task group @@ -764,14 +990,52 @@ static DEFINE_MUTEX(uclamp_mutex); * * A newly created task group inherits its utilization clamp values, for all * clamp indexes, from its parent task group. + * This ensures that its values are properly initialized and that the task + * group is accounted in the same parent's group index. + * + * Return: !0 on error + */ +static inline int alloc_uclamp_sched_group(struct task_group *tg, + struct task_group *parent) +{ + struct uclamp_tg *uc_tg; + int clamp_id; + int ret = 1; + + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) { + uc_tg = &tg->uclamp[clamp_id]; + + uc_tg->value = parent->uclamp[clamp_id].value; + uc_tg->group_id = UCLAMP_NONE; + + if (uclamp_group_get(NULL, clamp_id, uc_tg, + parent->uclamp[clamp_id].value)) { + ret = 0; + goto out; + } + } + +out: + return ret; +} + +/** + * release_uclamp_sched_group: release utilization clamp references of a TG + * @tg: the task group being removed + * + * An empty task group can be removed only when it has no more tasks or child + * groups. This means that we can also safely release all the reference + * counting to clamp groups. */ -static inline void alloc_uclamp_sched_group(struct task_group *tg, - struct task_group *parent) +static inline void free_uclamp_sched_group(struct task_group *tg) { + struct uclamp_tg *uc_tg; int clamp_id; - for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) - tg->uclamp[clamp_id] = parent->uclamp[clamp_id]; + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) { + uc_tg = &tg->uclamp[clamp_id]; + uclamp_group_put(clamp_id, uc_tg->group_id); + } } /** @@ -779,17 +1043,49 @@ static inline void alloc_uclamp_sched_group(struct task_group *tg, */ static inline void init_uclamp(void) { + struct uclamp_map *uc_map; + struct uclamp_tg *uc_tg; + int group_id; int clamp_id; mutex_init(&uclamp_mutex); + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) { + uc_map = &uclamp_maps[clamp_id][0]; + /* Init TG's clamp map */ + group_id = 0; + for ( ; group_id <= CONFIG_UCLAMP_GROUPS_COUNT; ++group_id) { + uc_map[group_id].value = UCLAMP_NONE; + raw_spin_lock_init(&uc_map[group_id].tg_lock); + } + } + + /* Root TG's are initialized to the first clamp group */ + group_id = 0; + /* Initialize root TG's to default (none) clamp values */ - for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) - root_task_group.uclamp[clamp_id] = uclamp_none(clamp_id); + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) { + uc_map = &uclamp_maps[clamp_id][0]; + + /* Map root TG's clamp value */ + uclamp_group_init(clamp_id, group_id, uclamp_none(clamp_id)); + + /* Init root TG's clamp group */ + uc_tg = &root_task_group.uclamp[clamp_id]; + uc_tg->value = uclamp_none(clamp_id); + uc_tg->group_id = group_id; + + /* Attach root TG's clamp group */ + uc_map[group_id].tg_count = 1; + } } #else -static inline void alloc_uclamp_sched_group(struct task_group *tg, - struct task_group *parent) { } +static inline int alloc_uclamp_sched_group(struct task_group *tg, + struct task_group *parent) +{ + return 1; +} +static inline void free_uclamp_sched_group(struct task_group *tg) { } static inline void init_uclamp(void) { } #endif /* CONFIG_UTIL_CLAMP */ @@ -6122,6 +6418,7 @@ static DEFINE_SPINLOCK(task_group_lock); static void sched_free_group(struct task_group *tg) { + free_uclamp_sched_group(tg); free_fair_sched_group(tg); free_rt_sched_group(tg); autogroup_free(tg); @@ -6143,7 +6440,8 @@ struct task_group *sched_create_group(struct task_group *parent) if (!alloc_rt_sched_group(tg, parent)) goto err; - alloc_uclamp_sched_group(tg, parent); + if (!alloc_uclamp_sched_group(tg, parent)) + goto err; return tg; @@ -6370,6 +6668,7 @@ static int cpu_util_min_write_u64(struct cgroup_subsys_state *css, struct cftype *cftype, u64 min_value) { struct cgroup_subsys_state *pos; + struct uclamp_tg *uc_tg; struct task_group *tg; int ret = -EINVAL; @@ -6382,29 +6681,29 @@ static int cpu_util_min_write_u64(struct cgroup_subsys_state *css, tg = css_tg(css); /* Already at the required value */ - if (tg->uclamp[UCLAMP_MIN] == min_value) { + if (tg->uclamp[UCLAMP_MIN].value == min_value) { ret = 0; goto out; } /* Ensure to not exceed the maximum clamp value */ - if (tg->uclamp[UCLAMP_MAX] < min_value) + if (tg->uclamp[UCLAMP_MAX].value < min_value) goto out; /* Ensure min clamp fits within parent's clamp value */ if (tg->parent && - tg->parent->uclamp[UCLAMP_MIN] > min_value) + tg->parent->uclamp[UCLAMP_MIN].value > min_value) goto out; /* Ensure each child is a restriction of this TG */ css_for_each_child(pos, css) { - if (css_tg(pos)->uclamp[UCLAMP_MIN] < min_value) + if (css_tg(pos)->uclamp[UCLAMP_MIN].value < min_value) goto out; } - /* Update TG's utilization clamp */ - tg->uclamp[UCLAMP_MIN] = min_value; - ret = 0; + /* Update TG's reference count */ + uc_tg = &tg->uclamp[UCLAMP_MIN]; + ret = uclamp_group_get(css, UCLAMP_MIN, uc_tg, min_value); out: rcu_read_unlock(); @@ -6417,6 +6716,7 @@ static int cpu_util_max_write_u64(struct cgroup_subsys_state *css, struct cftype *cftype, u64 max_value) { struct cgroup_subsys_state *pos; + struct uclamp_tg *uc_tg; struct task_group *tg; int ret = -EINVAL; @@ -6429,29 +6729,29 @@ static int cpu_util_max_write_u64(struct cgroup_subsys_state *css, tg = css_tg(css); /* Already at the required value */ - if (tg->uclamp[UCLAMP_MAX] == max_value) { + if (tg->uclamp[UCLAMP_MAX].value == max_value) { ret = 0; goto out; } /* Ensure to not go below the minimum clamp value */ - if (tg->uclamp[UCLAMP_MIN] > max_value) + if (tg->uclamp[UCLAMP_MIN].value > max_value) goto out; /* Ensure max clamp fits within parent's clamp value */ if (tg->parent && - tg->parent->uclamp[UCLAMP_MAX] < max_value) + tg->parent->uclamp[UCLAMP_MAX].value < max_value) goto out; /* Ensure each child is a restriction of this TG */ css_for_each_child(pos, css) { - if (css_tg(pos)->uclamp[UCLAMP_MAX] > max_value) + if (css_tg(pos)->uclamp[UCLAMP_MAX].value > max_value) goto out; } - /* Update TG's utilization clamp */ - tg->uclamp[UCLAMP_MAX] = max_value; - ret = 0; + /* Update TG's reference count */ + uc_tg = &tg->uclamp[UCLAMP_MAX]; + ret = uclamp_group_get(css, UCLAMP_MAX, uc_tg, max_value); out: rcu_read_unlock(); @@ -6468,7 +6768,7 @@ static inline u64 cpu_uclamp_read(struct cgroup_subsys_state *css, rcu_read_lock(); tg = css_tg(css); - util_clamp = tg->uclamp[clamp_id]; + util_clamp = tg->uclamp[clamp_id].value; rcu_read_unlock(); return util_clamp; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 982340b8870b..869344de0396 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -290,6 +290,24 @@ struct cfs_bandwidth { #endif }; +/** + * Utilization's clamp group + * + * A utilization clamp group maps a "clamp value" (value), i.e. + * util_{min,max}, to a "clamp group index" (group_id). + * + * Thus, the same "group_id" is used by all the TG's which enforce the same + * clamp "value" for a given clamp index. + */ +struct uclamp_tg { + /* Utilization constraint for tasks in this group */ + unsigned int value; + /* Utilization clamp group for this constraint */ + unsigned int group_id; + /* No utilization clamp group assigned */ +#define UCLAMP_NONE -1 +}; + /* task group related information */ struct task_group { struct cgroup_subsys_state css; @@ -332,8 +350,9 @@ struct task_group { struct cfs_bandwidth cfs_bandwidth; #ifdef CONFIG_UTIL_CLAMP - unsigned int uclamp[UCLAMP_CNT]; + struct uclamp_tg uclamp[UCLAMP_CNT]; #endif + }; #ifdef CONFIG_FAIR_GROUP_SCHED From patchwork Thu Aug 24 18:08:54 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Bellasi X-Patchwork-Id: 110943 Delivered-To: patch@linaro.org Received: by 10.140.95.78 with SMTP id h72csp6048042qge; Thu, 24 Aug 2017 11:09:46 -0700 (PDT) X-Received: by 10.98.74.73 with SMTP id x70mr7297505pfa.160.1503598186810; Thu, 24 Aug 2017 11:09:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1503598186; cv=none; d=google.com; s=arc-20160816; b=XIBA3KJGJmRvzq0Kf2WXO1KBcC7exi0Cvj67UU0nqfl0yhwT1jXLOhdFUD4E4vfPbx yq2UZvIV0wowXLpIvApdhRqGDMNGkNyA1V7zUgWiwUyjQoke57/lYB2htN3FxWafwbym KS4rbXFYxxr70Ukt++sarwwcJ6wxS01k7HPWTVyWsWDv73nL74obL7SQs365GDad/50R CBoKae0ObGTxBbzeshF+kd8cs0PBaJJnxfuqs+1LE9OZTRLfxFdk9HxdoHopSryUi8kU 8er5U1fRhxYQQTSOBJ591/b+DnXKrW4A9H4HsVCajWC7ewQWrUt1Ln5ShMOO1WuBnelZ bb/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=KwCTr8NkSNGLKbiCMk7mOqrpOL9yqVRVIyRKk9r1o34=; b=U6sjfr958aFETW9xbKlxmW8fHMPrYZxo/FAMuQbP9VcN+IOO+VN3gTj6YT82ifS+j8 AnZHL2yRvVT0mjE/NkbPFqQowhcTffYjWn5GjJey3Rubw1hxN1+GzBE/hzdpFxq9lTC+ wUMmeuiwHbCgcR1OS9AJ6ivQF8Q9d81FqwUwTGbABUPtNgrr6lgUGcsGFH/DSah1Po5U teKVeMFznvaAqV0jbchWXZt+2346EiOgDv6jtPN4ftOtxMFNC3kpnuD503f4TroS93AH jSVO8ja+FZX+iNB1zrDLVHJ+Vk63m++28EL1bGFC8d336oigxgJvZxR1/goMZLJ/bJOH UDHg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n1si3173462pga.824.2017.08.24.11.09.46; Thu, 24 Aug 2017 11:09:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753647AbdHXSJn (ORCPT + 26 others); Thu, 24 Aug 2017 14:09:43 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:45468 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753133AbdHXSJe (ORCPT ); Thu, 24 Aug 2017 14:09:34 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 12C39168F; Thu, 24 Aug 2017 11:09:33 -0700 (PDT) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 1BEC73F3E1; Thu, 24 Aug 2017 11:09:29 -0700 (PDT) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Tejun Heo , "Rafael J . Wysocki" , Paul Turner , Vincent Guittot , John Stultz , Morten Rasmussen , Dietmar Eggemann , Juri Lelli , Tim Murray , Todd Kjos , Andres Oportus , Joel Fernandes , Viresh Kumar Subject: [RFCv4 3/6] sched/core: reference count active tasks's clamp groups Date: Thu, 24 Aug 2017 19:08:54 +0100 Message-Id: <20170824180857.32103-4-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20170824180857.32103-1-patrick.bellasi@arm.com> References: <20170824180857.32103-1-patrick.bellasi@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When tasks are enqueued/dequeued on/from a CPU, the set of clamp groups active on that CPU can change. Indeed, the clamp value mapped by a clamp group applies to a CPU only when there is at least one task active in that clamp group. Since each clamp group enforces a different utilization clamp value, once the set of these groups changes it can be required to re-compute what is the new "aggregated" clamp value to apply for that CPU. Clamp values are always MAX aggregated for both util_min and util_max. This is to ensure that no tasks can affect the performances of other co-scheduled tasks which are either more boosted (i.e. with higher util_min clamp) or less capped (i.e. with higher util_max clamp). This patch introduces the required support to properly reference count clamp groups at each task enqueue/dequeue time. The MAX aggregation of the currently active clamp groups is implemented to minimizes the number of times we need to scan the complete (unordered) clamp group array to figure out the new max value. This operation happens only when we dequeue last task of the clamp group defining the current max clamp, and thus the CPU is either entering IDLE or going to schedule a less boosted or more clamped task. Signed-off-by: Patrick Bellasi Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Tejun Heo Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- include/linux/sched.h | 5 ++ kernel/sched/core.c | 160 ++++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 77 ++++++++++++++++++++++++ 3 files changed, 242 insertions(+) -- 2.14.1 diff --git a/include/linux/sched.h b/include/linux/sched.h index 265ac0898f9e..5cf0ee6a1aee 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -574,6 +574,11 @@ struct task_struct { #endif struct sched_dl_entity dl; +#ifdef CONFIG_UTIL_CLAMP + /* Index of clamp group the task has been accounted into */ + int uclamp_group_id[UCLAMP_CNT]; +#endif + #ifdef CONFIG_PREEMPT_NOTIFIERS /* List of struct preempt_notifier: */ struct hlist_head preempt_notifiers; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0d39766f2b03..ba31bb4e14c7 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -850,9 +850,19 @@ static inline void uclamp_group_init(int clamp_id, int group_id, unsigned int clamp_value) { struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0]; + struct uclamp_cpu *uc_cpu; + int cpu; + /* Set clamp group map */ uc_map[group_id].value = clamp_value; uc_map[group_id].tg_count = 0; + + /* Set clamp groups on all CPUs */ + for_each_possible_cpu(cpu) { + uc_cpu = &cpu_rq(cpu)->uclamp[clamp_id]; + uc_cpu->group[group_id].value = clamp_value; + uc_cpu->group[group_id].tasks = 0; + } } /** @@ -908,6 +918,110 @@ uclamp_group_find(int clamp_id, unsigned int clamp_value) return group_id; } +/** + * uclamp_cpu_update: update the utilization clamp of a CPU + * @cpu: the CPU which utilization clamp has to be updated + * @clamp_id: the clamp index to update + * + * When tasks are enqueued/dequeued on/from a CPU, the set of currently active + * clamp groups is subject to change. Since each clamp group enforces a + * different utilization clamp value, once the set of these groups change it + * can be required to re-compute what is the new clamp value to apply for that + * CPU. + * + * For the specified clamp index, this method computes the new CPU utilization + * clamp to use until the next change on the set of tasks active on that CPU. + */ +static inline void uclamp_cpu_update(int cpu, int clamp_id) +{ + struct uclamp_cpu *uc_cpu = &cpu_rq(cpu)->uclamp[clamp_id]; + int max_value = UCLAMP_NONE; + unsigned int group_id; + + for (group_id = 0; group_id <= CONFIG_UCLAMP_GROUPS_COUNT; ++group_id) { + + /* Ignore inactive clamp groups, i.e. no RUNNABLE tasks */ + if (!uclamp_group_active(uc_cpu, group_id)) + continue; + + /* Both min and max clamp are MAX aggregated */ + max_value = max(max_value, uc_cpu->group[group_id].value); + + /* Stop if we reach the max possible clamp */ + if (max_value >= SCHED_CAPACITY_SCALE) + break; + } + uc_cpu->value = max_value; +} + +/** + * uclamp_cpu_get(): increase reference count for a clamp group on a CPU + * @p: the task being enqueued on a CPU + * @cpu: the CPU where the clamp group has to be reference counted + * @clamp_id: the utilization clamp (e.g. min or max utilization) to reference + * + * Once a task is enqueued on a CPU's RQ, the clamp group currently defined by + * the task's TG::uclamp.group_id is reference counted on that CPU. + * We keep track of the reference counted clamp group by storing its index + * (group_id) into the task's task_struct::uclamp_group_id, which will then be + * used at task's dequeue time to release the reference count. + */ +static inline void uclamp_cpu_get(struct task_struct *p, int cpu, int clamp_id) +{ + struct uclamp_cpu *uc_cpu = &cpu_rq(cpu)->uclamp[clamp_id]; + int clamp_value = task_group(p)->uclamp[clamp_id].value; + int group_id; + + /* Increment the current TG's group_id */ + group_id = task_group(p)->uclamp[clamp_id].group_id; + uc_cpu->group[group_id].tasks += 1; + + /* Mark task as enqueued for this clamp IDX */ + p->uclamp_group_id[clamp_id] = group_id; + + /* + * If this is the new max utilization clamp value, then + * we can update straight away the CPU clamp value. + */ + if (uc_cpu->value < clamp_value) + uc_cpu->value = clamp_value; +} + +/** + * uclamp_cpu_put(): decrease reference count for a clamp groups on a CPU + * @p: the task being dequeued from a CPU + * @cpu: the CPU from where the clamp group has to be released + * @clamp_id: the utilization clamp (e.g. min or max utilization) to release + * + * When a task is dequeued from a CPU's RQ, the clamp group reference counted + * by the task's task_struct::uclamp_group_id is decrease for that CPU. + */ +static inline void uclamp_cpu_put(struct task_struct *p, int cpu, int clamp_id) +{ + struct uclamp_cpu *uc_cpu = &cpu_rq(cpu)->uclamp[clamp_id]; + unsigned int clamp_value; + int group_id; + + /* Decrement the task's reference counted group index */ + group_id = p->uclamp_group_id[clamp_id]; + uc_cpu->group[group_id].tasks -= 1; + + /* Mark task as dequeued for this clamp IDX */ + p->uclamp_group_id[clamp_id] = UCLAMP_NONE; + + /* If this is not the last task, no updates are required */ + if (uc_cpu->group[group_id].tasks > 0) + return; + + /* + * Update the CPU only if this was the last task of the group + * defining the current clamp value. + */ + clamp_value = uc_cpu->group[group_id].value; + if (clamp_value >= uc_cpu->value) + uclamp_cpu_update(cpu, clamp_id); +} + /** * uclamp_group_put: decrease the reference count for a clamp group * @clamp_id: the clamp index which was affected by a task group @@ -983,6 +1097,38 @@ static inline int uclamp_group_get(struct cgroup_subsys_state *css, return 0; } +/** + * uclamp_task_update: update clamp group referenced by a task + * @rq: the RQ the task is going to be enqueued/dequeued to/from + * @p: the task being enqueued/dequeued + * + * Utilization clamp constraints for a CPU depend on tasks which are active + * (i.e. RUNNABLE or RUNNING) on that CPU. To keep track of tasks + * requirements, each active task reference counts a clamp group in the CPU + * they are currently queued for execution. + * + * This method updates the utilization clamp constraints considering the + * requirements for the specified task. Thus, this update must be done before + * calling into the scheduling classes, which will eventually update schedutil + * considering the new task requirements. + */ +static inline void uclamp_task_update(struct rq *rq, struct task_struct *p) +{ + int cpu = cpu_of(rq); + int clamp_id; + + /* The idle task is never clamped */ + if (unlikely(p->sched_class == &idle_sched_class)) + return; + + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) { + if (uclamp_task_affects(p, clamp_id)) + uclamp_cpu_put(p, cpu, clamp_id); + else + uclamp_cpu_get(p, cpu, clamp_id); + } +} + /** * alloc_uclamp_sched_group: initialize a new TG's for utilization clamping * @tg: the newly created task group @@ -1043,10 +1189,12 @@ static inline void free_uclamp_sched_group(struct task_group *tg) */ static inline void init_uclamp(void) { + struct uclamp_cpu *uc_cpu; struct uclamp_map *uc_map; struct uclamp_tg *uc_tg; int group_id; int clamp_id; + int cpu; mutex_init(&uclamp_mutex); @@ -1058,6 +1206,11 @@ static inline void init_uclamp(void) uc_map[group_id].value = UCLAMP_NONE; raw_spin_lock_init(&uc_map[group_id].tg_lock); } + /* Init CPU's clamp groups */ + for_each_possible_cpu(cpu) { + uc_cpu = &cpu_rq(cpu)->uclamp[clamp_id]; + memset(uc_cpu, UCLAMP_NONE, sizeof(struct uclamp_cpu)); + } } /* Root TG's are initialized to the first clamp group */ @@ -1080,6 +1233,7 @@ static inline void init_uclamp(void) } } #else +static inline void uclamp_task_update(struct rq *rq, struct task_struct *p) { } static inline int alloc_uclamp_sched_group(struct task_group *tg, struct task_group *parent) { @@ -1097,6 +1251,7 @@ static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags) if (!(flags & ENQUEUE_RESTORE)) sched_info_queued(rq, p); + uclamp_task_update(rq, p); p->sched_class->enqueue_task(rq, p, flags); } @@ -1108,6 +1263,7 @@ static inline void dequeue_task(struct rq *rq, struct task_struct *p, int flags) if (!(flags & DEQUEUE_SAVE)) sched_info_dequeued(rq, p); + uclamp_task_update(rq, p); p->sched_class->dequeue_task(rq, p, flags); } @@ -2499,6 +2655,10 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) p->se.cfs_rq = NULL; #endif +#ifdef CONFIG_UTIL_CLAMP + memset(&p->uclamp_group_id, UCLAMP_NONE, sizeof(p->uclamp_group_id)); +#endif + #ifdef CONFIG_SCHEDSTATS /* Even if schedstat is disabled, there should not be garbage */ memset(&p->se.statistics, 0, sizeof(p->se.statistics)); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 869344de0396..b0f17c19c0f6 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -389,6 +389,42 @@ static inline int walk_tg_tree(tg_visitor down, tg_visitor up, void *data) extern int tg_nop(struct task_group *tg, void *data); #ifdef CONFIG_UTIL_CLAMP +/** + * Utilization clamp Group + * + * Keep track of how many tasks are RUNNABLE for a given utilization + * clamp value. + */ +struct uclamp_group { + /* Utilization clamp value for tasks on this clamp group */ + int value; + /* Number of RUNNABLE tasks on this clamp group */ + int tasks; +}; + +/** + * CPU's utilization clamp + * + * Keep track of active tasks on a CPUs to aggregate their clamp values. A + * clamp value is affecting a CPU where there is at least one task RUNNABLE + * (or actually running) with that value. + * All utilization clamping values are MAX aggregated, since: + * - for util_min: we wanna run the CPU at least at the max of the minimum + * utilization required by its currently active tasks. + * - for util_max: we wanna allow the CPU to run up to the max of the + * maximum utilization allowed by its currently active tasks. + * + * Since on each system we expect only a limited number of utilization clamp + * values, we can use a simple array to track the metrics required to compute + * all the per-CPU utilization clamp values. + */ +struct uclamp_cpu { + /* Utilization clamp value for a CPU */ + int value; + /* Utilization clamp groups affecting this CPU */ + struct uclamp_group group[CONFIG_UCLAMP_GROUPS_COUNT + 1]; +}; + /** * uclamp_none: default value for a clamp * @@ -404,6 +440,44 @@ static inline unsigned int uclamp_none(int clamp_id) return 0; return SCHED_CAPACITY_SCALE; } + +/** + * uclamp_task_affects: check if a task affects a utilization clamp + * @p: the task to consider + * @clamp_id: the utilization clamp to check + * + * A task affects a clamp index if its task_struct::uclamp_group_id is a + * valid clamp group index for the specified clamp index. + * Once a task is dequeued from a CPU, its clamp group indexes are reset to + * UCLAMP_NONE. A valid clamp group index is assigned to a task only when it + * is RUNNABLE on a CPU and it represents the clamp group which is currently + * reference counted by that task. + * + * Return: true if p currently affects the specified clamp_id + */ +static inline bool uclamp_task_affects(struct task_struct *p, int clamp_id) +{ + int task_group_id = p->uclamp_group_id[clamp_id]; + + return (task_group_id != UCLAMP_NONE); +} + +/** + * uclamp_group_active: check if a clamp group is active on a CPU + * @uc_cpu: the array of clamp groups for a CPU + * @group_id: the clamp group to check + * + * A clamp group affects a CPU if it as at least one "active" task. + * + * Return: true if the specified CPU has at least one active task for + * the specified clamp group. + */ +static inline bool uclamp_group_active(struct uclamp_cpu *uc_cpu, int group_id) +{ + return uc_cpu->group[group_id].tasks > 0; +} +#else +struct uclamp_cpu { }; #endif /* CONFIG_UTIL_CLAMP */ extern void free_fair_sched_group(struct task_group *tg); @@ -771,6 +845,9 @@ struct rq { unsigned long cpu_capacity; unsigned long cpu_capacity_orig; + /* util_{min,max} clamp values based on CPU's active tasks */ + struct uclamp_cpu uclamp[UCLAMP_CNT]; + struct callback_head *balance_callback; unsigned char idle_balance; From patchwork Thu Aug 24 18:08:56 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Bellasi X-Patchwork-Id: 110944 Delivered-To: patch@linaro.org Received: by 10.140.95.78 with SMTP id h72csp6048090qge; Thu, 24 Aug 2017 11:09:49 -0700 (PDT) X-Received: by 10.98.130.206 with SMTP id w197mr7109697pfd.310.1503598189042; Thu, 24 Aug 2017 11:09:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1503598189; cv=none; d=google.com; s=arc-20160816; b=Ef/UZEPpEZnH92AfScw9Z6/NBUeN1Wpmk/R9vH3VXhq7olSIy34pud3u6ytsmk2Ofj n3P7XQTu6IKVHg5g69i7HQPGwkwvbvkEb+MlnBXPgT+dlmhbmLtHsOvPPSj56ql8POpX +9NhvEeDdhKrUsPsc75LyhtEV+UB+AyACqKpmKIglhH+dPub4dqm60pb3F6k6YedRABO netPpQZ6HaS5HUdvMwQ+nV4TpDbD2q6GpskU/7FPcMLrm8u2iV8zhkf9lM44tgJ5ix+o PxgYOOxgebPmiE1WGd5Iq0LYf/wHMJF+p24sCVndq3IrWj8Qnc3/XG7vY01t2P+P1MDe Kikw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=AZcf77aSqAfVkdux+yJERo9Z9HLpPxEYS3KX2b9vdl8=; b=T2BCNbcEYTZP6YieGHaMt0NP+wx+WzGGGPTTq02yRDD04KQUjbDqIZgu0ZC36DdEBS Cc6aAy4fThAMOkNnL65gr1xxSi/wlbvGRMsUTBQ5If8dYNyTrpCz802GVz/ARkzJ+kxc PdWAzgjL46T+AHAHxh80Rh7/gH2ey5mESVF+WG0zPf5ApwFid9wsjiGnC7c3AfTAYZZF D70UuLVdEz98TRtwcDBO6kxUmUG43zEJL7Bo9n+7prsYD4nsLg5qVlkWoDE+9yTYRolY LPtGNkA3CxG6HBR2WzyhupUrVQ8njoPKQCf0m1Sl+ugbNuXtnwlfK0S3lzBbCsBLUJOs RgLg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n1si3173462pga.824.2017.08.24.11.09.48; Thu, 24 Aug 2017 11:09:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753683AbdHXSJp (ORCPT + 26 others); Thu, 24 Aug 2017 14:09:45 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:45494 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753519AbdHXSJj (ORCPT ); Thu, 24 Aug 2017 14:09:39 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7E1F81715; Thu, 24 Aug 2017 11:09:39 -0700 (PDT) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 86E943F3E1; Thu, 24 Aug 2017 11:09:36 -0700 (PDT) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Tejun Heo , "Rafael J . Wysocki" , Paul Turner , Vincent Guittot , John Stultz , Morten Rasmussen , Dietmar Eggemann , Juri Lelli , Tim Murray , Todd Kjos , Andres Oportus , Joel Fernandes , Viresh Kumar Subject: [RFCv4 5/6] cpufreq: schedutil: add util clamp for FAIR tasks Date: Thu, 24 Aug 2017 19:08:56 +0100 Message-Id: <20170824180857.32103-6-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20170824180857.32103-1-patrick.bellasi@arm.com> References: <20170824180857.32103-1-patrick.bellasi@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Each time a frequency update is required via schedutil, we must grant the util_{min,max} constraints enforced in the current CPU by its set of currently RUNNABLE tasks. This patch adds the required support to clamp the utilization generated by FAIR tasks within the boundaries defined by their aggregated utilization clamp constraints. The clamped utilization is then used to select the frequency thus allowing, for example, to: - boost tasks which are directly affecting the user experience by running them at least at a minimum "required" frequency - cap low priority tasks not directly affecting the user experience by running them only up to a maximum "allowed" frequency The default values for boosting and capping are defined to be: - util_min: 0 - util_max: SCHED_CAPACITY_SCALE which means that by default no boosting/capping is enforced. Signed-off-by: Patrick Bellasi Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- kernel/sched/cpufreq_schedutil.c | 33 ++++++++++++++++++++++ kernel/sched/sched.h | 60 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 93 insertions(+) -- 2.14.1 diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 29a397067ffa..f67c26bbade4 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -231,6 +231,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, } else { sugov_get_util(&util, &max); sugov_iowait_boost(sg_cpu, &util, &max); + util = uclamp_util(smp_processor_id(), util); next_f = get_next_freq(sg_policy, util, max); /* * Do not reduce the frequency if the CPU has not been idle @@ -246,9 +247,18 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) { struct sugov_policy *sg_policy = sg_cpu->sg_policy; struct cpufreq_policy *policy = sg_policy->policy; + unsigned long max_util, min_util; unsigned long util = 0, max = 1; unsigned int j; + /* Initialize clamp values based on caller CPU constraints */ + if (uclamp_enabled) { + int cpu = smp_processor_id(); + + max_util = uclamp_value(cpu, UCLAMP_MAX); + min_util = uclamp_value(cpu, UCLAMP_MIN); + } + for_each_cpu(j, policy->cpus) { struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j); unsigned long j_util, j_max; @@ -277,8 +287,31 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) } sugov_iowait_boost(j_sg_cpu, &util, &max); + + /* + * Update clamping range based on j-CPUs constraints, but only + * if active. Idle CPUs do not enforce constraints in a shared + * frequency domain. + */ + if (uclamp_enabled && !idle_cpu(j)) { + unsigned long j_max_util, j_min_util; + + j_max_util = uclamp_value(j, UCLAMP_MAX); + j_min_util = uclamp_value(j, UCLAMP_MIN); + + /* + * Clamp values are MAX aggregated among all the + * different CPUs in the shared frequency domain. + */ + max_util = max(max_util, j_max_util); + min_util = max(min_util, j_min_util); + } } + /* Clamp utilization based on aggregated uclamp constraints */ + if (uclamp_enabled) + util = clamp(util, min_util, max_util); + return get_next_freq(sg_policy, util, max); } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 164a8ac152b3..4a235c4a0762 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2224,6 +2224,66 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {} static inline void cpufreq_update_this_cpu(struct rq *rq, unsigned int flags) {} #endif /* CONFIG_CPU_FREQ */ +#ifdef CONFIG_UTIL_CLAMP +/* Enable clamping code at compile time by constant propagation */ +#define uclamp_enabled true + +/** + * uclamp_value: get the current CPU's utilization clamp value + * @cpu: the CPU to consider + * @clamp_id: the utilization clamp index (i.e. min or max utilization) + * + * The utilization clamp value for a CPU depends on its set of currently + * active tasks and their specific util_{min,max} constraints. + * A max aggregated value is tracked for each CPU and returned by this + * function. An IDLE CPU never enforces a clamp value. + * + * Return: the current value for the specified CPU and clamp index + */ +static inline unsigned int uclamp_value(unsigned int cpu, int clamp_id) +{ + struct uclamp_cpu *uc_cpu = &cpu_rq(cpu)->uclamp[clamp_id]; + int clamp_value = uclamp_none(clamp_id); + + /* Update min utilization clamp */ + if (uc_cpu->value != UCLAMP_NONE) + clamp_value = uc_cpu->value; + + return clamp_value; +} + +/** + * clamp_util: clamp a utilization value for a specified CPU + * @cpu: the CPU to get the clamp values from + * @util: the utilization signal to clamp + * + * Each CPU tracks util_{min,max} clamp values depending on the set of its + * currently active tasks. Given a utilization signal, i.e a signal in the + * [0..SCHED_CAPACITY_SCALE] range, this function returns a clamped + * utilization signal considering the current clamp values for the + * specified CPU. + * + * Return: a clamped utilization signal for a given CPU. + */ +static inline int uclamp_util(unsigned int cpu, unsigned int util) +{ + unsigned int min_util = uclamp_value(cpu, UCLAMP_MIN); + unsigned int max_util = uclamp_value(cpu, UCLAMP_MAX); + + return clamp(util, min_util, max_util); +} +#else +/* Disable clamping code at compile time by constant propagation */ +#define uclamp_enabled false +#define uclamp_util(cpu, util) util +static inline unsigned int uclamp_value(unsigned int cpu, int clamp_id) +{ + if (clamp_id == UCLAMP_MIN) + return 0; + return SCHED_CAPACITY_SCALE; +} +#endif /* CONFIG_UTIL_CLAMP */ + #ifdef arch_scale_freq_capacity #ifndef arch_scale_freq_invariant #define arch_scale_freq_invariant() (true) From patchwork Thu Aug 24 18:08:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Patrick Bellasi X-Patchwork-Id: 110945 Delivered-To: patch@linaro.org Received: by 10.140.95.78 with SMTP id h72csp6048362qge; Thu, 24 Aug 2017 11:10:03 -0700 (PDT) X-Received: by 10.84.215.208 with SMTP id g16mr7901525plj.191.1503598203024; Thu, 24 Aug 2017 11:10:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1503598203; cv=none; d=google.com; s=arc-20160816; b=LHRVaXz7zgArMNujuRDtWH1cJT6ClKDYMfXoHTDbFE554x2ggF70bxG9DtCcT9aaoH 42bXk3wTdivFgrhtebiLGx+JmeZ1k+dmgZ1FDBDLE20eu52FhHrHr4wfz08ne2/l10Cs vg/YmE8spXUBN6EcgCBa2E/cTNtH6w0995e7NNINEEAsGY328D2f8ZEhGFdxszCKqq6g 8qfSTG31jvrTuoH0Rv++kLaAneOA3KdR75zWhvu3eOVLy684XotW4t0ARdoCHmcw18ou 3yQZd/T0lOFm3P3UdYDHIKsQRueJaHs/aBRgBcnZXiuUm3PQNn6nL85pB3h4bEl2k5dc GN5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=joO/zRCPzNiOYWzhoVg8Xe8bKKC4pg8e2es40/zRuLE=; b=Sk++CoeKxJ0rlvhkXs680ZsKRDA//7xOzmfZ3qujUBocedX0vERuWEQKbhsdwpe3vY MBW/1aRzShYrH3eK5PMqs9EGg8Ezd6GV/FX+FUVW6eo9eWP8nCneVVazAqKRapYKWMg6 EjeL0Io+UgpkmqvawS/Y+yEgL1MoCH95Z8NFDFksYVOakH7NCqc3RMG5M4ynf0YrsQFZ rqKZidnyYRBbLZjgJBkiNyGg0L7z2MZQac6bZEimzvm8UNR+vNHW+4r9cd4JtFt78Nh2 KElEyP8lbcisYP3vkvKhKCMd9fG0HEWD4VQ83uW0HGYJeXdYcGM9zxTski5fGS4LlO+C MawA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n1si3173462pga.824.2017.08.24.11.10.02; Thu, 24 Aug 2017 11:10:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753758AbdHXSKA (ORCPT + 26 others); Thu, 24 Aug 2017 14:10:00 -0400 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70]:45510 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753631AbdHXSJn (ORCPT ); Thu, 24 Aug 2017 14:09:43 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B3C561991; Thu, 24 Aug 2017 11:09:42 -0700 (PDT) Received: from e110439-lin.cambridge.arm.com (e110439-lin.cambridge.arm.com [10.1.210.68]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id BCCE83F3E1; Thu, 24 Aug 2017 11:09:39 -0700 (PDT) From: Patrick Bellasi To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar , Peter Zijlstra , Tejun Heo , "Rafael J . Wysocki" , Paul Turner , Vincent Guittot , John Stultz , Morten Rasmussen , Dietmar Eggemann , Juri Lelli , Tim Murray , Todd Kjos , Andres Oportus , Joel Fernandes , Viresh Kumar Subject: [RFCv4 6/6] cpufreq: schedutil: add util clamp for RT/DL tasks Date: Thu, 24 Aug 2017 19:08:57 +0100 Message-Id: <20170824180857.32103-7-patrick.bellasi@arm.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20170824180857.32103-1-patrick.bellasi@arm.com> References: <20170824180857.32103-1-patrick.bellasi@arm.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Currently schedutil enforces a maximum frequency when RT/DL tasks are RUNNABLE. Such a mandatory policy can be made more tunable from userspace thus allowing for example to define a max frequency which is still reasonable for the execution of a specific RT/DL workload. This will contribute to make the RT class more friendly for power/energy sensitive use-cases. This patch extends the usage of util_{min,max} to the RT/DL classes. Whenever a task in these classes is RUNNABLE, the util required is defined by the constraints of the CPU control group the task belongs to. Signed-off-by: Patrick Bellasi Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Rafael J. Wysocki Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- kernel/sched/cpufreq_schedutil.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) -- 2.14.1 diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index f67c26bbade4..feca60c107bc 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -227,7 +227,10 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, busy = sugov_cpu_is_busy(sg_cpu); if (flags & SCHED_CPUFREQ_RT_DL) { - next_f = policy->cpuinfo.max_freq; + util = uclamp_util(smp_processor_id(), SCHED_CAPACITY_SCALE); + next_f = (uclamp_enabled && util < SCHED_CAPACITY_SCALE) + ? get_next_freq(sg_policy, util, policy->cpuinfo.max_freq) + : policy->cpuinfo.max_freq; } else { sugov_get_util(&util, &max); sugov_iowait_boost(sg_cpu, &util, &max); @@ -276,10 +279,15 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) j_sg_cpu->iowait_boost = 0; continue; } - if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL) - return policy->cpuinfo.max_freq; - j_util = j_sg_cpu->util; + if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL) { + if (!uclamp_enabled) + return policy->cpuinfo.max_freq; + j_util = uclamp_util(j, SCHED_CAPACITY_SCALE); + } else { + j_util = j_sg_cpu->util; + } + j_max = j_sg_cpu->max; if (j_util * max > j_max * util) { util = j_util;