From patchwork Fri Jul 21 15:50:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 705998 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45E61EB64DD for ; Fri, 21 Jul 2023 15:50:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232267AbjGUPuF (ORCPT ); Fri, 21 Jul 2023 11:50:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49944 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232272AbjGUPuE (ORCPT ); Fri, 21 Jul 2023 11:50:04 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 75848E6F; Fri, 21 Jul 2023 08:50:03 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B0D12D75; Fri, 21 Jul 2023 08:50:46 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.0.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 853DA3F738; Fri, 21 Jul 2023 08:50:00 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, Pierre.Gondois@arm.com, ionela.voinescu@arm.com, mhiramat@kernel.org Subject: [PATCH v3 01/12] PM: EM: Refactor em_cpufreq_update_efficiencies() arguments Date: Fri, 21 Jul 2023 16:50:11 +0100 Message-Id: <20230721155022.2339982-2-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230721155022.2339982-1-lukasz.luba@arm.com> References: <20230721155022.2339982-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org In order to prepare the code for the modifiable EM perf_state table, refactor existing function em_cpufreq_update_efficiencies(). Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 7b44f5b89fa1..0d037f3c4e58 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -237,10 +237,10 @@ static int em_create_pd(struct device *dev, int nr_states, return 0; } -static void em_cpufreq_update_efficiencies(struct device *dev) +static void +em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *table) { struct em_perf_domain *pd = dev->em_pd; - struct em_perf_state *table; struct cpufreq_policy *policy; int found = 0; int i; @@ -254,8 +254,6 @@ static void em_cpufreq_update_efficiencies(struct device *dev) return; } - table = pd->table; - for (i = 0; i < pd->nr_perf_states; i++) { if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT)) continue; @@ -397,7 +395,7 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, dev->em_pd->flags |= flags; - em_cpufreq_update_efficiencies(dev); + em_cpufreq_update_efficiencies(dev, dev->em_pd->table); em_debug_create_pd(dev); dev_info(dev, "EM: created perf domain\n"); From patchwork Fri Jul 21 15:50:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 705154 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F065EB64DC for ; Fri, 21 Jul 2023 15:50:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231524AbjGUPuP (ORCPT ); Fri, 21 Jul 2023 11:50:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232282AbjGUPuI (ORCPT ); Fri, 21 Jul 2023 11:50:08 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 88D7219A1; Fri, 21 Jul 2023 08:50:06 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BD2A114BF; Fri, 21 Jul 2023 08:50:49 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.0.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 8AA573F738; Fri, 21 Jul 2023 08:50:03 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, Pierre.Gondois@arm.com, ionela.voinescu@arm.com, mhiramat@kernel.org Subject: [PATCH v3 02/12] PM: EM: Find first CPU online while updating OPP efficiency Date: Fri, 21 Jul 2023 16:50:12 +0100 Message-Id: <20230721155022.2339982-3-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230721155022.2339982-1-lukasz.luba@arm.com> References: <20230721155022.2339982-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The Energy Model might be updated at runtime and the energy efficiency for each OPP may change. Thus, there is a need to update also the cpufreq framework and make it aligned to the new values. In order to do that, use a first online CPU from the Performance Domain. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 0d037f3c4e58..85a70b7da023 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -243,12 +243,19 @@ em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *table) struct em_perf_domain *pd = dev->em_pd; struct cpufreq_policy *policy; int found = 0; - int i; + int i, cpu; if (!_is_cpu_device(dev) || !pd) return; - policy = cpufreq_cpu_get(cpumask_first(em_span_cpus(pd))); + /* Try to get a CPU which is online and in this PD */ + cpu = cpumask_first_and(em_span_cpus(pd), cpu_active_mask); + if (cpu >= nr_cpu_ids) { + dev_warn(dev, "EM: No online CPU for CPUFreq policy\n"); + return; + } + + policy = cpufreq_cpu_get(cpu); if (!policy) { dev_warn(dev, "EM: Access to CPUFreq policy failed"); return; From patchwork Fri Jul 21 15:50:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 705997 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B52F2EB64DC for ; Fri, 21 Jul 2023 15:50:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232276AbjGUPuS (ORCPT ); Fri, 21 Jul 2023 11:50:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50106 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232290AbjGUPuP (ORCPT ); Fri, 21 Jul 2023 11:50:15 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 910CD2D7E; Fri, 21 Jul 2023 08:50:09 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C48E12F4; Fri, 21 Jul 2023 08:50:52 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.0.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 994C83F738; Fri, 21 Jul 2023 08:50:06 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, Pierre.Gondois@arm.com, ionela.voinescu@arm.com, mhiramat@kernel.org Subject: [PATCH v3 03/12] PM: EM: Refactor em_pd_get_efficient_state() to be more flexible Date: Fri, 21 Jul 2023 16:50:13 +0100 Message-Id: <20230721155022.2339982-4-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230721155022.2339982-1-lukasz.luba@arm.com> References: <20230721155022.2339982-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The Energy Model (EM) is going to support runtime modification. There are going to be 2 EM tables which store information. This patch aims to prepare the code to be generic and use one of the tables. The function will no longer get a pointer to 'struct em_perf_domain' (the EM) but instead a pointer to 'struct em_perf_state' (which is one of the EM's tables). Prepare em_pd_get_efficient_state() for the upcoming changes and make it possible to re-use. Return an index for the best performance state for a given EM table. The function arguments that are introduced should allow to work on different performance state arrays. The caller of em_pd_get_efficient_state() should be able to use the index either on the default or the modifiable EM table. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index b9caa01dfac4..8069f526c9d8 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -175,33 +175,35 @@ void em_dev_unregister_perf_domain(struct device *dev); /** * em_pd_get_efficient_state() - Get an efficient performance state from the EM - * @pd : Performance domain for which we want an efficient frequency - * @freq : Frequency to map with the EM + * @state: List of performance states, in ascending order + * @nr_perf_states: Number of performance states + * @freq: Frequency to map with the EM + * @pd_flags: Performance Domain flags * * It is called from the scheduler code quite frequently and as a consequence * doesn't implement any check. * - * Return: An efficient performance state, high enough to meet @freq + * Return: An efficient performance state id, high enough to meet @freq * requirement. */ -static inline -struct em_perf_state *em_pd_get_efficient_state(struct em_perf_domain *pd, - unsigned long freq) +static inline int +em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states, + unsigned long freq, unsigned long pd_flags) { struct em_perf_state *ps; int i; - for (i = 0; i < pd->nr_perf_states; i++) { - ps = &pd->table[i]; + for (i = 0; i < nr_perf_states; i++) { + ps = &table[i]; if (ps->frequency >= freq) { - if (pd->flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES && + if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES && ps->flags & EM_PERF_STATE_INEFFICIENT) continue; - break; + return i; } } - return ps; + return nr_perf_states - 1; } /** @@ -226,7 +228,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, { unsigned long freq, scale_cpu; struct em_perf_state *ps; - int cpu; + int cpu, i; if (!sum_util) return 0; @@ -251,7 +253,9 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, * Find the lowest performance state of the Energy Model above the * requested frequency. */ - ps = em_pd_get_efficient_state(pd, freq); + i = em_pd_get_efficient_state(pd->table, pd->nr_perf_states, freq, + pd->flags); + ps = &pd->table[i]; /* * The capacity of a CPU in the domain at the performance state (ps) From patchwork Fri Jul 21 15:50:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 705996 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D319EB64DC for ; Fri, 21 Jul 2023 15:50:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231417AbjGUPud (ORCPT ); Fri, 21 Jul 2023 11:50:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232308AbjGUPu2 (ORCPT ); Fri, 21 Jul 2023 11:50:28 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7D1E63583; Fri, 21 Jul 2023 08:50:18 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CBEF1D75; Fri, 21 Jul 2023 08:51:00 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.0.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 9FB213F738; Fri, 21 Jul 2023 08:50:09 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, Pierre.Gondois@arm.com, ionela.voinescu@arm.com, mhiramat@kernel.org Subject: [PATCH v3 04/12] PM: EM: Refactor a new function em_compute_costs() Date: Fri, 21 Jul 2023 16:50:14 +0100 Message-Id: <20230721155022.2339982-5-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230721155022.2339982-1-lukasz.luba@arm.com> References: <20230721155022.2339982-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Refactor a dedicated function which will be easier to maintain and re-use in future. The upcoming changes for the modifiable EM perf_state table will use it (instead of duplicating the code). Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 72 ++++++++++++++++++++++--------------- 1 file changed, 43 insertions(+), 29 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 85a70b7da023..fd1066dcf38b 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -103,14 +103,52 @@ static void em_debug_create_pd(struct device *dev) {} static void em_debug_remove_pd(struct device *dev) {} #endif +static int em_compute_costs(struct device *dev, struct em_perf_state *table, + struct em_data_callback *cb, int nr_states, + unsigned long flags) +{ + unsigned long prev_cost = ULONG_MAX; + u64 fmax; + int i, ret; + + /* Compute the cost of each performance state. */ + fmax = (u64) table[nr_states - 1].frequency; + for (i = nr_states - 1; i >= 0; i--) { + unsigned long power_res, cost; + + if (flags & EM_PERF_DOMAIN_ARTIFICIAL) { + ret = cb->get_cost(dev, table[i].frequency, &cost); + if (ret || !cost || cost > EM_MAX_POWER) { + dev_err(dev, "EM: invalid cost %lu %d\n", + cost, ret); + return -EINVAL; + } + } else { + power_res = table[i].power; + cost = div64_u64(fmax * power_res, table[i].frequency); + } + + table[i].cost = cost; + + if (table[i].cost >= prev_cost) { + table[i].flags = EM_PERF_STATE_INEFFICIENT; + dev_dbg(dev, "EM: OPP:%lu is inefficient\n", + table[i].frequency); + } else { + prev_cost = table[i].cost; + } + } + + return 0; +} + static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, int nr_states, struct em_data_callback *cb, unsigned long flags) { - unsigned long power, freq, prev_freq = 0, prev_cost = ULONG_MAX; + unsigned long power, freq, prev_freq = 0; struct em_perf_state *table; int i, ret; - u64 fmax; table = kcalloc(nr_states, sizeof(*table), GFP_KERNEL); if (!table) @@ -154,33 +192,9 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, table[i].frequency = prev_freq = freq; } - /* Compute the cost of each performance state. */ - fmax = (u64) table[nr_states - 1].frequency; - for (i = nr_states - 1; i >= 0; i--) { - unsigned long power_res, cost; - - if (flags & EM_PERF_DOMAIN_ARTIFICIAL) { - ret = cb->get_cost(dev, table[i].frequency, &cost); - if (ret || !cost || cost > EM_MAX_POWER) { - dev_err(dev, "EM: invalid cost %lu %d\n", - cost, ret); - goto free_ps_table; - } - } else { - power_res = table[i].power; - cost = div64_u64(fmax * power_res, table[i].frequency); - } - - table[i].cost = cost; - - if (table[i].cost >= prev_cost) { - table[i].flags = EM_PERF_STATE_INEFFICIENT; - dev_dbg(dev, "EM: OPP:%lu is inefficient\n", - table[i].frequency); - } else { - prev_cost = table[i].cost; - } - } + ret = em_compute_costs(dev, table, cb, nr_states, flags); + if (ret) + goto free_ps_table; pd->table = table; pd->nr_perf_states = nr_states; From patchwork Fri Jul 21 15:50:15 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 705153 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C599CEB64DC for ; Fri, 21 Jul 2023 15:50:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232289AbjGUPua (ORCPT ); Fri, 21 Jul 2023 11:50:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232298AbjGUPuU (ORCPT ); Fri, 21 Jul 2023 11:50:20 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0A82C2D47; Fri, 21 Jul 2023 08:50:15 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DD68D2F4; Fri, 21 Jul 2023 08:50:58 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.0.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id A69503F844; Fri, 21 Jul 2023 08:50:12 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, Pierre.Gondois@arm.com, ionela.voinescu@arm.com, mhiramat@kernel.org Subject: [PATCH v3 05/12] PM: EM: Check if the get_cost() callback is present in em_compute_costs() Date: Fri, 21 Jul 2023 16:50:15 +0100 Message-Id: <20230721155022.2339982-6-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230721155022.2339982-1-lukasz.luba@arm.com> References: <20230721155022.2339982-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The em_compute_cost() is going to be re-used in runtime modified EM code path. Thus, make sure that this common code is safe and won't try to use the NULL pointer. The former em_compute_cost() didn't have to care about runtime modification code path. The upcoming changes introduce such option, but with different callback. Those two paths which use get_cost() (during first EM registration) or update_power() (during runtime modification) need to be safely handled in em_compute_costs(). Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index fd1066dcf38b..5ecb73b36995 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -116,7 +116,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, for (i = nr_states - 1; i >= 0; i--) { unsigned long power_res, cost; - if (flags & EM_PERF_DOMAIN_ARTIFICIAL) { + if (flags & EM_PERF_DOMAIN_ARTIFICIAL && cb->get_cost) { ret = cb->get_cost(dev, table[i].frequency, &cost); if (ret || !cost || cost > EM_MAX_POWER) { dev_err(dev, "EM: invalid cost %lu %d\n", From patchwork Fri Jul 21 15:50:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 705152 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4ED52EB64DC for ; Fri, 21 Jul 2023 15:50:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232310AbjGUPuk (ORCPT ); Fri, 21 Jul 2023 11:50:40 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50114 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232314AbjGUPu2 (ORCPT ); Fri, 21 Jul 2023 11:50:28 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7A27A359D; Fri, 21 Jul 2023 08:50:19 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1883914BF; Fri, 21 Jul 2023 08:51:02 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.0.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B9A423F844; Fri, 21 Jul 2023 08:50:15 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, Pierre.Gondois@arm.com, ionela.voinescu@arm.com, mhiramat@kernel.org Subject: [PATCH v3 06/12] PM: EM: Refactor struct em_perf_domain and add default_table Date: Fri, 21 Jul 2023 16:50:16 +0100 Message-Id: <20230721155022.2339982-7-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230721155022.2339982-1-lukasz.luba@arm.com> References: <20230721155022.2339982-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The Energy Model is going to support runtime modifications. Refactor old implementation which accessed struct em_perf_state and introduce em_perf_domain::default_table to clean up the design. This new field will help to better distinguish 2 performance state tables. Update all drivers or frameworks which used the old field: em_perf_domain::table and now should use em_perf_domain::default_table. Signed-off-by: Lukasz Luba --- drivers/powercap/dtpm_cpu.c | 27 +++++++++++++++++++-------- drivers/powercap/dtpm_devfreq.c | 23 ++++++++++++++++------- drivers/thermal/cpufreq_cooling.c | 23 +++++++++++++++-------- drivers/thermal/devfreq_cooling.c | 23 +++++++++++++++++------ include/linux/energy_model.h | 14 ++++++++++++-- kernel/power/energy_model.c | 22 ++++++++++++++++++---- 6 files changed, 97 insertions(+), 35 deletions(-) diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c index 2ff7717530bf..743a0ac8ecdf 100644 --- a/drivers/powercap/dtpm_cpu.c +++ b/drivers/powercap/dtpm_cpu.c @@ -43,6 +43,7 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit) { struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); struct em_perf_domain *pd = em_cpu_get(dtpm_cpu->cpu); + struct em_perf_state *table; struct cpumask cpus; unsigned long freq; u64 power; @@ -51,19 +52,21 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit) cpumask_and(&cpus, cpu_online_mask, to_cpumask(pd->cpus)); nr_cpus = cpumask_weight(&cpus); + table = pd->default_table->state; + for (i = 0; i < pd->nr_perf_states; i++) { - power = pd->table[i].power * nr_cpus; + power = table[i].power * nr_cpus; if (power > power_limit) break; } - freq = pd->table[i - 1].frequency; + freq = table[i - 1].frequency; freq_qos_update_request(&dtpm_cpu->qos_req, freq); - power_limit = pd->table[i - 1].power * nr_cpus; + power_limit = table[i - 1].power * nr_cpus; return power_limit; } @@ -88,12 +91,14 @@ static u64 scale_pd_power_uw(struct cpumask *pd_mask, u64 power) static u64 get_pd_power_uw(struct dtpm *dtpm) { struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); + struct em_perf_state *table; struct em_perf_domain *pd; struct cpumask *pd_mask; unsigned long freq; int i; pd = em_cpu_get(dtpm_cpu->cpu); + table = pd->default_table->state; pd_mask = em_span_cpus(pd); @@ -101,10 +106,10 @@ static u64 get_pd_power_uw(struct dtpm *dtpm) for (i = 0; i < pd->nr_perf_states; i++) { - if (pd->table[i].frequency < freq) + if (table[i].frequency < freq) continue; - return scale_pd_power_uw(pd_mask, pd->table[i].power * + return scale_pd_power_uw(pd_mask, table[i].power * MICROWATT_PER_MILLIWATT); } @@ -115,17 +120,20 @@ static int update_pd_power_uw(struct dtpm *dtpm) { struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); struct em_perf_domain *em = em_cpu_get(dtpm_cpu->cpu); + struct em_perf_state *table; struct cpumask cpus; int nr_cpus; cpumask_and(&cpus, cpu_online_mask, to_cpumask(em->cpus)); nr_cpus = cpumask_weight(&cpus); - dtpm->power_min = em->table[0].power; + table = em->default_table->state; + + dtpm->power_min = table[0].power; dtpm->power_min *= MICROWATT_PER_MILLIWATT; dtpm->power_min *= nr_cpus; - dtpm->power_max = em->table[em->nr_perf_states - 1].power; + dtpm->power_max = table[em->nr_perf_states - 1].power; dtpm->power_max *= MICROWATT_PER_MILLIWATT; dtpm->power_max *= nr_cpus; @@ -182,6 +190,7 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent) { struct dtpm_cpu *dtpm_cpu; struct cpufreq_policy *policy; + struct em_perf_state *table; struct em_perf_domain *pd; char name[CPUFREQ_NAME_LEN]; int ret = -ENOMEM; @@ -198,6 +207,8 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent) if (!pd || em_is_artificial(pd)) return -EINVAL; + table = pd->default_table->state; + dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL); if (!dtpm_cpu) return -ENOMEM; @@ -216,7 +227,7 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent) ret = freq_qos_add_request(&policy->constraints, &dtpm_cpu->qos_req, FREQ_QOS_MAX, - pd->table[pd->nr_perf_states - 1].frequency); + table[pd->nr_perf_states - 1].frequency); if (ret) goto out_dtpm_unregister; diff --git a/drivers/powercap/dtpm_devfreq.c b/drivers/powercap/dtpm_devfreq.c index 91276761a31d..6ef0f2b4a683 100644 --- a/drivers/powercap/dtpm_devfreq.c +++ b/drivers/powercap/dtpm_devfreq.c @@ -37,11 +37,14 @@ static int update_pd_power_uw(struct dtpm *dtpm) struct devfreq *devfreq = dtpm_devfreq->devfreq; struct device *dev = devfreq->dev.parent; struct em_perf_domain *pd = em_pd_get(dev); + struct em_perf_state *table; - dtpm->power_min = pd->table[0].power; + table = pd->default_table->state; + + dtpm->power_min = table[0].power; dtpm->power_min *= MICROWATT_PER_MILLIWATT; - dtpm->power_max = pd->table[pd->nr_perf_states - 1].power; + dtpm->power_max = table[pd->nr_perf_states - 1].power; dtpm->power_max *= MICROWATT_PER_MILLIWATT; return 0; @@ -53,22 +56,25 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit) struct devfreq *devfreq = dtpm_devfreq->devfreq; struct device *dev = devfreq->dev.parent; struct em_perf_domain *pd = em_pd_get(dev); + struct em_perf_state *table; unsigned long freq; u64 power; int i; + table = pd->default_table->state; + for (i = 0; i < pd->nr_perf_states; i++) { - power = pd->table[i].power * MICROWATT_PER_MILLIWATT; + power = table[i].power * MICROWATT_PER_MILLIWATT; if (power > power_limit) break; } - freq = pd->table[i - 1].frequency; + freq = table[i - 1].frequency; dev_pm_qos_update_request(&dtpm_devfreq->qos_req, freq); - power_limit = pd->table[i - 1].power * MICROWATT_PER_MILLIWATT; + power_limit = table[i - 1].power * MICROWATT_PER_MILLIWATT; return power_limit; } @@ -94,6 +100,7 @@ static u64 get_pd_power_uw(struct dtpm *dtpm) struct device *dev = devfreq->dev.parent; struct em_perf_domain *pd = em_pd_get(dev); struct devfreq_dev_status status; + struct em_perf_state *table; unsigned long freq; u64 power; int i; @@ -102,15 +109,17 @@ static u64 get_pd_power_uw(struct dtpm *dtpm) status = devfreq->last_status; mutex_unlock(&devfreq->lock); + table = pd->default_table->state; + freq = DIV_ROUND_UP(status.current_frequency, HZ_PER_KHZ); _normalize_load(&status); for (i = 0; i < pd->nr_perf_states; i++) { - if (pd->table[i].frequency < freq) + if (table[i].frequency < freq) continue; - power = pd->table[i].power * MICROWATT_PER_MILLIWATT; + power = table[i].power * MICROWATT_PER_MILLIWATT; power *= status.busy_time; power >>= 10; diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c index e2cc7bd30862..1d979c5e05ed 100644 --- a/drivers/thermal/cpufreq_cooling.c +++ b/drivers/thermal/cpufreq_cooling.c @@ -91,10 +91,11 @@ struct cpufreq_cooling_device { static unsigned long get_level(struct cpufreq_cooling_device *cpufreq_cdev, unsigned int freq) { + struct em_perf_state *table = cpufreq_cdev->em->default_table->state; int i; for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) { - if (freq > cpufreq_cdev->em->table[i].frequency) + if (freq > table[i].frequency) break; } @@ -104,15 +105,16 @@ static unsigned long get_level(struct cpufreq_cooling_device *cpufreq_cdev, static u32 cpu_freq_to_power(struct cpufreq_cooling_device *cpufreq_cdev, u32 freq) { + struct em_perf_state *table = cpufreq_cdev->em->default_table->state; unsigned long power_mw; int i; for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) { - if (freq > cpufreq_cdev->em->table[i].frequency) + if (freq > table[i].frequency) break; } - power_mw = cpufreq_cdev->em->table[i + 1].power; + power_mw = table[i + 1].power; power_mw /= MICROWATT_PER_MILLIWATT; return power_mw; @@ -121,18 +123,19 @@ static u32 cpu_freq_to_power(struct cpufreq_cooling_device *cpufreq_cdev, static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev, u32 power) { + struct em_perf_state *table = cpufreq_cdev->em->default_table->state; unsigned long em_power_mw; int i; for (i = cpufreq_cdev->max_level; i > 0; i--) { /* Convert EM power to milli-Watts to make safe comparison */ - em_power_mw = cpufreq_cdev->em->table[i].power; + em_power_mw = table[i].power; em_power_mw /= MICROWATT_PER_MILLIWATT; if (power >= em_power_mw) break; } - return cpufreq_cdev->em->table[i].frequency; + return table[i].frequency; } /** @@ -262,8 +265,9 @@ static int cpufreq_get_requested_power(struct thermal_cooling_device *cdev, static int cpufreq_state2power(struct thermal_cooling_device *cdev, unsigned long state, u32 *power) { - unsigned int freq, num_cpus, idx; struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata; + unsigned int freq, num_cpus, idx; + struct em_perf_state *table; /* Request state should be less than max_level */ if (state > cpufreq_cdev->max_level) @@ -271,8 +275,9 @@ static int cpufreq_state2power(struct thermal_cooling_device *cdev, num_cpus = cpumask_weight(cpufreq_cdev->policy->cpus); + table = cpufreq_cdev->em->default_table->state; idx = cpufreq_cdev->max_level - state; - freq = cpufreq_cdev->em->table[idx].frequency; + freq = table[idx].frequency; *power = cpu_freq_to_power(cpufreq_cdev, freq) * num_cpus; return 0; @@ -378,8 +383,10 @@ static unsigned int get_state_freq(struct cpufreq_cooling_device *cpufreq_cdev, #ifdef CONFIG_THERMAL_GOV_POWER_ALLOCATOR /* Use the Energy Model table if available */ if (cpufreq_cdev->em) { + struct em_perf_state *table; + table = cpufreq_cdev->em->default_table->state; idx = cpufreq_cdev->max_level - state; - return cpufreq_cdev->em->table[idx].frequency; + return table[idx].frequency; } #endif diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c index 262e62ab6cf2..4207ef850582 100644 --- a/drivers/thermal/devfreq_cooling.c +++ b/drivers/thermal/devfreq_cooling.c @@ -87,6 +87,7 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev, struct devfreq_cooling_device *dfc = cdev->devdata; struct devfreq *df = dfc->devfreq; struct device *dev = df->dev.parent; + struct em_perf_state *table; unsigned long freq; int perf_idx; @@ -99,8 +100,9 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev, return -EINVAL; if (dfc->em_pd) { + table = dfc->em_pd->default_table->state; perf_idx = dfc->max_state - state; - freq = dfc->em_pd->table[perf_idx].frequency * 1000; + freq = table[perf_idx].frequency * 1000; } else { freq = dfc->freq_table[state]; } @@ -123,10 +125,11 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev, */ static int get_perf_idx(struct em_perf_domain *em_pd, unsigned long freq) { + struct em_perf_state *table = em_pd->default_table->state; int i; for (i = 0; i < em_pd->nr_perf_states; i++) { - if (em_pd->table[i].frequency == freq) + if (table[i].frequency == freq) return i; } @@ -181,6 +184,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd struct devfreq_cooling_device *dfc = cdev->devdata; struct devfreq *df = dfc->devfreq; struct devfreq_dev_status status; + struct em_perf_state *table; unsigned long state; unsigned long freq; unsigned long voltage; @@ -192,6 +196,8 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd freq = status.current_frequency; + table = dfc->em_pd->default_table->state; + if (dfc->power_ops && dfc->power_ops->get_real_power) { voltage = get_voltage(df, freq); if (voltage == 0) { @@ -204,7 +210,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd state = dfc->capped_state; /* Convert EM power into milli-Watts first */ - dfc->res_util = dfc->em_pd->table[state].power; + dfc->res_util = table[state].power; dfc->res_util /= MICROWATT_PER_MILLIWATT; dfc->res_util *= SCALE_ERROR_MITIGATION; @@ -225,7 +231,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd _normalize_load(&status); /* Convert EM power into milli-Watts first */ - *power = dfc->em_pd->table[perf_idx].power; + *power = table[perf_idx].power; *power /= MICROWATT_PER_MILLIWATT; /* Scale power for utilization */ *power *= status.busy_time; @@ -245,13 +251,15 @@ static int devfreq_cooling_state2power(struct thermal_cooling_device *cdev, unsigned long state, u32 *power) { struct devfreq_cooling_device *dfc = cdev->devdata; + struct em_perf_state *table; int perf_idx; if (state > dfc->max_state) return -EINVAL; + table = dfc->em_pd->default_table->state; perf_idx = dfc->max_state - state; - *power = dfc->em_pd->table[perf_idx].power; + *power = table[perf_idx].power; *power /= MICROWATT_PER_MILLIWATT; return 0; @@ -264,6 +272,7 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev, struct devfreq *df = dfc->devfreq; struct devfreq_dev_status status; unsigned long freq, em_power_mw; + struct em_perf_state *table; s32 est_power; int i; @@ -273,6 +282,8 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev, freq = status.current_frequency; + table = dfc->em_pd->default_table->state; + if (dfc->power_ops && dfc->power_ops->get_real_power) { /* Scale for resource utilization */ est_power = power * dfc->res_util; @@ -290,7 +301,7 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev, */ for (i = dfc->max_state; i > 0; i--) { /* Convert EM power to milli-Watts to make safe comparison */ - em_power_mw = dfc->em_pd->table[i].power; + em_power_mw = table[i].power; em_power_mw /= MICROWATT_PER_MILLIWATT; if (est_power >= em_power_mw) break; diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 8069f526c9d8..90c0822b664b 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -36,9 +36,19 @@ struct em_perf_state { */ #define EM_PERF_STATE_INEFFICIENT BIT(0) +/** + * struct em_perf_table - Performance states table + * @state: List of performance states, in ascending order + * @rcu: RCU used for safe access and destruction + */ +struct em_perf_table { + struct em_perf_state *state; + struct rcu_head rcu; +}; + /** * struct em_perf_domain - Performance domain - * @table: List of performance states, in ascending order + * @default_table: Pointer to the default em_perf_table * @nr_perf_states: Number of performance states * @flags: See "em_perf_domain flags" * @cpus: Cpumask covering the CPUs of the domain. It's here @@ -53,7 +63,7 @@ struct em_perf_state { * field is unused. */ struct em_perf_domain { - struct em_perf_state *table; + struct em_perf_table *default_table; int nr_perf_states; unsigned long flags; unsigned long cpus[]; diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 5ecb73b36995..6cd94f92701d 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -66,6 +66,7 @@ DEFINE_SHOW_ATTRIBUTE(em_debug_flags); static void em_debug_create_pd(struct device *dev) { + struct em_perf_table *table = dev->em_pd->default_table; struct dentry *d; int i; @@ -81,7 +82,7 @@ static void em_debug_create_pd(struct device *dev) /* Create a sub-directory for each performance state */ for (i = 0; i < dev->em_pd->nr_perf_states; i++) - em_debug_create_ps(&dev->em_pd->table[i], d); + em_debug_create_ps(&table->state[i], d); } @@ -196,7 +197,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, if (ret) goto free_ps_table; - pd->table = table; + pd->default_table->state = table; pd->nr_perf_states = nr_states; return 0; @@ -210,6 +211,7 @@ static int em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb, cpumask_t *cpus, unsigned long flags) { + struct em_perf_table *default_table; struct em_perf_domain *pd; struct device *cpu_dev; int cpu, ret, num_cpus; @@ -234,8 +236,17 @@ static int em_create_pd(struct device *dev, int nr_states, return -ENOMEM; } + default_table = kzalloc(sizeof(*default_table), GFP_KERNEL); + if (!default_table) { + kfree(pd); + return -ENOMEM; + } + + pd->default_table = default_table; + ret = em_create_perf_table(dev, pd, nr_states, cb, flags); if (ret) { + kfree(default_table); kfree(pd); return ret; } @@ -358,6 +369,7 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, bool microwatts) { unsigned long cap, prev_cap = 0; + struct em_perf_state *table; unsigned long flags = 0; int cpu, ret; @@ -416,7 +428,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, dev->em_pd->flags |= flags; - em_cpufreq_update_efficiencies(dev, dev->em_pd->table); + table = dev->em_pd->default_table->state; + em_cpufreq_update_efficiencies(dev, table); em_debug_create_pd(dev); dev_info(dev, "EM: created perf domain\n"); @@ -449,7 +462,8 @@ void em_dev_unregister_perf_domain(struct device *dev) mutex_lock(&em_pd_mutex); em_debug_remove_pd(dev); - kfree(dev->em_pd->table); + kfree(pd->default_table->state); + kfree(pd->default_table); kfree(dev->em_pd); dev->em_pd = NULL; mutex_unlock(&em_pd_mutex); From patchwork Fri Jul 21 15:50:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 705995 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9C2AEB64DC for ; Fri, 21 Jul 2023 15:50:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232014AbjGUPuv (ORCPT ); Fri, 21 Jul 2023 11:50:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232338AbjGUPug (ORCPT ); Fri, 21 Jul 2023 11:50:36 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 59E8E3AB5; Fri, 21 Jul 2023 08:50:22 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1FB562F4; Fri, 21 Jul 2023 08:51:05 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.0.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id E903E3F738; Fri, 21 Jul 2023 08:50:18 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, Pierre.Gondois@arm.com, ionela.voinescu@arm.com, mhiramat@kernel.org Subject: [PATCH v3 07/12] PM: EM: Add update_power() callback for runtime modifications Date: Fri, 21 Jul 2023 16:50:17 +0100 Message-Id: <20230721155022.2339982-8-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230721155022.2339982-1-lukasz.luba@arm.com> References: <20230721155022.2339982-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The Energy Model (EM) is going to support runtime modifications. This new callback would be used in the upcoming EM changes. The drivers or frameworks which want to modify the EM have to implement the update_power() callback and provide it via EM API em_dev_update_perf_domain(). The callback is then used by the EM framework to get new power values for each frequency in existing EM. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 90c0822b664b..9b67f54ddcf0 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -168,6 +168,26 @@ struct em_data_callback { */ int (*get_cost)(struct device *dev, unsigned long freq, unsigned long *cost); + + /** + * update_power() - Provide new power at the given performance state of + * a device + * @dev : Device for which we do this operation (can be a CPU) + * @freq : Frequency at the performance state in kHz + * @power : New power value at the performance state + * (modified) + * @priv : Pointer to private data useful for tracking context + * during run-time modifications of EM. + * + * The update_power() is used by run-time modifiable EM. It aims to + * provide updated power value for a given frequency, which is stored + * in the performance state. The power value provided by this callback + * should fit in the [0, EM_MAX_POWER] range. + * + * Return 0 on success, or appropriate error value in case of failure. + */ + int (*update_power)(struct device *dev, unsigned long freq, + unsigned long *power, void *priv); }; #define EM_SET_ACTIVE_POWER_CB(em_cb, cb) ((em_cb).active_power = cb) #define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) \ @@ -175,6 +195,7 @@ struct em_data_callback { .get_cost = _cost_cb } #define EM_DATA_CB(_active_power_cb) \ EM_ADV_DATA_CB(_active_power_cb, NULL) +#define EM_UPDATE_CB(_update_power_cb) { .update_power = &_update_power_cb } struct em_perf_domain *em_cpu_get(int cpu); struct em_perf_domain *em_pd_get(struct device *dev); From patchwork Fri Jul 21 15:50:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 705151 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E34C3EB64DD for ; Fri, 21 Jul 2023 15:51:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232388AbjGUPu7 (ORCPT ); Fri, 21 Jul 2023 11:50:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232329AbjGUPul (ORCPT ); Fri, 21 Jul 2023 11:50:41 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 430463C10; Fri, 21 Jul 2023 08:50:25 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2482DD75; Fri, 21 Jul 2023 08:51:08 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.0.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id EEF093F738; Fri, 21 Jul 2023 08:50:21 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, Pierre.Gondois@arm.com, ionela.voinescu@arm.com, mhiramat@kernel.org Subject: [PATCH v3 08/12] PM: EM: Introduce runtime modifiable table Date: Fri, 21 Jul 2023 16:50:18 +0100 Message-Id: <20230721155022.2339982-9-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230721155022.2339982-1-lukasz.luba@arm.com> References: <20230721155022.2339982-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org This patch introduces the new feature: modifiable EM perf_state table. The new runtime table would be populated with a new power data to better reflect the actual power. The power can vary over time e.g. due to the SoC temperature change. Higher temperature can increase power values. For longer running scenarios, such as game or camera, when also other devices are used (e.g. GPU, ISP) the CPU power can change. The new EM framework is able to addresses this issue and change the data at runtime safely. The runtime modifiable EM data is used by the Energy Aware Scheduler (EAS) for the task placement. The EAS is the only user of the 'runtime modifiable EM'. All the other users (thermal, etc.) are still using the default (basic) EM. This fact drove the design of this feature. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 4 +++- kernel/power/energy_model.c | 26 ++++++++++++++++++++++++++ 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 9b67f54ddcf0..cfb1759ffd45 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -39,7 +39,7 @@ struct em_perf_state { /** * struct em_perf_table - Performance states table * @state: List of performance states, in ascending order - * @rcu: RCU used for safe access and destruction + * @rcu: RCU used only for runtime modifiable table */ struct em_perf_table { struct em_perf_state *state; @@ -49,6 +49,7 @@ struct em_perf_table { /** * struct em_perf_domain - Performance domain * @default_table: Pointer to the default em_perf_table + * @runtime_table: Pointer to the runtime modifiable em_perf_table * @nr_perf_states: Number of performance states * @flags: See "em_perf_domain flags" * @cpus: Cpumask covering the CPUs of the domain. It's here @@ -64,6 +65,7 @@ struct em_perf_table { */ struct em_perf_domain { struct em_perf_table *default_table; + struct em_perf_table __rcu *runtime_table; int nr_perf_states; unsigned long flags; unsigned long cpus[]; diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 6cd94f92701d..c2f8a0046f8a 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -212,6 +212,7 @@ static int em_create_pd(struct device *dev, int nr_states, unsigned long flags) { struct em_perf_table *default_table; + struct em_perf_table *runtime_table; struct em_perf_domain *pd; struct device *cpu_dev; int cpu, ret, num_cpus; @@ -244,13 +245,25 @@ static int em_create_pd(struct device *dev, int nr_states, pd->default_table = default_table; + runtime_table = kzalloc(sizeof(*runtime_table), GFP_KERNEL); + if (!runtime_table) { + kfree(default_table); + kfree(pd); + return -ENOMEM; + } + ret = em_create_perf_table(dev, pd, nr_states, cb, flags); if (ret) { kfree(default_table); + kfree(runtime_table); kfree(pd); return ret; } + /* Re-use temporally (till 1st modification) the memory */ + runtime_table->state = default_table->state; + rcu_assign_pointer(pd->runtime_table, runtime_table); + if (_is_cpu_device(dev)) for_each_cpu(cpu, cpus) { cpu_dev = get_cpu_device(cpu); @@ -448,23 +461,36 @@ EXPORT_SYMBOL_GPL(em_dev_register_perf_domain); */ void em_dev_unregister_perf_domain(struct device *dev) { + struct em_perf_table __rcu *runtime_table; + struct em_perf_domain *pd; + if (IS_ERR_OR_NULL(dev) || !dev->em_pd) return; if (_is_cpu_device(dev)) return; + pd = dev->em_pd; /* * The mutex separates all register/unregister requests and protects * from potential clean-up/setup issues in the debugfs directories. * The debugfs directory name is the same as device's name. */ mutex_lock(&em_pd_mutex); + em_debug_remove_pd(dev); + runtime_table = pd->runtime_table; + + rcu_assign_pointer(pd->runtime_table, NULL); + synchronize_rcu(); + + kfree(runtime_table); + kfree(pd->default_table->state); kfree(pd->default_table); kfree(dev->em_pd); + dev->em_pd = NULL; mutex_unlock(&em_pd_mutex); } From patchwork Fri Jul 21 15:50:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 705994 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95552EB64DD for ; Fri, 21 Jul 2023 15:51:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231199AbjGUPvD (ORCPT ); Fri, 21 Jul 2023 11:51:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231775AbjGUPut (ORCPT ); Fri, 21 Jul 2023 11:50:49 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 557683C35; Fri, 21 Jul 2023 08:50:28 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2998014BF; Fri, 21 Jul 2023 08:51:11 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.0.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 011F43F738; Fri, 21 Jul 2023 08:50:24 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, Pierre.Gondois@arm.com, ionela.voinescu@arm.com, mhiramat@kernel.org Subject: [PATCH v3 09/12] PM: EM: Add RCU mechanism which safely cleans the old data Date: Fri, 21 Jul 2023 16:50:19 +0100 Message-Id: <20230721155022.2339982-10-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230721155022.2339982-1-lukasz.luba@arm.com> References: <20230721155022.2339982-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The EM is going to support runtime modifications of the power data. Introduce RCU safe mechanism to clean up the old allocated EM data. It also adds a mutex for the EM structure to serialize the modifiers. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 42 +++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index c2f8a0046f8a..4596bfe7398e 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -23,6 +23,9 @@ */ static DEFINE_MUTEX(em_pd_mutex); +static void em_cpufreq_update_efficiencies(struct device *dev, + struct em_perf_state *table); + static bool _is_cpu_device(struct device *dev) { return (dev->bus == &cpu_subsys); @@ -104,6 +107,45 @@ static void em_debug_create_pd(struct device *dev) {} static void em_debug_remove_pd(struct device *dev) {} #endif +static void em_destroy_rt_table_rcu(struct rcu_head *rp) +{ + struct em_perf_table *runtime_table; + + runtime_table = container_of(rp, struct em_perf_table, rcu); + kfree(runtime_table->state); + kfree(runtime_table); +} + +static void em_destroy_tmp_setup_rcu(struct rcu_head *rp) +{ + struct em_perf_table *runtime_table; + + runtime_table = container_of(rp, struct em_perf_table, rcu); + kfree(runtime_table); +} + +static void em_perf_runtime_table_set(struct device *dev, + struct em_perf_table *runtime_table) +{ + struct em_perf_domain *pd = dev->em_pd; + struct em_perf_table *tmp; + + tmp = pd->runtime_table; + + rcu_assign_pointer(pd->runtime_table, runtime_table); + + em_cpufreq_update_efficiencies(dev, runtime_table->state); + + /* + * Check if the 'state' array is not actually the one from setup. + * If it is then don't free it. + */ + if (tmp->state == pd->default_table->state) + call_rcu(&tmp->rcu, em_destroy_tmp_setup_rcu); + else + call_rcu(&tmp->rcu, em_destroy_rt_table_rcu); +} + static int em_compute_costs(struct device *dev, struct em_perf_state *table, struct em_data_callback *cb, int nr_states, unsigned long flags) From patchwork Fri Jul 21 15:50:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 705150 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F1FDEB64DC for ; Fri, 21 Jul 2023 15:51:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232312AbjGUPvU (ORCPT ); Fri, 21 Jul 2023 11:51:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232386AbjGUPu7 (ORCPT ); Fri, 21 Jul 2023 11:50:59 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 819543A84; Fri, 21 Jul 2023 08:50:36 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2F287150C; Fri, 21 Jul 2023 08:51:14 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.0.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 049663F738; Fri, 21 Jul 2023 08:50:27 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, Pierre.Gondois@arm.com, ionela.voinescu@arm.com, mhiramat@kernel.org Subject: [PATCH v3 10/12] PM: EM: Add runtime update interface to modify EM power Date: Fri, 21 Jul 2023 16:50:20 +0100 Message-Id: <20230721155022.2339982-11-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230721155022.2339982-1-lukasz.luba@arm.com> References: <20230721155022.2339982-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Add an interface which allows to modify EM power data at runtime. The new power information is populated by the provided callback, which is called for each performance state. The CPU frequencies' efficiency is re-calculated since that might be affected as well. The old EM memory is going to be freed later using RCU mechanism. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 8 +++ kernel/power/energy_model.c | 109 +++++++++++++++++++++++++++++++++++ 2 files changed, 117 insertions(+) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index cfb1759ffd45..fd4110166e97 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -201,6 +201,8 @@ struct em_data_callback { struct em_perf_domain *em_cpu_get(int cpu); struct em_perf_domain *em_pd_get(struct device *dev); +int em_dev_update_perf_domain(struct device *dev, struct em_data_callback *cb, + void *priv); int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, struct em_data_callback *cb, cpumask_t *span, bool microwatts); @@ -381,6 +383,12 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd) { return 0; } +static inline +int em_dev_update_perf_domain(struct device *dev, struct em_data_callback *cb, + void *priv) +{ + return -EINVAL; +} #endif #endif diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 4596bfe7398e..10180c776c5b 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -185,6 +185,101 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, return 0; } +/** + * em_dev_update_perf_domain() - Update run-time EM table for a device + * @dev : Device for which the EM is to be updated + * @cb : Callback function providing the power data for the EM + * @priv : Pointer to private data useful for passing context + * which might be required while calling @cb + * + * Update EM run-time modifiable table for a @dev using the callback + * defined in @cb. The EM new power values are then used for calculating + * the em_perf_state::cost for associated performance state. + * + * This function uses mutex to serialize writers, so it must not be called + * from non-sleeping context. + * + * Return 0 on success or a proper error in case of failure. + */ +int em_dev_update_perf_domain(struct device *dev, struct em_data_callback *cb, + void *priv) +{ + struct em_perf_table *runtime_table; + unsigned long power, freq; + struct em_perf_domain *pd; + int ret, i; + + if (!cb || !cb->update_power) + return -EINVAL; + + /* + * The lock serializes update and unregister code paths. When the + * EM has been unregistered in the meantime, we should capture that + * when entering this critical section. It also makes sure that + * two concurrent updates will be serialized. + */ + mutex_lock(&em_pd_mutex); + + if (!dev || !dev->em_pd) { + ret = -EINVAL; + goto unlock_em; + } + + pd = dev->em_pd; + + runtime_table = kzalloc(sizeof(*runtime_table), GFP_KERNEL); + if (!runtime_table) { + ret = -ENOMEM; + goto unlock_em; + } + + runtime_table->state = kcalloc(pd->nr_perf_states, + sizeof(struct em_perf_state), + GFP_KERNEL); + if (!runtime_table->state) { + ret = -ENOMEM; + goto free_runtime_table; + } + + /* Populate runtime table with updated values using driver callback */ + for (i = 0; i < pd->nr_perf_states; i++) { + freq = pd->default_table->state[i].frequency; + runtime_table->state[i].frequency = freq; + + /* + * Call driver callback to get a new power value for + * a given frequency. + */ + ret = cb->update_power(dev, freq, &power, priv); + if (ret) { + dev_dbg(dev, "EM: run-time update error: %d\n", ret); + goto free_runtime_state_table; + } + + runtime_table->state[i].power = power; + } + + ret = em_compute_costs(dev, runtime_table->state, cb, + pd->nr_perf_states, pd->flags); + if (ret) + goto free_runtime_state_table; + + em_perf_runtime_table_set(dev, runtime_table); + + mutex_unlock(&em_pd_mutex); + return 0; + +free_runtime_state_table: + kfree(runtime_table->state); +free_runtime_table: + kfree(runtime_table); +unlock_em: + mutex_unlock(&em_pd_mutex); + + return -EINVAL; +} +EXPORT_SYMBOL_GPL(em_dev_update_perf_domain); + static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, int nr_states, struct em_data_callback *cb, unsigned long flags) @@ -517,6 +612,8 @@ void em_dev_unregister_perf_domain(struct device *dev) * The mutex separates all register/unregister requests and protects * from potential clean-up/setup issues in the debugfs directories. * The debugfs directory name is the same as device's name. + * The lock also protects the updater of the runtime modifiable + * EM and this remover. */ mutex_lock(&em_pd_mutex); @@ -524,9 +621,21 @@ void em_dev_unregister_perf_domain(struct device *dev) runtime_table = pd->runtime_table; + /* + * Safely destroy runtime modifiable EM. By using the call + * synchronize_rcu() we make sure we don't progress till last user + * finished the RCU section and our update got applied. + */ rcu_assign_pointer(pd->runtime_table, NULL); synchronize_rcu(); + /* + * After the sync no updates will be in-flight, so free the old + * memory allocated for runtime EM. + */ + if (runtime_table->state != pd->default_table->state) + kfree(runtime_table->state); + kfree(runtime_table); kfree(pd->default_table->state); From patchwork Fri Jul 21 15:50:21 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 705149 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EBADEB64DC for ; Fri, 21 Jul 2023 15:51:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231204AbjGUPv2 (ORCPT ); Fri, 21 Jul 2023 11:51:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50672 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231775AbjGUPvK (ORCPT ); Fri, 21 Jul 2023 11:51:10 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E115C3AAF; Fri, 21 Jul 2023 08:50:44 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 35FB51515; Fri, 21 Jul 2023 08:51:17 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.0.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 0B7513F738; Fri, 21 Jul 2023 08:50:30 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, Pierre.Gondois@arm.com, ionela.voinescu@arm.com, mhiramat@kernel.org Subject: [PATCH v3 11/12] PM: EM: Use runtime modified EM for CPUs energy estimation in EAS Date: Fri, 21 Jul 2023 16:50:21 +0100 Message-Id: <20230721155022.2339982-12-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230721155022.2339982-1-lukasz.luba@arm.com> References: <20230721155022.2339982-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The new Energy Model (EM) supports runtime modification of the performance state table to better model the power used by the SoC. Use this new feature to improve energy estimation and therefore task placement in Energy Aware Scheduler (EAS). Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index fd4110166e97..62744547b8be 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -261,6 +261,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, unsigned long max_util, unsigned long sum_util, unsigned long allowed_cpu_cap) { + struct em_perf_table *runtime_table; unsigned long freq, scale_cpu; struct em_perf_state *ps; int cpu, i; @@ -278,7 +279,14 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, */ cpu = cpumask_first(to_cpumask(pd->cpus)); scale_cpu = arch_scale_cpu_capacity(cpu); - ps = &pd->table[pd->nr_perf_states - 1]; + + /* + * No rcu_read_lock() since it's already called by task scheduler. + * The runtime_table is always there for CPUs, so we don't check. + */ + runtime_table = rcu_dereference(pd->runtime_table); + + ps = &runtime_table->state[pd->nr_perf_states - 1]; max_util = map_util_perf(max_util); max_util = min(max_util, allowed_cpu_cap); @@ -288,9 +296,9 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, * Find the lowest performance state of the Energy Model above the * requested frequency. */ - i = em_pd_get_efficient_state(pd->table, pd->nr_perf_states, freq, - pd->flags); - ps = &pd->table[i]; + i = em_pd_get_efficient_state(runtime_table->state, pd->nr_perf_states, + freq, pd->flags); + ps = &runtime_table->state[i]; /* * The capacity of a CPU in the domain at the performance state (ps) From patchwork Fri Jul 21 15:50:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 705993 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36961EB64DC for ; Fri, 21 Jul 2023 15:51:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232404AbjGUPvU (ORCPT ); Fri, 21 Jul 2023 11:51:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232387AbjGUPu7 (ORCPT ); Fri, 21 Jul 2023 11:50:59 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E5CD24204; Fri, 21 Jul 2023 08:50:36 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3A3F22F4; Fri, 21 Jul 2023 08:51:20 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.0.79]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 0FDF03F738; Fri, 21 Jul 2023 08:50:33 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, Pierre.Gondois@arm.com, ionela.voinescu@arm.com, mhiramat@kernel.org Subject: [PATCH v3 12/12] Documentation: EM: Update with runtime modification design Date: Fri, 21 Jul 2023 16:50:22 +0100 Message-Id: <20230721155022.2339982-13-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230721155022.2339982-1-lukasz.luba@arm.com> References: <20230721155022.2339982-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Add a new section 'Design' which covers the information about Energy Model. It contains the design decisions, describes models and how they reflect the reality. Add description of the basic const. EM. Change the other section IDs. Add documentation bit for the new feature which allows o modify the EM in runtime. Signed-off-by: Lukasz Luba --- Documentation/power/energy-model.rst | 150 +++++++++++++++++++++++++-- 1 file changed, 140 insertions(+), 10 deletions(-) diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst index ef341be2882b..01d4d806a123 100644 --- a/Documentation/power/energy-model.rst +++ b/Documentation/power/energy-model.rst @@ -72,16 +72,70 @@ required to have the same micro-architecture. CPUs in different performance domains can have different micro-architectures. -2. Core APIs +2. Design +----------------- + +2.1 Basic EM +^^^^^^^^^^^^ + +The basic EM is built around constant power information for each performance +state, which is accessible in: 'dev->em_pd->default_table->state'. This model +can be derived based on power measurements of the device e.g. CPU while +running some benchmark. The benchmark might be integer heavy or floating point +computation with a data set fitting into the CPU cache or registers. Bare in +mind that this model might not cover all possible workloads running on CPUs. +Thus, please run a few different benchmarks and verify with some real +workloads your power model values. The power variation due to the workload +instruction mix and data set is not modeled. The static power, which can +change during runtime due to variation of SOC temperature, is not modeled in +this basic EM. + +2.2 Runtime modifiable EM +^^^^^^^^^^^^^^^^^^^^^^^^^ + +To better reflect power variation due to static power (leakage) the EM +supports runtime modifications of the power values. The mechanism relies on +RCU to free the modifiable EM perf_state table memory. Its user, the task +scheduler, also uses RCU to access this memory. The EM framework is +responsible for allocating the new memory for the modifiable EM perf_state +table. The old memory is freed automatically using RCU callback mechanism. +This design decision is made based on task scheduler using that data and +to prevent wrong usage of kernel modules if they would be responsible for the +memory management. + +There are two structures with the performance state tables in the EM: +a) dev->em_pd->default_table +b) dev->em_pd->runtime_table +They both point to the same memory location via: +'em_perf_table::state' pointer, until the first modification of the values +This should save memory on platforms which would never modify the EM. When +the first modification is made the 'default_table' (a) contains the old +EM which was created during the setup. The modified EM is available in the +'runtime_table' (b). + +Only EAS uses the 'runtime_table' and benefits from the updates to the +EM values. Other sub-systems (thermal, powercap) use the 'default_table' (a) +since they don't need such optimization. + +The drivers which want to modify the EM values are protected from concurrent +access using a mutex. Therefore, the drivers must use sleeping context when +they want to modify the EM. The runtime modifiable EM might also be used for +better reflecting real workload scenarios, e.g. when they pop-up on the screen +and will run for longer period, such as: games, video recoding or playing, +video calls, etc. It is up to the platform engineers to experiment and choose +the right approach for their device. + + +3. Core APIs ------------ -2.1 Config options +3.1 Config options ^^^^^^^^^^^^^^^^^^ CONFIG_ENERGY_MODEL must be enabled to use the EM framework. -2.2 Registration of performance domains +3.2 Registration of performance domains ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Registration of 'advanced' EM @@ -110,8 +164,8 @@ The last argument 'microwatts' is important to set with correct value. Kernel subsystems which use EM might rely on this flag to check if all EM devices use the same scale. If there are different scales, these subsystems might decide to return warning/error, stop working or panic. -See Section 3. for an example of driver implementing this -callback, or Section 2.4 for further documentation on this API +See Section 4. for an example of driver implementing this +callback, or Section 3.4 for further documentation on this API Registration of EM using DT ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -156,7 +210,7 @@ The EM which is registered using this method might not reflect correctly the physics of a real device, e.g. when static power (leakage) is important. -2.3 Accessing performance domains +3.3 Accessing performance domains ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are two API functions which provide the access to the energy model: @@ -175,10 +229,37 @@ CPUfreq governor is in use in case of CPU device. Currently this calculation is not provided for other type of devices. More details about the above APIs can be found in ```` -or in Section 2.4 +or in Section 3.5 + + +3.4 Runtime modifications +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Drivers willing to modify the EM at runtime should use the following API:: + + int em_dev_update_perf_domain(struct device *dev, + struct em_data_callback *cb, void *priv); -2.4 Description details of this API +Drivers must provide a callback .update_power() returning power value for each +performance state. The callback function provided by the driver is free +to fetch data from any relevant location (DT, firmware, ...) or sensor. +The .update_power() callback is called by the EM for each performance state to +provide new power value. In the Section 4.2 there is an example driver +which shows simple implementation of this mechanism. The callback can be +declared with EM_UPDATE_CB() macro. The caller of that callback also passes +a private void pointer back to the driver which tries to update EM. +It is useful and helps to maintain the consistent context for all performance +state calls for a given EM. +The artificial EM also supports runtime modifications. For this type of EM +there is a need to provide one more callback: .get_cost(). The .get_cost() +returns the cost value for each performance state, which better reflects the +efficiency of the CPUs which use artificial EM. Those two callbacks: +.update_power() and get .get_cost() can be declared with one macro +EM_ADV_UPDATE_CB() and then passed to the em_dev_update_perf_domain(). + + +3.5 Description details of this API ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. kernel-doc:: include/linux/energy_model.h :internal: @@ -187,8 +268,11 @@ or in Section 2.4 :export: -3. Example driver ------------------ +4. Examples +----------- + +4.1 Example driver with EM registration +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The CPUFreq framework supports dedicated callback for registering the EM for a given CPU(s) 'policy' object: cpufreq_driver::register_em(). @@ -242,3 +326,49 @@ EM framework:: 39 static struct cpufreq_driver foo_cpufreq_driver = { 40 .register_em = foo_cpufreq_register_em, 41 }; + + +4.2 Example driver with EM modification +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This section provides a simple example of a thermal driver modifying the EM. +The driver implements a foo_mod_power() function to be provided to the +EM framework. The driver is woken up periodically to check the temperature +and modify the EM data if needed:: + + -> drivers/thermal/foo_thermal.c + + 01 static int foo_mod_power(struct device *dev, unsigned long freq, + 02 unsigned long *power, void *priv) + 03 { + 04 struct foo_context *ctx = priv; + 05 + 06 /* Estimate power for the given frequency and temperature */ + 07 *power = foo_estimate_power(dev, freq, ctx->temperature); + 08 if (*power >= EM_MAX_POWER); + 09 return -EINVAL; + 10 + 11 return 0; + 12 } + 13 + 14 /* + 15 * Function called periodically to check the temperature and + 16 * update the EM if needed + 17 */ + 18 static void foo_thermal_em_update(struct foo_context *ctx) + 19 { + 20 struct em_data_callback em_cb = EM_UPDATE_CB(mod_power); + 21 struct cpufreq_policy *policy = ctx->policy; + 22 struct device *cpu_dev; + 23 + 24 cpu_dev = get_cpu_device(cpumask_first(policy->cpus)); + 25 + 26 ctx->temperature = foo_get_temp(cpu_dev, ctx); + 27 if (ctx->temperature < FOO_EM_UPDATE_TEMP_THRESHOLD) + 28 return; + 29 + 30 /* Update EM for the CPUs' performance domain */ + 31 ret = em_dev_update_perf_domain(cpu_dev, &em_cb, ctx); + 32 if (ret) + 33 pr_warn("foo_thermal: EM update failed\n"); + 34 }