From patchwork Mon Sep 25 08:11:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 726296 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5338FCE7A89 for ; Mon, 25 Sep 2023 08:11:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232694AbjIYILc (ORCPT ); Mon, 25 Sep 2023 04:11:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53726 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232693AbjIYIL0 (ORCPT ); Mon, 25 Sep 2023 04:11:26 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 00E5CFC; Mon, 25 Sep 2023 01:11:17 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9C4551474; Mon, 25 Sep 2023 01:11:55 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id F2E633F5A1; Mon, 25 Sep 2023 01:11:14 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 02/18] PM: EM: Refactor em_cpufreq_update_efficiencies() arguments Date: Mon, 25 Sep 2023 09:11:23 +0100 Message-Id: <20230925081139.1305766-3-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org In order to prepare the code for the modifiable EM perf_state table, refactor existing function em_cpufreq_update_efficiencies(). Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 8b9dd4a39f63..42486674b834 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -237,10 +237,10 @@ static int em_create_pd(struct device *dev, int nr_states, return 0; } -static void em_cpufreq_update_efficiencies(struct device *dev) +static void +em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *table) { struct em_perf_domain *pd = dev->em_pd; - struct em_perf_state *table; struct cpufreq_policy *policy; int found = 0; int i; @@ -254,8 +254,6 @@ static void em_cpufreq_update_efficiencies(struct device *dev) return; } - table = pd->table; - for (i = 0; i < pd->nr_perf_states; i++) { if (!(table[i].flags & EM_PERF_STATE_INEFFICIENT)) continue; @@ -397,7 +395,7 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, dev->em_pd->flags |= flags; - em_cpufreq_update_efficiencies(dev); + em_cpufreq_update_efficiencies(dev, dev->em_pd->table); em_debug_create_pd(dev); dev_info(dev, "EM: created perf domain\n"); From patchwork Mon Sep 25 08:11:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 726295 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D56C7CE7A95 for ; Mon, 25 Sep 2023 08:11:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232726AbjIYILh (ORCPT ); Mon, 25 Sep 2023 04:11:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232712AbjIYIL1 (ORCPT ); Mon, 25 Sep 2023 04:11:27 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id A0EA6D3; Mon, 25 Sep 2023 01:11:20 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 706D21476; Mon, 25 Sep 2023 01:11:58 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B8D2F3F5A1; Mon, 25 Sep 2023 01:11:17 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 03/18] PM: EM: Find first CPU online while updating OPP efficiency Date: Mon, 25 Sep 2023 09:11:24 +0100 Message-Id: <20230925081139.1305766-4-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The Energy Model might be updated at runtime and the energy efficiency for each OPP may change. Thus, there is a need to update also the cpufreq framework and make it aligned to the new values. In order to do that, use a first online CPU from the Performance Domain. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 42486674b834..3dafdd7731c4 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -243,12 +243,19 @@ em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *table) struct em_perf_domain *pd = dev->em_pd; struct cpufreq_policy *policy; int found = 0; - int i; + int i, cpu; if (!_is_cpu_device(dev) || !pd) return; - policy = cpufreq_cpu_get(cpumask_first(em_span_cpus(pd))); + /* Try to get a CPU which is online and in this PD */ + cpu = cpumask_first_and(em_span_cpus(pd), cpu_active_mask); + if (cpu >= nr_cpu_ids) { + dev_warn(dev, "EM: No online CPU for CPUFreq policy\n"); + return; + } + + policy = cpufreq_cpu_get(cpu); if (!policy) { dev_warn(dev, "EM: Access to CPUFreq policy failed\n"); return; From patchwork Mon Sep 25 08:11:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 726294 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64BB6CE7A81 for ; Mon, 25 Sep 2023 08:11:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232802AbjIYIL4 (ORCPT ); Mon, 25 Sep 2023 04:11:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232721AbjIYILg (ORCPT ); Mon, 25 Sep 2023 04:11:36 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 24452111; Mon, 25 Sep 2023 01:11:28 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B613FDA7; Mon, 25 Sep 2023 01:12:06 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 11E683F7BD; Mon, 25 Sep 2023 01:11:25 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 06/18] PM: EM: Check if the get_cost() callback is present in em_compute_costs() Date: Mon, 25 Sep 2023 09:11:27 +0100 Message-Id: <20230925081139.1305766-7-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The em_compute_cost() is going to be re-used in runtime modified EM code path. Thus, make sure that this common code is safe and won't try to use the NULL pointer. The former em_compute_cost() didn't have to care about runtime modification code path. The upcoming changes introduce such option, but with different callback. Those two paths which use get_cost() (during first EM registration) or update_power() (during runtime modification) need to be safely handled in em_compute_costs(). Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 7ea882401833..35e07933b34a 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -116,7 +116,7 @@ static int em_compute_costs(struct device *dev, struct em_perf_state *table, for (i = nr_states - 1; i >= 0; i--) { unsigned long power_res, cost; - if (flags & EM_PERF_DOMAIN_ARTIFICIAL) { + if (flags & EM_PERF_DOMAIN_ARTIFICIAL && cb->get_cost) { ret = cb->get_cost(dev, table[i].frequency, &cost); if (ret || !cost || cost > EM_MAX_POWER) { dev_err(dev, "EM: invalid cost %lu %d\n", From patchwork Mon Sep 25 08:11:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 726293 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70C39CE7A95 for ; Mon, 25 Sep 2023 08:11:55 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232830AbjIYIL7 (ORCPT ); Mon, 25 Sep 2023 04:11:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232766AbjIYILl (ORCPT ); Mon, 25 Sep 2023 04:11:41 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D1814180; Mon, 25 Sep 2023 01:11:31 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9BCBA1424; Mon, 25 Sep 2023 01:12:09 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id D15053F5A1; Mon, 25 Sep 2023 01:11:28 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 07/18] PM: EM: Refactor struct em_perf_domain and add default_table Date: Mon, 25 Sep 2023 09:11:28 +0100 Message-Id: <20230925081139.1305766-8-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The Energy Model is going to support runtime modifications. Refactor old implementation which accessed struct em_perf_state and introduce em_perf_domain::default_table to clean up the design. This new field will help to better distinguish 2 performance state tables. Update all drivers or frameworks which used the old field: em_perf_domain::table and now should use em_perf_domain::default_table. Signed-off-by: Lukasz Luba --- drivers/powercap/dtpm_cpu.c | 27 +++++++++++++++++++-------- drivers/powercap/dtpm_devfreq.c | 23 ++++++++++++++++------- drivers/thermal/cpufreq_cooling.c | 24 ++++++++++++++++-------- drivers/thermal/devfreq_cooling.c | 23 +++++++++++++++++------ include/linux/energy_model.h | 24 ++++++++++++++++++------ kernel/power/energy_model.c | 26 ++++++++++++++++++++++---- 6 files changed, 108 insertions(+), 39 deletions(-) diff --git a/drivers/powercap/dtpm_cpu.c b/drivers/powercap/dtpm_cpu.c index 2ff7717530bf..743a0ac8ecdf 100644 --- a/drivers/powercap/dtpm_cpu.c +++ b/drivers/powercap/dtpm_cpu.c @@ -43,6 +43,7 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit) { struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); struct em_perf_domain *pd = em_cpu_get(dtpm_cpu->cpu); + struct em_perf_state *table; struct cpumask cpus; unsigned long freq; u64 power; @@ -51,19 +52,21 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit) cpumask_and(&cpus, cpu_online_mask, to_cpumask(pd->cpus)); nr_cpus = cpumask_weight(&cpus); + table = pd->default_table->state; + for (i = 0; i < pd->nr_perf_states; i++) { - power = pd->table[i].power * nr_cpus; + power = table[i].power * nr_cpus; if (power > power_limit) break; } - freq = pd->table[i - 1].frequency; + freq = table[i - 1].frequency; freq_qos_update_request(&dtpm_cpu->qos_req, freq); - power_limit = pd->table[i - 1].power * nr_cpus; + power_limit = table[i - 1].power * nr_cpus; return power_limit; } @@ -88,12 +91,14 @@ static u64 scale_pd_power_uw(struct cpumask *pd_mask, u64 power) static u64 get_pd_power_uw(struct dtpm *dtpm) { struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); + struct em_perf_state *table; struct em_perf_domain *pd; struct cpumask *pd_mask; unsigned long freq; int i; pd = em_cpu_get(dtpm_cpu->cpu); + table = pd->default_table->state; pd_mask = em_span_cpus(pd); @@ -101,10 +106,10 @@ static u64 get_pd_power_uw(struct dtpm *dtpm) for (i = 0; i < pd->nr_perf_states; i++) { - if (pd->table[i].frequency < freq) + if (table[i].frequency < freq) continue; - return scale_pd_power_uw(pd_mask, pd->table[i].power * + return scale_pd_power_uw(pd_mask, table[i].power * MICROWATT_PER_MILLIWATT); } @@ -115,17 +120,20 @@ static int update_pd_power_uw(struct dtpm *dtpm) { struct dtpm_cpu *dtpm_cpu = to_dtpm_cpu(dtpm); struct em_perf_domain *em = em_cpu_get(dtpm_cpu->cpu); + struct em_perf_state *table; struct cpumask cpus; int nr_cpus; cpumask_and(&cpus, cpu_online_mask, to_cpumask(em->cpus)); nr_cpus = cpumask_weight(&cpus); - dtpm->power_min = em->table[0].power; + table = em->default_table->state; + + dtpm->power_min = table[0].power; dtpm->power_min *= MICROWATT_PER_MILLIWATT; dtpm->power_min *= nr_cpus; - dtpm->power_max = em->table[em->nr_perf_states - 1].power; + dtpm->power_max = table[em->nr_perf_states - 1].power; dtpm->power_max *= MICROWATT_PER_MILLIWATT; dtpm->power_max *= nr_cpus; @@ -182,6 +190,7 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent) { struct dtpm_cpu *dtpm_cpu; struct cpufreq_policy *policy; + struct em_perf_state *table; struct em_perf_domain *pd; char name[CPUFREQ_NAME_LEN]; int ret = -ENOMEM; @@ -198,6 +207,8 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent) if (!pd || em_is_artificial(pd)) return -EINVAL; + table = pd->default_table->state; + dtpm_cpu = kzalloc(sizeof(*dtpm_cpu), GFP_KERNEL); if (!dtpm_cpu) return -ENOMEM; @@ -216,7 +227,7 @@ static int __dtpm_cpu_setup(int cpu, struct dtpm *parent) ret = freq_qos_add_request(&policy->constraints, &dtpm_cpu->qos_req, FREQ_QOS_MAX, - pd->table[pd->nr_perf_states - 1].frequency); + table[pd->nr_perf_states - 1].frequency); if (ret) goto out_dtpm_unregister; diff --git a/drivers/powercap/dtpm_devfreq.c b/drivers/powercap/dtpm_devfreq.c index 91276761a31d..6ef0f2b4a683 100644 --- a/drivers/powercap/dtpm_devfreq.c +++ b/drivers/powercap/dtpm_devfreq.c @@ -37,11 +37,14 @@ static int update_pd_power_uw(struct dtpm *dtpm) struct devfreq *devfreq = dtpm_devfreq->devfreq; struct device *dev = devfreq->dev.parent; struct em_perf_domain *pd = em_pd_get(dev); + struct em_perf_state *table; - dtpm->power_min = pd->table[0].power; + table = pd->default_table->state; + + dtpm->power_min = table[0].power; dtpm->power_min *= MICROWATT_PER_MILLIWATT; - dtpm->power_max = pd->table[pd->nr_perf_states - 1].power; + dtpm->power_max = table[pd->nr_perf_states - 1].power; dtpm->power_max *= MICROWATT_PER_MILLIWATT; return 0; @@ -53,22 +56,25 @@ static u64 set_pd_power_limit(struct dtpm *dtpm, u64 power_limit) struct devfreq *devfreq = dtpm_devfreq->devfreq; struct device *dev = devfreq->dev.parent; struct em_perf_domain *pd = em_pd_get(dev); + struct em_perf_state *table; unsigned long freq; u64 power; int i; + table = pd->default_table->state; + for (i = 0; i < pd->nr_perf_states; i++) { - power = pd->table[i].power * MICROWATT_PER_MILLIWATT; + power = table[i].power * MICROWATT_PER_MILLIWATT; if (power > power_limit) break; } - freq = pd->table[i - 1].frequency; + freq = table[i - 1].frequency; dev_pm_qos_update_request(&dtpm_devfreq->qos_req, freq); - power_limit = pd->table[i - 1].power * MICROWATT_PER_MILLIWATT; + power_limit = table[i - 1].power * MICROWATT_PER_MILLIWATT; return power_limit; } @@ -94,6 +100,7 @@ static u64 get_pd_power_uw(struct dtpm *dtpm) struct device *dev = devfreq->dev.parent; struct em_perf_domain *pd = em_pd_get(dev); struct devfreq_dev_status status; + struct em_perf_state *table; unsigned long freq; u64 power; int i; @@ -102,15 +109,17 @@ static u64 get_pd_power_uw(struct dtpm *dtpm) status = devfreq->last_status; mutex_unlock(&devfreq->lock); + table = pd->default_table->state; + freq = DIV_ROUND_UP(status.current_frequency, HZ_PER_KHZ); _normalize_load(&status); for (i = 0; i < pd->nr_perf_states; i++) { - if (pd->table[i].frequency < freq) + if (table[i].frequency < freq) continue; - power = pd->table[i].power * MICROWATT_PER_MILLIWATT; + power = table[i].power * MICROWATT_PER_MILLIWATT; power *= status.busy_time; power >>= 10; diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c index e2cc7bd30862..d468aca241e2 100644 --- a/drivers/thermal/cpufreq_cooling.c +++ b/drivers/thermal/cpufreq_cooling.c @@ -91,10 +91,11 @@ struct cpufreq_cooling_device { static unsigned long get_level(struct cpufreq_cooling_device *cpufreq_cdev, unsigned int freq) { + struct em_perf_state *table = cpufreq_cdev->em->default_table->state; int i; for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) { - if (freq > cpufreq_cdev->em->table[i].frequency) + if (freq > table[i].frequency) break; } @@ -104,15 +105,16 @@ static unsigned long get_level(struct cpufreq_cooling_device *cpufreq_cdev, static u32 cpu_freq_to_power(struct cpufreq_cooling_device *cpufreq_cdev, u32 freq) { + struct em_perf_state *table = cpufreq_cdev->em->default_table->state; unsigned long power_mw; int i; for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) { - if (freq > cpufreq_cdev->em->table[i].frequency) + if (freq > table[i].frequency) break; } - power_mw = cpufreq_cdev->em->table[i + 1].power; + power_mw = table[i + 1].power; power_mw /= MICROWATT_PER_MILLIWATT; return power_mw; @@ -121,18 +123,19 @@ static u32 cpu_freq_to_power(struct cpufreq_cooling_device *cpufreq_cdev, static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev, u32 power) { + struct em_perf_state *table = cpufreq_cdev->em->default_table->state; unsigned long em_power_mw; int i; for (i = cpufreq_cdev->max_level; i > 0; i--) { /* Convert EM power to milli-Watts to make safe comparison */ - em_power_mw = cpufreq_cdev->em->table[i].power; + em_power_mw = table[i].power; em_power_mw /= MICROWATT_PER_MILLIWATT; if (power >= em_power_mw) break; } - return cpufreq_cdev->em->table[i].frequency; + return table[i].frequency; } /** @@ -262,8 +265,9 @@ static int cpufreq_get_requested_power(struct thermal_cooling_device *cdev, static int cpufreq_state2power(struct thermal_cooling_device *cdev, unsigned long state, u32 *power) { - unsigned int freq, num_cpus, idx; struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata; + unsigned int freq, num_cpus, idx; + struct em_perf_state *table; /* Request state should be less than max_level */ if (state > cpufreq_cdev->max_level) @@ -271,8 +275,9 @@ static int cpufreq_state2power(struct thermal_cooling_device *cdev, num_cpus = cpumask_weight(cpufreq_cdev->policy->cpus); + table = cpufreq_cdev->em->default_table->state; idx = cpufreq_cdev->max_level - state; - freq = cpufreq_cdev->em->table[idx].frequency; + freq = table[idx].frequency; *power = cpu_freq_to_power(cpufreq_cdev, freq) * num_cpus; return 0; @@ -378,8 +383,11 @@ static unsigned int get_state_freq(struct cpufreq_cooling_device *cpufreq_cdev, #ifdef CONFIG_THERMAL_GOV_POWER_ALLOCATOR /* Use the Energy Model table if available */ if (cpufreq_cdev->em) { + struct em_perf_state *table; + + table = cpufreq_cdev->em->default_table->state; idx = cpufreq_cdev->max_level - state; - return cpufreq_cdev->em->table[idx].frequency; + return table[idx].frequency; } #endif diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c index 262e62ab6cf2..4207ef850582 100644 --- a/drivers/thermal/devfreq_cooling.c +++ b/drivers/thermal/devfreq_cooling.c @@ -87,6 +87,7 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev, struct devfreq_cooling_device *dfc = cdev->devdata; struct devfreq *df = dfc->devfreq; struct device *dev = df->dev.parent; + struct em_perf_state *table; unsigned long freq; int perf_idx; @@ -99,8 +100,9 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev, return -EINVAL; if (dfc->em_pd) { + table = dfc->em_pd->default_table->state; perf_idx = dfc->max_state - state; - freq = dfc->em_pd->table[perf_idx].frequency * 1000; + freq = table[perf_idx].frequency * 1000; } else { freq = dfc->freq_table[state]; } @@ -123,10 +125,11 @@ static int devfreq_cooling_set_cur_state(struct thermal_cooling_device *cdev, */ static int get_perf_idx(struct em_perf_domain *em_pd, unsigned long freq) { + struct em_perf_state *table = em_pd->default_table->state; int i; for (i = 0; i < em_pd->nr_perf_states; i++) { - if (em_pd->table[i].frequency == freq) + if (table[i].frequency == freq) return i; } @@ -181,6 +184,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd struct devfreq_cooling_device *dfc = cdev->devdata; struct devfreq *df = dfc->devfreq; struct devfreq_dev_status status; + struct em_perf_state *table; unsigned long state; unsigned long freq; unsigned long voltage; @@ -192,6 +196,8 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd freq = status.current_frequency; + table = dfc->em_pd->default_table->state; + if (dfc->power_ops && dfc->power_ops->get_real_power) { voltage = get_voltage(df, freq); if (voltage == 0) { @@ -204,7 +210,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd state = dfc->capped_state; /* Convert EM power into milli-Watts first */ - dfc->res_util = dfc->em_pd->table[state].power; + dfc->res_util = table[state].power; dfc->res_util /= MICROWATT_PER_MILLIWATT; dfc->res_util *= SCALE_ERROR_MITIGATION; @@ -225,7 +231,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd _normalize_load(&status); /* Convert EM power into milli-Watts first */ - *power = dfc->em_pd->table[perf_idx].power; + *power = table[perf_idx].power; *power /= MICROWATT_PER_MILLIWATT; /* Scale power for utilization */ *power *= status.busy_time; @@ -245,13 +251,15 @@ static int devfreq_cooling_state2power(struct thermal_cooling_device *cdev, unsigned long state, u32 *power) { struct devfreq_cooling_device *dfc = cdev->devdata; + struct em_perf_state *table; int perf_idx; if (state > dfc->max_state) return -EINVAL; + table = dfc->em_pd->default_table->state; perf_idx = dfc->max_state - state; - *power = dfc->em_pd->table[perf_idx].power; + *power = table[perf_idx].power; *power /= MICROWATT_PER_MILLIWATT; return 0; @@ -264,6 +272,7 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev, struct devfreq *df = dfc->devfreq; struct devfreq_dev_status status; unsigned long freq, em_power_mw; + struct em_perf_state *table; s32 est_power; int i; @@ -273,6 +282,8 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev, freq = status.current_frequency; + table = dfc->em_pd->default_table->state; + if (dfc->power_ops && dfc->power_ops->get_real_power) { /* Scale for resource utilization */ est_power = power * dfc->res_util; @@ -290,7 +301,7 @@ static int devfreq_cooling_power2state(struct thermal_cooling_device *cdev, */ for (i = dfc->max_state; i > 0; i--) { /* Convert EM power to milli-Watts to make safe comparison */ - em_power_mw = dfc->em_pd->table[i].power; + em_power_mw = table[i].power; em_power_mw /= MICROWATT_PER_MILLIWATT; if (est_power >= em_power_mw) break; diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 8069f526c9d8..d236e08e80dc 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -36,9 +36,19 @@ struct em_perf_state { */ #define EM_PERF_STATE_INEFFICIENT BIT(0) +/** + * struct em_perf_table - Performance states table + * @state: List of performance states, in ascending order + * @rcu: RCU used for safe access and destruction + */ +struct em_perf_table { + struct em_perf_state *state; + struct rcu_head rcu; +}; + /** * struct em_perf_domain - Performance domain - * @table: List of performance states, in ascending order + * @default_table: Pointer to the default em_perf_table * @nr_perf_states: Number of performance states * @flags: See "em_perf_domain flags" * @cpus: Cpumask covering the CPUs of the domain. It's here @@ -53,7 +63,7 @@ struct em_perf_state { * field is unused. */ struct em_perf_domain { - struct em_perf_state *table; + struct em_perf_table *default_table; int nr_perf_states; unsigned long flags; unsigned long cpus[]; @@ -227,12 +237,14 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, unsigned long allowed_cpu_cap) { unsigned long freq, scale_cpu; - struct em_perf_state *ps; + struct em_perf_state *table, *ps; int cpu, i; if (!sum_util) return 0; + table = pd->default_table->state; + /* * In order to predict the performance state, map the utilization of * the most utilized CPU of the performance domain to a requested @@ -243,7 +255,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, */ cpu = cpumask_first(to_cpumask(pd->cpus)); scale_cpu = arch_scale_cpu_capacity(cpu); - ps = &pd->table[pd->nr_perf_states - 1]; + ps = &table[pd->nr_perf_states - 1]; max_util = map_util_perf(max_util); max_util = min(max_util, allowed_cpu_cap); @@ -253,9 +265,9 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, * Find the lowest performance state of the Energy Model above the * requested frequency. */ - i = em_pd_get_efficient_state(pd->table, pd->nr_perf_states, freq, + i = em_pd_get_efficient_state(table, pd->nr_perf_states, freq, pd->flags); - ps = &pd->table[i]; + ps = &table[i]; /* * The capacity of a CPU in the domain at the performance state (ps) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 35e07933b34a..797141638b29 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -66,6 +66,7 @@ DEFINE_SHOW_ATTRIBUTE(em_debug_flags); static void em_debug_create_pd(struct device *dev) { + struct em_perf_table *table = dev->em_pd->default_table; struct dentry *d; int i; @@ -81,7 +82,7 @@ static void em_debug_create_pd(struct device *dev) /* Create a sub-directory for each performance state */ for (i = 0; i < dev->em_pd->nr_perf_states; i++) - em_debug_create_ps(&dev->em_pd->table[i], d); + em_debug_create_ps(&table->state[i], d); } @@ -196,7 +197,7 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, if (ret) goto free_ps_table; - pd->table = table; + pd->default_table->state = table; pd->nr_perf_states = nr_states; return 0; @@ -210,6 +211,7 @@ static int em_create_pd(struct device *dev, int nr_states, struct em_data_callback *cb, cpumask_t *cpus, unsigned long flags) { + struct em_perf_table *default_table; struct em_perf_domain *pd; struct device *cpu_dev; int cpu, ret, num_cpus; @@ -234,8 +236,17 @@ static int em_create_pd(struct device *dev, int nr_states, return -ENOMEM; } + default_table = kzalloc(sizeof(*default_table), GFP_KERNEL); + if (!default_table) { + kfree(pd); + return -ENOMEM; + } + + pd->default_table = default_table; + ret = em_create_perf_table(dev, pd, nr_states, cb, flags); if (ret) { + kfree(default_table); kfree(pd); return ret; } @@ -358,6 +369,7 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, bool microwatts) { unsigned long cap, prev_cap = 0; + struct em_perf_state *table; unsigned long flags = 0; int cpu, ret; @@ -416,7 +428,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, dev->em_pd->flags |= flags; - em_cpufreq_update_efficiencies(dev, dev->em_pd->table); + table = dev->em_pd->default_table->state; + em_cpufreq_update_efficiencies(dev, table); em_debug_create_pd(dev); dev_info(dev, "EM: created perf domain\n"); @@ -435,12 +448,16 @@ EXPORT_SYMBOL_GPL(em_dev_register_perf_domain); */ void em_dev_unregister_perf_domain(struct device *dev) { + struct em_perf_domain *pd; + if (IS_ERR_OR_NULL(dev) || !dev->em_pd) return; if (_is_cpu_device(dev)) return; + pd = dev->em_pd; + /* * The mutex separates all register/unregister requests and protects * from potential clean-up/setup issues in the debugfs directories. @@ -449,7 +466,8 @@ void em_dev_unregister_perf_domain(struct device *dev) mutex_lock(&em_pd_mutex); em_debug_remove_pd(dev); - kfree(dev->em_pd->table); + kfree(pd->default_table->state); + kfree(pd->default_table); kfree(dev->em_pd); dev->em_pd = NULL; mutex_unlock(&em_pd_mutex); From patchwork Mon Sep 25 08:11:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 726292 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ACA83CE7A81 for ; Mon, 25 Sep 2023 08:12:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232629AbjIYIMK (ORCPT ); Mon, 25 Sep 2023 04:12:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55324 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232742AbjIYILq (ORCPT ); Mon, 25 Sep 2023 04:11:46 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E30441A7; Mon, 25 Sep 2023 01:11:39 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D7D0DDA7; Mon, 25 Sep 2023 01:12:17 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 3B8E93F5A1; Mon, 25 Sep 2023 01:11:37 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 10/18] PM: EM: Add RCU mechanism which safely cleans the old data Date: Mon, 25 Sep 2023 09:11:31 +0100 Message-Id: <20230925081139.1305766-11-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The EM is going to support runtime modifications of the power data. Introduce RCU safe mechanism to clean up the old allocated EM data. It also adds a mutex for the EM structure to serialize the modifiers. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 5b40db38b745..2345837bfd2c 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -23,6 +23,9 @@ */ static DEFINE_MUTEX(em_pd_mutex); +static void em_cpufreq_update_efficiencies(struct device *dev, + struct em_perf_state *table); + static bool _is_cpu_device(struct device *dev) { return (dev->bus == &cpu_subsys); @@ -104,6 +107,32 @@ static void em_debug_create_pd(struct device *dev) {} static void em_debug_remove_pd(struct device *dev) {} #endif +static void em_destroy_rt_table_rcu(struct rcu_head *rp) +{ + struct em_perf_table *runtime_table; + + runtime_table = container_of(rp, struct em_perf_table, rcu); + kfree(runtime_table->state); + kfree(runtime_table); +} + +static void em_perf_runtime_table_set(struct device *dev, + struct em_perf_table *runtime_table) +{ + struct em_perf_domain *pd = dev->em_pd; + struct em_perf_table *tmp; + + tmp = pd->runtime_table; + + rcu_assign_pointer(pd->runtime_table, runtime_table); + + em_cpufreq_update_efficiencies(dev, runtime_table->state); + + /* Don't free default table since it's used by other frameworks. */ + if (tmp != pd->default_table) + call_rcu(&tmp->rcu, em_destroy_rt_table_rcu); +} + static int em_compute_costs(struct device *dev, struct em_perf_state *table, struct em_data_callback *cb, int nr_states, unsigned long flags) From patchwork Mon Sep 25 08:11:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 726291 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1177ECE7A81 for ; Mon, 25 Sep 2023 08:12:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232709AbjIYIMS (ORCPT ); Mon, 25 Sep 2023 04:12:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42564 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232810AbjIYIL5 (ORCPT ); Mon, 25 Sep 2023 04:11:57 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 761D7CCF; Mon, 25 Sep 2023 01:11:45 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5F6251424; Mon, 25 Sep 2023 01:12:23 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B74273F5A1; Mon, 25 Sep 2023 01:11:42 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 12/18] PM: EM: Use runtime modified EM for CPUs energy estimation in EAS Date: Mon, 25 Sep 2023 09:11:33 +0100 Message-Id: <20230925081139.1305766-13-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The new Energy Model (EM) supports runtime modification of the performance state table to better model the power used by the SoC. Use this new feature to improve energy estimation and therefore task placement in Energy Aware Scheduler (EAS). Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 8f055ab356ed..41290ee2cdd0 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -261,15 +261,14 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, unsigned long max_util, unsigned long sum_util, unsigned long allowed_cpu_cap) { + struct em_perf_table *runtime_table; unsigned long freq, scale_cpu; - struct em_perf_state *table, *ps; + struct em_perf_state *ps; int cpu, i; if (!sum_util) return 0; - table = pd->default_table->state; - /* * In order to predict the performance state, map the utilization of * the most utilized CPU of the performance domain to a requested @@ -280,7 +279,14 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, */ cpu = cpumask_first(to_cpumask(pd->cpus)); scale_cpu = arch_scale_cpu_capacity(cpu); - ps = &table[pd->nr_perf_states - 1]; + + /* + * No rcu_read_lock() since it's already called by task scheduler. + * The runtime_table is always there for CPUs, so we don't check. + */ + runtime_table = rcu_dereference(pd->runtime_table); + + ps = &runtime_table->state[pd->nr_perf_states - 1]; max_util = map_util_perf(max_util); max_util = min(max_util, allowed_cpu_cap); @@ -290,9 +296,9 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, * Find the lowest performance state of the Energy Model above the * requested frequency. */ - i = em_pd_get_efficient_state(table, pd->nr_perf_states, freq, - pd->flags); - ps = &table[i]; + i = em_pd_get_efficient_state(runtime_table->state, pd->nr_perf_states, + freq, pd->flags); + ps = &runtime_table->state[i]; /* * The capacity of a CPU in the domain at the performance state (ps) From patchwork Mon Sep 25 08:11:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 726290 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFA5FCE7A81 for ; Mon, 25 Sep 2023 08:12:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232662AbjIYIM3 (ORCPT ); Mon, 25 Sep 2023 04:12:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42652 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232721AbjIYIL6 (ORCPT ); Mon, 25 Sep 2023 04:11:58 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id E0DD4112; Mon, 25 Sep 2023 01:11:50 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D917BDA7; Mon, 25 Sep 2023 01:12:28 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 3BAB73F5A1; Mon, 25 Sep 2023 01:11:48 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 14/18] PM: EM: Add performance field to struct em_perf_state Date: Mon, 25 Sep 2023 09:11:35 +0100 Message-Id: <20230925081139.1305766-15-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The performance doesn't scale linearly with the frequency. Also, it may be different in different workloads. Some CPUs are designed to be particularly good at some applications e.g. images or video processing and other CPUs in different. When those different types of CPUs are combined in one SoC they should be properly modeled to get max of the HW in Energy Aware Scheduler (EAS). The Energy Model (EM) provides the power vs. performance curves to the EAS, but assumes the CPUs capacity is fixed and scales linearly with the frequency. This patch allows to adjust the curve on the 'performance' axis as well. Signed-off-by: Lukasz Luba --- include/linux/energy_model.h | 11 ++++++----- kernel/power/energy_model.c | 27 +++++++++++++++++++++++++++ 2 files changed, 33 insertions(+), 5 deletions(-) diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 41290ee2cdd0..37fc8490709d 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -12,6 +12,7 @@ /** * struct em_perf_state - Performance state of a performance domain + * @performance: Non-linear CPU performance at a given frequency * @frequency: The frequency in KHz, for consistency with CPUFreq * @power: The power consumed at this level (by 1 CPU or by a registered * device). It can be a total power: static and dynamic. @@ -20,6 +21,7 @@ * @flags: see "em_perf_state flags" description below. */ struct em_perf_state { + unsigned long performance; unsigned long frequency; unsigned long power; unsigned long cost; @@ -223,14 +225,14 @@ void em_dev_unregister_perf_domain(struct device *dev); */ static inline int em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states, - unsigned long freq, unsigned long pd_flags) + unsigned long max_util, unsigned long pd_flags) { struct em_perf_state *ps; int i; for (i = 0; i < nr_perf_states; i++) { ps = &table[i]; - if (ps->frequency >= freq) { + if (ps->performance >= max_util) { if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES && ps->flags & EM_PERF_STATE_INEFFICIENT) continue; @@ -262,8 +264,8 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, unsigned long allowed_cpu_cap) { struct em_perf_table *runtime_table; - unsigned long freq, scale_cpu; struct em_perf_state *ps; + unsigned long scale_cpu; int cpu, i; if (!sum_util) @@ -290,14 +292,13 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, max_util = map_util_perf(max_util); max_util = min(max_util, allowed_cpu_cap); - freq = map_util_freq(max_util, ps->frequency, scale_cpu); /* * Find the lowest performance state of the Energy Model above the * requested frequency. */ i = em_pd_get_efficient_state(runtime_table->state, pd->nr_perf_states, - freq, pd->flags); + max_util, pd->flags); ps = &runtime_table->state[i]; /* diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 78e1495dc87e..c7ad42b42c46 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -46,6 +46,7 @@ static void em_debug_create_ps(struct em_perf_state *ps, struct dentry *pd) debugfs_create_ulong("frequency", 0444, d, &ps->frequency); debugfs_create_ulong("power", 0444, d, &ps->power); debugfs_create_ulong("cost", 0444, d, &ps->cost); + debugfs_create_ulong("performance", 0444, d, &ps->performance); debugfs_create_ulong("inefficient", 0444, d, &ps->flags); } @@ -133,6 +134,30 @@ static void em_perf_runtime_table_set(struct device *dev, call_rcu(&tmp->rcu, em_destroy_rt_table_rcu); } +static void em_init_performance(struct device *dev, struct em_perf_domain *pd, + struct em_perf_state *table, int nr_states) +{ + u64 fmax, max_cap; + int i, cpu; + + /* This is needed only for CPUs and EAS skip other devices */ + if (!_is_cpu_device(dev)) + return; + + cpu = cpumask_first(em_span_cpus(pd)); + + /* + * Calculate the performance value for each frequency with + * linear relationship. The final CPU capacity might not be ready at + * boot time, but the EM will be updated a bit later with correct one. + */ + fmax = (u64) table[nr_states - 1].frequency; + max_cap = (u64) arch_scale_cpu_capacity(cpu); + for (i = 0; i < nr_states; i++) + table[i].performance = div64_u64(max_cap * table[i].frequency, + fmax); +} + static int em_compute_costs(struct device *dev, struct em_perf_state *table, struct em_data_callback *cb, int nr_states, unsigned long flags) @@ -317,6 +342,8 @@ static int em_create_perf_table(struct device *dev, struct em_perf_domain *pd, table[i].frequency = prev_freq = freq; } + em_init_performance(dev, pd, table, nr_states); + ret = em_compute_costs(dev, table, cb, nr_states, flags); if (ret) goto free_ps_table; From patchwork Mon Sep 25 08:11:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 726289 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 53AB5CE7A97 for ; Mon, 25 Sep 2023 08:12:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232654AbjIYIMc (ORCPT ); Mon, 25 Sep 2023 04:12:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42472 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232665AbjIYIMD (ORCPT ); Mon, 25 Sep 2023 04:12:03 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 74AF419C; Mon, 25 Sep 2023 01:11:56 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5B4B4DA7; Mon, 25 Sep 2023 01:12:34 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id B44523F5A1; Mon, 25 Sep 2023 01:11:53 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 16/18] PM: EM: Support late CPUs booting and capacity adjustment Date: Mon, 25 Sep 2023 09:11:37 +0100 Message-Id: <20230925081139.1305766-17-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The patch adds needed infrastructure to handle the late CPUs boot, which might change the previous CPUs capacity values. With this changes the new CPUs which try to register EM will trigger the needed re-calculations for other CPUs EMs. Thanks to that the em_per_state::performance values will be aligned with the CPU capacity information after all CPUs finish the boot. Signed-off-by: Lukasz Luba --- kernel/power/energy_model.c | 108 ++++++++++++++++++++++++++++++++++++ 1 file changed, 108 insertions(+) diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c index 17a59a7717f7..6bfd33c2e48c 100644 --- a/kernel/power/energy_model.c +++ b/kernel/power/energy_model.c @@ -25,6 +25,9 @@ static DEFINE_MUTEX(em_pd_mutex); static void em_cpufreq_update_efficiencies(struct device *dev, struct em_perf_state *table); +static void em_check_capacity_update(void); +static void em_update_workfn(struct work_struct *work); +static DECLARE_DELAYED_WORK(em_update_work, em_update_workfn); static bool _is_cpu_device(struct device *dev) { @@ -591,6 +594,10 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, unlock: mutex_unlock(&em_pd_mutex); + + if (_is_cpu_device(dev)) + em_check_capacity_update(); + return ret; } EXPORT_SYMBOL_GPL(em_dev_register_perf_domain); @@ -651,3 +658,104 @@ void em_dev_unregister_perf_domain(struct device *dev) mutex_unlock(&em_pd_mutex); } EXPORT_SYMBOL_GPL(em_dev_unregister_perf_domain); + +/* + * Adjustment of CPU performance values after boot, when all CPUs capacites + * are correctly calculated. + */ +static int get_updated_perf(struct device *dev, unsigned long freq, + unsigned long *power, unsigned long *perf, + void *priv) +{ + struct em_perf_state *table = priv; + int i, cpu, nr_states; + u64 fmax, max_cap; + + nr_states = dev->em_pd->nr_perf_states; + + cpu = cpumask_first(em_span_cpus(dev->em_pd)); + + fmax = (u64) table[nr_states - 1].frequency; + max_cap = (u64) arch_scale_cpu_capacity(cpu); + + for (i = 0; i < nr_states; i++) { + if (freq != table[i].frequency) + continue; + + *power = table[i].power; + *perf = div64_u64(max_cap * freq, fmax); + break; + } + + return 0; +} + +static void em_check_capacity_update(void) +{ + struct em_data_callback em_cb = EM_UPDATE_CB(get_updated_perf); + struct em_perf_table *runtime_table; + struct em_perf_domain *em_pd; + cpumask_var_t cpu_done_mask; + unsigned long cpu_capacity; + struct em_perf_state *ps; + struct device *dev; + int cpu, ret; + + if (!zalloc_cpumask_var(&cpu_done_mask, GFP_KERNEL)) { + pr_warn("EM: no free memory\n"); + return; + } + + /* Loop over all EMs and check if the CPU capacity has changed. */ + for_each_possible_cpu(cpu) { + unsigned long em_max_performance; + struct cpufreq_policy *policy; + + if (cpumask_test_cpu(cpu, cpu_done_mask)) + continue; + + policy = cpufreq_cpu_get(cpu); + if (!policy) { + pr_debug("EM: Accessing cpu%d policy failed\n", cpu); + schedule_delayed_work(&em_update_work, + msecs_to_jiffies(1000)); + break; + } + + em_pd = em_cpu_get(cpu); + if (!em_pd || em_is_artificial(em_pd)) + continue; + + cpu_capacity = arch_scale_cpu_capacity(cpu); + + rcu_read_lock(); + runtime_table = rcu_dereference(em_pd->runtime_table); + ps = &runtime_table->state[em_pd->nr_perf_states - 1]; + em_max_performance = ps->performance; + rcu_read_unlock(); + + /* + * Check if the CPU capacity has been adjusted during boot + * and trigger the update for new performance values. + */ + if (em_max_performance != cpu_capacity) { + dev = get_cpu_device(cpu); + ret = em_dev_update_perf_domain(dev, &em_cb, + em_pd->default_table->state); + if (ret) + dev_warn(dev, "EM: update failed %d\n", ret); + else + dev_info(dev, "EM: updated\n"); + } + + cpumask_or(cpu_done_mask, cpu_done_mask, + em_span_cpus(em_pd)); + } + + free_cpumask_var(cpu_done_mask); +} + +static void em_update_workfn(struct work_struct *work) +{ + em_check_capacity_update(); +} From patchwork Mon Sep 25 08:11:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lukasz Luba X-Patchwork-Id: 726288 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C36ECE7A81 for ; Mon, 25 Sep 2023 08:12:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232806AbjIYIM7 (ORCPT ); Mon, 25 Sep 2023 04:12:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232808AbjIYIMS (ORCPT ); Mon, 25 Sep 2023 04:12:18 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 0D483115; Mon, 25 Sep 2023 01:12:02 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D510CDA7; Mon, 25 Sep 2023 01:12:39 -0700 (PDT) Received: from e129166.arm.com (unknown [10.57.93.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 39C503F5A1; Mon, 25 Sep 2023 01:11:59 -0700 (PDT) From: Lukasz Luba To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, rafael@kernel.org Cc: lukasz.luba@arm.com, dietmar.eggemann@arm.com, rui.zhang@intel.com, amit.kucheria@verdurent.com, amit.kachhap@gmail.com, daniel.lezcano@linaro.org, viresh.kumar@linaro.org, len.brown@intel.com, pavel@ucw.cz, mhiramat@kernel.org, qyousef@layalina.io, wvw@google.com Subject: [PATCH v4 18/18] Documentation: EM: Update information about performance field Date: Mon, 25 Sep 2023 09:11:39 +0100 Message-Id: <20230925081139.1305766-19-lukasz.luba@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230925081139.1305766-1-lukasz.luba@arm.com> References: <20230925081139.1305766-1-lukasz.luba@arm.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The Energy Model supports the new information: performance for each performance state. Update the needed documentation part accordingly. Signed-off-by: Lukasz Luba --- Documentation/power/energy-model.rst | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/Documentation/power/energy-model.rst b/Documentation/power/energy-model.rst index 3115411f9839..da3619c0b98a 100644 --- a/Documentation/power/energy-model.rst +++ b/Documentation/power/energy-model.rst @@ -125,6 +125,11 @@ runtime static EM' (system property) design to a 'single EM which can be changed during runtime according e.g. to the workload' (system and workload property) design. +It is possible also to modify the CPU performance values for each EM's +performance state. Thus, the full power and performance profile (which +is an exponential curve) can be changed according e.g. to the workload +or system property. + 3. Core APIs ------------ @@ -326,18 +331,18 @@ EM framework:: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This section provides a simple example of a thermal driver modifying the EM. -The driver implements a foo_mod_power() function to be provided to the +The driver implements a mod_power_perf() function to be provided to the EM framework. The driver is woken up periodically to check the temperature and modify the EM data if needed:: -> drivers/thermal/foo_thermal.c - 01 static int foo_mod_power(struct device *dev, unsigned long freq, - 02 unsigned long *power, void *priv) + 01 static int mod_power_perf(struct device *dev, unsigned long freq, + 02 unsigned long *power, unsigned long *perf, void *priv) 03 { 04 struct foo_context *ctx = priv; 05 - 06 /* Estimate power for the given frequency and temperature */ + 06 *perf = foo_estimate_performance(dev, freq); 07 *power = foo_estimate_power(dev, freq, ctx->temperature); 08 if (*power >= EM_MAX_POWER); 09 return -EINVAL;