mbox series

[V11,0/7] amd-pstate preferred core

Message ID 20231129065437.290183-1-li.meng@amd.com
Headers show
Series amd-pstate preferred core | expand

Message

Meng, Li (Jassmine) Nov. 29, 2023, 6:54 a.m. UTC
Hi all:

The core frequency is subjected to the process variation in semiconductors.
Not all cores are able to reach the maximum frequency respecting the
infrastructure limits. Consequently, AMD has redefined the concept of
maximum frequency of a part. This means that a fraction of cores can reach
maximum frequency. To find the best process scheduling policy for a given
scenario, OS needs to know the core ordering informed by the platform through
highest performance capability register of the CPPC interface.

Earlier implementations of amd-pstate preferred core only support a static
core ranking and targeted performance. Now it has the ability to dynamically
change the preferred core based on the workload and platform conditions and
accounting for thermals and aging.

Amd-pstate driver utilizes the functions and data structures provided by
the ITMT architecture to enable the scheduler to favor scheduling on cores
which can be get a higher frequency with lower voltage.
We call it amd-pstate preferred core.

Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.
Amd-pstate driver uses the highest performance value to indicate
the priority of CPU. The higher value has a higher priority.

Amd-pstate driver will provide an initial core ordering at boot time.
It relies on the CPPC interface to communicate the core ranking to the
operating system and scheduler to make sure that OS is choosing the cores
with highest performance firstly for scheduling the process. When amd-pstate
driver receives a message with the highest performance change, it will
update the core ranking.

Changes from V10->V11:
- cpufreq: amd-pstate:
- - according Perry's commnts, I replace the string with str_enabled_disable().

Changes from V9->V10:
- cpufreq: amd-pstate:
- - add judgement for highest_perf. When it is less than 255, the
  preferred core feature is enabled. And it will set the priority.
- - deleset "static u32 max_highest_perf" etc, because amd p-state
  perferred coe does not require specail process for hotpulg.

Changes form V8->V9:
- all:
- - pick up Tested-By flag added by Oleksandr.
- cpufreq: amd-pstate:
- - pick up Review-By flag added by Wyes.
- - ignore modification of bug.
- - add a attribute of prefcore_ranking.
- - modify data type conversion from u32 to int.
- Documentation: amd-pstate:
- - pick up Review-By flag added by Wyes.

Changes form V7->V8:
- all:
- - pick up Review-By flag added by Mario and Ray.
- cpufreq: amd-pstate:
- - use hw_prefcore embeds into cpudata structure.
- - delete preferred core init from cpu online/off.

Changes form V6->V7:
- x86:
- - Modify kconfig about X86_AMD_PSTATE.
- cpufreq: amd-pstate:
- - modify incorrect comments about scheduler_work().
- - convert highest_perf data type.
- - modify preferred core init when cpu init and online.
- acpi: cppc:
- - modify link of CPPC highest performance.
- cpufreq:
- - modify link of CPPC highest performance changed.

Changes form V5->V6:
- cpufreq: amd-pstate:
- - modify the wrong tag order.
- - modify warning about hw_prefcore sysfs attribute.
- - delete duplicate comments.
- - modify the variable name cppc_highest_perf to prefcore_ranking.
- - modify judgment conditions for setting highest_perf.
- - modify sysfs attribute for CPPC highest perf to pr_debug message.
- Documentation: amd-pstate:
- - modify warning: title underline too short.

Changes form V4->V5:
- cpufreq: amd-pstate:
- - modify sysfs attribute for CPPC highest perf.
- - modify warning about comments
- - rebase linux-next
- cpufreq: 
- - Moidfy warning about function declarations.
- Documentation: amd-pstate:
- - align with ``amd-pstat``

Changes form V3->V4:
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.

Changes form V2->V3:
- x86:
- - Modify kconfig and description.
- cpufreq: amd-pstate: 
- - Add Co-developed-by tag in commit message.
- cpufreq:
- - Modify commit message.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.

Changes form V1->V2:
- acpi: cppc:
- - Add reference link.
- cpufreq:
- - Moidfy link error.
- cpufreq: amd-pstate: 
- - Init the priorities of all online CPUs
- - Use a single variable to represent the status of preferred core.
- Documentation:
- - Default enabled preferred core.
- Documentation: amd-pstate: 
- - Modify inappropriate descriptions.
- - Default enabled preferred core.
- - Use a single variable to represent the status of preferred core.

Meng Li (7):
  x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
  acpi: cppc: Add get the highest performance cppc control
  cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
  cpufreq: Add a notification message that the highest perf has changed
  cpufreq: amd-pstate: Update amd-pstate preferred core ranking
    dynamically
  Documentation: amd-pstate: introduce amd-pstate preferred core
  Documentation: introduce amd-pstate preferrd core mode kernel command
    line options

 .../admin-guide/kernel-parameters.txt         |   5 +
 Documentation/admin-guide/pm/amd-pstate.rst   |  59 +++++-
 arch/x86/Kconfig                              |   5 +-
 drivers/acpi/cppc_acpi.c                      |  13 ++
 drivers/acpi/processor_driver.c               |   6 +
 drivers/cpufreq/amd-pstate.c                  | 187 ++++++++++++++++--
 drivers/cpufreq/cpufreq.c                     |  13 ++
 include/acpi/cppc_acpi.h                      |   5 +
 include/linux/amd-pstate.h                    |  10 +
 include/linux/cpufreq.h                       |   5 +
 10 files changed, 288 insertions(+), 20 deletions(-)

Comments

Yuan, Perry Nov. 30, 2023, 5:49 a.m. UTC | #1
[AMD Official Use Only - General]

> -----Original Message-----
> From: Meng, Li (Jassmine) <Li.Meng@amd.com>
> Sent: Wednesday, November 29, 2023 2:55 PM
> To: Rafael J . Wysocki <rafael.j.wysocki@intel.com>; Huang, Ray
> <Ray.Huang@amd.com>
> Cc: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org; x86@kernel.org;
> linux-acpi@vger.kernel.org; Shuah Khan <skhan@linuxfoundation.org>; linux-
> kselftest@vger.kernel.org; Fontenot, Nathan <Nathan.Fontenot@amd.com>;
> Sharma, Deepak <Deepak.Sharma@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Limonciello, Mario
> <Mario.Limonciello@amd.com>; Huang, Shimmer
> <Shimmer.Huang@amd.com>; Yuan, Perry <Perry.Yuan@amd.com>; Du,
> Xiaojian <Xiaojian.Du@amd.com>; Viresh Kumar <viresh.kumar@linaro.org>;
> Borislav Petkov <bp@alien8.de>; Oleksandr Natalenko
> <oleksandr@natalenko.name>; Meng, Li (Jassmine) <Li.Meng@amd.com>;
> Karny, Wyes <Wyes.Karny@amd.com>
> Subject: [PATCH V11 5/7] cpufreq: amd-pstate: Update amd-pstate preferred
> core ranking dynamically
>
> Preferred core rankings can be changed dynamically by the platform based on
> the workload and platform conditions and accounting for thermals and aging.
> When this occurs, cpu priority need to be set.
>
> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
> Reviewed-by: Wyes Karny <wyes.karny@amd.com>
> Reviewed-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Meng Li <li.meng@amd.com>
> ---
>  drivers/cpufreq/amd-pstate.c | 46
> ++++++++++++++++++++++++++++++++++++
>  include/linux/amd-pstate.h   |  6 +++++
>  2 files changed, 52 insertions(+)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 74dcf63d75f9..88df6510dcc0 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -312,6 +312,7 @@ static int pstate_init_perf(struct amd_cpudata
> *cpudata)
>       WRITE_ONCE(cpudata->nominal_perf,
> AMD_CPPC_NOMINAL_PERF(cap1));
>       WRITE_ONCE(cpudata->lowest_nonlinear_perf,
> AMD_CPPC_LOWNONLIN_PERF(cap1));
>       WRITE_ONCE(cpudata->lowest_perf,
> AMD_CPPC_LOWEST_PERF(cap1));
> +     WRITE_ONCE(cpudata->prefcore_ranking,
> AMD_CPPC_HIGHEST_PERF(cap1));
>
>       return 0;
>  }
> @@ -333,6 +334,7 @@ static int cppc_init_perf(struct amd_cpudata
> *cpudata)
>       WRITE_ONCE(cpudata->lowest_nonlinear_perf,
>                  cppc_perf.lowest_nonlinear_perf);
>       WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf);
> +     WRITE_ONCE(cpudata->prefcore_ranking, cppc_perf.highest_perf);
>
>       if (cppc_state == AMD_PSTATE_ACTIVE)
>               return 0;
> @@ -749,6 +751,34 @@ static void amd_pstate_init_prefcore(struct
> amd_cpudata *cpudata)
>       schedule_work(&sched_prefcore_work);
>  }
>
> +static void amd_pstate_update_highest_perf(unsigned int cpu) {
> +     struct cpufreq_policy *policy;
> +     struct amd_cpudata *cpudata;
> +     u32 prev_high = 0, cur_high = 0;
> +     int ret;
> +
> +     if ((!amd_pstate_prefcore) || (!cpudata->hw_prefcore))
> +             return;
> +
> +     ret = amd_pstate_get_highest_perf(cpu, &cur_high);
> +     if (ret)
> +             return;
> +
> +     policy = cpufreq_cpu_get(cpu);
> +     cpudata = policy->driver_data;
> +     prev_high = READ_ONCE(cpudata->prefcore_ranking);
> +
> +     if (prev_high != cur_high) {
> +             WRITE_ONCE(cpudata->prefcore_ranking, cur_high);
> +
> +             if (cur_high < CPPC_MAX_PERF)
> +                     sched_set_itmt_core_prio((int)cur_high, cpu);
> +     }
> +
> +     cpufreq_cpu_put(policy);
> +}
> +
>  static int amd_pstate_cpu_init(struct cpufreq_policy *policy)  {
>       int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
> @@ -920,6 +950,17 @@ static ssize_t show_amd_pstate_highest_perf(struct
> cpufreq_policy *policy,
>       return sysfs_emit(buf, "%u\n", perf);
>  }
>
> +static ssize_t show_amd_pstate_prefcore_ranking(struct cpufreq_policy
> *policy,
> +                                             char *buf)
> +{
> +     u32 perf;
> +     struct amd_cpudata *cpudata = policy->driver_data;
> +
> +     perf = READ_ONCE(cpudata->prefcore_ranking);
> +
> +     return sysfs_emit(buf, "%u\n", perf);
> +}
> +
>  static ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy,
>                                          char *buf)
>  {
> @@ -1133,6 +1174,7 @@ cpufreq_freq_attr_ro(amd_pstate_max_freq);
>  cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
>
>  cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> +cpufreq_freq_attr_ro(amd_pstate_prefcore_ranking);
>  cpufreq_freq_attr_ro(amd_pstate_hw_prefcore);
>  cpufreq_freq_attr_rw(energy_performance_preference);
>  cpufreq_freq_attr_ro(energy_performance_available_preferences);
> @@ -1143,6 +1185,7 @@ static struct freq_attr *amd_pstate_attr[] = {
>       &amd_pstate_max_freq,
>       &amd_pstate_lowest_nonlinear_freq,
>       &amd_pstate_highest_perf,
> +     &amd_pstate_prefcore_ranking,
>       &amd_pstate_hw_prefcore,
>       NULL,
>  };
> @@ -1151,6 +1194,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = {
>       &amd_pstate_max_freq,
>       &amd_pstate_lowest_nonlinear_freq,
>       &amd_pstate_highest_perf,
> +     &amd_pstate_prefcore_ranking,
>       &amd_pstate_hw_prefcore,
>       &energy_performance_preference,
>       &energy_performance_available_preferences,
> @@ -1491,6 +1535,7 @@ static struct cpufreq_driver amd_pstate_driver = {
>       .suspend        = amd_pstate_cpu_suspend,
>       .resume         = amd_pstate_cpu_resume,
>       .set_boost      = amd_pstate_set_boost,
> +     .update_highest_perf    = amd_pstate_update_highest_perf,
>       .name           = "amd-pstate",
>       .attr           = amd_pstate_attr,
>  };
> @@ -1505,6 +1550,7 @@ static struct cpufreq_driver
> amd_pstate_epp_driver = {
>       .online         = amd_pstate_epp_cpu_online,
>       .suspend        = amd_pstate_epp_suspend,
>       .resume         = amd_pstate_epp_resume,
> +     .update_highest_perf    = amd_pstate_update_highest_perf,
>       .name           = "amd-pstate-epp",
>       .attr           = amd_pstate_epp_attr,
>  };
> diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h index
> 87e140e9e6db..426822612373 100644
> --- a/include/linux/amd-pstate.h
> +++ b/include/linux/amd-pstate.h
> @@ -39,11 +39,16 @@ struct amd_aperf_mperf {
>   * @cppc_req_cached: cached performance request hints
>   * @highest_perf: the maximum performance an individual processor may
> reach,
>   *             assuming ideal conditions
> + *             For platforms that do not support the preferred core feature,
> the
> + *             highest_pef may be configured with 166 or 255, to avoid
> max frequency
> + *             calculated wrongly. we take the fixed value as the
> highest_perf.
>   * @nominal_perf: the maximum sustained performance level of the
> processor,
>   *             assuming ideal operating conditions
>   * @lowest_nonlinear_perf: the lowest performance level at which nonlinear
> power
>   *                      savings are achieved
>   * @lowest_perf: the absolute lowest performance level of the processor
> + * @prefcore_ranking: the preferred core ranking, the higher value indicates a
> higher
> + *             priority.
>   * @max_freq: the frequency that mapped to highest_perf
>   * @min_freq: the frequency that mapped to lowest_perf
>   * @nominal_freq: the frequency that mapped to nominal_perf @@ -73,6
> +78,7 @@ struct amd_cpudata {
>       u32     nominal_perf;
>       u32     lowest_nonlinear_perf;
>       u32     lowest_perf;
> +     u32     prefcore_ranking;
>
>       u32     max_freq;
>       u32     min_freq;
> --
> 2.34.1

Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Yuan, Perry Nov. 30, 2023, 5:52 a.m. UTC | #2
[AMD Official Use Only - General]

> -----Original Message-----
> From: Meng, Li (Jassmine) <Li.Meng@amd.com>
> Sent: Wednesday, November 29, 2023 2:55 PM
> To: Rafael J . Wysocki <rafael.j.wysocki@intel.com>; Huang, Ray
> <Ray.Huang@amd.com>
> Cc: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org; x86@kernel.org;
> linux-acpi@vger.kernel.org; Shuah Khan <skhan@linuxfoundation.org>; linux-
> kselftest@vger.kernel.org; Fontenot, Nathan <Nathan.Fontenot@amd.com>;
> Sharma, Deepak <Deepak.Sharma@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Limonciello, Mario
> <Mario.Limonciello@amd.com>; Huang, Shimmer
> <Shimmer.Huang@amd.com>; Yuan, Perry <Perry.Yuan@amd.com>; Du,
> Xiaojian <Xiaojian.Du@amd.com>; Viresh Kumar <viresh.kumar@linaro.org>;
> Borislav Petkov <bp@alien8.de>; Oleksandr Natalenko
> <oleksandr@natalenko.name>; Meng, Li (Jassmine) <Li.Meng@amd.com>;
> Karny, Wyes <Wyes.Karny@amd.com>
> Subject: [PATCH V11 7/7] Documentation: introduce amd-pstate preferrd core
> mode kernel command line options
>
> amd-pstate driver support enable/disable preferred core.
> Default enabled on platforms supporting amd-pstate preferred core.
> Disable amd-pstate preferred core with
> "amd_prefcore=disable" added to the kernel command line.
>
> Signed-off-by: Meng Li <li.meng@amd.com>
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
> Reviewed-by: Wyes Karny <wyes.karny@amd.com>
> Reviewed-by: Huang Rui <ray.huang@amd.com>
> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt
> b/Documentation/admin-guide/kernel-parameters.txt
> index 758bb25ea3e6..008bdfd63c22 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -363,6 +363,11 @@
>                         selects a performance level in this range and
> appropriate
>                         to the current workload.
>
> +     amd_prefcore=
> +                     [X86]
> +                     disable
> +                       Disable amd-pstate preferred core.
> +
>       amijoy.map=     [HW,JOY] Amiga joystick support
>                       Map of devices attached to JOY0DAT and JOY1DAT
>                       Format: <a>,<b>
> --
> 2.34.1

Reviewed-by: Perry Yuan < perry.yuan@amd.com>
Yuan, Perry Nov. 30, 2023, 6 a.m. UTC | #3
[AMD Official Use Only - General]

> -----Original Message-----
> From: Meng, Li (Jassmine) <Li.Meng@amd.com>
> Sent: Wednesday, November 29, 2023 2:55 PM
> To: Rafael J . Wysocki <rafael.j.wysocki@intel.com>; Huang, Ray
> <Ray.Huang@amd.com>
> Cc: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org; x86@kernel.org;
> linux-acpi@vger.kernel.org; Shuah Khan <skhan@linuxfoundation.org>; linux-
> kselftest@vger.kernel.org; Fontenot, Nathan <Nathan.Fontenot@amd.com>;
> Sharma, Deepak <Deepak.Sharma@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Limonciello, Mario
> <Mario.Limonciello@amd.com>; Huang, Shimmer <Shimmer.Huang@amd.com>;
> Yuan, Perry <Perry.Yuan@amd.com>; Du, Xiaojian <Xiaojian.Du@amd.com>;
> Viresh Kumar <viresh.kumar@linaro.org>; Borislav Petkov <bp@alien8.de>;
> Oleksandr Natalenko <oleksandr@natalenko.name>; Meng, Li (Jassmine)
> <Li.Meng@amd.com>
> Subject: [PATCH V11 1/7] x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for
> the expansion.
>
> amd-pstate driver also uses SCHED_MC_PRIO, so decouple the requirement of
> CPU_SUP_INTEL from the dependencies to allow compilation in kernels without
> Intel CPU support.
>
> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
> Reviewed-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Meng Li <li.meng@amd.com>
> ---
>  arch/x86/Kconfig | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index
> ad478a2b49e2..77b1af90f7a2 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1054,8 +1054,9 @@ config SCHED_MC
>
>  config SCHED_MC_PRIO
>       bool "CPU core priorities scheduler support"
> -     depends on SCHED_MC && CPU_SUP_INTEL
> -     select X86_INTEL_PSTATE
> +     depends on SCHED_MC
> +     select X86_INTEL_PSTATE if CPU_SUP_INTEL
> +     select X86_AMD_PSTATE if CPU_SUP_AMD && ACPI
>       select CPU_FREQ
>       default y
>       help
> --
> 2.34.1

Reviewed-by: Perry Yuan <perry.yuan@amd.com >
Tor Vic Nov. 30, 2023, 2:35 p.m. UTC | #4
On 11/29/23 07:54, Meng Li wrote:
> Hi all:
> 
> The core frequency is subjected to the process variation in semiconductors.
> Not all cores are able to reach the maximum frequency respecting the
> infrastructure limits. Consequently, AMD has redefined the concept of
> maximum frequency of a part. This means that a fraction of cores can reach
> maximum frequency. To find the best process scheduling policy for a given
> scenario, OS needs to know the core ordering informed by the platform through
> highest performance capability register of the CPPC interface.
> 
> Earlier implementations of amd-pstate preferred core only support a static
> core ranking and targeted performance. Now it has the ability to dynamically
> change the preferred core based on the workload and platform conditions and
> accounting for thermals and aging.
> 
> Amd-pstate driver utilizes the functions and data structures provided by
> the ITMT architecture to enable the scheduler to favor scheduling on cores
> which can be get a higher frequency with lower voltage.
> We call it amd-pstate preferred core.
> 
> Here sched_set_itmt_core_prio() is called to set priorities and
> sched_set_itmt_support() is called to enable ITMT feature.
> Amd-pstate driver uses the highest performance value to indicate
> the priority of CPU. The higher value has a higher priority.
> 
> Amd-pstate driver will provide an initial core ordering at boot time.
> It relies on the CPPC interface to communicate the core ranking to the
> operating system and scheduler to make sure that OS is choosing the cores
> with highest performance firstly for scheduling the process. When amd-pstate
> driver receives a message with the highest performance change, it will
> update the core ranking.
> 
> Changes from V10->V11:
> - cpufreq: amd-pstate:
> - - according Perry's commnts, I replace the string with str_enabled_disable().
> 

Hi,

Using clang-17, I get the following warning:

----
   drivers/cpufreq/amd-pstate.c:793:34: warning: variable 'cpudata' is 
uninitialized when used here [-Wuninitialized]
     793 |         if ((!amd_pstate_prefcore) || (!cpudata->hw_prefcore))
         |                                         ^~~~~~~
   drivers/cpufreq/amd-pstate.c:789:29: note: initialize the variable 
'cpudata' to silence this warning
     789 |         struct amd_cpudata *cpudata;
         |                                    ^
         |                                     = NULL
   1 warning generated.
----

Cheers,
Tor