mbox series

[V8,0/7] amd-pstate preferred core

Message ID 20231009024932.2563622-1-li.meng@amd.com
Headers show
Series amd-pstate preferred core | expand

Message

Meng, Li (Jassmine) Oct. 9, 2023, 2:49 a.m. UTC
Hi all:

The core frequency is subjected to the process variation in semiconductors.
Not all cores are able to reach the maximum frequency respecting the
infrastructure limits. Consequently, AMD has redefined the concept of
maximum frequency of a part. This means that a fraction of cores can reach
maximum frequency. To find the best process scheduling policy for a given
scenario, OS needs to know the core ordering informed by the platform through
highest performance capability register of the CPPC interface.

Earlier implementations of amd-pstate preferred core only support a static
core ranking and targeted performance. Now it has the ability to dynamically
change the preferred core based on the workload and platform conditions and
accounting for thermals and aging.

Amd-pstate driver utilizes the functions and data structures provided by
the ITMT architecture to enable the scheduler to favor scheduling on cores
which can be get a higher frequency with lower voltage.
We call it amd-pstate preferred core.

Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.
Amd-pstate driver uses the highest performance value to indicate
the priority of CPU. The higher value has a higher priority.

Amd-pstate driver will provide an initial core ordering at boot time.
It relies on the CPPC interface to communicate the core ranking to the
operating system and scheduler to make sure that OS is choosing the cores
with highest performance firstly for scheduling the process. When amd-pstate
driver receives a message with the highest performance change, it will
update the core ranking.

Changes form V7->V8:
- all:
- - pick up Review-By flag added by Mario and Ray.
- cpufreq: amd-pstate:
- - use hw_prefcore embeds into cpudata structure.
- - delete preferred core init from cpu online/off.

Changes form V6->V7:
- x86:
- - Modify kconfig about X86_AMD_PSTATE.
- cpufreq: amd-pstate:
- - modify incorrect comments about scheduler_work().
- - convert highest_perf data type.
- - modify preferred core init when cpu init and online.
- acpi: cppc:
- - modify link of CPPC highest performance.
- cpufreq:
- - modify link of CPPC highest performance changed.

Changes form V5->V6:
- cpufreq: amd-pstate:
- - modify the wrong tag order.
- - modify warning about hw_prefcore sysfs attribute.
- - delete duplicate comments.
- - modify the variable name cppc_highest_perf to prefcore_ranking.
- - modify judgment conditions for setting highest_perf.
- - modify sysfs attribute for CPPC highest perf to pr_debug message.
- Documentation: amd-pstate:
- - modify warning: title underline too short.

Changes form V4->V5:
- cpufreq: amd-pstate:
- - modify sysfs attribute for CPPC highest perf.
- - modify warning about comments
- - rebase linux-next
- cpufreq: 
- - Moidfy warning about function declarations.
- Documentation: amd-pstate:
- - align with ``amd-pstat``

Changes form V3->V4:
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.

Changes form V2->V3:
- x86:
- - Modify kconfig and description.
- cpufreq: amd-pstate: 
- - Add Co-developed-by tag in commit message.
- cpufreq:
- - Modify commit message.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.

Changes form V1->V2:
- acpi: cppc:
- - Add reference link.
- cpufreq:
- - Moidfy link error.
- cpufreq: amd-pstate: 
- - Init the priorities of all online CPUs
- - Use a single variable to represent the status of preferred core.
- Documentation:
- - Default enabled preferred core.
- Documentation: amd-pstate: 
- - Modify inappropriate descriptions.
- - Default enabled preferred core.
- - Use a single variable to represent the status of preferred core.

Meng Li (7):
  x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
  acpi: cppc: Add get the highest performance cppc control
  cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
  cpufreq: Add a notification message that the highest perf has changed
  cpufreq: amd-pstate: Update amd-pstate preferred core ranking
    dynamically
  Documentation: amd-pstate: introduce amd-pstate preferred core
  Documentation: introduce amd-pstate preferrd core mode kernel command
    line options

 .../admin-guide/kernel-parameters.txt         |   5 +
 Documentation/admin-guide/pm/amd-pstate.rst   |  59 +++++-
 arch/x86/Kconfig                              |   5 +-
 drivers/acpi/cppc_acpi.c                      |  13 ++
 drivers/acpi/processor_driver.c               |   6 +
 drivers/cpufreq/amd-pstate.c                  | 186 ++++++++++++++++--
 drivers/cpufreq/cpufreq.c                     |  13 ++
 include/acpi/cppc_acpi.h                      |   5 +
 include/linux/amd-pstate.h                    |  10 +
 include/linux/cpufreq.h                       |   5 +
 10 files changed, 285 insertions(+), 22 deletions(-)

Comments

Wyes Karny Oct. 9, 2023, 6:19 a.m. UTC | #1
Hi Meng Li,

On 09 Oct 10:49, Meng Li wrote:
> Preferred core rankings can be changed dynamically by the
> platform based on the workload and platform conditions and
> accounting for thermals and aging.
> When this occurs, cpu priority need to be set.
> 
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
> Reviewed-by: Wyes Karny <wyes.karny@amd.com>
> Reviewed-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Meng Li <li.meng@amd.com>
> ---
>  drivers/cpufreq/amd-pstate.c | 34 ++++++++++++++++++++++++++++++++--
>  include/linux/amd-pstate.h   |  6 ++++++
>  2 files changed, 38 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 6ac8939fce5a..d3369247c6c9 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -313,6 +313,7 @@ static int pstate_init_perf(struct amd_cpudata *cpudata)
>  	WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1));
>  	WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1));
>  	WRITE_ONCE(cpudata->lowest_perf, AMD_CPPC_LOWEST_PERF(cap1));
> +	WRITE_ONCE(cpudata->prefcore_ranking, AMD_CPPC_HIGHEST_PERF(cap1));
>  
>  	return 0;
>  }
> @@ -334,6 +335,7 @@ static int cppc_init_perf(struct amd_cpudata *cpudata)
>  	WRITE_ONCE(cpudata->lowest_nonlinear_perf,
>  		   cppc_perf.lowest_nonlinear_perf);
>  	WRITE_ONCE(cpudata->lowest_perf, cppc_perf.lowest_perf);
> +	WRITE_ONCE(cpudata->prefcore_ranking, cppc_perf.highest_perf);
>  
>  	if (cppc_state == AMD_PSTATE_ACTIVE)
>  		return 0;
> @@ -540,7 +542,7 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
>  	if (target_perf < capacity)
>  		des_perf = DIV_ROUND_UP(cap_perf * target_perf, capacity);
>  
> -	min_perf = READ_ONCE(cpudata->highest_perf);
> +	min_perf = READ_ONCE(cpudata->lowest_perf);

This seems to be a fix. So, this could be a separate patch.

>  	if (_min_perf < capacity)
>  		min_perf = DIV_ROUND_UP(cap_perf * _min_perf, capacity);
>  
> @@ -760,6 +762,32 @@ static void amd_pstate_init_prefcore(struct amd_cpudata *cpudata)
>  	}
>  }
>  
> +static void amd_pstate_update_highest_perf(unsigned int cpu)
> +{
> +	struct cpufreq_policy *policy;
> +	struct amd_cpudata *cpudata;
> +	u32 prev_high = 0, cur_high = 0;
> +	int ret;
> +
> +	if ((!amd_pstate_prefcore) || (!cpudata->hw_prefcore))
> +		return;
> +
> +	ret = amd_pstate_get_highest_perf(cpu, &cur_high);
> +	if (ret)
> +		return;
> +
> +	policy = cpufreq_cpu_get(cpu);
> +	cpudata = policy->driver_data;
> +	prev_high = READ_ONCE(cpudata->prefcore_ranking);
> +
> +	if (prev_high != cur_high) {
> +		WRITE_ONCE(cpudata->prefcore_ranking, cur_high);
> +		sched_set_itmt_core_prio(cur_high, cpu);
> +	}
> +
> +	cpufreq_cpu_put(policy);
> +}
> +
>  static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
>  {
>  	int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
> @@ -926,7 +954,7 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy,
>  	u32 perf;
>  	struct amd_cpudata *cpudata = policy->driver_data;
>  
> -	perf = READ_ONCE(cpudata->highest_perf);
> +	perf = READ_ONCE(cpudata->prefcore_ranking);

I think this should show cpudata->highest_perf.

Thanks,
Wyes
>  
>  	return sysfs_emit(buf, "%u\n", perf);
>  }
> @@ -1502,6 +1530,7 @@ static struct cpufreq_driver amd_pstate_driver = {
>  	.suspend	= amd_pstate_cpu_suspend,
>  	.resume		= amd_pstate_cpu_resume,
>  	.set_boost	= amd_pstate_set_boost,
> +	.update_highest_perf	= amd_pstate_update_highest_perf,
>  	.name		= "amd-pstate",
>  	.attr		= amd_pstate_attr,
>  };
> @@ -1516,6 +1545,7 @@ static struct cpufreq_driver amd_pstate_epp_driver = {
>  	.online		= amd_pstate_epp_cpu_online,
>  	.suspend	= amd_pstate_epp_suspend,
>  	.resume		= amd_pstate_epp_resume,
> +	.update_highest_perf	= amd_pstate_update_highest_perf,
>  	.name		= "amd-pstate-epp",
>  	.attr		= amd_pstate_epp_attr,
>  };
> diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h
> index 87e140e9e6db..426822612373 100644
> --- a/include/linux/amd-pstate.h
> +++ b/include/linux/amd-pstate.h
> @@ -39,11 +39,16 @@ struct amd_aperf_mperf {
>   * @cppc_req_cached: cached performance request hints
>   * @highest_perf: the maximum performance an individual processor may reach,
>   *		  assuming ideal conditions
> + *		  For platforms that do not support the preferred core feature, the
> + *		  highest_pef may be configured with 166 or 255, to avoid max frequency
> + *		  calculated wrongly. we take the fixed value as the highest_perf.
>   * @nominal_perf: the maximum sustained performance level of the processor,
>   *		  assuming ideal operating conditions
>   * @lowest_nonlinear_perf: the lowest performance level at which nonlinear power
>   *			   savings are achieved
>   * @lowest_perf: the absolute lowest performance level of the processor
> + * @prefcore_ranking: the preferred core ranking, the higher value indicates a higher
> + * 		  priority.
>   * @max_freq: the frequency that mapped to highest_perf
>   * @min_freq: the frequency that mapped to lowest_perf
>   * @nominal_freq: the frequency that mapped to nominal_perf
> @@ -73,6 +78,7 @@ struct amd_cpudata {
>  	u32	nominal_perf;
>  	u32	lowest_nonlinear_perf;
>  	u32	lowest_perf;
> +	u32     prefcore_ranking;
>  
>  	u32	max_freq;
>  	u32	min_freq;
> -- 
> 2.34.1
>
Wyes Karny Oct. 9, 2023, 6:19 a.m. UTC | #2
On 09 Oct 10:49, Meng Li wrote:
> amd-pstate driver utilizes the functions and data structures
> provided by the ITMT architecture to enable the scheduler to
> favor scheduling on cores which can be get a higher frequency
> with lower voltage. We call it amd-pstate preferrred core.
> 
> Here sched_set_itmt_core_prio() is called to set priorities and
> sched_set_itmt_support() is called to enable ITMT feature.
> amd-pstate driver uses the highest performance value to indicate
> the priority of CPU. The higher value has a higher priority.
> 
> The initial core rankings are set up by amd-pstate when the
> system boots.
> 
> Add a variable hw_prefcore in cpudata structure. It will check
> if the processor and power firmware support preferred core
> feature.
> 
> Add one new early parameter `disable` to allow user to disable
> the preferred core.
> 
> Only when hardware supports preferred core and user set `enabled`
> in early parameter, amd pstate driver supports preferred core featue.
> 
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
> Co-developed-by: Perry Yuan <Perry.Yuan@amd.com>
> Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
> Signed-off-by: Meng Li <li.meng@amd.com>
> ---
>  drivers/cpufreq/amd-pstate.c | 152 +++++++++++++++++++++++++++++++----
>  include/linux/amd-pstate.h   |   4 +
>  2 files changed, 140 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 9a1e194d5cf8..6ac8939fce5a 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -37,6 +37,7 @@
>  #include <linux/uaccess.h>
>  #include <linux/static_call.h>
>  #include <linux/amd-pstate.h>
> +#include <linux/topology.h>
>  
>  #include <acpi/processor.h>
>  #include <acpi/cppc_acpi.h>
> @@ -49,6 +50,8 @@
>  
>  #define AMD_PSTATE_TRANSITION_LATENCY	20000
>  #define AMD_PSTATE_TRANSITION_DELAY	1000
> +#define AMD_PSTATE_PREFCORE_THRESHOLD	166
> +#define AMD_PSTATE_MAX_CPPC_PERF	255
>  
>  /*
>   * TODO: We need more time to fine tune processors with shared memory solution
> @@ -64,6 +67,7 @@ static struct cpufreq_driver amd_pstate_driver;
>  static struct cpufreq_driver amd_pstate_epp_driver;
>  static int cppc_state = AMD_PSTATE_UNDEFINED;
>  static bool cppc_enabled;
> +static bool amd_pstate_prefcore = true;
>  
>  /*
>   * AMD Energy Preference Performance (EPP)
> @@ -290,23 +294,21 @@ static inline int amd_pstate_enable(bool enable)
>  static int pstate_init_perf(struct amd_cpudata *cpudata)
>  {
>  	u64 cap1;
> -	u32 highest_perf;
>  
>  	int ret = rdmsrl_safe_on_cpu(cpudata->cpu, MSR_AMD_CPPC_CAP1,
>  				     &cap1);
>  	if (ret)
>  		return ret;
>  
> -	/*
> -	 * TODO: Introduce AMD specific power feature.
> -	 *
> -	 * CPPC entry doesn't indicate the highest performance in some ASICs.
> +	/* For platforms that do not support the preferred core feature, the
> +	 * highest_pef may be configured with 166 or 255, to avoid max frequency
> +	 * calculated wrongly. we take the AMD_CPPC_HIGHEST_PERF(cap1) value as
> +	 * the default max perf.
>  	 */
> -	highest_perf = amd_get_highest_perf();
> -	if (highest_perf > AMD_CPPC_HIGHEST_PERF(cap1))
> -		highest_perf = AMD_CPPC_HIGHEST_PERF(cap1);
> -
> -	WRITE_ONCE(cpudata->highest_perf, highest_perf);
> +	if (cpudata->hw_prefcore)
> +		WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
> +	else
> +		WRITE_ONCE(cpudata->highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
>  
>  	WRITE_ONCE(cpudata->nominal_perf, AMD_CPPC_NOMINAL_PERF(cap1));
>  	WRITE_ONCE(cpudata->lowest_nonlinear_perf, AMD_CPPC_LOWNONLIN_PERF(cap1));
> @@ -318,17 +320,15 @@ static int pstate_init_perf(struct amd_cpudata *cpudata)
>  static int cppc_init_perf(struct amd_cpudata *cpudata)
>  {
>  	struct cppc_perf_caps cppc_perf;
> -	u32 highest_perf;
>  
>  	int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf);
>  	if (ret)
>  		return ret;
>  
> -	highest_perf = amd_get_highest_perf();
> -	if (highest_perf > cppc_perf.highest_perf)
> -		highest_perf = cppc_perf.highest_perf;
> -
> -	WRITE_ONCE(cpudata->highest_perf, highest_perf);
> +	if (cpudata->hw_prefcore)
> +		WRITE_ONCE(cpudata->highest_perf, AMD_PSTATE_PREFCORE_THRESHOLD);
> +	else
> +		WRITE_ONCE(cpudata->highest_perf, cppc_perf.highest_perf);
>  
>  	WRITE_ONCE(cpudata->nominal_perf, cppc_perf.nominal_perf);
>  	WRITE_ONCE(cpudata->lowest_nonlinear_perf,
> @@ -676,6 +676,90 @@ static void amd_perf_ctl_reset(unsigned int cpu)
>  	wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0);
>  }
>  
> +/*
> + * Set amd-pstate preferred core enable can't be done directly from cpufreq callbacks
> + * due to locking, so queue the work for later.
> + */
> +static void amd_pstste_sched_prefcore_workfn(struct work_struct *work)
> +{
> +	sched_set_itmt_support();
> +}
> +static DECLARE_WORK(sched_prefcore_work, amd_pstste_sched_prefcore_workfn);
> +
> +/*
> + * Get the highest performance register value.
> + * @cpu: CPU from which to get highest performance.
> + * @highest_perf: Return address.
> + *
> + * Return: 0 for success, -EIO otherwise.
> + */
> +static int amd_pstate_get_highest_perf(int cpu, u32 *highest_perf)
> +{
> +	int ret;
> +	u64 cppc_highest_perf;
> +
> +	if (boot_cpu_has(X86_FEATURE_CPPC)) {
> +		u64 cap1;
> +
> +		ret = rdmsrl_safe_on_cpu(cpu, MSR_AMD_CPPC_CAP1, &cap1);
> +		if (ret)
> +			return ret;
> +		WRITE_ONCE(*highest_perf, AMD_CPPC_HIGHEST_PERF(cap1));
> +	} else {
> +		ret = cppc_get_highest_perf(cpu, &cppc_highest_perf);
> +		*highest_perf = (u32)(cppc_highest_perf & 0xFFFF);
> +	}
> +
> +	return (ret);
> +}
> +
> +static void amd_pstate_init_prefcore(struct amd_cpudata *cpudata)
> +{
> +	int ret;
> +	u32 highest_perf;
> +	static u32 max_highest_perf = 0, min_highest_perf = U32_MAX;
> +
> +	ret = amd_pstate_get_highest_perf(cpudata->cpu, &highest_perf);
> +	if (ret)
> +		return;
> +
> +	cpudata->hw_prefcore = true;
> +	/* check if CPPC preferred core feature is enabled*/
> +	if (highest_perf == AMD_PSTATE_MAX_CPPC_PERF) {
> +		pr_debug("AMD CPPC preferred core is unsupported!\n");
> +		cpudata->hw_prefcore = false;
> +		return;
> +	}
> +
> +	if (!amd_pstate_prefcore)
> +		return;
> +
> +	/*
> +	 * The priorities can be set regardless of whether or not
> +	 * sched_set_itmt_support(true) has been called and it is valid to
> +	 * update them at any time after it has been called.
> +	 */
> +	sched_set_itmt_core_prio(highest_perf, cpudata->cpu);
> +
> +	if (max_highest_perf <= min_highest_perf) {
> +		if (highest_perf > max_highest_perf)
> +			max_highest_perf = highest_perf;
> +
> +		if (highest_perf < min_highest_perf)
> +			min_highest_perf = highest_perf;
> +
> +		if (max_highest_perf > min_highest_perf) {
> +			/*
> +			 * This code can be run during CPU online under the
> +			 * CPU hotplug locks, so sched_set_itmt_support()
> +			 * cannot be called from here.  Queue up a work item
> +			 * to invoke it.
> +			 */
> +			schedule_work(&sched_prefcore_work);
> +		}
> +	}
> +}
> +
>  static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
>  {
>  	int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
> @@ -697,6 +781,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
>  
>  	cpudata->cpu = policy->cpu;
>  
> +	amd_pstate_init_prefcore(cpudata);
> +
>  	ret = amd_pstate_init_perf(cpudata);
>  	if (ret)
>  		goto free_cpudata1;
> @@ -845,6 +931,17 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy,
>  	return sysfs_emit(buf, "%u\n", perf);
>  }
>  
> +static ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy,
> +					   char *buf)
> +{
> +	bool hw_prefcore;
> +	struct amd_cpudata *cpudata = policy->driver_data;
> +
> +	hw_prefcore = READ_ONCE(cpudata->hw_prefcore);
> +
> +	return sysfs_emit(buf, "%s\n", hw_prefcore ? "supported" : "unsupported");
> +}
> +
>  static ssize_t show_energy_performance_available_preferences(
>  				struct cpufreq_policy *policy, char *buf)
>  {
> @@ -1037,18 +1134,27 @@ static ssize_t status_store(struct device *a, struct device_attribute *b,
>  	return ret < 0 ? ret : count;
>  }
>  
> +static ssize_t prefcore_show(struct device *dev,
> +			     struct device_attribute *attr, char *buf)
> +{
> +	return sysfs_emit(buf, "%s\n", amd_pstate_prefcore ? "enabled" : "disabled");
> +}
> +
>  cpufreq_freq_attr_ro(amd_pstate_max_freq);
>  cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
>  
>  cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> +cpufreq_freq_attr_ro(amd_pstate_hw_prefcore);
>  cpufreq_freq_attr_rw(energy_performance_preference);
>  cpufreq_freq_attr_ro(energy_performance_available_preferences);
>  static DEVICE_ATTR_RW(status);
> +static DEVICE_ATTR_RO(prefcore);
>  
>  static struct freq_attr *amd_pstate_attr[] = {
>  	&amd_pstate_max_freq,
>  	&amd_pstate_lowest_nonlinear_freq,
>  	&amd_pstate_highest_perf,
> +	&amd_pstate_hw_prefcore,
>  	NULL,
>  };
>  
> @@ -1056,6 +1162,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = {
>  	&amd_pstate_max_freq,
>  	&amd_pstate_lowest_nonlinear_freq,
>  	&amd_pstate_highest_perf,
> +	&amd_pstate_hw_prefcore,
>  	&energy_performance_preference,
>  	&energy_performance_available_preferences,
>  	NULL,
> @@ -1063,6 +1170,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = {
>  
>  static struct attribute *pstate_global_attributes[] = {
>  	&dev_attr_status.attr,
> +	&dev_attr_prefcore.attr,
>  	NULL
>  };
>  
> @@ -1114,6 +1222,8 @@ static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy)
>  	cpudata->cpu = policy->cpu;
>  	cpudata->epp_policy = 0;
>  
> +	amd_pstate_init_prefcore(cpudata);
> +
>  	ret = amd_pstate_init_perf(cpudata);
>  	if (ret)
>  		goto free_cpudata1;
> @@ -1527,7 +1637,17 @@ static int __init amd_pstate_param(char *str)
>  
>  	return amd_pstate_set_driver(mode_idx);
>  }
> +
> +static int __init amd_prefcore_param(char *str)
> +{
> +	if (!strcmp(str, "disable"))
> +		amd_pstate_prefcore = false;
> +
> +	return 0;
> +}
> +
>  early_param("amd_pstate", amd_pstate_param);
> +early_param("amd_prefcore", amd_prefcore_param);
>  
>  MODULE_AUTHOR("Huang Rui <ray.huang@amd.com>");
>  MODULE_DESCRIPTION("AMD Processor P-state Frequency Driver");
> diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h
> index 446394f84606..87e140e9e6db 100644
> --- a/include/linux/amd-pstate.h
> +++ b/include/linux/amd-pstate.h
> @@ -52,6 +52,9 @@ struct amd_aperf_mperf {
>   * @prev: Last Aperf/Mperf/tsc count value read from register
>   * @freq: current cpu frequency value
>   * @boost_supported: check whether the Processor or SBIOS supports boost mode
> + * @hw_prefcore: check whether HW supports preferred core featue.
> + * 		  Only when hw_prefcore and early prefcore param are true,
> + * 		  AMD P-State driver supports preferred core featue.
>   * @epp_policy: Last saved policy used to set energy-performance preference
>   * @epp_cached: Cached CPPC energy-performance preference value
>   * @policy: Cpufreq policy value
> @@ -81,6 +84,7 @@ struct amd_cpudata {
>  
>  	u64	freq;
>  	bool	boost_supported;
> +	bool	hw_prefcore;
>  
>  	/* EPP feature related attributes*/
>  	s16	epp_policy;
> -- 
> 2.34.1
>
Meng, Li (Jassmine) Oct. 9, 2023, 7:23 a.m. UTC | #3
[AMD Official Use Only - General]

Hi Oleksandr:

> -----Original Message-----
> From: Oleksandr Natalenko <oleksandr@natalenko.name>
> Sent: Monday, October 9, 2023 2:55 PM
> To: Rafael J . Wysocki <rafael.j.wysocki@intel.com>; Huang, Ray
> <Ray.Huang@amd.com>; Meng, Li (Jassmine) <Li.Meng@amd.com>
> Cc: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org;
> x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan
> <skhan@linuxfoundation.org>; linux-kselftest@vger.kernel.org; Fontenot,
> Nathan <Nathan.Fontenot@amd.com>; Sharma, Deepak
> <Deepak.Sharma@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Limonciello, Mario
> <Mario.Limonciello@amd.com>; Huang, Shimmer
> <Shimmer.Huang@amd.com>; Yuan, Perry <Perry.Yuan@amd.com>; Du,
> Xiaojian <Xiaojian.Du@amd.com>; Viresh Kumar <viresh.kumar@linaro.org>;
> Borislav Petkov <bp@alien8.de>; Meng, Li (Jassmine) <Li.Meng@amd.com>
> Subject: Re: [PATCH V8 0/7] amd-pstate preferred core
>
> Hello.
>
> On pondělí 9. října 2023 4:49:25 CEST Meng Li wrote:
> > Hi all:
> >
> > The core frequency is subjected to the process variation in semiconductors.
> > Not all cores are able to reach the maximum frequency respecting the
> > infrastructure limits. Consequently, AMD has redefined the concept of
> > maximum frequency of a part. This means that a fraction of cores can
> > reach maximum frequency. To find the best process scheduling policy
> > for a given scenario, OS needs to know the core ordering informed by
> > the platform through highest performance capability register of the CPPC
> interface.
> >
> > Earlier implementations of amd-pstate preferred core only support a
> > static core ranking and targeted performance. Now it has the ability
> > to dynamically change the preferred core based on the workload and
> > platform conditions and accounting for thermals and aging.
> >
> > Amd-pstate driver utilizes the functions and data structures provided
> > by the ITMT architecture to enable the scheduler to favor scheduling
> > on cores which can be get a higher frequency with lower voltage.
> > We call it amd-pstate preferred core.
> >
> > Here sched_set_itmt_core_prio() is called to set priorities and
> > sched_set_itmt_support() is called to enable ITMT feature.
> > Amd-pstate driver uses the highest performance value to indicate the
> > priority of CPU. The higher value has a higher priority.
> >
> > Amd-pstate driver will provide an initial core ordering at boot time.
> > It relies on the CPPC interface to communicate the core ranking to the
> > operating system and scheduler to make sure that OS is choosing the
> > cores with highest performance firstly for scheduling the process.
> > When amd-pstate driver receives a message with the highest performance
> > change, it will update the core ranking.
> >
> > Changes form V7->V8:
> > - all:
> > - - pick up Review-By flag added by Mario and Ray.
> > - cpufreq: amd-pstate:
> > - - use hw_prefcore embeds into cpudata structure.
> > - - delete preferred core init from cpu online/off.
>
> Could you please let me know if this change means a fix for the report I've
> sent previously? [1]
>
[Meng, Li (Jassmine)] Yes.
I have deleted online handle function of amd pstate driver.
It doesn't re-initialize preferred core.
This online function will set incorrect des perf value.

> Would you also be able to Cc me on the next iteration of this patchset?
[Meng, Li (Jassmine)] OK.
>
> Thank you!
>
> [1] https://lore.kernel.org/lkml/5973628.lOV4Wx5bFT@natalenko.name/
>
> >
> > Changes form V6->V7:
> > - x86:
> > - - Modify kconfig about X86_AMD_PSTATE.
> > - cpufreq: amd-pstate:
> > - - modify incorrect comments about scheduler_work().
> > - - convert highest_perf data type.
> > - - modify preferred core init when cpu init and online.
> > - acpi: cppc:
> > - - modify link of CPPC highest performance.
> > - cpufreq:
> > - - modify link of CPPC highest performance changed.
> >
> > Changes form V5->V6:
> > - cpufreq: amd-pstate:
> > - - modify the wrong tag order.
> > - - modify warning about hw_prefcore sysfs attribute.
> > - - delete duplicate comments.
> > - - modify the variable name cppc_highest_perf to prefcore_ranking.
> > - - modify judgment conditions for setting highest_perf.
> > - - modify sysfs attribute for CPPC highest perf to pr_debug message.
> > - Documentation: amd-pstate:
> > - - modify warning: title underline too short.
> >
> > Changes form V4->V5:
> > - cpufreq: amd-pstate:
> > - - modify sysfs attribute for CPPC highest perf.
> > - - modify warning about comments
> > - - rebase linux-next
> > - cpufreq:
> > - - Moidfy warning about function declarations.
> > - Documentation: amd-pstate:
> > - - align with ``amd-pstat``
> >
> > Changes form V3->V4:
> > - Documentation: amd-pstate:
> > - - Modify inappropriate descriptions.
> >
> > Changes form V2->V3:
> > - x86:
> > - - Modify kconfig and description.
> > - cpufreq: amd-pstate:
> > - - Add Co-developed-by tag in commit message.
> > - cpufreq:
> > - - Modify commit message.
> > - Documentation: amd-pstate:
> > - - Modify inappropriate descriptions.
> >
> > Changes form V1->V2:
> > - acpi: cppc:
> > - - Add reference link.
> > - cpufreq:
> > - - Moidfy link error.
> > - cpufreq: amd-pstate:
> > - - Init the priorities of all online CPUs
> > - - Use a single variable to represent the status of preferred core.
> > - Documentation:
> > - - Default enabled preferred core.
> > - Documentation: amd-pstate:
> > - - Modify inappropriate descriptions.
> > - - Default enabled preferred core.
> > - - Use a single variable to represent the status of preferred core.
> >
> > Meng Li (7):
> >   x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
> >   acpi: cppc: Add get the highest performance cppc control
> >   cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
> >   cpufreq: Add a notification message that the highest perf has changed
> >   cpufreq: amd-pstate: Update amd-pstate preferred core ranking
> >     dynamically
> >   Documentation: amd-pstate: introduce amd-pstate preferred core
> >   Documentation: introduce amd-pstate preferrd core mode kernel
> command
> >     line options
> >
> >  .../admin-guide/kernel-parameters.txt         |   5 +
> >  Documentation/admin-guide/pm/amd-pstate.rst   |  59 +++++-
> >  arch/x86/Kconfig                              |   5 +-
> >  drivers/acpi/cppc_acpi.c                      |  13 ++
> >  drivers/acpi/processor_driver.c               |   6 +
> >  drivers/cpufreq/amd-pstate.c                  | 186 ++++++++++++++++--
> >  drivers/cpufreq/cpufreq.c                     |  13 ++
> >  include/acpi/cppc_acpi.h                      |   5 +
> >  include/linux/amd-pstate.h                    |  10 +
> >  include/linux/cpufreq.h                       |   5 +
> >  10 files changed, 285 insertions(+), 22 deletions(-)
> >
> >
>
>
> --
> Oleksandr Natalenko (post-factum)
Oleksandr Natalenko Oct. 9, 2023, 12:59 p.m. UTC | #4
Hello.

On pondělí 9. října 2023 9:23:29 CEST Meng, Li (Jassmine) wrote:
> [AMD Official Use Only - General]
> 
> Hi Oleksandr:
> 
> > -----Original Message-----
> > From: Oleksandr Natalenko <oleksandr@natalenko.name>
> > Sent: Monday, October 9, 2023 2:55 PM
> > To: Rafael J . Wysocki <rafael.j.wysocki@intel.com>; Huang, Ray
> > <Ray.Huang@amd.com>; Meng, Li (Jassmine) <Li.Meng@amd.com>
> > Cc: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan
> > <skhan@linuxfoundation.org>; linux-kselftest@vger.kernel.org; Fontenot,
> > Nathan <Nathan.Fontenot@amd.com>; Sharma, Deepak
> > <Deepak.Sharma@amd.com>; Deucher, Alexander
> > <Alexander.Deucher@amd.com>; Limonciello, Mario
> > <Mario.Limonciello@amd.com>; Huang, Shimmer
> > <Shimmer.Huang@amd.com>; Yuan, Perry <Perry.Yuan@amd.com>; Du,
> > Xiaojian <Xiaojian.Du@amd.com>; Viresh Kumar <viresh.kumar@linaro.org>;
> > Borislav Petkov <bp@alien8.de>; Meng, Li (Jassmine) <Li.Meng@amd.com>
> > Subject: Re: [PATCH V8 0/7] amd-pstate preferred core
> >
> > Hello.
> >
> > On pondělí 9. října 2023 4:49:25 CEST Meng Li wrote:
> > > Hi all:
> > >
> > > The core frequency is subjected to the process variation in semiconductors.
> > > Not all cores are able to reach the maximum frequency respecting the
> > > infrastructure limits. Consequently, AMD has redefined the concept of
> > > maximum frequency of a part. This means that a fraction of cores can
> > > reach maximum frequency. To find the best process scheduling policy
> > > for a given scenario, OS needs to know the core ordering informed by
> > > the platform through highest performance capability register of the CPPC
> > interface.
> > >
> > > Earlier implementations of amd-pstate preferred core only support a
> > > static core ranking and targeted performance. Now it has the ability
> > > to dynamically change the preferred core based on the workload and
> > > platform conditions and accounting for thermals and aging.
> > >
> > > Amd-pstate driver utilizes the functions and data structures provided
> > > by the ITMT architecture to enable the scheduler to favor scheduling
> > > on cores which can be get a higher frequency with lower voltage.
> > > We call it amd-pstate preferred core.
> > >
> > > Here sched_set_itmt_core_prio() is called to set priorities and
> > > sched_set_itmt_support() is called to enable ITMT feature.
> > > Amd-pstate driver uses the highest performance value to indicate the
> > > priority of CPU. The higher value has a higher priority.
> > >
> > > Amd-pstate driver will provide an initial core ordering at boot time.
> > > It relies on the CPPC interface to communicate the core ranking to the
> > > operating system and scheduler to make sure that OS is choosing the
> > > cores with highest performance firstly for scheduling the process.
> > > When amd-pstate driver receives a message with the highest performance
> > > change, it will update the core ranking.
> > >
> > > Changes form V7->V8:
> > > - all:
> > > - - pick up Review-By flag added by Mario and Ray.
> > > - cpufreq: amd-pstate:
> > > - - use hw_prefcore embeds into cpudata structure.
> > > - - delete preferred core init from cpu online/off.
> >
> > Could you please let me know if this change means a fix for the report I've
> > sent previously? [1]
> >
> [Meng, Li (Jassmine)] Yes.
> I have deleted online handle function of amd pstate driver.
> It doesn't re-initialize preferred core.
> This online function will set incorrect des perf value.

Thank you for the confirmation. I've built v6.5.5 with this patchset applied, and now the frequency is as expected after the suspend-resume cycle.

I've also added the following modification to accommodate recent feedback:

```
commit 1450ac395434c532f995521e1a2497d09ddf106c
Author: Oleksandr Natalenko <oleksandr@natalenko.name>
Date:   Mon Oct 9 11:19:50 2023 +0200

    cpufreq/amd-pstate: show prefcore_ranking separately
    
    Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index d3369247c6c9c..86999d861e87b 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -954,6 +954,17 @@ static ssize_t show_amd_pstate_highest_perf(struct cpufreq_policy *policy,
 	u32 perf;
 	struct amd_cpudata *cpudata = policy->driver_data;
 
+	perf = READ_ONCE(cpudata->highest_perf);
+
+	return sysfs_emit(buf, "%u\n", perf);
+}
+
+static ssize_t show_amd_pstate_prefcore_ranking(struct cpufreq_policy *policy,
+						char *buf)
+{
+	u32 perf;
+	struct amd_cpudata *cpudata = policy->driver_data;
+
 	perf = READ_ONCE(cpudata->prefcore_ranking);
 
 	return sysfs_emit(buf, "%u\n", perf);
@@ -1172,6 +1183,7 @@ cpufreq_freq_attr_ro(amd_pstate_max_freq);
 cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
 
 cpufreq_freq_attr_ro(amd_pstate_highest_perf);
+cpufreq_freq_attr_ro(amd_pstate_prefcore_ranking);
 cpufreq_freq_attr_ro(amd_pstate_hw_prefcore);
 cpufreq_freq_attr_rw(energy_performance_preference);
 cpufreq_freq_attr_ro(energy_performance_available_preferences);
@@ -1182,6 +1194,7 @@ static struct freq_attr *amd_pstate_attr[] = {
 	&amd_pstate_max_freq,
 	&amd_pstate_lowest_nonlinear_freq,
 	&amd_pstate_highest_perf,
+	&amd_pstate_prefcore_ranking,
 	&amd_pstate_hw_prefcore,
 	NULL,
 };
@@ -1190,6 +1203,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = {
 	&amd_pstate_max_freq,
 	&amd_pstate_lowest_nonlinear_freq,
 	&amd_pstate_highest_perf,
+	&amd_pstate_prefcore_ranking,
 	&amd_pstate_hw_prefcore,
 	&energy_performance_preference,
 	&energy_performance_available_preferences,
```

with the following output as a result:

```
[~]> grep . /sys/devices/system/cpu*/cpufreq/policy*/amd_pstate_highest_perf
/sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy1/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy2/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy3/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy4/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy5/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy6/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy7/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy8/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy9/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy10/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy11/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy12/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy13/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy14/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy15/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy16/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy17/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy18/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy19/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy20/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy21/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy22/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy23/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy24/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy25/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy26/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy27/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy28/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy29/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy30/amd_pstate_highest_perf:166
/sys/devices/system/cpu/cpufreq/policy31/amd_pstate_highest_perf:166

[~]> grep . /sys/devices/system/cpu*/cpufreq/policy*/amd_pstate_hw_prefcore
/sys/devices/system/cpu/cpufreq/policy0/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy1/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy2/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy3/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy4/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy5/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy6/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy7/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy8/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy9/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy10/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy11/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy12/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy13/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy14/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy15/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy16/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy17/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy18/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy19/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy20/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy21/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy22/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy23/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy24/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy25/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy26/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy27/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy28/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy29/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy30/amd_pstate_hw_prefcore:supported
/sys/devices/system/cpu/cpufreq/policy31/amd_pstate_hw_prefcore:supported

[~]> grep . /sys/devices/system/cpu*/cpufreq/policy*/amd_pstate_prefcore_ranking
/sys/devices/system/cpu/cpufreq/policy0/amd_pstate_prefcore_ranking:226
/sys/devices/system/cpu/cpufreq/policy1/amd_pstate_prefcore_ranking:231
/sys/devices/system/cpu/cpufreq/policy2/amd_pstate_prefcore_ranking:211
/sys/devices/system/cpu/cpufreq/policy3/amd_pstate_prefcore_ranking:236
/sys/devices/system/cpu/cpufreq/policy4/amd_pstate_prefcore_ranking:216
/sys/devices/system/cpu/cpufreq/policy5/amd_pstate_prefcore_ranking:236
/sys/devices/system/cpu/cpufreq/policy6/amd_pstate_prefcore_ranking:206
/sys/devices/system/cpu/cpufreq/policy7/amd_pstate_prefcore_ranking:221
/sys/devices/system/cpu/cpufreq/policy8/amd_pstate_prefcore_ranking:191
/sys/devices/system/cpu/cpufreq/policy9/amd_pstate_prefcore_ranking:201
/sys/devices/system/cpu/cpufreq/policy10/amd_pstate_prefcore_ranking:186
/sys/devices/system/cpu/cpufreq/policy11/amd_pstate_prefcore_ranking:196
/sys/devices/system/cpu/cpufreq/policy12/amd_pstate_prefcore_ranking:171
/sys/devices/system/cpu/cpufreq/policy13/amd_pstate_prefcore_ranking:166
/sys/devices/system/cpu/cpufreq/policy14/amd_pstate_prefcore_ranking:176
/sys/devices/system/cpu/cpufreq/policy15/amd_pstate_prefcore_ranking:181
/sys/devices/system/cpu/cpufreq/policy16/amd_pstate_prefcore_ranking:226
/sys/devices/system/cpu/cpufreq/policy17/amd_pstate_prefcore_ranking:231
/sys/devices/system/cpu/cpufreq/policy18/amd_pstate_prefcore_ranking:211
/sys/devices/system/cpu/cpufreq/policy19/amd_pstate_prefcore_ranking:236
/sys/devices/system/cpu/cpufreq/policy20/amd_pstate_prefcore_ranking:216
/sys/devices/system/cpu/cpufreq/policy21/amd_pstate_prefcore_ranking:236
/sys/devices/system/cpu/cpufreq/policy22/amd_pstate_prefcore_ranking:206
/sys/devices/system/cpu/cpufreq/policy23/amd_pstate_prefcore_ranking:221
/sys/devices/system/cpu/cpufreq/policy24/amd_pstate_prefcore_ranking:191
/sys/devices/system/cpu/cpufreq/policy25/amd_pstate_prefcore_ranking:201
/sys/devices/system/cpu/cpufreq/policy26/amd_pstate_prefcore_ranking:186
/sys/devices/system/cpu/cpufreq/policy27/amd_pstate_prefcore_ranking:196
/sys/devices/system/cpu/cpufreq/policy28/amd_pstate_prefcore_ranking:171
/sys/devices/system/cpu/cpufreq/policy29/amd_pstate_prefcore_ranking:166
/sys/devices/system/cpu/cpufreq/policy30/amd_pstate_prefcore_ranking:176
/sys/devices/system/cpu/cpufreq/policy31/amd_pstate_prefcore_ranking:181
```

When I run `dd if=/dev/zero of=/dev/null`, the load lands onto cores 3, 5, 19 or 21, IOW, those that have the highest `amd_pstate_prefcore_ranking` value given `schedutil` is in use.

If all of the above is as expected, please add:

Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>

> > Would you also be able to Cc me on the next iteration of this patchset?
> [Meng, Li (Jassmine)] OK.

Thanks.

> >
> > Thank you!
> >
> > [1] https://lore.kernel.org/lkml/5973628.lOV4Wx5bFT@natalenko.name/
> >
> > >
> > > Changes form V6->V7:
> > > - x86:
> > > - - Modify kconfig about X86_AMD_PSTATE.
> > > - cpufreq: amd-pstate:
> > > - - modify incorrect comments about scheduler_work().
> > > - - convert highest_perf data type.
> > > - - modify preferred core init when cpu init and online.
> > > - acpi: cppc:
> > > - - modify link of CPPC highest performance.
> > > - cpufreq:
> > > - - modify link of CPPC highest performance changed.
> > >
> > > Changes form V5->V6:
> > > - cpufreq: amd-pstate:
> > > - - modify the wrong tag order.
> > > - - modify warning about hw_prefcore sysfs attribute.
> > > - - delete duplicate comments.
> > > - - modify the variable name cppc_highest_perf to prefcore_ranking.
> > > - - modify judgment conditions for setting highest_perf.
> > > - - modify sysfs attribute for CPPC highest perf to pr_debug message.
> > > - Documentation: amd-pstate:
> > > - - modify warning: title underline too short.
> > >
> > > Changes form V4->V5:
> > > - cpufreq: amd-pstate:
> > > - - modify sysfs attribute for CPPC highest perf.
> > > - - modify warning about comments
> > > - - rebase linux-next
> > > - cpufreq:
> > > - - Moidfy warning about function declarations.
> > > - Documentation: amd-pstate:
> > > - - align with ``amd-pstat``
> > >
> > > Changes form V3->V4:
> > > - Documentation: amd-pstate:
> > > - - Modify inappropriate descriptions.
> > >
> > > Changes form V2->V3:
> > > - x86:
> > > - - Modify kconfig and description.
> > > - cpufreq: amd-pstate:
> > > - - Add Co-developed-by tag in commit message.
> > > - cpufreq:
> > > - - Modify commit message.
> > > - Documentation: amd-pstate:
> > > - - Modify inappropriate descriptions.
> > >
> > > Changes form V1->V2:
> > > - acpi: cppc:
> > > - - Add reference link.
> > > - cpufreq:
> > > - - Moidfy link error.
> > > - cpufreq: amd-pstate:
> > > - - Init the priorities of all online CPUs
> > > - - Use a single variable to represent the status of preferred core.
> > > - Documentation:
> > > - - Default enabled preferred core.
> > > - Documentation: amd-pstate:
> > > - - Modify inappropriate descriptions.
> > > - - Default enabled preferred core.
> > > - - Use a single variable to represent the status of preferred core.
> > >
> > > Meng Li (7):
> > >   x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
> > >   acpi: cppc: Add get the highest performance cppc control
> > >   cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
> > >   cpufreq: Add a notification message that the highest perf has changed
> > >   cpufreq: amd-pstate: Update amd-pstate preferred core ranking
> > >     dynamically
> > >   Documentation: amd-pstate: introduce amd-pstate preferred core
> > >   Documentation: introduce amd-pstate preferrd core mode kernel
> > command
> > >     line options
> > >
> > >  .../admin-guide/kernel-parameters.txt         |   5 +
> > >  Documentation/admin-guide/pm/amd-pstate.rst   |  59 +++++-
> > >  arch/x86/Kconfig                              |   5 +-
> > >  drivers/acpi/cppc_acpi.c                      |  13 ++
> > >  drivers/acpi/processor_driver.c               |   6 +
> > >  drivers/cpufreq/amd-pstate.c                  | 186 ++++++++++++++++--
> > >  drivers/cpufreq/cpufreq.c                     |  13 ++
> > >  include/acpi/cppc_acpi.h                      |   5 +
> > >  include/linux/amd-pstate.h                    |  10 +
> > >  include/linux/cpufreq.h                       |   5 +
> > >  10 files changed, 285 insertions(+), 22 deletions(-)
> > >
> > >
> >
> >
> > --
> > Oleksandr Natalenko (post-factum)
>
Meng, Li (Jassmine) Oct. 10, 2023, 2:15 a.m. UTC | #5
[AMD Official Use Only - General]

Hi Oleksandr:

> -----Original Message-----
> From: Oleksandr Natalenko <oleksandr@natalenko.name>
> Sent: Monday, October 9, 2023 9:00 PM
> To: Rafael J . Wysocki <rafael.j.wysocki@intel.com>; Huang, Ray
> <Ray.Huang@amd.com>; Meng, Li (Jassmine) <Li.Meng@amd.com>
> Cc: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org;
> x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan
> <skhan@linuxfoundation.org>; linux-kselftest@vger.kernel.org; Fontenot,
> Nathan <Nathan.Fontenot@amd.com>; Sharma, Deepak
> <Deepak.Sharma@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Limonciello, Mario
> <Mario.Limonciello@amd.com>; Huang, Shimmer
> <Shimmer.Huang@amd.com>; Yuan, Perry <Perry.Yuan@amd.com>; Du,
> Xiaojian <Xiaojian.Du@amd.com>; Viresh Kumar <viresh.kumar@linaro.org>;
> Borislav Petkov <bp@alien8.de>
> Subject: Re: [PATCH V8 0/7] amd-pstate preferred core
>
> Hello.
>
> On pondělí 9. října 2023 9:23:29 CEST Meng, Li (Jassmine) wrote:
> > [AMD Official Use Only - General]
> >
> > Hi Oleksandr:
> >
> > > -----Original Message-----
> > > From: Oleksandr Natalenko <oleksandr@natalenko.name>
> > > Sent: Monday, October 9, 2023 2:55 PM
> > > To: Rafael J . Wysocki <rafael.j.wysocki@intel.com>; Huang, Ray
> > > <Ray.Huang@amd.com>; Meng, Li (Jassmine) <Li.Meng@amd.com>
> > > Cc: linux-pm@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > x86@kernel.org; linux-acpi@vger.kernel.org; Shuah Khan
> > > <skhan@linuxfoundation.org>; linux-kselftest@vger.kernel.org;
> > > Fontenot, Nathan <Nathan.Fontenot@amd.com>; Sharma, Deepak
> > > <Deepak.Sharma@amd.com>; Deucher, Alexander
> > > <Alexander.Deucher@amd.com>; Limonciello, Mario
> > > <Mario.Limonciello@amd.com>; Huang, Shimmer
> <Shimmer.Huang@amd.com>;
> > > Yuan, Perry <Perry.Yuan@amd.com>; Du, Xiaojian
> > > <Xiaojian.Du@amd.com>; Viresh Kumar <viresh.kumar@linaro.org>;
> > > Borislav Petkov <bp@alien8.de>; Meng, Li (Jassmine)
> > > <Li.Meng@amd.com>
> > > Subject: Re: [PATCH V8 0/7] amd-pstate preferred core
> > >
> > > Hello.
> > >
> > > On pondělí 9. října 2023 4:49:25 CEST Meng Li wrote:
> > > > Hi all:
> > > >
> > > > The core frequency is subjected to the process variation in
> semiconductors.
> > > > Not all cores are able to reach the maximum frequency respecting
> > > > the infrastructure limits. Consequently, AMD has redefined the
> > > > concept of maximum frequency of a part. This means that a fraction
> > > > of cores can reach maximum frequency. To find the best process
> > > > scheduling policy for a given scenario, OS needs to know the core
> > > > ordering informed by the platform through highest performance
> > > > capability register of the CPPC
> > > interface.
> > > >
> > > > Earlier implementations of amd-pstate preferred core only support
> > > > a static core ranking and targeted performance. Now it has the
> > > > ability to dynamically change the preferred core based on the
> > > > workload and platform conditions and accounting for thermals and
> aging.
> > > >
> > > > Amd-pstate driver utilizes the functions and data structures
> > > > provided by the ITMT architecture to enable the scheduler to favor
> > > > scheduling on cores which can be get a higher frequency with lower
> voltage.
> > > > We call it amd-pstate preferred core.
> > > >
> > > > Here sched_set_itmt_core_prio() is called to set priorities and
> > > > sched_set_itmt_support() is called to enable ITMT feature.
> > > > Amd-pstate driver uses the highest performance value to indicate
> > > > the priority of CPU. The higher value has a higher priority.
> > > >
> > > > Amd-pstate driver will provide an initial core ordering at boot time.
> > > > It relies on the CPPC interface to communicate the core ranking to
> > > > the operating system and scheduler to make sure that OS is
> > > > choosing the cores with highest performance firstly for scheduling the
> process.
> > > > When amd-pstate driver receives a message with the highest
> > > > performance change, it will update the core ranking.
> > > >
> > > > Changes form V7->V8:
> > > > - all:
> > > > - - pick up Review-By flag added by Mario and Ray.
> > > > - cpufreq: amd-pstate:
> > > > - - use hw_prefcore embeds into cpudata structure.
> > > > - - delete preferred core init from cpu online/off.
> > >
> > > Could you please let me know if this change means a fix for the
> > > report I've sent previously? [1]
> > >
> > [Meng, Li (Jassmine)] Yes.
> > I have deleted online handle function of amd pstate driver.
> > It doesn't re-initialize preferred core.
> > This online function will set incorrect des perf value.
>
> Thank you for the confirmation. I've built v6.5.5 with this patchset applied,
> and now the frequency is as expected after the suspend-resume cycle.
>
> I've also added the following modification to accommodate recent feedback:
>
> ```
> commit 1450ac395434c532f995521e1a2497d09ddf106c
> Author: Oleksandr Natalenko <oleksandr@natalenko.name>
> Date:   Mon Oct 9 11:19:50 2023 +0200
>
>     cpufreq/amd-pstate: show prefcore_ranking separately
>
>     Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index d3369247c6c9c..86999d861e87b 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -954,6 +954,17 @@ static ssize_t show_amd_pstate_highest_perf(struct
> cpufreq_policy *policy,
>       u32 perf;
>       struct amd_cpudata *cpudata = policy->driver_data;
>
> +     perf = READ_ONCE(cpudata->highest_perf);
> +
> +     return sysfs_emit(buf, "%u\n", perf);
> +}
> +
> +static ssize_t show_amd_pstate_prefcore_ranking(struct cpufreq_policy
> *policy,
> +                                             char *buf)
> +{
> +     u32 perf;
> +     struct amd_cpudata *cpudata = policy->driver_data;
> +
>       perf = READ_ONCE(cpudata->prefcore_ranking);
>
>       return sysfs_emit(buf, "%u\n", perf);
> @@ -1172,6 +1183,7 @@ cpufreq_freq_attr_ro(amd_pstate_max_freq);
>  cpufreq_freq_attr_ro(amd_pstate_lowest_nonlinear_freq);
>
>  cpufreq_freq_attr_ro(amd_pstate_highest_perf);
> +cpufreq_freq_attr_ro(amd_pstate_prefcore_ranking);
>  cpufreq_freq_attr_ro(amd_pstate_hw_prefcore);
>  cpufreq_freq_attr_rw(energy_performance_preference);
>  cpufreq_freq_attr_ro(energy_performance_available_preferences);
> @@ -1182,6 +1194,7 @@ static struct freq_attr *amd_pstate_attr[] = {
>       &amd_pstate_max_freq,
>       &amd_pstate_lowest_nonlinear_freq,
>       &amd_pstate_highest_perf,
> +     &amd_pstate_prefcore_ranking,
>       &amd_pstate_hw_prefcore,
>       NULL,
>  };
> @@ -1190,6 +1203,7 @@ static struct freq_attr *amd_pstate_epp_attr[] = {
>       &amd_pstate_max_freq,
>       &amd_pstate_lowest_nonlinear_freq,
>       &amd_pstate_highest_perf,
> +     &amd_pstate_prefcore_ranking,
>       &amd_pstate_hw_prefcore,
>       &energy_performance_preference,
>       &energy_performance_available_preferences,
> ```
>
> with the following output as a result:
>
> ```
> [~]> grep .
> /sys/devices/system/cpu*/cpufreq/policy*/amd_pstate_highest_perf
> /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy1/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy2/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy3/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy4/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy5/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy6/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy7/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy8/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy9/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy10/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy11/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy12/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy13/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy14/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy15/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy16/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy17/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy18/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy19/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy20/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy21/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy22/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy23/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy24/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy25/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy26/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy27/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy28/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy29/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy30/amd_pstate_highest_perf:166
> /sys/devices/system/cpu/cpufreq/policy31/amd_pstate_highest_perf:166
>
> [~]> grep .
> /sys/devices/system/cpu*/cpufreq/policy*/amd_pstate_hw_prefcore
> /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_hw_prefcore:suppo
> rted
> /sys/devices/system/cpu/cpufreq/policy1/amd_pstate_hw_prefcore:suppo
> rted
> /sys/devices/system/cpu/cpufreq/policy2/amd_pstate_hw_prefcore:suppo
> rted
> /sys/devices/system/cpu/cpufreq/policy3/amd_pstate_hw_prefcore:suppo
> rted
> /sys/devices/system/cpu/cpufreq/policy4/amd_pstate_hw_prefcore:suppo
> rted
> /sys/devices/system/cpu/cpufreq/policy5/amd_pstate_hw_prefcore:suppo
> rted
> /sys/devices/system/cpu/cpufreq/policy6/amd_pstate_hw_prefcore:suppo
> rted
> /sys/devices/system/cpu/cpufreq/policy7/amd_pstate_hw_prefcore:suppo
> rted
> /sys/devices/system/cpu/cpufreq/policy8/amd_pstate_hw_prefcore:suppo
> rted
> /sys/devices/system/cpu/cpufreq/policy9/amd_pstate_hw_prefcore:suppo
> rted
> /sys/devices/system/cpu/cpufreq/policy10/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy11/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy12/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy13/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy14/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy15/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy16/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy17/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy18/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy19/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy20/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy21/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy22/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy23/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy24/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy25/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy26/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy27/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy28/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy29/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy30/amd_pstate_hw_prefcore:supp
> orted
> /sys/devices/system/cpu/cpufreq/policy31/amd_pstate_hw_prefcore:supp
> orted
>
> [~]> grep .
> /sys/devices/system/cpu*/cpufreq/policy*/amd_pstate_prefcore_ranking
> /sys/devices/system/cpu/cpufreq/policy0/amd_pstate_prefcore_ranking:22
> 6
> /sys/devices/system/cpu/cpufreq/policy1/amd_pstate_prefcore_ranking:23
> 1
> /sys/devices/system/cpu/cpufreq/policy2/amd_pstate_prefcore_ranking:21
> 1
> /sys/devices/system/cpu/cpufreq/policy3/amd_pstate_prefcore_ranking:23
> 6
> /sys/devices/system/cpu/cpufreq/policy4/amd_pstate_prefcore_ranking:21
> 6
> /sys/devices/system/cpu/cpufreq/policy5/amd_pstate_prefcore_ranking:23
> 6
> /sys/devices/system/cpu/cpufreq/policy6/amd_pstate_prefcore_ranking:20
> 6
> /sys/devices/system/cpu/cpufreq/policy7/amd_pstate_prefcore_ranking:22
> 1
> /sys/devices/system/cpu/cpufreq/policy8/amd_pstate_prefcore_ranking:19
> 1
> /sys/devices/system/cpu/cpufreq/policy9/amd_pstate_prefcore_ranking:20
> 1
> /sys/devices/system/cpu/cpufreq/policy10/amd_pstate_prefcore_ranking:1
> 86
> /sys/devices/system/cpu/cpufreq/policy11/amd_pstate_prefcore_ranking:1
> 96
> /sys/devices/system/cpu/cpufreq/policy12/amd_pstate_prefcore_ranking:1
> 71
> /sys/devices/system/cpu/cpufreq/policy13/amd_pstate_prefcore_ranking:1
> 66
> /sys/devices/system/cpu/cpufreq/policy14/amd_pstate_prefcore_ranking:1
> 76
> /sys/devices/system/cpu/cpufreq/policy15/amd_pstate_prefcore_ranking:1
> 81
> /sys/devices/system/cpu/cpufreq/policy16/amd_pstate_prefcore_ranking:2
> 26
> /sys/devices/system/cpu/cpufreq/policy17/amd_pstate_prefcore_ranking:2
> 31
> /sys/devices/system/cpu/cpufreq/policy18/amd_pstate_prefcore_ranking:2
> 11
> /sys/devices/system/cpu/cpufreq/policy19/amd_pstate_prefcore_ranking:2
> 36
> /sys/devices/system/cpu/cpufreq/policy20/amd_pstate_prefcore_ranking:2
> 16
> /sys/devices/system/cpu/cpufreq/policy21/amd_pstate_prefcore_ranking:2
> 36
> /sys/devices/system/cpu/cpufreq/policy22/amd_pstate_prefcore_ranking:2
> 06
> /sys/devices/system/cpu/cpufreq/policy23/amd_pstate_prefcore_ranking:2
> 21
> /sys/devices/system/cpu/cpufreq/policy24/amd_pstate_prefcore_ranking:1
> 91
> /sys/devices/system/cpu/cpufreq/policy25/amd_pstate_prefcore_ranking:2
> 01
> /sys/devices/system/cpu/cpufreq/policy26/amd_pstate_prefcore_ranking:1
> 86
> /sys/devices/system/cpu/cpufreq/policy27/amd_pstate_prefcore_ranking:1
> 96
> /sys/devices/system/cpu/cpufreq/policy28/amd_pstate_prefcore_ranking:1
> 71
> /sys/devices/system/cpu/cpufreq/policy29/amd_pstate_prefcore_ranking:1
> 66
> /sys/devices/system/cpu/cpufreq/policy30/amd_pstate_prefcore_ranking:1
> 76
> /sys/devices/system/cpu/cpufreq/policy31/amd_pstate_prefcore_ranking:1
> 81
> ```
>
> When I run `dd if=/dev/zero of=/dev/null`, the load lands onto cores 3, 5, 19
> or 21, IOW, those that have the highest `amd_pstate_prefcore_ranking`
> value given `schedutil` is in use.
>
> If all of the above is as expected, please add:
>
> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
>
> > > Would you also be able to Cc me on the next iteration of this patchset?
> > [Meng, Li (Jassmine)] OK.
>
> Thanks.
>
[Meng, Li (Jassmine)]
Thanks a lot.
Based on Wyes's suggestion, I also made similar modifications in the next patches.
All the log information above is in line with expectations.

> > >
> > > Thank you!
> > >
> > > [1] https://lore.kernel.org/lkml/5973628.lOV4Wx5bFT@natalenko.name/
> > >
> > > >
> > > > Changes form V6->V7:
> > > > - x86:
> > > > - - Modify kconfig about X86_AMD_PSTATE.
> > > > - cpufreq: amd-pstate:
> > > > - - modify incorrect comments about scheduler_work().
> > > > - - convert highest_perf data type.
> > > > - - modify preferred core init when cpu init and online.
> > > > - acpi: cppc:
> > > > - - modify link of CPPC highest performance.
> > > > - cpufreq:
> > > > - - modify link of CPPC highest performance changed.
> > > >
> > > > Changes form V5->V6:
> > > > - cpufreq: amd-pstate:
> > > > - - modify the wrong tag order.
> > > > - - modify warning about hw_prefcore sysfs attribute.
> > > > - - delete duplicate comments.
> > > > - - modify the variable name cppc_highest_perf to prefcore_ranking.
> > > > - - modify judgment conditions for setting highest_perf.
> > > > - - modify sysfs attribute for CPPC highest perf to pr_debug message.
> > > > - Documentation: amd-pstate:
> > > > - - modify warning: title underline too short.
> > > >
> > > > Changes form V4->V5:
> > > > - cpufreq: amd-pstate:
> > > > - - modify sysfs attribute for CPPC highest perf.
> > > > - - modify warning about comments
> > > > - - rebase linux-next
> > > > - cpufreq:
> > > > - - Moidfy warning about function declarations.
> > > > - Documentation: amd-pstate:
> > > > - - align with ``amd-pstat``
> > > >
> > > > Changes form V3->V4:
> > > > - Documentation: amd-pstate:
> > > > - - Modify inappropriate descriptions.
> > > >
> > > > Changes form V2->V3:
> > > > - x86:
> > > > - - Modify kconfig and description.
> > > > - cpufreq: amd-pstate:
> > > > - - Add Co-developed-by tag in commit message.
> > > > - cpufreq:
> > > > - - Modify commit message.
> > > > - Documentation: amd-pstate:
> > > > - - Modify inappropriate descriptions.
> > > >
> > > > Changes form V1->V2:
> > > > - acpi: cppc:
> > > > - - Add reference link.
> > > > - cpufreq:
> > > > - - Moidfy link error.
> > > > - cpufreq: amd-pstate:
> > > > - - Init the priorities of all online CPUs
> > > > - - Use a single variable to represent the status of preferred core.
> > > > - Documentation:
> > > > - - Default enabled preferred core.
> > > > - Documentation: amd-pstate:
> > > > - - Modify inappropriate descriptions.
> > > > - - Default enabled preferred core.
> > > > - - Use a single variable to represent the status of preferred core.
> > > >
> > > > Meng Li (7):
> > > >   x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
> > > >   acpi: cppc: Add get the highest performance cppc control
> > > >   cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
> > > >   cpufreq: Add a notification message that the highest perf has changed
> > > >   cpufreq: amd-pstate: Update amd-pstate preferred core ranking
> > > >     dynamically
> > > >   Documentation: amd-pstate: introduce amd-pstate preferred core
> > > >   Documentation: introduce amd-pstate preferrd core mode kernel
> > > command
> > > >     line options
> > > >
> > > >  .../admin-guide/kernel-parameters.txt         |   5 +
> > > >  Documentation/admin-guide/pm/amd-pstate.rst   |  59 +++++-
> > > >  arch/x86/Kconfig                              |   5 +-
> > > >  drivers/acpi/cppc_acpi.c                      |  13 ++
> > > >  drivers/acpi/processor_driver.c               |   6 +
> > > >  drivers/cpufreq/amd-pstate.c                  | 186 ++++++++++++++++--
> > > >  drivers/cpufreq/cpufreq.c                     |  13 ++
> > > >  include/acpi/cppc_acpi.h                      |   5 +
> > > >  include/linux/amd-pstate.h                    |  10 +
> > > >  include/linux/cpufreq.h                       |   5 +
> > > >  10 files changed, 285 insertions(+), 22 deletions(-)
> > > >
> > > >
> > >
> > >
> > > --
> > > Oleksandr Natalenko (post-factum)
> >
>
>
> --
> Oleksandr Natalenko (post-factum)
Huang Rui Oct. 10, 2023, 3:01 a.m. UTC | #6
On Mon, Oct 09, 2023 at 10:49:25AM +0800, Meng Li wrote:
> Hi all:
> 
> The core frequency is subjected to the process variation in semiconductors.
> Not all cores are able to reach the maximum frequency respecting the
> infrastructure limits. Consequently, AMD has redefined the concept of
> maximum frequency of a part. This means that a fraction of cores can reach
> maximum frequency. To find the best process scheduling policy for a given
> scenario, OS needs to know the core ordering informed by the platform through
> highest performance capability register of the CPPC interface.
> 
> Earlier implementations of amd-pstate preferred core only support a static
> core ranking and targeted performance. Now it has the ability to dynamically
> change the preferred core based on the workload and platform conditions and
> accounting for thermals and aging.
> 
> Amd-pstate driver utilizes the functions and data structures provided by
> the ITMT architecture to enable the scheduler to favor scheduling on cores
> which can be get a higher frequency with lower voltage.
> We call it amd-pstate preferred core.
> 
> Here sched_set_itmt_core_prio() is called to set priorities and
> sched_set_itmt_support() is called to enable ITMT feature.
> Amd-pstate driver uses the highest performance value to indicate
> the priority of CPU. The higher value has a higher priority.
> 
> Amd-pstate driver will provide an initial core ordering at boot time.
> It relies on the CPPC interface to communicate the core ranking to the
> operating system and scheduler to make sure that OS is choosing the cores
> with highest performance firstly for scheduling the process. When amd-pstate
> driver receives a message with the highest performance change, it will
> update the core ranking.
> 
> Changes form V7->V8:
> - all:
> - - pick up Review-By flag added by Mario and Ray.
> - cpufreq: amd-pstate:
> - - use hw_prefcore embeds into cpudata structure.
> - - delete preferred core init from cpu online/off.

Thanks!

Series are Reviewed-by: Huang Rui <ray.huang@amd.com>

> 
> Changes form V6->V7:
> - x86:
> - - Modify kconfig about X86_AMD_PSTATE.
> - cpufreq: amd-pstate:
> - - modify incorrect comments about scheduler_work().
> - - convert highest_perf data type.
> - - modify preferred core init when cpu init and online.
> - acpi: cppc:
> - - modify link of CPPC highest performance.
> - cpufreq:
> - - modify link of CPPC highest performance changed.
> 
> Changes form V5->V6:
> - cpufreq: amd-pstate:
> - - modify the wrong tag order.
> - - modify warning about hw_prefcore sysfs attribute.
> - - delete duplicate comments.
> - - modify the variable name cppc_highest_perf to prefcore_ranking.
> - - modify judgment conditions for setting highest_perf.
> - - modify sysfs attribute for CPPC highest perf to pr_debug message.
> - Documentation: amd-pstate:
> - - modify warning: title underline too short.
> 
> Changes form V4->V5:
> - cpufreq: amd-pstate:
> - - modify sysfs attribute for CPPC highest perf.
> - - modify warning about comments
> - - rebase linux-next
> - cpufreq: 
> - - Moidfy warning about function declarations.
> - Documentation: amd-pstate:
> - - align with ``amd-pstat``
> 
> Changes form V3->V4:
> - Documentation: amd-pstate:
> - - Modify inappropriate descriptions.
> 
> Changes form V2->V3:
> - x86:
> - - Modify kconfig and description.
> - cpufreq: amd-pstate: 
> - - Add Co-developed-by tag in commit message.
> - cpufreq:
> - - Modify commit message.
> - Documentation: amd-pstate:
> - - Modify inappropriate descriptions.
> 
> Changes form V1->V2:
> - acpi: cppc:
> - - Add reference link.
> - cpufreq:
> - - Moidfy link error.
> - cpufreq: amd-pstate: 
> - - Init the priorities of all online CPUs
> - - Use a single variable to represent the status of preferred core.
> - Documentation:
> - - Default enabled preferred core.
> - Documentation: amd-pstate: 
> - - Modify inappropriate descriptions.
> - - Default enabled preferred core.
> - - Use a single variable to represent the status of preferred core.
> 
> Meng Li (7):
>   x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
>   acpi: cppc: Add get the highest performance cppc control
>   cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
>   cpufreq: Add a notification message that the highest perf has changed
>   cpufreq: amd-pstate: Update amd-pstate preferred core ranking
>     dynamically
>   Documentation: amd-pstate: introduce amd-pstate preferred core
>   Documentation: introduce amd-pstate preferrd core mode kernel command
>     line options
> 
>  .../admin-guide/kernel-parameters.txt         |   5 +
>  Documentation/admin-guide/pm/amd-pstate.rst   |  59 +++++-
>  arch/x86/Kconfig                              |   5 +-
>  drivers/acpi/cppc_acpi.c                      |  13 ++
>  drivers/acpi/processor_driver.c               |   6 +
>  drivers/cpufreq/amd-pstate.c                  | 186 ++++++++++++++++--
>  drivers/cpufreq/cpufreq.c                     |  13 ++
>  include/acpi/cppc_acpi.h                      |   5 +
>  include/linux/amd-pstate.h                    |  10 +
>  include/linux/cpufreq.h                       |   5 +
>  10 files changed, 285 insertions(+), 22 deletions(-)
> 
> -- 
> 2.34.1
>
Peter Zijlstra Oct. 10, 2023, 10:30 a.m. UTC | #7
On Mon, Oct 09, 2023 at 10:49:26AM +0800, Meng Li wrote:
> amd-pstate driver also uses SCHED_MC_PRIO, so decouple the requirement
> of CPU_SUP_INTEL from the dependencies to allow compilation in kernels
> without Intel CPU support.
> 
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
> Reviewed-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Meng Li <li.meng@amd.com>
> ---
>  arch/x86/Kconfig | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 66bfabae8814..a2e163acf623 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1054,8 +1054,9 @@ config SCHED_MC
>  
>  config SCHED_MC_PRIO
>  	bool "CPU core priorities scheduler support"
> -	depends on SCHED_MC && CPU_SUP_INTEL
> -	select X86_INTEL_PSTATE
> +	depends on SCHED_MC
> +	select X86_INTEL_PSTATE if CPU_SUP_INTEL
> +	select X86_AMD_PSTATE if CPU_SUP_AMD && ACPI
>  	select CPU_FREQ
>  	default y
>  	help

The pedantic side of me wants to point out that:

	depends on SCHED_MC
	depends on CPU_SUP_INTEL || CPU_SUP_AMD

would be more accurate, as we still have a pile of other SUPs.

Anyway, no real objection, distros will have them all set anyway.
Peter Zijlstra Oct. 10, 2023, 10:36 a.m. UTC | #8
On Mon, Oct 09, 2023 at 10:49:28AM +0800, Meng Li wrote:

> +static void amd_pstate_init_prefcore(struct amd_cpudata *cpudata)
> +{
> +	int ret;
> +	u32 highest_perf;
> +	static u32 max_highest_perf = 0, min_highest_perf = U32_MAX;
> +
> +	ret = amd_pstate_get_highest_perf(cpudata->cpu, &highest_perf);
> +	if (ret)
> +		return;
> +
> +	cpudata->hw_prefcore = true;
> +	/* check if CPPC preferred core feature is enabled*/
> +	if (highest_perf == AMD_PSTATE_MAX_CPPC_PERF) {
> +		pr_debug("AMD CPPC preferred core is unsupported!\n");
> +		cpudata->hw_prefcore = false;
> +		return;
> +	}
> +
> +	if (!amd_pstate_prefcore)
> +		return;
> +
> +	/*
> +	 * The priorities can be set regardless of whether or not
> +	 * sched_set_itmt_support(true) has been called and it is valid to
> +	 * update them at any time after it has been called.
> +	 */
> +	sched_set_itmt_core_prio(highest_perf, cpudata->cpu);

You still got the whole u32 vs int thing confused, I've only pointed
that out *TWICE* before.

Boris, can you pull out the clue hammer please?

> +
> +	if (max_highest_perf <= min_highest_perf) {
> +		if (highest_perf > max_highest_perf)
> +			max_highest_perf = highest_perf;
> +
> +		if (highest_perf < min_highest_perf)
> +			min_highest_perf = highest_perf;
> +
> +		if (max_highest_perf > min_highest_perf) {
> +			/*
> +			 * This code can be run during CPU online under the
> +			 * CPU hotplug locks, so sched_set_itmt_support()
> +			 * cannot be called from here.  Queue up a work item
> +			 * to invoke it.
> +			 */
> +			schedule_work(&sched_prefcore_work);
> +		}
> +	}
> +}
Meng, Li (Jassmine) Oct. 11, 2023, 1:01 a.m. UTC | #9
[AMD Official Use Only - General]

Hi Peter:

> -----Original Message-----
> From: Peter Zijlstra <peterz@infradead.org>
> Sent: Tuesday, October 10, 2023 6:36 PM
> To: Meng, Li (Jassmine) <Li.Meng@amd.com>
> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>; Huang, Ray
> <Ray.Huang@amd.com>; linux-pm@vger.kernel.org; linux-
> kernel@vger.kernel.org; x86@kernel.org; linux-acpi@vger.kernel.org; Shuah
> Khan <skhan@linuxfoundation.org>; linux-kselftest@vger.kernel.org;
> Fontenot, Nathan <Nathan.Fontenot@amd.com>; Sharma, Deepak
> <Deepak.Sharma@amd.com>; Deucher, Alexander
> <Alexander.Deucher@amd.com>; Limonciello, Mario
> <Mario.Limonciello@amd.com>; Huang, Shimmer
> <Shimmer.Huang@amd.com>; Yuan, Perry <Perry.Yuan@amd.com>; Du,
> Xiaojian <Xiaojian.Du@amd.com>; Viresh Kumar <viresh.kumar@linaro.org>;
> Borislav Petkov <bp@alien8.de>
> Subject: Re: [PATCH V8 3/7] cpufreq: amd-pstate: Enable amd-pstate
> preferred core supporting.
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> On Mon, Oct 09, 2023 at 10:49:28AM +0800, Meng Li wrote:
>
> > +static void amd_pstate_init_prefcore(struct amd_cpudata *cpudata) {
> > +     int ret;
> > +     u32 highest_perf;
> > +     static u32 max_highest_perf = 0, min_highest_perf = U32_MAX;
> > +
> > +     ret = amd_pstate_get_highest_perf(cpudata->cpu, &highest_perf);
> > +     if (ret)
> > +             return;
> > +
> > +     cpudata->hw_prefcore = true;
> > +     /* check if CPPC preferred core feature is enabled*/
> > +     if (highest_perf == AMD_PSTATE_MAX_CPPC_PERF) {
> > +             pr_debug("AMD CPPC preferred core is unsupported!\n");
> > +             cpudata->hw_prefcore = false;
> > +             return;
> > +     }
> > +
> > +     if (!amd_pstate_prefcore)
> > +             return;
> > +
> > +     /*
> > +      * The priorities can be set regardless of whether or not
> > +      * sched_set_itmt_support(true) has been called and it is valid to
> > +      * update them at any time after it has been called.
> > +      */
> > +     sched_set_itmt_core_prio(highest_perf, cpudata->cpu);
>
> You still got the whole u32 vs int thing confused, I've only pointed that out
> *TWICE* before.
>
> Boris, can you pull out the clue hammer please?
>
[Meng, Li (Jassmine)]
I feel very sorry, I will immediately double check and correct all modified data type.
Thanks a lot.

> > +
> > +     if (max_highest_perf <= min_highest_perf) {
> > +             if (highest_perf > max_highest_perf)
> > +                     max_highest_perf = highest_perf;
> > +
> > +             if (highest_perf < min_highest_perf)
> > +                     min_highest_perf = highest_perf;
> > +
> > +             if (max_highest_perf > min_highest_perf) {
> > +                     /*
> > +                      * This code can be run during CPU online under the
> > +                      * CPU hotplug locks, so sched_set_itmt_support()
> > +                      * cannot be called from here.  Queue up a work item
> > +                      * to invoke it.
> > +                      */
> > +                     schedule_work(&sched_prefcore_work);
> > +             }
> > +     }
> > +}