diff mbox series

[v2,1/2] perf/x86/rapl: Fix the energy-pkg event for AMD CPUs

Message ID 20240730044917.4680-2-Dhananjay.Ugwekar@amd.com
State Accepted
Commit 8d72eba1cf8cecd76a2b4c1dd7673c2dc775f514
Headers show
Series RAPL driver fixes for AMD CPUs | expand

Commit Message

Dhananjay Ugwekar July 30, 2024, 4:49 a.m. UTC
After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
on AMD processors that support extended CPUID leaf 0x80000026, the
topology_die_cpumask() and topology_logical_die_id() macros, no longer
return the package cpumask and package id, instead they return the CCD
(Core Complex Die) mask and id respectively. This leads to the energy-pkg
event scope to be modified to CCD instead of package.

For more historical context, please refer to commit 32fb480e0a2c
("powercap/intel_rapl: Support multi-die/package"), which initially changed
the RAPL scope from package to die for all systems, as Intel systems
with Die enumeration have RAPL scope as die, and those without die
enumeration were not affected by it. So, all systems(Intel, AMD, Hygon),
worked correctly with topology_logical_die_id() until recently, but this
changed after the "0x80000026 leaf" commit mentioned above.

Replacing topology_logical_die_id() with topology_physical_package_id()
conditionally only for AMD and Hygon fixes the energy-pkg event.

On a 12 CCD 1 Package AMD Zen4 Genoa machine:

Before:
$ cat /sys/devices/power/cpumask
0,8,16,24,32,40,48,56,64,72,80,88.

The expected cpumask here is supposed to be just "0", as it is a package
scope event, only one CPU will be collecting the event for all the CPUs in
the package.

After:
$ cat /sys/devices/power/cpumask
0

Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
---
Changes in v2:
* Updated the scope description comment
* Dont create rapl_pmu_cpumask and rapl_pmu_idx local variables, as they're
  used only once, instead call the get_* functions directly where needed
* Check topology_logical_(die/package)_id return value
---
 arch/x86/events/rapl.c | 47 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 42 insertions(+), 5 deletions(-)

Comments

Liang, Kan Aug. 2, 2024, 3:26 p.m. UTC | #1
Hi Dhananjay,

On 2024-07-30 12:49 a.m., Dhananjay Ugwekar wrote:
> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
> on AMD processors that support extended CPUID leaf 0x80000026, the
> topology_die_cpumask() and topology_logical_die_id() macros, no longer
> return the package cpumask and package id, instead they return the CCD
> (Core Complex Die) mask and id respectively. This leads to the energy-pkg
> event scope to be modified to CCD instead of package.
> 
> For more historical context, please refer to commit 32fb480e0a2c
> ("powercap/intel_rapl: Support multi-die/package"), which initially changed
> the RAPL scope from package to die for all systems, as Intel systems
> with Die enumeration have RAPL scope as die, and those without die
> enumeration were not affected by it. So, all systems(Intel, AMD, Hygon),
> worked correctly with topology_logical_die_id() until recently, but this
> changed after the "0x80000026 leaf" commit mentioned above.
> 
> Replacing topology_logical_die_id() with topology_physical_package_id()
> conditionally only for AMD and Hygon fixes the energy-pkg event.
> 
> On a 12 CCD 1 Package AMD Zen4 Genoa machine:
> 
> Before:
> $ cat /sys/devices/power/cpumask
> 0,8,16,24,32,40,48,56,64,72,80,88.
> 
> The expected cpumask here is supposed to be just "0", as it is a package
> scope event, only one CPU will be collecting the event for all the CPUs in
> the package.
> 
> After:
> $ cat /sys/devices/power/cpumask
> 0
> 
> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
> ---
> Changes in v2:
> * Updated the scope description comment
> * Dont create rapl_pmu_cpumask and rapl_pmu_idx local variables, as they're
>   used only once, instead call the get_* functions directly where needed
> * Check topology_logical_(die/package)_id return value
> --->  arch/x86/events/rapl.c | 47 +++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 42 insertions(+), 5 deletions(-)

I just posted a patch set to clean up the hotplug code in perf.
https://lore.kernel.org/lkml/20240802151643.1691631-1-kan.liang@linux.intel.com/

With the cleanup patch set, the fix may be simplified as below.

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index b70ad880c5bc..801697be4118 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -646,6 +646,10 @@ static int __init init_rapl_pmus(void)
 	rapl_pmus->pmu.module           = THIS_MODULE;
 	rapl_pmus->pmu.scope            = PERF_PMU_SCOPE_DIE;
 	rapl_pmus->pmu.capabilities     = PERF_PMU_CAP_NO_EXCLUDE;
+
+	if (rapl_pmu_is_pkg_scope())
+		rapl_pmus->pmu.scope = PERF_PMU_SCOPE_PKG;
+
 	return 0;
 }

Could you please take a look at the cleanup patch and give it a try?

Thanks,
Kan
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index b985ca79cf97..7097c0f6a71f 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -103,6 +103,19 @@ static struct perf_pmu_events_attr event_attr_##v = {				\
>  	.event_str	= str,							\
>  };
>  
> +/*
> + * RAPL Package energy counter scope:
> + * 1. AMD/HYGON platforms have a per-PKG package energy counter
> + * 2. For Intel platforms
> + *	2.1. CLX-AP is multi-die and its RAPL MSRs are die-scope
> + *	2.2. Other Intel platforms are single die systems so the scope can be
> + *	     considered as either pkg-scope or die-scope, and we are considering
> + *	     them as die-scope.
> + */
> +#define rapl_pmu_is_pkg_scope()				\
> +	(boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||	\
> +	 boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> +
>  struct rapl_pmu {
>  	raw_spinlock_t		lock;
>  	int			n_active;
> @@ -140,9 +153,25 @@ static unsigned int rapl_cntr_mask;
>  static u64 rapl_timer_ms;
>  static struct perf_msr *rapl_msrs;
>  
> +/*
> + * Helper functions to get the correct topology macros according to the
> + * RAPL PMU scope.
> + */
> +static inline unsigned int get_rapl_pmu_idx(int cpu)
> +{
> +	return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
> +					 topology_logical_die_id(cpu);
> +}
> +
> +static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
> +{
> +	return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
> +					 topology_die_cpumask(cpu);
> +}
> +
>  static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
>  {
> -	unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
> +	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>  
>  	/*
>  	 * The unsigned check also catches the '-1' return value for non
> @@ -552,7 +581,7 @@ static int rapl_cpu_offline(unsigned int cpu)
>  
>  	pmu->cpu = -1;
>  	/* Find a new cpu to collect rapl events */
> -	target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
> +	target = cpumask_any_but(get_rapl_pmu_cpumask(cpu), cpu);
>  
>  	/* Migrate rapl events to the new target */
>  	if (target < nr_cpu_ids) {
> @@ -565,6 +594,11 @@ static int rapl_cpu_offline(unsigned int cpu)
>  
>  static int rapl_cpu_online(unsigned int cpu)
>  {
> +	s32 rapl_pmu_idx = get_rapl_pmu_idx(cpu);
> +	if (rapl_pmu_idx < 0) {
> +		pr_err("topology_logical_(package/die)_id() returned a negative value");
> +		return -EINVAL;
> +	}
>  	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>  	int target;
>  
> @@ -579,14 +613,14 @@ static int rapl_cpu_online(unsigned int cpu)
>  		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
>  		rapl_hrtimer_init(pmu);
>  
> -		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
> +		rapl_pmus->pmus[rapl_pmu_idx] = pmu;
>  	}
>  
>  	/*
>  	 * Check if there is an online cpu in the package which collects rapl
>  	 * events already.
>  	 */
> -	target = cpumask_any_and(&rapl_cpu_mask, topology_die_cpumask(cpu));
> +	target = cpumask_any_and(&rapl_cpu_mask, get_rapl_pmu_cpumask(cpu));
>  	if (target < nr_cpu_ids)
>  		return 0;
>  
> @@ -675,7 +709,10 @@ static const struct attribute_group *rapl_attr_update[] = {
>  
>  static int __init init_rapl_pmus(void)
>  {
> -	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
> +	int nr_rapl_pmu = topology_max_packages();
> +
> +	if (!rapl_pmu_is_pkg_scope())
> +		nr_rapl_pmu *= topology_max_dies_per_package();
>  
>  	rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus, nr_rapl_pmu), GFP_KERNEL);
>  	if (!rapl_pmus)
Dhananjay Ugwekar Aug. 8, 2024, 12:10 p.m. UTC | #2
Hello Kan,

On 8/2/2024 8:56 PM, Liang, Kan wrote:
> Hi Dhananjay,
> 
> On 2024-07-30 12:49 a.m., Dhananjay Ugwekar wrote:
>> After commit ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf"),
>> on AMD processors that support extended CPUID leaf 0x80000026, the
>> topology_die_cpumask() and topology_logical_die_id() macros, no longer
>> return the package cpumask and package id, instead they return the CCD
>> (Core Complex Die) mask and id respectively. This leads to the energy-pkg
>> event scope to be modified to CCD instead of package.
>>
>> For more historical context, please refer to commit 32fb480e0a2c
>> ("powercap/intel_rapl: Support multi-die/package"), which initially changed
>> the RAPL scope from package to die for all systems, as Intel systems
>> with Die enumeration have RAPL scope as die, and those without die
>> enumeration were not affected by it. So, all systems(Intel, AMD, Hygon),
>> worked correctly with topology_logical_die_id() until recently, but this
>> changed after the "0x80000026 leaf" commit mentioned above.
>>
>> Replacing topology_logical_die_id() with topology_physical_package_id()
>> conditionally only for AMD and Hygon fixes the energy-pkg event.
>>
>> On a 12 CCD 1 Package AMD Zen4 Genoa machine:
>>
>> Before:
>> $ cat /sys/devices/power/cpumask
>> 0,8,16,24,32,40,48,56,64,72,80,88.
>>
>> The expected cpumask here is supposed to be just "0", as it is a package
>> scope event, only one CPU will be collecting the event for all the CPUs in
>> the package.
>>
>> After:
>> $ cat /sys/devices/power/cpumask
>> 0
>>
>> Fixes: 63edbaa48a57 ("x86/cpu/topology: Add support for the AMD 0x80000026 leaf")
>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com>
>> Reviewed-by: Zhang Rui <rui.zhang@intel.com>
>> ---
>> Changes in v2:
>> * Updated the scope description comment
>> * Dont create rapl_pmu_cpumask and rapl_pmu_idx local variables, as they're
>>   used only once, instead call the get_* functions directly where needed
>> * Check topology_logical_(die/package)_id return value
>> --->  arch/x86/events/rapl.c | 47 +++++++++++++++++++++++++++++++++++++-----
>>  1 file changed, 42 insertions(+), 5 deletions(-)
> 
> I just posted a patch set to clean up the hotplug code in perf.
> https://lore.kernel.org/lkml/20240802151643.1691631-1-kan.liang@linux.intel.com/
> 
> With the cleanup patch set, the fix may be simplified as below.
> 
> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
> index b70ad880c5bc..801697be4118 100644
> --- a/arch/x86/events/rapl.c
> +++ b/arch/x86/events/rapl.c
> @@ -646,6 +646,10 @@ static int __init init_rapl_pmus(void)
>  	rapl_pmus->pmu.module           = THIS_MODULE;
>  	rapl_pmus->pmu.scope            = PERF_PMU_SCOPE_DIE;
>  	rapl_pmus->pmu.capabilities     = PERF_PMU_CAP_NO_EXCLUDE;
> +
> +	if (rapl_pmu_is_pkg_scope())
> +		rapl_pmus->pmu.scope = PERF_PMU_SCOPE_PKG;
> +
>  	return 0;
>  }
> 
> Could you please take a look at the cleanup patch and give it a try?

Sorry for the late reply, I was out sick for last few days.

I tried out my fix(modified) on top your patchset, the energy-pkg event seems to be
working correctly,

On a Zen4 Genoa EPYC server:

When idle,

~$ sudo perf stat -a -e power/energy-pkg/ sleep 1

 Performance counter stats for 'system wide':

             33.93 Joules power/energy-pkg/

       1.001969742 seconds time elapsed

After running stress-ng:

~$ sudo perf stat -a -e power/energy-pkg/ sleep 1

 Performance counter stats for 'system wide':

            376.16 Joules power/energy-pkg/

       1.003985155 seconds time elapsed

Please feel free to add my tested-by on your hotplug cleanup patchset.

Thanks,
Dhananjay

> 
> Thanks,
> Kan
>>
>> diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
>> index b985ca79cf97..7097c0f6a71f 100644
>> --- a/arch/x86/events/rapl.c
>> +++ b/arch/x86/events/rapl.c
>> @@ -103,6 +103,19 @@ static struct perf_pmu_events_attr event_attr_##v = {				\
>>  	.event_str	= str,							\
>>  };
>>  
>> +/*
>> + * RAPL Package energy counter scope:
>> + * 1. AMD/HYGON platforms have a per-PKG package energy counter
>> + * 2. For Intel platforms
>> + *	2.1. CLX-AP is multi-die and its RAPL MSRs are die-scope
>> + *	2.2. Other Intel platforms are single die systems so the scope can be
>> + *	     considered as either pkg-scope or die-scope, and we are considering
>> + *	     them as die-scope.
>> + */
>> +#define rapl_pmu_is_pkg_scope()				\
>> +	(boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||	\
>> +	 boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
>> +
>>  struct rapl_pmu {
>>  	raw_spinlock_t		lock;
>>  	int			n_active;
>> @@ -140,9 +153,25 @@ static unsigned int rapl_cntr_mask;
>>  static u64 rapl_timer_ms;
>>  static struct perf_msr *rapl_msrs;
>>  
>> +/*
>> + * Helper functions to get the correct topology macros according to the
>> + * RAPL PMU scope.
>> + */
>> +static inline unsigned int get_rapl_pmu_idx(int cpu)
>> +{
>> +	return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
>> +					 topology_logical_die_id(cpu);
>> +}
>> +
>> +static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
>> +{
>> +	return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
>> +					 topology_die_cpumask(cpu);
>> +}
>> +
>>  static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
>>  {
>> -	unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
>> +	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>>  
>>  	/*
>>  	 * The unsigned check also catches the '-1' return value for non
>> @@ -552,7 +581,7 @@ static int rapl_cpu_offline(unsigned int cpu)
>>  
>>  	pmu->cpu = -1;
>>  	/* Find a new cpu to collect rapl events */
>> -	target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
>> +	target = cpumask_any_but(get_rapl_pmu_cpumask(cpu), cpu);
>>  
>>  	/* Migrate rapl events to the new target */
>>  	if (target < nr_cpu_ids) {
>> @@ -565,6 +594,11 @@ static int rapl_cpu_offline(unsigned int cpu)
>>  
>>  static int rapl_cpu_online(unsigned int cpu)
>>  {
>> +	s32 rapl_pmu_idx = get_rapl_pmu_idx(cpu);
>> +	if (rapl_pmu_idx < 0) {
>> +		pr_err("topology_logical_(package/die)_id() returned a negative value");
>> +		return -EINVAL;
>> +	}
>>  	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
>>  	int target;
>>  
>> @@ -579,14 +613,14 @@ static int rapl_cpu_online(unsigned int cpu)
>>  		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
>>  		rapl_hrtimer_init(pmu);
>>  
>> -		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
>> +		rapl_pmus->pmus[rapl_pmu_idx] = pmu;
>>  	}
>>  
>>  	/*
>>  	 * Check if there is an online cpu in the package which collects rapl
>>  	 * events already.
>>  	 */
>> -	target = cpumask_any_and(&rapl_cpu_mask, topology_die_cpumask(cpu));
>> +	target = cpumask_any_and(&rapl_cpu_mask, get_rapl_pmu_cpumask(cpu));
>>  	if (target < nr_cpu_ids)
>>  		return 0;
>>  
>> @@ -675,7 +709,10 @@ static const struct attribute_group *rapl_attr_update[] = {
>>  
>>  static int __init init_rapl_pmus(void)
>>  {
>> -	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
>> +	int nr_rapl_pmu = topology_max_packages();
>> +
>> +	if (!rapl_pmu_is_pkg_scope())
>> +		nr_rapl_pmu *= topology_max_dies_per_package();
>>  
>>  	rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus, nr_rapl_pmu), GFP_KERNEL);
>>  	if (!rapl_pmus)
diff mbox series

Patch

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index b985ca79cf97..7097c0f6a71f 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -103,6 +103,19 @@  static struct perf_pmu_events_attr event_attr_##v = {				\
 	.event_str	= str,							\
 };
 
+/*
+ * RAPL Package energy counter scope:
+ * 1. AMD/HYGON platforms have a per-PKG package energy counter
+ * 2. For Intel platforms
+ *	2.1. CLX-AP is multi-die and its RAPL MSRs are die-scope
+ *	2.2. Other Intel platforms are single die systems so the scope can be
+ *	     considered as either pkg-scope or die-scope, and we are considering
+ *	     them as die-scope.
+ */
+#define rapl_pmu_is_pkg_scope()				\
+	(boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||	\
+	 boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+
 struct rapl_pmu {
 	raw_spinlock_t		lock;
 	int			n_active;
@@ -140,9 +153,25 @@  static unsigned int rapl_cntr_mask;
 static u64 rapl_timer_ms;
 static struct perf_msr *rapl_msrs;
 
+/*
+ * Helper functions to get the correct topology macros according to the
+ * RAPL PMU scope.
+ */
+static inline unsigned int get_rapl_pmu_idx(int cpu)
+{
+	return rapl_pmu_is_pkg_scope() ? topology_logical_package_id(cpu) :
+					 topology_logical_die_id(cpu);
+}
+
+static inline const struct cpumask *get_rapl_pmu_cpumask(int cpu)
+{
+	return rapl_pmu_is_pkg_scope() ? topology_core_cpumask(cpu) :
+					 topology_die_cpumask(cpu);
+}
+
 static inline struct rapl_pmu *cpu_to_rapl_pmu(unsigned int cpu)
 {
-	unsigned int rapl_pmu_idx = topology_logical_die_id(cpu);
+	unsigned int rapl_pmu_idx = get_rapl_pmu_idx(cpu);
 
 	/*
 	 * The unsigned check also catches the '-1' return value for non
@@ -552,7 +581,7 @@  static int rapl_cpu_offline(unsigned int cpu)
 
 	pmu->cpu = -1;
 	/* Find a new cpu to collect rapl events */
-	target = cpumask_any_but(topology_die_cpumask(cpu), cpu);
+	target = cpumask_any_but(get_rapl_pmu_cpumask(cpu), cpu);
 
 	/* Migrate rapl events to the new target */
 	if (target < nr_cpu_ids) {
@@ -565,6 +594,11 @@  static int rapl_cpu_offline(unsigned int cpu)
 
 static int rapl_cpu_online(unsigned int cpu)
 {
+	s32 rapl_pmu_idx = get_rapl_pmu_idx(cpu);
+	if (rapl_pmu_idx < 0) {
+		pr_err("topology_logical_(package/die)_id() returned a negative value");
+		return -EINVAL;
+	}
 	struct rapl_pmu *pmu = cpu_to_rapl_pmu(cpu);
 	int target;
 
@@ -579,14 +613,14 @@  static int rapl_cpu_online(unsigned int cpu)
 		pmu->timer_interval = ms_to_ktime(rapl_timer_ms);
 		rapl_hrtimer_init(pmu);
 
-		rapl_pmus->pmus[topology_logical_die_id(cpu)] = pmu;
+		rapl_pmus->pmus[rapl_pmu_idx] = pmu;
 	}
 
 	/*
 	 * Check if there is an online cpu in the package which collects rapl
 	 * events already.
 	 */
-	target = cpumask_any_and(&rapl_cpu_mask, topology_die_cpumask(cpu));
+	target = cpumask_any_and(&rapl_cpu_mask, get_rapl_pmu_cpumask(cpu));
 	if (target < nr_cpu_ids)
 		return 0;
 
@@ -675,7 +709,10 @@  static const struct attribute_group *rapl_attr_update[] = {
 
 static int __init init_rapl_pmus(void)
 {
-	int nr_rapl_pmu = topology_max_packages() * topology_max_dies_per_package();
+	int nr_rapl_pmu = topology_max_packages();
+
+	if (!rapl_pmu_is_pkg_scope())
+		nr_rapl_pmu *= topology_max_dies_per_package();
 
 	rapl_pmus = kzalloc(struct_size(rapl_pmus, pmus, nr_rapl_pmu), GFP_KERNEL);
 	if (!rapl_pmus)