mbox series

[v3,0/4] powercap: Enable RAPL for AMD Fam17h and Fam19h

Message ID 20201027072358.13725-1-victording@google.com
Headers show
Series powercap: Enable RAPL for AMD Fam17h and Fam19h | expand

Message

Victor Ding Oct. 27, 2020, 7:23 a.m. UTC
This patch series adds support for AMD Fam17h RAPL counters. As per
AMD PPR, Fam17h and Fam19h support RAPL counters to monitor power
usage. The RAPL counter operates as with Intel RAPL. Therefore, it is
beneficial to re-use existing framework for Intel, especially to
allow existing tools to seamlessly run on AMD.

Comments

Zhang Rui Nov. 2, 2020, 1:38 a.m. UTC | #1
On Tue, 2020-10-27 at 07:23 +0000, Victor Ding wrote:
> This patch enables AMD Fam17h RAPL support for the power capping

> framework. The support is as per AMD Fam17h Model31h (Zen2) and

> model 00-ffh (Zen1) PPR.

> 

> Tested by comparing the results of following two sysfs entries and

> the

> values directly read from corresponding MSRs via /dev/cpu/[x]/msr:

>   /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj

>   /sys/class/powercap/intel-rapl/intel-rapl:0/intel-

> rapl:0:0/energy_uj

> 

> Signed-off-by: Victor Ding <victording@google.com>

> Acked-by: Kim Phillips <kim.phillips@amd.com>

> 

> 

> ---

> 

> Changes in v3:

> By Victor Ding <victording@google.com>

>  - Rebased to the latest code.

>  - Created a new rapl_defaults for AMD CPUs.

>  - Removed redundant setting to zeros.

>  - Stopped using the fake power limit domain 1.

> 

> Changes in v2:

> By Kim Phillips <kim.phillips@amd.com>:

>  - Added Kim's Acked-by.

>  - Added Daniel Lezcano to Cc.

>  - (No code change).

> 

>  arch/x86/include/asm/msr-index.h     |  1 +

>  drivers/powercap/intel_rapl_common.c |  6 ++++++

>  drivers/powercap/intel_rapl_msr.c    | 20 +++++++++++++++++++-

>  3 files changed, 26 insertions(+), 1 deletion(-)

> 

> diff --git a/arch/x86/include/asm/msr-index.h

> b/arch/x86/include/asm/msr-index.h

> index 21917e134ad4..c36a083c8ec0 100644

> --- a/arch/x86/include/asm/msr-index.h

> +++ b/arch/x86/include/asm/msr-index.h

> @@ -327,6 +327,7 @@

>  #define MSR_PP1_POLICY			0x00000642

>  

>  #define MSR_AMD_RAPL_POWER_UNIT		0xc0010299

> +#define MSR_AMD_CORE_ENERGY_STATUS		0xc001029a

>  #define MSR_AMD_PKG_ENERGY_STATUS	0xc001029b

>  

>  /* Config TDP MSRs */

> diff --git a/drivers/powercap/intel_rapl_common.c

> b/drivers/powercap/intel_rapl_common.c

> index 0b2830efc574..bedd780bed12 100644

> --- a/drivers/powercap/intel_rapl_common.c

> +++ b/drivers/powercap/intel_rapl_common.c

> @@ -1011,6 +1011,10 @@ static const struct rapl_defaults

> rapl_defaults_cht = {

>  	.compute_time_window = rapl_compute_time_window_atom,

>  };

>  

> +static const struct rapl_defaults rapl_defaults_amd = {

> +	.check_unit = rapl_check_unit_core,

> +};

> +


why do we need power_unit and time_unit if we only want to expose the
energy counter?

Plus, in rapl_init_domains(), PL1 is enabled for every RAPL Domain
blindly, I'm not sure how this is handled on the AMD CPUs.
Is PL1 invalidated by rapl_detect_powerlimit()? or is it still
registered as a valid constraint into powercap sysfs I/F?

Currently, the code makes the assumption that there is only on power
limit if priv->limits[domain_id] not set, we probably need to change
this if we want to support RAPL domains with no power limit.

thanks,
rui
>  static const struct x86_cpu_id rapl_ids[] __initconst = {

>  	X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE,		&rapl_default

> s_core),

>  	X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE_X,	&rapl_defaults_core),

> @@ -1061,6 +1065,8 @@ static const struct x86_cpu_id rapl_ids[]

> __initconst = {

>  

>  	X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL,	&rapl_defaults_hsw_se

> rver),

>  	X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM,	&rapl_defaults_hsw_se

> rver),

> +

> +	X86_MATCH_VENDOR_FAM(AMD, 0x17, &rapl_defaults_amd),

>  	{}

>  };

>  MODULE_DEVICE_TABLE(x86cpu, rapl_ids);

> diff --git a/drivers/powercap/intel_rapl_msr.c

> b/drivers/powercap/intel_rapl_msr.c

> index a819b3b89b2f..78213d4b5b16 100644

> --- a/drivers/powercap/intel_rapl_msr.c

> +++ b/drivers/powercap/intel_rapl_msr.c

> @@ -49,6 +49,14 @@ static struct rapl_if_priv rapl_msr_priv_intel = {

>  	.limits[RAPL_DOMAIN_PLATFORM] = 2,

>  };

>  

> +static struct rapl_if_priv rapl_msr_priv_amd = {

> +	.reg_unit = MSR_AMD_RAPL_POWER_UNIT,

> +	.regs[RAPL_DOMAIN_PACKAGE] = {

> +		0, MSR_AMD_PKG_ENERGY_STATUS, 0, 0, 0 },

> +	.regs[RAPL_DOMAIN_PP0] = {

> +		0, MSR_AMD_CORE_ENERGY_STATUS, 0, 0, 0 },

> +};

> +

>  /* Handles CPU hotplug on multi-socket systems.

>   * If a CPU goes online as the first CPU of the physical package

>   * we add the RAPL package to the system. Similarly, when the last

> @@ -138,7 +146,17 @@ static int rapl_msr_probe(struct platform_device

> *pdev)

>  	const struct x86_cpu_id *id = x86_match_cpu(pl4_support_ids);

>  	int ret;

>  

> -	rapl_msr_priv = &rapl_msr_priv_intel;

> +	switch (boot_cpu_data.x86_vendor) {

> +	case X86_VENDOR_INTEL:

> +		rapl_msr_priv = &rapl_msr_priv_intel;

> +		break;

> +	case X86_VENDOR_AMD:

> +		rapl_msr_priv = &rapl_msr_priv_amd;

> +		break;

> +	default:

> +		pr_err("intel-rapl does not support CPU vendor %d\n",

> boot_cpu_data.x86_vendor);

> +		return -ENODEV;

> +	}

>  	rapl_msr_priv->read_raw = rapl_msr_read_raw;

>  	rapl_msr_priv->write_raw = rapl_msr_write_raw;

>
Victor Ding Nov. 3, 2020, 6:10 a.m. UTC | #2
On Mon, Nov 2, 2020 at 12:39 PM Zhang Rui <rui.zhang@intel.com> wrote:
>
> On Tue, 2020-10-27 at 07:23 +0000, Victor Ding wrote:
> > This patch enables AMD Fam17h RAPL support for the power capping
> > framework. The support is as per AMD Fam17h Model31h (Zen2) and
> > model 00-ffh (Zen1) PPR.
> >
> > Tested by comparing the results of following two sysfs entries and
> > the
> > values directly read from corresponding MSRs via /dev/cpu/[x]/msr:
> >   /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
> >   /sys/class/powercap/intel-rapl/intel-rapl:0/intel-
> > rapl:0:0/energy_uj
> >
> > Signed-off-by: Victor Ding <victording@google.com>
> > Acked-by: Kim Phillips <kim.phillips@amd.com>
> >
> >
> > ---
> >
> > Changes in v3:
> > By Victor Ding <victording@google.com>
> >  - Rebased to the latest code.
> >  - Created a new rapl_defaults for AMD CPUs.
> >  - Removed redundant setting to zeros.
> >  - Stopped using the fake power limit domain 1.
> >
> > Changes in v2:
> > By Kim Phillips <kim.phillips@amd.com>:
> >  - Added Kim's Acked-by.
> >  - Added Daniel Lezcano to Cc.
> >  - (No code change).
> >
> >  arch/x86/include/asm/msr-index.h     |  1 +
> >  drivers/powercap/intel_rapl_common.c |  6 ++++++
> >  drivers/powercap/intel_rapl_msr.c    | 20 +++++++++++++++++++-
> >  3 files changed, 26 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/msr-index.h
> > b/arch/x86/include/asm/msr-index.h
> > index 21917e134ad4..c36a083c8ec0 100644
> > --- a/arch/x86/include/asm/msr-index.h
> > +++ b/arch/x86/include/asm/msr-index.h
> > @@ -327,6 +327,7 @@
> >  #define MSR_PP1_POLICY                       0x00000642
> >
> >  #define MSR_AMD_RAPL_POWER_UNIT              0xc0010299
> > +#define MSR_AMD_CORE_ENERGY_STATUS           0xc001029a
> >  #define MSR_AMD_PKG_ENERGY_STATUS    0xc001029b
> >
> >  /* Config TDP MSRs */
> > diff --git a/drivers/powercap/intel_rapl_common.c
> > b/drivers/powercap/intel_rapl_common.c
> > index 0b2830efc574..bedd780bed12 100644
> > --- a/drivers/powercap/intel_rapl_common.c
> > +++ b/drivers/powercap/intel_rapl_common.c
> > @@ -1011,6 +1011,10 @@ static const struct rapl_defaults
> > rapl_defaults_cht = {
> >       .compute_time_window = rapl_compute_time_window_atom,
> >  };
> >
> > +static const struct rapl_defaults rapl_defaults_amd = {
> > +     .check_unit = rapl_check_unit_core,
> > +};
> > +
>
> why do we need power_unit and time_unit if we only want to expose the
> energy counter?
AMD's Power Unit MSR provides identical information as Intel's, including
time units, power units, and energy status units. By reusing the check unit
method, we could avoid code duplication as well as easing future enhance-
ment when AMD starts to support power limits.
>
> Plus, in rapl_init_domains(), PL1 is enabled for every RAPL Domain
> blindly, I'm not sure how this is handled on the AMD CPUs.
> Is PL1 invalidated by rapl_detect_powerlimit()? or is it still
> registered as a valid constraint into powercap sysfs I/F?
AMD's CORE_ENERGY_STAT MSR is like Intel's PP0_ENERGY_STATUS;
therefore, PL1 also always exists on AMD. rapl_detect_powerlimit() correctly
markes the domain as monitoring-only after finding power limit MSRs do not
exist.
>
> Currently, the code makes the assumption that there is only on power
> limit if priv->limits[domain_id] not set, we probably need to change
> this if we want to support RAPL domains with no power limit.
The existing code already supports RAPL domains with no power limit: PL1 is
enabled when there is zero or one power limit,
rapl_detect_powerlimit() will then
mark if PL1 is monitoring-only if power limit MSRs do not exist. Both AMD's RAPL
domains are monitoring-only and are correctly marked and handled.
>
> thanks,
> rui
> >  static const struct x86_cpu_id rapl_ids[] __initconst = {
> >       X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE,         &rapl_default
> > s_core),
> >       X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE_X,       &rapl_defaults_core),
> > @@ -1061,6 +1065,8 @@ static const struct x86_cpu_id rapl_ids[]
> > __initconst = {
> >
> >       X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL,        &rapl_defaults_hsw_se
> > rver),
> >       X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM,        &rapl_defaults_hsw_se
> > rver),
> > +
> > +     X86_MATCH_VENDOR_FAM(AMD, 0x17, &rapl_defaults_amd),
> >       {}
> >  };
> >  MODULE_DEVICE_TABLE(x86cpu, rapl_ids);
> > diff --git a/drivers/powercap/intel_rapl_msr.c
> > b/drivers/powercap/intel_rapl_msr.c
> > index a819b3b89b2f..78213d4b5b16 100644
> > --- a/drivers/powercap/intel_rapl_msr.c
> > +++ b/drivers/powercap/intel_rapl_msr.c
> > @@ -49,6 +49,14 @@ static struct rapl_if_priv rapl_msr_priv_intel = {
> >       .limits[RAPL_DOMAIN_PLATFORM] = 2,
> >  };
> >
> > +static struct rapl_if_priv rapl_msr_priv_amd = {
> > +     .reg_unit = MSR_AMD_RAPL_POWER_UNIT,
> > +     .regs[RAPL_DOMAIN_PACKAGE] = {
> > +             0, MSR_AMD_PKG_ENERGY_STATUS, 0, 0, 0 },
> > +     .regs[RAPL_DOMAIN_PP0] = {
> > +             0, MSR_AMD_CORE_ENERGY_STATUS, 0, 0, 0 },
> > +};
> > +
> >  /* Handles CPU hotplug on multi-socket systems.
> >   * If a CPU goes online as the first CPU of the physical package
> >   * we add the RAPL package to the system. Similarly, when the last
> > @@ -138,7 +146,17 @@ static int rapl_msr_probe(struct platform_device
> > *pdev)
> >       const struct x86_cpu_id *id = x86_match_cpu(pl4_support_ids);
> >       int ret;
> >
> > -     rapl_msr_priv = &rapl_msr_priv_intel;
> > +     switch (boot_cpu_data.x86_vendor) {
> > +     case X86_VENDOR_INTEL:
> > +             rapl_msr_priv = &rapl_msr_priv_intel;
> > +             break;
> > +     case X86_VENDOR_AMD:
> > +             rapl_msr_priv = &rapl_msr_priv_amd;
> > +             break;
> > +     default:
> > +             pr_err("intel-rapl does not support CPU vendor %d\n",
> > boot_cpu_data.x86_vendor);
> > +             return -ENODEV;
> > +     }
> >       rapl_msr_priv->read_raw = rapl_msr_read_raw;
> >       rapl_msr_priv->write_raw = rapl_msr_write_raw;
> >
>
Best regards,
Victor Ding
Srinivas Pandruvada Nov. 3, 2020, 5:09 p.m. UTC | #3
On Tue, 2020-11-03 at 17:10 +1100, Victor Ding wrote:
> On Mon, Nov 2, 2020 at 12:39 PM Zhang Rui <rui.zhang@intel.com>
> wrote:
> > On Tue, 2020-10-27 at 07:23 +0000, Victor Ding wrote:
> > > This patch enables AMD Fam17h RAPL support for the power capping
> > > framework. The support is as per AMD Fam17h Model31h (Zen2) and
> > > model 00-ffh (Zen1) PPR.
> > > 
> > > Tested by comparing the results of following two sysfs entries
> > > and
> > > the
> > > values directly read from corresponding MSRs via
> > > /dev/cpu/[x]/msr:
> > >   /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
> > >   /sys/class/powercap/intel-rapl/intel-rapl:0/intel-
> > > rapl:0:0/energy_uj

Is this for just energy reporting? No capping of power?

Thanks,
Srinivas


> > > 
> > > Signed-off-by: Victor Ding <victording@google.com>
> > > Acked-by: Kim Phillips <kim.phillips@amd.com>
> > > 
> > > 
> > > ---
> > > 
> > > Changes in v3:
> > > By Victor Ding <victording@google.com>
> > >  - Rebased to the latest code.
> > >  - Created a new rapl_defaults for AMD CPUs.
> > >  - Removed redundant setting to zeros.
> > >  - Stopped using the fake power limit domain 1.
> > > 
> > > Changes in v2:
> > > By Kim Phillips <kim.phillips@amd.com>:
> > >  - Added Kim's Acked-by.
> > >  - Added Daniel Lezcano to Cc.
> > >  - (No code change).
> > > 
> > >  arch/x86/include/asm/msr-index.h     |  1 +
> > >  drivers/powercap/intel_rapl_common.c |  6 ++++++
> > >  drivers/powercap/intel_rapl_msr.c    | 20 +++++++++++++++++++-
> > >  3 files changed, 26 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/x86/include/asm/msr-index.h
> > > b/arch/x86/include/asm/msr-index.h
> > > index 21917e134ad4..c36a083c8ec0 100644
> > > --- a/arch/x86/include/asm/msr-index.h
> > > +++ b/arch/x86/include/asm/msr-index.h
> > > @@ -327,6 +327,7 @@
> > >  #define MSR_PP1_POLICY                       0x00000642
> > > 
> > >  #define MSR_AMD_RAPL_POWER_UNIT              0xc0010299
> > > +#define MSR_AMD_CORE_ENERGY_STATUS           0xc001029a
> > >  #define MSR_AMD_PKG_ENERGY_STATUS    0xc001029b
> > > 
> > >  /* Config TDP MSRs */
> > > diff --git a/drivers/powercap/intel_rapl_common.c
> > > b/drivers/powercap/intel_rapl_common.c
> > > index 0b2830efc574..bedd780bed12 100644
> > > --- a/drivers/powercap/intel_rapl_common.c
> > > +++ b/drivers/powercap/intel_rapl_common.c
> > > @@ -1011,6 +1011,10 @@ static const struct rapl_defaults
> > > rapl_defaults_cht = {
> > >       .compute_time_window = rapl_compute_time_window_atom,
> > >  };
> > > 
> > > +static const struct rapl_defaults rapl_defaults_amd = {
> > > +     .check_unit = rapl_check_unit_core,
> > > +};
> > > +
> > 
> > why do we need power_unit and time_unit if we only want to expose
> > the
> > energy counter?
> AMD's Power Unit MSR provides identical information as Intel's,
> including
> time units, power units, and energy status units. By reusing the
> check unit
> method, we could avoid code duplication as well as easing future
> enhance-
> ment when AMD starts to support power limits.
> > Plus, in rapl_init_domains(), PL1 is enabled for every RAPL Domain
> > blindly, I'm not sure how this is handled on the AMD CPUs.
> > Is PL1 invalidated by rapl_detect_powerlimit()? or is it still
> > registered as a valid constraint into powercap sysfs I/F?
> AMD's CORE_ENERGY_STAT MSR is like Intel's PP0_ENERGY_STATUS;
> therefore, PL1 also always exists on AMD. rapl_detect_powerlimit()
> correctly
> markes the domain as monitoring-only after finding power limit MSRs
> do not
> exist.
> > Currently, the code makes the assumption that there is only on
> > power
> > limit if priv->limits[domain_id] not set, we probably need to
> > change
> > this if we want to support RAPL domains with no power limit.
> The existing code already supports RAPL domains with no power limit:
> PL1 is
> enabled when there is zero or one power limit,
> rapl_detect_powerlimit() will then
> mark if PL1 is monitoring-only if power limit MSRs do not exist. Both
> AMD's RAPL
> domains are monitoring-only and are correctly marked and handled.
> > thanks,
> > rui
> > >  static const struct x86_cpu_id rapl_ids[] __initconst = {
> > >       X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE,         &rapl_defau
> > > lt
> > > s_core),
> > >       X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE_X,       &rapl_defau
> > > lts_core),
> > > @@ -1061,6 +1065,8 @@ static const struct x86_cpu_id rapl_ids[]
> > > __initconst = {
> > > 
> > >       X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL,        &rapl_defau
> > > lts_hsw_se
> > > rver),
> > >       X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM,        &rapl_defau
> > > lts_hsw_se
> > > rver),
> > > +
> > > +     X86_MATCH_VENDOR_FAM(AMD, 0x17, &rapl_defaults_amd),
> > >       {}
> > >  };
> > >  MODULE_DEVICE_TABLE(x86cpu, rapl_ids);
> > > diff --git a/drivers/powercap/intel_rapl_msr.c
> > > b/drivers/powercap/intel_rapl_msr.c
> > > index a819b3b89b2f..78213d4b5b16 100644
> > > --- a/drivers/powercap/intel_rapl_msr.c
> > > +++ b/drivers/powercap/intel_rapl_msr.c
> > > @@ -49,6 +49,14 @@ static struct rapl_if_priv rapl_msr_priv_intel
> > > = {
> > >       .limits[RAPL_DOMAIN_PLATFORM] = 2,
> > >  };
> > > 
> > > +static struct rapl_if_priv rapl_msr_priv_amd = {
> > > +     .reg_unit = MSR_AMD_RAPL_POWER_UNIT,
> > > +     .regs[RAPL_DOMAIN_PACKAGE] = {
> > > +             0, MSR_AMD_PKG_ENERGY_STATUS, 0, 0, 0 },
> > > +     .regs[RAPL_DOMAIN_PP0] = {
> > > +             0, MSR_AMD_CORE_ENERGY_STATUS, 0, 0, 0 },
> > > +};
> > > +
> > >  /* Handles CPU hotplug on multi-socket systems.
> > >   * If a CPU goes online as the first CPU of the physical package
> > >   * we add the RAPL package to the system. Similarly, when the
> > > last
> > > @@ -138,7 +146,17 @@ static int rapl_msr_probe(struct
> > > platform_device
> > > *pdev)
> > >       const struct x86_cpu_id *id =
> > > x86_match_cpu(pl4_support_ids);
> > >       int ret;
> > > 
> > > -     rapl_msr_priv = &rapl_msr_priv_intel;
> > > +     switch (boot_cpu_data.x86_vendor) {
> > > +     case X86_VENDOR_INTEL:
> > > +             rapl_msr_priv = &rapl_msr_priv_intel;
> > > +             break;
> > > +     case X86_VENDOR_AMD:
> > > +             rapl_msr_priv = &rapl_msr_priv_amd;
> > > +             break;
> > > +     default:
> > > +             pr_err("intel-rapl does not support CPU vendor
> > > %d\n",
> > > boot_cpu_data.x86_vendor);
> > > +             return -ENODEV;
> > > +     }
> > >       rapl_msr_priv->read_raw = rapl_msr_read_raw;
> > >       rapl_msr_priv->write_raw = rapl_msr_write_raw;
> > > 
> Best regards,
> Victor Ding
Victor Ding Nov. 4, 2020, 1:43 a.m. UTC | #4
On Wed, Nov 4, 2020 at 4:09 AM Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
>
> On Tue, 2020-11-03 at 17:10 +1100, Victor Ding wrote:
> > On Mon, Nov 2, 2020 at 12:39 PM Zhang Rui <rui.zhang@intel.com>
> > wrote:
> > > On Tue, 2020-10-27 at 07:23 +0000, Victor Ding wrote:
> > > > This patch enables AMD Fam17h RAPL support for the power capping
> > > > framework. The support is as per AMD Fam17h Model31h (Zen2) and
> > > > model 00-ffh (Zen1) PPR.
> > > >
> > > > Tested by comparing the results of following two sysfs entries
> > > > and
> > > > the
> > > > values directly read from corresponding MSRs via
> > > > /dev/cpu/[x]/msr:
> > > >   /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
> > > >   /sys/class/powercap/intel-rapl/intel-rapl:0/intel-
> > > > rapl:0:0/energy_uj
>
> Is this for just energy reporting? No capping of power?
Correct, the hardware does not support capping of power.
>
> Thanks,
> Srinivas
>
>
> > > >
> > > > Signed-off-by: Victor Ding <victording@google.com>
> > > > Acked-by: Kim Phillips <kim.phillips@amd.com>
> > > >
> > > >
> > > > ---
> > > >
> > > > Changes in v3:
> > > > By Victor Ding <victording@google.com>
> > > >  - Rebased to the latest code.
> > > >  - Created a new rapl_defaults for AMD CPUs.
> > > >  - Removed redundant setting to zeros.
> > > >  - Stopped using the fake power limit domain 1.
> > > >
> > > > Changes in v2:
> > > > By Kim Phillips <kim.phillips@amd.com>:
> > > >  - Added Kim's Acked-by.
> > > >  - Added Daniel Lezcano to Cc.
> > > >  - (No code change).
> > > >
> > > >  arch/x86/include/asm/msr-index.h     |  1 +
> > > >  drivers/powercap/intel_rapl_common.c |  6 ++++++
> > > >  drivers/powercap/intel_rapl_msr.c    | 20 +++++++++++++++++++-
> > > >  3 files changed, 26 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/arch/x86/include/asm/msr-index.h
> > > > b/arch/x86/include/asm/msr-index.h
> > > > index 21917e134ad4..c36a083c8ec0 100644
> > > > --- a/arch/x86/include/asm/msr-index.h
> > > > +++ b/arch/x86/include/asm/msr-index.h
> > > > @@ -327,6 +327,7 @@
> > > >  #define MSR_PP1_POLICY                       0x00000642
> > > >
> > > >  #define MSR_AMD_RAPL_POWER_UNIT              0xc0010299
> > > > +#define MSR_AMD_CORE_ENERGY_STATUS           0xc001029a
> > > >  #define MSR_AMD_PKG_ENERGY_STATUS    0xc001029b
> > > >
> > > >  /* Config TDP MSRs */
> > > > diff --git a/drivers/powercap/intel_rapl_common.c
> > > > b/drivers/powercap/intel_rapl_common.c
> > > > index 0b2830efc574..bedd780bed12 100644
> > > > --- a/drivers/powercap/intel_rapl_common.c
> > > > +++ b/drivers/powercap/intel_rapl_common.c
> > > > @@ -1011,6 +1011,10 @@ static const struct rapl_defaults
> > > > rapl_defaults_cht = {
> > > >       .compute_time_window = rapl_compute_time_window_atom,
> > > >  };
> > > >
> > > > +static const struct rapl_defaults rapl_defaults_amd = {
> > > > +     .check_unit = rapl_check_unit_core,
> > > > +};
> > > > +
> > >
> > > why do we need power_unit and time_unit if we only want to expose
> > > the
> > > energy counter?
> > AMD's Power Unit MSR provides identical information as Intel's,
> > including
> > time units, power units, and energy status units. By reusing the
> > check unit
> > method, we could avoid code duplication as well as easing future
> > enhance-
> > ment when AMD starts to support power limits.
> > > Plus, in rapl_init_domains(), PL1 is enabled for every RAPL Domain
> > > blindly, I'm not sure how this is handled on the AMD CPUs.
> > > Is PL1 invalidated by rapl_detect_powerlimit()? or is it still
> > > registered as a valid constraint into powercap sysfs I/F?
> > AMD's CORE_ENERGY_STAT MSR is like Intel's PP0_ENERGY_STATUS;
> > therefore, PL1 also always exists on AMD. rapl_detect_powerlimit()
> > correctly
> > markes the domain as monitoring-only after finding power limit MSRs
> > do not
> > exist.
> > > Currently, the code makes the assumption that there is only on
> > > power
> > > limit if priv->limits[domain_id] not set, we probably need to
> > > change
> > > this if we want to support RAPL domains with no power limit.
> > The existing code already supports RAPL domains with no power limit:
> > PL1 is
> > enabled when there is zero or one power limit,
> > rapl_detect_powerlimit() will then
> > mark if PL1 is monitoring-only if power limit MSRs do not exist. Both
> > AMD's RAPL
> > domains are monitoring-only and are correctly marked and handled.
> > > thanks,
> > > rui
> > > >  static const struct x86_cpu_id rapl_ids[] __initconst = {
> > > >       X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE,         &rapl_defau
> > > > lt
> > > > s_core),
> > > >       X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE_X,       &rapl_defau
> > > > lts_core),
> > > > @@ -1061,6 +1065,8 @@ static const struct x86_cpu_id rapl_ids[]
> > > > __initconst = {
> > > >
> > > >       X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL,        &rapl_defau
> > > > lts_hsw_se
> > > > rver),
> > > >       X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM,        &rapl_defau
> > > > lts_hsw_se
> > > > rver),
> > > > +
> > > > +     X86_MATCH_VENDOR_FAM(AMD, 0x17, &rapl_defaults_amd),
> > > >       {}
> > > >  };
> > > >  MODULE_DEVICE_TABLE(x86cpu, rapl_ids);
> > > > diff --git a/drivers/powercap/intel_rapl_msr.c
> > > > b/drivers/powercap/intel_rapl_msr.c
> > > > index a819b3b89b2f..78213d4b5b16 100644
> > > > --- a/drivers/powercap/intel_rapl_msr.c
> > > > +++ b/drivers/powercap/intel_rapl_msr.c
> > > > @@ -49,6 +49,14 @@ static struct rapl_if_priv rapl_msr_priv_intel
> > > > = {
> > > >       .limits[RAPL_DOMAIN_PLATFORM] = 2,
> > > >  };
> > > >
> > > > +static struct rapl_if_priv rapl_msr_priv_amd = {
> > > > +     .reg_unit = MSR_AMD_RAPL_POWER_UNIT,
> > > > +     .regs[RAPL_DOMAIN_PACKAGE] = {
> > > > +             0, MSR_AMD_PKG_ENERGY_STATUS, 0, 0, 0 },
> > > > +     .regs[RAPL_DOMAIN_PP0] = {
> > > > +             0, MSR_AMD_CORE_ENERGY_STATUS, 0, 0, 0 },
> > > > +};
> > > > +
> > > >  /* Handles CPU hotplug on multi-socket systems.
> > > >   * If a CPU goes online as the first CPU of the physical package
> > > >   * we add the RAPL package to the system. Similarly, when the
> > > > last
> > > > @@ -138,7 +146,17 @@ static int rapl_msr_probe(struct
> > > > platform_device
> > > > *pdev)
> > > >       const struct x86_cpu_id *id =
> > > > x86_match_cpu(pl4_support_ids);
> > > >       int ret;
> > > >
> > > > -     rapl_msr_priv = &rapl_msr_priv_intel;
> > > > +     switch (boot_cpu_data.x86_vendor) {
> > > > +     case X86_VENDOR_INTEL:
> > > > +             rapl_msr_priv = &rapl_msr_priv_intel;
> > > > +             break;
> > > > +     case X86_VENDOR_AMD:
> > > > +             rapl_msr_priv = &rapl_msr_priv_amd;
> > > > +             break;
> > > > +     default:
> > > > +             pr_err("intel-rapl does not support CPU vendor
> > > > %d\n",
> > > > boot_cpu_data.x86_vendor);
> > > > +             return -ENODEV;
> > > > +     }
> > > >       rapl_msr_priv->read_raw = rapl_msr_read_raw;
> > > >       rapl_msr_priv->write_raw = rapl_msr_write_raw;
> > > >
> > Best regards,
> > Victor Ding
>
Best regards,
Victor Ding
Srinivas Pandruvada Nov. 4, 2020, 2:16 a.m. UTC | #5
On Wed, 2020-11-04 at 12:43 +1100, Victor Ding wrote:
> On Wed, Nov 4, 2020 at 4:09 AM Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com> wrote:
> > On Tue, 2020-11-03 at 17:10 +1100, Victor Ding wrote:
> > > On Mon, Nov 2, 2020 at 12:39 PM Zhang Rui <rui.zhang@intel.com>
> > > wrote:
> > > > On Tue, 2020-10-27 at 07:23 +0000, Victor Ding wrote:
> > > > > This patch enables AMD Fam17h RAPL support for the power
> > > > > capping
> > > > > framework. The support is as per AMD Fam17h Model31h (Zen2)
> > > > > and
> > > > > model 00-ffh (Zen1) PPR.
> > > > > 
> > > > > Tested by comparing the results of following two sysfs
> > > > > entries
> > > > > and
> > > > > the
> > > > > values directly read from corresponding MSRs via
> > > > > /dev/cpu/[x]/msr:
> > > > >   /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
> > > > >   /sys/class/powercap/intel-rapl/intel-rapl:0/intel-
> > > > > rapl:0:0/energy_uj
> > 
> > Is this for just energy reporting? No capping of power?
> Correct, the hardware does not support capping of power.
I wonder if there is no capping, is this the right interface?
Do you have specific user space, which cares about this?

I think these counters are already exposed via hwmon sysf.

Thanks,
Srinivas

> > Thanks,
> > Srinivas
> > 
> > 
> > > > > Signed-off-by: Victor Ding <victording@google.com>
> > > > > Acked-by: Kim Phillips <kim.phillips@amd.com>
> > > > > 
> > > > > 
> > > > > ---
> > > > > 
> > > > > Changes in v3:
> > > > > By Victor Ding <victording@google.com>
> > > > >  - Rebased to the latest code.
> > > > >  - Created a new rapl_defaults for AMD CPUs.
> > > > >  - Removed redundant setting to zeros.
> > > > >  - Stopped using the fake power limit domain 1.
> > > > > 
> > > > > Changes in v2:
> > > > > By Kim Phillips <kim.phillips@amd.com>:
> > > > >  - Added Kim's Acked-by.
> > > > >  - Added Daniel Lezcano to Cc.
> > > > >  - (No code change).
> > > > > 
> > > > >  arch/x86/include/asm/msr-index.h     |  1 +
> > > > >  drivers/powercap/intel_rapl_common.c |  6 ++++++
> > > > >  drivers/powercap/intel_rapl_msr.c    | 20
> > > > > +++++++++++++++++++-
> > > > >  3 files changed, 26 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/arch/x86/include/asm/msr-index.h
> > > > > b/arch/x86/include/asm/msr-index.h
> > > > > index 21917e134ad4..c36a083c8ec0 100644
> > > > > --- a/arch/x86/include/asm/msr-index.h
> > > > > +++ b/arch/x86/include/asm/msr-index.h
> > > > > @@ -327,6 +327,7 @@
> > > > >  #define MSR_PP1_POLICY                       0x00000642
> > > > > 
> > > > >  #define MSR_AMD_RAPL_POWER_UNIT              0xc0010299
> > > > > +#define MSR_AMD_CORE_ENERGY_STATUS           0xc001029a
> > > > >  #define MSR_AMD_PKG_ENERGY_STATUS    0xc001029b
> > > > > 
> > > > >  /* Config TDP MSRs */
> > > > > diff --git a/drivers/powercap/intel_rapl_common.c
> > > > > b/drivers/powercap/intel_rapl_common.c
> > > > > index 0b2830efc574..bedd780bed12 100644
> > > > > --- a/drivers/powercap/intel_rapl_common.c
> > > > > +++ b/drivers/powercap/intel_rapl_common.c
> > > > > @@ -1011,6 +1011,10 @@ static const struct rapl_defaults
> > > > > rapl_defaults_cht = {
> > > > >       .compute_time_window = rapl_compute_time_window_atom,
> > > > >  };
> > > > > 
> > > > > +static const struct rapl_defaults rapl_defaults_amd = {
> > > > > +     .check_unit = rapl_check_unit_core,
> > > > > +};
> > > > > +
> > > > 
> > > > why do we need power_unit and time_unit if we only want to
> > > > expose
> > > > the
> > > > energy counter?
> > > AMD's Power Unit MSR provides identical information as Intel's,
> > > including
> > > time units, power units, and energy status units. By reusing the
> > > check unit
> > > method, we could avoid code duplication as well as easing future
> > > enhance-
> > > ment when AMD starts to support power limits.
> > > > Plus, in rapl_init_domains(), PL1 is enabled for every RAPL
> > > > Domain
> > > > blindly, I'm not sure how this is handled on the AMD CPUs.
> > > > Is PL1 invalidated by rapl_detect_powerlimit()? or is it still
> > > > registered as a valid constraint into powercap sysfs I/F?
> > > AMD's CORE_ENERGY_STAT MSR is like Intel's PP0_ENERGY_STATUS;
> > > therefore, PL1 also always exists on AMD.
> > > rapl_detect_powerlimit()
> > > correctly
> > > markes the domain as monitoring-only after finding power limit
> > > MSRs
> > > do not
> > > exist.
> > > > Currently, the code makes the assumption that there is only on
> > > > power
> > > > limit if priv->limits[domain_id] not set, we probably need to
> > > > change
> > > > this if we want to support RAPL domains with no power limit.
> > > The existing code already supports RAPL domains with no power
> > > limit:
> > > PL1 is
> > > enabled when there is zero or one power limit,
> > > rapl_detect_powerlimit() will then
> > > mark if PL1 is monitoring-only if power limit MSRs do not exist.
> > > Both
> > > AMD's RAPL
> > > domains are monitoring-only and are correctly marked and handled.
> > > > thanks,
> > > > rui
> > > > >  static const struct x86_cpu_id rapl_ids[] __initconst = {
> > > > >       X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE,         &rapl_d
> > > > > efau
> > > > > lt
> > > > > s_core),
> > > > >       X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE_X,       &rapl_d
> > > > > efau
> > > > > lts_core),
> > > > > @@ -1061,6 +1065,8 @@ static const struct x86_cpu_id
> > > > > rapl_ids[]
> > > > > __initconst = {
> > > > > 
> > > > >       X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL,        &rapl_d
> > > > > efau
> > > > > lts_hsw_se
> > > > > rver),
> > > > >       X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM,        &rapl_d
> > > > > efau
> > > > > lts_hsw_se
> > > > > rver),
> > > > > +
> > > > > +     X86_MATCH_VENDOR_FAM(AMD, 0x17, &rapl_defaults_amd),
> > > > >       {}
> > > > >  };
> > > > >  MODULE_DEVICE_TABLE(x86cpu, rapl_ids);
> > > > > diff --git a/drivers/powercap/intel_rapl_msr.c
> > > > > b/drivers/powercap/intel_rapl_msr.c
> > > > > index a819b3b89b2f..78213d4b5b16 100644
> > > > > --- a/drivers/powercap/intel_rapl_msr.c
> > > > > +++ b/drivers/powercap/intel_rapl_msr.c
> > > > > @@ -49,6 +49,14 @@ static struct rapl_if_priv
> > > > > rapl_msr_priv_intel
> > > > > = {
> > > > >       .limits[RAPL_DOMAIN_PLATFORM] = 2,
> > > > >  };
> > > > > 
> > > > > +static struct rapl_if_priv rapl_msr_priv_amd = {
> > > > > +     .reg_unit = MSR_AMD_RAPL_POWER_UNIT,
> > > > > +     .regs[RAPL_DOMAIN_PACKAGE] = {
> > > > > +             0, MSR_AMD_PKG_ENERGY_STATUS, 0, 0, 0 },
> > > > > +     .regs[RAPL_DOMAIN_PP0] = {
> > > > > +             0, MSR_AMD_CORE_ENERGY_STATUS, 0, 0, 0 },
> > > > > +};
> > > > > +
> > > > >  /* Handles CPU hotplug on multi-socket systems.
> > > > >   * If a CPU goes online as the first CPU of the physical
> > > > > package
> > > > >   * we add the RAPL package to the system. Similarly, when
> > > > > the
> > > > > last
> > > > > @@ -138,7 +146,17 @@ static int rapl_msr_probe(struct
> > > > > platform_device
> > > > > *pdev)
> > > > >       const struct x86_cpu_id *id =
> > > > > x86_match_cpu(pl4_support_ids);
> > > > >       int ret;
> > > > > 
> > > > > -     rapl_msr_priv = &rapl_msr_priv_intel;
> > > > > +     switch (boot_cpu_data.x86_vendor) {
> > > > > +     case X86_VENDOR_INTEL:
> > > > > +             rapl_msr_priv = &rapl_msr_priv_intel;
> > > > > +             break;
> > > > > +     case X86_VENDOR_AMD:
> > > > > +             rapl_msr_priv = &rapl_msr_priv_amd;
> > > > > +             break;
> > > > > +     default:
> > > > > +             pr_err("intel-rapl does not support CPU vendor
> > > > > %d\n",
> > > > > boot_cpu_data.x86_vendor);
> > > > > +             return -ENODEV;
> > > > > +     }
> > > > >       rapl_msr_priv->read_raw = rapl_msr_read_raw;
> > > > >       rapl_msr_priv->write_raw = rapl_msr_write_raw;
> > > > > 
> > > Best regards,
> > > Victor Ding
> Best regards,
> Victor Ding
Victor Ding Nov. 5, 2020, 3:53 a.m. UTC | #6
On Wed, Nov 4, 2020 at 1:17 PM Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
>
> On Wed, 2020-11-04 at 12:43 +1100, Victor Ding wrote:
> > On Wed, Nov 4, 2020 at 4:09 AM Srinivas Pandruvada
> > <srinivas.pandruvada@linux.intel.com> wrote:
> > > On Tue, 2020-11-03 at 17:10 +1100, Victor Ding wrote:
> > > > On Mon, Nov 2, 2020 at 12:39 PM Zhang Rui <rui.zhang@intel.com>
> > > > wrote:
> > > > > On Tue, 2020-10-27 at 07:23 +0000, Victor Ding wrote:
> > > > > > This patch enables AMD Fam17h RAPL support for the power
> > > > > > capping
> > > > > > framework. The support is as per AMD Fam17h Model31h (Zen2)
> > > > > > and
> > > > > > model 00-ffh (Zen1) PPR.
> > > > > >
> > > > > > Tested by comparing the results of following two sysfs
> > > > > > entries
> > > > > > and
> > > > > > the
> > > > > > values directly read from corresponding MSRs via
> > > > > > /dev/cpu/[x]/msr:
> > > > > >   /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
> > > > > >   /sys/class/powercap/intel-rapl/intel-rapl:0/intel-
> > > > > > rapl:0:0/energy_uj
> > >
> > > Is this for just energy reporting? No capping of power?
> > Correct, the hardware does not support capping of power.
> I wonder if there is no capping, is this the right interface?
> Do you have specific user space, which cares about this?
We have tools that previously developed to measure energy status
on Intel via the powercap interface. Powercap is the only interface
allowing reading RAPL energy counters without requiring MSR access
privileges. We want to use these tools on AMD with minimal modifications.
I believe the powercap interface should support these counters,
regardless of the use cases, mainly for two reasons:
1. Powercap interface already supports monitoring-only power domains,
e.g. power limit is locked by BIOS or the (Intel) CPU does not expose an
MSR for certain power domains. The latter is the exact situation on AMD;
2. As AMD has partially introduced the equivalent of Intel's RAPL, we
should leverage this opportunity to reduce the divergence in the APIs. i.e.
OS as a hardware abstraction layer should allow users to use the same
set of APIs to access RAPL features if it issupported on both Intel and AMD.
In this specific case, if users can query for Intel's RAPL counters via
powercap, they should be able to do so as well for AMD's.
>
> I think these counters are already exposed via hwmon sysf.
Yes, they were introduced early this year. However, it is not the same as
the counters exposed via powercap interface: powercap exposes the
actual value of the energy counters while hwmon adds an accumulation
layer on top.
In addition, I don't think Intel's RAPL counters are exposed via hwmon;
therefore: 1. existing fine grade power monitoring tools are not based on
hwmon; 2. new tools cannot query the same set of counters via the same
API so that they have to actively maintain two sets of logic.
>
> Thanks,
> Srinivas
>
> > > Thanks,
> > > Srinivas
> > >
> > >
> > > > > > Signed-off-by: Victor Ding <victording@google.com>
> > > > > > Acked-by: Kim Phillips <kim.phillips@amd.com>
> > > > > >
> > > > > >
> > > > > > ---
> > > > > >
> > > > > > Changes in v3:
> > > > > > By Victor Ding <victording@google.com>
> > > > > >  - Rebased to the latest code.
> > > > > >  - Created a new rapl_defaults for AMD CPUs.
> > > > > >  - Removed redundant setting to zeros.
> > > > > >  - Stopped using the fake power limit domain 1.
> > > > > >
> > > > > > Changes in v2:
> > > > > > By Kim Phillips <kim.phillips@amd.com>:
> > > > > >  - Added Kim's Acked-by.
> > > > > >  - Added Daniel Lezcano to Cc.
> > > > > >  - (No code change).
> > > > > >
> > > > > >  arch/x86/include/asm/msr-index.h     |  1 +
> > > > > >  drivers/powercap/intel_rapl_common.c |  6 ++++++
> > > > > >  drivers/powercap/intel_rapl_msr.c    | 20
> > > > > > +++++++++++++++++++-
> > > > > >  3 files changed, 26 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/arch/x86/include/asm/msr-index.h
> > > > > > b/arch/x86/include/asm/msr-index.h
> > > > > > index 21917e134ad4..c36a083c8ec0 100644
> > > > > > --- a/arch/x86/include/asm/msr-index.h
> > > > > > +++ b/arch/x86/include/asm/msr-index.h
> > > > > > @@ -327,6 +327,7 @@
> > > > > >  #define MSR_PP1_POLICY                       0x00000642
> > > > > >
> > > > > >  #define MSR_AMD_RAPL_POWER_UNIT              0xc0010299
> > > > > > +#define MSR_AMD_CORE_ENERGY_STATUS           0xc001029a
> > > > > >  #define MSR_AMD_PKG_ENERGY_STATUS    0xc001029b
> > > > > >
> > > > > >  /* Config TDP MSRs */
> > > > > > diff --git a/drivers/powercap/intel_rapl_common.c
> > > > > > b/drivers/powercap/intel_rapl_common.c
> > > > > > index 0b2830efc574..bedd780bed12 100644
> > > > > > --- a/drivers/powercap/intel_rapl_common.c
> > > > > > +++ b/drivers/powercap/intel_rapl_common.c
> > > > > > @@ -1011,6 +1011,10 @@ static const struct rapl_defaults
> > > > > > rapl_defaults_cht = {
> > > > > >       .compute_time_window = rapl_compute_time_window_atom,
> > > > > >  };
> > > > > >
> > > > > > +static const struct rapl_defaults rapl_defaults_amd = {
> > > > > > +     .check_unit = rapl_check_unit_core,
> > > > > > +};
> > > > > > +
> > > > >
> > > > > why do we need power_unit and time_unit if we only want to
> > > > > expose
> > > > > the
> > > > > energy counter?
> > > > AMD's Power Unit MSR provides identical information as Intel's,
> > > > including
> > > > time units, power units, and energy status units. By reusing the
> > > > check unit
> > > > method, we could avoid code duplication as well as easing future
> > > > enhance-
> > > > ment when AMD starts to support power limits.
> > > > > Plus, in rapl_init_domains(), PL1 is enabled for every RAPL
> > > > > Domain
> > > > > blindly, I'm not sure how this is handled on the AMD CPUs.
> > > > > Is PL1 invalidated by rapl_detect_powerlimit()? or is it still
> > > > > registered as a valid constraint into powercap sysfs I/F?
> > > > AMD's CORE_ENERGY_STAT MSR is like Intel's PP0_ENERGY_STATUS;
> > > > therefore, PL1 also always exists on AMD.
> > > > rapl_detect_powerlimit()
> > > > correctly
> > > > markes the domain as monitoring-only after finding power limit
> > > > MSRs
> > > > do not
> > > > exist.
> > > > > Currently, the code makes the assumption that there is only on
> > > > > power
> > > > > limit if priv->limits[domain_id] not set, we probably need to
> > > > > change
> > > > > this if we want to support RAPL domains with no power limit.
> > > > The existing code already supports RAPL domains with no power
> > > > limit:
> > > > PL1 is
> > > > enabled when there is zero or one power limit,
> > > > rapl_detect_powerlimit() will then
> > > > mark if PL1 is monitoring-only if power limit MSRs do not exist.
> > > > Both
> > > > AMD's RAPL
> > > > domains are monitoring-only and are correctly marked and handled.
> > > > > thanks,
> > > > > rui
> > > > > >  static const struct x86_cpu_id rapl_ids[] __initconst = {
> > > > > >       X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE,         &rapl_d
> > > > > > efau
> > > > > > lt
> > > > > > s_core),
> > > > > >       X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE_X,       &rapl_d
> > > > > > efau
> > > > > > lts_core),
> > > > > > @@ -1061,6 +1065,8 @@ static const struct x86_cpu_id
> > > > > > rapl_ids[]
> > > > > > __initconst = {
> > > > > >
> > > > > >       X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL,        &rapl_d
> > > > > > efau
> > > > > > lts_hsw_se
> > > > > > rver),
> > > > > >       X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM,        &rapl_d
> > > > > > efau
> > > > > > lts_hsw_se
> > > > > > rver),
> > > > > > +
> > > > > > +     X86_MATCH_VENDOR_FAM(AMD, 0x17, &rapl_defaults_amd),
> > > > > >       {}
> > > > > >  };
> > > > > >  MODULE_DEVICE_TABLE(x86cpu, rapl_ids);
> > > > > > diff --git a/drivers/powercap/intel_rapl_msr.c
> > > > > > b/drivers/powercap/intel_rapl_msr.c
> > > > > > index a819b3b89b2f..78213d4b5b16 100644
> > > > > > --- a/drivers/powercap/intel_rapl_msr.c
> > > > > > +++ b/drivers/powercap/intel_rapl_msr.c
> > > > > > @@ -49,6 +49,14 @@ static struct rapl_if_priv
> > > > > > rapl_msr_priv_intel
> > > > > > = {
> > > > > >       .limits[RAPL_DOMAIN_PLATFORM] = 2,
> > > > > >  };
> > > > > >
> > > > > > +static struct rapl_if_priv rapl_msr_priv_amd = {
> > > > > > +     .reg_unit = MSR_AMD_RAPL_POWER_UNIT,
> > > > > > +     .regs[RAPL_DOMAIN_PACKAGE] = {
> > > > > > +             0, MSR_AMD_PKG_ENERGY_STATUS, 0, 0, 0 },
> > > > > > +     .regs[RAPL_DOMAIN_PP0] = {
> > > > > > +             0, MSR_AMD_CORE_ENERGY_STATUS, 0, 0, 0 },
> > > > > > +};
> > > > > > +
> > > > > >  /* Handles CPU hotplug on multi-socket systems.
> > > > > >   * If a CPU goes online as the first CPU of the physical
> > > > > > package
> > > > > >   * we add the RAPL package to the system. Similarly, when
> > > > > > the
> > > > > > last
> > > > > > @@ -138,7 +146,17 @@ static int rapl_msr_probe(struct
> > > > > > platform_device
> > > > > > *pdev)
> > > > > >       const struct x86_cpu_id *id =
> > > > > > x86_match_cpu(pl4_support_ids);
> > > > > >       int ret;
> > > > > >
> > > > > > -     rapl_msr_priv = &rapl_msr_priv_intel;
> > > > > > +     switch (boot_cpu_data.x86_vendor) {
> > > > > > +     case X86_VENDOR_INTEL:
> > > > > > +             rapl_msr_priv = &rapl_msr_priv_intel;
> > > > > > +             break;
> > > > > > +     case X86_VENDOR_AMD:
> > > > > > +             rapl_msr_priv = &rapl_msr_priv_amd;
> > > > > > +             break;
> > > > > > +     default:
> > > > > > +             pr_err("intel-rapl does not support CPU vendor
> > > > > > %d\n",
> > > > > > boot_cpu_data.x86_vendor);
> > > > > > +             return -ENODEV;
> > > > > > +     }
> > > > > >       rapl_msr_priv->read_raw = rapl_msr_read_raw;
> > > > > >       rapl_msr_priv->write_raw = rapl_msr_write_raw;
> > > > > >
> > > > Best regards,
> > > > Victor Ding
> > Best regards,
> > Victor Ding
>
Best regards,
Victor Ding
Srinivas Pandruvada Nov. 5, 2020, 5:14 p.m. UTC | #7
On Thu, 2020-11-05 at 14:53 +1100, Victor Ding wrote:
> On Wed, Nov 4, 2020 at 1:17 PM Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com> wrote:
> > On Wed, 2020-11-04 at 12:43 +1100, Victor Ding wrote:
> > > On Wed, Nov 4, 2020 at 4:09 AM Srinivas Pandruvada
> > > <srinivas.pandruvada@linux.intel.com> wrote:
> > > > On Tue, 2020-11-03 at 17:10 +1100, Victor Ding wrote:
> > > > > On Mon, Nov 2, 2020 at 12:39 PM Zhang Rui <
> > > > > rui.zhang@intel.com>
> > > > > wrote:
> > > > > > On Tue, 2020-10-27 at 07:23 +0000, Victor Ding wrote:
> > > > > > > This patch enables AMD Fam17h RAPL support for the power
> > > > > > > capping
> > > > > > > framework. The support is as per AMD Fam17h Model31h
> > > > > > > (Zen2)
> > > > > > > and
> > > > > > > model 00-ffh (Zen1) PPR.
> > > > > > > 
> > > > > > > Tested by comparing the results of following two sysfs
> > > > > > > entries
> > > > > > > and
> > > > > > > the
> > > > > > > values directly read from corresponding MSRs via
> > > > > > > /dev/cpu/[x]/msr:
> > > > > > >   /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
> > > > > > >   /sys/class/powercap/intel-rapl/intel-rapl:0/intel-
> > > > > > > rapl:0:0/energy_uj
> > > > 
> > > > Is this for just energy reporting? No capping of power?
> > > Correct, the hardware does not support capping of power.
> > I wonder if there is no capping, is this the right interface?
> > Do you have specific user space, which cares about this?
> We have tools that previously developed to measure energy status
> on Intel via the powercap interface. Powercap is the only interface
> allowing reading RAPL energy counters without requiring MSR access
> privileges. We want to use these tools on AMD with minimal
> modifications.
> I believe the powercap interface should support these counters,
> regardless of the use cases, mainly for two reasons:
> 1. Powercap interface already supports monitoring-only power domains,
> e.g. power limit is locked by BIOS or the (Intel) CPU does not expose
> an
> MSR for certain power domains. The latter is the exact situation on
> AMD;
> 2. As AMD has partially introduced the equivalent of Intel's RAPL, we
> should leverage this opportunity to reduce the divergence in the
> APIs. i.e.
> OS as a hardware abstraction layer should allow users to use the same
> set of APIs to access RAPL features if it issupported on both Intel
> and AMD.
> In this specific case, if users can query for Intel's RAPL counters
> via
> powercap, they should be able to do so as well for AMD's.
> > I think these counters are already exposed via hwmon sysf.
> Yes, they were introduced early this year. However, it is not the
> same as
> the counters exposed via powercap interface: powercap exposes the
> actual value of the energy counters while hwmon adds an accumulation
> layer on top.
> In addition, I don't think Intel's RAPL counters are exposed via
> hwmon;
> therefore: 1. existing fine grade power monitoring tools are not
> based on
> hwmon; 2. new tools cannot query the same set of counters via the
> same
> API so that they have to actively maintain two sets of logic.

Fine with me. I think eventually the power capping interface will be
supported.

Thanks,
Srinivas

> > Thanks,
> > Srinivas
> > 
> > > > Thanks,
> > > > Srinivas
> > > > 
> > > > 
> > > > > > > Signed-off-by: Victor Ding <victording@google.com>
> > > > > > > Acked-by: Kim Phillips <kim.phillips@amd.com>
> > > > > > > 
> > > > > > > 
> > > > > > > ---
> > > > > > > 
> > > > > > > Changes in v3:
> > > > > > > By Victor Ding <victording@google.com>
> > > > > > >  - Rebased to the latest code.
> > > > > > >  - Created a new rapl_defaults for AMD CPUs.
> > > > > > >  - Removed redundant setting to zeros.
> > > > > > >  - Stopped using the fake power limit domain 1.
> > > > > > > 
> > > > > > > Changes in v2:
> > > > > > > By Kim Phillips <kim.phillips@amd.com>:
> > > > > > >  - Added Kim's Acked-by.
> > > > > > >  - Added Daniel Lezcano to Cc.
> > > > > > >  - (No code change).
> > > > > > > 
> > > > > > >  arch/x86/include/asm/msr-index.h     |  1 +
> > > > > > >  drivers/powercap/intel_rapl_common.c |  6 ++++++
> > > > > > >  drivers/powercap/intel_rapl_msr.c    | 20
> > > > > > > +++++++++++++++++++-
> > > > > > >  3 files changed, 26 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/arch/x86/include/asm/msr-index.h
> > > > > > > b/arch/x86/include/asm/msr-index.h
> > > > > > > index 21917e134ad4..c36a083c8ec0 100644
> > > > > > > --- a/arch/x86/include/asm/msr-index.h
> > > > > > > +++ b/arch/x86/include/asm/msr-index.h
> > > > > > > @@ -327,6 +327,7 @@
> > > > > > >  #define MSR_PP1_POLICY                       0x00000642
> > > > > > > 
> > > > > > >  #define MSR_AMD_RAPL_POWER_UNIT              0xc0010299
> > > > > > > +#define MSR_AMD_CORE_ENERGY_STATUS           0xc001029a
> > > > > > >  #define MSR_AMD_PKG_ENERGY_STATUS    0xc001029b
> > > > > > > 
> > > > > > >  /* Config TDP MSRs */
> > > > > > > diff --git a/drivers/powercap/intel_rapl_common.c
> > > > > > > b/drivers/powercap/intel_rapl_common.c
> > > > > > > index 0b2830efc574..bedd780bed12 100644
> > > > > > > --- a/drivers/powercap/intel_rapl_common.c
> > > > > > > +++ b/drivers/powercap/intel_rapl_common.c
> > > > > > > @@ -1011,6 +1011,10 @@ static const struct rapl_defaults
> > > > > > > rapl_defaults_cht = {
> > > > > > >       .compute_time_window =
> > > > > > > rapl_compute_time_window_atom,
> > > > > > >  };
> > > > > > > 
> > > > > > > +static const struct rapl_defaults rapl_defaults_amd = {
> > > > > > > +     .check_unit = rapl_check_unit_core,
> > > > > > > +};
> > > > > > > +
> > > > > > 
> > > > > > why do we need power_unit and time_unit if we only want to
> > > > > > expose
> > > > > > the
> > > > > > energy counter?
> > > > > AMD's Power Unit MSR provides identical information as
> > > > > Intel's,
> > > > > including
> > > > > time units, power units, and energy status units. By reusing
> > > > > the
> > > > > check unit
> > > > > method, we could avoid code duplication as well as easing
> > > > > future
> > > > > enhance-
> > > > > ment when AMD starts to support power limits.
> > > > > > Plus, in rapl_init_domains(), PL1 is enabled for every RAPL
> > > > > > Domain
> > > > > > blindly, I'm not sure how this is handled on the AMD CPUs.
> > > > > > Is PL1 invalidated by rapl_detect_powerlimit()? or is it
> > > > > > still
> > > > > > registered as a valid constraint into powercap sysfs I/F?
> > > > > AMD's CORE_ENERGY_STAT MSR is like Intel's PP0_ENERGY_STATUS;
> > > > > therefore, PL1 also always exists on AMD.
> > > > > rapl_detect_powerlimit()
> > > > > correctly
> > > > > markes the domain as monitoring-only after finding power
> > > > > limit
> > > > > MSRs
> > > > > do not
> > > > > exist.
> > > > > > Currently, the code makes the assumption that there is only
> > > > > > on
> > > > > > power
> > > > > > limit if priv->limits[domain_id] not set, we probably need
> > > > > > to
> > > > > > change
> > > > > > this if we want to support RAPL domains with no power
> > > > > > limit.
> > > > > The existing code already supports RAPL domains with no power
> > > > > limit:
> > > > > PL1 is
> > > > > enabled when there is zero or one power limit,
> > > > > rapl_detect_powerlimit() will then
> > > > > mark if PL1 is monitoring-only if power limit MSRs do not
> > > > > exist.
> > > > > Both
> > > > > AMD's RAPL
> > > > > domains are monitoring-only and are correctly marked and
> > > > > handled.
> > > > > > thanks,
> > > > > > rui
> > > > > > >  static const struct x86_cpu_id rapl_ids[] __initconst =
> > > > > > > {
> > > > > > >       X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE,         &ra
> > > > > > > pl_d
> > > > > > > efau
> > > > > > > lt
> > > > > > > s_core),
> > > > > > >       X86_MATCH_INTEL_FAM6_MODEL(SANDYBRIDGE_X,       &ra
> > > > > > > pl_d
> > > > > > > efau
> > > > > > > lts_core),
> > > > > > > @@ -1061,6 +1065,8 @@ static const struct x86_cpu_id
> > > > > > > rapl_ids[]
> > > > > > > __initconst = {
> > > > > > > 
> > > > > > >       X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNL,        &ra
> > > > > > > pl_d
> > > > > > > efau
> > > > > > > lts_hsw_se
> > > > > > > rver),
> > > > > > >       X86_MATCH_INTEL_FAM6_MODEL(XEON_PHI_KNM,        &ra
> > > > > > > pl_d
> > > > > > > efau
> > > > > > > lts_hsw_se
> > > > > > > rver),
> > > > > > > +
> > > > > > > +     X86_MATCH_VENDOR_FAM(AMD, 0x17,
> > > > > > > &rapl_defaults_amd),
> > > > > > >       {}
> > > > > > >  };
> > > > > > >  MODULE_DEVICE_TABLE(x86cpu, rapl_ids);
> > > > > > > diff --git a/drivers/powercap/intel_rapl_msr.c
> > > > > > > b/drivers/powercap/intel_rapl_msr.c
> > > > > > > index a819b3b89b2f..78213d4b5b16 100644
> > > > > > > --- a/drivers/powercap/intel_rapl_msr.c
> > > > > > > +++ b/drivers/powercap/intel_rapl_msr.c
> > > > > > > @@ -49,6 +49,14 @@ static struct rapl_if_priv
> > > > > > > rapl_msr_priv_intel
> > > > > > > = {
> > > > > > >       .limits[RAPL_DOMAIN_PLATFORM] = 2,
> > > > > > >  };
> > > > > > > 
> > > > > > > +static struct rapl_if_priv rapl_msr_priv_amd = {
> > > > > > > +     .reg_unit = MSR_AMD_RAPL_POWER_UNIT,
> > > > > > > +     .regs[RAPL_DOMAIN_PACKAGE] = {
> > > > > > > +             0, MSR_AMD_PKG_ENERGY_STATUS, 0, 0, 0 },
> > > > > > > +     .regs[RAPL_DOMAIN_PP0] = {
> > > > > > > +             0, MSR_AMD_CORE_ENERGY_STATUS, 0, 0, 0 },
> > > > > > > +};
> > > > > > > +
> > > > > > >  /* Handles CPU hotplug on multi-socket systems.
> > > > > > >   * If a CPU goes online as the first CPU of the physical
> > > > > > > package
> > > > > > >   * we add the RAPL package to the system. Similarly,
> > > > > > > when
> > > > > > > the
> > > > > > > last
> > > > > > > @@ -138,7 +146,17 @@ static int rapl_msr_probe(struct
> > > > > > > platform_device
> > > > > > > *pdev)
> > > > > > >       const struct x86_cpu_id *id =
> > > > > > > x86_match_cpu(pl4_support_ids);
> > > > > > >       int ret;
> > > > > > > 
> > > > > > > -     rapl_msr_priv = &rapl_msr_priv_intel;
> > > > > > > +     switch (boot_cpu_data.x86_vendor) {
> > > > > > > +     case X86_VENDOR_INTEL:
> > > > > > > +             rapl_msr_priv = &rapl_msr_priv_intel;
> > > > > > > +             break;
> > > > > > > +     case X86_VENDOR_AMD:
> > > > > > > +             rapl_msr_priv = &rapl_msr_priv_amd;
> > > > > > > +             break;
> > > > > > > +     default:
> > > > > > > +             pr_err("intel-rapl does not support CPU
> > > > > > > vendor
> > > > > > > %d\n",
> > > > > > > boot_cpu_data.x86_vendor);
> > > > > > > +             return -ENODEV;
> > > > > > > +     }
> > > > > > >       rapl_msr_priv->read_raw = rapl_msr_read_raw;
> > > > > > >       rapl_msr_priv->write_raw = rapl_msr_write_raw;
> > > > > > > 
> > > > > Best regards,
> > > > > Victor Ding
> > > Best regards,
> > > Victor Ding
> Best regards,
> Victor Ding
Rafael J. Wysocki Nov. 5, 2020, 6:04 p.m. UTC | #8
On Thursday, November 5, 2020 6:14:01 PM CET Srinivas Pandruvada wrote:
> On Thu, 2020-11-05 at 14:53 +1100, Victor Ding wrote:

> > On Wed, Nov 4, 2020 at 1:17 PM Srinivas Pandruvada

> > <srinivas.pandruvada@linux.intel.com> wrote:

> > > On Wed, 2020-11-04 at 12:43 +1100, Victor Ding wrote:

> > > > On Wed, Nov 4, 2020 at 4:09 AM Srinivas Pandruvada

> > > > <srinivas.pandruvada@linux.intel.com> wrote:

> > > > > On Tue, 2020-11-03 at 17:10 +1100, Victor Ding wrote:

> > > > > > On Mon, Nov 2, 2020 at 12:39 PM Zhang Rui <

> > > > > > rui.zhang@intel.com>

> > > > > > wrote:

> > > > > > > On Tue, 2020-10-27 at 07:23 +0000, Victor Ding wrote:

> > > > > > > > This patch enables AMD Fam17h RAPL support for the power

> > > > > > > > capping

> > > > > > > > framework. The support is as per AMD Fam17h Model31h

> > > > > > > > (Zen2)

> > > > > > > > and

> > > > > > > > model 00-ffh (Zen1) PPR.

> > > > > > > > 

> > > > > > > > Tested by comparing the results of following two sysfs

> > > > > > > > entries

> > > > > > > > and

> > > > > > > > the

> > > > > > > > values directly read from corresponding MSRs via

> > > > > > > > /dev/cpu/[x]/msr:

> > > > > > > >   /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj

> > > > > > > >   /sys/class/powercap/intel-rapl/intel-rapl:0/intel-

> > > > > > > > rapl:0:0/energy_uj

> > > > > 

> > > > > Is this for just energy reporting? No capping of power?

> > > > Correct, the hardware does not support capping of power.

> > > I wonder if there is no capping, is this the right interface?

> > > Do you have specific user space, which cares about this?

> > We have tools that previously developed to measure energy status

> > on Intel via the powercap interface. Powercap is the only interface

> > allowing reading RAPL energy counters without requiring MSR access

> > privileges. We want to use these tools on AMD with minimal

> > modifications.

> > I believe the powercap interface should support these counters,

> > regardless of the use cases, mainly for two reasons:

> > 1. Powercap interface already supports monitoring-only power domains,

> > e.g. power limit is locked by BIOS or the (Intel) CPU does not expose

> > an

> > MSR for certain power domains. The latter is the exact situation on

> > AMD;

> > 2. As AMD has partially introduced the equivalent of Intel's RAPL, we

> > should leverage this opportunity to reduce the divergence in the

> > APIs. i.e.

> > OS as a hardware abstraction layer should allow users to use the same

> > set of APIs to access RAPL features if it issupported on both Intel

> > and AMD.

> > In this specific case, if users can query for Intel's RAPL counters

> > via

> > powercap, they should be able to do so as well for AMD's.

> > > I think these counters are already exposed via hwmon sysf.

> > Yes, they were introduced early this year. However, it is not the

> > same as

> > the counters exposed via powercap interface: powercap exposes the

> > actual value of the energy counters while hwmon adds an accumulation

> > layer on top.

> > In addition, I don't think Intel's RAPL counters are exposed via

> > hwmon;

> > therefore: 1. existing fine grade power monitoring tools are not

> > based on

> > hwmon; 2. new tools cannot query the same set of counters via the

> > same

> > API so that they have to actively maintain two sets of logic.

> 

> Fine with me.


OK, I'll queue up the series for 5.11 then if there are no other concerns.

Thanks!
Rafael J. Wysocki Nov. 10, 2020, 7:24 p.m. UTC | #9
On Tue, Oct 27, 2020 at 8:24 AM Victor Ding <victording@google.com> wrote:
>

> This patch series adds support for AMD Fam17h RAPL counters. As per

> AMD PPR, Fam17h and Fam19h support RAPL counters to monitor power

> usage. The RAPL counter operates as with Intel RAPL. Therefore, it is

> beneficial to re-use existing framework for Intel, especially to

> allow existing tools to seamlessly run on AMD.

>

> From the user's point view, this series enables the following two sysfs

> entry on AMD Fam17h or Fam19h:

>   /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj

>   /sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0/energy_uj

>

> Changes in v3:

> By Victor Ding <victording@google.com>

>  - Rebased to the latest code.

>  - Created a new rapl_defaults for AMD CPUs.

>  - Removed redundant setting to zeros.

>  - Stopped using the fake power limit domain 1.

>

> Changes in v2:

> By Kim Phillips <kim.phillips@amd.com>

> - Added the Fam19h patch to the end of the series

> - Added my Acked-by

> - Added Daniel Lezcano to Cc

> - (linux-pm was already on Cc)

> - (No code changes)

>

> Kim Phillips (1):

>   powercap: Add AMD Fam19h RAPL support

>

> Victor Ding (3):

>   x86/msr-index: sort AMD RAPL MSRs by address

>   powercap/intel_rapl_msr: Convert rapl_msr_priv into pointer

>   powercap: Add AMD Fam17h RAPL support

>

>  arch/x86/include/asm/msr-index.h     |  3 +-

>  drivers/powercap/intel_rapl_common.c |  7 ++++

>  drivers/powercap/intel_rapl_msr.c    | 51 ++++++++++++++++++++--------

>  3 files changed, 45 insertions(+), 16 deletions(-)

>

> --


All patches applied as 5.11 material (which some minor edits in the
changelogs), thanks!