mbox series

[v2,0/15] powercap/intel_rapl: Introduce RAPL TPMI support

Message ID 20230419024419.324436-1-rui.zhang@intel.com
Headers show
Series powercap/intel_rapl: Introduce RAPL TPMI support | expand

Message

Zhang Rui April 19, 2023, 2:44 a.m. UTC
The TPMI (Topology Aware Register and PM Capsule Interface) provides a
flexible, extendable and PCIe enumerable MMIO interface for PM features.

The TPMI documentation can be downloaded from:
https://github.com/intel/tpmi_power_management

Intel RAPL (Running Average Power Limit) is one of the features that
benefit from this. Using TPMI Interface has advantage over traditional MSR
(Model Specific Register) interface, where a thread needs to be scheduled
on the target CPU to read or write. Also the RAPL features vary between
CPU models, and hence lot of model specific code. Here TPMI provides an
architectural interface by providing hierarchical tables and fields,
which will not need any model specific implementation.

Given that there are some differences between RAPL TPMI Interface and the
existing RAPL MSR/MMIO Interface, this patch series improves the RAPL
common code to satisfy the new requirements from TPMI interface, and then
introduces the RAPL TPMI Interface driver.

Patch 1-4	cleanups and preparation work.
Patch 5		adds support for per Domain Unit register.
Patch 6-10	improves Power Limits handling, and provides support
		for per Power Limit register, and per Power Limit Lock.
Patch 11-12	support rapl_package without online CPUs. So that TPMI
		rapl_package still works with whole package offlined.
Patch 13-15	introduces RAPL Core support for TPMI Interface and the
		RAPL TPMI Interface driver.

This series depends on the TPMI base driver which has been merged in 6.3-rc1.

thanks,
rui
---
Changes since v1:
   - use set_defaults() and variable name 'defaults' for rapl_defaults
     structure.
   - use 'rpi_default' instead of 'rpis' for the default rapl primitive
     information of MSR/MMIO Interface.
   - rephase the changelog of patch 7/15.
   - change the subject of patch 10/15 and use a helper for getting the
     primitive for power limit LOCK bit control.

----------------------------------------------------------------
Zhang Rui (15):
      powercap/intel_rapl: Remove unused field in struct rapl_if_priv
      powercap/intel_rapl: Allow probing without CPUID match
      powercap/intel_rapl: Support per Interface rapl_defaults
      powercap/intel_rapl: Support per Interface primitive information
      powercap/intel_rapl: Support per domain energy/power/time unit
      powercap/intel_rapl: Use index to initialize primitive information
      powercap/intel_rapl: Change primitive order
      powercap/intel_rapl: Use bitmap for Power Limits
      powercap/intel_rapl: Cleanup Power Limits support
      powercap/intel_rapl: Introduce per Power Limit lock
      powercap/intel_rapl: Remove redundant cpu parameter
      powercap/intel_rapl: Make cpu optional for rapl_package
      powercap/intel_rapl: Introduce RAPL I/F type
      powercap/intel_rapl: Introduce core support for TPMI interface
      powercap/intel_rapl_tpmi: Introduce RAPL TPMI interface driver

 drivers/powercap/Kconfig                           |  14 +
 drivers/powercap/Makefile                          |   1 +
 drivers/powercap/intel_rapl_common.c               | 868 ++++++++++++---------
 drivers/powercap/intel_rapl_msr.c                  |  14 +-
 drivers/powercap/intel_rapl_tpmi.c                 | 325 ++++++++
 .../intel/int340x_thermal/processor_thermal_rapl.c |  11 +-
 include/linux/intel_rapl.h                         |  40 +-
 7 files changed, 875 insertions(+), 398 deletions(-)
 create mode 100644 drivers/powercap/intel_rapl_tpmi.c

Comments

Zhang Rui Sept. 6, 2023, 3:14 a.m. UTC | #1
Hi, Ville,

On Tue, 2023-09-05 at 09:21 +0300, Ville Syrjälä wrote:
> On Wed, Apr 19, 2023 at 10:44:13AM +0800, Zhang Rui wrote:
> > The same set of operations are shared by different Powert Limits,
> > including Power Limit get/set, Power Limit enable/disable, clamping
> > enable/disable, time window get/set, and max power get/set, etc.
> > 
> > But the same operation for different Power Limit has different
> > primitives because they use different registers/register bits.
> > 
> > A lot of dirty/duplicate code was introduced to handle this
> > difference.
> > 
> > Introduce a universal way to issue Power Limit operations.
> > Instead of using hardcoded primitive name directly, use Power Limit
> > id
> > + operation type, and hide all the Power Limit difference details
> > in a
> > central place, get_pl_prim(). Two helpers, rapl_read_pl_data() and
> > rapl_write_pl_data(), are introduced at the same time to simplify
> > the
> > code for issuing Power Limit operations.
> > 
> > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> > Tested-by: Wang Wendy <wendy.wang@intel.com>
> > ---
> >  drivers/powercap/intel_rapl_common.c | 343 ++++++++++++-----------
> > ----
> >  include/linux/intel_rapl.h           |   1 -
> >  2 files changed, 146 insertions(+), 198 deletions(-)
> > 
> > diff --git a/drivers/powercap/intel_rapl_common.c
> > b/drivers/powercap/intel_rapl_common.c
> > index 8e77df42257a..7f80c35e5c86 100644
> > --- a/drivers/powercap/intel_rapl_common.c
> > +++ b/drivers/powercap/intel_rapl_common.c
> <snip>
> > @@ -818,6 +778,33 @@ static int rapl_write_data_raw(struct
> > rapl_domain *rd,
> >         return ret;
> >  }
> >  
> > +static int rapl_read_pl_data(struct rapl_domain *rd, int pl,
> > +                             enum pl_prims pl_prim, bool xlate,
> > u64 *data)
> > +{
> > +       enum rapl_primitives prim = get_pl_prim(pl, pl_prim);
> > +
> > +       if (!is_pl_valid(rd, pl))
> > +               return -EINVAL;
> > +
> > +       return rapl_read_data_raw(rd, prim, xlate, data);
> > +}
> > +
> > +static int rapl_write_pl_data(struct rapl_domain *rd, int pl,
> > +                              enum pl_prims pl_prim,
> > +                              unsigned long long value)
> > +{
> > +       enum rapl_primitives prim = get_pl_prim(pl, pl_prim);
> > +
> > +       if (!is_pl_valid(rd, pl))
> > +               return -EINVAL;
> > +
> > +       if (rd->state & DOMAIN_STATE_BIOS_LOCKED) {
> > +               pr_warn("%s:%s:%s locked by BIOS\n", rd->rp->name,
> > rd->name, pl_names[pl]);
> > +               return -EACCES;
> 
> This seems to be causing a lot of WARN level dmesg spam [1] during
> suspend/resume on several machines. I suppose previously the
> warning was only printed when trying to change the limits explicitly,

Before this patch, when enabling a power limit, it doesn't give a
warning but returns -EACCES directly.

After this patch, it gives a warning when any change is made to a
locked power limit, and this includes enabling/setting the power limit,
changing the time window, etc.

So I think in your case, the RAPL Package domain is enabled upon resume
for some reason, maybe requested by thermald, and that's why we see
this new warning.

The below change keeps the previous logic, can you confirm this?

IMO, the new logic is right because making any change to a 
locked power limit is meaningless.

Srinivas,

Do we check if a domain/power_limit is locked before we enabling it in
thermald?

thanks,
rui

diff --git a/drivers/powercap/intel_rapl_common.c
b/drivers/powercap/intel_rapl_common.c
index 5c2e6d5eea2a..f6816a91d027 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -893,7 +893,7 @@ static int rapl_write_pl_data(struct rapl_domain
*rd, int pl,
 	if (!is_pl_valid(rd, pl))
 		return -EINVAL;
 
-	if (rd->rpl[pl].locked) {
+	if (rd->rpl[pl].locked && pl_prim == PL_LIMIT) {
 		pr_warn("%s:%s:%s locked by BIOS\n", rd->rp->name, rd-
>name, pl_names[pl]);
 		return -EACCES;
 	}
Pandruvada, Srinivas Sept. 6, 2023, 3:32 p.m. UTC | #2
On Wed, 2023-09-06 at 03:14 +0000, Zhang, Rui wrote:
> Hi, Ville,
> 
> 
[...]

> The below change keeps the previous logic, can you confirm this?
> 
> IMO, the new logic is right because making any change to a 
> locked power limit is meaningless.
> 
> Srinivas,
> 
> Do we check if a domain/power_limit is locked before we enabling it
> in
> thermald?
There is no way to check locked bit from user space.

This was an issue several years back and thermald added logic to avoid
trying.
"
             if (ret == -ENODATA) {
                        thd_log_info("powercap RAPL is BIOS locked,
cannot update\n");
                        bios_locked = true;
}
"

But here it seems that issue with suspend/resume. thermald doesn't do
anything during suspend/resume.

Thanks,
Srinivas

> 
> thanks,
> rui
> 
> diff --git a/drivers/powercap/intel_rapl_common.c
> b/drivers/powercap/intel_rapl_common.c
> index 5c2e6d5eea2a..f6816a91d027 100644
> --- a/drivers/powercap/intel_rapl_common.c
> +++ b/drivers/powercap/intel_rapl_common.c
> @@ -893,7 +893,7 @@ static int rapl_write_pl_data(struct rapl_domain
> *rd, int pl,
>         if (!is_pl_valid(rd, pl))
>                 return -EINVAL;
>  
> -       if (rd->rpl[pl].locked) {
> +       if (rd->rpl[pl].locked && pl_prim == PL_LIMIT) {
>                 pr_warn("%s:%s:%s locked by BIOS\n", rd->rp->name,
> rd-
> > name, pl_names[pl]);
>                 return -EACCES;
>         }
> 
> 
> 
> 
>