Message ID | 20230419024419.324436-1-rui.zhang@intel.com |
---|---|
Headers | show |
Series | powercap/intel_rapl: Introduce RAPL TPMI support | expand |
Hi, Ville, On Tue, 2023-09-05 at 09:21 +0300, Ville Syrjälä wrote: > On Wed, Apr 19, 2023 at 10:44:13AM +0800, Zhang Rui wrote: > > The same set of operations are shared by different Powert Limits, > > including Power Limit get/set, Power Limit enable/disable, clamping > > enable/disable, time window get/set, and max power get/set, etc. > > > > But the same operation for different Power Limit has different > > primitives because they use different registers/register bits. > > > > A lot of dirty/duplicate code was introduced to handle this > > difference. > > > > Introduce a universal way to issue Power Limit operations. > > Instead of using hardcoded primitive name directly, use Power Limit > > id > > + operation type, and hide all the Power Limit difference details > > in a > > central place, get_pl_prim(). Two helpers, rapl_read_pl_data() and > > rapl_write_pl_data(), are introduced at the same time to simplify > > the > > code for issuing Power Limit operations. > > > > Signed-off-by: Zhang Rui <rui.zhang@intel.com> > > Tested-by: Wang Wendy <wendy.wang@intel.com> > > --- > > drivers/powercap/intel_rapl_common.c | 343 ++++++++++++----------- > > ---- > > include/linux/intel_rapl.h | 1 - > > 2 files changed, 146 insertions(+), 198 deletions(-) > > > > diff --git a/drivers/powercap/intel_rapl_common.c > > b/drivers/powercap/intel_rapl_common.c > > index 8e77df42257a..7f80c35e5c86 100644 > > --- a/drivers/powercap/intel_rapl_common.c > > +++ b/drivers/powercap/intel_rapl_common.c > <snip> > > @@ -818,6 +778,33 @@ static int rapl_write_data_raw(struct > > rapl_domain *rd, > > return ret; > > } > > > > +static int rapl_read_pl_data(struct rapl_domain *rd, int pl, > > + enum pl_prims pl_prim, bool xlate, > > u64 *data) > > +{ > > + enum rapl_primitives prim = get_pl_prim(pl, pl_prim); > > + > > + if (!is_pl_valid(rd, pl)) > > + return -EINVAL; > > + > > + return rapl_read_data_raw(rd, prim, xlate, data); > > +} > > + > > +static int rapl_write_pl_data(struct rapl_domain *rd, int pl, > > + enum pl_prims pl_prim, > > + unsigned long long value) > > +{ > > + enum rapl_primitives prim = get_pl_prim(pl, pl_prim); > > + > > + if (!is_pl_valid(rd, pl)) > > + return -EINVAL; > > + > > + if (rd->state & DOMAIN_STATE_BIOS_LOCKED) { > > + pr_warn("%s:%s:%s locked by BIOS\n", rd->rp->name, > > rd->name, pl_names[pl]); > > + return -EACCES; > > This seems to be causing a lot of WARN level dmesg spam [1] during > suspend/resume on several machines. I suppose previously the > warning was only printed when trying to change the limits explicitly, Before this patch, when enabling a power limit, it doesn't give a warning but returns -EACCES directly. After this patch, it gives a warning when any change is made to a locked power limit, and this includes enabling/setting the power limit, changing the time window, etc. So I think in your case, the RAPL Package domain is enabled upon resume for some reason, maybe requested by thermald, and that's why we see this new warning. The below change keeps the previous logic, can you confirm this? IMO, the new logic is right because making any change to a locked power limit is meaningless. Srinivas, Do we check if a domain/power_limit is locked before we enabling it in thermald? thanks, rui diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 5c2e6d5eea2a..f6816a91d027 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -893,7 +893,7 @@ static int rapl_write_pl_data(struct rapl_domain *rd, int pl, if (!is_pl_valid(rd, pl)) return -EINVAL; - if (rd->rpl[pl].locked) { + if (rd->rpl[pl].locked && pl_prim == PL_LIMIT) { pr_warn("%s:%s:%s locked by BIOS\n", rd->rp->name, rd- >name, pl_names[pl]); return -EACCES; }
On Wed, 2023-09-06 at 03:14 +0000, Zhang, Rui wrote: > Hi, Ville, > > [...] > The below change keeps the previous logic, can you confirm this? > > IMO, the new logic is right because making any change to a > locked power limit is meaningless. > > Srinivas, > > Do we check if a domain/power_limit is locked before we enabling it > in > thermald? There is no way to check locked bit from user space. This was an issue several years back and thermald added logic to avoid trying. " if (ret == -ENODATA) { thd_log_info("powercap RAPL is BIOS locked, cannot update\n"); bios_locked = true; } " But here it seems that issue with suspend/resume. thermald doesn't do anything during suspend/resume. Thanks, Srinivas > > thanks, > rui > > diff --git a/drivers/powercap/intel_rapl_common.c > b/drivers/powercap/intel_rapl_common.c > index 5c2e6d5eea2a..f6816a91d027 100644 > --- a/drivers/powercap/intel_rapl_common.c > +++ b/drivers/powercap/intel_rapl_common.c > @@ -893,7 +893,7 @@ static int rapl_write_pl_data(struct rapl_domain > *rd, int pl, > if (!is_pl_valid(rd, pl)) > return -EINVAL; > > - if (rd->rpl[pl].locked) { > + if (rd->rpl[pl].locked && pl_prim == PL_LIMIT) { > pr_warn("%s:%s:%s locked by BIOS\n", rd->rp->name, > rd- > > name, pl_names[pl]); > return -EACCES; > } > > > > >