diff mbox series

powercap: intel_rapl: Fix setting of Power Limit 4 to 0

Message ID 20230906000640.2919607-1-srinivas.pandruvada@linux.intel.com
State New
Headers show
Series powercap: intel_rapl: Fix setting of Power Limit 4 to 0 | expand

Commit Message

Srinivas Pandruvada Sept. 6, 2023, 12:06 a.m. UTC
System runs at minimum performance, once powercap RAPL package domain
"enabled" flag is toggled.

Setting RAPL package domain enabled flag to 0, results in setting of
power limit 4 (PL4) MSR 0x601 to 0. This implies disabling PL4 limit.
The PL4 limit controls the peak power. This can significantly change
the performance. Even worse, when the enabled flag is set to 1 again.
This will set PL4 MSR value to 0x01, which means reduce peak power to
0.125W. This will force the system to run at the lowest possible
performance.

This is caused by a change which assumes that there is an enable bit
in the PL4 MSR like other power limits.

In functions set_floor_freq_default() and rapl_remove_package(), call
rapl_write_pl_data with PL_ENABLE and PL_CLAMP for only power limit 1
and 2. Similarly don't read PL_ENABLE for PL4 to check the presence of
power limit 4. Power limit 4 support is based on CPU model in this
driver. No additional checks can be done.

Fixes: 9050a9cd5e4c ("powercap: intel_rapl: Cleanup Power Limits support")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: stable@vger.kernel.org # v6.5+
---
 drivers/powercap/intel_rapl_common.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

Comments

Zhang, Rui Sept. 6, 2023, 2:20 a.m. UTC | #1
Hi, Srinivas,

Thanks for the patch.

On Tue, 2023-09-05 at 17:06 -0700, Srinivas Pandruvada wrote:
> System runs at minimum performance, once powercap RAPL package domain
> "enabled" flag is toggled.
> 
> Setting RAPL package domain enabled flag to 0, results in setting of
> power limit 4 (PL4) MSR 0x601 to 0.

This is the bug. Setting enabled flag should only affect the Enable bit
but not the other bits.

>  This implies disabling PL4 limit.
> The PL4 limit controls the peak power. This can significantly change
> the performance. Even worse, when the enabled flag is set to 1 again.
> This will set PL4 MSR value to 0x01, which means reduce peak power to
> 0.125W. This will force the system to run at the lowest possible
> performance.
> 
> This is caused by a change which assumes that there is an enable bit
> in the PL4 MSR like other power limits.

MSR RAPL doesn't have PL4 enable bit, but TPMI RAPL does have per power
limit enable bit.

> 
> In functions set_floor_freq_default() and rapl_remove_package(), call
> rapl_write_pl_data with PL_ENABLE and PL_CLAMP for only power limit 1
> and 2. Similarly don't read PL_ENABLE for PL4 to check the presence
> of
> power limit 4.

IMO, the rootcause is that we expose an non-exits PL4_ENABLE primitive
for MSR Interface, and even worse we expose it wrongly that the
primitive uses the power limit bits.

Commit 9050a9cd5e4c ("powercap: intel_rapl: Cleanup Power Limits
support") pokes the MSR interface PL4_ENABLE primitive and exposes this
bug.

Sumeet is testing the below fix right now, and I suppose he will give
some update soon.

thanks,
rui

diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
index 8fac57b28f8a..d407f918876f 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -658,8 +658,6 @@ static struct rapl_primitive_info rpi_msr[NR_RAPL_PRIMITIVES] = {
 			    RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
 	[PL2_CLAMP] = PRIMITIVE_INFO_INIT(PL2_CLAMP, POWER_LIMIT2_CLAMP, 48,
 			    RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0),
-	[PL4_ENABLE] = PRIMITIVE_INFO_INIT(PL4_ENABLE, POWER_LIMIT4_MASK, 0,
-				RAPL_DOMAIN_REG_PL4, ARBITRARY_UNIT, 0),
 	[TIME_WINDOW1] = PRIMITIVE_INFO_INIT(TIME_WINDOW1, TIME_WINDOW1_MASK, 17,
 			    RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0),
 	[TIME_WINDOW2] = PRIMITIVE_INFO_INIT(TIME_WINDOW2, TIME_WINDOW2_MASK, 49,
Srinivas Pandruvada Sept. 6, 2023, 2:36 p.m. UTC | #2
Hi Rui,

On Wed, 2023-09-06 at 02:20 +0000, Zhang, Rui wrote:
> Hi, Srinivas,
> 
> Thanks for the patch.
> 
> On Tue, 2023-09-05 at 17:06 -0700, Srinivas Pandruvada wrote:
> > System runs at minimum performance, once powercap RAPL package
> > domain
> > "enabled" flag is toggled.
> > 
> > Setting RAPL package domain enabled flag to 0, results in setting
> > of
> > power limit 4 (PL4) MSR 0x601 to 0.
> 
> This is the bug. Setting enabled flag should only affect the Enable
> bit
> but not the other bits.
> 
> >  This implies disabling PL4 limit.
> > The PL4 limit controls the peak power. This can significantly
> > change
> > the performance. Even worse, when the enabled flag is set to 1
> > again.
> > This will set PL4 MSR value to 0x01, which means reduce peak power
> > to
> > 0.125W. This will force the system to run at the lowest possible
> > performance.
> > 
> > This is caused by a change which assumes that there is an enable
> > bit
> > in the PL4 MSR like other power limits.
> 
> MSR RAPL doesn't have PL4 enable bit, but TPMI RAPL does have per
> power
> limit enable bit.
That is correct, but not sure that in practice used or not.

> 
> > 
> > In functions set_floor_freq_default() and rapl_remove_package(),
> > call
> > rapl_write_pl_data with PL_ENABLE and PL_CLAMP for only power limit
> > 1
> > and 2. Similarly don't read PL_ENABLE for PL4 to check the presence
> > of
> > power limit 4.
> 
> IMO, the rootcause is that we expose an non-exits PL4_ENABLE
> primitive
> for MSR Interface, and even worse we expose it wrongly that the
> primitive uses the power limit bits.
> 
> Commit 9050a9cd5e4c ("powercap: intel_rapl: Cleanup Power Limits
> support") pokes the MSR interface PL4_ENABLE primitive and exposes
> this
> bug.
> 
> Sumeet is testing the below fix right now, and I suppose he will give
> some update soon.
> 
> thanks,
> rui
> 
> diff --git a/drivers/powercap/intel_rapl_common.c
> b/drivers/powercap/intel_rapl_common.c
> index 8fac57b28f8a..d407f918876f 100644
> --- a/drivers/powercap/intel_rapl_common.c
> +++ b/drivers/powercap/intel_rapl_common.c
> @@ -658,8 +658,6 @@ static struct rapl_primitive_info
> rpi_msr[NR_RAPL_PRIMITIVES] = {
>                             RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT,
> 0),
>         [PL2_CLAMP] = PRIMITIVE_INFO_INIT(PL2_CLAMP,
> POWER_LIMIT2_CLAMP, 48,
>                             RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT,
> 0),
> -       [PL4_ENABLE] = PRIMITIVE_INFO_INIT(PL4_ENABLE,
> POWER_LIMIT4_MASK, 0,
> -                               RAPL_DOMAIN_REG_PL4, ARBITRARY_UNIT,
> 0),
>         [TIME_WINDOW1] = PRIMITIVE_INFO_INIT(TIME_WINDOW1,
> TIME_WINDOW1_MASK, 17,
>                             RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0),
>         [TIME_WINDOW2] = PRIMITIVE_INFO_INIT(TIME_WINDOW2,
> TIME_WINDOW2_MASK, 49,
> 
> 
Let me try this, but this is not enough. The enable/disable is also
gets checked for presence of PL4.

Thanks,
Srinivas

>
Sumeet Pawnikar Sept. 6, 2023, 7:07 p.m. UTC | #3
> -----Original Message-----
> From: srinivas pandruvada <srinivas.pandruvada@linux.intel.com>
> Sent: Wednesday, September 6, 2023 8:06 PM
> To: Zhang, Rui <rui.zhang@intel.com>; rafael@kernel.org
> Cc: linux-pm@vger.kernel.org; Pawnikar, Sumeet R
> <sumeet.r.pawnikar@intel.com>; linux-kernel@vger.kernel.org;
> stable@vger.kernel.org
> Subject: Re: [PATCH] powercap: intel_rapl: Fix setting of Power Limit 4 to 0
> 
> Hi Rui,
> 
> On Wed, 2023-09-06 at 02:20 +0000, Zhang, Rui wrote:
> > Hi, Srinivas,
> >
> > Thanks for the patch.
> >
> > On Tue, 2023-09-05 at 17:06 -0700, Srinivas Pandruvada wrote:
> > > System runs at minimum performance, once powercap RAPL package
> > > domain "enabled" flag is toggled.
> > >
> > > Setting RAPL package domain enabled flag to 0, results in setting of
> > > power limit 4 (PL4) MSR 0x601 to 0.
> >
> > This is the bug. Setting enabled flag should only affect the Enable
> > bit but not the other bits.
> >
> > >  This implies disabling PL4 limit.
> > > The PL4 limit controls the peak power. This can significantly change
> > > the performance. Even worse, when the enabled flag is set to 1
> > > again.
> > > This will set PL4 MSR value to 0x01, which means reduce peak power
> > > to 0.125W. This will force the system to run at the lowest possible
> > > performance.
> > >
> > > This is caused by a change which assumes that there is an enable bit
> > > in the PL4 MSR like other power limits.
> >
> > MSR RAPL doesn't have PL4 enable bit, but TPMI RAPL does have per
> > power limit enable bit.
> That is correct, but not sure that in practice used or not.
> 
> >
> > >
> > > In functions set_floor_freq_default() and rapl_remove_package(),
> > > call
> > > rapl_write_pl_data with PL_ENABLE and PL_CLAMP for only power limit
> > > 1
> > > and 2. Similarly don't read PL_ENABLE for PL4 to check the presence
> > > of
> > > power limit 4.
> >
> > IMO, the rootcause is that we expose an non-exits PL4_ENABLE
> > primitive
> > for MSR Interface, and even worse we expose it wrongly that the
> > primitive uses the power limit bits.
> >
> > Commit 9050a9cd5e4c ("powercap: intel_rapl: Cleanup Power Limits
> > support") pokes the MSR interface PL4_ENABLE primitive and exposes
> > this
> > bug.
> >
> > Sumeet is testing the below fix right now, and I suppose he will give
> > some update soon.
> > 

Testing still going on. 

> > thanks,
> > rui
> >
> > diff --git a/drivers/powercap/intel_rapl_common.c
> > b/drivers/powercap/intel_rapl_common.c
> > index 8fac57b28f8a..d407f918876f 100644
> > --- a/drivers/powercap/intel_rapl_common.c
> > +++ b/drivers/powercap/intel_rapl_common.c
> > @@ -658,8 +658,6 @@ static struct rapl_primitive_info
> > rpi_msr[NR_RAPL_PRIMITIVES] = {
> >                             RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT,
> > 0),
> >         [PL2_CLAMP] = PRIMITIVE_INFO_INIT(PL2_CLAMP,
> > POWER_LIMIT2_CLAMP, 48,
> >                             RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT,
> > 0),
> > -       [PL4_ENABLE] = PRIMITIVE_INFO_INIT(PL4_ENABLE,
> > POWER_LIMIT4_MASK, 0,
> > -                               RAPL_DOMAIN_REG_PL4, ARBITRARY_UNIT,
> > 0),
> >         [TIME_WINDOW1] = PRIMITIVE_INFO_INIT(TIME_WINDOW1,
> > TIME_WINDOW1_MASK, 17,
> >                             RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0),
> >         [TIME_WINDOW2] = PRIMITIVE_INFO_INIT(TIME_WINDOW2,
> > TIME_WINDOW2_MASK, 49,
> >
> >
> Let me try this, but this is not enough. The enable/disable is also
> gets checked for presence of PL4.
> 

Yes, we need the check as well.

Thanks,
Sumeet. 

> Thanks,
> Srinivas
> 
> >
diff mbox series

Patch

diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c
index 5c2e6d5eea2a..0afedf7ad872 100644
--- a/drivers/powercap/intel_rapl_common.c
+++ b/drivers/powercap/intel_rapl_common.c
@@ -184,8 +184,6 @@  static int get_pl_prim(struct rapl_domain *rd, int pl, enum pl_prims prim)
 	case POWER_LIMIT4:
 		if (prim == PL_LIMIT)
 			return POWER_LIMIT4;
-		if (prim == PL_ENABLE)
-			return PL4_ENABLE;
 		/* PL4 would be around two times PL2, use same prim as PL2. */
 		if (prim == PL_MAX_POWER)
 			return MAX_POWER;
@@ -1033,17 +1031,13 @@  static void package_power_limit_irq_restore(struct rapl_package *rp)
 
 static void set_floor_freq_default(struct rapl_domain *rd, bool mode)
 {
-	int i;
-
 	/* always enable clamp such that p-state can go below OS requested
 	 * range. power capping priority over guranteed frequency.
 	 */
 	rapl_write_pl_data(rd, POWER_LIMIT1, PL_CLAMP, mode);
 
-	for (i = POWER_LIMIT2; i < NR_POWER_LIMITS; i++) {
-		rapl_write_pl_data(rd, i, PL_ENABLE, mode);
-		rapl_write_pl_data(rd, i, PL_CLAMP, mode);
-	}
+	rapl_write_pl_data(rd, POWER_LIMIT2, PL_ENABLE, mode);
+	rapl_write_pl_data(rd, POWER_LIMIT2, PL_CLAMP, mode);
 }
 
 static void set_floor_freq_atom(struct rapl_domain *rd, bool enable)
@@ -1458,7 +1452,7 @@  static void rapl_detect_powerlimit(struct rapl_domain *rd)
 			}
 		}
 
-		if (rapl_read_pl_data(rd, i, PL_ENABLE, false, &val64))
+		if (i != POWER_LIMIT4 && rapl_read_pl_data(rd, i, PL_ENABLE, false, &val64))
 			rd->rpl[i].name = NULL;
 	}
 }
@@ -1510,7 +1504,7 @@  void rapl_remove_package(struct rapl_package *rp)
 	for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) {
 		int i;
 
-		for (i = POWER_LIMIT1; i < NR_POWER_LIMITS; i++) {
+		for (i = POWER_LIMIT1; i <= POWER_LIMIT2; i++) {
 			rapl_write_pl_data(rd, i, PL_ENABLE, 0);
 			rapl_write_pl_data(rd, i, PL_CLAMP, 0);
 		}