diff mbox series

[v2] x86,acpi: Limit "Dummy wait" workaround to older AMD and Intel processors

Message ID 20220923153801.9167-1-kprateek.nayak@amd.com
State New
Headers show
Series [v2] x86,acpi: Limit "Dummy wait" workaround to older AMD and Intel processors | expand

Commit Message

K Prateek Nayak Sept. 23, 2022, 3:38 p.m. UTC
Processors based on the Zen microarchitecture support IOPORT based deeper
C-states. The ACPI idle driver reads the
acpi_gbl_FADT.xpm_timer_block.address in the IOPORT based C-state exit
path which is claimed to be a "Dummy wait op" and has been around since
ACPI's introduction to Linux dating back to Andy Grover's Mar 14, 2002
posting [1].

Old, circa 2002 chipsets have a bug which was elaborated by Andreas Mohr
back in 2006 in commit b488f02156d3d ("ACPI: restore comment justifying
'extra' P_LVLx access") where the commit log claims:
"this dummy read was about: STPCLK# doesn't get asserted in time on
(some) chipsets, which is why we need to have a dummy I/O read to delay
further instruction processing until the CPU is fully stopped."

This workaround is very painful on modern systems with a large number of
cores. The "inl()" can take thousands of cycles. Sampling certain
workloads with IBS on AMD Zen3 system shows that a significant amount of
time is spent in the dummy op, which incorrectly gets accounted as
C-State residency. A large C-State residency value can prime the cpuidle
governor to recommend a deeper C-State during the subsequent idle
instances, starting a vicious cycle, leading to performance degradation
on workloads that rapidly switch between busy and idle phases.
(For the extent of the performance degradation refer link [2])

The dummy wait is unnecessary on processors based on the Zen
microarchitecture (AMD family 17h+ and HYGON). Skip it to prevent
polluting the C-state residency information. Among the pre-family 17h
AMD processors, there has been at least one report of an AMD Athlon on a
VIA chipset (circa 2006) where this this problem was seen (see [3] for
report by Andreas Mohr).

Modern Intel processors use MWAIT based C-States in the intel_idle driver
and are not impacted by this code path. For older Intel processors that
use the acpi_idle driver, a workaround was suggested by Dave Hansen and
Rafael J. Wysocki to regard all Intel chipsets using the IOPORT based
C-state management as being affected by this problem (see [4] for
workaround proposed).

For these reasons, mark all the Intel processors and pre-family 17h
AMD processors with x86_BUG_STPCLK. In the acpi_idle driver, restrict the
dummy wait during IOPORT based C-state transitions to only these
processors.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/commit/?id=972c16130d9dc182cedcdd408408d9eacc7d6a2d [1]
Link: https://lore.kernel.org/lkml/20220921063638.2489-1-kprateek.nayak@amd.com/ [2]
Link: https://lore.kernel.org/lkml/Yyy6l94G0O2B7Yh1@rhlx01.hs-esslingen.de/ [3]
Link: https://lore.kernel.org/lkml/88c17568-8694-940a-0f1f-9d345e8dcbdb@intel.com/ [4]

Suggested-by: Calvin Ong <calvin.ong@amd.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: Pu Wen <puwen@hygon.cn>
Cc: stable@vger.kernel.org
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
v1->v2:
o Introduce X86_BUG_STPCLK to mark chipsets as being affected by the
  STPCLK# signal assertion issue.
o Mark all Intel chipsets and pre fam-17h AMD chipsets as being affected
  by the X86_BUG_STPCLK.
o Skip dummy xpm_timer_block read in IOPORT based C-state exit path in
  ACPI processor_idle if chipset is not affected by X86_BUG_STPCLK.
---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/kernel/cpu/amd.c          | 12 ++++++++++++
 arch/x86/kernel/cpu/intel.c        | 12 ++++++++++++
 drivers/acpi/processor_idle.c      |  8 ++++++++
 4 files changed, 33 insertions(+)

Comments

Peter Zijlstra Sept. 26, 2022, 12:07 p.m. UTC | #1
On Fri, Sep 23, 2022 at 09:08:01PM +0530, K Prateek Nayak wrote:
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index ef4775c6db01..fcd3617ed315 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -460,5 +460,6 @@
>  #define X86_BUG_MMIO_UNKNOWN		X86_BUG(26) /* CPU is too old and its MMIO Stale Data status is unknown */
>  #define X86_BUG_RETBLEED		X86_BUG(27) /* CPU is affected by RETBleed */
>  #define X86_BUG_EIBRS_PBRSB		X86_BUG(28) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
> +#define X86_BUG_STPCLK			X86_BUG(29) /* STPCLK# signal does not get asserted in time during IOPORT based C-state entry */
>  
>  #endif /* _ASM_X86_CPUFEATURES_H */
> diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
> index 48276c0e479d..8cb5887a53a3 100644
> --- a/arch/x86/kernel/cpu/amd.c
> +++ b/arch/x86/kernel/cpu/amd.c
> @@ -988,6 +988,18 @@ static void init_amd(struct cpuinfo_x86 *c)
>  	if (!cpu_has(c, X86_FEATURE_XENPV))
>  		set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
>  
> +	/*
> +	 * CPUs based on the Zen microarchitecture (Fam 17h onward) can
> +	 * guarantee that STPCLK# signal is asserted in time after the
> +	 * P_LVL2 read to freeze execution after an IOPORT based C-state
> +	 * entry. Among the older AMD processors, there has been at least
> +	 * one report of an AMD Athlon processor on a VIA chipset
> +	 * (circa 2006) having this issue. Mark all these older AMD
> +	 * processor families as being affected.
> +	 */
> +	if (c->x86 < 0x17)
> +		set_cpu_bug(c, X86_BUG_STPCLK);
> +
>  	/*
>  	 * Turn on the Instructions Retired free counter on machines not
>  	 * susceptible to erratum #1054 "Instructions Retired Performance
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 2d7ea5480ec3..96fe1320c238 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -696,6 +696,18 @@ static void init_intel(struct cpuinfo_x86 *c)
>  		((c->x86_model == INTEL_FAM6_ATOM_GOLDMONT)))
>  		set_cpu_bug(c, X86_BUG_MONITOR);
>  
> +	/*
> +	 * Intel chipsets prior to Nehalem used the ACPI processor_idle
> +	 * driver for C-state management. Some of these processors that
> +	 * used IOPORT based C-states could not guarantee that STPCLK#
> +	 * signal gets asserted in time after P_LVL2 read to freeze
> +	 * execution properly. Since a clear cut-off point is not known
> +	 * as to when this bug was solved, mark all the chipsets as
> +	 * being affected. Only the ones that use IOPORT based C-state
> +	 * transitions via the acpi_idle driver will be impacted.
> +	 */
> +	set_cpu_bug(c, X86_BUG_STPCLK);
> +
>  #ifdef CONFIG_X86_64
>  	if (c->x86 == 15)
>  		c->x86_cache_alignment = c->x86_clflush_size * 2;

Quiz time:

  #define X86_VENDOR_INTEL       0
  #define X86_VENDOR_CYRIX       1
  #define X86_VENDOR_AMD         2
  #define X86_VENDOR_UMC         3
  #define X86_VENDOR_CENTAUR     5
  #define X86_VENDOR_TRANSMETA   7
  #define X86_VENDOR_NSC         8
  #define X86_VENDOR_HYGON       9
  #define X86_VENDOR_ZHAOXIN     10
  #define X86_VENDOR_VORTEX      11
  #define X86_VENDOR_NUM         12
  #define X86_VENDOR_UNKNOWN     0xff

For how many of the above have you changed behaviour?

Not to mention that this is the gazillion-th time AMD has failed to
change HYGON in lock-step. That's Zen too -- deal with it.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index ef4775c6db01..fcd3617ed315 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -460,5 +460,6 @@ 
 #define X86_BUG_MMIO_UNKNOWN		X86_BUG(26) /* CPU is too old and its MMIO Stale Data status is unknown */
 #define X86_BUG_RETBLEED		X86_BUG(27) /* CPU is affected by RETBleed */
 #define X86_BUG_EIBRS_PBRSB		X86_BUG(28) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
+#define X86_BUG_STPCLK			X86_BUG(29) /* STPCLK# signal does not get asserted in time during IOPORT based C-state entry */
 
 #endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 48276c0e479d..8cb5887a53a3 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -988,6 +988,18 @@  static void init_amd(struct cpuinfo_x86 *c)
 	if (!cpu_has(c, X86_FEATURE_XENPV))
 		set_cpu_bug(c, X86_BUG_SYSRET_SS_ATTRS);
 
+	/*
+	 * CPUs based on the Zen microarchitecture (Fam 17h onward) can
+	 * guarantee that STPCLK# signal is asserted in time after the
+	 * P_LVL2 read to freeze execution after an IOPORT based C-state
+	 * entry. Among the older AMD processors, there has been at least
+	 * one report of an AMD Athlon processor on a VIA chipset
+	 * (circa 2006) having this issue. Mark all these older AMD
+	 * processor families as being affected.
+	 */
+	if (c->x86 < 0x17)
+		set_cpu_bug(c, X86_BUG_STPCLK);
+
 	/*
 	 * Turn on the Instructions Retired free counter on machines not
 	 * susceptible to erratum #1054 "Instructions Retired Performance
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 2d7ea5480ec3..96fe1320c238 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -696,6 +696,18 @@  static void init_intel(struct cpuinfo_x86 *c)
 		((c->x86_model == INTEL_FAM6_ATOM_GOLDMONT)))
 		set_cpu_bug(c, X86_BUG_MONITOR);
 
+	/*
+	 * Intel chipsets prior to Nehalem used the ACPI processor_idle
+	 * driver for C-state management. Some of these processors that
+	 * used IOPORT based C-states could not guarantee that STPCLK#
+	 * signal gets asserted in time after P_LVL2 read to freeze
+	 * execution properly. Since a clear cut-off point is not known
+	 * as to when this bug was solved, mark all the chipsets as
+	 * being affected. Only the ones that use IOPORT based C-state
+	 * transitions via the acpi_idle driver will be impacted.
+	 */
+	set_cpu_bug(c, X86_BUG_STPCLK);
+
 #ifdef CONFIG_X86_64
 	if (c->x86 == 15)
 		c->x86_cache_alignment = c->x86_clflush_size * 2;
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 16a1663d02d4..493f9ccdb72d 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -528,6 +528,14 @@  static int acpi_idle_bm_check(void)
 static void wait_for_freeze(void)
 {
 #ifdef	CONFIG_X86
+	/*
+	 * A dummy wait operation is only required for those chipsets
+	 * that cannot assert STPCLK# signal in time after P_LVL2 read.
+	 * If a chipset is not affected by this problem, skip it.
+	 */
+	if (!static_cpu_has_bug(X86_BUG_STPCLK))
+		return;
+
 	/* No delay is needed if we are in guest */
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
 		return;