Message ID | 20241108122909.763663-3-patryk.wlazlyn@linux.intel.com |
---|---|
State | New |
Headers | show |
Series | SRF: Fix offline CPU preventing pc6 entry | expand |
On Tue, Nov 12, 2024 at 03:56:18PM +0100, Peter Zijlstra wrote: > On Tue, Nov 12, 2024 at 02:23:14PM +0100, Rafael J. Wysocki wrote: > > On Tue, Nov 12, 2024 at 12:47 PM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Fri, Nov 08, 2024 at 01:29:08PM +0100, Patryk Wlazlyn wrote: > > > > The generic implementation, based on cpuid leaf 0x5, for looking up the > > > > mwait hint for the deepest cstate, depends on them to be continuous in > > > > range [0, NUM_SUBSTATES-1]. While that is correct on most Intel x86 > > > > platforms, it is not architectural and may not result in reaching the > > > > most optimized idle state on some of them. > > > > > > > > Prefer cpuidle_play_dead() over the generic mwait_play_dead() loop and > > > > fallback to the later in case of missing enter_dead() handler. > > > > > > > > Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> > > > > --- > > > > arch/x86/kernel/smpboot.c | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > > > > index 44c40781bad6..721bb931181c 100644 > > > > --- a/arch/x86/kernel/smpboot.c > > > > +++ b/arch/x86/kernel/smpboot.c > > > > @@ -1416,9 +1416,9 @@ void native_play_dead(void) > > > > play_dead_common(); > > > > tboot_shutdown(TB_SHUTDOWN_WFS); > > > > > > > > - mwait_play_dead(); > > > > if (cpuidle_play_dead()) > > > > - hlt_play_dead(); > > > > + mwait_play_dead(); > > > > + hlt_play_dead(); > > > > } > > > > > > Yeah, I don't think so. we don't want to accidentally hit > > > acpi_idle_play_dead(). > > > > Having inspected the code once again, I'm not sure what your concern is. > > > > :enter.dead() is set to acpi_idle_play_dead() for all states in ACPI > > idle - see acpi_processor_setup_cstates() and the role of the type > > check is to filter out bogus table entries (the "type" must be 1, 2, > > or 3 as per the spec). > > > > Then cpuidle_play_dead() calls drv->states[i].enter_dead() for the > > deepest state where it is set and if this is FFH, > > acpi_idle_play_dead() will return an error. So after the change, the > > code above will fall back to mwait_play_dead() then. > > > > Or am I missing anything? > > So it relies on there being a C2/C3 state enumerated and that being FFh. > Otherwise it will find a 'working' state and we're up a creek. > > Typically I expect C2/C3 FFh states will be there on Intel stuff, but it > seems awefully random to rely on this hole. AMD might unwittinly change > the ACPI driver (they're the main user) and then we'd be up a creek. AMD platforms won't be using FFH based states for offlined CPUs. We prefer IO based states when available, and HLT otherwise. > > Robustly we'd teach the ACPI driver about FFh and set enter_dead on > every state -- but we'd have to double check that with AMD. Works for us as long as those FFh states aren't used for play_dead on AMD platforms. How about something like this (completely untested) ---------------------x8---------------------------------------------------- diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c index f3ffd0a3a012..bd611771fa6c 100644 --- a/arch/x86/kernel/acpi/cstate.c +++ b/arch/x86/kernel/acpi/cstate.c @@ -215,6 +215,24 @@ void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx) } EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_enter); +static int acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx) +{ + unsigned int cpu = smp_processor_id(); + struct cstate_entry *percpu_entry; + + /* + * This is ugly. But AMD processors don't prefer MWAIT based + * C-states when processors are offlined. + */ + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) + return -ENODEV; + + percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu); + return mwait_play_dead_with_hints(percpu_entry->states[cx->index].eax); +} +EXPORT_SYMBOL_GPL(acpi_processor_ffh_play_dead); + static int __init ffh_cstate_init(void) { struct cpuinfo_x86 *c = &boot_cpu_data; diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index 831fa4a12159..c535f5df9081 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -590,6 +590,8 @@ static int acpi_idle_play_dead(struct cpuidle_device *dev, int index) raw_safe_halt(); else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) { io_idle(cx->address); + } else if (cx->entry_method == ACPI_CSTATE_FFH) { + return acpi_procesor_ffh_play_dead(cx); } else return -ENODEV; } diff --git a/include/acpi/processor.h b/include/acpi/processor.h index e6f6074eadbf..38329dcdd2b9 100644 --- a/include/acpi/processor.h +++ b/include/acpi/processor.h @@ -280,6 +280,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu, struct acpi_processor_cx *cx, struct acpi_power_register *reg); void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cstate); +int acpi_processor_ffh_dead(struct acpi_processor_cx *cstate); #else static inline void acpi_processor_power_init_bm_check(struct acpi_processor_flags @@ -300,6 +301,11 @@ static inline void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx { return; } + +static inline acpi_processor_ffh_dead(struct acpi_processor_cx *cstate) +{ + return -ENODEV; +} #endif static inline int call_on_cpu(int cpu, long (*fn)(void *), void *arg, ---------------------x8-------------------------------------------------------- -- Thanks and Regards gautham.
On 11/13/24 03:41, Gautham R. Shenoy wrote: > + /* > + * This is ugly. But AMD processors don't prefer MWAIT based > + * C-states when processors are offlined. > + */ > + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || > + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) > + return -ENODEV; Can we get an X86_FEATURE for this, please? Either a positive one: X86_FEATURE_MWAIT_OK_FOR_OFFLINE or a negative one: X86_FEATURE_MWAIT_BUSTED_FOR_OFFLINE ... with better names. Or even a helper. Because if you add this AMD||HYGON check, it'll be at _least_ the second one of these for the same logical reason.
On 11/13/2024 5:22 PM, Peter Zijlstra wrote: > On Wed, Nov 13, 2024 at 05:11:38PM +0530, Gautham R. Shenoy wrote: > >> How about something like this (completely untested) >> >> ---------------------x8---------------------------------------------------- >> >> diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c >> index f3ffd0a3a012..bd611771fa6c 100644 >> --- a/arch/x86/kernel/acpi/cstate.c >> +++ b/arch/x86/kernel/acpi/cstate.c >> @@ -215,6 +215,24 @@ void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx) >> } >> EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_enter); >> >> +static int acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx) >> +{ >> + unsigned int cpu = smp_processor_id(); >> + struct cstate_entry *percpu_entry; >> + >> + /* >> + * This is ugly. But AMD processors don't prefer MWAIT based >> + * C-states when processors are offlined. >> + */ >> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || >> + boot_cpu_data.x86_vendor == X86_VENDOR_HYGON) >> + return -ENODEV; > Are there AMD systems with FFh idle states at all? I don't know. > Also, I don't think this works right, the loop in cpuidle_play_dead() > will terminate on this, and not try a shallower state which might have > IO/HLT on. I think you are right. >> + >> + percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu); >> + return mwait_play_dead_with_hints(percpu_entry->states[cx->index].eax); >> +} >> +EXPORT_SYMBOL_GPL(acpi_processor_ffh_play_dead);
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 44c40781bad6..721bb931181c 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1416,9 +1416,9 @@ void native_play_dead(void) play_dead_common(); tboot_shutdown(TB_SHUTDOWN_WFS); - mwait_play_dead(); if (cpuidle_play_dead()) - hlt_play_dead(); + mwait_play_dead(); + hlt_play_dead(); } #else /* ... !CONFIG_HOTPLUG_CPU */
The generic implementation, based on cpuid leaf 0x5, for looking up the mwait hint for the deepest cstate, depends on them to be continuous in range [0, NUM_SUBSTATES-1]. While that is correct on most Intel x86 platforms, it is not architectural and may not result in reaching the most optimized idle state on some of them. Prefer cpuidle_play_dead() over the generic mwait_play_dead() loop and fallback to the later in case of missing enter_dead() handler. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> --- arch/x86/kernel/smpboot.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)