diff mbox series

[v3,2/3] x86/smp native_play_dead: Prefer cpuidle_play_dead() over mwait_play_dead()

Message ID 20241108122909.763663-3-patryk.wlazlyn@linux.intel.com
State New
Headers show
Series SRF: Fix offline CPU preventing pc6 entry | expand

Commit Message

Patryk Wlazlyn Nov. 8, 2024, 12:29 p.m. UTC
The generic implementation, based on cpuid leaf 0x5, for looking up the
mwait hint for the deepest cstate, depends on them to be continuous in
range [0, NUM_SUBSTATES-1]. While that is correct on most Intel x86
platforms, it is not architectural and may not result in reaching the
most optimized idle state on some of them.

Prefer cpuidle_play_dead() over the generic mwait_play_dead() loop and
fallback to the later in case of missing enter_dead() handler.

Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
---
 arch/x86/kernel/smpboot.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Gautham R. Shenoy Nov. 13, 2024, 11:41 a.m. UTC | #1
On Tue, Nov 12, 2024 at 03:56:18PM +0100, Peter Zijlstra wrote:
> On Tue, Nov 12, 2024 at 02:23:14PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Nov 12, 2024 at 12:47 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Fri, Nov 08, 2024 at 01:29:08PM +0100, Patryk Wlazlyn wrote:
> > > > The generic implementation, based on cpuid leaf 0x5, for looking up the
> > > > mwait hint for the deepest cstate, depends on them to be continuous in
> > > > range [0, NUM_SUBSTATES-1]. While that is correct on most Intel x86
> > > > platforms, it is not architectural and may not result in reaching the
> > > > most optimized idle state on some of them.
> > > >
> > > > Prefer cpuidle_play_dead() over the generic mwait_play_dead() loop and
> > > > fallback to the later in case of missing enter_dead() handler.
> > > >
> > > > Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
> > > > ---
> > > >  arch/x86/kernel/smpboot.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > > > index 44c40781bad6..721bb931181c 100644
> > > > --- a/arch/x86/kernel/smpboot.c
> > > > +++ b/arch/x86/kernel/smpboot.c
> > > > @@ -1416,9 +1416,9 @@ void native_play_dead(void)
> > > >       play_dead_common();
> > > >       tboot_shutdown(TB_SHUTDOWN_WFS);
> > > >
> > > > -     mwait_play_dead();
> > > >       if (cpuidle_play_dead())
> > > > -             hlt_play_dead();
> > > > +             mwait_play_dead();
> > > > +     hlt_play_dead();
> > > >  }
> > >
> > > Yeah, I don't think so. we don't want to accidentally hit
> > > acpi_idle_play_dead().
> > 
> > Having inspected the code once again, I'm not sure what your concern is.
> > 
> > :enter.dead() is set to acpi_idle_play_dead() for all states in ACPI
> > idle - see acpi_processor_setup_cstates() and the role of the type
> > check is to filter out bogus table entries (the "type" must be 1, 2,
> > or 3 as per the spec).
> > 
> > Then cpuidle_play_dead() calls drv->states[i].enter_dead() for the
> > deepest state where it is set and if this is FFH,
> > acpi_idle_play_dead() will return an error.  So after the change, the
> > code above will fall back to mwait_play_dead() then.
> > 
> > Or am I missing anything?
> 
> So it relies on there being a C2/C3 state enumerated and that being FFh.
> Otherwise it will find a 'working' state and we're up a creek.
> 
> Typically I expect C2/C3 FFh states will be there on Intel stuff, but it
> seems awefully random to rely on this hole. AMD might unwittinly change
> the ACPI driver (they're the main user) and then we'd be up a creek.

AMD platforms won't be using FFH based states for offlined CPUs. We
prefer IO based states when available, and HLT otherwise.

> 
> Robustly we'd teach the ACPI driver about FFh and set enter_dead on
> every state -- but we'd have to double check that with AMD.

Works for us as long as those FFh states aren't used for play_dead on
AMD platforms.

How about something like this (completely untested)

---------------------x8----------------------------------------------------

diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
index f3ffd0a3a012..bd611771fa6c 100644
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -215,6 +215,24 @@ void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
 }
 EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_enter);
 
+static int acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx)
+{
+       unsigned int cpu = smp_processor_id();
+       struct cstate_entry *percpu_entry;
+
+       /*
+        * This is ugly. But AMD processors don't prefer MWAIT based
+        * C-states when processors are offlined.
+        */
+       if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
+           boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
+               return -ENODEV;
+
+       percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
+       return mwait_play_dead_with_hints(percpu_entry->states[cx->index].eax);
+}
+EXPORT_SYMBOL_GPL(acpi_processor_ffh_play_dead);
+
 static int __init ffh_cstate_init(void)
 {
        struct cpuinfo_x86 *c = &boot_cpu_data;
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 831fa4a12159..c535f5df9081 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -590,6 +590,8 @@ static int acpi_idle_play_dead(struct cpuidle_device *dev, int index)
                        raw_safe_halt();
                else if (cx->entry_method == ACPI_CSTATE_SYSTEMIO) {
                        io_idle(cx->address);
+               } else if (cx->entry_method == ACPI_CSTATE_FFH) {
+                       return acpi_procesor_ffh_play_dead(cx);
                } else
                        return -ENODEV;
        }
diff --git a/include/acpi/processor.h b/include/acpi/processor.h
index e6f6074eadbf..38329dcdd2b9 100644
--- a/include/acpi/processor.h
+++ b/include/acpi/processor.h
@@ -280,6 +280,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
                                    struct acpi_processor_cx *cx,
                                    struct acpi_power_register *reg);
 void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cstate);
+int acpi_processor_ffh_dead(struct acpi_processor_cx *cstate);
 #else
 static inline void acpi_processor_power_init_bm_check(struct
                                                      acpi_processor_flags
@@ -300,6 +301,11 @@ static inline void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx
 {
        return;
 }
+
+static inline acpi_processor_ffh_dead(struct acpi_processor_cx *cstate)
+{
+       return -ENODEV;
+}
 #endif
 
 static inline int call_on_cpu(int cpu, long (*fn)(void *), void *arg,

---------------------x8--------------------------------------------------------

--
Thanks and Regards
gautham.
Dave Hansen Nov. 13, 2024, 4:14 p.m. UTC | #2
On 11/13/24 03:41, Gautham R. Shenoy wrote:
> +       /*
> +        * This is ugly. But AMD processors don't prefer MWAIT based
> +        * C-states when processors are offlined.
> +        */
> +       if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
> +           boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
> +               return -ENODEV;

Can we get an X86_FEATURE for this, please?  Either a positive one:

	X86_FEATURE_MWAIT_OK_FOR_OFFLINE

or a negative one:

	X86_FEATURE_MWAIT_BUSTED_FOR_OFFLINE

... with better names.

Or even a helper.  Because if you add this AMD||HYGON check, it'll be at
_least_ the second one of these for the same logical reason.
Wysocki, Rafael J Nov. 13, 2024, 4:27 p.m. UTC | #3
On 11/13/2024 5:22 PM, Peter Zijlstra wrote:
> On Wed, Nov 13, 2024 at 05:11:38PM +0530, Gautham R. Shenoy wrote:
>
>> How about something like this (completely untested)
>>
>> ---------------------x8----------------------------------------------------
>>
>> diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
>> index f3ffd0a3a012..bd611771fa6c 100644
>> --- a/arch/x86/kernel/acpi/cstate.c
>> +++ b/arch/x86/kernel/acpi/cstate.c
>> @@ -215,6 +215,24 @@ void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
>>   }
>>   EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_enter);
>>   
>> +static int acpi_processor_ffh_play_dead(struct acpi_processor_cx *cx)
>> +{
>> +       unsigned int cpu = smp_processor_id();
>> +       struct cstate_entry *percpu_entry;
>> +
>> +       /*
>> +        * This is ugly. But AMD processors don't prefer MWAIT based
>> +        * C-states when processors are offlined.
>> +        */
>> +       if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
>> +           boot_cpu_data.x86_vendor == X86_VENDOR_HYGON)
>> +               return -ENODEV;
> Are there AMD systems with FFh idle states at all?

I don't know.


> Also, I don't think this works right, the loop in cpuidle_play_dead()
> will terminate on this, and not try a shallower state which might have
> IO/HLT on.

I think you are right.


>> +
>> +       percpu_entry = per_cpu_ptr(cpu_cstate_entry, cpu);
>> +       return mwait_play_dead_with_hints(percpu_entry->states[cx->index].eax);
>> +}
>> +EXPORT_SYMBOL_GPL(acpi_processor_ffh_play_dead);
diff mbox series

Patch

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 44c40781bad6..721bb931181c 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1416,9 +1416,9 @@  void native_play_dead(void)
 	play_dead_common();
 	tboot_shutdown(TB_SHUTDOWN_WFS);
 
-	mwait_play_dead();
 	if (cpuidle_play_dead())
-		hlt_play_dead();
+		mwait_play_dead();
+	hlt_play_dead();
 }
 
 #else /* ... !CONFIG_HOTPLUG_CPU */