mbox series

[v3,0/3] SRF: Fix offline CPU preventing pc6 entry

Message ID 20241108122909.763663-1-patryk.wlazlyn@linux.intel.com
Headers show
Series SRF: Fix offline CPU preventing pc6 entry | expand

Message

Patryk Wlazlyn Nov. 8, 2024, 12:29 p.m. UTC
Applied suggestions from Dave and Rafael.

Because we now call a smp function, we have to have ifdefs for
CONFIG_SMP or rely on mwait_play_dead_with_hint to return immediatelly
with an error on non-smp builds. I decided to do the later, but maybe we
should return -ENODEV (or other error constant) instead of 1. I am open
for suggestions.

Removing the existing "kexec hack" by bringing all offlined CPUs back
online before proceeding with the kexec would make it even simpler, but
I am not sure we can do that. It looks kind of obvious to me, but for
some reason the hack exist.

Changes since v2:
  The whole approach changed, but there main things are below.

  * Split mwait_play_dead (old code) into two parts:
    1. Computing mwait hint based on cpuid leaf 0x5
    2. Entering while(1) mwait loop with "kexec hack" handling

  * Prefer cpuidle_play_dead over mwait_play_dead by reordering calls in
    native_play_dead.

  * Add implementation for enter_dead() handler in intel_idle that calls
    executes old mwait_play_dead code, but with the mwait hint from the
    cpuidle state table.


Patryk Wlazlyn (3):
  x86/smp: Allow calling mwait_play_dead with arbitrary hint
  x86/smp native_play_dead: Prefer cpuidle_play_dead() over
    mwait_play_dead()
  intel_idle: Provide enter_dead() handler for SRF

 arch/x86/include/asm/smp.h |  6 ++++++
 arch/x86/kernel/smpboot.c  | 29 ++++++++++++++++++++---------
 drivers/idle/intel_idle.c  | 16 ++++++++++++++++
 3 files changed, 42 insertions(+), 9 deletions(-)

Comments

Thomas Gleixner Nov. 12, 2024, 7:17 p.m. UTC | #1
On Tue, Nov 12 2024 at 12:28, Rafael J. Wysocki wrote:
> On Tue, Nov 12, 2024 at 12:18 PM Patryk Wlazlyn
> <patryk.wlazlyn@linux.intel.com> wrote:
>> I don't think so. The old algorithm gives fairly good heuristic for computing
>> the mwait hint for the deepest cstate. Even though it's not guaranteed to work,
>> it does work on most of the platforms that don't early return. I think we should
>> leave it, but prefer idle_driver.
>
> IOW, as a fallback mechanism, it is as good as it gets.
>
> As the primary source of information though, not quite.

So we have at least 5 places in the kernel which evaluate CPUID leaf 0x5
in different ways.

Can we please have _ONE_ function which evaluates the leaf correctly
once and provides the required information for all places ready to use?

Thanks,

        tglx
Patryk Wlazlyn Nov. 14, 2024, 5:13 p.m. UTC | #2
> That's not really working:
>
>   1) Regular kexec offlines them again.

But then, they execute hlt loop in the reboot vector right?
I think that's fine. We just don't want to be woken up from the mwait
when the RIP points to garbage.

>   2) Kexec in panic can't do any of that.

Does it make sense to change the kexec, so that every CPU is forced
into the reboot vector, which I believe happens for online CPUs?
We don't have to online them all the way. Maybe minimal bringup just
to be able to send them the IPI?

As a side note, It's not that important for this patchset. I think I
can make it work without simplifying the existing hack, but crossed my
mind at some point that maybe we could do that.