diff mbox series

[3/3] x86/fpu: Don't support kernel-mode FPU when irqs_disabled()

Message ID 20250516231858.27899-4-ebiggers@kernel.org
State New
Headers show
Series x86: Don't support kernel-mode FPU with hardirqs disabled | expand

Commit Message

Eric Biggers May 16, 2025, 11:18 p.m. UTC
From: Eric Biggers <ebiggers@google.com>

Make irq_fpu_usable() return false when irqs_disabled().  That makes the
irqs_disabled() checks in kernel_fpu_begin_mask() and kernel_fpu_end()
unnecessary, so also remove those.

Rationale:

- There's no known use case for kernel-mode FPU when irqs_disabled().
  arm64 and riscv already disallow kernel-mode FPU when irqs_disabled().
  __save_processor_state() previously did expect kernel_fpu_begin() and
  kernel_fpu_end() to work when irqs_disabled(), but this was a
  different use case and not actual kernel-mode FPU use.

- This is more efficient, since one call to irqs_disabled() replaces two
  irqs_disabled() and one in_hardirq().

- This fixes irq_fpu_usable() to correctly return false during CPU
  initialization.  Incorrectly returning true caused the SHA-256 library
  code, which is called when loading AMD microcode, to take a
  SIMD-optimized code path too early, causing a crash.  By correctly
  returning false from irq_fpu_usable(), the generic SHA-256 code
  correctly gets used instead.  (Note: SIMD-optimized SHA-256 doesn't
  get enabled until subsys_initcall, but CPU hotplug can happen later.)

Fixes: 11d7956d526f ("crypto: x86/sha256 - implement library instead of shash")
Reported-by: Ayush Jain <Ayush.Jain3@amd.com>
Closes: https://lore.kernel.org/r/20250516112217.GBaCcf6Yoc6LkIIryP@fat_crate.local
Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/kernel/fpu/core.c | 49 ++++++++++++++------------------------
 1 file changed, 18 insertions(+), 31 deletions(-)

Comments

Ingo Molnar May 17, 2025, 7:09 a.m. UTC | #1
* Eric Biggers <ebiggers@kernel.org> wrote:

> From: Eric Biggers <ebiggers@google.com>
> 
> Make irq_fpu_usable() return false when irqs_disabled().  That makes the
> irqs_disabled() checks in kernel_fpu_begin_mask() and kernel_fpu_end()
> unnecessary, so also remove those.
> 
> Rationale:
> 
> - There's no known use case for kernel-mode FPU when irqs_disabled().

Except EFI?

>   arm64 and riscv already disallow kernel-mode FPU when irqs_disabled().
>   __save_processor_state() previously did expect kernel_fpu_begin() and
>   kernel_fpu_end() to work when irqs_disabled(), but this was a
>   different use case and not actual kernel-mode FPU use.
> 
> - This is more efficient, since one call to irqs_disabled() replaces two
>   irqs_disabled() and one in_hardirq().

This is noise compared to the overhead of saving/restoring vector CPU 
context ...

> - This fixes irq_fpu_usable() to correctly return false during CPU
>   initialization.  Incorrectly returning true caused the SHA-256 library
>   code, which is called when loading AMD microcode, to take a
>   SIMD-optimized code path too early, causing a crash.  By correctly
>   returning false from irq_fpu_usable(), the generic SHA-256 code
>   correctly gets used instead.  (Note: SIMD-optimized SHA-256 doesn't
>   get enabled until subsys_initcall, but CPU hotplug can happen later.)

Alternatively we could set in_kernel_fpu during CPU bootstrap, and 
clear it once we know the FPU is usable? This is only a relatively 
short early boot period, with no scheduling, right?

Thanks,

	Ingo
Eric Biggers May 17, 2025, 6:39 p.m. UTC | #2
On Sat, May 17, 2025 at 09:09:01AM +0200, Ingo Molnar wrote:
> 
> * Eric Biggers <ebiggers@kernel.org> wrote:
> 
> > From: Eric Biggers <ebiggers@google.com>
> > 
> > Make irq_fpu_usable() return false when irqs_disabled().  That makes the
> > irqs_disabled() checks in kernel_fpu_begin_mask() and kernel_fpu_end()
> > unnecessary, so also remove those.
> > 
> > Rationale:
> > 
> > - There's no known use case for kernel-mode FPU when irqs_disabled().
> 
> Except EFI?

Yes, I remembered that just after sending this...  And EFI does want the ldmxcsr
and fninit, which makes it like actual kernel-mode FPU.  That implies we at
least need to disable BH (and preemption) if it wasn't already disabled.  But if
hardirqs may or may not be disabled already, that means we either need to
conditionally use local_bh_disable()/enable (or preempt_enable()/disable on
PREEMPT_RT) as the current code does, or use local_irq_save()/restore.

If we did the latter, then all EFI calls would run with hardirqs disabled.  It
looks like hardirqs are currently intentionally disabled before some of the EFI
calls, but not all of them.  I'm not sure what the logic is there, and whether
it would be okay to just always disable them.

> 
> >   arm64 and riscv already disallow kernel-mode FPU when irqs_disabled().
> >   __save_processor_state() previously did expect kernel_fpu_begin() and
> >   kernel_fpu_end() to work when irqs_disabled(), but this was a
> >   different use case and not actual kernel-mode FPU use.
> > 
> > - This is more efficient, since one call to irqs_disabled() replaces two
> >   irqs_disabled() and one in_hardirq().
> 
> This is noise compared to the overhead of saving/restoring vector CPU 
> context ...

In practice most calls to kernel_fpu_begin() don't actually do the
save_fpregs_to_fpstate(), since either TIF_NEED_FPU_LOAD is already set or it's
a kthread.  So, the overhead from the other parts like the EFLAGS checks and
ldmxcsr are measurable, especially when processing small amounts of data.

> > - This fixes irq_fpu_usable() to correctly return false during CPU
> >   initialization.  Incorrectly returning true caused the SHA-256 library
> >   code, which is called when loading AMD microcode, to take a
> >   SIMD-optimized code path too early, causing a crash.  By correctly
> >   returning false from irq_fpu_usable(), the generic SHA-256 code
> >   correctly gets used instead.  (Note: SIMD-optimized SHA-256 doesn't
> >   get enabled until subsys_initcall, but CPU hotplug can happen later.)
> 
> Alternatively we could set in_kernel_fpu during CPU bootstrap, and 
> clear it once we know the FPU is usable? This is only a relatively 
> short early boot period, with no scheduling, right?

Yes, if there isn't agreement on this approach we can do that instead.  Say:

- Replace in_kernel_fpu with kernel_fpu_supported, with the opposite meaning
  (so that the initial value of false means "unsupported")
- fpu__init_cpu() sets it to true
- cpu_disable_common() sets it to false
Ingo Molnar May 18, 2025, 6:34 a.m. UTC | #3
* Eric Biggers <ebiggers@kernel.org> wrote:

> > Alternatively we could set in_kernel_fpu during CPU bootstrap, and 
> > clear it once we know the FPU is usable? This is only a relatively 
> > short early boot period, with no scheduling, right?
> 
> Yes, if there isn't agreement on this approach we can do that 
> instead.  Say:
> 
> - Replace in_kernel_fpu with kernel_fpu_supported, with the opposite 
>   meaning (so that the initial value of false means "unsupported")

I'm not against simplifying the x86 FPU model to exclude IRQs-off
context (especially if it also micro-optimizes some of the key runtime
kernel-FPU primitives), but it has to be a full solution and we'll have
to see how complicated the EFI changes get.

Ie. without seeing the full cost-benefit balance it's hard to call this
in advance. Mind sending a full series that addresses the EFI case too?

Thanks,

	Ingo
Eric Biggers May 18, 2025, 8:01 p.m. UTC | #4
On Sun, May 18, 2025 at 03:18:58PM +0200, Ard Biesheuvel wrote:
> On Sun, 18 May 2025 at 08:34, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > > > Alternatively we could set in_kernel_fpu during CPU bootstrap, and
> > > > clear it once we know the FPU is usable? This is only a relatively
> > > > short early boot period, with no scheduling, right?
> > >
> > > Yes, if there isn't agreement on this approach we can do that
> > > instead.  Say:
> > >
> > > - Replace in_kernel_fpu with kernel_fpu_supported, with the opposite
> > >   meaning (so that the initial value of false means "unsupported")
> >
> > I'm not against simplifying the x86 FPU model to exclude IRQs-off
> > context (especially if it also micro-optimizes some of the key runtime
> > kernel-FPU primitives), but it has to be a full solution and we'll have
> > to see how complicated the EFI changes get.
> >
> > Ie. without seeing the full cost-benefit balance it's hard to call this
> > in advance. Mind sending a full series that addresses the EFI case too?
> >
> 
> EFI services are only called with IRQs disabled in exceptional cases,
> so it would be unfortunate if it prevents us from make meaningful
> improvements here. In ordinary cases, they are called from a
> workqueue, and I'd prefer it if we can address this without calling
> all EFI services with interrupts disabled either.
> 
> AIUI, the reason we cannot tolerate IRQs being disabled is because
> re-enabling softirqs will complain if IRQs are disabled, due to the
> fact that handling softirqs should not be attempted at that point?
> 
> I don't know the history here, but I wonder if that isn't overly
> pedantic? Disabling softirqs could be avoided entirely when IRQs are
> off, given that they are disabled implicitly already. But why then is
> it not permitted to disable and re-enable softirqs under this
> condition, given that it makes no difference? Or perhaps I'm missing
> something here.
> 
> A good way to trigger such an exceptional case is running a kernel
> with efi-pstore and lkdtm built-in under QEMU with OVMF, and do
> 
> # echo PANIC > /sys/kernel/debug/provoke-crash/DIRECT
> 
> Another case that likely executes with IRQs disabled (but I haven't
> double checked) is reset_system(), which may return with an error, or
> reboot/poweroff the machine and never return.

That makes sense to me.  preempt_disable() and preempt_enable() are already
allowed when IRQs are disabled, and I'm not sure why local_bh_disable() and
local_bh_enable() are different.  local_bh_enable() already uses
local_irq_save(flags) instead of local_irq_disable(), so it seems it's sort of
intended to work when IRQs are disabled, despite the
lockdep_assert_irqs_enabled().

Anyway, that would point to continuing to support kernel-mode FPU when IRQs are
disabled.  But also EFI needs it anyway, unless we refactor it to use
kernel_fpu_begin() and kernel_fpu_end() only when irq_fpu_usable() and otherwise
use different code, analogous what arm64 does.

So for now I've sent
https://lore.kernel.org/lkml/20250518193212.1822-1-ebiggers@kernel.org which
implements the other possible fix, where we just start keeping track of whether
the FPU has been initialized or not.

- Eric
Ingo Molnar May 19, 2025, 8:05 a.m. UTC | #5
* Eric Biggers <ebiggers@kernel.org> wrote:

> > # echo PANIC > /sys/kernel/debug/provoke-crash/DIRECT
> > 
> > Another case that likely executes with IRQs disabled (but I haven't
> > double checked) is reset_system(), which may return with an error, or
> > reboot/poweroff the machine and never return.
> 
> That makes sense to me.  preempt_disable() and preempt_enable() are already
> allowed when IRQs are disabled, and I'm not sure why local_bh_disable() and
> local_bh_enable() are different.

Because local_bh_enable() may run softirq handlers immediately if 
there's pending softirqs, which shouldn't be done in hardirq context.

This is a key optimization of the Linux networking code, which uses 
BH-off/BH-on sections instead of IRQS-off/IRQS-on critical sections, 
for performance reasons.

Thanks,

	Ingo
Ard Biesheuvel May 19, 2025, 9:49 a.m. UTC | #6
On Mon, 19 May 2025 at 10:06, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Eric Biggers <ebiggers@kernel.org> wrote:
>
> > > # echo PANIC > /sys/kernel/debug/provoke-crash/DIRECT
> > >
> > > Another case that likely executes with IRQs disabled (but I haven't
> > > double checked) is reset_system(), which may return with an error, or
> > > reboot/poweroff the machine and never return.
> >
> > That makes sense to me.  preempt_disable() and preempt_enable() are already
> > allowed when IRQs are disabled, and I'm not sure why local_bh_disable() and
> > local_bh_enable() are different.
>
> Because local_bh_enable() may run softirq handlers immediately if
> there's pending softirqs, which shouldn't be done in hardirq context.
>

Sure, but why is that mandatory?

preempt_disable() has preempt_enable() and preempt_enable_no_resched()
counterparts. Could we have a local_bh_enable_no_xxx() version that
re-enables async softirq processing on the current CPU but does not
kick off a synchronous processing run?
Ingo Molnar May 20, 2025, 7:42 a.m. UTC | #7
* Ard Biesheuvel <ardb@kernel.org> wrote:

> On Mon, 19 May 2025 at 14:57, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> > > On Mon, 19 May 2025 at 10:06, Ingo Molnar <mingo@kernel.org> wrote:
> > > >
> > > >
> > > > * Eric Biggers <ebiggers@kernel.org> wrote:
> > > >
> > > > > > # echo PANIC > /sys/kernel/debug/provoke-crash/DIRECT
> > > > > >
> > > > > > Another case that likely executes with IRQs disabled (but I 
> > > > > > haven't double checked) is reset_system(), which may return 
> > > > > > with an error, or reboot/poweroff the machine and never 
> > > > > > return.
> > > > >
> > > > > That makes sense to me.  preempt_disable() and 
> > > > > preempt_enable() are already allowed when IRQs are disabled, 
> > > > > and I'm not sure why local_bh_disable() and local_bh_enable() 
> > > > > are different.
> > > >
> > > > Because local_bh_enable() may run softirq handlers immediately 
> > > > if there's pending softirqs, which shouldn't be done in hardirq 
> > > > context.
> > >
> > > Sure, but why is that mandatory?
> > >
> > >
> > > preempt_disable() has preempt_enable() and 
> > > preempt_enable_no_resched() counterparts.
> >
> > > [...] Could we have a local_bh_enable_no_xxx() version that
> > > re-enables async softirq processing on the current CPU but does not
> > > kick off a synchronous processing run?
> >
> > Yes, that's what __local_bh_enable() does, but if used it for
> > kernel_fpu_end() we'd be introducing random softirq processing
> > latencies. The softirq execution model is for softirqs to be
> > immediately executed after local_bh_enable(), and various networking
> > code is tuned to that behavior.
> >
> 
> All of that only applies when re-enabling softirqs with IRQs enabled.

Yes, of course. BHs in the networking code are typically 
disabled/enabled when IRQs are on. It's the whole *point* of the 
facility: it was written as a 'lightweight' IRQs-on/off facility 
originally, back in the days when local_irq_save() was very expensive, 
especially on non-x86 platforms.

> I don't see why we'd need all of that.
> 
> Conceptually, kernel_fpu_end() would do
> 
> if (irqs_disabled())
>    local_bh_enable_no_xxx();
> else
>    local_bh_enable();

In normal kernel code local_bh_enable() obviously cannot be done with 
IRQs off, and local_bh_disable()/__local_bh_enable() is highly frowned 
upon because it's generally pointless: if irqs are off to begin with, 
why disable any BHs?

What you probably mean is to only disable BHs if IRQs are not off 
(because hardirqs-off state already disables BHs):

  kernel_fpu_begin()
	if (!irqs_disabled)
		local_bh_disable();

  kernel_fpu_end()
	if (!irqs_disabled())
		local_bh_enable();

... which is basically what the current code does:

        if (!irqs_disabled())
                fpregs_lock();

	...

        if (!irqs_disabled())
                fpregs_unlock();

BTW., maybe we should add a lockdep check to make sure we never enable 
hardirqs while kernel FPU handling is ongoing. It should be relatively 
straightforward and cheap.

But that brings is far away from the original question:

> > > > > preempt_disable() and preempt_enable() are already allowed 
> > > > > when IRQs are disabled, and I'm not sure why 
> > > > > local_bh_disable() and local_bh_enable() are different.
> > > >
> > > > Because local_bh_enable() may run softirq handlers immediately 
> > > > if there's pending softirqs, which shouldn't be done in hardirq 
> > > > context.

To rephrase my answer: local_bh_disable()/enable() are part of the 
softirq execution mechanism whose primary historical purpose was to be 
a lighter-weight replacement hardirq disable/enable critical sections, 
combined with a relaxation of how long a softirq could run versus a 
hardirq. It makes little sense to try to nest BH handling primitives to 
within hardirq critical sections.

Thanks,

	Ingo
diff mbox series

Patch

diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 476393b1d5e8f..9b3c5e17f86cd 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -66,42 +66,31 @@  struct fpu *x86_task_fpu(struct task_struct *task)
  * Can we use the FPU in kernel mode with the
  * whole "kernel_fpu_begin/end()" sequence?
  */
 bool irq_fpu_usable(void)
 {
-	if (WARN_ON_ONCE(in_nmi()))
-		return false;
-
 	/*
-	 * In kernel FPU usage already active?  This detects any explicitly
-	 * nested usage in task or softirq context, which is unsupported.  It
-	 * also detects attempted usage in a hardirq that has interrupted a
-	 * kernel-mode FPU section.
+	 * To ensure that (non-explicitly-nested) kernel-mode FPU is always
+	 * usable in task and softirq contexts, kernel_fpu_begin() disables
+	 * preemption and softirqs, and kernel_fpu_end() re-enables them.  That
+	 * is not compatible with hardirqs being disabled (including hardirq
+	 * context), or with NMI context.  Support for kernel-mode FPU is not
+	 * needed in those contexts anyway.  Return false in those contexts.
+	 *
+	 * Returning false when irqs_disabled() also eliminates the need to
+	 * explicitly check whether the FPU has been initialized yet during CPU
+	 * initialization.  Before then, hardirqs are still disabled.
 	 */
+	if (irqs_disabled() || WARN_ON_ONCE(in_nmi()))
+		return false;
+
+	/* Catch any explicitly nested usage, which should never happen. */
 	if (this_cpu_read(in_kernel_fpu)) {
-		WARN_ON_FPU(!in_hardirq());
+		WARN_ON_FPU(1);
 		return false;
 	}
-
-	/*
-	 * When not in NMI or hard interrupt context, FPU can be used in:
-	 *
-	 * - Task context except from within fpregs_lock()'ed critical
-	 *   regions.
-	 *
-	 * - Soft interrupt processing context which cannot happen
-	 *   while in a fpregs_lock()'ed critical region.
-	 */
-	if (!in_hardirq())
-		return true;
-
-	/*
-	 * In hard interrupt context it's safe when soft interrupts
-	 * are enabled, which means the interrupt did not hit in
-	 * a fpregs_lock()'ed critical region.
-	 */
-	return !softirq_count();
+	return true;
 }
 EXPORT_SYMBOL(irq_fpu_usable);
 
 /*
  * Track AVX512 state use because it is known to slow the max clock
@@ -443,12 +432,11 @@  static __always_inline void __fpu_save_state(void)
 	__cpu_invalidate_fpregs_state();
 }
 
 void kernel_fpu_begin_mask(unsigned int kfpu_mask)
 {
-	if (!irqs_disabled())
-		fpregs_lock();
+	fpregs_lock();
 
 	WARN_ON_FPU(!irq_fpu_usable());
 	WARN_ON_FPU(this_cpu_read(in_kernel_fpu));
 
 	this_cpu_write(in_kernel_fpu, true);
@@ -467,12 +455,11 @@  EXPORT_SYMBOL_GPL(kernel_fpu_begin_mask);
 void kernel_fpu_end(void)
 {
 	WARN_ON_FPU(!this_cpu_read(in_kernel_fpu));
 
 	this_cpu_write(in_kernel_fpu, false);
-	if (!irqs_disabled())
-		fpregs_unlock();
+	fpregs_unlock();
 }
 EXPORT_SYMBOL_GPL(kernel_fpu_end);
 
 #ifdef CONFIG_PM_SLEEP
 /*