diff mbox series

[v3,2/2] arm64: efi: Account for the EFI runtime stack in stack unwinder

Message ID 20230106174703.1883495-3-ardb@kernel.org
State Accepted
Commit 7ea55715c421d22c1b63f7129cae6a654091b695
Headers show
Series efi: Follow-up fixes for EFI runtime stack | expand

Commit Message

Ard Biesheuvel Jan. 6, 2023, 5:47 p.m. UTC
The EFI runtime services run from a dedicated stack now, and so the
stack unwinder needs to be informed about this.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/stacktrace.h | 15 +++++++++++++++
 arch/arm64/kernel/stacktrace.c      | 12 ++++++++++++
 2 files changed, 27 insertions(+)

Comments

Nathan Chancellor Jan. 10, 2023, 8:48 p.m. UTC | #1
Hi Ard,

On Fri, Jan 06, 2023 at 06:47:03PM +0100, Ard Biesheuvel wrote:
> The EFI runtime services run from a dedicated stack now, and so the
> stack unwinder needs to be informed about this.
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Apologies if this has been reported and/or fixed already, I searched
lore and did not find anything but I just bisected a QEMU boot hang [1]
that we see in the ClangBuiltLinux CI with Fedora's configuration [2] to
this change in next-20220110 as commit a7334dc70496 ("arm64: efi:
Account for the EFI runtime stack in stack unwinder").

I do not think our QEMU command is very exotic but we are not booting
via EFI (although I do see it even if we do boot via EFI):

$ qemu-system-aarch64 \
-cpu max,pauth-impdef=true \
-machine virt,gic-version=max,virtualization=true \
-kernel arch/arm64/boot/Image.gz
-append "console=ttyAMA0 earlycon" \
-display none \
-initrd images/arm64/rootfs.cpio
-m 512m \
-nodefaults \
-no-reboot \
-serial mon:stdio
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x000f0510]
[    0.000000] Linux version 6.2.0-rc2+ (tuxmake@tuxmake) (aarch64-linux-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #1 SMP PREEMPT_DYNAMIC @1673275520
[    0.000000] random: crng init done
[    0.000000] Machine model: linux,dummy-virt
[    0.000000] efi: UEFI not found.
[    0.000000] earlycon: pl11 at MMIO 0x0000000009000000 (options '')
[    0.000000] printk: bootconsole [pl11] enabled
[    0.000000] NUMA: No NUMA configuration found
[    0.000000] NUMA: Faking a node at [mem 0x0000000040000000-0x000000005fffffff]
[    0.000000] NUMA: NODE_DATA [mem 0x5fe9e3c0-0x5feb4fff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000040000000-0x000000005fffffff]
[    0.000000]   DMA32    empty
[    0.000000]   Normal   empty
[    0.000000]   Device   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000040000000-0x000000005fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x000000005fffffff]
[    0.000000] cma: Reserved 64 MiB at 0x000000005b600000
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.1 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: Trusted OS migration not required
[    0.000000] psci: SMC Calling Convention v1.0
[    0.000000] percpu: Embedded 32 pages/cpu s90408 r8192 d32472 u131072
[    0.000000] Detected PIPT I-cache on CPU0
[    0.000000] CPU features: detected: Address authentication (IMP DEF algorithm)
[    0.000000] CPU features: detected: GIC system register CPU interface
[    0.000000] CPU features: detected: Virtualization Host Extensions
[    0.000000] CPU features: detected: Hardware dirty bit management
[    0.000000] CPU features: detected: Spectre-v4
[    0.000000] alternatives: applying boot alternatives
[    0.000000] Fallback order for Node 0: 0
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 129024
[    0.000000] Policy zone: DMA
[    0.000000] Kernel command line: console=ttyAMA0 earlycon
[    0.000000] Dentry cache hash table entries: 65536 (order: 7, 524288 bytes, linear)
[    0.000000] Inode-cache hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.000000] mem auto-init: stack:all(zero), heap alloc:off, heap free:off
[    0.000000] Memory: 384052K/524288K available (17088K kernel code, 4208K rwdata, 15032K rodata, 9536K init, 10717K bss, 74700K reserved, 65536K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] ftrace: allocating 58614 entries in 229 pages
[    0.000000] ftrace: allocated 229 pages with 5 groups
[    0.000000] trace event string verifier disabled
[    0.000000] Dynamic Preempt: voluntary
[    0.000000] rcu: Preemptible hierarchical RCU implementation.
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=4096 to nr_cpu_ids=1.
[    0.000000]  Trampoline variant of Tasks RCU enabled.
[    0.000000]  Rude variant of Tasks RCU enabled.
[    0.000000]  Tracing variant of Tasks RCU enabled.
[    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=1
[    0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000] GICv3: GIC: Using split EOI/Deactivate mode
[    0.000000] GICv3: 256 SPIs implemented
[    0.000000] GICv3: 0 Extended SPIs implemented
[    0.000000] Root IRQ handler: gic_handle_irq
[    0.000000] GICv3: GICv3 features: 16 PPIs
[    0.000000] GICv3: GICv4 features:
[    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x00000000080a0000
[    0.000000] ITS [mem 0x08080000-0x0809ffff]
[    0.000000] ITS@0x0000000008080000: Single VMOVP capable
[    0.000000] ITS@0x0000000008080000: allocated 8192 Devices @43de0000 (indirect, esz 8, psz 64K, shr 1)
[    0.000000] ITS@0x0000000008080000: allocated 8192 Interrupt Collections @43df0000 (flat, esz 8, psz 64K, shr 1)
[    0.000000] ITS@0x0000000008080000: allocated 8192 Virtual CPUs @43e00000 (indirect, esz 8, psz 64K, shr 1)
[    0.000000] GICv3: using LPI property table @0x0000000043e10000
[    0.000000] ITS: Allocated DevID ffff as GICv4 proxy device (2 slots)
[    0.000000] ITS: Enabling GICv4 support
[    0.000000] GICv3: CPU0: using allocated LPI pending table @0x0000000043e20000
[    0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000000] arch_timer: cp15 timer(s) running at 62.50MHz (phys).
[    0.000000] clocksource: arch_sys_counter: mask: 0x1ffffffffffffff max_cycles: 0x1cd42e208c, max_idle_ns: 881590405314 ns
[    0.000069] sched_clock: 57 bits at 63MHz, resolution 16ns, wraps every 4398046511096ns
[    0.002785] kfence: initialized - using 2097152 bytes for 255 objects at 0x(____ptrval____)-0x(____ptrval____)
[    0.014680] Console: colour dummy device 80x25
[    0.022555] Calibrating delay loop (skipped), value calculated using timer frequency.. 125.00 BogoMIPS (lpj=625000)
[    0.023292] pid_max: default: 32768 minimum: 301
[    0.025346] LSM: initializing lsm=lockdown,capability,yama,integrity,selinux,bpf,landlock
[    0.026087] Yama: becoming mindful.
[    0.027053] SELinux:  Initializing.
[    0.027935] LSM support for eBPF active
[    0.028089] landlock: Up and running.
[    0.030514] Mount-cache hash table entries: 1024 (order: 1, 8192 bytes, linear)
[    0.030706] Mountpoint-cache hash table entries: 1024 (order: 1, 8192 bytes, linear)
qemu-system-aarch64: terminating on signal 15 from pid 4112158 (timeout)

This does not appear to be clang specific, as I can reproduce it with
the same configuration using kernel.org's GCC 12.2.0 (as you'll see
above). I unfortunately do not have time to bisect configurations to see
if there is a more trivial reproducer on top of defconfig, so just this
report.

If there is any additional information I can provide, please let me
know.

Cheers,
Nathan

[1]: https://github.com/ClangBuiltLinux/continuous-integration2/actions/runs/3883388450/jobs/6628748170
[2]: https://src.fedoraproject.org/rpms/kernel/raw/rawhide/f/kernel-aarch64-fedora.config

# bad: [435bf71af3a0aa8067f3b87ff9febf68b564dbb6] Add linux-next specific files for 20230110
# good: [1fe4fd6f5cad346e598593af36caeadc4f5d4fa9] Merge tag 'xfs-6.2-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect start '435bf71af3a0aa8067f3b87ff9febf68b564dbb6' '1fe4fd6f5cad346e598593af36caeadc4f5d4fa9'
# bad: [57aac56e8af1628ef96055820f88ca547233b310] Merge branch 'drm-next' of git://git.freedesktop.org/git/drm/drm.git
git bisect bad 57aac56e8af1628ef96055820f88ca547233b310
# bad: [68efa45676e8f0b73c896ee4f82996ac396455af] Merge branch 'riscv-soc-for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git
git bisect bad 68efa45676e8f0b73c896ee4f82996ac396455af
# bad: [154e09c58f5dcd853bd54ac8b725132126d0b640] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux-mem-ctrl.git
git bisect bad 154e09c58f5dcd853bd54ac8b725132126d0b640
# good: [0680aa5d9dee0e17d169c022f29c53ec6ad3f69d] Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git
git bisect good 0680aa5d9dee0e17d169c022f29c53ec6ad3f69d
# good: [76c76819b36d2433e2637c91dbdc4ec345bf7d92] ARM: remove CONFIG_UNUSED_BOARD_FILES
git bisect good 76c76819b36d2433e2637c91dbdc4ec345bf7d92
# bad: [d08fc57162414439a50e36eb5c4554a56eb8e121] Merge branch 'for-next' of git://git.armlinux.org.uk/~rmk/linux-arm.git
git bisect bad d08fc57162414439a50e36eb5c4554a56eb8e121
# bad: [3145d9dfed323d5a82d40670e333de58fb1a9e65] Merge branch 'for-linux-next-fixes' of git://anongit.freedesktop.org/drm/drm-misc
git bisect bad 3145d9dfed323d5a82d40670e333de58fb1a9e65
# good: [3b3d45c3b305edeb93053c05993bd7c58b9ceb94] Merge branch 'for-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
git bisect good 3b3d45c3b305edeb93053c05993bd7c58b9ceb94
# bad: [33b73800589ae5a93e3b2faf2fbac7c30d6fdd1a] Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git
git bisect bad 33b73800589ae5a93e3b2faf2fbac7c30d6fdd1a
# good: [18bba1843fc7f264f58c9345d00827d082f9c558] efi: rt-wrapper: Add missing include
git bisect good 18bba1843fc7f264f58c9345d00827d082f9c558
# bad: [a7334dc70496bb0cec7f1d3b1f148855951f41c5] arm64: efi: Account for the EFI runtime stack in stack unwinder
git bisect bad a7334dc70496bb0cec7f1d3b1f148855951f41c5
# good: [4a70773264fd7f00b8401b5f9addb72b58678fdf] arm64: efi: Avoid workqueue to check whether EFI runtime is live
git bisect good 4a70773264fd7f00b8401b5f9addb72b58678fdf
# first bad commit: [a7334dc70496bb0cec7f1d3b1f148855951f41c5] arm64: efi: Account for the EFI runtime stack in stack unwinder
Ard Biesheuvel Jan. 11, 2023, 8:45 a.m. UTC | #2
On Tue, 10 Jan 2023 at 21:48, Nathan Chancellor <nathan@kernel.org> wrote:
>
> Hi Ard,
>
> On Fri, Jan 06, 2023 at 06:47:03PM +0100, Ard Biesheuvel wrote:
> > The EFI runtime services run from a dedicated stack now, and so the
> > stack unwinder needs to be informed about this.
> >
> > Acked-by: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>
> Apologies if this has been reported and/or fixed already, I searched
> lore and did not find anything but I just bisected a QEMU boot hang [1]
> that we see in the ClangBuiltLinux CI with Fedora's configuration [2] to
> this change in next-20220110 as commit a7334dc70496 ("arm64: efi:
> Account for the EFI runtime stack in stack unwinder").
>

Thanks for the report. This is due to an oversight on my part: we
removed a spin_is_locked() check, and the lock in question can only be
in the locked state when EFI runtime services are enabled to begin
with.

Without the lock check, we may end up dereferencing the uninitialized
efi_rt_stack_top on non-EFI boots.

I've fixed this up in the EFI fixes tree, so the issue should
disappear once -next is updated. (We just missed 20230111
unfortunately)
Nathan Chancellor Jan. 11, 2023, 9:18 p.m. UTC | #3
Hi Ard,

On Wed, Jan 11, 2023 at 09:45:32AM +0100, Ard Biesheuvel wrote:
> On Tue, 10 Jan 2023 at 21:48, Nathan Chancellor <nathan@kernel.org> wrote:
> > On Fri, Jan 06, 2023 at 06:47:03PM +0100, Ard Biesheuvel wrote:
> > > The EFI runtime services run from a dedicated stack now, and so the
> > > stack unwinder needs to be informed about this.
> > >
> > > Acked-by: Mark Rutland <mark.rutland@arm.com>
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> >
> > Apologies if this has been reported and/or fixed already, I searched
> > lore and did not find anything but I just bisected a QEMU boot hang [1]
> > that we see in the ClangBuiltLinux CI with Fedora's configuration [2] to
> > this change in next-20220110 as commit a7334dc70496 ("arm64: efi:
> > Account for the EFI runtime stack in stack unwinder").
> >
> 
> Thanks for the report. This is due to an oversight on my part: we
> removed a spin_is_locked() check, and the lock in question can only be
> in the locked state when EFI runtime services are enabled to begin
> with.
> 
> Without the lock check, we may end up dereferencing the uninitialized
> efi_rt_stack_top on non-EFI boots.
> 
> I've fixed this up in the EFI fixes tree, so the issue should
> disappear once -next is updated. (We just missed 20230111
> unfortunately)

Thank you for the quick response! That issue appears to be fixed.

Unfortunately, I am still seeing a hang while booting via EFI on either
bare metal or KVM when CONFIG_DEBUG_PREEMPT is enabled (Fedora's rawhide
config appears to enable several debugging options), so it appears I was
seeing two distinct issues :/ defconfig + CONFIG_DEBUG_PREEMPT=y is
enough for me to reproduce this problem.

I see

  [    0.015382] Remapping and enabling EFI services.

as the last line in the console (via earlycon) with the bad kernel and
nothing after it (I assume we deadlock somewhere or hit a BUG_ON()?), vs

  [    0.015191] Remapping and enabling EFI services.
  [    0.016725] smp: Bringing up secondary CPUs ...

on the good kernel, followed by a normal boot.

Sorry for not noticing this sooner! It should be pretty easy to
reproduce but if there is any other information I can provide, I am more
than happy to do so.

Cheers,
Nathan
Ard Biesheuvel Jan. 11, 2023, 10:53 p.m. UTC | #4
On Wed, 11 Jan 2023 at 22:18, Nathan Chancellor <nathan@kernel.org> wrote:
>
> Hi Ard,
>
> On Wed, Jan 11, 2023 at 09:45:32AM +0100, Ard Biesheuvel wrote:
> > On Tue, 10 Jan 2023 at 21:48, Nathan Chancellor <nathan@kernel.org> wrote:
> > > On Fri, Jan 06, 2023 at 06:47:03PM +0100, Ard Biesheuvel wrote:
> > > > The EFI runtime services run from a dedicated stack now, and so the
> > > > stack unwinder needs to be informed about this.
> > > >
> > > > Acked-by: Mark Rutland <mark.rutland@arm.com>
> > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > >
> > > Apologies if this has been reported and/or fixed already, I searched
> > > lore and did not find anything but I just bisected a QEMU boot hang [1]
> > > that we see in the ClangBuiltLinux CI with Fedora's configuration [2] to
> > > this change in next-20220110 as commit a7334dc70496 ("arm64: efi:
> > > Account for the EFI runtime stack in stack unwinder").
> > >
> >
> > Thanks for the report. This is due to an oversight on my part: we
> > removed a spin_is_locked() check, and the lock in question can only be
> > in the locked state when EFI runtime services are enabled to begin
> > with.
> >
> > Without the lock check, we may end up dereferencing the uninitialized
> > efi_rt_stack_top on non-EFI boots.
> >
> > I've fixed this up in the EFI fixes tree, so the issue should
> > disappear once -next is updated. (We just missed 20230111
> > unfortunately)
>
> Thank you for the quick response! That issue appears to be fixed.
>
> Unfortunately, I am still seeing a hang while booting via EFI on either
> bare metal or KVM when CONFIG_DEBUG_PREEMPT is enabled (Fedora's rawhide
> config appears to enable several debugging options), so it appears I was
> seeing two distinct issues :/ defconfig + CONFIG_DEBUG_PREEMPT=y is
> enough for me to reproduce this problem.
>
> I see
>
>   [    0.015382] Remapping and enabling EFI services.
>
> as the last line in the console (via earlycon) with the bad kernel and
> nothing after it (I assume we deadlock somewhere or hit a BUG_ON()?), vs
>
>   [    0.015191] Remapping and enabling EFI services.
>   [    0.016725] smp: Bringing up secondary CPUs ...
>
> on the good kernel, followed by a normal boot.
>

Yeah, this is the same issue, essentially.

I have added back the spin_is_locked() check, which is a better
indicator of whether the EFI runtime stack is actually in use or not.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index 4e5354beafb01bac..66ec8caa6ac07fa0 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -106,4 +106,19 @@  static inline struct stack_info stackinfo_get_sdei_critical(void)
 #define stackinfo_get_sdei_critical()	stackinfo_get_unknown()
 #endif
 
+#ifdef CONFIG_EFI
+extern u64 *efi_rt_stack_top;
+
+static inline struct stack_info stackinfo_get_efi(void)
+{
+	unsigned long high = (u64)efi_rt_stack_top;
+	unsigned long low = high - THREAD_SIZE;
+
+	return (struct stack_info) {
+		.low = low,
+		.high = high,
+	};
+}
+#endif
+
 #endif	/* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 117e2c180f3c77d8..83154303e682c8b6 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -5,6 +5,7 @@ 
  * Copyright (C) 2012 ARM Ltd.
  */
 #include <linux/kernel.h>
+#include <linux/efi.h>
 #include <linux/export.h>
 #include <linux/ftrace.h>
 #include <linux/sched.h>
@@ -12,6 +13,7 @@ 
 #include <linux/sched/task_stack.h>
 #include <linux/stacktrace.h>
 
+#include <asm/efi.h>
 #include <asm/irq.h>
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
@@ -186,6 +188,13 @@  void show_stack(struct task_struct *tsk, unsigned long *sp, const char *loglvl)
 			: stackinfo_get_unknown();		\
 	})
 
+#define STACKINFO_EFI						\
+	({							\
+		((task == current) && current_in_efi())		\
+			? stackinfo_get_efi()			\
+			: stackinfo_get_unknown();		\
+	})
+
 noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry,
 			      void *cookie, struct task_struct *task,
 			      struct pt_regs *regs)
@@ -199,6 +208,9 @@  noinline noinstr void arch_stack_walk(stack_trace_consume_fn consume_entry,
 #if defined(CONFIG_VMAP_STACK) && defined(CONFIG_ARM_SDE_INTERFACE)
 		STACKINFO_SDEI(normal),
 		STACKINFO_SDEI(critical),
+#endif
+#ifdef CONFIG_EFI
+		STACKINFO_EFI,
 #endif
 	};
 	struct unwind_state state = {