Message ID | 20200730180340.1724137-1-ndesaulniers@google.com |
---|---|
State | New |
Headers | show |
Series | [4.14.y] ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel() | expand |
On Thu, Jul 30, 2020 at 11:03:40AM -0700, Nick Desaulniers wrote: > From: Geert Uytterhoeven <geert@linux-m68k.org> > > commit 59b6359dd92d18f5dc04b14a4c926fa08ab66f7c upstream. > > If CONFIG_DEBUG_LOCK_ALLOC=y, the kernel log is spammed with a few > hundred identical messages: > > unwind: Unknown symbol address c0800300 > unwind: Index not found c0800300 > > c0800300 is the return address from the last subroutine call (to > __memzero()) in __mmap_switched(). Apparently having this address in > the link register confuses the unwinder. > > To fix this, reset the link register to zero before jumping to > start_kernel(). > > Fixes: 9520b1a1b5f7a348 ("ARM: head-common.S: speed up startup code") > Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > Acked-by: Nicolas Pitre <nico@linaro.org> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > --- > Looks like this first landed in v4.15-rc1. Without this, we can't tell > during an unwind initiated from start_kernel() when to stop unwinding, > which for the clang specific implementation of the arm frame pointer > unwinder leads to dereferencing a garbage value, triggering an exception > which has no fixup, triggering a panic, triggering an unwind, triggering > an infinite loop that prevents booting. I have more patches to send > upstream to make the unwinder more resilient, but it's ambiguous as to > when to stop unwinding without this patch. Note, the "Fixes:" tag points at something in 4.15, not 4.14, so are you _SURE_ this is needed in 4.14.y? thanks, greg k-h
On Sat, Aug 1, 2020 at 3:38 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Thu, Jul 30, 2020 at 11:03:40AM -0700, Nick Desaulniers wrote: > > From: Geert Uytterhoeven <geert@linux-m68k.org> > > > > commit 59b6359dd92d18f5dc04b14a4c926fa08ab66f7c upstream. > > > > If CONFIG_DEBUG_LOCK_ALLOC=y, the kernel log is spammed with a few > > hundred identical messages: > > > > unwind: Unknown symbol address c0800300 > > unwind: Index not found c0800300 > > > > c0800300 is the return address from the last subroutine call (to > > __memzero()) in __mmap_switched(). Apparently having this address in > > the link register confuses the unwinder. > > > > To fix this, reset the link register to zero before jumping to > > start_kernel(). > > > > Fixes: 9520b1a1b5f7a348 ("ARM: head-common.S: speed up startup code") > > Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > > Acked-by: Nicolas Pitre <nico@linaro.org> > > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> > > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > > --- > > Looks like this first landed in v4.15-rc1. Without this, we can't tell > > during an unwind initiated from start_kernel() when to stop unwinding, > > which for the clang specific implementation of the arm frame pointer > > unwinder leads to dereferencing a garbage value, triggering an exception > > which has no fixup, triggering a panic, triggering an unwind, triggering > > an infinite loop that prevents booting. I have more patches to send > > upstream to make the unwinder more resilient, but it's ambiguous as to > > when to stop unwinding without this patch. > > Note, the "Fixes:" tag points at something in 4.15, not 4.14, so are you > _SURE_ this is needed in 4.14.y? It's a fair question, sorry I didn't notice and pre-emptively address. Yes. Prior to 59b6359dd92d, the value in the link register (LR) was garbage left over from the calls to __memzero added in 9520b1a1b5f7. I suspect that after a `ret` instruction, the value in LR really shouldn't be used again. Having garbage in LR when chasing frame pointers from an unwind started in start_kernel() makes it appear that there are further frames to unwind, hence the error noted in the commit message of 59b6359dd92d. Prior to 9520b1a1b5f7, the value in the LR was still garbage (so the Fixes tag referencing 9520b1a1b5f7 isn't super precise; it references the latest change that noticeably changed the value of LR, but it was still previously undefined what its last value was set to). In fact, digging up the original suggestion from Ard regarding 59b6359dd92d: https://lore.kernel.org/linux-arm-kernel/CAKv+Gu8BSnn3XhUALM-CfPqw2zNxovvup4uHf1F4qYZZ5oVUaA@mail.gmail.com/ > I don't think the patch itself is to blame here, it simply triggers an > existing issue in the unwinder (if my analysis is correct, of course) and yet 9520b1a1b5f7 was still cited in the Fixes tag of 59b6359dd92d. (I agree with Ard's analysis). Yes, "c0800300 is the return address from the last subroutine call (to __memzero()) in __mmap_switched()" is correct, but I'd have argued this was broken even before 59b6359dd92d (which is Ard's point). Forgive me if I'm misinterpreting your analysis, Ard. Maybe that Fixes tag was the simplest to avoid backports to stable which would have had conflicts due to 9520b1a1b5f7 being a dependency? Looking at the callers of __mmap_switched, it's hard to tell who would have set the last value of LR, as there's divergent implementations based on whether or not there's MMU support. I don't think it matters though, and that unwinding via frame pointer on ARM out of a path in start_kernel() was broken until 59b6359dd92d. I suspect a combination of the frame pointer unwinder not being as popular on ARM (vs CONFIG_UNWINDER_ARM) and the lack of needing to unwind from start_kernel() (since unwinding occurs for exceptional cases like WARN_ON or panics) as sources that may have helped to keep this bug latent for a while. Here's the lore thread on 9520b1a1b5f7 FWIW, but there's nothing of interest there IMO. https://lore.kernel.org/linux-arm-kernel/1507044184-27152-1-git-send-email-geert+renesas@glider.be/ -- Thanks, ~Nick Desaulniers
On Mon, Aug 3, 2020 at 2:54 PM Nick Desaulniers <ndesaulniers@google.com> wrote: > > On Sat, Aug 1, 2020 at 3:38 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > On Thu, Jul 30, 2020 at 11:03:40AM -0700, Nick Desaulniers wrote: > > > From: Geert Uytterhoeven <geert@linux-m68k.org> > > > > > > commit 59b6359dd92d18f5dc04b14a4c926fa08ab66f7c upstream. > > > > > > If CONFIG_DEBUG_LOCK_ALLOC=y, the kernel log is spammed with a few > > > hundred identical messages: > > > > > > unwind: Unknown symbol address c0800300 > > > unwind: Index not found c0800300 > > > > > > c0800300 is the return address from the last subroutine call (to > > > __memzero()) in __mmap_switched(). Apparently having this address in > > > the link register confuses the unwinder. > > > > > > To fix this, reset the link register to zero before jumping to > > > start_kernel(). > > > > > > Fixes: 9520b1a1b5f7a348 ("ARM: head-common.S: speed up startup code") > > > Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> > > > Acked-by: Nicolas Pitre <nico@linaro.org> > > > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> > > > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> > > > --- > > > Looks like this first landed in v4.15-rc1. Without this, we can't tell > > > during an unwind initiated from start_kernel() when to stop unwinding, > > > which for the clang specific implementation of the arm frame pointer > > > unwinder leads to dereferencing a garbage value, triggering an exception > > > which has no fixup, triggering a panic, triggering an unwind, triggering > > > an infinite loop that prevents booting. I have more patches to send > > > upstream to make the unwinder more resilient, but it's ambiguous as to > > > when to stop unwinding without this patch. > > > > Note, the "Fixes:" tag points at something in 4.15, not 4.14, so are you > > _SURE_ this is needed in 4.14.y? > > It's a fair question, sorry I didn't notice and pre-emptively address. > > Yes. Prior to 59b6359dd92d, the value in the link register (LR) was > garbage left over from the calls to __memzero added in 9520b1a1b5f7. > I suspect that after a `ret` instruction, the value in LR really Sorry, rereading that `ret` is not an instruction on ARM; I may have confused the assembler macro defined in arch/arm/include/asm/assembler.h used in __turn_mmu_on to call into __mmap_switched with being an actual instruction. Maybe `I suspect that after returning from a called frame, the value in LR really shouldn't be used again.` would have been more precise. > shouldn't be used again. > > Having garbage in LR when chasing frame pointers from an unwind > started in start_kernel() makes it appear that there are further > frames to unwind, hence the error noted in the commit message of > 59b6359dd92d. > > Prior to 9520b1a1b5f7, the value in the LR was still garbage (so the > Fixes tag referencing 9520b1a1b5f7 isn't super precise; it references > the latest change that noticeably changed the value of LR, but it was > still previously undefined what its last value was set to). In fact, > digging up the original suggestion from Ard regarding 59b6359dd92d: > https://lore.kernel.org/linux-arm-kernel/CAKv+Gu8BSnn3XhUALM-CfPqw2zNxovvup4uHf1F4qYZZ5oVUaA@mail.gmail.com/ > > > I don't think the patch itself is to blame here, it simply triggers an > > existing issue in the unwinder (if my analysis is correct, of course) > > and yet 9520b1a1b5f7 was still cited in the Fixes tag of 59b6359dd92d. > (I agree with Ard's analysis). Yes, "c0800300 is the return address > from the last subroutine call (to __memzero()) in __mmap_switched()" > is correct, but I'd have argued this was broken even before > 59b6359dd92d (which is Ard's point). Forgive me if I'm > misinterpreting your analysis, Ard. Maybe that Fixes tag was the > simplest to avoid backports to stable which would have had conflicts > due to 9520b1a1b5f7 being a dependency? > > Looking at the callers of __mmap_switched, it's hard to tell who would > have set the last value of LR, as there's divergent implementations > based on whether or not there's MMU support. I don't think it matters > though, and that unwinding via frame pointer on ARM out of a path in > start_kernel() was broken until 59b6359dd92d. I suspect a combination > of the frame pointer unwinder not being as popular on ARM (vs > CONFIG_UNWINDER_ARM) and the lack of needing to unwind from > start_kernel() (since unwinding occurs for exceptional cases like > WARN_ON or panics) as sources that may have helped to keep this bug > latent for a while. > > Here's the lore thread on 9520b1a1b5f7 FWIW, but there's nothing of > interest there IMO. > https://lore.kernel.org/linux-arm-kernel/1507044184-27152-1-git-send-email-geert+renesas@glider.be/ > -- > Thanks, > ~Nick Desaulniers -- Thanks, ~Nick Desaulniers
diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S index 7e662bdd5cb3..932b2244e709 100644 --- a/arch/arm/kernel/head-common.S +++ b/arch/arm/kernel/head-common.S @@ -101,6 +101,7 @@ __mmap_switched: str r2, [r6] @ Save atags pointer cmp r7, #0 strne r0, [r7] @ Save control register values + mov lr, #0 b start_kernel ENDPROC(__mmap_switched)