diff mbox series

[4.14.y] ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel()

Message ID 20200730180340.1724137-1-ndesaulniers@google.com
State New
Headers show
Series [4.14.y] ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel() | expand

Commit Message

Nick Desaulniers July 30, 2020, 6:03 p.m. UTC
From: Geert Uytterhoeven <geert@linux-m68k.org>


commit 59b6359dd92d18f5dc04b14a4c926fa08ab66f7c upstream.

If CONFIG_DEBUG_LOCK_ALLOC=y, the kernel log is spammed with a few
hundred identical messages:

    unwind: Unknown symbol address c0800300
    unwind: Index not found c0800300

c0800300 is the return address from the last subroutine call (to
__memzero()) in __mmap_switched().  Apparently having this address in
the link register confuses the unwinder.

To fix this, reset the link register to zero before jumping to
start_kernel().

Fixes: 9520b1a1b5f7a348 ("ARM: head-common.S: speed up startup code")
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

Acked-by: Nicolas Pitre <nico@linaro.org>

Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

---
Looks like this first landed in v4.15-rc1.  Without this, we can't tell
during an unwind initiated from start_kernel() when to stop unwinding,
which for the clang specific implementation of the arm frame pointer
unwinder leads to dereferencing a garbage value, triggering an exception
which has no fixup, triggering a panic, triggering an unwind, triggering
an infinite loop that prevents booting. I have more patches to send
upstream to make the unwinder more resilient, but it's ambiguous as to
when to stop unwinding without this patch.

 arch/arm/kernel/head-common.S | 1 +
 1 file changed, 1 insertion(+)

-- 
2.28.0.rc0.142.g3c755180ce-goog

Comments

Greg KH Aug. 1, 2020, 10:38 a.m. UTC | #1
On Thu, Jul 30, 2020 at 11:03:40AM -0700, Nick Desaulniers wrote:
> From: Geert Uytterhoeven <geert@linux-m68k.org>

> 

> commit 59b6359dd92d18f5dc04b14a4c926fa08ab66f7c upstream.

> 

> If CONFIG_DEBUG_LOCK_ALLOC=y, the kernel log is spammed with a few

> hundred identical messages:

> 

>     unwind: Unknown symbol address c0800300

>     unwind: Index not found c0800300

> 

> c0800300 is the return address from the last subroutine call (to

> __memzero()) in __mmap_switched().  Apparently having this address in

> the link register confuses the unwinder.

> 

> To fix this, reset the link register to zero before jumping to

> start_kernel().

> 

> Fixes: 9520b1a1b5f7a348 ("ARM: head-common.S: speed up startup code")

> Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

> Acked-by: Nicolas Pitre <nico@linaro.org>

> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

> ---

> Looks like this first landed in v4.15-rc1.  Without this, we can't tell

> during an unwind initiated from start_kernel() when to stop unwinding,

> which for the clang specific implementation of the arm frame pointer

> unwinder leads to dereferencing a garbage value, triggering an exception

> which has no fixup, triggering a panic, triggering an unwind, triggering

> an infinite loop that prevents booting. I have more patches to send

> upstream to make the unwinder more resilient, but it's ambiguous as to

> when to stop unwinding without this patch.


Note, the "Fixes:" tag points at something in 4.15, not 4.14, so are you
_SURE_ this is needed in 4.14.y?

thanks,

greg k-h
Nick Desaulniers Aug. 3, 2020, 9:54 p.m. UTC | #2
On Sat, Aug 1, 2020 at 3:38 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>

> On Thu, Jul 30, 2020 at 11:03:40AM -0700, Nick Desaulniers wrote:

> > From: Geert Uytterhoeven <geert@linux-m68k.org>

> >

> > commit 59b6359dd92d18f5dc04b14a4c926fa08ab66f7c upstream.

> >

> > If CONFIG_DEBUG_LOCK_ALLOC=y, the kernel log is spammed with a few

> > hundred identical messages:

> >

> >     unwind: Unknown symbol address c0800300

> >     unwind: Index not found c0800300

> >

> > c0800300 is the return address from the last subroutine call (to

> > __memzero()) in __mmap_switched().  Apparently having this address in

> > the link register confuses the unwinder.

> >

> > To fix this, reset the link register to zero before jumping to

> > start_kernel().

> >

> > Fixes: 9520b1a1b5f7a348 ("ARM: head-common.S: speed up startup code")

> > Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

> > Acked-by: Nicolas Pitre <nico@linaro.org>

> > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

> > ---

> > Looks like this first landed in v4.15-rc1.  Without this, we can't tell

> > during an unwind initiated from start_kernel() when to stop unwinding,

> > which for the clang specific implementation of the arm frame pointer

> > unwinder leads to dereferencing a garbage value, triggering an exception

> > which has no fixup, triggering a panic, triggering an unwind, triggering

> > an infinite loop that prevents booting. I have more patches to send

> > upstream to make the unwinder more resilient, but it's ambiguous as to

> > when to stop unwinding without this patch.

>

> Note, the "Fixes:" tag points at something in 4.15, not 4.14, so are you

> _SURE_ this is needed in 4.14.y?


It's a fair question, sorry I didn't notice and pre-emptively address.

Yes.  Prior to 59b6359dd92d, the value in the link register (LR) was
garbage left over from the calls to __memzero added in 9520b1a1b5f7.
I suspect that after a `ret` instruction, the value in LR really
shouldn't be used again.

Having garbage in LR when chasing frame pointers from an unwind
started in start_kernel() makes it appear that there are further
frames to unwind, hence the error noted in the commit message of
59b6359dd92d.

Prior to 9520b1a1b5f7, the value in the LR was still garbage (so the
Fixes tag referencing 9520b1a1b5f7 isn't super precise; it references
the latest change that noticeably changed the value of LR, but it was
still previously undefined what its last value was set to).  In fact,
digging up the original suggestion from Ard regarding 59b6359dd92d:
https://lore.kernel.org/linux-arm-kernel/CAKv+Gu8BSnn3XhUALM-CfPqw2zNxovvup4uHf1F4qYZZ5oVUaA@mail.gmail.com/

> I don't think the patch itself is to blame here, it simply triggers an

> existing issue in the unwinder (if my analysis is correct, of course)


and yet 9520b1a1b5f7 was still cited in the Fixes tag of 59b6359dd92d.
(I agree with Ard's analysis).  Yes, "c0800300 is the return address
from the last subroutine call (to __memzero()) in __mmap_switched()"
is correct, but I'd have argued this was broken even before
59b6359dd92d (which is Ard's point).  Forgive me if I'm
misinterpreting your analysis, Ard.  Maybe that Fixes tag was the
simplest to avoid backports to stable which would have had conflicts
due to 9520b1a1b5f7 being a dependency?

Looking at the callers of __mmap_switched, it's hard to tell who would
have set the last value of LR, as there's divergent implementations
based on whether or not there's MMU support.  I don't think it matters
though, and that unwinding via frame pointer on ARM out of a path in
start_kernel() was broken until 59b6359dd92d.  I suspect a combination
of the frame pointer unwinder not being as popular on ARM (vs
CONFIG_UNWINDER_ARM) and the lack of needing to unwind from
start_kernel() (since unwinding occurs for exceptional cases like
WARN_ON or panics) as sources that may have helped to keep this bug
latent for a while.

Here's the lore thread on 9520b1a1b5f7 FWIW, but there's nothing of
interest there IMO.
https://lore.kernel.org/linux-arm-kernel/1507044184-27152-1-git-send-email-geert+renesas@glider.be/
-- 
Thanks,
~Nick Desaulniers
Nick Desaulniers Aug. 3, 2020, 10:06 p.m. UTC | #3
On Mon, Aug 3, 2020 at 2:54 PM Nick Desaulniers <ndesaulniers@google.com> wrote:
>

> On Sat, Aug 1, 2020 at 3:38 AM Greg KH <gregkh@linuxfoundation.org> wrote:

> >

> > On Thu, Jul 30, 2020 at 11:03:40AM -0700, Nick Desaulniers wrote:

> > > From: Geert Uytterhoeven <geert@linux-m68k.org>

> > >

> > > commit 59b6359dd92d18f5dc04b14a4c926fa08ab66f7c upstream.

> > >

> > > If CONFIG_DEBUG_LOCK_ALLOC=y, the kernel log is spammed with a few

> > > hundred identical messages:

> > >

> > >     unwind: Unknown symbol address c0800300

> > >     unwind: Index not found c0800300

> > >

> > > c0800300 is the return address from the last subroutine call (to

> > > __memzero()) in __mmap_switched().  Apparently having this address in

> > > the link register confuses the unwinder.

> > >

> > > To fix this, reset the link register to zero before jumping to

> > > start_kernel().

> > >

> > > Fixes: 9520b1a1b5f7a348 ("ARM: head-common.S: speed up startup code")

> > > Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> > > Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

> > > Acked-by: Nicolas Pitre <nico@linaro.org>

> > > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

> > > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

> > > ---

> > > Looks like this first landed in v4.15-rc1.  Without this, we can't tell

> > > during an unwind initiated from start_kernel() when to stop unwinding,

> > > which for the clang specific implementation of the arm frame pointer

> > > unwinder leads to dereferencing a garbage value, triggering an exception

> > > which has no fixup, triggering a panic, triggering an unwind, triggering

> > > an infinite loop that prevents booting. I have more patches to send

> > > upstream to make the unwinder more resilient, but it's ambiguous as to

> > > when to stop unwinding without this patch.

> >

> > Note, the "Fixes:" tag points at something in 4.15, not 4.14, so are you

> > _SURE_ this is needed in 4.14.y?

>

> It's a fair question, sorry I didn't notice and pre-emptively address.

>

> Yes.  Prior to 59b6359dd92d, the value in the link register (LR) was

> garbage left over from the calls to __memzero added in 9520b1a1b5f7.

> I suspect that after a `ret` instruction, the value in LR really


Sorry, rereading that `ret` is not an instruction on ARM; I may have
confused the assembler macro defined in
arch/arm/include/asm/assembler.h used in __turn_mmu_on to call into
__mmap_switched with being an actual instruction.  Maybe `I suspect
that after returning from a called frame, the value in LR really
shouldn't be used again.` would have been more precise.

> shouldn't be used again.

>

> Having garbage in LR when chasing frame pointers from an unwind

> started in start_kernel() makes it appear that there are further

> frames to unwind, hence the error noted in the commit message of

> 59b6359dd92d.

>

> Prior to 9520b1a1b5f7, the value in the LR was still garbage (so the

> Fixes tag referencing 9520b1a1b5f7 isn't super precise; it references

> the latest change that noticeably changed the value of LR, but it was

> still previously undefined what its last value was set to).  In fact,

> digging up the original suggestion from Ard regarding 59b6359dd92d:

> https://lore.kernel.org/linux-arm-kernel/CAKv+Gu8BSnn3XhUALM-CfPqw2zNxovvup4uHf1F4qYZZ5oVUaA@mail.gmail.com/

>

> > I don't think the patch itself is to blame here, it simply triggers an

> > existing issue in the unwinder (if my analysis is correct, of course)

>

> and yet 9520b1a1b5f7 was still cited in the Fixes tag of 59b6359dd92d.

> (I agree with Ard's analysis).  Yes, "c0800300 is the return address

> from the last subroutine call (to __memzero()) in __mmap_switched()"

> is correct, but I'd have argued this was broken even before

> 59b6359dd92d (which is Ard's point).  Forgive me if I'm

> misinterpreting your analysis, Ard.  Maybe that Fixes tag was the

> simplest to avoid backports to stable which would have had conflicts

> due to 9520b1a1b5f7 being a dependency?

>

> Looking at the callers of __mmap_switched, it's hard to tell who would

> have set the last value of LR, as there's divergent implementations

> based on whether or not there's MMU support.  I don't think it matters

> though, and that unwinding via frame pointer on ARM out of a path in

> start_kernel() was broken until 59b6359dd92d.  I suspect a combination

> of the frame pointer unwinder not being as popular on ARM (vs

> CONFIG_UNWINDER_ARM) and the lack of needing to unwind from

> start_kernel() (since unwinding occurs for exceptional cases like

> WARN_ON or panics) as sources that may have helped to keep this bug

> latent for a while.

>

> Here's the lore thread on 9520b1a1b5f7 FWIW, but there's nothing of

> interest there IMO.

> https://lore.kernel.org/linux-arm-kernel/1507044184-27152-1-git-send-email-geert+renesas@glider.be/

> --

> Thanks,

> ~Nick Desaulniers




-- 
Thanks,
~Nick Desaulniers
diff mbox series

Patch

diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S
index 7e662bdd5cb3..932b2244e709 100644
--- a/arch/arm/kernel/head-common.S
+++ b/arch/arm/kernel/head-common.S
@@ -101,6 +101,7 @@  __mmap_switched:
 	str	r2, [r6]			@ Save atags pointer
 	cmp	r7, #0
 	strne	r0, [r7]			@ Save control register values
+	mov	lr, #0
 	b	start_kernel
 ENDPROC(__mmap_switched)