diff mbox series

rmk/for-next bisection: boot on ox820-cloudengines-pogoplug-series-3

Message ID 5dcf8f19.1c69fb81.c02f3.91f2@mx.google.com
State Superseded
Headers show
Series rmk/for-next bisection: boot on ox820-cloudengines-pogoplug-series-3 | expand

Commit Message

kernelci.org bot Nov. 16, 2019, 5:54 a.m. UTC
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* This automated bisection report was sent to you on the basis  *
* that you may be involved with the breaking commit it has      *
* found.  No manual investigation has been done to verify it,   *
* and the root cause of the problem may be somewhere else.      *
*                                                               *
* If you do send a fix, please include this trailer:            *
*   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *
*                                                               *
* Hope this helps!                                              *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

rmk/for-next bisection: boot on ox820-cloudengines-pogoplug-series-3

Summary:
  Start:      b6c3c42cfda0 ARM: 8938/1: kernel: initialize broadcast hrtimer based clock event device
  Details:    https://kernelci.org/boot/id/5dcf3f0359b514dc84cf54c8
  Plain log:  https://storage.kernelci.org//rmk/for-next/v5.4-rc5-26-gb6c3c42cfda0/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/boot-ox820-cloudengines-pogoplug-series-3.txt
  HTML log:   https://storage.kernelci.org//rmk/for-next/v5.4-rc5-26-gb6c3c42cfda0/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/boot-ox820-cloudengines-pogoplug-series-3.html
  Result:     ea70bf6e92c5 ARM: 8935/1: decompressor: avoid CP15 barrier instructions in v7 cache setup code

Checks:
  revert:     PASS
  verify:     PASS

Parameters:
  Tree:       rmk
  URL:        git://git.armlinux.org.uk/~rmk/linux-arm.git
  Branch:     for-next
  Target:     ox820-cloudengines-pogoplug-series-3
  CPU arch:   arm
  Lab:        lab-baylibre
  Compiler:   gcc-8
  Config:     oxnas_v6_defconfig
  Test suite: boot

Breaking commit found:

-------------------------------------------------------------------------------
commit ea70bf6e92c5d8cf38c8a077e0eded091c275899
Author: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Date:   Fri Nov 8 13:46:50 2019 +0100

    ARM: 8935/1: decompressor: avoid CP15 barrier instructions in v7 cache setup code
    
    Commit e17b1af96b2afc38e684aa2f1033387e2ed10029
    
      "ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache"
    
    added some explicit handling of the CP15BEN bit in the SCTLR system
    register, to ensure that CP15 barrier instructions are enabled, even
    if we enter the decompressor via the EFI stub.
    
    However, as it turns out, there are other ways in which we may end up
    using CP15 barrier instructions without them being enabled. I.e., when
    the decompressor startup code skips the cache_on() initially, we end
    up calling cache_clean_flush() with the caches and MMU off, in which
    case the CP15BEN bit in SCTLR may not be programmed either. And in
    fact, cache_on() itself issues CP15 barrier instructions before actually
    enabling them by programming the new SCTLR value (and issuing an ISB)
    
    Since all these routines are specific to v7, let's clean this up by
    using the ordinary v7 barrier instructions in the v7 specific cache
    handling routines, so that we never rely on the CP15 ones. This also
    avoids the issue where a barrier is required between programming SCTLR
    and using the CP15 barrier instructions, which would result in two
    different kinds of barriers being used in the same function.
    
    Acked-by: Marc Zyngier <maz@kernel.org>

    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

    Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>


-------------------------------------------------------------------------------


Git bisection log:

-------------------------------------------------------------------------------
git bisect start
# good: [cb73737ea1d27181f5c4bfb1288e97f3e8a4abc7] ARM: 8928/1: ARM_ERRATA_775420: Spelling s/date/data/
git bisect good cb73737ea1d27181f5c4bfb1288e97f3e8a4abc7
# bad: [b6c3c42cfda04b0119a0ac46c2a06006f38522d7] ARM: 8938/1: kernel: initialize broadcast hrtimer based clock event device
git bisect bad b6c3c42cfda04b0119a0ac46c2a06006f38522d7
# good: [052e76a31b4a64d7678e270d498e1bc36c342f88] ARM: 8931/1: Add clock_getres entry point
git bisect good 052e76a31b4a64d7678e270d498e1bc36c342f88
# good: [44700c1ea9afeb9c5093dba7794117fda7c5c955] ARM: 8934/1: Revert "efi: enable CP15 DMB instructions before cleaning the cache"
git bisect good 44700c1ea9afeb9c5093dba7794117fda7c5c955
# bad: [7f586a0a683ec37ac25bee24381e24c66dfe32b8] ARM: 8937/1: spectre-v2: remove Brahma-B53 from hardening
git bisect bad 7f586a0a683ec37ac25bee24381e24c66dfe32b8
# bad: [ea70bf6e92c5d8cf38c8a077e0eded091c275899] ARM: 8935/1: decompressor: avoid CP15 barrier instructions in v7 cache setup code
git bisect bad ea70bf6e92c5d8cf38c8a077e0eded091c275899
# first bad commit: [ea70bf6e92c5d8cf38c8a077e0eded091c275899] ARM: 8935/1: decompressor: avoid CP15 barrier instructions in v7 cache setup code
-------------------------------------------------------------------------------

Comments

Ard Biesheuvel Nov. 16, 2019, 10:26 a.m. UTC | #1
(+ Arnd)

On Sat, 16 Nov 2019 at 05:54, kernelci.org bot <bot@kernelci.org> wrote:
>

> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

> * This automated bisection report was sent to you on the basis  *

> * that you may be involved with the breaking commit it has      *

> * found.  No manual investigation has been done to verify it,   *

> * and the root cause of the problem may be somewhere else.      *

> *                                                               *

> * If you do send a fix, please include this trailer:            *

> *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *

> *                                                               *

> * Hope this helps!                                              *

> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

>

> rmk/for-next bisection: boot on ox820-cloudengines-pogoplug-series-3

>

> Summary:

>   Start:      b6c3c42cfda0 ARM: 8938/1: kernel: initialize broadcast hrtimer based clock event device

>   Details:    https://kernelci.org/boot/id/5dcf3f0359b514dc84cf54c8

>   Plain log:  https://storage.kernelci.org//rmk/for-next/v5.4-rc5-26-gb6c3c42cfda0/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/boot-ox820-cloudengines-pogoplug-series-3.txt

>   HTML log:   https://storage.kernelci.org//rmk/for-next/v5.4-rc5-26-gb6c3c42cfda0/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/boot-ox820-cloudengines-pogoplug-series-3.html

>   Result:     ea70bf6e92c5 ARM: 8935/1: decompressor: avoid CP15 barrier instructions in v7 cache setup code

>


OK, so this regression is caused by the fact that the 'armv7' cache
maintenance routines in the decompressor are also used for ARMv6 cores
if they implement the CPUID extension, which I failed to realise when
I sent this patch.

There are roughly three ways to deal with this:
1) add a mask/val match pair for ARM11MPcore and ARM1176 that hardwire
them to the ARMv6 routines, even though they implement the CPUID
extension. This would be very easy, but assumes that those two cores
are the only ones that are affected by this.
2) modify the v7 routines to check for the L1Hvd MMFR1 attribute (in
the flush routine) and for the CP15BEN SCTLR bit (in the on/off
routines), and jump to the respective v6 variants if the CPU turns out
not to support the v7 one.
3) revert the patch, and just enable the CP15 barriers (and issue a v7
barrier) in the v7 on() and flush() routines.

I am leaning towards the latter, since it is the most straightforward,
even though it mixes v7 and cp15 barriers in the same function, but
that was mostly a cosmetic concern anyway.



> Checks:

>   revert:     PASS

>   verify:     PASS

>

> Parameters:

>   Tree:       rmk

>   URL:        git://git.armlinux.org.uk/~rmk/linux-arm.git

>   Branch:     for-next

>   Target:     ox820-cloudengines-pogoplug-series-3

>   CPU arch:   arm

>   Lab:        lab-baylibre

>   Compiler:   gcc-8

>   Config:     oxnas_v6_defconfig

>   Test suite: boot

>

> Breaking commit found:

>

> -------------------------------------------------------------------------------

> commit ea70bf6e92c5d8cf38c8a077e0eded091c275899

> Author: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> Date:   Fri Nov 8 13:46:50 2019 +0100

>

>     ARM: 8935/1: decompressor: avoid CP15 barrier instructions in v7 cache setup code

>

>     Commit e17b1af96b2afc38e684aa2f1033387e2ed10029

>

>       "ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache"

>

>     added some explicit handling of the CP15BEN bit in the SCTLR system

>     register, to ensure that CP15 barrier instructions are enabled, even

>     if we enter the decompressor via the EFI stub.

>

>     However, as it turns out, there are other ways in which we may end up

>     using CP15 barrier instructions without them being enabled. I.e., when

>     the decompressor startup code skips the cache_on() initially, we end

>     up calling cache_clean_flush() with the caches and MMU off, in which

>     case the CP15BEN bit in SCTLR may not be programmed either. And in

>     fact, cache_on() itself issues CP15 barrier instructions before actually

>     enabling them by programming the new SCTLR value (and issuing an ISB)

>

>     Since all these routines are specific to v7, let's clean this up by

>     using the ordinary v7 barrier instructions in the v7 specific cache

>     handling routines, so that we never rely on the CP15 ones. This also

>     avoids the issue where a barrier is required between programming SCTLR

>     and using the CP15 barrier instructions, which would result in two

>     different kinds of barriers being used in the same function.

>

>     Acked-by: Marc Zyngier <maz@kernel.org>

>     Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

>     Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>

>

> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S

> index 95238146b7f2..fe279816b298 100644

> --- a/arch/arm/boot/compressed/head.S

> +++ b/arch/arm/boot/compressed/head.S

> @@ -656,6 +656,21 @@ params:            ldr     r0, =0x10000100         @ params_phys for RPC

>                 .align

>  #endif

>

> +               .macro  v7dsb

> + ARM(          .inst   0xf57ff04f              @ v7+ dsb       )

> + THUMB(                dsb                                             )

> +               .endm

> +

> +               .macro  v7dmb

> + ARM(          .inst   0xf57ff05f              @ v7+ dmb       )

> + THUMB(                dmb                                             )

> +               .endm

> +

> +               .macro  v7isb

> + ARM(          .inst   0xf57ff06f              @ v7+ isb       )

> + THUMB(                isb                                             )

> +               .endm

> +

>  /*

>   * Turn on the cache.  We need to setup some page tables so that we

>   * can have both the I and D caches on.

> @@ -827,7 +842,7 @@ __armv7_mmu_cache_on:

>                 movne   r6, #CB_BITS | 0x02     @ !XN

>                 blne    __setup_mmu

>                 mov     r0, #0

> -               mcr     p15, 0, r0, c7, c10, 4  @ drain write buffer

> +               v7dsb                           @ drain write buffer

>                 tst     r11, #0xf               @ VMSA

>                 mcrne   p15, 0, r0, c8, c7, 0   @ flush I,D TLBs

>  #endif

> @@ -849,11 +864,11 @@ __armv7_mmu_cache_on:

>                 mcrne   p15, 0, r1, c3, c0, 0   @ load domain access control

>                 mcrne   p15, 0, r6, c2, c0, 2   @ load ttb control

>  #endif

> -               mcr     p15, 0, r0, c7, c5, 4   @ ISB

> +               v7isb

>                 mcr     p15, 0, r0, c1, c0, 0   @ load control register

>                 mrc     p15, 0, r0, c1, c0, 0   @ and read it back

>                 mov     r0, #0

> -               mcr     p15, 0, r0, c7, c5, 4   @ ISB

> +               v7isb

>                 mov     pc, r12

>

>  __fa526_cache_on:

> @@ -1154,8 +1169,8 @@ __armv7_mmu_cache_off:

>                 mcr     p15, 0, r0, c8, c7, 0   @ invalidate whole TLB

>  #endif

>                 mcr     p15, 0, r0, c7, c5, 6   @ invalidate BTC

> -               mcr     p15, 0, r0, c7, c10, 4  @ DSB

> -               mcr     p15, 0, r0, c7, c5, 4   @ ISB

> +               v7dsb

> +               v7isb

>                 mov     pc, r12

>

>  /*

> @@ -1218,7 +1233,7 @@ __armv7_mmu_cache_flush:

>                 mcr     p15, 0, r10, c7, c14, 0 @ clean+invalidate D

>                 b       iflush

>  hierarchical:

> -               mcr     p15, 0, r10, c7, c10, 5 @ DMB

> +               v7dmb

>                 stmfd   sp!, {r0-r7, r9-r11}

>                 mrc     p15, 1, r0, c0, c0, 1   @ read clidr

>                 ands    r3, r0, #0x7000000      @ extract loc from clidr

> @@ -1232,7 +1247,7 @@ loop1:

>                 cmp     r1, #2                  @ see what cache we have at this level

>                 blt     skip                    @ skip if no cache, or just i-cache

>                 mcr     p15, 2, r10, c0, c0, 0  @ select current cache level in cssr

> -               mcr     p15, 0, r10, c7, c5, 4  @ isb to sych the new cssr&csidr

> +               v7isb                           @ isb to sych the new cssr&csidr

>                 mrc     p15, 1, r1, c0, c0, 0   @ read the new csidr

>                 and     r2, r1, #7              @ extract the length of the cache lines

>                 add     r2, r2, #4              @ add 4 (line length offset)

> @@ -1264,10 +1279,10 @@ finished:

>                 mov     r10, #0                 @ switch back to cache level 0

>                 mcr     p15, 2, r10, c0, c0, 0  @ select current cache level in cssr

>  iflush:

> -               mcr     p15, 0, r10, c7, c10, 4 @ DSB

> +               v7dsb

>                 mcr     p15, 0, r10, c7, c5, 0  @ invalidate I+BTB

> -               mcr     p15, 0, r10, c7, c10, 4 @ DSB

> -               mcr     p15, 0, r10, c7, c5, 4  @ ISB

> +               v7dsb

> +               v7isb

>                 mov     pc, lr

>

>  __armv5tej_mmu_cache_flush:

> -------------------------------------------------------------------------------

>

>

> Git bisection log:

>

> -------------------------------------------------------------------------------

> git bisect start

> # good: [cb73737ea1d27181f5c4bfb1288e97f3e8a4abc7] ARM: 8928/1: ARM_ERRATA_775420: Spelling s/date/data/

> git bisect good cb73737ea1d27181f5c4bfb1288e97f3e8a4abc7

> # bad: [b6c3c42cfda04b0119a0ac46c2a06006f38522d7] ARM: 8938/1: kernel: initialize broadcast hrtimer based clock event device

> git bisect bad b6c3c42cfda04b0119a0ac46c2a06006f38522d7

> # good: [052e76a31b4a64d7678e270d498e1bc36c342f88] ARM: 8931/1: Add clock_getres entry point

> git bisect good 052e76a31b4a64d7678e270d498e1bc36c342f88

> # good: [44700c1ea9afeb9c5093dba7794117fda7c5c955] ARM: 8934/1: Revert "efi: enable CP15 DMB instructions before cleaning the cache"

> git bisect good 44700c1ea9afeb9c5093dba7794117fda7c5c955

> # bad: [7f586a0a683ec37ac25bee24381e24c66dfe32b8] ARM: 8937/1: spectre-v2: remove Brahma-B53 from hardening

> git bisect bad 7f586a0a683ec37ac25bee24381e24c66dfe32b8

> # bad: [ea70bf6e92c5d8cf38c8a077e0eded091c275899] ARM: 8935/1: decompressor: avoid CP15 barrier instructions in v7 cache setup code

> git bisect bad ea70bf6e92c5d8cf38c8a077e0eded091c275899

> # first bad commit: [ea70bf6e92c5d8cf38c8a077e0eded091c275899] ARM: 8935/1: decompressor: avoid CP15 barrier instructions in v7 cache setup code

> -------------------------------------------------------------------------------
Russell King (Oracle) Nov. 16, 2019, 10:49 a.m. UTC | #2
On Sat, Nov 16, 2019 at 10:26:27AM +0000, Ard Biesheuvel wrote:
> (+ Arnd)

> 

> On Sat, 16 Nov 2019 at 05:54, kernelci.org bot <bot@kernelci.org> wrote:

> >

> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

> > * This automated bisection report was sent to you on the basis  *

> > * that you may be involved with the breaking commit it has      *

> > * found.  No manual investigation has been done to verify it,   *

> > * and the root cause of the problem may be somewhere else.      *

> > *                                                               *

> > * If you do send a fix, please include this trailer:            *

> > *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *

> > *                                                               *

> > * Hope this helps!                                              *

> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

> >

> > rmk/for-next bisection: boot on ox820-cloudengines-pogoplug-series-3

> >

> > Summary:

> >   Start:      b6c3c42cfda0 ARM: 8938/1: kernel: initialize broadcast hrtimer based clock event device

> >   Details:    https://kernelci.org/boot/id/5dcf3f0359b514dc84cf54c8

> >   Plain log:  https://storage.kernelci.org//rmk/for-next/v5.4-rc5-26-gb6c3c42cfda0/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/boot-ox820-cloudengines-pogoplug-series-3.txt

> >   HTML log:   https://storage.kernelci.org//rmk/for-next/v5.4-rc5-26-gb6c3c42cfda0/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/boot-ox820-cloudengines-pogoplug-series-3.html

> >   Result:     ea70bf6e92c5 ARM: 8935/1: decompressor: avoid CP15 barrier instructions in v7 cache setup code

> >

> 

> OK, so this regression is caused by the fact that the 'armv7' cache

> maintenance routines in the decompressor are also used for ARMv6 cores

> if they implement the CPUID extension, which I failed to realise when

> I sent this patch.

> 

> There are roughly three ways to deal with this:

> 1) add a mask/val match pair for ARM11MPcore and ARM1176 that hardwire

> them to the ARMv6 routines, even though they implement the CPUID

> extension. This would be very easy, but assumes that those two cores

> are the only ones that are affected by this.

> 2) modify the v7 routines to check for the L1Hvd MMFR1 attribute (in

> the flush routine) and for the CP15BEN SCTLR bit (in the on/off

> routines), and jump to the respective v6 variants if the CPU turns out

> not to support the v7 one.

> 3) revert the patch, and just enable the CP15 barriers (and issue a v7

> barrier) in the v7 on() and flush() routines.

> 

> I am leaning towards the latter, since it is the most straightforward,

> even though it mixes v7 and cp15 barriers in the same function, but

> that was mostly a cosmetic concern anyway.


I'm going to drop the patches - if -rc8 is released tomorrow maybe we
can have a go at solving it next week.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Marc Zyngier Nov. 16, 2019, 12:32 p.m. UTC | #3
On Sat, 16 Nov 2019 10:26:27 +0000
Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

> (+ Arnd)

> 

> On Sat, 16 Nov 2019 at 05:54, kernelci.org bot <bot@kernelci.org> wrote:

> >

> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

> > * This automated bisection report was sent to you on the basis  *

> > * that you may be involved with the breaking commit it has      *

> > * found.  No manual investigation has been done to verify it,   *

> > * and the root cause of the problem may be somewhere else.      *

> > *                                                               *

> > * If you do send a fix, please include this trailer:            *

> > *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *

> > *                                                               *

> > * Hope this helps!                                              *

> > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

> >

> > rmk/for-next bisection: boot on ox820-cloudengines-pogoplug-series-3

> >

> > Summary:

> >   Start:      b6c3c42cfda0 ARM: 8938/1: kernel: initialize broadcast hrtimer based clock event device

> >   Details:    https://kernelci.org/boot/id/5dcf3f0359b514dc84cf54c8

> >   Plain log:  https://storage.kernelci.org//rmk/for-next/v5.4-rc5-26-gb6c3c42cfda0/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/boot-ox820-cloudengines-pogoplug-series-3.txt

> >   HTML log:   https://storage.kernelci.org//rmk/for-next/v5.4-rc5-26-gb6c3c42cfda0/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/boot-ox820-cloudengines-pogoplug-series-3.html

> >   Result:     ea70bf6e92c5 ARM: 8935/1: decompressor: avoid CP15 barrier instructions in v7 cache setup code

> >  

> 

> OK, so this regression is caused by the fact that the 'armv7' cache

> maintenance routines in the decompressor are also used for ARMv6 cores

> if they implement the CPUID extension, which I failed to realise when

> I sent this patch.


Nobody expects the 11MPcore... :-(.

> There are roughly three ways to deal with this:

> 1) add a mask/val match pair for ARM11MPcore and ARM1176 that hardwire

> them to the ARMv6 routines, even though they implement the CPUID

> extension. This would be very easy, but assumes that those two cores

> are the only ones that are affected by this.

> 2) modify the v7 routines to check for the L1Hvd MMFR1 attribute (in

> the flush routine) and for the CP15BEN SCTLR bit (in the on/off

> routines), and jump to the respective v6 variants if the CPU turns out

> not to support the v7 one.

> 3) revert the patch, and just enable the CP15 barriers (and issue a v7

> barrier) in the v7 on() and flush() routines.

> 

> I am leaning towards the latter, since it is the most straightforward,

> even though it mixes v7 and cp15 barriers in the same function, but

> that was mostly a cosmetic concern anyway.


A potential alternative is to check for the presence of architected
barriers in a macro (see the hack below). I've given it a go as a 32bit
guest on an A72 box, both as ARM and Thumb, and nothing caught fire. Of
course, it remains to be seen whether it works on a v6 machine (I don't
think I have any left in my zoo -- please don't send me any), and more
importantly whether we want to carry this kind of horror...

	M.

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index ec14687aea3c..144de4b08547 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -656,19 +656,40 @@ params:		ldr	r0, =0x10000100		@ params_phys for RPC
 		.align
 #endif
 
-		.macro	v7dsb
+		@ Check for architected barrier instructions
+		@ Branch to the tgt label if v7 barriers are
+		@ not available. Corrupts the tmp register
+		@ as well as the flags.
+		.macro	no_v7_barrier, tmp, tgt
+		mrc	p15, 0, \tmp, c0, c2, 4 @ ID_ISAR4
+		tst	\tmp, #0xf << 16
+		beq	\tgt
+		.endm
+
+		@ The following macros will use zreg as a temp
+		@ register, and will zero it after use.
+		.macro	__dsb, zreg
+		no_v7_barrier	\zreg, .L__dsb\@
  ARM(		.inst	0xf57ff04f		@ v7+ dsb	)
  THUMB(		dsb						)
+		mov	\zreg, #0
+.L__dsb\@:	mcreq	p15, 0, \zreg, c7, c10, 4 @ dsb
 		.endm
 
-		.macro	v7dmb
+		.macro	__dmb, zreg
+		no_v7_barrier	\zreg, .L__dmb\@
  ARM(		.inst	0xf57ff05f		@ v7+ dmb	)
  THUMB(		dmb						)
+		mov	\zreg, #0
+.L__dmb\@:	mcreq	p15, 0, \zreg, c7, c10, 5 @ dmb
 		.endm
 
-		.macro	v7isb
+		.macro	__isb, zreg
+		no_v7_barrier	\zreg, .L__isb\@
  ARM(		.inst	0xf57ff06f		@ v7+ isb	)
  THUMB(		isb						)
+		mov	\zreg, #0
+.L__isb\@:	mcreq	p15, 0, \zreg, c7, c5, 4 @ isb
 		.endm
 
 /*
@@ -841,8 +862,7 @@ __armv7_mmu_cache_on:
 		tst	r11, #0xf		@ VMSA
 		movne	r6, #CB_BITS | 0x02	@ !XN
 		blne	__setup_mmu
-		mov	r0, #0
-		v7dsb				@ drain write buffer
+		__dsb	r0			@ drain write buffer
 		tst	r11, #0xf		@ VMSA
 		mcrne	p15, 0, r0, c8, c7, 0	@ flush I,D TLBs
 #endif
@@ -864,11 +884,10 @@ __armv7_mmu_cache_on:
 		mcrne	p15, 0, r1, c3, c0, 0	@ load domain access control
 		mcrne   p15, 0, r6, c2, c0, 2   @ load ttb control
 #endif
-		v7isb
+		__isb	lr
 		mcr	p15, 0, r0, c1, c0, 0	@ load control register
 		mrc	p15, 0, r0, c1, c0, 0	@ and read it back
-		mov	r0, #0
-		v7isb
+		__isb	r0
 		mov	pc, r12
 
 __fa526_cache_on:
@@ -1169,8 +1188,8 @@ __armv7_mmu_cache_off:
 		mcr	p15, 0, r0, c8, c7, 0	@ invalidate whole TLB
 #endif
 		mcr	p15, 0, r0, c7, c5, 6	@ invalidate BTC
-		v7dsb
-		v7isb
+		__dsb	r0
+		__isb	r0
 		mov	pc, r12
 
 /*
@@ -1233,7 +1252,7 @@ __armv7_mmu_cache_flush:
 		mcr	p15, 0, r10, c7, c14, 0	@ clean+invalidate D
 		b	iflush
 hierarchical:
-		v7dmb
+		__dmb	r10
 		stmfd	sp!, {r0-r7, r9-r11}
 		mrc	p15, 1, r0, c0, c0, 1	@ read clidr
 		ands	r3, r0, #0x7000000	@ extract loc from clidr
@@ -1247,7 +1266,7 @@ loop1:
 		cmp	r1, #2			@ see what cache we have at this level
 		blt	skip			@ skip if no cache, or just i-cache
 		mcr	p15, 2, r10, c0, c0, 0	@ select current cache level in cssr
-		v7isb				@ isb to sych the new cssr&csidr
+		__isb	r1			@ isb to sych the new cssr&csidr
 		mrc	p15, 1, r1, c0, c0, 0	@ read the new csidr
 		and	r2, r1, #7		@ extract the length of the cache lines
 		add	r2, r2, #4		@ add 4 (line length offset)
@@ -1279,10 +1298,10 @@ finished:
 		mov	r10, #0			@ switch back to cache level 0
 		mcr	p15, 2, r10, c0, c0, 0	@ select current cache level in cssr
 iflush:
-		v7dsb
+		__dsb	r10
 		mcr	p15, 0, r10, c7, c5, 0	@ invalidate I+BTB
-		v7dsb
-		v7isb
+		__dsb	r10
+		__isb	r10
 		mov	pc, lr
 
 __armv5tej_mmu_cache_flush:

-- 
Jazz is not dead. It just smells funny...
Ard Biesheuvel Nov. 16, 2019, 3:35 p.m. UTC | #4
On Sat, 16 Nov 2019 at 13:32, Marc Zyngier <maz@kernel.org> wrote:
>

> On Sat, 16 Nov 2019 10:26:27 +0000

> Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

>

> > (+ Arnd)

> >

> > On Sat, 16 Nov 2019 at 05:54, kernelci.org bot <bot@kernelci.org> wrote:

> > >

> > > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

> > > * This automated bisection report was sent to you on the basis  *

> > > * that you may be involved with the breaking commit it has      *

> > > * found.  No manual investigation has been done to verify it,   *

> > > * and the root cause of the problem may be somewhere else.      *

> > > *                                                               *

> > > * If you do send a fix, please include this trailer:            *

> > > *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *

> > > *                                                               *

> > > * Hope this helps!                                              *

> > > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

> > >

> > > rmk/for-next bisection: boot on ox820-cloudengines-pogoplug-series-3

> > >

> > > Summary:

> > >   Start:      b6c3c42cfda0 ARM: 8938/1: kernel: initialize broadcast hrtimer based clock event device

> > >   Details:    https://kernelci.org/boot/id/5dcf3f0359b514dc84cf54c8

> > >   Plain log:  https://storage.kernelci.org//rmk/for-next/v5.4-rc5-26-gb6c3c42cfda0/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/boot-ox820-cloudengines-pogoplug-series-3.txt

> > >   HTML log:   https://storage.kernelci.org//rmk/for-next/v5.4-rc5-26-gb6c3c42cfda0/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/boot-ox820-cloudengines-pogoplug-series-3.html

> > >   Result:     ea70bf6e92c5 ARM: 8935/1: decompressor: avoid CP15 barrier instructions in v7 cache setup code

> > >

> >

> > OK, so this regression is caused by the fact that the 'armv7' cache

> > maintenance routines in the decompressor are also used for ARMv6 cores

> > if they implement the CPUID extension, which I failed to realise when

> > I sent this patch.

>

> Nobody expects the 11MPcore... :-(.

>


Yeah :-)

> > There are roughly three ways to deal with this:

> > 1) add a mask/val match pair for ARM11MPcore and ARM1176 that hardwire

> > them to the ARMv6 routines, even though they implement the CPUID

> > extension. This would be very easy, but assumes that those two cores

> > are the only ones that are affected by this.

> > 2) modify the v7 routines to check for the L1Hvd MMFR1 attribute (in

> > the flush routine) and for the CP15BEN SCTLR bit (in the on/off

> > routines), and jump to the respective v6 variants if the CPU turns out

> > not to support the v7 one.

> > 3) revert the patch, and just enable the CP15 barriers (and issue a v7

> > barrier) in the v7 on() and flush() routines.

> >

> > I am leaning towards the latter, since it is the most straightforward,

> > even though it mixes v7 and cp15 barriers in the same function, but

> > that was mostly a cosmetic concern anyway.

>

> A potential alternative is to check for the presence of architected

> barriers in a macro (see the hack below). I've given it a go as a 32bit

> guest on an A72 box, both as ARM and Thumb, and nothing caught fire. Of

> course, it remains to be seen whether it works on a v6 machine (I don't

> think I have any left in my zoo -- please don't send me any), and more

> importantly whether we want to carry this kind of horror...

>


Thanks.

But as I said, it might be better just to enable the CP15 barriers.
They haven't been removed from the architecture yet, so they are
always supported - we just have to enable them.



>

> diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S

> index ec14687aea3c..144de4b08547 100644

> --- a/arch/arm/boot/compressed/head.S

> +++ b/arch/arm/boot/compressed/head.S

> @@ -656,19 +656,40 @@ params:           ldr     r0, =0x10000100         @ params_phys for RPC

>                 .align

>  #endif

>

> -               .macro  v7dsb

> +               @ Check for architected barrier instructions

> +               @ Branch to the tgt label if v7 barriers are

> +               @ not available. Corrupts the tmp register

> +               @ as well as the flags.

> +               .macro  no_v7_barrier, tmp, tgt

> +               mrc     p15, 0, \tmp, c0, c2, 4 @ ID_ISAR4

> +               tst     \tmp, #0xf << 16

> +               beq     \tgt

> +               .endm

> +

> +               @ The following macros will use zreg as a temp

> +               @ register, and will zero it after use.

> +               .macro  __dsb, zreg

> +               no_v7_barrier   \zreg, .L__dsb\@

>   ARM(          .inst   0xf57ff04f              @ v7+ dsb       )

>   THUMB(                dsb                                             )

> +               mov     \zreg, #0

> +.L__dsb\@:     mcreq   p15, 0, \zreg, c7, c10, 4 @ dsb

>                 .endm

>

> -               .macro  v7dmb

> +               .macro  __dmb, zreg

> +               no_v7_barrier   \zreg, .L__dmb\@

>   ARM(          .inst   0xf57ff05f              @ v7+ dmb       )

>   THUMB(                dmb                                             )

> +               mov     \zreg, #0

> +.L__dmb\@:     mcreq   p15, 0, \zreg, c7, c10, 5 @ dmb

>                 .endm

>

> -               .macro  v7isb

> +               .macro  __isb, zreg

> +               no_v7_barrier   \zreg, .L__isb\@

>   ARM(          .inst   0xf57ff06f              @ v7+ isb       )

>   THUMB(                isb                                             )

> +               mov     \zreg, #0

> +.L__isb\@:     mcreq   p15, 0, \zreg, c7, c5, 4 @ isb

>                 .endm

>

>  /*

> @@ -841,8 +862,7 @@ __armv7_mmu_cache_on:

>                 tst     r11, #0xf               @ VMSA

>                 movne   r6, #CB_BITS | 0x02     @ !XN

>                 blne    __setup_mmu

> -               mov     r0, #0

> -               v7dsb                           @ drain write buffer

> +               __dsb   r0                      @ drain write buffer

>                 tst     r11, #0xf               @ VMSA

>                 mcrne   p15, 0, r0, c8, c7, 0   @ flush I,D TLBs

>  #endif

> @@ -864,11 +884,10 @@ __armv7_mmu_cache_on:

>                 mcrne   p15, 0, r1, c3, c0, 0   @ load domain access control

>                 mcrne   p15, 0, r6, c2, c0, 2   @ load ttb control

>  #endif

> -               v7isb

> +               __isb   lr

>                 mcr     p15, 0, r0, c1, c0, 0   @ load control register

>                 mrc     p15, 0, r0, c1, c0, 0   @ and read it back

> -               mov     r0, #0

> -               v7isb

> +               __isb   r0

>                 mov     pc, r12

>

>  __fa526_cache_on:

> @@ -1169,8 +1188,8 @@ __armv7_mmu_cache_off:

>                 mcr     p15, 0, r0, c8, c7, 0   @ invalidate whole TLB

>  #endif

>                 mcr     p15, 0, r0, c7, c5, 6   @ invalidate BTC

> -               v7dsb

> -               v7isb

> +               __dsb   r0

> +               __isb   r0

>                 mov     pc, r12

>

>  /*

> @@ -1233,7 +1252,7 @@ __armv7_mmu_cache_flush:

>                 mcr     p15, 0, r10, c7, c14, 0 @ clean+invalidate D

>                 b       iflush

>  hierarchical:

> -               v7dmb

> +               __dmb   r10

>                 stmfd   sp!, {r0-r7, r9-r11}

>                 mrc     p15, 1, r0, c0, c0, 1   @ read clidr

>                 ands    r3, r0, #0x7000000      @ extract loc from clidr

> @@ -1247,7 +1266,7 @@ loop1:

>                 cmp     r1, #2                  @ see what cache we have at this level

>                 blt     skip                    @ skip if no cache, or just i-cache

>                 mcr     p15, 2, r10, c0, c0, 0  @ select current cache level in cssr

> -               v7isb                           @ isb to sych the new cssr&csidr

> +               __isb   r1                      @ isb to sych the new cssr&csidr

>                 mrc     p15, 1, r1, c0, c0, 0   @ read the new csidr

>                 and     r2, r1, #7              @ extract the length of the cache lines

>                 add     r2, r2, #4              @ add 4 (line length offset)

> @@ -1279,10 +1298,10 @@ finished:

>                 mov     r10, #0                 @ switch back to cache level 0

>                 mcr     p15, 2, r10, c0, c0, 0  @ select current cache level in cssr

>  iflush:

> -               v7dsb

> +               __dsb   r10

>                 mcr     p15, 0, r10, c7, c5, 0  @ invalidate I+BTB

> -               v7dsb

> -               v7isb

> +               __dsb   r10

> +               __isb   r10

>                 mov     pc, lr

>

>  __armv5tej_mmu_cache_flush:

>

> --

> Jazz is not dead. It just smells funny...
Ard Biesheuvel Nov. 16, 2019, 3:35 p.m. UTC | #5
On Sat, 16 Nov 2019 at 11:49, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>

> On Sat, Nov 16, 2019 at 10:26:27AM +0000, Ard Biesheuvel wrote:

> > (+ Arnd)

> >

> > On Sat, 16 Nov 2019 at 05:54, kernelci.org bot <bot@kernelci.org> wrote:

> > >

> > > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

> > > * This automated bisection report was sent to you on the basis  *

> > > * that you may be involved with the breaking commit it has      *

> > > * found.  No manual investigation has been done to verify it,   *

> > > * and the root cause of the problem may be somewhere else.      *

> > > *                                                               *

> > > * If you do send a fix, please include this trailer:            *

> > > *   Reported-by: "kernelci.org bot" <bot@kernelci.org>          *

> > > *                                                               *

> > > * Hope this helps!                                              *

> > > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

> > >

> > > rmk/for-next bisection: boot on ox820-cloudengines-pogoplug-series-3

> > >

> > > Summary:

> > >   Start:      b6c3c42cfda0 ARM: 8938/1: kernel: initialize broadcast hrtimer based clock event device

> > >   Details:    https://kernelci.org/boot/id/5dcf3f0359b514dc84cf54c8

> > >   Plain log:  https://storage.kernelci.org//rmk/for-next/v5.4-rc5-26-gb6c3c42cfda0/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/boot-ox820-cloudengines-pogoplug-series-3.txt

> > >   HTML log:   https://storage.kernelci.org//rmk/for-next/v5.4-rc5-26-gb6c3c42cfda0/arm/oxnas_v6_defconfig/gcc-8/lab-baylibre/boot-ox820-cloudengines-pogoplug-series-3.html

> > >   Result:     ea70bf6e92c5 ARM: 8935/1: decompressor: avoid CP15 barrier instructions in v7 cache setup code

> > >

> >

> > OK, so this regression is caused by the fact that the 'armv7' cache

> > maintenance routines in the decompressor are also used for ARMv6 cores

> > if they implement the CPUID extension, which I failed to realise when

> > I sent this patch.

> >

> > There are roughly three ways to deal with this:

> > 1) add a mask/val match pair for ARM11MPcore and ARM1176 that hardwire

> > them to the ARMv6 routines, even though they implement the CPUID

> > extension. This would be very easy, but assumes that those two cores

> > are the only ones that are affected by this.

> > 2) modify the v7 routines to check for the L1Hvd MMFR1 attribute (in

> > the flush routine) and for the CP15BEN SCTLR bit (in the on/off

> > routines), and jump to the respective v6 variants if the CPU turns out

> > not to support the v7 one.

> > 3) revert the patch, and just enable the CP15 barriers (and issue a v7

> > barrier) in the v7 on() and flush() routines.

> >

> > I am leaning towards the latter, since it is the most straightforward,

> > even though it mixes v7 and cp15 barriers in the same function, but

> > that was mostly a cosmetic concern anyway.

>

> I'm going to drop the patches - if -rc8 is released tomorrow maybe we

> can have a go at solving it next week.

>


Fair enough.
diff mbox series

Patch

diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 95238146b7f2..fe279816b298 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -656,6 +656,21 @@  params:		ldr	r0, =0x10000100		@ params_phys for RPC
 		.align
 #endif
 
+		.macro	v7dsb
+ ARM(		.inst	0xf57ff04f		@ v7+ dsb	)
+ THUMB(		dsb						)
+		.endm
+
+		.macro	v7dmb
+ ARM(		.inst	0xf57ff05f		@ v7+ dmb	)
+ THUMB(		dmb						)
+		.endm
+
+		.macro	v7isb
+ ARM(		.inst	0xf57ff06f		@ v7+ isb	)
+ THUMB(		isb						)
+		.endm
+
 /*
  * Turn on the cache.  We need to setup some page tables so that we
  * can have both the I and D caches on.
@@ -827,7 +842,7 @@  __armv7_mmu_cache_on:
 		movne	r6, #CB_BITS | 0x02	@ !XN
 		blne	__setup_mmu
 		mov	r0, #0
-		mcr	p15, 0, r0, c7, c10, 4	@ drain write buffer
+		v7dsb				@ drain write buffer
 		tst	r11, #0xf		@ VMSA
 		mcrne	p15, 0, r0, c8, c7, 0	@ flush I,D TLBs
 #endif
@@ -849,11 +864,11 @@  __armv7_mmu_cache_on:
 		mcrne	p15, 0, r1, c3, c0, 0	@ load domain access control
 		mcrne   p15, 0, r6, c2, c0, 2   @ load ttb control
 #endif
-		mcr	p15, 0, r0, c7, c5, 4	@ ISB
+		v7isb
 		mcr	p15, 0, r0, c1, c0, 0	@ load control register
 		mrc	p15, 0, r0, c1, c0, 0	@ and read it back
 		mov	r0, #0
-		mcr	p15, 0, r0, c7, c5, 4	@ ISB
+		v7isb
 		mov	pc, r12
 
 __fa526_cache_on:
@@ -1154,8 +1169,8 @@  __armv7_mmu_cache_off:
 		mcr	p15, 0, r0, c8, c7, 0	@ invalidate whole TLB
 #endif
 		mcr	p15, 0, r0, c7, c5, 6	@ invalidate BTC
-		mcr	p15, 0, r0, c7, c10, 4	@ DSB
-		mcr	p15, 0, r0, c7, c5, 4	@ ISB
+		v7dsb
+		v7isb
 		mov	pc, r12
 
 /*
@@ -1218,7 +1233,7 @@  __armv7_mmu_cache_flush:
 		mcr	p15, 0, r10, c7, c14, 0	@ clean+invalidate D
 		b	iflush
 hierarchical:
-		mcr	p15, 0, r10, c7, c10, 5	@ DMB
+		v7dmb
 		stmfd	sp!, {r0-r7, r9-r11}
 		mrc	p15, 1, r0, c0, c0, 1	@ read clidr
 		ands	r3, r0, #0x7000000	@ extract loc from clidr
@@ -1232,7 +1247,7 @@  loop1:
 		cmp	r1, #2			@ see what cache we have at this level
 		blt	skip			@ skip if no cache, or just i-cache
 		mcr	p15, 2, r10, c0, c0, 0	@ select current cache level in cssr
-		mcr	p15, 0, r10, c7, c5, 4	@ isb to sych the new cssr&csidr
+		v7isb				@ isb to sych the new cssr&csidr
 		mrc	p15, 1, r1, c0, c0, 0	@ read the new csidr
 		and	r2, r1, #7		@ extract the length of the cache lines
 		add	r2, r2, #4		@ add 4 (line length offset)
@@ -1264,10 +1279,10 @@  finished:
 		mov	r10, #0			@ switch back to cache level 0
 		mcr	p15, 2, r10, c0, c0, 0	@ select current cache level in cssr
 iflush:
-		mcr	p15, 0, r10, c7, c10, 4	@ DSB
+		v7dsb
 		mcr	p15, 0, r10, c7, c5, 0	@ invalidate I+BTB
-		mcr	p15, 0, r10, c7, c10, 4	@ DSB
-		mcr	p15, 0, r10, c7, c5, 4	@ ISB
+		v7dsb
+		v7isb
 		mov	pc, lr
 
 __armv5tej_mmu_cache_flush: