[Xen-devel,v2,5/7] xen/arm32: Invalidate BTB on guest exit for Cortex A17 and 12

Message ID 20180131165334.23175-6-julien.grall@arm.com
State New
Headers show
Series
  • xen/arm32: Branch predictor hardening (XSA-254 variant 2)
Related show

Commit Message

Julien Grall Jan. 31, 2018, 4:53 p.m.
From: Julien Grall <julien.grall@linaro.org>

In order to avoid aliasing attackes agains the branch predictor, let's
invalidate the BTB on guest exist. This is made complicated by the fact
that we cannot take a branch invalidating the BTB.

This is based on the first version posrted by Marc Zyngier on Linux-arm
mailing list (see [1]).

This is part of XSA-254.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

[1] https://www.spinics.net/lists/arm-kernel/msg627032.html

---
    Changes in v2:
        - Add Stefano's reviewed-by
---
 xen/arch/arm/arm32/entry.S | 55 ++++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/arm/cpuerrata.c   | 19 ++++++++++++++++
 2 files changed, 74 insertions(+)

Comments

Julien Grall Jan. 31, 2018, 5:34 p.m. | #1
Hi,

On 31/01/18 16:53, Julien Grall wrote:
> +GLOBAL(hyp_traps_vector_bp_inv)
> +        /*
> +         * We encode the exception entry in the bottom 3 bits of
> +         * SP, and we have to guarantee to be 8 bytes aligned.
> +         */
> +        add sp, sp, #1                  /* Reset            7 */
> +        add sp, sp, #1                  /* Undef            6 */
> +        add sp, sp, #1                  /* Hypervisor Call  5 */
> +        add sp, sp, #1                  /* Prefetch abort   4 */
> +        add sp, sp, #1                  /* Data abort       3 */
> +        add sp, sp, #1                  /* Hypervisor       2 */
> +        add sp, sp, #1                  /* IRQ              1 */
> +        nop                             /* FIQ              0 */
> +
> +        mcr	p15, 0, r0, c7, c5, 6	    /* BPIALL */
> +        isb
> +
> +        /*
> +         * As we cannot use any temporary registers and cannot
> +         * clobber SP, we can decode the exception entry using
> +         * an unrolled binary search.
> +         */
> +        tst sp, #4
> +        bne 1f
> +
> +        tst sp, #2
> +        bne 3f
> +
> +        tst sp, #1
> +        bic sp, sp, #0x7
> +        bne trap_irq
> +        b   trap_fiq
> +
> +1:
> +        tst sp, #2
> +        bne 2f
> +
> +        tst sp, #1
> +        bic sp, sp, #0x7
> +        bne trap_hypervisor_call
> +        b   trap_prefetch_abort
> +
> +2:
> +        tst sp, #1
> +        bic sp, sp, #0x7
> +        bne trap_reset
> +        b   trap_undefined_instruction
> +
> +3:
> +        tst sp, #1
> +        bic sp, sp, #0x7
> +        bne trap_data_abort
> +        b   trap_guest_sync

Just after I hit the sent button Marc pointed me to a discussion on 
Linux ML [1].

They found a solution which would streamline the code, improve the 
readability and prioritize exception.

I would be tempt to use a similar solution on Xen. Stefano, do you have 
any opinions? Shall I respin the series or do a follow-up?

Cheers,
Stefano Stabellini Jan. 31, 2018, 5:39 p.m. | #2
On Wed, 31 Jan 2018, Julien Grall wrote:
> Hi,
> 
> On 31/01/18 16:53, Julien Grall wrote:
> > +GLOBAL(hyp_traps_vector_bp_inv)
> > +        /*
> > +         * We encode the exception entry in the bottom 3 bits of
> > +         * SP, and we have to guarantee to be 8 bytes aligned.
> > +         */
> > +        add sp, sp, #1                  /* Reset            7 */
> > +        add sp, sp, #1                  /* Undef            6 */
> > +        add sp, sp, #1                  /* Hypervisor Call  5 */
> > +        add sp, sp, #1                  /* Prefetch abort   4 */
> > +        add sp, sp, #1                  /* Data abort       3 */
> > +        add sp, sp, #1                  /* Hypervisor       2 */
> > +        add sp, sp, #1                  /* IRQ              1 */
> > +        nop                             /* FIQ              0 */
> > +
> > +        mcr	p15, 0, r0, c7, c5, 6	    /* BPIALL */
> > +        isb
> > +
> > +        /*
> > +         * As we cannot use any temporary registers and cannot
> > +         * clobber SP, we can decode the exception entry using
> > +         * an unrolled binary search.
> > +         */
> > +        tst sp, #4
> > +        bne 1f
> > +
> > +        tst sp, #2
> > +        bne 3f
> > +
> > +        tst sp, #1
> > +        bic sp, sp, #0x7
> > +        bne trap_irq
> > +        b   trap_fiq
> > +
> > +1:
> > +        tst sp, #2
> > +        bne 2f
> > +
> > +        tst sp, #1
> > +        bic sp, sp, #0x7
> > +        bne trap_hypervisor_call
> > +        b   trap_prefetch_abort
> > +
> > +2:
> > +        tst sp, #1
> > +        bic sp, sp, #0x7
> > +        bne trap_reset
> > +        b   trap_undefined_instruction
> > +
> > +3:
> > +        tst sp, #1
> > +        bic sp, sp, #0x7
> > +        bne trap_data_abort
> > +        b   trap_guest_sync
> 
> Just after I hit the sent button Marc pointed me to a discussion on Linux ML
> [1].
> 
> They found a solution which would streamline the code, improve the readability
> and prioritize exception.
> 
> I would be tempt to use a similar solution on Xen. Stefano, do you have any
> opinions? Shall I respin the series or do a follow-up?
 
You forgot the link :-)
I prefer if you respin, because it is going to make the backports
slightly easier.
Julien Grall Jan. 31, 2018, 5:39 p.m. | #3
On 31/01/18 17:39, Stefano Stabellini wrote:
> On Wed, 31 Jan 2018, Julien Grall wrote:
>> Hi,
>>
>> On 31/01/18 16:53, Julien Grall wrote:
>>> +GLOBAL(hyp_traps_vector_bp_inv)
>>> +        /*
>>> +         * We encode the exception entry in the bottom 3 bits of
>>> +         * SP, and we have to guarantee to be 8 bytes aligned.
>>> +         */
>>> +        add sp, sp, #1                  /* Reset            7 */
>>> +        add sp, sp, #1                  /* Undef            6 */
>>> +        add sp, sp, #1                  /* Hypervisor Call  5 */
>>> +        add sp, sp, #1                  /* Prefetch abort   4 */
>>> +        add sp, sp, #1                  /* Data abort       3 */
>>> +        add sp, sp, #1                  /* Hypervisor       2 */
>>> +        add sp, sp, #1                  /* IRQ              1 */
>>> +        nop                             /* FIQ              0 */
>>> +
>>> +        mcr	p15, 0, r0, c7, c5, 6	    /* BPIALL */
>>> +        isb
>>> +
>>> +        /*
>>> +         * As we cannot use any temporary registers and cannot
>>> +         * clobber SP, we can decode the exception entry using
>>> +         * an unrolled binary search.
>>> +         */
>>> +        tst sp, #4
>>> +        bne 1f
>>> +
>>> +        tst sp, #2
>>> +        bne 3f
>>> +
>>> +        tst sp, #1
>>> +        bic sp, sp, #0x7
>>> +        bne trap_irq
>>> +        b   trap_fiq
>>> +
>>> +1:
>>> +        tst sp, #2
>>> +        bne 2f
>>> +
>>> +        tst sp, #1
>>> +        bic sp, sp, #0x7
>>> +        bne trap_hypervisor_call
>>> +        b   trap_prefetch_abort
>>> +
>>> +2:
>>> +        tst sp, #1
>>> +        bic sp, sp, #0x7
>>> +        bne trap_reset
>>> +        b   trap_undefined_instruction
>>> +
>>> +3:
>>> +        tst sp, #1
>>> +        bic sp, sp, #0x7
>>> +        bne trap_data_abort
>>> +        b   trap_guest_sync
>>
>> Just after I hit the sent button Marc pointed me to a discussion on Linux ML
>> [1].
>>
>> They found a solution which would streamline the code, improve the readability
>> and prioritize exception.
>>
>> I would be tempt to use a similar solution on Xen. Stefano, do you have any
>> opinions? Shall I respin the series or do a follow-up?
>   
> You forgot the link :-)
> I prefer if you respin, because it is going to make the backports
> slightly easier.

Here the link:

https://www.spinics.net/lists/arm-kernel/msg631860.html

Cheers,

Patch

diff --git a/xen/arch/arm/arm32/entry.S b/xen/arch/arm/arm32/entry.S
index 828e52c25c..a295f3ad67 100644
--- a/xen/arch/arm/arm32/entry.S
+++ b/xen/arch/arm/arm32/entry.S
@@ -160,6 +160,61 @@  GLOBAL(hyp_traps_vector)
         b trap_irq                      /* 0x18 - IRQ */
         b trap_fiq                      /* 0x1c - FIQ */
 
+        .align 5
+GLOBAL(hyp_traps_vector_bp_inv)
+        /*
+         * We encode the exception entry in the bottom 3 bits of
+         * SP, and we have to guarantee to be 8 bytes aligned.
+         */
+        add sp, sp, #1                  /* Reset            7 */
+        add sp, sp, #1                  /* Undef            6 */
+        add sp, sp, #1                  /* Hypervisor Call  5 */
+        add sp, sp, #1                  /* Prefetch abort   4 */
+        add sp, sp, #1                  /* Data abort       3 */
+        add sp, sp, #1                  /* Hypervisor       2 */
+        add sp, sp, #1                  /* IRQ              1 */
+        nop                             /* FIQ              0 */
+
+        mcr	p15, 0, r0, c7, c5, 6	    /* BPIALL */
+        isb
+
+        /*
+         * As we cannot use any temporary registers and cannot
+         * clobber SP, we can decode the exception entry using
+         * an unrolled binary search.
+         */
+        tst sp, #4
+        bne 1f
+
+        tst sp, #2
+        bne 3f
+
+        tst sp, #1
+        bic sp, sp, #0x7
+        bne trap_irq
+        b   trap_fiq
+
+1:
+        tst sp, #2
+        bne 2f
+
+        tst sp, #1
+        bic sp, sp, #0x7
+        bne trap_hypervisor_call
+        b   trap_prefetch_abort
+
+2:
+        tst sp, #1
+        bic sp, sp, #0x7
+        bne trap_reset
+        b   trap_undefined_instruction
+
+3:
+        tst sp, #1
+        bic sp, sp, #0x7
+        bne trap_data_abort
+        b   trap_guest_sync
+
 DEFINE_TRAP_ENTRY(reset)
 DEFINE_TRAP_ENTRY(undefined_instruction)
 DEFINE_TRAP_ENTRY(hypervisor_call)
diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
index 0a138fa735..c79e6d65d3 100644
--- a/xen/arch/arm/cpuerrata.c
+++ b/xen/arch/arm/cpuerrata.c
@@ -198,6 +198,13 @@  install_bp_hardening_vecs(const struct arm_cpu_capabilities *entry,
     this_cpu(bp_harden_vecs) = hyp_vecs;
 }
 
+static int enable_bp_inv_hardening(void *data)
+{
+    install_bp_hardening_vecs(data, hyp_traps_vector_bp_inv,
+                              "execute BPIALL");
+    return 0;
+}
+
 #endif
 
 #define MIDR_RANGE(model, min, max)     \
@@ -284,6 +291,18 @@  static const struct arm_cpu_capabilities arm_errata[] = {
         .enable = enable_psci_bp_hardening,
     },
 #endif
+#ifdef CONFIG_ARM32_HARDEN_BRANCH_PREDICTOR
+    {
+        .capability = ARM_HARDEN_BRANCH_PREDICTOR,
+        MIDR_ALL_VERSIONS(MIDR_CORTEX_A12),
+        .enable = enable_bp_inv_hardening,
+    },
+    {
+        .capability = ARM_HARDEN_BRANCH_PREDICTOR,
+        MIDR_ALL_VERSIONS(MIDR_CORTEX_A17),
+        .enable = enable_bp_inv_hardening,
+    },
+#endif
     {},
 };