[Xen-devel,for-4.7,v2] xen/arm: Force broadcast of TLB and instruction cache maintenance instructions

Message ID 1461756173-10300-1-git-send-email-julien.grall@arm.com
State New
Headers show

Commit Message

Julien Grall April 27, 2016, 11:22 a.m.
UP guest may use TLB instructions to flush only on the local CPU.
Therefore, TLB flush will not be broadcasted across all the CPUs within
the same innershareable domain.

When the vCPU is migrated between different CPUs, it may be rescheduled
to a previous CPU where the TLB has not been flushed. The TLB may
contain stale entries which will result to translate incorrectly a VA to
IPA or even cause TLB conflicts.

To avoid a such situation, it is possible to set HCR_EL2.FB, which will
force the broadcast of TLB and instruction cache maintenance instructions.

The performance impact of setting HCR_EL2.FB will depend on how often
a guest makes use of local flush instructions.

ARM64 Linux kernel is SMP-aware (no possibility to build only for UP).
Most of the flush instructions are innershareable. The local flushes are
limited to the boot (1 per CPU) and when a task is getting a new ASIC.
Therefore the impact of setting HCR.FB for those guests is very limited.

ARM32 Linux kernel offers the possibility to be built either for SMP or
UP. The number of local flush is very limited in the former kernel
whilst the latter will only issue local flushes. Therefore there will be
an impact to set HCR.FB for guest kernel only built for UP.

Note that the SMP kernel can run in a domain using 1 vCPU and it
will still make use of innershareable flush instruction.

Looking at other OSes, such as FreeBSD, they are very similar to ARM32
Linux kernel (i.e offering two configuration: SMP and UP).

However, nothing prevents an SMP-aware kernel to make more often use of
local flush instrutions.

In the case that HCR_EL2.FB is not set, Xen would need to:
    * Flush all the TLBs for the VMID associated to this domain
    * Invalidate all the entries from the branch predictor
    * Invalidate all the entries from the instruction cache
Those actions would only be needed when the vCPU is migrating between 2
physical CPUs.

Whilst this solution would have a negative performance impact on kernels
which do not heavily use local flush instructions, this may improve
performance for kernels only built for UP system.

For now implement the easiest solution (i.e setting HCR_EL2.FB). We can
revisit it if the performance impact is too high for UP kernel.

Signed-off-by: Julien Grall <julien.grall@arm.com>
---

This is a bug fix for Xen 4.7 and should be backported up to Xen 4.4
(first official release for ARM). Without this patch, UP guest will
crash if it gets migrated on a physical CPU with stale TLBs for this
guest.

    Changes in v2:
        - Rework the commit message to include the possible performance
        impact of setting HCR_EL2.FB.
---
 xen/arch/arm/traps.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Patch

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 5e865cf..9926a57 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -124,7 +124,8 @@  void init_traps(void)
 
     /* Setup hypervisor traps */
     WRITE_SYSREG(HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
-                 HCR_TWE|HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP, HCR_EL2);
+                 HCR_TWE|HCR_TWI|HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB,
+                 HCR_EL2);
     isb();
 }