[Xen-devel,RFC,15/16] xen/arm: Implement Set/Way operations

Message ID 20181008183352.16291-16-julien.grall@arm.com
State New
Headers show
Series
  • xen/arm: Implement Set/Way operations
Related show

Commit Message

Julien Grall Oct. 8, 2018, 6:33 p.m.
Set/Way operations are used to perform maintenance on a given cache.
At the moment, Set/Way operations are not trapped and therefore a guest
OS will directly act on the local cache. However, a vCPU may migrate to
another pCPU in the middle of the processor. This will result to have
cache with stall data (Set/Way are not propagated) potentially causing
crash. This may be the cause of heisenbug noticed in Osstest [1].

Furthermore, Set/Way operations are not available on system cache. This
means that OS, such as Linux 32-bit, relying on those operations to
fully clean the cache before disabling MMU may break because data may
sits in system caches and not in RAM.

For more details about Set/Way, see the talk "The Art of Virtualizing
Cache Maintenance" given at Xen Summit 2018 [2].

In the context of Xen, we need to trap Set/Way operations and emulate
them. From the Arm Arm (B1.14.4 in DDI 046C.c), Set/Way operations are
difficult to virtualized. So we can assume that a guest OS using them will
suffer the consequence (i.e slowness) until developer removes all the usage
of Set/Way.

As the software is not allowed to infer the Set/Way to Physical Address
mapping, Xen will need to go through the guest P2M and clean &
invalidate all the entries mapped.

Because Set/Way happen in batch (a loop on all Set/Way of a cache), Xen
would need to go through the P2M for every instructions. This is quite
expensive and would severely impact the guest OS. The implementation is
re-using the KVM policy to limit the number of flush:
    - If we trap a Set/Way operations, we enable VM trapping (i.e
      HVC_EL2.TVM) to detect cache being turned on/off, and do a full
    clean.
    - We clean the caches when turning on and off
    - Once the caches are enabled, we stop trapping VM instructions

[1] https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg03191.html
[2] https://fr.slideshare.net/xen_com_mgr/virtualizing-cache

Signed-off-by: Julien Grall <julien.grall@arm.com>
---
 xen/arch/arm/arm64/vsysreg.c | 27 +++++++++++++++++-
 xen/arch/arm/p2m.c           | 68 ++++++++++++++++++++++++++++++++++++++++++++
 xen/arch/arm/traps.c         |  2 +-
 xen/arch/arm/vcpreg.c        | 23 +++++++++++++++
 xen/include/asm-arm/p2m.h    | 16 +++++++++++
 5 files changed, 134 insertions(+), 2 deletions(-)

Comments

Stefano Stabellini Nov. 5, 2018, 9:10 p.m. | #1
On Mon, 8 Oct 2018, Julien Grall wrote:
> Set/Way operations are used to perform maintenance on a given cache.
> At the moment, Set/Way operations are not trapped and therefore a guest
> OS will directly act on the local cache. However, a vCPU may migrate to
> another pCPU in the middle of the processor. This will result to have
> cache with stall data (Set/Way are not propagated) potentially causing
> crash. This may be the cause of heisenbug noticed in Osstest [1].
> 
> Furthermore, Set/Way operations are not available on system cache. This
> means that OS, such as Linux 32-bit, relying on those operations to
> fully clean the cache before disabling MMU may break because data may
> sits in system caches and not in RAM.
> 
> For more details about Set/Way, see the talk "The Art of Virtualizing
> Cache Maintenance" given at Xen Summit 2018 [2].
> 
> In the context of Xen, we need to trap Set/Way operations and emulate
> them. From the Arm Arm (B1.14.4 in DDI 046C.c), Set/Way operations are
> difficult to virtualized. So we can assume that a guest OS using them will
> suffer the consequence (i.e slowness) until developer removes all the usage
> of Set/Way.
> 
> As the software is not allowed to infer the Set/Way to Physical Address
> mapping, Xen will need to go through the guest P2M and clean &
> invalidate all the entries mapped.
> 
> Because Set/Way happen in batch (a loop on all Set/Way of a cache), Xen
> would need to go through the P2M for every instructions. This is quite
> expensive and would severely impact the guest OS. The implementation is
> re-using the KVM policy to limit the number of flush:
>     - If we trap a Set/Way operations, we enable VM trapping (i.e
                   ^ remove 'a'

>       HVC_EL2.TVM) to detect cache being turned on/off, and do a full
>     clean.

"do a full clean" straight away, right? May I suggest a rewording of
this item:

- as soon as we trap a set/way operation, we enable VM trapping (i.e.
  HVC_EL2.TVM, it ll allow us to detect cache being turned on/off),
  then we do a full clean


>     - We clean the caches when turning on and off

"We clean the caches when the guest turns caches on or off"


>     - Once the caches are enabled, we stop trapping VM instructions
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg03191.html
> [2] https://fr.slideshare.net/xen_com_mgr/virtualizing-cache
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> ---
>  xen/arch/arm/arm64/vsysreg.c | 27 +++++++++++++++++-
>  xen/arch/arm/p2m.c           | 68 ++++++++++++++++++++++++++++++++++++++++++++
>  xen/arch/arm/traps.c         |  2 +-
>  xen/arch/arm/vcpreg.c        | 23 +++++++++++++++
>  xen/include/asm-arm/p2m.h    | 16 +++++++++++
>  5 files changed, 134 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
> index 1517879697..43c6c3e30d 100644
> --- a/xen/arch/arm/arm64/vsysreg.c
> +++ b/xen/arch/arm/arm64/vsysreg.c
> @@ -40,7 +40,20 @@ static bool vreg_emulate_##reg(struct cpu_user_regs *regs,          \
>  }
>  
>  /* Defining helpers for emulating sysreg registers. */
> -TVM_REG(SCTLR_EL1)
> +static bool vreg_emulate_SCTLR_EL1(struct cpu_user_regs *regs, uint64_t *r,
> +                                   bool read)
> +{
> +    struct vcpu *v = current;
> +    bool cache_enabled = vcpu_has_cache_enabled(v);
> +
> +    GUEST_BUG_ON(read);
> +    WRITE_SYSREG(*r, SCTLR_EL1);
> +
> +    p2m_toggle_cache(v, cache_enabled);
> +
> +    return true;
> +}
> +
>  TVM_REG(TTBR0_EL1)
>  TVM_REG(TTBR1_EL1)
>  TVM_REG(TCR_EL1)
> @@ -84,6 +97,18 @@ void do_sysreg(struct cpu_user_regs *regs,
>          break;
>  
>      /*
> +     * HCR_EL2.TSW
> +     *
> +     * ARMv8 (DDI 0487B.b): Table D1-42
> +     */
> +    case HSR_SYSREG_DCISW:
> +    case HSR_SYSREG_DCCSW:
> +    case HSR_SYSREG_DCCISW:
> +        if ( hsr.sysreg.read )

Shouldn't it be !hsr.sysreg.read ?


> +            p2m_set_way_flush(current);
> +        break;
> +
> +    /*
>       * HCR_EL2.TVM
>       *
>       * ARMv8 (DDI 0487B.b): Table D1-37
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index df6b48d73b..a3d4c563b1 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -1564,6 +1564,74 @@ int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end)
>      return 0;
>  }
>  
> +static void p2m_flush_vm(struct vcpu *v)
> +{
> +    struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);
> +
> +    /* XXX: Handle preemption */

Yes, good to have this reminder. Maybe add "we'd want to break the
operation down when it takes too long".


> +    p2m_cache_flush_range(v->domain, p2m->lowest_mapped_gfn,
> +                          p2m->max_mapped_gfn);
> +}
> +
> +/*
> + * See note at ARMv7 ARM B1.14.4 (DDI 0406C.c) (TL;DR: S/W ops are not
> + * easily virtualized).
> + *
> + * Main problems:
> + *  - S/W ops are local to a CPU (not broadcast)
> + *  - We have line migration behind our back (speculation)
> + *  - System caches don't support S/W at all (damn!)
> + *
> + * In the face of the above, the best we can do is to try and convert
> + * S/W ops to VA ops. Because the guest is not allowed to infer the S/W
> + * to PA mapping, it can only use S/W to nuke the whole cache, which is
> + * rather a good thing for us.
> + *
> + * Also, it is only used when turning caches on/off ("The expected
> + * usage of the cache maintenance instructions that operate by set/way
> + * is associated with the powerdown and powerup of caches, if this is
> + * required by the implementation.").
> + *
> + * We use the following policy:
> + *  - If we trap a S/W operation, we enabled VM trapping to detect
> + *  caches being turned on/off, and do a full clean.
> + *
> + *  - We flush the caches on both caches being turned on and off.
> + *
> + *  - Once the caches are enabled, we stop trapping VM ops.
> + */
> +void p2m_set_way_flush(struct vcpu *v)
> +{
> +    /* This function can only work with the current vCPU. */
> +    ASSERT(v == current);

NIT: if it can only operate on current, it makes sense to remove the
struct vcpu* parameter


> +    if ( !(v->arch.hcr_el2 & HCR_TVM) )
> +    {
> +        p2m_flush_vm(v);
> +        vcpu_hcr_set_flags(v, HCR_TVM);
> +    }
> +}
> +
> +void p2m_toggle_cache(struct vcpu *v, bool was_enabled)
> +{
> +    bool now_enabled = vcpu_has_cache_enabled(v);
> +
> +    /* This function can only work with the current vCPU. */
> +    ASSERT(v == current);

NIT: same about struct vcpu* as parameter when only current can be used


> +    /*
> +     * If switching the MMU+caches on, need to invalidate the caches.
> +     * If switching it off, need to clean the caches.
> +     * Clean + invalidate does the trick always.
> +     */
> +    if ( was_enabled != now_enabled )
> +        p2m_flush_vm(v);
> +
> +    /* Caches are now on, stop trapping VM ops (until a S/W op) */
> +    if ( now_enabled )
> +        vcpu_hcr_clear_flags(v, HCR_TVM);
> +}
> +
>  mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn)
>  {
>      return p2m_lookup(d, gfn, NULL);
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 169b57cb6b..cdc10eee5a 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -98,7 +98,7 @@ register_t get_default_hcr_flags(void)
>  {
>      return  (HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
>               (vwfi != NATIVE ? (HCR_TWI|HCR_TWE) : 0) |
> -             HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB);
> +             HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB|HCR_TSW);
>  }
>  
>  static enum {
> diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c
> index 49529b97cd..dc46d9d0d7 100644
> --- a/xen/arch/arm/vcpreg.c
> +++ b/xen/arch/arm/vcpreg.c
> @@ -45,9 +45,14 @@
>  #define TVM_REG(sz, func, reg...)                                           \
>  static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read)    \
>  {                                                                           \
> +    struct vcpu *v = current;                                               \
> +    bool cache_enabled = vcpu_has_cache_enabled(v);                         \
> +                                                                            \
>      GUEST_BUG_ON(read);                                                     \
>      WRITE_SYSREG##sz(*r, reg);                                              \
>                                                                              \
> +    p2m_toggle_cache(v, cache_enabled);                                     \

This will affect all the registers trapped with TVM. Shouldn't we only
call p2m_toggle_cache when relevant? i.e. when changing SCTLR?
I think it would be better to only modify the SCTLR emulation handler.


>      return true;                                                            \
>  }
>  
> @@ -65,6 +70,8 @@ static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read)    \
>  static bool vreg_emulate_##xreg(struct cpu_user_regs *regs, uint32_t *r,    \
>                                  bool read, bool hi)                         \
>  {                                                                           \
> +    struct vcpu *v = current;                                               \
> +    bool cache_enabled = vcpu_has_cache_enabled(v);                         \
>      register_t reg = READ_SYSREG(xreg);                                     \
>                                                                              \
>      GUEST_BUG_ON(read);                                                     \
> @@ -80,6 +87,8 @@ static bool vreg_emulate_##xreg(struct cpu_user_regs *regs, uint32_t *r,    \
>      }                                                                       \
>      WRITE_SYSREG(reg, xreg);                                                \
>                                                                              \
> +    p2m_toggle_cache(v, cache_enabled);                                     \
> +                                                                            \
>      return true;                                                            \
>  }                                                                           \
>                                                                              \
> @@ -98,6 +107,7 @@ static bool vreg_emulate_##hireg(struct cpu_user_regs *regs, uint32_t *r,   \
>  
>  /* Defining helpers for emulating co-processor registers. */
>  TVM_REG32(SCTLR, SCTLR_EL1)
> +

Spurious change. Should be in a previous patch?

>  /*
>   * AArch32 provides two way to access TTBR* depending on the access
>   * size, whilst AArch64 provides one way.
> @@ -180,6 +190,19 @@ void do_cp15_32(struct cpu_user_regs *regs, const union hsr hsr)
>          break;
>  
>      /*
> +     * HCR_EL2.TSW
> +     *
> +     * ARMv7 (DDI 0406C.b): B1.14.6
> +     * ARMv8 (DDI 0487B.b): Table D1-42
> +     */
> +    case HSR_CPREG32(DCISW):
> +    case HSR_CPREG32(DCCSW):
> +    case HSR_CPREG32(DCCISW):
> +        if ( !cp32.read )
> +            p2m_set_way_flush(current);
> +        break;
> +
> +    /*
>       * HCR_EL2.TVM
>       *
>       * ARMv8 (DDI 0487B.b): Table D1-37
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index 92213dd1ab..c470f062db 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -231,6 +231,10 @@ int p2m_set_entry(struct p2m_domain *p2m,
>   */
>  int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end);
>  
> +void p2m_set_way_flush(struct vcpu *v);
> +
> +void p2m_toggle_cache(struct vcpu *v, bool was_enabled);
> +
>  /*
>   * Map a region in the guest p2m with a specific p2m type.
>   * The memory attributes will be derived from the p2m type.
> @@ -358,6 +362,18 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>      return -EOPNOTSUPP;
>  }
>  
> +/*
> + * A vCPU has cache enabled only when the MMU is enabled and data cache
> + * is enabled.
> + */
> +static inline bool vcpu_has_cache_enabled(struct vcpu *v)
> +{
> +    /* Only works with the current vCPU */
> +    ASSERT(current == v);

NIT: same about struct vcpu *v as parameter when only current makes
sense


> +    return (READ_SYSREG32(SCTLR_EL1) & (SCTLR_C|SCTLR_M)) == (SCTLR_C|SCTLR_M);
> +}
> +
>  #endif /* _XEN_P2M_H */
>  
>  /*
Julien Grall Nov. 5, 2018, 11:38 p.m. | #2
Hi Stefano,

On 11/5/18 9:10 PM, Stefano Stabellini wrote:
> On Mon, 8 Oct 2018, Julien Grall wrote:
>> Set/Way operations are used to perform maintenance on a given cache.
>> At the moment, Set/Way operations are not trapped and therefore a guest
>> OS will directly act on the local cache. However, a vCPU may migrate to
>> another pCPU in the middle of the processor. This will result to have
>> cache with stall data (Set/Way are not propagated) potentially causing
>> crash. This may be the cause of heisenbug noticed in Osstest [1].
>>
>> Furthermore, Set/Way operations are not available on system cache. This
>> means that OS, such as Linux 32-bit, relying on those operations to
>> fully clean the cache before disabling MMU may break because data may
>> sits in system caches and not in RAM.
>>
>> For more details about Set/Way, see the talk "The Art of Virtualizing
>> Cache Maintenance" given at Xen Summit 2018 [2].
>>
>> In the context of Xen, we need to trap Set/Way operations and emulate
>> them. From the Arm Arm (B1.14.4 in DDI 046C.c), Set/Way operations are
>> difficult to virtualized. So we can assume that a guest OS using them will
>> suffer the consequence (i.e slowness) until developer removes all the usage
>> of Set/Way.
>>
>> As the software is not allowed to infer the Set/Way to Physical Address
>> mapping, Xen will need to go through the guest P2M and clean &
>> invalidate all the entries mapped.
>>
>> Because Set/Way happen in batch (a loop on all Set/Way of a cache), Xen
>> would need to go through the P2M for every instructions. This is quite
>> expensive and would severely impact the guest OS. The implementation is
>> re-using the KVM policy to limit the number of flush:
>>      - If we trap a Set/Way operations, we enable VM trapping (i.e
>                     ^ remove 'a'
> 
>>        HVC_EL2.TVM) to detect cache being turned on/off, and do a full
>>      clean.
> 
> "do a full clean" straight away, right? May I suggest a rewording of
> this item:
> 
> - as soon as we trap a set/way operation, we enable VM trapping (i.e.
>    HVC_EL2.TVM, it ll allow us to detect cache being turned on/off),
>    then we do a full clean

Sure.

> 
> 
>>      - We clean the caches when turning on and off
> 
> "We clean the caches when the guest turns caches on or off"

Sure.

> 
> 
>>      - Once the caches are enabled, we stop trapping VM instructions
>>
>> [1] https://lists.xenproject.org/archives/html/xen-devel/2017-09/msg03191.html
>> [2] https://fr.slideshare.net/xen_com_mgr/virtualizing-cache
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> ---
>>   xen/arch/arm/arm64/vsysreg.c | 27 +++++++++++++++++-
>>   xen/arch/arm/p2m.c           | 68 ++++++++++++++++++++++++++++++++++++++++++++
>>   xen/arch/arm/traps.c         |  2 +-
>>   xen/arch/arm/vcpreg.c        | 23 +++++++++++++++
>>   xen/include/asm-arm/p2m.h    | 16 +++++++++++
>>   5 files changed, 134 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
>> index 1517879697..43c6c3e30d 100644
>> --- a/xen/arch/arm/arm64/vsysreg.c
>> +++ b/xen/arch/arm/arm64/vsysreg.c
>> @@ -40,7 +40,20 @@ static bool vreg_emulate_##reg(struct cpu_user_regs *regs,          \
>>   }
>>   
>>   /* Defining helpers for emulating sysreg registers. */
>> -TVM_REG(SCTLR_EL1)
>> +static bool vreg_emulate_SCTLR_EL1(struct cpu_user_regs *regs, uint64_t *r,
>> +                                   bool read)
>> +{
>> +    struct vcpu *v = current;
>> +    bool cache_enabled = vcpu_has_cache_enabled(v);
>> +
>> +    GUEST_BUG_ON(read);
>> +    WRITE_SYSREG(*r, SCTLR_EL1);
>> +
>> +    p2m_toggle_cache(v, cache_enabled);
>> +
>> +    return true;
>> +}
>> +
>>   TVM_REG(TTBR0_EL1)
>>   TVM_REG(TTBR1_EL1)
>>   TVM_REG(TCR_EL1)
>> @@ -84,6 +97,18 @@ void do_sysreg(struct cpu_user_regs *regs,
>>           break;
>>   
>>       /*
>> +     * HCR_EL2.TSW
>> +     *
>> +     * ARMv8 (DDI 0487B.b): Table D1-42
>> +     */
>> +    case HSR_SYSREG_DCISW:
>> +    case HSR_SYSREG_DCCSW:
>> +    case HSR_SYSREG_DCCISW:
>> +        if ( hsr.sysreg.read )
> 
> Shouldn't it be !hsr.sysreg.read ?

Hmmm yes.

> 
> 
>> +            p2m_set_way_flush(current);
>> +        break;
>> +
>> +    /*
>>        * HCR_EL2.TVM
>>        *
>>        * ARMv8 (DDI 0487B.b): Table D1-37
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index df6b48d73b..a3d4c563b1 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -1564,6 +1564,74 @@ int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end)
>>       return 0;
>>   }
>>   
>> +static void p2m_flush_vm(struct vcpu *v)
>> +{
>> +    struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);
>> +
>> +    /* XXX: Handle preemption */
> 
> Yes, good to have this reminder. Maybe add "we'd want to break the
> operation down when it takes too long".

I am planning to handle this before the series is actually merged. This 
is a massive security hole and easy to exploit.

> 
> 
>> +    p2m_cache_flush_range(v->domain, p2m->lowest_mapped_gfn,
>> +                          p2m->max_mapped_gfn);
>> +}
>> +
>> +/*
>> + * See note at ARMv7 ARM B1.14.4 (DDI 0406C.c) (TL;DR: S/W ops are not
>> + * easily virtualized).
>> + *
>> + * Main problems:
>> + *  - S/W ops are local to a CPU (not broadcast)
>> + *  - We have line migration behind our back (speculation)
>> + *  - System caches don't support S/W at all (damn!)
>> + *
>> + * In the face of the above, the best we can do is to try and convert
>> + * S/W ops to VA ops. Because the guest is not allowed to infer the S/W
>> + * to PA mapping, it can only use S/W to nuke the whole cache, which is
>> + * rather a good thing for us.
>> + *
>> + * Also, it is only used when turning caches on/off ("The expected
>> + * usage of the cache maintenance instructions that operate by set/way
>> + * is associated with the powerdown and powerup of caches, if this is
>> + * required by the implementation.").
>> + *
>> + * We use the following policy:
>> + *  - If we trap a S/W operation, we enabled VM trapping to detect
>> + *  caches being turned on/off, and do a full clean.
>> + *
>> + *  - We flush the caches on both caches being turned on and off.
>> + *
>> + *  - Once the caches are enabled, we stop trapping VM ops.
>> + */
>> +void p2m_set_way_flush(struct vcpu *v)
>> +{
>> +    /* This function can only work with the current vCPU. */
>> +    ASSERT(v == current);
> 
> NIT: if it can only operate on current, it makes sense to remove the
> struct vcpu* parameter

I prefer to keep struct vcpu *v here. This makes more straightforward to 
know on what the function is working on.

Furthermore, current is actually quite expensive to use in some 
circumstance.

For instance, in nested case, TPIDR_EL2 will get trapped to the host 
hypervisor.

So it would be best if we start avoid current whenever it is possible.


> 
> 
>> +    if ( !(v->arch.hcr_el2 & HCR_TVM) )
>> +    {
>> +        p2m_flush_vm(v);
>> +        vcpu_hcr_set_flags(v, HCR_TVM);
>> +    }
>> +}
>> +
>> +void p2m_toggle_cache(struct vcpu *v, bool was_enabled)
>> +{
>> +    bool now_enabled = vcpu_has_cache_enabled(v);
>> +
>> +    /* This function can only work with the current vCPU. */
>> +    ASSERT(v == current);
> 
> NIT: same about struct vcpu* as parameter when only current can be used
> 
> 
>> +    /*
>> +     * If switching the MMU+caches on, need to invalidate the caches.
>> +     * If switching it off, need to clean the caches.
>> +     * Clean + invalidate does the trick always.
>> +     */
>> +    if ( was_enabled != now_enabled )
>> +        p2m_flush_vm(v);
>> +
>> +    /* Caches are now on, stop trapping VM ops (until a S/W op) */
>> +    if ( now_enabled )
>> +        vcpu_hcr_clear_flags(v, HCR_TVM);
>> +}
>> +
>>   mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn)
>>   {
>>       return p2m_lookup(d, gfn, NULL);
>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>> index 169b57cb6b..cdc10eee5a 100644
>> --- a/xen/arch/arm/traps.c
>> +++ b/xen/arch/arm/traps.c
>> @@ -98,7 +98,7 @@ register_t get_default_hcr_flags(void)
>>   {
>>       return  (HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
>>                (vwfi != NATIVE ? (HCR_TWI|HCR_TWE) : 0) |
>> -             HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB);
>> +             HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB|HCR_TSW);
>>   }
>>   
>>   static enum {
>> diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c
>> index 49529b97cd..dc46d9d0d7 100644
>> --- a/xen/arch/arm/vcpreg.c
>> +++ b/xen/arch/arm/vcpreg.c
>> @@ -45,9 +45,14 @@
>>   #define TVM_REG(sz, func, reg...)                                           \
>>   static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read)    \
>>   {                                                                           \
>> +    struct vcpu *v = current;                                               \
>> +    bool cache_enabled = vcpu_has_cache_enabled(v);                         \
>> +                                                                            \
>>       GUEST_BUG_ON(read);                                                     \
>>       WRITE_SYSREG##sz(*r, reg);                                              \
>>                                                                               \
>> +    p2m_toggle_cache(v, cache_enabled);                                     \
> 
> This will affect all the registers trapped with TVM. Shouldn't we only
> call p2m_toggle_cache when relevant? i.e. when changing SCTLR?
> I think it would be better to only modify the SCTLR emulation handler.

This is made on purpose, you increase the chance to disable TVM as soon 
as possible. If you only rely on SCTLR, you may end up to trap a lot of 
registers for a long time.

FWIW, as I already wrote in the commit message, this is based on what 
KVM does.

> 
> 
>>       return true;                                                            \
>>   }
>>   
>> @@ -65,6 +70,8 @@ static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read)    \
>>   static bool vreg_emulate_##xreg(struct cpu_user_regs *regs, uint32_t *r,    \
>>                                   bool read, bool hi)                         \
>>   {                                                                           \
>> +    struct vcpu *v = current;                                               \
>> +    bool cache_enabled = vcpu_has_cache_enabled(v);                         \
>>       register_t reg = READ_SYSREG(xreg);                                     \
>>                                                                               \
>>       GUEST_BUG_ON(read);                                                     \
>> @@ -80,6 +87,8 @@ static bool vreg_emulate_##xreg(struct cpu_user_regs *regs, uint32_t *r,    \
>>       }                                                                       \
>>       WRITE_SYSREG(reg, xreg);                                                \
>>                                                                               \
>> +    p2m_toggle_cache(v, cache_enabled);                                     \
>> +                                                                            \
>>       return true;                                                            \
>>   }                                                                           \
>>                                                                               \
>> @@ -98,6 +107,7 @@ static bool vreg_emulate_##hireg(struct cpu_user_regs *regs, uint32_t *r,   \
>>   
>>   /* Defining helpers for emulating co-processor registers. */
>>   TVM_REG32(SCTLR, SCTLR_EL1)
>> +
> 
> Spurious change. Should be in a previous patch?

Likely.

> 
>>   /*
>>    * AArch32 provides two way to access TTBR* depending on the access
>>    * size, whilst AArch64 provides one way.
>> @@ -180,6 +190,19 @@ void do_cp15_32(struct cpu_user_regs *regs, const union hsr hsr)
>>           break;
>>   
>>       /*
>> +     * HCR_EL2.TSW
>> +     *
>> +     * ARMv7 (DDI 0406C.b): B1.14.6
>> +     * ARMv8 (DDI 0487B.b): Table D1-42
>> +     */
>> +    case HSR_CPREG32(DCISW):
>> +    case HSR_CPREG32(DCCSW):
>> +    case HSR_CPREG32(DCCISW):
>> +        if ( !cp32.read )
>> +            p2m_set_way_flush(current);
>> +        break;
>> +
>> +    /*
>>        * HCR_EL2.TVM
>>        *
>>        * ARMv8 (DDI 0487B.b): Table D1-37
>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
>> index 92213dd1ab..c470f062db 100644
>> --- a/xen/include/asm-arm/p2m.h
>> +++ b/xen/include/asm-arm/p2m.h
>> @@ -231,6 +231,10 @@ int p2m_set_entry(struct p2m_domain *p2m,
>>    */
>>   int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end);
>>   
>> +void p2m_set_way_flush(struct vcpu *v);
>> +
>> +void p2m_toggle_cache(struct vcpu *v, bool was_enabled);
>> +
>>   /*
>>    * Map a region in the guest p2m with a specific p2m type.
>>    * The memory attributes will be derived from the p2m type.
>> @@ -358,6 +362,18 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
>>       return -EOPNOTSUPP;
>>   }
>>   
>> +/*
>> + * A vCPU has cache enabled only when the MMU is enabled and data cache
>> + * is enabled.
>> + */
>> +static inline bool vcpu_has_cache_enabled(struct vcpu *v)
>> +{
>> +    /* Only works with the current vCPU */
>> +    ASSERT(current == v);
> 
> NIT: same about struct vcpu *v as parameter when only current makes
> sense
> 
> 
>> +    return (READ_SYSREG32(SCTLR_EL1) & (SCTLR_C|SCTLR_M)) == (SCTLR_C|SCTLR_M);
>> +}
>> +
>>   #endif /* _XEN_P2M_H */
>>   
>>   /*

Cheers,
Stefano Stabellini Nov. 6, 2018, 5:31 p.m. | #3
On Mon, 5 Nov 2018, Julien Grall wrote:
> > > +void p2m_set_way_flush(struct vcpu *v)
> > > +{
> > > +    /* This function can only work with the current vCPU. */
> > > +    ASSERT(v == current);
> > 
> > NIT: if it can only operate on current, it makes sense to remove the
> > struct vcpu* parameter
> 
> I prefer to keep struct vcpu *v here. This makes more straightforward to know
> on what the function is working on.
> 
> Furthermore, current is actually quite expensive to use in some circumstance.
> 
> For instance, in nested case, TPIDR_EL2 will get trapped to the host
> hypervisor.
> 
> So it would be best if we start avoid current whenever it is possible.

That's OK


> > 
> > 
> > > +    if ( !(v->arch.hcr_el2 & HCR_TVM) )
> > > +    {
> > > +        p2m_flush_vm(v);
> > > +        vcpu_hcr_set_flags(v, HCR_TVM);
> > > +    }
> > > +}
> > > +
> > > +void p2m_toggle_cache(struct vcpu *v, bool was_enabled)
> > > +{
> > > +    bool now_enabled = vcpu_has_cache_enabled(v);
> > > +
> > > +    /* This function can only work with the current vCPU. */
> > > +    ASSERT(v == current);
> > 
> > NIT: same about struct vcpu* as parameter when only current can be used
> > 
> > 
> > > +    /*
> > > +     * If switching the MMU+caches on, need to invalidate the caches.
> > > +     * If switching it off, need to clean the caches.
> > > +     * Clean + invalidate does the trick always.
> > > +     */
> > > +    if ( was_enabled != now_enabled )
> > > +        p2m_flush_vm(v);
> > > +
> > > +    /* Caches are now on, stop trapping VM ops (until a S/W op) */
> > > +    if ( now_enabled )
> > > +        vcpu_hcr_clear_flags(v, HCR_TVM);
> > > +}
> > > +
> > >   mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn)
> > >   {
> > >       return p2m_lookup(d, gfn, NULL);
> > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> > > index 169b57cb6b..cdc10eee5a 100644
> > > --- a/xen/arch/arm/traps.c
> > > +++ b/xen/arch/arm/traps.c
> > > @@ -98,7 +98,7 @@ register_t get_default_hcr_flags(void)
> > >   {
> > >       return  (HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
> > >                (vwfi != NATIVE ? (HCR_TWI|HCR_TWE) : 0) |
> > > -             HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB);
> > > +             HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB|HCR_TSW);
> > >   }
> > >     static enum {
> > > diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c
> > > index 49529b97cd..dc46d9d0d7 100644
> > > --- a/xen/arch/arm/vcpreg.c
> > > +++ b/xen/arch/arm/vcpreg.c
> > > @@ -45,9 +45,14 @@
> > >   #define TVM_REG(sz, func, reg...)
> > > \
> > >   static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read)
> > > \
> > >   {
> > > \
> > > +    struct vcpu *v = current;
> > > \
> > > +    bool cache_enabled = vcpu_has_cache_enabled(v);
> > > \
> > > +
> > > \
> > >       GUEST_BUG_ON(read);
> > > \
> > >       WRITE_SYSREG##sz(*r, reg);
> > > \
> > >                                                                               \
> > > +    p2m_toggle_cache(v, cache_enabled);
> > > \
> > 
> > This will affect all the registers trapped with TVM. Shouldn't we only
> > call p2m_toggle_cache when relevant? i.e. when changing SCTLR?
> > I think it would be better to only modify the SCTLR emulation handler.
> 
> This is made on purpose, you increase the chance to disable TVM as soon as
> possible. If you only rely on SCTLR, you may end up to trap a lot of registers
> for a long time.
> 
> FWIW, as I already wrote in the commit message, this is based on what KVM
> does.

I missed that. As you explain it, it makes sense. Maybe worth adding an
explicit statement about it: "On ARM64, we call p2m_toggle_cache from on
the TVM-trapped register handlers to increase the chances of disabling
TVM as soon as possible."
Julien Grall Nov. 6, 2018, 5:34 p.m. | #4
Hi Stefano,

On 06/11/2018 17:31, Stefano Stabellini wrote:
> On Mon, 5 Nov 2018, Julien Grall wrote:
>>> This will affect all the registers trapped with TVM. Shouldn't we only
>>> call p2m_toggle_cache when relevant? i.e. when changing SCTLR?
>>> I think it would be better to only modify the SCTLR emulation handler.
>>
>> This is made on purpose, you increase the chance to disable TVM as soon as
>> possible. If you only rely on SCTLR, you may end up to trap a lot of registers
>> for a long time.
>>
>> FWIW, as I already wrote in the commit message, this is based on what KVM
>> does.
> 
> I missed that. As you explain it, it makes sense. Maybe worth adding an
> explicit statement about it: "On ARM64, we call p2m_toggle_cache from on
> the TVM-trapped register handlers to increase the chances of disabling
> TVM as soon as possible."

I am assuming you meant arm32 here? Looking at the code, it looks like I 
implemented the arm64 differently. But we probably want apply to call 
p2m_toggle_cache in all TVM trapped registers.

This would keep the logic everywhere the same.

Cheers,
Stefano Stabellini Nov. 6, 2018, 5:38 p.m. | #5
On Tue, 6 Nov 2018, Julien Grall wrote:
> Hi Stefano,
> 
> On 06/11/2018 17:31, Stefano Stabellini wrote:
> > On Mon, 5 Nov 2018, Julien Grall wrote:
> > > > This will affect all the registers trapped with TVM. Shouldn't we only
> > > > call p2m_toggle_cache when relevant? i.e. when changing SCTLR?
> > > > I think it would be better to only modify the SCTLR emulation handler.
> > > 
> > > This is made on purpose, you increase the chance to disable TVM as soon as
> > > possible. If you only rely on SCTLR, you may end up to trap a lot of
> > > registers
> > > for a long time.
> > > 
> > > FWIW, as I already wrote in the commit message, this is based on what KVM
> > > does.
> > 
> > I missed that. As you explain it, it makes sense. Maybe worth adding an
> > explicit statement about it: "On ARM64, we call p2m_toggle_cache from on
> > the TVM-trapped register handlers to increase the chances of disabling
> > TVM as soon as possible."
> 
> I am assuming you meant arm32 here? 

Yes, I did


> Looking at the code, it looks like I
> implemented the arm64 differently. But we probably want apply to call
> p2m_toggle_cache in all TVM trapped registers.
> 
> This would keep the logic everywhere the same.

Right, good idea

Patch

diff --git a/xen/arch/arm/arm64/vsysreg.c b/xen/arch/arm/arm64/vsysreg.c
index 1517879697..43c6c3e30d 100644
--- a/xen/arch/arm/arm64/vsysreg.c
+++ b/xen/arch/arm/arm64/vsysreg.c
@@ -40,7 +40,20 @@  static bool vreg_emulate_##reg(struct cpu_user_regs *regs,          \
 }
 
 /* Defining helpers for emulating sysreg registers. */
-TVM_REG(SCTLR_EL1)
+static bool vreg_emulate_SCTLR_EL1(struct cpu_user_regs *regs, uint64_t *r,
+                                   bool read)
+{
+    struct vcpu *v = current;
+    bool cache_enabled = vcpu_has_cache_enabled(v);
+
+    GUEST_BUG_ON(read);
+    WRITE_SYSREG(*r, SCTLR_EL1);
+
+    p2m_toggle_cache(v, cache_enabled);
+
+    return true;
+}
+
 TVM_REG(TTBR0_EL1)
 TVM_REG(TTBR1_EL1)
 TVM_REG(TCR_EL1)
@@ -84,6 +97,18 @@  void do_sysreg(struct cpu_user_regs *regs,
         break;
 
     /*
+     * HCR_EL2.TSW
+     *
+     * ARMv8 (DDI 0487B.b): Table D1-42
+     */
+    case HSR_SYSREG_DCISW:
+    case HSR_SYSREG_DCCSW:
+    case HSR_SYSREG_DCCISW:
+        if ( hsr.sysreg.read )
+            p2m_set_way_flush(current);
+        break;
+
+    /*
      * HCR_EL2.TVM
      *
      * ARMv8 (DDI 0487B.b): Table D1-37
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index df6b48d73b..a3d4c563b1 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1564,6 +1564,74 @@  int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end)
     return 0;
 }
 
+static void p2m_flush_vm(struct vcpu *v)
+{
+    struct p2m_domain *p2m = p2m_get_hostp2m(v->domain);
+
+    /* XXX: Handle preemption */
+    p2m_cache_flush_range(v->domain, p2m->lowest_mapped_gfn,
+                          p2m->max_mapped_gfn);
+}
+
+/*
+ * See note at ARMv7 ARM B1.14.4 (DDI 0406C.c) (TL;DR: S/W ops are not
+ * easily virtualized).
+ *
+ * Main problems:
+ *  - S/W ops are local to a CPU (not broadcast)
+ *  - We have line migration behind our back (speculation)
+ *  - System caches don't support S/W at all (damn!)
+ *
+ * In the face of the above, the best we can do is to try and convert
+ * S/W ops to VA ops. Because the guest is not allowed to infer the S/W
+ * to PA mapping, it can only use S/W to nuke the whole cache, which is
+ * rather a good thing for us.
+ *
+ * Also, it is only used when turning caches on/off ("The expected
+ * usage of the cache maintenance instructions that operate by set/way
+ * is associated with the powerdown and powerup of caches, if this is
+ * required by the implementation.").
+ *
+ * We use the following policy:
+ *  - If we trap a S/W operation, we enabled VM trapping to detect
+ *  caches being turned on/off, and do a full clean.
+ *
+ *  - We flush the caches on both caches being turned on and off.
+ *
+ *  - Once the caches are enabled, we stop trapping VM ops.
+ */
+void p2m_set_way_flush(struct vcpu *v)
+{
+    /* This function can only work with the current vCPU. */
+    ASSERT(v == current);
+
+    if ( !(v->arch.hcr_el2 & HCR_TVM) )
+    {
+        p2m_flush_vm(v);
+        vcpu_hcr_set_flags(v, HCR_TVM);
+    }
+}
+
+void p2m_toggle_cache(struct vcpu *v, bool was_enabled)
+{
+    bool now_enabled = vcpu_has_cache_enabled(v);
+
+    /* This function can only work with the current vCPU. */
+    ASSERT(v == current);
+
+    /*
+     * If switching the MMU+caches on, need to invalidate the caches.
+     * If switching it off, need to clean the caches.
+     * Clean + invalidate does the trick always.
+     */
+    if ( was_enabled != now_enabled )
+        p2m_flush_vm(v);
+
+    /* Caches are now on, stop trapping VM ops (until a S/W op) */
+    if ( now_enabled )
+        vcpu_hcr_clear_flags(v, HCR_TVM);
+}
+
 mfn_t gfn_to_mfn(struct domain *d, gfn_t gfn)
 {
     return p2m_lookup(d, gfn, NULL);
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 169b57cb6b..cdc10eee5a 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -98,7 +98,7 @@  register_t get_default_hcr_flags(void)
 {
     return  (HCR_PTW|HCR_BSU_INNER|HCR_AMO|HCR_IMO|HCR_FMO|HCR_VM|
              (vwfi != NATIVE ? (HCR_TWI|HCR_TWE) : 0) |
-             HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB);
+             HCR_TSC|HCR_TAC|HCR_SWIO|HCR_TIDCP|HCR_FB|HCR_TSW);
 }
 
 static enum {
diff --git a/xen/arch/arm/vcpreg.c b/xen/arch/arm/vcpreg.c
index 49529b97cd..dc46d9d0d7 100644
--- a/xen/arch/arm/vcpreg.c
+++ b/xen/arch/arm/vcpreg.c
@@ -45,9 +45,14 @@ 
 #define TVM_REG(sz, func, reg...)                                           \
 static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read)    \
 {                                                                           \
+    struct vcpu *v = current;                                               \
+    bool cache_enabled = vcpu_has_cache_enabled(v);                         \
+                                                                            \
     GUEST_BUG_ON(read);                                                     \
     WRITE_SYSREG##sz(*r, reg);                                              \
                                                                             \
+    p2m_toggle_cache(v, cache_enabled);                                     \
+                                                                            \
     return true;                                                            \
 }
 
@@ -65,6 +70,8 @@  static bool func(struct cpu_user_regs *regs, uint##sz##_t *r, bool read)    \
 static bool vreg_emulate_##xreg(struct cpu_user_regs *regs, uint32_t *r,    \
                                 bool read, bool hi)                         \
 {                                                                           \
+    struct vcpu *v = current;                                               \
+    bool cache_enabled = vcpu_has_cache_enabled(v);                         \
     register_t reg = READ_SYSREG(xreg);                                     \
                                                                             \
     GUEST_BUG_ON(read);                                                     \
@@ -80,6 +87,8 @@  static bool vreg_emulate_##xreg(struct cpu_user_regs *regs, uint32_t *r,    \
     }                                                                       \
     WRITE_SYSREG(reg, xreg);                                                \
                                                                             \
+    p2m_toggle_cache(v, cache_enabled);                                     \
+                                                                            \
     return true;                                                            \
 }                                                                           \
                                                                             \
@@ -98,6 +107,7 @@  static bool vreg_emulate_##hireg(struct cpu_user_regs *regs, uint32_t *r,   \
 
 /* Defining helpers for emulating co-processor registers. */
 TVM_REG32(SCTLR, SCTLR_EL1)
+
 /*
  * AArch32 provides two way to access TTBR* depending on the access
  * size, whilst AArch64 provides one way.
@@ -180,6 +190,19 @@  void do_cp15_32(struct cpu_user_regs *regs, const union hsr hsr)
         break;
 
     /*
+     * HCR_EL2.TSW
+     *
+     * ARMv7 (DDI 0406C.b): B1.14.6
+     * ARMv8 (DDI 0487B.b): Table D1-42
+     */
+    case HSR_CPREG32(DCISW):
+    case HSR_CPREG32(DCCSW):
+    case HSR_CPREG32(DCCISW):
+        if ( !cp32.read )
+            p2m_set_way_flush(current);
+        break;
+
+    /*
      * HCR_EL2.TVM
      *
      * ARMv8 (DDI 0487B.b): Table D1-37
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 92213dd1ab..c470f062db 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -231,6 +231,10 @@  int p2m_set_entry(struct p2m_domain *p2m,
  */
 int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end);
 
+void p2m_set_way_flush(struct vcpu *v);
+
+void p2m_toggle_cache(struct vcpu *v, bool was_enabled);
+
 /*
  * Map a region in the guest p2m with a specific p2m type.
  * The memory attributes will be derived from the p2m type.
@@ -358,6 +362,18 @@  static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn,
     return -EOPNOTSUPP;
 }
 
+/*
+ * A vCPU has cache enabled only when the MMU is enabled and data cache
+ * is enabled.
+ */
+static inline bool vcpu_has_cache_enabled(struct vcpu *v)
+{
+    /* Only works with the current vCPU */
+    ASSERT(current == v);
+
+    return (READ_SYSREG32(SCTLR_EL1) & (SCTLR_C|SCTLR_M)) == (SCTLR_C|SCTLR_M);
+}
+
 #endif /* _XEN_P2M_H */
 
 /*