[Xen-devel,13/13] xen/arm: Avoid to use current everywhere in enter_hypervisor_head

Message ID 20180522174254.27551-14-julien.grall@arm.com
State Superseded
Headers show
Series
  • xen/arm: SSBD (aka Spectre-v4) mitigation (XSA-263)
Related show

Commit Message

Julien Grall May 22, 2018, 5:42 p.m.
Using current is fairly expensive, so save up into a variable.

Signed-off-by: Julien Grall <julien.grall@arm.com>
---
 xen/arch/arm/traps.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

Comments

Stefano Stabellini May 23, 2018, 11:47 p.m. | #1
On Tue, 22 May 2018, Julien Grall wrote:
> Using current is fairly expensive, so save up into a variable.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>

Good idea. I am curious to know actually how much this patch would save
but I am not going to ask you run the tests.

Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

> ---
>  xen/arch/arm/traps.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> index 020b0b8eef..b1546f6907 100644
> --- a/xen/arch/arm/traps.c
> +++ b/xen/arch/arm/traps.c
> @@ -2024,8 +2024,10 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs)
>  {
>      if ( guest_mode(regs) )
>      {
> +        struct vcpu *v = current;
> +
>          /* If the guest has disabled the workaround, bring it back on. */
> -        if ( needs_ssbd_flip(current) )
> +        if ( needs_ssbd_flip(v) )
>              arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL);
>  
>          /*
> @@ -2034,8 +2036,8 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs)
>           * but the crucial bit is "On taking a vSError interrupt, HCR_EL2.VSE
>           * (alias of HCR.VA) is cleared to 0."
>           */
> -        if ( current->arch.hcr_el2 & HCR_VA )
> -            current->arch.hcr_el2 = READ_SYSREG(HCR_EL2);
> +        if ( v->arch.hcr_el2 & HCR_VA )
> +            v->arch.hcr_el2 = READ_SYSREG(HCR_EL2);
>  
>  #ifdef CONFIG_NEW_VGIC
>          /*
> @@ -2045,11 +2047,11 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs)
>           * TODO: Investigate whether this is necessary to do on every
>           * trap and how it can be optimised.
>           */
> -        vtimer_update_irqs(current);
> -        vcpu_update_evtchn_irq(current);
> +        vtimer_update_irqs(v);
> +        vcpu_update_evtchn_irq(v);
>  #endif
>  
> -        vgic_sync_from_lrs(current);
> +        vgic_sync_from_lrs(v);
>      }
>  }
>  
> -- 
> 2.11.0
>
Julien Grall May 24, 2018, 10:29 a.m. | #2
Hi Stefano,

On 24/05/18 00:47, Stefano Stabellini wrote:
> On Tue, 22 May 2018, Julien Grall wrote:
>> Using current is fairly expensive, so save up into a variable.
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
> 
> Good idea. I am curious to know actually how much this patch would save
> but I am not going to ask you run the tests.

I haven't benchmark it but looked at the resulting assembly code. This 
reduces by about ~20% the number of instructions in the function.

AFAIU, this is because of the way per-cpu access have been implemented. 
The per-cpu offset is stored in a system register (TPIDR_EL2), all the 
read to it cannot be optimized (access using volatile).

So every direct use of "current" will require at least a system register 
access and then a load from memory.

Cheers,

> 
> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> 
>> ---
>>   xen/arch/arm/traps.c | 14 ++++++++------
>>   1 file changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
>> index 020b0b8eef..b1546f6907 100644
>> --- a/xen/arch/arm/traps.c
>> +++ b/xen/arch/arm/traps.c
>> @@ -2024,8 +2024,10 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs)
>>   {
>>       if ( guest_mode(regs) )
>>       {
>> +        struct vcpu *v = current;
>> +
>>           /* If the guest has disabled the workaround, bring it back on. */
>> -        if ( needs_ssbd_flip(current) )
>> +        if ( needs_ssbd_flip(v) )
>>               arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL);
>>   
>>           /*
>> @@ -2034,8 +2036,8 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs)
>>            * but the crucial bit is "On taking a vSError interrupt, HCR_EL2.VSE
>>            * (alias of HCR.VA) is cleared to 0."
>>            */
>> -        if ( current->arch.hcr_el2 & HCR_VA )
>> -            current->arch.hcr_el2 = READ_SYSREG(HCR_EL2);
>> +        if ( v->arch.hcr_el2 & HCR_VA )
>> +            v->arch.hcr_el2 = READ_SYSREG(HCR_EL2);
>>   
>>   #ifdef CONFIG_NEW_VGIC
>>           /*
>> @@ -2045,11 +2047,11 @@ static void enter_hypervisor_head(struct cpu_user_regs *regs)
>>            * TODO: Investigate whether this is necessary to do on every
>>            * trap and how it can be optimised.
>>            */
>> -        vtimer_update_irqs(current);
>> -        vcpu_update_evtchn_irq(current);
>> +        vtimer_update_irqs(v);
>> +        vcpu_update_evtchn_irq(v);
>>   #endif
>>   
>> -        vgic_sync_from_lrs(current);
>> +        vgic_sync_from_lrs(v);
>>       }
>>   }
>>   
>> -- 
>> 2.11.0
>>
Stefano Stabellini May 24, 2018, 6:46 p.m. | #3
On Thu, 24 May 2018, Julien Grall wrote:
> Hi Stefano,
> 
> On 24/05/18 00:47, Stefano Stabellini wrote:
> > On Tue, 22 May 2018, Julien Grall wrote:
> > > Using current is fairly expensive, so save up into a variable.
> > > 
> > > Signed-off-by: Julien Grall <julien.grall@arm.com>
> > 
> > Good idea. I am curious to know actually how much this patch would save
> > but I am not going to ask you run the tests.
> 
> I haven't benchmark it but looked at the resulting assembly code. This reduces
> by about ~20% the number of instructions in the function.
> 
> AFAIU, this is because of the way per-cpu access have been implemented. The
> per-cpu offset is stored in a system register (TPIDR_EL2), all the read to it
> cannot be optimized (access using volatile).
> 
> So every direct use of "current" will require at least a system register
> access and then a load from memory.

Very nice, thank you!


> 
> > 
> > Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
> > 
> > > ---
> > >   xen/arch/arm/traps.c | 14 ++++++++------
> > >   1 file changed, 8 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
> > > index 020b0b8eef..b1546f6907 100644
> > > --- a/xen/arch/arm/traps.c
> > > +++ b/xen/arch/arm/traps.c
> > > @@ -2024,8 +2024,10 @@ static void enter_hypervisor_head(struct
> > > cpu_user_regs *regs)
> > >   {
> > >       if ( guest_mode(regs) )
> > >       {
> > > +        struct vcpu *v = current;
> > > +
> > >           /* If the guest has disabled the workaround, bring it back on.
> > > */
> > > -        if ( needs_ssbd_flip(current) )
> > > +        if ( needs_ssbd_flip(v) )
> > >               arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL);
> > >             /*
> > > @@ -2034,8 +2036,8 @@ static void enter_hypervisor_head(struct
> > > cpu_user_regs *regs)
> > >            * but the crucial bit is "On taking a vSError interrupt,
> > > HCR_EL2.VSE
> > >            * (alias of HCR.VA) is cleared to 0."
> > >            */
> > > -        if ( current->arch.hcr_el2 & HCR_VA )
> > > -            current->arch.hcr_el2 = READ_SYSREG(HCR_EL2);
> > > +        if ( v->arch.hcr_el2 & HCR_VA )
> > > +            v->arch.hcr_el2 = READ_SYSREG(HCR_EL2);
> > >     #ifdef CONFIG_NEW_VGIC
> > >           /*
> > > @@ -2045,11 +2047,11 @@ static void enter_hypervisor_head(struct
> > > cpu_user_regs *regs)
> > >            * TODO: Investigate whether this is necessary to do on every
> > >            * trap and how it can be optimised.
> > >            */
> > > -        vtimer_update_irqs(current);
> > > -        vcpu_update_evtchn_irq(current);
> > > +        vtimer_update_irqs(v);
> > > +        vcpu_update_evtchn_irq(v);
> > >   #endif
> > >   -        vgic_sync_from_lrs(current);
> > > +        vgic_sync_from_lrs(v);
> > >       }
> > >   }
> > >   
> > > -- 
> > > 2.11.0
> > > 
> 
> -- 
> Julien Grall
>

Patch

diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c
index 020b0b8eef..b1546f6907 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -2024,8 +2024,10 @@  static void enter_hypervisor_head(struct cpu_user_regs *regs)
 {
     if ( guest_mode(regs) )
     {
+        struct vcpu *v = current;
+
         /* If the guest has disabled the workaround, bring it back on. */
-        if ( needs_ssbd_flip(current) )
+        if ( needs_ssbd_flip(v) )
             arm_smccc_1_1_smc(ARM_SMCCC_ARCH_WORKAROUND_2_FID, 1, NULL);
 
         /*
@@ -2034,8 +2036,8 @@  static void enter_hypervisor_head(struct cpu_user_regs *regs)
          * but the crucial bit is "On taking a vSError interrupt, HCR_EL2.VSE
          * (alias of HCR.VA) is cleared to 0."
          */
-        if ( current->arch.hcr_el2 & HCR_VA )
-            current->arch.hcr_el2 = READ_SYSREG(HCR_EL2);
+        if ( v->arch.hcr_el2 & HCR_VA )
+            v->arch.hcr_el2 = READ_SYSREG(HCR_EL2);
 
 #ifdef CONFIG_NEW_VGIC
         /*
@@ -2045,11 +2047,11 @@  static void enter_hypervisor_head(struct cpu_user_regs *regs)
          * TODO: Investigate whether this is necessary to do on every
          * trap and how it can be optimised.
          */
-        vtimer_update_irqs(current);
-        vcpu_update_evtchn_irq(current);
+        vtimer_update_irqs(v);
+        vcpu_update_evtchn_irq(v);
 #endif
 
-        vgic_sync_from_lrs(current);
+        vgic_sync_from_lrs(v);
     }
 }