diff mbox

[Xen-devel,18/22] xen/arm: p2m: Rework the context switch to another VTTBR in flush_tlb_domain

Message ID 1469031064-23344-19-git-send-email-julien.grall@arm.com
State Superseded
Headers show

Commit Message

Julien Grall July 20, 2016, 4:11 p.m. UTC
The current implementation of flush_tlb_domain is relying on the domain
to have a single p2m. With the upcoming feature altp2m, a single domain
may have different p2m. So we would need to switch to the correct p2m in
order to flush the TLBs.

Rather than checking whether the domain is not the current domain, check
whether the VTTBR is different. The resulting assembly code is much
smaller: from 38 instructions (+ 2 functions call) to 22 instructions.

Signed-off-by: Julien Grall <julien.grall@arm.com>
---
 xen/arch/arm/p2m.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

Comments

Julien Grall July 27, 2016, 10:22 a.m. UTC | #1
Hi Stefano,

On 27/07/16 02:12, Stefano Stabellini wrote:
> On Wed, 20 Jul 2016, Julien Grall wrote:
>> The current implementation of flush_tlb_domain is relying on the domain
>> to have a single p2m. With the upcoming feature altp2m, a single domain
>> may have different p2m. So we would need to switch to the correct p2m in
>> order to flush the TLBs.
>>
>> Rather than checking whether the domain is not the current domain, check
>> whether the VTTBR is different. The resulting assembly code is much
>> smaller: from 38 instructions (+ 2 functions call) to 22 instructions.
>
> That's true but SYSREG reads are more expensive than regular
> instructions.

This argument is not really true. The ARM ARM (D7-1879 in ARM DDI 
0487A.j) says: "Reads of the System registers can occur out of order 
with respect to earlier instructions executed on the same PE, provided 
that any data dependencies between the instructions are respected". So 
It will depend on how the micro-architecture implemented access to SYSREG.

However, the current code already contains plenty of SYSREG read access 
(via the macro current using TPIDR_EL2). So the number of SYSREG 
accesses stay exactly the same.

I also forgot to mention that the number of instructions in the function 
call (10 instructions). So we are down from 58 instructions to 22 
instructions.

Therefore, smaller code and likely better performance.

>
>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> ---
>>  xen/arch/arm/p2m.c | 18 +++++++++++-------
>>  1 file changed, 11 insertions(+), 7 deletions(-)
>>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index d1b6009..015c1e8 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -151,24 +151,28 @@ void p2m_restore_state(struct vcpu *n)
>>
>>  void flush_tlb_domain(struct domain *d)
>>  {
>> +    struct p2m_domain *p2m = &d->arch.p2m;
>>      unsigned long flags = 0;
>> +    uint64_t ovttbr;
>>
>>      /*
>> -     * Update the VTTBR if necessary with the domain d. In this case,
>> -     * it's only necessary to flush TLBs on every CPUs with the current VMID
>> -     * (our domain).
>> +     * ARM only provides an instruction to flush TLBs for the current
>> +     * VMID. So switch to the VTTBR of a given P2M if different.
>>       */
>> -    if ( d != current->domain )
>> +    ovttbr = READ_SYSREG64(VTTBR_EL2);
>> +    if ( ovttbr != p2m->vttbr )
>>      {
>>          local_irq_save(flags);
>> -        p2m_load_VTTBR(d);
>> +        WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2);
>> +        isb();
>>      }
>>
>>      flush_tlb();
>>
>> -    if ( d != current->domain )
>> +    if ( ovttbr != READ_SYSREG64(VTTBR_EL2) )
>
> You should be able to remove this second SYSREG read and optimize the
> code further.

I should be able, however I think it will not bring much more 
optimization here but obfuscating a bit more the code.

Regards,
diff mbox

Patch

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index d1b6009..015c1e8 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -151,24 +151,28 @@  void p2m_restore_state(struct vcpu *n)
 
 void flush_tlb_domain(struct domain *d)
 {
+    struct p2m_domain *p2m = &d->arch.p2m;
     unsigned long flags = 0;
+    uint64_t ovttbr;
 
     /*
-     * Update the VTTBR if necessary with the domain d. In this case,
-     * it's only necessary to flush TLBs on every CPUs with the current VMID
-     * (our domain).
+     * ARM only provides an instruction to flush TLBs for the current
+     * VMID. So switch to the VTTBR of a given P2M if different.
      */
-    if ( d != current->domain )
+    ovttbr = READ_SYSREG64(VTTBR_EL2);
+    if ( ovttbr != p2m->vttbr )
     {
         local_irq_save(flags);
-        p2m_load_VTTBR(d);
+        WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2);
+        isb();
     }
 
     flush_tlb();
 
-    if ( d != current->domain )
+    if ( ovttbr != READ_SYSREG64(VTTBR_EL2) )
     {
-        p2m_load_VTTBR(current->domain);
+        WRITE_SYSREG64(ovttbr, VTTBR_EL2);
+        isb();
         local_irq_restore(flags);
     }
 }