Message ID | 1469031064-23344-19-git-send-email-julien.grall@arm.com |
---|---|
State | Superseded |
Headers | show |
Hi Stefano, On 27/07/16 02:12, Stefano Stabellini wrote: > On Wed, 20 Jul 2016, Julien Grall wrote: >> The current implementation of flush_tlb_domain is relying on the domain >> to have a single p2m. With the upcoming feature altp2m, a single domain >> may have different p2m. So we would need to switch to the correct p2m in >> order to flush the TLBs. >> >> Rather than checking whether the domain is not the current domain, check >> whether the VTTBR is different. The resulting assembly code is much >> smaller: from 38 instructions (+ 2 functions call) to 22 instructions. > > That's true but SYSREG reads are more expensive than regular > instructions. This argument is not really true. The ARM ARM (D7-1879 in ARM DDI 0487A.j) says: "Reads of the System registers can occur out of order with respect to earlier instructions executed on the same PE, provided that any data dependencies between the instructions are respected". So It will depend on how the micro-architecture implemented access to SYSREG. However, the current code already contains plenty of SYSREG read access (via the macro current using TPIDR_EL2). So the number of SYSREG accesses stay exactly the same. I also forgot to mention that the number of instructions in the function call (10 instructions). So we are down from 58 instructions to 22 instructions. Therefore, smaller code and likely better performance. > > >> Signed-off-by: Julien Grall <julien.grall@arm.com> >> --- >> xen/arch/arm/p2m.c | 18 +++++++++++------- >> 1 file changed, 11 insertions(+), 7 deletions(-) >> >> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c >> index d1b6009..015c1e8 100644 >> --- a/xen/arch/arm/p2m.c >> +++ b/xen/arch/arm/p2m.c >> @@ -151,24 +151,28 @@ void p2m_restore_state(struct vcpu *n) >> >> void flush_tlb_domain(struct domain *d) >> { >> + struct p2m_domain *p2m = &d->arch.p2m; >> unsigned long flags = 0; >> + uint64_t ovttbr; >> >> /* >> - * Update the VTTBR if necessary with the domain d. In this case, >> - * it's only necessary to flush TLBs on every CPUs with the current VMID >> - * (our domain). >> + * ARM only provides an instruction to flush TLBs for the current >> + * VMID. So switch to the VTTBR of a given P2M if different. >> */ >> - if ( d != current->domain ) >> + ovttbr = READ_SYSREG64(VTTBR_EL2); >> + if ( ovttbr != p2m->vttbr ) >> { >> local_irq_save(flags); >> - p2m_load_VTTBR(d); >> + WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2); >> + isb(); >> } >> >> flush_tlb(); >> >> - if ( d != current->domain ) >> + if ( ovttbr != READ_SYSREG64(VTTBR_EL2) ) > > You should be able to remove this second SYSREG read and optimize the > code further. I should be able, however I think it will not bring much more optimization here but obfuscating a bit more the code. Regards,
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index d1b6009..015c1e8 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -151,24 +151,28 @@ void p2m_restore_state(struct vcpu *n) void flush_tlb_domain(struct domain *d) { + struct p2m_domain *p2m = &d->arch.p2m; unsigned long flags = 0; + uint64_t ovttbr; /* - * Update the VTTBR if necessary with the domain d. In this case, - * it's only necessary to flush TLBs on every CPUs with the current VMID - * (our domain). + * ARM only provides an instruction to flush TLBs for the current + * VMID. So switch to the VTTBR of a given P2M if different. */ - if ( d != current->domain ) + ovttbr = READ_SYSREG64(VTTBR_EL2); + if ( ovttbr != p2m->vttbr ) { local_irq_save(flags); - p2m_load_VTTBR(d); + WRITE_SYSREG64(p2m->vttbr, VTTBR_EL2); + isb(); } flush_tlb(); - if ( d != current->domain ) + if ( ovttbr != READ_SYSREG64(VTTBR_EL2) ) { - p2m_load_VTTBR(current->domain); + WRITE_SYSREG64(ovttbr, VTTBR_EL2); + isb(); local_irq_restore(flags); } }
The current implementation of flush_tlb_domain is relying on the domain to have a single p2m. With the upcoming feature altp2m, a single domain may have different p2m. So we would need to switch to the correct p2m in order to flush the TLBs. Rather than checking whether the domain is not the current domain, check whether the VTTBR is different. The resulting assembly code is much smaller: from 38 instructions (+ 2 functions call) to 22 instructions. Signed-off-by: Julien Grall <julien.grall@arm.com> --- xen/arch/arm/p2m.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)