Message ID | 1469717505-8026-22-git-send-email-julien.grall@arm.com |
---|---|
State | Superseded |
Headers | show |
On 28/07/16 16:04, Razvan Cojocaru wrote: > On 07/28/2016 05:51 PM, Julien Grall wrote: >> The function p2m_set_mem_access can be re-implemented using the generic >> functions p2m_get_entry and __p2m_set_entry. >> >> Note that because of the implementation of p2m_get_entry, a TLB >> invalidation instruction will be issued for each 4KB page. Therefore the >> performance of memaccess will be impacted, however the function is now >> safe on all the processors. >> >> Also the function apply_p2m_changes is dropped completely as it is not >> unused anymore. >> >> Signed-off-by: Julien Grall <julien.grall@arm.com> >> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com> >> Cc: Tamas K Lengyel <tamas@tklengyel.com> >> >> --- >> I have not ran any performance test with memaccess for now, but I >> expect an important and unavoidable impact because of how memaccess >> has been designed to workaround hardware limitation. Note that might >> be possible to re-work memaccess work on superpage but this should >> be done in a separate patch. >> --- >> xen/arch/arm/p2m.c | 329 +++++++---------------------------------------------- >> 1 file changed, 38 insertions(+), 291 deletions(-) > > Thanks for the CC! Hi Razvan, > This seems to only impact ARM, are there any planned changes for x86 > along these lines as well? The break-before-make sequence is required by the ARM architecture. I don't know the x86 architecture, you can ask the x86 maintainers if this is something necessary. Regards,
Hi Razvan, On 28/07/16 16:04, Razvan Cojocaru wrote: > On 07/28/2016 05:51 PM, Julien Grall wrote: >> The function p2m_set_mem_access can be re-implemented using the generic >> functions p2m_get_entry and __p2m_set_entry. >> >> Note that because of the implementation of p2m_get_entry, a TLB >> invalidation instruction will be issued for each 4KB page. Therefore the >> performance of memaccess will be impacted, however the function is now >> safe on all the processors. >> >> Also the function apply_p2m_changes is dropped completely as it is not >> unused anymore. >> >> Signed-off-by: Julien Grall <julien.grall@arm.com> >> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com> >> Cc: Tamas K Lengyel <tamas@tklengyel.com> >> >> --- >> I have not ran any performance test with memaccess for now, but I >> expect an important and unavoidable impact because of how memaccess >> has been designed to workaround hardware limitation. Note that might >> be possible to re-work memaccess work on superpage but this should >> be done in a separate patch. >> --- >> xen/arch/arm/p2m.c | 329 +++++++---------------------------------------------- >> 1 file changed, 38 insertions(+), 291 deletions(-) > > Thanks for the CC! > > This seems to only impact ARM, are there any planned changes for x86 > along these lines as well? Actually, it might be possible to remove the TLB for each 4KB entry in the memaccess case. After I read again multiple time the ARM ARM (D4-1732 in ARM DDI 0487A.i) and spoke with various ARM folks, changing the permission (i.e read, write, execute) does not require the break-before-make sequence. However, I noticed a latent bug in the memaccess code when the permission restriction are removed/changed. In the current implementation (i.e without this series), the TLB invalidation is deferred until the p2m is released. Until that time, a vCPU may still run with the previous permission and trap into the hypervisor the permission fault. However, as the permission as changed, p2m_memaccess_check may fail and we will inject an abort to the guest. The problem is very similar to [1]. This will still be true with this series applied if I relaxed the use of the break-before-make sequence. The two ways I can see to fix this are either try again the instruction (we would trap again if the permission was not correct) or keep the break-before-make. The former will be cleaner given than stage-2 permission fault can only happen because of memaccess for now. Any opinions? Regards, [1] https://lists.xenproject.org/archives/html/xen-devel/2016-07/msg03133.html
On 01/08/16 16:59, Tamas K Lengyel wrote: > On Mon, Aug 1, 2016 at 9:40 AM, Julien Grall <julien.grall@arm.com> wrote: > IMHO we should just pause the domain while mem_access permissions are > being changed. On x86 it is not necessary as the mem_access > bookkeeping is kept in the ept pte entries but since on ARM the two > things are disjoint it is required. Even on x86 though - while it's > not strictly necessary - I think it would be a good idea to just pause > the domain as from a user perspective it would be more intuitive then > keeping the vCPUs running. The problem is not because of the bookkeeping, but how the TLBs work. I never mentioned the radix tree in my previous mail because storing the access in the entry will not help here. The entry may have been updated but the TLBs not yet invalidated. x86 does not seem to be affected because if the mem access check fail, the instruction is replayed (see hvm_hap_nested_page_fault). This is exactly the solution I suggested, which is IHMO the best. I don't think pausing all the vCPUs of a domain is a good solution, the overhead will be too high for modifying only one page (see p2m_mem_access_check for instance). Regards,
On 01/08/16 17:27, Tamas K Lengyel wrote: > On Mon, Aug 1, 2016 at 10:15 AM, Julien Grall <julien.grall@arm.com> wrote: >> >> >> On 01/08/16 16:59, Tamas K Lengyel wrote: >>> >>> On Mon, Aug 1, 2016 at 9:40 AM, Julien Grall <julien.grall@arm.com> wrote: >>> IMHO we should just pause the domain while mem_access permissions are >>> being changed. On x86 it is not necessary as the mem_access >>> bookkeeping is kept in the ept pte entries but since on ARM the two >>> things are disjoint it is required. Even on x86 though - while it's >>> not strictly necessary - I think it would be a good idea to just pause >>> the domain as from a user perspective it would be more intuitive then >>> keeping the vCPUs running. >> >> >> The problem is not because of the bookkeeping, but how the TLBs work. I >> never mentioned the radix tree in my previous mail because storing the >> access in the entry will not help here. The entry may have been updated but >> the TLBs not yet invalidated. >> >> x86 does not seem to be affected because if the mem access check fail, the >> instruction is replayed (see hvm_hap_nested_page_fault). This is exactly the >> solution I suggested, which is IHMO the best. > > I see. Yes, indeed that sounds like the route to take here. I am looking forward to see a patch. >> >> I don't think pausing all the vCPUs of a domain is a good solution, the >> overhead will be too high for modifying only one page (see >> p2m_mem_access_check for instance). >> > > IMHO the overhead of pausing the domain for setting mem_access > permissions is not critical as it is done only on occasion. We can > certainly leave the default behavior like this but I'll probably also > introduce a new XENMEM_access_op_set_access_sync op that will pause > the domain for the duration of the op. Oh, you meant for a user app to set the memaccess. I guess it is fine. I mentioned p2m_mem_access_check because this function is supposed to be called often in the data/instruction abort handlers, so pausing a domain here would be a big overhead. Regards,
Hi Julien, On Mon, Aug 01, 2016 at 04:40:50PM +0100, Julien Grall wrote: > Hi Razvan, > > On 28/07/16 16:04, Razvan Cojocaru wrote: > >On 07/28/2016 05:51 PM, Julien Grall wrote: > >>The function p2m_set_mem_access can be re-implemented using the generic > >>functions p2m_get_entry and __p2m_set_entry. > >> > >>Note that because of the implementation of p2m_get_entry, a TLB > >>invalidation instruction will be issued for each 4KB page. Therefore the > >>performance of memaccess will be impacted, however the function is now > >>safe on all the processors. > >> > >>Also the function apply_p2m_changes is dropped completely as it is not > >>unused anymore. > >> > >>Signed-off-by: Julien Grall <julien.grall@arm.com> > >>Cc: Razvan Cojocaru <rcojocaru@bitdefender.com> > >>Cc: Tamas K Lengyel <tamas@tklengyel.com> > >> > >>--- > >> I have not ran any performance test with memaccess for now, but I > >> expect an important and unavoidable impact because of how memaccess > >> has been designed to workaround hardware limitation. Note that might > >> be possible to re-work memaccess work on superpage but this should > >> be done in a separate patch. > >>--- > >> xen/arch/arm/p2m.c | 329 +++++++---------------------------------------------- > >> 1 file changed, 38 insertions(+), 291 deletions(-) > > > >Thanks for the CC! > > > >This seems to only impact ARM, are there any planned changes for x86 > >along these lines as well? > > Actually, it might be possible to remove the TLB for each 4KB entry > in the memaccess case. Could you clarify what you mean by "remove the TLB"? Do you mean a TLBI instruction? > After I read again multiple time the ARM ARM (D4-1732 in ARM DDI > 0487A.i) and spoke with various ARM folks, changing the permission > (i.e read, write, execute) does not require the break-before-make > sequence. Regardless of whether Break-Before-Make (BBM) is required, TLB invalidation is still required to ensure the new permissions have taken effect after *any* modification to page tables, unless: * The prior entry was not permitted to be held in a TLB, and: * No TLB held an entry for the address range. > However, I noticed a latent bug in the memaccess code when the > permission restriction are removed/changed. > > In the current implementation (i.e without this series), the TLB > invalidation is deferred until the p2m is released. Until that time, > a vCPU may still run with the previous permission and trap into the > hypervisor the permission fault. However, as the permission as > changed, p2m_memaccess_check may fail and we will inject an abort to > the guest. > > The problem is very similar to [1]. This will still be true with > this series applied if I relaxed the use of the break-before-make > sequence. The two ways I can see to fix this are either try again > the instruction (we would trap again if the permission was not > correct) or keep the break-before-make. Why can you not have TLB invalidation without the full BBM sequence? Thanks, Mark.
On 01/08/16 17:34, Mark Rutland wrote: > Hi Julien, Hi Mark, > On Mon, Aug 01, 2016 at 04:40:50PM +0100, Julien Grall wrote: >> Hi Razvan, >> >> On 28/07/16 16:04, Razvan Cojocaru wrote: >>> On 07/28/2016 05:51 PM, Julien Grall wrote: >>>> The function p2m_set_mem_access can be re-implemented using the generic >>>> functions p2m_get_entry and __p2m_set_entry. >>>> >>>> Note that because of the implementation of p2m_get_entry, a TLB >>>> invalidation instruction will be issued for each 4KB page. Therefore the >>>> performance of memaccess will be impacted, however the function is now >>>> safe on all the processors. >>>> >>>> Also the function apply_p2m_changes is dropped completely as it is not >>>> unused anymore. >>>> >>>> Signed-off-by: Julien Grall <julien.grall@arm.com> >>>> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com> >>>> Cc: Tamas K Lengyel <tamas@tklengyel.com> >>>> >>>> --- >>>> I have not ran any performance test with memaccess for now, but I >>>> expect an important and unavoidable impact because of how memaccess >>>> has been designed to workaround hardware limitation. Note that might >>>> be possible to re-work memaccess work on superpage but this should >>>> be done in a separate patch. >>>> --- >>>> xen/arch/arm/p2m.c | 329 +++++++---------------------------------------------- >>>> 1 file changed, 38 insertions(+), 291 deletions(-) >>> >>> Thanks for the CC! >>> >>> This seems to only impact ARM, are there any planned changes for x86 >>> along these lines as well? >> >> Actually, it might be possible to remove the TLB for each 4KB entry >> in the memaccess case. > > Could you clarify what you mean by "remove the TLB"? Do you mean a TLBI > instruction? I meant TLBI instruction sorry for the confusion. >> After I read again multiple time the ARM ARM (D4-1732 in ARM DDI >> 0487A.i) and spoke with various ARM folks, changing the permission >> (i.e read, write, execute) does not require the break-before-make >> sequence. > > Regardless of whether Break-Before-Make (BBM) is required, TLB > invalidation is still required to ensure the new permissions have taken > effect after *any* modification to page tables, unless: > > * The prior entry was not permitted to be held in a TLB, and: > * No TLB held an entry for the address range. Agreed, however we only need one TLBI instruction (assuming there is no superpage shattering) per-batch rather than one per-entry in this case. > >> However, I noticed a latent bug in the memaccess code when the >> permission restriction are removed/changed. >> >> In the current implementation (i.e without this series), the TLB >> invalidation is deferred until the p2m is released. Until that time, >> a vCPU may still run with the previous permission and trap into the >> hypervisor the permission fault. However, as the permission as >> changed, p2m_memaccess_check may fail and we will inject an abort to >> the guest. >> >> The problem is very similar to [1]. This will still be true with >> this series applied if I relaxed the use of the break-before-make >> sequence. The two ways I can see to fix this are either try again >> the instruction (we would trap again if the permission was not >> correct) or keep the break-before-make. > > Why can you not have TLB invalidation without the full BBM sequence? I agree that in general TLBI instruction does not require the full BBM sequence. If we ever need the TLB invalidation per entry, it will be better to keep BBM to keep the code streamlined. However, in this case I think it will be better to return to the guest and replay the instruction if a data/instruction abort has been received. Regards,
On Mon, Aug 01, 2016 at 05:57:50PM +0100, Julien Grall wrote: > On 01/08/16 17:34, Mark Rutland wrote: > >On Mon, Aug 01, 2016 at 04:40:50PM +0100, Julien Grall wrote: > >>After I read again multiple time the ARM ARM (D4-1732 in ARM DDI > >>0487A.i) and spoke with various ARM folks, changing the permission > >>(i.e read, write, execute) does not require the break-before-make > >>sequence. > > > >Regardless of whether Break-Before-Make (BBM) is required, TLB > >invalidation is still required to ensure the new permissions have taken > >effect after *any* modification to page tables, unless: > > > >* The prior entry was not permitted to be held in a TLB, and: > >* No TLB held an entry for the address range. > > Agreed, however we only need one TLBI instruction (assuming there is > no superpage shattering) per-batch rather than one per-entry in this > case. I got Cc'd to a reply without the original patch context, so I'm not sure what the case is here. I'm not exactly sure what you mean by "per-batch". Assuming that you've (only) changed the permissions (i.e. the AP bits and XN bits) of a number of stage-2 leaf entries, you need either: * A single TLBI VMALLE1IS Note that this removes *all* stage-2 or combined stage-1&2 entries from TLBs, and thus is stronger than necessary. * Per entry, a TLBI IPAS2LE1IS e.g. for_each_entry(x) modify_ap_bits(x); dsb(ishst); tlbi(ipas2le1is); dsb(ish); > >>In the current implementation (i.e without this series), the TLB > >>invalidation is deferred until the p2m is released. Until that time, > >>a vCPU may still run with the previous permission and trap into the > >>hypervisor the permission fault. However, as the permission as > >>changed, p2m_memaccess_check may fail and we will inject an abort to > >>the guest. > >> > >>The problem is very similar to [1]. This will still be true with > >>this series applied if I relaxed the use of the break-before-make > >>sequence. The two ways I can see to fix this are either try again > >>the instruction (we would trap again if the permission was not > >>correct) or keep the break-before-make. > > > >Why can you not have TLB invalidation without the full BBM sequence? > > I agree that in general TLBI instruction does not require the full > BBM sequence. If we ever need the TLB invalidation per entry, it > will be better to keep BBM to keep the code streamlined. If this is not performance-critical, this sounds fine. This does unnecessarily penalise some cases, though. e.g. a guest only performing reads if write permission is removed from pages. > However, in this case I think it will be better to return to the > guest and replay the instruction if a data/instruction abort has > been received. That sounds like what we do in Linux. On a fault, if the page tables are such that the fault should not occur, we assume we raced with concurrent modification, and return to the faulting instruction. Eventually (once the TLB maintenance is completed), the guest will make forward progress. We have locking around page table manipulation, but only have to take them in the (hopefully) unlikely case of a race on the affected memory area. Thanks, Mark.
Hi, I realised I made a confusing mistake in my last reply; clarification below. On Mon, Aug 01, 2016 at 06:26:54PM +0100, Mark Rutland wrote: > On Mon, Aug 01, 2016 at 05:57:50PM +0100, Julien Grall wrote: > > however we only need one TLBI instruction (assuming there is > > no superpage shattering) per-batch rather than one per-entry in this > > case. > > I got Cc'd to a reply without the original patch context, so I'm not > sure what the case is here. I'm not exactly sure what you mean by > "per-batch". > > Assuming that you've (only) changed the permissions (i.e. the AP bits > and XN bits) of a number of stage-2 leaf entries, you need either: [...] > * Per entry, a TLBI IPAS2LE1IS > > e.g. > > for_each_entry(x) > modify_ap_bits(x); > dsb(ishst); > tlbi(ipas2le1is); > dsb(ish); Here I was trying to have the bare minimum barriers necessary, but in focussing on that I failed to add the required loop to have a TLBI per entry. The above should have been: for_each_entry(x) modify_ap_bits(x); dsb(ishst); for_each_entry(x) tlbi(ipas2le1is, x); dsb(ish); Assuming the necessary bit fiddling for the TLBI to affect the IPA of x, with the right VMID, etc. Thanks, Mark.
On 01/08/2016 19:22, Mark Rutland wrote: > Hi, Hi Mark, > I realised I made a confusing mistake in my last reply; clarification below. > > On Mon, Aug 01, 2016 at 06:26:54PM +0100, Mark Rutland wrote: >> On Mon, Aug 01, 2016 at 05:57:50PM +0100, Julien Grall wrote: >>> however we only need one TLBI instruction (assuming there is >>> no superpage shattering) per-batch rather than one per-entry in this >>> case. >> >> I got Cc'd to a reply without the original patch context, so I'm not >> sure what the case is here. I'm not exactly sure what you mean by >> "per-batch". Sorry for that. I CCed in case I did not summarize correctly the conversation we had. The page table handling code can be found in patch #18 [1]. >> >> Assuming that you've (only) changed the permissions (i.e. the AP bits >> and XN bits) of a number of stage-2 leaf entries, you need either: > > [...] > >> * Per entry, a TLBI IPAS2LE1IS >> >> e.g. >> >> for_each_entry(x) >> modify_ap_bits(x); >> dsb(ishst); >> tlbi(ipas2le1is); >> dsb(ish); > > Here I was trying to have the bare minimum barriers necessary, but in focussing > on that I failed to add the required loop to have a TLBI per entry. > > The above should have been: > > for_each_entry(x) > modify_ap_bits(x); > dsb(ishst); > for_each_entry(x) > tlbi(ipas2le1is, x); > dsb(ish); I have a question related to this example. Is there a threshold where invalidate all the TLB entry for a given VMID/ASID is worth? > > Assuming the necessary bit fiddling for the TLBI to affect the IPA of x, with > the right VMID, etc. Regards, [1] https://lists.xen.org/archives/html/xen-devel/2016-07/msg02966.html
On Tue, Aug 02, 2016 at 10:58:00AM +0100, Julien Grall wrote: > On 01/08/2016 19:22, Mark Rutland wrote: > >On Mon, Aug 01, 2016 at 06:26:54PM +0100, Mark Rutland wrote: > >>On Mon, Aug 01, 2016 at 05:57:50PM +0100, Julien Grall wrote: > >>>however we only need one TLBI instruction (assuming there is > >>>no superpage shattering) per-batch rather than one per-entry in this > >>>case. > >> > >>I got Cc'd to a reply without the original patch context, so I'm not > >>sure what the case is here. I'm not exactly sure what you mean by > >>"per-batch". > > Sorry for that. I CCed in case I did not summarize correctly the > conversation we had. > > The page table handling code can be found in patch #18 [1]. If I've understood, you're asking if you can do a TLBI VMALLE1IS per batch of leaf entry updates in p2m_set_entry? As below, if only the AP and/or XN bits are changing, that should be safe. If any other fields are being altered (inc. the output address, even for intermediate entries), that may not be safe. > >>Assuming that you've (only) changed the permissions (i.e. the AP bits > >>and XN bits) of a number of stage-2 leaf entries, you need either: > >>* Per entry, a TLBI IPAS2LE1IS > >> > >> e.g. > > for_each_entry(x) > > modify_ap_bits(x); > > dsb(ishst); > > for_each_entry(x) > > tlbi(ipas2le1is, x); > > dsb(ish); > > I have a question related to this example. Is there a threshold > where invalidate all the TLB entry for a given VMID/ASID is worth? Strictly speaking, "yes", but the value is going to depend on implementation and workload, so there's no "good" value as such provided by the architecture. In Linux, we end up doing so in some cases to avoid softlockups. Look for MAX_TLB_RANGE in arch/arm64/include/asm/tlbflush.h. There are some more details in commit 05ac65305437e8ef ("arm64: fix soft lockup due to large tlb flush range"). Thanks, Mark. > [1] https://lists.xen.org/archives/html/xen-devel/2016-07/msg02966.html
Hi Ravzan, On 06/09/2016 20:16, Razvan Cojocaru wrote: > On 09/06/16 22:06, Stefano Stabellini wrote: >> On Thu, 28 Jul 2016, Julien Grall wrote: >>>> The function p2m_set_mem_access can be re-implemented using the generic >>>> functions p2m_get_entry and __p2m_set_entry. >>>> >>>> Note that because of the implementation of p2m_get_entry, a TLB >>>> invalidation instruction will be issued for each 4KB page. Therefore the >>>> performance of memaccess will be impacted, however the function is now >>>> safe on all the processors. >>>> >>>> Also the function apply_p2m_changes is dropped completely as it is not >>>> unused anymore. >>>> >>>> Signed-off-by: Julien Grall <julien.grall@arm.com> >>>> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com> >>>> Cc: Tamas K Lengyel <tamas@tklengyel.com> >> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> > > How far is this patch from landing into staging? Considering the recent > discussion about the patch I'm working on, this would certainly impact > the upcoming ARM part of it. I expect this to be in Xen 4.8. I also realized that without this patch it will be harder to implement the ARM side. Given that I would be fine to write the ARM side once this series is out and your patch ready. Regards,
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index 707c7be..16ed393 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -1081,295 +1081,6 @@ static int p2m_set_entry(struct p2m_domain *p2m, return rc; } -#define P2M_ONE_DESCEND 0 -#define P2M_ONE_PROGRESS_NOP 0x1 -#define P2M_ONE_PROGRESS 0x10 - -static int p2m_shatter_page(struct p2m_domain *p2m, - lpae_t *entry, - unsigned int level) -{ - const paddr_t level_shift = level_shifts[level]; - int rc = p2m_create_table(p2m, entry, level_shift - PAGE_SHIFT); - - if ( !rc ) - { - p2m->stats.shattered[level]++; - p2m->stats.mappings[level]--; - p2m->stats.mappings[level+1] += LPAE_ENTRIES; - } - - return rc; -} - -/* - * 0 == (P2M_ONE_DESCEND) continue to descend the tree - * +ve == (P2M_ONE_PROGRESS_*) handled at this level, continue, flush, - * entry, addr and maddr updated. Return value is an - * indication of the amount of work done (for preemption). - * -ve == (-Exxx) error. - */ -static int apply_one_level(struct domain *d, - lpae_t *entry, - unsigned int level, - enum p2m_operation op, - paddr_t start_gpaddr, - paddr_t end_gpaddr, - paddr_t *addr, - paddr_t *maddr, - bool_t *flush, - p2m_type_t t, - p2m_access_t a) -{ - const paddr_t level_size = level_sizes[level]; - - struct p2m_domain *p2m = &d->arch.p2m; - lpae_t pte; - const lpae_t orig_pte = *entry; - int rc; - - BUG_ON(level > 3); - - switch ( op ) - { - case MEMACCESS: - if ( level < 3 ) - { - if ( !p2m_valid(orig_pte) ) - { - *addr += level_size; - return P2M_ONE_PROGRESS_NOP; - } - - /* Shatter large pages as we descend */ - if ( p2m_mapping(orig_pte) ) - { - rc = p2m_shatter_page(p2m, entry, level); - if ( rc < 0 ) - return rc; - } /* else: an existing table mapping -> descend */ - - return P2M_ONE_DESCEND; - } - else - { - pte = orig_pte; - - if ( p2m_valid(pte) ) - { - rc = p2m_mem_access_radix_set(p2m, _gfn(paddr_to_pfn(*addr)), - a); - if ( rc < 0 ) - return rc; - - p2m_set_permission(&pte, pte.p2m.type, a); - p2m_write_pte(entry, pte, p2m->clean_pte); - } - - *addr += level_size; - *flush = true; - return P2M_ONE_PROGRESS; - } - } - - BUG(); /* Should never get here */ -} - -/* - * The page is only used by the P2M code which is protected by the p2m->lock. - * So we can avoid to use atomic helpers. - */ -static void update_reference_mapping(struct page_info *page, - lpae_t old_entry, - lpae_t new_entry) -{ - if ( p2m_valid(old_entry) && !p2m_valid(new_entry) ) - page->u.inuse.p2m_refcount--; - else if ( !p2m_valid(old_entry) && p2m_valid(new_entry) ) - page->u.inuse.p2m_refcount++; -} - -static int apply_p2m_changes(struct domain *d, - enum p2m_operation op, - gfn_t sgfn, - unsigned long nr, - mfn_t smfn, - uint32_t mask, - p2m_type_t t, - p2m_access_t a) -{ - paddr_t start_gpaddr = pfn_to_paddr(gfn_x(sgfn)); - paddr_t end_gpaddr = pfn_to_paddr(gfn_x(sgfn) + nr); - paddr_t maddr = pfn_to_paddr(mfn_x(smfn)); - int rc, ret; - struct p2m_domain *p2m = &d->arch.p2m; - lpae_t *mappings[4] = { NULL, NULL, NULL, NULL }; - struct page_info *pages[4] = { NULL, NULL, NULL, NULL }; - paddr_t addr; - unsigned int level = 0; - unsigned int cur_root_table = ~0; - unsigned int cur_offset[4] = { ~0, ~0, ~0, ~0 }; - unsigned int count = 0; - const unsigned int preempt_count_limit = (op == MEMACCESS) ? 1 : 0x2000; - const bool_t preempt = !is_idle_vcpu(current); - bool_t flush = false; - PAGE_LIST_HEAD(free_pages); - struct page_info *pg; - - p2m_write_lock(p2m); - - /* Static mapping. P2M_ROOT_PAGES > 1 are handled below */ - if ( P2M_ROOT_PAGES == 1 ) - { - mappings[P2M_ROOT_LEVEL] = __map_domain_page(p2m->root); - pages[P2M_ROOT_LEVEL] = p2m->root; - } - - addr = start_gpaddr; - while ( addr < end_gpaddr ) - { - int root_table; - const unsigned int offsets[4] = { - zeroeth_table_offset(addr), - first_table_offset(addr), - second_table_offset(addr), - third_table_offset(addr) - }; - - /* - * Check if current iteration should be possibly preempted. - * Since count is initialised to 0 above we are guaranteed to - * always make at least one pass as long as preempt_count_limit is - * initialized with a value >= 1. - */ - if ( preempt && count >= preempt_count_limit - && hypercall_preempt_check() ) - { - switch ( op ) - { - case MEMACCESS: - { - /* - * Preempt setting mem_access permissions as required by XSA-89, - * if it's not the last iteration. - */ - uint32_t progress = paddr_to_pfn(addr) - gfn_x(sgfn) + 1; - - if ( nr > progress && !(progress & mask) ) - { - rc = progress; - goto out; - } - break; - } - - default: - break; - }; - - /* - * Reset current iteration counter. - */ - count = 0; - } - - if ( P2M_ROOT_PAGES > 1 ) - { - int i; - /* - * Concatenated root-level tables. The table number will be the - * offset at the previous level. It is not possible to concatenate - * a level-0 root. - */ - ASSERT(P2M_ROOT_LEVEL > 0); - root_table = offsets[P2M_ROOT_LEVEL - 1]; - if ( root_table >= P2M_ROOT_PAGES ) - { - rc = -EINVAL; - goto out; - } - - if ( cur_root_table != root_table ) - { - if ( mappings[P2M_ROOT_LEVEL] ) - unmap_domain_page(mappings[P2M_ROOT_LEVEL]); - mappings[P2M_ROOT_LEVEL] = - __map_domain_page(p2m->root + root_table); - pages[P2M_ROOT_LEVEL] = p2m->root + root_table; - cur_root_table = root_table; - /* Any mapping further down is now invalid */ - for ( i = P2M_ROOT_LEVEL; i < 4; i++ ) - cur_offset[i] = ~0; - } - } - - for ( level = P2M_ROOT_LEVEL; level < 4; level++ ) - { - unsigned offset = offsets[level]; - lpae_t *entry = &mappings[level][offset]; - lpae_t old_entry = *entry; - - ret = apply_one_level(d, entry, - level, op, - start_gpaddr, end_gpaddr, - &addr, &maddr, &flush, - t, a); - if ( ret < 0 ) { rc = ret ; goto out; } - count += ret; - - if ( ret != P2M_ONE_PROGRESS_NOP ) - update_reference_mapping(pages[level], old_entry, *entry); - - /* L3 had better have done something! We cannot descend any further */ - BUG_ON(level == 3 && ret == P2M_ONE_DESCEND); - if ( ret != P2M_ONE_DESCEND ) break; - - BUG_ON(!p2m_valid(*entry)); - - if ( cur_offset[level] != offset ) - { - /* Update mapping for next level */ - int i; - if ( mappings[level+1] ) - unmap_domain_page(mappings[level+1]); - mappings[level+1] = map_domain_page(_mfn(entry->p2m.base)); - pages[level+1] = mfn_to_page(entry->p2m.base); - cur_offset[level] = offset; - /* Any mapping further down is now invalid */ - for ( i = level+1; i < 4; i++ ) - cur_offset[i] = ~0; - } - /* else: next level already valid */ - } - - BUG_ON(level > 3); - } - - rc = 0; - -out: - if ( flush ) - { - p2m_flush_tlb_sync(&d->arch.p2m); - ret = iommu_iotlb_flush(d, gfn_x(sgfn), nr); - if ( !rc ) - rc = ret; - } - - while ( (pg = page_list_remove_head(&free_pages)) ) - free_domheap_page(pg); - - for ( level = P2M_ROOT_LEVEL; level < 4; level ++ ) - { - if ( mappings[level] ) - unmap_domain_page(mappings[level]); - } - - p2m_write_unlock(p2m); - - return rc; -} - static inline int p2m_insert_mapping(struct domain *d, gfn_t start_gfn, unsigned long nr, @@ -2069,6 +1780,7 @@ long p2m_set_mem_access(struct domain *d, gfn_t gfn, uint32_t nr, { struct p2m_domain *p2m = p2m_get_hostp2m(d); p2m_access_t a; + unsigned int order; long rc = 0; static const p2m_access_t memaccess[] = { @@ -2111,8 +1823,43 @@ long p2m_set_mem_access(struct domain *d, gfn_t gfn, uint32_t nr, return 0; } - rc = apply_p2m_changes(d, MEMACCESS, gfn_add(gfn, start), - (nr - start), INVALID_MFN, mask, 0, a); + p2m_write_lock(p2m); + + for ( gfn = gfn_add(gfn, start); nr > start; gfn = gfn_add(gfn, 1UL << order) ) + { + p2m_type_t t; + mfn_t mfn = p2m_get_entry(p2m, gfn, &t, NULL, &order); + + /* Skip hole */ + if ( mfn_eq(mfn, INVALID_MFN) ) + { + /* + * the order corresponds to the order of the mapping in the + * page table. so we need to align the gfn before + * incrementing. + */ + gfn = _gfn(gfn_x(gfn) & ~((1UL << order) - 1)); + continue; + } + else + { + order = 0; + rc = __p2m_set_entry(p2m, gfn, 0, mfn, t, a); + if ( rc ) + break; + } + + start += (1UL << order); + /* Check for continuation if it is not the last iteration */ + if ( nr > start && !(start & mask) && hypercall_preempt_check() ) + { + rc = start; + break; + } + } + + p2m_write_unlock(p2m); + if ( rc < 0 ) return rc; else if ( rc > 0 )
The function p2m_set_mem_access can be re-implemented using the generic functions p2m_get_entry and __p2m_set_entry. Note that because of the implementation of p2m_get_entry, a TLB invalidation instruction will be issued for each 4KB page. Therefore the performance of memaccess will be impacted, however the function is now safe on all the processors. Also the function apply_p2m_changes is dropped completely as it is not unused anymore. Signed-off-by: Julien Grall <julien.grall@arm.com> Cc: Razvan Cojocaru <rcojocaru@bitdefender.com> Cc: Tamas K Lengyel <tamas@tklengyel.com> --- I have not ran any performance test with memaccess for now, but I expect an important and unavoidable impact because of how memaccess has been designed to workaround hardware limitation. Note that might be possible to re-work memaccess work on superpage but this should be done in a separate patch. --- xen/arch/arm/p2m.c | 329 +++++++---------------------------------------------- 1 file changed, 38 insertions(+), 291 deletions(-)