Message ID | 20181008183352.16291-17-julien.grall@arm.com |
---|---|
State | New |
Headers | show |
Series | xen/arm: Implement Set/Way operations | expand |
>>> On 08.10.18 at 20:33, <julien.grall@arm.com> wrote: > At the moment, the implementation of Set/Way operations will go through > all the entries of the guest P2M and flush them. However, this is very > expensive and may render unusable a guest OS using them. > > For instance, Linux 32-bit will use Set/Way operations during secondary > CPU bring-up. As the implementation is really expensive, it may be possible > to hit the CPU bring-up timeout. > > To limit the Set/Way impact, we track what pages has been of the guest > has been accessed between batch of Set/Way operations. This is done > using bit[0] (aka valid bit) of the P2M entry. > > This patch adds a new per-arch helper is introduced to perform actions just > before the guest is first unpaused. This will be used to invalidate the > P2M to track access from the start of the guest. While I'm not opposed to the new arch hook, why don't you create the p2m entries in their intended state right away? At the very least this would have the benefit of confining the entire change to Arm code. Jan
Hi Jan, On 09/10/2018 08:04, Jan Beulich wrote: >>>> On 08.10.18 at 20:33, <julien.grall@arm.com> wrote: >> At the moment, the implementation of Set/Way operations will go through >> all the entries of the guest P2M and flush them. However, this is very >> expensive and may render unusable a guest OS using them. >> >> For instance, Linux 32-bit will use Set/Way operations during secondary >> CPU bring-up. As the implementation is really expensive, it may be possible >> to hit the CPU bring-up timeout. >> >> To limit the Set/Way impact, we track what pages has been of the guest >> has been accessed between batch of Set/Way operations. This is done >> using bit[0] (aka valid bit) of the P2M entry. >> >> This patch adds a new per-arch helper is introduced to perform actions just >> before the guest is first unpaused. This will be used to invalidate the >> P2M to track access from the start of the guest. > > While I'm not opposed to the new arch hook, why don't you create the > p2m entries in their intended state right away? At the very least this > would have the benefit of confining the entire change to Arm code. Let me start by saying I think having a hook to perform an action once the VM has been fully created is quite useful. For instance, this could be used on Arm to limit the invalidation of the icache. At the moment, we invalidate the icache for every populate memory hypercall. This is quite a waste of cycle. In this particular circumstance, I would still like to use the hardware for walking the page-tables during the domain creation (i.e when copy binary over). This would not be possible if we create the entry with valid bit unset. Furthermore, we don't need to create entry with valid bit unset once the guest is running. So we would need to check in the P2M code whether the guest is running and whether IOMMU is enabled. So overall, I fell this is the best way to keep it simple for Arm and open door to speed/streamline a bit more the domain creation. Cheers,
>>> On 09.10.18 at 12:16, <julien.grall@arm.com> wrote: > On 09/10/2018 08:04, Jan Beulich wrote: >>>>> On 08.10.18 at 20:33, <julien.grall@arm.com> wrote: >>> This patch adds a new per-arch helper is introduced to perform actions just >>> before the guest is first unpaused. This will be used to invalidate the >>> P2M to track access from the start of the guest. >> >> While I'm not opposed to the new arch hook, why don't you create the >> p2m entries in their intended state right away? At the very least this >> would have the benefit of confining the entire change to Arm code. > > Let me start by saying I think having a hook to perform an action once > the VM has been fully created is quite useful. For instance, this could > be used on Arm to limit the invalidation of the icache. At the moment, > we invalidate the icache for every populate memory hypercall. This is > quite a waste of cycle. As said - I'm not opposed to such a hook in principle, but I'd like to understand the reasons (and in particular whether there's an alternative without introducing such a hook). > In this particular circumstance, I would still like to use the hardware > for walking the page-tables during the domain creation (i.e when copy > binary over). This would not be possible if we create the entry with > valid bit unset. This must be something Arm specific, since afaiu we're talking about arbitrary domain creation here, not just Dom0. On x86 it would be basically impossible to re-use the page tables created for the guest to access guest memory from the control domain. (It could be made work by inserting sub-trees into the control domain's page tables, but obviously there would be a fair chance of conflict between the virtual addresses the control domain uses for its own purposes and the ones where the destination range in the domain being created sits). > Furthermore, we don't need to create entry with valid bit unset once the > guest is running. So we would need to check in the P2M code whether the > guest is running and whether IOMMU is enabled. Well, looking at the patch context of your change, it is quite clear that this would be pretty easy - simply taking d->creation_finished into account. Jan
Hi Jan, On 09/10/2018 12:43, Jan Beulich wrote: >>>> On 09.10.18 at 12:16, <julien.grall@arm.com> wrote: >> On 09/10/2018 08:04, Jan Beulich wrote: >>>>>> On 08.10.18 at 20:33, <julien.grall@arm.com> wrote: >>>> This patch adds a new per-arch helper is introduced to perform actions just >>>> before the guest is first unpaused. This will be used to invalidate the >>>> P2M to track access from the start of the guest. >>> >>> While I'm not opposed to the new arch hook, why don't you create the >>> p2m entries in their intended state right away? At the very least this >>> would have the benefit of confining the entire change to Arm code. >> >> Let me start by saying I think having a hook to perform an action once >> the VM has been fully created is quite useful. For instance, this could >> be used on Arm to limit the invalidation of the icache. At the moment, >> we invalidate the icache for every populate memory hypercall. This is >> quite a waste of cycle. > > As said - I'm not opposed to such a hook in principle, but I'd like > to understand the reasons (and in particular whether there's an > alternative without introducing such a hook). > >> In this particular circumstance, I would still like to use the hardware >> for walking the page-tables during the domain creation (i.e when copy >> binary over). This would not be possible if we create the entry with >> valid bit unset. > > This must be something Arm specific, since afaiu we're talking about > arbitrary domain creation here, not just Dom0. On x86 it would be > basically impossible to re-use the page tables created for the guest > to access guest memory from the control domain. (It could be made > work by inserting sub-trees into the control domain's page tables, > but obviously there would be a fair chance of conflict between the > virtual addresses the control domain uses for its own purposes and > the ones where the destination range in the domain being created > sits). Well we don't share sub-trees on Arm, yet have the valid set from the starting is still useful if you want to use the hardware for translate a guest address (e.g by switch between page-tables). This avoid the software lookup. > >> Furthermore, we don't need to create entry with valid bit unset once the >> guest is running. So we would need to check in the P2M code whether the >> guest is running and whether IOMMU is enabled. > > Well, looking at the patch context of your change, it is quite clear > that this would be pretty easy - simply taking d->creation_finished > into account. I never said I couldn't use d->creation_finished. It is possible to spread it everywhere if we want to. But what's the point when it can be done in a single place? Furthermore, as I suggested at the beginning of my previous answer, I can see other usage for this new hook. I am quite surprised you don't see any benefits on x86 too. For instance looking at the memory subsystem, it would be possible to defer the TLB flush until the domain actually first run. Cheers,
On Mon, 8 Oct 2018, Julien Grall wrote: > At the moment, the implementation of Set/Way operations will go through > all the entries of the guest P2M and flush them. However, this is very > expensive and may render unusable a guest OS using them. > > For instance, Linux 32-bit will use Set/Way operations during secondary > CPU bring-up. As the implementation is really expensive, it may be possible > to hit the CPU bring-up timeout. > > To limit the Set/Way impact, we track what pages has been of the guest > has been accessed between batch of Set/Way operations. This is done > using bit[0] (aka valid bit) of the P2M entry. This is going to improve performance of ill-mannered guests at the cost of hurting performance of well-mannered guests. Is it really a good trade-off? Should this behavior at least be configurable with a Xen command line? > This patch adds a new per-arch helper is introduced to perform actions just > before the guest is first unpaused. This will be used to invalidate the > P2M to track access from the start of the guest. > > Signed-off-by: Julien Grall <julien.grall@arm.com> > > --- > > Cc: Stefano Stabellini <sstabellini@kernel.org> > Cc: Julien Grall <julien.grall@arm.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > Cc: George Dunlap <George.Dunlap@eu.citrix.com> > Cc: Ian Jackson <ian.jackson@eu.citrix.com> > Cc: Jan Beulich <jbeulich@suse.com> > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > Cc: Tim Deegan <tim@xen.org> > Cc: Wei Liu <wei.liu2@citrix.com> > --- > xen/arch/arm/domain.c | 14 ++++++++++++++ > xen/arch/arm/domain_build.c | 7 +++++++ > xen/arch/arm/p2m.c | 32 +++++++++++++++++++++++++++++++- > xen/arch/x86/domain.c | 4 ++++ > xen/common/domain.c | 5 ++++- > xen/include/asm-arm/p2m.h | 2 ++ > xen/include/xen/domain.h | 2 ++ > 7 files changed, 64 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > index feebbf5a92..f439f4657a 100644 > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -738,6 +738,20 @@ int arch_domain_soft_reset(struct domain *d) > return -ENOSYS; > } > > +void arch_domain_creation_finished(struct domain *d) > +{ > + /* > + * To avoid flushing the whole guest RAM on the first Set/Way, we > + * invalidate the P2M to track what has been accessed. > + * > + * This is only turned when IOMMU is not used or the page-table are > + * not shared because bit[0] (e.g valid bit) unset will result > + * IOMMU fault that could be not fixed-up. > + */ > + if ( !iommu_use_hap_pt(d) ) > + p2m_invalidate_root(p2m_get_hostp2m(d)); > +} > + > static int is_guest_pv32_psr(uint32_t psr) > { > switch (psr & PSR_MODE_MASK) > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c > index f552154e93..de96516faa 100644 > --- a/xen/arch/arm/domain_build.c > +++ b/xen/arch/arm/domain_build.c > @@ -2249,6 +2249,13 @@ int __init construct_dom0(struct domain *d) > v->is_initialised = 1; > clear_bit(_VPF_down, &v->pause_flags); > > + /* > + * XXX: We probably want a command line option to invalidate or not > + * the P2M. This is because invalidating the P2M will not work with > + * IOMMU, however if this is not done there will be an impact. Why can't we check on iommu_use_hap_pt(d) like in arch_domain_creation_finished? In any case, I agree it is a good idea to introduce a command line parameter to toggle the p2m_invalidate_root call at domain creation on/off. There are cases where it should be off even if an IOMMU is present. Aside from these two questions, the rest of the patch looks correct. > + */ > + p2m_invalidate_root(p2m_get_hostp2m(d)); > > return 0; > } > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index a3d4c563b1..8e0c31d7ac 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -1079,6 +1079,22 @@ static void p2m_invalidate_table(struct p2m_domain *p2m, mfn_t mfn) > p2m->need_flush = true; > } > > +/* > + * Invalidate all entries in the root page-tables. This is > + * useful to get fault on entry and do an action. > + */ > +void p2m_invalidate_root(struct p2m_domain *p2m) > +{ > + unsigned int i; > + > + p2m_write_lock(p2m); > + > + for ( i = 0; i < P2M_ROOT_LEVEL; i++ ) > + p2m_invalidate_table(p2m, page_to_mfn(p2m->root + i)); > + > + p2m_write_unlock(p2m); > +} > + > bool p2m_resolve_translation_fault(struct domain *d, gfn_t gfn) > { > struct p2m_domain *p2m = p2m_get_hostp2m(d); > @@ -1539,7 +1555,8 @@ int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end) > > for ( ; gfn_x(start) < gfn_x(end); start = next_gfn ) > { > - mfn_t mfn = p2m_get_entry(p2m, start, &t, NULL, &order, NULL); > + bool valid; > + mfn_t mfn = p2m_get_entry(p2m, start, &t, NULL, &order, &valid); > > next_gfn = gfn_next_boundary(start, order); > > @@ -1547,6 +1564,13 @@ int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end) > if ( mfn_eq(mfn, INVALID_MFN) || !p2m_is_any_ram(t) ) > continue; > > + /* > + * Page with valid bit (bit [0]) unset does not need to be > + * cleaned > + */ > + if ( !valid ) > + continue; > + > /* XXX: Implement preemption */ > while ( gfn_x(start) < gfn_x(next_gfn) ) > { > @@ -1571,6 +1595,12 @@ static void p2m_flush_vm(struct vcpu *v) > /* XXX: Handle preemption */ > p2m_cache_flush_range(v->domain, p2m->lowest_mapped_gfn, > p2m->max_mapped_gfn); > + > + /* > + * Invalidate the p2m to track which page was modified by the guest > + * between call of p2m_flush_vm(). > + */ > + p2m_invalidate_root(p2m); > } > > /* > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c > index 9371efc8c7..2b6d1c01a1 100644 > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -723,6 +723,10 @@ int arch_domain_soft_reset(struct domain *d) > return ret; > } > > +void arch_domain_creation_finished(struct domain *d) > +{ > +} > + > /* > * These are the masks of CR4 bits (subject to hardware availability) which a > * PV guest may not legitimiately attempt to modify. > diff --git a/xen/common/domain.c b/xen/common/domain.c > index 65151e2ac4..b402c635f9 100644 > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -1100,8 +1100,11 @@ int domain_unpause_by_systemcontroller(struct domain *d) > * Creation is considered finished when the controller reference count > * first drops to 0. > */ > - if ( new == 0 ) > + if ( new == 0 && !d->creation_finished ) > + { > d->creation_finished = true; > + arch_domain_creation_finished(d); > + } > > domain_unpause(d); > > diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h > index c470f062db..2a4652e7f4 100644 > --- a/xen/include/asm-arm/p2m.h > +++ b/xen/include/asm-arm/p2m.h > @@ -225,6 +225,8 @@ int p2m_set_entry(struct p2m_domain *p2m, > p2m_type_t t, > p2m_access_t a); > > +void p2m_invalidate_root(struct p2m_domain *p2m); > + > /* > * Clean & invalidate caches corresponding to a region [start,end) of guest > * address space. > diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h > index 5e393fd7f2..8d95ad4bf1 100644 > --- a/xen/include/xen/domain.h > +++ b/xen/include/xen/domain.h > @@ -70,6 +70,8 @@ void arch_domain_unpause(struct domain *d); > > int arch_domain_soft_reset(struct domain *d); > > +void arch_domain_creation_finished(struct domain *d); > + > void arch_p2m_set_access_required(struct domain *d, bool access_required); > > int arch_set_info_guest(struct vcpu *, vcpu_guest_context_u); > -- > 2.11.0 >
Hi Stefano, On 11/5/18 9:35 PM, Stefano Stabellini wrote: > On Mon, 8 Oct 2018, Julien Grall wrote: >> At the moment, the implementation of Set/Way operations will go through >> all the entries of the guest P2M and flush them. However, this is very >> expensive and may render unusable a guest OS using them. >> >> For instance, Linux 32-bit will use Set/Way operations during secondary >> CPU bring-up. As the implementation is really expensive, it may be possible >> to hit the CPU bring-up timeout. >> >> To limit the Set/Way impact, we track what pages has been of the guest >> has been accessed between batch of Set/Way operations. This is done >> using bit[0] (aka valid bit) of the P2M entry. > > This is going to improve performance of ill-mannered guests at the cost > of hurting performance of well-mannered guests. Is it really a good > trade-off? Should this behavior at least be configurable with a Xen > command line? Well, we have the choice between not been able to boot Linux 32-bit anymore or have a slight impact at the boot time for all guests. As you may have noticed the command line is been suggested below. I didn't yet implemented as we agreed at Connect it would be good to start getting feedback on it. > > >> This patch adds a new per-arch helper is introduced to perform actions just >> before the guest is first unpaused. This will be used to invalidate the >> P2M to track access from the start of the guest. >> >> Signed-off-by: Julien Grall <julien.grall@arm.com> >> >> --- >> >> Cc: Stefano Stabellini <sstabellini@kernel.org> >> Cc: Julien Grall <julien.grall@arm.com> >> Cc: Andrew Cooper <andrew.cooper3@citrix.com> >> Cc: George Dunlap <George.Dunlap@eu.citrix.com> >> Cc: Ian Jackson <ian.jackson@eu.citrix.com> >> Cc: Jan Beulich <jbeulich@suse.com> >> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> >> Cc: Tim Deegan <tim@xen.org> >> Cc: Wei Liu <wei.liu2@citrix.com> >> --- >> xen/arch/arm/domain.c | 14 ++++++++++++++ >> xen/arch/arm/domain_build.c | 7 +++++++ >> xen/arch/arm/p2m.c | 32 +++++++++++++++++++++++++++++++- >> xen/arch/x86/domain.c | 4 ++++ >> xen/common/domain.c | 5 ++++- >> xen/include/asm-arm/p2m.h | 2 ++ >> xen/include/xen/domain.h | 2 ++ >> 7 files changed, 64 insertions(+), 2 deletions(-) >> >> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c >> index feebbf5a92..f439f4657a 100644 >> --- a/xen/arch/arm/domain.c >> +++ b/xen/arch/arm/domain.c >> @@ -738,6 +738,20 @@ int arch_domain_soft_reset(struct domain *d) >> return -ENOSYS; >> } >> >> +void arch_domain_creation_finished(struct domain *d) >> +{ >> + /* >> + * To avoid flushing the whole guest RAM on the first Set/Way, we >> + * invalidate the P2M to track what has been accessed. >> + * >> + * This is only turned when IOMMU is not used or the page-table are >> + * not shared because bit[0] (e.g valid bit) unset will result >> + * IOMMU fault that could be not fixed-up. >> + */ >> + if ( !iommu_use_hap_pt(d) ) >> + p2m_invalidate_root(p2m_get_hostp2m(d)); >> +} >> + >> static int is_guest_pv32_psr(uint32_t psr) >> { >> switch (psr & PSR_MODE_MASK) >> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c >> index f552154e93..de96516faa 100644 >> --- a/xen/arch/arm/domain_build.c >> +++ b/xen/arch/arm/domain_build.c >> @@ -2249,6 +2249,13 @@ int __init construct_dom0(struct domain *d) >> v->is_initialised = 1; >> clear_bit(_VPF_down, &v->pause_flags); >> >> + /* >> + * XXX: We probably want a command line option to invalidate or not >> + * the P2M. This is because invalidating the P2M will not work with >> + * IOMMU, however if this is not done there will be an impact. > > Why can't we check on iommu_use_hap_pt(d) like in > arch_domain_creation_finished? > > In any case, I agree it is a good idea to introduce a command line > parameter to toggle the p2m_invalidate_root call at domain creation > on/off. There are cases where it should be off even if an IOMMU is > present. I actually forgot to remove that code as Dom0 should be covered by the change below. I am not entirely to understand your last sentence, this feature is turned off when an IOMMU is present. So what is your use case here? Cheers,
On Mon, 5 Nov 2018, Julien Grall wrote: > Hi Stefano, > > On 11/5/18 9:35 PM, Stefano Stabellini wrote: > > On Mon, 8 Oct 2018, Julien Grall wrote: > > > At the moment, the implementation of Set/Way operations will go through > > > all the entries of the guest P2M and flush them. However, this is very > > > expensive and may render unusable a guest OS using them. > > > > > > For instance, Linux 32-bit will use Set/Way operations during secondary > > > CPU bring-up. As the implementation is really expensive, it may be > > > possible > > > to hit the CPU bring-up timeout. > > > > > > To limit the Set/Way impact, we track what pages has been of the guest > > > has been accessed between batch of Set/Way operations. This is done > > > using bit[0] (aka valid bit) of the P2M entry. > > > > This is going to improve performance of ill-mannered guests at the cost > > of hurting performance of well-mannered guests. Is it really a good > > trade-off? Should this behavior at least be configurable with a Xen > > command line? > > Well, we have the choice between not been able to boot Linux 32-bit anymore or > have a slight impact at the boot time for all guests. Wait -- I thought that with the set/way emulation introduced by patch #15 we would be able to boot Linux 32-bit already. This patch is a performance improvement. Or is it actually needed to boot Linux 32-bit? > As you may have noticed the command line is been suggested below. I didn't yet > implemented as we agreed at Connect it would be good to start getting feedback > on it. Sure. I was thinking about this -- does it make sense to have different defaults for 32bit and 64bit guests? 32bit -> default p2m_invalidate_root 64bit -> default not > > > > > This patch adds a new per-arch helper is introduced to perform actions > > > just > > > before the guest is first unpaused. This will be used to invalidate the > > > P2M to track access from the start of the guest. > > > > > > Signed-off-by: Julien Grall <julien.grall@arm.com> > > > > > > --- > > > > > > Cc: Stefano Stabellini <sstabellini@kernel.org> > > > Cc: Julien Grall <julien.grall@arm.com> > > > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > > > Cc: George Dunlap <George.Dunlap@eu.citrix.com> > > > Cc: Ian Jackson <ian.jackson@eu.citrix.com> > > > Cc: Jan Beulich <jbeulich@suse.com> > > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> > > > Cc: Tim Deegan <tim@xen.org> > > > Cc: Wei Liu <wei.liu2@citrix.com> > > > --- > > > xen/arch/arm/domain.c | 14 ++++++++++++++ > > > xen/arch/arm/domain_build.c | 7 +++++++ > > > xen/arch/arm/p2m.c | 32 +++++++++++++++++++++++++++++++- > > > xen/arch/x86/domain.c | 4 ++++ > > > xen/common/domain.c | 5 ++++- > > > xen/include/asm-arm/p2m.h | 2 ++ > > > xen/include/xen/domain.h | 2 ++ > > > 7 files changed, 64 insertions(+), 2 deletions(-) > > > > > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c > > > index feebbf5a92..f439f4657a 100644 > > > --- a/xen/arch/arm/domain.c > > > +++ b/xen/arch/arm/domain.c > > > @@ -738,6 +738,20 @@ int arch_domain_soft_reset(struct domain *d) > > > return -ENOSYS; > > > } > > > +void arch_domain_creation_finished(struct domain *d) > > > +{ > > > + /* > > > + * To avoid flushing the whole guest RAM on the first Set/Way, we > > > + * invalidate the P2M to track what has been accessed. > > > + * > > > + * This is only turned when IOMMU is not used or the page-table are > > > + * not shared because bit[0] (e.g valid bit) unset will result > > > + * IOMMU fault that could be not fixed-up. > > > + */ > > > + if ( !iommu_use_hap_pt(d) ) > > > + p2m_invalidate_root(p2m_get_hostp2m(d)); > > > +} > > > + > > > static int is_guest_pv32_psr(uint32_t psr) > > > { > > > switch (psr & PSR_MODE_MASK) > > > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c > > > index f552154e93..de96516faa 100644 > > > --- a/xen/arch/arm/domain_build.c > > > +++ b/xen/arch/arm/domain_build.c > > > @@ -2249,6 +2249,13 @@ int __init construct_dom0(struct domain *d) > > > v->is_initialised = 1; > > > clear_bit(_VPF_down, &v->pause_flags); > > > + /* > > > + * XXX: We probably want a command line option to invalidate or not > > > + * the P2M. This is because invalidating the P2M will not work with > > > + * IOMMU, however if this is not done there will be an impact. > > > > Why can't we check on iommu_use_hap_pt(d) like in > > arch_domain_creation_finished? > > > > In any case, I agree it is a good idea to introduce a command line > > parameter to toggle the p2m_invalidate_root call at domain creation > > on/off. There are cases where it should be off even if an IOMMU is > > present. > > I actually forgot to remove that code as Dom0 should be covered by the change > below. Makes sense now > I am not entirely to understand your last sentence, this feature is turned off > when an IOMMU is present. So what is your use case here? My sentence was badly written, sorry. I meant to say that even when an IOMMU is NOT present, there are cases where we might not want to call p2m_invalidate_root.
On 06/11/2018 17:43, Stefano Stabellini wrote: > On Mon, 5 Nov 2018, Julien Grall wrote: >> Hi Stefano, >> >> On 11/5/18 9:35 PM, Stefano Stabellini wrote: >>> On Mon, 8 Oct 2018, Julien Grall wrote: >>>> At the moment, the implementation of Set/Way operations will go through >>>> all the entries of the guest P2M and flush them. However, this is very >>>> expensive and may render unusable a guest OS using them. >>>> >>>> For instance, Linux 32-bit will use Set/Way operations during secondary >>>> CPU bring-up. As the implementation is really expensive, it may be >>>> possible >>>> to hit the CPU bring-up timeout. >>>> >>>> To limit the Set/Way impact, we track what pages has been of the guest >>>> has been accessed between batch of Set/Way operations. This is done >>>> using bit[0] (aka valid bit) of the P2M entry. >>> >>> This is going to improve performance of ill-mannered guests at the cost >>> of hurting performance of well-mannered guests. Is it really a good >>> trade-off? Should this behavior at least be configurable with a Xen >>> command line? >> >> Well, we have the choice between not been able to boot Linux 32-bit anymore or >> have a slight impact at the boot time for all guests. > > Wait -- I thought that with the set/way emulation introduced by patch > #15 we would be able to boot Linux 32-bit already. This patch is a > performance improvement. Or is it actually needed to boot Linux 32-bit? The problem is Linux 32-bit calls a few time set/way during secondary CPU bring up. It also has a timeout of 1s to fully boot that CPU. In my testing, I can easily hit the timeout even with a small amount of memory. If we don't start tracking the page from the beginning, then you would need to clean the full RAM the first time. If you start to track from the beginning, you may just have to clean a couple of MB. > > >> As you may have noticed the command line is been suggested below. I didn't yet >> implemented as we agreed at Connect it would be good to start getting feedback >> on it. > > Sure. I was thinking about this -- does it make sense to have different > defaults for 32bit and 64bit guests? > > 32bit -> default p2m_invalidate_root > 64bit -> default not The decision is not that easy. While Linux arm64 does not contain set/way anymore, UEFI still use them (I checked it a couple of months ago). I haven't done any benchmark yet. That's my next step. >> I am not entirely to understand your last sentence, this feature is turned off >> when an IOMMU is present. So what is your use case here? > > My sentence was badly written, sorry. I meant to say that even when an > IOMMU is NOT present, there are cases where we might not want to call > p2m_invalidate_root. Before implementing a command line option (or even plumbing that to the guest configuration), I want to understand what is the real impact on the boot time for "normal" guest. Cheers,
On Tue, 6 Nov 2018, Julien Grall wrote: > On 06/11/2018 17:43, Stefano Stabellini wrote: > > On Mon, 5 Nov 2018, Julien Grall wrote: > > > Hi Stefano, > > > > > > On 11/5/18 9:35 PM, Stefano Stabellini wrote: > > > > On Mon, 8 Oct 2018, Julien Grall wrote: > > > > > At the moment, the implementation of Set/Way operations will go > > > > > through > > > > > all the entries of the guest P2M and flush them. However, this is very > > > > > expensive and may render unusable a guest OS using them. > > > > > > > > > > For instance, Linux 32-bit will use Set/Way operations during > > > > > secondary > > > > > CPU bring-up. As the implementation is really expensive, it may be > > > > > possible > > > > > to hit the CPU bring-up timeout. > > > > > > > > > > To limit the Set/Way impact, we track what pages has been of the guest > > > > > has been accessed between batch of Set/Way operations. This is done > > > > > using bit[0] (aka valid bit) of the P2M entry. > > > > > > > > This is going to improve performance of ill-mannered guests at the cost > > > > of hurting performance of well-mannered guests. Is it really a good > > > > trade-off? Should this behavior at least be configurable with a Xen > > > > command line? > > > > > > Well, we have the choice between not been able to boot Linux 32-bit > > > anymore or > > > have a slight impact at the boot time for all guests. > > > > Wait -- I thought that with the set/way emulation introduced by patch > > #15 we would be able to boot Linux 32-bit already. This patch is a > > performance improvement. Or is it actually needed to boot Linux 32-bit? > > The problem is Linux 32-bit calls a few time set/way during secondary CPU > bring up. It also has a timeout of 1s to fully boot that CPU. In my testing, I > can easily hit the timeout even with a small amount of memory. Damn! It is worst than I thought. > If we don't start tracking the page from the beginning, then you would need to > clean the full RAM the first time. If you start to track from the beginning, > you may just have to clean a couple of MB. > > > > > > As you may have noticed the command line is been suggested below. I didn't > > > yet > > > implemented as we agreed at Connect it would be good to start getting > > > feedback > > > on it. > > > > Sure. I was thinking about this -- does it make sense to have different > > defaults for 32bit and 64bit guests? > > > > 32bit -> default p2m_invalidate_root > > 64bit -> default not > > The decision is not that easy. While Linux arm64 does not contain set/way > anymore, UEFI still use them (I checked it a couple of months ago). > > I haven't done any benchmark yet. That's my next step. OK, makes sense > > > I am not entirely to understand your last sentence, this feature is turned > > > off > > > when an IOMMU is present. So what is your use case here? > > My sentence was badly written, sorry. I meant to say that even when an > > IOMMU is NOT present, there are cases where we might not want to call > > p2m_invalidate_root. > > Before implementing a command line option (or even plumbing that to the guest > configuration), I want to understand what is the real impact on the boot time > for "normal" guest. yes, makes sense
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index feebbf5a92..f439f4657a 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -738,6 +738,20 @@ int arch_domain_soft_reset(struct domain *d) return -ENOSYS; } +void arch_domain_creation_finished(struct domain *d) +{ + /* + * To avoid flushing the whole guest RAM on the first Set/Way, we + * invalidate the P2M to track what has been accessed. + * + * This is only turned when IOMMU is not used or the page-table are + * not shared because bit[0] (e.g valid bit) unset will result + * IOMMU fault that could be not fixed-up. + */ + if ( !iommu_use_hap_pt(d) ) + p2m_invalidate_root(p2m_get_hostp2m(d)); +} + static int is_guest_pv32_psr(uint32_t psr) { switch (psr & PSR_MODE_MASK) diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index f552154e93..de96516faa 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -2249,6 +2249,13 @@ int __init construct_dom0(struct domain *d) v->is_initialised = 1; clear_bit(_VPF_down, &v->pause_flags); + /* + * XXX: We probably want a command line option to invalidate or not + * the P2M. This is because invalidating the P2M will not work with + * IOMMU, however if this is not done there will be an impact. + */ + p2m_invalidate_root(p2m_get_hostp2m(d)); + return 0; } diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index a3d4c563b1..8e0c31d7ac 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -1079,6 +1079,22 @@ static void p2m_invalidate_table(struct p2m_domain *p2m, mfn_t mfn) p2m->need_flush = true; } +/* + * Invalidate all entries in the root page-tables. This is + * useful to get fault on entry and do an action. + */ +void p2m_invalidate_root(struct p2m_domain *p2m) +{ + unsigned int i; + + p2m_write_lock(p2m); + + for ( i = 0; i < P2M_ROOT_LEVEL; i++ ) + p2m_invalidate_table(p2m, page_to_mfn(p2m->root + i)); + + p2m_write_unlock(p2m); +} + bool p2m_resolve_translation_fault(struct domain *d, gfn_t gfn) { struct p2m_domain *p2m = p2m_get_hostp2m(d); @@ -1539,7 +1555,8 @@ int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end) for ( ; gfn_x(start) < gfn_x(end); start = next_gfn ) { - mfn_t mfn = p2m_get_entry(p2m, start, &t, NULL, &order, NULL); + bool valid; + mfn_t mfn = p2m_get_entry(p2m, start, &t, NULL, &order, &valid); next_gfn = gfn_next_boundary(start, order); @@ -1547,6 +1564,13 @@ int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end) if ( mfn_eq(mfn, INVALID_MFN) || !p2m_is_any_ram(t) ) continue; + /* + * Page with valid bit (bit [0]) unset does not need to be + * cleaned + */ + if ( !valid ) + continue; + /* XXX: Implement preemption */ while ( gfn_x(start) < gfn_x(next_gfn) ) { @@ -1571,6 +1595,12 @@ static void p2m_flush_vm(struct vcpu *v) /* XXX: Handle preemption */ p2m_cache_flush_range(v->domain, p2m->lowest_mapped_gfn, p2m->max_mapped_gfn); + + /* + * Invalidate the p2m to track which page was modified by the guest + * between call of p2m_flush_vm(). + */ + p2m_invalidate_root(p2m); } /* diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 9371efc8c7..2b6d1c01a1 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -723,6 +723,10 @@ int arch_domain_soft_reset(struct domain *d) return ret; } +void arch_domain_creation_finished(struct domain *d) +{ +} + /* * These are the masks of CR4 bits (subject to hardware availability) which a * PV guest may not legitimiately attempt to modify. diff --git a/xen/common/domain.c b/xen/common/domain.c index 65151e2ac4..b402c635f9 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -1100,8 +1100,11 @@ int domain_unpause_by_systemcontroller(struct domain *d) * Creation is considered finished when the controller reference count * first drops to 0. */ - if ( new == 0 ) + if ( new == 0 && !d->creation_finished ) + { d->creation_finished = true; + arch_domain_creation_finished(d); + } domain_unpause(d); diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index c470f062db..2a4652e7f4 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -225,6 +225,8 @@ int p2m_set_entry(struct p2m_domain *p2m, p2m_type_t t, p2m_access_t a); +void p2m_invalidate_root(struct p2m_domain *p2m); + /* * Clean & invalidate caches corresponding to a region [start,end) of guest * address space. diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h index 5e393fd7f2..8d95ad4bf1 100644 --- a/xen/include/xen/domain.h +++ b/xen/include/xen/domain.h @@ -70,6 +70,8 @@ void arch_domain_unpause(struct domain *d); int arch_domain_soft_reset(struct domain *d); +void arch_domain_creation_finished(struct domain *d); + void arch_p2m_set_access_required(struct domain *d, bool access_required); int arch_set_info_guest(struct vcpu *, vcpu_guest_context_u);
At the moment, the implementation of Set/Way operations will go through all the entries of the guest P2M and flush them. However, this is very expensive and may render unusable a guest OS using them. For instance, Linux 32-bit will use Set/Way operations during secondary CPU bring-up. As the implementation is really expensive, it may be possible to hit the CPU bring-up timeout. To limit the Set/Way impact, we track what pages has been of the guest has been accessed between batch of Set/Way operations. This is done using bit[0] (aka valid bit) of the P2M entry. This patch adds a new per-arch helper is introduced to perform actions just before the guest is first unpaused. This will be used to invalidate the P2M to track access from the start of the guest. Signed-off-by: Julien Grall <julien.grall@arm.com> --- Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Julien Grall <julien.grall@arm.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tim Deegan <tim@xen.org> Cc: Wei Liu <wei.liu2@citrix.com> --- xen/arch/arm/domain.c | 14 ++++++++++++++ xen/arch/arm/domain_build.c | 7 +++++++ xen/arch/arm/p2m.c | 32 +++++++++++++++++++++++++++++++- xen/arch/x86/domain.c | 4 ++++ xen/common/domain.c | 5 ++++- xen/include/asm-arm/p2m.h | 2 ++ xen/include/xen/domain.h | 2 ++ 7 files changed, 64 insertions(+), 2 deletions(-)