[Xen-devel,RFC,16/16] xen/arm: Track page accessed between batch of Set/Way operations

Message ID 20181008183352.16291-17-julien.grall@arm.com
State New
Headers show
Series
  • xen/arm: Implement Set/Way operations
Related show

Commit Message

Julien Grall Oct. 8, 2018, 6:33 p.m.
At the moment, the implementation of Set/Way operations will go through
all the entries of the guest P2M and flush them. However, this is very
expensive and may render unusable a guest OS using them.

For instance, Linux 32-bit will use Set/Way operations during secondary
CPU bring-up. As the implementation is really expensive, it may be possible
to hit the CPU bring-up timeout.

To limit the Set/Way impact, we track what pages has been of the guest
has been accessed between batch of Set/Way operations. This is done
using bit[0] (aka valid bit) of the P2M entry.

This patch adds a new per-arch helper is introduced to perform actions just
before the guest is first unpaused. This will be used to invalidate the
P2M to track access from the start of the guest.

Signed-off-by: Julien Grall <julien.grall@arm.com>

---

Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
---
 xen/arch/arm/domain.c       | 14 ++++++++++++++
 xen/arch/arm/domain_build.c |  7 +++++++
 xen/arch/arm/p2m.c          | 32 +++++++++++++++++++++++++++++++-
 xen/arch/x86/domain.c       |  4 ++++
 xen/common/domain.c         |  5 ++++-
 xen/include/asm-arm/p2m.h   |  2 ++
 xen/include/xen/domain.h    |  2 ++
 7 files changed, 64 insertions(+), 2 deletions(-)

Comments

Jan Beulich Oct. 9, 2018, 7:04 a.m. | #1
>>> On 08.10.18 at 20:33, <julien.grall@arm.com> wrote:
> At the moment, the implementation of Set/Way operations will go through
> all the entries of the guest P2M and flush them. However, this is very
> expensive and may render unusable a guest OS using them.
> 
> For instance, Linux 32-bit will use Set/Way operations during secondary
> CPU bring-up. As the implementation is really expensive, it may be possible
> to hit the CPU bring-up timeout.
> 
> To limit the Set/Way impact, we track what pages has been of the guest
> has been accessed between batch of Set/Way operations. This is done
> using bit[0] (aka valid bit) of the P2M entry.
> 
> This patch adds a new per-arch helper is introduced to perform actions just
> before the guest is first unpaused. This will be used to invalidate the
> P2M to track access from the start of the guest.

While I'm not opposed to the new arch hook, why don't you create the
p2m entries in their intended state right away? At the very least this
would have the benefit of confining the entire change to Arm code.

Jan
Julien Grall Oct. 9, 2018, 10:16 a.m. | #2
Hi Jan,

On 09/10/2018 08:04, Jan Beulich wrote:
>>>> On 08.10.18 at 20:33, <julien.grall@arm.com> wrote:
>> At the moment, the implementation of Set/Way operations will go through
>> all the entries of the guest P2M and flush them. However, this is very
>> expensive and may render unusable a guest OS using them.
>>
>> For instance, Linux 32-bit will use Set/Way operations during secondary
>> CPU bring-up. As the implementation is really expensive, it may be possible
>> to hit the CPU bring-up timeout.
>>
>> To limit the Set/Way impact, we track what pages has been of the guest
>> has been accessed between batch of Set/Way operations. This is done
>> using bit[0] (aka valid bit) of the P2M entry.
>>
>> This patch adds a new per-arch helper is introduced to perform actions just
>> before the guest is first unpaused. This will be used to invalidate the
>> P2M to track access from the start of the guest.
> 
> While I'm not opposed to the new arch hook, why don't you create the
> p2m entries in their intended state right away? At the very least this
> would have the benefit of confining the entire change to Arm code.

Let me start by saying I think having a hook to perform an action once 
the VM has been fully created is quite useful. For instance, this could 
be used on Arm to limit the invalidation of the icache. At the moment, 
we invalidate the icache for every populate memory hypercall. This is 
quite a waste of cycle.

In this particular circumstance, I would still like to use the hardware 
for walking the page-tables during the domain creation (i.e when copy 
binary over). This would not be possible if we create the entry with 
valid bit unset.

Furthermore, we don't need to create entry with valid bit unset once the 
guest is running. So we would need to check in the P2M code whether the 
guest is running and whether IOMMU is enabled.

So overall, I fell this is the best way to keep it simple for Arm and 
open door to speed/streamline a bit more the domain creation.

Cheers,
Jan Beulich Oct. 9, 2018, 11:43 a.m. | #3
>>> On 09.10.18 at 12:16, <julien.grall@arm.com> wrote:
> On 09/10/2018 08:04, Jan Beulich wrote:
>>>>> On 08.10.18 at 20:33, <julien.grall@arm.com> wrote:
>>> This patch adds a new per-arch helper is introduced to perform actions just
>>> before the guest is first unpaused. This will be used to invalidate the
>>> P2M to track access from the start of the guest.
>> 
>> While I'm not opposed to the new arch hook, why don't you create the
>> p2m entries in their intended state right away? At the very least this
>> would have the benefit of confining the entire change to Arm code.
> 
> Let me start by saying I think having a hook to perform an action once 
> the VM has been fully created is quite useful. For instance, this could 
> be used on Arm to limit the invalidation of the icache. At the moment, 
> we invalidate the icache for every populate memory hypercall. This is 
> quite a waste of cycle.

As said - I'm not opposed to such a hook in principle, but I'd like
to understand the reasons (and in particular whether there's an
alternative without introducing such a hook).

> In this particular circumstance, I would still like to use the hardware 
> for walking the page-tables during the domain creation (i.e when copy 
> binary over). This would not be possible if we create the entry with 
> valid bit unset.

This must be something Arm specific, since afaiu we're talking about
arbitrary domain creation here, not just Dom0. On x86 it would be
basically impossible to re-use the page tables created for the guest
to access guest memory from the control domain. (It could be made
work by inserting sub-trees into the control domain's page tables,
but obviously there would be a fair chance of conflict between the
virtual addresses the control domain uses for its own purposes and
the ones where the destination range in the domain being created
sits).

> Furthermore, we don't need to create entry with valid bit unset once the 
> guest is running. So we would need to check in the P2M code whether the 
> guest is running and whether IOMMU is enabled.

Well, looking at the patch context of your change, it is quite clear
that this would be pretty easy - simply taking d->creation_finished
into account.

Jan
Julien Grall Oct. 9, 2018, 12:24 p.m. | #4
Hi Jan,

On 09/10/2018 12:43, Jan Beulich wrote:
>>>> On 09.10.18 at 12:16, <julien.grall@arm.com> wrote:
>> On 09/10/2018 08:04, Jan Beulich wrote:
>>>>>> On 08.10.18 at 20:33, <julien.grall@arm.com> wrote:
>>>> This patch adds a new per-arch helper is introduced to perform actions just
>>>> before the guest is first unpaused. This will be used to invalidate the
>>>> P2M to track access from the start of the guest.
>>>
>>> While I'm not opposed to the new arch hook, why don't you create the
>>> p2m entries in their intended state right away? At the very least this
>>> would have the benefit of confining the entire change to Arm code.
>>
>> Let me start by saying I think having a hook to perform an action once
>> the VM has been fully created is quite useful. For instance, this could
>> be used on Arm to limit the invalidation of the icache. At the moment,
>> we invalidate the icache for every populate memory hypercall. This is
>> quite a waste of cycle.
> 
> As said - I'm not opposed to such a hook in principle, but I'd like
> to understand the reasons (and in particular whether there's an
> alternative without introducing such a hook).
> 
>> In this particular circumstance, I would still like to use the hardware
>> for walking the page-tables during the domain creation (i.e when copy
>> binary over). This would not be possible if we create the entry with
>> valid bit unset.
> 
> This must be something Arm specific, since afaiu we're talking about
> arbitrary domain creation here, not just Dom0. On x86 it would be
> basically impossible to re-use the page tables created for the guest
> to access guest memory from the control domain. (It could be made
> work by inserting sub-trees into the control domain's page tables,
> but obviously there would be a fair chance of conflict between the
> virtual addresses the control domain uses for its own purposes and
> the ones where the destination range in the domain being created
> sits).

Well we don't share sub-trees on Arm, yet have the valid set from the 
starting is still useful if you want to use the hardware for translate a 
guest address (e.g by switch between page-tables). This avoid the 
software lookup.

> 
>> Furthermore, we don't need to create entry with valid bit unset once the
>> guest is running. So we would need to check in the P2M code whether the
>> guest is running and whether IOMMU is enabled.
> 
> Well, looking at the patch context of your change, it is quite clear
> that this would be pretty easy - simply taking d->creation_finished
> into account.

I never said I couldn't use d->creation_finished. It is possible to 
spread it everywhere if we want to. But what's the point when it can be 
done in a single place?

Furthermore, as I suggested at the beginning of my previous answer, I 
can see other usage for this new hook. I am quite surprised you don't 
see any benefits on x86 too.

For instance looking at the memory subsystem, it would be possible to 
defer the TLB flush until the domain actually first run.

Cheers,
Stefano Stabellini Nov. 5, 2018, 9:35 p.m. | #5
On Mon, 8 Oct 2018, Julien Grall wrote:
> At the moment, the implementation of Set/Way operations will go through
> all the entries of the guest P2M and flush them. However, this is very
> expensive and may render unusable a guest OS using them.
> 
> For instance, Linux 32-bit will use Set/Way operations during secondary
> CPU bring-up. As the implementation is really expensive, it may be possible
> to hit the CPU bring-up timeout.
> 
> To limit the Set/Way impact, we track what pages has been of the guest
> has been accessed between batch of Set/Way operations. This is done
> using bit[0] (aka valid bit) of the P2M entry.

This is going to improve performance of ill-mannered guests at the cost
of hurting performance of well-mannered guests. Is it really a good
trade-off? Should this behavior at least be configurable with a Xen
command line?


> This patch adds a new per-arch helper is introduced to perform actions just
> before the guest is first unpaused. This will be used to invalidate the
> P2M to track access from the start of the guest.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
>
> ---
> 
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien.grall@arm.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> Cc: Jan Beulich <jbeulich@suse.com>
> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Cc: Tim Deegan <tim@xen.org>
> Cc: Wei Liu <wei.liu2@citrix.com>
> ---
>  xen/arch/arm/domain.c       | 14 ++++++++++++++
>  xen/arch/arm/domain_build.c |  7 +++++++
>  xen/arch/arm/p2m.c          | 32 +++++++++++++++++++++++++++++++-
>  xen/arch/x86/domain.c       |  4 ++++
>  xen/common/domain.c         |  5 ++++-
>  xen/include/asm-arm/p2m.h   |  2 ++
>  xen/include/xen/domain.h    |  2 ++
>  7 files changed, 64 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index feebbf5a92..f439f4657a 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -738,6 +738,20 @@ int arch_domain_soft_reset(struct domain *d)
>      return -ENOSYS;
>  }
>  
> +void arch_domain_creation_finished(struct domain *d)
> +{
> +    /*
> +     * To avoid flushing the whole guest RAM on the first Set/Way, we
> +     * invalidate the P2M to track what has been accessed.
> +     *
> +     * This is only turned when IOMMU is not used or the page-table are
> +     * not shared because bit[0] (e.g valid bit) unset will result
> +     * IOMMU fault that could be not fixed-up.
> +     */
> +    if ( !iommu_use_hap_pt(d) )
> +        p2m_invalidate_root(p2m_get_hostp2m(d));
> +}
> +
>  static int is_guest_pv32_psr(uint32_t psr)
>  {
>      switch (psr & PSR_MODE_MASK)
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index f552154e93..de96516faa 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -2249,6 +2249,13 @@ int __init construct_dom0(struct domain *d)
>      v->is_initialised = 1;
>      clear_bit(_VPF_down, &v->pause_flags);
>  
> +    /*
> +     * XXX: We probably want a command line option to invalidate or not
> +     * the P2M. This is because invalidating the P2M will not work with
> +     * IOMMU, however if this is not done there will be an impact.

Why can't we check on iommu_use_hap_pt(d) like in
arch_domain_creation_finished?

In any case, I agree it is a good idea to introduce a command line
parameter to toggle the p2m_invalidate_root call at domain creation
on/off. There are cases where it should be off even if an IOMMU is
present.

Aside from these two questions, the rest of the patch looks correct.


> +     */
> +    p2m_invalidate_root(p2m_get_hostp2m(d));
>
>      return 0;
>  }
>  
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index a3d4c563b1..8e0c31d7ac 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -1079,6 +1079,22 @@ static void p2m_invalidate_table(struct p2m_domain *p2m, mfn_t mfn)
>      p2m->need_flush = true;
>  }
>  
> +/*
> + * Invalidate all entries in the root page-tables. This is
> + * useful to get fault on entry and do an action.
> + */
> +void p2m_invalidate_root(struct p2m_domain *p2m)
> +{
> +    unsigned int i;
> +
> +    p2m_write_lock(p2m);
> +
> +    for ( i = 0; i < P2M_ROOT_LEVEL; i++ )
> +        p2m_invalidate_table(p2m, page_to_mfn(p2m->root + i));
> +
> +    p2m_write_unlock(p2m);
> +}
> +
>  bool p2m_resolve_translation_fault(struct domain *d, gfn_t gfn)
>  {
>      struct p2m_domain *p2m = p2m_get_hostp2m(d);
> @@ -1539,7 +1555,8 @@ int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end)
>  
>      for ( ; gfn_x(start) < gfn_x(end); start = next_gfn )
>      {
> -        mfn_t mfn = p2m_get_entry(p2m, start, &t, NULL, &order, NULL);
> +        bool valid;
> +        mfn_t mfn = p2m_get_entry(p2m, start, &t, NULL, &order, &valid);
>  
>          next_gfn = gfn_next_boundary(start, order);
>  
> @@ -1547,6 +1564,13 @@ int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end)
>          if ( mfn_eq(mfn, INVALID_MFN) || !p2m_is_any_ram(t) )
>              continue;
>  
> +        /*
> +         * Page with valid bit (bit [0]) unset does not need to be
> +         * cleaned
> +         */
> +        if ( !valid )
> +            continue;
> +
>          /* XXX: Implement preemption */
>          while ( gfn_x(start) < gfn_x(next_gfn) )
>          {
> @@ -1571,6 +1595,12 @@ static void p2m_flush_vm(struct vcpu *v)
>      /* XXX: Handle preemption */
>      p2m_cache_flush_range(v->domain, p2m->lowest_mapped_gfn,
>                            p2m->max_mapped_gfn);
> +
> +    /*
> +     * Invalidate the p2m to track which page was modified by the guest
> +     * between call of p2m_flush_vm().
> +     */
> +    p2m_invalidate_root(p2m);
>  }
>  
>  /*
> diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> index 9371efc8c7..2b6d1c01a1 100644
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -723,6 +723,10 @@ int arch_domain_soft_reset(struct domain *d)
>      return ret;
>  }
>  
> +void arch_domain_creation_finished(struct domain *d)
> +{
> +}
> +
>  /*
>   * These are the masks of CR4 bits (subject to hardware availability) which a
>   * PV guest may not legitimiately attempt to modify.
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 65151e2ac4..b402c635f9 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1100,8 +1100,11 @@ int domain_unpause_by_systemcontroller(struct domain *d)
>       * Creation is considered finished when the controller reference count
>       * first drops to 0.
>       */
> -    if ( new == 0 )
> +    if ( new == 0 && !d->creation_finished )
> +    {
>          d->creation_finished = true;
> +        arch_domain_creation_finished(d);
> +    }
>  
>      domain_unpause(d);
>  
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index c470f062db..2a4652e7f4 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -225,6 +225,8 @@ int p2m_set_entry(struct p2m_domain *p2m,
>                    p2m_type_t t,
>                    p2m_access_t a);
>  
> +void p2m_invalidate_root(struct p2m_domain *p2m);
> +
>  /*
>   * Clean & invalidate caches corresponding to a region [start,end) of guest
>   * address space.
> diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
> index 5e393fd7f2..8d95ad4bf1 100644
> --- a/xen/include/xen/domain.h
> +++ b/xen/include/xen/domain.h
> @@ -70,6 +70,8 @@ void arch_domain_unpause(struct domain *d);
>  
>  int arch_domain_soft_reset(struct domain *d);
>  
> +void arch_domain_creation_finished(struct domain *d);
> +
>  void arch_p2m_set_access_required(struct domain *d, bool access_required);
>  
>  int arch_set_info_guest(struct vcpu *, vcpu_guest_context_u);
> -- 
> 2.11.0
>
Julien Grall Nov. 5, 2018, 11:28 p.m. | #6
Hi Stefano,

On 11/5/18 9:35 PM, Stefano Stabellini wrote:
> On Mon, 8 Oct 2018, Julien Grall wrote:
>> At the moment, the implementation of Set/Way operations will go through
>> all the entries of the guest P2M and flush them. However, this is very
>> expensive and may render unusable a guest OS using them.
>>
>> For instance, Linux 32-bit will use Set/Way operations during secondary
>> CPU bring-up. As the implementation is really expensive, it may be possible
>> to hit the CPU bring-up timeout.
>>
>> To limit the Set/Way impact, we track what pages has been of the guest
>> has been accessed between batch of Set/Way operations. This is done
>> using bit[0] (aka valid bit) of the P2M entry.
> 
> This is going to improve performance of ill-mannered guests at the cost
> of hurting performance of well-mannered guests. Is it really a good
> trade-off? Should this behavior at least be configurable with a Xen
> command line?

Well, we have the choice between not been able to boot Linux 32-bit 
anymore or have a slight impact at the boot time for all guests.

As you may have noticed the command line is been suggested below. I 
didn't yet implemented as we agreed at Connect it would be good to start 
getting feedback on it.

> 
> 
>> This patch adds a new per-arch helper is introduced to perform actions just
>> before the guest is first unpaused. This will be used to invalidate the
>> P2M to track access from the start of the guest.
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>>
>> ---
>>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>> Cc: Julien Grall <julien.grall@arm.com>
>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>> Cc: George Dunlap <George.Dunlap@eu.citrix.com>
>> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
>> Cc: Jan Beulich <jbeulich@suse.com>
>> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> Cc: Tim Deegan <tim@xen.org>
>> Cc: Wei Liu <wei.liu2@citrix.com>
>> ---
>>   xen/arch/arm/domain.c       | 14 ++++++++++++++
>>   xen/arch/arm/domain_build.c |  7 +++++++
>>   xen/arch/arm/p2m.c          | 32 +++++++++++++++++++++++++++++++-
>>   xen/arch/x86/domain.c       |  4 ++++
>>   xen/common/domain.c         |  5 ++++-
>>   xen/include/asm-arm/p2m.h   |  2 ++
>>   xen/include/xen/domain.h    |  2 ++
>>   7 files changed, 64 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index feebbf5a92..f439f4657a 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -738,6 +738,20 @@ int arch_domain_soft_reset(struct domain *d)
>>       return -ENOSYS;
>>   }
>>   
>> +void arch_domain_creation_finished(struct domain *d)
>> +{
>> +    /*
>> +     * To avoid flushing the whole guest RAM on the first Set/Way, we
>> +     * invalidate the P2M to track what has been accessed.
>> +     *
>> +     * This is only turned when IOMMU is not used or the page-table are
>> +     * not shared because bit[0] (e.g valid bit) unset will result
>> +     * IOMMU fault that could be not fixed-up.
>> +     */
>> +    if ( !iommu_use_hap_pt(d) )
>> +        p2m_invalidate_root(p2m_get_hostp2m(d));
>> +}
>> +
>>   static int is_guest_pv32_psr(uint32_t psr)
>>   {
>>       switch (psr & PSR_MODE_MASK)
>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index f552154e93..de96516faa 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -2249,6 +2249,13 @@ int __init construct_dom0(struct domain *d)
>>       v->is_initialised = 1;
>>       clear_bit(_VPF_down, &v->pause_flags);
>>   
>> +    /*
>> +     * XXX: We probably want a command line option to invalidate or not
>> +     * the P2M. This is because invalidating the P2M will not work with
>> +     * IOMMU, however if this is not done there will be an impact.
> 
> Why can't we check on iommu_use_hap_pt(d) like in
> arch_domain_creation_finished?
> 
> In any case, I agree it is a good idea to introduce a command line
> parameter to toggle the p2m_invalidate_root call at domain creation
> on/off. There are cases where it should be off even if an IOMMU is
> present.

I actually forgot to remove that code as Dom0 should be covered by the 
change below.

I am not entirely to understand your last sentence, this feature is 
turned off when an IOMMU is present. So what is your use case here?

Cheers,
Stefano Stabellini Nov. 6, 2018, 5:43 p.m. | #7
On Mon, 5 Nov 2018, Julien Grall wrote:
> Hi Stefano,
> 
> On 11/5/18 9:35 PM, Stefano Stabellini wrote:
> > On Mon, 8 Oct 2018, Julien Grall wrote:
> > > At the moment, the implementation of Set/Way operations will go through
> > > all the entries of the guest P2M and flush them. However, this is very
> > > expensive and may render unusable a guest OS using them.
> > > 
> > > For instance, Linux 32-bit will use Set/Way operations during secondary
> > > CPU bring-up. As the implementation is really expensive, it may be
> > > possible
> > > to hit the CPU bring-up timeout.
> > > 
> > > To limit the Set/Way impact, we track what pages has been of the guest
> > > has been accessed between batch of Set/Way operations. This is done
> > > using bit[0] (aka valid bit) of the P2M entry.
> > 
> > This is going to improve performance of ill-mannered guests at the cost
> > of hurting performance of well-mannered guests. Is it really a good
> > trade-off? Should this behavior at least be configurable with a Xen
> > command line?
> 
> Well, we have the choice between not been able to boot Linux 32-bit anymore or
> have a slight impact at the boot time for all guests.

Wait -- I thought that with the set/way emulation introduced by patch
#15 we would be able to boot Linux 32-bit already. This patch is a
performance improvement. Or is it actually needed to boot Linux 32-bit?


> As you may have noticed the command line is been suggested below. I didn't yet
> implemented as we agreed at Connect it would be good to start getting feedback
> on it.

Sure. I was thinking about this -- does it make sense to have different
defaults for 32bit and 64bit guests?

32bit -> default p2m_invalidate_root
64bit -> default not


> > 
> > > This patch adds a new per-arch helper is introduced to perform actions
> > > just
> > > before the guest is first unpaused. This will be used to invalidate the
> > > P2M to track access from the start of the guest.
> > > 
> > > Signed-off-by: Julien Grall <julien.grall@arm.com>
> > > 
> > > ---
> > > 
> > > Cc: Stefano Stabellini <sstabellini@kernel.org>
> > > Cc: Julien Grall <julien.grall@arm.com>
> > > Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> > > Cc: George Dunlap <George.Dunlap@eu.citrix.com>
> > > Cc: Ian Jackson <ian.jackson@eu.citrix.com>
> > > Cc: Jan Beulich <jbeulich@suse.com>
> > > Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > > Cc: Tim Deegan <tim@xen.org>
> > > Cc: Wei Liu <wei.liu2@citrix.com>
> > > ---
> > >   xen/arch/arm/domain.c       | 14 ++++++++++++++
> > >   xen/arch/arm/domain_build.c |  7 +++++++
> > >   xen/arch/arm/p2m.c          | 32 +++++++++++++++++++++++++++++++-
> > >   xen/arch/x86/domain.c       |  4 ++++
> > >   xen/common/domain.c         |  5 ++++-
> > >   xen/include/asm-arm/p2m.h   |  2 ++
> > >   xen/include/xen/domain.h    |  2 ++
> > >   7 files changed, 64 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > > index feebbf5a92..f439f4657a 100644
> > > --- a/xen/arch/arm/domain.c
> > > +++ b/xen/arch/arm/domain.c
> > > @@ -738,6 +738,20 @@ int arch_domain_soft_reset(struct domain *d)
> > >       return -ENOSYS;
> > >   }
> > >   +void arch_domain_creation_finished(struct domain *d)
> > > +{
> > > +    /*
> > > +     * To avoid flushing the whole guest RAM on the first Set/Way, we
> > > +     * invalidate the P2M to track what has been accessed.
> > > +     *
> > > +     * This is only turned when IOMMU is not used or the page-table are
> > > +     * not shared because bit[0] (e.g valid bit) unset will result
> > > +     * IOMMU fault that could be not fixed-up.
> > > +     */
> > > +    if ( !iommu_use_hap_pt(d) )
> > > +        p2m_invalidate_root(p2m_get_hostp2m(d));
> > > +}
> > > +
> > >   static int is_guest_pv32_psr(uint32_t psr)
> > >   {
> > >       switch (psr & PSR_MODE_MASK)
> > > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > > index f552154e93..de96516faa 100644
> > > --- a/xen/arch/arm/domain_build.c
> > > +++ b/xen/arch/arm/domain_build.c
> > > @@ -2249,6 +2249,13 @@ int __init construct_dom0(struct domain *d)
> > >       v->is_initialised = 1;
> > >       clear_bit(_VPF_down, &v->pause_flags);
> > >   +    /*
> > > +     * XXX: We probably want a command line option to invalidate or not
> > > +     * the P2M. This is because invalidating the P2M will not work with
> > > +     * IOMMU, however if this is not done there will be an impact.
> > 
> > Why can't we check on iommu_use_hap_pt(d) like in
> > arch_domain_creation_finished?
> > 
> > In any case, I agree it is a good idea to introduce a command line
> > parameter to toggle the p2m_invalidate_root call at domain creation
> > on/off. There are cases where it should be off even if an IOMMU is
> > present.
> 
> I actually forgot to remove that code as Dom0 should be covered by the change
> below.

Makes sense now


> I am not entirely to understand your last sentence, this feature is turned off
> when an IOMMU is present. So what is your use case here?
 
My sentence was badly written, sorry. I meant to say that even when an
IOMMU is NOT present, there are cases where we might not want to call
p2m_invalidate_root.
Julien Grall Nov. 6, 2018, 6:10 p.m. | #8
On 06/11/2018 17:43, Stefano Stabellini wrote:
> On Mon, 5 Nov 2018, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 11/5/18 9:35 PM, Stefano Stabellini wrote:
>>> On Mon, 8 Oct 2018, Julien Grall wrote:
>>>> At the moment, the implementation of Set/Way operations will go through
>>>> all the entries of the guest P2M and flush them. However, this is very
>>>> expensive and may render unusable a guest OS using them.
>>>>
>>>> For instance, Linux 32-bit will use Set/Way operations during secondary
>>>> CPU bring-up. As the implementation is really expensive, it may be
>>>> possible
>>>> to hit the CPU bring-up timeout.
>>>>
>>>> To limit the Set/Way impact, we track what pages has been of the guest
>>>> has been accessed between batch of Set/Way operations. This is done
>>>> using bit[0] (aka valid bit) of the P2M entry.
>>>
>>> This is going to improve performance of ill-mannered guests at the cost
>>> of hurting performance of well-mannered guests. Is it really a good
>>> trade-off? Should this behavior at least be configurable with a Xen
>>> command line?
>>
>> Well, we have the choice between not been able to boot Linux 32-bit anymore or
>> have a slight impact at the boot time for all guests.
> 
> Wait -- I thought that with the set/way emulation introduced by patch
> #15 we would be able to boot Linux 32-bit already. This patch is a
> performance improvement. Or is it actually needed to boot Linux 32-bit?

The problem is Linux 32-bit calls a few time set/way during secondary CPU bring 
up. It also has a timeout of 1s to fully boot that CPU. In my testing, I can 
easily hit the timeout even with a small amount of memory.

If we don't start tracking the page from the beginning, then you would need to 
clean the full RAM the first time. If you start to track from the beginning, you 
may just have to clean a couple of MB.

> 
> 
>> As you may have noticed the command line is been suggested below. I didn't yet
>> implemented as we agreed at Connect it would be good to start getting feedback
>> on it.
> 
> Sure. I was thinking about this -- does it make sense to have different
> defaults for 32bit and 64bit guests?
> 
> 32bit -> default p2m_invalidate_root
> 64bit -> default not

The decision is not that easy. While Linux arm64 does not contain set/way 
anymore, UEFI still use them (I checked it a couple of months ago).

I haven't done any benchmark yet. That's my next step.

>> I am not entirely to understand your last sentence, this feature is turned off
>> when an IOMMU is present. So what is your use case here?
>   
> My sentence was badly written, sorry. I meant to say that even when an
> IOMMU is NOT present, there are cases where we might not want to call
> p2m_invalidate_root.

Before implementing a command line option (or even plumbing that to the guest 
configuration), I want to understand what is the real impact on the boot time 
for "normal" guest.

Cheers,
Stefano Stabellini Nov. 6, 2018, 6:41 p.m. | #9
On Tue, 6 Nov 2018, Julien Grall wrote:
> On 06/11/2018 17:43, Stefano Stabellini wrote:
> > On Mon, 5 Nov 2018, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 11/5/18 9:35 PM, Stefano Stabellini wrote:
> > > > On Mon, 8 Oct 2018, Julien Grall wrote:
> > > > > At the moment, the implementation of Set/Way operations will go
> > > > > through
> > > > > all the entries of the guest P2M and flush them. However, this is very
> > > > > expensive and may render unusable a guest OS using them.
> > > > > 
> > > > > For instance, Linux 32-bit will use Set/Way operations during
> > > > > secondary
> > > > > CPU bring-up. As the implementation is really expensive, it may be
> > > > > possible
> > > > > to hit the CPU bring-up timeout.
> > > > > 
> > > > > To limit the Set/Way impact, we track what pages has been of the guest
> > > > > has been accessed between batch of Set/Way operations. This is done
> > > > > using bit[0] (aka valid bit) of the P2M entry.
> > > > 
> > > > This is going to improve performance of ill-mannered guests at the cost
> > > > of hurting performance of well-mannered guests. Is it really a good
> > > > trade-off? Should this behavior at least be configurable with a Xen
> > > > command line?
> > > 
> > > Well, we have the choice between not been able to boot Linux 32-bit
> > > anymore or
> > > have a slight impact at the boot time for all guests.
> > 
> > Wait -- I thought that with the set/way emulation introduced by patch
> > #15 we would be able to boot Linux 32-bit already. This patch is a
> > performance improvement. Or is it actually needed to boot Linux 32-bit?
> 
> The problem is Linux 32-bit calls a few time set/way during secondary CPU
> bring up. It also has a timeout of 1s to fully boot that CPU. In my testing, I
> can easily hit the timeout even with a small amount of memory.
 
Damn! It is worst than I thought.


> If we don't start tracking the page from the beginning, then you would need to
> clean the full RAM the first time. If you start to track from the beginning,
> you may just have to clean a couple of MB.
>
> > 
> > > As you may have noticed the command line is been suggested below. I didn't
> > > yet
> > > implemented as we agreed at Connect it would be good to start getting
> > > feedback
> > > on it.
> > 
> > Sure. I was thinking about this -- does it make sense to have different
> > defaults for 32bit and 64bit guests?
> > 
> > 32bit -> default p2m_invalidate_root
> > 64bit -> default not
> 
> The decision is not that easy. While Linux arm64 does not contain set/way
> anymore, UEFI still use them (I checked it a couple of months ago).
> 
> I haven't done any benchmark yet. That's my next step.

OK, makes sense


> > > I am not entirely to understand your last sentence, this feature is turned
> > > off
> > > when an IOMMU is present. So what is your use case here?
> >   My sentence was badly written, sorry. I meant to say that even when an
> > IOMMU is NOT present, there are cases where we might not want to call
> > p2m_invalidate_root.
> 
> Before implementing a command line option (or even plumbing that to the guest
> configuration), I want to understand what is the real impact on the boot time
> for "normal" guest.

yes, makes sense

Patch

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index feebbf5a92..f439f4657a 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -738,6 +738,20 @@  int arch_domain_soft_reset(struct domain *d)
     return -ENOSYS;
 }
 
+void arch_domain_creation_finished(struct domain *d)
+{
+    /*
+     * To avoid flushing the whole guest RAM on the first Set/Way, we
+     * invalidate the P2M to track what has been accessed.
+     *
+     * This is only turned when IOMMU is not used or the page-table are
+     * not shared because bit[0] (e.g valid bit) unset will result
+     * IOMMU fault that could be not fixed-up.
+     */
+    if ( !iommu_use_hap_pt(d) )
+        p2m_invalidate_root(p2m_get_hostp2m(d));
+}
+
 static int is_guest_pv32_psr(uint32_t psr)
 {
     switch (psr & PSR_MODE_MASK)
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index f552154e93..de96516faa 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2249,6 +2249,13 @@  int __init construct_dom0(struct domain *d)
     v->is_initialised = 1;
     clear_bit(_VPF_down, &v->pause_flags);
 
+    /*
+     * XXX: We probably want a command line option to invalidate or not
+     * the P2M. This is because invalidating the P2M will not work with
+     * IOMMU, however if this is not done there will be an impact.
+     */
+    p2m_invalidate_root(p2m_get_hostp2m(d));
+
     return 0;
 }
 
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index a3d4c563b1..8e0c31d7ac 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1079,6 +1079,22 @@  static void p2m_invalidate_table(struct p2m_domain *p2m, mfn_t mfn)
     p2m->need_flush = true;
 }
 
+/*
+ * Invalidate all entries in the root page-tables. This is
+ * useful to get fault on entry and do an action.
+ */
+void p2m_invalidate_root(struct p2m_domain *p2m)
+{
+    unsigned int i;
+
+    p2m_write_lock(p2m);
+
+    for ( i = 0; i < P2M_ROOT_LEVEL; i++ )
+        p2m_invalidate_table(p2m, page_to_mfn(p2m->root + i));
+
+    p2m_write_unlock(p2m);
+}
+
 bool p2m_resolve_translation_fault(struct domain *d, gfn_t gfn)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
@@ -1539,7 +1555,8 @@  int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end)
 
     for ( ; gfn_x(start) < gfn_x(end); start = next_gfn )
     {
-        mfn_t mfn = p2m_get_entry(p2m, start, &t, NULL, &order, NULL);
+        bool valid;
+        mfn_t mfn = p2m_get_entry(p2m, start, &t, NULL, &order, &valid);
 
         next_gfn = gfn_next_boundary(start, order);
 
@@ -1547,6 +1564,13 @@  int p2m_cache_flush_range(struct domain *d, gfn_t start, gfn_t end)
         if ( mfn_eq(mfn, INVALID_MFN) || !p2m_is_any_ram(t) )
             continue;
 
+        /*
+         * Page with valid bit (bit [0]) unset does not need to be
+         * cleaned
+         */
+        if ( !valid )
+            continue;
+
         /* XXX: Implement preemption */
         while ( gfn_x(start) < gfn_x(next_gfn) )
         {
@@ -1571,6 +1595,12 @@  static void p2m_flush_vm(struct vcpu *v)
     /* XXX: Handle preemption */
     p2m_cache_flush_range(v->domain, p2m->lowest_mapped_gfn,
                           p2m->max_mapped_gfn);
+
+    /*
+     * Invalidate the p2m to track which page was modified by the guest
+     * between call of p2m_flush_vm().
+     */
+    p2m_invalidate_root(p2m);
 }
 
 /*
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 9371efc8c7..2b6d1c01a1 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -723,6 +723,10 @@  int arch_domain_soft_reset(struct domain *d)
     return ret;
 }
 
+void arch_domain_creation_finished(struct domain *d)
+{
+}
+
 /*
  * These are the masks of CR4 bits (subject to hardware availability) which a
  * PV guest may not legitimiately attempt to modify.
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 65151e2ac4..b402c635f9 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1100,8 +1100,11 @@  int domain_unpause_by_systemcontroller(struct domain *d)
      * Creation is considered finished when the controller reference count
      * first drops to 0.
      */
-    if ( new == 0 )
+    if ( new == 0 && !d->creation_finished )
+    {
         d->creation_finished = true;
+        arch_domain_creation_finished(d);
+    }
 
     domain_unpause(d);
 
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index c470f062db..2a4652e7f4 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -225,6 +225,8 @@  int p2m_set_entry(struct p2m_domain *p2m,
                   p2m_type_t t,
                   p2m_access_t a);
 
+void p2m_invalidate_root(struct p2m_domain *p2m);
+
 /*
  * Clean & invalidate caches corresponding to a region [start,end) of guest
  * address space.
diff --git a/xen/include/xen/domain.h b/xen/include/xen/domain.h
index 5e393fd7f2..8d95ad4bf1 100644
--- a/xen/include/xen/domain.h
+++ b/xen/include/xen/domain.h
@@ -70,6 +70,8 @@  void arch_domain_unpause(struct domain *d);
 
 int arch_domain_soft_reset(struct domain *d);
 
+void arch_domain_creation_finished(struct domain *d);
+
 void arch_p2m_set_access_required(struct domain *d, bool access_required);
 
 int arch_set_info_guest(struct vcpu *, vcpu_guest_context_u);