[v2,07/10] xen/arm: Introduce relinquish_p2m_mapping to remove refcount every mapped page

Message ID 1386560047-17500-8-git-send-email-julien.grall@linaro.org
State Superseded
Headers show

Commit Message

Julien Grall Dec. 9, 2013, 3:34 a.m.
This function will be called when the domain relinquishes its memory.
It removes refcount on every mapped page to a valid MFN.

Currently, Xen doesn't take refcount on every new mapping but only for foreign
mapping. Restrict the function only on foreign mapping.

Signed-off-by: Julien Grall <julien.grall@linaro.org>

---
    Changes in v2:
        - Introduce the patch
---
 xen/arch/arm/domain.c        |    5 +++++
 xen/arch/arm/p2m.c           |   47 ++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/domain.h |    1 +
 xen/include/asm-arm/p2m.h    |   15 ++++++++++++++
 4 files changed, 68 insertions(+)

Comments

Ian Campbell Dec. 9, 2013, 4:28 p.m. | #1
On Mon, 2013-12-09 at 03:34 +0000, Julien Grall wrote:
> This function will be called when the domain relinquishes its memory.
> It removes refcount on every mapped page to a valid MFN.
> 
> Currently, Xen doesn't take refcount on every new mapping but only for foreign
> mapping. Restrict the function only on foreign mapping.

Skimming the remainder of the patch's titles and recalling a previous
conversation the intention is not to extend this for 4.4, correct?

> 
> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> 
> ---
>     Changes in v2:
>         - Introduce the patch
> ---
>  xen/arch/arm/domain.c        |    5 +++++
>  xen/arch/arm/p2m.c           |   47 ++++++++++++++++++++++++++++++++++++++++++
>  xen/include/asm-arm/domain.h |    1 +
>  xen/include/asm-arm/p2m.h    |   15 ++++++++++++++
>  4 files changed, 68 insertions(+)
> 
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 1590708..e7c2f67 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -717,6 +717,11 @@ int domain_relinquish_resources(struct domain *d)
>          if ( ret )
>              return ret;
>  
> +    case RELMEM_mapping:

Something somewhere should be setting d->arch.relmem = RELMEM_mapping at
the appropriate time. (immediately above I think?)

You also want a "Fallthrough" comment just above.

> +        ret = relinquish_p2m_mapping(d);
> +        if ( ret )
> +            return ret;
> +
>          d->arch.relmem = RELMEM_done;
>          /* Fallthrough */
>  
> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> index f0bbaca..dbd6a06 100644
> --- a/xen/arch/arm/p2m.c
> +++ b/xen/arch/arm/p2m.c
> @@ -6,6 +6,7 @@
>  #include <xen/bitops.h>
>  #include <asm/flushtlb.h>
>  #include <asm/gic.h>
> +#include <asm/event.h>
>  
>  /* First level P2M is 2 consecutive pages */
>  #define P2M_FIRST_ORDER 1
> @@ -320,6 +321,16 @@ static int create_p2m_entries(struct domain *d,
>              flush_tlb_all_local();
>      }
>  
> +    if ( (t == p2m_ram_rw) || (t == p2m_ram_ro) || (t == p2m_map_foreign))
> +    {
> +        unsigned long sgfn = paddr_to_pfn(start_gpaddr);
> +        unsigned long egfn = paddr_to_pfn(end_gpaddr);
> +
> +        p2m->max_mapped_gfn = MAX(p2m->max_mapped_gfn, egfn);
> +        /* Use next_gfn_to_relinquish to store the lowest gfn mapped */
> +        p2m->next_gfn_to_relinquish = MIN(p2m->next_gfn_to_relinquish, sgfn);
> +    }
> +
>      rc = 0;
>  
>  out:
> @@ -503,12 +514,48 @@ int p2m_init(struct domain *d)
>  
>      p2m->first_level = NULL;
>  
> +    p2m->max_mapped_gfn = 0;
> +    p2m->next_gfn_to_relinquish = ULONG_MAX;
> +
>  err:
>      spin_unlock(&p2m->lock);
>  
>      return rc;
>  }
>  
> +int relinquish_p2m_mapping(struct domain *d)
> +{
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +    unsigned long gfn, count = 0;
> +    int rc = 0;
> +
> +    for ( gfn = p2m->next_gfn_to_relinquish;
> +          gfn < p2m->max_mapped_gfn; gfn++ )

I know that Tim has been keen to get rid of these sorts of loops on x86,
and with good reason I think.

> +    {
> +        p2m_type_t t;
> +        paddr_t p = p2m_lookup(d, gfn, &t);

This does the full walk for each address, even though 2/3 of the levels
are more than likely identical to the previous gfn.

It would be better to do a full walk, which sadly will look a lot like
p2m_lookup, no avoiding that I think.

You can still resume the walk based on next_gfn_to_relinquish and bound
it on max_mapped_gfn, although I don't think it is strictly necessary.

> +        unsigned long mfn = p >> PAGE_SHIFT;
> +
> +        if ( mfn_valid(mfn) && p2m_is_foreign(t) )

I think it would be worth reiterating in a comment that we only take a
ref for foreign mappings right now.
> +        {
> +            put_page(mfn_to_page(mfn));
> +            guest_physmap_remove_page(d, gfn, mfn, 0);

You should unmap it and then put it I think.

Is this going to do yet another lookup/walk?

The REMOVE case of create_p2m_entries is so trivial you could open code
it here, or if you wanted to you could refactor it into a helper.

I am wondering if the conditional put page ought to be at the point of
removal (i.e. in the helper) rather than here. (I think Tim made a
similar comment on the x86 version of the remove_from_physmap pvh
patches, you probably need to match the generic change which that
implies)

BTW, if you do the clear (but not the put_page) for every entry then the
present bit (or pte.bits == 0) might be a useful proxy for
next_lN_entry? I suppose even walking, say, 4GB of mostly empty P2M is
not going to be super cheap.

> +        }
> +
> +        count++;
> +
> +        /* Preempt every 2MiB. Arbitrary */
> +        if ( (count == 512) && hypercall_preempt_check() )
> +        {
> +            p2m->next_gfn_to_relinquish = gfn + 1;
> +            rc = -EAGAIN;

I think I'm just failing to find it, but where is the call to
hypercall_create_continuation? I suppose it is somewhere way up the
stack?

I'm not sure the count == 512 is needed -- hypercall_preempt_check
should be sufficient?

> +            break;
> +        }
> +    }
> +
> +    return rc;
> +}
> +
>  unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn)
>  {
>      paddr_t p = p2m_lookup(d, pfn_to_paddr(gpfn), NULL);
> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
> index 922eda3..4a4c018 100644
> --- a/xen/include/asm-arm/domain.h
> +++ b/xen/include/asm-arm/domain.h
> @@ -75,6 +75,7 @@ struct arch_domain
>          RELMEM_not_started,
>          RELMEM_xen,
>          RELMEM_page,
> +        RELMEM_mapping,
>          RELMEM_done,
>      } relmem;
>  
> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
> index b63204d..b0d3aea 100644
> --- a/xen/include/asm-arm/p2m.h
> +++ b/xen/include/asm-arm/p2m.h
> @@ -18,6 +18,15 @@ struct p2m_domain {
>  
>      /* Current VMID in use */
>      uint8_t vmid;
> +
> +    /* Highest guest frame that's ever been mapped in the p2m
> +     * Take only into account ram and foreign mapping
> +     */
> +    unsigned long max_mapped_gfn;
> +
> +    /* When releasing mapped gfn's in a preemptible manner, recall where
> +     * to resume the search */
> +    unsigned long next_gfn_to_relinquish;
>  };
>  
>  /* List of possible type for each page in the p2m
> @@ -48,6 +57,12 @@ int p2m_init(struct domain *d);
>  /* Return all the p2m resources to Xen. */
>  void p2m_teardown(struct domain *d);
>  
> +/* Remove mapping refcount on each mapping page in the p2m
> + *
> + * TODO: For the moment only foreign mapping is handled
> + */
> +int relinquish_p2m_mapping(struct domain *d);
> +
>  /* Allocate a new p2m table for a domain.
>   *
>   * Returns 0 for success or -errno.
Julien Grall Dec. 10, 2013, 1:31 a.m. | #2
On 12/09/2013 04:28 PM, Ian Campbell wrote:
> On Mon, 2013-12-09 at 03:34 +0000, Julien Grall wrote:
>> This function will be called when the domain relinquishes its memory.
>> It removes refcount on every mapped page to a valid MFN.
>>
>> Currently, Xen doesn't take refcount on every new mapping but only for foreign
>> mapping. Restrict the function only on foreign mapping.
>
> Skimming the remainder of the patch's titles and recalling a previous
> conversation the intention is not to extend this for 4.4, correct?

Right, it's too big for Xen 4.4.

>>
>> Signed-off-by: Julien Grall <julien.grall@linaro.org>
>>
>> ---
>>      Changes in v2:
>>          - Introduce the patch
>> ---
>>   xen/arch/arm/domain.c        |    5 +++++
>>   xen/arch/arm/p2m.c           |   47 ++++++++++++++++++++++++++++++++++++++++++
>>   xen/include/asm-arm/domain.h |    1 +
>>   xen/include/asm-arm/p2m.h    |   15 ++++++++++++++
>>   4 files changed, 68 insertions(+)
>>
>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>> index 1590708..e7c2f67 100644
>> --- a/xen/arch/arm/domain.c
>> +++ b/xen/arch/arm/domain.c
>> @@ -717,6 +717,11 @@ int domain_relinquish_resources(struct domain *d)
>>           if ( ret )
>>               return ret;
>>
>> +    case RELMEM_mapping:
>
> Something somewhere should be setting d->arch.relmem = RELMEM_mapping at
> the appropriate time. (immediately above I think?)
>
> You also want a "Fallthrough" comment just above.

Oops, I will update the patch for the next version.

>> +        ret = relinquish_p2m_mapping(d);
>> +        if ( ret )
>> +            return ret;
>> +
>>           d->arch.relmem = RELMEM_done;
>>           /* Fallthrough */
>>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index f0bbaca..dbd6a06 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -6,6 +6,7 @@
>>   #include <xen/bitops.h>
>>   #include <asm/flushtlb.h>
>>   #include <asm/gic.h>
>> +#include <asm/event.h>
>>
>>   /* First level P2M is 2 consecutive pages */
>>   #define P2M_FIRST_ORDER 1
>> @@ -320,6 +321,16 @@ static int create_p2m_entries(struct domain *d,
>>               flush_tlb_all_local();
>>       }
>>
>> +    if ( (t == p2m_ram_rw) || (t == p2m_ram_ro) || (t == p2m_map_foreign))
>> +    {
>> +        unsigned long sgfn = paddr_to_pfn(start_gpaddr);
>> +        unsigned long egfn = paddr_to_pfn(end_gpaddr);
>> +
>> +        p2m->max_mapped_gfn = MAX(p2m->max_mapped_gfn, egfn);
>> +        /* Use next_gfn_to_relinquish to store the lowest gfn mapped */
>> +        p2m->next_gfn_to_relinquish = MIN(p2m->next_gfn_to_relinquish, sgfn);
>> +    }
>> +
>>       rc = 0;
>>
>>   out:
>> @@ -503,12 +514,48 @@ int p2m_init(struct domain *d)
>>
>>       p2m->first_level = NULL;
>>
>> +    p2m->max_mapped_gfn = 0;
>> +    p2m->next_gfn_to_relinquish = ULONG_MAX;
>> +
>>   err:
>>       spin_unlock(&p2m->lock);
>>
>>       return rc;
>>   }
>>
>> +int relinquish_p2m_mapping(struct domain *d)
>> +{
>> +    struct p2m_domain *p2m = &d->arch.p2m;
>> +    unsigned long gfn, count = 0;
>> +    int rc = 0;
>> +
>> +    for ( gfn = p2m->next_gfn_to_relinquish;
>> +          gfn < p2m->max_mapped_gfn; gfn++ )
>
> I know that Tim has been keen to get rid of these sorts of loops on x86,
> and with good reason I think.
>
>> +    {
>> +        p2m_type_t t;
>> +        paddr_t p = p2m_lookup(d, gfn, &t);
>
> This does the full walk for each address, even though 2/3 of the levels
> are more than likely identical to the previous gfn.
>
> It would be better to do a full walk, which sadly will look a lot like
> p2m_lookup, no avoiding that I think.
>
> You can still resume the walk based on next_gfn_to_relinquish and bound
> it on max_mapped_gfn, although I don't think it is strictly necessary.
>> +        unsigned long mfn = p >> PAGE_SHIFT;
>> +
>> +        if ( mfn_valid(mfn) && p2m_is_foreign(t) )
>
> I think it would be worth reiterating in a comment that we only take a
> ref for foreign mappings right now.

Will do.

>> +        {
>> +            put_page(mfn_to_page(mfn));
>> +            guest_physmap_remove_page(d, gfn, mfn, 0);
>
> You should unmap it and then put it I think.
>
> Is this going to do yet another lookup/walk?
>
> The REMOVE case of create_p2m_entries is so trivial you could open code
> it here, or if you wanted to you could refactor it into a helper.
>
> I am wondering if the conditional put page ought to be at the point of
> removal (i.e. in the helper) rather than here. (I think Tim made a
> similar comment on the x86 version of the remove_from_physmap pvh
> patches, you probably need to match the generic change which that
> implies)
>
> BTW, if you do the clear (but not the put_page) for every entry then the
> present bit (or pte.bits == 0) might be a useful proxy for
> next_lN_entry? I suppose even walking, say, 4GB of mostly empty P2M is
> not going to be super cheap.

Following the discution we had IRL, I will try to extend 
create_p2m_entries by adding RELINQUISH option.

>> +        }
>> +
>> +        count++;
>> +
>> +        /* Preempt every 2MiB. Arbitrary */
>> +        if ( (count == 512) && hypercall_preempt_check() )
>> +        {
>> +            p2m->next_gfn_to_relinquish = gfn + 1;
>> +            rc = -EAGAIN;
>
> I think I'm just failing to find it, but where is the call to
> hypercall_create_continuation? I suppose it is somewhere way up the
> stack?

Actually rc = -EAGAIN will be catched by xc_domain_destroy in libxc. The 
function will call again the hypercall if needed.

I'm not sure why... do you have any idea why x86 also uses this trick?

> I'm not sure the count == 512 is needed -- hypercall_preempt_check
> should be sufficient?

On ARM hypercall_preempt_check is a bit complex. Checking every loop 
seems a bit overkill.
Julien Grall Dec. 10, 2013, 2:36 a.m. | #3
On 12/10/2013 01:31 AM, Julien Grall wrote:
>
>
> On 12/09/2013 04:28 PM, Ian Campbell wrote:
>> On Mon, 2013-12-09 at 03:34 +0000, Julien Grall wrote:
>>> This function will be called when the domain relinquishes its memory.
>>> It removes refcount on every mapped page to a valid MFN.
>>>
>>> Currently, Xen doesn't take refcount on every new mapping but only
>>> for foreign
>>> mapping. Restrict the function only on foreign mapping.
>>
>> Skimming the remainder of the patch's titles and recalling a previous
>> conversation the intention is not to extend this for 4.4, correct?
>
> Right, it's too big for Xen 4.4.
>
>>>
>>> Signed-off-by: Julien Grall <julien.grall@linaro.org>
>>>
>>> ---
>>>      Changes in v2:
>>>          - Introduce the patch
>>> ---
>>>   xen/arch/arm/domain.c        |    5 +++++
>>>   xen/arch/arm/p2m.c           |   47
>>> ++++++++++++++++++++++++++++++++++++++++++
>>>   xen/include/asm-arm/domain.h |    1 +
>>>   xen/include/asm-arm/p2m.h    |   15 ++++++++++++++
>>>   4 files changed, 68 insertions(+)
>>>
>>> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
>>> index 1590708..e7c2f67 100644
>>> --- a/xen/arch/arm/domain.c
>>> +++ b/xen/arch/arm/domain.c
>>> @@ -717,6 +717,11 @@ int domain_relinquish_resources(struct domain *d)
>>>           if ( ret )
>>>               return ret;
>>>
>>> +    case RELMEM_mapping:
>>
>> Something somewhere should be setting d->arch.relmem = RELMEM_mapping at
>> the appropriate time. (immediately above I think?)
>>
>> You also want a "Fallthrough" comment just above.
>
> Oops, I will update the patch for the next version.
>
>>> +        ret = relinquish_p2m_mapping(d);
>>> +        if ( ret )
>>> +            return ret;
>>> +
>>>           d->arch.relmem = RELMEM_done;
>>>           /* Fallthrough */
>>>
>>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>>> index f0bbaca..dbd6a06 100644
>>> --- a/xen/arch/arm/p2m.c
>>> +++ b/xen/arch/arm/p2m.c
>>> @@ -6,6 +6,7 @@
>>>   #include <xen/bitops.h>
>>>   #include <asm/flushtlb.h>
>>>   #include <asm/gic.h>
>>> +#include <asm/event.h>
>>>
>>>   /* First level P2M is 2 consecutive pages */
>>>   #define P2M_FIRST_ORDER 1
>>> @@ -320,6 +321,16 @@ static int create_p2m_entries(struct domain *d,
>>>               flush_tlb_all_local();
>>>       }
>>>
>>> +    if ( (t == p2m_ram_rw) || (t == p2m_ram_ro) || (t ==
>>> p2m_map_foreign))
>>> +    {
>>> +        unsigned long sgfn = paddr_to_pfn(start_gpaddr);
>>> +        unsigned long egfn = paddr_to_pfn(end_gpaddr);
>>> +
>>> +        p2m->max_mapped_gfn = MAX(p2m->max_mapped_gfn, egfn);
>>> +        /* Use next_gfn_to_relinquish to store the lowest gfn mapped */
>>> +        p2m->next_gfn_to_relinquish =
>>> MIN(p2m->next_gfn_to_relinquish, sgfn);
>>> +    }
>>> +
>>>       rc = 0;
>>>
>>>   out:
>>> @@ -503,12 +514,48 @@ int p2m_init(struct domain *d)
>>>
>>>       p2m->first_level = NULL;
>>>
>>> +    p2m->max_mapped_gfn = 0;
>>> +    p2m->next_gfn_to_relinquish = ULONG_MAX;
>>> +
>>>   err:
>>>       spin_unlock(&p2m->lock);
>>>
>>>       return rc;
>>>   }
>>>
>>> +int relinquish_p2m_mapping(struct domain *d)
>>> +{
>>> +    struct p2m_domain *p2m = &d->arch.p2m;
>>> +    unsigned long gfn, count = 0;
>>> +    int rc = 0;
>>> +
>>> +    for ( gfn = p2m->next_gfn_to_relinquish;
>>> +          gfn < p2m->max_mapped_gfn; gfn++ )
>>
>> I know that Tim has been keen to get rid of these sorts of loops on x86,
>> and with good reason I think.
>>
>>> +    {
>>> +        p2m_type_t t;
>>> +        paddr_t p = p2m_lookup(d, gfn, &t);
>>
>> This does the full walk for each address, even though 2/3 of the levels
>> are more than likely identical to the previous gfn.
>>
>> It would be better to do a full walk, which sadly will look a lot like
>> p2m_lookup, no avoiding that I think.
>>
>> You can still resume the walk based on next_gfn_to_relinquish and bound
>> it on max_mapped_gfn, although I don't think it is strictly necessary.
>>> +        unsigned long mfn = p >> PAGE_SHIFT;
>>> +
>>> +        if ( mfn_valid(mfn) && p2m_is_foreign(t) )
>>
>> I think it would be worth reiterating in a comment that we only take a
>> ref for foreign mappings right now.
>
> Will do.
>
>>> +        {
>>> +            put_page(mfn_to_page(mfn));
>>> +            guest_physmap_remove_page(d, gfn, mfn, 0);
>>
>> You should unmap it and then put it I think.
>>
>> Is this going to do yet another lookup/walk?
>>
>> The REMOVE case of create_p2m_entries is so trivial you could open code
>> it here, or if you wanted to you could refactor it into a helper.
>>
>> I am wondering if the conditional put page ought to be at the point of
>> removal (i.e. in the helper) rather than here. (I think Tim made a
>> similar comment on the x86 version of the remove_from_physmap pvh
>> patches, you probably need to match the generic change which that
>> implies)
>>
>> BTW, if you do the clear (but not the put_page) for every entry then the
>> present bit (or pte.bits == 0) might be a useful proxy for
>> next_lN_entry? I suppose even walking, say, 4GB of mostly empty P2M is
>> not going to be super cheap.
>
> Following the discution we had IRL, I will try to extend
> create_p2m_entries by adding RELINQUISH option.
>
>>> +        }
>>> +
>>> +        count++;
>>> +
>>> +        /* Preempt every 2MiB. Arbitrary */
>>> +        if ( (count == 512) && hypercall_preempt_check() )
>>> +        {
>>> +            p2m->next_gfn_to_relinquish = gfn + 1;
>>> +            rc = -EAGAIN;
>>
>> I think I'm just failing to find it, but where is the call to
>> hypercall_create_continuation? I suppose it is somewhere way up the
>> stack?
>
> Actually rc = -EAGAIN will be catched by xc_domain_destroy in libxc. The
> function will call again the hypercall if needed.
>
> I'm not sure why... do you have any idea why x86 also uses this trick?
>
>> I'm not sure the count == 512 is needed -- hypercall_preempt_check
>> should be sufficient?
>
> On ARM hypercall_preempt_check is a bit complex. Checking every loop
> seems a bit overkill.
>

So I gave a try to remove the (count == 512) and it's very very slow 
(more than 10 times slower). So I will keep it.
Ian Campbell Dec. 10, 2013, 9:34 a.m. | #4
On Tue, 2013-12-10 at 01:31 +0000, Julien Grall wrote:
> 
> On 12/09/2013 04:28 PM, Ian Campbell wrote:
> > On Mon, 2013-12-09 at 03:34 +0000, Julien Grall wrote:
> >> This function will be called when the domain relinquishes its memory.
> >> It removes refcount on every mapped page to a valid MFN.
> >>
> >> Currently, Xen doesn't take refcount on every new mapping but only for foreign
> >> mapping. Restrict the function only on foreign mapping.
> >
> > Skimming the remainder of the patch's titles and recalling a previous
> > conversation the intention is not to extend this for 4.4, correct?
> 
> Right, it's too big for Xen 4.4.
> 
> >>
> >> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> >>
> >> ---
> >>      Changes in v2:
> >>          - Introduce the patch
> >> ---
> >>   xen/arch/arm/domain.c        |    5 +++++
> >>   xen/arch/arm/p2m.c           |   47 ++++++++++++++++++++++++++++++++++++++++++
> >>   xen/include/asm-arm/domain.h |    1 +
> >>   xen/include/asm-arm/p2m.h    |   15 ++++++++++++++
> >>   4 files changed, 68 insertions(+)
> >>
> >> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> >> index 1590708..e7c2f67 100644
> >> --- a/xen/arch/arm/domain.c
> >> +++ b/xen/arch/arm/domain.c
> >> @@ -717,6 +717,11 @@ int domain_relinquish_resources(struct domain *d)
> >>           if ( ret )
> >>               return ret;
> >>
> >> +    case RELMEM_mapping:
> >
> > Something somewhere should be setting d->arch.relmem = RELMEM_mapping at
> > the appropriate time. (immediately above I think?)
> >
> > You also want a "Fallthrough" comment just above.
> 
> Oops, I will update the patch for the next version.
> 
> >> +        ret = relinquish_p2m_mapping(d);
> >> +        if ( ret )
> >> +            return ret;
> >> +
> >>           d->arch.relmem = RELMEM_done;
> >>           /* Fallthrough */
> >>
> >> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
> >> index f0bbaca..dbd6a06 100644
> >> --- a/xen/arch/arm/p2m.c
> >> +++ b/xen/arch/arm/p2m.c
> >> @@ -6,6 +6,7 @@
> >>   #include <xen/bitops.h>
> >>   #include <asm/flushtlb.h>
> >>   #include <asm/gic.h>
> >> +#include <asm/event.h>
> >>
> >>   /* First level P2M is 2 consecutive pages */
> >>   #define P2M_FIRST_ORDER 1
> >> @@ -320,6 +321,16 @@ static int create_p2m_entries(struct domain *d,
> >>               flush_tlb_all_local();
> >>       }
> >>
> >> +    if ( (t == p2m_ram_rw) || (t == p2m_ram_ro) || (t == p2m_map_foreign))
> >> +    {
> >> +        unsigned long sgfn = paddr_to_pfn(start_gpaddr);
> >> +        unsigned long egfn = paddr_to_pfn(end_gpaddr);
> >> +
> >> +        p2m->max_mapped_gfn = MAX(p2m->max_mapped_gfn, egfn);
> >> +        /* Use next_gfn_to_relinquish to store the lowest gfn mapped */
> >> +        p2m->next_gfn_to_relinquish = MIN(p2m->next_gfn_to_relinquish, sgfn);
> >> +    }
> >> +
> >>       rc = 0;
> >>
> >>   out:
> >> @@ -503,12 +514,48 @@ int p2m_init(struct domain *d)
> >>
> >>       p2m->first_level = NULL;
> >>
> >> +    p2m->max_mapped_gfn = 0;
> >> +    p2m->next_gfn_to_relinquish = ULONG_MAX;
> >> +
> >>   err:
> >>       spin_unlock(&p2m->lock);
> >>
> >>       return rc;
> >>   }
> >>
> >> +int relinquish_p2m_mapping(struct domain *d)
> >> +{
> >> +    struct p2m_domain *p2m = &d->arch.p2m;
> >> +    unsigned long gfn, count = 0;
> >> +    int rc = 0;
> >> +
> >> +    for ( gfn = p2m->next_gfn_to_relinquish;
> >> +          gfn < p2m->max_mapped_gfn; gfn++ )
> >
> > I know that Tim has been keen to get rid of these sorts of loops on x86,
> > and with good reason I think.
> >
> >> +    {
> >> +        p2m_type_t t;
> >> +        paddr_t p = p2m_lookup(d, gfn, &t);
> >
> > This does the full walk for each address, even though 2/3 of the levels
> > are more than likely identical to the previous gfn.
> >
> > It would be better to do a full walk, which sadly will look a lot like
> > p2m_lookup, no avoiding that I think.
> >
> > You can still resume the walk based on next_gfn_to_relinquish and bound
> > it on max_mapped_gfn, although I don't think it is strictly necessary.
> >> +        unsigned long mfn = p >> PAGE_SHIFT;
> >> +
> >> +        if ( mfn_valid(mfn) && p2m_is_foreign(t) )
> >
> > I think it would be worth reiterating in a comment that we only take a
> > ref for foreign mappings right now.
> 
> Will do.
> 
> >> +        {
> >> +            put_page(mfn_to_page(mfn));
> >> +            guest_physmap_remove_page(d, gfn, mfn, 0);
> >
> > You should unmap it and then put it I think.
> >
> > Is this going to do yet another lookup/walk?
> >
> > The REMOVE case of create_p2m_entries is so trivial you could open code
> > it here, or if you wanted to you could refactor it into a helper.
> >
> > I am wondering if the conditional put page ought to be at the point of
> > removal (i.e. in the helper) rather than here. (I think Tim made a
> > similar comment on the x86 version of the remove_from_physmap pvh
> > patches, you probably need to match the generic change which that
> > implies)
> >
> > BTW, if you do the clear (but not the put_page) for every entry then the
> > present bit (or pte.bits == 0) might be a useful proxy for
> > next_lN_entry? I suppose even walking, say, 4GB of mostly empty P2M is
> > not going to be super cheap.
> 
> Following the discution we had IRL, I will try to extend 
> create_p2m_entries by adding RELINQUISH option.

Looking at it again this morning the main change in behaviour will be to
continue on non-present entries instead of either filling them in or
aborting. It'll be interesting to see how this new mode of operation
fits into the function...

> >> +        }
> >> +
> >> +        count++;
> >> +
> >> +        /* Preempt every 2MiB. Arbitrary */
> >> +        if ( (count == 512) && hypercall_preempt_check() )
> >> +        {
> >> +            p2m->next_gfn_to_relinquish = gfn + 1;
> >> +            rc = -EAGAIN;
> >
> > I think I'm just failing to find it, but where is the call to
> > hypercall_create_continuation? I suppose it is somewhere way up the
> > stack?
> 
> Actually rc = -EAGAIN will be catched by xc_domain_destroy in libxc. The 
> function will call again the hypercall if needed.

Ah, I only followed the call chain on the hypervisor side.

> I'm not sure why... do you have any idea why x86 also uses this trick?

I suspect that this operation is so long running that it is beneficial
to return all the way to userspace context in order to allow other
processes to be scheduled. Avoiding the kernel's softlockup timer etc.

A hypercall continuation only give the opportunity for the guest kernel
to process interrupts IIRC.

> > I'm not sure the count == 512 is needed -- hypercall_preempt_check
> > should be sufficient?
> 
> On ARM hypercall_preempt_check is a bit complex. Checking every loop 
> seems a bit overkill.

OK. I see now that existing code has a mixture, every 512 is fine.
Perhaps using LPAE_ENTRIES is a little bit less arbitrary (although it
remains almost completely so...)

Ian.

Patch

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 1590708..e7c2f67 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -717,6 +717,11 @@  int domain_relinquish_resources(struct domain *d)
         if ( ret )
             return ret;
 
+    case RELMEM_mapping:
+        ret = relinquish_p2m_mapping(d);
+        if ( ret )
+            return ret;
+
         d->arch.relmem = RELMEM_done;
         /* Fallthrough */
 
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index f0bbaca..dbd6a06 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -6,6 +6,7 @@ 
 #include <xen/bitops.h>
 #include <asm/flushtlb.h>
 #include <asm/gic.h>
+#include <asm/event.h>
 
 /* First level P2M is 2 consecutive pages */
 #define P2M_FIRST_ORDER 1
@@ -320,6 +321,16 @@  static int create_p2m_entries(struct domain *d,
             flush_tlb_all_local();
     }
 
+    if ( (t == p2m_ram_rw) || (t == p2m_ram_ro) || (t == p2m_map_foreign))
+    {
+        unsigned long sgfn = paddr_to_pfn(start_gpaddr);
+        unsigned long egfn = paddr_to_pfn(end_gpaddr);
+
+        p2m->max_mapped_gfn = MAX(p2m->max_mapped_gfn, egfn);
+        /* Use next_gfn_to_relinquish to store the lowest gfn mapped */
+        p2m->next_gfn_to_relinquish = MIN(p2m->next_gfn_to_relinquish, sgfn);
+    }
+
     rc = 0;
 
 out:
@@ -503,12 +514,48 @@  int p2m_init(struct domain *d)
 
     p2m->first_level = NULL;
 
+    p2m->max_mapped_gfn = 0;
+    p2m->next_gfn_to_relinquish = ULONG_MAX;
+
 err:
     spin_unlock(&p2m->lock);
 
     return rc;
 }
 
+int relinquish_p2m_mapping(struct domain *d)
+{
+    struct p2m_domain *p2m = &d->arch.p2m;
+    unsigned long gfn, count = 0;
+    int rc = 0;
+
+    for ( gfn = p2m->next_gfn_to_relinquish;
+          gfn < p2m->max_mapped_gfn; gfn++ )
+    {
+        p2m_type_t t;
+        paddr_t p = p2m_lookup(d, gfn, &t);
+        unsigned long mfn = p >> PAGE_SHIFT;
+
+        if ( mfn_valid(mfn) && p2m_is_foreign(t) )
+        {
+            put_page(mfn_to_page(mfn));
+            guest_physmap_remove_page(d, gfn, mfn, 0);
+        }
+
+        count++;
+
+        /* Preempt every 2MiB. Arbitrary */
+        if ( (count == 512) && hypercall_preempt_check() )
+        {
+            p2m->next_gfn_to_relinquish = gfn + 1;
+            rc = -EAGAIN;
+            break;
+        }
+    }
+
+    return rc;
+}
+
 unsigned long gmfn_to_mfn(struct domain *d, unsigned long gpfn)
 {
     paddr_t p = p2m_lookup(d, pfn_to_paddr(gpfn), NULL);
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 922eda3..4a4c018 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -75,6 +75,7 @@  struct arch_domain
         RELMEM_not_started,
         RELMEM_xen,
         RELMEM_page,
+        RELMEM_mapping,
         RELMEM_done,
     } relmem;
 
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index b63204d..b0d3aea 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -18,6 +18,15 @@  struct p2m_domain {
 
     /* Current VMID in use */
     uint8_t vmid;
+
+    /* Highest guest frame that's ever been mapped in the p2m
+     * Take only into account ram and foreign mapping
+     */
+    unsigned long max_mapped_gfn;
+
+    /* When releasing mapped gfn's in a preemptible manner, recall where
+     * to resume the search */
+    unsigned long next_gfn_to_relinquish;
 };
 
 /* List of possible type for each page in the p2m
@@ -48,6 +57,12 @@  int p2m_init(struct domain *d);
 /* Return all the p2m resources to Xen. */
 void p2m_teardown(struct domain *d);
 
+/* Remove mapping refcount on each mapping page in the p2m
+ *
+ * TODO: For the moment only foreign mapping is handled
+ */
+int relinquish_p2m_mapping(struct domain *d);
+
 /* Allocate a new p2m table for a domain.
  *
  * Returns 0 for success or -errno.