[Xen-devel,RFC,12/22] xen/arm: p2m: Introduce p2m_get_entry and use it to implement __p2m_lookup

Message ID 1469717505-8026-13-git-send-email-julien.grall@arm.com
State New
Headers show

Commit Message

Julien Grall July 28, 2016, 2:51 p.m.
Currently, for a given GFN, the function __p2m_lookup will only return
the associated MFN and the p2m type of the mapping.

In some case we need the order of the mapping and the memaccess
permission. Rather than providing separate function for this purpose,
it is better to implement a generic function to return all the
information.

To avoid passing dummy parameter, a caller that does need a specific
information can use NULL instead.

The list of the informations retrieved is based on the x86 version. All
of them will be used in follow-up patches.

It might have been possible to extend __p2m_lookup, however I choose to
reimplement it from scratch to allow sharing some helpers with the
function that will update the P2M (will be added in a follow-up patch).

Signed-off-by: Julien Grall <julien.grall@arm.com>
---
 xen/arch/arm/p2m.c         | 188 ++++++++++++++++++++++++++++++++++-----------
 xen/include/asm-arm/page.h |   4 +
 2 files changed, 149 insertions(+), 43 deletions(-)

Comments

Julien Grall Aug. 31, 2016, 12:25 p.m. | #1
Hi Stefano,

On 31/08/16 01:30, Stefano Stabellini wrote:
> On Thu, 28 Jul 2016, Julien Grall wrote:
>> Currently, for a given GFN, the function __p2m_lookup will only return
>> the associated MFN and the p2m type of the mapping.
>>
>> In some case we need the order of the mapping and the memaccess
>> permission. Rather than providing separate function for this purpose,
>                                     ^ a separate
>
>> it is better to implement a generic function to return all the
>> information.
>>
>> To avoid passing dummy parameter, a caller that does need a specific
>                                                       ^ not need?

Yes, I will fix it in the next version.

>
>
>> information can use NULL instead.
>>
>> The list of the informations retrieved is based on the x86 version. All
>> of them will be used in follow-up patches.
>>
>> It might have been possible to extend __p2m_lookup, however I choose to
>> reimplement it from scratch to allow sharing some helpers with the
>> function that will update the P2M (will be added in a follow-up patch).
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> ---
>>  xen/arch/arm/p2m.c         | 188 ++++++++++++++++++++++++++++++++++-----------
>>  xen/include/asm-arm/page.h |   4 +
>>  2 files changed, 149 insertions(+), 43 deletions(-)
>>
>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
>> index d4a4b62..8676b9d 100644
>> --- a/xen/arch/arm/p2m.c
>> +++ b/xen/arch/arm/p2m.c
>> @@ -36,6 +36,8 @@ static const paddr_t level_masks[] =
>>      { ZEROETH_MASK, FIRST_MASK, SECOND_MASK, THIRD_MASK };
>>  static const unsigned int level_shifts[] =
>>      { ZEROETH_SHIFT, FIRST_SHIFT, SECOND_SHIFT, THIRD_SHIFT };
>> +static const unsigned int level_orders[] =
>> +    { ZEROETH_ORDER, FIRST_ORDER, SECOND_ORDER, THIRD_ORDER };
>>
>>  static bool_t p2m_valid(lpae_t pte)
>>  {
>> @@ -236,28 +238,99 @@ static lpae_t *p2m_get_root_pointer(struct p2m_domain *p2m,
>>
>>  /*
>>   * Lookup the MFN corresponding to a domain's GFN.
>> + * Lookup mem access in the ratrix tree.
>> + * The entries associated to the GFN is considered valid.
>> + */
>> +static p2m_access_t p2m_mem_access_radix_get(struct p2m_domain *p2m, gfn_t gfn)
>> +{
>> +    void *ptr;
>> +
>> +    if ( !p2m->mem_access_enabled )
>> +        return p2m_access_rwx;
>
> Shouldn't this be p2m->default_access?

default_access will always be p2m_access_rwx when memaccess is disabled. 
It will lead to crash a if you try to restrict permission without memaccess.

Note that, this is matching the behavior of __p2m_get_mem_access.

>
>
>> +    ptr = radix_tree_lookup(&p2m->mem_access_settings, gfn_x(gfn));
>> +    if ( !ptr )
>> +        return p2m_access_rwx;
>
> Same here?

The radix tree will contain all the permission restriction but 
p2m_access_rwx. This is because you may change the default_access whilst 
memaccess is enabled and you don't know what page was restricted with 
the default access.

Note that this is matching the behavior of p2m_mem_access_radix_set.

>
>
>> +    else
>> +        return radix_tree_ptr_to_int(ptr);
>> +}
>> +
>> +#define GUEST_TABLE_MAP_FAILED 0
>> +#define GUEST_TABLE_SUPER_PAGE 1
>> +#define GUEST_TABLE_NORMAL_PAGE 2
>> +
>> +static int p2m_create_table(struct p2m_domain *p2m, lpae_t *entry,
>> +                            int level_shift);
>> +
>> +/*
>> + * Take the currently mapped table, find the corresponding GFN entry,
>> + * and map the next table, if available.
>
> It is important to write down that the function also unmaps the previous
> table.

Will do.

>
>>   *
>> - * There are no processor functions to do a stage 2 only lookup therefore we
>> - * do a a software walk.
>> + * Return values:
>> + *  GUEST_TABLE_MAP_FAILED: Either read_only was set and the entry
>> + *  was empty, or allocating a new page failed.
>> + *  GUEST_TABLE_NORMAL_PAGE: next level mapped normally
>> + *  GUEST_TABLE_SUPER_PAGE: The next entry points to a superpage.
>>   */
>> -static mfn_t __p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
>> +static int p2m_next_level(struct p2m_domain *p2m, bool read_only,
>> +                          lpae_t **table, unsigned int offset)
>>  {
>> -    struct p2m_domain *p2m = &d->arch.p2m;
>> -    const paddr_t paddr = pfn_to_paddr(gfn_x(gfn));
>> -    const unsigned int offsets[4] = {
>> -        zeroeth_table_offset(paddr),
>> -        first_table_offset(paddr),
>> -        second_table_offset(paddr),
>> -        third_table_offset(paddr)
>> -    };
>> -    const paddr_t masks[4] = {
>> -        ZEROETH_MASK, FIRST_MASK, SECOND_MASK, THIRD_MASK
>> -    };
>> -    lpae_t pte, *map;
>> +    lpae_t *entry;
>> +    int ret;
>> +    mfn_t mfn;
>> +
>> +    entry = *table + offset;
>> +
>> +    if ( !p2m_valid(*entry) )
>> +    {
>> +        if ( read_only )
>> +            return GUEST_TABLE_MAP_FAILED;
>> +
>> +        ret = p2m_create_table(p2m, entry, /* not used */ ~0);
>> +        if ( ret )
>> +            return GUEST_TABLE_MAP_FAILED;
>> +    }
>> +
>> +    /* The function p2m_next_level is never called at the 3rd level */
>> +    if ( p2m_mapping(*entry) )
>> +        return GUEST_TABLE_SUPER_PAGE;
>> +
>> +    mfn = _mfn(entry->p2m.base);
>> +
>> +    unmap_domain_page(*table);
>> +    *table = map_domain_page(mfn);
>> +
>> +    return GUEST_TABLE_NORMAL_PAGE;
>
> I am a bit worried about having the same function doing the lookup and
> creating new tables, especially given it doesn't tell you whether the
> entry was already there or it was created: the return value is the same
> in both cases. At the very least the return values should be different.

I don't understand your worry here. Why would you care that the table 
has been allocated or was already existing?

If the caller does not want to allocate table, it can request to browse 
the entry in a read-only mode (see read_only).

>
>
>> +}
>> +
>> +/*
>> + * Get the details of a given gfn.
>> + *
>> + * If the entry is present, the associated MFN will be returned and the
>> + * access and type filled up. The page_order will correspond to the
>> + * order of the mapping in the page table (i.e it could be a superpage).
>> + *
>> + * If the entry is not present, INVALID_MFN will be returned and the
>> + * page_order will be set according to the order of the invalid range.
>> + */
>> +static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
>> +                           p2m_type_t *t, p2m_access_t *a,
>> +                           unsigned int *page_order)
>> +{
>> +    paddr_t addr = pfn_to_paddr(gfn_x(gfn));
>> +    unsigned int level = 0;
>> +    lpae_t entry, *table;
>> +    int rc;
>>      mfn_t mfn = INVALID_MFN;
>> -    paddr_t mask = 0;
>>      p2m_type_t _t;
>> -    unsigned int level;
>> +
>> +    /* Convenience aliases */
>> +    const unsigned int offsets[4] = {
>> +        zeroeth_table_offset(addr),
>> +        first_table_offset(addr),
>> +        second_table_offset(addr),
>> +        third_table_offset(addr)
>> +    };
>>
>>      ASSERT(p2m_is_locked(p2m));
>>      BUILD_BUG_ON(THIRD_MASK != PAGE_MASK);
>> @@ -267,46 +340,75 @@ static mfn_t __p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
>>
>>      *t = p2m_invalid;
>>
>> -    map = p2m_get_root_pointer(p2m, gfn);
>> -    if ( !map )
>> -        return INVALID_MFN;
>> +    /* XXX: Check if the mapping is lower than the mapped gfn */
>>
>> -    ASSERT(P2M_ROOT_LEVEL < 4);
>> -
>> -    for ( level = P2M_ROOT_LEVEL ; level < 4 ; level++ )
>> +    /* This gfn is higher than the highest the p2m map currently holds */
>> +    if ( gfn_x(gfn) > gfn_x(p2m->max_mapped_gfn) )
>>      {
>> -        mask = masks[level];
>> -
>> -        pte = map[offsets[level]];
>> +        for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
>> +        {
>> +            if ( (gfn_x(gfn) & (level_masks[level] >> PAGE_SHIFT)) >
>> +                 gfn_x(p2m->max_mapped_gfn) )
>> +                break;
>> +            goto out;
>
> I am not sure what this loop is for, but it looks wrong.

As mentioned in the description of the function, if the entry is not 
present, the function will return the order of the invalid range.

The loop will find the highest possible order by checking the base of 
the block mapping is greater than the max mapped gfn.

>
>
>> +        }
>> +    }
>>
>> -        if ( level == 3 && !p2m_table(pte) )
>> -            /* Invalid, clobber the pte */
>> -            pte.bits = 0;
>> -        if ( level == 3 || !p2m_table(pte) )
>> -            /* Done */
>> -            break;
>> +    table = p2m_get_root_pointer(p2m, gfn);
>>
>> -        ASSERT(level < 3);
>> +    /*
>> +     * the table should always be non-NULL because the gfn is below
>> +     * p2m->max_mapped_gfn and the root table pages are always present.
>> +     */
>> +    BUG_ON(table == NULL);
>>
>> -        /* Map for next level */
>> -        unmap_domain_page(map);
>> -        map = map_domain_page(_mfn(pte.p2m.base));
>> +    for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
>> +    {
>> +        rc = p2m_next_level(p2m, true, &table, offsets[level]);
>> +        if ( rc == GUEST_TABLE_MAP_FAILED )
>> +            goto out_unmap;
>> +        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
>> +            break;
>>      }
>>
>> -    unmap_domain_page(map);
>> +    entry = table[offsets[level]];
>>
>> -    if ( p2m_valid(pte) )
>> +    if ( p2m_valid(entry) )
>>      {
>> -        ASSERT(mask);
>> -        ASSERT(pte.p2m.type != p2m_invalid);
>> -        mfn = _mfn(paddr_to_pfn((pte.bits & PADDR_MASK & mask) |
>> -                                (paddr & ~mask)));
>> -        *t = pte.p2m.type;
>> +        *t = entry.p2m.type;
>
> Why don't you have a check for ( t ) like you have done in the case of
> ( a )? It would be more consistent.

This is done at the beginning of the function:

/* Allow t to be NULL */
t = t ?: &_t;

*t = p2m_invalid;

This was kept from the implementation of __p2m_lookup because some 
callers check the type before checking if the MFN is invalid (e.g 
get_page_from_gfn).

I didn't follow the same principle for the access because going through 
the radix tree is expensive and most of the hot path does not care about 
the access.

>
>
>> +
>> +        if ( a )
>> +            *a = p2m_mem_access_radix_get(p2m, gfn);
>> +
>> +        mfn = _mfn(entry.p2m.base);
>> +        /*
>> +         * The entry may point to a superpage. Find the MFN associated
>> +         * to the GFN.
>> +         */
>> +        mfn = mfn_add(mfn, gfn_x(gfn) & ((1UL << level_orders[level]) - 1));
>>      }
>>
>> +out_unmap:
>> +    unmap_domain_page(table);
>> +
>> +out:
>> +    if ( page_order )
>> +        *page_order = level_shifts[level] - PAGE_SHIFT;
>> +
>>      return mfn;
>>  }
>>
>> +/*
>> + * Lookup the MFN corresponding to a domain's GFN.
>> + *
>> + * There are no processor functions to do a stage 2 only lookup therefore we
>> + * do a a software walk.
>> + */
>> +static mfn_t __p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
>> +{
>> +    return p2m_get_entry(&d->arch.p2m, gfn, t, NULL, NULL);
>> +}
>> +
>>  mfn_t p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
>>  {
>>      mfn_t ret;
>> diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
>> index 05d9f82..1c5bd8b 100644
>> --- a/xen/include/asm-arm/page.h
>> +++ b/xen/include/asm-arm/page.h
>> @@ -457,15 +457,19 @@ static inline int gva_to_ipa(vaddr_t va, paddr_t *paddr, unsigned int flags)
>>  #define LPAE_ENTRY_MASK (LPAE_ENTRIES - 1)
>>
>>  #define THIRD_SHIFT    (PAGE_SHIFT)
>> +#define THIRD_ORDER    0
>>  #define THIRD_SIZE     ((paddr_t)1 << THIRD_SHIFT)
>>  #define THIRD_MASK     (~(THIRD_SIZE - 1))
>>  #define SECOND_SHIFT   (THIRD_SHIFT + LPAE_SHIFT)
>> +#define SECOND_ORDER   (THIRD_ORDER + LPAE_SHIFT)
>>  #define SECOND_SIZE    ((paddr_t)1 << SECOND_SHIFT)
>>  #define SECOND_MASK    (~(SECOND_SIZE - 1))
>>  #define FIRST_SHIFT    (SECOND_SHIFT + LPAE_SHIFT)
>> +#define FIRST_ORDER    (SECOND_ORDER + LPAE_SHIFT)
>>  #define FIRST_SIZE     ((paddr_t)1 << FIRST_SHIFT)
>>  #define FIRST_MASK     (~(FIRST_SIZE - 1))
>>  #define ZEROETH_SHIFT  (FIRST_SHIFT + LPAE_SHIFT)
>> +#define ZEROETH_ORDER  (FIRST_ORDER + LPAE_SHIFT)
>>  #define ZEROETH_SIZE   ((paddr_t)1 << ZEROETH_SHIFT)
>>  #define ZEROETH_MASK   (~(ZEROETH_SIZE - 1))
>
> It might be clearer to define them by SHIFT:
>
>     #define THIRD_ORDER     (THIRD_SHIFT - PAGE_SHIFT)
>     #define SECOND_ORDER    (SECOND_SHIFT - PAGE_SHIFT)
>     #define FIRST_ORDER     (FIRST_SHIFT - PAGE_SHIFT)
>     #define ZEROETH_ORDER   (ZEROETH_SHIFT - PAGE_SHIFT)

I was following the way the other constant has been defined. The order 
of the 2nd level can be expressed in term of the order of the 3rd level...

I don't think this is clearer, but I don't mind to use them.

>
> or avoid them and just use level_shifts which is already defined.  I
> don't think they add much value.

Really? It avoids to spread - {PAGE,LPAE}_SHIFT everywhere in the code 
so the code will be cleaner.

Note that I have noticed that there are still few places where 
(level_shifts - PAGE_SHIFT) is still in use. However, we can replace by 
level_orders.

Regards,
Julien Grall Sept. 1, 2016, 11:37 a.m. | #2
Hi Stefano,

On 31/08/16 20:33, Stefano Stabellini wrote:
> On Wed, 31 Aug 2016, Julien Grall wrote:
>>>> @@ -236,28 +238,99 @@ static lpae_t *p2m_get_root_pointer(struct
>>>> p2m_domain *p2m,
>>>>
>>>>  /*
>>>>   * Lookup the MFN corresponding to a domain's GFN.
>>>> + * Lookup mem access in the ratrix tree.
>>>> + * The entries associated to the GFN is considered valid.
>>>> + */
>>>> +static p2m_access_t p2m_mem_access_radix_get(struct p2m_domain *p2m,
>>>> gfn_t gfn)
>>>> +{
>>>> +    void *ptr;
>>>> +
>>>> +    if ( !p2m->mem_access_enabled )
>>>> +        return p2m_access_rwx;
>>>
>>> Shouldn't this be p2m->default_access?
>>
>> default_access will always be p2m_access_rwx when memaccess is disabled. It
>> will lead to crash a if you try to restrict permission without memaccess.
>>
>> Note that, this is matching the behavior of __p2m_get_mem_access.
>
> I would like to avoid some places to use p2m_access_rwx and others to
> use p2m->default_access. But it is true that in this context both are
> fine. This was just a suggestion.

Thinking a bit more, this will allow us to catch any error where 
mem_access is not enabled but default_access is not p2m_access_rwx.

I will use p2m->default_access here for this case (the one below should 
stay p2m_access_rwx).

>
>>>> +    ptr = radix_tree_lookup(&p2m->mem_access_settings, gfn_x(gfn));
>>>> +    if ( !ptr )
>>>> +        return p2m_access_rwx;
>>>
>>> Same here?
>>
>> The radix tree will contain all the permission restriction but p2m_access_rwx.
>> This is because you may change the default_access whilst memaccess is enabled
>> and you don't know what page was restricted with the default access.
>>
>> Note that this is matching the behavior of p2m_mem_access_radix_set.

[...]

>>>
>>>>   *
>>>> - * There are no processor functions to do a stage 2 only lookup therefore
>>>> we
>>>> - * do a a software walk.
>>>> + * Return values:
>>>> + *  GUEST_TABLE_MAP_FAILED: Either read_only was set and the entry
>>>> + *  was empty, or allocating a new page failed.
>>>> + *  GUEST_TABLE_NORMAL_PAGE: next level mapped normally
>>>> + *  GUEST_TABLE_SUPER_PAGE: The next entry points to a superpage.
>>>>   */
>>>> -static mfn_t __p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
>>>> +static int p2m_next_level(struct p2m_domain *p2m, bool read_only,
>>>> +                          lpae_t **table, unsigned int offset)
>>>>  {
>>>> -    struct p2m_domain *p2m = &d->arch.p2m;
>>>> -    const paddr_t paddr = pfn_to_paddr(gfn_x(gfn));
>>>> -    const unsigned int offsets[4] = {
>>>> -        zeroeth_table_offset(paddr),
>>>> -        first_table_offset(paddr),
>>>> -        second_table_offset(paddr),
>>>> -        third_table_offset(paddr)
>>>> -    };
>>>> -    const paddr_t masks[4] = {
>>>> -        ZEROETH_MASK, FIRST_MASK, SECOND_MASK, THIRD_MASK
>>>> -    };
>>>> -    lpae_t pte, *map;
>>>> +    lpae_t *entry;
>>>> +    int ret;
>>>> +    mfn_t mfn;
>>>> +
>>>> +    entry = *table + offset;
>>>> +
>>>> +    if ( !p2m_valid(*entry) )
>>>> +    {
>>>> +        if ( read_only )
>>>> +            return GUEST_TABLE_MAP_FAILED;
>>>> +
>>>> +        ret = p2m_create_table(p2m, entry, /* not used */ ~0);
>>>> +        if ( ret )
>>>> +            return GUEST_TABLE_MAP_FAILED;
>>>> +    }
>>>> +
>>>> +    /* The function p2m_next_level is never called at the 3rd level */
>>>> +    if ( p2m_mapping(*entry) )
>>>> +        return GUEST_TABLE_SUPER_PAGE;
>>>> +
>>>> +    mfn = _mfn(entry->p2m.base);
>>>> +
>>>> +    unmap_domain_page(*table);
>>>> +    *table = map_domain_page(mfn);
>>>> +
>>>> +    return GUEST_TABLE_NORMAL_PAGE;
>>>
>>> I am a bit worried about having the same function doing the lookup and
>>> creating new tables, especially given it doesn't tell you whether the
>>> entry was already there or it was created: the return value is the same
>>> in both cases. At the very least the return values should be different.
>>
>> I don't understand your worry here. Why would you care that the table has been
>> allocated or was already existing?
>>
>> If the caller does not want to allocate table, it can request to browse the
>> entry in a read-only mode (see read_only).
>
> To avoid unintentional side effects, such as calling this function for a
> lookup but passing read-only as false by mistake. But maybe we just need
> to be careful enough in the code reviews, after all this function won't
> be called that many times.

This would be the same if the caller does not check the return value 
properly, thinking page table will not be allocated.

I prefer to keep the interface like that for now and rely on the review. 
I will also document the behavior of read_only bit in the next version.

>
>
>
>>>> +}
>>>> +
>>>> +/*
>>>> + * Get the details of a given gfn.
>>>> + *
>>>> + * If the entry is present, the associated MFN will be returned and the
>>>> + * access and type filled up. The page_order will correspond to the
>>>> + * order of the mapping in the page table (i.e it could be a superpage).
>>>> + *
>>>> + * If the entry is not present, INVALID_MFN will be returned and the
>>>> + * page_order will be set according to the order of the invalid range.
>>>> + */
>>>> +static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
>>>> +                           p2m_type_t *t, p2m_access_t *a,
>>>> +                           unsigned int *page_order)
>>>> +{
>>>> +    paddr_t addr = pfn_to_paddr(gfn_x(gfn));
>>>> +    unsigned int level = 0;
>>>> +    lpae_t entry, *table;
>>>> +    int rc;
>>>>      mfn_t mfn = INVALID_MFN;
>>>> -    paddr_t mask = 0;
>>>>      p2m_type_t _t;
>>>> -    unsigned int level;
>>>> +
>>>> +    /* Convenience aliases */
>>>> +    const unsigned int offsets[4] = {
>>>> +        zeroeth_table_offset(addr),
>>>> +        first_table_offset(addr),
>>>> +        second_table_offset(addr),
>>>> +        third_table_offset(addr)
>>>> +    };
>>>>
>>>>      ASSERT(p2m_is_locked(p2m));
>>>>      BUILD_BUG_ON(THIRD_MASK != PAGE_MASK);
>>>> @@ -267,46 +340,75 @@ static mfn_t __p2m_lookup(struct domain *d, gfn_t
>>>> gfn, p2m_type_t *t)
>>>>
>>>>      *t = p2m_invalid;
>>>>
>>>> -    map = p2m_get_root_pointer(p2m, gfn);
>>>> -    if ( !map )
>>>> -        return INVALID_MFN;
>>>> +    /* XXX: Check if the mapping is lower than the mapped gfn */
>>>>
>>>> -    ASSERT(P2M_ROOT_LEVEL < 4);
>>>> -
>>>> -    for ( level = P2M_ROOT_LEVEL ; level < 4 ; level++ )
>>>> +    /* This gfn is higher than the highest the p2m map currently holds */
>>>> +    if ( gfn_x(gfn) > gfn_x(p2m->max_mapped_gfn) )
>>>>      {
>>>> -        mask = masks[level];
>>>> -
>>>> -        pte = map[offsets[level]];
>>>> +        for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
>>>> +        {
>>>> +            if ( (gfn_x(gfn) & (level_masks[level] >> PAGE_SHIFT)) >
>>>> +                 gfn_x(p2m->max_mapped_gfn) )
>>>> +                break;
>>>> +            goto out;
>>>
>>> I am not sure what this loop is for, but it looks wrong.
>>
>> As mentioned in the description of the function, if the entry is not present,
>> the function will return the order of the invalid range.
>>
>> The loop will find the highest possible order by checking the base of the
>> block mapping is greater than the max mapped gfn.
>
> All right, but shouldn't the `goto out` be right after the loop?
> Otherwise it is not really a loop :-)

Oh, right. Wei Chen pointed me out this issue a couple of weeks ago and 
I fixed it in my development branch:

         for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
             if ( (gfn_x(gfn) & (level_masks[level] >> PAGE_SHIFT)) >
                  gfn_x(p2m->max_mapped_gfn) )
                 break;

         goto out;

>
>
>>>> +        }
>>>> +    }

[..]

>>>> +
>>>> +        if ( a )
>>>> +            *a = p2m_mem_access_radix_get(p2m, gfn);
>>>> +
>>>> +        mfn = _mfn(entry.p2m.base);
>>>> +        /*
>>>> +         * The entry may point to a superpage. Find the MFN associated
>>>> +         * to the GFN.
>>>> +         */
>>>> +        mfn = mfn_add(mfn, gfn_x(gfn) & ((1UL << level_orders[level]) -
>>>> 1));
>>>>      }
>>>>
>>>> +out_unmap:
>>>> +    unmap_domain_page(table);
>>>> +
>>>> +out:
>>>> +    if ( page_order )
>>>> +        *page_order = level_shifts[level] - PAGE_SHIFT;
>>>> +
>>>>      return mfn;
>>>>  }
>>>>
>>>> +/*
>>>> + * Lookup the MFN corresponding to a domain's GFN.
>>>> + *
>>>> + * There are no processor functions to do a stage 2 only lookup therefore
>>>> we
>>>> + * do a a software walk.
>>>> + */
>>>> +static mfn_t __p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
>>>> +{
>>>> +    return p2m_get_entry(&d->arch.p2m, gfn, t, NULL, NULL);
>>>> +}
>>>> +
>>>>  mfn_t p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
>>>>  {
>>>>      mfn_t ret;
>>>> diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
>>>> index 05d9f82..1c5bd8b 100644
>>>> --- a/xen/include/asm-arm/page.h
>>>> +++ b/xen/include/asm-arm/page.h
>>>> @@ -457,15 +457,19 @@ static inline int gva_to_ipa(vaddr_t va, paddr_t
>>>> *paddr, unsigned int flags)
>>>>  #define LPAE_ENTRY_MASK (LPAE_ENTRIES - 1)
>>>>
>>>>  #define THIRD_SHIFT    (PAGE_SHIFT)
>>>> +#define THIRD_ORDER    0
>>>>  #define THIRD_SIZE     ((paddr_t)1 << THIRD_SHIFT)
>>>>  #define THIRD_MASK     (~(THIRD_SIZE - 1))
>>>>  #define SECOND_SHIFT   (THIRD_SHIFT + LPAE_SHIFT)
>>>> +#define SECOND_ORDER   (THIRD_ORDER + LPAE_SHIFT)
>>>>  #define SECOND_SIZE    ((paddr_t)1 << SECOND_SHIFT)
>>>>  #define SECOND_MASK    (~(SECOND_SIZE - 1))
>>>>  #define FIRST_SHIFT    (SECOND_SHIFT + LPAE_SHIFT)
>>>> +#define FIRST_ORDER    (SECOND_ORDER + LPAE_SHIFT)
>>>>  #define FIRST_SIZE     ((paddr_t)1 << FIRST_SHIFT)
>>>>  #define FIRST_MASK     (~(FIRST_SIZE - 1))
>>>>  #define ZEROETH_SHIFT  (FIRST_SHIFT + LPAE_SHIFT)
>>>> +#define ZEROETH_ORDER  (FIRST_ORDER + LPAE_SHIFT)
>>>>  #define ZEROETH_SIZE   ((paddr_t)1 << ZEROETH_SHIFT)
>>>>  #define ZEROETH_MASK   (~(ZEROETH_SIZE - 1))
>>>
>>> It might be clearer to define them by SHIFT:
>>>
>>>     #define THIRD_ORDER     (THIRD_SHIFT - PAGE_SHIFT)
>>>     #define SECOND_ORDER    (SECOND_SHIFT - PAGE_SHIFT)
>>>     #define FIRST_ORDER     (FIRST_SHIFT - PAGE_SHIFT)
>>>     #define ZEROETH_ORDER   (ZEROETH_SHIFT - PAGE_SHIFT)
>>
>> I was following the way the other constant has been defined. The order of the
>> 2nd level can be expressed in term of the order of the 3rd level...
>>
>> I don't think this is clearer, but I don't mind to use them.
>>
>>> or avoid them and just use level_shifts which is already defined.  I
>>> don't think they add much value.
>>
>> Really? It avoids to spread - {PAGE,LPAE}_SHIFT everywhere in the code so the
>> code will be cleaner.
>>
>> Note that I have noticed that there are still few places where (level_shifts -
>> PAGE_SHIFT) is still in use. However, we can replace by level_orders.
>
> The reason why I suggested the alternative implementation is that
> "order" is not commonly used when dealing with pagetable levels (while
> "mask" and "shift" are). I had a guess about what it meant, but wasn't
> sure. To be sure I had to read the implementation. I think this version
> is more obvious about what it does. That said, this is just a
> suggestion, not a requirement.

Most of the generic Xen function are dealing with order and frame number 
(see guest_physmap_*). Those defines avoid to have to move back and 
forth between address and frame number. This helps to keep the typesafe 
gfn/mfn as far as possible.

I am happy to use the definition you suggested and document a bit more 
the code (though I am not sure what to say here).

Regards,

Patch

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index d4a4b62..8676b9d 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -36,6 +36,8 @@  static const paddr_t level_masks[] =
     { ZEROETH_MASK, FIRST_MASK, SECOND_MASK, THIRD_MASK };
 static const unsigned int level_shifts[] =
     { ZEROETH_SHIFT, FIRST_SHIFT, SECOND_SHIFT, THIRD_SHIFT };
+static const unsigned int level_orders[] =
+    { ZEROETH_ORDER, FIRST_ORDER, SECOND_ORDER, THIRD_ORDER };
 
 static bool_t p2m_valid(lpae_t pte)
 {
@@ -236,28 +238,99 @@  static lpae_t *p2m_get_root_pointer(struct p2m_domain *p2m,
 
 /*
  * Lookup the MFN corresponding to a domain's GFN.
+ * Lookup mem access in the ratrix tree.
+ * The entries associated to the GFN is considered valid.
+ */
+static p2m_access_t p2m_mem_access_radix_get(struct p2m_domain *p2m, gfn_t gfn)
+{
+    void *ptr;
+
+    if ( !p2m->mem_access_enabled )
+        return p2m_access_rwx;
+
+    ptr = radix_tree_lookup(&p2m->mem_access_settings, gfn_x(gfn));
+    if ( !ptr )
+        return p2m_access_rwx;
+    else
+        return radix_tree_ptr_to_int(ptr);
+}
+
+#define GUEST_TABLE_MAP_FAILED 0
+#define GUEST_TABLE_SUPER_PAGE 1
+#define GUEST_TABLE_NORMAL_PAGE 2
+
+static int p2m_create_table(struct p2m_domain *p2m, lpae_t *entry,
+                            int level_shift);
+
+/*
+ * Take the currently mapped table, find the corresponding GFN entry,
+ * and map the next table, if available.
  *
- * There are no processor functions to do a stage 2 only lookup therefore we
- * do a a software walk.
+ * Return values:
+ *  GUEST_TABLE_MAP_FAILED: Either read_only was set and the entry
+ *  was empty, or allocating a new page failed.
+ *  GUEST_TABLE_NORMAL_PAGE: next level mapped normally
+ *  GUEST_TABLE_SUPER_PAGE: The next entry points to a superpage.
  */
-static mfn_t __p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
+static int p2m_next_level(struct p2m_domain *p2m, bool read_only,
+                          lpae_t **table, unsigned int offset)
 {
-    struct p2m_domain *p2m = &d->arch.p2m;
-    const paddr_t paddr = pfn_to_paddr(gfn_x(gfn));
-    const unsigned int offsets[4] = {
-        zeroeth_table_offset(paddr),
-        first_table_offset(paddr),
-        second_table_offset(paddr),
-        third_table_offset(paddr)
-    };
-    const paddr_t masks[4] = {
-        ZEROETH_MASK, FIRST_MASK, SECOND_MASK, THIRD_MASK
-    };
-    lpae_t pte, *map;
+    lpae_t *entry;
+    int ret;
+    mfn_t mfn;
+
+    entry = *table + offset;
+
+    if ( !p2m_valid(*entry) )
+    {
+        if ( read_only )
+            return GUEST_TABLE_MAP_FAILED;
+
+        ret = p2m_create_table(p2m, entry, /* not used */ ~0);
+        if ( ret )
+            return GUEST_TABLE_MAP_FAILED;
+    }
+
+    /* The function p2m_next_level is never called at the 3rd level */
+    if ( p2m_mapping(*entry) )
+        return GUEST_TABLE_SUPER_PAGE;
+
+    mfn = _mfn(entry->p2m.base);
+
+    unmap_domain_page(*table);
+    *table = map_domain_page(mfn);
+
+    return GUEST_TABLE_NORMAL_PAGE;
+}
+
+/*
+ * Get the details of a given gfn.
+ *
+ * If the entry is present, the associated MFN will be returned and the
+ * access and type filled up. The page_order will correspond to the
+ * order of the mapping in the page table (i.e it could be a superpage).
+ *
+ * If the entry is not present, INVALID_MFN will be returned and the
+ * page_order will be set according to the order of the invalid range.
+ */
+static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
+                           p2m_type_t *t, p2m_access_t *a,
+                           unsigned int *page_order)
+{
+    paddr_t addr = pfn_to_paddr(gfn_x(gfn));
+    unsigned int level = 0;
+    lpae_t entry, *table;
+    int rc;
     mfn_t mfn = INVALID_MFN;
-    paddr_t mask = 0;
     p2m_type_t _t;
-    unsigned int level;
+
+    /* Convenience aliases */
+    const unsigned int offsets[4] = {
+        zeroeth_table_offset(addr),
+        first_table_offset(addr),
+        second_table_offset(addr),
+        third_table_offset(addr)
+    };
 
     ASSERT(p2m_is_locked(p2m));
     BUILD_BUG_ON(THIRD_MASK != PAGE_MASK);
@@ -267,46 +340,75 @@  static mfn_t __p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
 
     *t = p2m_invalid;
 
-    map = p2m_get_root_pointer(p2m, gfn);
-    if ( !map )
-        return INVALID_MFN;
+    /* XXX: Check if the mapping is lower than the mapped gfn */
 
-    ASSERT(P2M_ROOT_LEVEL < 4);
-
-    for ( level = P2M_ROOT_LEVEL ; level < 4 ; level++ )
+    /* This gfn is higher than the highest the p2m map currently holds */
+    if ( gfn_x(gfn) > gfn_x(p2m->max_mapped_gfn) )
     {
-        mask = masks[level];
-
-        pte = map[offsets[level]];
+        for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
+        {
+            if ( (gfn_x(gfn) & (level_masks[level] >> PAGE_SHIFT)) >
+                 gfn_x(p2m->max_mapped_gfn) )
+                break;
+            goto out;
+        }
+    }
 
-        if ( level == 3 && !p2m_table(pte) )
-            /* Invalid, clobber the pte */
-            pte.bits = 0;
-        if ( level == 3 || !p2m_table(pte) )
-            /* Done */
-            break;
+    table = p2m_get_root_pointer(p2m, gfn);
 
-        ASSERT(level < 3);
+    /*
+     * the table should always be non-NULL because the gfn is below
+     * p2m->max_mapped_gfn and the root table pages are always present.
+     */
+    BUG_ON(table == NULL);
 
-        /* Map for next level */
-        unmap_domain_page(map);
-        map = map_domain_page(_mfn(pte.p2m.base));
+    for ( level = P2M_ROOT_LEVEL; level < 3; level++ )
+    {
+        rc = p2m_next_level(p2m, true, &table, offsets[level]);
+        if ( rc == GUEST_TABLE_MAP_FAILED )
+            goto out_unmap;
+        else if ( rc != GUEST_TABLE_NORMAL_PAGE )
+            break;
     }
 
-    unmap_domain_page(map);
+    entry = table[offsets[level]];
 
-    if ( p2m_valid(pte) )
+    if ( p2m_valid(entry) )
     {
-        ASSERT(mask);
-        ASSERT(pte.p2m.type != p2m_invalid);
-        mfn = _mfn(paddr_to_pfn((pte.bits & PADDR_MASK & mask) |
-                                (paddr & ~mask)));
-        *t = pte.p2m.type;
+        *t = entry.p2m.type;
+
+        if ( a )
+            *a = p2m_mem_access_radix_get(p2m, gfn);
+
+        mfn = _mfn(entry.p2m.base);
+        /*
+         * The entry may point to a superpage. Find the MFN associated
+         * to the GFN.
+         */
+        mfn = mfn_add(mfn, gfn_x(gfn) & ((1UL << level_orders[level]) - 1));
     }
 
+out_unmap:
+    unmap_domain_page(table);
+
+out:
+    if ( page_order )
+        *page_order = level_shifts[level] - PAGE_SHIFT;
+
     return mfn;
 }
 
+/*
+ * Lookup the MFN corresponding to a domain's GFN.
+ *
+ * There are no processor functions to do a stage 2 only lookup therefore we
+ * do a a software walk.
+ */
+static mfn_t __p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
+{
+    return p2m_get_entry(&d->arch.p2m, gfn, t, NULL, NULL);
+}
+
 mfn_t p2m_lookup(struct domain *d, gfn_t gfn, p2m_type_t *t)
 {
     mfn_t ret;
diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index 05d9f82..1c5bd8b 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -457,15 +457,19 @@  static inline int gva_to_ipa(vaddr_t va, paddr_t *paddr, unsigned int flags)
 #define LPAE_ENTRY_MASK (LPAE_ENTRIES - 1)
 
 #define THIRD_SHIFT    (PAGE_SHIFT)
+#define THIRD_ORDER    0
 #define THIRD_SIZE     ((paddr_t)1 << THIRD_SHIFT)
 #define THIRD_MASK     (~(THIRD_SIZE - 1))
 #define SECOND_SHIFT   (THIRD_SHIFT + LPAE_SHIFT)
+#define SECOND_ORDER   (THIRD_ORDER + LPAE_SHIFT)
 #define SECOND_SIZE    ((paddr_t)1 << SECOND_SHIFT)
 #define SECOND_MASK    (~(SECOND_SIZE - 1))
 #define FIRST_SHIFT    (SECOND_SHIFT + LPAE_SHIFT)
+#define FIRST_ORDER    (SECOND_ORDER + LPAE_SHIFT)
 #define FIRST_SIZE     ((paddr_t)1 << FIRST_SHIFT)
 #define FIRST_MASK     (~(FIRST_SIZE - 1))
 #define ZEROETH_SHIFT  (FIRST_SHIFT + LPAE_SHIFT)
+#define ZEROETH_ORDER  (FIRST_ORDER + LPAE_SHIFT)
 #define ZEROETH_SIZE   ((paddr_t)1 << ZEROETH_SHIFT)
 #define ZEROETH_MASK   (~(ZEROETH_SIZE - 1))