diff mbox series

[Xen-devel,MM-PART3,v2,10/12] xen/arm: mm: Rework Xen page-tables walk during update

Message ID 20190514123125.29086-11-julien.grall@arm.com
State New
Headers show
Series xen/arm: Provide a generic function to update Xen PT | expand

Commit Message

Julien Grall May 14, 2019, 12:31 p.m. UTC
Currently, xen_pt_update_entry() is only able to update the region covered
by xen_second (i.e 0 to 0x7fffffff).

Because of the restriction we end to have multiple functions in mm.c
modifying the page-tables differently.

Furthermore, we never walked the page-tables fully. This means that any
change in the layout may requires major rewrite of the page-tables code.

Lastly, we have been quite lucky that no one ever tried to pass an address
outside this range because it would have blown-up.

xen_pt_update_entry() is reworked to walk over the page-tables every
time. The logic has been borrowed from arch/arm/p2m.c and contain some
limitations for the time being:
    - Superpage cannot be shattered
    - Only level 3 (i.e 4KB) can be done

Note that the parameter 'addr' has been renamed to 'virt' to make clear
we are dealing with a virtual address.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>

---
    Changes in v2:
        - Add Andrii's reviewed-by
---
 xen/arch/arm/mm.c | 121 +++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 106 insertions(+), 15 deletions(-)

Comments

Stefano Stabellini June 12, 2019, 10:52 p.m. UTC | #1
On Tue, 14 May 2019, Julien Grall wrote:
> Currently, xen_pt_update_entry() is only able to update the region covered
> by xen_second (i.e 0 to 0x7fffffff).
> 
> Because of the restriction we end to have multiple functions in mm.c
> modifying the page-tables differently.
> 
> Furthermore, we never walked the page-tables fully. This means that any
> change in the layout may requires major rewrite of the page-tables code.
> 
> Lastly, we have been quite lucky that no one ever tried to pass an address
> outside this range because it would have blown-up.
> 
> xen_pt_update_entry() is reworked to walk over the page-tables every
> time. The logic has been borrowed from arch/arm/p2m.c and contain some
> limitations for the time being:
>     - Superpage cannot be shattered
>     - Only level 3 (i.e 4KB) can be done
> 
> Note that the parameter 'addr' has been renamed to 'virt' to make clear
> we are dealing with a virtual address.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
> 
> ---
>     Changes in v2:
>         - Add Andrii's reviewed-by
> ---
>  xen/arch/arm/mm.c | 121 +++++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 106 insertions(+), 15 deletions(-)
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index f5979f549b..9a40754f44 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -984,6 +984,53 @@ static void xen_unmap_table(const lpae_t *table)
>      unmap_domain_page(table);
>  }
>  
> +#define XEN_TABLE_MAP_FAILED 0
> +#define XEN_TABLE_SUPER_PAGE 1
> +#define XEN_TABLE_NORMAL_PAGE 2

Minor NIT: do we want to have XEN_TABLE_MAP_FAILED be -1 to follow the
pattern that errors are < 0 ? Not important though.


> +/*
> + * Take the currently mapped table, find the corresponding entry,
> + * and map the next table, if available.
> + *
> + * The read_only parameters indicates whether intermediate tables should
> + * be allocated when not present.

I wonder if it would be a good idea to rename read_only to something
more obviously connected to the idea that tables get created. Maybe
create_missing? It would have to match the variable and comment added
below in xen_pt_update_entry. I don't have a strong opinion on this.


> + * Return values:
> + *  XEN_TABLE_MAP_FAILED: Either read_only was set and the entry
> + *  was empty, or allocating a new page failed.
> + *  XEN_TABLE_NORMAL_PAGE: next level mapped normally
> + *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
> + */
> +static int xen_pt_next_level(bool read_only, unsigned int level,
> +                             lpae_t **table, unsigned int offset)
> +{
> +    lpae_t *entry;
> +    int ret;
> +
> +    entry = *table + offset;
> +
> +    if ( !lpae_is_valid(*entry) )
> +    {
> +        if ( read_only )
> +            return XEN_TABLE_MAP_FAILED;
> +
> +        ret = create_xen_table(entry);
> +        if ( ret )
> +            return XEN_TABLE_MAP_FAILED;
> +    }
> +
> +    ASSERT(lpae_is_valid(*entry));

Why the ASSERT just after the lpae_is_valid check above?


> +    /* The function xen_pt_next_level is never called at the 3rd level */
> +    if ( lpae_is_mapping(*entry, level) )
> +        return XEN_TABLE_SUPER_PAGE;
> +
> +    xen_unmap_table(*table);
> +    *table = xen_map_table(lpae_get_mfn(*entry));
> +
> +    return XEN_TABLE_NORMAL_PAGE;
> +}
> +
>  /* Sanity check of the entry */
>  static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>  {
> @@ -1043,30 +1090,65 @@ static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
>      return true;
>  }
>  
> -static int xen_pt_update_entry(unsigned long addr, mfn_t mfn,
> -                               unsigned int flags)
> +static int xen_pt_update_entry(mfn_t root, unsigned long virt,
> +                               mfn_t mfn, unsigned int flags)
>  {
>      int rc;
> +    unsigned int level;
> +    /* We only support 4KB mapping (i.e level 3) for now */
> +    unsigned int target = 3;
> +    lpae_t *table;
> +    /*
> +     * The intermediate page tables are read-only when the MFN is not valid
> +     * and we are not populating page table.
> +     * This means we either modify permissions or remove an entry.
> +     */
> +    bool read_only = mfn_eq(mfn, INVALID_MFN) && !(flags & _PAGE_POPULATE);
>      lpae_t pte, *entry;
> -    lpae_t *third = NULL;
> +
> +    /* convenience aliases */
> +    DECLARE_OFFSETS(offsets, (paddr_t)virt);
>  
>      /* _PAGE_POPULATE and _PAGE_PRESENT should never be set together. */
>      ASSERT((flags & (_PAGE_POPULATE|_PAGE_PRESENT)) != (_PAGE_POPULATE|_PAGE_PRESENT));
>  
> -    entry = &xen_second[second_linear_offset(addr)];
> -    if ( !lpae_is_valid(*entry) || !lpae_is_table(*entry, 2) )
> +    table = xen_map_table(root);
> +    for ( level = HYP_PT_ROOT_LEVEL; level < target; level++ )
>      {
> -        int rc = create_xen_table(entry);
> -        if ( rc < 0 ) {
> -            printk("%s: L2 failed\n", __func__);
> -            return rc;
> +        rc = xen_pt_next_level(read_only, level, &table, offsets[level]);
> +        if ( rc == XEN_TABLE_MAP_FAILED )
> +        {
> +            /*
> +             * We are here because xen_pt_next_level has failed to map
> +             * the intermediate page table (e.g the table does not exist
> +             * and the pt is read-only). It is a valid case when
> +             * removing a mapping as it may not exist in the page table.
> +             * In this case, just ignore it.
> +             */
> +            if ( flags & (_PAGE_PRESENT|_PAGE_POPULATE) )
> +            {
> +                mm_printk("%s: Unable to map level %u\n", __func__, level);
> +                rc = -ENOENT;
> +                goto out;
> +            }
> +            else
> +            {
> +                rc = 0;
> +                goto out;
> +            }
>          }
> +        else if ( rc != XEN_TABLE_NORMAL_PAGE )
> +            break;
>      }
>  
> -    BUG_ON(!lpae_is_valid(*entry));
> +    if ( level != target )
> +    {
> +        mm_printk("%s: Shattering superpage is not supported\n", __func__);
> +        rc = -EOPNOTSUPP;
> +        goto out;
> +    }
>  
> -    third = xen_map_table(lpae_get_mfn(*entry));
> -    entry = &third[third_table_offset(addr)];
> +    entry = table + offsets[level];
>  
>      rc = -EINVAL;
>      if ( !xen_pt_check_entry(*entry, mfn, flags) )
> @@ -1103,7 +1185,7 @@ static int xen_pt_update_entry(unsigned long addr, mfn_t mfn,
>      rc = 0;
>  
>  out:
> -    xen_unmap_table(third);
> +    xen_unmap_table(table);
>  
>      return rc;
>  }
> @@ -1119,6 +1201,15 @@ static int xen_pt_update(unsigned long virt,
>      unsigned long addr = virt, addr_end = addr + nr_mfns * PAGE_SIZE;
>  
>      /*
> +     * For arm32, page-tables are different on each CPUs. Yet, they share
> +     * some common mappings. It is assumed that only common mappings
> +     * will be modified with this function.
> +     *
> +     * XXX: Add a check.
> +     */
> +    const mfn_t root = virt_to_mfn(THIS_CPU_PGTABLE);
> +
> +    /*
>       * The hardware was configured to forbid mapping both writeable and
>       * executable.
>       * When modifying/creating mapping (i.e _PAGE_PRESENT is set),
> @@ -1139,9 +1230,9 @@ static int xen_pt_update(unsigned long virt,
>  
>      spin_lock(&xen_pt_lock);
>  
> -    for( ; addr < addr_end; addr += PAGE_SIZE )
> +    for ( ; addr < addr_end; addr += PAGE_SIZE )
>      {
> -        rc = xen_pt_update_entry(addr, mfn, flags);
> +        rc = xen_pt_update_entry(root, addr, mfn, flags);
>          if ( rc )
>              break;
>  
> -- 
> 2.11.0
>
Julien Grall June 13, 2019, 8:20 a.m. UTC | #2
Hi Stefano,

On 6/12/19 11:52 PM, Stefano Stabellini wrote:
> On Tue, 14 May 2019, Julien Grall wrote:
>> Currently, xen_pt_update_entry() is only able to update the region covered
>> by xen_second (i.e 0 to 0x7fffffff).
>>
>> Because of the restriction we end to have multiple functions in mm.c
>> modifying the page-tables differently.
>>
>> Furthermore, we never walked the page-tables fully. This means that any
>> change in the layout may requires major rewrite of the page-tables code.
>>
>> Lastly, we have been quite lucky that no one ever tried to pass an address
>> outside this range because it would have blown-up.
>>
>> xen_pt_update_entry() is reworked to walk over the page-tables every
>> time. The logic has been borrowed from arch/arm/p2m.c and contain some
>> limitations for the time being:
>>      - Superpage cannot be shattered
>>      - Only level 3 (i.e 4KB) can be done
>>
>> Note that the parameter 'addr' has been renamed to 'virt' to make clear
>> we are dealing with a virtual address.
>>
>> Signed-off-by: Julien Grall <julien.grall@arm.com>
>> Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
>>
>> ---
>>      Changes in v2:
>>          - Add Andrii's reviewed-by
>> ---
>>   xen/arch/arm/mm.c | 121 +++++++++++++++++++++++++++++++++++++++++++++++-------
>>   1 file changed, 106 insertions(+), 15 deletions(-)
>>
>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
>> index f5979f549b..9a40754f44 100644
>> --- a/xen/arch/arm/mm.c
>> +++ b/xen/arch/arm/mm.c
>> @@ -984,6 +984,53 @@ static void xen_unmap_table(const lpae_t *table)
>>       unmap_domain_page(table);
>>   }
>>   
>> +#define XEN_TABLE_MAP_FAILED 0
>> +#define XEN_TABLE_SUPER_PAGE 1
>> +#define XEN_TABLE_NORMAL_PAGE 2
> 
> Minor NIT: do we want to have XEN_TABLE_MAP_FAILED be -1 to follow the
> pattern that errors are < 0 ? Not important though.

The value of XEN_TABLE_* here does not matter, you can see it as an 
open-coded enum. This was borrowed from arm/p2m.c (which was based on 
x86/mm/p2m-pt.c).

For the time being, I would prefer to keep it as is because it makes 
easier to spot the difference with the p2m code. I can consider 
switching the two to enum afterwards.

> 
> 
>> +/*
>> + * Take the currently mapped table, find the corresponding entry,
>> + * and map the next table, if available.
>> + *
>> + * The read_only parameters indicates whether intermediate tables should
>> + * be allocated when not present.
> 
> I wonder if it would be a good idea to rename read_only to something
> more obviously connected to the idea that tables get created. Maybe
> create_missing? It would have to match the variable and comment added
> below in xen_pt_update_entry. I don't have a strong opinion on this.

Same as above here, the comment is a replicate of p2m.c

> 
> 
>> + * Return values:
>> + *  XEN_TABLE_MAP_FAILED: Either read_only was set and the entry
>> + *  was empty, or allocating a new page failed.
>> + *  XEN_TABLE_NORMAL_PAGE: next level mapped normally
>> + *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
>> + */
>> +static int xen_pt_next_level(bool read_only, unsigned int level,
>> +                             lpae_t **table, unsigned int offset)
>> +{
>> +    lpae_t *entry;
>> +    int ret;
>> +
>> +    entry = *table + offset;
>> +
>> +    if ( !lpae_is_valid(*entry) )
>> +    {
>> +        if ( read_only )
>> +            return XEN_TABLE_MAP_FAILED;
>> +
>> +        ret = create_xen_table(entry);
>> +        if ( ret )
>> +            return XEN_TABLE_MAP_FAILED;
>> +    }
>> +
>> +    ASSERT(lpae_is_valid(*entry));
> 
> Why the ASSERT just after the lpae_is_valid check above?

When the entry is invalid, the new page table will be allocated and the 
entry will be generated. The rest of the function will then be executed. 
The ASSERT() here confirms the entry we have in hand is valid in all the 
cases.

Cheers,
Stefano Stabellini June 13, 2019, 5:59 p.m. UTC | #3
On Thu, 13 Jun 2019, Julien Grall wrote:
> Hi Stefano,
> 
> On 6/12/19 11:52 PM, Stefano Stabellini wrote:
> > On Tue, 14 May 2019, Julien Grall wrote:
> > > Currently, xen_pt_update_entry() is only able to update the region covered
> > > by xen_second (i.e 0 to 0x7fffffff).
> > > 
> > > Because of the restriction we end to have multiple functions in mm.c
> > > modifying the page-tables differently.
> > > 
> > > Furthermore, we never walked the page-tables fully. This means that any
> > > change in the layout may requires major rewrite of the page-tables code.
> > > 
> > > Lastly, we have been quite lucky that no one ever tried to pass an address
> > > outside this range because it would have blown-up.
> > > 
> > > xen_pt_update_entry() is reworked to walk over the page-tables every
> > > time. The logic has been borrowed from arch/arm/p2m.c and contain some
> > > limitations for the time being:
> > >      - Superpage cannot be shattered
> > >      - Only level 3 (i.e 4KB) can be done
> > > 
> > > Note that the parameter 'addr' has been renamed to 'virt' to make clear
> > > we are dealing with a virtual address.
> > > 
> > > Signed-off-by: Julien Grall <julien.grall@arm.com>
> > > Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
> > > 
> > > ---
> > >      Changes in v2:
> > >          - Add Andrii's reviewed-by
> > > ---
> > >   xen/arch/arm/mm.c | 121
> > > +++++++++++++++++++++++++++++++++++++++++++++++-------
> > >   1 file changed, 106 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> > > index f5979f549b..9a40754f44 100644
> > > --- a/xen/arch/arm/mm.c
> > > +++ b/xen/arch/arm/mm.c
> > > @@ -984,6 +984,53 @@ static void xen_unmap_table(const lpae_t *table)
> > >       unmap_domain_page(table);
> > >   }
> > >   +#define XEN_TABLE_MAP_FAILED 0
> > > +#define XEN_TABLE_SUPER_PAGE 1
> > > +#define XEN_TABLE_NORMAL_PAGE 2
> > 
> > Minor NIT: do we want to have XEN_TABLE_MAP_FAILED be -1 to follow the
> > pattern that errors are < 0 ? Not important though.
> 
> The value of XEN_TABLE_* here does not matter, you can see it as an open-coded
> enum. This was borrowed from arm/p2m.c (which was based on x86/mm/p2m-pt.c).
> 
> For the time being, I would prefer to keep it as is because it makes easier to
> spot the difference with the p2m code. I can consider switching the two to
> enum afterwards.

OK


> > 
> > > +/*
> > > + * Take the currently mapped table, find the corresponding entry,
> > > + * and map the next table, if available.
> > > + *
> > > + * The read_only parameters indicates whether intermediate tables should
> > > + * be allocated when not present.
> > 
> > I wonder if it would be a good idea to rename read_only to something
> > more obviously connected to the idea that tables get created. Maybe
> > create_missing? It would have to match the variable and comment added
> > below in xen_pt_update_entry. I don't have a strong opinion on this.
> 
> Same as above here, the comment is a replicate of p2m.c
 
OK


> > > + * Return values:
> > > + *  XEN_TABLE_MAP_FAILED: Either read_only was set and the entry
> > > + *  was empty, or allocating a new page failed.
> > > + *  XEN_TABLE_NORMAL_PAGE: next level mapped normally
> > > + *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
> > > + */
> > > +static int xen_pt_next_level(bool read_only, unsigned int level,
> > > +                             lpae_t **table, unsigned int offset)
> > > +{
> > > +    lpae_t *entry;
> > > +    int ret;
> > > +
> > > +    entry = *table + offset;
> > > +
> > > +    if ( !lpae_is_valid(*entry) )
> > > +    {
> > > +        if ( read_only )
> > > +            return XEN_TABLE_MAP_FAILED;
> > > +
> > > +        ret = create_xen_table(entry);
> > > +        if ( ret )
> > > +            return XEN_TABLE_MAP_FAILED;
> > > +    }
> > > +
> > > +    ASSERT(lpae_is_valid(*entry));
> > 
> > Why the ASSERT just after the lpae_is_valid check above?
> 
> When the entry is invalid, the new page table will be allocated and the entry
> will be generated. The rest of the function will then be executed. The
> ASSERT() here confirms the entry we have in hand is valid in all the cases.

So it's to double-check that after getting into the `if' statement, the
entry becomes valid, which is kind of redundant due to the two errors
check above but it is still valid. OK.
Julien Grall June 13, 2019, 9:32 p.m. UTC | #4
Hi Stefano,

On 13/06/2019 18:59, Stefano Stabellini wrote:
> On Thu, 13 Jun 2019, Julien Grall wrote: >>>> + * Return values:

>>>> + *  XEN_TABLE_MAP_FAILED: Either read_only was set and the entry

>>>> + *  was empty, or allocating a new page failed.

>>>> + *  XEN_TABLE_NORMAL_PAGE: next level mapped normally

>>>> + *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.

>>>> + */

>>>> +static int xen_pt_next_level(bool read_only, unsigned int level,

>>>> +                             lpae_t **table, unsigned int offset)

>>>> +{

>>>> +    lpae_t *entry;

>>>> +    int ret;

>>>> +

>>>> +    entry = *table + offset;

>>>> +

>>>> +    if ( !lpae_is_valid(*entry) )

>>>> +    {

>>>> +        if ( read_only )

>>>> +            return XEN_TABLE_MAP_FAILED;

>>>> +

>>>> +        ret = create_xen_table(entry);

>>>> +        if ( ret )

>>>> +            return XEN_TABLE_MAP_FAILED;

>>>> +    }

>>>> +

>>>> +    ASSERT(lpae_is_valid(*entry));

>>>

>>> Why the ASSERT just after the lpae_is_valid check above?

>>

>> When the entry is invalid, the new page table will be allocated and the entry

>> will be generated. The rest of the function will then be executed. The

>> ASSERT() here confirms the entry we have in hand is valid in all the cases.

> 

> So it's to double-check that after getting into the `if' statement, the

> entry becomes valid, which is kind of redundant due to the two errors

> check above but it is still valid. OK.


While I agree that we have 2 ifs above, we only check "rc" and not "entry".

I ought to think I wrote perfect code, sadly this is not always the case ;).

Here, it would catch any mistake if "rc" is zero but "entry" is still 
invalid. The risk here is the "entry" would be invalid but the mistake 
may be spotted a long time after (i.e any access to the mapping will 
fault). This would potentially cost a lot of debug.

I agree this is probably over cautious, I can't remember if I hit the 
problem before. Anyway, I am happy to drop the ASSERT() if you think it 
is too redundant.

Regardless that, are you happy with the rest of the patch? If so, can I 
get your acked-by/reviewed-by?

Cheers,

-- 
Julien Grall
Stefano Stabellini June 13, 2019, 10:57 p.m. UTC | #5
On Thu, 13 Jun 2019, Julien Grall wrote:
> Hi Stefano,
> 
> On 13/06/2019 18:59, Stefano Stabellini wrote:
> > On Thu, 13 Jun 2019, Julien Grall wrote: >>>> + * Return values:
> >>>> + *  XEN_TABLE_MAP_FAILED: Either read_only was set and the entry
> >>>> + *  was empty, or allocating a new page failed.
> >>>> + *  XEN_TABLE_NORMAL_PAGE: next level mapped normally
> >>>> + *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
> >>>> + */
> >>>> +static int xen_pt_next_level(bool read_only, unsigned int level,
> >>>> +                             lpae_t **table, unsigned int offset)
> >>>> +{
> >>>> +    lpae_t *entry;
> >>>> +    int ret;
> >>>> +
> >>>> +    entry = *table + offset;
> >>>> +
> >>>> +    if ( !lpae_is_valid(*entry) )
> >>>> +    {
> >>>> +        if ( read_only )
> >>>> +            return XEN_TABLE_MAP_FAILED;
> >>>> +
> >>>> +        ret = create_xen_table(entry);
> >>>> +        if ( ret )
> >>>> +            return XEN_TABLE_MAP_FAILED;
> >>>> +    }
> >>>> +
> >>>> +    ASSERT(lpae_is_valid(*entry));
> >>>
> >>> Why the ASSERT just after the lpae_is_valid check above?
> >>
> >> When the entry is invalid, the new page table will be allocated and the entry
> >> will be generated. The rest of the function will then be executed. The
> >> ASSERT() here confirms the entry we have in hand is valid in all the cases.
> > 
> > So it's to double-check that after getting into the `if' statement, the
> > entry becomes valid, which is kind of redundant due to the two errors
> > check above but it is still valid. OK.
> 
> While I agree that we have 2 ifs above, we only check "rc" and not "entry".
> 
> I ought to think I wrote perfect code, sadly this is not always the case ;).
> 
> Here, it would catch any mistake if "rc" is zero but "entry" is still 
> invalid. The risk here is the "entry" would be invalid but the mistake 
> may be spotted a long time after (i.e any access to the mapping will 
> fault). This would potentially cost a lot of debug.
> 
> I agree this is probably over cautious, I can't remember if I hit the 
> problem before. Anyway, I am happy to drop the ASSERT() if you think it 
> is too redundant.

I would drop it, but I don't care much about it.

> Regardless that, are you happy with the rest of the patch? If so, can I 
> get your acked-by/reviewed-by?

Yes, either way
diff mbox series

Patch

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index f5979f549b..9a40754f44 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -984,6 +984,53 @@  static void xen_unmap_table(const lpae_t *table)
     unmap_domain_page(table);
 }
 
+#define XEN_TABLE_MAP_FAILED 0
+#define XEN_TABLE_SUPER_PAGE 1
+#define XEN_TABLE_NORMAL_PAGE 2
+
+/*
+ * Take the currently mapped table, find the corresponding entry,
+ * and map the next table, if available.
+ *
+ * The read_only parameters indicates whether intermediate tables should
+ * be allocated when not present.
+ *
+ * Return values:
+ *  XEN_TABLE_MAP_FAILED: Either read_only was set and the entry
+ *  was empty, or allocating a new page failed.
+ *  XEN_TABLE_NORMAL_PAGE: next level mapped normally
+ *  XEN_TABLE_SUPER_PAGE: The next entry points to a superpage.
+ */
+static int xen_pt_next_level(bool read_only, unsigned int level,
+                             lpae_t **table, unsigned int offset)
+{
+    lpae_t *entry;
+    int ret;
+
+    entry = *table + offset;
+
+    if ( !lpae_is_valid(*entry) )
+    {
+        if ( read_only )
+            return XEN_TABLE_MAP_FAILED;
+
+        ret = create_xen_table(entry);
+        if ( ret )
+            return XEN_TABLE_MAP_FAILED;
+    }
+
+    ASSERT(lpae_is_valid(*entry));
+
+    /* The function xen_pt_next_level is never called at the 3rd level */
+    if ( lpae_is_mapping(*entry, level) )
+        return XEN_TABLE_SUPER_PAGE;
+
+    xen_unmap_table(*table);
+    *table = xen_map_table(lpae_get_mfn(*entry));
+
+    return XEN_TABLE_NORMAL_PAGE;
+}
+
 /* Sanity check of the entry */
 static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
 {
@@ -1043,30 +1090,65 @@  static bool xen_pt_check_entry(lpae_t entry, mfn_t mfn, unsigned int flags)
     return true;
 }
 
-static int xen_pt_update_entry(unsigned long addr, mfn_t mfn,
-                               unsigned int flags)
+static int xen_pt_update_entry(mfn_t root, unsigned long virt,
+                               mfn_t mfn, unsigned int flags)
 {
     int rc;
+    unsigned int level;
+    /* We only support 4KB mapping (i.e level 3) for now */
+    unsigned int target = 3;
+    lpae_t *table;
+    /*
+     * The intermediate page tables are read-only when the MFN is not valid
+     * and we are not populating page table.
+     * This means we either modify permissions or remove an entry.
+     */
+    bool read_only = mfn_eq(mfn, INVALID_MFN) && !(flags & _PAGE_POPULATE);
     lpae_t pte, *entry;
-    lpae_t *third = NULL;
+
+    /* convenience aliases */
+    DECLARE_OFFSETS(offsets, (paddr_t)virt);
 
     /* _PAGE_POPULATE and _PAGE_PRESENT should never be set together. */
     ASSERT((flags & (_PAGE_POPULATE|_PAGE_PRESENT)) != (_PAGE_POPULATE|_PAGE_PRESENT));
 
-    entry = &xen_second[second_linear_offset(addr)];
-    if ( !lpae_is_valid(*entry) || !lpae_is_table(*entry, 2) )
+    table = xen_map_table(root);
+    for ( level = HYP_PT_ROOT_LEVEL; level < target; level++ )
     {
-        int rc = create_xen_table(entry);
-        if ( rc < 0 ) {
-            printk("%s: L2 failed\n", __func__);
-            return rc;
+        rc = xen_pt_next_level(read_only, level, &table, offsets[level]);
+        if ( rc == XEN_TABLE_MAP_FAILED )
+        {
+            /*
+             * We are here because xen_pt_next_level has failed to map
+             * the intermediate page table (e.g the table does not exist
+             * and the pt is read-only). It is a valid case when
+             * removing a mapping as it may not exist in the page table.
+             * In this case, just ignore it.
+             */
+            if ( flags & (_PAGE_PRESENT|_PAGE_POPULATE) )
+            {
+                mm_printk("%s: Unable to map level %u\n", __func__, level);
+                rc = -ENOENT;
+                goto out;
+            }
+            else
+            {
+                rc = 0;
+                goto out;
+            }
         }
+        else if ( rc != XEN_TABLE_NORMAL_PAGE )
+            break;
     }
 
-    BUG_ON(!lpae_is_valid(*entry));
+    if ( level != target )
+    {
+        mm_printk("%s: Shattering superpage is not supported\n", __func__);
+        rc = -EOPNOTSUPP;
+        goto out;
+    }
 
-    third = xen_map_table(lpae_get_mfn(*entry));
-    entry = &third[third_table_offset(addr)];
+    entry = table + offsets[level];
 
     rc = -EINVAL;
     if ( !xen_pt_check_entry(*entry, mfn, flags) )
@@ -1103,7 +1185,7 @@  static int xen_pt_update_entry(unsigned long addr, mfn_t mfn,
     rc = 0;
 
 out:
-    xen_unmap_table(third);
+    xen_unmap_table(table);
 
     return rc;
 }
@@ -1119,6 +1201,15 @@  static int xen_pt_update(unsigned long virt,
     unsigned long addr = virt, addr_end = addr + nr_mfns * PAGE_SIZE;
 
     /*
+     * For arm32, page-tables are different on each CPUs. Yet, they share
+     * some common mappings. It is assumed that only common mappings
+     * will be modified with this function.
+     *
+     * XXX: Add a check.
+     */
+    const mfn_t root = virt_to_mfn(THIS_CPU_PGTABLE);
+
+    /*
      * The hardware was configured to forbid mapping both writeable and
      * executable.
      * When modifying/creating mapping (i.e _PAGE_PRESENT is set),
@@ -1139,9 +1230,9 @@  static int xen_pt_update(unsigned long virt,
 
     spin_lock(&xen_pt_lock);
 
-    for( ; addr < addr_end; addr += PAGE_SIZE )
+    for ( ; addr < addr_end; addr += PAGE_SIZE )
     {
-        rc = xen_pt_update_entry(addr, mfn, flags);
+        rc = xen_pt_update_entry(root, addr, mfn, flags);
         if ( rc )
             break;