diff mbox

[Xen-devel,6/6] xen: arm: use superpages in p2m when pages are suitably aligned

Message ID 1402394278-9850-6-git-send-email-ian.campbell@citrix.com
State New
Headers show

Commit Message

Ian Campbell June 10, 2014, 9:57 a.m. UTC
This creates superpage (1G and 2M) mappings in the p2m for both domU and dom0
when the guest and machine addresses are suitably aligned. This relies on the
domain builder to allocate suitably sized regions, this is trivially true for
1:1 mapped dom0's and was arranged for guests in a previous patch. A non-1:1
dom0's memory is not currently deliberately aligned.

Since ARM pagetables are (mostly) consistent at each level this is implemented
by refactoring the handling of a single level of pagetable into a common
function. The two inconsistencies are that there are no superpage mappings at
level zero and that level three entry mappings must set the table bit.

When inserting new mappings the code shatters superpage mappings as necessary,
but currently makes no attempt to coalesce anything again. In particular when
inserting a mapping which could be a superpage over an existing table mapping
we do not attempt to free that lower page table and instead descend into it.

Some p2m statistics are added to keep track of the number of each level of
mapping and the number of times we've had to shatter an existing mapping.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/domain.c     |    1 +
 xen/arch/arm/p2m.c        |  497 +++++++++++++++++++++++++++++++++------------
 xen/include/asm-arm/p2m.h |    9 +
 3 files changed, 373 insertions(+), 134 deletions(-)

Comments

Julien Grall June 10, 2014, 12:10 p.m. UTC | #1
Hi Ian,

This patch looks good in general. I've only few comments, see them below.

On 06/10/2014 10:57 AM, Ian Campbell wrote:
> +void p2m_dump_info(struct domain *d)
> +{
> +    struct p2m_domain *p2m = &d->arch.p2m;
> +
> +    spin_lock(&p2m->lock);
> +    printk("p2m mappings for domain %d (vmid %d):\n",
> +           d->domain_id, p2m->vmid);
> +    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
> +    printk("  1G mappings: %d (shattered %d)\n",
> +           p2m->stats.mappings[1], p2m->stats.shattered[1]);
> +    printk("  2M mappings: %d (shattered %d)\n",
> +           p2m->stats.mappings[2], p2m->stats.shattered[2]);
> +    printk("  4K mappings: %d\n", p2m->stats.mappings[3]);

I wondering if we can have more than 2^32-1 4K mapping sometimes...

[..]

> +    else
> +    {
> +        clear_page(p);
> +    }
> +

Spurious {}.

[..]

> +/*
> + * 0   == (P2M_ONE_DESCEND) continue to decend the tree

s/decend/descend/

[..]

> @@ -574,7 +803,7 @@ int guest_physmap_add_entry(struct domain *d,
>  {
>      return apply_p2m_changes(d, INSERT,
>                               pfn_to_paddr(gpfn),
> -                             pfn_to_paddr(gpfn + (1 << page_order)),
> +                             pfn_to_paddr(gpfn + (1 << page_order)) - 1,

> @@ -584,7 +813,7 @@ void guest_physmap_remove_page(struct domain *d,
>  {
>      apply_p2m_changes(d, REMOVE,
>                        pfn_to_paddr(gpfn),
> -                      pfn_to_paddr(gpfn + (1<<page_order)),
> +                      pfn_to_paddr(gpfn + (1<<page_order)) - 1,

Why did you add -1 on both function? This is inconsistent with
p2m_populate_ram which use the range [start, end[.

I think we should document how we pass the argument to apply_p2m_changes
somewhere. Even better, use gmfn and nr_gfn, as x86 does, for
apply_p2m_changes and handle internally the pfn_to_paddr stuff. This
would avoid this kind of problem.
	
Regards,
Ian Campbell June 10, 2014, 3:23 p.m. UTC | #2
On Tue, 2014-06-10 at 13:10 +0100, Julien Grall wrote:
> Hi Ian,
> 
> This patch looks good in general. I've only few comments, see them below.
> 
> On 06/10/2014 10:57 AM, Ian Campbell wrote:
> > +void p2m_dump_info(struct domain *d)
> > +{
> > +    struct p2m_domain *p2m = &d->arch.p2m;
> > +
> > +    spin_lock(&p2m->lock);
> > +    printk("p2m mappings for domain %d (vmid %d):\n",
> > +           d->domain_id, p2m->vmid);
> > +    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
> > +    printk("  1G mappings: %d (shattered %d)\n",
> > +           p2m->stats.mappings[1], p2m->stats.shattered[1]);
> > +    printk("  2M mappings: %d (shattered %d)\n",
> > +           p2m->stats.mappings[2], p2m->stats.shattered[2]);
> > +    printk("  4K mappings: %d\n", p2m->stats.mappings[3]);
> 
> I wondering if we can have more than 2^32-1 4K mapping sometimes...

That would be 16TB of memory, which isn't out of the question but we
don't currently support it. Still, not much harm in increase to a 64-bit
field.

> [..]
> 
> > +    else
> > +    {
> > +        clear_page(p);
> > +    }
> > +
> 
> Spurious {}.

I prefer them when on both halves of an else when one of them needs
them, but I guess I'm in a minority so I'll remove.

> > @@ -574,7 +803,7 @@ int guest_physmap_add_entry(struct domain *d,
> >  {
> >      return apply_p2m_changes(d, INSERT,
> >                               pfn_to_paddr(gpfn),
> > -                             pfn_to_paddr(gpfn + (1 << page_order)),
> > +                             pfn_to_paddr(gpfn + (1 << page_order)) - 1,
> 
> > @@ -584,7 +813,7 @@ void guest_physmap_remove_page(struct domain *d,
> >  {
> >      apply_p2m_changes(d, REMOVE,
> >                        pfn_to_paddr(gpfn),
> > -                      pfn_to_paddr(gpfn + (1<<page_order)),
> > +                      pfn_to_paddr(gpfn + (1<<page_order)) - 1,
> 
> Why did you add -1 on both function? This is inconsistent with
> p2m_populate_ram which use the range [start, end[.

I thihk I thought the that the callers of apply_p2m_changes were
inconsistent about this, but then I changed my mind and somehow failed
to remove this change.

> I think we should document how we pass the argument to apply_p2m_changes
> somewhere. Even better, use gmfn and nr_gfn, as x86 does, for
> apply_p2m_changes and handle internally the pfn_to_paddr stuff. This
> would avoid this kind of problem.

Arianna is sorting that all out in her mmio mapping series.

Ian.
Julien Grall June 10, 2014, 7:11 p.m. UTC | #3
On 10/06/14 16:23, Ian Campbell wrote:
> On Tue, 2014-06-10 at 13:10 +0100, Julien Grall wrote:
>> Hi Ian,
>>
>> This patch looks good in general. I've only few comments, see them below.
>>
>> On 06/10/2014 10:57 AM, Ian Campbell wrote:
>>> +void p2m_dump_info(struct domain *d)
>>> +{
>>> +    struct p2m_domain *p2m = &d->arch.p2m;
>>> +
>>> +    spin_lock(&p2m->lock);
>>> +    printk("p2m mappings for domain %d (vmid %d):\n",
>>> +           d->domain_id, p2m->vmid);
>>> +    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
>>> +    printk("  1G mappings: %d (shattered %d)\n",
>>> +           p2m->stats.mappings[1], p2m->stats.shattered[1]);
>>> +    printk("  2M mappings: %d (shattered %d)\n",
>>> +           p2m->stats.mappings[2], p2m->stats.shattered[2]);
>>> +    printk("  4K mappings: %d\n", p2m->stats.mappings[3]);
>>
>> I wondering if we can have more than 2^32-1 4K mapping sometimes...
>
> That would be 16TB of memory, which isn't out of the question but we
> don't currently support it. Still, not much harm in increase to a 64-bit
> field.

Sounds good.

>> I think we should document how we pass the argument to apply_p2m_changes
>> somewhere. Even better, use gmfn and nr_gfn, as x86 does, for
>> apply_p2m_changes and handle internally the pfn_to_paddr stuff. This
>> would avoid this kind of problem.
>
> Arianna is sorting that all out in her mmio mapping series.

Oh right. Some of you will have to rebase his/her series on top of the 
other one :).

Regards,
Ian Campbell June 11, 2014, 8:29 a.m. UTC | #4
On Tue, 2014-06-10 at 20:11 +0100, Julien Grall wrote:
> 
> On 10/06/14 16:23, Ian Campbell wrote:
> > On Tue, 2014-06-10 at 13:10 +0100, Julien Grall wrote:
> >> Hi Ian,
> >>
> >> This patch looks good in general. I've only few comments, see them below.
> >>
> >> On 06/10/2014 10:57 AM, Ian Campbell wrote:
> >>> +void p2m_dump_info(struct domain *d)
> >>> +{
> >>> +    struct p2m_domain *p2m = &d->arch.p2m;
> >>> +
> >>> +    spin_lock(&p2m->lock);
> >>> +    printk("p2m mappings for domain %d (vmid %d):\n",
> >>> +           d->domain_id, p2m->vmid);
> >>> +    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
> >>> +    printk("  1G mappings: %d (shattered %d)\n",
> >>> +           p2m->stats.mappings[1], p2m->stats.shattered[1]);
> >>> +    printk("  2M mappings: %d (shattered %d)\n",
> >>> +           p2m->stats.mappings[2], p2m->stats.shattered[2]);
> >>> +    printk("  4K mappings: %d\n", p2m->stats.mappings[3]);
> >>
> >> I wondering if we can have more than 2^32-1 4K mapping sometimes...
> >
> > That would be 16TB of memory, which isn't out of the question but we
> > don't currently support it. Still, not much harm in increase to a 64-bit
> > field.
> 
> Sounds good.

In the end I went with unsigned long which is the size of a pfn for
arm32 and arm64 and therefore not wasteful on arm32.

> >> I think we should document how we pass the argument to apply_p2m_changes
> >> somewhere. Even better, use gmfn and nr_gfn, as x86 does, for
> >> apply_p2m_changes and handle internally the pfn_to_paddr stuff. This
> >> would avoid this kind of problem.
> >
> > Arianna is sorting that all out in her mmio mapping series.
> 
> Oh right. Some of you will have to rebase his/her series on top of the 
> other one :).

Yes.

Ian.
diff mbox

Patch

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index e494112..45b7cbd 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -742,6 +742,7 @@  int domain_relinquish_resources(struct domain *d)
 
 void arch_dump_domain_info(struct domain *d)
 {
+    p2m_dump_info(d);
 }
 
 
diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 233df72..1c2c724 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1,6 +1,7 @@ 
 #include <xen/config.h>
 #include <xen/sched.h>
 #include <xen/lib.h>
+#include <xen/stdbool.h>
 #include <xen/errno.h>
 #include <xen/domain_page.h>
 #include <xen/bitops.h>
@@ -19,6 +20,22 @@ 
 #define p2m_table(pte) (p2m_valid(pte) && (pte).p2m.table)
 #define p2m_entry(pte) (p2m_valid(pte) && !(pte).p2m.table)
 
+void p2m_dump_info(struct domain *d)
+{
+    struct p2m_domain *p2m = &d->arch.p2m;
+
+    spin_lock(&p2m->lock);
+    printk("p2m mappings for domain %d (vmid %d):\n",
+           d->domain_id, p2m->vmid);
+    BUG_ON(p2m->stats.mappings[0] || p2m->stats.shattered[0]);
+    printk("  1G mappings: %d (shattered %d)\n",
+           p2m->stats.mappings[1], p2m->stats.shattered[1]);
+    printk("  2M mappings: %d (shattered %d)\n",
+           p2m->stats.mappings[2], p2m->stats.shattered[2]);
+    printk("  4K mappings: %d\n", p2m->stats.mappings[3]);
+    spin_unlock(&p2m->lock);
+}
+
 void dump_p2m_lookup(struct domain *d, paddr_t addr)
 {
     struct p2m_domain *p2m = &d->arch.p2m;
@@ -274,15 +291,26 @@  static inline void p2m_write_pte(lpae_t *p, lpae_t pte, bool_t flush_cache)
         clean_xen_dcache(*p);
 }
 
-/* Allocate a new page table page and hook it in via the given entry */
-static int p2m_create_table(struct domain *d, lpae_t *entry, bool_t flush_cache)
+/*
+ * Allocate a new page table page and hook it in via the given entry.
+ * apply_one_level relies on this returning 0 on success
+ * and -ve on failure.
+ *
+ * If the exist entry is present then it must be a mapping and not a
+ * table and it will be shattered into the next level down.
+ *
+ * level_shift is the number of bits at the level we want to create.
+ */
+static int p2m_create_table(struct domain *d, lpae_t *entry,
+                            int level_shift, bool_t flush_cache)
 {
     struct p2m_domain *p2m = &d->arch.p2m;
     struct page_info *page;
-    void *p;
+    lpae_t *p;
     lpae_t pte;
+    int splitting = entry->p2m.valid;
 
-    BUG_ON(entry->p2m.valid);
+    BUG_ON(entry->p2m.table);
 
     page = alloc_domheap_page(NULL, 0);
     if ( page == NULL )
@@ -291,9 +319,39 @@  static int p2m_create_table(struct domain *d, lpae_t *entry, bool_t flush_cache)
     page_list_add(page, &p2m->pages);
 
     p = __map_domain_page(page);
-    clear_page(p);
+    if ( splitting )
+    {
+        p2m_type_t t = entry->p2m.type;
+        unsigned long base_pfn = entry->p2m.base;
+        int i;
+
+        /*
+         * We are either splitting a first level 1G page into 512 second level
+         * 2M pages, or a second level 2M page into 512 third level 4K pages.
+         */
+         for ( i=0 ; i < LPAE_ENTRIES; i++ )
+         {
+             pte = mfn_to_p2m_entry(base_pfn + (i<<(level_shift-LPAE_SHIFT)),
+                                    MATTR_MEM, t);
+
+             /*
+              * First and second level super pages set p2m.table = 0, but
+              * third level entries set table = 1.
+              */
+             if ( level_shift - LPAE_SHIFT )
+                 pte.p2m.table = 0;
+
+             write_pte(&p[i], pte);
+         }
+    }
+    else
+    {
+        clear_page(p);
+    }
+
     if ( flush_cache )
         clean_xen_dcache_va_range(p, PAGE_SIZE);
+
     unmap_domain_page(p);
 
     pte = mfn_to_p2m_entry(page_to_mfn(page), MATTR_MEM, p2m_invalid);
@@ -311,8 +369,14 @@  enum p2m_operation {
     CACHEFLUSH,
 };
 
-static void p2m_put_page(const lpae_t pte)
+/* Put any references on the single 4K page referenced by pte.  TODO:
+ * Handle superpages, for now we only take special references for leaf
+ * pages (specifically foreign ones, which can't be super mapped today).
+ */
+static void p2m_put_l3_page(const lpae_t pte)
 {
+    ASSERT(p2m_valid(pte));
+
     /* TODO: Handle other p2m types
      *
      * It's safe to do the put_page here because page_alloc will
@@ -328,6 +392,248 @@  static void p2m_put_page(const lpae_t pte)
     }
 }
 
+/*
+ * Returns true if start_gpaddr..end_gpaddr contains at least one
+ * suitably aligned level_size mappping of maddr.
+ *
+ * So long as the range is large enough the end_gpaddr need not be
+ * aligned (callers should create one superpage mapping based on this
+ * result and then call this again on the new range, eventually the
+ * slop at the end will cause this function to return false).
+ */
+static bool_t is_mapping_aligned(const paddr_t start_gpaddr,
+                                 const paddr_t end_gpaddr,
+                                 const paddr_t maddr,
+                                 const paddr_t level_size)
+{
+    const paddr_t level_mask = level_size - 1;
+
+    /* No hardware superpages at level 0 */
+    if ( level_size == ZEROETH_SIZE )
+        return false;
+
+    /*
+     * A range smaller than the size of a superpage at this level
+     * cannot be superpage aligned.
+     */
+    if ( ( end_gpaddr - start_gpaddr ) < level_size - 1 )
+        return false;
+
+    /* Both the gpaddr and maddr must be aligned */
+    if ( start_gpaddr & level_mask )
+        return false;
+    if ( maddr & level_mask )
+        return false;
+    return true;
+}
+
+#define P2M_ONE_DESCEND        0
+#define P2M_ONE_PROGRESS_NOP   0x1
+#define P2M_ONE_PROGRESS       0x10
+
+/*
+ * 0   == (P2M_ONE_DESCEND) continue to decend the tree
+ * +ve == (P2M_ONE_PROGRESS_*) handled at this level, continue, flush,
+ *        entry, addr and maddr updated.  Return value is an
+ *        indication of the amount of work done (for preemption).
+ * -ve == (-Exxx) error.
+ */
+static int apply_one_level(struct domain *d,
+                           lpae_t *entry,
+                           unsigned int level,
+                           bool_t flush_cache,
+                           enum p2m_operation op,
+                           paddr_t start_gpaddr,
+                           paddr_t end_gpaddr,
+                           paddr_t *addr,
+                           paddr_t *maddr,
+                           bool_t *flush,
+                           int mattr,
+                           p2m_type_t t)
+{
+    /* Helpers to lookup the properties of each level */
+    const paddr_t level_sizes[] =
+        { ZEROETH_SIZE, FIRST_SIZE, SECOND_SIZE, THIRD_SIZE };
+    const paddr_t level_masks[] =
+        { ZEROETH_MASK, FIRST_MASK, SECOND_MASK, THIRD_MASK };
+    const paddr_t level_shifts[] =
+        { ZEROETH_SHIFT, FIRST_SHIFT, SECOND_SHIFT, THIRD_SHIFT };
+    const paddr_t level_size = level_sizes[level];
+    const paddr_t level_mask = level_masks[level];
+    const paddr_t level_shift = level_shifts[level];
+
+    struct p2m_domain *p2m = &d->arch.p2m;
+    lpae_t pte;
+    const lpae_t orig_pte = *entry;
+    int rc;
+
+    BUG_ON(level > 3);
+
+    switch ( op )
+    {
+    case ALLOCATE:
+        ASSERT(level < 3 || !p2m_valid(orig_pte));
+        ASSERT(*maddr == 0);
+
+        if ( p2m_valid(orig_pte) )
+            return P2M_ONE_DESCEND;
+
+        if ( is_mapping_aligned(*addr, end_gpaddr, 0, level_size) )
+        {
+            struct page_info *page;
+
+            page = alloc_domheap_pages(d, level_shift - PAGE_SHIFT, 0);
+            if ( page )
+            {
+                pte = mfn_to_p2m_entry(page_to_mfn(page), mattr, t);
+                if ( level != 3 )
+                    pte.p2m.table = 0;
+                p2m_write_pte(entry, pte, flush_cache);
+                p2m->stats.mappings[level]++;
+                return P2M_ONE_PROGRESS;
+            }
+            else if ( level == 3 )
+                return -ENOMEM;
+        }
+
+        BUG_ON(level == 3); /* L3 is always superpage aligned */
+
+        /*
+         * If we get here then we failed to allocate a sufficiently
+         * large contiguous region for this level (which can't be
+         * L3). Create a page table and continue to descend so we try
+         * smaller allocations.
+         */
+        rc = p2m_create_table(d, entry, 0, flush_cache);
+        if ( rc < 0 )
+            return rc;
+
+        return P2M_ONE_DESCEND;
+
+    case INSERT:
+        if ( is_mapping_aligned(*addr, end_gpaddr, *maddr, level_size) &&
+           /* We do not handle replacing an existing table with a superpage */
+             (level == 3 || !p2m_table(orig_pte)) )
+        {
+            /* New mapping is superpage aligned, make it */
+            pte = mfn_to_p2m_entry(*maddr >> PAGE_SHIFT, mattr, t);
+            if ( level < 3 )
+                pte.p2m.table = 0; /* Superpage entry */
+
+            p2m_write_pte(entry, pte, flush_cache);
+
+            *flush |= p2m_valid(orig_pte);
+
+            *addr += level_size;
+            *maddr += level_size;
+
+            if ( p2m_valid(orig_pte) )
+            {
+                /*
+                 * We can't currently get here for an existing table
+                 * mapping, since we don't handle replacing an
+                 * existing table with a superpage. If we did we would
+                 * need to handle freeing (and accounting) for the bit
+                 * of the p2m tree which we would be about to lop off.
+                 */
+                BUG_ON(level < 3 && p2m_table(orig_pte));
+                if ( level == 3 )
+                    p2m_put_l3_page(orig_pte);
+            }
+            else /* New mapping */
+                p2m->stats.mappings[level]++;
+
+            return P2M_ONE_PROGRESS;
+        }
+        else
+        {
+            /* New mapping is not superpage aligned, create a new table entry */
+            BUG_ON(level == 3); /* L3 is always superpage aligned */
+
+            /* Not present -> create table entry and descend */
+            if ( !p2m_valid(orig_pte) )
+            {
+                rc = p2m_create_table(d, entry, 0, flush_cache);
+                if ( rc < 0 )
+                    return rc;
+                return P2M_ONE_DESCEND;
+            }
+
+            /* Existing superpage mapping -> shatter and descend */
+            if ( p2m_entry(orig_pte) )
+            {
+                *flush = true;
+                rc = p2m_create_table(d, entry,
+                                      level_shift - PAGE_SHIFT, flush_cache);
+                if ( rc < 0 )
+                    return rc;
+
+                p2m->stats.shattered[level]++;
+                p2m->stats.mappings[level]--;
+                p2m->stats.mappings[level+1] += LPAE_ENTRIES;
+            } /* else: an existing table mapping -> descend */
+
+            BUG_ON(!entry->p2m.table);
+
+            return P2M_ONE_DESCEND;
+        }
+
+        break;
+
+    case RELINQUISH:
+    case REMOVE:
+        if ( !p2m_valid(orig_pte) )
+        {
+            /* Progress up to next boundary */
+            *addr = (*addr + level_size) & level_mask;
+            return P2M_ONE_PROGRESS_NOP;
+        }
+
+        if ( level < 3 && p2m_table(orig_pte) )
+            return P2M_ONE_DESCEND;
+
+        *flush = true;
+
+        memset(&pte, 0x00, sizeof(pte));
+        p2m_write_pte(entry, pte, flush_cache);
+
+        *addr += level_size;
+
+        p2m->stats.mappings[level]--;
+
+        if ( level == 3 )
+            p2m_put_l3_page(orig_pte);
+
+        /*
+         * This is still a single pte write, no matter the level, so no need to
+         * scale.
+         */
+        return P2M_ONE_PROGRESS;
+
+    case CACHEFLUSH:
+        if ( !p2m_valid(orig_pte) )
+        {
+            *addr = (*addr + level_size) & level_mask;
+            return P2M_ONE_PROGRESS_NOP;
+        }
+
+        /*
+         * could flush up to the next boundary, but would need to be
+         * careful about preemption, so just do one page now and loop.
+         */
+        *addr += PAGE_SIZE;
+        if ( p2m_is_ram(orig_pte.p2m.type) )
+        {
+            flush_page_to_ram(orig_pte.p2m.base + third_table_offset(*addr));
+            return P2M_ONE_PROGRESS;
+        }
+        else
+            return P2M_ONE_PROGRESS_NOP;
+    }
+
+    BUG(); /* Should never get here */
+}
+
 static int apply_p2m_changes(struct domain *d,
                      enum p2m_operation op,
                      paddr_t start_gpaddr,
@@ -344,9 +650,7 @@  static int apply_p2m_changes(struct domain *d,
                   cur_first_offset = ~0,
                   cur_second_offset = ~0;
     unsigned long count = 0;
-    unsigned int flush = 0;
-    bool_t populate = (op == INSERT || op == ALLOCATE);
-    lpae_t pte;
+    bool_t flush = false;
     bool_t flush_pt;
 
     /* Some IOMMU don't support coherent PT walk. When the p2m is
@@ -360,6 +664,25 @@  static int apply_p2m_changes(struct domain *d,
     addr = start_gpaddr;
     while ( addr < end_gpaddr )
     {
+        /*
+         * Arbitrarily, preempt every 512 operations or 8192 nops.
+         * 512*P2M_ONE_PROGRESS == 8192*P2M_ONE_PROGRESS_NOP == 0x2000
+         *
+         * count is initialised to 0 above, so we are guaranteed to
+         * always make at least one pass.
+         */
+
+        if ( op == RELINQUISH && count >= 0x2000 )
+        {
+            if ( hypercall_preempt_check() )
+            {
+                p2m->lowest_mapped_gfn = addr >> PAGE_SHIFT;
+                rc = -ERESTART;
+                goto out;
+            }
+            count = 0;
+        }
+
         if ( cur_first_page != p2m_first_level_index(addr) )
         {
             if ( first ) unmap_domain_page(first);
@@ -372,22 +695,18 @@  static int apply_p2m_changes(struct domain *d,
             cur_first_page = p2m_first_level_index(addr);
         }
 
-        if ( !p2m_valid(first[first_table_offset(addr)]) )
-        {
-            if ( !populate )
-            {
-                addr = (addr + FIRST_SIZE) & FIRST_MASK;
-                continue;
-            }
+        /* We only use a 3 level p2m at the moment, so no level 0,
+         * current hardware doesn't support super page mappings at
+         * level 0 anyway */
 
-            rc = p2m_create_table(d, &first[first_table_offset(addr)],
-                                  flush_pt);
-            if ( rc < 0 )
-            {
-                printk("p2m_populate_ram: L1 failed\n");
-                goto out;
-            }
-        }
+        rc = apply_one_level(d, &first[first_table_offset(addr)],
+                             1, flush_pt, op,
+                             start_gpaddr, end_gpaddr,
+                             &addr, &maddr, &flush,
+                             mattr, t);
+        if ( rc < 0 ) goto out;
+        count += rc;
+        if ( rc != P2M_ONE_DESCEND ) continue;
 
         BUG_ON(!p2m_valid(first[first_table_offset(addr)]));
 
@@ -399,23 +718,16 @@  static int apply_p2m_changes(struct domain *d,
         }
         /* else: second already valid */
 
-        if ( !p2m_valid(second[second_table_offset(addr)]) )
-        {
-            if ( !populate )
-            {
-                addr = (addr + SECOND_SIZE) & SECOND_MASK;
-                continue;
-            }
+        rc = apply_one_level(d,&second[second_table_offset(addr)],
+                             2, flush_pt, op,
+                             start_gpaddr, end_gpaddr,
+                             &addr, &maddr, &flush,
+                             mattr, t);
+        if ( rc < 0 ) goto out;
+        count += rc;
+        if ( rc != P2M_ONE_DESCEND ) continue;
 
-            rc = p2m_create_table(d, &second[second_table_offset(addr)],
-                                  flush_pt);
-            if ( rc < 0 ) {
-                printk("p2m_populate_ram: L2 failed\n");
-                goto out;
-            }
-        }
-
-        BUG_ON(!second[second_table_offset(addr)].p2m.valid);
+        BUG_ON(!p2m_valid(second[second_table_offset(addr)]));
 
         if ( cur_second_offset != second_table_offset(addr) )
         {
@@ -425,98 +737,15 @@  static int apply_p2m_changes(struct domain *d,
             cur_second_offset = second_table_offset(addr);
         }
 
-        pte = third[third_table_offset(addr)];
-
-        flush |= pte.p2m.valid;
-
-        /* TODO: Handle other p2m type
-         *
-         * It's safe to do the put_page here because page_alloc will
-         * flush the TLBs if the page is reallocated before the end of
-         * this loop.
-         */
-        if ( pte.p2m.valid && p2m_is_foreign(pte.p2m.type) )
-        {
-            unsigned long mfn = pte.p2m.base;
-
-            ASSERT(mfn_valid(mfn));
-            put_page(mfn_to_page(mfn));
-        }
-
-        switch (op) {
-            case ALLOCATE:
-                {
-                    /* Allocate a new RAM page and attach */
-                    struct page_info *page;
-
-                    ASSERT(!pte.p2m.valid);
-                    rc = -ENOMEM;
-                    page = alloc_domheap_page(d, 0);
-                    if ( page == NULL ) {
-                        printk("p2m_populate_ram: failed to allocate page\n");
-                        goto out;
-                    }
-
-                    pte = mfn_to_p2m_entry(page_to_mfn(page), mattr, t);
-
-                    p2m_write_pte(&third[third_table_offset(addr)],
-                                  pte, flush_pt);
-                }
-                break;
-            case INSERT:
-                {
-                    if ( pte.p2m.valid )
-                        p2m_put_page(pte);
-                    pte = mfn_to_p2m_entry(maddr >> PAGE_SHIFT, mattr, t);
-                    p2m_write_pte(&third[third_table_offset(addr)],
-                                  pte, flush_pt);
-                    maddr += PAGE_SIZE;
-                }
-                break;
-            case RELINQUISH:
-            case REMOVE:
-                {
-                    if ( !pte.p2m.valid )
-                    {
-                        count++;
-                        break;
-                    }
-
-                    p2m_put_page(pte);
-
-                    count += 0x10;
-
-                    memset(&pte, 0x00, sizeof(pte));
-                    p2m_write_pte(&third[third_table_offset(addr)],
-                                  pte, flush_pt);
-                    count++;
-                }
-                break;
-
-            case CACHEFLUSH:
-                {
-                    if ( !pte.p2m.valid || !p2m_is_ram(pte.p2m.type) )
-                        break;
-
-                    flush_page_to_ram(pte.p2m.base);
-                }
-                break;
-        }
-
-        /* Preempt every 2MiB (mapped) or 32 MiB (unmapped) - arbitrary */
-        if ( op == RELINQUISH && count >= 0x2000 )
-        {
-            if ( hypercall_preempt_check() )
-            {
-                p2m->lowest_mapped_gfn = addr >> PAGE_SHIFT;
-                rc = -ERESTART;
-                goto out;
-            }
-            count = 0;
-        }
-
-        /* Got the next page */
-        addr += PAGE_SIZE;
+        rc = apply_one_level(d, &third[third_table_offset(addr)],
+                             3, flush_pt, op,
+                             start_gpaddr, end_gpaddr,
+                             &addr, &maddr, &flush,
+                             mattr, t);
+        if ( rc < 0 ) goto out;
+        /* L3 had better have done something! We cannot descend any further */
+        BUG_ON(rc == P2M_ONE_DESCEND);
+        count += rc;
     }
 
     if ( flush )
@@ -574,7 +803,7 @@  int guest_physmap_add_entry(struct domain *d,
 {
     return apply_p2m_changes(d, INSERT,
                              pfn_to_paddr(gpfn),
-                             pfn_to_paddr(gpfn + (1 << page_order)),
+                             pfn_to_paddr(gpfn + (1 << page_order)) - 1,
                              pfn_to_paddr(mfn), MATTR_MEM, t);
 }
 
@@ -584,7 +813,7 @@  void guest_physmap_remove_page(struct domain *d,
 {
     apply_p2m_changes(d, REMOVE,
                       pfn_to_paddr(gpfn),
-                      pfn_to_paddr(gpfn + (1<<page_order)),
+                      pfn_to_paddr(gpfn + (1<<page_order)) - 1,
                       pfn_to_paddr(mfn), MATTR_MEM, p2m_invalid);
 }
 
diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h
index 911d32d..0cfbb09 100644
--- a/xen/include/asm-arm/p2m.h
+++ b/xen/include/asm-arm/p2m.h
@@ -29,6 +29,12 @@  struct p2m_domain {
      * resume the search. Apart from during teardown this can only
      * decrease. */
     unsigned long lowest_mapped_gfn;
+
+    struct {
+        uint32_t mappings[4]; /* Number of mappings at each p2m tree level */
+        uint32_t shattered[4]; /* Number of times we have shattered a mapping
+                                * at each p2m tree level. */
+    } stats;
 };
 
 /* List of possible type for each page in the p2m entry.
@@ -79,6 +85,9 @@  int p2m_alloc_table(struct domain *d);
 void p2m_save_state(struct vcpu *p);
 void p2m_restore_state(struct vcpu *n);
 
+/* */
+void p2m_dump_info(struct domain *d);
+
 /* Look up the MFN corresponding to a domain's PFN. */
 paddr_t p2m_lookup(struct domain *d, paddr_t gpfn, p2m_type_t *t);