diff mbox

[Xen-devel,v2,8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed

Message ID 1402504804-29173-8-git-send-email-ian.campbell@citrix.com
State New
Headers show

Commit Message

Ian Campbell June 11, 2014, 4:40 p.m. UTC
Depending on where Xen and the initial modules were loaded into RAM we may not
be able to allocate a suitably contiguous block of memory for dom0. Especially
since the allocations made by alloc_domheap_pages are naturally aligned (e.g. a
1GB allocation must be at a 1GB boundary).

The alignment requirement in particular is a huge limitation, meaning that even
dom0_mem0=1G on a 2GB system is pretty likely to fail unless you are very
careful with your load addresses. People were also having trouble with dom0 >
128MB on the 1GB cubieboard2 when using fairly standard load addresses for
things.

This patch tries to allocate multiple banks of memory in order to try and
satisfy the entire requested domain 0 allocation. Sadly this turns out to be
pretty tricky to arrange (see the large comment in the code).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Karim Raslan <karim.allah.ahmed@gmail.com>
---
v2: New patch
---
 xen/arch/arm/domain_build.c |  278 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 256 insertions(+), 22 deletions(-)

Comments

Julien Grall June 11, 2014, 10:47 p.m. UTC | #1
Hi Ian,

On 11/06/14 17:40, Ian Campbell wrote:
> +        bank->start = start;
> +        bank->size = size;
> +        kinfo->mem.nr_banks++;
> +        return true;
> +    }
> +
> +    panic("Failed add pages to DOM0 memory bank");

This can happen when there is no more bank, right? How likely will Xen 
fill the 8 banks?

Shall we just continue with only the current number of bank?

> +}
> +
> +/*
> + * This is all pretty horrible.
> + *
> + * Requirements:
> + *
> + * 1. The dom0 kernel should be loaded within the first 128MB of RAM. This
> + *    is necessary at least for Linux zImage kernels, which are all we
> + *    support today.
> + * 2. We want to put the dom0 kernel, ramdisk and DTB in the same
> + *    bank. Partly this is just easier for us to deal with, but also
> + *    the ramdisk and DTB must be placed within a certain proximity of
> + *    the kernel within RAM.
> + * 3. For 32-bit dom0 we want to place as much of the RAM as we
> + *    reasonably can below 4GB, so that it can be used by non-LPAE
> + *    enabled kernels.
> + * 4. For 32-bit dom0 the kernel must be located below 4GB.
> + * 5. We want to have a few largers banks rather than many smaller ones.
> + *
> + * For the first two requirements we need to make sure that the lowest
> + * bank is sufficiently large.
> + *
> + * For convenience we also sort the banks by physical address.

IIRC, some kernel version (< 3.15) is assuming that the memory bank are 
ordered.

Regards,
Ian Campbell June 12, 2014, 7:31 a.m. UTC | #2
On Wed, 2014-06-11 at 23:47 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 11/06/14 17:40, Ian Campbell wrote:
> > +        bank->start = start;
> > +        bank->size = size;
> > +        kinfo->mem.nr_banks++;
> > +        return true;
> > +    }
> > +
> > +    panic("Failed add pages to DOM0 memory bank");
> 
> This can happen when there is no more bank, right? How likely will Xen 
> fill the 8 banks?
> 
> Shall we just continue with only the current number of bank?

That was my intention, I think I just got confused about the
circumstances which could lead to particular failure.

Ian.
Julien Grall June 17, 2014, 5:58 p.m. UTC | #3
Hi Ian,

On 06/11/2014 05:40 PM, Ian Campbell wrote:
> +    /*
> +     * First try and allocate the largest thing we can as low as
> +     * possible to be bank 0.
> +     */
> +    while ( order > min_low_order )
> +    {
> +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
> +        {
> +            pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
> +            if ( pg != NULL )
> +                goto got_bank0;
> +        }
> +        order--;
> +    }
> +
> +    panic("Unable to allocate first memory bank");

I gave a try to this patch in stand-alone on the versatile express and I
hit this panic.

Xen is trying to allocate 128Mb for the first bank. It was working
without this patch.

Regards,
Ian Campbell June 18, 2014, 8:27 a.m. UTC | #4
On Tue, 2014-06-17 at 18:58 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 06/11/2014 05:40 PM, Ian Campbell wrote:
> > +    /*
> > +     * First try and allocate the largest thing we can as low as
> > +     * possible to be bank 0.
> > +     */
> > +    while ( order > min_low_order )
> > +    {
> > +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
> > +        {
> > +            pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
> > +            if ( pg != NULL )
> > +                goto got_bank0;
> > +        }
> > +        order--;
> > +    }
> > +
> > +    panic("Unable to allocate first memory bank");
> 
> I gave a try to this patch in stand-alone on the versatile express and I
> hit this panic.
> 
> Xen is trying to allocate 128Mb for the first bank. It was working
> without this patch.

What is your dom0_mem and how much ram does the system have?
Julien Grall June 18, 2014, 9:10 a.m. UTC | #5
On 18/06/14 09:27, Ian Campbell wrote:
> On Tue, 2014-06-17 at 18:58 +0100, Julien Grall wrote:
>> Hi Ian,
>>
>> On 06/11/2014 05:40 PM, Ian Campbell wrote:
>>> +    /*
>>> +     * First try and allocate the largest thing we can as low as
>>> +     * possible to be bank 0.
>>> +     */
>>> +    while ( order > min_low_order )
>>> +    {
>>> +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
>>> +        {
>>> +            pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
>>> +            if ( pg != NULL )
>>> +                goto got_bank0;
>>> +        }
>>> +        order--;
>>> +    }
>>> +
>>> +    panic("Unable to allocate first memory bank");
>>
>> I gave a try to this patch in stand-alone on the versatile express and I
>> hit this panic.
>>
>> Xen is trying to allocate 128Mb for the first bank. It was working
>> without this patch.
> 
> What is your dom0_mem and how much ram does the system have?

I use the default value from Xen i.e 128MB. The platform has 1GB of RAM
(see below Xen log with early printk enabled).

- UART enabled -
- CPU 00000000 booting -
- Xen starting in Hyp mode -
- Zero BSS -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) Checking for initrd in /chosen
(XEN) RAM: 0000000080000000 - 000000009fffffff
(XEN) RAM: 00000000a0000000 - 00000000bfffffff
(XEN)
(XEN) MODULE[1]: 000000009fee6000 - 000000009feea000
(XEN) MODULE[2]: 00000000a0008000 - 00000000a033f458
(XEN)  RESVD[0]: 0000000081f00000 - 0000000081f04000
(XEN)  RESVD[1]: 000000009fee6000 - 000000009feea000
(XEN)
(XEN) Command line: noreboot sync_console console=dtuart dtuart=serial0
(XEN) Placing Xen at 0x00000000bfe00000-0x00000000c0000000
(XEN) Xen heap: 00000000b6000000-00000000be000000 (32768 pages)
(XEN) Dom heap: 229376 pages
(XEN) Domain heap initialised
(XEN) Looking for UART console serial0
Xen 4.5-unstable
(XEN) Xen version 4.5-unstable (julien@cam.xci-test.com) (arm-linux-gnueabihf-gcc (crosstool-NG linaro-1.13.1-4.8-2014.01 - Linaro GCC 2013.11) 4.8.3 20140106 (prerelease)) debug=y Wed Jun 18 09:52:52 BST 2014
(XEN) Latest ChangeSet: Wed Jun 11 17:40:04 2014 +0100 git:bdee6c6
(XEN) Console output is synchronous.
(XEN) Processor: 412fc0f1: "ARM Limited", variant: 0x2, part 0xc0f, rev 0x1
(XEN) 32-bit Execution:
(XEN)   Processor Features: 00001131:00011011
(XEN)     Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle
(XEN)     Extensions: GenericTimer Security
(XEN)   Debug Features: 02010555
(XEN)   Auxiliary Features: 00000000
(XEN)   Memory Model Features: 10201105 20000000 01240000 02102211
(XEN)  ISA Features: 02101110 13112111 21232041 11112131 10011142 00000000
(XEN) Platform: VERSATILE EXPRESS
(XEN) Set SYS_FLAGS to 00000000bfe0004c (0020004c)
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27
(XEN) Using generic timer at 24000 KHz
(XEN) GIC initialization:
(XEN)         gic_dist_addr=000000002c001000
(XEN)         gic_cpu_addr=000000002c002000
(XEN)         gic_hyp_addr=000000002c004000
(XEN)         gic_vcpu_addr=000000002c006000
(XEN)         gic_maintenance_irq=25
(XEN) GIC: 192 lines, 5 cpus, secure (IID 0200043b).
(XEN) Using scheduler: SMP Credit Scheduler (credit)
(XEN) I/O virtualisation disabled
(XEN) Allocated console ring of 16 KiB.
(XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0
(XEN) Bringing up CPU1
- CPU 00000001 booting -
- Xen starting in Hyp mode -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) CPU 1 booted.
(XEN) Brought up 2 CPUs
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Loading kernel from boot module 2
(XEN) Allocating 1:1 mappings totalling 128MB for dom0:
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Unable to allocate first memory bank
(XEN) ****************************************
(XEN)
(XEN) Manual reset required ('noreboot' specified)

Regards,
Ian Campbell June 18, 2014, 9:36 a.m. UTC | #6
On Wed, 2014-06-18 at 10:10 +0100, Julien Grall wrote:
> 
> On 18/06/14 09:27, Ian Campbell wrote:
> > On Tue, 2014-06-17 at 18:58 +0100, Julien Grall wrote:
> >> Hi Ian,
> >>
> >> On 06/11/2014 05:40 PM, Ian Campbell wrote:
> >>> +    /*
> >>> +     * First try and allocate the largest thing we can as low as
> >>> +     * possible to be bank 0.
> >>> +     */
> >>> +    while ( order > min_low_order )
> >>> +    {
> >>> +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
> >>> +        {
> >>> +            pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
> >>> +            if ( pg != NULL )
> >>> +                goto got_bank0;
> >>> +        }
> >>> +        order--;
> >>> +    }
> >>> +
> >>> +    panic("Unable to allocate first memory bank");
> >>
> >> I gave a try to this patch in stand-alone on the versatile express and I
> >> hit this panic.
> >>
> >> Xen is trying to allocate 128Mb for the first bank. It was working
> >> without this patch.
> > 
> > What is your dom0_mem and how much ram does the system have?
> 
> I use the default value from Xen i.e 128MB. The platform has 1GB of RAM
> (see below Xen log with early printk enabled).

Thanks, how strange. Perhaps I got min_low_order wrong, or that loop is
knackered in some way. I'll check.
diff mbox

Patch

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 46a3619..f37127f 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -44,6 +44,13 @@  custom_param("dom0_mem", parse_dom0_mem);
 # define DPRINT(fmt, args...) do {} while ( 0 )
 #endif
 
+//#define DEBUG_11_ALLOCATION
+#ifdef DEBUG_11_ALLOCATION
+# define D11PRINT(fmt, args...) printk(XENLOG_DEBUG fmt, ##args)
+#else
+# define D11PRINT(fmt, args...) do {} while ( 0 )
+#endif
+
 /*
  * Amount of extra space required to dom0's device tree.  No new nodes
  * are added (yet) but one terminating reserve map entry (16 bytes) is
@@ -66,39 +73,266 @@  struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
     return alloc_vcpu(dom0, 0, 0);
 }
 
-static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
+static unsigned int get_11_allocation_size(paddr_t size)
 {
-    paddr_t start;
-    paddr_t size;
-    struct page_info *pg;
-    unsigned int order = get_order_from_bytes(dom0_mem);
-    int res;
-    paddr_t spfn;
+    /*
+     * get_order_from_bytes returns the order greater than or equal to
+     * the given size, but we need less than or equal. Adding one to
+     * the size pushes an evenly aligned size into the next order, so
+     * we can then unconditionally subtract 1 from the order which is
+     * returned.
+     */
+    return get_order_from_bytes(size + 1) - 1;
+}
 
-    if ( is_32bit_domain(d) )
-        pg = alloc_domheap_pages(d, order, MEMF_bits(32));
-    else
-        pg = alloc_domheap_pages(d, order, 0);
-    if ( !pg )
-        panic("Failed to allocate contiguous memory for dom0");
+/*
+ * Insert the given pages into a memory bank, banks are ordered by address.
+ *
+ * Returns false if the memory would be below bank 0.
+ */
+static bool_t insert_11_bank(struct domain *d,
+                             struct kernel_info *kinfo,
+                             struct page_info *pg,
+                             unsigned int order)
+{
+    int res, i;
+    paddr_t spfn;
+    paddr_t start, size;
 
     spfn = page_to_mfn(pg);
     start = pfn_to_paddr(spfn);
     size = pfn_to_paddr((1 << order));
 
-    // 1:1 mapping
-    printk("Populate P2M %#"PRIx64"->%#"PRIx64" (1:1 mapping for dom0)\n",
-           start, start + size);
-    res = guest_physmap_add_page(d, spfn, spfn, order);
+    D11PRINT("Allocated %#"PRIpaddr"-%#"PRIpaddr" (%ldMB/%ldMB, order %d)\n",
+             start, start + size,
+             1UL << (order+PAGE_SHIFT-20),
+             /* Don't want format this as PRIpaddr (16 digit hex) */
+             (unsigned long)(kinfo->unassigned_mem >> 20),
+             order);
 
-    if ( res )
-        panic("Unable to add pages in DOM0: %d", res);
+    if ( kinfo->mem.nr_banks > 0 && start + size < kinfo->mem.bank[0].start )
+    {
+        D11PRINT("Allocation is below bank 0, not using\n");
+        free_domheap_pages(pg, order);
+        return false;
+    }
 
-    kinfo->mem.bank[0].start = start;
-    kinfo->mem.bank[0].size = size;
-    kinfo->mem.nr_banks = 1;
+    res = guest_physmap_add_page(d, spfn, spfn, order);
+    if ( res )
+        panic("Failed map pages to DOM0: %d", res);
 
     kinfo->unassigned_mem -= size;
+
+    if ( kinfo->mem.nr_banks == 0 )
+    {
+        kinfo->mem.bank[0].start = start;
+        kinfo->mem.bank[0].size = size;
+        kinfo->mem.nr_banks = 1;
+        return true;
+    }
+
+    for( i = 0; i < kinfo->mem.nr_banks; i++ )
+    {
+        struct membank *bank = &kinfo->mem.bank[i];
+
+        /* If possible merge new memory into the start of the bank */
+        if ( bank->start == start+size )
+        {
+            bank->start = start;
+            bank->size += size;
+            return true;
+        }
+
+        /* If possible merge new memory onto the end of the bank */
+        if ( start == bank->start + bank->size )
+        {
+            bank->size += size;
+            return true;
+        }
+
+        /*
+         * Otherwise if it is below this bank insert new memory in a
+         * new bank before this one. If there was a lower bank we
+         * could have inserted the memory into/before we would already
+         * have done so, so this must be the right place.
+         */
+        if ( start + size < bank->start && kinfo->mem.nr_banks < NR_MEM_BANKS )
+        {
+            memmove(bank + 1, bank, sizeof(*bank)*(kinfo->mem.nr_banks - i));
+            kinfo->mem.nr_banks++;
+            bank->start = start;
+            bank->size = size;
+            return true;
+        }
+    }
+
+    if ( i == kinfo->mem.nr_banks && kinfo->mem.nr_banks < NR_MEM_BANKS )
+    {
+        struct membank *bank = &kinfo->mem.bank[kinfo->mem.nr_banks];
+
+        bank->start = start;
+        bank->size = size;
+        kinfo->mem.nr_banks++;
+        return true;
+    }
+
+    panic("Failed add pages to DOM0 memory bank");
+}
+
+/*
+ * This is all pretty horrible.
+ *
+ * Requirements:
+ *
+ * 1. The dom0 kernel should be loaded within the first 128MB of RAM. This
+ *    is necessary at least for Linux zImage kernels, which are all we
+ *    support today.
+ * 2. We want to put the dom0 kernel, ramdisk and DTB in the same
+ *    bank. Partly this is just easier for us to deal with, but also
+ *    the ramdisk and DTB must be placed within a certain proximity of
+ *    the kernel within RAM.
+ * 3. For 32-bit dom0 we want to place as much of the RAM as we
+ *    reasonably can below 4GB, so that it can be used by non-LPAE
+ *    enabled kernels.
+ * 4. For 32-bit dom0 the kernel must be located below 4GB.
+ * 5. We want to have a few largers banks rather than many smaller ones.
+ *
+ * For the first two requirements we need to make sure that the lowest
+ * bank is sufficiently large.
+ *
+ * For convenience we also sort the banks by physical address.
+ *
+ * The memory allocator does not really give us the flexibility to
+ * meet these requirements directly. So instead of proceed as follows:
+ *
+ * We first allocate the largest allocation we can as low as we
+ * can. This then becomes the first bank. This bank must be at least
+ * 128MB (or dom0_mem if that is smaller).
+ *
+ * Then we start allocating more memory, trying to allocate the
+ * largest possible size and trying smaller sizes until we
+ * successfully allocate something.
+ *
+ * We then try and insert this memory in to the list of banks. If it
+ * can be merged into an existing bank then this is trivial.
+ *
+ * If the new memory is before the first bank (and cannot be merged
+ * into it) then we give up. Since the allocator prefers to allocate
+ * high addresses first and the first bank has already been allocated
+ * to be as low as possible this likely means we wouldn't have been
+ * able to allocate much more memory anyway.
+ *
+ * Otherwise we insert a new bank. If we've reached MAX_NR_BANKS then
+ * we give up.
+ *
+ * For 32-bit domain we require that the initial allocation for the
+ * first bank is under 4G. Then for the subsequent allocations we
+ * initially allocate memory only from below 4GB. Once that runs out
+ * (as described above) we allow higher allocations and continue until
+ * that runs out (or we have allocated sufficient dom0 memory).
+ */
+static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
+{
+    const unsigned int min_low_order =
+        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
+    const unsigned int min_order = get_order_from_bytes(MB(4));
+    struct page_info *pg;
+    unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
+    int i;
+
+    bool_t lowmem = is_32bit_domain(d);
+    unsigned int bits;
+
+    printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
+           /* Don't want format this as PRIpaddr (16 digit hex) */
+           (unsigned long)(kinfo->unassigned_mem >> 20));
+
+    kinfo->mem.nr_banks = 0;
+
+    /*
+     * First try and allocate the largest thing we can as low as
+     * possible to be bank 0.
+     */
+    while ( order > min_low_order )
+    {
+        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
+        {
+            pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
+            if ( pg != NULL )
+                goto got_bank0;
+        }
+        order--;
+    }
+
+    panic("Unable to allocate first memory bank");
+
+ got_bank0:
+
+    if ( !insert_11_bank(d, kinfo, pg, order) )
+        BUG(); /* Cannot fail for first bank */
+
+    /* Now allocate more memory and fill in additional banks */
+
+    order = get_11_allocation_size(kinfo->unassigned_mem);
+    while ( kinfo->unassigned_mem && kinfo->mem.nr_banks < NR_MEM_BANKS )
+    {
+        pg = alloc_domheap_pages(d, order, lowmem ? MEMF_bits(32) : 0);
+        if ( !pg )
+        {
+            order --;
+
+            if ( lowmem && order < min_low_order)
+            {
+                D11PRINT("Failed at min_low_order, allow high allocations\n");
+                order = get_11_allocation_size(kinfo->unassigned_mem);
+                lowmem = false;
+                continue;
+            }
+            if ( order >= min_order )
+                continue;
+
+            /* No more we can do */
+            break;
+        }
+
+        if ( !insert_11_bank(d, kinfo, pg, order) )
+        {
+            if ( lowmem )
+            {
+                D11PRINT("Allocation below bank 0, allow high allocations\n");
+                order = get_11_allocation_size(kinfo->unassigned_mem);
+                lowmem = false;
+                continue;
+            }
+            else
+            {
+                D11PRINT("Allocation below bank 0\n");
+                break;
+            }
+        }
+
+        /*
+         * Success, next time around try again to get the largest order
+         * allocation possible.
+         */
+        order = get_11_allocation_size(kinfo->unassigned_mem);
+    }
+
+    if ( kinfo->unassigned_mem )
+        printk("WARNING: Failed to allocate requested dom0 memory."
+               /* Don't want format this as PRIpaddr (16 digit hex) */
+               " %ldMB unallocated\n",
+               (unsigned long)kinfo->unassigned_mem >> 20);
+
+    for( i = 0; i < kinfo->mem.nr_banks; i++ )
+    {
+        printk("BANK[%d] %#"PRIpaddr"-%#"PRIpaddr" (%ldMB)\n",
+               i,
+               kinfo->mem.bank[i].start,
+               kinfo->mem.bank[i].start + kinfo->mem.bank[i].size,
+               /* Don't want format this as PRIpaddr (16 digit hex) */
+               (unsigned long)(kinfo->mem.bank[i].size >> 20));
+    }
 }
 
 static void allocate_memory(struct domain *d, struct kernel_info *kinfo)