diff mbox

[Xen-devel,v3,8/8] xen: arm: allocate more than one bank for 1:1 domain 0 if needed

Message ID 1403777837-16779-8-git-send-email-ian.campbell@citrix.com
State New
Headers show

Commit Message

Ian Campbell June 26, 2014, 10:17 a.m. UTC
Depending on where Xen and the initial modules were loaded into RAM we may not
be able to allocate a suitably contiguous block of memory for dom0. Especially
since the allocations made by alloc_domheap_pages are naturally aligned (e.g. a
1GB allocation must be at a 1GB boundary).

The alignment requirement in particular is a huge limitation, meaning that even
dom0_mem0=1G on a 2GB system is pretty likely to fail unless you are very
careful with your load addresses. People were also having trouble with dom0 >
128MB on the 1GB cubieboard2 when using fairly standard load addresses for
things.

This patch tries to allocate multiple banks of memory in order to try and
satisfy the entire requested domain 0 allocation. Sadly this turns out to be
pretty tricky to arrange (see the large comment in the code).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Karim Raslan <karim.allah.ahmed@gmail.com>
---
v3:
  Handle running out of banks more gracefully
  Allow order >= min_low_order, not strictly greater. Otherwise
  dom0_mem=128M doesn't work.
v2: New patch
---
 xen/arch/arm/domain_build.c |  286 +++++++++++++++++++++++++++++++++++++++----
 1 file changed, 264 insertions(+), 22 deletions(-)

Comments

Julien Grall June 26, 2014, 6:20 p.m. UTC | #1
Hi Ian,

On 06/26/2014 11:17 AM, Ian Campbell wrote:
> Depending on where Xen and the initial modules were loaded into RAM we may not
> be able to allocate a suitably contiguous block of memory for dom0. Especially
> since the allocations made by alloc_domheap_pages are naturally aligned (e.g. a
> 1GB allocation must be at a 1GB boundary).
> 
> The alignment requirement in particular is a huge limitation, meaning that even
> dom0_mem0=1G on a 2GB system is pretty likely to fail unless you are very
> careful with your load addresses. People were also having trouble with dom0 >
> 128MB on the 1GB cubieboard2 when using fairly standard load addresses for
> things.
> 
> This patch tries to allocate multiple banks of memory in order to try and
> satisfy the entire requested domain 0 allocation. Sadly this turns out to be
> pretty tricky to arrange (see the large comment in the code).
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> Cc: Karim Raslan <karim.allah.ahmed@gmail.com>
> ---
> v3:
>   Handle running out of banks more gracefully
>   Allow order >= min_low_order, not strictly greater. Otherwise
>   dom0_mem=128M doesn't work.

dom0_mem still not working on the versatile express (see my explanation
a below).

> +static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
> +{
> +    const unsigned int min_low_order =
> +        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
> +    const unsigned int min_order = get_order_from_bytes(MB(4));
> +    struct page_info *pg;
> +    unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
> +    int i;
> +
> +    bool_t lowmem = is_32bit_domain(d);
> +    unsigned int bits;
> +
> +    printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
> +           /* Don't want format this as PRIpaddr (16 digit hex) */
> +           (unsigned long)(kinfo->unassigned_mem >> 20));
> +
> +    kinfo->mem.nr_banks = 0;
> +
> +    /*
> +     * First try and allocate the largest thing we can as low as
> +     * possible to be bank 0.
> +     */
> +    while ( order >= min_low_order )
> +    {
> +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )

This has to be: bits <= (...). Indeed, we want to try all possible zone.

Which is annoying on the versatile express because the first bank will
be allocated in "high memory" from Linux POV which make DOM0 doesn't
boot under certain circumstance (see the
http://lists.xen.org/archives/html/xen-users/2014-06/msg00142.html).

I'm looking why Xen doesn't try to allocate in lower memory the first bank.

regards,
Ian Campbell June 27, 2014, 9:24 a.m. UTC | #2
On Thu, 2014-06-26 at 19:20 +0100, Julien Grall wrote:
> On 06/26/2014 11:17 AM, Ian Campbell wrote:
> > Depending on where Xen and the initial modules were loaded into RAM we may not
> > be able to allocate a suitably contiguous block of memory for dom0. Especially
> > since the allocations made by alloc_domheap_pages are naturally aligned (e.g. a
> > 1GB allocation must be at a 1GB boundary).
> > 
> > The alignment requirement in particular is a huge limitation, meaning that even
> > dom0_mem0=1G on a 2GB system is pretty likely to fail unless you are very
> > careful with your load addresses. People were also having trouble with dom0 >
> > 128MB on the 1GB cubieboard2 when using fairly standard load addresses for
> > things.
> > 
> > This patch tries to allocate multiple banks of memory in order to try and
> > satisfy the entire requested domain 0 allocation. Sadly this turns out to be
> > pretty tricky to arrange (see the large comment in the code).
> > 
> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > Cc: Karim Raslan <karim.allah.ahmed@gmail.com>
> > ---
> > v3:
> >   Handle running out of banks more gracefully
> >   Allow order >= min_low_order, not strictly greater. Otherwise
> >   dom0_mem=128M doesn't work.
> 
> dom0_mem still not working on the versatile express (see my explanation
> a below).

> 
> > +static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
> > +{
> > +    const unsigned int min_low_order =
> > +        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
> > +    const unsigned int min_order = get_order_from_bytes(MB(4));
> > +    struct page_info *pg;
> > +    unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
> > +    int i;
> > +
> > +    bool_t lowmem = is_32bit_domain(d);
> > +    unsigned int bits;
> > +
> > +    printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
> > +           /* Don't want format this as PRIpaddr (16 digit hex) */
> > +           (unsigned long)(kinfo->unassigned_mem >> 20));
> > +
> > +    kinfo->mem.nr_banks = 0;
> > +
> > +    /*
> > +     * First try and allocate the largest thing we can as low as
> > +     * possible to be bank 0.
> > +     */
> > +    while ( order >= min_low_order )
> > +    {
> > +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
> 
> This has to be: bits <= (...). Indeed, we want to try all possible zone.

Damn. I tested this on midway with "dom0_mem=128M mem=1G" and saw a
failure (which is fixed in this iteration). But obviously midway and
vexpress differ in their physical RAM addresses which caused me to miss
this second failure.

Have you tried that change? bits < 32 gives you 32-bits (i.e. 0..31)
which should cover everything up to 4G. Or else I'm misremembering some
quirk of the allocation interface.

Does vexpress really have memory high enough up that excluding bits==32
excludes it? I thought it was down in the 1-3GB range.

> Which is annoying on the versatile express because the first bank will
> be allocated in "high memory" from Linux POV which make DOM0 doesn't
> boot under certain circumstance (see the
> http://lists.xen.org/archives/html/xen-users/2014-06/msg00142.html).
> 
> I'm looking why Xen doesn't try to allocate in lower memory the first bank.

Usually it means that at least single page is allocated in each 128MB
block lower down, which can happen depending on how your boot modules
are laid out.

I did consider making 
min_low_order = get_order_from_bytes(min_t(paddr_t, dom0_mem/4, MB(128)));

i.e. for a dom0_mem <= 128MB allowing to be split into 4x32MB banks if
necessary. Or perhaps /2 would be better since 32MB is small enough that
we might have trouble with placing the kernel and modules etc in a
single bank.

Ian.
Julien Grall June 27, 2014, 10:31 a.m. UTC | #3
Hi Ian,

On 27/06/14 10:24, Ian Campbell wrote:
>> This has to be: bits <= (...). Indeed, we want to try all possible zone.
>
> Damn. I tested this on midway with "dom0_mem=128M mem=1G" and saw a
> failure (which is fixed in this iteration). But obviously midway and
> vexpress differ in their physical RAM addresses which caused me to miss
> this second failure.
>
> Have you tried that change? bits < 32 gives you 32-bits (i.e. 0..31)
> which should cover everything up to 4G. Or else I'm misremembering some
> quirk of the allocation interface.

Yes, without the change "<=" I was unable to allocate the first bank.

The memory layout is
(XEN) Checking for initrd in /chosen
(XEN) RAM: 0000000080000000 - 000000009fffffff
(XEN) RAM: 00000000a0000000 - 00000000bfffffff
(XEN)
(XEN) MODULE[1]: 000000009fee6000 - 000000009feea000
(XEN) MODULE[2]: 00000000a0008000 - 00000000a033f458
(XEN)  RESVD[0]: 0000000081f00000 - 0000000081f04000
(XEN)  RESVD[1]: 000000009fee6000 - 000000009feea000

> Does vexpress really have memory high enough up that excluding bits==32
> excludes it? I thought it was down in the 1-3GB range.

The first bank allocating is situated just above 2GB
BANK[0] 0x000000a8000000-0x000000b0000000

But I'm unable to find enough memory unless when I pass bits=32 to the 
allocator.

>> I'm looking why Xen doesn't try to allocate in lower memory the first bank.
>
> Usually it means that at least single page is allocated in each 128MB
> block lower down, which can happen depending on how your boot modules
> are laid out.

With my current layout (see above), I should have space in the first 
bank (around 0x80000000). I don't really understand why Xen is not 
trying to use it.

> I did consider making
> min_low_order = get_order_from_bytes(min_t(paddr_t, dom0_mem/4, MB(128)));
>
> i.e. for a dom0_mem <= 128MB allowing to be split into 4x32MB banks if
> necessary. Or perhaps /2 would be better since 32MB is small enough that
> we might have trouble with placing the kernel and modules etc in a
> single bank.

Require a 128Mb hole in the memory doesn't seem too mad. I would keep 
this minimum value.

Regards,
Julien Grall June 27, 2014, 11:54 a.m. UTC | #4
On 27/06/14 11:31, Julien Grall wrote:
> Hi Ian,
>
> On 27/06/14 10:24, Ian Campbell wrote:
>>> This has to be: bits <= (...). Indeed, we want to try all possible zone.
>>
>> Damn. I tested this on midway with "dom0_mem=128M mem=1G" and saw a
>> failure (which is fixed in this iteration). But obviously midway and
>> vexpress differ in their physical RAM addresses which caused me to miss
>> this second failure.
>>
>> Have you tried that change? bits < 32 gives you 32-bits (i.e. 0..31)
>> which should cover everything up to 4G. Or else I'm misremembering some
>> quirk of the allocation interface.
>
> Yes, without the change "<=" I was unable to allocate the first bank.
>
> The memory layout is
> (XEN) Checking for initrd in /chosen
> (XEN) RAM: 0000000080000000 - 000000009fffffff
> (XEN) RAM: 00000000a0000000 - 00000000bfffffff
> (XEN)
> (XEN) MODULE[1]: 000000009fee6000 - 000000009feea000
> (XEN) MODULE[2]: 00000000a0008000 - 00000000a033f458
> (XEN)  RESVD[0]: 0000000081f00000 - 0000000081f04000
> (XEN)  RESVD[1]: 000000009fee6000 - 000000009feea000
>
>> Does vexpress really have memory high enough up that excluding bits==32
>> excludes it? I thought it was down in the 1-3GB range.
>
> The first bank allocating is situated just above 2GB
> BANK[0] 0x000000a8000000-0x000000b0000000

I've dump my heap just after end_bootallocator. And it doesn't look right:

(XEN) heap[node=0][zone=0] -> 32763 pages
(XEN) heap[node=0][zone=1] -> 0 pages
(XEN) heap[node=0][zone=2] -> 0 pages
(XEN) heap[node=0][zone=3] -> 0 pages
(XEN) heap[node=0][zone=4] -> 0 pages
(XEN) heap[node=0][zone=5] -> 0 pages
(XEN) heap[node=0][zone=6] -> 0 pages
(XEN) heap[node=0][zone=7] -> 0 pages
(XEN) heap[node=0][zone=8] -> 0 pages
(XEN) heap[node=0][zone=9] -> 0 pages
(XEN) heap[node=0][zone=10] -> 0 pages
(XEN) heap[node=0][zone=11] -> 0 pages
(XEN) heap[node=0][zone=12] -> 0 pages
(XEN) heap[node=0][zone=13] -> 0 pages
(XEN) heap[node=0][zone=14] -> 0 pages
(XEN) heap[node=0][zone=15] -> 0 pages
(XEN) heap[node=0][zone=16] -> 0 pages
(XEN) heap[node=0][zone=17] -> 0 pages
(XEN) heap[node=0][zone=18] -> 0 pages
(XEN) heap[node=0][zone=19] -> 0 pages
(XEN) heap[node=0][zone=20] -> 219840 pages
(XEN) heap[node=0][zone=21] -> 0 pages
(XEN) heap[node=0][zone=22] -> 0 pages
(XEN) heap[node=0][zone=23] -> 0 pages
(XEN) heap[node=0][zone=24] -> 0 pages
(XEN) heap[node=0][zone=25] -> 0 pages
(XEN) heap[node=0][zone=26] -> 0 pages
(XEN) heap[node=0][zone=27] -> 0 pages
(XEN) heap[node=0][zone=28] -> 0 pages

For comparison midway:
(XEN) RAM: 0000000000000000 - 00000000ff7fffff
(XEN) RAM: 0000000200000000 - 00000002ffffffff
(XEN)
(XEN) MODULE[1]: 0000000000001000 - 0000000000006000
(XEN) MODULE[2]: 0000000001000000 - 00000000012c31a8
(XEN)  RESVD[0]: 0000000000000000 - 0000000000001000
(XEN)  RESVD[1]: 0000000000001000 - 0000000000003000

(XEN) heap[node=0][zone=0] -> 262138 pages
(XEN) heap[node=0][zone=1] -> 0 pages
(XEN) heap[node=0][zone=2] -> 0 pages
(XEN) heap[node=0][zone=3] -> 2 pages
(XEN) heap[node=0][zone=4] -> 8 pages
(XEN) heap[node=0][zone=5] -> 16 pages
(XEN) heap[node=0][zone=6] -> 32 pages
(XEN) heap[node=0][zone=7] -> 64 pages
(XEN) heap[node=0][zone=8] -> 128 pages
(XEN) heap[node=0][zone=9] -> 256 pages
(XEN) heap[node=0][zone=10] -> 512 pages
(XEN) heap[node=0][zone=11] -> 1024 pages
(XEN) heap[node=0][zone=12] -> 2048 pages
(XEN) heap[node=0][zone=13] -> 3388 pages
(XEN) heap[node=0][zone=14] -> 8192 pages
(XEN) heap[node=0][zone=15] -> 16384 pages
(XEN) heap[node=0][zone=16] -> 32768 pages
(XEN) heap[node=0][zone=17] -> 65536 pages
(XEN) heap[node=0][zone=18] -> 131072 pages
(XEN) heap[node=0][zone=19] -> 262144 pages
(XEN) heap[node=0][zone=20] -> 259584 pages
(XEN) heap[node=0][zone=21] -> 0 pages
(XEN) heap[node=0][zone=22] -> 1024000 pages
(XEN) heap[node=0][zone=23] -> 0 pages
(XEN) heap[node=0][zone=24] -> 0 pages
(XEN) heap[node=0][zone=25] -> 0 pages
(XEN) heap[node=0][zone=26] -> 0 pages
(XEN) heap[node=0][zone=27] -> 0 pages
(XEN) heap[node=0][zone=28] -> 0 pages

Arndale:
(XEN) RAM: 0000000040000000 - 000000004fffffff
(XEN) RAM: 0000000050000000 - 000000005fffffff
(XEN) RAM: 0000000060000000 - 000000006fffffff
(XEN) RAM: 0000000070000000 - 000000007fffffff
(XEN) RAM: 0000000080000000 - 000000008fffffff
(XEN) RAM: 0000000090000000 - 000000009fffffff
(XEN) RAM: 00000000a0000000 - 00000000afffffff
(XEN) RAM: 00000000b0000000 - 00000000bfffffff
(XEN)
(XEN) MODULE[1]: 000000004fff3000 - 0000000050000000
(XEN) MODULE[2]: 0000000040f00000 - 0000000041900000
(XEN)  RESVD[0]: 0000000042000000 - 000000004200a000

(XEN) heap[node=0][zone=0] -> 65522 pages
(XEN) heap[node=0][zone=1] -> 0 pages
(XEN) heap[node=0][zone=2] -> 0 pages
(XEN) heap[node=0][zone=3] -> 0 pages
(XEN) heap[node=0][zone=4] -> 0 pages
(XEN) heap[node=0][zone=5] -> 0 pages
(XEN) heap[node=0][zone=6] -> 0 pages
(XEN) heap[node=0][zone=7] -> 0 pages
(XEN) heap[node=0][zone=8] -> 0 pages
(XEN) heap[node=0][zone=9] -> 0 pages
(XEN) heap[node=0][zone=10] -> 0 pages
(XEN) heap[node=0][zone=11] -> 0 pages
(XEN) heap[node=0][zone=12] -> 0 pages
(XEN) heap[node=0][zone=13] -> 0 pages
(XEN) heap[node=0][zone=14] -> 0 pages
(XEN) heap[node=0][zone=15] -> 0 pages
(XEN) heap[node=0][zone=16] -> 0 pages
(XEN) heap[node=0][zone=17] -> 0 pages
(XEN) heap[node=0][zone=18] -> 0 pages
(XEN) heap[node=0][zone=19] -> 259561 pages
(XEN) heap[node=0][zone=20] -> 187904 pages
(XEN) heap[node=0][zone=21] -> 0 pages
(XEN) heap[node=0][zone=22] -> 0 pages
(XEN) heap[node=0][zone=23] -> 0 pages
(XEN) heap[node=0][zone=24] -> 0 pages
(XEN) heap[node=0][zone=25] -> 0 pages
(XEN) heap[node=0][zone=26] -> 0 pages
(XEN) heap[node=0][zone=27] -> 0 pages
(XEN) heap[node=0][zone=28] -> 0 pages
Julien Grall June 27, 2014, 12:52 p.m. UTC | #5
On 27/06/14 10:24, Ian Campbell wrote:
> On Thu, 2014-06-26 at 19:20 +0100, Julien Grall wrote:
>> On 06/26/2014 11:17 AM, Ian Campbell wrote:
>>> Depending on where Xen and the initial modules were loaded into RAM we may not
>>> be able to allocate a suitably contiguous block of memory for dom0. Especially
>>> since the allocations made by alloc_domheap_pages are naturally aligned (e.g. a
>>> 1GB allocation must be at a 1GB boundary).
>>>
>>> The alignment requirement in particular is a huge limitation, meaning that even
>>> dom0_mem0=1G on a 2GB system is pretty likely to fail unless you are very
>>> careful with your load addresses. People were also having trouble with dom0 >
>>> 128MB on the 1GB cubieboard2 when using fairly standard load addresses for
>>> things.
>>>
>>> This patch tries to allocate multiple banks of memory in order to try and
>>> satisfy the entire requested domain 0 allocation. Sadly this turns out to be
>>> pretty tricky to arrange (see the large comment in the code).
>>>
>>> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
>>> Cc: Karim Raslan <karim.allah.ahmed@gmail.com>
>>> ---
>>> v3:
>>>    Handle running out of banks more gracefully
>>>    Allow order >= min_low_order, not strictly greater. Otherwise
>>>    dom0_mem=128M doesn't work.
>>
>> dom0_mem still not working on the versatile express (see my explanation
>> a below).
>
>>
>>> +static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
>>> +{
>>> +    const unsigned int min_low_order =
>>> +        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
>>> +    const unsigned int min_order = get_order_from_bytes(MB(4));
>>> +    struct page_info *pg;
>>> +    unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
>>> +    int i;
>>> +
>>> +    bool_t lowmem = is_32bit_domain(d);
>>> +    unsigned int bits;
>>> +
>>> +    printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
>>> +           /* Don't want format this as PRIpaddr (16 digit hex) */
>>> +           (unsigned long)(kinfo->unassigned_mem >> 20));
>>> +
>>> +    kinfo->mem.nr_banks = 0;
>>> +
>>> +    /*
>>> +     * First try and allocate the largest thing we can as low as
>>> +     * possible to be bank 0.
>>> +     */
>>> +    while ( order >= min_low_order )
>>> +    {
>>> +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
>>
>> This has to be: bits <= (...). Indeed, we want to try all possible zone.

After digging into the code, I confirm that we need the "<=".

the zone is retrieved with this formula: fls(page_to_mfn(pg)), in the 
case of the versatile express the bank start mfn is 0x8000.
So every page will reach the zone 20. At the same time zone 20 is equals 
to bits=32.

Futhermore, it looks like we always allocate memory from the top of this 
zone. Is it normal?

Regards,
Ian Campbell June 27, 2014, 1:01 p.m. UTC | #6
On Fri, 2014-06-27 at 13:52 +0100, Julien Grall wrote:
> 
> On 27/06/14 10:24, Ian Campbell wrote:
> > On Thu, 2014-06-26 at 19:20 +0100, Julien Grall wrote:
> >> On 06/26/2014 11:17 AM, Ian Campbell wrote:
> >>> Depending on where Xen and the initial modules were loaded into RAM we may not
> >>> be able to allocate a suitably contiguous block of memory for dom0. Especially
> >>> since the allocations made by alloc_domheap_pages are naturally aligned (e.g. a
> >>> 1GB allocation must be at a 1GB boundary).
> >>>
> >>> The alignment requirement in particular is a huge limitation, meaning that even
> >>> dom0_mem0=1G on a 2GB system is pretty likely to fail unless you are very
> >>> careful with your load addresses. People were also having trouble with dom0 >
> >>> 128MB on the 1GB cubieboard2 when using fairly standard load addresses for
> >>> things.
> >>>
> >>> This patch tries to allocate multiple banks of memory in order to try and
> >>> satisfy the entire requested domain 0 allocation. Sadly this turns out to be
> >>> pretty tricky to arrange (see the large comment in the code).
> >>>
> >>> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> >>> Cc: Karim Raslan <karim.allah.ahmed@gmail.com>
> >>> ---
> >>> v3:
> >>>    Handle running out of banks more gracefully
> >>>    Allow order >= min_low_order, not strictly greater. Otherwise
> >>>    dom0_mem=128M doesn't work.
> >>
> >> dom0_mem still not working on the versatile express (see my explanation
> >> a below).
> >
> >>
> >>> +static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
> >>> +{
> >>> +    const unsigned int min_low_order =
> >>> +        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
> >>> +    const unsigned int min_order = get_order_from_bytes(MB(4));
> >>> +    struct page_info *pg;
> >>> +    unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
> >>> +    int i;
> >>> +
> >>> +    bool_t lowmem = is_32bit_domain(d);
> >>> +    unsigned int bits;
> >>> +
> >>> +    printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
> >>> +           /* Don't want format this as PRIpaddr (16 digit hex) */
> >>> +           (unsigned long)(kinfo->unassigned_mem >> 20));
> >>> +
> >>> +    kinfo->mem.nr_banks = 0;
> >>> +
> >>> +    /*
> >>> +     * First try and allocate the largest thing we can as low as
> >>> +     * possible to be bank 0.
> >>> +     */
> >>> +    while ( order >= min_low_order )
> >>> +    {
> >>> +        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
> >>
> >> This has to be: bits <= (...). Indeed, we want to try all possible zone.
> 
> After digging into the code, I confirm that we need the "<=".
> 
> the zone is retrieved with this formula: fls(page_to_mfn(pg)), in the 
> case of the versatile express the bank start mfn is 0x8000.
> So every page will reach the zone 20. At the same time zone 20 is equals 
> to bits=32.


OK, thanks for digging.

> Futhermore, it looks like we always allocate memory from the top of this 
> zone. Is it normal?

Normal behaviour is to allocate from as high as possible so I'm not
surprised that applies with a zone as well.

Ian.
Julien Grall June 27, 2014, 1:04 p.m. UTC | #7
On 27/06/14 14:01, Ian Campbell wrote:
>> Futhermore, it looks like we always allocate memory from the top of this
>> zone. Is it normal?
>
> Normal behaviour is to allocate from as high as possible so I'm not
> surprised that applies with a zone as well.

For DOM0 bank allocation, shouldn't we try to allocate as lower as 
possible? If not, we may not be able to create a second bank for dom0.
Ian Campbell June 27, 2014, 1:09 p.m. UTC | #8
On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
> 
> On 27/06/14 14:01, Ian Campbell wrote:
> >> Futhermore, it looks like we always allocate memory from the top of this
> >> zone. Is it normal?
> >
> > Normal behaviour is to allocate from as high as possible so I'm not
> > surprised that applies with a zone as well.
> 
> For DOM0 bank allocation, shouldn't we try to allocate as lower as 
> possible? If not, we may not be able to create a second bank for dom0.

The loop over bits is supposed to help there, but it's not as effective
with higher addresses I suppose.

We'd need a new flag fore the allocate. Which is doable I guess. I'll
give it a try.

Ian.
Ian Campbell June 30, 2014, 3:58 p.m. UTC | #9
On Fri, 2014-06-27 at 14:09 +0100, Ian Campbell wrote:
> On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
> > 
> > On 27/06/14 14:01, Ian Campbell wrote:
> > >> Futhermore, it looks like we always allocate memory from the top of this
> > >> zone. Is it normal?
> > >
> > > Normal behaviour is to allocate from as high as possible so I'm not
> > > surprised that applies with a zone as well.
> > 
> > For DOM0 bank allocation, shouldn't we try to allocate as lower as 
> > possible? If not, we may not be able to create a second bank for dom0.
> 
> The loop over bits is supposed to help there, but it's not as effective
> with higher addresses I suppose.
> 
> We'd need a new flag fore the allocate. Which is doable I guess. I'll
> give it a try.

It's a bit tricky and involves messing with the guts of the page
allocator, which I'd rather avoid...

We do allow allocations to be merged onto the start of bank 0, which in
the normal case is what I would expect to happen, although it isn't
guaranteed. In that case we would try higher addresses and might find
enough RAM there.

I suppose I could relax the constraint a bit and allow new banks in
front of bank zero if they were >= 128MB (say).

Ian.
Julien Grall June 30, 2014, 4:10 p.m. UTC | #10
Hi Ian,

On 06/30/2014 04:58 PM, Ian Campbell wrote:
> On Fri, 2014-06-27 at 14:09 +0100, Ian Campbell wrote:
>> On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
>>>
>>> On 27/06/14 14:01, Ian Campbell wrote:
>>>>> Futhermore, it looks like we always allocate memory from the top of this
>>>>> zone. Is it normal?
>>>>
>>>> Normal behaviour is to allocate from as high as possible so I'm not
>>>> surprised that applies with a zone as well.
>>>
>>> For DOM0 bank allocation, shouldn't we try to allocate as lower as 
>>> possible? If not, we may not be able to create a second bank for dom0.
>>
>> The loop over bits is supposed to help there, but it's not as effective
>> with higher addresses I suppose.
>>
>> We'd need a new flag fore the allocate. Which is doable I guess. I'll
>> give it a try.
> 
> It's a bit tricky and involves messing with the guts of the page
> allocator, which I'd rather avoid...
> 
> We do allow allocations to be merged onto the start of bank 0, which in
> the normal case is what I would expect to happen, although it isn't
> guaranteed. In that case we would try higher addresses and might find
> enough RAM there.
> 
> I suppose I could relax the constraint a bit and allow new banks in
> front of bank zero if they were >= 128MB (say).

So, replacing the bank zero by the new bank, right? IIRC, some Linux
version are requiring the bank to be ordered.

Regards,
Ian Campbell June 30, 2014, 4:14 p.m. UTC | #11
On Mon, 2014-06-30 at 17:10 +0100, Julien Grall wrote:
> Hi Ian,
> 
> On 06/30/2014 04:58 PM, Ian Campbell wrote:
> > On Fri, 2014-06-27 at 14:09 +0100, Ian Campbell wrote:
> >> On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
> >>>
> >>> On 27/06/14 14:01, Ian Campbell wrote:
> >>>>> Futhermore, it looks like we always allocate memory from the top of this
> >>>>> zone. Is it normal?
> >>>>
> >>>> Normal behaviour is to allocate from as high as possible so I'm not
> >>>> surprised that applies with a zone as well.
> >>>
> >>> For DOM0 bank allocation, shouldn't we try to allocate as lower as 
> >>> possible? If not, we may not be able to create a second bank for dom0.
> >>
> >> The loop over bits is supposed to help there, but it's not as effective
> >> with higher addresses I suppose.
> >>
> >> We'd need a new flag fore the allocate. Which is doable I guess. I'll
> >> give it a try.
> > 
> > It's a bit tricky and involves messing with the guts of the page
> > allocator, which I'd rather avoid...
> > 
> > We do allow allocations to be merged onto the start of bank 0, which in
> > the normal case is what I would expect to happen, although it isn't
> > guaranteed. In that case we would try higher addresses and might find
> > enough RAM there.
> > 
> > I suppose I could relax the constraint a bit and allow new banks in
> > front of bank zero if they were >= 128MB (say).
> 
> So, replacing the bank zero by the new bank, right? IIRC, some Linux
> version are requiring the bank to be ordered.

Inserting the new bank before, retaining the ordering.

Ian.
Julien Grall June 30, 2014, 4:20 p.m. UTC | #12
On 06/30/2014 05:14 PM, Ian Campbell wrote:
> On Mon, 2014-06-30 at 17:10 +0100, Julien Grall wrote:
>> Hi Ian,
>>
>> On 06/30/2014 04:58 PM, Ian Campbell wrote:
>>> On Fri, 2014-06-27 at 14:09 +0100, Ian Campbell wrote:
>>>> On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
>>>>>
>>>>> On 27/06/14 14:01, Ian Campbell wrote:
>>>>>>> Futhermore, it looks like we always allocate memory from the top of this
>>>>>>> zone. Is it normal?
>>>>>>
>>>>>> Normal behaviour is to allocate from as high as possible so I'm not
>>>>>> surprised that applies with a zone as well.
>>>>>
>>>>> For DOM0 bank allocation, shouldn't we try to allocate as lower as 
>>>>> possible? If not, we may not be able to create a second bank for dom0.
>>>>
>>>> The loop over bits is supposed to help there, but it's not as effective
>>>> with higher addresses I suppose.
>>>>
>>>> We'd need a new flag fore the allocate. Which is doable I guess. I'll
>>>> give it a try.
>>>
>>> It's a bit tricky and involves messing with the guts of the page
>>> allocator, which I'd rather avoid...
>>>
>>> We do allow allocations to be merged onto the start of bank 0, which in
>>> the normal case is what I would expect to happen, although it isn't
>>> guaranteed. In that case we would try higher addresses and might find
>>> enough RAM there.
>>>
>>> I suppose I could relax the constraint a bit and allow new banks in
>>> front of bank zero if they were >= 128MB (say).
>>
>> So, replacing the bank zero by the new bank, right? IIRC, some Linux
>> version are requiring the bank to be ordered.
> 
> Inserting the new bank before, retaining the ordering.

Thanks, this plan sounds good!
Ian Campbell June 30, 2014, 4:27 p.m. UTC | #13
On Mon, 2014-06-30 at 17:20 +0100, Julien Grall wrote:
> On 06/30/2014 05:14 PM, Ian Campbell wrote:
> > On Mon, 2014-06-30 at 17:10 +0100, Julien Grall wrote:
> >> Hi Ian,
> >>
> >> On 06/30/2014 04:58 PM, Ian Campbell wrote:
> >>> On Fri, 2014-06-27 at 14:09 +0100, Ian Campbell wrote:
> >>>> On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
> >>>>>
> >>>>> On 27/06/14 14:01, Ian Campbell wrote:
> >>>>>>> Futhermore, it looks like we always allocate memory from the top of this
> >>>>>>> zone. Is it normal?
> >>>>>>
> >>>>>> Normal behaviour is to allocate from as high as possible so I'm not
> >>>>>> surprised that applies with a zone as well.
> >>>>>
> >>>>> For DOM0 bank allocation, shouldn't we try to allocate as lower as 
> >>>>> possible? If not, we may not be able to create a second bank for dom0.
> >>>>
> >>>> The loop over bits is supposed to help there, but it's not as effective
> >>>> with higher addresses I suppose.
> >>>>
> >>>> We'd need a new flag fore the allocate. Which is doable I guess. I'll
> >>>> give it a try.
> >>>
> >>> It's a bit tricky and involves messing with the guts of the page
> >>> allocator, which I'd rather avoid...
> >>>
> >>> We do allow allocations to be merged onto the start of bank 0, which in
> >>> the normal case is what I would expect to happen, although it isn't
> >>> guaranteed. In that case we would try higher addresses and might find
> >>> enough RAM there.
> >>>
> >>> I suppose I could relax the constraint a bit and allow new banks in
> >>> front of bank zero if they were >= 128MB (say).
> >>
> >> So, replacing the bank zero by the new bank, right? IIRC, some Linux
> >> version are requiring the bank to be ordered.
> > 
> > Inserting the new bank before, retaining the ordering.
> 
> Thanks, this plan sounds good!

I've implemented it in the obvious way, but I can't for the life of me
get the scenario to actually occur despite playing with module load
location, fdt rsvmem, mem= and dom0_mem=.

I suppose that gives us some confidence that this is not going to happen
very often...

Ian.
Julien Grall June 30, 2014, 4:29 p.m. UTC | #14
On 06/30/2014 05:27 PM, Ian Campbell wrote:
> On Mon, 2014-06-30 at 17:20 +0100, Julien Grall wrote:
>> On 06/30/2014 05:14 PM, Ian Campbell wrote:
>>> On Mon, 2014-06-30 at 17:10 +0100, Julien Grall wrote:
>>>> Hi Ian,
>>>>
>>>> On 06/30/2014 04:58 PM, Ian Campbell wrote:
>>>>> On Fri, 2014-06-27 at 14:09 +0100, Ian Campbell wrote:
>>>>>> On Fri, 2014-06-27 at 14:04 +0100, Julien Grall wrote:
>>>>>>>
>>>>>>> On 27/06/14 14:01, Ian Campbell wrote:
>>>>>>>>> Futhermore, it looks like we always allocate memory from the top of this
>>>>>>>>> zone. Is it normal?
>>>>>>>>
>>>>>>>> Normal behaviour is to allocate from as high as possible so I'm not
>>>>>>>> surprised that applies with a zone as well.
>>>>>>>
>>>>>>> For DOM0 bank allocation, shouldn't we try to allocate as lower as 
>>>>>>> possible? If not, we may not be able to create a second bank for dom0.
>>>>>>
>>>>>> The loop over bits is supposed to help there, but it's not as effective
>>>>>> with higher addresses I suppose.
>>>>>>
>>>>>> We'd need a new flag fore the allocate. Which is doable I guess. I'll
>>>>>> give it a try.
>>>>>
>>>>> It's a bit tricky and involves messing with the guts of the page
>>>>> allocator, which I'd rather avoid...
>>>>>
>>>>> We do allow allocations to be merged onto the start of bank 0, which in
>>>>> the normal case is what I would expect to happen, although it isn't
>>>>> guaranteed. In that case we would try higher addresses and might find
>>>>> enough RAM there.
>>>>>
>>>>> I suppose I could relax the constraint a bit and allow new banks in
>>>>> front of bank zero if they were >= 128MB (say).
>>>>
>>>> So, replacing the bank zero by the new bank, right? IIRC, some Linux
>>>> version are requiring the bank to be ordered.
>>>
>>> Inserting the new bank before, retaining the ordering.
>>
>> Thanks, this plan sounds good!
> 
> I've implemented it in the obvious way, but I can't for the life of me
> get the scenario to actually occur despite playing with module load
> location, fdt rsvmem, mem= and dom0_mem=.
> 
> I suppose that gives us some confidence that this is not going to happen
> very often...

If you send me the patch, I can give a try on the versatile express to
see what happen.
Ian Campbell June 30, 2014, 4:31 p.m. UTC | #15
On Mon, 2014-06-30 at 17:29 +0100, Julien Grall wrote:
> If you send me the patch, I can give a try on the versatile express to
> see what happen.

Thanks, I'm heading home now but I'll post v4 in the morning...
diff mbox

Patch

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 9d9cba9..3fdb20e 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -45,6 +45,13 @@  custom_param("dom0_mem", parse_dom0_mem);
 # define DPRINT(fmt, args...) do {} while ( 0 )
 #endif
 
+//#define DEBUG_11_ALLOCATION
+#ifdef DEBUG_11_ALLOCATION
+# define D11PRINT(fmt, args...) printk(XENLOG_DEBUG fmt, ##args)
+#else
+# define D11PRINT(fmt, args...) do {} while ( 0 )
+#endif
+
 /*
  * Amount of extra space required to dom0's device tree.  No new nodes
  * are added (yet) but one terminating reserve map entry (16 bytes) is
@@ -67,39 +74,274 @@  struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
     return alloc_vcpu(dom0, 0, 0);
 }
 
-static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
+static unsigned int get_11_allocation_size(paddr_t size)
 {
-    paddr_t start;
-    paddr_t size;
-    struct page_info *pg;
-    unsigned int order = get_order_from_bytes(dom0_mem);
-    int res;
-    paddr_t spfn;
+    /*
+     * get_order_from_bytes returns the order greater than or equal to
+     * the given size, but we need less than or equal. Adding one to
+     * the size pushes an evenly aligned size into the next order, so
+     * we can then unconditionally subtract 1 from the order which is
+     * returned.
+     */
+    return get_order_from_bytes(size + 1) - 1;
+}
 
-    if ( is_32bit_domain(d) )
-        pg = alloc_domheap_pages(d, order, MEMF_bits(32));
-    else
-        pg = alloc_domheap_pages(d, order, 0);
-    if ( !pg )
-        panic("Failed to allocate contiguous memory for dom0");
+/*
+ * Insert the given pages into a memory bank, banks are ordered by address.
+ *
+ * Returns false if the memory would be below bank 0 or we have run
+ * out of banks. In this case it will free the pages.
+ */
+static bool_t insert_11_bank(struct domain *d,
+                             struct kernel_info *kinfo,
+                             struct page_info *pg,
+                             unsigned int order)
+{
+    int res, i;
+    paddr_t spfn;
+    paddr_t start, size;
 
     spfn = page_to_mfn(pg);
     start = pfn_to_paddr(spfn);
     size = pfn_to_paddr((1 << order));
 
-    // 1:1 mapping
-    printk("Populate P2M %#"PRIx64"->%#"PRIx64" (1:1 mapping for dom0)\n",
-           start, start + size);
-    res = guest_physmap_add_page(d, spfn, spfn, order);
+    D11PRINT("Allocated %#"PRIpaddr"-%#"PRIpaddr" (%ldMB/%ldMB, order %d)\n",
+             start, start + size,
+             1UL << (order+PAGE_SHIFT-20),
+             /* Don't want format this as PRIpaddr (16 digit hex) */
+             (unsigned long)(kinfo->unassigned_mem >> 20),
+             order);
 
-    if ( res )
-        panic("Unable to add pages in DOM0: %d", res);
+    if ( kinfo->mem.nr_banks > 0 && start + size < kinfo->mem.bank[0].start )
+    {
+        D11PRINT("Allocation is below bank 0, not using\n");
+        goto fail;
+    }
 
-    kinfo->mem.bank[0].start = start;
-    kinfo->mem.bank[0].size = size;
-    kinfo->mem.nr_banks = 1;
+    res = guest_physmap_add_page(d, spfn, spfn, order);
+    if ( res )
+        panic("Failed map pages to DOM0: %d", res);
 
     kinfo->unassigned_mem -= size;
+
+    if ( kinfo->mem.nr_banks == 0 )
+    {
+        kinfo->mem.bank[0].start = start;
+        kinfo->mem.bank[0].size = size;
+        kinfo->mem.nr_banks = 1;
+        return true;
+    }
+
+    for( i = 0; i < kinfo->mem.nr_banks; i++ )
+    {
+        struct membank *bank = &kinfo->mem.bank[i];
+
+        /* If possible merge new memory into the start of the bank */
+        if ( bank->start == start+size )
+        {
+            bank->start = start;
+            bank->size += size;
+            return true;
+        }
+
+        /* If possible merge new memory onto the end of the bank */
+        if ( start == bank->start + bank->size )
+        {
+            bank->size += size;
+            return true;
+        }
+
+        /*
+         * Otherwise if it is below this bank insert new memory in a
+         * new bank before this one. If there was a lower bank we
+         * could have inserted the memory into/before we would already
+         * have done so, so this must be the right place.
+         */
+        if ( start + size < bank->start && kinfo->mem.nr_banks < NR_MEM_BANKS )
+        {
+            memmove(bank + 1, bank, sizeof(*bank)*(kinfo->mem.nr_banks - i));
+            kinfo->mem.nr_banks++;
+            bank->start = start;
+            bank->size = size;
+            return true;
+        }
+    }
+
+    if ( i == kinfo->mem.nr_banks && kinfo->mem.nr_banks < NR_MEM_BANKS )
+    {
+        struct membank *bank = &kinfo->mem.bank[kinfo->mem.nr_banks];
+
+        bank->start = start;
+        bank->size = size;
+        kinfo->mem.nr_banks++;
+        return true;
+    }
+
+    /* If we get here then there are no more banks to fill. */
+
+fail:
+    free_domheap_pages(pg, order);
+    return false;
+}
+
+/*
+ * This is all pretty horrible.
+ *
+ * Requirements:
+ *
+ * 1. The dom0 kernel should be loaded within the first 128MB of RAM. This
+ *    is necessary at least for Linux zImage kernels, which are all we
+ *    support today.
+ * 2. We want to put the dom0 kernel, ramdisk and DTB in the same
+ *    bank. Partly this is just easier for us to deal with, but also
+ *    the ramdisk and DTB must be placed within a certain proximity of
+ *    the kernel within RAM.
+ * 3. For 32-bit dom0 we want to place as much of the RAM as we
+ *    reasonably can below 4GB, so that it can be used by non-LPAE
+ *    enabled kernels.
+ * 4. For 32-bit dom0 the kernel must be located below 4GB.
+ * 5. We want to have a few largers banks rather than many smaller ones.
+ *
+ * For the first two requirements we need to make sure that the lowest
+ * bank is sufficiently large.
+ *
+ * For convenience we also sort the banks by physical address.
+ *
+ * The memory allocator does not really give us the flexibility to
+ * meet these requirements directly. So instead of proceed as follows:
+ *
+ * We first allocate the largest allocation we can as low as we
+ * can. This then becomes the first bank. This bank must be at least
+ * 128MB (or dom0_mem if that is smaller).
+ *
+ * Then we start allocating more memory, trying to allocate the
+ * largest possible size and trying smaller sizes until we
+ * successfully allocate something.
+ *
+ * We then try and insert this memory in to the list of banks. If it
+ * can be merged into an existing bank then this is trivial.
+ *
+ * If the new memory is before the first bank (and cannot be merged
+ * into it) then we give up. Since the allocator prefers to allocate
+ * high addresses first and the first bank has already been allocated
+ * to be as low as possible this likely means we wouldn't have been
+ * able to allocate much more memory anyway.
+ *
+ * Otherwise we insert a new bank. If we've reached MAX_NR_BANKS then
+ * we give up.
+ *
+ * For 32-bit domain we require that the initial allocation for the
+ * first bank is under 4G. Then for the subsequent allocations we
+ * initially allocate memory only from below 4GB. Once that runs out
+ * (as described above) we allow higher allocations and continue until
+ * that runs out (or we have allocated sufficient dom0 memory).
+ */
+static void allocate_memory_11(struct domain *d, struct kernel_info *kinfo)
+{
+    const unsigned int min_low_order =
+        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
+    const unsigned int min_order = get_order_from_bytes(MB(4));
+    struct page_info *pg;
+    unsigned int order = get_11_allocation_size(kinfo->unassigned_mem);
+    int i;
+
+    bool_t lowmem = is_32bit_domain(d);
+    unsigned int bits;
+
+    printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
+           /* Don't want format this as PRIpaddr (16 digit hex) */
+           (unsigned long)(kinfo->unassigned_mem >> 20));
+
+    kinfo->mem.nr_banks = 0;
+
+    /*
+     * First try and allocate the largest thing we can as low as
+     * possible to be bank 0.
+     */
+    while ( order >= min_low_order )
+    {
+        for ( bits = order ; bits < (lowmem ? 32 : PADDR_BITS); bits++ )
+        {
+            pg = alloc_domheap_pages(d, order, MEMF_bits(bits));
+            if ( pg != NULL )
+                goto got_bank0;
+        }
+        order--;
+    }
+
+    panic("Unable to allocate first memory bank");
+
+ got_bank0:
+
+    if ( !insert_11_bank(d, kinfo, pg, order) )
+        BUG(); /* Cannot fail for first bank */
+
+    /* Now allocate more memory and fill in additional banks */
+
+    order = get_11_allocation_size(kinfo->unassigned_mem);
+    while ( kinfo->unassigned_mem && kinfo->mem.nr_banks < NR_MEM_BANKS )
+    {
+        pg = alloc_domheap_pages(d, order, lowmem ? MEMF_bits(32) : 0);
+        if ( !pg )
+        {
+            order --;
+
+            if ( lowmem && order < min_low_order)
+            {
+                D11PRINT("Failed at min_low_order, allow high allocations\n");
+                order = get_11_allocation_size(kinfo->unassigned_mem);
+                lowmem = false;
+                continue;
+            }
+            if ( order >= min_order )
+                continue;
+
+            /* No more we can do */
+            break;
+        }
+
+        if ( !insert_11_bank(d, kinfo, pg, order) )
+        {
+            if ( kinfo->mem.nr_banks == NR_MEM_BANKS )
+                /* Nothing more we can do. */
+                break;
+
+            if ( lowmem )
+            {
+                D11PRINT("Allocation below bank 0, allow high allocations\n");
+                order = get_11_allocation_size(kinfo->unassigned_mem);
+                lowmem = false;
+                continue;
+            }
+            else
+            {
+                D11PRINT("Allocation below bank 0\n");
+                break;
+            }
+        }
+
+        /*
+         * Success, next time around try again to get the largest order
+         * allocation possible.
+         */
+        order = get_11_allocation_size(kinfo->unassigned_mem);
+    }
+
+    if ( kinfo->unassigned_mem )
+        printk("WARNING: Failed to allocate requested dom0 memory."
+               /* Don't want format this as PRIpaddr (16 digit hex) */
+               " %ldMB unallocated\n",
+               (unsigned long)kinfo->unassigned_mem >> 20);
+
+    for( i = 0; i < kinfo->mem.nr_banks; i++ )
+    {
+        printk("BANK[%d] %#"PRIpaddr"-%#"PRIpaddr" (%ldMB)\n",
+               i,
+               kinfo->mem.bank[i].start,
+               kinfo->mem.bank[i].start + kinfo->mem.bank[i].size,
+               /* Don't want format this as PRIpaddr (16 digit hex) */
+               (unsigned long)(kinfo->mem.bank[i].size >> 20));
+    }
 }
 
 static void allocate_memory(struct domain *d, struct kernel_info *kinfo)