libxc/arm: Correctly handle the difference between virtual and physical address

Message ID 1386696982-23129-1-git-send-email-julien.grall@linaro.org
State New
Headers show

Commit Message

Julien Grall Dec. 10, 2013, 5:36 p.m.
xc_dom_alloc_page deals with virtual address not physical address. When
an ELF is loaded, virtual address and physical address may be different.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
---
 tools/libxc/xc_dom_arm.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

Comments

Ian Campbell Dec. 10, 2013, 5:44 p.m. | #1
On Tue, 2013-12-10 at 17:36 +0000, Julien Grall wrote:
> xc_dom_alloc_page deals with virtual address not physical address. When
> an ELF is loaded, virtual address and physical address may be different.

Can you give an example of the program headers of an ELF file (readelf
-l) which causes this? How was it constructed?

When we are building the guest the MMU is disabled so virt == phys.

If the ELF has virt != phys then how do we know whether it expects to be
launched with the MMU on or off? We can only really launch with the MMU
off, so do we require that the initial ELF entry point be PIC and know
how to enable the MMU?

Ian.

> 
> Signed-off-by: Julien Grall <julien.grall@linaro.org>
> ---
>  tools/libxc/xc_dom_arm.c |   16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
> index a40e04d..75a6f1c 100644
> --- a/tools/libxc/xc_dom_arm.c
> +++ b/tools/libxc/xc_dom_arm.c
> @@ -108,13 +108,15 @@ static int shared_info_arm(struct xc_dom_image *dom, void *ptr)
>  static int vcpu_arm32(struct xc_dom_image *dom, void *ptr)
>  {
>      vcpu_guest_context_t *ctxt = ptr;
> +    uint32_t rambase = dom->rambase_pfn << XC_PAGE_SHIFT;
> +    uint32_t offset = dom->parms.virt_base - rambase;
>  
>      DOMPRINTF_CALLED(dom->xch);
>  
>      /* clear everything */
>      memset(ctxt, 0, sizeof(*ctxt));
>  
> -    ctxt->user_regs.pc32 = dom->parms.virt_entry;
> +    ctxt->user_regs.pc32 = dom->parms.virt_entry - offset;
>  
>      /* Linux boot protocol. See linux.Documentation/arm/Booting. */
>      ctxt->user_regs.r0_usr = 0; /* SBZ */
> @@ -125,7 +127,7 @@ static int vcpu_arm32(struct xc_dom_image *dom, void *ptr)
>       * like a valid pointer to a set of ATAGS or a DTB.
>       */
>      ctxt->user_regs.r2_usr = dom->devicetree_blob ?
> -        dom->devicetree_seg.vstart : 0xffffffff;
> +        (dom->devicetree_seg.vstart - offset) : 0xffffffff;
>  
>      ctxt->sctlr = SCTLR_GUEST_INIT;
>  
> @@ -280,15 +282,15 @@ int arch_setup_meminit(struct xc_dom_image *dom)
>  
>      if ( dom->devicetree_blob )
>      {
> -        const uint64_t rambase = dom->rambase_pfn << XC_PAGE_SHIFT;
> -        const uint64_t ramend = rambase + ( dom->total_pages << XC_PAGE_SHIFT );
> +        const uint64_t virtbase = dom->parms.virt_base;
> +        const uint64_t virtend = virtbase + ( dom->total_pages << XC_PAGE_SHIFT );
>          const uint64_t dtbsize = ROUNDUP(dom->devicetree_size, XC_PAGE_SHIFT);
>  
>          /* Place at 128MB if there is sufficient RAM */
> -        if ( ramend >= rambase + 128*1024*1024 + dtbsize )
> -            dom->devicetree_seg.vstart = rambase + 128*1024*1024;
> +        if ( virtend >= virtbase + 128*1024*1024 + dtbsize )
> +            dom->devicetree_seg.vstart = virtbase + 128*1024*1024;
>          else /* otherwise at top of RAM */
> -            dom->devicetree_seg.vstart = ramend - dtbsize;
> +            dom->devicetree_seg.vstart = virtend - dtbsize;
>  
>          dom->devicetree_seg.vend =
>              dom->devicetree_seg.vstart + dom->devicetree_size;
Julien Grall Dec. 10, 2013, 6:16 p.m. | #2
On 12/10/2013 05:44 PM, Ian Campbell wrote:
> On Tue, 2013-12-10 at 17:36 +0000, Julien Grall wrote:
>> xc_dom_alloc_page deals with virtual address not physical address. When
>> an ELF is loaded, virtual address and physical address may be different.
> 
> Can you give an example of the program headers of an ELF file (readelf
> -l) which causes this? How was it constructed?

This is the beginning of readelf -l for a FreeBSD ARM binary:

Elf file type is EXEC (Executable file)
Entry point 0xc0100100
There are 5 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  EXIDX          0x2f6e54 0xc03f6e54 0xc03f6e54 0x0df90 0x0df90 R   0x4
  PHDR           0x000034 0xc0100034 0xc0100034 0x000a0 0x000a0 R E 0x4
  INTERP         0x2b67f0 0xc03b67f0 0xc03b67f0 0x0000d 0x0000d R   0x1
      [Requesting program interpreter: /red/herring]
  LOAD           0x000000 0xc0100000 0xc0100000 0x32dadc 0x3538d8 RWE 0x8000
  DYNAMIC        0x32da74 0xc042da74 0xc042da74 0x00068 0x00068 RW  0x4

FreeBSD will build the kernel with:
   - kernel physical address: 0x80100000
   - kernel virtual address:  0xC0100000
And the ELF will be linked with the virtual address.

> When we are building the guest the MMU is disabled so virt == phys.

Right, but ELF loader is based on virtual address not physical address.
Guest creation will fail when libxc is trying to allocate segment for the device
tree (see xc_dom_alloc_segment).
As it's allocate after the kernel, virt_alloc_end will contains a virtual address.
But device tree address is generated with a physical address...
Therefore, the function will fail with "segment start too low ...".

> If the ELF has virt != phys then how do we know whether it expects to be
> launched with the MMU on or off? We can only really launch with the MMU
> off, so do we require that the initial ELF entry point be PIC and know
> how to enable the MMU?

This issue is when as virt == phys, build ELF with virt != phys is very difficult.
I took a couple of hours without any success.
In any case, FreeBSD is building its ELF with virt = phys.

When the guest is creating, the ELF should loaded like zImage at the specific
physical address. Then the guest will start will MMU turn off, and during the
first instructions it will use fixup to get the right address.
Ian Campbell Dec. 11, 2013, 10:02 a.m. | #3
On Tue, 2013-12-10 at 18:16 +0000, Julien Grall wrote:
> This issue is when as virt == phys, build ELF with virt != phys is very difficult.

OK, so I think I was mislead by your commit message, this is not
actually about virt vs phys as such (or as represented in the ELF
header) but is actually more about link address vs. load address. Where
link address == ELF vaddr (==paddr). And because the MMU is disable load
address == some vaddr (==paddr), which may differ from the ELF vaddr.

The terminology in xc_dom, which uses vaddr a lot because of PV x86,
isn't helpful here but is that an accurate summary?

[...]
> When the guest is creating, the ELF should loaded like zImage at the specific
> physical address.

So that would be my next question -- where does this load address come
from?

I suppose it has to be the 0x80100000 kernel physical address you gave
earlier?

I think this means "where is dom->parms.virt_base" initialised. Have you
added some ELF notes to your BSD kernel image?

>  Then the guest will start will MMU turn off, and during the
> first instructions it will use fixup to get the right address.

"fixup" == enable the MMU, or "fixup" == some sort of relocation/PIC?

Ian.
Julien Grall Dec. 11, 2013, 7:26 p.m. | #4
On 12/11/2013 10:02 AM, Ian Campbell wrote:
> On Tue, 2013-12-10 at 18:16 +0000, Julien Grall wrote:
>> This issue is when as virt == phys, build ELF with virt != phys is very difficult.
> 
> OK, so I think I was mislead by your commit message, this is not
> actually about virt vs phys as such (or as represented in the ELF
> header) but is actually more about link address vs. load address. Where
> link address == ELF vaddr (==paddr). And because the MMU is disable load
> address == some vaddr (==paddr), which may differ from the ELF vaddr.
> 
> The terminology in xc_dom, which uses vaddr a lot because of PV x86,
> isn't helpful here but is that an accurate summary?

Right, and this address is actually equal to ELF vaddr. But the device
tree base address will be computed with the physical address.

> [...]
>> When the guest is creating, the ELF should loaded like zImage at the specific
>> physical address.
> 
> So that would be my next question -- where does this load address come
> from?

Load address ? Virtual ? Physical ?

> I suppose it has to be the 0x80100000 kernel physical address you gave
> earlier?
> 
> I think this means "where is dom->parms.virt_base" initialised. Have you
> added some ELF notes to your BSD kernel image?

I have tried to play with the ELF notes. My current configuration is:

XEN_ELFNOTE_VIRT_BASE    vaddr   	(0xC0000000)
XEN_ELFNOTE_PADDR_OFFSET vaddr		(0xC0000000)

It's copied from Linux.

If I try to modify PADDR_OFFSET with (KERNBASE - PHYSADDR) or whatever
value it doesn't work. I got
"segment kernel too large (0x355 > 0x4000 - 0x80100 pages): Out of memory".

>>  Then the guest will start will MMU turn off, and during the
>> first instructions it will use fixup to get the right address.
> 
> "fixup" == enable the MMU, or "fixup" == some sort of relocation/PIC?

Some sort of relocation. When MMU is turn off, every time the kernel
needs to call asm callback, it will add an offset.
Ian Campbell Dec. 12, 2013, 10:35 a.m. | #5
On Wed, 2013-12-11 at 19:26 +0000, Julien Grall wrote:
> On 12/11/2013 10:02 AM, Ian Campbell wrote:
> > On Tue, 2013-12-10 at 18:16 +0000, Julien Grall wrote:
> >> This issue is when as virt == phys, build ELF with virt != phys is very difficult.
> > 
> > OK, so I think I was mislead by your commit message, this is not
> > actually about virt vs phys as such (or as represented in the ELF
> > header) but is actually more about link address vs. load address. Where
> > link address == ELF vaddr (==paddr). And because the MMU is disable load
> > address == some vaddr (==paddr), which may differ from the ELF vaddr.
> > 
> > The terminology in xc_dom, which uses vaddr a lot because of PV x86,
> > isn't helpful here but is that an accurate summary?
> 
> Right, and this address is actually equal to ELF vaddr. But the device
> tree base address will be computed with the physical address.
> 
> > [...]
> >> When the guest is creating, the ELF should loaded like zImage at the specific
> >> physical address.
> > 
> > So that would be my next question -- where does this load address come
> > from?
> 
> Load address ? Virtual ? Physical ?

Load address == The literal address where the kernel image is loaded,
e.g. on native it would be the physical address.

> > I suppose it has to be the 0x80100000 kernel physical address you gave
> > earlier?
> > 
> > I think this means "where is dom->parms.virt_base" initialised. Have you
> > added some ELF notes to your BSD kernel image?
> 
> I have tried to play with the ELF notes. My current configuration is:
> 
> XEN_ELFNOTE_VIRT_BASE    vaddr   	(0xC0000000)
> XEN_ELFNOTE_PADDR_OFFSET vaddr		(0xC0000000)
> 
> It's copied from Linux.

XEN_ELFNOTE_PADDR_OFFSET is 0 for x86 Linux though.

It's possible that for ARM it should be RAMBASE or something? Or perhaps
it should be 0 and the RAMBASE should be added by the tools. Or maybe it
should be omitted and only VIRT_BASE is needed?

Does NetBSD absolutely require that it is loaded at precisely address
0x80100000 or is the requirement rambase + 0x00100000.

If NetBSD has requirements about where RAM is the we may need a new note
to designate the required RAM base address.

> If I try to modify PADDR_OFFSET with (KERNBASE - PHYSADDR) or whatever
> value it doesn't work. I got
> "segment kernel too large (0x355 > 0x4000 - 0x80100 pages): Out of memory".

It is very possible that the existing code buggily assumes in various
places that RAM starts at address 0, because it has only ever been run
on x86. You will probably need to fix this by using rambase_pfn in
various places.

> >>  Then the guest will start will MMU turn off, and during the
> >> first instructions it will use fixup to get the right address.
> > 
> > "fixup" == enable the MMU, or "fixup" == some sort of relocation/PIC?
> 
> Some sort of relocation. When MMU is turn off, every time the kernel
> needs to call asm callback, it will add an offset.

Sounds a bit mad to me compared with writing PIC, but OK.

Ian.
Julien Grall Dec. 12, 2013, 1:55 p.m. | #6
On 12/12/2013 10:35 AM, Ian Campbell wrote:
> On Wed, 2013-12-11 at 19:26 +0000, Julien Grall wrote:
>> On 12/11/2013 10:02 AM, Ian Campbell wrote:
>>> On Tue, 2013-12-10 at 18:16 +0000, Julien Grall wrote:
>>>> This issue is when as virt == phys, build ELF with virt != phys is very difficult.
>>>
>>> OK, so I think I was mislead by your commit message, this is not
>>> actually about virt vs phys as such (or as represented in the ELF
>>> header) but is actually more about link address vs. load address. Where
>>> link address == ELF vaddr (==paddr). And because the MMU is disable load
>>> address == some vaddr (==paddr), which may differ from the ELF vaddr.
>>>
>>> The terminology in xc_dom, which uses vaddr a lot because of PV x86,
>>> isn't helpful here but is that an accurate summary?
>>
>> Right, and this address is actually equal to ELF vaddr. But the device
>> tree base address will be computed with the physical address.
>>
>>> [...]
>>>> When the guest is creating, the ELF should loaded like zImage at the specific
>>>> physical address.
>>>
>>> So that would be my next question -- where does this load address come
>>> from?
>>
>> Load address ? Virtual ? Physical ?
>
> Load address == The literal address where the kernel image is loaded,
> e.g. on native it would be the physical address.
>
>>> I suppose it has to be the 0x80100000 kernel physical address you gave
>>> earlier?
>>>
>>> I think this means "where is dom->parms.virt_base" initialised. Have you
>>> added some ELF notes to your BSD kernel image?
>>
>> I have tried to play with the ELF notes. My current configuration is:
>>
>> XEN_ELFNOTE_VIRT_BASE    vaddr   	(0xC0000000)
>> XEN_ELFNOTE_PADDR_OFFSET vaddr		(0xC0000000)
>>
>> It's copied from Linux.
>
> XEN_ELFNOTE_PADDR_OFFSET is 0 for x86 Linux though.

Sorry I meant FreeBSD.

>
> It's possible that for ARM it should be RAMBASE or something? Or perhaps
> it should be 0 and the RAMBASE should be added by the tools. Or maybe it
> should be omitted and only VIRT_BASE is needed?

It's not clear to me what is the usage of XEN_ELFNOTE_PADDR_OFFSET. Is 
it the physical address? An offset?

> Does NetBSD absolutely require that it is loaded at precisely address
> 0x80100000 or is the requirement rambase + 0x00100000.
>
> If NetBSD has requirements about where RAM is the we may need a new note
> to designate the required RAM base address.

FreeBSD requires to know at compile time the RAM base address. I think 
it's fine now to hardcode to GUEST_RAMBASE. We can modify it later.

>> If I try to modify PADDR_OFFSET with (KERNBASE - PHYSADDR) or whatever
>> value it doesn't work. I got
>> "segment kernel too large (0x355 > 0x4000 - 0x80100 pages): Out of memory".
>
> It is very possible that the existing code buggily assumes in various
> places that RAM starts at address 0, because it has only ever been run
> on x86. You will probably need to fix this by using rambase_pfn in
> various places.

That would mean we need to fix up every seg->vstart/vend. The main issue 
is, at the end of xc_dom_alloc_segment, we will store vend to 
virt_alloc_end. This value we be retrieved to the next allocation..

I wrote the patch in this way because I think it's fine to do all the 
allocation in the virtual address space (even if MMU is turn off during 
at boot time) and then, thanks to xc_dom_alloc_segment, translate to 
physical address. It's the currently behaviour. But it works fine 
because for zImage, we made the assumption that virt_base == rambase_pfn 
<< PAGE_SHIFT.

The main change is using virt_base to allocate the device tree and 
substract different offset when the guest will start.

>>>>   Then the guest will start will MMU turn off, and during the
>>>> first instructions it will use fixup to get the right address.
>>>
>>> "fixup" == enable the MMU, or "fixup" == some sort of relocation/PIC?
>>
>> Some sort of relocation. When MMU is turn off, every time the kernel
>> needs to call asm callback, it will add an offset.
>
> Sounds a bit mad to me compared with writing PIC, but OK.

Most of the code is writing PIC. But as in Linux, for assembly code you 
can have something like that:

.word my_func

my_func will be replaced by the virtual address not the physical 
address. So when the kernel want to call the function with MMU disabled, 
it will need to add an offset. Thankfully, you don't need it everywhere.
Ian Campbell Dec. 12, 2013, 2:36 p.m. | #7
On Thu, 2013-12-12 at 13:55 +0000, Julien Grall wrote:
> > It's possible that for ARM it should be RAMBASE or something? Or perhaps
> > it should be 0 and the RAMBASE should be added by the tools. Or maybe it
> > should be omitted and only VIRT_BASE is needed?
> 
> It's not clear to me what is the usage of XEN_ELFNOTE_PADDR_OFFSET. Is 
> it the physical address? An offset?

/*
 * The offset of the ELF paddr field from the actual required
 * pseudo-physical address (numeric).
 *
 * This is used to maintain backwards compatibility with older kernels
 * which wrote __PAGE_OFFSET into that field. This field defaults to 0
 * if not present.
 *
 * LEGACY: ELF_PADDR_OFFSET. (n.b. legacy default is VIRT_BASE)
 */
#define XEN_ELFNOTE_PADDR_OFFSET   4

So apparently it is the offset between the paddr field in the ELF file
and the real load address. Which seems like it should be
0xc0100000-0x80100000, but you'll have to UTSL to be sure.

> > Does NetBSD absolutely require that it is loaded at precisely address
> > 0x80100000 or is the requirement rambase + 0x00100000.
> >
> > If NetBSD has requirements about where RAM is the we may need a new note
> > to designate the required RAM base address.
> 
> FreeBSD requires to know at compile time the RAM base address. I think 
> it's fine now to hardcode to GUEST_RAMBASE. We can modify it later.

OK, that makes it easier for these headers to fit I think.

> >> If I try to modify PADDR_OFFSET with (KERNBASE - PHYSADDR) or whatever
> >> value it doesn't work. I got
> >> "segment kernel too large (0x355 > 0x4000 - 0x80100 pages): Out of memory".
> >
> > It is very possible that the existing code buggily assumes in various
> > places that RAM starts at address 0, because it has only ever been run
> > on x86. You will probably need to fix this by using rambase_pfn in
> > various places.
> 
> That would mean we need to fix up every seg->vstart/vend.

Quite possible, yes.

This code was written to deal with a PV x86 guest, which has a different
set of requirements to what is needed on ARM.

>  The main issue 
> is, at the end of xc_dom_alloc_segment, we will store vend to 
> virt_alloc_end. This value we be retrieved to the next allocation..
> 
> I wrote the patch in this way because I think it's fine to do all the 
> allocation in the virtual address space (even if MMU is turn off during 
> at boot time) and then, thanks to xc_dom_alloc_segment, translate to 
> physical address. It's the currently behaviour. But it works fine 
> because for zImage, we made the assumption that virt_base == rambase_pfn 
> << PAGE_SHIFT.

The fact that the fields are called virt_* is irrelevant, these are
physical addresses on ARM, the naming is from x86 where they are
physical addresses.

If you are trying to somehow shoehorn the kernels virtual addresses into
these then things will get confusing fast. Everything should be thought
of in terms of physical addresses and the *load* address that BSD
expects. 

> The main change is using virt_base to allocate the device tree and 
> substract different offset when the guest will start.

If this is done correctly then virt_base will contain physical addresses
and everything works without offsets.

Ignore the field names and think only about physical addresses.

> >>>>   Then the guest will start will MMU turn off, and during the
> >>>> first instructions it will use fixup to get the right address.
> >>>
> >>> "fixup" == enable the MMU, or "fixup" == some sort of relocation/PIC?
> >>
> >> Some sort of relocation. When MMU is turn off, every time the kernel
> >> needs to call asm callback, it will add an offset.
> >
> > Sounds a bit mad to me compared with writing PIC, but OK.
> 
> Most of the code is writing PIC. But as in Linux, for assembly code you 
> can have something like that:
> 
> .word my_func
> 
> my_func will be replaced by the virtual address not the physical 
> address. So when the kernel want to call the function with MMU disabled, 
> it will need to add an offset. Thankfully, you don't need it everywhere.

That's not PIC code then. PIC code would have an offset not an address
and would call the function as the offset from current PC or something
similar, but we are digressing here.

Ian.

Patch

diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index a40e04d..75a6f1c 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -108,13 +108,15 @@  static int shared_info_arm(struct xc_dom_image *dom, void *ptr)
 static int vcpu_arm32(struct xc_dom_image *dom, void *ptr)
 {
     vcpu_guest_context_t *ctxt = ptr;
+    uint32_t rambase = dom->rambase_pfn << XC_PAGE_SHIFT;
+    uint32_t offset = dom->parms.virt_base - rambase;
 
     DOMPRINTF_CALLED(dom->xch);
 
     /* clear everything */
     memset(ctxt, 0, sizeof(*ctxt));
 
-    ctxt->user_regs.pc32 = dom->parms.virt_entry;
+    ctxt->user_regs.pc32 = dom->parms.virt_entry - offset;
 
     /* Linux boot protocol. See linux.Documentation/arm/Booting. */
     ctxt->user_regs.r0_usr = 0; /* SBZ */
@@ -125,7 +127,7 @@  static int vcpu_arm32(struct xc_dom_image *dom, void *ptr)
      * like a valid pointer to a set of ATAGS or a DTB.
      */
     ctxt->user_regs.r2_usr = dom->devicetree_blob ?
-        dom->devicetree_seg.vstart : 0xffffffff;
+        (dom->devicetree_seg.vstart - offset) : 0xffffffff;
 
     ctxt->sctlr = SCTLR_GUEST_INIT;
 
@@ -280,15 +282,15 @@  int arch_setup_meminit(struct xc_dom_image *dom)
 
     if ( dom->devicetree_blob )
     {
-        const uint64_t rambase = dom->rambase_pfn << XC_PAGE_SHIFT;
-        const uint64_t ramend = rambase + ( dom->total_pages << XC_PAGE_SHIFT );
+        const uint64_t virtbase = dom->parms.virt_base;
+        const uint64_t virtend = virtbase + ( dom->total_pages << XC_PAGE_SHIFT );
         const uint64_t dtbsize = ROUNDUP(dom->devicetree_size, XC_PAGE_SHIFT);
 
         /* Place at 128MB if there is sufficient RAM */
-        if ( ramend >= rambase + 128*1024*1024 + dtbsize )
-            dom->devicetree_seg.vstart = rambase + 128*1024*1024;
+        if ( virtend >= virtbase + 128*1024*1024 + dtbsize )
+            dom->devicetree_seg.vstart = virtbase + 128*1024*1024;
         else /* otherwise at top of RAM */
-            dom->devicetree_seg.vstart = ramend - dtbsize;
+            dom->devicetree_seg.vstart = virtend - dtbsize;
 
         dom->devicetree_seg.vend =
             dom->devicetree_seg.vstart + dom->devicetree_size;