diff mbox series

[Xen-devel,for-4.12,v2,2/2] xen/arm: Stop relocating Xen

Message ID 20181214114455.5841-3-julien.grall@arm.com
State New
Headers show
Series xen/arm: mm: Boot fixes | expand

Commit Message

Julien Grall Dec. 14, 2018, 11:44 a.m. UTC
At the moment, Xen is relocated towards the end of the memory. While
this has the advantage to free space in low memory, the code is not
compliant with the break-before-make because it requires to switch
between two sets of page-table. This is not entirely trivial to fix as
it would require us to go through an identity mapping and disabling MMU.

Furthermore, it looks like that some platform (such as the Hikey960)
may not be able to bring-up secondary CPUs if the entry is too high.

I don't believe the low memory is an issue because Xen is quite tiny
(< 2MB). So the best solution is to stop relocating Xen. This has the
advantage to simplify the code and should speed-up the boot as relocation
is not necessary anymore.

Note that the break-before-make issue is not fixed by this patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reported-by: Matthew Daley <mattd@bugfuzz.com>
Tested-by: Matthew Daley <mattd@bugfuzz.com>

---
    Changes in v2:
        - Add Matthew's tested-by
---
 xen/arch/arm/arm32/head.S | 54 +++------------------------------------
 xen/arch/arm/arm64/head.S | 50 +++++-------------------------------
 xen/arch/arm/mm.c         | 18 +++----------
 xen/arch/arm/setup.c      | 65 +++--------------------------------------------
 xen/include/asm-arm/mm.h  |  2 +-
 5 files changed, 17 insertions(+), 172 deletions(-)

Comments

Andrii Anisov Dec. 14, 2018, 3:52 p.m. UTC | #1
Hello Julien,

Let me speculate a bit about the topic.

On 14.12.18 13:44, Julien Grall wrote:
> At the moment, Xen is relocated towards the end of the memory.
This statement is not really true. Some time ago, XEN was relocated toward the end of
the low memory (under 4 GB). Currently, on my board I see some kind of mess:

     (XEN) RAM: 0000000048000000 - 00000000bfffffff
     (XEN) RAM: 0000000500000000 - 000000057fffffff
     (XEN) RAM: 0000000600000000 - 000000067fffffff
     (XEN) RAM: 0000000700000000 - 000000077fffffff
     (XEN)
     (XEN) MODULE[0]: 0000000048000000 - 0000000048013000 Device Tree
     (XEN) MODULE[1]: 000000007a000000 - 000000007c000000 Kernel
     (XEN) MODULE[2]: 000000007c000000 - 000000007c010000 XSM
     (XEN)  RESVD[0]: 0000000048000000 - 0000000048013000
     (XEN)
     (XEN)
     (XEN) Command line: dom0_mem=3G console=dtuart dtuart=serial0 dom0_max_vcpus=2 bootscrub=0 loglvl=all cpufreq=none tbuf_size=8192 loglvl=all/none guest_loglvl=all/none
     (XEN) parameter "cpufreq" unknown!
     (XEN) Placing Xen at 0x000000077fe00000-0x0000000780000000
     (XEN) Update BOOTMOD_XEN from 0000000078080000-0000000078188d81 => 000000077fe00000-000000077ff08d81

As you can see XEN is moved towards the end of the first GB of the low memory instead of the end of under 4GB RAM.


> While
> this has the advantage to free space in low memory, the code is not
> compliant with the break-before-make because it requires to switch
> between two sets of page-table. This is not entirely trivial to fix as
> it would require us to go through an identity mapping and disabling MMU.
I understand this motivation though.

> Furthermore, it looks like that some platform (such as the Hikey960)
> may not be able to bring-up secondary CPUs if the entry is too high.Just a reminder that long time ago we implemented a move of XEN toward the real end of the memory, over 4GB.
As long as CPUs were not able to start to the code placed over 4 GB, we set secondary CPUs to be brought up to a XEN instance under 4GB, then jump to a copy over 4GB, following CPU0.

> I don't believe the low memory is an issue because Xen is quite tiny
> (< 2MB).
It is really tiny, but the problem is that Dom0 low memory (lower than 4 GB) RAM banks
start and end are aligned by 128MB. So existing of a single 1MB XEN cuts out 128MB from low memory for Dom0.
On my current setup I have two 128MB chunks stolen: one by relocated XEN, kernel module an XSM module, other another by device tree. So Dom0 gets 1664MB of low RAM, instead of physically available 1920MB.

     (XEN) Loading Domd0 kernel from boot module @ 000000007a000000
     (XEN) Allocating 1:1 mappings totalling 3072MB for dom0:
     (XEN) BANK[0] 0x00000050000000-0x00000078000000 (640MB)
     (XEN) BANK[1] 0x00000080000000-0x000000c0000000 (1024MB)
     (XEN) BANK[2] 0x00000540000000-0x00000580000000 (1024MB)
     (XEN) BANK[3] 0x00000748000000-0x00000760000000 (384MB)
     (XEN) Grant table range: 0x0000077fe00000-0x0000077fe40000

Such loss might be painful for those who are targeting low memory eager use-cases (i.e. multimedia) and lack of IOMMU on a SoC.

> So the best solution is to stop relocating Xen.
And those who cares about XEN placement should configure their bootloader to put XEN (and other boot modules) to a proper place right away.

> This has the advantage to simplify the code and should speed-up the boot as relocation
> is not necessary anymore.
Boot time improvements always make me glad :)

Please also note, that all above are kind of generic ideas. They are not linked to our target setup, we do not care about Dom0 and its 1:1 mapped memory. And, all in all, we have an IOMMU.
Andrii Anisov Dec. 14, 2018, 4:09 p.m. UTC | #2
On 14.12.18 17:52, Andrii Anisov wrote:
> Hello Julien,
> 
> Let me speculate a bit about the topic.
> 
> On 14.12.18 13:44, Julien Grall wrote:
>> At the moment, Xen is relocated towards the end of the memory.
> This statement is not really true. Some time ago, XEN was relocated toward the end of
> the low memory (under 4 GB). Currently, on my board I see some kind of mess:
> 
>      (XEN) RAM: 0000000048000000 - 00000000bfffffff
>      (XEN) RAM: 0000000500000000 - 000000057fffffff
>      (XEN) RAM: 0000000600000000 - 000000067fffffff
>      (XEN) RAM: 0000000700000000 - 000000077fffffff
>      (XEN)
>      (XEN) MODULE[0]: 0000000048000000 - 0000000048013000 Device Tree
>      (XEN) MODULE[1]: 000000007a000000 - 000000007c000000 Kernel
>      (XEN) MODULE[2]: 000000007c000000 - 000000007c010000 XSM
>      (XEN)  RESVD[0]: 0000000048000000 - 0000000048013000
>      (XEN)
>      (XEN)
>      (XEN) Command line: dom0_mem=3G console=dtuart dtuart=serial0 dom0_max_vcpus=2 bootscrub=0 loglvl=all cpufreq=none tbuf_size=8192 loglvl=all/none guest_loglvl=all/none
>      (XEN) parameter "cpufreq" unknown!
>      (XEN) Placing Xen at 0x000000077fe00000-0x0000000780000000
>      (XEN) Update BOOTMOD_XEN from 0000000078080000-0000000078188d81 => 000000077fe00000-000000077ff08d81
> 
> As you can see XEN is moved towards the end of the first GB of the low memory instead of the end of under 4GB RAM.

Sorry, I have to recall the piece above. I miscounted zeros in the relocation address. XEN is actually set by the end of the RAM above 4GB.


> On my current setup I have two 128MB chunks stolen: one by relocated XEN, kernel module an XSM module, 
Not by relocated XEN, but still by kernel and XSM
Andrii Anisov Dec. 14, 2018, 4:13 p.m. UTC | #3
On 14.12.18 13:44, Julien Grall wrote:
> At the moment, Xen is relocated towards the end of the memory. While
> this has the advantage to free space in low memory, the code is not
> compliant with the break-before-make because it requires to switch
> between two sets of page-table. This is not entirely trivial to fix as
> it would require us to go through an identity mapping and disabling MMU.
> 
> Furthermore, it looks like that some platform (such as the Hikey960)
> may not be able to bring-up secondary CPUs if the entry is too high.
> 
> I don't believe the low memory is an issue because Xen is quite tiny
> (< 2MB). So the best solution is to stop relocating Xen. This has the
> advantage to simplify the code and should speed-up the boot as relocation
> is not necessary anymore.
> 
> Note that the break-before-make issue is not fixed by this patch.
> 
> Signed-off-by: Julien Grall <julien.grall@arm.com>
> Reported-by: Matthew Daley <mattd@bugfuzz.com>
> Tested-by: Matthew Daley <mattd@bugfuzz.com>

Tested-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Julien Grall Dec. 14, 2018, 4:26 p.m. UTC | #4
On 12/14/18 3:52 PM, Andrii Anisov wrote:
> Hello Julien,


Hi,

>

> Let me speculate a bit about the topic.

>

> On 14.12.18 13:44, Julien Grall wrote:

>> At the moment, Xen is relocated towards the end of the memory.

> This statement is not really true. Some time ago, XEN was relocated

> toward the end of

> the low memory (under 4 GB).

Are you using arm64 or arm32? Arm64 does not have the 4GB limit while Arm32 has.
I could make the difference in the commit message but this would add more
confusion below.

> Currently, on my board I see some kind of

> mess:

>

>      (XEN) RAM: 0000000048000000 - 00000000bfffffff

>      (XEN) RAM: 0000000500000000 - 000000057fffffff

>      (XEN) RAM: 0000000600000000 - 000000067fffffff

>      (XEN) RAM: 0000000700000000 - 000000077fffffff

>      (XEN)

>      (XEN) MODULE[0]: 0000000048000000 - 0000000048013000 Device Tree

>      (XEN) MODULE[1]: 000000007a000000 - 000000007c000000 Kernel

>      (XEN) MODULE[2]: 000000007c000000 - 000000007c010000 XSM

>      (XEN)  RESVD[0]: 0000000048000000 - 0000000048013000

>      (XEN)

>      (XEN)

>      (XEN) Command line: dom0_mem=3G console=dtuart dtuart=serial0

> dom0_max_vcpus=2 bootscrub=0 loglvl=all cpufreq=none tbuf_size=8192

> loglvl=all/none guest_loglvl=all/none

>      (XEN) parameter "cpufreq" unknown!

>      (XEN) Placing Xen at 0x000000077fe00000-0x0000000780000000

>      (XEN) Update BOOTMOD_XEN from 0000000078080000-0000000078188d81 =>

> 000000077fe00000-000000077ff08d81

>

> As you can see XEN is moved towards the end of the first GB of the low

> memory instead of the end of under 4GB RAM.


I don't understand what you mean. Looking at your log, Xen is relocated at the
end of the last bank. This is the expected behavior in unstable.
>> While

>> this has the advantage to free space in low memory, the code is not

>> compliant with the break-before-make because it requires to switch

>> between two sets of page-table. This is not entirely trivial to fix as

>> it would require us to go through an identity mapping and disabling MMU.

> I understand this motivation though.

>

>> Furthermore, it looks like that some platform (such as the Hikey960)

>> may not be able to bring-up secondary CPUs if the entry is too

>> high.Just a reminder that long time ago we implemented a move of XEN

>> toward the real end of the memory, over 4GB.


This not the first time part of answer is mangled with my e-mail. This is making
really difficult to follow the conversion. Can you please configure your e-mail
client to do proper quote/reply?

> As long as CPUs were not able to start to the code placed over 4 GB, we

> set secondary CPUs to be brought up to a XEN instance under 4GB, then

> jump to a copy over 4GB, following CPU0.


How were you switching between the page-tables? A proper solution would require
to switch between page-tables using an identify mappings. This is far more
complicate than what is worth here.

>

>> I don't believe the low memory is an issue because Xen is quite tiny

>> (< 2MB).

> It is really tiny, but the problem is that Dom0 low memory (lower than 4

> GB) RAM banks

> start and end are aligned by 128MB. So existing of a single 1MB XEN cuts

> out 128MB from low memory for Dom0.


I don't understand how you came up with the conclusion that 128MB will be
removed from Dom0. We only have the requirement that the first bank is at least
128MB. So can you expand it?

> On my current setup I have two 128MB chunks stolen: one by relocated

> XEN, kernel module an XSM module, other another by device tree.


Xen is free to allocate anything below 4GB. This is nothing new. But you should
not have two 128MB chunks stolen because of modules here. If that's the case
then there is a bug in Xen that should be fixed.

Cheers,

--
Julien Grall
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Andrii Anisov Dec. 14, 2018, 5:24 p.m. UTC | #5
On 14.12.18 18:26, Julien Grall wrote:
> I don't understand how you came up with the conclusion that 128MB will be
> removed from Dom0. We only have the requirement that the first bank is at least
> 128MB. So can you expand it?
IIRC Linux kernel requires that the machine RAM start must be 128MB aligned.
Look at `allocate_memory_11()`, `min_low_order` variable usage. It affects all low memory 1:1 allocation and makes all low memory banks 128MB aligned both start and end.
So that having a module in a low memory poisons the whole 128MB region.
Andrii Anisov Dec. 14, 2018, 5:46 p.m. UTC | #6
On 14.12.18 18:26, Julien Grall wrote:
> Are you using arm64 or arm32?

I'm using arm64.
But also speak about arm32.

> I don't understand what you mean. Looking at your log, Xen is relocated at the
> end of the last bank. This is the expected behavior in unstable.

Yes, I miscounted zeroes, and recalled that part before your answer  ;)

> This not the first time part of answer is mangled with my e-mail. This is making
> really difficult to follow the conversion. Can you please configure your e-mail
> client to do proper quote/reply?

OK. Will check it.

> How were you switching between the page-tables? A proper solution would require
> to switch between page-tables using an identify mappings. This is far more
> complicate than what is worth here.

Sorry for my ignorance, what "identify mappings" stands for? Any links for the problem description?
I don't remember such details of implementation but pretty sure we didn't mean that thing. It was about 3 years ago in a different company on a arm32 SoC.

> Xen is free to allocate anything below 4GB. This is nothing new. But you should
> not have two 128MB chunks stolen because of modules here. If that's the case
> then there is a bug in Xen that should be fixed.

They are stolen from 1:1 memory allocation because of `allocate_memory_11()` design and implementation.
Julien Grall Dec. 14, 2018, 5:48 p.m. UTC | #7
Hi,

On 14/12/2018 17:24, Andrii Anisov wrote:
> 
> 
> On 14.12.18 18:26, Julien Grall wrote:
>> I don't understand how you came up with the conclusion that 128MB will be
>> removed from Dom0. We only have the requirement that the first bank is at least
>> 128MB. So can you expand it?
> IIRC Linux kernel requires that the machine RAM start must be 128MB aligned.

Please try to reference the documentation (or code if lack of documentation) 
when making such statement.

AFAICT, Linux 32-bit [1] imposes the kernel to be loaded in the first 128MB of 
RAM. Nothing about the 128MB aligned RAM. Linux 64-bit [1] requires to be loaded 
at a 2MB aligned address.

So technically allocating the RAM using a 2MB alignment should be enough. Yet we 
need to make sure the first bank is at least 128MB.

> Look at `allocate_memory_11()`, `min_low_order` variable usage. It affects all 
> low memory 1:1 allocation and makes all low memory banks 128MB aligned both 
> start and end.
> So that having a module in a low memory poisons the whole 128MB region.
> 
That's definitely an unwanted behavior, but this is not related to the patch 
itself. As soon as you hand memory to the allocator, memory can be allocated at 
any place in the memory. I am still unsure whether the alignment is due to the 
algorithm in allocate_memory_11() or because of the order we pass to the allocator.

Until we fix it, the best recommendation is to keep all the modules close 
together at the beginning of the RAM. So you only "waste" 128MB region. I can 
add this recommendation in the commit message and potentially documentation.

Cheers,

[1] Documentation/arm/Booting
[2] Documentation/arm64/booting.txt
Julien Grall Dec. 14, 2018, 6:04 p.m. UTC | #8
Hi,

On 14/12/2018 17:46, Andrii Anisov wrote:
>> How were you switching between the page-tables? A proper solution would require
>> to switch between page-tables using an identify mappings. This is far more
>> complicate than what is worth here.
> 
> Sorry for my ignorance, what "identify mappings" stands for?

I meant identity mapping sorry.

> Any links for the 
> problem description?

Have a look at "Using break-before-make when updating translation table entries" 
D5-2516 in ARM DDI 0487D.a. The underlying issue is you can't change the mapping 
with going through an invalid state. So if you switch between 2 page-tables you 
need roughly to:
	- Disable MMU
	- Flush TLBs
	- Switch page-tables
	- Enable MMU

The new set of page-tables needs to contain an identity mapping corresponding to 
where Xen is loaded in memory. Such mapping is quite difficult to have on Xen 
because the memory layout is static.

Just in case, I know that Xen does not respect that today. We are quite lucky so 
far that this didn't result to weird behavior when running Xen. Although, I am 
starting to see more report of interesting crash with newer processor. Removing 
the relocation is a first step towards avoiding to switch between page-tables.

>> Xen is free to allocate anything below 4GB. This is nothing new. But you should
>> not have two 128MB chunks stolen because of modules here. If that's the case
>> then there is a bug in Xen that should be fixed.
> 
> They are stolen from 1:1 memory allocation because of `allocate_memory_11()` 
> design and implementation.

Then the code needs to be fixed... It would be nice to get some helps here as I 
can't scale.

Cheers,
Julien Grall Dec. 17, 2018, 11:11 a.m. UTC | #9
Hi,

On 14/12/2018 17:48, Julien Grall wrote:
> On 14/12/2018 17:24, Andrii Anisov wrote:
>> On 14.12.18 18:26, Julien Grall wrote:
>>> I don't understand how you came up with the conclusion that 128MB will be
>>> removed from Dom0. We only have the requirement that the first bank is at least
>>> 128MB. So can you expand it?
>> IIRC Linux kernel requires that the machine RAM start must be 128MB aligned.
> 
> Please try to reference the documentation (or code if lack of documentation) 
> when making such statement.
> 
> AFAICT, Linux 32-bit [1] imposes the kernel to be loaded in the first 128MB of 
> RAM. Nothing about the 128MB aligned RAM. Linux 64-bit [1] requires to be loaded 
> at a 2MB aligned address.
> 
> So technically allocating the RAM using a 2MB alignment should be enough. Yet we 
> need to make sure the first bank is at least 128MB.
> 
>> Look at `allocate_memory_11()`, `min_low_order` variable usage. It affects all 
>> low memory 1:1 allocation and makes all low memory banks 128MB aligned both 
>> start and end.
>> So that having a module in a low memory poisons the whole 128MB region.
>>
> That's definitely an unwanted behavior, but this is not related to the patch 
> itself. As soon as you hand memory to the allocator, memory can be allocated at 
> any place in the memory. I am still unsure whether the alignment is due to the 
> algorithm in allocate_memory_11() or because of the order we pass to the allocator.
> 
> Until we fix it, the best recommendation is to keep all the modules close 
> together at the beginning of the RAM. So you only "waste" 128MB region. I can 
> add this recommendation in the commit message and potentially documentation.

Answering to myself. Stefano pointed out on IRC that grub/UEFI users are not in 
control of the memory layout so this might be an issue for them.

Looking at my GRUB setup, all the modules are loaded together:

(XEN) MODULE[0]: 00000000f2afb000 - 00000000f2b02000 Device Tree
(XEN) MODULE[1]: 00000000f695b000 - 00000000f7f6fa00 Kernel
(XEN) MODULE[2]: 00000000f2c23000 - 00000000f6959200 Ramdisk

[...]

(XEN) Placing Xen at 0x000000099be00000-0x000000099c000000
(XEN) Update BOOTMOD_XEN from 00000000f2b02000-00000000f2c22d81 => 000000099be00

So whether Xen is going to be relocated or not is not going to make much difference.

Now, let's image the bootloader decides to load the modules in different places 
in the memory. Then you will have 4 slots (potential 5 slots) of 128MB used. 
That's up to 640MB of low memory not available for Dom0. Relocating Xen may or 
may not make available more low memory for Dom0. For instance, in my use case 
above, this does not make any change.

This is obviously the worst case scenario. I am pretty sure people would have 
seen report if 640MB of low memory was not available for Dom0 and that was a 
concern.

So, to be honest, I think this is a non-issue. I am not saying this should not 
be fixed. I am saying that the price is minimal compare to allow Xen booting on 
platform such as the Hikey and bringing more compliance with the Arm Arm.

Cheers,

> 
> Cheers,
> 
> [1] Documentation/arm/Booting
> [2] Documentation/arm64/booting.txt
>
Andrii Anisov Dec. 17, 2018, 1:14 p.m. UTC | #10
On 14.12.18 19:48, Julien Grall wrote:
> Please try to reference the documentation (or code if lack of documentation) when making such statement.

Ah, yes, sure.

> AFAICT, Linux 32-bit [1] imposes the kernel to be loaded in the first 128MB of RAM. Nothing about the 128MB aligned RAM.

Right, the documentation sets a restriction for a compressed image to be loaded in the first 128MB of RAM. But the comment in a decompressor code (and the code itself) explicitly refer alignment [11].
Calculating a RAM start address with this mask [22] give us a wrong value if the real physical address is 64MB aligned, not 128MB. Then crash on decompressing. You know, I stepped into that with J6 (arm32) while setting up thin Dom0 with only 64MB RAM.

> Linux 64-bit [1] requires to be loaded at a 2MB aligned address.

Great, 64-bit Linux documentation directly refers the alignment :)

> So technically allocating the RAM using a 2MB alignment should be enough.

For 64-bit and, maybe, raw 32-bit Linux kernel images.
For 32-bit compressed Linux kernel - still 128MB aligned first bank is required. It can be changed on kernel side by setting ZRELADDR, but we can't designate that from XEN runtime.

> Yet we need to make sure the first bank is at least 128MB.

Well, I'm not sure the ARM64 documentation [33] or implementation mention the size of the first bank. Except it should be enough to hold the kernel image [44].
Also I would not treat [55] as a strict requirement to have 128MB in the first bank. But we can stick at that to make things easier.

>> Look at `allocate_memory_11()`, `min_low_order` variable usage. It affects all low memory 1:1 allocation and makes all low memory banks 128MB aligned both start and end.
>> So that having a module in a low memory poisons the whole 128MB region.
>>
> That's definitely an unwanted behavior, but this is not related to the patch itself.

I'm totally agree that it is not related to the code changes done by the patch. But leads to the patch message change.

> As soon as you hand memory to the allocator, memory can be allocated at any place in the memory. I am still unsure whether the alignment is due to the algorithm in allocate_memory_11() or because of the order we pass to the allocator.
> 
> Until we fix it, the best recommendation is to keep all the modules close together at the beginning of the RAM. So you only "waste" 128MB region. 

It might work for the current code.

> I can add this recommendation in the commit message and potentially documentation.

I vote for this.

[11] https://elixir.bootlin.com/linux/v4.20-rc7/source/arch/arm/boot/compressed/head.S#L217
[22] https://elixir.bootlin.com/linux/v4.20-rc7/source/arch/arm/boot/compressed/head.S#L236
[33] https://elixir.bootlin.com/linux/v4.20-rc7/source/Documentation/arm64/booting.txt
[44] https://elixir.bootlin.com/linux/v4.20-rc7/source/Documentation/arm64/booting.txt#L129
[55] https://elixir.bootlin.com/linux/v4.20-rc7/source/Documentation/arm/Booting#L160
> [1] Documentation/arm/Booting
> [2] Documentation/arm64/booting.txt
Julien Grall Dec. 17, 2018, 1:38 p.m. UTC | #11
Hi,

On 17/12/2018 13:14, Andrii Anisov wrote:
> 
> 
> On 14.12.18 19:48, Julien Grall wrote:
>> Please try to reference the documentation (or code if lack of documentation) 
>> when making such statement.
> 
> Ah, yes, sure.
> 
>> AFAICT, Linux 32-bit [1] imposes the kernel to be loaded in the first 128MB of 
>> RAM. Nothing about the 128MB aligned RAM.
> 
> Right, the documentation sets a restriction for a compressed image to be loaded 
> in the first 128MB of RAM. But the comment in a decompressor code (and the code 
> itself) explicitly refer alignment [11].
> Calculating a RAM start address with this mask [22] give us a wrong value if the 
> real physical address is 64MB aligned, not 128MB. Then crash on decompressing. 
> You know, I stepped into that with J6 (arm32) while setting up thin Dom0 with 
> only 64MB RAM.

Thank you for the pointer.

> 
>> Linux 64-bit [1] requires to be loaded at a 2MB aligned address.
> 
> Great, 64-bit Linux documentation directly refers the alignment :)
> 
>> So technically allocating the RAM using a 2MB alignment should be enough.
> 
> For 64-bit and, maybe, raw 32-bit Linux kernel images.
> For 32-bit compressed Linux kernel - still 128MB aligned first bank is required.
> It can be changed on kernel side by setting ZRELADDR, but we can't designate 
> that from XEN runtime.

I also don't think this would be a good approach as we still want to keep the 
kernel as much as possible position independent.

> 
>> Yet we need to make sure the first bank is at least 128MB.
> 
> Well, I'm not sure the ARM64 documentation [33] or implementation mention the 
> size of the first bank. Except it should be enough to hold the kernel image [44].
> Also I would not treat [55] as a strict requirement to have 128MB in the first 
> bank. But we can stick at that to make things easier.

The size of the first bank is implicit on arm32. If you look at the 
Documentation/arm32/Booting.txt, the DTB/initramfs should be loaded just above 
128MB to avoid the decompressor to avoid overwrite them. So technically, we 
should allow more than 128MB for the first bank.

At the moment, the algorithm to load 64-bit and 32-bit Image are the same. Hence 
why I said we need at least 128MB in the first bank. I am open for using 
different algorithm if Dom0 should be smaller.

Cheers,
Andrii Anisov Dec. 17, 2018, 3:54 p.m. UTC | #12
On 14.12.18 20:04, Julien Grall wrote:
> Then the code needs to be fixed... It would be nice to get some helps here as I can't scale.

I can take this.
But I would like to align on the algorithm first.
Andrii Anisov Dec. 17, 2018, 3:55 p.m. UTC | #13
On 17.12.18 15:38, Julien Grall wrote:
>>> So technically allocating the RAM using a 2MB alignment should be enough.
>>
>> For 64-bit and, maybe, raw 32-bit Linux kernel images.
>> For 32-bit compressed Linux kernel - still 128MB aligned first bank is required.
>> It can be changed on kernel side by setting ZRELADDR, but we can't designate that from XEN runtime.
> 
> I also don't think this would be a good approach as we still want to keep the kernel as much as possible position independent.

I suppose you get me wrong.
I'm saying that 128MB alignment requirement for RAM start might be relaxed by kernel itself, if it has set ZRELADDR. But we, from XEN side, can not rely on that, nor detect that.
So we must follow RAM start alignment for compressed Linux kernel images (32 bit).

>>> Yet we need to make sure the first bank is at least 128MB.
>>
>> Well, I'm not sure the ARM64 documentation [33] or implementation mention the size of the first bank. Except it should be enough to hold the kernel image [44].
>> Also I would not treat [55] as a strict requirement to have 128MB in the first bank. But we can stick at that to make things easier.
> 
> The size of the first bank is implicit on arm32. If you look at the Documentation/arm32/Booting.txt, the DTB/initramfs should be loaded just above 128MB to avoid the decompressor to avoid overwrite them.

It is rather a recommendation. A good way to choose the place for initramfs and dtb. Because decompressor is pretty dumb and limited by 128MB addresses range.
And on XEN side we follow this recommendation whenever it is possible.

> So technically, we should allow more than 128MB for the first bank

Although, it is not said that DTB or initramfs should reside in the first RAM bank ;)

> At the moment, the algorithm to load 64-bit and 32-bit Image are the same. Hence why I said we need at least 128MB in the first bank. I am open for using different algorithm if Dom0 should be smaller.

The algorithm is the same, but it is buggy right now for smaller Dom0 sizes. The lines

     const unsigned int min_low_order =
         get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));

in `allocate_memory_11()` prone to end up with a wrong RAM start alignment.
Andrii Anisov Dec. 17, 2018, 4:02 p.m. UTC | #14
On 17.12.18 13:11, Julien Grall wrote:
> So, to be honest, I think this is a non-issue. I am not saying this should not be fixed. I am saying that the price is minimal compare to allow Xen booting on platform such as the Hikey and bringing more compliance with the Arm Arm.

BTW, I hope you already noticed I've passed my RB and TB for the patch.
I would prefer you reflect last findings in the commit message, but if you don't like - let it be as is.
Julien Grall Dec. 17, 2018, 4:56 p.m. UTC | #15
Hi Andrii,

On 17/12/2018 16:02, Andrii Anisov wrote:
> 
> 
> On 17.12.18 13:11, Julien Grall wrote:
>> So, to be honest, I think this is a non-issue. I am not saying this should not 
>> be fixed. I am saying that the price is minimal compare to allow Xen booting 
>> on platform such as the Hikey and bringing more compliance with the Arm Arm.
> 
> BTW, I hope you already noticed I've passed my RB and TB for the patch.

I have seen them. Thank you for the review and tested-by.

> I would prefer you reflect last findings in the commit message, but if you don't 
> like - let it be as is.

I am planning to reflect this in the commit message. I am just waiting on the 
discussion to come to a conclusion before resending it.

Cheers,
Julien Grall Dec. 17, 2018, 5:02 p.m. UTC | #16
Hi,

On 17/12/2018 15:55, Andrii Anisov wrote:
> 
> 
> On 17.12.18 15:38, Julien Grall wrote:
>>>> So technically allocating the RAM using a 2MB alignment should be enough.
>>>
>>> For 64-bit and, maybe, raw 32-bit Linux kernel images.
>>> For 32-bit compressed Linux kernel - still 128MB aligned first bank is required.
>>> It can be changed on kernel side by setting ZRELADDR, but we can't designate 
>>> that from XEN runtime.
>>
>> I also don't think this would be a good approach as we still want to keep the 
>> kernel as much as possible position independent.
> 
> I suppose you get me wrong.
> I'm saying that 128MB alignment requirement for RAM start might be relaxed by 
> kernel itself, if it has set ZRELADDR. But we, from XEN side, can not rely on 
> that, nor detect that.
> So we must follow RAM start alignment for compressed Linux kernel images (32 bit).

I didn't get you wrong. We are saying the same things :).

> 
>>>> Yet we need to make sure the first bank is at least 128MB.
>>>
>>> Well, I'm not sure the ARM64 documentation [33] or implementation mention the 
>>> size of the first bank. Except it should be enough to hold the kernel image 
>>> [44].
>>> Also I would not treat [55] as a strict requirement to have 128MB in the 
>>> first bank. But we can stick at that to make things easier.
>>
>> The size of the first bank is implicit on arm32. If you look at the 
>> Documentation/arm32/Booting.txt, the DTB/initramfs should be loaded just above 
>> 128MB to avoid the decompressor to avoid overwrite them.
> 
> It is rather a recommendation. A good way to choose the place for initramfs and 
> dtb. Because decompressor is pretty dumb and limited by 128MB addresses range.
> And on XEN side we follow this recommendation whenever it is possible.
> 
>> So technically, we should allow more than 128MB for the first bank
> 
> Although, it is not said that DTB or initramfs should reside in the first RAM 
> bank ;)

The documentation suggest to load the DTB and initramfs just above 128MB. I am 
not entirely how the kernel would behave if you put it in a separate bank far after.

Similarly, some version on Linux (i.e prior to 4.2) requires the DTB to within 
512MB from the kernel.

> 
>> At the moment, the algorithm to load 64-bit and 32-bit Image are the same. 
>> Hence why I said we need at least 128MB in the first bank. I am open for using 
>> different algorithm if Dom0 should be smaller.
> 
> The algorithm is the same, but it is buggy right now for smaller Dom0 sizes. The 
> lines
> 
>      const unsigned int min_low_order =
>          get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
> 
> in `allocate_memory_11()` prone to end up with a wrong RAM start alignment.

Patches are welcomed.

Cheers,
Julien Grall Dec. 17, 2018, 5:03 p.m. UTC | #17
Hi Andrii,

On 17/12/2018 15:54, Andrii Anisov wrote:
> On 14.12.18 20:04, Julien Grall wrote:
>> Then the code needs to be fixed... It would be nice to get some helps here as 
>> I can't scale.
> 
> I can take this.

Thank you.

> But I would like to align on the algorithm first.

It is probably worth to start a separate discussion with your thoughts.

Cheers,
Andrii Anisov Dec. 17, 2018, 5:34 p.m. UTC | #18
On 17.12.18 19:02, Julien Grall wrote:
> I didn't get you wrong. We are saying the same things :).

Great!

> Similarly, some version on Linux (i.e prior to 4.2) requires the DTB to within 512MB from the kernel.

I've seen that restriction in the Linux for ARM64 documentation.

>> in `allocate_memory_11()` prone to end up with a wrong RAM start alignment.
> 
> Patches are welcomed.
I see something like following as a quick WA (not even build tested):

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index d2c63a8..bf72ba9 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -223,8 +223,9 @@ fail:
   * meet these requirements directly. So instead of proceed as follows:
   *
   * We first allocate the largest allocation we can as low as we
- * can. This then becomes the first bank. This bank must be at least
- * 128MB (or dom0_mem if that is smaller).
+ * can. This then becomes the first bank. This bank is at least 128MB even if
+ * dom0 is configured for less. It is the way to get that bank 128MB aligned,
+ * what is required for 32-bit zImage.
   *
   * Then we start allocating more memory, trying to allocate the
   * largest possible size and trying smaller sizes until we
@@ -253,7 +254,7 @@ static void __init allocate_memory_11(struct domain *d,
                                        struct kernel_info *kinfo)
  {
      const unsigned int min_low_order =
-        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
+        get_order_from_bytes(MB(128));
      const unsigned int min_order = get_order_from_bytes(MB(4));
      struct page_info *pg;
      unsigned int order = get_allocation_size(kinfo->unassigned_mem);
@@ -268,6 +269,10 @@ static void __init allocate_memory_11(struct domain *d,
       */
      BUG_ON(!is_domain_direct_mapped(d));
  
+    if ( dom0_mem < MB(128))
+        printk(XENLOG_WARNING "Allocating 128MB for Domain0 with %"PRIu64"MB\n",
+               dom0_mem/MB(1));
+
      printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
             /* Don't want format this as PRIpaddr (16 digit hex) */
             (unsigned long)(kinfo->unassigned_mem >> 20));


But I'm not sure if it worth to be sent, because I'm going to rewrite it soon.
Stefano Stabellini Dec. 17, 2018, 5:57 p.m. UTC | #19
On Mon, 17 Dec 2018, Julien Grall wrote:
> Hi,
> 
> On 14/12/2018 17:48, Julien Grall wrote:
> > On 14/12/2018 17:24, Andrii Anisov wrote:
> > > On 14.12.18 18:26, Julien Grall wrote:
> > > > I don't understand how you came up with the conclusion that 128MB will
> > > > be
> > > > removed from Dom0. We only have the requirement that the first bank is
> > > > at least
> > > > 128MB. So can you expand it?
> > > IIRC Linux kernel requires that the machine RAM start must be 128MB
> > > aligned.
> > 
> > Please try to reference the documentation (or code if lack of documentation)
> > when making such statement.
> > 
> > AFAICT, Linux 32-bit [1] imposes the kernel to be loaded in the first 128MB
> > of RAM. Nothing about the 128MB aligned RAM. Linux 64-bit [1] requires to be
> > loaded at a 2MB aligned address.
> > 
> > So technically allocating the RAM using a 2MB alignment should be enough.
> > Yet we need to make sure the first bank is at least 128MB.
> > 
> > > Look at `allocate_memory_11()`, `min_low_order` variable usage. It affects
> > > all low memory 1:1 allocation and makes all low memory banks 128MB aligned
> > > both start and end.
> > > So that having a module in a low memory poisons the whole 128MB region.
> > > 
> > That's definitely an unwanted behavior, but this is not related to the patch
> > itself. As soon as you hand memory to the allocator, memory can be allocated
> > at any place in the memory. I am still unsure whether the alignment is due
> > to the algorithm in allocate_memory_11() or because of the order we pass to
> > the allocator.
> > 
> > Until we fix it, the best recommendation is to keep all the modules close
> > together at the beginning of the RAM. So you only "waste" 128MB region. I
> > can add this recommendation in the commit message and potentially
> > documentation.
> 
> Answering to myself. Stefano pointed out on IRC that grub/UEFI users are not
> in control of the memory layout so this might be an issue for them.
> 
> Looking at my GRUB setup, all the modules are loaded together:
> 
> (XEN) MODULE[0]: 00000000f2afb000 - 00000000f2b02000 Device Tree
> (XEN) MODULE[1]: 00000000f695b000 - 00000000f7f6fa00 Kernel
> (XEN) MODULE[2]: 00000000f2c23000 - 00000000f6959200 Ramdisk
> 
> [...]
> 
> (XEN) Placing Xen at 0x000000099be00000-0x000000099c000000
> (XEN) Update BOOTMOD_XEN from 00000000f2b02000-00000000f2c22d81 =>
> 000000099be00
> 
> So whether Xen is going to be relocated or not is not going to make much
> difference.
> 
> Now, let's image the bootloader decides to load the modules in different
> places in the memory. Then you will have 4 slots (potential 5 slots) of 128MB
> used. That's up to 640MB of low memory not available for Dom0. Relocating Xen
> may or may not make available more low memory for Dom0. For instance, in my
> use case above, this does not make any change.
> 
> This is obviously the worst case scenario. I am pretty sure people would have
> seen report if 640MB of low memory was not available for Dom0 and that was a
> concern.
> 
> So, to be honest, I think this is a non-issue. I am not saying this should not
> be fixed. I am saying that the price is minimal compare to allow Xen booting
> on platform such as the Hikey and bringing more compliance with the Arm Arm.

Make sense. Add some of these thoughts to the commit message so that
this time we remember.


> > Cheers,
> > 
> > [1] Documentation/arm/Booting
> > [2] Documentation/arm64/booting.txt
Julien Grall Dec. 17, 2018, 6:03 p.m. UTC | #20
Hi,

On 17/12/2018 17:57, Stefano Stabellini wrote:
> On Mon, 17 Dec 2018, Julien Grall wrote:
>> Hi,
>>
>> On 14/12/2018 17:48, Julien Grall wrote:
>>> On 14/12/2018 17:24, Andrii Anisov wrote:
>>>> On 14.12.18 18:26, Julien Grall wrote:
>>>>> I don't understand how you came up with the conclusion that 128MB will
>>>>> be
>>>>> removed from Dom0. We only have the requirement that the first bank is
>>>>> at least
>>>>> 128MB. So can you expand it?
>>>> IIRC Linux kernel requires that the machine RAM start must be 128MB
>>>> aligned.
>>>
>>> Please try to reference the documentation (or code if lack of documentation)
>>> when making such statement.
>>>
>>> AFAICT, Linux 32-bit [1] imposes the kernel to be loaded in the first 128MB
>>> of RAM. Nothing about the 128MB aligned RAM. Linux 64-bit [1] requires to be
>>> loaded at a 2MB aligned address.
>>>
>>> So technically allocating the RAM using a 2MB alignment should be enough.
>>> Yet we need to make sure the first bank is at least 128MB.
>>>
>>>> Look at `allocate_memory_11()`, `min_low_order` variable usage. It affects
>>>> all low memory 1:1 allocation and makes all low memory banks 128MB aligned
>>>> both start and end.
>>>> So that having a module in a low memory poisons the whole 128MB region.
>>>>
>>> That's definitely an unwanted behavior, but this is not related to the patch
>>> itself. As soon as you hand memory to the allocator, memory can be allocated
>>> at any place in the memory. I am still unsure whether the alignment is due
>>> to the algorithm in allocate_memory_11() or because of the order we pass to
>>> the allocator.
>>>
>>> Until we fix it, the best recommendation is to keep all the modules close
>>> together at the beginning of the RAM. So you only "waste" 128MB region. I
>>> can add this recommendation in the commit message and potentially
>>> documentation.
>>
>> Answering to myself. Stefano pointed out on IRC that grub/UEFI users are not
>> in control of the memory layout so this might be an issue for them.
>>
>> Looking at my GRUB setup, all the modules are loaded together:
>>
>> (XEN) MODULE[0]: 00000000f2afb000 - 00000000f2b02000 Device Tree
>> (XEN) MODULE[1]: 00000000f695b000 - 00000000f7f6fa00 Kernel
>> (XEN) MODULE[2]: 00000000f2c23000 - 00000000f6959200 Ramdisk
>>
>> [...]
>>
>> (XEN) Placing Xen at 0x000000099be00000-0x000000099c000000
>> (XEN) Update BOOTMOD_XEN from 00000000f2b02000-00000000f2c22d81 =>
>> 000000099be00
>>
>> So whether Xen is going to be relocated or not is not going to make much
>> difference.
>>
>> Now, let's image the bootloader decides to load the modules in different
>> places in the memory. Then you will have 4 slots (potential 5 slots) of 128MB
>> used. That's up to 640MB of low memory not available for Dom0. Relocating Xen
>> may or may not make available more low memory for Dom0. For instance, in my
>> use case above, this does not make any change.
>>
>> This is obviously the worst case scenario. I am pretty sure people would have
>> seen report if 640MB of low memory was not available for Dom0 and that was a
>> concern.
>>
>> So, to be honest, I think this is a non-issue. I am not saying this should not
>> be fixed. I am saying that the price is minimal compare to allow Xen booting
>> on platform such as the Hikey and bringing more compliance with the Arm Arm.
> 
> Make sense. Add some of these thoughts to the commit message so that
> this time we remember.

I will do.

Cheers,
Andrii Anisov Dec. 18, 2018, 12:09 p.m. UTC | #21
Hello Julien,

On 17.12.18 19:34, Andrii Anisov wrote:
> I see something like following as a quick WA (not even build tested):
> 
> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> index d2c63a8..bf72ba9 100644
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -223,8 +223,9 @@ fail:
>    * meet these requirements directly. So instead of proceed as follows:
>    *
>    * We first allocate the largest allocation we can as low as we
> - * can. This then becomes the first bank. This bank must be at least
> - * 128MB (or dom0_mem if that is smaller).
> + * can. This then becomes the first bank. This bank is at least 128MB even if
> + * dom0 is configured for less. It is the way to get that bank 128MB aligned,
> + * what is required for 32-bit zImage.
>    *
>    * Then we start allocating more memory, trying to allocate the
>    * largest possible size and trying smaller sizes until we
> @@ -253,7 +254,7 @@ static void __init allocate_memory_11(struct domain *d,
>                                         struct kernel_info *kinfo)
>   {
>       const unsigned int min_low_order =
> -        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
> +        get_order_from_bytes(MB(128));
>       const unsigned int min_order = get_order_from_bytes(MB(4));
>       struct page_info *pg;
>       unsigned int order = get_allocation_size(kinfo->unassigned_mem);
> @@ -268,6 +269,10 @@ static void __init allocate_memory_11(struct domain *d,
>        */
>       BUG_ON(!is_domain_direct_mapped(d));
> 
> +    if ( dom0_mem < MB(128))
> +        printk(XENLOG_WARNING "Allocating 128MB for Domain0 with %"PRIu64"MB\n",
> +               dom0_mem/MB(1));
> +
>       printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
>              /* Don't want format this as PRIpaddr (16 digit hex) */
>              (unsigned long)(kinfo->unassigned_mem >> 20));
> 
> 
> But I'm not sure if it worth to be sent, because I'm going to rewrite it soon.

 From the second thought, we have last posting date for 4.12 already in the past. So redesign will not go to 4.12.
Maybe it worth to have the thing above as a bugfix for 4.12?
What do you think?
Julien Grall Dec. 18, 2018, 1:13 p.m. UTC | #22
On 12/18/18 12:09 PM, Andrii Anisov wrote:
> Hello Julien,

Hi,

> On 17.12.18 19:34, Andrii Anisov wrote:
>> I see something like following as a quick WA (not even build tested):
>>
>> diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
>> index d2c63a8..bf72ba9 100644
>> --- a/xen/arch/arm/domain_build.c
>> +++ b/xen/arch/arm/domain_build.c
>> @@ -223,8 +223,9 @@ fail:
>>    * meet these requirements directly. So instead of proceed as follows:
>>    *
>>    * We first allocate the largest allocation we can as low as we
>> - * can. This then becomes the first bank. This bank must be at least
>> - * 128MB (or dom0_mem if that is smaller).
>> + * can. This then becomes the first bank. This bank is at least 128MB 
>> even if
>> + * dom0 is configured for less. It is the way to get that bank 128MB 
>> aligned,
>> + * what is required for 32-bit zImage.
>>    *
>>    * Then we start allocating more memory, trying to allocate the
>>    * largest possible size and trying smaller sizes until we
>> @@ -253,7 +254,7 @@ static void __init allocate_memory_11(struct 
>> domain *d,
>>                                         struct kernel_info *kinfo)
>>   {
>>       const unsigned int min_low_order =
>> -        get_order_from_bytes(min_t(paddr_t, dom0_mem, MB(128)));
>> +        get_order_from_bytes(MB(128));
>>       const unsigned int min_order = get_order_from_bytes(MB(4));
>>       struct page_info *pg;
>>       unsigned int order = get_allocation_size(kinfo->unassigned_mem);
>> @@ -268,6 +269,10 @@ static void __init allocate_memory_11(struct 
>> domain *d,
>>        */
>>       BUG_ON(!is_domain_direct_mapped(d));
>>
>> +    if ( dom0_mem < MB(128))
>> +        printk(XENLOG_WARNING "Allocating 128MB for Domain0 with 
>> %"PRIu64"MB\n",
>> +               dom0_mem/MB(1));
>> +
>>       printk("Allocating 1:1 mappings totalling %ldMB for dom0:\n",
>>              /* Don't want format this as PRIpaddr (16 digit hex) */
>>              (unsigned long)(kinfo->unassigned_mem >> 20));
>>
>>
>> But I'm not sure if it worth to be sent, because I'm going to rewrite 
>> it soon.
> 
>  From the second thought, we have last posting date for 4.12 already in 
> the past. So redesign will not go to 4.12.
> Maybe it worth to have the thing above as a bugfix for 4.12?
> What do you think?

AFAICT, it was possible to boot a Dom0 with only 64MB on Arm64. So I am 
not entirely why we would also limit the size there.

For Arm32, I think we should just return an error and fail the domain 
build. If you ask 64MB and we give you 128MB then something is already 
really wrong.

Anyway, I think this should be submitted properly to discuss for Xen 
4.12 inclusion.

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/arm/arm32/head.S b/xen/arch/arm/arm32/head.S
index 93b51e9ef2..390a505e05 100644
--- a/xen/arch/arm/arm32/head.S
+++ b/xen/arch/arm/arm32/head.S
@@ -469,58 +469,12 @@  fail:   PRINT("- Boot failed -\r\n")
 GLOBAL(_end_boot)
 
 /*
- * Copy Xen to new location and switch TTBR
+ * Switch TTBR
  * r1:r0       ttbr
- * r2          source address
- * r3          destination address
- * [sp]=>r4    length
  *
- * Source and destination must be word aligned, length is rounded up
- * to a 16 byte boundary.
- *
- * MUST BE VERY CAREFUL when saving things to RAM over the copy
+ * TODO: This code does not comply with break-before-make.
  */
-ENTRY(relocate_xen)
-        push {r4,r5,r6,r7,r8,r9,r10,r11}
-
-        ldr   r4, [sp, #8*4]                /* Get 4th argument from stack */
-
-        /* Copy 16 bytes at a time using:
-         * r5:  counter
-         * r6:  data
-         * r7:  data
-         * r8:  data
-         * r9:  data
-         * r10: source
-         * r11: destination
-         */
-        mov   r5, r4
-        mov   r10, r2
-        mov   r11, r3
-1:      ldmia r10!, {r6, r7, r8, r9}
-        stmia r11!, {r6, r7, r8, r9}
-
-        subs  r5, r5, #16
-        bgt   1b
-
-        /* Flush destination from dcache using:
-         * r5: counter
-         * r6: step
-         * r7: vaddr
-         */
-        dsb        /* So the CPU issues all writes to the range */
-
-        mov   r5, r4
-        ldr   r6, =dcache_line_bytes /* r6 := step */
-        ldr   r6, [r6]
-        mov   r7, r3
-
-1:      mcr   CP32(r7, DCCMVAC)
-
-        add   r7, r7, r6
-        subs  r5, r5, r6
-        bgt   1b
-
+ENTRY(switch_ttbr)
         dsb                            /* Ensure the flushes happen before
                                         * continuing */
         isb                            /* Ensure synchronization with previous
@@ -543,8 +497,6 @@  ENTRY(relocate_xen)
         dsb                            /* Ensure completion of TLB+BP flush */
         isb
 
-        pop {r4, r5,r6,r7,r8,r9,r10,r11}
-
         mov pc, lr
 
 #ifdef CONFIG_EARLY_PRINTK
diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index ef87b5c254..0b7f6e7f92 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -609,52 +609,14 @@  fail:   PRINT("- Boot failed -\r\n")
 
 GLOBAL(_end_boot)
 
-/* Copy Xen to new location and switch TTBR
- * x0    ttbr
- * x1    source address
- * x2    destination address
- * x3    length
+/*
+ * Switch TTBR
  *
- * Source and destination must be word aligned, length is rounded up
- * to a 16 byte boundary.
+ * x0    ttbr
  *
- * MUST BE VERY CAREFUL when saving things to RAM over the copy */
-ENTRY(relocate_xen)
-        /* Copy 16 bytes at a time using:
-         *   x9: counter
-         *   x10: data
-         *   x11: data
-         *   x12: source
-         *   x13: destination
-         */
-        mov     x9, x3
-        mov     x12, x1
-        mov     x13, x2
-
-1:      ldp     x10, x11, [x12], #16
-        stp     x10, x11, [x13], #16
-
-        subs    x9, x9, #16
-        bgt     1b
-
-        /* Flush destination from dcache using:
-         * x9: counter
-         * x10: step
-         * x11: vaddr
-         */
-        dsb   sy        /* So the CPU issues all writes to the range */
-
-        mov   x9, x3
-        ldr   x10, =dcache_line_bytes /* x10 := step */
-        ldr   x10, [x10]
-        mov   x11, x2
-
-1:      dc    cvac, x11
-
-        add   x11, x11, x10
-        subs  x9, x9, x10
-        bgt   1b
-
+ * TODO: This code does not comply with break-before-make.
+ */
+ENTRY(switch_ttbr)
         dsb   sy                     /* Ensure the flushes happen before
                                       * continuing */
         isb                          /* Ensure synchronization with previous
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 91f3aef93c..d96a6655ee 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -601,7 +601,7 @@  void __init remove_early_mappings(void)
     flush_xen_data_tlb_range_va(BOOT_FDT_VIRT_START, BOOT_FDT_SLOT_SIZE);
 }
 
-extern void relocate_xen(uint64_t ttbr, void *src, void *dst, size_t len);
+extern void switch_ttbr(uint64_t ttbr);
 
 /* Clear a translation table and clean & invalidate the cache */
 static void clear_table(void *table)
@@ -612,15 +612,13 @@  static void clear_table(void *table)
 
 /* Boot-time pagetable setup.
  * Changes here may need matching changes in head.S */
-void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr)
+void __init setup_pagetables(unsigned long boot_phys_offset)
 {
     uint64_t ttbr;
-    unsigned long dest_va;
     lpae_t pte, *p;
     int i;
 
-    /* Calculate virt-to-phys offset for the new location */
-    phys_offset = xen_paddr - (unsigned long) _start;
+    phys_offset = boot_phys_offset;
 
 #ifdef CONFIG_ARM_64
     p = (void *) xen_pgtable;
@@ -686,21 +684,13 @@  void __init setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr)
     pte = boot_second[second_table_offset(BOOT_FDT_VIRT_START + SZ_2M)];
     xen_second[second_table_offset(BOOT_FDT_VIRT_START + SZ_2M)] = pte;
 
-    /* ... Boot Misc area for xen relocation */
-    dest_va = BOOT_RELOC_VIRT_START;
-    pte = mfn_to_xen_entry(maddr_to_mfn(xen_paddr), MT_NORMAL);
-    /* Map the destination in xen_second. */
-    xen_second[second_table_offset(dest_va)] = pte;
-    /* Map the destination in boot_second. */
-    write_pte(boot_second + second_table_offset(dest_va), pte);
-    flush_xen_data_tlb_range_va_local(dest_va, SECOND_SIZE);
 #ifdef CONFIG_ARM_64
     ttbr = (uintptr_t) xen_pgtable + phys_offset;
 #else
     ttbr = (uintptr_t) cpu0_pgtable + phys_offset;
 #endif
 
-    relocate_xen(ttbr, _start, (void*)dest_va, _end - _start);
+    switch_ttbr(ttbr);
 
     /* Clear the copy of the boot pagetables. Each secondary CPU
      * rebuilds these itself (see head.S) */
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index e83221ab79..fb923cdf67 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -374,6 +374,7 @@  void __init discard_initial_modules(void)
     remove_early_mappings();
 }
 
+#ifdef CONFIG_ARM_32
 /*
  * Returns the end address of the highest region in the range s..e
  * with required size and alignment that does not conflict with the
@@ -440,6 +441,7 @@  static paddr_t __init consider_modules(paddr_t s, paddr_t e,
     }
     return e;
 }
+#endif
 
 /*
  * Return the end of the non-module region starting at s. In other
@@ -475,59 +477,6 @@  static paddr_t __init next_module(paddr_t s, paddr_t *end)
     return lowest;
 }
 
-
-/**
- * get_xen_paddr - get physical address to relocate Xen to
- *
- * Xen is relocated to as near to the top of RAM as possible and
- * aligned to a XEN_PADDR_ALIGN boundary.
- */
-static paddr_t __init get_xen_paddr(void)
-{
-    struct meminfo *mi = &bootinfo.mem;
-    paddr_t min_size;
-    paddr_t paddr = 0;
-    int i;
-
-    min_size = (_end - _start + (XEN_PADDR_ALIGN-1)) & ~(XEN_PADDR_ALIGN-1);
-
-    /* Find the highest bank with enough space. */
-    for ( i = 0; i < mi->nr_banks; i++ )
-    {
-        const struct membank *bank = &mi->bank[i];
-        paddr_t s, e;
-
-        if ( bank->size >= min_size )
-        {
-            e = consider_modules(bank->start, bank->start + bank->size,
-                                 min_size, XEN_PADDR_ALIGN, 0);
-            if ( !e )
-                continue;
-
-#ifdef CONFIG_ARM_32
-            /* Xen must be under 4GB */
-            if ( e > 0x100000000ULL )
-                e = 0x100000000ULL;
-            if ( e < bank->start )
-                continue;
-#endif
-
-            s = e - min_size;
-
-            if ( s > paddr )
-                paddr = s;
-        }
-    }
-
-    if ( !paddr )
-        panic("Not enough memory to relocate Xen\n");
-
-    printk("Placing Xen at 0x%"PRIpaddr"-0x%"PRIpaddr"\n",
-           paddr, paddr + min_size);
-
-    return paddr;
-}
-
 static void __init init_pdx(void)
 {
     paddr_t bank_start, bank_size, bank_end;
@@ -783,7 +732,6 @@  void __init start_xen(unsigned long boot_phys_offset,
 {
     size_t fdt_size;
     int cpus, i;
-    paddr_t xen_paddr;
     const char *cmdline;
     struct bootmodule *xen_bootmodule;
     struct domain *dom0;
@@ -827,14 +775,7 @@  void __init start_xen(unsigned long boot_phys_offset,
                              (paddr_t)(uintptr_t)(_end - _start + 1), false);
     BUG_ON(!xen_bootmodule);
 
-    xen_paddr = get_xen_paddr();
-    setup_pagetables(boot_phys_offset, xen_paddr);
-
-    /* Update Xen's address now that we have relocated. */
-    printk("Update BOOTMOD_XEN from %"PRIpaddr"-%"PRIpaddr" => %"PRIpaddr"-%"PRIpaddr"\n",
-           xen_bootmodule->start, xen_bootmodule->start + xen_bootmodule->size,
-           xen_paddr, xen_paddr + xen_bootmodule->size);
-    xen_bootmodule->start = xen_paddr;
+    setup_pagetables(boot_phys_offset);
 
     setup_mm(fdt_paddr, fdt_size);
 
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index b2f6104a7f..eafa26f56e 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -169,7 +169,7 @@  extern unsigned long total_pages;
 #define PDX_GROUP_SHIFT SECOND_SHIFT
 
 /* Boot-time pagetable setup */
-extern void setup_pagetables(unsigned long boot_phys_offset, paddr_t xen_paddr);
+extern void setup_pagetables(unsigned long boot_phys_offset);
 /* Map FDT in boot pagetable */
 extern void *early_fdt_map(paddr_t fdt_paddr);
 /* Remove early mappings */