diff mbox series

[Xen-devel] Arm boot regression with Xen 4.12

Message ID a1f50f9c-1f14-9816-88f2-2a11dffdf6db@arm.com
State New
Headers show
Series [Xen-devel] Arm boot regression with Xen 4.12 | expand

Commit Message

Julien Grall March 19, 2019, 10:33 a.m. UTC
(+ Juergen)

Hi Amit,

On 3/18/19 3:12 PM, Amit Tomer wrote:
>> It will be difficult to help without any log. You probably want to try with
>> Stefano series instead. However ...
> 
> If we comment out GPU node(gpu@38000000) , we don't see this issue and
> Dom0 kernel is
> loaded into memory but we following crash:
> 
> Starting kernel ...
> 
> - UART enabled -
> - CPU 00000000 booting -
> - Current EL 00000008 -
> - Xen starting at EL2 -
> - Zero BSS -
> - Setting up control registers -
> - Turning on paging -
> - Ready -
> (XEN) Checking for initrd in /chosen
> (XEN) RAM: 0000000040000000 - 00000000bfffffff
> (XEN)
> (XEN) MODULE[0]: 00000000be511000 - 00000000be51d000 Device Tree
> (XEN) MODULE[1]: 0000000040480000 - 0000000042680000 Kernel
> (XEN)  RESVD[0]: 0000000043000000 - 000000004300c000
> (XEN)  RESVD[1]: 00000000be511000 - 00000000be51d000

[...]

> (XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
> (XEN) Data Abort Trap. Syndrome=0x6
> (XEN) Walking Hypervisor VA 0x8 on CPU0 via TTBR 0x0000000042114000
> (XEN) 0TH[0x0] = 0x0000000042113f7f
> (XEN) 1ST[0x0] = 0x0000000042110f7f
> (XEN) 2ND[0x0] = 0x0000000000000000
> (XEN) CPU0: Unexpected Trap: Data Abort
> (XEN) ----[ Xen-4.12.0-rc  arm64  debug=y   Not tainted ]----
> (XEN) CPU:    0
> (XEN) PC:     000000000021c220 page_alloc.c#free_heap_pages+0x3b0/0x58c

[...]

> (XEN) Xen call trace:
> (XEN)    [<000000000021c220>] page_alloc.c#free_heap_pages+0x3b0/0x58c (PC)
> (XEN)    [<000000000021c20c>] page_alloc.c#free_heap_pages+0x39c/0x58c (LR)
> (XEN)    [<000000000021e5f4>] page_alloc.c#init_heap_pages+0x334/0x4ec
> (XEN)    [<000000000021e840>] init_domheap_pages+0x94/0x9c
> (XEN)    [<000000000024e178>] free_init_memory+0xac/0xe0
> (XEN)    [<0000000000252580>] setup.c#init_done+0x14/0x20
> (XEN)    [<000000000029daa8>] 000000000029daa8
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) CPU0: Unexpected Trap: Data Abort
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...

Could you give a try to the below patch?


Now the long answer.

Unfortunately, in a recent page, I removed the log telling where
Xen lives in memory,  so I am not 100% sure this is your problem.

From my own testing, I think the problem is Xen will try to hand reserved
memory (the old fashion /memreserve/ and not /reserved-regions) to the
allocator. This happen when freeing the init regions (see free_init_memory).

We do handle correctly all the others modules (see discard_initial_modules).

On my setup this does not crash Xen, instead it happily hand the page to
the allocator which is not good. The difference in behavior may be because
on how the PDX is setup (I need to investigate that). So by luck, I have
a struct page_info backing the reserved-memory region. This does not
mean it is better :).

This regression was introduced by commit f60658c6ae "xen/arm: Stop
relocating Xen". Before hand, Xen was always relocated so the original
Xen was left untouched. The relocated version would always live in
non-reserved area.

On my setup, Xen was not in the reserved region area by default. I had
to modify the Device-Tree. I don't know how many platform are putting
Xen in /memreserve/ region. Amit, assuming the patch above works for you,
could you tell who created the /memreserve/?

Cheers,

Comments

Amit Tomer March 19, 2019, 2:01 p.m. UTC | #1
Hi,

> Could you give a try to the below patch?
>
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index 01ae2cccc0..2c34138bbd 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1139,7 +1139,7 @@ void free_init_memory(void)
>          *(p + i) = insn;
>
>      set_pte_flags_on_range(__init_begin, len, mg_clear);
> -    init_domheap_pages(pa, pa + len);
> +    dt_unreserved_regions(pa, pa + len, init_domheap_pages, 0);
>      printk("Freed %ldkB init memory.\n", (long)(__init_end-__init_begin)>>10);
>  }
>
> diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
> index 444857a967..8dbc4f819b 100644
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -764,18 +764,18 @@ void __init start_xen(unsigned long boot_phys_offset,
>                "Please check your bootloader.\n",
>                fdt_paddr);
>
> -    fdt_size = boot_fdt_info(device_tree_flattened, fdt_paddr);
> -
> -    cmdline = boot_fdt_cmdline(device_tree_flattened);
> -    printk("Command line: %s\n", cmdline);
> -    cmdline_parse(cmdline);
> -
>      /* Register Xen's load address as a boot module. */
>      xen_bootmodule = add_boot_module(BOOTMOD_XEN,
>                               (paddr_t)(uintptr_t)(_start + boot_phys_offset),
>                               (paddr_t)(uintptr_t)(_end - _start + 1), false);
>      BUG_ON(!xen_bootmodule);
>
> +    fdt_size = boot_fdt_info(device_tree_flattened, fdt_paddr);
> +
> +    cmdline = boot_fdt_cmdline(device_tree_flattened);
> +    printk("Command line: %s\n", cmdline);
> +    cmdline_parse(cmdline);
> +
>      setup_pagetables(boot_phys_offset);
>
>      setup_mm(fdt_paddr, fdt_size);
>


I tried the above patch but still see the same crash:

tarting kernel ...

- UART enabled -
- CPU 00000000 booting -
- Current EL 00000008 -
- Xen starting at EL2 -
- Zero BSS -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) Checking for initrd in /chosen
(XEN) RAM: 0000000040000000 - 00000000bfffffff
(XEN)
(XEN) MODULE[0]: 0000000042000000 - 0000000042120d81 Xen
(XEN) MODULE[1]: 00000000be510000 - 00000000be51d000 Device Tree
(XEN) MODULE[2]: 0000000040480000 - 0000000042680000 Kernel
(XEN)  RESVD[0]: 0000000043000000 - 000000004300d000
(XEN)  RESVD[1]: 00000000be510000 - 00000000be51d000
(XEN)
(XEN)
(XEN) Command line: console=dtuart dom0_mem=1024M
(XEN) Domain heap initialised
(XEN) Booting using Device Tree
(XEN) Platform: Generic System
(XEN) Taking dtuart configuration from /chosen/stdout-path
(XEN) Looking for dtuart at "/serial@30860000", options ""
(XEN) Unable to initialize dtuart: -9
(XEN) Bad console= option 'dtuart'
 Xen 4.13-unstable
(XEN) Xen version 4.13-unstable (atomar@) (aarch64-linux-gnu-gcc
(Linaro GCC 7.3-2018.05) 7.3.1 20180425 [linaro-7.3-2018.05 revision
d29120a424ecfbc167ef90065c0eeb7f91977701]) debug=y  Tue Mar 19 19:14:9
(XEN) Latest ChangeSet: Tue Feb 12 18:33:30 2019 +0000 git:1e780ef-dirty
(XEN) Processor: 410fd034: "ARM Limited", variant: 0x0, part 0xd03, rev 0x4
(XEN) 64-bit Execution:
(XEN)   Processor Features: 0000000001002222 0000000000000000
(XEN)     Exception Levels: EL3:64+32 EL2:64+32 EL1:64+32 EL0:64+32
(XEN)     Extensions: FloatingPoint AdvancedSIMD GICv3-SysReg
(XEN)   Debug Features: 0000000010305106 0000000000000000
(XEN)   Auxiliary Features: 0000000000000000 0000000000000000
(XEN)   Memory Model Features: 0000000000001122 0000000000000000
(XEN)   ISA Features:  0000000000011120 0000000000000000
(XEN) 32-bit Execution:
(XEN)   Processor Features: 00000131:10011011
(XEN)     Instruction Sets: AArch32 A32 Thumb Thumb-2 Jazelle
(XEN)     Extensions: GenericTimer Security
(XEN)   Debug Features: 03010066
(XEN)   Auxiliary Features: 00000000
(XEN)   Memory Model Features: 10201105 40000000 01260000 02102211
(XEN)  ISA Features: 02101110 13112111 21232042 01112131 00011142 00011121
(XEN) Using SMC Calling Convention v1.0
(XEN) Using PSCI v1.0
(XEN) SMP: Allowing 4 CPUs
(XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 Freq: 8333 KHz
(XEN) GICv3 initialization:
(XEN)       gic_dist_addr=0x00000038800000
(XEN)       gic_maintenance_irq=25
(XEN)       gic_rdist_stride=0
(XEN)       gic_rdist_regions=1
(XEN)       redistributor regions:
(XEN)         - region 0: 0x00000038880000 - 0x00000038940000
(XEN) GICv3: 160 lines, (IID 0001143b).
(XEN) GICv3: CPU0: Found redistributor in region 0 @000000004001a000
(XEN) Using scheduler: SMP Credit Scheduler rev2 (credit2)
(XEN) Initializing Credit2 scheduler
(XEN)  load_precision_shift: 18
(XEN)  load_window_shift: 30
(XEN)  underload_balance_tolerance: 0
(XEN)  overload_balance_tolerance: -3
(XEN)  runqueues arrangement: socket
(XEN)  cap enforcement granularity: 10ms
(XEN) load tracking window length 1073741824 ns
(XEN) Adding cpu 0 to runqueue 0
(XEN)  First cpu on runqueue, activating
(XEN) Allocated console ring of 32 KiB.
(XEN) Bringing up CPU1
- CPU 00000001 booting -
- Current EL 00000008 -
- Xen starting at EL2 -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) GICv3: CPU1: Found redistributor in region 0 @000000004003a000
(XEN) Adding cpu 1 to runqueue 0
(XEN) CPU 1 booted.
(XEN) Bringing up CPU2
- CPU 00000002 booting -
- Current EL 00000008 -
- Xen starting at EL2 -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) GICv3: CPU2: Found redistributor in region 0 @000000004005a000
(XEN) Adding cpu 2 to runqueue 0
(XEN) CPU 2 booted.
(XEN) Bringing up CPU3
- CPU 00000003 booting -
- Current EL 00000008 -
- Xen starting at EL2 -
- Setting up control registers -
- Turning on paging -
- Ready -
(XEN) GICv3: CPU3: Found redistributor in region 0 @000000004007a000
(XEN) Adding cpu 3 to runqueue 0
(XEN) CPU 3 booted.
(XEN) Brought up 4 CPUs
(XEN) P2M: 40-bit IPA with 40-bit PA and 8-bit VMID
(XEN) P2M: 3 levels with order-1 root, VTCR 0x80023558
(XEN) I/O virtualisation disabled
(XEN) build-id: 0e92db8580ada91ca578019e95e002da80d47cdb
(XEN) alternatives: Patching with alt table 00000000002abbf0 -> 00000000002ac238
(XEN) *** LOADING DOMAIN 0 ***
(XEN) Loading Domd0 kernel from boot module @ 0000000040480000
(XEN) Allocating 1:1 mappings totalling 1024MB for dom0:
(XEN) BANK[0] 0x00000060000000-0x000000a0000000 (1024MB)
(XEN) Grant table range: 0x00000042000000-0x00000042040000
(XEN) Allocating PPI 16 for event channel interrupt
(XEN) Loading zImage from 0000000040480000 to 0000000060080000-0000000062280000
(XEN) Loading dom0 DTB to 0x0000000068000000-0x000000006800bab2
(XEN) Initial low memory virq threshold set at 0x4000 pages.
(XEN) Scrubbing Free RAM in background
(XEN) Std. Loglevel: All
(XEN) Guest Loglevel: All
(XEN) *** Serial input to DOM0 (type 'CTRL-a' three times to switch input)
(XEN) Data Abort Trap. Syndrome=0x6
(XEN) Walking Hypervisor VA 0x8 on CPU0 via TTBR 0x0000000042114000
(XEN) 0TH[0x0] = 0x0000000042113f7f
(XEN) 1ST[0x0] = 0x0000000042110f7f
(XEN) 2ND[0x0] = 0x0000000000000000
(XEN) CPU0: Unexpected Trap: Data Abort
(XEN) ----[ Xen-4.13-unstable  arm64  debug=y   Not tainted ]----
(XEN) CPU:    0
(XEN) PC:     000000000021c30c page_alloc.c#free_heap_pages+0x4ac/0x58c
(XEN) LR:     000000000021c308
(XEN) SP:     000080007ffefd30
(XEN) CPSR:   80000249 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000000000  X1: 0000000000000000  X2: 0000000000288db0
(XEN)      X3: ffffffffffffffff  X4: 0000000000040000  X5: 0000000000000000
(XEN)      X6: 0000000000000001  X7: 0180000000000000  X8: 0080000000000000
(XEN)      X9: 0000000000288da0 X10: 0000000000000000 X11: 0000000047ffffff
(XEN)     X12: 0000000000288000 X13: 0000000000288000 X14: 0000000000289000
(XEN)     X15: 6db6db6db6db6db7 X16: fffffff800000000 X17: 3d3d3d3d3d3d3d3d
(XEN)     X18: 0000000000289000 X19: 0000000000000000 X20: 0000000800073f38
(XEN)     X21: 0000000000000000 X22: 6db6db6db6db6db7 X23: 0000000000000013
(XEN)     X24: 0000000000288da0 X25: 0000000000289000 X26: 0000000000200200
(XEN)     X27: 0000000000100100 X28: 0000000800073f00  FP: 000080007ffefd30
(XEN)
(XEN)   VTCR_EL2: 80023558
(XEN)  VTTBR_EL2: 0000000000000000
(XEN)
(XEN)  SCTLR_EL2: 30cd183d
(XEN)    HCR_EL2: 000000000000003a
(XEN)  TTBR0_EL2: 0000000042114000
(XEN)
(XEN)    ESR_EL2: 96000006
(XEN)  HPFAR_EL2: 00000000fdce5ac0
(XEN)    FAR_EL2: 0000000000000008
(XEN)
(XEN) Xen stack trace from sp=000080007ffefd30:
(XEN)    000080007ffefd90 000000000021e5e4 00000000002e0d40 0000000800073f38
(XEN)    0000000000001ca1 0000000000002200 0000000000288d78 000000080000fc00
(XEN)    0000000000000000 0000000040000000 00000000c0000000 6db6db6db6db6db7
(XEN)    000080007ffefe00 000000000021e830 0000000000000002 00000000002b83c0
(XEN)    0000000000288da0 0000000040480000 0000000042680000 000000000021e79c
(XEN)    00000000002b83d0 0000000040000000 00000000c0000000 0000000000000001
(XEN)    0000000000000001 00000000002e0d90 000080007ffefe10 000000000029c938
(XEN)    000080007ffefe60 000000000029cef4 0000000000000002 00000000002b83c0
(XEN)    0000000000288da0 000000000021e79c 0000000042680000 0000000040480000
(XEN)    00000000be510000 00000000be51d000 000080007ffefea0 0000000000252588
(XEN)    0000000000287000 0000000000000004 0000000000287c80 000000000031a430
(XEN)    0000000000000004 00000000002836a0 00000000002d7de0 000000000029daac
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<000000000021c30c>] page_alloc.c#free_heap_pages+0x4ac/0x58c (PC)
(XEN)    [<000000000021c308>] page_alloc.c#free_heap_pages+0x4a8/0x58c (LR)
(XEN)    [<000000000021e5e4>] page_alloc.c#init_heap_pages+0x334/0x4ec
(XEN)    [<000000000021e830>] init_domheap_pages+0x94/0x9c
(XEN)    [<000000000029c938>] 000000000029c938
(XEN)    [<000000000029cef4>] 000000000029cef4
(XEN)    [<0000000000252588>] setup.c#init_done+0x10/0x20
(XEN)    [<000000000029daac>] 000000000029daac
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) CPU0: Unexpected Trap: Data Abort
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...


On the other hand, it didn't come on 4.11.

Thanks
-Amit
Julien Grall March 19, 2019, 3:05 p.m. UTC | #2
On 19/03/2019 14:01, Amit Tomer wrote:
> I tried the above patch but still see the same crash:

[..]

> 
> On the other hand, it didn't come on 4.11.

That's good to know. Can you bisect Xen and see if you can pin point a specific 
commit?

Cheers,
Julien Grall March 22, 2019, 10:48 a.m. UTC | #3
Hi Amit,

On 19/03/2019 15:05, Julien Grall wrote:
> 
> 
> On 19/03/2019 14:01, Amit Tomer wrote:
>> I tried the above patch but still see the same crash:
> 
> [..]
> 
>>
>> On the other hand, it didn't come on 4.11.
> 
> That's good to know. Can you bisect Xen and see if you can pin point a specific 
> commit?

I am wondering if you had time to bisect the issue?

We are releasing Xen 4.12 soon so we need to make the decision whether this 
should be fixed after (and backport it) or before.

Cheers,
Amit Tomer March 22, 2019, 11:35 a.m. UTC | #4
Hi,


> I am wondering if you had time to bisect the issue?
>
> We are releasing Xen 4.12 soon so we need to make the decision whether this
> should be fixed after (and backport it) or before.

I am planning to look at it on this weekends and let you know if I find
something.

Thanks
-Amit
Amit Tomer April 17, 2019, 8:02 a.m. UTC | #5
Hello,

On Fri, Mar 22, 2019 at 5:05 PM Amit Tomer <amittomer25@gmail.com> wrote:
>
> Hi,
>
>
> > I am wondering if you had time to bisect the issue?
> >
> > We are releasing Xen 4.12 soon so we need to make the decision whether this
> > should be fixed after (and backport it) or before.

Don't see this crash if I chose to have different load addresses(Load
kernel at 2 MB aligned address).
For instance, using

"setenv xen_addr_r 0x40480000 ;setenv fdt_addr_r 0x43000000;setenv
kernel_addr_r 0x41000000"
 instead of
"setenv xen_addr_r 0x42000000 ;setenv fdt_addr_r 0x43000000;setenv
kernel_addr_r 0x40480000"

Allow it boot properly(Tested it with 4.13).

Thanks
-Amit
Julien Grall April 17, 2019, 9:22 a.m. UTC | #6
On 17/04/2019 09:02, Amit Tomer wrote:
> Hello,

Hi,

> On Fri, Mar 22, 2019 at 5:05 PM Amit Tomer <amittomer25@gmail.com> wrote:
>>
>> Hi,
>>
>>
>>> I am wondering if you had time to bisect the issue?
>>>
>>> We are releasing Xen 4.12 soon so we need to make the decision whether this
>>> should be fixed after (and backport it) or before.
> 
> Don't see this crash if I chose to have different load addresses(Load
> kernel at 2 MB aligned address).
> For instance, using
> 
> "setenv xen_addr_r 0x40480000 ;setenv fdt_addr_r 0x43000000;setenv
> kernel_addr_r 0x41000000"
>   instead of
> "setenv xen_addr_r 0x42000000 ;setenv fdt_addr_r 0x43000000;setenv
> kernel_addr_r 0x40480000"

Glad to hear that you manage to boot Xen with a different load address. I would 
still like to understand what is the exact issue as this may be a bug in Xen.

On IRC, you mention the offending commit is f60658c6ae "xen/arm: Stop relocating 
Xen". You also pointed out that init_heap_pages was call with a NULL page.

Did you find out why it is NULL? What was the MFN looked up?

Cheers,
diff mbox series

Patch

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 01ae2cccc0..2c34138bbd 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1139,7 +1139,7 @@  void free_init_memory(void)
         *(p + i) = insn;
 
     set_pte_flags_on_range(__init_begin, len, mg_clear);
-    init_domheap_pages(pa, pa + len);
+    dt_unreserved_regions(pa, pa + len, init_domheap_pages, 0);
     printk("Freed %ldkB init memory.\n", (long)(__init_end-__init_begin)>>10);
 }

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 444857a967..8dbc4f819b 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -764,18 +764,18 @@  void __init start_xen(unsigned long boot_phys_offset,
               "Please check your bootloader.\n",
               fdt_paddr);
 
-    fdt_size = boot_fdt_info(device_tree_flattened, fdt_paddr);
-
-    cmdline = boot_fdt_cmdline(device_tree_flattened);
-    printk("Command line: %s\n", cmdline);
-    cmdline_parse(cmdline);
-
     /* Register Xen's load address as a boot module. */
     xen_bootmodule = add_boot_module(BOOTMOD_XEN,
                              (paddr_t)(uintptr_t)(_start + boot_phys_offset),
                              (paddr_t)(uintptr_t)(_end - _start + 1), false);
     BUG_ON(!xen_bootmodule);
 
+    fdt_size = boot_fdt_info(device_tree_flattened, fdt_paddr);
+
+    cmdline = boot_fdt_cmdline(device_tree_flattened);
+    printk("Command line: %s\n", cmdline);
+    cmdline_parse(cmdline);
+
     setup_pagetables(boot_phys_offset);
 
     setup_mm(fdt_paddr, fdt_size);