Message ID | f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com |
---|---|
State | New |
Headers | show |
Series | RISC-V reserved memory problems | expand |
Hi Conor, Sorry for the delay, somehow this slipped between the cracks. On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote: > Hullo Palmer, Mike & whoever else may read this, > > Just reviving this thread from a little while ago as I have been in the > area again recently... TBH, I didn't really dig deep into the issues, but the thought I had was what if DT was mapped via fixmap until the setup_vm_final() and then it would be possible to call DT methods early. Could be I'm shooting in the dark :) > On Tue, Aug 16, 2022 at 08:41:05PM +0000, Conor.Dooley@microchip.com wrote: > > Hey all, > > We've run into a bit of a problem with reserved memory on PolarFire, or > > more accurately a pair of problems that seem to have opposite fixes. > > > > The first of these problems is triggered when trying to implement a > > remoteproc driver. To get the reserved memory buffer, remoteproc > > does an of_reserved_mem_lookup(), something like: > > > > np = of_parse_phandle(pdev->of_node, "memory-region", 0); > > if (!np) > > return -EINVAL; > > > > rmem = of_reserved_mem_lookup(np); > > if (!rmem) > > return -EINVAL; > > > > of_reserved_mem_lookup() then uses reserved_mem[i].name to try and find > > a match - but this was triggering kernel panics for us. We did some > > debugging and found that the name string's pointer was pointing to an > > address in the 0x4000_0000 range. The minimum reproduction for this > > crash is attached - it hacks in some print_reserved_mem()s into > > setup_vm_final() around a tlb flush so you can see the before/after. > > (You'll need a reserved memory node in your dts to replicate) > > > > The output is like so, with the same crash as in the remoteproc driver: > > > > [ 0.000000] Linux version 6.0.0-rc1-00001-g0d9d6953d834 (conor@wendy) (riscv64-unknown-linux-gnu-gcc (g5964b5cd727) 11.1.0, GNU ld (GNU Binutils) 2.37) #1 SMP Tue Aug 16 13:42:09 IST 2022 > > [...] > > > [ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]--- > > > > We traced this back to early_init_fdt_scan_reserved_mem() in > > setup_bootmem() - moving it later back up the boot sequence to > > after the dt has been remapped etc has fixed the problem for us. > > > > The least movement to get it working is attached, and also pushed > > here: git.kernel.org/conor/c/1735589baefc > > This one is fixed now, as of commit 50e63dd8ed92 ("riscv: fix reserved > memory setup"). > > > The second problem is a bit more complicated to explain - but we > > found the solution conflicted with the remoteproc fix as we had > > to move early_init_fdt_scan_reserved_mem() _earlier_ in the boot > > process to solve this one. > > > > We want to have a node in our devicetree that contains some memory > > that is non-cached & marked as reserved-memory. Maybe we have just > > missed something, but from what we've seen: > > - the really early setup looks at the dtb, picks the highest bit > > of memory and puts the dtb etc there so it can start using it > > - early_init_fdt_scan_reserved_mem() is then called, which figures > > out if memory is reserved or not. > > > > Unfortunately, the highest bit of memory is the non-cached bit so > > everything falls over, but we can avoid this by moving the call to > > early_init_fdt_scan_reserved_mem() above the dtb memblock alloc that > > takes place right before it in setup_bootmem(). > > > > Obviously, both of these changes are moving the function call in > > opposite directions and we can only really do one of them. We are not > > sure if what we are doing with the non-cached reserved-memory section > > is just not permitted & cannot work - or if this is something that > > was overlooked for RISC-V specifically and works for other archs. > > We ended up working around this one by making sure that U-Boot loaded > the dtb to somewhere that would be inside the kernel's memory map, thus > avoiding the remapping in the first place. > > We did run into another problem recently though, and 50e63dd8ed92 is > kinda at fault for it. > This particular issue was encountered with a devicetree where the > top-most memory region was entirely reserved & was not observed prior > to my fix for the first issue. > > On RISC-V, the boot sequence is something like: > setup_bootmem(); > setup_vm_final(); > unflatten_device_tree(); > early_init_fdt_scan_reserved_mem(); > > Whereas, before my patch it used to be (give-or-take): > setup_bootmem(); > early_init_fdt_scan_reserved_mem(); > setup_vm_final(); > unflatten_device_tree(); > > The difference being that we used to have scanned the reserved memory > regions before calling setup_vm_final() & therefore know which regions > we cannot use. As a reminder, calling early_init_fdt_scan_reserved_mem() > before we've got the dt in a proper virtual memory address will cause > the kernel to panic if it tries to read a reserved memory node's label. > > As we are now calling setup_vm_final() *before* we know what the > reserved memory regions are & as RISC-V allocates memblocks from the top > down, the allocations in setup_vm_final() will be done in the highest > memory region. > When early_init_fdt_scan_reserved_mem() then tries to reserve the > entirety of that top-most memory region, the reservation fails as part > of this region has already been allocated. > In the scenario where I found this bug, that top-most region is non- > cached memory & the kernel ends up panicking. > The memblock debug code made this pretty easy to spot, otherwise I'd > probably have spent more than just a few hours trying to figure out why > it was panicking! > > My "this needs to be fixed today" solution for this problem was calling > memblock_set_bottom_up(true) in setup_bootmem() & that's what we are > going to carry downstream for now. > > I haven't tested it (yet) but I suspect that it would also fix our > problem of the dtb being remapped into a non-cached region of memory > that we would later go on to reserve too. Non-cached being an issue > mainly due to the panicking, but failing to reserve (and using!) memory > regions that are meant to be reserved is very far from ideal even when > they are memory that the kernel can actually use. > > I have no idea if that is an acceptable solution for upstream though, so > I guess this is me putting out feelers as to whether this is something I > should send a patch to do *OR* if this is another sign of the issues > that you (Mike, Palmer) mentioned in the past. > If it isn't an acceptable solution, I'm not really too sure how to > proceed! > > Cheers, > Conor. >
On Thu, Mar 09, 2023 at 04:12:57PM +0100, Alexandre Ghiti wrote: > On 3/9/23 13:51, Conor Dooley wrote: > > On Thu, Mar 09, 2023 at 01:45:05PM +0100, Alexandre Ghiti wrote: > > > On 3/7/23 12:35, Mike Rapoport wrote: > > > > Hi Conor, > > > > > > > > Sorry for the delay, somehow this slipped between the cracks. > > > > > > > > On Thu, Feb 02, 2023 at 10:01:26PM +0000, Conor Dooley wrote: > > > > > Hullo Palmer, Mike & whoever else may read this, > > > > > > > > > > Just reviving this thread from a little while ago as I have been in the > > > > > area again recently... > > > > TBH, I didn't really dig deep into the issues, but the thought I had was > > > > what if DT was mapped via fixmap until the setup_vm_final() and then it > > > > would be possible to call DT methods early. > > > > > > > > Could be I'm shooting in the dark :) > > > > > > I think I understand the issue now, it's because In riscv, we establish 2 > > > different virtual mappings and we map the device tree at 2 different virtual > > > addresses, which is the problem. > > > > > > So to me, the solution is: > > > > > > - to revert your previous fix, that is calling > > > early_init_fdt_scan_reserved_mem() before any call to memblock_alloc() > > > (which could result in an allocation in the area you want to reserve) > > > > > > - to map the device tree at the same virtual address, because > > > early_init_fdt_scan_reserved_mem() initializes reserved_mem with the dtb > > > mapping established in setup_vm() and uses reserved_mem with the new mapping > > > from setup_vm_final (which is what Mike proposes, we should use the fixmap > > > region to have the same virtual addresses) > > > > > > Hope that makes sense: I'll come up with something this afternoon for you to > > > test! > > Sounds good. Please give me some ELI5 commit messages if you can, > > explanations for this stuff (which I found took a lot of archaeology to > > understand) would be very welcome next time we need to go back looking > > at this stuff. > > > Can you give it a try here: > https://github.com/AlexGhiti/riscv-linux/commits/dev/alex/conor_dtb_fixmap_v1 > ? > > That works for me but I need to carefully explain and check that's correct > though, not upstreamable as is. Hey Alex, So I ended up being pretty sick & had to take a week off. I gave this an initial spin today & it appears to work. I'll take it for a longer test-drive when you send a "real" patch for it, but I tested both the lookup by name & the situation that was allocating in reserved memory and both were not an issue. Thanks for working on this, Conor.
From 14447dc618a3007bf17dd27c03f7fec095efbc6f Mon Sep 17 00:00:00 2001 From: Valentina Fernandez <valentina.fernandezalanis@microchip.com> Date: Wed, 13 Jul 2022 10:56:47 +0100 Subject: [PATCH] Debug tlb_flush with reserved memory --- arch/riscv/boot/dts/microchip/Makefile | 1 + .../microchip/mpfs-icicle-kit-context-a.dts | 203 ++++++++++++++++++ arch/riscv/mm/init.c | 9 +- drivers/of/of_reserved_mem.c | 6 + include/linux/of_reserved_mem.h | 2 + 5 files changed, 220 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/boot/dts/microchip/mpfs-icicle-kit-context-a.dts diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index d466ec670e1f..5458519c3eb4 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -20,7 +20,7 @@ #include <linux/dma-map-ops.h> #include <linux/crash_dump.h> #include <linux/hugetlb.h> - +#include <linux/of_reserved_mem.h> #include <asm/fixmap.h> #include <asm/tlbflush.h> #include <asm/sections.h> @@ -1112,8 +1112,15 @@ static void __init setup_vm_final(void) /* Move to swapper page table */ csr_write(CSR_SATP, PFN_DOWN(__pa_symbol(swapper_pg_dir)) | satp_mode); + + pr_err("before flush\n"); + print_reserved_mem(); + local_flush_tlb_all(); + pr_err("after flush\n"); + print_reserved_mem(); + pt_ops_set_late(); } #else diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c index 75caa6f5d36f..6aaebb05730d 100644 --- a/drivers/of/of_reserved_mem.c +++ b/drivers/of/of_reserved_mem.c @@ -445,3 +445,9 @@ struct reserved_mem *of_reserved_mem_lookup(struct device_node *np) return NULL; } EXPORT_SYMBOL_GPL(of_reserved_mem_lookup); + +void print_reserved_mem(void) +{ + pr_err("debug name is %s\n", reserved_mem[0].name); +} +EXPORT_SYMBOL_GPL(print_reserved_mem); \ No newline at end of file diff --git a/include/linux/of_reserved_mem.h b/include/linux/of_reserved_mem.h index 4de2a24cadc9..1fe504af3944 100644 --- a/include/linux/of_reserved_mem.h +++ b/include/linux/of_reserved_mem.h @@ -40,6 +40,8 @@ int of_reserved_mem_device_init_by_name(struct device *dev, void of_reserved_mem_device_release(struct device *dev); struct reserved_mem *of_reserved_mem_lookup(struct device_node *np); + +void print_reserved_mem(void); #else #define RESERVEDMEM_OF_DECLARE(name, compat, init) \ -- 2.25.1