Message ID | 20250506092401.646595-1-sughosh.ganu@linaro.org |
---|---|
State | New |
Headers | show |
Series | riscv: set the width of the physical address/size data type based on arch | expand |
On 5/6/25 11:24, Sughosh Ganu wrote: > U-Boot has support for both the 32-bit and 64-bit RiscV platforms. Set > the width of the phys_{addr,size}_t data types based on the register > size of the architecture. > > Currently, even the 32-bit RiscV platforms have a 64-bit > phys_{addr,size}_t data types. This causes issues on the 32-bit > platforms, where the upper 32-bits of the variables of these types > can have junk data, and that can cause all kinds of side-effects. How could it be that the upper 32-bit have junk data? When we convert from a shorter variable the compiler should fill the upper bits with zero. > > This was discovered on the qemu Riscv 32-bit platform when the return > value of an LMB API was checked, and some LMB allocation that ought > not to have failed, was failing. The upper 32-bits of the address > variable contained garbage, resulting in failures. > > Signed-off-by: Sughosh Ganu <sughosh.ganu@linaro.org> > --- > > Note: > Although the LMB API cleanup series depends on this patch, I am > sending it separately so that it gets noticed by the RiscV > maintainers. Sometimes a patch may not get the required attention when > sent as part of another seemingly unrelated series. > > > arch/riscv/include/asm/types.h | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/arch/riscv/include/asm/types.h b/arch/riscv/include/asm/types.h > index 49f7a5d6b3a..45d806c83eb 100644 > --- a/arch/riscv/include/asm/types.h > +++ b/arch/riscv/include/asm/types.h > @@ -35,8 +35,13 @@ typedef u64 dma_addr_t; > typedef u32 dma_addr_t; > #endif > > -typedef unsigned long long phys_addr_t; > -typedef unsigned long long phys_size_t; > +#ifdef CONFIG_PHYS_64BIT > +typedef u64 phys_addr_t; > +typedef u64 phys_size_t; > +#else > +typedef u32 phys_addr_t; > +typedef u32 phys_size_t; This matches what has been done for ARM. 86c915628d58 ("riscv: Change phys_addr_t and phys_size_t to 64-bit") changed the definition to 64bit phys addr_t to support the 34bit physical addresses (similar to LPAE on arm32). I don't think that we need 34bit support currently. But I would prefer if we could find the root cause why the upper 32bit gets messed up as this might point to a generic problem. Best regards Heinrich > +#endif > > #endif /* __KERNEL__ */ >
On Tue, 6 May 2025 at 15:19, Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > > On 5/6/25 11:24, Sughosh Ganu wrote: > > U-Boot has support for both the 32-bit and 64-bit RiscV platforms. Set > > the width of the phys_{addr,size}_t data types based on the register > > size of the architecture. > > > > Currently, even the 32-bit RiscV platforms have a 64-bit > > phys_{addr,size}_t data types. This causes issues on the 32-bit > > platforms, where the upper 32-bits of the variables of these types > > can have junk data, and that can cause all kinds of side-effects. > > How could it be that the upper 32-bit have junk data? > > When we convert from a shorter variable the compiler should fill the > upper bits with zero. That does not seem to be happening. The efi_fit test fails on the qemu-riscv32 platform, when attempting to boot the OS from the FIT image. These are the values of the base address that I see in the _lmb_alloc_addr() function. _lmb_alloc_addr: 755, rgn => -1, base => 0x1a1c0e00802000bc, size => 0x50b1 The actual value that gets passed is 0x802000bc. The upper 32-bits do not get zeroed out. Which causes the _lmb_alloc_addr() to return an error. You can check my branch [1], where I have put a temporary commit to print the values that cause the issue. -sughosh [1] - https://source.denx.de/u-boot/contributors/sng/u-boot/-/commits/lmb_apis_cleanup_bisectable_v2 > > > > > This was discovered on the qemu Riscv 32-bit platform when the return > > value of an LMB API was checked, and some LMB allocation that ought > > not to have failed, was failing. The upper 32-bits of the address > > variable contained garbage, resulting in failures. > > > > Signed-off-by: Sughosh Ganu <sughosh.ganu@linaro.org> > > --- > > > > Note: > > Although the LMB API cleanup series depends on this patch, I am > > sending it separately so that it gets noticed by the RiscV > > maintainers. Sometimes a patch may not get the required attention when > > sent as part of another seemingly unrelated series. > > > > > > arch/riscv/include/asm/types.h | 9 +++++++-- > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > diff --git a/arch/riscv/include/asm/types.h b/arch/riscv/include/asm/types.h > > index 49f7a5d6b3a..45d806c83eb 100644 > > --- a/arch/riscv/include/asm/types.h > > +++ b/arch/riscv/include/asm/types.h > > @@ -35,8 +35,13 @@ typedef u64 dma_addr_t; > > typedef u32 dma_addr_t; > > #endif > > > > -typedef unsigned long long phys_addr_t; > > -typedef unsigned long long phys_size_t; > > +#ifdef CONFIG_PHYS_64BIT > > +typedef u64 phys_addr_t; > > +typedef u64 phys_size_t; > > +#else > > +typedef u32 phys_addr_t; > > +typedef u32 phys_size_t; > > This matches what has been done for ARM. > > 86c915628d58 ("riscv: Change phys_addr_t and phys_size_t to 64-bit") > changed the definition to 64bit phys addr_t to support the 34bit > physical addresses (similar to LPAE on arm32). > > I don't think that we need 34bit support currently. But I would prefer > if we could find the root cause why the upper 32bit gets messed up as > this might point to a generic problem. > > Best regards > > Heinrich > > > +#endif > > > > #endif /* __KERNEL__ */ > > >
Sughosh Ganu <sughosh.ganu@linaro.org> schrieb am Di., 6. Mai 2025, 12:50: > On Tue, 6 May 2025 at 15:19, Heinrich Schuchardt > <heinrich.schuchardt@canonical.com> wrote: > > > > On 5/6/25 11:24, Sughosh Ganu wrote: > > > U-Boot has support for both the 32-bit and 64-bit RiscV platforms. Set > > > the width of the phys_{addr,size}_t data types based on the register > > > size of the architecture. > > > > > > Currently, even the 32-bit RiscV platforms have a 64-bit > > > phys_{addr,size}_t data types. This causes issues on the 32-bit > > > platforms, where the upper 32-bits of the variables of these types > > > can have junk data, and that can cause all kinds of side-effects. > > > > How could it be that the upper 32-bit have junk data? > > > > When we convert from a shorter variable the compiler should fill the > > upper bits with zero. > > That does not seem to be happening. The efi_fit test fails on the > qemu-riscv32 platform, when attempting to boot the OS from the FIT > image. > > These are the values of the base address that I see in the > _lmb_alloc_addr() function. > > _lmb_alloc_addr: 755, rgn => -1, base => 0x1a1c0e00802000bc, size => 0x50b1 > As you are running on QEMU you should be able to track down where the value is actually assigned with gdb. This could for instance be a buffer overrun. If we don't correct the root cause, it will hit us again. Best regards Heinrich > The actual value that gets passed is 0x802000bc. The upper 32-bits do > not get zeroed out. Which causes the _lmb_alloc_addr() to return an > error. You can check my branch [1], where I have put a temporary > commit to print the values that cause the issue. > > -sughosh > > [1] - > https://source.denx.de/u-boot/contributors/sng/u-boot/-/commits/lmb_apis_cleanup_bisectable_v2 > > > > > > > > > This was discovered on the qemu Riscv 32-bit platform when the return > > > value of an LMB API was checked, and some LMB allocation that ought > > > not to have failed, was failing. The upper 32-bits of the address > > > variable contained garbage, resulting in failures. > > > > > > Signed-off-by: Sughosh Ganu <sughosh.ganu@linaro.org> > > > --- > > > > > > Note: > > > Although the LMB API cleanup series depends on this patch, I am > > > sending it separately so that it gets noticed by the RiscV > > > maintainers. Sometimes a patch may not get the required attention when > > > sent as part of another seemingly unrelated series. > > > > > > > > > arch/riscv/include/asm/types.h | 9 +++++++-- > > > 1 file changed, 7 insertions(+), 2 deletions(-) > > > > > > diff --git a/arch/riscv/include/asm/types.h > b/arch/riscv/include/asm/types.h > > > index 49f7a5d6b3a..45d806c83eb 100644 > > > --- a/arch/riscv/include/asm/types.h > > > +++ b/arch/riscv/include/asm/types.h > > > @@ -35,8 +35,13 @@ typedef u64 dma_addr_t; > > > typedef u32 dma_addr_t; > > > #endif > > > > > > -typedef unsigned long long phys_addr_t; > > > -typedef unsigned long long phys_size_t; > > > +#ifdef CONFIG_PHYS_64BIT > > > +typedef u64 phys_addr_t; > > > +typedef u64 phys_size_t; > > > +#else > > > +typedef u32 phys_addr_t; > > > +typedef u32 phys_size_t; > > > > This matches what has been done for ARM. > > > > 86c915628d58 ("riscv: Change phys_addr_t and phys_size_t to 64-bit") > > changed the definition to 64bit phys addr_t to support the 34bit > > physical addresses (similar to LPAE on arm32). > > > > I don't think that we need 34bit support currently. But I would prefer > > if we could find the root cause why the upper 32bit gets messed up as > > this might point to a generic problem. > > > > Best regards > > > > Heinrich > > > > > +#endif > > > > > > #endif /* __KERNEL__ */ > > > > > >
On Tue, 6 May 2025 at 16:35, Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > > > > Sughosh Ganu <sughosh.ganu@linaro.org> schrieb am Di., 6. Mai 2025, 12:50: >> >> On Tue, 6 May 2025 at 15:19, Heinrich Schuchardt >> <heinrich.schuchardt@canonical.com> wrote: >> > >> > On 5/6/25 11:24, Sughosh Ganu wrote: >> > > U-Boot has support for both the 32-bit and 64-bit RiscV platforms. Set >> > > the width of the phys_{addr,size}_t data types based on the register >> > > size of the architecture. >> > > >> > > Currently, even the 32-bit RiscV platforms have a 64-bit >> > > phys_{addr,size}_t data types. This causes issues on the 32-bit >> > > platforms, where the upper 32-bits of the variables of these types >> > > can have junk data, and that can cause all kinds of side-effects. >> > >> > How could it be that the upper 32-bit have junk data? >> > >> > When we convert from a shorter variable the compiler should fill the >> > upper bits with zero. >> >> That does not seem to be happening. The efi_fit test fails on the >> qemu-riscv32 platform, when attempting to boot the OS from the FIT >> image. >> >> These are the values of the base address that I see in the >> _lmb_alloc_addr() function. >> >> _lmb_alloc_addr: 755, rgn => -1, base => 0x1a1c0e00802000bc, size => 0x50b1 > > > As you are running on QEMU you should be able to track down where the value is actually assigned with gdb. This could for instance be a buffer overrun. > > If we don't correct the root cause, it will hit us again. How will we hit this issue if the phys_addr_t is set to a 32-bit value? So, this can be investigated as to why the compiler does not set the upper 32-bits to 0. But given your earlier comment on the fact that LPAE is not being used, the change proposed by this patch should work? What issue do you see with a 32-bit address space. -sughosh > > Best regards > > Heinrich > >> >> The actual value that gets passed is 0x802000bc. The upper 32-bits do >> not get zeroed out. Which causes the _lmb_alloc_addr() to return an >> error. You can check my branch [1], where I have put a temporary >> commit to print the values that cause the issue. >> >> -sughosh >> >> [1] - https://source.denx.de/u-boot/contributors/sng/u-boot/-/commits/lmb_apis_cleanup_bisectable_v2 >> >> > >> > > >> > > This was discovered on the qemu Riscv 32-bit platform when the return >> > > value of an LMB API was checked, and some LMB allocation that ought >> > > not to have failed, was failing. The upper 32-bits of the address >> > > variable contained garbage, resulting in failures. >> > > >> > > Signed-off-by: Sughosh Ganu <sughosh.ganu@linaro.org> >> > > --- >> > > >> > > Note: >> > > Although the LMB API cleanup series depends on this patch, I am >> > > sending it separately so that it gets noticed by the RiscV >> > > maintainers. Sometimes a patch may not get the required attention when >> > > sent as part of another seemingly unrelated series. >> > > >> > > >> > > arch/riscv/include/asm/types.h | 9 +++++++-- >> > > 1 file changed, 7 insertions(+), 2 deletions(-) >> > > >> > > diff --git a/arch/riscv/include/asm/types.h b/arch/riscv/include/asm/types.h >> > > index 49f7a5d6b3a..45d806c83eb 100644 >> > > --- a/arch/riscv/include/asm/types.h >> > > +++ b/arch/riscv/include/asm/types.h >> > > @@ -35,8 +35,13 @@ typedef u64 dma_addr_t; >> > > typedef u32 dma_addr_t; >> > > #endif >> > > >> > > -typedef unsigned long long phys_addr_t; >> > > -typedef unsigned long long phys_size_t; >> > > +#ifdef CONFIG_PHYS_64BIT >> > > +typedef u64 phys_addr_t; >> > > +typedef u64 phys_size_t; >> > > +#else >> > > +typedef u32 phys_addr_t; >> > > +typedef u32 phys_size_t; >> > >> > This matches what has been done for ARM. >> > >> > 86c915628d58 ("riscv: Change phys_addr_t and phys_size_t to 64-bit") >> > changed the definition to 64bit phys addr_t to support the 34bit >> > physical addresses (similar to LPAE on arm32). >> > >> > I don't think that we need 34bit support currently. But I would prefer >> > if we could find the root cause why the upper 32bit gets messed up as >> > this might point to a generic problem. >> > >> > Best regards >> > >> > Heinrich >> > >> > > +#endif >> > > >> > > #endif /* __KERNEL__ */ >> > > >> >
On Tue, 6 May 2025 at 16:35, Heinrich Schuchardt <heinrich.schuchardt@canonical.com> wrote: > > > > Sughosh Ganu <sughosh.ganu@linaro.org> schrieb am Di., 6. Mai 2025, 12:50: >> >> On Tue, 6 May 2025 at 15:19, Heinrich Schuchardt >> <heinrich.schuchardt@canonical.com> wrote: >> > >> > On 5/6/25 11:24, Sughosh Ganu wrote: >> > > U-Boot has support for both the 32-bit and 64-bit RiscV platforms. Set >> > > the width of the phys_{addr,size}_t data types based on the register >> > > size of the architecture. >> > > >> > > Currently, even the 32-bit RiscV platforms have a 64-bit >> > > phys_{addr,size}_t data types. This causes issues on the 32-bit >> > > platforms, where the upper 32-bits of the variables of these types >> > > can have junk data, and that can cause all kinds of side-effects. >> > >> > How could it be that the upper 32-bit have junk data? >> > >> > When we convert from a shorter variable the compiler should fill the >> > upper bits with zero. >> >> That does not seem to be happening. The efi_fit test fails on the >> qemu-riscv32 platform, when attempting to boot the OS from the FIT >> image. >> >> These are the values of the base address that I see in the >> _lmb_alloc_addr() function. >> >> _lmb_alloc_addr: 755, rgn => -1, base => 0x1a1c0e00802000bc, size => 0x50b1 > > > As you are running on QEMU you should be able to track down where the value is actually assigned with gdb. This could for instance be a buffer overrun. I was able to hook up gdb and re-create the issue. What I observe is that when the lmb_allocate_mem() function is called, the base address parameter, which is 64-bits, shows a value with the upper 32-bits not zeroed out. So, this looks like a compiler issue, where the upper 32-bits are not being zeroed out. Fwiw, this shows up with the compiler being used in the CI environment, as well as the one that I am using. /toolchains/riscv32-linux/bin/riscv32-linux-gcc --version riscv32-linux-gcc (GCC) 13.2.0 /toolchains/riscv32-linux/bin/riscv32-linux-ld --version GNU ld (GNU Binutils) 2.41 Given that LPAE is not being used on riscv32 platforms in U-Boot, I think this patch should be considered. Thanks. -sughosh > > If we don't correct the root cause, it will hit us again. > > Best regards > > Heinrich > >> >> The actual value that gets passed is 0x802000bc. The upper 32-bits do >> not get zeroed out. Which causes the _lmb_alloc_addr() to return an >> error. You can check my branch [1], where I have put a temporary >> commit to print the values that cause the issue. >> >> -sughosh >> >> [1] - https://source.denx.de/u-boot/contributors/sng/u-boot/-/commits/lmb_apis_cleanup_bisectable_v2 >> >> > >> > > >> > > This was discovered on the qemu Riscv 32-bit platform when the return >> > > value of an LMB API was checked, and some LMB allocation that ought >> > > not to have failed, was failing. The upper 32-bits of the address >> > > variable contained garbage, resulting in failures. >> > > >> > > Signed-off-by: Sughosh Ganu <sughosh.ganu@linaro.org> >> > > --- >> > > >> > > Note: >> > > Although the LMB API cleanup series depends on this patch, I am >> > > sending it separately so that it gets noticed by the RiscV >> > > maintainers. Sometimes a patch may not get the required attention when >> > > sent as part of another seemingly unrelated series. >> > > >> > > >> > > arch/riscv/include/asm/types.h | 9 +++++++-- >> > > 1 file changed, 7 insertions(+), 2 deletions(-) >> > > >> > > diff --git a/arch/riscv/include/asm/types.h b/arch/riscv/include/asm/types.h >> > > index 49f7a5d6b3a..45d806c83eb 100644 >> > > --- a/arch/riscv/include/asm/types.h >> > > +++ b/arch/riscv/include/asm/types.h >> > > @@ -35,8 +35,13 @@ typedef u64 dma_addr_t; >> > > typedef u32 dma_addr_t; >> > > #endif >> > > >> > > -typedef unsigned long long phys_addr_t; >> > > -typedef unsigned long long phys_size_t; >> > > +#ifdef CONFIG_PHYS_64BIT >> > > +typedef u64 phys_addr_t; >> > > +typedef u64 phys_size_t; >> > > +#else >> > > +typedef u32 phys_addr_t; >> > > +typedef u32 phys_size_t; >> > >> > This matches what has been done for ARM. >> > >> > 86c915628d58 ("riscv: Change phys_addr_t and phys_size_t to 64-bit") >> > changed the definition to 64bit phys addr_t to support the 34bit >> > physical addresses (similar to LPAE on arm32). >> > >> > I don't think that we need 34bit support currently. But I would prefer >> > if we could find the root cause why the upper 32bit gets messed up as >> > this might point to a generic problem. >> > >> > Best regards >> > >> > Heinrich >> > >> > > +#endif >> > > >> > > #endif /* __KERNEL__ */ >> > > >> >
On Wed, 7 May 2025 at 13:19, Sughosh Ganu <sughosh.ganu@linaro.org> wrote: > > On Tue, 6 May 2025 at 16:35, Heinrich Schuchardt > <heinrich.schuchardt@canonical.com> wrote: > > > > > > > > Sughosh Ganu <sughosh.ganu@linaro.org> schrieb am Di., 6. Mai 2025, 12:50: > >> > >> On Tue, 6 May 2025 at 15:19, Heinrich Schuchardt > >> <heinrich.schuchardt@canonical.com> wrote: > >> > > >> > On 5/6/25 11:24, Sughosh Ganu wrote: > >> > > U-Boot has support for both the 32-bit and 64-bit RiscV platforms. Set > >> > > the width of the phys_{addr,size}_t data types based on the register > >> > > size of the architecture. > >> > > > >> > > Currently, even the 32-bit RiscV platforms have a 64-bit > >> > > phys_{addr,size}_t data types. This causes issues on the 32-bit > >> > > platforms, where the upper 32-bits of the variables of these types > >> > > can have junk data, and that can cause all kinds of side-effects. > >> > > >> > How could it be that the upper 32-bit have junk data? > >> > > >> > When we convert from a shorter variable the compiler should fill the > >> > upper bits with zero. > >> > >> That does not seem to be happening. The efi_fit test fails on the > >> qemu-riscv32 platform, when attempting to boot the OS from the FIT > >> image. > >> > >> These are the values of the base address that I see in the > >> _lmb_alloc_addr() function. > >> > >> _lmb_alloc_addr: 755, rgn => -1, base => 0x1a1c0e00802000bc, size => 0x50b1 > > > > > > As you are running on QEMU you should be able to track down where the value is actually assigned with gdb. This could for instance be a buffer overrun. > > I was able to hook up gdb and re-create the issue. What I observe is > that when the lmb_allocate_mem() function is called, the base address > parameter, which is 64-bits, shows a value with the upper 32-bits not > zeroed out. So, this looks like a compiler issue, where the upper > 32-bits are not being zeroed out. Fwiw, this shows up with the > compiler being used in the CI environment, as well as the one that I > am using. Thinking a bit on this, I don't think this is a compiler issue. The problem is that we are using the ulong type in some places(especially in the boot* commands) for storing the address values, while we use phys_addr_t in other places. And because this is a pointer being passed across functions, when the data-type that the pointer is pointing to changes from a 32-bit to 64-bit value, the upper 32-bits get considered. So the issue is that we use ulong in some places, and phys_addr_t in others for storing the addresses. But I think that the solution for this(at least for now) is to set phys_addr_t based on the underlying architecture. In the long run, there needs to be an audit of the usage of ulong for storing addresses, and that needs to be changed to phys_addr_t. -sughosh > > /toolchains/riscv32-linux/bin/riscv32-linux-gcc --version > riscv32-linux-gcc (GCC) 13.2.0 > > /toolchains/riscv32-linux/bin/riscv32-linux-ld --version > GNU ld (GNU Binutils) 2.41 > > Given that LPAE is not being used on riscv32 platforms in U-Boot, I > think this patch should be considered. Thanks. > > -sughosh > > > > > If we don't correct the root cause, it will hit us again. > > > > Best regards > > > > Heinrich > > > >> > >> The actual value that gets passed is 0x802000bc. The upper 32-bits do > >> not get zeroed out. Which causes the _lmb_alloc_addr() to return an > >> error. You can check my branch [1], where I have put a temporary > >> commit to print the values that cause the issue. > >> > >> -sughosh > >> > >> [1] - https://source.denx.de/u-boot/contributors/sng/u-boot/-/commits/lmb_apis_cleanup_bisectable_v2 > >> > >> > > >> > > > >> > > This was discovered on the qemu Riscv 32-bit platform when the return > >> > > value of an LMB API was checked, and some LMB allocation that ought > >> > > not to have failed, was failing. The upper 32-bits of the address > >> > > variable contained garbage, resulting in failures. > >> > > > >> > > Signed-off-by: Sughosh Ganu <sughosh.ganu@linaro.org> > >> > > --- > >> > > > >> > > Note: > >> > > Although the LMB API cleanup series depends on this patch, I am > >> > > sending it separately so that it gets noticed by the RiscV > >> > > maintainers. Sometimes a patch may not get the required attention when > >> > > sent as part of another seemingly unrelated series. > >> > > > >> > > > >> > > arch/riscv/include/asm/types.h | 9 +++++++-- > >> > > 1 file changed, 7 insertions(+), 2 deletions(-) > >> > > > >> > > diff --git a/arch/riscv/include/asm/types.h b/arch/riscv/include/asm/types.h > >> > > index 49f7a5d6b3a..45d806c83eb 100644 > >> > > --- a/arch/riscv/include/asm/types.h > >> > > +++ b/arch/riscv/include/asm/types.h > >> > > @@ -35,8 +35,13 @@ typedef u64 dma_addr_t; > >> > > typedef u32 dma_addr_t; > >> > > #endif > >> > > > >> > > -typedef unsigned long long phys_addr_t; > >> > > -typedef unsigned long long phys_size_t; > >> > > +#ifdef CONFIG_PHYS_64BIT > >> > > +typedef u64 phys_addr_t; > >> > > +typedef u64 phys_size_t; > >> > > +#else > >> > > +typedef u32 phys_addr_t; > >> > > +typedef u32 phys_size_t; > >> > > >> > This matches what has been done for ARM. > >> > > >> > 86c915628d58 ("riscv: Change phys_addr_t and phys_size_t to 64-bit") > >> > changed the definition to 64bit phys addr_t to support the 34bit > >> > physical addresses (similar to LPAE on arm32). > >> > > >> > I don't think that we need 34bit support currently. But I would prefer > >> > if we could find the root cause why the upper 32bit gets messed up as > >> > this might point to a generic problem. > >> > > >> > Best regards > >> > > >> > Heinrich > >> > > >> > > +#endif > >> > > > >> > > #endif /* __KERNEL__ */ > >> > > > >> >
On Wed, May 07, 2025 at 03:11:38PM +0530, Sughosh Ganu wrote: > On Wed, 7 May 2025 at 13:19, Sughosh Ganu <sughosh.ganu@linaro.org> wrote: > > > > On Tue, 6 May 2025 at 16:35, Heinrich Schuchardt > > <heinrich.schuchardt@canonical.com> wrote: > > > > > > > > > > > > Sughosh Ganu <sughosh.ganu@linaro.org> schrieb am Di., 6. Mai 2025, 12:50: > > >> > > >> On Tue, 6 May 2025 at 15:19, Heinrich Schuchardt > > >> <heinrich.schuchardt@canonical.com> wrote: > > >> > > > >> > On 5/6/25 11:24, Sughosh Ganu wrote: > > >> > > U-Boot has support for both the 32-bit and 64-bit RiscV platforms. Set > > >> > > the width of the phys_{addr,size}_t data types based on the register > > >> > > size of the architecture. > > >> > > > > >> > > Currently, even the 32-bit RiscV platforms have a 64-bit > > >> > > phys_{addr,size}_t data types. This causes issues on the 32-bit > > >> > > platforms, where the upper 32-bits of the variables of these types > > >> > > can have junk data, and that can cause all kinds of side-effects. > > >> > > > >> > How could it be that the upper 32-bit have junk data? > > >> > > > >> > When we convert from a shorter variable the compiler should fill the > > >> > upper bits with zero. > > >> > > >> That does not seem to be happening. The efi_fit test fails on the > > >> qemu-riscv32 platform, when attempting to boot the OS from the FIT > > >> image. > > >> > > >> These are the values of the base address that I see in the > > >> _lmb_alloc_addr() function. > > >> > > >> _lmb_alloc_addr: 755, rgn => -1, base => 0x1a1c0e00802000bc, size => 0x50b1 > > > > > > > > > As you are running on QEMU you should be able to track down where the value is actually assigned with gdb. This could for instance be a buffer overrun. > > > > I was able to hook up gdb and re-create the issue. What I observe is > > that when the lmb_allocate_mem() function is called, the base address > > parameter, which is 64-bits, shows a value with the upper 32-bits not > > zeroed out. So, this looks like a compiler issue, where the upper > > 32-bits are not being zeroed out. Fwiw, this shows up with the > > compiler being used in the CI environment, as well as the one that I > > am using. > > Thinking a bit on this, I don't think this is a compiler issue. The > problem is that we are using the ulong type in some places(especially > in the boot* commands) for storing the address values, while we use > phys_addr_t in other places. And because this is a pointer being > passed across functions, when the data-type that the pointer is > pointing to changes from a 32-bit to 64-bit value, the upper 32-bits > get considered. So the issue is that we use ulong in some places, and > phys_addr_t in others for storing the addresses. > > But I think that the solution for this(at least for now) is to set > phys_addr_t based on the underlying architecture. In the long run, > there needs to be an audit of the usage of ulong for storing > addresses, and that needs to be changed to phys_addr_t. Thanks for digging in to this more. I agree with what you're saying here for both the short and long term.
On Wed, 7 May 2025 at 21:18, Tom Rini <trini@konsulko.com> wrote: > > On Wed, May 07, 2025 at 03:11:38PM +0530, Sughosh Ganu wrote: > > On Wed, 7 May 2025 at 13:19, Sughosh Ganu <sughosh.ganu@linaro.org> wrote: > > > > > > On Tue, 6 May 2025 at 16:35, Heinrich Schuchardt > > > <heinrich.schuchardt@canonical.com> wrote: > > > > > > > > > > > > > > > > Sughosh Ganu <sughosh.ganu@linaro.org> schrieb am Di., 6. Mai 2025, 12:50: > > > >> > > > >> On Tue, 6 May 2025 at 15:19, Heinrich Schuchardt > > > >> <heinrich.schuchardt@canonical.com> wrote: > > > >> > > > > >> > On 5/6/25 11:24, Sughosh Ganu wrote: > > > >> > > U-Boot has support for both the 32-bit and 64-bit RiscV platforms. Set > > > >> > > the width of the phys_{addr,size}_t data types based on the register > > > >> > > size of the architecture. > > > >> > > > > > >> > > Currently, even the 32-bit RiscV platforms have a 64-bit > > > >> > > phys_{addr,size}_t data types. This causes issues on the 32-bit > > > >> > > platforms, where the upper 32-bits of the variables of these types > > > >> > > can have junk data, and that can cause all kinds of side-effects. > > > >> > > > > >> > How could it be that the upper 32-bit have junk data? > > > >> > > > > >> > When we convert from a shorter variable the compiler should fill the > > > >> > upper bits with zero. > > > >> > > > >> That does not seem to be happening. The efi_fit test fails on the > > > >> qemu-riscv32 platform, when attempting to boot the OS from the FIT > > > >> image. > > > >> > > > >> These are the values of the base address that I see in the > > > >> _lmb_alloc_addr() function. > > > >> > > > >> _lmb_alloc_addr: 755, rgn => -1, base => 0x1a1c0e00802000bc, size => 0x50b1 > > > > > > > > > > > > As you are running on QEMU you should be able to track down where the value is actually assigned with gdb. This could for instance be a buffer overrun. > > > > > > I was able to hook up gdb and re-create the issue. What I observe is > > > that when the lmb_allocate_mem() function is called, the base address > > > parameter, which is 64-bits, shows a value with the upper 32-bits not > > > zeroed out. So, this looks like a compiler issue, where the upper > > > 32-bits are not being zeroed out. Fwiw, this shows up with the > > > compiler being used in the CI environment, as well as the one that I > > > am using. > > > > Thinking a bit on this, I don't think this is a compiler issue. The > > problem is that we are using the ulong type in some places(especially > > in the boot* commands) for storing the address values, while we use > > phys_addr_t in other places. And because this is a pointer being > > passed across functions, when the data-type that the pointer is > > pointing to changes from a 32-bit to 64-bit value, the upper 32-bits > > get considered. So the issue is that we use ulong in some places, and > > phys_addr_t in others for storing the addresses. > > > > But I think that the solution for this(at least for now) is to set > > phys_addr_t based on the underlying architecture. In the long run, > > there needs to be an audit of the usage of ulong for storing > > addresses, and that needs to be changed to phys_addr_t. > > Thanks for digging in to this more. I agree with what you're saying here > for both the short and long term. Heinrich and I had a discussion on IRC on this, and for the short term, it was decided to instead have the ulong values copied into a local variable of type phys_addr_t before calling the lmb API. This approach too will work for now. Heinrich is of the opinion that it would be better not to make the change to the riscv32 file as the maintainers think it appropriate to use u64 for phys_addr_t. I will be making this change as part of my upcoming version of the lmb API series. I will be on leave for the next week, and will send the v2 once back. Thanks. -sughosh > > -- > Tom
diff --git a/arch/riscv/include/asm/types.h b/arch/riscv/include/asm/types.h index 49f7a5d6b3a..45d806c83eb 100644 --- a/arch/riscv/include/asm/types.h +++ b/arch/riscv/include/asm/types.h @@ -35,8 +35,13 @@ typedef u64 dma_addr_t; typedef u32 dma_addr_t; #endif -typedef unsigned long long phys_addr_t; -typedef unsigned long long phys_size_t; +#ifdef CONFIG_PHYS_64BIT +typedef u64 phys_addr_t; +typedef u64 phys_size_t; +#else +typedef u32 phys_addr_t; +typedef u32 phys_size_t; +#endif #endif /* __KERNEL__ */
U-Boot has support for both the 32-bit and 64-bit RiscV platforms. Set the width of the phys_{addr,size}_t data types based on the register size of the architecture. Currently, even the 32-bit RiscV platforms have a 64-bit phys_{addr,size}_t data types. This causes issues on the 32-bit platforms, where the upper 32-bits of the variables of these types can have junk data, and that can cause all kinds of side-effects. This was discovered on the qemu Riscv 32-bit platform when the return value of an LMB API was checked, and some LMB allocation that ought not to have failed, was failing. The upper 32-bits of the address variable contained garbage, resulting in failures. Signed-off-by: Sughosh Ganu <sughosh.ganu@linaro.org> --- Note: Although the LMB API cleanup series depends on this patch, I am sending it separately so that it gets noticed by the RiscV maintainers. Sometimes a patch may not get the required attention when sent as part of another seemingly unrelated series. arch/riscv/include/asm/types.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)