mbox series

[RFC,V3,00/43] rv64ilp32_abi: Build CONFIG_64BIT kernel-self with ILP32 ABI

Message ID 20250325121624.523258-1-guoren@kernel.org
Headers show
Series rv64ilp32_abi: Build CONFIG_64BIT kernel-self with ILP32 ABI | expand

Message

Guo Ren March 25, 2025, 12:15 p.m. UTC
From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>

Since 2001, the CONFIG_64BIT kernel has been built with the LP64 ABI,
but this patchset allows the CONFIG_64BIT kernel to use an ILP32 ABI
for construction to reduce cache & memory footprint (Compared to
kernel-lp64-abi, kernel-rv64ilp32-abi decreased the used memory by
about 20%, as shown in "free -h" in the following demo.)

Caution: this patchset doesn't introduce any new userspace ABI; it's
only for the kernel-self.

The patchset targets RISC-V and is built on the RV64ILP32 ABI, which
was introduced into RISC-V's psABI in January 2025 [1]. This patchset
equips an rv64ilp32-abi kernel with all the functionalities of a
traditional lp64-abi kernel, yet restricts the address space to 2GiB.
Hence, the rv64ilp32-abi kernel simultaneously supports lp64-abi
userspace and ilp32-abi (compat) userspace, the same as the
traditional lp64-abi kernel.

  +--------------------------------+
  | +-------------+--------------+ |
  | |             |   (compat)   | |
  | |  lp64-abi   |   ilp32-abi  | | User
  | +-------------+--------------+ |
  +--------------------------------+-------
  | +----------------------------+ |
  | |  rv64ilp32-abi / lp64-abi  | | Kernel
  | |  ^^^^^^^^^^^^^             | |
  | +----------------------------+ |
  +--------------lp64-sbi----------+-------
  | +----------------------------+ |
  | |            lp64-abi        | | OpenSBI
  | +----------------------------+ |
  +--------------------------------+-------
  | +----------------------------+ |
  | |  rv64gcbvh (RISC-V 64-bit) | | ISA
  | +----------------------------+ |
  +--------------------------------+

Caution: The rv64ilp32-abi and lp64-abi kernels are equivalent
and can be used interchangeably. The only difference is that the
rv64ilp32-abi kernel restricts kernel and user space to separate 
2GiB address spaces.


Motivation
==========
Because all RISC-V RVA(B) Profiles are based on the 64-bit ISA, the market
has experienced a significant rise in RISC-V 64-bit ISA SoCs and CPU cores
for resource-constrained scenarios, such as:

 - allwinner/sun20i-d1-lichee
 - allwinner/sun20i-d1s-mangopi
 - bouffalo/bl808
 - canaan/k230d
 - microchip/mpfs-beaglev-fire
 - renesas/rzfive-smarc
 - sophgo/cv1800b
 - sophgo/cv1812h
 - sophgo/sg2002

The listed RV64 ISA-based SoCs with limited memory (less than 1GiB) can
benefit from this patchset. The patchset's benefit is not only decreasing
the memory footprint but also improving performance due to increased cache
density. Hence, All RVA(B) Profile hardwares can benefit from this
patchset.


Patchset Organization
=====================
This patchset is now in its third version. The major update is the
shift to CONFIG_64BIT with user lp64-abi & ilp32-abi support. The prior
versions (v1, v2) are all based on CONFIG_32BIT and only support the
user ilp32-abi.

The innovation of v3 lies in supporting user lp64-abi by inheriting
CONFIG_64BIT.

This patchset comprises 43 patches affecting more than 20 subsystems.
Most modifications are about ensuring the correct usage of
BITS_PER_LONG and CONFIG_64BIT. Part of the Linux code doesn't care
about that because BITS_PER_LONG and CONFIG_64BIT were the same before.

 - PATCH[1]    : The rv64ilp32-abi kernel reuses lp64-abi uapi.
 - PATCH[2~17] : The riscv subsystem-related modifications.
 - PATCH[18~43]: Other subsystem-related modifications.

The first patch needs discussion and is titled "uapi: Reuse lp64 ABI
interface." How do we define a unified set of lp64-abi uapi header
files that could be utilized for the lp64-abi kernel and the
rv64ilp32-abi kernel?

To get started with the patch set quickly, check out the following
demo.


Demo Introduction
=================
To test the patchset, use a riscv64 toolchain with rv64ilp32-abi
support. The rv64ilp32-abi is integrated as a -mabi=ilp32 feature within
the standard rv64 toolchain. We've built a multi-lib riscv64-elf-toolchain
[2] containing rv64ilp32-abi. We also provide the pre-compiled demo
materials for a quick start, such as qemu, kernel, and rootfs binaries for
the demo.

After download from [2]:
$ tar zxvf riscv64-elf-ubuntu-20(2).04-gcc-nightly-2025.03.24-nightly.tar.gz
$ cd riscv/qemu-linux
 - Image_rv64ilp32		rv64ilp32-abi kernel
 - Image_rv64lp64		lp64-abi kernel
 - u64lp64_rootfs.ext2		lp64-abi userspace rootfs
 - start-qemu-rv64.sh		qemu running wrapper script

Compile Image_rv64ilp32:
$ make ARCH=riscv CROSS_COMPILE=<download path>/riscv/bin/riscv64-unknown-elf- rv64ilp32_defconfig all

Compile Image_rv64lp64:
$ make ARCH=riscv CROSS_COMPILE=<download path>/riscv/bin/riscv64-unknown-elf- defconfig all

Quick Start:
$ ./start-qemu-rv64.sh Image_rv64ilp32 u64lp64_rootfs.ext2
v.s.
$ ./start-qemu-rv64.sh Image_rv64lp64  u64lp64_rootfs.ext2

Used Memory Comparison
======================
Under the same configuration, the used memory decreased by 20% (10.8 ->
8.2) with the Image_rv64ilp32 replacement (Qemu, firmware, and lp64-abi
user rootfs are the same).

$ ./start-qemu-rv64.sh Image_rv64ilp32 u64lp64_rootfs.ext2
$ free -h
       total  used   free  shared  buff/cache   available
Mem:  105.4M  8.2M  93.9M   44.0K        3.3M       93.6M
              ^^^^

$ ./start-qemu-rv64.sh Image_rv64lp64  u64lp64_rootfs.ext2
$ free -h
       total  used   free  shared  buff/cache   available
Mem:   89.3M 10.8M  74.8M   44.0K        3.7M       74.9M
             ^^^^^

$ cat start-qemu-rv64.sh
exec qemu-system-riscv64 -cpu rv64 -M virt -m 128m -nographic -kernel $1 -drive file=$2,format=raw,id=hd0 -device virtio-blk-device,drive=hd0 -append "rootwait root=/dev/vda ro console=ttyS0 earlycon=sbi norandmaps no5lvl no4lvl"


User Virtual Memory Layout
==========================
Here is the comparison running lp64-abi userspace rootfs on
rv64ilp32-abi kernel and lp64-abi kernel:

(rv64ilp32-abi kernel + lp64-abi user rootfs)
$ cat /proc/1/maps
55555000-5560c000 r-xp 00000000 fe:00 17         /bin/busybox
5560c000-5560f000 r--p 000b7000 fe:00 17         /bin/busybox
5560f000-55610000 rw-p 000ba000 fe:00 17         /bin/busybox
55610000-55631000 rw-p 00000000 00:00 0          [heap]
77e69000-77e6b000 rw-p 00000000 00:00 0
77e6b000-77fba000 r-xp 00000000 fe:00 140        /lib/libc.so.6
77fba000-77fbd000 r--p 0014f000 fe:00 140        /lib/libc.so.6
77fbd000-77fbf000 rw-p 00152000 fe:00 140        /lib/libc.so.6
77fbf000-77fcb000 rw-p 00000000 00:00 0
77fcb000-77fd5000 r-xp 00000000 fe:00 148        /lib/libresolv.so.2
77fd5000-77fd6000 r--p 0000a000 fe:00 148        /lib/libresolv.so.2
77fd6000-77fd7000 rw-p 0000b000 fe:00 148        /lib/libresolv.so.2
77fd7000-77fd9000 rw-p 00000000 00:00 0
77fd9000-77fdb000 r--p 00000000 00:00 0          [vvar]
77fdb000-77fdc000 r-xp 00000000 00:00 0          [vdso]
77fdc000-77ffc000 r-xp 00000000 fe:00 135        /lib/ld-linux-riscv64-lp64d.so.1
77ffc000-77ffe000 r--p 0001f000 fe:00 135        /lib/ld-linux-riscv64-lp64d.so.1
77ffe000-78000000 rw-p 00021000 fe:00 135        /lib/ld-linux-riscv64-lp64d.so.1
7ffdf000-80000000 rw-p 00000000 00:00 0          [stack]

(lp64-abi kernel + lp64-abi user rootfs)
$ cat /proc/1/maps
2aaaaaa000-2aaab61000 r-xp 00000000 fe:00 17     /bin/busybox
2aaab61000-2aaab64000 r--p 000b7000 fe:00 17     /bin/busybox
2aaab64000-2aaab65000 rw-p 000ba000 fe:00 17     /bin/busybox
2aaab65000-2aaab86000 rw-p 00000000 00:00 0      [heap]
3ff7e69000-3ff7e6b000 rw-p 00000000 00:00 0
3ff7e6b000-3ff7fba000 r-xp 00000000 fe:00 140    /lib/libc.so.6
3ff7fba000-3ff7fbd000 r--p 0014f000 fe:00 140    /lib/libc.so.6
3ff7fbd000-3ff7fbf000 rw-p 00152000 fe:00 140    /lib/libc.so.6
3ff7fbf000-3ff7fcb000 rw-p 00000000 00:00 0
3ff7fcb000-3ff7fd5000 r-xp 00000000 fe:00 148    /lib/libresolv.so.2
3ff7fd5000-3ff7fd6000 r--p 0000a000 fe:00 148    /lib/libresolv.so.2
3ff7fd6000-3ff7fd7000 rw-p 0000b000 fe:00 148    /lib/libresolv.so.2
3ff7fd7000-3ff7fd9000 rw-p 00000000 00:00 0
3ff7fd9000-3ff7fdb000 r--p 00000000 00:00 0      [vvar]
3ff7fdb000-3ff7fdc000 r-xp 00000000 00:00 0      [vdso]
3ff7fdc000-3ff7ffc000 r-xp 00000000 fe:00 135    /lib/ld-linux-riscv64-lp64d.so.1
3ff7ffc000-3ff7ffe000 r--p 0001f000 fe:00 135    /lib/ld-linux-riscv64-lp64d.so.1
3ff7ffe000-3ff8000000 rw-p 00021000 fe:00 135    /lib/ld-linux-riscv64-lp64d.so.1
3ffffdf000-4000000000 rw-p 00000000 00:00 0      [stack]

~~~~~~~~~~~~~~~~~~~~~~
For the ilp32-abi userspace rootfs, the virtual memory layouts
are the same:
(lp64-abi/rv64ilp32-abi kernel + ilp32-abi user rootfs)
$ cat /proc/1/maps
55555000-55637000 r-xp 00000000 fe:00 17         /bin/busybox
55637000-55639000 r--p 000e1000 fe:00 17         /bin/busybox
55639000-5563a000 rw-p 000e3000 fe:00 17         /bin/busybox
5563a000-5565c000 rw-p 00000000 00:00 0          [heap]
77e63000-77fbe000 r-xp 00000000 fe:00 145        /lib/libc.so.6
77fbe000-77fc0000 r--p 0015a000 fe:00 145        /lib/libc.so.6
77fc0000-77fc1000 rw-p 0015c000 fe:00 145        /lib/libc.so.6
77fc1000-77fcb000 rw-p 00000000 00:00 0
77fcb000-77fd5000 r-xp 00000000 fe:00 154        /lib/libresolv.so.2
77fd5000-77fd6000 r--p 0000a000 fe:00 154        /lib/libresolv.so.2
77fd6000-77fd7000 rw-p 0000b000 fe:00 154        /lib/libresolv.so.2
77fd7000-77fd9000 rw-p 00000000 00:00 0
77fd9000-77fdb000 r--p 00000000 00:00 0          [vvar]
77fdb000-77fdc000 r-xp 00000000 00:00 0          [vdso]
77fdc000-77ffe000 r-xp 00000000 fe:00 138        /lib/ld-linux-riscv32-ilp32d.so.1
77ffe000-77fff000 r--p 00022000 fe:00 138        /lib/ld-linux-riscv32-ilp32d.so.1
77fff000-78000000 rw-p 00023000 fe:00 138        /lib/ld-linux-riscv32-ilp32d.so.1
7ffdf000-80000000 rw-p 00000000 00:00 0          [stack]


Kernel Virtual Memory Layout
============================
Here is the comparison on rv64ilp32-abi kernel and lp64-abi kernel:

Virtual kernel memory layout (rv64ilp32-abi kernel):
   fixmap : 0x94a00000 - 0x94ffffff   (6144 kB)
   pci io : 0x95000000 - 0x95ffffff   (  16 MB)
  vmemmap : 0x96000000 - 0x97ffffff   (  32 MB)
  vmalloc : 0x98000000 - 0xb7ffffff   ( 512 MB)
  modules : 0xb8000000 - 0xbbffffff   (  64 MB)
   lowmem : 0xc0000000 - 0xc7ffffff   ( 128 MB)
    kasan : 0x80000000 - 0x8fffffff   ( 256 MB)
   kernel : 0xbc000000 - 0xbfffffff   (  64 MB)

Virtual kernel memory layout (lp64-abi kernel):
   fixmap : 0xffffffc4fea00000 - 0xffffffc4feffffff   (6144 kB)
   pci io : 0xffffffc4ff000000 - 0xffffffc4ffffffff   (  16 MB)
  vmemmap : 0xffffffc500000000 - 0xffffffc5ffffffff   (4096 MB)
  vmalloc : 0xffffffc600000000 - 0xffffffd5ffffffff   (  64 GB)
  modules : 0xffffffff01591000 - 0xffffffff7fffffff   (2026 MB)
   lowmem : 0xffffffd600000000 - 0xffffffd607ffffff   ( 128 MB)
    kasan : 0xfffffff700000000 - 0xfffffffeffffffff   (  32 GB)
   kernel : 0xffffffff80000000 - 0xfffffffffffffffe   (2047 MB)


Memory Info Comparison
======================

$ ./start-qemu-rv64.sh Image_rv64ilp32 u64lp64_rootfs.ext2
$ cat /proc/meminfo
MemTotal:         107916 kB
MemFree:           96200 kB
MemAvailable:      95932 kB
Buffers:             448 kB
Cached:             2268 kB
SwapCached:            0 kB
Active:             2432 kB
Inactive:            832 kB
Active(anon):         44 kB
Inactive(anon):      548 kB
Active(file):       2388 kB
Inactive(file):      284 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               108 kB
Writeback:             0 kB
AnonPages:           648 kB
Mapped:             1672 kB
Shmem:                44 kB
KReclaimable:        688 kB
Slab:               4996 kB
SReclaimable:        688 kB
SUnreclaim:         4308 kB
KernelStack:         768 kB
PageTables:          164 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:       53956 kB
Committed_AS:       2240 kB
VmallocTotal:     524288 kB
VmallocUsed:         924 kB
VmallocChunk:          0 kB
Percpu:               76 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB

$ ./start-qemu-rv64.sh Image_rv64lp64 u64lp64_rootfs.ext2
$ cat /proc/meminfo
MemTotal:          91428 kB
MemFree:           77048 kB
MemAvailable:      77172 kB
Buffers:             448 kB
Cached:             2268 kB
SwapCached:            0 kB
Active:             2492 kB
Inactive:            768 kB
Active(anon):         48 kB
Inactive(anon):      540 kB
Active(file):       2444 kB
Inactive(file):      228 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                32 kB
Writeback:             0 kB
AnonPages:           648 kB
Mapped:             1768 kB
Shmem:                44 kB
KReclaimable:       1140 kB
Slab:               7220 kB
SReclaimable:       1140 kB
SUnreclaim:         6080 kB
KernelStack:         768 kB
PageTables:          408 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:       45712 kB
Committed_AS:       2240 kB
VmallocTotal:   67108864 kB
VmallocUsed:         864 kB
VmallocChunk:          0 kB
Percpu:               88 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB


CPU Info Comparison
===================
After disabling sv48 and sv57, there is no difference in the "/proc/cpuinfo".

$ ./start-qemu-rv64.sh Image_rv64lp64 u64lp64_rootfs.ext2
$ cat /proc/cpuinfo
processor       : 0
hart            : 0
isa             : rv64imafdch_zicbom_zicboz_zicntr_zicsr_zifencei_zihintpause_zihpm_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_sstc_svadu
mmu             : sv39
mvendorid       : 0x0
marchid         : 0x0
mimpid          : 0x0
hart isa        : rv64imafdch_zicbom_zicboz_zicntr_zicsr_zifencei_zihintpause_zihpm_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_sstc_svadu

$ ./start-qemu-rv64.sh Image_rv64ilp32 u64lp64_rootfs.ext2
$ cat /proc/cpuinfo
processor       : 0
hart            : 0
isa             : rv64imafdch_zicbom_zicboz_zicntr_zicsr_zifencei_zihintpause_zihpm_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_sstc_svadu
mmu             : sv39
mvendorid       : 0x0
marchid         : 0x0
mimpid          : 0x0
hart isa        : rv64imafdch_zicbom_zicboz_zicntr_zicsr_zifencei_zihintpause_zihpm_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_sstc_svadu


.config Difference
==================
The patchset adds CONFIG_ABI_RV64ILP32 to Kconfig, switching the
compile options from "-mabi=lp64 -melf64lriscv" to
"-mabi=ilp32 -melf32lriscv" depending on CONFIG_64BIT. So, The
differences of Kconfig between with and without ABI_RV64ILP32
are rare:

 - CONFIG_PAGE_OFFSET		Change to 0xc0000000.
 - CONFIG_ILLEGAL_POINTER_VALUE	Change to 0x0.
 - CONFIG_ABI_RV64ILP32		Compile option depends on CONFIG_64BIT.
 - CONFIG_HAVE_CMPXCHG_DOUBLE	The rv64ilp32-abi kernel offers new feature.
 - CONFIG_ZONE_DMA32		It's unnecessary for rv64ilp32-abi kernel.
 - CONFIG_CSD_LOCK_WAIT_DEBUG	Because of BITS_PER_LONG = 32, rv64ilp32-abi
				kernel doesn't support.

$ diff build-rv64lp64/.config build-rv64ilp32/.config
296c296
< CONFIG_PAGE_OFFSET=0xff60000000000000
---
> CONFIG_PAGE_OFFSET=0xc0000000
308c308
< CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
---
> CONFIG_ILLEGAL_POINTER_VALUE=0
352c352
< # CONFIG_ABI_RV64ILP32 is not set
---
> CONFIG_ABI_RV64ILP32=y
609a610
> CONFIG_HAVE_CMPXCHG_DOUBLE=y
837d837
< CONFIG_ZONE_DMA32=y
7240d7239
< # CONFIG_CSD_LOCK_WAIT_DEBUG is not set

6 differences

Use "zcat /proc/config.gz" to get the .config file in our qemu demo.


Dmesg Difference
================

$ diff rv64lp64.log rv64ilp32.log
1c1
< Linux version 6.14.0-rc1-00041-g804ac3b4d679 (ren.guo@ea134-sw12.eng.xrvm.cn) (riscv64-unknown-elf-gcc (gf9ffd92f861-dirty) 13.2.0, GNU ld (GNU Binutils) 2.42) #1 SMP Sat Mar 15 11:57:20 CST 2025
---
> Linux version 6.14.0-rc1-00041-g804ac3b4d679 (ren.guo@ea134-sw12.eng.xrvm.cn) (riscv64-unknown-elf-gcc (gf9ffd92f861-dirty) 13.2.0, GNU ld (GNU Binutils) 2.42) #1 SMP Sat Mar 15 11:55:33 CST 2025
13,14d12
< Disabled 5-level paging
< Disabled 4-level and 5-level paging
19,20c17
<   DMA32    [mem 0x0000000060000000-0x0000000067ffffff]
<   Normal   empty
---
>   Normal   [mem 0x0000000060000000-0x0000000067ffffff]
31c28
< percpu: Embedded 22 pages/cpu s49384 r8192 d32536 u90112
---
> percpu: Embedded 16 pages/cpu s34264 r8192 d23080 u65536
33,35c30,33
< printk: log buffer data + meta data: 131072 + 458752 = 589824 bytes
< Dentry cache hash table entries: 16384 (order: 5, 131072 bytes, linear)
< Inode-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
---
> Unknown kernel command line parameters "no5lvl no4lvl", will be passed to user space.
> printk: log buffer data + meta data: 131072 + 409600 = 540672 bytes
> Dentry cache hash table entries: 16384 (order: 4, 65536 bytes, linear)
> Inode-cache hash table entries: 8192 (order: 3, 32768 bytes, linear)
40c38
< software IO TLB: mapped [mem 0x0000000067f6b000-0x0000000067fab000] (0MB)
---
> software IO TLB: mapped [mem 0x0000000067f8e000-0x0000000067fce000] (0MB)
42,48c40,46
<       fixmap : 0xffffffc4fea00000 - 0xffffffc4feffffff   (6144 kB)
<       pci io : 0xffffffc4ff000000 - 0xffffffc4ffffffff   (  16 MB)
<      vmemmap : 0xffffffc500000000 - 0xffffffc5ffffffff   (4096 MB)
<      vmalloc : 0xffffffc600000000 - 0xffffffd5ffffffff   (  64 GB)
<      modules : 0xffffffff0158d000 - 0xffffffff7fffffff   (2026 MB)
<       lowmem : 0xffffffd600000000 - 0xffffffd607ffffff   ( 128 MB)
<       kernel : 0xffffffff80000000 - 0xfffffffffffffffe   (2047 MB)
---
>       fixmap : 0x94a00000 - 0x94ffffff   (6144 kB)
>       pci io : 0x95000000 - 0x95ffffff   (  16 MB)
>      vmemmap : 0x96000000 - 0x97ffffff   (  32 MB)
>      vmalloc : 0x98000000 - 0xb7ffffff   ( 512 MB)
>      modules : 0xb8000000 - 0xbbffffff   (  64 MB)
>       lowmem : 0xc0000000 - 0xc7ffffff   ( 128 MB)
>       kernel : 0xbc000000 - 0xbfffffff   (  64 MB)
68,69c66,67
< Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
< Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
---
> Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
> Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
77c75
< Memory: 87584K/131072K available (9731K kernel code, 4933K rwdata, 4096K rodata, 2307K init, 484K bss, 41948K reserved, 0K cma-reserved)
---
> Memory: 104768K/131072K available (10043K kernel code, 4722K rwdata, 4096K rodata, 2265K init, 371K bss, 25420K reserved, 0K cma-reserved)
85d82
< DMA: preallocated 128 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
88c85
< audit: type=2000 audit(0.104:1): state=initialized audit_enabled=0 res=1
---
> audit: type=2000 audit(0.152:1): state=initialized audit_enabled=0 res=1
92c89
< HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
---
> HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
106c103
< tcp_listen_portaddr_hash hash table entries: 128 (order: 0, 4096 bytes, linear)
---
> tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 5120 bytes, linear)
108,109c105,106
< TCP established hash table entries: 1024 (order: 1, 8192 bytes, linear)
< TCP bind hash table entries: 1024 (order: 4, 65536 bytes, linear)
---
> TCP established hash table entries: 1024 (order: 0, 4096 bytes, linear)
> TCP bind hash table entries: 1024 (order: 3, 40960 bytes, linear)
111,112c108,109
< UDP hash table entries: 256 (order: 3, 40960 bytes, linear)
< UDP-Lite hash table entries: 256 (order: 3, 40960 bytes, linear)
---
> UDP hash table entries: 256 (order: 2, 20480 bytes, linear)
> UDP-Lite hash table entries: 256 (order: 2, 20480 bytes, linear)
120c117
< workingset: timestamp_bits=46 max_order=15 bucket_order=0
---
> workingset: timestamp_bits=14 max_order=15 bucket_order=1
165c162
< goldfish_rtc 101000.rtc: setting system clock to 2025-03-15T08:48:58 UTC (1742028538)
---
> goldfish_rtc 101000.rtc: setting system clock to 2025-03-15T08:51:36 UTC (1742028696)
191c188
< Freeing unused kernel image (initmem) memory: 2304K
---
> Freeing unused kernel image (initmem) memory: 2264K

18 differences


References
==========
[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/381
[2] https://github.com/ruyisdk/riscv-gnu-toolchain-rv64ilp32/releases/tag/2025.03.24


Changelog
=========
v3:
 - Base on CONFIG_64BIT instead of CONFIG_32BIT
 - Add lp64-abi userspace support
 - Remove rv64ilp32-abi userspace support
 - Rebase on v6.14-rc1

v2:
https://lore.kernel.org/linux-riscv/20231112061514.2306187-1-guoren@kernel.org/
 - Add u64ilp32 support
 - Rebase v6.5-rc1
 - Enable 64ilp32 vgettimeofday for benchmarking

v1:
https://lore.kernel.org/linux-riscv/20230518131013.3366406-1-guoren@kernel.org/

Guo Ren (Alibaba DAMO Academy) (43):
  rv64ilp32_abi: uapi: Reuse lp64 ABI interface
  rv64ilp32_abi: riscv: Adapt Makefile and Kconfig
  rv64ilp32_abi: riscv: Adapt ULL & UL definition
  rv64ilp32_abi: riscv: Introduce xlen_t to adapt __riscv_xlen !=
    BITS_PER_LONG
  rv64ilp32_abi: riscv: crc32: Utilize 64-bit width to improve the
    performance
  rv64ilp32_abi: riscv: csum: Utilize 64-bit width to improve the
    performance
  rv64ilp32_abi: riscv: arch_hweight: Adapt cpopw & cpop of zbb
    extension
  rv64ilp32_abi: riscv: bitops: Adapt ctzw & clzw of zbb extension
  rv64ilp32_abi: riscv: Reuse LP64 SBI interface
  rv64ilp32_abi: riscv: Update SATP.MODE.ASID width
  rv64ilp32_abi: riscv: Introduce PTR_L and PTR_S
  rv64ilp32_abi: riscv: Introduce cmpxchg_double
  rv64ilp32_abi: riscv: Correct stackframe layout
  rv64ilp32_abi: riscv: Adapt kernel module code
  rv64ilp32_abi: riscv: mm: Adapt MMU_SV39 for 2GiB address space
  rv64ilp32_abi: riscv: Support physical addresses >= 0x80000000
  rv64ilp32_abi: riscv: Adapt kasan memory layout
  rv64ilp32_abi: riscv: kvm: Initial support
  rv64ilp32_abi: irqchip: irq-riscv-intc: Use xlen_t instead of ulong
  rv64ilp32_abi: drivers/perf: Adapt xlen_t of sbiret
  rv64ilp32_abi: asm-generic: Add custom BITS_PER_LONG definition
  rv64ilp32_abi: bpf: Change KERN_ARENA_SZ to 256MiB
  rv64ilp32_abi: compat: Correct compat_ulong_t cast
  rv64ilp32_abi: compiler_types: Add "long long" into __native_word()
  rv64ilp32_abi: exec: Adapt 64lp64 env and argv
  rv64ilp32_abi: file_ref: Use 32-bit width for refcnt
  rv64ilp32_abi: input: Adapt BITS_PER_LONG to dword
  rv64ilp32_abi: iov_iter: Resize kvec to match iov_iter's size
  rv64ilp32_abi: locking/atomic: Use BITS_PER_LONG for scripts
  rv64ilp32_abi: kernel/smp: Disable CSD_LOCK_WAIT_DEBUG
  rv64ilp32_abi: maple_tree: Use BITS_PER_LONG instead of CONFIG_64BIT
  rv64ilp32_abi: mm: Remove _folio_nr_pages
  rv64ilp32_abi: mm/auxvec: Adapt mm->saved_auxv[] to Elf64
  rv64ilp32_abi: mm: Adapt vm_flags_t struct
  rv64ilp32_abi: net: Use BITS_PER_LONG in struct dst_entry
  rv64ilp32_abi: printf: Use BITS_PER_LONG instead of CONFIG_64BIT
  rv64ilp32_abi: random: Adapt fast_pool struct
  rv64ilp32_abi: syscall: Use CONFIG_64BIT instead of BITS_PER_LONG
  rv64ilp32_abi: sysinfo: Adapt sysinfo structure to lp64 uapi
  rv64ilp32_abi: tracepoint-defs: Using u64 for trace_print_flags.mask
  rv64ilp32_abi: tty: Adapt ptr_to_compat
  rv64ilp32_abi: memfd: Use vm_flag_t
  riscv: Fixup address space overlay of print_mlk

 arch/riscv/Kconfig                            |  15 +-
 arch/riscv/Makefile                           |  17 ++
 arch/riscv/configs/rv64ilp32.config           |   1 +
 arch/riscv/include/asm/arch_hweight.h         |   8 +-
 arch/riscv/include/asm/asm.h                  |  13 +-
 arch/riscv/include/asm/bitops.h               |  21 +-
 arch/riscv/include/asm/checksum.h             |   4 +
 arch/riscv/include/asm/cmpxchg.h              |  57 ++++-
 arch/riscv/include/asm/cpu_ops_sbi.h          |   4 +-
 arch/riscv/include/asm/csr.h                  | 227 +++++++++---------
 arch/riscv/include/asm/kasan.h                |   6 +-
 arch/riscv/include/asm/kvm_aia.h              |  32 +--
 arch/riscv/include/asm/kvm_host.h             | 192 +++++++--------
 arch/riscv/include/asm/kvm_nacl.h             |  26 +-
 arch/riscv/include/asm/kvm_vcpu_insn.h        |   4 +-
 arch/riscv/include/asm/kvm_vcpu_pmu.h         |   8 +-
 arch/riscv/include/asm/kvm_vcpu_sbi.h         |   4 +-
 arch/riscv/include/asm/page.h                 |  23 +-
 arch/riscv/include/asm/pgtable-64.h           |  55 +++--
 arch/riscv/include/asm/pgtable.h              |  60 ++++-
 arch/riscv/include/asm/processor.h            |  12 +-
 arch/riscv/include/asm/ptrace.h               |  92 +++----
 arch/riscv/include/asm/sbi.h                  |  32 +--
 arch/riscv/include/asm/scs.h                  |   4 +-
 arch/riscv/include/asm/sparsemem.h            |   2 +-
 arch/riscv/include/asm/stacktrace.h           |   6 +
 arch/riscv/include/asm/switch_to.h            |   4 +-
 arch/riscv/include/asm/syscall_table.h        |   2 +-
 arch/riscv/include/asm/thread_info.h          |   2 +-
 arch/riscv/include/asm/timex.h                |   4 +-
 arch/riscv/include/asm/unistd.h               |   4 +-
 arch/riscv/include/uapi/asm/bitsperlong.h     |   6 +
 arch/riscv/include/uapi/asm/elf.h             |   4 +-
 arch/riscv/include/uapi/asm/kvm.h             |  56 ++---
 arch/riscv/include/uapi/asm/ptrace.h          |  97 ++++----
 arch/riscv/include/uapi/asm/ucontext.h        |   7 +-
 arch/riscv/include/uapi/asm/unistd.h          |   2 +-
 arch/riscv/kernel/compat_signal.c             |   4 +-
 arch/riscv/kernel/cpu.c                       |   4 +-
 arch/riscv/kernel/cpu_ops_sbi.c               |   4 +-
 arch/riscv/kernel/entry.S                     |  32 +--
 arch/riscv/kernel/head.S                      | 120 ++++++++-
 arch/riscv/kernel/module.c                    |   2 +-
 arch/riscv/kernel/process.c                   |   8 +-
 arch/riscv/kernel/sbi_ecall.c                 |  22 +-
 arch/riscv/kernel/signal.c                    |   4 +-
 arch/riscv/kernel/traps.c                     |   4 +-
 arch/riscv/kernel/vector.c                    |   2 +-
 arch/riscv/kvm/aia.c                          |  26 +-
 arch/riscv/kvm/aia_imsic.c                    |   6 +-
 arch/riscv/kvm/main.c                         |   2 +-
 arch/riscv/kvm/mmu.c                          |  10 +-
 arch/riscv/kvm/tlb.c                          |  76 +++---
 arch/riscv/kvm/vcpu.c                         |  10 +-
 arch/riscv/kvm/vcpu_exit.c                    |   4 +-
 arch/riscv/kvm/vcpu_insn.c                    |  12 +-
 arch/riscv/kvm/vcpu_onereg.c                  |  18 +-
 arch/riscv/kvm/vcpu_pmu.c                     |   8 +-
 arch/riscv/kvm/vcpu_sbi_base.c                |   2 +-
 arch/riscv/kvm/vmid.c                         |   4 +-
 arch/riscv/lib/crc32-riscv.c                  |  35 +--
 arch/riscv/lib/csum.c                         |  48 ++--
 arch/riscv/mm/context.c                       |  12 +-
 arch/riscv/mm/fault.c                         |  12 +-
 arch/riscv/mm/init.c                          |  63 +++--
 arch/riscv/mm/kasan_init.c                    |   2 +-
 arch/riscv/mm/pageattr.c                      |   4 +-
 arch/riscv/mm/pgtable.c                       |   2 +-
 arch/riscv/net/bpf_jit_comp64.c               |   6 +-
 drivers/char/random.c                         |   8 +
 drivers/input/input.c                         |   4 +
 drivers/irqchip/irq-riscv-intc.c              |   9 +-
 drivers/perf/riscv_pmu_sbi.c                  |   4 +-
 drivers/tty/tty_io.c                          |   4 +
 fs/exec.c                                     |   4 +
 fs/proc/loadavg.c                             |  10 +-
 fs/proc/task_mmu.c                            |   9 +-
 include/asm-generic/bitsperlong.h             |   2 +
 include/asm-generic/module.h                  |   2 +-
 include/linux/atomic/atomic-long.h            | 174 +++++++-------
 include/linux/compiler_types.h                |   7 +
 include/linux/file_ref.h                      |   4 +-
 include/linux/maple_tree.h                    |   2 +-
 include/linux/memfd.h                         |   4 +-
 include/linux/mm.h                            |  14 +-
 include/linux/mm_types.h                      |  10 +-
 include/linux/sched/loadavg.h                 |   4 +
 include/linux/smp_types.h                     |   2 +-
 include/linux/socket.h                        |  35 +++
 include/linux/tracepoint-defs.h               |   4 +
 include/linux/uio.h                           |   6 +
 include/net/dst.h                             |   6 +-
 include/uapi/asm-generic/siginfo.h            |  50 ++++
 include/uapi/asm-generic/signal.h             |  35 +++
 include/uapi/asm-generic/stat.h               |  25 ++
 include/uapi/linux/atm.h                      |   7 +
 include/uapi/linux/atmdev.h                   |   7 +
 include/uapi/linux/auto_fs.h                  |   6 +
 include/uapi/linux/blkpg.h                    |   7 +
 include/uapi/linux/btrfs.h                    |  19 ++
 include/uapi/linux/capi.h                     |  11 +
 include/uapi/linux/fs.h                       |  12 +
 include/uapi/linux/futex.h                    |  18 ++
 include/uapi/linux/if.h                       |   6 +
 include/uapi/linux/netfilter/x_tables.h       |   8 +
 include/uapi/linux/netfilter_ipv4/ip_tables.h |   7 +
 include/uapi/linux/nfs4_mount.h               |  14 ++
 include/uapi/linux/ppp-ioctl.h                |   7 +
 include/uapi/linux/sctp.h                     |   3 +
 include/uapi/linux/sem.h                      |  38 +++
 include/uapi/linux/socket.h                   |   7 +
 include/uapi/linux/sysctl.h                   |  32 +++
 include/uapi/linux/sysinfo.h                  |  20 ++
 include/uapi/linux/uhid.h                     |   7 +
 include/uapi/linux/uio.h                      |  11 +
 include/uapi/linux/usb/tmc.h                  |  14 ++
 include/uapi/linux/usbdevice_fs.h             |  50 ++++
 include/uapi/linux/uvcvideo.h                 |  14 ++
 include/uapi/linux/vfio.h                     |   7 +
 include/uapi/linux/videodev2.h                |   7 +
 kernel/bpf/arena.c                            |  19 +-
 kernel/compat.c                               |  15 +-
 kernel/sched/loadavg.c                        |   4 +
 kernel/sys.c                                  |   8 +
 lib/Kconfig.debug                             |   1 +
 lib/vsprintf.c                                |   2 +-
 mm/debug.c                                    |   4 +
 mm/internal.h                                 |   2 +-
 mm/memfd.c                                    |   8 +-
 mm/memory.c                                   |   4 +
 scripts/atomic/gen-atomic-long.sh             |   4 +-
 scripts/checksyscalls.sh                      |   2 +-
 132 files changed, 1728 insertions(+), 803 deletions(-)
 create mode 100644 arch/riscv/configs/rv64ilp32.config

Comments

Guo Ren March 25, 2025, 1:13 p.m. UTC | #1
On Tue, Mar 25, 2025 at 8:27 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote:
> > From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>
> >
> > Since 2001, the CONFIG_64BIT kernel has been built with the LP64 ABI,
> > but this patchset allows the CONFIG_64BIT kernel to use an ILP32 ABI
>
> I'm thinking you're going to be finding a metric ton of assumptions
> about 'unsigned long' being 64bit when 64BIT=y throughout the kernel.
Less than you imagined. Most code is compatible with ILP32 ABI due to
the CONFIG_32BIT. In my practice, it's deemed acceptable.

>
> I know of a couple of places where 64BIT will result in different math
> such that a 32bit 'unsigned long' will trivially overflow.
I would be grateful if you could share some with me.

>
> Please, don't do this. This adds a significant maintenance burden on all
> of us.
The 64ILP32 ABI would bear the maintenance burden, not traditional
64-bit or 32-bit ABIs. The patch set won't impact other CONFIG_64BIT
or CONFIG_32BIT. Numerous RV64 chips require the RV64ILP32 ABI to
reduce the memory and cache footprint; we will bear the burden. The
core code maintainers would receive patches that would make them use
BITS_PER_LONG and CONFIG_64BIT more accurately.
Arnd Bergmann March 25, 2025, 1:17 p.m. UTC | #2
On Tue, Mar 25, 2025, at 13:26, Peter Zijlstra wrote:
> On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote:
>> From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>
>> 
>> Since 2001, the CONFIG_64BIT kernel has been built with the LP64 ABI,
>> but this patchset allows the CONFIG_64BIT kernel to use an ILP32 ABI
>
> Please, don't do this. This adds a significant maintenance burden on all
> of us.

It would be easier to this with CONFIG_64BIT disabled and continue
treating CONFIG_64BIT to be the same as BITS_PER_LONG=64, but I still
think it's fundamentally a bad idea to support this in mainline
kernels in any variation, other than supporting regular 32-bit
compat mode tasks on a regular 64-bit kernel.

>> The patchset targets RISC-V and is built on the RV64ILP32 ABI, which
>> was introduced into RISC-V's psABI in January 2025 [1]. This patchset
>> equips an rv64ilp32-abi kernel with all the functionalities of a
>> traditional lp64-abi kernel, yet restricts the address space to 2GiB.
>> Hence, the rv64ilp32-abi kernel simultaneously supports lp64-abi
>> userspace and ilp32-abi (compat) userspace, the same as the
>> traditional lp64-abi kernel.

You declare the syscall ABI to be the native 64-bit ABI, but this
is fundamentally not true because a many uapi structures are
defined in terms of 'long' or pointer values, in particular in
the ioctl call. This might work for an rv64ilp32 userspace that
uses the same headers and the same types, but you explicitly
say that the goal is to run native rv64 or compat rv32 tasks,
not rv64ilp32 (thanks!).

As far as I can tell, there is no way to rectify this design flaw
other than to drop support for 64-bit userspace and only support
regular rv32 userspace. I'm also skeptical that supporting rv64
userspace helps in practice other than for testing, since
generally most memory overhead is in userspace rather than the
kernel, and there is much more to gain from shrinking the larger
userspace by running rv32 compat mode binaries on a 64-bit kernel
than the other way round.

If you remove the CONFIG_64BIT changes that Peter mentioned and
the support for ilp64 userland from your series, you end up
with a kernel that is very similar to a native rv32 kernel
but executes as rv64ilp32 and runs rv32 userspace. I don't have
any objections to that approach, and the same thing has come
up on arm64 as a possible idea as well, but I don't know if
that actually brings any notable advantage over an rv32 kernel.

Are there CPUs that can run rv64 kernels and rv32 userspace
but not rv32 kernels, similar to what we have on Arm Cortex-A76
and Cortex-A510?

       Arnd
Sergey Shtylyov March 25, 2025, 5:19 p.m. UTC | #3
On 3/25/25 3:16 PM, guoren@kernel.org wrote:

> From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>
> 
> The rv64ilp32 abi reuses the env and argv memory layout of the
> lp64 abi, so leave the space to fit the lp64 struct layout.
> 
> Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>
> ---
>  fs/exec.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/exec.c b/fs/exec.c
> index 506cd411f4ac..548d18b7ae92 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -424,6 +424,10 @@ static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr)
>  	}
>  #endif
>  
> +#if defined(CONFIG_64BIT) && (BITS_PER_LONG == 32)

   Parens don't seem necessary...

> +	nr = nr * 2;

   Why not nr *= 2?

[...]

MBR, Sergey
David Hildenbrand March 25, 2025, 6:51 p.m. UTC | #4
On 25.03.25 13:26, Peter Zijlstra wrote:
> On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote:
>> From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>
>>
>> Since 2001, the CONFIG_64BIT kernel has been built with the LP64 ABI,
>> but this patchset allows the CONFIG_64BIT kernel to use an ILP32 ABI
> 
> I'm thinking you're going to be finding a metric ton of assumptions
> about 'unsigned long' being 64bit when 64BIT=y throughout the kernel.
> 
> I know of a couple of places where 64BIT will result in different math
> such that a 32bit 'unsigned long' will trivially overflow.
> 
> Please, don't do this. This adds a significant maintenance burden on all
> of us.
> 

Fully agreed.
Liam R. Howlett March 25, 2025, 7:09 p.m. UTC | #5
* guoren@kernel.org <guoren@kernel.org> [250325 08:24]:
> From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>
> 
> The Maple tree algorithm uses ulong type for each element. The
> number of slots is based on BITS_PER_LONG for RV64ILP32 ABI, so
> use BITS_PER_LONG instead of CONFIG_64BIT.
> 
> Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>
> ---
>  include/linux/maple_tree.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> index cbbcd18d4186..ff6265b6468b 100644
> --- a/include/linux/maple_tree.h
> +++ b/include/linux/maple_tree.h
> @@ -24,7 +24,7 @@
>   *
>   * Nodes in the tree point to their parent unless bit 0 is set.
>   */
> -#if defined(CONFIG_64BIT) || defined(BUILD_VDSO32_64)
> +#if (BITS_PER_LONG == 64) || defined(BUILD_VDSO32_64)

This will break my userspace testing, if you do not update the testing as
well.  This can be found in tools/testing/radix-tree.  Please also look
at the Makefile as well since it will generate a build flag for the
userspace.

This raises other concerns as the code is found with a grep command, so
I'm not sure why it was missed and if anything else is missed?

If you consider this email to be the (unasked) question about what to do
here, then please CC me, the maintainer of the files including the one
you are updating here.

Thank you,
Liam
Liam R. Howlett March 25, 2025, 7:23 p.m. UTC | #6
* David Hildenbrand <david@redhat.com> [250325 14:52]:
> On 25.03.25 13:26, Peter Zijlstra wrote:
> > On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote:
> > > From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>
> > > 
> > > Since 2001, the CONFIG_64BIT kernel has been built with the LP64 ABI,
> > > but this patchset allows the CONFIG_64BIT kernel to use an ILP32 ABI
> > 
> > I'm thinking you're going to be finding a metric ton of assumptions
> > about 'unsigned long' being 64bit when 64BIT=y throughout the kernel.
> > 
> > I know of a couple of places where 64BIT will result in different math
> > such that a 32bit 'unsigned long' will trivially overflow.
> > 
> > Please, don't do this. This adds a significant maintenance burden on all
> > of us.
> > 
> 
> Fully agreed.

I would go further and say I do not want this to go in.

The open ended maintenance burden is not worth extending hardware life
of a board with 16mb of ram (If I understand your 2023 LPC slides
correctly).

Thank you,
Liam
Jan Engelhardt March 25, 2025, 8:30 p.m. UTC | #7
On Tuesday 2025-03-25 13:15, guoren@kernel.org wrote:

>diff --git a/include/uapi/linux/netfilter/x_tables.h b/include/uapi/linux/netfilter/x_tables.h
>index 796af83a963a..7e02e34c6fad 100644
>--- a/include/uapi/linux/netfilter/x_tables.h
>+++ b/include/uapi/linux/netfilter/x_tables.h
>@@ -18,7 +18,11 @@ struct xt_entry_match {
> 			__u8 revision;
> 		} user;
> 		struct {
>+#if __riscv_xlen == 64
>+			__u64 match_size;
>+#else
> 			__u16 match_size;
>+#endif
> 
> 			/* Used inside the kernel */
> 			struct xt_match *match;

The __u16 is the common prefix of the union which is exposed to userspace.
If anything, you need to use __attribute__((aligned(8))) to move
`match` to a fixed location.

However, that sub-struct is only used inside the kernel and never exposed,
so the alignment of `match` should not play a role.

Moreover, change from u16 to u64 would break RISC-V Big-Endian. Even if there
currently is no big-endian variant, let's not introduce such breakage.


>--- a/include/uapi/linux/netfilter_ipv4/ip_tables.h
>+++ b/include/uapi/linux/netfilter_ipv4/ip_tables.h
>@@ -200,7 +200,14 @@ struct ipt_replace {
> 	/* Number of counters (must be equal to current number of entries). */
> 	unsigned int num_counters;
> 	/* The old entries' counters. */
>+#if __riscv_xlen == 64
>+	union {
>+		struct xt_counters __user *counters;
>+		__u64 __counters;
>+	};
>+#else
> 	struct xt_counters __user *counters;
>+#endif
> 
> 	/* The entries (hang off end: not really an array). */
> 	struct ipt_entry entries[];

This seems ok, but perhaps there is a better name for __riscv_xlen (ifdef
CONFIG_????ilp32), so it is not strictly tied to riscv,
in case other platform wants to try ilp32-self mode.

>+#if __riscv_xlen == 64
>+	union {
>+		int __user *auth_flavours;		/* 1 */
>+		__u64 __auth_flavours;
>+	};
>+#else
> 	int __user *auth_flavours;		/* 1 */
>+#endif
> };
> 
> /* bits in the flags field */
>diff --git a/include/uapi/linux/ppp-ioctl.h b/include/uapi/linux/ppp-ioctl.h
>index 1cc5ce0ae062..8d48eab430c1 100644
>--- a/include/uapi/linux/ppp-ioctl.h
>+++ b/include/uapi/linux/ppp-ioctl.h
>@@ -59,7 +59,14 @@ struct npioctl {
> 
> /* Structure describing a CCP configuration option, for PPPIOCSCOMPRESS */
> struct ppp_option_data {
>+#if __riscv_xlen == 64
>+	union {
>+		__u8	__user *ptr;
>+		__u64	__ptr;
>+	};
>+#else
> 	__u8	__user *ptr;
>+#endif
> 	__u32	length;
> 	int	transmit;
> };
>diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
>index b7d91d4cf0db..46a06fddcd2f 100644
>--- a/include/uapi/linux/sctp.h
>+++ b/include/uapi/linux/sctp.h
>@@ -1024,6 +1024,9 @@ struct sctp_getaddrs_old {
> #else
> 	struct sockaddr		*addrs;
> #endif
>+#if (__riscv_xlen == 64) && (__SIZEOF_LONG__ == 4)
>+	__u32			unused;
>+#endif
> };


> 
> struct sctp_getaddrs {
>diff --git a/include/uapi/linux/sem.h b/include/uapi/linux/sem.h
>index 75aa3b273cd9..de9f441913cd 100644
>--- a/include/uapi/linux/sem.h
>+++ b/include/uapi/linux/sem.h
>@@ -26,10 +26,29 @@ struct semid_ds {
> 	struct ipc_perm	sem_perm;		/* permissions .. see ipc.h */
> 	__kernel_old_time_t sem_otime;		/* last semop time */
> 	__kernel_old_time_t sem_ctime;		/* create/last semctl() time */
>+#if __riscv_xlen == 64
>+	union {
>+		struct sem	*sem_base;		/* ptr to first semaphore in array */
>+		__u64 __sem_base;
>+	};
>+	union {
>+		struct sem_queue *sem_pending;		/* pending operations to be processed */
>+		__u64 __sem_pending;
>+	};
>+	union {
>+		struct sem_queue **sem_pending_last;	/* last pending operation */
>+		__u64 __sem_pending_last;
>+	};
>+	union {
>+		struct sem_undo	*undo;			/* undo requests on this array */
>+		__u64 __undo;
>+	};
>+#else
> 	struct sem	*sem_base;		/* ptr to first semaphore in array */
> 	struct sem_queue *sem_pending;		/* pending operations to be processed */
> 	struct sem_queue **sem_pending_last;	/* last pending operation */
> 	struct sem_undo	*undo;			/* undo requests on this array */
>+#endif
> 	unsigned short	sem_nsems;		/* no. of semaphores in array */
> };
> 
>@@ -46,10 +65,29 @@ struct sembuf {
> /* arg for semctl system calls. */
> union semun {
> 	int val;			/* value for SETVAL */
>+#if __riscv_xlen == 64
>+	union {
>+		struct semid_ds __user *buf;	/* buffer for IPC_STAT & IPC_SET */
>+		__u64 ___buf;
>+	};
>+	union {
>+		unsigned short __user *array;	/* array for GETALL & SETALL */
>+		__u64 __array;
>+	};
>+	union {
>+		struct seminfo __user *__buf;	/* buffer for IPC_INFO */
>+		__u64 ____buf;
>+	};
>+	union {
>+		void __user *__pad;
>+		__u64 ____pad;
>+	};
>+#else
> 	struct semid_ds __user *buf;	/* buffer for IPC_STAT & IPC_SET */
> 	unsigned short __user *array;	/* array for GETALL & SETALL */
> 	struct seminfo __user *__buf;	/* buffer for IPC_INFO */
> 	void __user *__pad;
>+#endif
> };
> 
> struct  seminfo {
>diff --git a/include/uapi/linux/socket.h b/include/uapi/linux/socket.h
>index d3fcd3b5ec53..5f7a83649395 100644
>--- a/include/uapi/linux/socket.h
>+++ b/include/uapi/linux/socket.h
>@@ -22,7 +22,14 @@ struct __kernel_sockaddr_storage {
> 				/* space to achieve desired size, */
> 				/* _SS_MAXSIZE value minus size of ss_family */
> 		};
>+#if __riscv_xlen == 64
>+		union {
>+			void *__align; /* implementation specific desired alignment */
>+			u64 ___align;
>+		};
>+#else
> 		void *__align; /* implementation specific desired alignment */
>+#endif
> 	};
> };
> 
>diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
>index 8981f00204db..8ed7b29897f9 100644
>--- a/include/uapi/linux/sysctl.h
>+++ b/include/uapi/linux/sysctl.h
>@@ -33,13 +33,45 @@
> 				   member of a struct __sysctl_args to have? */
> 
> struct __sysctl_args {
>+#if __riscv_xlen == 64
>+	union {
>+		int __user *name;
>+		__u64 __name;
>+	};
>+#else
> 	int __user *name;
>+#endif
> 	int nlen;
>+#if __riscv_xlen == 64
>+	union {
>+		void __user *oldval;
>+		__u64 __oldval;
>+	};
>+#else
> 	void __user *oldval;
>+#endif
>+#if __riscv_xlen == 64
>+	union {
>+		size_t __user *oldlenp;
>+		__u64 __oldlenp;
>+	};
>+#else
> 	size_t __user *oldlenp;
>+#endif
>+#if __riscv_xlen == 64
>+	union {
>+		void __user *newval;
>+		__u64 __newval;
>+	};
>+#else
> 	void __user *newval;
>+#endif
> 	size_t newlen;
>+#if __riscv_xlen == 64
>+	unsigned long long __unused[4];
>+#else
> 	unsigned long __unused[4];
>+#endif
> };
> 
> /* Define sysctl names first */
>diff --git a/include/uapi/linux/uhid.h b/include/uapi/linux/uhid.h
>index cef7534d2d19..4a774dbd3de8 100644
>--- a/include/uapi/linux/uhid.h
>+++ b/include/uapi/linux/uhid.h
>@@ -130,7 +130,14 @@ struct uhid_create_req {
> 	__u8 name[128];
> 	__u8 phys[64];
> 	__u8 uniq[64];
>+#if __riscv_xlen == 64
>+	union {
>+		__u8 __user *rd_data;
>+		__u64 __rd_data;
>+	};
>+#else
> 	__u8 __user *rd_data;
>+#endif
> 	__u16 rd_size;
> 
> 	__u16 bus;
>diff --git a/include/uapi/linux/uio.h b/include/uapi/linux/uio.h
>index 649739e0c404..27dfd6032dc6 100644
>--- a/include/uapi/linux/uio.h
>+++ b/include/uapi/linux/uio.h
>@@ -16,8 +16,19 @@
> 
> struct iovec
> {
>+#if __riscv_xlen == 64
>+	union {
>+		void __user *iov_base;	/* BSD uses caddr_t (1003.1g requires void *) */
>+		__u64 __iov_base;
>+	};
>+	union {
>+		__kernel_size_t iov_len; /* Must be size_t (1003.1g) */
>+		__u64 __iov_len;
>+	};
>+#else
> 	void __user *iov_base;	/* BSD uses caddr_t (1003.1g requires void *) */
> 	__kernel_size_t iov_len; /* Must be size_t (1003.1g) */
>+#endif
> };
> 
> struct dmabuf_cmsg {
>diff --git a/include/uapi/linux/usb/tmc.h b/include/uapi/linux/usb/tmc.h
>index d791cc58a7f0..443ec5356caf 100644
>--- a/include/uapi/linux/usb/tmc.h
>+++ b/include/uapi/linux/usb/tmc.h
>@@ -51,7 +51,14 @@ struct usbtmc_request {
> 
> struct usbtmc_ctrlrequest {
> 	struct usbtmc_request req;
>+#if __riscv_xlen == 64
>+	union {
>+		void __user *data; /* pointer to user space */
>+		__u64 __data; /* pointer to user space */
>+	};
>+#else
> 	void __user *data; /* pointer to user space */
>+#endif
> } __attribute__ ((packed));
> 
> struct usbtmc_termchar {
>@@ -70,7 +77,14 @@ struct usbtmc_message {
> 	__u32 transfer_size; /* size of bytes to transfer */
> 	__u32 transferred; /* size of received/written bytes */
> 	__u32 flags; /* bit 0: 0 = synchronous; 1 = asynchronous */
>+#if __riscv_xlen == 64
>+	union {
>+		void __user *message; /* pointer to header and data in user space */
>+		__u64 __message;
>+	};
>+#else
> 	void __user *message; /* pointer to header and data in user space */
>+#endif
> } __attribute__ ((packed));
> 
> /* Request values for USBTMC driver's ioctl entry point */
>diff --git a/include/uapi/linux/usbdevice_fs.h b/include/uapi/linux/usbdevice_fs.h
>index 74a84e02422a..8c8efef74c3c 100644
>--- a/include/uapi/linux/usbdevice_fs.h
>+++ b/include/uapi/linux/usbdevice_fs.h
>@@ -44,14 +44,28 @@ struct usbdevfs_ctrltransfer {
> 	__u16 wIndex;
> 	__u16 wLength;
> 	__u32 timeout;  /* in milliseconds */
>+#if __riscv_xlen == 64
>+	union {
>+		void __user *data;
>+		__u64 __data;
>+	};
>+#else
>  	void __user *data;
>+#endif
> };
> 
> struct usbdevfs_bulktransfer {
> 	unsigned int ep;
> 	unsigned int len;
> 	unsigned int timeout; /* in milliseconds */
>+#if __riscv_xlen == 64
>+	union {
>+		void __user *data;
>+		__u64 __data;
>+	};
>+#else
> 	void __user *data;
>+#endif
> };
> 
> struct usbdevfs_setinterface {
>@@ -61,7 +75,14 @@ struct usbdevfs_setinterface {
> 
> struct usbdevfs_disconnectsignal {
> 	unsigned int signr;
>+#if __riscv_xlen == 64
>+	union {
>+		void __user *context;
>+		__u64 __context;
>+	};
>+#else
> 	void __user *context;
>+#endif
> };
> 
> #define USBDEVFS_MAXDRIVERNAME 255
>@@ -119,7 +140,14 @@ struct usbdevfs_urb {
> 	unsigned char endpoint;
> 	int status;
> 	unsigned int flags;
>+#if __riscv_xlen == 64
>+	union {
>+		void __user *buffer;
>+		__u64 __buffer;
>+	};
>+#else
> 	void __user *buffer;
>+#endif
> 	int buffer_length;
> 	int actual_length;
> 	int start_frame;
>@@ -130,7 +158,14 @@ struct usbdevfs_urb {
> 	int error_count;
> 	unsigned int signr;	/* signal to be sent on completion,
> 				  or 0 if none should be sent. */
>+#if __riscv_xlen == 64
>+	union {
>+		void __user *usercontext;
>+		__u64 __usercontext;
>+	};
>+#else
> 	void __user *usercontext;
>+#endif
> 	struct usbdevfs_iso_packet_desc iso_frame_desc[];
> };
> 
>@@ -139,7 +174,14 @@ struct usbdevfs_ioctl {
> 	int	ifno;		/* interface 0..N ; negative numbers reserved */
> 	int	ioctl_code;	/* MUST encode size + direction of data so the
> 				 * macros in <asm/ioctl.h> give correct values */
>+#if __riscv_xlen == 64
>+	union {
>+		void __user *data;	/* param buffer (in, or out) */
>+		__u64 __pad;
>+	};
>+#else
> 	void __user *data;	/* param buffer (in, or out) */
>+#endif
> };
> 
> /* You can do most things with hubs just through control messages,
>@@ -195,9 +237,17 @@ struct usbdevfs_streams {
> #define USBDEVFS_SUBMITURB         _IOR('U', 10, struct usbdevfs_urb)
> #define USBDEVFS_SUBMITURB32       _IOR('U', 10, struct usbdevfs_urb32)
> #define USBDEVFS_DISCARDURB        _IO('U', 11)
>+#if __riscv_xlen == 64
>+#define USBDEVFS_REAPURB           _IOW('U', 12, __u64)
>+#else
> #define USBDEVFS_REAPURB           _IOW('U', 12, void *)
>+#endif
> #define USBDEVFS_REAPURB32         _IOW('U', 12, __u32)
>+#if __riscv_xlen == 64
>+#define USBDEVFS_REAPURBNDELAY     _IOW('U', 13, __u64)
>+#else
> #define USBDEVFS_REAPURBNDELAY     _IOW('U', 13, void *)
>+#endif
> #define USBDEVFS_REAPURBNDELAY32   _IOW('U', 13, __u32)
> #define USBDEVFS_DISCSIGNAL        _IOR('U', 14, struct usbdevfs_disconnectsignal)
> #define USBDEVFS_DISCSIGNAL32      _IOR('U', 14, struct usbdevfs_disconnectsignal32)
>diff --git a/include/uapi/linux/uvcvideo.h b/include/uapi/linux/uvcvideo.h
>index f86185456dc5..3ccb99039a43 100644
>--- a/include/uapi/linux/uvcvideo.h
>+++ b/include/uapi/linux/uvcvideo.h
>@@ -54,7 +54,14 @@ struct uvc_xu_control_mapping {
> 	__u32 v4l2_type;
> 	__u32 data_type;
> 
>+#if __riscv_xlen == 64
>+	union {
>+		struct uvc_menu_info __user *menu_info;
>+		__u64 __menu_info;
>+	};
>+#else
> 	struct uvc_menu_info __user *menu_info;
>+#endif
> 	__u32 menu_count;
> 
> 	__u32 reserved[4];
>@@ -66,7 +73,14 @@ struct uvc_xu_control_query {
> 	__u8 query;		/* Video Class-Specific Request Code, */
> 				/* defined in linux/usb/video.h A.8.  */
> 	__u16 size;
>+#if __riscv_xlen == 64
>+	union {
>+		__u8 __user *data;
>+		__u64 __data;
>+	};
>+#else
> 	__u8 __user *data;
>+#endif
> };
> 
> #define UVCIOC_CTRL_MAP		_IOWR('u', 0x20, struct uvc_xu_control_mapping)
>diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>index c8dbf8219c4f..0a1dc2a780fb 100644
>--- a/include/uapi/linux/vfio.h
>+++ b/include/uapi/linux/vfio.h
>@@ -1570,7 +1570,14 @@ struct vfio_iommu_type1_dma_map {
> struct vfio_bitmap {
> 	__u64        pgsize;	/* page size for bitmap in bytes */
> 	__u64        size;	/* in bytes */
>+	#if __riscv_xlen == 64
>+	union {
>+		__u64 __user *data;	/* one bit per page */
>+		__u64 __data;
>+	};
>+	#else
> 	__u64 __user *data;	/* one bit per page */
>+	#endif
> };
> 
> /**
>diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
>index e7c4dce39007..8e5391f07626 100644
>--- a/include/uapi/linux/videodev2.h
>+++ b/include/uapi/linux/videodev2.h
>@@ -1898,7 +1898,14 @@ struct v4l2_ext_controls {
> 	__u32 error_idx;
> 	__s32 request_fd;
> 	__u32 reserved[1];
>+#if __riscv_xlen == 64
>+	union {
>+		struct v4l2_ext_control *controls;
>+		__u64 __controls;
>+	};
>+#else
> 	struct v4l2_ext_control *controls;
>+#endif
> };
> 
> #define V4L2_CTRL_ID_MASK	  (0x0fffffff)
>-- 
>2.40.1
>
>
Linus Torvalds March 25, 2025, 8:41 p.m. UTC | #8
On Tue, 25 Mar 2025 at 05:17, <guoren@kernel.org> wrote:
>
> The rv64ilp32 abi kernel accommodates the lp64 abi userspace and
> leverages the lp64 abi Linux interface. Hence, unify the
> BITS_PER_LONG = 32 memory layout to match BITS_PER_LONG = 64.

No.

This isn't happening.

You can't do crazy things in the RISC-V code and then expect the rest
of the kernel to just go "ok, we'll do crazy things".

We're not doing crazy __riscv_xlen hackery with random structures
containing 64-bit values that the kernel then only looks at the low 32
bits. That's wrong on *so* many levels.

I'm willing to say "big-endian is dead", but I'm not willing to accept
this kind of crazy hackery.

Not today, not ever.

If you want to run a ilp32 kernel on 64-bit hardware (and support
64-bit ABI just in a 32-bit virtual memory size), I would suggest you

 (a) treat the kernel as natively 32-bit (obviously you can then tell
the compiler to use the rv64 instructions, which I presume you're
already doing - I didn't look)

 (b) look at making the compat stuff do the conversion the "wrong way".

And btw, that (b) implies *not* just ignoring the high bits. If
user-space gives 64-bit pointer, you don't just treat it as a 32-bit
one by dropping the high bits. You add some logic to convert it to an
invalid pointer so that user space gets -EFAULT.

            Linus
Guo Ren March 26, 2025, 3:35 a.m. UTC | #9
On Wed, Mar 26, 2025 at 4:31 AM Jan Engelhardt <ej@inai.de> wrote:
>
>
> On Tuesday 2025-03-25 13:15, guoren@kernel.org wrote:
>
> >diff --git a/include/uapi/linux/netfilter/x_tables.h b/include/uapi/linux/netfilter/x_tables.h
> >index 796af83a963a..7e02e34c6fad 100644
> >--- a/include/uapi/linux/netfilter/x_tables.h
> >+++ b/include/uapi/linux/netfilter/x_tables.h
> >@@ -18,7 +18,11 @@ struct xt_entry_match {
> >                       __u8 revision;
> >               } user;
> >               struct {
> >+#if __riscv_xlen == 64
> >+                      __u64 match_size;
> >+#else
> >                       __u16 match_size;
> >+#endif
> >
> >                       /* Used inside the kernel */
> >                       struct xt_match *match;
>
> The __u16 is the common prefix of the union which is exposed to userspace.
> If anything, you need to use __attribute__((aligned(8))) to move
> `match` to a fixed location.
>
> However, that sub-struct is only used inside the kernel and never exposed,
> so the alignment of `match` should not play a role.
>
> Moreover, change from u16 to u64 would break RISC-V Big-Endian. Even if there
> currently is no big-endian variant, let's not introduce such breakage.
You're correct. The __u64 modification is too raw from the proof of
concept. It's not correct, so I would accept your advice.

>
>
> >--- a/include/uapi/linux/netfilter_ipv4/ip_tables.h
> >+++ b/include/uapi/linux/netfilter_ipv4/ip_tables.h
> >@@ -200,7 +200,14 @@ struct ipt_replace {
> >       /* Number of counters (must be equal to current number of entries). */
> >       unsigned int num_counters;
> >       /* The old entries' counters. */
> >+#if __riscv_xlen == 64
> >+      union {
> >+              struct xt_counters __user *counters;
> >+              __u64 __counters;
> >+      };
> >+#else
> >       struct xt_counters __user *counters;
> >+#endif
> >
> >       /* The entries (hang off end: not really an array). */
> >       struct ipt_entry entries[];
>
> This seems ok, but perhaps there is a better name for __riscv_xlen (ifdef
> CONFIG_????ilp32), so it is not strictly tied to riscv,
> in case other platform wants to try ilp32-self mode.
Yes, I want that macro, but Linus has suggested "compat stuff". I
would have to try.

Thx for the reviewing!

>
> >+#if __riscv_xlen == 64
> >+      union {
> >+              int __user *auth_flavours;              /* 1 */
> >+              __u64 __auth_flavours;
> >+      };
> >+#else
> >       int __user *auth_flavours;              /* 1 */
> >+#endif
> > };
> >
> > /* bits in the flags field */
> >diff --git a/include/uapi/linux/ppp-ioctl.h b/include/uapi/linux/ppp-ioctl.h
> >index 1cc5ce0ae062..8d48eab430c1 100644
> >--- a/include/uapi/linux/ppp-ioctl.h
> >+++ b/include/uapi/linux/ppp-ioctl.h
> >@@ -59,7 +59,14 @@ struct npioctl {
> >
> > /* Structure describing a CCP configuration option, for PPPIOCSCOMPRESS */
> > struct ppp_option_data {
> >+#if __riscv_xlen == 64
> >+      union {
> >+              __u8    __user *ptr;
> >+              __u64   __ptr;
> >+      };
> >+#else
> >       __u8    __user *ptr;
> >+#endif
> >       __u32   length;
> >       int     transmit;
> > };
> >diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
> >index b7d91d4cf0db..46a06fddcd2f 100644
> >--- a/include/uapi/linux/sctp.h
> >+++ b/include/uapi/linux/sctp.h
> >@@ -1024,6 +1024,9 @@ struct sctp_getaddrs_old {
> > #else
> >       struct sockaddr         *addrs;
> > #endif
> >+#if (__riscv_xlen == 64) && (__SIZEOF_LONG__ == 4)
> >+      __u32                   unused;
> >+#endif
> > };
>
>
> >
> > struct sctp_getaddrs {
> >diff --git a/include/uapi/linux/sem.h b/include/uapi/linux/sem.h
> >index 75aa3b273cd9..de9f441913cd 100644
> >--- a/include/uapi/linux/sem.h
> >+++ b/include/uapi/linux/sem.h
> >@@ -26,10 +26,29 @@ struct semid_ds {
> >       struct ipc_perm sem_perm;               /* permissions .. see ipc.h */
> >       __kernel_old_time_t sem_otime;          /* last semop time */
> >       __kernel_old_time_t sem_ctime;          /* create/last semctl() time */
> >+#if __riscv_xlen == 64
> >+      union {
> >+              struct sem      *sem_base;              /* ptr to first semaphore in array */
> >+              __u64 __sem_base;
> >+      };
> >+      union {
> >+              struct sem_queue *sem_pending;          /* pending operations to be processed */
> >+              __u64 __sem_pending;
> >+      };
> >+      union {
> >+              struct sem_queue **sem_pending_last;    /* last pending operation */
> >+              __u64 __sem_pending_last;
> >+      };
> >+      union {
> >+              struct sem_undo *undo;                  /* undo requests on this array */
> >+              __u64 __undo;
> >+      };
> >+#else
> >       struct sem      *sem_base;              /* ptr to first semaphore in array */
> >       struct sem_queue *sem_pending;          /* pending operations to be processed */
> >       struct sem_queue **sem_pending_last;    /* last pending operation */
> >       struct sem_undo *undo;                  /* undo requests on this array */
> >+#endif
> >       unsigned short  sem_nsems;              /* no. of semaphores in array */
> > };
> >
> >@@ -46,10 +65,29 @@ struct sembuf {
> > /* arg for semctl system calls. */
> > union semun {
> >       int val;                        /* value for SETVAL */
> >+#if __riscv_xlen == 64
> >+      union {
> >+              struct semid_ds __user *buf;    /* buffer for IPC_STAT & IPC_SET */
> >+              __u64 ___buf;
> >+      };
> >+      union {
> >+              unsigned short __user *array;   /* array for GETALL & SETALL */
> >+              __u64 __array;
> >+      };
> >+      union {
> >+              struct seminfo __user *__buf;   /* buffer for IPC_INFO */
> >+              __u64 ____buf;
> >+      };
> >+      union {
> >+              void __user *__pad;
> >+              __u64 ____pad;
> >+      };
> >+#else
> >       struct semid_ds __user *buf;    /* buffer for IPC_STAT & IPC_SET */
> >       unsigned short __user *array;   /* array for GETALL & SETALL */
> >       struct seminfo __user *__buf;   /* buffer for IPC_INFO */
> >       void __user *__pad;
> >+#endif
> > };
> >
> > struct  seminfo {
> >diff --git a/include/uapi/linux/socket.h b/include/uapi/linux/socket.h
> >index d3fcd3b5ec53..5f7a83649395 100644
> >--- a/include/uapi/linux/socket.h
> >+++ b/include/uapi/linux/socket.h
> >@@ -22,7 +22,14 @@ struct __kernel_sockaddr_storage {
> >                               /* space to achieve desired size, */
> >                               /* _SS_MAXSIZE value minus size of ss_family */
> >               };
> >+#if __riscv_xlen == 64
> >+              union {
> >+                      void *__align; /* implementation specific desired alignment */
> >+                      u64 ___align;
> >+              };
> >+#else
> >               void *__align; /* implementation specific desired alignment */
> >+#endif
> >       };
> > };
> >
> >diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h
> >index 8981f00204db..8ed7b29897f9 100644
> >--- a/include/uapi/linux/sysctl.h
> >+++ b/include/uapi/linux/sysctl.h
> >@@ -33,13 +33,45 @@
> >                                  member of a struct __sysctl_args to have? */
> >
> > struct __sysctl_args {
> >+#if __riscv_xlen == 64
> >+      union {
> >+              int __user *name;
> >+              __u64 __name;
> >+      };
> >+#else
> >       int __user *name;
> >+#endif
> >       int nlen;
> >+#if __riscv_xlen == 64
> >+      union {
> >+              void __user *oldval;
> >+              __u64 __oldval;
> >+      };
> >+#else
> >       void __user *oldval;
> >+#endif
> >+#if __riscv_xlen == 64
> >+      union {
> >+              size_t __user *oldlenp;
> >+              __u64 __oldlenp;
> >+      };
> >+#else
> >       size_t __user *oldlenp;
> >+#endif
> >+#if __riscv_xlen == 64
> >+      union {
> >+              void __user *newval;
> >+              __u64 __newval;
> >+      };
> >+#else
> >       void __user *newval;
> >+#endif
> >       size_t newlen;
> >+#if __riscv_xlen == 64
> >+      unsigned long long __unused[4];
> >+#else
> >       unsigned long __unused[4];
> >+#endif
> > };
> >
> > /* Define sysctl names first */
> >diff --git a/include/uapi/linux/uhid.h b/include/uapi/linux/uhid.h
> >index cef7534d2d19..4a774dbd3de8 100644
> >--- a/include/uapi/linux/uhid.h
> >+++ b/include/uapi/linux/uhid.h
> >@@ -130,7 +130,14 @@ struct uhid_create_req {
> >       __u8 name[128];
> >       __u8 phys[64];
> >       __u8 uniq[64];
> >+#if __riscv_xlen == 64
> >+      union {
> >+              __u8 __user *rd_data;
> >+              __u64 __rd_data;
> >+      };
> >+#else
> >       __u8 __user *rd_data;
> >+#endif
> >       __u16 rd_size;
> >
> >       __u16 bus;
> >diff --git a/include/uapi/linux/uio.h b/include/uapi/linux/uio.h
> >index 649739e0c404..27dfd6032dc6 100644
> >--- a/include/uapi/linux/uio.h
> >+++ b/include/uapi/linux/uio.h
> >@@ -16,8 +16,19 @@
> >
> > struct iovec
> > {
> >+#if __riscv_xlen == 64
> >+      union {
> >+              void __user *iov_base;  /* BSD uses caddr_t (1003.1g requires void *) */
> >+              __u64 __iov_base;
> >+      };
> >+      union {
> >+              __kernel_size_t iov_len; /* Must be size_t (1003.1g) */
> >+              __u64 __iov_len;
> >+      };
> >+#else
> >       void __user *iov_base;  /* BSD uses caddr_t (1003.1g requires void *) */
> >       __kernel_size_t iov_len; /* Must be size_t (1003.1g) */
> >+#endif
> > };
> >
> > struct dmabuf_cmsg {
> >diff --git a/include/uapi/linux/usb/tmc.h b/include/uapi/linux/usb/tmc.h
> >index d791cc58a7f0..443ec5356caf 100644
> >--- a/include/uapi/linux/usb/tmc.h
> >+++ b/include/uapi/linux/usb/tmc.h
> >@@ -51,7 +51,14 @@ struct usbtmc_request {
> >
> > struct usbtmc_ctrlrequest {
> >       struct usbtmc_request req;
> >+#if __riscv_xlen == 64
> >+      union {
> >+              void __user *data; /* pointer to user space */
> >+              __u64 __data; /* pointer to user space */
> >+      };
> >+#else
> >       void __user *data; /* pointer to user space */
> >+#endif
> > } __attribute__ ((packed));
> >
> > struct usbtmc_termchar {
> >@@ -70,7 +77,14 @@ struct usbtmc_message {
> >       __u32 transfer_size; /* size of bytes to transfer */
> >       __u32 transferred; /* size of received/written bytes */
> >       __u32 flags; /* bit 0: 0 = synchronous; 1 = asynchronous */
> >+#if __riscv_xlen == 64
> >+      union {
> >+              void __user *message; /* pointer to header and data in user space */
> >+              __u64 __message;
> >+      };
> >+#else
> >       void __user *message; /* pointer to header and data in user space */
> >+#endif
> > } __attribute__ ((packed));
> >
> > /* Request values for USBTMC driver's ioctl entry point */
> >diff --git a/include/uapi/linux/usbdevice_fs.h b/include/uapi/linux/usbdevice_fs.h
> >index 74a84e02422a..8c8efef74c3c 100644
> >--- a/include/uapi/linux/usbdevice_fs.h
> >+++ b/include/uapi/linux/usbdevice_fs.h
> >@@ -44,14 +44,28 @@ struct usbdevfs_ctrltransfer {
> >       __u16 wIndex;
> >       __u16 wLength;
> >       __u32 timeout;  /* in milliseconds */
> >+#if __riscv_xlen == 64
> >+      union {
> >+              void __user *data;
> >+              __u64 __data;
> >+      };
> >+#else
> >       void __user *data;
> >+#endif
> > };
> >
> > struct usbdevfs_bulktransfer {
> >       unsigned int ep;
> >       unsigned int len;
> >       unsigned int timeout; /* in milliseconds */
> >+#if __riscv_xlen == 64
> >+      union {
> >+              void __user *data;
> >+              __u64 __data;
> >+      };
> >+#else
> >       void __user *data;
> >+#endif
> > };
> >
> > struct usbdevfs_setinterface {
> >@@ -61,7 +75,14 @@ struct usbdevfs_setinterface {
> >
> > struct usbdevfs_disconnectsignal {
> >       unsigned int signr;
> >+#if __riscv_xlen == 64
> >+      union {
> >+              void __user *context;
> >+              __u64 __context;
> >+      };
> >+#else
> >       void __user *context;
> >+#endif
> > };
> >
> > #define USBDEVFS_MAXDRIVERNAME 255
> >@@ -119,7 +140,14 @@ struct usbdevfs_urb {
> >       unsigned char endpoint;
> >       int status;
> >       unsigned int flags;
> >+#if __riscv_xlen == 64
> >+      union {
> >+              void __user *buffer;
> >+              __u64 __buffer;
> >+      };
> >+#else
> >       void __user *buffer;
> >+#endif
> >       int buffer_length;
> >       int actual_length;
> >       int start_frame;
> >@@ -130,7 +158,14 @@ struct usbdevfs_urb {
> >       int error_count;
> >       unsigned int signr;     /* signal to be sent on completion,
> >                                 or 0 if none should be sent. */
> >+#if __riscv_xlen == 64
> >+      union {
> >+              void __user *usercontext;
> >+              __u64 __usercontext;
> >+      };
> >+#else
> >       void __user *usercontext;
> >+#endif
> >       struct usbdevfs_iso_packet_desc iso_frame_desc[];
> > };
> >
> >@@ -139,7 +174,14 @@ struct usbdevfs_ioctl {
> >       int     ifno;           /* interface 0..N ; negative numbers reserved */
> >       int     ioctl_code;     /* MUST encode size + direction of data so the
> >                                * macros in <asm/ioctl.h> give correct values */
> >+#if __riscv_xlen == 64
> >+      union {
> >+              void __user *data;      /* param buffer (in, or out) */
> >+              __u64 __pad;
> >+      };
> >+#else
> >       void __user *data;      /* param buffer (in, or out) */
> >+#endif
> > };
> >
> > /* You can do most things with hubs just through control messages,
> >@@ -195,9 +237,17 @@ struct usbdevfs_streams {
> > #define USBDEVFS_SUBMITURB         _IOR('U', 10, struct usbdevfs_urb)
> > #define USBDEVFS_SUBMITURB32       _IOR('U', 10, struct usbdevfs_urb32)
> > #define USBDEVFS_DISCARDURB        _IO('U', 11)
> >+#if __riscv_xlen == 64
> >+#define USBDEVFS_REAPURB           _IOW('U', 12, __u64)
> >+#else
> > #define USBDEVFS_REAPURB           _IOW('U', 12, void *)
> >+#endif
> > #define USBDEVFS_REAPURB32         _IOW('U', 12, __u32)
> >+#if __riscv_xlen == 64
> >+#define USBDEVFS_REAPURBNDELAY     _IOW('U', 13, __u64)
> >+#else
> > #define USBDEVFS_REAPURBNDELAY     _IOW('U', 13, void *)
> >+#endif
> > #define USBDEVFS_REAPURBNDELAY32   _IOW('U', 13, __u32)
> > #define USBDEVFS_DISCSIGNAL        _IOR('U', 14, struct usbdevfs_disconnectsignal)
> > #define USBDEVFS_DISCSIGNAL32      _IOR('U', 14, struct usbdevfs_disconnectsignal32)
> >diff --git a/include/uapi/linux/uvcvideo.h b/include/uapi/linux/uvcvideo.h
> >index f86185456dc5..3ccb99039a43 100644
> >--- a/include/uapi/linux/uvcvideo.h
> >+++ b/include/uapi/linux/uvcvideo.h
> >@@ -54,7 +54,14 @@ struct uvc_xu_control_mapping {
> >       __u32 v4l2_type;
> >       __u32 data_type;
> >
> >+#if __riscv_xlen == 64
> >+      union {
> >+              struct uvc_menu_info __user *menu_info;
> >+              __u64 __menu_info;
> >+      };
> >+#else
> >       struct uvc_menu_info __user *menu_info;
> >+#endif
> >       __u32 menu_count;
> >
> >       __u32 reserved[4];
> >@@ -66,7 +73,14 @@ struct uvc_xu_control_query {
> >       __u8 query;             /* Video Class-Specific Request Code, */
> >                               /* defined in linux/usb/video.h A.8.  */
> >       __u16 size;
> >+#if __riscv_xlen == 64
> >+      union {
> >+              __u8 __user *data;
> >+              __u64 __data;
> >+      };
> >+#else
> >       __u8 __user *data;
> >+#endif
> > };
> >
> > #define UVCIOC_CTRL_MAP               _IOWR('u', 0x20, struct uvc_xu_control_mapping)
> >diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >index c8dbf8219c4f..0a1dc2a780fb 100644
> >--- a/include/uapi/linux/vfio.h
> >+++ b/include/uapi/linux/vfio.h
> >@@ -1570,7 +1570,14 @@ struct vfio_iommu_type1_dma_map {
> > struct vfio_bitmap {
> >       __u64        pgsize;    /* page size for bitmap in bytes */
> >       __u64        size;      /* in bytes */
> >+      #if __riscv_xlen == 64
> >+      union {
> >+              __u64 __user *data;     /* one bit per page */
> >+              __u64 __data;
> >+      };
> >+      #else
> >       __u64 __user *data;     /* one bit per page */
> >+      #endif
> > };
> >
> > /**
> >diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h
> >index e7c4dce39007..8e5391f07626 100644
> >--- a/include/uapi/linux/videodev2.h
> >+++ b/include/uapi/linux/videodev2.h
> >@@ -1898,7 +1898,14 @@ struct v4l2_ext_controls {
> >       __u32 error_idx;
> >       __s32 request_fd;
> >       __u32 reserved[1];
> >+#if __riscv_xlen == 64
> >+      union {
> >+              struct v4l2_ext_control *controls;
> >+              __u64 __controls;
> >+      };
> >+#else
> >       struct v4l2_ext_control *controls;
> >+#endif
> > };
> >
> > #define V4L2_CTRL_ID_MASK       (0x0fffffff)
> >--
> >2.40.1
> >
> >
Guo Ren March 26, 2025, 6:07 a.m. UTC | #10
On Tue, Mar 25, 2025 at 9:18 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Mar 25, 2025, at 13:26, Peter Zijlstra wrote:
> > On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote:
> >> From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>
> >>
> >> Since 2001, the CONFIG_64BIT kernel has been built with the LP64 ABI,
> >> but this patchset allows the CONFIG_64BIT kernel to use an ILP32 ABI
> >
> > Please, don't do this. This adds a significant maintenance burden on all
> > of us.
>
> It would be easier to this with CONFIG_64BIT disabled and continue
> treating CONFIG_64BIT to be the same as BITS_PER_LONG=64, but I still
> think it's fundamentally a bad idea to support this in mainline
> kernels in any variation, other than supporting regular 32-bit
> compat mode tasks on a regular 64-bit kernel.
>
> >> The patchset targets RISC-V and is built on the RV64ILP32 ABI, which
> >> was introduced into RISC-V's psABI in January 2025 [1]. This patchset
> >> equips an rv64ilp32-abi kernel with all the functionalities of a
> >> traditional lp64-abi kernel, yet restricts the address space to 2GiB.
> >> Hence, the rv64ilp32-abi kernel simultaneously supports lp64-abi
> >> userspace and ilp32-abi (compat) userspace, the same as the
> >> traditional lp64-abi kernel.
>
> You declare the syscall ABI to be the native 64-bit ABI, but this
> is fundamentally not true because a many uapi structures are
> defined in terms of 'long' or pointer values, in particular in
> the ioctl call.

I modified uapi with
void __user *msg_name;
->
union {void __user *msg_name; u64 __msg_name;};
to make native 64-bit ABI.

I would look at compat stuff instead of using __riscv_xlen macro.

> This might work for an rv64ilp32 userspace that
> uses the same headers and the same types, but you explicitly
> say that the goal is to run native rv64 or compat rv32 tasks,
> not rv64ilp32 (thanks!).

It's not for rv64ilp32-abi userspace, no rv64ilp32-abi userspace
introduced in the patch set.
It's for native lp64-abi.

Let's discuss this in the first patch thread:
uapi: Reuse lp64 ABI interface

>
> As far as I can tell, there is no way to rectify this design flaw
> other than to drop support for 64-bit userspace and only support
> regular rv32 userspace. I'm also skeptical that supporting rv64
> userspace helps in practice other than for testing, since
> generally most memory overhead is in userspace rather than the
> kernel, and there is much more to gain from shrinking the larger
> userspace by running rv32 compat mode binaries on a 64-bit kernel
> than the other way round.

The lp64-abi userspace rootfs works fine in this patch set, which
proves the technique is valid. But the modification on uapi is raw,
and I'm looking at compat stuff.

Supporting lp64-abi userspace is essential because riscv lp64-abi and
ilp32-abi userspace are hybrid deployments when the target is
ilp32-abi userspace. The lp64-abi provides a good supplement to
ilp32-abi which eases the development.

>
> If you remove the CONFIG_64BIT changes that Peter mentioned and
> the support for ilp64 userland from your series, you end up
> with a kernel that is very similar to a native rv32 kernel
> but executes as rv64ilp32 and runs rv32 userspace. I don't have
> any objections to that approach, and the same thing has come
> up on arm64 as a possible idea as well, but I don't know if
> that actually brings any notable advantage over an rv32 kernel.
>
> Are there CPUs that can run rv64 kernels and rv32 userspace
> but not rv32 kernels, similar to what we have on Arm Cortex-A76
> and Cortex-A510?

Yes, there is, and it only supports rv32 userspace, not rv32 kernel.
https://www.xrvm.com/product/xuantie/C908

Here are the products:
https://developer.canaan-creative.com/k230_canmv/en/dev/userguide/boards/canmv_k230d.html
http://riscv.org/ecosystem-news/2024/07/unpacking-the-canmv-k230-risc-v-board/
Guo Ren March 26, 2025, 6:34 a.m. UTC | #11
On Wed, Mar 26, 2025 at 4:41 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Tue, 25 Mar 2025 at 05:17, <guoren@kernel.org> wrote:
> >
> > The rv64ilp32 abi kernel accommodates the lp64 abi userspace and
> > leverages the lp64 abi Linux interface. Hence, unify the
> > BITS_PER_LONG = 32 memory layout to match BITS_PER_LONG = 64.
>
> No.
>
> This isn't happening.
>
> You can't do crazy things in the RISC-V code and then expect the rest
> of the kernel to just go "ok, we'll do crazy things".
>
> We're not doing crazy __riscv_xlen hackery with random structures
> containing 64-bit values that the kernel then only looks at the low 32
> bits. That's wrong on *so* many levels.
>
> I'm willing to say "big-endian is dead", but I'm not willing to accept
> this kind of crazy hackery.
>
> Not today, not ever.
>
> If you want to run a ilp32 kernel on 64-bit hardware (and support
> 64-bit ABI just in a 32-bit virtual memory size), I would suggest you
>
>  (a) treat the kernel as natively 32-bit (obviously you can then tell
> the compiler to use the rv64 instructions, which I presume you're
> already doing - I didn't look)
I used CONFIG_32BIT in v1 and v2, but I've abandoned them because,
based on CONFIG_64BIT, I gain more functionality by inheriting the
lp64-abi kernel. I want the full functionality of the CONFIG_64BIT
Linux kernel, which can be equivalent, used interchangeably, and
seamlessly.

>
>  (b) look at making the compat stuff do the conversion the "wrong way".
>
> And btw, that (b) implies *not* just ignoring the high bits. If
> user-space gives 64-bit pointer, you don't just treat it as a 32-bit
> one by dropping the high bits. You add some logic to convert it to an
> invalid pointer so that user space gets -EFAULT.
Thanks for the advice. I'm looking at how to make the compat stuff.
Arnd Bergmann March 26, 2025, 6:55 a.m. UTC | #12
On Wed, Mar 26, 2025, at 07:07, Guo Ren wrote:
> On Tue, Mar 25, 2025 at 9:18 PM Arnd Bergmann <arnd@arndb.de> wrote:
>> On Tue, Mar 25, 2025, at 13:26, Peter Zijlstra wrote:
>> > On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote:
>>
>> You declare the syscall ABI to be the native 64-bit ABI, but this
>> is fundamentally not true because a many uapi structures are
>> defined in terms of 'long' or pointer values, in particular in
>> the ioctl call.
>
> I modified uapi with
> void __user *msg_name;
> ->
> union {void __user *msg_name; u64 __msg_name;};
> to make native 64-bit ABI.
>
> I would look at compat stuff instead of using __riscv_xlen macro.

The problem I see here is that there are many more drivers
that you did not modify than drivers that you did change this
way.  The union is particularly ugly, but even if you find
a nicer method of doing this, you now also put the burden
on future driver writers to do this right for your platform.

>> As far as I can tell, there is no way to rectify this design flaw
>> other than to drop support for 64-bit userspace and only support
>> regular rv32 userspace. I'm also skeptical that supporting rv64
>> userspace helps in practice other than for testing, since
>> generally most memory overhead is in userspace rather than the
>> kernel, and there is much more to gain from shrinking the larger
>> userspace by running rv32 compat mode binaries on a 64-bit kernel
>> than the other way round.
>
> The lp64-abi userspace rootfs works fine in this patch set, which
> proves the technique is valid. But the modification on uapi is raw,
> and I'm looking at compat stuff.

There is a big difference between making it work for a particular
set of userspace binaries and making it correct for the entire
kernel ABI.

I agree that limiting the hacks to the compat side while keeping
the native ABI as ilp32 as in your previous versions is better,
but I also don't think this can be easily done without major
changes to how compat mode works in general, and that still
seems like a show-stopper for two reasons:

- it still puts the burden on driver writers to get it right
  for your platform. The scope is a bit smaller than in the
  current version because that would be limited to the compat
  handlers and not change the native codepath, but that's
  still a lot of drivers.

- the way that I would imagine this to be implemented in
  practice would require changing the compat code in a way that
  allows multiple compat ABIs, so drivers can separate the
  normal 32-on-64 handling from the 64-on-32 version you need.
  We have discussed something like this in the past, but Linus
  has already made it very clear that he doesn't want it done
  that way. Whichever way you do it, this is unlikely to
  find consensus.  

> Supporting lp64-abi userspace is essential because riscv lp64-abi and
> ilp32-abi userspace are hybrid deployments when the target is
> ilp32-abi userspace. The lp64-abi provides a good supplement to
> ilp32-abi which eases the development.

I'm not following here, please clarify. I do understand that
having a mixed 32/64 userspace can help for development, but
that can already be done on a 64-bit kernel and it doesn't
seem to be useful for deployment because having two sets of
support libraries makes this counterproductive for the goal
of saving RAM.

>> If you remove the CONFIG_64BIT changes that Peter mentioned and
>> the support for ilp64 userland from your series, you end up
>> with a kernel that is very similar to a native rv32 kernel
>> but executes as rv64ilp32 and runs rv32 userspace. I don't have
>> any objections to that approach, and the same thing has come
>> up on arm64 as a possible idea as well, but I don't know if
>> that actually brings any notable advantage over an rv32 kernel.
>>
>> Are there CPUs that can run rv64 kernels and rv32 userspace
>> but not rv32 kernels, similar to what we have on Arm Cortex-A76
>> and Cortex-A510?
>
> Yes, there is, and it only supports rv32 userspace, not rv32 kernel.
> https://www.xrvm.com/product/xuantie/C908

Ok, thanks for the link.

       Arnd
Guo Ren March 26, 2025, 9:22 a.m. UTC | #13
On Wed, Mar 26, 2025 at 1:19 AM Sergey Shtylyov <s.shtylyov@omp.ru> wrote:
>
> On 3/25/25 3:16 PM, guoren@kernel.org wrote:
>
> > From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>
> >
> > The rv64ilp32 abi reuses the env and argv memory layout of the
> > lp64 abi, so leave the space to fit the lp64 struct layout.
> >
> > Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>
> > ---
> >  fs/exec.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/exec.c b/fs/exec.c
> > index 506cd411f4ac..548d18b7ae92 100644
> > --- a/fs/exec.c
> > +++ b/fs/exec.c
> > @@ -424,6 +424,10 @@ static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr)
> >       }
> >  #endif
> >
> > +#if defined(CONFIG_64BIT) && (BITS_PER_LONG == 32)
okay, #if defined(CONFIG_64BIT) && BITS_PER_LONG == 32

>
>    Parens don't seem necessary...
>
> > +     nr = nr * 2;
>
>    Why not nr *= 2?
okay, nr *= 2;
Guo Ren March 27, 2025, 12:47 p.m. UTC | #14
On Wed, Mar 26, 2025 at 3:10 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
>
> * guoren@kernel.org <guoren@kernel.org> [250325 08:24]:
> > From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>
> >
> > The Maple tree algorithm uses ulong type for each element. The
> > number of slots is based on BITS_PER_LONG for RV64ILP32 ABI, so
> > use BITS_PER_LONG instead of CONFIG_64BIT.
> >
> > Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org>
> > ---
> >  include/linux/maple_tree.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> > index cbbcd18d4186..ff6265b6468b 100644
> > --- a/include/linux/maple_tree.h
> > +++ b/include/linux/maple_tree.h
> > @@ -24,7 +24,7 @@
> >   *
> >   * Nodes in the tree point to their parent unless bit 0 is set.
> >   */
> > -#if defined(CONFIG_64BIT) || defined(BUILD_VDSO32_64)
> > +#if (BITS_PER_LONG == 64) || defined(BUILD_VDSO32_64)
>
> This will break my userspace testing, if you do not update the testing as
> well.  This can be found in tools/testing/radix-tree.  Please also look
> at the Makefile as well since it will generate a build flag for the
> userspace.
I think you are talking about the following:
============================================================
../shared/shared.mk:
ifndef LONG_BIT
LONG_BIT := $(shell getconf LONG_BIT)
endif

generated/bit-length.h: FORCE
        @mkdir -p generated
        @if ! grep -qws CONFIG_$(LONG_BIT)BIT generated/bit-length.h; then   \
                echo "Generating $@";                                        \
                echo "#define CONFIG_$(LONG_BIT)BIT 1" > $@;                 \
                echo "#define CONFIG_PHYS_ADDR_T_$(LONG_BIT)BIT 1" >> $@;    \
        fi

$ grep CONFIG_64BIT * -r -A 2
generated/bit-length.h:#define CONFIG_64BIT 1
generated/bit-length.h-#define CONFIG_PHYS_ADDR_T_64BIT 1
--
maple.c:#if defined(CONFIG_64BIT)
maple.c-static noinline void __init check_erase2_testset(struct maple_tree *mt,
maple.c-                const unsigned long *set, unsigned long size)
--
maple.c:#if CONFIG_64BIT
maple.c-        MT_BUG_ON(mt, data_end != mas_data_end(&mas));
maple.c-#endif
--
maple.c:#if CONFIG_64BIT
maple.c-        MT_BUG_ON(mt, data_end - 2 != mas_data_end(&mas));
maple.c-#endif
--
maple.c:#if CONFIG_64BIT
maple.c-        MT_BUG_ON(mt, data_end - 4 != mas_data_end(&mas));
maple.c-#endif
--
maple.c:#if defined(CONFIG_64BIT)
maple.c-        /* Captures from VMs that found previous errors */
maple.c-        mt_init_flags(&tree, 0);
============================================================

First, we don't introduce rv64ilp32-abi user space, which means these
testing codes can't run on rv64ilp32-abi userspace currently. So, the
problem you mentioned doesn't exist.

Second, CONFIG_32BIT is determined by LONG_BIT, so there's no issue in
maple.c with future rv64ilp32-abi userspace.
That means rv64ilp32-abi userspace would use CONFIG_32BIT to test
radix-tree. It's okay.

>
> This raises other concerns as the code is found with a grep command, so
> I'm not sure why it was missed and if anything else is missed?
>
> If you consider this email to be the (unasked) question about what to do
> here, then please CC me, the maintainer of the files including the one
> you are updating here.
>
> Thank you,
> Liam
>
Guo Ren March 27, 2025, 1:13 p.m. UTC | #15
On Wed, Mar 26, 2025 at 2:56 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Mar 26, 2025, at 07:07, Guo Ren wrote:
> > On Tue, Mar 25, 2025 at 9:18 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >> On Tue, Mar 25, 2025, at 13:26, Peter Zijlstra wrote:
> >> > On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote:
> >>
> >> You declare the syscall ABI to be the native 64-bit ABI, but this
> >> is fundamentally not true because a many uapi structures are
> >> defined in terms of 'long' or pointer values, in particular in
> >> the ioctl call.
> >
> > I modified uapi with
> > void __user *msg_name;
> > ->
> > union {void __user *msg_name; u64 __msg_name;};
> > to make native 64-bit ABI.
> >
> > I would look at compat stuff instead of using __riscv_xlen macro.
>
> The problem I see here is that there are many more drivers
> that you did not modify than drivers that you did change this
> way.  The union is particularly ugly, but even if you find
> a nicer method of doing this, you now also put the burden
> on future driver writers to do this right for your platform.
Got it.

>
> >> As far as I can tell, there is no way to rectify this design flaw
> >> other than to drop support for 64-bit userspace and only support
> >> regular rv32 userspace. I'm also skeptical that supporting rv64
> >> userspace helps in practice other than for testing, since
> >> generally most memory overhead is in userspace rather than the
> >> kernel, and there is much more to gain from shrinking the larger
> >> userspace by running rv32 compat mode binaries on a 64-bit kernel
> >> than the other way round.
> >
> > The lp64-abi userspace rootfs works fine in this patch set, which
> > proves the technique is valid. But the modification on uapi is raw,
> > and I'm looking at compat stuff.
>
> There is a big difference between making it work for a particular
> set of userspace binaries and making it correct for the entire
> kernel ABI.
>
> I agree that limiting the hacks to the compat side while keeping
> the native ABI as ilp32 as in your previous versions is better,
> but I also don't think this can be easily done without major
> changes to how compat mode works in general, and that still
> seems like a show-stopper for two reasons:
>
> - it still puts the burden on driver writers to get it right
>   for your platform. The scope is a bit smaller than in the
>   current version because that would be limited to the compat
>   handlers and not change the native codepath, but that's
>   still a lot of drivers.
>
> - the way that I would imagine this to be implemented in
>   practice would require changing the compat code in a way that
>   allows multiple compat ABIs, so drivers can separate the
>   normal 32-on-64 handling from the 64-on-32 version you need.
>   We have discussed something like this in the past, but Linus
>   has already made it very clear that he doesn't want it done
>   that way. Whichever way you do it, this is unlikely to
>   find consensus.
Got it, thanks for analysing.

>
> > Supporting lp64-abi userspace is essential because riscv lp64-abi and
> > ilp32-abi userspace are hybrid deployments when the target is
> > ilp32-abi userspace. The lp64-abi provides a good supplement to
> > ilp32-abi which eases the development.
>
> I'm not following here, please clarify. I do understand that
> having a mixed 32/64 userspace can help for development, but
> that can already be done on a 64-bit kernel and it doesn't
> seem to be useful for deployment because having two sets of
> support libraries makes this counterproductive for the goal
> of saving RAM.
In my case, most binaries and libraries are based on 32-bit, but a
small part would remain on 64-bit, which may be statically linked.
For RISC-V, the rv64 ecosystem is more complete than the rv32's. So,
rv64-abi is always necessary, and rv32-abi is a supplement.

>
> >> If you remove the CONFIG_64BIT changes that Peter mentioned and
> >> the support for ilp64 userland from your series, you end up
> >> with a kernel that is very similar to a native rv32 kernel
> >> but executes as rv64ilp32 and runs rv32 userspace. I don't have
> >> any objections to that approach, and the same thing has come
> >> up on arm64 as a possible idea as well, but I don't know if
> >> that actually brings any notable advantage over an rv32 kernel.
> >>
> >> Are there CPUs that can run rv64 kernels and rv32 userspace
> >> but not rv32 kernels, similar to what we have on Arm Cortex-A76
> >> and Cortex-A510?
> >
> > Yes, there is, and it only supports rv32 userspace, not rv32 kernel.
> > https://www.xrvm.com/product/xuantie/C908
>
> Ok, thanks for the link.
>
>        Arnd
>
Palmer Dabbelt March 27, 2025, 4:20 p.m. UTC | #16
On Tue, 25 Mar 2025 13:41:30 PDT (-0700), Linus Torvalds wrote:
> On Tue, 25 Mar 2025 at 05:17, <guoren@kernel.org> wrote:
>>
>> The rv64ilp32 abi kernel accommodates the lp64 abi userspace and
>> leverages the lp64 abi Linux interface. Hence, unify the
>> BITS_PER_LONG = 32 memory layout to match BITS_PER_LONG = 64.
>
> No.
>
> This isn't happening.
>
> You can't do crazy things in the RISC-V code and then expect the rest
> of the kernel to just go "ok, we'll do crazy things".
>
> We're not doing crazy __riscv_xlen hackery with random structures
> containing 64-bit values that the kernel then only looks at the low 32
> bits. That's wrong on *so* many levels.

FWIW: this has come up a few times and we've generally said "nobody 
wants this", but that doesn't seem to stick...

> I'm willing to say "big-endian is dead", but I'm not willing to accept
> this kind of crazy hackery.
>
> Not today, not ever.

OK, maybe that will stick ;)

> If you want to run a ilp32 kernel on 64-bit hardware (and support
> 64-bit ABI just in a 32-bit virtual memory size), I would suggest you
>
>  (a) treat the kernel as natively 32-bit (obviously you can then tell
> the compiler to use the rv64 instructions, which I presume you're
> already doing - I didn't look)
>
>  (b) look at making the compat stuff do the conversion the "wrong way".
>
> And btw, that (b) implies *not* just ignoring the high bits. If
> user-space gives 64-bit pointer, you don't just treat it as a 32-bit
> one by dropping the high bits. You add some logic to convert it to an
> invalid pointer so that user space gets -EFAULT.
>
>             Linus
David Laight March 27, 2025, 9:06 p.m. UTC | #17
On Tue, 25 Mar 2025 08:15:41 -0400
guoren@kernel.org wrote:

> From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org>
> 
> Since 2001, the CONFIG_64BIT kernel has been built with the LP64 ABI,
> but this patchset allows the CONFIG_64BIT kernel to use an ILP32 ABI
> for construction to reduce cache & memory footprint (Compared to
> kernel-lp64-abi, kernel-rv64ilp32-abi decreased the used memory by
> about 20%, as shown in "free -h" in the following demo.)
...

Why on earth would you want to run a 64bit application on a 32bit kernel.
IIRC the main justification for 64bit was to get a larger address space.

Now you might want to compile a 32bit (ILP32) system that actually
runs in 64bit mode (c/f x32) so that 64bit maths (long long) is
more efficient - but that is a different issue.
(I suspect you'd need to change the process switch code to save
all 64bits of the registers - but maybe not much else??)

	David