Message ID | 20250325121624.523258-1-guoren@kernel.org |
---|---|
Headers | show |
Series | rv64ilp32_abi: Build CONFIG_64BIT kernel-self with ILP32 ABI | expand |
On Tue, Mar 25, 2025 at 8:27 PM Peter Zijlstra <peterz@infradead.org> wrote: > > On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote: > > From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org> > > > > Since 2001, the CONFIG_64BIT kernel has been built with the LP64 ABI, > > but this patchset allows the CONFIG_64BIT kernel to use an ILP32 ABI > > I'm thinking you're going to be finding a metric ton of assumptions > about 'unsigned long' being 64bit when 64BIT=y throughout the kernel. Less than you imagined. Most code is compatible with ILP32 ABI due to the CONFIG_32BIT. In my practice, it's deemed acceptable. > > I know of a couple of places where 64BIT will result in different math > such that a 32bit 'unsigned long' will trivially overflow. I would be grateful if you could share some with me. > > Please, don't do this. This adds a significant maintenance burden on all > of us. The 64ILP32 ABI would bear the maintenance burden, not traditional 64-bit or 32-bit ABIs. The patch set won't impact other CONFIG_64BIT or CONFIG_32BIT. Numerous RV64 chips require the RV64ILP32 ABI to reduce the memory and cache footprint; we will bear the burden. The core code maintainers would receive patches that would make them use BITS_PER_LONG and CONFIG_64BIT more accurately.
On Tue, Mar 25, 2025, at 13:26, Peter Zijlstra wrote: > On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote: >> From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org> >> >> Since 2001, the CONFIG_64BIT kernel has been built with the LP64 ABI, >> but this patchset allows the CONFIG_64BIT kernel to use an ILP32 ABI > > Please, don't do this. This adds a significant maintenance burden on all > of us. It would be easier to this with CONFIG_64BIT disabled and continue treating CONFIG_64BIT to be the same as BITS_PER_LONG=64, but I still think it's fundamentally a bad idea to support this in mainline kernels in any variation, other than supporting regular 32-bit compat mode tasks on a regular 64-bit kernel. >> The patchset targets RISC-V and is built on the RV64ILP32 ABI, which >> was introduced into RISC-V's psABI in January 2025 [1]. This patchset >> equips an rv64ilp32-abi kernel with all the functionalities of a >> traditional lp64-abi kernel, yet restricts the address space to 2GiB. >> Hence, the rv64ilp32-abi kernel simultaneously supports lp64-abi >> userspace and ilp32-abi (compat) userspace, the same as the >> traditional lp64-abi kernel. You declare the syscall ABI to be the native 64-bit ABI, but this is fundamentally not true because a many uapi structures are defined in terms of 'long' or pointer values, in particular in the ioctl call. This might work for an rv64ilp32 userspace that uses the same headers and the same types, but you explicitly say that the goal is to run native rv64 or compat rv32 tasks, not rv64ilp32 (thanks!). As far as I can tell, there is no way to rectify this design flaw other than to drop support for 64-bit userspace and only support regular rv32 userspace. I'm also skeptical that supporting rv64 userspace helps in practice other than for testing, since generally most memory overhead is in userspace rather than the kernel, and there is much more to gain from shrinking the larger userspace by running rv32 compat mode binaries on a 64-bit kernel than the other way round. If you remove the CONFIG_64BIT changes that Peter mentioned and the support for ilp64 userland from your series, you end up with a kernel that is very similar to a native rv32 kernel but executes as rv64ilp32 and runs rv32 userspace. I don't have any objections to that approach, and the same thing has come up on arm64 as a possible idea as well, but I don't know if that actually brings any notable advantage over an rv32 kernel. Are there CPUs that can run rv64 kernels and rv32 userspace but not rv32 kernels, similar to what we have on Arm Cortex-A76 and Cortex-A510? Arnd
On 3/25/25 3:16 PM, guoren@kernel.org wrote: > From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org> > > The rv64ilp32 abi reuses the env and argv memory layout of the > lp64 abi, so leave the space to fit the lp64 struct layout. > > Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> > --- > fs/exec.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/fs/exec.c b/fs/exec.c > index 506cd411f4ac..548d18b7ae92 100644 > --- a/fs/exec.c > +++ b/fs/exec.c > @@ -424,6 +424,10 @@ static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr) > } > #endif > > +#if defined(CONFIG_64BIT) && (BITS_PER_LONG == 32) Parens don't seem necessary... > + nr = nr * 2; Why not nr *= 2? [...] MBR, Sergey
On 25.03.25 13:26, Peter Zijlstra wrote: > On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote: >> From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org> >> >> Since 2001, the CONFIG_64BIT kernel has been built with the LP64 ABI, >> but this patchset allows the CONFIG_64BIT kernel to use an ILP32 ABI > > I'm thinking you're going to be finding a metric ton of assumptions > about 'unsigned long' being 64bit when 64BIT=y throughout the kernel. > > I know of a couple of places where 64BIT will result in different math > such that a 32bit 'unsigned long' will trivially overflow. > > Please, don't do this. This adds a significant maintenance burden on all > of us. > Fully agreed.
* guoren@kernel.org <guoren@kernel.org> [250325 08:24]: > From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org> > > The Maple tree algorithm uses ulong type for each element. The > number of slots is based on BITS_PER_LONG for RV64ILP32 ABI, so > use BITS_PER_LONG instead of CONFIG_64BIT. > > Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> > --- > include/linux/maple_tree.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h > index cbbcd18d4186..ff6265b6468b 100644 > --- a/include/linux/maple_tree.h > +++ b/include/linux/maple_tree.h > @@ -24,7 +24,7 @@ > * > * Nodes in the tree point to their parent unless bit 0 is set. > */ > -#if defined(CONFIG_64BIT) || defined(BUILD_VDSO32_64) > +#if (BITS_PER_LONG == 64) || defined(BUILD_VDSO32_64) This will break my userspace testing, if you do not update the testing as well. This can be found in tools/testing/radix-tree. Please also look at the Makefile as well since it will generate a build flag for the userspace. This raises other concerns as the code is found with a grep command, so I'm not sure why it was missed and if anything else is missed? If you consider this email to be the (unasked) question about what to do here, then please CC me, the maintainer of the files including the one you are updating here. Thank you, Liam
* David Hildenbrand <david@redhat.com> [250325 14:52]: > On 25.03.25 13:26, Peter Zijlstra wrote: > > On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote: > > > From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org> > > > > > > Since 2001, the CONFIG_64BIT kernel has been built with the LP64 ABI, > > > but this patchset allows the CONFIG_64BIT kernel to use an ILP32 ABI > > > > I'm thinking you're going to be finding a metric ton of assumptions > > about 'unsigned long' being 64bit when 64BIT=y throughout the kernel. > > > > I know of a couple of places where 64BIT will result in different math > > such that a 32bit 'unsigned long' will trivially overflow. > > > > Please, don't do this. This adds a significant maintenance burden on all > > of us. > > > > Fully agreed. I would go further and say I do not want this to go in. The open ended maintenance burden is not worth extending hardware life of a board with 16mb of ram (If I understand your 2023 LPC slides correctly). Thank you, Liam
On Tuesday 2025-03-25 13:15, guoren@kernel.org wrote: >diff --git a/include/uapi/linux/netfilter/x_tables.h b/include/uapi/linux/netfilter/x_tables.h >index 796af83a963a..7e02e34c6fad 100644 >--- a/include/uapi/linux/netfilter/x_tables.h >+++ b/include/uapi/linux/netfilter/x_tables.h >@@ -18,7 +18,11 @@ struct xt_entry_match { > __u8 revision; > } user; > struct { >+#if __riscv_xlen == 64 >+ __u64 match_size; >+#else > __u16 match_size; >+#endif > > /* Used inside the kernel */ > struct xt_match *match; The __u16 is the common prefix of the union which is exposed to userspace. If anything, you need to use __attribute__((aligned(8))) to move `match` to a fixed location. However, that sub-struct is only used inside the kernel and never exposed, so the alignment of `match` should not play a role. Moreover, change from u16 to u64 would break RISC-V Big-Endian. Even if there currently is no big-endian variant, let's not introduce such breakage. >--- a/include/uapi/linux/netfilter_ipv4/ip_tables.h >+++ b/include/uapi/linux/netfilter_ipv4/ip_tables.h >@@ -200,7 +200,14 @@ struct ipt_replace { > /* Number of counters (must be equal to current number of entries). */ > unsigned int num_counters; > /* The old entries' counters. */ >+#if __riscv_xlen == 64 >+ union { >+ struct xt_counters __user *counters; >+ __u64 __counters; >+ }; >+#else > struct xt_counters __user *counters; >+#endif > > /* The entries (hang off end: not really an array). */ > struct ipt_entry entries[]; This seems ok, but perhaps there is a better name for __riscv_xlen (ifdef CONFIG_????ilp32), so it is not strictly tied to riscv, in case other platform wants to try ilp32-self mode. >+#if __riscv_xlen == 64 >+ union { >+ int __user *auth_flavours; /* 1 */ >+ __u64 __auth_flavours; >+ }; >+#else > int __user *auth_flavours; /* 1 */ >+#endif > }; > > /* bits in the flags field */ >diff --git a/include/uapi/linux/ppp-ioctl.h b/include/uapi/linux/ppp-ioctl.h >index 1cc5ce0ae062..8d48eab430c1 100644 >--- a/include/uapi/linux/ppp-ioctl.h >+++ b/include/uapi/linux/ppp-ioctl.h >@@ -59,7 +59,14 @@ struct npioctl { > > /* Structure describing a CCP configuration option, for PPPIOCSCOMPRESS */ > struct ppp_option_data { >+#if __riscv_xlen == 64 >+ union { >+ __u8 __user *ptr; >+ __u64 __ptr; >+ }; >+#else > __u8 __user *ptr; >+#endif > __u32 length; > int transmit; > }; >diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h >index b7d91d4cf0db..46a06fddcd2f 100644 >--- a/include/uapi/linux/sctp.h >+++ b/include/uapi/linux/sctp.h >@@ -1024,6 +1024,9 @@ struct sctp_getaddrs_old { > #else > struct sockaddr *addrs; > #endif >+#if (__riscv_xlen == 64) && (__SIZEOF_LONG__ == 4) >+ __u32 unused; >+#endif > }; > > struct sctp_getaddrs { >diff --git a/include/uapi/linux/sem.h b/include/uapi/linux/sem.h >index 75aa3b273cd9..de9f441913cd 100644 >--- a/include/uapi/linux/sem.h >+++ b/include/uapi/linux/sem.h >@@ -26,10 +26,29 @@ struct semid_ds { > struct ipc_perm sem_perm; /* permissions .. see ipc.h */ > __kernel_old_time_t sem_otime; /* last semop time */ > __kernel_old_time_t sem_ctime; /* create/last semctl() time */ >+#if __riscv_xlen == 64 >+ union { >+ struct sem *sem_base; /* ptr to first semaphore in array */ >+ __u64 __sem_base; >+ }; >+ union { >+ struct sem_queue *sem_pending; /* pending operations to be processed */ >+ __u64 __sem_pending; >+ }; >+ union { >+ struct sem_queue **sem_pending_last; /* last pending operation */ >+ __u64 __sem_pending_last; >+ }; >+ union { >+ struct sem_undo *undo; /* undo requests on this array */ >+ __u64 __undo; >+ }; >+#else > struct sem *sem_base; /* ptr to first semaphore in array */ > struct sem_queue *sem_pending; /* pending operations to be processed */ > struct sem_queue **sem_pending_last; /* last pending operation */ > struct sem_undo *undo; /* undo requests on this array */ >+#endif > unsigned short sem_nsems; /* no. of semaphores in array */ > }; > >@@ -46,10 +65,29 @@ struct sembuf { > /* arg for semctl system calls. */ > union semun { > int val; /* value for SETVAL */ >+#if __riscv_xlen == 64 >+ union { >+ struct semid_ds __user *buf; /* buffer for IPC_STAT & IPC_SET */ >+ __u64 ___buf; >+ }; >+ union { >+ unsigned short __user *array; /* array for GETALL & SETALL */ >+ __u64 __array; >+ }; >+ union { >+ struct seminfo __user *__buf; /* buffer for IPC_INFO */ >+ __u64 ____buf; >+ }; >+ union { >+ void __user *__pad; >+ __u64 ____pad; >+ }; >+#else > struct semid_ds __user *buf; /* buffer for IPC_STAT & IPC_SET */ > unsigned short __user *array; /* array for GETALL & SETALL */ > struct seminfo __user *__buf; /* buffer for IPC_INFO */ > void __user *__pad; >+#endif > }; > > struct seminfo { >diff --git a/include/uapi/linux/socket.h b/include/uapi/linux/socket.h >index d3fcd3b5ec53..5f7a83649395 100644 >--- a/include/uapi/linux/socket.h >+++ b/include/uapi/linux/socket.h >@@ -22,7 +22,14 @@ struct __kernel_sockaddr_storage { > /* space to achieve desired size, */ > /* _SS_MAXSIZE value minus size of ss_family */ > }; >+#if __riscv_xlen == 64 >+ union { >+ void *__align; /* implementation specific desired alignment */ >+ u64 ___align; >+ }; >+#else > void *__align; /* implementation specific desired alignment */ >+#endif > }; > }; > >diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h >index 8981f00204db..8ed7b29897f9 100644 >--- a/include/uapi/linux/sysctl.h >+++ b/include/uapi/linux/sysctl.h >@@ -33,13 +33,45 @@ > member of a struct __sysctl_args to have? */ > > struct __sysctl_args { >+#if __riscv_xlen == 64 >+ union { >+ int __user *name; >+ __u64 __name; >+ }; >+#else > int __user *name; >+#endif > int nlen; >+#if __riscv_xlen == 64 >+ union { >+ void __user *oldval; >+ __u64 __oldval; >+ }; >+#else > void __user *oldval; >+#endif >+#if __riscv_xlen == 64 >+ union { >+ size_t __user *oldlenp; >+ __u64 __oldlenp; >+ }; >+#else > size_t __user *oldlenp; >+#endif >+#if __riscv_xlen == 64 >+ union { >+ void __user *newval; >+ __u64 __newval; >+ }; >+#else > void __user *newval; >+#endif > size_t newlen; >+#if __riscv_xlen == 64 >+ unsigned long long __unused[4]; >+#else > unsigned long __unused[4]; >+#endif > }; > > /* Define sysctl names first */ >diff --git a/include/uapi/linux/uhid.h b/include/uapi/linux/uhid.h >index cef7534d2d19..4a774dbd3de8 100644 >--- a/include/uapi/linux/uhid.h >+++ b/include/uapi/linux/uhid.h >@@ -130,7 +130,14 @@ struct uhid_create_req { > __u8 name[128]; > __u8 phys[64]; > __u8 uniq[64]; >+#if __riscv_xlen == 64 >+ union { >+ __u8 __user *rd_data; >+ __u64 __rd_data; >+ }; >+#else > __u8 __user *rd_data; >+#endif > __u16 rd_size; > > __u16 bus; >diff --git a/include/uapi/linux/uio.h b/include/uapi/linux/uio.h >index 649739e0c404..27dfd6032dc6 100644 >--- a/include/uapi/linux/uio.h >+++ b/include/uapi/linux/uio.h >@@ -16,8 +16,19 @@ > > struct iovec > { >+#if __riscv_xlen == 64 >+ union { >+ void __user *iov_base; /* BSD uses caddr_t (1003.1g requires void *) */ >+ __u64 __iov_base; >+ }; >+ union { >+ __kernel_size_t iov_len; /* Must be size_t (1003.1g) */ >+ __u64 __iov_len; >+ }; >+#else > void __user *iov_base; /* BSD uses caddr_t (1003.1g requires void *) */ > __kernel_size_t iov_len; /* Must be size_t (1003.1g) */ >+#endif > }; > > struct dmabuf_cmsg { >diff --git a/include/uapi/linux/usb/tmc.h b/include/uapi/linux/usb/tmc.h >index d791cc58a7f0..443ec5356caf 100644 >--- a/include/uapi/linux/usb/tmc.h >+++ b/include/uapi/linux/usb/tmc.h >@@ -51,7 +51,14 @@ struct usbtmc_request { > > struct usbtmc_ctrlrequest { > struct usbtmc_request req; >+#if __riscv_xlen == 64 >+ union { >+ void __user *data; /* pointer to user space */ >+ __u64 __data; /* pointer to user space */ >+ }; >+#else > void __user *data; /* pointer to user space */ >+#endif > } __attribute__ ((packed)); > > struct usbtmc_termchar { >@@ -70,7 +77,14 @@ struct usbtmc_message { > __u32 transfer_size; /* size of bytes to transfer */ > __u32 transferred; /* size of received/written bytes */ > __u32 flags; /* bit 0: 0 = synchronous; 1 = asynchronous */ >+#if __riscv_xlen == 64 >+ union { >+ void __user *message; /* pointer to header and data in user space */ >+ __u64 __message; >+ }; >+#else > void __user *message; /* pointer to header and data in user space */ >+#endif > } __attribute__ ((packed)); > > /* Request values for USBTMC driver's ioctl entry point */ >diff --git a/include/uapi/linux/usbdevice_fs.h b/include/uapi/linux/usbdevice_fs.h >index 74a84e02422a..8c8efef74c3c 100644 >--- a/include/uapi/linux/usbdevice_fs.h >+++ b/include/uapi/linux/usbdevice_fs.h >@@ -44,14 +44,28 @@ struct usbdevfs_ctrltransfer { > __u16 wIndex; > __u16 wLength; > __u32 timeout; /* in milliseconds */ >+#if __riscv_xlen == 64 >+ union { >+ void __user *data; >+ __u64 __data; >+ }; >+#else > void __user *data; >+#endif > }; > > struct usbdevfs_bulktransfer { > unsigned int ep; > unsigned int len; > unsigned int timeout; /* in milliseconds */ >+#if __riscv_xlen == 64 >+ union { >+ void __user *data; >+ __u64 __data; >+ }; >+#else > void __user *data; >+#endif > }; > > struct usbdevfs_setinterface { >@@ -61,7 +75,14 @@ struct usbdevfs_setinterface { > > struct usbdevfs_disconnectsignal { > unsigned int signr; >+#if __riscv_xlen == 64 >+ union { >+ void __user *context; >+ __u64 __context; >+ }; >+#else > void __user *context; >+#endif > }; > > #define USBDEVFS_MAXDRIVERNAME 255 >@@ -119,7 +140,14 @@ struct usbdevfs_urb { > unsigned char endpoint; > int status; > unsigned int flags; >+#if __riscv_xlen == 64 >+ union { >+ void __user *buffer; >+ __u64 __buffer; >+ }; >+#else > void __user *buffer; >+#endif > int buffer_length; > int actual_length; > int start_frame; >@@ -130,7 +158,14 @@ struct usbdevfs_urb { > int error_count; > unsigned int signr; /* signal to be sent on completion, > or 0 if none should be sent. */ >+#if __riscv_xlen == 64 >+ union { >+ void __user *usercontext; >+ __u64 __usercontext; >+ }; >+#else > void __user *usercontext; >+#endif > struct usbdevfs_iso_packet_desc iso_frame_desc[]; > }; > >@@ -139,7 +174,14 @@ struct usbdevfs_ioctl { > int ifno; /* interface 0..N ; negative numbers reserved */ > int ioctl_code; /* MUST encode size + direction of data so the > * macros in <asm/ioctl.h> give correct values */ >+#if __riscv_xlen == 64 >+ union { >+ void __user *data; /* param buffer (in, or out) */ >+ __u64 __pad; >+ }; >+#else > void __user *data; /* param buffer (in, or out) */ >+#endif > }; > > /* You can do most things with hubs just through control messages, >@@ -195,9 +237,17 @@ struct usbdevfs_streams { > #define USBDEVFS_SUBMITURB _IOR('U', 10, struct usbdevfs_urb) > #define USBDEVFS_SUBMITURB32 _IOR('U', 10, struct usbdevfs_urb32) > #define USBDEVFS_DISCARDURB _IO('U', 11) >+#if __riscv_xlen == 64 >+#define USBDEVFS_REAPURB _IOW('U', 12, __u64) >+#else > #define USBDEVFS_REAPURB _IOW('U', 12, void *) >+#endif > #define USBDEVFS_REAPURB32 _IOW('U', 12, __u32) >+#if __riscv_xlen == 64 >+#define USBDEVFS_REAPURBNDELAY _IOW('U', 13, __u64) >+#else > #define USBDEVFS_REAPURBNDELAY _IOW('U', 13, void *) >+#endif > #define USBDEVFS_REAPURBNDELAY32 _IOW('U', 13, __u32) > #define USBDEVFS_DISCSIGNAL _IOR('U', 14, struct usbdevfs_disconnectsignal) > #define USBDEVFS_DISCSIGNAL32 _IOR('U', 14, struct usbdevfs_disconnectsignal32) >diff --git a/include/uapi/linux/uvcvideo.h b/include/uapi/linux/uvcvideo.h >index f86185456dc5..3ccb99039a43 100644 >--- a/include/uapi/linux/uvcvideo.h >+++ b/include/uapi/linux/uvcvideo.h >@@ -54,7 +54,14 @@ struct uvc_xu_control_mapping { > __u32 v4l2_type; > __u32 data_type; > >+#if __riscv_xlen == 64 >+ union { >+ struct uvc_menu_info __user *menu_info; >+ __u64 __menu_info; >+ }; >+#else > struct uvc_menu_info __user *menu_info; >+#endif > __u32 menu_count; > > __u32 reserved[4]; >@@ -66,7 +73,14 @@ struct uvc_xu_control_query { > __u8 query; /* Video Class-Specific Request Code, */ > /* defined in linux/usb/video.h A.8. */ > __u16 size; >+#if __riscv_xlen == 64 >+ union { >+ __u8 __user *data; >+ __u64 __data; >+ }; >+#else > __u8 __user *data; >+#endif > }; > > #define UVCIOC_CTRL_MAP _IOWR('u', 0x20, struct uvc_xu_control_mapping) >diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h >index c8dbf8219c4f..0a1dc2a780fb 100644 >--- a/include/uapi/linux/vfio.h >+++ b/include/uapi/linux/vfio.h >@@ -1570,7 +1570,14 @@ struct vfio_iommu_type1_dma_map { > struct vfio_bitmap { > __u64 pgsize; /* page size for bitmap in bytes */ > __u64 size; /* in bytes */ >+ #if __riscv_xlen == 64 >+ union { >+ __u64 __user *data; /* one bit per page */ >+ __u64 __data; >+ }; >+ #else > __u64 __user *data; /* one bit per page */ >+ #endif > }; > > /** >diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h >index e7c4dce39007..8e5391f07626 100644 >--- a/include/uapi/linux/videodev2.h >+++ b/include/uapi/linux/videodev2.h >@@ -1898,7 +1898,14 @@ struct v4l2_ext_controls { > __u32 error_idx; > __s32 request_fd; > __u32 reserved[1]; >+#if __riscv_xlen == 64 >+ union { >+ struct v4l2_ext_control *controls; >+ __u64 __controls; >+ }; >+#else > struct v4l2_ext_control *controls; >+#endif > }; > > #define V4L2_CTRL_ID_MASK (0x0fffffff) >-- >2.40.1 > >
On Tue, 25 Mar 2025 at 05:17, <guoren@kernel.org> wrote: > > The rv64ilp32 abi kernel accommodates the lp64 abi userspace and > leverages the lp64 abi Linux interface. Hence, unify the > BITS_PER_LONG = 32 memory layout to match BITS_PER_LONG = 64. No. This isn't happening. You can't do crazy things in the RISC-V code and then expect the rest of the kernel to just go "ok, we'll do crazy things". We're not doing crazy __riscv_xlen hackery with random structures containing 64-bit values that the kernel then only looks at the low 32 bits. That's wrong on *so* many levels. I'm willing to say "big-endian is dead", but I'm not willing to accept this kind of crazy hackery. Not today, not ever. If you want to run a ilp32 kernel on 64-bit hardware (and support 64-bit ABI just in a 32-bit virtual memory size), I would suggest you (a) treat the kernel as natively 32-bit (obviously you can then tell the compiler to use the rv64 instructions, which I presume you're already doing - I didn't look) (b) look at making the compat stuff do the conversion the "wrong way". And btw, that (b) implies *not* just ignoring the high bits. If user-space gives 64-bit pointer, you don't just treat it as a 32-bit one by dropping the high bits. You add some logic to convert it to an invalid pointer so that user space gets -EFAULT. Linus
On Wed, Mar 26, 2025 at 4:31 AM Jan Engelhardt <ej@inai.de> wrote: > > > On Tuesday 2025-03-25 13:15, guoren@kernel.org wrote: > > >diff --git a/include/uapi/linux/netfilter/x_tables.h b/include/uapi/linux/netfilter/x_tables.h > >index 796af83a963a..7e02e34c6fad 100644 > >--- a/include/uapi/linux/netfilter/x_tables.h > >+++ b/include/uapi/linux/netfilter/x_tables.h > >@@ -18,7 +18,11 @@ struct xt_entry_match { > > __u8 revision; > > } user; > > struct { > >+#if __riscv_xlen == 64 > >+ __u64 match_size; > >+#else > > __u16 match_size; > >+#endif > > > > /* Used inside the kernel */ > > struct xt_match *match; > > The __u16 is the common prefix of the union which is exposed to userspace. > If anything, you need to use __attribute__((aligned(8))) to move > `match` to a fixed location. > > However, that sub-struct is only used inside the kernel and never exposed, > so the alignment of `match` should not play a role. > > Moreover, change from u16 to u64 would break RISC-V Big-Endian. Even if there > currently is no big-endian variant, let's not introduce such breakage. You're correct. The __u64 modification is too raw from the proof of concept. It's not correct, so I would accept your advice. > > > >--- a/include/uapi/linux/netfilter_ipv4/ip_tables.h > >+++ b/include/uapi/linux/netfilter_ipv4/ip_tables.h > >@@ -200,7 +200,14 @@ struct ipt_replace { > > /* Number of counters (must be equal to current number of entries). */ > > unsigned int num_counters; > > /* The old entries' counters. */ > >+#if __riscv_xlen == 64 > >+ union { > >+ struct xt_counters __user *counters; > >+ __u64 __counters; > >+ }; > >+#else > > struct xt_counters __user *counters; > >+#endif > > > > /* The entries (hang off end: not really an array). */ > > struct ipt_entry entries[]; > > This seems ok, but perhaps there is a better name for __riscv_xlen (ifdef > CONFIG_????ilp32), so it is not strictly tied to riscv, > in case other platform wants to try ilp32-self mode. Yes, I want that macro, but Linus has suggested "compat stuff". I would have to try. Thx for the reviewing! > > >+#if __riscv_xlen == 64 > >+ union { > >+ int __user *auth_flavours; /* 1 */ > >+ __u64 __auth_flavours; > >+ }; > >+#else > > int __user *auth_flavours; /* 1 */ > >+#endif > > }; > > > > /* bits in the flags field */ > >diff --git a/include/uapi/linux/ppp-ioctl.h b/include/uapi/linux/ppp-ioctl.h > >index 1cc5ce0ae062..8d48eab430c1 100644 > >--- a/include/uapi/linux/ppp-ioctl.h > >+++ b/include/uapi/linux/ppp-ioctl.h > >@@ -59,7 +59,14 @@ struct npioctl { > > > > /* Structure describing a CCP configuration option, for PPPIOCSCOMPRESS */ > > struct ppp_option_data { > >+#if __riscv_xlen == 64 > >+ union { > >+ __u8 __user *ptr; > >+ __u64 __ptr; > >+ }; > >+#else > > __u8 __user *ptr; > >+#endif > > __u32 length; > > int transmit; > > }; > >diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h > >index b7d91d4cf0db..46a06fddcd2f 100644 > >--- a/include/uapi/linux/sctp.h > >+++ b/include/uapi/linux/sctp.h > >@@ -1024,6 +1024,9 @@ struct sctp_getaddrs_old { > > #else > > struct sockaddr *addrs; > > #endif > >+#if (__riscv_xlen == 64) && (__SIZEOF_LONG__ == 4) > >+ __u32 unused; > >+#endif > > }; > > > > > > struct sctp_getaddrs { > >diff --git a/include/uapi/linux/sem.h b/include/uapi/linux/sem.h > >index 75aa3b273cd9..de9f441913cd 100644 > >--- a/include/uapi/linux/sem.h > >+++ b/include/uapi/linux/sem.h > >@@ -26,10 +26,29 @@ struct semid_ds { > > struct ipc_perm sem_perm; /* permissions .. see ipc.h */ > > __kernel_old_time_t sem_otime; /* last semop time */ > > __kernel_old_time_t sem_ctime; /* create/last semctl() time */ > >+#if __riscv_xlen == 64 > >+ union { > >+ struct sem *sem_base; /* ptr to first semaphore in array */ > >+ __u64 __sem_base; > >+ }; > >+ union { > >+ struct sem_queue *sem_pending; /* pending operations to be processed */ > >+ __u64 __sem_pending; > >+ }; > >+ union { > >+ struct sem_queue **sem_pending_last; /* last pending operation */ > >+ __u64 __sem_pending_last; > >+ }; > >+ union { > >+ struct sem_undo *undo; /* undo requests on this array */ > >+ __u64 __undo; > >+ }; > >+#else > > struct sem *sem_base; /* ptr to first semaphore in array */ > > struct sem_queue *sem_pending; /* pending operations to be processed */ > > struct sem_queue **sem_pending_last; /* last pending operation */ > > struct sem_undo *undo; /* undo requests on this array */ > >+#endif > > unsigned short sem_nsems; /* no. of semaphores in array */ > > }; > > > >@@ -46,10 +65,29 @@ struct sembuf { > > /* arg for semctl system calls. */ > > union semun { > > int val; /* value for SETVAL */ > >+#if __riscv_xlen == 64 > >+ union { > >+ struct semid_ds __user *buf; /* buffer for IPC_STAT & IPC_SET */ > >+ __u64 ___buf; > >+ }; > >+ union { > >+ unsigned short __user *array; /* array for GETALL & SETALL */ > >+ __u64 __array; > >+ }; > >+ union { > >+ struct seminfo __user *__buf; /* buffer for IPC_INFO */ > >+ __u64 ____buf; > >+ }; > >+ union { > >+ void __user *__pad; > >+ __u64 ____pad; > >+ }; > >+#else > > struct semid_ds __user *buf; /* buffer for IPC_STAT & IPC_SET */ > > unsigned short __user *array; /* array for GETALL & SETALL */ > > struct seminfo __user *__buf; /* buffer for IPC_INFO */ > > void __user *__pad; > >+#endif > > }; > > > > struct seminfo { > >diff --git a/include/uapi/linux/socket.h b/include/uapi/linux/socket.h > >index d3fcd3b5ec53..5f7a83649395 100644 > >--- a/include/uapi/linux/socket.h > >+++ b/include/uapi/linux/socket.h > >@@ -22,7 +22,14 @@ struct __kernel_sockaddr_storage { > > /* space to achieve desired size, */ > > /* _SS_MAXSIZE value minus size of ss_family */ > > }; > >+#if __riscv_xlen == 64 > >+ union { > >+ void *__align; /* implementation specific desired alignment */ > >+ u64 ___align; > >+ }; > >+#else > > void *__align; /* implementation specific desired alignment */ > >+#endif > > }; > > }; > > > >diff --git a/include/uapi/linux/sysctl.h b/include/uapi/linux/sysctl.h > >index 8981f00204db..8ed7b29897f9 100644 > >--- a/include/uapi/linux/sysctl.h > >+++ b/include/uapi/linux/sysctl.h > >@@ -33,13 +33,45 @@ > > member of a struct __sysctl_args to have? */ > > > > struct __sysctl_args { > >+#if __riscv_xlen == 64 > >+ union { > >+ int __user *name; > >+ __u64 __name; > >+ }; > >+#else > > int __user *name; > >+#endif > > int nlen; > >+#if __riscv_xlen == 64 > >+ union { > >+ void __user *oldval; > >+ __u64 __oldval; > >+ }; > >+#else > > void __user *oldval; > >+#endif > >+#if __riscv_xlen == 64 > >+ union { > >+ size_t __user *oldlenp; > >+ __u64 __oldlenp; > >+ }; > >+#else > > size_t __user *oldlenp; > >+#endif > >+#if __riscv_xlen == 64 > >+ union { > >+ void __user *newval; > >+ __u64 __newval; > >+ }; > >+#else > > void __user *newval; > >+#endif > > size_t newlen; > >+#if __riscv_xlen == 64 > >+ unsigned long long __unused[4]; > >+#else > > unsigned long __unused[4]; > >+#endif > > }; > > > > /* Define sysctl names first */ > >diff --git a/include/uapi/linux/uhid.h b/include/uapi/linux/uhid.h > >index cef7534d2d19..4a774dbd3de8 100644 > >--- a/include/uapi/linux/uhid.h > >+++ b/include/uapi/linux/uhid.h > >@@ -130,7 +130,14 @@ struct uhid_create_req { > > __u8 name[128]; > > __u8 phys[64]; > > __u8 uniq[64]; > >+#if __riscv_xlen == 64 > >+ union { > >+ __u8 __user *rd_data; > >+ __u64 __rd_data; > >+ }; > >+#else > > __u8 __user *rd_data; > >+#endif > > __u16 rd_size; > > > > __u16 bus; > >diff --git a/include/uapi/linux/uio.h b/include/uapi/linux/uio.h > >index 649739e0c404..27dfd6032dc6 100644 > >--- a/include/uapi/linux/uio.h > >+++ b/include/uapi/linux/uio.h > >@@ -16,8 +16,19 @@ > > > > struct iovec > > { > >+#if __riscv_xlen == 64 > >+ union { > >+ void __user *iov_base; /* BSD uses caddr_t (1003.1g requires void *) */ > >+ __u64 __iov_base; > >+ }; > >+ union { > >+ __kernel_size_t iov_len; /* Must be size_t (1003.1g) */ > >+ __u64 __iov_len; > >+ }; > >+#else > > void __user *iov_base; /* BSD uses caddr_t (1003.1g requires void *) */ > > __kernel_size_t iov_len; /* Must be size_t (1003.1g) */ > >+#endif > > }; > > > > struct dmabuf_cmsg { > >diff --git a/include/uapi/linux/usb/tmc.h b/include/uapi/linux/usb/tmc.h > >index d791cc58a7f0..443ec5356caf 100644 > >--- a/include/uapi/linux/usb/tmc.h > >+++ b/include/uapi/linux/usb/tmc.h > >@@ -51,7 +51,14 @@ struct usbtmc_request { > > > > struct usbtmc_ctrlrequest { > > struct usbtmc_request req; > >+#if __riscv_xlen == 64 > >+ union { > >+ void __user *data; /* pointer to user space */ > >+ __u64 __data; /* pointer to user space */ > >+ }; > >+#else > > void __user *data; /* pointer to user space */ > >+#endif > > } __attribute__ ((packed)); > > > > struct usbtmc_termchar { > >@@ -70,7 +77,14 @@ struct usbtmc_message { > > __u32 transfer_size; /* size of bytes to transfer */ > > __u32 transferred; /* size of received/written bytes */ > > __u32 flags; /* bit 0: 0 = synchronous; 1 = asynchronous */ > >+#if __riscv_xlen == 64 > >+ union { > >+ void __user *message; /* pointer to header and data in user space */ > >+ __u64 __message; > >+ }; > >+#else > > void __user *message; /* pointer to header and data in user space */ > >+#endif > > } __attribute__ ((packed)); > > > > /* Request values for USBTMC driver's ioctl entry point */ > >diff --git a/include/uapi/linux/usbdevice_fs.h b/include/uapi/linux/usbdevice_fs.h > >index 74a84e02422a..8c8efef74c3c 100644 > >--- a/include/uapi/linux/usbdevice_fs.h > >+++ b/include/uapi/linux/usbdevice_fs.h > >@@ -44,14 +44,28 @@ struct usbdevfs_ctrltransfer { > > __u16 wIndex; > > __u16 wLength; > > __u32 timeout; /* in milliseconds */ > >+#if __riscv_xlen == 64 > >+ union { > >+ void __user *data; > >+ __u64 __data; > >+ }; > >+#else > > void __user *data; > >+#endif > > }; > > > > struct usbdevfs_bulktransfer { > > unsigned int ep; > > unsigned int len; > > unsigned int timeout; /* in milliseconds */ > >+#if __riscv_xlen == 64 > >+ union { > >+ void __user *data; > >+ __u64 __data; > >+ }; > >+#else > > void __user *data; > >+#endif > > }; > > > > struct usbdevfs_setinterface { > >@@ -61,7 +75,14 @@ struct usbdevfs_setinterface { > > > > struct usbdevfs_disconnectsignal { > > unsigned int signr; > >+#if __riscv_xlen == 64 > >+ union { > >+ void __user *context; > >+ __u64 __context; > >+ }; > >+#else > > void __user *context; > >+#endif > > }; > > > > #define USBDEVFS_MAXDRIVERNAME 255 > >@@ -119,7 +140,14 @@ struct usbdevfs_urb { > > unsigned char endpoint; > > int status; > > unsigned int flags; > >+#if __riscv_xlen == 64 > >+ union { > >+ void __user *buffer; > >+ __u64 __buffer; > >+ }; > >+#else > > void __user *buffer; > >+#endif > > int buffer_length; > > int actual_length; > > int start_frame; > >@@ -130,7 +158,14 @@ struct usbdevfs_urb { > > int error_count; > > unsigned int signr; /* signal to be sent on completion, > > or 0 if none should be sent. */ > >+#if __riscv_xlen == 64 > >+ union { > >+ void __user *usercontext; > >+ __u64 __usercontext; > >+ }; > >+#else > > void __user *usercontext; > >+#endif > > struct usbdevfs_iso_packet_desc iso_frame_desc[]; > > }; > > > >@@ -139,7 +174,14 @@ struct usbdevfs_ioctl { > > int ifno; /* interface 0..N ; negative numbers reserved */ > > int ioctl_code; /* MUST encode size + direction of data so the > > * macros in <asm/ioctl.h> give correct values */ > >+#if __riscv_xlen == 64 > >+ union { > >+ void __user *data; /* param buffer (in, or out) */ > >+ __u64 __pad; > >+ }; > >+#else > > void __user *data; /* param buffer (in, or out) */ > >+#endif > > }; > > > > /* You can do most things with hubs just through control messages, > >@@ -195,9 +237,17 @@ struct usbdevfs_streams { > > #define USBDEVFS_SUBMITURB _IOR('U', 10, struct usbdevfs_urb) > > #define USBDEVFS_SUBMITURB32 _IOR('U', 10, struct usbdevfs_urb32) > > #define USBDEVFS_DISCARDURB _IO('U', 11) > >+#if __riscv_xlen == 64 > >+#define USBDEVFS_REAPURB _IOW('U', 12, __u64) > >+#else > > #define USBDEVFS_REAPURB _IOW('U', 12, void *) > >+#endif > > #define USBDEVFS_REAPURB32 _IOW('U', 12, __u32) > >+#if __riscv_xlen == 64 > >+#define USBDEVFS_REAPURBNDELAY _IOW('U', 13, __u64) > >+#else > > #define USBDEVFS_REAPURBNDELAY _IOW('U', 13, void *) > >+#endif > > #define USBDEVFS_REAPURBNDELAY32 _IOW('U', 13, __u32) > > #define USBDEVFS_DISCSIGNAL _IOR('U', 14, struct usbdevfs_disconnectsignal) > > #define USBDEVFS_DISCSIGNAL32 _IOR('U', 14, struct usbdevfs_disconnectsignal32) > >diff --git a/include/uapi/linux/uvcvideo.h b/include/uapi/linux/uvcvideo.h > >index f86185456dc5..3ccb99039a43 100644 > >--- a/include/uapi/linux/uvcvideo.h > >+++ b/include/uapi/linux/uvcvideo.h > >@@ -54,7 +54,14 @@ struct uvc_xu_control_mapping { > > __u32 v4l2_type; > > __u32 data_type; > > > >+#if __riscv_xlen == 64 > >+ union { > >+ struct uvc_menu_info __user *menu_info; > >+ __u64 __menu_info; > >+ }; > >+#else > > struct uvc_menu_info __user *menu_info; > >+#endif > > __u32 menu_count; > > > > __u32 reserved[4]; > >@@ -66,7 +73,14 @@ struct uvc_xu_control_query { > > __u8 query; /* Video Class-Specific Request Code, */ > > /* defined in linux/usb/video.h A.8. */ > > __u16 size; > >+#if __riscv_xlen == 64 > >+ union { > >+ __u8 __user *data; > >+ __u64 __data; > >+ }; > >+#else > > __u8 __user *data; > >+#endif > > }; > > > > #define UVCIOC_CTRL_MAP _IOWR('u', 0x20, struct uvc_xu_control_mapping) > >diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > >index c8dbf8219c4f..0a1dc2a780fb 100644 > >--- a/include/uapi/linux/vfio.h > >+++ b/include/uapi/linux/vfio.h > >@@ -1570,7 +1570,14 @@ struct vfio_iommu_type1_dma_map { > > struct vfio_bitmap { > > __u64 pgsize; /* page size for bitmap in bytes */ > > __u64 size; /* in bytes */ > >+ #if __riscv_xlen == 64 > >+ union { > >+ __u64 __user *data; /* one bit per page */ > >+ __u64 __data; > >+ }; > >+ #else > > __u64 __user *data; /* one bit per page */ > >+ #endif > > }; > > > > /** > >diff --git a/include/uapi/linux/videodev2.h b/include/uapi/linux/videodev2.h > >index e7c4dce39007..8e5391f07626 100644 > >--- a/include/uapi/linux/videodev2.h > >+++ b/include/uapi/linux/videodev2.h > >@@ -1898,7 +1898,14 @@ struct v4l2_ext_controls { > > __u32 error_idx; > > __s32 request_fd; > > __u32 reserved[1]; > >+#if __riscv_xlen == 64 > >+ union { > >+ struct v4l2_ext_control *controls; > >+ __u64 __controls; > >+ }; > >+#else > > struct v4l2_ext_control *controls; > >+#endif > > }; > > > > #define V4L2_CTRL_ID_MASK (0x0fffffff) > >-- > >2.40.1 > > > >
On Tue, Mar 25, 2025 at 9:18 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Tue, Mar 25, 2025, at 13:26, Peter Zijlstra wrote: > > On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote: > >> From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org> > >> > >> Since 2001, the CONFIG_64BIT kernel has been built with the LP64 ABI, > >> but this patchset allows the CONFIG_64BIT kernel to use an ILP32 ABI > > > > Please, don't do this. This adds a significant maintenance burden on all > > of us. > > It would be easier to this with CONFIG_64BIT disabled and continue > treating CONFIG_64BIT to be the same as BITS_PER_LONG=64, but I still > think it's fundamentally a bad idea to support this in mainline > kernels in any variation, other than supporting regular 32-bit > compat mode tasks on a regular 64-bit kernel. > > >> The patchset targets RISC-V and is built on the RV64ILP32 ABI, which > >> was introduced into RISC-V's psABI in January 2025 [1]. This patchset > >> equips an rv64ilp32-abi kernel with all the functionalities of a > >> traditional lp64-abi kernel, yet restricts the address space to 2GiB. > >> Hence, the rv64ilp32-abi kernel simultaneously supports lp64-abi > >> userspace and ilp32-abi (compat) userspace, the same as the > >> traditional lp64-abi kernel. > > You declare the syscall ABI to be the native 64-bit ABI, but this > is fundamentally not true because a many uapi structures are > defined in terms of 'long' or pointer values, in particular in > the ioctl call. I modified uapi with void __user *msg_name; -> union {void __user *msg_name; u64 __msg_name;}; to make native 64-bit ABI. I would look at compat stuff instead of using __riscv_xlen macro. > This might work for an rv64ilp32 userspace that > uses the same headers and the same types, but you explicitly > say that the goal is to run native rv64 or compat rv32 tasks, > not rv64ilp32 (thanks!). It's not for rv64ilp32-abi userspace, no rv64ilp32-abi userspace introduced in the patch set. It's for native lp64-abi. Let's discuss this in the first patch thread: uapi: Reuse lp64 ABI interface > > As far as I can tell, there is no way to rectify this design flaw > other than to drop support for 64-bit userspace and only support > regular rv32 userspace. I'm also skeptical that supporting rv64 > userspace helps in practice other than for testing, since > generally most memory overhead is in userspace rather than the > kernel, and there is much more to gain from shrinking the larger > userspace by running rv32 compat mode binaries on a 64-bit kernel > than the other way round. The lp64-abi userspace rootfs works fine in this patch set, which proves the technique is valid. But the modification on uapi is raw, and I'm looking at compat stuff. Supporting lp64-abi userspace is essential because riscv lp64-abi and ilp32-abi userspace are hybrid deployments when the target is ilp32-abi userspace. The lp64-abi provides a good supplement to ilp32-abi which eases the development. > > If you remove the CONFIG_64BIT changes that Peter mentioned and > the support for ilp64 userland from your series, you end up > with a kernel that is very similar to a native rv32 kernel > but executes as rv64ilp32 and runs rv32 userspace. I don't have > any objections to that approach, and the same thing has come > up on arm64 as a possible idea as well, but I don't know if > that actually brings any notable advantage over an rv32 kernel. > > Are there CPUs that can run rv64 kernels and rv32 userspace > but not rv32 kernels, similar to what we have on Arm Cortex-A76 > and Cortex-A510? Yes, there is, and it only supports rv32 userspace, not rv32 kernel. https://www.xrvm.com/product/xuantie/C908 Here are the products: https://developer.canaan-creative.com/k230_canmv/en/dev/userguide/boards/canmv_k230d.html http://riscv.org/ecosystem-news/2024/07/unpacking-the-canmv-k230-risc-v-board/
On Wed, Mar 26, 2025 at 4:41 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Tue, 25 Mar 2025 at 05:17, <guoren@kernel.org> wrote: > > > > The rv64ilp32 abi kernel accommodates the lp64 abi userspace and > > leverages the lp64 abi Linux interface. Hence, unify the > > BITS_PER_LONG = 32 memory layout to match BITS_PER_LONG = 64. > > No. > > This isn't happening. > > You can't do crazy things in the RISC-V code and then expect the rest > of the kernel to just go "ok, we'll do crazy things". > > We're not doing crazy __riscv_xlen hackery with random structures > containing 64-bit values that the kernel then only looks at the low 32 > bits. That's wrong on *so* many levels. > > I'm willing to say "big-endian is dead", but I'm not willing to accept > this kind of crazy hackery. > > Not today, not ever. > > If you want to run a ilp32 kernel on 64-bit hardware (and support > 64-bit ABI just in a 32-bit virtual memory size), I would suggest you > > (a) treat the kernel as natively 32-bit (obviously you can then tell > the compiler to use the rv64 instructions, which I presume you're > already doing - I didn't look) I used CONFIG_32BIT in v1 and v2, but I've abandoned them because, based on CONFIG_64BIT, I gain more functionality by inheriting the lp64-abi kernel. I want the full functionality of the CONFIG_64BIT Linux kernel, which can be equivalent, used interchangeably, and seamlessly. > > (b) look at making the compat stuff do the conversion the "wrong way". > > And btw, that (b) implies *not* just ignoring the high bits. If > user-space gives 64-bit pointer, you don't just treat it as a 32-bit > one by dropping the high bits. You add some logic to convert it to an > invalid pointer so that user space gets -EFAULT. Thanks for the advice. I'm looking at how to make the compat stuff.
On Wed, Mar 26, 2025, at 07:07, Guo Ren wrote: > On Tue, Mar 25, 2025 at 9:18 PM Arnd Bergmann <arnd@arndb.de> wrote: >> On Tue, Mar 25, 2025, at 13:26, Peter Zijlstra wrote: >> > On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote: >> >> You declare the syscall ABI to be the native 64-bit ABI, but this >> is fundamentally not true because a many uapi structures are >> defined in terms of 'long' or pointer values, in particular in >> the ioctl call. > > I modified uapi with > void __user *msg_name; > -> > union {void __user *msg_name; u64 __msg_name;}; > to make native 64-bit ABI. > > I would look at compat stuff instead of using __riscv_xlen macro. The problem I see here is that there are many more drivers that you did not modify than drivers that you did change this way. The union is particularly ugly, but even if you find a nicer method of doing this, you now also put the burden on future driver writers to do this right for your platform. >> As far as I can tell, there is no way to rectify this design flaw >> other than to drop support for 64-bit userspace and only support >> regular rv32 userspace. I'm also skeptical that supporting rv64 >> userspace helps in practice other than for testing, since >> generally most memory overhead is in userspace rather than the >> kernel, and there is much more to gain from shrinking the larger >> userspace by running rv32 compat mode binaries on a 64-bit kernel >> than the other way round. > > The lp64-abi userspace rootfs works fine in this patch set, which > proves the technique is valid. But the modification on uapi is raw, > and I'm looking at compat stuff. There is a big difference between making it work for a particular set of userspace binaries and making it correct for the entire kernel ABI. I agree that limiting the hacks to the compat side while keeping the native ABI as ilp32 as in your previous versions is better, but I also don't think this can be easily done without major changes to how compat mode works in general, and that still seems like a show-stopper for two reasons: - it still puts the burden on driver writers to get it right for your platform. The scope is a bit smaller than in the current version because that would be limited to the compat handlers and not change the native codepath, but that's still a lot of drivers. - the way that I would imagine this to be implemented in practice would require changing the compat code in a way that allows multiple compat ABIs, so drivers can separate the normal 32-on-64 handling from the 64-on-32 version you need. We have discussed something like this in the past, but Linus has already made it very clear that he doesn't want it done that way. Whichever way you do it, this is unlikely to find consensus. > Supporting lp64-abi userspace is essential because riscv lp64-abi and > ilp32-abi userspace are hybrid deployments when the target is > ilp32-abi userspace. The lp64-abi provides a good supplement to > ilp32-abi which eases the development. I'm not following here, please clarify. I do understand that having a mixed 32/64 userspace can help for development, but that can already be done on a 64-bit kernel and it doesn't seem to be useful for deployment because having two sets of support libraries makes this counterproductive for the goal of saving RAM. >> If you remove the CONFIG_64BIT changes that Peter mentioned and >> the support for ilp64 userland from your series, you end up >> with a kernel that is very similar to a native rv32 kernel >> but executes as rv64ilp32 and runs rv32 userspace. I don't have >> any objections to that approach, and the same thing has come >> up on arm64 as a possible idea as well, but I don't know if >> that actually brings any notable advantage over an rv32 kernel. >> >> Are there CPUs that can run rv64 kernels and rv32 userspace >> but not rv32 kernels, similar to what we have on Arm Cortex-A76 >> and Cortex-A510? > > Yes, there is, and it only supports rv32 userspace, not rv32 kernel. > https://www.xrvm.com/product/xuantie/C908 Ok, thanks for the link. Arnd
On Wed, Mar 26, 2025 at 1:19 AM Sergey Shtylyov <s.shtylyov@omp.ru> wrote: > > On 3/25/25 3:16 PM, guoren@kernel.org wrote: > > > From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org> > > > > The rv64ilp32 abi reuses the env and argv memory layout of the > > lp64 abi, so leave the space to fit the lp64 struct layout. > > > > Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> > > --- > > fs/exec.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/fs/exec.c b/fs/exec.c > > index 506cd411f4ac..548d18b7ae92 100644 > > --- a/fs/exec.c > > +++ b/fs/exec.c > > @@ -424,6 +424,10 @@ static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr) > > } > > #endif > > > > +#if defined(CONFIG_64BIT) && (BITS_PER_LONG == 32) okay, #if defined(CONFIG_64BIT) && BITS_PER_LONG == 32 > > Parens don't seem necessary... > > > + nr = nr * 2; > > Why not nr *= 2? okay, nr *= 2;
On Wed, Mar 26, 2025 at 3:10 AM Liam R. Howlett <Liam.Howlett@oracle.com> wrote: > > * guoren@kernel.org <guoren@kernel.org> [250325 08:24]: > > From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org> > > > > The Maple tree algorithm uses ulong type for each element. The > > number of slots is based on BITS_PER_LONG for RV64ILP32 ABI, so > > use BITS_PER_LONG instead of CONFIG_64BIT. > > > > Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> > > --- > > include/linux/maple_tree.h | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h > > index cbbcd18d4186..ff6265b6468b 100644 > > --- a/include/linux/maple_tree.h > > +++ b/include/linux/maple_tree.h > > @@ -24,7 +24,7 @@ > > * > > * Nodes in the tree point to their parent unless bit 0 is set. > > */ > > -#if defined(CONFIG_64BIT) || defined(BUILD_VDSO32_64) > > +#if (BITS_PER_LONG == 64) || defined(BUILD_VDSO32_64) > > This will break my userspace testing, if you do not update the testing as > well. This can be found in tools/testing/radix-tree. Please also look > at the Makefile as well since it will generate a build flag for the > userspace. I think you are talking about the following: ============================================================ ../shared/shared.mk: ifndef LONG_BIT LONG_BIT := $(shell getconf LONG_BIT) endif generated/bit-length.h: FORCE @mkdir -p generated @if ! grep -qws CONFIG_$(LONG_BIT)BIT generated/bit-length.h; then \ echo "Generating $@"; \ echo "#define CONFIG_$(LONG_BIT)BIT 1" > $@; \ echo "#define CONFIG_PHYS_ADDR_T_$(LONG_BIT)BIT 1" >> $@; \ fi $ grep CONFIG_64BIT * -r -A 2 generated/bit-length.h:#define CONFIG_64BIT 1 generated/bit-length.h-#define CONFIG_PHYS_ADDR_T_64BIT 1 -- maple.c:#if defined(CONFIG_64BIT) maple.c-static noinline void __init check_erase2_testset(struct maple_tree *mt, maple.c- const unsigned long *set, unsigned long size) -- maple.c:#if CONFIG_64BIT maple.c- MT_BUG_ON(mt, data_end != mas_data_end(&mas)); maple.c-#endif -- maple.c:#if CONFIG_64BIT maple.c- MT_BUG_ON(mt, data_end - 2 != mas_data_end(&mas)); maple.c-#endif -- maple.c:#if CONFIG_64BIT maple.c- MT_BUG_ON(mt, data_end - 4 != mas_data_end(&mas)); maple.c-#endif -- maple.c:#if defined(CONFIG_64BIT) maple.c- /* Captures from VMs that found previous errors */ maple.c- mt_init_flags(&tree, 0); ============================================================ First, we don't introduce rv64ilp32-abi user space, which means these testing codes can't run on rv64ilp32-abi userspace currently. So, the problem you mentioned doesn't exist. Second, CONFIG_32BIT is determined by LONG_BIT, so there's no issue in maple.c with future rv64ilp32-abi userspace. That means rv64ilp32-abi userspace would use CONFIG_32BIT to test radix-tree. It's okay. > > This raises other concerns as the code is found with a grep command, so > I'm not sure why it was missed and if anything else is missed? > > If you consider this email to be the (unasked) question about what to do > here, then please CC me, the maintainer of the files including the one > you are updating here. > > Thank you, > Liam >
On Wed, Mar 26, 2025 at 2:56 PM Arnd Bergmann <arnd@arndb.de> wrote: > > On Wed, Mar 26, 2025, at 07:07, Guo Ren wrote: > > On Tue, Mar 25, 2025 at 9:18 PM Arnd Bergmann <arnd@arndb.de> wrote: > >> On Tue, Mar 25, 2025, at 13:26, Peter Zijlstra wrote: > >> > On Tue, Mar 25, 2025 at 08:15:41AM -0400, guoren@kernel.org wrote: > >> > >> You declare the syscall ABI to be the native 64-bit ABI, but this > >> is fundamentally not true because a many uapi structures are > >> defined in terms of 'long' or pointer values, in particular in > >> the ioctl call. > > > > I modified uapi with > > void __user *msg_name; > > -> > > union {void __user *msg_name; u64 __msg_name;}; > > to make native 64-bit ABI. > > > > I would look at compat stuff instead of using __riscv_xlen macro. > > The problem I see here is that there are many more drivers > that you did not modify than drivers that you did change this > way. The union is particularly ugly, but even if you find > a nicer method of doing this, you now also put the burden > on future driver writers to do this right for your platform. Got it. > > >> As far as I can tell, there is no way to rectify this design flaw > >> other than to drop support for 64-bit userspace and only support > >> regular rv32 userspace. I'm also skeptical that supporting rv64 > >> userspace helps in practice other than for testing, since > >> generally most memory overhead is in userspace rather than the > >> kernel, and there is much more to gain from shrinking the larger > >> userspace by running rv32 compat mode binaries on a 64-bit kernel > >> than the other way round. > > > > The lp64-abi userspace rootfs works fine in this patch set, which > > proves the technique is valid. But the modification on uapi is raw, > > and I'm looking at compat stuff. > > There is a big difference between making it work for a particular > set of userspace binaries and making it correct for the entire > kernel ABI. > > I agree that limiting the hacks to the compat side while keeping > the native ABI as ilp32 as in your previous versions is better, > but I also don't think this can be easily done without major > changes to how compat mode works in general, and that still > seems like a show-stopper for two reasons: > > - it still puts the burden on driver writers to get it right > for your platform. The scope is a bit smaller than in the > current version because that would be limited to the compat > handlers and not change the native codepath, but that's > still a lot of drivers. > > - the way that I would imagine this to be implemented in > practice would require changing the compat code in a way that > allows multiple compat ABIs, so drivers can separate the > normal 32-on-64 handling from the 64-on-32 version you need. > We have discussed something like this in the past, but Linus > has already made it very clear that he doesn't want it done > that way. Whichever way you do it, this is unlikely to > find consensus. Got it, thanks for analysing. > > > Supporting lp64-abi userspace is essential because riscv lp64-abi and > > ilp32-abi userspace are hybrid deployments when the target is > > ilp32-abi userspace. The lp64-abi provides a good supplement to > > ilp32-abi which eases the development. > > I'm not following here, please clarify. I do understand that > having a mixed 32/64 userspace can help for development, but > that can already be done on a 64-bit kernel and it doesn't > seem to be useful for deployment because having two sets of > support libraries makes this counterproductive for the goal > of saving RAM. In my case, most binaries and libraries are based on 32-bit, but a small part would remain on 64-bit, which may be statically linked. For RISC-V, the rv64 ecosystem is more complete than the rv32's. So, rv64-abi is always necessary, and rv32-abi is a supplement. > > >> If you remove the CONFIG_64BIT changes that Peter mentioned and > >> the support for ilp64 userland from your series, you end up > >> with a kernel that is very similar to a native rv32 kernel > >> but executes as rv64ilp32 and runs rv32 userspace. I don't have > >> any objections to that approach, and the same thing has come > >> up on arm64 as a possible idea as well, but I don't know if > >> that actually brings any notable advantage over an rv32 kernel. > >> > >> Are there CPUs that can run rv64 kernels and rv32 userspace > >> but not rv32 kernels, similar to what we have on Arm Cortex-A76 > >> and Cortex-A510? > > > > Yes, there is, and it only supports rv32 userspace, not rv32 kernel. > > https://www.xrvm.com/product/xuantie/C908 > > Ok, thanks for the link. > > Arnd >
On Tue, 25 Mar 2025 13:41:30 PDT (-0700), Linus Torvalds wrote: > On Tue, 25 Mar 2025 at 05:17, <guoren@kernel.org> wrote: >> >> The rv64ilp32 abi kernel accommodates the lp64 abi userspace and >> leverages the lp64 abi Linux interface. Hence, unify the >> BITS_PER_LONG = 32 memory layout to match BITS_PER_LONG = 64. > > No. > > This isn't happening. > > You can't do crazy things in the RISC-V code and then expect the rest > of the kernel to just go "ok, we'll do crazy things". > > We're not doing crazy __riscv_xlen hackery with random structures > containing 64-bit values that the kernel then only looks at the low 32 > bits. That's wrong on *so* many levels. FWIW: this has come up a few times and we've generally said "nobody wants this", but that doesn't seem to stick... > I'm willing to say "big-endian is dead", but I'm not willing to accept > this kind of crazy hackery. > > Not today, not ever. OK, maybe that will stick ;) > If you want to run a ilp32 kernel on 64-bit hardware (and support > 64-bit ABI just in a 32-bit virtual memory size), I would suggest you > > (a) treat the kernel as natively 32-bit (obviously you can then tell > the compiler to use the rv64 instructions, which I presume you're > already doing - I didn't look) > > (b) look at making the compat stuff do the conversion the "wrong way". > > And btw, that (b) implies *not* just ignoring the high bits. If > user-space gives 64-bit pointer, you don't just treat it as a 32-bit > one by dropping the high bits. You add some logic to convert it to an > invalid pointer so that user space gets -EFAULT. > > Linus
On Tue, 25 Mar 2025 08:15:41 -0400 guoren@kernel.org wrote: > From: "Guo Ren (Alibaba DAMO Academy)" <guoren@kernel.org> > > Since 2001, the CONFIG_64BIT kernel has been built with the LP64 ABI, > but this patchset allows the CONFIG_64BIT kernel to use an ILP32 ABI > for construction to reduce cache & memory footprint (Compared to > kernel-lp64-abi, kernel-rv64ilp32-abi decreased the used memory by > about 20%, as shown in "free -h" in the following demo.) ... Why on earth would you want to run a 64bit application on a 32bit kernel. IIRC the main justification for 64bit was to get a larger address space. Now you might want to compile a 32bit (ILP32) system that actually runs in 64bit mode (c/f x32) so that 64bit maths (long long) is more efficient - but that is a different issue. (I suspect you'd need to change the process switch code to save all 64bits of the registers - but maybe not much else??) David