Message ID | 1450215766-14765-13-git-send-email-ynorov@caviumnetworks.com |
---|---|
State | New |
Headers | show |
On Wednesday 16 December 2015 00:42:38 Yury Norov wrote: > +/* Using non-compat syscalls where necessary */ > +#define compat_sys_fadvise64_64 sys_fadvise64_64 > +#define compat_sys_fallocate sys_fallocate > +#define compat_sys_ftruncate64 sys_ftruncate > +#define compat_sys_lookup_dcookie sys_lookup_dcookie > +#define compat_sys_pread64 sys_pread64 > +#define compat_sys_pwrite64 sys_pwrite64 > +#define compat_sys_readahead sys_readahead > +#define compat_sys_shmat sys_shmat > +#define compat_sys_sigaltstack sys_sigaltstack > +#define compat_sys_sync_file_range sys_sync_file_range > +#define compat_sys_truncate64 sys_truncate > +#define sys_llseek sys_lseek > + > +#define compat_sys_open_by_handle_at sys_open_by_handle_at > +#define compat_sys_openat sys_openat Very nice and short list now! I've double-checked all calls and the only one I'm not sure about is sys_sigaltstack. Did we discuss that one earlier? My first guess would be that it's easier to use the compat version of that. I would probably also group sys_shmat with sys_openat, because it's different from the other ones in the first block as this is not about passing 64-bit arguments as registers, but instead it's about the behavior of the system call. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote: > +/* Using non-compat syscalls where necessary */ > +#define compat_sys_fadvise64_64 sys_fadvise64_64 > +#define compat_sys_fallocate sys_fallocate > +#define compat_sys_ftruncate64 sys_ftruncate I initially thought this should be sys_ftruncate64 (or a wrapper to pass small == 0) but we rely on sys_openat to set O_LARGEFILE. arch/arm has ftruncate and ftruncate64, but it looks like we route both via sys_ftruncate(). The difference is the "small" argument which imposes a limit on the length without O_LARGEFILE, so we may have a bug here. > +#define compat_sys_lookup_dcookie sys_lookup_dcookie > +#define compat_sys_pread64 sys_pread64 > +#define compat_sys_pwrite64 sys_pwrite64 > +#define compat_sys_readahead sys_readahead > +#define compat_sys_shmat sys_shmat I wonder whether we need wrappers (actually, not only for these but sys_read etc.). These functions take either a pointer or a size_t argument which are 32-bit with ILP32 but treated as 64-bit by an LP64 kernel. Can we guarantee that user space zeros the top 32-bit of the arguments passed here? With compat/AArch32, this is guaranteed by the kernel since EL0 won't be able to touch the top part but here I'm not entirely sure. As long as user space used Wn registers for 32-bit types, we are probably fine (the architecture guarantees the top 32-bit zeroing following a MOV, LDR etc. instruction into a Wn register). We just need to mention this in the ABI document (ilp32.txt). > +#define compat_sys_sigaltstack sys_sigaltstack I think Arnd is right here in using the compat function. The stack_t would differ between LP64 and ILP32. compat_sys_sigaltstack() uses compat_user_stack_pointer() but this should work correctly as it checks pt_regs for the right mode. > +#define compat_sys_sync_file_range sys_sync_file_range > +#define compat_sys_truncate64 sys_truncate > +#define sys_llseek sys_lseek I think this makes sense since we have 64-bit registers. > + > +#define compat_sys_open_by_handle_at sys_open_by_handle_at > +#define compat_sys_openat sys_openat So using sys_openat() forces O_LARGEFILE and we don't have a problem with (f)truncate. We may have an issue with AArch32 compat though. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Thursday 17 December 2015 18:27:53 Catalin Marinas wrote: > On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote: > > +#define compat_sys_lookup_dcookie sys_lookup_dcookie > > +#define compat_sys_pread64 sys_pread64 > > +#define compat_sys_pwrite64 sys_pwrite64 > > +#define compat_sys_readahead sys_readahead > > +#define compat_sys_shmat sys_shmat > > I wonder whether we need wrappers (actually, not only for these but > sys_read etc.). These functions take either a pointer or a size_t > argument which are 32-bit with ILP32 but treated as 64-bit by an LP64 > kernel. Can we guarantee that user space zeros the top 32-bit of the > arguments passed here? I'm pretty sure that is safe. I haven't read the calling conventions specification for arm64 ilp32, but usually all function arguments are passed as 64-bit registers with proper sign-extend or zero-extend. Most other syscalls rely on this behavior too, not just the ones that are being modified here. > With compat/AArch32, this is guaranteed by the kernel since EL0 won't be > able to touch the top part but here I'm not entirely sure. As long as > user space used Wn registers for 32-bit types, we are probably fine (the > architecture guarantees the top 32-bit zeroing following a MOV, LDR etc. > instruction into a Wn register). We just need to mention this in the ABI > document (ilp32.txt). I think the aarch32 case is actually the hard one, because it has to worry about explicitly sign-extending 32-bit arguments (signed int or signed long) that might be negative, e.g. user space passes -1 as 0xffffffff, which the kernel entry turns into 0x00000000ffffffff when it should use 0xffffffffffffffff. The COMPAT_SYSCALL_DEFINEx macros take care of this. > > + > > +#define compat_sys_open_by_handle_at sys_open_by_handle_at > > +#define compat_sys_openat sys_openat > > So using sys_openat() forces O_LARGEFILE and we don't have a problem > with (f)truncate. We may have an issue with AArch32 compat though. aarch32 uses the correct compat functions in asm/unistd32.h Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Thu, Dec 17, 2015 at 12:10 PM, Arnd Bergmann <arnd@arndb.de> wrote: > On Thursday 17 December 2015 18:27:53 Catalin Marinas wrote: >> On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote: > >> > +#define compat_sys_lookup_dcookie sys_lookup_dcookie >> > +#define compat_sys_pread64 sys_pread64 >> > +#define compat_sys_pwrite64 sys_pwrite64 >> > +#define compat_sys_readahead sys_readahead >> > +#define compat_sys_shmat sys_shmat >> >> I wonder whether we need wrappers (actually, not only for these but >> sys_read etc.). These functions take either a pointer or a size_t >> argument which are 32-bit with ILP32 but treated as 64-bit by an LP64 >> kernel. Can we guarantee that user space zeros the top 32-bit of the >> arguments passed here? > > I'm pretty sure that is safe. I haven't read the calling conventions > specification for arm64 ilp32, but usually all function arguments are > passed as 64-bit registers with proper sign-extend or zero-extend. Well (just like LP64 on AARCH64), when passing a 32bit value to a function, the upper 32bits are undefined. I ran into this when I was debugging the GCC go library on ILP32 (though reproduced with pure C code) and the assembly functions inside glibc where pointers are passed with the upper 32bits as undefined. So we have an issue if called with syscall function or using pure assembly to create the syscall functions (which glibc does). Thanks, Andrew > > Most other syscalls rely on this behavior too, not just the ones that > are being modified here. > >> With compat/AArch32, this is guaranteed by the kernel since EL0 won't be >> able to touch the top part but here I'm not entirely sure. As long as >> user space used Wn registers for 32-bit types, we are probably fine (the >> architecture guarantees the top 32-bit zeroing following a MOV, LDR etc. >> instruction into a Wn register). We just need to mention this in the ABI >> document (ilp32.txt). > > I think the aarch32 case is actually the hard one, because it has to > worry about explicitly sign-extending 32-bit arguments (signed int or > signed long) that might be negative, e.g. user space passes -1 > as 0xffffffff, which the kernel entry turns into 0x00000000ffffffff > when it should use 0xffffffffffffffff. The COMPAT_SYSCALL_DEFINEx > macros take care of this. > >> > + >> > +#define compat_sys_open_by_handle_at sys_open_by_handle_at >> > +#define compat_sys_openat sys_openat >> >> So using sys_openat() forces O_LARGEFILE and we don't have a problem >> with (f)truncate. We may have an issue with AArch32 compat though. > > aarch32 uses the correct compat functions in asm/unistd32.h > > Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Thursday 17 December 2015 12:14:20 Andrew Pinski wrote: > On Thu, Dec 17, 2015 at 12:10 PM, Arnd Bergmann <arnd@arndb.de> wrote: > > On Thursday 17 December 2015 18:27:53 Catalin Marinas wrote: > >> On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote: > > > >> > +#define compat_sys_lookup_dcookie sys_lookup_dcookie > >> > +#define compat_sys_pread64 sys_pread64 > >> > +#define compat_sys_pwrite64 sys_pwrite64 > >> > +#define compat_sys_readahead sys_readahead > >> > +#define compat_sys_shmat sys_shmat > >> > >> I wonder whether we need wrappers (actually, not only for these but > >> sys_read etc.). These functions take either a pointer or a size_t > >> argument which are 32-bit with ILP32 but treated as 64-bit by an LP64 > >> kernel. Can we guarantee that user space zeros the top 32-bit of the > >> arguments passed here? > > > > I'm pretty sure that is safe. I haven't read the calling conventions > > specification for arm64 ilp32, but usually all function arguments are > > passed as 64-bit registers with proper sign-extend or zero-extend. > > Well (just like LP64 on AARCH64), when passing a 32bit value to a > function, the upper 32bits are undefined. I ran into this when I was > debugging the GCC go library on ILP32 (though reproduced with pure C > code) and the assembly functions inside glibc where pointers are > passed with the upper 32bits as undefined. > So we have an issue if called with syscall function or using pure > assembly to create the syscall functions (which glibc does). Ok, I see :-( So the calling conventions avoid the problem of being able to set the upper bits from malicious user space when the kernel assumes they are zeroed out (we had security bugs in this area, before we introduced SYSCALL_DEFINEx()), but it means that we need wrappers around each syscall that takes an argument that is different length between user and kernel space (as Catalin guessed). arch/s390 has the same problem and works around it with code in arch/s390/kernel/compat_wrapper.c, while other architectures (at least powerpc, x86 and tile IIRC, don't know much about mips, parisc and sparc) don't have the problem because of their calling conventions. This also means that we cannot work around it in glibc at all, because we have to be able to handle malicious user space, so it has to be done in the kernel using something similar to what s390 does. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
(cc'ing Marcus for more insight on the tools side) On Thu, Dec 17, 2015 at 12:14:20PM -0800, Andrew Pinski wrote: > On Thu, Dec 17, 2015 at 12:10 PM, Arnd Bergmann <arnd@arndb.de> wrote: > > On Thursday 17 December 2015 18:27:53 Catalin Marinas wrote: > >> On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote: > > > >> > +#define compat_sys_lookup_dcookie sys_lookup_dcookie > >> > +#define compat_sys_pread64 sys_pread64 > >> > +#define compat_sys_pwrite64 sys_pwrite64 > >> > +#define compat_sys_readahead sys_readahead > >> > +#define compat_sys_shmat sys_shmat > >> > >> I wonder whether we need wrappers (actually, not only for these but > >> sys_read etc.). These functions take either a pointer or a size_t > >> argument which are 32-bit with ILP32 but treated as 64-bit by an LP64 > >> kernel. Can we guarantee that user space zeros the top 32-bit of the > >> arguments passed here? > > > > I'm pretty sure that is safe. I haven't read the calling conventions > > specification for arm64 ilp32, but usually all function arguments are > > passed as 64-bit registers with proper sign-extend or zero-extend. > > Well (just like LP64 on AARCH64), when passing a 32bit value to a > function, the upper 32bits are undefined. I ran into this when I was > debugging the GCC go library on ILP32 (though reproduced with pure C > code) and the assembly functions inside glibc where pointers are > passed with the upper 32bits as undefined. > So we have an issue if called with syscall function or using pure > assembly to create the syscall functions (which glibc does). I think the ILP32 syscall ABI should follow the PCS convention where the top 32-bit of a register is not guaranteed 0 when the size of the argument is 32-bit. So take the read(2) syscall: ssize_t read(int fd, void *buf, size_t count); From the ILP32 code perspective, void * and size_t are both 32-bit. It would call into the kernel leaving the top 32-bit as undefined (if we follow the PCS). Normally, calling a function with the same size arguments is not a problem since the compiler generates the callee code accordingly. However, we route the syscall directly into the native sys_read() where void * and size_t are 64-bit with the top 32-bit left undefined. We have three options here: 1. Always follow PCS convention across user/kernel call and add wrappers in the kernel (preferred) 2. Follow the PCS up to glibc and get glibc to zero the top part (not always safe with hand-written assembly, though we already do this for AArch32 where the PCS only specifies 4 arguments in registers, the rest go on the stack) 3. Follow the PCS up to glibc but always pass syscall arguments in W registers, like AArch32 compat support (the least preferred option, the only advantage is a single wrapper for all syscalls but it would be doing unnecessary zeroing even for syscalls where it isn't needed) My preference, as stated above, is (1). You can write the wrappers in C directly and let the compiler upgrade the types when calling the native syscall. But any other option would be fine (take some inspiration from other architectures). Unfortunately we don't have COMPAT_SYSCALL_DEFINE for all functions that we need to wrap, it would have been easier (so we need to add them but probably in the arch/arm64 code). -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Friday 18 December 2015 11:42:19 Catalin Marinas wrote: > On Thu, Dec 17, 2015 at 12:14:20PM -0800, Andrew Pinski wrote: > > Well (just like LP64 on AARCH64), when passing a 32bit value to a > > function, the upper 32bits are undefined. I ran into this when I was > > debugging the GCC go library on ILP32 (though reproduced with pure C > > code) and the assembly functions inside glibc where pointers are > > passed with the upper 32bits as undefined. > > So we have an issue if called with syscall function or using pure > > assembly to create the syscall functions (which glibc does). > > I think the ILP32 syscall ABI should follow the PCS convention where the > top 32-bit of a register is not guaranteed 0 when the size of the > argument is 32-bit. So take the read(2) syscall: > > ssize_t read(int fd, void *buf, size_t count); > > From the ILP32 code perspective, void * and size_t are both 32-bit. It > would call into the kernel leaving the top 32-bit as undefined (if we > follow the PCS). Normally, calling a function with the same size > arguments is not a problem since the compiler generates the callee code > accordingly. However, we route the syscall directly into the native > sys_read() where void * and size_t are 64-bit with the top 32-bit left > undefined. > > We have three options here: > > 1. Always follow PCS convention across user/kernel call and add wrappers > in the kernel (preferred) Yes, I also think this is best. > 2. Follow the PCS up to glibc and get glibc to zero the top part (not > always safe with hand-written assembly, though we already do this for > AArch32 where the PCS only specifies 4 arguments in registers, the > rest go on the stack) I assume this needs special handling for syscalls with 64-bit arguments in both glibc and kernel. > 3. Follow the PCS up to glibc but always pass syscall arguments in W > registers, like AArch32 compat support (the least preferred option, > the only advantage is a single wrapper for all syscalls but it would > be doing unnecessary zeroing even for syscalls where it isn't needed) This would mean we cannot pass 64-bit arguments in registers, right? > My preference, as stated above, is (1). You can write the wrappers in C > directly and let the compiler upgrade the types when calling the native > syscall. But any other option would be fine (take some inspiration from > other architectures). Unfortunately we don't have COMPAT_SYSCALL_DEFINE > for all functions that we need to wrap, it would have been easier (so we > need to add them but probably in the arch/arm64 code). It would be nice to have that code architecture-independent, so we can share it with s390 and only need to update one place when new syscalls get added. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Fri, Dec 18, 2015 at 01:47:55PM +0100, Arnd Bergmann wrote: > On Friday 18 December 2015 11:42:19 Catalin Marinas wrote: > > On Thu, Dec 17, 2015 at 12:14:20PM -0800, Andrew Pinski wrote: > > > Well (just like LP64 on AARCH64), when passing a 32bit value to a > > > function, the upper 32bits are undefined. I ran into this when I was > > > debugging the GCC go library on ILP32 (though reproduced with pure C > > > code) and the assembly functions inside glibc where pointers are > > > passed with the upper 32bits as undefined. > > > So we have an issue if called with syscall function or using pure > > > assembly to create the syscall functions (which glibc does). > > > > I think the ILP32 syscall ABI should follow the PCS convention where the > > top 32-bit of a register is not guaranteed 0 when the size of the > > argument is 32-bit. So take the read(2) syscall: > > > > ssize_t read(int fd, void *buf, size_t count); > > > > From the ILP32 code perspective, void * and size_t are both 32-bit. It > > would call into the kernel leaving the top 32-bit as undefined (if we > > follow the PCS). Normally, calling a function with the same size > > arguments is not a problem since the compiler generates the callee code > > accordingly. However, we route the syscall directly into the native > > sys_read() where void * and size_t are 64-bit with the top 32-bit left > > undefined. > > > > We have three options here: > > > > 1. Always follow PCS convention across user/kernel call and add wrappers > > in the kernel (preferred) > > Yes, I also think this is best. > > > 2. Follow the PCS up to glibc and get glibc to zero the top part (not > > always safe with hand-written assembly, though we already do this for > > AArch32 where the PCS only specifies 4 arguments in registers, the > > rest go on the stack) > > I assume this needs special handling for syscalls with 64-bit arguments > in both glibc and kernel. I think glibc only should suffice, if it is its responsibility to zero the top 32-bit part. > > 3. Follow the PCS up to glibc but always pass syscall arguments in W > > registers, like AArch32 compat support (the least preferred option, > > the only advantage is a single wrapper for all syscalls but it would > > be doing unnecessary zeroing even for syscalls where it isn't needed) > > This would mean we cannot pass 64-bit arguments in registers, right? Not in a single register but two (like we do on AArch32). > > My preference, as stated above, is (1). You can write the wrappers in C > > directly and let the compiler upgrade the types when calling the native > > syscall. But any other option would be fine (take some inspiration from > > other architectures). Unfortunately we don't have COMPAT_SYSCALL_DEFINE > > for all functions that we need to wrap, it would have been easier (so we > > need to add them but probably in the arch/arm64 code). > > It would be nice to have that code architecture-independent, so we can > share it with s390 and only need to update one place when new syscalls > get added. We could indeed move things like: COMPAT_SYSCALL_DEFINE3(s390_read, unsigned int, fd, char __user *, buf, compat_size_t, count) to the core code and share them between s390 and arm64/ILP32. So let's stick to option 1. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
> On 18 Dec 2015, at 13:47, Arnd Bergmann <arnd@arndb.de> wrote: > >> 3. Follow the PCS up to glibc but always pass syscall arguments in W >> registers, like AArch32 compat support (the least preferred option, >> the only advantage is a single wrapper for all syscalls but it would >> be doing unnecessary zeroing even for syscalls where it isn't needed) > > This would mean we cannot pass 64-bit arguments in registers, right? Note that there’s no 32bit registers (the ‘w’-form always refers to the lower 32bits of a 64bit register, with implicit zero-extension)… and load/store instructions always use the full base-register (‘x’-form) for address calculation. I.e. a load/store would inadvertently pickup “random garbage” in the upper 32bits, if no explicit zero-extension is applied. In other words: all zero-extensions for 32bit arguments should be explicit on the kernel side. Regards, Philipp. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Monday 21 December 2015, Dr. Philipp Tomsich wrote: > > > On 18 Dec 2015, at 13:47, Arnd Bergmann <arnd@arndb.de> wrote: > > > >> 3. Follow the PCS up to glibc but always pass syscall arguments in W > >> registers, like AArch32 compat support (the least preferred option, > >> the only advantage is a single wrapper for all syscalls but it would > >> be doing unnecessary zeroing even for syscalls where it isn't needed) > > > > This would mean we cannot pass 64-bit arguments in registers, right? > > Note that there’s no 32bit registers (the ‘w’-form always refers to the lower > 32bits of a 64bit register, with implicit zero-extension)… and load/store > instructions always use the full base-register (‘x’-form) for address calculation. > I.e. a load/store would inadvertently pickup “random garbage” in the upper > 32bits, if no explicit zero-extension is applied. > > In other words: all zero-extensions for 32bit arguments should be explicit > on the kernel side. I think that is what Catalin meant with the single wrapper in the description: the kernel always zeroes out the upper 32 bits in the initial trap, and then we only need to do sign-extensions for the few cases that need it, and pass 64-bit arguments in two lower halves. In contrast, approach 1 adds a separate wrapper for each syscall that takes care of both sign-extending where necessary and zero-extending all othe 32-bit arguments but not the 64-bit arguments. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Monday 21 December 2015, Catalin Marinas wrote: > On Fri, Dec 18, 2015 at 01:47:55PM +0100, Arnd Bergmann wrote: > > On Friday 18 December 2015 11:42:19 Catalin Marinas wrote: > > > 2. Follow the PCS up to glibc and get glibc to zero the top part (not > > > always safe with hand-written assembly, though we already do this for > > > AArch32 where the PCS only specifies 4 arguments in registers, the > > > rest go on the stack) > > > > I assume this needs special handling for syscalls with 64-bit arguments > > in both glibc and kernel. > > I think glibc only should suffice, if it is its responsibility to zero > the top 32-bit part. The kernel still needs to know about whether to call e.g. sys_llseek or sys_lseek. The default syscall table contains llseek for 32-bit architectures, but the current patch set uses lseek because that makes more sense when you have 64-bit registers. > > > 3. Follow the PCS up to glibc but always pass syscall arguments in W > > > registers, like AArch32 compat support (the least preferred option, > > > the only advantage is a single wrapper for all syscalls but it would > > > be doing unnecessary zeroing even for syscalls where it isn't needed) > > > > This would mean we cannot pass 64-bit arguments in registers, right? > > Not in a single register but two (like we do on AArch32). Yes, that's what I mean. Essentially we'd use the unmodified 32-bit API here. > > > My preference, as stated above, is (1). You can write the wrappers in C > > > directly and let the compiler upgrade the types when calling the native > > > syscall. But any other option would be fine (take some inspiration from > > > other architectures). Unfortunately we don't have COMPAT_SYSCALL_DEFINE > > > for all functions that we need to wrap, it would have been easier (so we > > > need to add them but probably in the arch/arm64 code). > > > > It would be nice to have that code architecture-independent, so we can > > share it with s390 and only need to update one place when new syscalls > > get added. > > We could indeed move things like: > > COMPAT_SYSCALL_DEFINE3(s390_read, unsigned int, fd, char __user *, buf, compat_size_t, count) > > to the core code and share them between s390 and arm64/ILP32. So let's > stick to option 1. Ok. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tuesday 15 December 2015, Yury Norov wrote: > + > +#define compat_sys_open_by_handle_at sys_open_by_handle_at > +#define compat_sys_openat sys_openat > + One more thing I just remembered: I think we want this behavior for all new 32-bit architectures, it was a bug to call compat_sys_openat for the generic syscall table, as we don't support 32-bit off_t. Could you split this out into a separate patch that does these changes: - change the default asm-generic/unistd.h to use sys_openat/sys_open_by_handle_at - change tile to override those two to keep the current (suboptimal) behavior - change the force_o_largefile() definition so it defaults to true for all future architectures. The easiest way is probably to add a Kconfig symbol for this that gets selected by all 32-bit architectures, so we can use #define force_o_largefile() ((BITS_PER_LONG != 32) || !IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T)) Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Mon, Dec 21, 2015 at 11:31:57PM +0100, Arnd Bergmann wrote: > On Tuesday 15 December 2015, Yury Norov wrote: > > + > > +#define compat_sys_open_by_handle_at sys_open_by_handle_at > > +#define compat_sys_openat sys_openat > > + > > One more thing I just remembered: I think we want this behavior for all new > 32-bit architectures, it was a bug to call compat_sys_openat for the generic > syscall table, as we don't support 32-bit off_t. > > Could you split this out into a separate patch that does these changes: > > - change the default asm-generic/unistd.h to use sys_openat/sys_open_by_handle_at > - change tile to override those two to keep the current (suboptimal) behavior > - change the force_o_largefile() definition so it defaults to true for all future > architectures. The easiest way is probably to add a Kconfig symbol for this > that gets selected by all 32-bit architectures, so we can use > > #define force_o_largefile() ((BITS_PER_LONG != 32) || !IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T)) > Hi Arnd, First two items are OK. The last one... Do you need it to remove compat_sys_openat and compat_sys_open_by_handle_at? The patch that introduces CONFIG_ARCH_32BIT_OFF_T will affect all architectures, and so I need to get ack from each maintainer. I will also have to ask them to test that change, because I have no access to all the hardware. (Even with QEMU, I cannot test them all.) I think the only man who is able to success with it is Linus :)... Some arches has more than one compat mode. For example ARM64: aarch32 requires this config enabled, but ilp32 needs it disabled. Now we can enable both features, but this will make them mutual exclusive. We can instead (un)define __ARCH_WANT_32BIT_OFF_T for each ABI of each platform, but it's even more work. And we'd think twice how to do it because it's not mechanical work. For example, on aarch64 it will look like: #define __ARCH_WANT_32BIT_OFF_T is_a32_compat_task() I have no idea how it will look on x86 or ppc. In your previous email (Nov 18) you write that tile is the only user of asm-generic/unistd.h that needs compat behaviour. If so, why not just to turn it around, as you initially suggested, and fix tile. And do nothing with force_o_largefile()? By the way, is there a comprehensive list of linux platforms/abis, or at least ones that use asm-generic/unistd.h? BR, Yury. > Arnd > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Wednesday 23 December 2015, Yury Norov wrote: > On Mon, Dec 21, 2015 at 11:31:57PM +0100, Arnd Bergmann wrote: > > On Tuesday 15 December 2015, Yury Norov wrote: > > > + > > > +#define compat_sys_open_by_handle_at sys_open_by_handle_at > > > +#define compat_sys_openat sys_openat > > > + > > > > One more thing I just remembered: I think we want this behavior for all new > > 32-bit architectures, it was a bug to call compat_sys_openat for the generic > > syscall table, as we don't support 32-bit off_t. > > > > Could you split this out into a separate patch that does these changes: > > > > - change the default asm-generic/unistd.h to use sys_openat/sys_open_by_handle_at > > - change tile to override those two to keep the current (suboptimal) behavior > > - change the force_o_largefile() definition so it defaults to true for all future > > architectures. The easiest way is probably to add a Kconfig symbol for this > > that gets selected by all 32-bit architectures, so we can use > > > > #define force_o_largefile() ((BITS_PER_LONG != 32) || !IS_ENABLED(CONFIG_ARCH_32BIT_OFF_T)) > > > > Hi Arnd, > > First two items are OK. The last one... Do you need it to remove > compat_sys_openat and compat_sys_open_by_handle_at? It's needed so all future architectures get it right. > The patch that introduces CONFIG_ARCH_32BIT_OFF_T will affect all > architectures, and so I need to get ack from each maintainer. Just post it to the linux-arch and linux-kernel mailing lists. You are not actually changing behavior, so my Ack should be sufficient here, as long as we make sure we handle all architectures the same way. > I will also have to ask them to test that change, because I have no > access to all the hardware. (Even with QEMU, I cannot test them all.) > I think the only man who is able to success with it is Linus :)... > > Some arches has more than one compat mode. For example ARM64: aarch32 > requires this config enabled, but ilp32 needs it disabled. Now we can > enable both features, but this will make them mutual exclusive. We can > instead (un)define __ARCH_WANT_32BIT_OFF_T for each ABI of each platform, > but it's even more work. And we'd think twice how to do it because > it's not mechanical work. For example, on aarch64 it will look like: > > #define __ARCH_WANT_32BIT_OFF_T is_a32_compat_task() > > I have no idea how it will look on x86 or ppc. arm64 is not affected at all, because it's a 64-bit-only architecture and force_o_largefile() already returns true here. > In your previous email (Nov 18) you write that tile is the only user > of asm-generic/unistd.h that needs compat behaviour. If so, why not > just to turn it around, as you initially suggested, and fix tile. And > do nothing with force_o_largefile()? The first two of the three changes I listed above are for the compat ABI, and there it is enough to change tile. The third change has nothing to do with compat mode, but is about native 32-bit architectures. However, we should change both in sync, so the next architecture that gets added with both native 32-bit mode and compat 32-bit mode on a 64-bit kernel behaves the same way for all 32-bit user space, independent of what the kernel does. We can have two separate patches to clarify that these don't have to be done atomically, but we need to do both for consistency. > By the way, is there a comprehensive list of linux platforms/abis, or > at least ones that use asm-generic/unistd.h? Just the source code. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Thu, Dec 17, 2015 at 09:50:52PM +0100, Arnd Bergmann wrote: > On Thursday 17 December 2015 12:14:20 Andrew Pinski wrote: > > On Thu, Dec 17, 2015 at 12:10 PM, Arnd Bergmann <arnd@arndb.de> wrote: > > > On Thursday 17 December 2015 18:27:53 Catalin Marinas wrote: > > >> On Wed, Dec 16, 2015 at 12:42:38AM +0300, Yury Norov wrote: > > > > > >> > +#define compat_sys_lookup_dcookie sys_lookup_dcookie > > >> > +#define compat_sys_pread64 sys_pread64 > > >> > +#define compat_sys_pwrite64 sys_pwrite64 > > >> > +#define compat_sys_readahead sys_readahead > > >> > +#define compat_sys_shmat sys_shmat > > >> > > >> I wonder whether we need wrappers (actually, not only for these but > > >> sys_read etc.). These functions take either a pointer or a size_t > > >> argument which are 32-bit with ILP32 but treated as 64-bit by an LP64 > > >> kernel. Can we guarantee that user space zeros the top 32-bit of the > > >> arguments passed here? > > > > > > I'm pretty sure that is safe. I haven't read the calling conventions > > > specification for arm64 ilp32, but usually all function arguments are > > > passed as 64-bit registers with proper sign-extend or zero-extend. > > > > Well (just like LP64 on AARCH64), when passing a 32bit value to a > > function, the upper 32bits are undefined. I ran into this when I was > > debugging the GCC go library on ILP32 (though reproduced with pure C > > code) and the assembly functions inside glibc where pointers are > > passed with the upper 32bits as undefined. > > So we have an issue if called with syscall function or using pure > > assembly to create the syscall functions (which glibc does). > > Ok, I see :-( > > So the calling conventions avoid the problem of being able to set > the upper bits from malicious user space when the kernel assumes they > are zeroed out (we had security bugs in this area, before we introduced > SYSCALL_DEFINEx()), but it means that we need wrappers around each > syscall that takes an argument that is different length between user > and kernel space (as Catalin guessed). arch/s390 has the same problem and > works around it with code in arch/s390/kernel/compat_wrapper.c, while > other architectures (at least powerpc, x86 and tile IIRC, don't know much > about mips, parisc and sparc) don't have the problem because of their > calling conventions. > > This also means that we cannot work around it in glibc at all, because > we have to be able to handle malicious user space, so it has to be > done in the kernel using something similar to what s390 does. > > Arnd So it seems like we (should) have 2 compat modes - with and without access to upper half of register. I'm thinking now on how put it in generic unistd.h less painfull way. Beside of that, I think I almost finished with all current comments. As this issue is not related to ILP32 directly, I think, it's better to show it now, as there is pretty massive rework. What do you think? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tuesday 05 January 2016 18:26:57 Yury Norov wrote: > > So the calling conventions avoid the problem of being able to set > > the upper bits from malicious user space when the kernel assumes they > > are zeroed out (we had security bugs in this area, before we introduced > > SYSCALL_DEFINEx()), but it means that we need wrappers around each > > syscall that takes an argument that is different length between user > > and kernel space (as Catalin guessed). arch/s390 has the same problem and > > works around it with code in arch/s390/kernel/compat_wrapper.c, while > > other architectures (at least powerpc, x86 and tile IIRC, don't know much > > about mips, parisc and sparc) don't have the problem because of their > > calling conventions. > > > > This also means that we cannot work around it in glibc at all, because > > we have to be able to handle malicious user space, so it has to be > > done in the kernel using something similar to what s390 does. > > > > Arnd > > So it seems like we (should) have 2 compat modes - with and without access > to upper half of register. I'm thinking now on how put it in generic > unistd.h less painfull way. I think we can do that by slightly modifying the existing __SYSCALL/__SC_3264/ __SC_COMP/__SC_COMP_3264 macros: The first two need extra wrappers for arm64-ilp32 and s390, the other two don't. We can use some clever string concatenation to add a ##_wrapper to the name of the handler where needed and then just have a file that implements the wrappers, copied from s390. Unfortunately, we can't just zero out all the upper halves and be done with it: even if we went back to passing 64-bit arguments as separate 32-bit registers, we'd still need to deal with sign-extending negative 32-bit numbers. > Beside of that, I think I almost finished with all current comments. As > this issue is not related to ILP32 directly, I think, it's better to show > it now, as there is pretty massive rework. What do you think? Good idea, yes. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Tue, Jan 05, 2016 at 10:12:20PM +0100, Arnd Bergmann wrote: > On Tuesday 05 January 2016 18:26:57 Yury Norov wrote: > > > So the calling conventions avoid the problem of being able to set > > > the upper bits from malicious user space when the kernel assumes they > > > are zeroed out (we had security bugs in this area, before we introduced > > > SYSCALL_DEFINEx()), but it means that we need wrappers around each > > > syscall that takes an argument that is different length between user > > > and kernel space (as Catalin guessed). arch/s390 has the same problem and > > > works around it with code in arch/s390/kernel/compat_wrapper.c, while > > > other architectures (at least powerpc, x86 and tile IIRC, don't know much > > > about mips, parisc and sparc) don't have the problem because of their > > > calling conventions. > > > > > > This also means that we cannot work around it in glibc at all, because > > > we have to be able to handle malicious user space, so it has to be > > > done in the kernel using something similar to what s390 does. > > > > So it seems like we (should) have 2 compat modes - with and without access > > to upper half of register. I'm thinking now on how put it in generic > > unistd.h less painfull way. > > I think we can do that by slightly modifying the existing __SYSCALL/__SC_3264/ > __SC_COMP/__SC_COMP_3264 macros: The first two need extra wrappers for > arm64-ilp32 and s390, the other two don't. > > We can use some clever string concatenation to add a ##_wrapper to the name > of the handler where needed and then just have a file that implements > the wrappers, copied from s390. > > Unfortunately, we can't just zero out all the upper halves and be done with > it: even if we went back to passing 64-bit arguments as separate 32-bit > registers, we'd still need to deal with sign-extending negative 32-bit > numbers. How many syscalls would we need sign-extension for? Most are probably already handled by specific compat_sys_* functions, otherwise A32 compat wouldn't work properly. Anyway, I think we can get away with not modifying the generic __SYSCALL definition and only use something like arch/s390/kernel/compat_wrapper.c. In sys_ilp32.c, we would make __SYSCALL expand the function name with some ilp32_ prefix. For existing compat_* syscalls, we only need to handle the pointer types (something like the s390's __TYPE_IS_PTR). I think other types are already handled by defining the prototype with compat_ulong_t etc. For native syscalls like sys_read, apart from pointers we also need to handle size_t. The wrapper would need to be defined using compat types: ILP32_SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, compat_size_t, count) and let the compiler handle the conversion to size_t automatically when calling sys_read from the wrapper. > > Beside of that, I think I almost finished with all current comments. As > > this issue is not related to ILP32 directly, I think, it's better to show > > it now, as there is pretty massive rework. What do you think? > > Good idea, yes. Note that we still need to sort the 0/sign extension out before we "declare" the ILP32 ABI stable. -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Wednesday 06 January 2016 17:10:47 Catalin Marinas wrote: > On Tue, Jan 05, 2016 at 10:12:20PM +0100, Arnd Bergmann wrote: > > On Tuesday 05 January 2016 18:26:57 Yury Norov wrote: > > > > So the calling conventions avoid the problem of being able to set > > > > the upper bits from malicious user space when the kernel assumes they > > > > are zeroed out (we had security bugs in this area, before we introduced > > > > SYSCALL_DEFINEx()), but it means that we need wrappers around each > > > > syscall that takes an argument that is different length between user > > > > and kernel space (as Catalin guessed). arch/s390 has the same problem and > > > > works around it with code in arch/s390/kernel/compat_wrapper.c, while > > > > other architectures (at least powerpc, x86 and tile IIRC, don't know much > > > > about mips, parisc and sparc) don't have the problem because of their > > > > calling conventions. > > > > > > > > This also means that we cannot work around it in glibc at all, because > > > > we have to be able to handle malicious user space, so it has to be > > > > done in the kernel using something similar to what s390 does. > > > > > > So it seems like we (should) have 2 compat modes - with and without access > > > to upper half of register. I'm thinking now on how put it in generic > > > unistd.h less painfull way. > > > > I think we can do that by slightly modifying the existing __SYSCALL/__SC_3264/ > > __SC_COMP/__SC_COMP_3264 macros: The first two need extra wrappers for > > arm64-ilp32 and s390, the other two don't. > > > > We can use some clever string concatenation to add a ##_wrapper to the name > > of the handler where needed and then just have a file that implements > > the wrappers, copied from s390. > > > > Unfortunately, we can't just zero out all the upper halves and be done with > > it: even if we went back to passing 64-bit arguments as separate 32-bit > > registers, we'd still need to deal with sign-extending negative 32-bit > > numbers. > > How many syscalls would we need sign-extension for? Most are probably > already handled by specific compat_sys_* functions, otherwise A32 compat > wouldn't work properly. Good point. I suppose any system call that expects a negative argument may run into this on all architectures and require a COMPAT_SYSCALL handler, but only s390 cares about doing the extension for the entire set of syscalls. This may be to work around a peculiarity of s390, which has now two but three possible 32-to-64 extension modes: signed int, unsigned int and pointer. The third one sets the top 33 bits to zero, clearing the top bit of the 31-bit pointer value in the process. Nothing else needs this, so if we just clear the upper bits on all system calls and go back to passing 64-bit arguments as pairs, we are fine and have a much simpler solution. > Anyway, I think we can get away with not modifying the generic __SYSCALL > definition and only use something like > arch/s390/kernel/compat_wrapper.c. In sys_ilp32.c, we would make > __SYSCALL expand the function name with some ilp32_ prefix. I couldn't think of a way, but I'm gladly proven wrong here. > For existing compat_* syscalls, we only need to handle the pointer types > (something like the s390's __TYPE_IS_PTR). I think other types are > already handled by defining the prototype with compat_ulong_t etc. Right. > For native syscalls like sys_read, apart from pointers we also need to > handle size_t. The wrapper would need to be defined using compat types: > > ILP32_SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, compat_size_t, count) > > and let the compiler handle the conversion to size_t automatically when > calling sys_read from the wrapper. Correct. I don't think we need an ILP32_SYSCALL_DEFINEx set of macros though, the existing COMPAT_SYSCALL_DEFINEx ones should get this right already. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Thu, Jan 07, 2016 at 03:13:37PM +0100, Arnd Bergmann wrote: > On Wednesday 06 January 2016 17:10:47 Catalin Marinas wrote: > > On Tue, Jan 05, 2016 at 10:12:20PM +0100, Arnd Bergmann wrote: > > > On Tuesday 05 January 2016 18:26:57 Yury Norov wrote: > > > > > So the calling conventions avoid the problem of being able to set > > > > > the upper bits from malicious user space when the kernel assumes they > > > > > are zeroed out (we had security bugs in this area, before we introduced > > > > > SYSCALL_DEFINEx()), but it means that we need wrappers around each > > > > > syscall that takes an argument that is different length between user > > > > > and kernel space (as Catalin guessed). arch/s390 has the same problem and > > > > > works around it with code in arch/s390/kernel/compat_wrapper.c, while > > > > > other architectures (at least powerpc, x86 and tile IIRC, don't know much > > > > > about mips, parisc and sparc) don't have the problem because of their > > > > > calling conventions. > > > > > > > > > > This also means that we cannot work around it in glibc at all, because > > > > > we have to be able to handle malicious user space, so it has to be > > > > > done in the kernel using something similar to what s390 does. > > > > > > > > So it seems like we (should) have 2 compat modes - with and without access > > > > to upper half of register. I'm thinking now on how put it in generic > > > > unistd.h less painfull way. > > > > > > I think we can do that by slightly modifying the existing __SYSCALL/__SC_3264/ > > > __SC_COMP/__SC_COMP_3264 macros: The first two need extra wrappers for > > > arm64-ilp32 and s390, the other two don't. > > > > > > We can use some clever string concatenation to add a ##_wrapper to the name > > > of the handler where needed and then just have a file that implements > > > the wrappers, copied from s390. > > > > > > Unfortunately, we can't just zero out all the upper halves and be done with > > > it: even if we went back to passing 64-bit arguments as separate 32-bit > > > registers, we'd still need to deal with sign-extending negative 32-bit > > > numbers. > > > > How many syscalls would we need sign-extension for? Most are probably > > already handled by specific compat_sys_* functions, otherwise A32 compat > > wouldn't work properly. > > Good point. I suppose any system call that expects a negative argument > may run into this on all architectures and require a COMPAT_SYSCALL handler, > but only s390 cares about doing the extension for the entire set of syscalls. > > This may be to work around a peculiarity of s390, which has now two > but three possible 32-to-64 extension modes: signed int, unsigned int > and pointer. The third one sets the top 33 bits to zero, clearing the > top bit of the 31-bit pointer value in the process. Nothing else needs > this, so if we just clear the upper bits on all system calls and go > back to passing 64-bit arguments as pairs, we are fine and have a much > simpler solution. Wrappers will not add extra complexity because we already have it in s390 port. In other hand, splitting and then assembling 64-bit values affects performance so small, that we may not care about it much. So, both approaches are acceptable for me. But I'd choose wrappers, because it looks more generic, and more fun. :) This way we can make something like ILP16 easily (can't imagine what for, though). > > > Anyway, I think we can get away with not modifying the generic __SYSCALL > > definition and only use something like > > arch/s390/kernel/compat_wrapper.c. In sys_ilp32.c, we would make > > __SYSCALL expand the function name with some ilp32_ prefix. > > I couldn't think of a way, but I'm gladly proven wrong here. > > > For existing compat_* syscalls, we only need to handle the pointer types > > (something like the s390's __TYPE_IS_PTR). I think other types are > > already handled by defining the prototype with compat_ulong_t etc. > > Right. > > > For native syscalls like sys_read, apart from pointers we also need to > > handle size_t. The wrapper would need to be defined using compat types: > > > > ILP32_SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, compat_size_t, count) > > > > and let the compiler handle the conversion to size_t automatically when > > calling sys_read from the wrapper. > > Correct. I don't think we need an ILP32_SYSCALL_DEFINEx set of macros > though, the existing COMPAT_SYSCALL_DEFINEx ones should get this right > already. > > Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Thu, Jan 07, 2016 at 03:13:37PM +0100, Arnd Bergmann wrote: > On Wednesday 06 January 2016 17:10:47 Catalin Marinas wrote: > > On Tue, Jan 05, 2016 at 10:12:20PM +0100, Arnd Bergmann wrote: > > > On Tuesday 05 January 2016 18:26:57 Yury Norov wrote: > > > > > So the calling conventions avoid the problem of being able to set > > > > > the upper bits from malicious user space when the kernel assumes they > > > > > are zeroed out (we had security bugs in this area, before we introduced > > > > > SYSCALL_DEFINEx()), but it means that we need wrappers around each > > > > > syscall that takes an argument that is different length between user > > > > > and kernel space (as Catalin guessed). arch/s390 has the same problem and > > > > > works around it with code in arch/s390/kernel/compat_wrapper.c, while > > > > > other architectures (at least powerpc, x86 and tile IIRC, don't know much > > > > > about mips, parisc and sparc) don't have the problem because of their > > > > > calling conventions. > > > > > > > > > > This also means that we cannot work around it in glibc at all, because > > > > > we have to be able to handle malicious user space, so it has to be > > > > > done in the kernel using something similar to what s390 does. > > > > > > > > So it seems like we (should) have 2 compat modes - with and without access > > > > to upper half of register. I'm thinking now on how put it in generic > > > > unistd.h less painfull way. > > > > > > I think we can do that by slightly modifying the existing __SYSCALL/__SC_3264/ > > > __SC_COMP/__SC_COMP_3264 macros: The first two need extra wrappers for > > > arm64-ilp32 and s390, the other two don't. > > > > > > We can use some clever string concatenation to add a ##_wrapper to the name > > > of the handler where needed and then just have a file that implements > > > the wrappers, copied from s390. > > > > > > Unfortunately, we can't just zero out all the upper halves and be done with > > > it: even if we went back to passing 64-bit arguments as separate 32-bit > > > registers, we'd still need to deal with sign-extending negative 32-bit > > > numbers. > > > > How many syscalls would we need sign-extension for? Most are probably > > already handled by specific compat_sys_* functions, otherwise A32 compat > > wouldn't work properly. > > Good point. I suppose any system call that expects a negative argument > may run into this on all architectures and require a COMPAT_SYSCALL handler, > but only s390 cares about doing the extension for the entire set of syscalls. > > This may be to work around a peculiarity of s390, which has now two > but three possible 32-to-64 extension modes: signed int, unsigned int > and pointer. The third one sets the top 33 bits to zero, clearing the > top bit of the 31-bit pointer value in the process. Nothing else needs > this, so if we just clear the upper bits on all system calls and go > back to passing 64-bit arguments as pairs, we are fine and have a much > simpler solution. It would be indeed simpler from a kernel perspective. I'm not sure about the performance impact (a wrapper which does "mov wn, wn" for the first 6 registers, on top of existing wrappers). OTOH, with explicit wrappers we have the overhead of an additional function call, so we may be better off with the former. > > For native syscalls like sys_read, apart from pointers we also need to > > handle size_t. The wrapper would need to be defined using compat types: > > > > ILP32_SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, compat_size_t, count) > > > > and let the compiler handle the conversion to size_t automatically when > > calling sys_read from the wrapper. > > Correct. I don't think we need an ILP32_SYSCALL_DEFINEx set of macros > though, the existing COMPAT_SYSCALL_DEFINEx ones should get this right > already. The existing COMPAT_SYSCALL_DEFINEx macros generate the wrapper and definition for the compat_sys_* functions. What I meant by an ILP32_SYSCALL_DEFINEx is a macro which only generates a wrapper that calls into the native syscall (after sanitizing the arguments). -- Catalin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index e6216ef..230db54 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -13,6 +13,12 @@ * You should have received a copy of the GNU General Public License * along with this program. If not, see <http://www.gnu.org/licenses/>. */ + +#ifdef CONFIG_ARM64_ILP32 +#define __ARCH_WANT_COMPAT_SYS_PREADV64 +#define __ARCH_WANT_COMPAT_SYS_PWRITEV64 +#endif + #ifdef CONFIG_AARCH32_EL0 #define __ARCH_WANT_COMPAT_SYS_GETDENTS64 #define __ARCH_WANT_SYS_GETHOSTNAME diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 1470332..8787347 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -24,6 +24,7 @@ arm64-obj-$(CONFIG_AARCH32_EL0) += sys32.o kuser32.o signal32.o \ sys_compat.o entry32.o \ ../../arm/kernel/opcodes.o arm64-obj-$(CONFIG_FUNCTION_TRACER) += ftrace.o entry-ftrace.o +arm64-obj-$(CONFIG_ARM64_ILP32) += sys_ilp32.o arm64-obj-$(CONFIG_MODULES) += arm64ksyms.o module.o arm64-obj-$(CONFIG_PERF_EVENTS) += perf_regs.o perf_callchain.o arm64-obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 52be5c8..bcd921a 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -664,9 +664,13 @@ ENDPROC(ret_from_fork) */ .align 6 el0_svc: - adrp stbl, sys_call_table // load syscall table pointer uxtw scno, w8 // syscall number in w8 mov sc_nr, #__NR_syscalls +#ifdef CONFIG_ARM64_ILP32 + ldr x16, [tsk, #TI_FLAGS] + tbnz x16, #TIF_32BIT_AARCH64, el0_ilp32_svc // We are using ILP32 +#endif + adrp stbl, sys_call_table // load syscall table pointer el0_svc_naked: // compat entry point stp x0, scno, [sp, #S_ORIG_X0] // save the original x0 and syscall number enable_dbg_and_irq @@ -686,6 +690,12 @@ ni_sys: b ret_fast_syscall ENDPROC(el0_svc) +#ifdef CONFIG_ARM64_ILP32 +el0_ilp32_svc: + adrp stbl, sys_call_ilp32_table // load syscall table pointer + b el0_svc_naked +#endif + /* * This is the really slow path. We're going to be doing context * switches, and waiting for our parent to respond. diff --git a/arch/arm64/kernel/sys_ilp32.c b/arch/arm64/kernel/sys_ilp32.c new file mode 100644 index 0000000..8ce79db --- /dev/null +++ b/arch/arm64/kernel/sys_ilp32.c @@ -0,0 +1,65 @@ +/* + * AArch64- ILP32 specific system calls implementation + * + * Copyright (C) 2015 Cavium Inc. + * Author: Andrew Pinski <apinski@cavium.com> + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see <http://www.gnu.org/licenses/>. + */ + +#include <linux/compiler.h> +#include <linux/errno.h> +#include <linux/fs.h> +#include <linux/mm.h> +#include <linux/msg.h> +#include <linux/export.h> +#include <linux/sched.h> +#include <linux/slab.h> +#include <linux/syscalls.h> +#include <linux/compat.h> + +/* Using non-compat syscalls where necessary */ +#define compat_sys_fadvise64_64 sys_fadvise64_64 +#define compat_sys_fallocate sys_fallocate +#define compat_sys_ftruncate64 sys_ftruncate +#define compat_sys_lookup_dcookie sys_lookup_dcookie +#define compat_sys_pread64 sys_pread64 +#define compat_sys_pwrite64 sys_pwrite64 +#define compat_sys_readahead sys_readahead +#define compat_sys_shmat sys_shmat +#define compat_sys_sigaltstack sys_sigaltstack +#define compat_sys_sync_file_range sys_sync_file_range +#define compat_sys_truncate64 sys_truncate +#define sys_llseek sys_lseek + +#define compat_sys_open_by_handle_at sys_open_by_handle_at +#define compat_sys_openat sys_openat + +#include <asm/syscall.h> + +#undef __SYSCALL +#undef __SC_COMP +#undef __SC_3264 +#undef __SC_COMP_3264 + +#define __SYSCALL_COMPAT +#define __SYSCALL(nr, sym) [nr] = sym, + +/* + * The sys_call_ilp32_table array must be 4K aligned to be accessible from + * kernel/entry.S. + */ +void *sys_call_ilp32_table[__NR_syscalls] __aligned(4096) = { + [0 ... __NR_syscalls - 1] = sys_ni_syscall, +#include <asm/unistd.h> +};