Message ID | 20171127170121.634826-3-arnd@arndb.de |
---|---|
State | Superseded |
Headers | show |
Series | [1/3] y2038: introduce struct __kernel_old_timeval | expand |
Paul Eggert <eggert@cs.ucla.edu> writes: > On 11/27/2017 09:00 AM, Arnd Bergmann wrote: >> b) Extend the approach taken by the x32 ABI, and use the 64-bit >> native structure layout for rusage on all architectures with new >> system calls that is otherwise compatible. A possible problem here >> is that we end up with incompatible definitions of rusage between >> /usr/include/linux/resource.h and /usr/include/bits/resource.h >> >> c) Change the definition of struct rusage to be independent of >> time_t. This is the easiest change, as it does not involve new system >> call entry points, but it has the risk of introducing compile-time >> incompatibilities with user space sources that rely on the type >> of ru_utime and ru_stime. >> >> I'm picking approch c) for its simplicity, but I'd like to hear from >> others whether they would prefer a different approach. > > (c) would break programs like GNU Emacs, which copy ru_utime and ru_stime > members into struct timeval variables. > > All in all, (b) sounds like it would be better for programs using glibc, as it's > more compatible with what POSIX apps expect. Though I'm not sure what problems > are meant by "possible ... incompatible definitions"; perhaps you could > elaborate. getrusage is posix and I believe the use of struct timeval is posix as well. So getrusage(3) the libc definition and that defintion must struct timeval or the implementation will be non-conforming and it won't be just emacs we need to worry about. The practical question is what do we provide to userspace so that it can implement a conforming getrusage? A 32bit time_t based struct timeval is good for durations up to 136 years or so. Which strongly suggests the range is large enough, except for some crazy massively multi-threaded application. And anything off the charts cpu hungry at this point I expect will be 64bit. It is possible to get a 128 way system with one thread on each core and consume 100% of the core for a bit over a year to max out getrusage. So I do think in the long run we care about increasing the size of time_t here. Last I checked applications doing things like that were 64bit in the year 2000. Given that userspace is going to be seeing the larger struct rusage in any event my inclination for long term maintainability would be to introduce the new syscall and have the current one called oldgetrusage on 32bit architectures. Then we won't have to worry about what weird things glibc will do when translating the data, and we can handle applications with crazy (but possible) runtimes. Which inclines me to (b) as well. As for (a) does anyone have a need for process acounting at nsec granularity? Unless we can get that for free that just seems like overpromising and a waist to have so much fine granularity. Eric
On Mon, Nov 27, 2017 at 7:49 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > Paul Eggert <eggert@cs.ucla.edu> writes: > >> On 11/27/2017 09:00 AM, Arnd Bergmann wrote: >>> b) Extend the approach taken by the x32 ABI, and use the 64-bit >>> native structure layout for rusage on all architectures with new >>> system calls that is otherwise compatible. A possible problem here >>> is that we end up with incompatible definitions of rusage between >>> /usr/include/linux/resource.h and /usr/include/bits/resource.h >>> >>> c) Change the definition of struct rusage to be independent of >>> time_t. This is the easiest change, as it does not involve new system >>> call entry points, but it has the risk of introducing compile-time >>> incompatibilities with user space sources that rely on the type >>> of ru_utime and ru_stime. >>> >>> I'm picking approch c) for its simplicity, but I'd like to hear from >>> others whether they would prefer a different approach. >> >> (c) would break programs like GNU Emacs, which copy ru_utime and ru_stime >> members into struct timeval variables. Right. I think I originally had the workaround to have glibc convert between its own structure and the kernel structure in mind, but then ended up not including that in the text above. I was going back and forth on whether it would be needed or not. >> All in all, (b) sounds like it would be better for programs using glibc, as it's >> more compatible with what POSIX apps expect. Though I'm not sure what problems >> are meant by "possible ... incompatible definitions"; perhaps you could >> elaborate. I meant that you might have an application that includes linux/resource.h instead of sys/resource.h but calls the glibc function, or one that includes sys/resource.h and invokes the system call directly. > getrusage is posix and I believe the use of struct timeval is posix as > well. > > So getrusage(3) the libc definition and that defintion must struct > timeval or the implementation will be non-conforming and it won't be > just emacs we need to worry about. > > The practical question is what do we provide to userspace so that it can > implement a conforming getrusage? > > A 32bit time_t based struct timeval is good for durations up to 136 years > or so. Which strongly suggests the range is large enough, except for > some crazy massively multi-threaded application. And anything off the > charts cpu hungry at this point I expect will be 64bit. > > It is possible to get a 128 way system with one thread on each core and > consume 100% of the core for a bit over a year to max out getrusage. So > I do think in the long run we care about increasing the size of time_t > here. Last I checked applications doing things like that were 64bit in > the year 2000. Agreed, this was also a calculation I did. > Given that userspace is going to be seeing the larger struct rusage in > any event my inclination for long term maintainability would be to > introduce the new syscall and have the current one called oldgetrusage > on 32bit architectures. Then we won't have to worry about what weird > things glibc will do when translating the data, and we can handle > applications with crazy (but possible) runtimes. Which inclines me to > (b) as well. This would actually be the same thing we do for most other syscalls, regarding the naming, it would become compat_sys_getrusage() and share the implementation between native 32-bit mode and compat mode on 64-bit architectures, while sys_getrusage becomes the function that deals with the 64-bit layout, and would have the same binary format on both 32-bit and 64-bit native ABIs. Unfortunately, this opens a new question, as the structure is currently defined by glibc as: /* Structure which says how much of each resource has been used. */ /* The purpose of all the unions is to have the kernel-compatible layout while keeping the API type as 'long int', and among machines where __syscall_slong_t is not 'long int', this only does the right thing for little-endian ones, like x32. */ struct rusage { /* Total amount of user time used. */ struct timeval ru_utime; /* Total amount of system time used. */ struct timeval ru_stime; /* Maximum resident set size (in kilobytes). */ __extension__ union { long int ru_maxrss; __syscall_slong_t __ru_maxrss_word; }; /* Amount of sharing of text segment memory with other processes (kilobyte-seconds). */ /* Maximum resident set size (in kilobytes). */ __extension__ union { long int ru_ixrss; __syscall_slong_t __ru_ixrss_word; }; ... }; Here, I guess we have to replace __syscall_slong_t with an 'rusage' specific type that has the same length as time_t, but is independent of __syscall_slong_t, which is still 32-bit for most 32-bit architectures. How would we do the big-endian version of that though? One argument for using c) plus the emulation in glibc is that glibc has to do emulation anyway, to allow running user space with 64-bit time_t on older kernels that don't have the new getrusage system call. > As for (a) does anyone have a need for process acounting at nsec > granularity? Unless we can get that for free that just seems like > overpromising and a waist to have so much fine granularity. The kernel does everything in nanoseconds, so we always spend a few cycles (a lot of cycles on some of the very low-end architectures) on dividing it by 1000. Moving the division operation to user space is essentially free, and using the nanoseconds instead of microseconds might be slightly cheaper. I don't think anyone really needs it though. Arnd
diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c index fa1a392ca9a2..445ded2ea471 100644 --- a/arch/alpha/kernel/osf_sys.c +++ b/arch/alpha/kernel/osf_sys.c @@ -970,7 +970,7 @@ put_tv32(struct timeval32 __user *o, struct timespec64 *i) } static inline long -put_tv_to_tv32(struct timeval32 __user *o, struct timeval *i) +put_tv_to_tv32(struct timeval32 __user *o, struct __kernel_old_timeval *i) { return copy_to_user(o, &(struct timeval32){ .tv_sec = i->tv_sec, diff --git a/include/uapi/linux/resource.h b/include/uapi/linux/resource.h index cc00fd079631..74ef57b38f9f 100644 --- a/include/uapi/linux/resource.h +++ b/include/uapi/linux/resource.h @@ -22,8 +22,8 @@ #define RUSAGE_THREAD 1 /* only the calling thread */ struct rusage { - struct timeval ru_utime; /* user time used */ - struct timeval ru_stime; /* system time used */ + struct __kernel_old_timeval ru_utime; /* user time used */ + struct __kernel_old_timeval ru_stime; /* system time used */ __kernel_long_t ru_maxrss; /* maximum resident set size */ __kernel_long_t ru_ixrss; /* integral shared memory size */ __kernel_long_t ru_idrss; /* integral unshared data size */ diff --git a/kernel/sys.c b/kernel/sys.c index 83ffd7dccf23..c459e294aa9e 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1717,8 +1717,8 @@ void getrusage(struct task_struct *p, int who, struct rusage *r) unlock_task_sighand(p, &flags); out: - r->ru_utime = ns_to_timeval(utime); - r->ru_stime = ns_to_timeval(stime); + r->ru_utime = ns_to_kernel_old_timeval(utime); + r->ru_stime = ns_to_kernel_old_timeval(stime); if (who != RUSAGE_CHILDREN) { struct mm_struct *mm = get_task_mm(p);
'struct rusage' contains the run times of a process in 'timeval' format and is accessed through the wait4() and getrusage() system calls. This is not a problem for y2038 safety by itself, but causes an issue when the C library starts using 64-bit time_t on 32-bit architectures because the structure layout becomes incompatible. There are three possible ways of dealing with this: a) deprecate the wait4() and getrusage() system calls, and create a set of kernel interfaces based around a newly defined structure that could solve multiple problems at once, e.g. provide more fine-grained timestamps. The C library could then implement the posix interfaces on top of the new system calls. b) Extend the approach taken by the x32 ABI, and use the 64-bit native structure layout for rusage on all architectures with new system calls that is otherwise compatible. A possible problem here is that we end up with incompatible definitions of rusage between /usr/include/linux/resource.h and /usr/include/bits/resource.h c) Change the definition of struct rusage to be independent of time_t. This is the easiest change, as it does not involve new system call entry points, but it has the risk of introducing compile-time incompatibilities with user space sources that rely on the type of ru_utime and ru_stime. I'm picking approch c) for its simplicity, but I'd like to hear from others whether they would prefer a different approach. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- arch/alpha/kernel/osf_sys.c | 2 +- include/uapi/linux/resource.h | 4 ++-- kernel/sys.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) -- 2.9.0