[11/23] y2038: rusage: use __kernel_old_timeval

Message ID 20191108211323.1806194-2-arnd@arndb.de
State Accepted
Commit bdd565f817a74b9e30edec108f7cb1dbc762b8a6
Headers show
Series
  • y2038 cleanups
Related show

Commit Message

Arnd Bergmann Nov. 8, 2019, 9:12 p.m.
There are two 'struct timeval' fields in 'struct rusage'.

Unfortunately the definition of timeval is now ambiguous when used in
user space with a libc that has a 64-bit time_t, and this also changes
the 'rusage' definition in user space in a way that is incompatible with
the system call interface.

While there is no good solution to avoid all ambiguity here, change
the definition in the kernel headers to be compatible with the kernel
ABI, using __kernel_old_timeval as an unambiguous base type.

In previous discussions, there was also a plan to add a replacement
for rusage based on 64-bit timestamps and nanosecond resolution,
i.e. 'struct __kernel_timespec'. I have patches for that as well,
if anyone thinks we should do that.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>

---
Question: should we also rename 'struct rusage' into 'struct __kernel_rusage'
here, to make them completely unambiguous?
---
 arch/alpha/kernel/osf_sys.c   | 2 +-
 include/uapi/linux/resource.h | 4 ++--
 kernel/sys.c                  | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

-- 
2.20.0

Comments

Cyrill Gorcunov Nov. 12, 2019, 9:09 p.m. | #1
On Fri, Nov 08, 2019 at 10:12:10PM +0100, Arnd Bergmann wrote:
> There are two 'struct timeval' fields in 'struct rusage'.

> 

> Unfortunately the definition of timeval is now ambiguous when used in

> user space with a libc that has a 64-bit time_t, and this also changes

> the 'rusage' definition in user space in a way that is incompatible with

> the system call interface.

> 

> While there is no good solution to avoid all ambiguity here, change

> the definition in the kernel headers to be compatible with the kernel

> ABI, using __kernel_old_timeval as an unambiguous base type.

> 

> In previous discussions, there was also a plan to add a replacement

> for rusage based on 64-bit timestamps and nanosecond resolution,

> i.e. 'struct __kernel_timespec'. I have patches for that as well,

> if anyone thinks we should do that.

> 

> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> ---

> Question: should we also rename 'struct rusage' into 'struct __kernel_rusage'

> here, to make them completely unambiguous?


The patch looks ok to me. I must confess I looked into rusage long ago
so __kernel_timespec type used in uapi made me nervious at first,
but then i found that we've this type defined in time_types.h uapi
so userspace should be safe. I also like the idea of __kernel_rusage
but definitely on top of the series.

Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>
Arnd Bergmann Nov. 13, 2019, 10:02 a.m. | #2
On Tue, Nov 12, 2019 at 10:09 PM Cyrill Gorcunov <gorcunov@gmail.com> wrote:
>

> On Fri, Nov 08, 2019 at 10:12:10PM +0100, Arnd Bergmann wrote:


> > ---

> > Question: should we also rename 'struct rusage' into 'struct __kernel_rusage'

> > here, to make them completely unambiguous?

>

> The patch looks ok to me. I must confess I looked into rusage long ago

> so __kernel_timespec type used in uapi made me nervious at first,

> but then i found that we've this type defined in time_types.h uapi

> so userspace should be safe. I also like the idea of __kernel_rusage

> but definitely on top of the series.


There are clearly too many time types at the moment, but I'm in the
process of throwing out the ones we no longer need now.

I do have a number patches implementing other variants for the syscall,
and I suppose that if we end up adding __kernel_rusage, that would
have to go with a set of syscalls using 64-bit seconds/nanoseconds
rather than the old 32/64 microseconds. I don't know what other
changes remain that anyone would want from sys_waitid() now that
it does support pidfd.

If there is still a need for a new waitid() replacement, that should take
that new __kernel_rusage I think, but until then I hope we are fine
with today's getrusage+waitid based on the current struct rusage.

BSD has wait6() to return separate rusage structures for 'self' and
'children', but I could not find any application (using the freebsd
sources and debian code search) that actually uses that information,
so there might not be any demand for that.

> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com>


Thanks,

      Arnd
Cyrill Gorcunov Nov. 13, 2019, 5:22 p.m. | #3
On Wed, Nov 13, 2019 at 11:02:12AM +0100, Arnd Bergmann wrote:
...
> 

> There are clearly too many time types at the moment, but I'm in the

> process of throwing out the ones we no longer need now.


Cool!

> I do have a number patches implementing other variants for the syscall,

> and I suppose that if we end up adding __kernel_rusage, that would

> have to go with a set of syscalls using 64-bit seconds/nanoseconds

> rather than the old 32/64 microseconds. I don't know what other

> changes remain that anyone would want from sys_waitid() now that

> it does support pidfd.

> 

> If there is still a need for a new waitid() replacement, that should take

> that new __kernel_rusage I think, but until then I hope we are fine

> with today's getrusage+waitid based on the current struct rusage.


Definitely.

> 

> BSD has wait6() to return separate rusage structures for 'self' and

> 'children', but I could not find any application (using the freebsd

> sources and debian code search) that actually uses that information,

> so there might not be any demand for that.


Thanks for detailed info Arnd!
Christian Brauner Nov. 14, 2019, 12:38 a.m. | #4
On Wed, Nov 13, 2019 at 11:02:12AM +0100, Arnd Bergmann wrote:
> On Tue, Nov 12, 2019 at 10:09 PM Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> >

> > On Fri, Nov 08, 2019 at 10:12:10PM +0100, Arnd Bergmann wrote:

> 

> > > ---

> > > Question: should we also rename 'struct rusage' into 'struct __kernel_rusage'

> > > here, to make them completely unambiguous?

> >

> > The patch looks ok to me. I must confess I looked into rusage long ago

> > so __kernel_timespec type used in uapi made me nervious at first,

> > but then i found that we've this type defined in time_types.h uapi

> > so userspace should be safe. I also like the idea of __kernel_rusage

> > but definitely on top of the series.

> 

> There are clearly too many time types at the moment, but I'm in the

> process of throwing out the ones we no longer need now.

> 

> I do have a number patches implementing other variants for the syscall,

> and I suppose that if we end up adding __kernel_rusage, that would

> have to go with a set of syscalls using 64-bit seconds/nanoseconds

> rather than the old 32/64 microseconds. I don't know what other

> changes remain that anyone would want from sys_waitid() now that

> it does support pidfd.

> 

> If there is still a need for a new waitid() replacement, that should take

> that new __kernel_rusage I think, but until then I hope we are fine

> with today's getrusage+waitid based on the current struct rusage.


Note, that glibc does _not_ expose the rusage argument, i.e. most of
userspace is unaware that waitid() does allow you to get rusage
information. So users first need to know that waitid() has an rusage
argument and then need to call the waitid() syscall directly.

> 

> BSD has wait6() to return separate rusage structures for 'self' and

> 'children', but I could not find any application (using the freebsd

> sources and debian code search) that actually uses that information,

> so there might not be any demand for that.


Speaking specifically for Linux now, I think that rusage does not
actually expose the information most relevant users are interested in.
On Linux nowadays it is _way_ more interesting to retrieve stats
relative to the cgroup the task lived in etc.
Doing a git grep -i rusage in the systemd source code shows that rusage
is used _nowhere_. And I consider an init system to be the most likely
candidate to be interested in rusage.

	Christian
Arnd Bergmann Nov. 14, 2019, 10:18 a.m. | #5
On Thu, Nov 14, 2019 at 1:38 AM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
> On Wed, Nov 13, 2019 at 11:02:12AM +0100, Arnd Bergmann wrote:

> > On Tue, Nov 12, 2019 at 10:09 PM Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> > >

> > > On Fri, Nov 08, 2019 at 10:12:10PM +0100, Arnd Bergmann wrote:

> >

> > > > ---

> > > > Question: should we also rename 'struct rusage' into 'struct __kernel_rusage'

> > > > here, to make them completely unambiguous?

> > >

> > > The patch looks ok to me. I must confess I looked into rusage long ago

> > > so __kernel_timespec type used in uapi made me nervious at first,

> > > but then i found that we've this type defined in time_types.h uapi

> > > so userspace should be safe. I also like the idea of __kernel_rusage

> > > but definitely on top of the series.

> >

> > There are clearly too many time types at the moment, but I'm in the

> > process of throwing out the ones we no longer need now.

> >

> > I do have a number patches implementing other variants for the syscall,

> > and I suppose that if we end up adding __kernel_rusage, that would

> > have to go with a set of syscalls using 64-bit seconds/nanoseconds

> > rather than the old 32/64 microseconds. I don't know what other

> > changes remain that anyone would want from sys_waitid() now that

> > it does support pidfd.

> >

> > If there is still a need for a new waitid() replacement, that should take

> > that new __kernel_rusage I think, but until then I hope we are fine

> > with today's getrusage+waitid based on the current struct rusage.

>

> Note, that glibc does _not_ expose the rusage argument, i.e. most of

> userspace is unaware that waitid() does allow you to get rusage

> information. So users first need to know that waitid() has an rusage

> argument and then need to call the waitid() syscall directly.


On architectures that don't have a wait4 syscall (riscv32 for now),
glibc uses waitid to implement wait4 and wait3.

> > BSD has wait6() to return separate rusage structures for 'self' and

> > 'children', but I could not find any application (using the freebsd

> > sources and debian code search) that actually uses that information,

> > so there might not be any demand for that.

>

> Speaking specifically for Linux now, I think that rusage does not

> actually expose the information most relevant users are interested in.

> On Linux nowadays it is _way_ more interesting to retrieve stats

> relative to the cgroup the task lived in etc.

> Doing a git grep -i rusage in the systemd source code shows that rusage

> is used _nowhere_. And I consider an init system to be the most likely

> candidate to be interested in rusage.


I looked at a couple of implementations of time(1), this is one example
that sometimes uses wait3(), though other implementations just call
getrusage() in the parent process before the fork/exec. None of them
actually seem to report better than millisecond resolution, so there is
not a strict reason to do a timespec replacement for these.

       Arnd
Christian Brauner Nov. 14, 2019, 10:23 a.m. | #6
[+Cc Florian, libc-alpha]

On Thu, Nov 14, 2019 at 11:18:15AM +0100, Arnd Bergmann wrote:
> On Thu, Nov 14, 2019 at 1:38 AM Christian Brauner

> <christian.brauner@ubuntu.com> wrote:

> > On Wed, Nov 13, 2019 at 11:02:12AM +0100, Arnd Bergmann wrote:

> > > On Tue, Nov 12, 2019 at 10:09 PM Cyrill Gorcunov <gorcunov@gmail.com> wrote:

> > > >

> > > > On Fri, Nov 08, 2019 at 10:12:10PM +0100, Arnd Bergmann wrote:

> > >

> > > > > ---

> > > > > Question: should we also rename 'struct rusage' into 'struct __kernel_rusage'

> > > > > here, to make them completely unambiguous?

> > > >

> > > > The patch looks ok to me. I must confess I looked into rusage long ago

> > > > so __kernel_timespec type used in uapi made me nervious at first,

> > > > but then i found that we've this type defined in time_types.h uapi

> > > > so userspace should be safe. I also like the idea of __kernel_rusage

> > > > but definitely on top of the series.

> > >

> > > There are clearly too many time types at the moment, but I'm in the

> > > process of throwing out the ones we no longer need now.

> > >

> > > I do have a number patches implementing other variants for the syscall,

> > > and I suppose that if we end up adding __kernel_rusage, that would

> > > have to go with a set of syscalls using 64-bit seconds/nanoseconds

> > > rather than the old 32/64 microseconds. I don't know what other

> > > changes remain that anyone would want from sys_waitid() now that

> > > it does support pidfd.

> > >

> > > If there is still a need for a new waitid() replacement, that should take

> > > that new __kernel_rusage I think, but until then I hope we are fine

> > > with today's getrusage+waitid based on the current struct rusage.

> >

> > Note, that glibc does _not_ expose the rusage argument, i.e. most of

> > userspace is unaware that waitid() does allow you to get rusage

> > information. So users first need to know that waitid() has an rusage

> > argument and then need to call the waitid() syscall directly.

> 

> On architectures that don't have a wait4 syscall (riscv32 for now),

> glibc uses waitid to implement wait4 and wait3.


Yes, and there's an ongoing discussion to implement wait4() on all
arches through waitid(), I think. I haven't followed it too closely.

> 

> > > BSD has wait6() to return separate rusage structures for 'self' and

> > > 'children', but I could not find any application (using the freebsd

> > > sources and debian code search) that actually uses that information,

> > > so there might not be any demand for that.

> >

> > Speaking specifically for Linux now, I think that rusage does not

> > actually expose the information most relevant users are interested in.

> > On Linux nowadays it is _way_ more interesting to retrieve stats

> > relative to the cgroup the task lived in etc.

> > Doing a git grep -i rusage in the systemd source code shows that rusage

> > is used _nowhere_. And I consider an init system to be the most likely

> > candidate to be interested in rusage.

> 

> I looked at a couple of implementations of time(1), this is one example

> that sometimes uses wait3(), though other implementations just call

> getrusage() in the parent process before the fork/exec. None of them

> actually seem to report better than millisecond resolution, so there is

> not a strict reason to do a timespec replacement for these.


Right, though I have to say that for the sake of consistency I'd much
rather have a replacement. We're doing all this work right now so we
might as well. But I get the point.

	Christian

Patch

diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index bf497b8b0ec6..bbe7a0da6264 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -963,7 +963,7 @@  put_tv32(struct timeval32 __user *o, struct timespec64 *i)
 }
 
 static inline long
-put_tv_to_tv32(struct timeval32 __user *o, struct timeval *i)
+put_tv_to_tv32(struct timeval32 __user *o, struct __kernel_old_timeval *i)
 {
 	return copy_to_user(o, &(struct timeval32){
 				.tv_sec = i->tv_sec,
diff --git a/include/uapi/linux/resource.h b/include/uapi/linux/resource.h
index cc00fd079631..74ef57b38f9f 100644
--- a/include/uapi/linux/resource.h
+++ b/include/uapi/linux/resource.h
@@ -22,8 +22,8 @@ 
 #define	RUSAGE_THREAD	1		/* only the calling thread */
 
 struct	rusage {
-	struct timeval ru_utime;	/* user time used */
-	struct timeval ru_stime;	/* system time used */
+	struct __kernel_old_timeval ru_utime;	/* user time used */
+	struct __kernel_old_timeval ru_stime;	/* system time used */
 	__kernel_long_t	ru_maxrss;	/* maximum resident set size */
 	__kernel_long_t	ru_ixrss;	/* integral shared memory size */
 	__kernel_long_t	ru_idrss;	/* integral unshared data size */
diff --git a/kernel/sys.c b/kernel/sys.c
index a611d1d58c7d..d3aef31e24dc 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1763,8 +1763,8 @@  void getrusage(struct task_struct *p, int who, struct rusage *r)
 	unlock_task_sighand(p, &flags);
 
 out:
-	r->ru_utime = ns_to_timeval(utime);
-	r->ru_stime = ns_to_timeval(stime);
+	r->ru_utime = ns_to_kernel_old_timeval(utime);
+	r->ru_stime = ns_to_kernel_old_timeval(stime);
 
 	if (who != RUSAGE_CHILDREN) {
 		struct mm_struct *mm = get_task_mm(p);