diff mbox series

[net-next,v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args.

Message ID 20210206203648.609650-1-arjunroy.kdev@gmail.com
State Superseded
Headers show
Series [net-next,v2] tcp: Explicitly mark reserved field in tcp_zerocopy_receive args. | expand

Commit Message

Arjun Roy Feb. 6, 2021, 8:36 p.m. UTC
From: Arjun Roy <arjunroy@google.com>

Explicitly define reserved field and require it to be 0-valued.

Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.")
Signed-off-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Suggested-by: David Ahern <dsahern@gmail.com>
Suggested-by: Leon Romanovsky <leon@kernel.org>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
---
 include/uapi/linux/tcp.h | 2 +-
 net/ipv4/tcp.c           | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

Comments

David Ahern Feb. 7, 2021, 5:49 p.m. UTC | #1
On 2/6/21 1:36 PM, Arjun Roy wrote:
> From: Arjun Roy <arjunroy@google.com>
> 
> Explicitly define reserved field and require it to be 0-valued.
> 
> Fixes: 7eeba1706eba ("tcp: Add receive timestamp support for receive zerocopy.")
> Signed-off-by: Arjun Roy <arjunroy@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
> Suggested-by: David Ahern <dsahern@gmail.com>
> Suggested-by: Leon Romanovsky <leon@kernel.org>
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> ---
>  include/uapi/linux/tcp.h | 2 +-
>  net/ipv4/tcp.c           | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> index 42fc5a640df4..8fc09e8638b3 100644
> --- a/include/uapi/linux/tcp.h
> +++ b/include/uapi/linux/tcp.h
> @@ -357,6 +357,6 @@ struct tcp_zerocopy_receive {
>  	__u64 msg_control; /* ancillary data */
>  	__u64 msg_controllen;
>  	__u32 msg_flags;
> -	/* __u32 hole;  Next we must add >1 u32 otherwise length checks fail. */
> +	__u32 reserved; /* set to 0 for now */
>  };
>  #endif /* _UAPI_LINUX_TCP_H */
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index e1a17c6b473c..c8469c579ed8 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
>  		}
>  		if (copy_from_user(&zc, optval, len))
>  			return -EFAULT;
> +		if (zc.reserved)
> +			return -EINVAL;
>  		lock_sock(sk);
>  		err = tcp_zerocopy_receive(sk, &zc, &tss);
>  		release_sock(sk);
> 


The 'switch (len)' statement needs to be updated now that 'len' is not
going to end on the 'msg_flags' boundary? But then, how did that work
before if there was 4 byte padding?

Maybe I am missing something here. You currently have:

	switch (len) {
	case offsetofend(struct tcp_zerocopy_receive, msg_flags):

which should == 60, yet the struct size is 64 with 4-bytes of padding. A
user doing

	int optlen = sizeof(struct tcp_zerocopy_receive);

	getsockopt(...., &optlen)

would pass in a value of 64, right?
Jakub Kicinski Feb. 8, 2021, 6:41 p.m. UTC | #2
On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote:
> On Sat, Feb 06, 2021 at 03:28:28PM -0800, Jakub Kicinski wrote:

> > On Sat,  6 Feb 2021 12:36:48 -0800 Arjun Roy wrote:  

> > > From: Arjun Roy <arjunroy@google.com>

> > >

> > > Explicitly define reserved field and require it to be 0-valued.  

> >  

> > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c

> > > index e1a17c6b473c..c8469c579ed8 100644

> > > --- a/net/ipv4/tcp.c

> > > +++ b/net/ipv4/tcp.c

> > > @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,

> > >  		}

> > >  		if (copy_from_user(&zc, optval, len))

> > >  			return -EFAULT;

> > > +		if (zc.reserved)

> > > +			return -EINVAL;

> > >  		lock_sock(sk);

> > >  		err = tcp_zerocopy_receive(sk, &zc, &tss);

> > >  		release_sock(sk);  

> >

> > I was expecting we'd also throw in a check_zeroed_user().

> > Either we can check if the buffer is zeroed all the way,

> > or we can't and we shouldn't validate reserved either

> >

> > 	check_zeroed_user(optval + offsetof(reserved),

> > 			  len - offsetof(reserved))

> > ?  

> 

> There is a check that len is not larger than zs and users can't give

> large buffer.

> 

> I would say that is pretty safe to write "if (zc.reserved)".


Which check? There's a check which truncates (writes back to user space
len = min(len, sizeof(zc)). Application can still pass garbage beyond
sizeof(zc) and syscall may start failing in the future if sizeof(zc)
changes.
David Ahern Feb. 9, 2021, 2:24 a.m. UTC | #3
On 2/8/21 11:41 AM, Jakub Kicinski wrote:
> On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote:

>> On Sat, Feb 06, 2021 at 03:28:28PM -0800, Jakub Kicinski wrote:

>>> On Sat,  6 Feb 2021 12:36:48 -0800 Arjun Roy wrote:  

>>>> From: Arjun Roy <arjunroy@google.com>

>>>>

>>>> Explicitly define reserved field and require it to be 0-valued.  

>>>  

>>>> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c

>>>> index e1a17c6b473c..c8469c579ed8 100644

>>>> --- a/net/ipv4/tcp.c

>>>> +++ b/net/ipv4/tcp.c

>>>> @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,

>>>>  		}

>>>>  		if (copy_from_user(&zc, optval, len))

>>>>  			return -EFAULT;

>>>> +		if (zc.reserved)

>>>> +			return -EINVAL;

>>>>  		lock_sock(sk);

>>>>  		err = tcp_zerocopy_receive(sk, &zc, &tss);

>>>>  		release_sock(sk);  

>>>

>>> I was expecting we'd also throw in a check_zeroed_user().

>>> Either we can check if the buffer is zeroed all the way,

>>> or we can't and we shouldn't validate reserved either

>>>

>>> 	check_zeroed_user(optval + offsetof(reserved),

>>> 			  len - offsetof(reserved))

>>> ?  

>>

>> There is a check that len is not larger than zs and users can't give

>> large buffer.

>>

>> I would say that is pretty safe to write "if (zc.reserved)".

> 

> Which check? There's a check which truncates (writes back to user space

> len = min(len, sizeof(zc)). Application can still pass garbage beyond

> sizeof(zc) and syscall may start failing in the future if sizeof(zc)

> changes.

> 


That would be the case for new userspace on old kernel. Extending the
check to the end of the struct would guarantee new userspace can not ask
for something that the running kernel does not understand.
Jakub Kicinski Feb. 9, 2021, 2:53 a.m. UTC | #4
On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:
> On 2/8/21 11:41 AM, Jakub Kicinski wrote:

> > On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote:  

> >> There is a check that len is not larger than zs and users can't give

> >> large buffer.

> >>

> >> I would say that is pretty safe to write "if (zc.reserved)".  

> > 

> > Which check? There's a check which truncates (writes back to user space

> > len = min(len, sizeof(zc)). Application can still pass garbage beyond

> > sizeof(zc) and syscall may start failing in the future if sizeof(zc)

> > changes.

> 

> That would be the case for new userspace on old kernel. Extending the

> check to the end of the struct would guarantee new userspace can not ask

> for something that the running kernel does not understand.


Indeed, so we're agreeing that check_zeroed_user() is needed before
original optlen from user space gets truncated?
David Ahern Feb. 9, 2021, 3:20 a.m. UTC | #5
On 2/8/21 7:53 PM, Jakub Kicinski wrote:
> On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:

>> On 2/8/21 11:41 AM, Jakub Kicinski wrote:

>>> On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote:  

>>>> There is a check that len is not larger than zs and users can't give

>>>> large buffer.

>>>>

>>>> I would say that is pretty safe to write "if (zc.reserved)".  

>>>

>>> Which check? There's a check which truncates (writes back to user space

>>> len = min(len, sizeof(zc)). Application can still pass garbage beyond

>>> sizeof(zc) and syscall may start failing in the future if sizeof(zc)

>>> changes.

>>

>> That would be the case for new userspace on old kernel. Extending the

>> check to the end of the struct would guarantee new userspace can not ask

>> for something that the running kernel does not understand.

> 

> Indeed, so we're agreeing that check_zeroed_user() is needed before

> original optlen from user space gets truncated?

> 


I thought so, but maybe not. To think through this ...

If current kernel understands a struct of size N, it can only copy that
amount from user to kernel. Anything beyond is ignored in these
multiplexed uAPIs, and that is where the new userspace on old kernel falls.

Known value checks can only be done up to size N. In this case, the
reserved field is at the end of the known struct size, so checking just
the field is fine. Going beyond the reserved field has implications for
extensions to the API which should be handled when those extensions are
added.

So, in short I think the "if (zc.reserved)" is correct as Leon noted.
Leon Romanovsky Feb. 9, 2021, 6:15 a.m. UTC | #6
On Mon, Feb 08, 2021 at 10:41:43AM -0800, Jakub Kicinski wrote:
> On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote:

> > On Sat, Feb 06, 2021 at 03:28:28PM -0800, Jakub Kicinski wrote:

> > > On Sat,  6 Feb 2021 12:36:48 -0800 Arjun Roy wrote:

> > > > From: Arjun Roy <arjunroy@google.com>

> > > >

> > > > Explicitly define reserved field and require it to be 0-valued.

> > >

> > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c

> > > > index e1a17c6b473c..c8469c579ed8 100644

> > > > --- a/net/ipv4/tcp.c

> > > > +++ b/net/ipv4/tcp.c

> > > > @@ -4159,6 +4159,8 @@ static int do_tcp_getsockopt(struct sock *sk, int level,

> > > >  		}

> > > >  		if (copy_from_user(&zc, optval, len))

> > > >  			return -EFAULT;

> > > > +		if (zc.reserved)

> > > > +			return -EINVAL;

> > > >  		lock_sock(sk);

> > > >  		err = tcp_zerocopy_receive(sk, &zc, &tss);

> > > >  		release_sock(sk);

> > >

> > > I was expecting we'd also throw in a check_zeroed_user().

> > > Either we can check if the buffer is zeroed all the way,

> > > or we can't and we shouldn't validate reserved either

> > >

> > > 	check_zeroed_user(optval + offsetof(reserved),

> > > 			  len - offsetof(reserved))

> > > ?

> >

> > There is a check that len is not larger than zs and users can't give

> > large buffer.

> >

> > I would say that is pretty safe to write "if (zc.reserved)".

>

> Which check? There's a check which truncates (writes back to user space

> len = min(len, sizeof(zc)). Application can still pass garbage beyond

> sizeof(zc) and syscall may start failing in the future if sizeof(zc)

> changes.


At least in my tree, we have the length check:
  4155                 if (len > sizeof(zc)) {
  4156                         len = sizeof(zc);
  4157                         if (put_user(len, optlen))
  4158                                 return -EFAULT;
  4159                 }


Ad David wrote below, the "if (zc.reserved)" is enough.

We have following options:
1. Old kernel that have sizeof(sz) upto .reserved and old userspace
-> len <= sizeof(sz) - works correctly.
2. Old kernel that have sizeof(sz) upto .reserved and new userspace that
sends larger struct -> "f (len > sizeof(zc))" will return -EFAULT
3. New kernel that have sizeof(sz) beyond reserved and old userspace
-> any new added field to struct sz should be checked and anyway it is the same as item 1.
4. New kernel and new userspace
-> standard flow.

Thanks
Leon Romanovsky Feb. 9, 2021, 6:29 a.m. UTC | #7
On Mon, Feb 08, 2021 at 08:20:29PM -0700, David Ahern wrote:
> On 2/8/21 7:53 PM, Jakub Kicinski wrote:

> > On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:

> >> On 2/8/21 11:41 AM, Jakub Kicinski wrote:

> >>> On Sun, 7 Feb 2021 10:26:54 +0200 Leon Romanovsky wrote:

> >>>> There is a check that len is not larger than zs and users can't give

> >>>> large buffer.

> >>>>

> >>>> I would say that is pretty safe to write "if (zc.reserved)".

> >>>

> >>> Which check? There's a check which truncates (writes back to user space

> >>> len = min(len, sizeof(zc)). Application can still pass garbage beyond

> >>> sizeof(zc) and syscall may start failing in the future if sizeof(zc)

> >>> changes.

> >>

> >> That would be the case for new userspace on old kernel. Extending the

> >> check to the end of the struct would guarantee new userspace can not ask

> >> for something that the running kernel does not understand.

> >

> > Indeed, so we're agreeing that check_zeroed_user() is needed before

> > original optlen from user space gets truncated?

> >

>

> I thought so, but maybe not. To think through this ...

>

> If current kernel understands a struct of size N, it can only copy that

> amount from user to kernel. Anything beyond is ignored in these

> multiplexed uAPIs, and that is where the new userspace on old kernel falls.

>

> Known value checks can only be done up to size N. In this case, the

> reserved field is at the end of the known struct size, so checking just

> the field is fine. Going beyond the reserved field has implications for

> extensions to the API which should be handled when those extensions are

> added.


It is handled.
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/tree/net/ipv4/tcp.c#n4155
		if (len > sizeof(zc)) {
			len = sizeof(zc);
			if (put_user(len, optlen))
				return -EFAULT;
		}

Thanks

>

> So, in short I think the "if (zc.reserved)" is correct as Leon noted.
Jakub Kicinski Feb. 9, 2021, 4:59 p.m. UTC | #8
On Mon, 8 Feb 2021 20:20:29 -0700 David Ahern wrote:
> On 2/8/21 7:53 PM, Jakub Kicinski wrote:

> > On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:  

> >> That would be the case for new userspace on old kernel. Extending the

> >> check to the end of the struct would guarantee new userspace can not ask

> >> for something that the running kernel does not understand.  

> > 

> > Indeed, so we're agreeing that check_zeroed_user() is needed before

> > original optlen from user space gets truncated?

> 

> I thought so, but maybe not. To think through this ...

> 

> If current kernel understands a struct of size N, it can only copy that

> amount from user to kernel. Anything beyond is ignored in these

> multiplexed uAPIs, and that is where the new userspace on old kernel falls.

> 

> Known value checks can only be done up to size N. In this case, the

> reserved field is at the end of the known struct size, so checking just

> the field is fine. Going beyond the reserved field has implications for

> extensions to the API which should be handled when those extensions are

> added.


Let me try one last time.

There is no check in the kernels that len <= N. User can pass any
length _already_. check_zeroed_user() forces the values beyond the
structure length to be known (0) rather than anything. It can only 
avoid breakages in the future.

> So, in short I think the "if (zc.reserved)" is correct as Leon noted.


If it's correct to check some arbitrary part of the buffer is zeroed 
it should be correct to check the entire tail is zeroed.
Jakub Kicinski Feb. 9, 2021, 4:59 p.m. UTC | #9
On Tue, 9 Feb 2021 08:15:11 +0200 Leon Romanovsky wrote:
> At least in my tree, we have the length check:

>   4155                 if (len > sizeof(zc)) {

>   4156                         len = sizeof(zc);

>   4157                         if (put_user(len, optlen))

>   4158                                 return -EFAULT;

>   4159                 }

> 

> 

> Ad David wrote below, the "if (zc.reserved)" is enough.

> 

> We have following options:

> 1. Old kernel that have sizeof(sz) upto .reserved and old userspace

> -> len <= sizeof(sz) - works correctly.  

> 2. Old kernel that have sizeof(sz) upto .reserved and new userspace that

> sends larger struct -> "f (len > sizeof(zc))" will return -EFAULT


Based on the code you quoted? I don't see how. Maybe I need a vacation.
put_user() just copies len back to user space after truncation.

> 3. New kernel that have sizeof(sz) beyond reserved and old userspace

> -> any new added field to struct sz should be checked and anyway it is the same as item 1.  

> 4. New kernel and new userspace

> -> standard flow.
Leon Romanovsky Feb. 9, 2021, 7:01 p.m. UTC | #10
On Tue, Feb 09, 2021 at 08:59:38AM -0800, Jakub Kicinski wrote:
> On Tue, 9 Feb 2021 08:15:11 +0200 Leon Romanovsky wrote:

> > At least in my tree, we have the length check:

> >   4155                 if (len > sizeof(zc)) {

> >   4156                         len = sizeof(zc);

> >   4157                         if (put_user(len, optlen))

> >   4158                                 return -EFAULT;

> >   4159                 }

> >

> >

> > Ad David wrote below, the "if (zc.reserved)" is enough.

> >

> > We have following options:

> > 1. Old kernel that have sizeof(sz) upto .reserved and old userspace

> > -> len <= sizeof(sz) - works correctly.

> > 2. Old kernel that have sizeof(sz) upto .reserved and new userspace that

> > sends larger struct -> "f (len > sizeof(zc))" will return -EFAULT

>

> Based on the code you quoted? I don't see how. Maybe I need a vacation.

> put_user() just copies len back to user space after truncation.


It is me who needs to go to vacation, you are right it doesn't return
for lengths larger than sizeof(zc).

>

> > 3. New kernel that have sizeof(sz) beyond reserved and old userspace

> > -> any new added field to struct sz should be checked and anyway it is the same as item 1.

> > 4. New kernel and new userspace

> > -> standard flow.
Arjun Roy Feb. 9, 2021, 11:46 p.m. UTC | #11
On Tue, Feb 9, 2021 at 8:59 AM Jakub Kicinski <kuba@kernel.org> wrote:
>

> On Mon, 8 Feb 2021 20:20:29 -0700 David Ahern wrote:

> > On 2/8/21 7:53 PM, Jakub Kicinski wrote:

> > > On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:

> > >> That would be the case for new userspace on old kernel. Extending the

> > >> check to the end of the struct would guarantee new userspace can not ask

> > >> for something that the running kernel does not understand.

> > >

> > > Indeed, so we're agreeing that check_zeroed_user() is needed before

> > > original optlen from user space gets truncated?

> >

> > I thought so, but maybe not. To think through this ...

> >

> > If current kernel understands a struct of size N, it can only copy that

> > amount from user to kernel. Anything beyond is ignored in these

> > multiplexed uAPIs, and that is where the new userspace on old kernel falls.

> >

> > Known value checks can only be done up to size N. In this case, the

> > reserved field is at the end of the known struct size, so checking just

> > the field is fine. Going beyond the reserved field has implications for

> > extensions to the API which should be handled when those extensions are

> > added.

>

> Let me try one last time.

>

> There is no check in the kernels that len <= N. User can pass any

> length _already_. check_zeroed_user() forces the values beyond the

> structure length to be known (0) rather than anything. It can only

> avoid breakages in the future.

>

> > So, in short I think the "if (zc.reserved)" is correct as Leon noted.

>

> If it's correct to check some arbitrary part of the buffer is zeroed

> it should be correct to check the entire tail is zeroed.


So, coming back to the thread, I think the following appears to be the
current thoughts:

1. It is requested that, on the kernel as it stands today, fields
beyond zc.msg_flags (including zc.reserved, the only such field as of
this patch) are zero'd out. So a new userspace asking to do specific
things would fail on this old kernel with EINVAL. Old userspace would
work on old or new kernels. New of course works on new kernels.
2. If it's correct to check some arbitrary field (zc.reserved) to be
0, then it should be fine to check this for all future fields >=
reserved in the struct. So some advanced userspace down the line
doesn't get confused.

Strictly speaking, I'm not convinced this is necessary - eg. 64 bytes
struct right now, suppose userspace of the future gives us 96 bytes of
which the last 32 are non-zero for some feature or the other. We, in
the here and now kernel, truncate that length to 64 (as in we only
copy to kernel those first 64 bytes) and set the returned length to
64. The understanding being, any (future, past or present) userspace
consults the output value; and considers anything byte >= the returned
len to be untouched by the kernel executing the call (ie. garbage,
unacted upon).

So, how would this work for old+new userspace on old+new kernel?

A) old+old, new+new: sizes match, no issue
B) new kernel, old userspace: That's not an issue. We have the
switch(len) statement for that.
C) old kernel, new userspace: that's the 96 vs. 64 B example above -
new userspace would see that the kernel only operated on 64 B and
treat the last 32 B as garbage/unacted on.

In this case, we would not give EINVAL on case C, as we would if we
returned EINVAL on a check_zeroed_user() case for fields past
zc.reserved. We'd do a zerocopy operating on just the features we know
about, and communicate to the user that we only acted on features up
until this byte offset.

Now, given this is the case, we still have the padding confusion with
zc.reserved and the current struct size, so we have to force it to 0
as we are doing. But I think we don't need to go beyond this so far.

Thus, my personal preference is to not have the check_zeroed_user()
check. But if the consensus demands it, then it's an easy enough fix.
What are your thoughts?

Thanks,
-Arjun
David Ahern Feb. 10, 2021, 4:35 a.m. UTC | #12
On 2/9/21 4:46 PM, Arjun Roy wrote:
> On Tue, Feb 9, 2021 at 8:59 AM Jakub Kicinski <kuba@kernel.org> wrote:

>>

>> On Mon, 8 Feb 2021 20:20:29 -0700 David Ahern wrote:

>>> On 2/8/21 7:53 PM, Jakub Kicinski wrote:

>>>> On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:

>>>>> That would be the case for new userspace on old kernel. Extending the

>>>>> check to the end of the struct would guarantee new userspace can not ask

>>>>> for something that the running kernel does not understand.

>>>>

>>>> Indeed, so we're agreeing that check_zeroed_user() is needed before

>>>> original optlen from user space gets truncated?

>>>

>>> I thought so, but maybe not. To think through this ...

>>>

>>> If current kernel understands a struct of size N, it can only copy that

>>> amount from user to kernel. Anything beyond is ignored in these

>>> multiplexed uAPIs, and that is where the new userspace on old kernel falls.

>>>

>>> Known value checks can only be done up to size N. In this case, the

>>> reserved field is at the end of the known struct size, so checking just

>>> the field is fine. Going beyond the reserved field has implications for

>>> extensions to the API which should be handled when those extensions are

>>> added.

>>

>> Let me try one last time.

>>

>> There is no check in the kernels that len <= N. User can pass any

>> length _already_. check_zeroed_user() forces the values beyond the

>> structure length to be known (0) rather than anything. It can only

>> avoid breakages in the future.

>>

>>> So, in short I think the "if (zc.reserved)" is correct as Leon noted.

>>

>> If it's correct to check some arbitrary part of the buffer is zeroed

>> it should be correct to check the entire tail is zeroed.

> 

> So, coming back to the thread, I think the following appears to be the

> current thoughts:

> 

> 1. It is requested that, on the kernel as it stands today, fields

> beyond zc.msg_flags (including zc.reserved, the only such field as of

> this patch) are zero'd out. So a new userspace asking to do specific

> things would fail on this old kernel with EINVAL. Old userspace would

> work on old or new kernels. New of course works on new kernels.

> 2. If it's correct to check some arbitrary field (zc.reserved) to be

> 0, then it should be fine to check this for all future fields >=

> reserved in the struct. So some advanced userspace down the line

> doesn't get confused.

> 

> Strictly speaking, I'm not convinced this is necessary - eg. 64 bytes

> struct right now, suppose userspace of the future gives us 96 bytes of

> which the last 32 are non-zero for some feature or the other. We, in

> the here and now kernel, truncate that length to 64 (as in we only

> copy to kernel those first 64 bytes) and set the returned length to

> 64. The understanding being, any (future, past or present) userspace

> consults the output value; and considers anything byte >= the returned

> len to be untouched by the kernel executing the call (ie. garbage,

> unacted upon).

> 

> So, how would this work for old+new userspace on old+new kernel?

> 

> A) old+old, new+new: sizes match, no issue

> B) new kernel, old userspace: That's not an issue. We have the

> switch(len) statement for that.

> C) old kernel, new userspace: that's the 96 vs. 64 B example above -

> new userspace would see that the kernel only operated on 64 B and

> treat the last 32 B as garbage/unacted on.

> 

> In this case, we would not give EINVAL on case C, as we would if we

> returned EINVAL on a check_zeroed_user() case for fields past

> zc.reserved. We'd do a zerocopy operating on just the features we know

> about, and communicate to the user that we only acted on features up

> until this byte offset.

> 

> Now, given this is the case, we still have the padding confusion with

> zc.reserved and the current struct size, so we have to force it to 0

> as we are doing. But I think we don't need to go beyond this so far.

> 

> Thus, my personal preference is to not have the check_zeroed_user()

> check. But if the consensus demands it, then it's an easy enough fix.

> What are your thoughts?

> 


bpf uses check_zeroed_user to make sure extensions to its structs are
compatible, so yes, this is required.

Also, you need to address legitimate msg_flags as I mentioned in another
response.
Arjun Roy Feb. 10, 2021, 7:23 p.m. UTC | #13
On Tue, Feb 9, 2021 at 8:35 PM David Ahern <dsahern@gmail.com> wrote:
>

> On 2/9/21 4:46 PM, Arjun Roy wrote:

> > On Tue, Feb 9, 2021 at 8:59 AM Jakub Kicinski <kuba@kernel.org> wrote:

> >>

> >> On Mon, 8 Feb 2021 20:20:29 -0700 David Ahern wrote:

> >>> On 2/8/21 7:53 PM, Jakub Kicinski wrote:

> >>>> On Mon, 8 Feb 2021 19:24:05 -0700 David Ahern wrote:

> >>>>> That would be the case for new userspace on old kernel. Extending the

> >>>>> check to the end of the struct would guarantee new userspace can not ask

> >>>>> for something that the running kernel does not understand.

> >>>>

> >>>> Indeed, so we're agreeing that check_zeroed_user() is needed before

> >>>> original optlen from user space gets truncated?

> >>>

> >>> I thought so, but maybe not. To think through this ...

> >>>

> >>> If current kernel understands a struct of size N, it can only copy that

> >>> amount from user to kernel. Anything beyond is ignored in these

> >>> multiplexed uAPIs, and that is where the new userspace on old kernel falls.

> >>>

> >>> Known value checks can only be done up to size N. In this case, the

> >>> reserved field is at the end of the known struct size, so checking just

> >>> the field is fine. Going beyond the reserved field has implications for

> >>> extensions to the API which should be handled when those extensions are

> >>> added.

> >>

> >> Let me try one last time.

> >>

> >> There is no check in the kernels that len <= N. User can pass any

> >> length _already_. check_zeroed_user() forces the values beyond the

> >> structure length to be known (0) rather than anything. It can only

> >> avoid breakages in the future.

> >>

> >>> So, in short I think the "if (zc.reserved)" is correct as Leon noted.

> >>

> >> If it's correct to check some arbitrary part of the buffer is zeroed

> >> it should be correct to check the entire tail is zeroed.

> >

> > So, coming back to the thread, I think the following appears to be the

> > current thoughts:

> >

> > 1. It is requested that, on the kernel as it stands today, fields

> > beyond zc.msg_flags (including zc.reserved, the only such field as of

> > this patch) are zero'd out. So a new userspace asking to do specific

> > things would fail on this old kernel with EINVAL. Old userspace would

> > work on old or new kernels. New of course works on new kernels.

> > 2. If it's correct to check some arbitrary field (zc.reserved) to be

> > 0, then it should be fine to check this for all future fields >=

> > reserved in the struct. So some advanced userspace down the line

> > doesn't get confused.

> >

> > Strictly speaking, I'm not convinced this is necessary - eg. 64 bytes

> > struct right now, suppose userspace of the future gives us 96 bytes of

> > which the last 32 are non-zero for some feature or the other. We, in

> > the here and now kernel, truncate that length to 64 (as in we only

> > copy to kernel those first 64 bytes) and set the returned length to

> > 64. The understanding being, any (future, past or present) userspace

> > consults the output value; and considers anything byte >= the returned

> > len to be untouched by the kernel executing the call (ie. garbage,

> > unacted upon).

> >

> > So, how would this work for old+new userspace on old+new kernel?

> >

> > A) old+old, new+new: sizes match, no issue

> > B) new kernel, old userspace: That's not an issue. We have the

> > switch(len) statement for that.

> > C) old kernel, new userspace: that's the 96 vs. 64 B example above -

> > new userspace would see that the kernel only operated on 64 B and

> > treat the last 32 B as garbage/unacted on.

> >

> > In this case, we would not give EINVAL on case C, as we would if we

> > returned EINVAL on a check_zeroed_user() case for fields past

> > zc.reserved. We'd do a zerocopy operating on just the features we know

> > about, and communicate to the user that we only acted on features up

> > until this byte offset.

> >

> > Now, given this is the case, we still have the padding confusion with

> > zc.reserved and the current struct size, so we have to force it to 0

> > as we are doing. But I think we don't need to go beyond this so far.

> >

> > Thus, my personal preference is to not have the check_zeroed_user()

> > check. But if the consensus demands it, then it's an easy enough fix.

> > What are your thoughts?

> >

>

> bpf uses check_zeroed_user to make sure extensions to its structs are

> compatible, so yes, this is required.


Very well; I shall send out a v3 patch with this.

>

> Also, you need to address legitimate msg_flags as I mentioned in another

> response.


I meant to respond to this earlier but forgot. v3 will address this as well.

-Arjun
diff mbox series

Patch

diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 42fc5a640df4..8fc09e8638b3 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -357,6 +357,6 @@  struct tcp_zerocopy_receive {
 	__u64 msg_control; /* ancillary data */
 	__u64 msg_controllen;
 	__u32 msg_flags;
-	/* __u32 hole;  Next we must add >1 u32 otherwise length checks fail. */
+	__u32 reserved; /* set to 0 for now */
 };
 #endif /* _UAPI_LINUX_TCP_H */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e1a17c6b473c..c8469c579ed8 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4159,6 +4159,8 @@  static int do_tcp_getsockopt(struct sock *sk, int level,
 		}
 		if (copy_from_user(&zc, optval, len))
 			return -EFAULT;
+		if (zc.reserved)
+			return -EINVAL;
 		lock_sock(sk);
 		err = tcp_zerocopy_receive(sk, &zc, &tss);
 		release_sock(sk);