diff mbox series

[2/9] iov_iter: move rw_copy_check_uvector() into lib/iov_iter.c

Message ID 20200925045146.1283714-3-hch@lst.de
State Superseded
Headers show
Series [1/9] compat.h: fix a spelling error in <linux/compat.h> | expand

Commit Message

Christoph Hellwig Sept. 25, 2020, 4:51 a.m. UTC
From: David Laight <David.Laight@ACULAB.COM>

This lets the compiler inline it into import_iovec() generating
much better code.

Signed-off-by: David Laight <david.laight@aculab.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 179 ------------------------------------------------
 lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 176 insertions(+), 179 deletions(-)

Comments

Greg Kroah-Hartman Oct. 21, 2020, 4:13 p.m. UTC | #1
On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
> From: David Laight <David.Laight@ACULAB.COM>
> 
> This lets the compiler inline it into import_iovec() generating
> much better code.
> 
> Signed-off-by: David Laight <david.laight@aculab.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/read_write.c | 179 ------------------------------------------------
>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 176 insertions(+), 179 deletions(-)

Strangely, this commit causes a regression in Linus's tree right now.

I can't really figure out what the regression is, only that this commit
triggers a "large Android system binary" from working properly.  There's
no kernel log messages anywhere, and I don't have any way to strace the
thing in the testing framework, so any hints that people can provide
would be most appreciated.

thanks,

greg k-h
Al Viro Oct. 21, 2020, 11:39 p.m. UTC | #2
On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
> > From: David Laight <David.Laight@ACULAB.COM>
> > 
> > This lets the compiler inline it into import_iovec() generating
> > much better code.
> > 
> > Signed-off-by: David Laight <david.laight@aculab.com>
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/read_write.c | 179 ------------------------------------------------
> >  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 176 insertions(+), 179 deletions(-)
> 
> Strangely, this commit causes a regression in Linus's tree right now.
> 
> I can't really figure out what the regression is, only that this commit
> triggers a "large Android system binary" from working properly.  There's
> no kernel log messages anywhere, and I don't have any way to strace the
> thing in the testing framework, so any hints that people can provide
> would be most appreciated.

It's a pure move - modulo changed line breaks in the argument lists
the functions involved are identical before and after that (just checked
that directly, by checking out the trees before and after, extracting two
functions in question from fs/read_write.c and lib/iov_iter.c (before and
after, resp.) and checking the diff between those.

How certain is your bisection?
Greg Kroah-Hartman Oct. 22, 2020, 8:26 a.m. UTC | #3
On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:
> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
> > On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
> > > From: David Laight <David.Laight@ACULAB.COM>
> > > 
> > > This lets the compiler inline it into import_iovec() generating
> > > much better code.
> > > 
> > > Signed-off-by: David Laight <david.laight@aculab.com>
> > > Signed-off-by: Christoph Hellwig <hch@lst.de>
> > > ---
> > >  fs/read_write.c | 179 ------------------------------------------------
> > >  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 176 insertions(+), 179 deletions(-)
> > 
> > Strangely, this commit causes a regression in Linus's tree right now.
> > 
> > I can't really figure out what the regression is, only that this commit
> > triggers a "large Android system binary" from working properly.  There's
> > no kernel log messages anywhere, and I don't have any way to strace the
> > thing in the testing framework, so any hints that people can provide
> > would be most appreciated.
> 
> It's a pure move - modulo changed line breaks in the argument lists
> the functions involved are identical before and after that (just checked
> that directly, by checking out the trees before and after, extracting two
> functions in question from fs/read_write.c and lib/iov_iter.c (before and
> after, resp.) and checking the diff between those.
> 
> How certain is your bisection?

The bisection is very reproducable.

But, this looks now to be a compiler bug.  I'm using the latest version
of clang and if I put "noinline" at the front of the function,
everything works.

Nick, any ideas here as to who I should report this to?

I'll work on a fixup patch for the Android kernel tree to see if I can
work around it there, but others will hit this in Linus's tree sooner or
later...

thanks,

greg k-h
David Hildenbrand Oct. 22, 2020, 8:35 a.m. UTC | #4
On 22.10.20 10:26, Greg KH wrote:
> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:

>> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:

>>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:

>>>> From: David Laight <David.Laight@ACULAB.COM>

>>>>

>>>> This lets the compiler inline it into import_iovec() generating

>>>> much better code.

>>>>

>>>> Signed-off-by: David Laight <david.laight@aculab.com>

>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>

>>>> ---

>>>>  fs/read_write.c | 179 ------------------------------------------------

>>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++

>>>>  2 files changed, 176 insertions(+), 179 deletions(-)

>>>

>>> Strangely, this commit causes a regression in Linus's tree right now.

>>>

>>> I can't really figure out what the regression is, only that this commit

>>> triggers a "large Android system binary" from working properly.  There's

>>> no kernel log messages anywhere, and I don't have any way to strace the

>>> thing in the testing framework, so any hints that people can provide

>>> would be most appreciated.

>>

>> It's a pure move - modulo changed line breaks in the argument lists

>> the functions involved are identical before and after that (just checked

>> that directly, by checking out the trees before and after, extracting two

>> functions in question from fs/read_write.c and lib/iov_iter.c (before and

>> after, resp.) and checking the diff between those.

>>

>> How certain is your bisection?

> 

> The bisection is very reproducable.

> 

> But, this looks now to be a compiler bug.  I'm using the latest version

> of clang and if I put "noinline" at the front of the function,

> everything works.


Well, the compiler can do more invasive optimizations when inlining. If
you have buggy code that relies on some unspecified behavior, inlining
can change the behavior ... but going over that code, there isn't too
much action going on. At least nothing screamed at me.

-- 
Thanks,

David / dhildenb
David Laight Oct. 22, 2020, 8:40 a.m. UTC | #5
From: David Hildenbrand

> Sent: 22 October 2020 09:35

> 

> On 22.10.20 10:26, Greg KH wrote:

> > On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:

> >> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:

> >>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:

> >>>> From: David Laight <David.Laight@ACULAB.COM>

> >>>>

> >>>> This lets the compiler inline it into import_iovec() generating

> >>>> much better code.

> >>>>

> >>>> Signed-off-by: David Laight <david.laight@aculab.com>

> >>>> Signed-off-by: Christoph Hellwig <hch@lst.de>

> >>>> ---

> >>>>  fs/read_write.c | 179 ------------------------------------------------

> >>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++

> >>>>  2 files changed, 176 insertions(+), 179 deletions(-)

> >>>

> >>> Strangely, this commit causes a regression in Linus's tree right now.

> >>>

> >>> I can't really figure out what the regression is, only that this commit

> >>> triggers a "large Android system binary" from working properly.  There's

> >>> no kernel log messages anywhere, and I don't have any way to strace the

> >>> thing in the testing framework, so any hints that people can provide

> >>> would be most appreciated.

> >>

> >> It's a pure move - modulo changed line breaks in the argument lists

> >> the functions involved are identical before and after that (just checked

> >> that directly, by checking out the trees before and after, extracting two

> >> functions in question from fs/read_write.c and lib/iov_iter.c (before and

> >> after, resp.) and checking the diff between those.

> >>

> >> How certain is your bisection?

> >

> > The bisection is very reproducable.

> >

> > But, this looks now to be a compiler bug.  I'm using the latest version

> > of clang and if I put "noinline" at the front of the function,

> > everything works.

> 

> Well, the compiler can do more invasive optimizations when inlining. If

> you have buggy code that relies on some unspecified behavior, inlining

> can change the behavior ... but going over that code, there isn't too

> much action going on. At least nothing screamed at me.


Apart from all the optimisations that get rid off the 'pass be reference'
parameters and strange conditional tests.
Plenty of scope for the compiler getting it wrong.
But nothing even vaguely illegal.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Hildenbrand Oct. 22, 2020, 8:48 a.m. UTC | #6
On 22.10.20 10:40, David Laight wrote:
> From: David Hildenbrand

>> Sent: 22 October 2020 09:35

>>

>> On 22.10.20 10:26, Greg KH wrote:

>>> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:

>>>> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:

>>>>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:

>>>>>> From: David Laight <David.Laight@ACULAB.COM>

>>>>>>

>>>>>> This lets the compiler inline it into import_iovec() generating

>>>>>> much better code.

>>>>>>

>>>>>> Signed-off-by: David Laight <david.laight@aculab.com>

>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>

>>>>>> ---

>>>>>>  fs/read_write.c | 179 ------------------------------------------------

>>>>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++

>>>>>>  2 files changed, 176 insertions(+), 179 deletions(-)

>>>>>

>>>>> Strangely, this commit causes a regression in Linus's tree right now.

>>>>>

>>>>> I can't really figure out what the regression is, only that this commit

>>>>> triggers a "large Android system binary" from working properly.  There's

>>>>> no kernel log messages anywhere, and I don't have any way to strace the

>>>>> thing in the testing framework, so any hints that people can provide

>>>>> would be most appreciated.

>>>>

>>>> It's a pure move - modulo changed line breaks in the argument lists

>>>> the functions involved are identical before and after that (just checked

>>>> that directly, by checking out the trees before and after, extracting two

>>>> functions in question from fs/read_write.c and lib/iov_iter.c (before and

>>>> after, resp.) and checking the diff between those.

>>>>

>>>> How certain is your bisection?

>>>

>>> The bisection is very reproducable.

>>>

>>> But, this looks now to be a compiler bug.  I'm using the latest version

>>> of clang and if I put "noinline" at the front of the function,

>>> everything works.

>>

>> Well, the compiler can do more invasive optimizations when inlining. If

>> you have buggy code that relies on some unspecified behavior, inlining

>> can change the behavior ... but going over that code, there isn't too

>> much action going on. At least nothing screamed at me.

> 

> Apart from all the optimisations that get rid off the 'pass be reference'

> parameters and strange conditional tests.

> Plenty of scope for the compiler getting it wrong.

> But nothing even vaguely illegal.


Not the first time that people blame the compiler to then figure out
that something else is wrong ... but maybe this time is different :)

-- 
Thanks,

David / dhildenb
David Laight Oct. 22, 2020, 9:02 a.m. UTC | #7
From: David Hildenbrand

> Sent: 22 October 2020 09:49

...
> >>> But, this looks now to be a compiler bug.  I'm using the latest version

> >>> of clang and if I put "noinline" at the front of the function,

> >>> everything works.

> >>

> >> Well, the compiler can do more invasive optimizations when inlining. If

> >> you have buggy code that relies on some unspecified behavior, inlining

> >> can change the behavior ... but going over that code, there isn't too

> >> much action going on. At least nothing screamed at me.

> >

> > Apart from all the optimisations that get rid off the 'pass be reference'

> > parameters and strange conditional tests.

> > Plenty of scope for the compiler getting it wrong.

> > But nothing even vaguely illegal.

> 

> Not the first time that people blame the compiler to then figure out

> that something else is wrong ... but maybe this time is different :)


Usually down to missing asm 'memory' constraints...

Need to read the obj file to see what the compiler did.

The code must be 'approximately right' or nothing would run.
So I'd guess it has to do with > 8 fragments.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Laight Oct. 22, 2020, 9:06 a.m. UTC | #8
From: Greg KH

> Sent: 22 October 2020 10:02

...
> I'm running some more tests, trying to narrow things down as just adding

> a "noinline" to the function that got moved here doesn't work on Linus's

> tree at the moment because the function was split into multiple

> functions.


I was going to look at that once rc2 is in - and the kernel is
likely to work.

I suspect the split version doesn't get inlined the same way?
Which leaves the horrid argument conversion the inline got
rid of back there again.

Which all rather begs the question of why the compiler doesn't
generate the expected code.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Hildenbrand Oct. 22, 2020, 9:19 a.m. UTC | #9
On 22.10.20 11:01, Greg KH wrote:
> On Thu, Oct 22, 2020 at 10:48:59AM +0200, David Hildenbrand wrote:
>> On 22.10.20 10:40, David Laight wrote:
>>> From: David Hildenbrand
>>>> Sent: 22 October 2020 09:35
>>>>
>>>> On 22.10.20 10:26, Greg KH wrote:
>>>>> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:
>>>>>> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:
>>>>>>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:
>>>>>>>> From: David Laight <David.Laight@ACULAB.COM>
>>>>>>>>
>>>>>>>> This lets the compiler inline it into import_iovec() generating
>>>>>>>> much better code.
>>>>>>>>
>>>>>>>> Signed-off-by: David Laight <david.laight@aculab.com>
>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>
>>>>>>>> ---
>>>>>>>>  fs/read_write.c | 179 ------------------------------------------------
>>>>>>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>>  2 files changed, 176 insertions(+), 179 deletions(-)
>>>>>>>
>>>>>>> Strangely, this commit causes a regression in Linus's tree right now.
>>>>>>>
>>>>>>> I can't really figure out what the regression is, only that this commit
>>>>>>> triggers a "large Android system binary" from working properly.  There's
>>>>>>> no kernel log messages anywhere, and I don't have any way to strace the
>>>>>>> thing in the testing framework, so any hints that people can provide
>>>>>>> would be most appreciated.
>>>>>>
>>>>>> It's a pure move - modulo changed line breaks in the argument lists
>>>>>> the functions involved are identical before and after that (just checked
>>>>>> that directly, by checking out the trees before and after, extracting two
>>>>>> functions in question from fs/read_write.c and lib/iov_iter.c (before and
>>>>>> after, resp.) and checking the diff between those.
>>>>>>
>>>>>> How certain is your bisection?
>>>>>
>>>>> The bisection is very reproducable.
>>>>>
>>>>> But, this looks now to be a compiler bug.  I'm using the latest version
>>>>> of clang and if I put "noinline" at the front of the function,
>>>>> everything works.
>>>>
>>>> Well, the compiler can do more invasive optimizations when inlining. If
>>>> you have buggy code that relies on some unspecified behavior, inlining
>>>> can change the behavior ... but going over that code, there isn't too
>>>> much action going on. At least nothing screamed at me.
>>>
>>> Apart from all the optimisations that get rid off the 'pass be reference'
>>> parameters and strange conditional tests.
>>> Plenty of scope for the compiler getting it wrong.
>>> But nothing even vaguely illegal.
>>
>> Not the first time that people blame the compiler to then figure out
>> that something else is wrong ... but maybe this time is different :)
> 
> I agree, I hate to blame the compiler, that's almost never the real
> problem, but this one sure "feels" like it.
> 
> I'm running some more tests, trying to narrow things down as just adding
> a "noinline" to the function that got moved here doesn't work on Linus's
> tree at the moment because the function was split into multiple
> functions.
> 
> Give me a few hours...

I might be wrong but

a) import_iovec() uses:
- unsigned nr_segs -> int
- unsigned fast_segs -> int
b) rw_copy_check_uvector() uses:
- unsigned long nr_segs -> long
- unsigned long fast_seg -> long

So when calling rw_copy_check_uvector(), we have to zero-extend the
registers used for passing the arguments. That's definitely done when
calling the function explicitly. Maybe when inlining something is messed up?

Just a thought ...
David Hildenbrand Oct. 22, 2020, 9:25 a.m. UTC | #10
On 22.10.20 11:19, David Hildenbrand wrote:
> On 22.10.20 11:01, Greg KH wrote:

>> On Thu, Oct 22, 2020 at 10:48:59AM +0200, David Hildenbrand wrote:

>>> On 22.10.20 10:40, David Laight wrote:

>>>> From: David Hildenbrand

>>>>> Sent: 22 October 2020 09:35

>>>>>

>>>>> On 22.10.20 10:26, Greg KH wrote:

>>>>>> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:

>>>>>>> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:

>>>>>>>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:

>>>>>>>>> From: David Laight <David.Laight@ACULAB.COM>

>>>>>>>>>

>>>>>>>>> This lets the compiler inline it into import_iovec() generating

>>>>>>>>> much better code.

>>>>>>>>>

>>>>>>>>> Signed-off-by: David Laight <david.laight@aculab.com>

>>>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>

>>>>>>>>> ---

>>>>>>>>>  fs/read_write.c | 179 ------------------------------------------------

>>>>>>>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++

>>>>>>>>>  2 files changed, 176 insertions(+), 179 deletions(-)

>>>>>>>>

>>>>>>>> Strangely, this commit causes a regression in Linus's tree right now.

>>>>>>>>

>>>>>>>> I can't really figure out what the regression is, only that this commit

>>>>>>>> triggers a "large Android system binary" from working properly.  There's

>>>>>>>> no kernel log messages anywhere, and I don't have any way to strace the

>>>>>>>> thing in the testing framework, so any hints that people can provide

>>>>>>>> would be most appreciated.

>>>>>>>

>>>>>>> It's a pure move - modulo changed line breaks in the argument lists

>>>>>>> the functions involved are identical before and after that (just checked

>>>>>>> that directly, by checking out the trees before and after, extracting two

>>>>>>> functions in question from fs/read_write.c and lib/iov_iter.c (before and

>>>>>>> after, resp.) and checking the diff between those.

>>>>>>>

>>>>>>> How certain is your bisection?

>>>>>>

>>>>>> The bisection is very reproducable.

>>>>>>

>>>>>> But, this looks now to be a compiler bug.  I'm using the latest version

>>>>>> of clang and if I put "noinline" at the front of the function,

>>>>>> everything works.

>>>>>

>>>>> Well, the compiler can do more invasive optimizations when inlining. If

>>>>> you have buggy code that relies on some unspecified behavior, inlining

>>>>> can change the behavior ... but going over that code, there isn't too

>>>>> much action going on. At least nothing screamed at me.

>>>>

>>>> Apart from all the optimisations that get rid off the 'pass be reference'

>>>> parameters and strange conditional tests.

>>>> Plenty of scope for the compiler getting it wrong.

>>>> But nothing even vaguely illegal.

>>>

>>> Not the first time that people blame the compiler to then figure out

>>> that something else is wrong ... but maybe this time is different :)

>>

>> I agree, I hate to blame the compiler, that's almost never the real

>> problem, but this one sure "feels" like it.

>>

>> I'm running some more tests, trying to narrow things down as just adding

>> a "noinline" to the function that got moved here doesn't work on Linus's

>> tree at the moment because the function was split into multiple

>> functions.

>>

>> Give me a few hours...

> 

> I might be wrong but

> 

> a) import_iovec() uses:

> - unsigned nr_segs -> int

> - unsigned fast_segs -> int

> b) rw_copy_check_uvector() uses:

> - unsigned long nr_segs -> long

> - unsigned long fast_seg -> long

> 

> So when calling rw_copy_check_uvector(), we have to zero-extend the

> registers used for passing the arguments. That's definitely done when

> calling the function explicitly. Maybe when inlining something is messed up?

> 

> Just a thought ...

> 


... especially because I recall that clang and gcc behave slightly
differently:

https://github.com/hjl-tools/x86-psABI/issues/2

"Function args are different: narrow types are sign or zero extended to
32 bits, depending on their type. clang depends on this for incoming
args, but gcc doesn't make that assumption. But both compilers do it
when calling, so gcc code can call clang code.

The upper 32 bits of registers are always undefined garbage for types
smaller than 64 bits."

Again, just a thought.

-- 
Thanks,

David / dhildenb
David Laight Oct. 22, 2020, 9:28 a.m. UTC | #11
From: David Hildenbrand

> Sent: 22 October 2020 10:19

> 

> On 22.10.20 11:01, Greg KH wrote:

> > On Thu, Oct 22, 2020 at 10:48:59AM +0200, David Hildenbrand wrote:

> >> On 22.10.20 10:40, David Laight wrote:

> >>> From: David Hildenbrand

> >>>> Sent: 22 October 2020 09:35

> >>>>

> >>>> On 22.10.20 10:26, Greg KH wrote:

> >>>>> On Thu, Oct 22, 2020 at 12:39:14AM +0100, Al Viro wrote:

> >>>>>> On Wed, Oct 21, 2020 at 06:13:01PM +0200, Greg KH wrote:

> >>>>>>> On Fri, Sep 25, 2020 at 06:51:39AM +0200, Christoph Hellwig wrote:

> >>>>>>>> From: David Laight <David.Laight@ACULAB.COM>

> >>>>>>>>

> >>>>>>>> This lets the compiler inline it into import_iovec() generating

> >>>>>>>> much better code.

> >>>>>>>>

> >>>>>>>> Signed-off-by: David Laight <david.laight@aculab.com>

> >>>>>>>> Signed-off-by: Christoph Hellwig <hch@lst.de>

> >>>>>>>> ---

> >>>>>>>>  fs/read_write.c | 179 ------------------------------------------------

> >>>>>>>>  lib/iov_iter.c  | 176 +++++++++++++++++++++++++++++++++++++++++++++++

> >>>>>>>>  2 files changed, 176 insertions(+), 179 deletions(-)

> >>>>>>>

> >>>>>>> Strangely, this commit causes a regression in Linus's tree right now.

> >>>>>>>

> >>>>>>> I can't really figure out what the regression is, only that this commit

> >>>>>>> triggers a "large Android system binary" from working properly.  There's

> >>>>>>> no kernel log messages anywhere, and I don't have any way to strace the

> >>>>>>> thing in the testing framework, so any hints that people can provide

> >>>>>>> would be most appreciated.

> >>>>>>

> >>>>>> It's a pure move - modulo changed line breaks in the argument lists

> >>>>>> the functions involved are identical before and after that (just checked

> >>>>>> that directly, by checking out the trees before and after, extracting two

> >>>>>> functions in question from fs/read_write.c and lib/iov_iter.c (before and

> >>>>>> after, resp.) and checking the diff between those.

> >>>>>>

> >>>>>> How certain is your bisection?

> >>>>>

> >>>>> The bisection is very reproducable.

> >>>>>

> >>>>> But, this looks now to be a compiler bug.  I'm using the latest version

> >>>>> of clang and if I put "noinline" at the front of the function,

> >>>>> everything works.

> >>>>

> >>>> Well, the compiler can do more invasive optimizations when inlining. If

> >>>> you have buggy code that relies on some unspecified behavior, inlining

> >>>> can change the behavior ... but going over that code, there isn't too

> >>>> much action going on. At least nothing screamed at me.

> >>>

> >>> Apart from all the optimisations that get rid off the 'pass be reference'

> >>> parameters and strange conditional tests.

> >>> Plenty of scope for the compiler getting it wrong.

> >>> But nothing even vaguely illegal.

> >>

> >> Not the first time that people blame the compiler to then figure out

> >> that something else is wrong ... but maybe this time is different :)

> >

> > I agree, I hate to blame the compiler, that's almost never the real

> > problem, but this one sure "feels" like it.

> >

> > I'm running some more tests, trying to narrow things down as just adding

> > a "noinline" to the function that got moved here doesn't work on Linus's

> > tree at the moment because the function was split into multiple

> > functions.

> >

> > Give me a few hours...

> 

> I might be wrong but

> 

> a) import_iovec() uses:

> - unsigned nr_segs -> int

> - unsigned fast_segs -> int

> b) rw_copy_check_uvector() uses:

> - unsigned long nr_segs -> long

> - unsigned long fast_seg -> long

> 

> So when calling rw_copy_check_uvector(), we have to zero-extend the

> registers used for passing the arguments. That's definitely done when

> calling the function explicitly. Maybe when inlining something is messed up?


That's also not needed on x86-64 - the high bits get cleared by 32bit writes.
But, IIRC, arm64 leaves them unchanged or undefined.

I guessing that every array access uses a *(Rx + Ry) addressing
mode. So indexing an array even with 'unsigned int' requires
an explicit zero-extend on arm64?
(x86-64 ends up with an explicit sign extend when indexing an
array with 'signed int'.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Laight Oct. 22, 2020, 9:32 a.m. UTC | #12
From: David Hildenbrand

> Sent: 22 October 2020 10:25

...
> ... especially because I recall that clang and gcc behave slightly

> differently:

> 

> https://github.com/hjl-tools/x86-psABI/issues/2

> 

> "Function args are different: narrow types are sign or zero extended to

> 32 bits, depending on their type. clang depends on this for incoming

> args, but gcc doesn't make that assumption. But both compilers do it

> when calling, so gcc code can call clang code.


It really is best to use 'int' (or even 'long') for all numeric
arguments (and results) regardless of the domain of the value.

Related, I've always worried about 'bool'....

> The upper 32 bits of registers are always undefined garbage for types

> smaller than 64 bits."


On x86-64 the high bits are zeroed by all 32bit loads.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Hildenbrand Oct. 22, 2020, 9:36 a.m. UTC | #13
On 22.10.20 11:32, David Laight wrote:
> From: David Hildenbrand
>> Sent: 22 October 2020 10:25
> ...
>> ... especially because I recall that clang and gcc behave slightly
>> differently:
>>
>> https://github.com/hjl-tools/x86-psABI/issues/2
>>
>> "Function args are different: narrow types are sign or zero extended to
>> 32 bits, depending on their type. clang depends on this for incoming
>> args, but gcc doesn't make that assumption. But both compilers do it
>> when calling, so gcc code can call clang code.
> 
> It really is best to use 'int' (or even 'long') for all numeric
> arguments (and results) regardless of the domain of the value.
> 
> Related, I've always worried about 'bool'....
> 
>> The upper 32 bits of registers are always undefined garbage for types
>> smaller than 64 bits."
> 
> On x86-64 the high bits are zeroed by all 32bit loads.

Yeah, but does not help here.


My thinking: if the compiler that calls import_iovec() has garbage in
the upper 32 bit

a) gcc will zero it out and not rely on it being zero.
b) clang will not zero it out, assuming it is zero.

But

a) will zero it out when calling the !inlined variant
b) clang will zero it out when calling the !inlined variant

When inlining, b) strikes. We access garbage. That would mean that we
have calling code that's not generated by clang/gcc IIUC.

We can test easily by changing the parameters instead of adding an "inline".
Greg Kroah-Hartman Oct. 22, 2020, 10:48 a.m. UTC | #14
On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
> On 22.10.20 11:32, David Laight wrote:
> > From: David Hildenbrand
> >> Sent: 22 October 2020 10:25
> > ...
> >> ... especially because I recall that clang and gcc behave slightly
> >> differently:
> >>
> >> https://github.com/hjl-tools/x86-psABI/issues/2
> >>
> >> "Function args are different: narrow types are sign or zero extended to
> >> 32 bits, depending on their type. clang depends on this for incoming
> >> args, but gcc doesn't make that assumption. But both compilers do it
> >> when calling, so gcc code can call clang code.
> > 
> > It really is best to use 'int' (or even 'long') for all numeric
> > arguments (and results) regardless of the domain of the value.
> > 
> > Related, I've always worried about 'bool'....
> > 
> >> The upper 32 bits of registers are always undefined garbage for types
> >> smaller than 64 bits."
> > 
> > On x86-64 the high bits are zeroed by all 32bit loads.
> 
> Yeah, but does not help here.
> 
> 
> My thinking: if the compiler that calls import_iovec() has garbage in
> the upper 32 bit
> 
> a) gcc will zero it out and not rely on it being zero.
> b) clang will not zero it out, assuming it is zero.
> 
> But
> 
> a) will zero it out when calling the !inlined variant
> b) clang will zero it out when calling the !inlined variant
> 
> When inlining, b) strikes. We access garbage. That would mean that we
> have calling code that's not generated by clang/gcc IIUC.
> 
> We can test easily by changing the parameters instead of adding an "inline".

Let me try that as well, as I seem to have a good reproducer, but it
takes a while to run...

greg k-h
Greg Kroah-Hartman Oct. 22, 2020, 12:57 p.m. UTC | #15
On Thu, Oct 22, 2020 at 02:42:24PM +0200, David Hildenbrand wrote:
> On 22.10.20 14:18, Greg KH wrote:
> > On Thu, Oct 22, 2020 at 12:48:05PM +0200, Greg KH wrote:
> >> On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
> >>> On 22.10.20 11:32, David Laight wrote:
> >>>> From: David Hildenbrand
> >>>>> Sent: 22 October 2020 10:25
> >>>> ...
> >>>>> ... especially because I recall that clang and gcc behave slightly
> >>>>> differently:
> >>>>>
> >>>>> https://github.com/hjl-tools/x86-psABI/issues/2
> >>>>>
> >>>>> "Function args are different: narrow types are sign or zero extended to
> >>>>> 32 bits, depending on their type. clang depends on this for incoming
> >>>>> args, but gcc doesn't make that assumption. But both compilers do it
> >>>>> when calling, so gcc code can call clang code.
> >>>>
> >>>> It really is best to use 'int' (or even 'long') for all numeric
> >>>> arguments (and results) regardless of the domain of the value.
> >>>>
> >>>> Related, I've always worried about 'bool'....
> >>>>
> >>>>> The upper 32 bits of registers are always undefined garbage for types
> >>>>> smaller than 64 bits."
> >>>>
> >>>> On x86-64 the high bits are zeroed by all 32bit loads.
> >>>
> >>> Yeah, but does not help here.
> >>>
> >>>
> >>> My thinking: if the compiler that calls import_iovec() has garbage in
> >>> the upper 32 bit
> >>>
> >>> a) gcc will zero it out and not rely on it being zero.
> >>> b) clang will not zero it out, assuming it is zero.
> >>>
> >>> But
> >>>
> >>> a) will zero it out when calling the !inlined variant
> >>> b) clang will zero it out when calling the !inlined variant
> >>>
> >>> When inlining, b) strikes. We access garbage. That would mean that we
> >>> have calling code that's not generated by clang/gcc IIUC.
> >>>
> >>> We can test easily by changing the parameters instead of adding an "inline".
> >>
> >> Let me try that as well, as I seem to have a good reproducer, but it
> >> takes a while to run...
> > 
> > Ok, that didn't work.
> > 
> > And I can't seem to "fix" this by adding noinline to patches further
> > along in the patch series (because this commit's function is no longer
> > present due to later patches.)
> 
> We might have the same issues with iovec_from_user() and friends now.
> 
> > 
> > Will keep digging...
> > 
> > greg k-h
> > 
> 
> 
> Might be worth to give this a try, just to see if it's related to
> garbage in upper 32 bit and the way clang is handling it (might be a BUG
> in clang, though):
> 
> 
> diff --git a/include/linux/uio.h b/include/linux/uio.h
> index 72d88566694e..7527298c6b56 100644
> --- a/include/linux/uio.h
> +++ b/include/linux/uio.h
> @@ -267,7 +267,7 @@ size_t hash_and_copy_to_iter(const void *addr,
> size_t bytes, void *hashp,
>                 struct iov_iter *i);
> 
>  struct iovec *iovec_from_user(const struct iovec __user *uvector,
> -               unsigned long nr_segs, unsigned long fast_segs,
> +               unsigned nr_segs, unsigned fast_segs,
>                 struct iovec *fast_iov, bool compat);
>  ssize_t import_iovec(int type, const struct iovec __user *uvec,
>                  unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 1635111c5bd2..58417f1916dc 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1652,7 +1652,7 @@ const void *dup_iter(struct iov_iter *new, struct
> iov_iter *old, gfp_t flags)
>  EXPORT_SYMBOL(dup_iter);
> 
>  static int copy_compat_iovec_from_user(struct iovec *iov,
> -               const struct iovec __user *uvec, unsigned long nr_segs)
> +               const struct iovec __user *uvec, unsigned nr_segs)
>  {
>         const struct compat_iovec __user *uiov =
>                 (const struct compat_iovec __user *)uvec;
> @@ -1684,7 +1684,7 @@ static int copy_compat_iovec_from_user(struct
> iovec *iov,
>  }
> 
>  static int copy_iovec_from_user(struct iovec *iov,
> -               const struct iovec __user *uvec, unsigned long nr_segs)
> +               const struct iovec __user *uvec, unsigned nr_segs)
>  {
>         unsigned long seg;
> 
> @@ -1699,7 +1699,7 @@ static int copy_iovec_from_user(struct iovec *iov,
>  }
> 
>  struct iovec *iovec_from_user(const struct iovec __user *uvec,
> -               unsigned long nr_segs, unsigned long fast_segs,
> +               unsigned nr_segs, unsigned fast_segs,
>                 struct iovec *fast_iov, bool compat)
>  {
>         struct iovec *iov = fast_iov;
> @@ -1738,7 +1738,7 @@ ssize_t __import_iovec(int type, const struct
> iovec __user *uvec,
>                  struct iov_iter *i, bool compat)
>  {
>         ssize_t total_len = 0;
> -       unsigned long seg;
> +       unsigned seg;
>         struct iovec *iov;
> 
>         iov = iovec_from_user(uvec, nr_segs, fast_segs, *iovp, compat);
> 

Ah, I tested the other way around, making everything "unsigned long"
instead.  Will go try this too, as other tests are still running...

thanks,

greg k-h
Christoph Hellwig Oct. 22, 2020, 1:23 p.m. UTC | #16
On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
> My thinking: if the compiler that calls import_iovec() has garbage in

> the upper 32 bit

> 

> a) gcc will zero it out and not rely on it being zero.

> b) clang will not zero it out, assuming it is zero.

> 

> But

> 

> a) will zero it out when calling the !inlined variant

> b) clang will zero it out when calling the !inlined variant

> 

> When inlining, b) strikes. We access garbage. That would mean that we

> have calling code that's not generated by clang/gcc IIUC.


Most callchains of import_iovec start with the assembly syscall wrappers.
Greg Kroah-Hartman Oct. 22, 2020, 1:50 p.m. UTC | #17
On Thu, Oct 22, 2020 at 02:57:59PM +0200, Greg KH wrote:
> On Thu, Oct 22, 2020 at 02:42:24PM +0200, David Hildenbrand wrote:
> > On 22.10.20 14:18, Greg KH wrote:
> > > On Thu, Oct 22, 2020 at 12:48:05PM +0200, Greg KH wrote:
> > >> On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
> > >>> On 22.10.20 11:32, David Laight wrote:
> > >>>> From: David Hildenbrand
> > >>>>> Sent: 22 October 2020 10:25
> > >>>> ...
> > >>>>> ... especially because I recall that clang and gcc behave slightly
> > >>>>> differently:
> > >>>>>
> > >>>>> https://github.com/hjl-tools/x86-psABI/issues/2
> > >>>>>
> > >>>>> "Function args are different: narrow types are sign or zero extended to
> > >>>>> 32 bits, depending on their type. clang depends on this for incoming
> > >>>>> args, but gcc doesn't make that assumption. But both compilers do it
> > >>>>> when calling, so gcc code can call clang code.
> > >>>>
> > >>>> It really is best to use 'int' (or even 'long') for all numeric
> > >>>> arguments (and results) regardless of the domain of the value.
> > >>>>
> > >>>> Related, I've always worried about 'bool'....
> > >>>>
> > >>>>> The upper 32 bits of registers are always undefined garbage for types
> > >>>>> smaller than 64 bits."
> > >>>>
> > >>>> On x86-64 the high bits are zeroed by all 32bit loads.
> > >>>
> > >>> Yeah, but does not help here.
> > >>>
> > >>>
> > >>> My thinking: if the compiler that calls import_iovec() has garbage in
> > >>> the upper 32 bit
> > >>>
> > >>> a) gcc will zero it out and not rely on it being zero.
> > >>> b) clang will not zero it out, assuming it is zero.
> > >>>
> > >>> But
> > >>>
> > >>> a) will zero it out when calling the !inlined variant
> > >>> b) clang will zero it out when calling the !inlined variant
> > >>>
> > >>> When inlining, b) strikes. We access garbage. That would mean that we
> > >>> have calling code that's not generated by clang/gcc IIUC.
> > >>>
> > >>> We can test easily by changing the parameters instead of adding an "inline".
> > >>
> > >> Let me try that as well, as I seem to have a good reproducer, but it
> > >> takes a while to run...
> > > 
> > > Ok, that didn't work.
> > > 
> > > And I can't seem to "fix" this by adding noinline to patches further
> > > along in the patch series (because this commit's function is no longer
> > > present due to later patches.)
> > 
> > We might have the same issues with iovec_from_user() and friends now.
> > 
> > > 
> > > Will keep digging...
> > > 
> > > greg k-h
> > > 
> > 
> > 
> > Might be worth to give this a try, just to see if it's related to
> > garbage in upper 32 bit and the way clang is handling it (might be a BUG
> > in clang, though):
> > 
> > 
> > diff --git a/include/linux/uio.h b/include/linux/uio.h
> > index 72d88566694e..7527298c6b56 100644
> > --- a/include/linux/uio.h
> > +++ b/include/linux/uio.h
> > @@ -267,7 +267,7 @@ size_t hash_and_copy_to_iter(const void *addr,
> > size_t bytes, void *hashp,
> >                 struct iov_iter *i);
> > 
> >  struct iovec *iovec_from_user(const struct iovec __user *uvector,
> > -               unsigned long nr_segs, unsigned long fast_segs,
> > +               unsigned nr_segs, unsigned fast_segs,
> >                 struct iovec *fast_iov, bool compat);
> >  ssize_t import_iovec(int type, const struct iovec __user *uvec,
> >                  unsigned nr_segs, unsigned fast_segs, struct iovec **iovp,
> > diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> > index 1635111c5bd2..58417f1916dc 100644
> > --- a/lib/iov_iter.c
> > +++ b/lib/iov_iter.c
> > @@ -1652,7 +1652,7 @@ const void *dup_iter(struct iov_iter *new, struct
> > iov_iter *old, gfp_t flags)
> >  EXPORT_SYMBOL(dup_iter);
> > 
> >  static int copy_compat_iovec_from_user(struct iovec *iov,
> > -               const struct iovec __user *uvec, unsigned long nr_segs)
> > +               const struct iovec __user *uvec, unsigned nr_segs)
> >  {
> >         const struct compat_iovec __user *uiov =
> >                 (const struct compat_iovec __user *)uvec;
> > @@ -1684,7 +1684,7 @@ static int copy_compat_iovec_from_user(struct
> > iovec *iov,
> >  }
> > 
> >  static int copy_iovec_from_user(struct iovec *iov,
> > -               const struct iovec __user *uvec, unsigned long nr_segs)
> > +               const struct iovec __user *uvec, unsigned nr_segs)
> >  {
> >         unsigned long seg;
> > 
> > @@ -1699,7 +1699,7 @@ static int copy_iovec_from_user(struct iovec *iov,
> >  }
> > 
> >  struct iovec *iovec_from_user(const struct iovec __user *uvec,
> > -               unsigned long nr_segs, unsigned long fast_segs,
> > +               unsigned nr_segs, unsigned fast_segs,
> >                 struct iovec *fast_iov, bool compat)
> >  {
> >         struct iovec *iov = fast_iov;
> > @@ -1738,7 +1738,7 @@ ssize_t __import_iovec(int type, const struct
> > iovec __user *uvec,
> >                  struct iov_iter *i, bool compat)
> >  {
> >         ssize_t total_len = 0;
> > -       unsigned long seg;
> > +       unsigned seg;
> >         struct iovec *iov;
> > 
> >         iov = iovec_from_user(uvec, nr_segs, fast_segs, *iovp, compat);
> > 
> 
> Ah, I tested the other way around, making everything "unsigned long"
> instead.  Will go try this too, as other tests are still running...

Ok, no, this didn't work either.

Nick, I think I need some compiler help here.  Any ideas?

thanks,

greg k-h
Greg Kroah-Hartman Oct. 22, 2020, 2:40 p.m. UTC | #18
On Thu, Oct 22, 2020 at 04:28:20PM +0200, Arnd Bergmann wrote:
> On Thu, Oct 22, 2020 at 3:50 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> > On Thu, Oct 22, 2020 at 02:57:59PM +0200, Greg KH wrote:
> > > On Thu, Oct 22, 2020 at 02:42:24PM +0200, David Hildenbrand wrote:
> 
> > > >  struct iovec *iovec_from_user(const struct iovec __user *uvec,
> > > > -               unsigned long nr_segs, unsigned long fast_segs,
> > > > +               unsigned nr_segs, unsigned fast_segs,
> > > >                 struct iovec *fast_iov, bool compat)
> > > >  {
> > > >         struct iovec *iov = fast_iov;
> > > > @@ -1738,7 +1738,7 @@ ssize_t __import_iovec(int type, const struct
> > > > iovec __user *uvec,
> > > >                  struct iov_iter *i, bool compat)
> > > >  {
> > > >         ssize_t total_len = 0;
> > > > -       unsigned long seg;
> > > > +       unsigned seg;
> > > >         struct iovec *iov;
> > > >
> > > >         iov = iovec_from_user(uvec, nr_segs, fast_segs, *iovp, compat);
> > > >
> > >
> > > Ah, I tested the other way around, making everything "unsigned long"
> > > instead.  Will go try this too, as other tests are still running...
> >
> > Ok, no, this didn't work either.
> >
> > Nick, I think I need some compiler help here.  Any ideas?
> 
> I don't think the patch above would reliably clear the upper bits if they
> contain garbage.
> 
> If the integer extension is the problem, the way I'd try it is to make the
> function take an 'unsigned long' and then explictly mask the upper
> bits with
> 
>      seg = lower_32_bits(seg);
> 
> Can you attach the iov_iter.s files from the broken build, plus the
> one with 'noinline' for comparison? Maybe something can be seen
> in there.

I don't know how to extract the .s files easily from the AOSP build
system, I'll look into that.  I'm also now testing by downgrading to an
older version of clang (10 instead of 11), to see if that matters at all
or not...

thanks,

greg k-h
David Laight Oct. 22, 2020, 4:15 p.m. UTC | #19
From: Greg KH

> Sent: 22 October 2020 15:40

> 

> On Thu, Oct 22, 2020 at 04:28:20PM +0200, Arnd Bergmann wrote:

...
> > Can you attach the iov_iter.s files from the broken build, plus the

> > one with 'noinline' for comparison? Maybe something can be seen

> > in there.

> 

> I don't know how to extract the .s files easily from the AOSP build

> system, I'll look into that.  I'm also now testing by downgrading to an

> older version of clang (10 instead of 11), to see if that matters at all

> or not...


Back from a day out - after it stopped raining.
Trying to use up leave before the end of the year.

Can you use objdump on the kernel binary itself and cut out
the single function?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Laight Oct. 22, 2020, 4:35 p.m. UTC | #20
From: Christoph Hellwig
> Sent: 22 October 2020 14:24
> 
> On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:
> > My thinking: if the compiler that calls import_iovec() has garbage in
> > the upper 32 bit
> >
> > a) gcc will zero it out and not rely on it being zero.
> > b) clang will not zero it out, assuming it is zero.
> >
> > But
> >
> > a) will zero it out when calling the !inlined variant
> > b) clang will zero it out when calling the !inlined variant
> >
> > When inlining, b) strikes. We access garbage. That would mean that we
> > have calling code that's not generated by clang/gcc IIUC.
> 
> Most callchains of import_iovec start with the assembly syscall wrappers.

Wait...
readv(2) defines:
	ssize_t readv(int fd, const struct iovec *iov, int iovcnt);

But the syscall is defined as:

SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
                unsigned long, vlen)
{
        return do_readv(fd, vec, vlen, 0);
}

I'm guessing that nothing actually masks the high bits that come
from an application that is compiled with clang?

The vlen is 'unsigned long' through the first few calls.
So unless there is a non-inlined function than takes vlen
as 'int' the high garbage bits from userspace are kept.

Which makes it a bug in the kernel C syscall wrappers.
They need to explicitly mask the high bits of 32bit
arguments on arm64 but not x86-64.

What does the ARM EABI say about register parameters?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Laight Oct. 22, 2020, 4:50 p.m. UTC | #21
From: Matthew Wilcox <willy@infradead.org>
> Sent: 22 October 2020 17:41
> 
> On Thu, Oct 22, 2020 at 04:35:17PM +0000, David Laight wrote:
> > Wait...
> > readv(2) defines:
> > 	ssize_t readv(int fd, const struct iovec *iov, int iovcnt);
> 
> It doesn't really matter what the manpage says.  What does the AOSP
> libc header say?

The only copy I can find is:

/usr/include/x86_64-linux-gnu/sys/uio.h:extern ssize_t readv (int __fd, const struct iovec *__iovec, int __count)
/usr/include/x86_64-linux-gnu/sys/uio.h-  __wur;

and not surprisingly agrees.
POSIX and/or TOG will (more or less) mandate the prototype.

> > But the syscall is defined as:
> >
> > SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
> >                 unsigned long, vlen)
> > {
> >         return do_readv(fd, vec, vlen, 0);
> > }

I wonder when the high bits of 'fd' get zapped?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Nick Desaulniers Oct. 22, 2020, 5 p.m. UTC | #22
On Thu, Oct 22, 2020 at 9:40 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Oct 22, 2020 at 04:35:17PM +0000, David Laight wrote:
> > Wait...
> > readv(2) defines:
> >       ssize_t readv(int fd, const struct iovec *iov, int iovcnt);
>
> It doesn't really matter what the manpage says.  What does the AOSP
> libc header say?

Same: https://android.googlesource.com/platform/bionic/+/refs/heads/master/libc/include/sys/uio.h#38

Theoretically someone could bypass libc to make a system call, right?

>
> > But the syscall is defined as:
> >
> > SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
> >                 unsigned long, vlen)
> > {
> >         return do_readv(fd, vec, vlen, 0);
> > }
>
Nick Desaulniers Oct. 22, 2020, 5:54 p.m. UTC | #23
On Thu, Oct 22, 2020 at 9:35 AM David Laight <David.Laight@aculab.com> wrote:
>

> From: Christoph Hellwig

> > Sent: 22 October 2020 14:24

> >

> > On Thu, Oct 22, 2020 at 11:36:40AM +0200, David Hildenbrand wrote:

> > > My thinking: if the compiler that calls import_iovec() has garbage in

> > > the upper 32 bit

> > >

> > > a) gcc will zero it out and not rely on it being zero.

> > > b) clang will not zero it out, assuming it is zero.

> > >

> > > But

> > >

> > > a) will zero it out when calling the !inlined variant

> > > b) clang will zero it out when calling the !inlined variant

> > >

> > > When inlining, b) strikes. We access garbage. That would mean that we

> > > have calling code that's not generated by clang/gcc IIUC.

> >

> > Most callchains of import_iovec start with the assembly syscall wrappers.

>

> Wait...

> readv(2) defines:

>         ssize_t readv(int fd, const struct iovec *iov, int iovcnt);

>

> But the syscall is defined as:

>

> SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,

>                 unsigned long, vlen)

> {

>         return do_readv(fd, vec, vlen, 0);

> }

>

> I'm guessing that nothing actually masks the high bits that come

> from an application that is compiled with clang?

>

> The vlen is 'unsigned long' through the first few calls.

> So unless there is a non-inlined function than takes vlen

> as 'int' the high garbage bits from userspace are kept.


Yeah, that's likely a bug: https://godbolt.org/z/KfsPKs

>

> Which makes it a bug in the kernel C syscall wrappers.

> They need to explicitly mask the high bits of 32bit

> arguments on arm64 but not x86-64.


Why not x86-64? Wouldn't it be *any* LP64 ISA?

Attaching a patch that uses the proper width, but I'm pretty sure
there's still a signedness issue .  Greg, would you mind running this
through the wringer?

>

> What does the ARM EABI say about register parameters?


AAPCS is the ABI for 64b ARM, IIUC, which is the ISA GKH is reporting
the problem against. IIUC, EABI is one of the 32b ABIs.  aarch64 is
LP64 just like x86_64.

--
Thanks,
~Nick Desaulniers
Al Viro Oct. 22, 2020, 6:19 p.m. UTC | #24
On Thu, Oct 22, 2020 at 05:40:40PM +0100, Matthew Wilcox wrote:
> On Thu, Oct 22, 2020 at 04:35:17PM +0000, David Laight wrote:
> > Wait...
> > readv(2) defines:
> > 	ssize_t readv(int fd, const struct iovec *iov, int iovcnt);
> 
> It doesn't really matter what the manpage says.  What does the AOSP
> libc header say?

FWIW, see https://www.spinics.net/lists/linux-scsi/msg147836.html and
subthread from there on...
Nick Desaulniers Oct. 22, 2020, 7:04 p.m. UTC | #25
On Thu, Oct 22, 2020 at 11:13 AM Arnd Bergmann <arnd@arndb.de> wrote:
>

> On Thu, Oct 22, 2020 at 7:54 PM Nick Desaulniers

> <ndesaulniers@google.com> wrote:

> > On Thu, Oct 22, 2020 at 9:35 AM David Laight <David.Laight@aculab.com> wrote:

> > >

> > > Which makes it a bug in the kernel C syscall wrappers.

> > > They need to explicitly mask the high bits of 32bit

> > > arguments on arm64 but not x86-64.

> >

> > Why not x86-64? Wouldn't it be *any* LP64 ISA?

>

> x86-64 is slightly special because most instructions on a 32-bit

> argument clear the upper 32 bits, while on most architectures

> the same instruction would leave the upper bits unchanged.


Oh interesting, depends on the operations too on x86_64 IIUC?

>

> > Attaching a patch that uses the proper width, but I'm pretty sure

> > there's still a signedness issue .  Greg, would you mind running this

> > through the wringer?

>

> I would not expect this to change anything for the bug that Greg

> is chasing, unless there is also a bug in clang.

>

> In the version before the patch, we get a 64-bit argument from

> user space, which may consist of the intended value in the lower

> bits plus garbage in the upper bits. However, vlen only gets

> passed down  into import_iovec() without any other operations

> on it, and since import_iovec takes a 32-bit argument, this is

> where it finally gets narrowed.


Passing an `unsigned long` as an `unsigned int` does no such
narrowing: https://godbolt.org/z/TvfMxe (same vice-versa, just tail
calls, no masking instructions).
So if rw_copy_check_uvector() is inlined into import_iovec() (looking
at the mainline@1028ae406999), then children calls of
`rw_copy_check_uvector()` will be interpreting the `nr_segs` register
unmodified, ie. garbage in the upper 32b.

>

> After your patch, the SYSCALL_DEFINE3() does the narrowing

> conversion with the same clearing of the upper bits.

>

> If there is a problem somewhere leading up to import_iovec(),

> it would have to in some code that expects to get a 32-bit

> register argument but gets called with a register that has

> garbage in the upper bits /without/ going through a correct

> sanitizing function like SYSCALL_DEFINE3().

>

>       Arnd




-- 
Thanks,
~Nick Desaulniers
Al Viro Oct. 22, 2020, 7:24 p.m. UTC | #26
On Thu, Oct 22, 2020 at 12:04:52PM -0700, Nick Desaulniers wrote:

> Passing an `unsigned long` as an `unsigned int` does no such
> narrowing: https://godbolt.org/z/TvfMxe (same vice-versa, just tail
> calls, no masking instructions).
> So if rw_copy_check_uvector() is inlined into import_iovec() (looking
> at the mainline@1028ae406999), then children calls of
> `rw_copy_check_uvector()` will be interpreting the `nr_segs` register
> unmodified, ie. garbage in the upper 32b.

FWIW,

void f(unsinged long v)
{
	if (v != 1)
		printf("failed\n");
}

void g(unsigned int v)
{
	f(v);
}

void h(unsigned long v)
{
	g(v);
}

main()
{
	h(0x100000001);
}

must not produce any output on a host with 32bit int and 64bit long, regardless of
the inlining, having functions live in different compilation units, etc.

Depending upon the calling conventions, compiler might do truncation in caller or
in a callee, but it must be done _somewhere_.
Al Viro Oct. 22, 2020, 8:09 p.m. UTC | #27
On Thu, Oct 22, 2020 at 09:06:29PM +0100, Al Viro wrote:
> On Thu, Oct 22, 2020 at 08:24:58PM +0100, Al Viro wrote:
> 
> > Depending upon the calling conventions, compiler might do truncation in caller or
> > in a callee, but it must be done _somewhere_.
> 
> Unless I'm misreading AAPCS64,
> 	"Unlike in the 32-bit AAPCS, named integral values must be narrowed by the callee
> 	 rather than the caller"
> in 6.4.2 means that callee must not _not_ expect the upper 32 bits of %x0..%x7 to contain

Sorry, artefact of editing - that's

"in 6.4.2 means that callee must _not_ expect the upper 32 bits of %x0..%x7 to contain"

obviously.
Nick Desaulniers Oct. 22, 2020, 8:11 p.m. UTC | #28
On Thu, Oct 22, 2020 at 12:25 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>

> On Thu, Oct 22, 2020 at 12:04:52PM -0700, Nick Desaulniers wrote:

>

> > Passing an `unsigned long` as an `unsigned int` does no such

> > narrowing: https://godbolt.org/z/TvfMxe (same vice-versa, just tail

> > calls, no masking instructions).

> > So if rw_copy_check_uvector() is inlined into import_iovec() (looking

> > at the mainline@1028ae406999), then children calls of

> > `rw_copy_check_uvector()` will be interpreting the `nr_segs` register

> > unmodified, ie. garbage in the upper 32b.

>

> FWIW,

>

> void f(unsinged long v)

> {

>         if (v != 1)

>                 printf("failed\n");

> }

>

> void g(unsigned int v)

> {

>         f(v);

> }

>

> void h(unsigned long v)

> {

>         g(v);

> }

>

> main()

> {

>         h(0x100000001);

> }


A good/analogous example, but things get weird when the leaf node in
the call chain is inline asm: https://godbolt.org/z/s19TY5

(I'm not sure that's precisely what's going on here; I'll need to dive
more into the calls rw_copy_check_uvector() makes to see if there's
inline asm somewhere, pretty sure calls to get_user with `nr_regs`
exist).

>

> must not produce any output on a host with 32bit int and 64bit long, regardless of

> the inlining, having functions live in different compilation units, etc.

>

> Depending upon the calling conventions, compiler might do truncation in caller or

> in a callee, but it must be done _somewhere_.




-- 
Thanks,
~Nick Desaulniers
Eric Biggers Oct. 22, 2020, 8:59 p.m. UTC | #29
On Thu, Oct 22, 2020 at 10:00:44AM -0700, Nick Desaulniers wrote:
> On Thu, Oct 22, 2020 at 9:40 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Oct 22, 2020 at 04:35:17PM +0000, David Laight wrote:
> > > Wait...
> > > readv(2) defines:
> > >       ssize_t readv(int fd, const struct iovec *iov, int iovcnt);
> >
> > It doesn't really matter what the manpage says.  What does the AOSP
> > libc header say?
> 
> Same: https://android.googlesource.com/platform/bionic/+/refs/heads/master/libc/include/sys/uio.h#38
> 
> Theoretically someone could bypass libc to make a system call, right?
> 
> >
> > > But the syscall is defined as:
> > >
> > > SYSCALL_DEFINE3(readv, unsigned long, fd, const struct iovec __user *, vec,
> > >                 unsigned long, vlen)
> > > {
> > >         return do_readv(fd, vec, vlen, 0);
> > > }
> >
> 

FWIW, glibc makes the readv() syscall assuming that fd and vlen are 'int' as
well.  So this problem isn't specific to Android's libc.
David Laight Oct. 22, 2020, 10:04 p.m. UTC | #30
From: Nick Desaulniers

> Sent: 22 October 2020 20:05

> 

...
> Passing an `unsigned long` as an `unsigned int` does no such

> narrowing: https://godbolt.org/z/TvfMxe (same vice-versa, just tail

> calls, no masking instructions).


Right but is the called function going to use 32bit ops
and/or mask the register?
Certainly that is likely on x86-64.

I've rather lost track of where the masking is expected
to happen.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Laight Oct. 23, 2020, 12:46 p.m. UTC | #31
From: Greg KH <gregkh@linuxfoundation.org>

> Sent: 22 October 2020 14:51


I've rammed the code into godbolt.

https://godbolt.org/z/9v5PPW

Definitely a clang bug.

Search for [wx]24 in the clang output.
nr_segs comes in as w2 and the initial bound checks are done on w2.
w24 is loaded from w2 - I don't believe this changes the high bits.
There are no references to w24, just x24.
So the kmalloc_array() is passed 'huge' and will fail.
The iov_iter_init also gets the 64bit value.

Note that the gcc code has a sign-extend copy of w2.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Hildenbrand Oct. 23, 2020, 1:09 p.m. UTC | #32
On 23.10.20 14:46, David Laight wrote:
> From: Greg KH <gregkh@linuxfoundation.org>
>> Sent: 22 October 2020 14:51
> 
> I've rammed the code into godbolt.
> 
> https://godbolt.org/z/9v5PPW
> 
> Definitely a clang bug.
> 
> Search for [wx]24 in the clang output.
> nr_segs comes in as w2 and the initial bound checks are done on w2.
> w24 is loaded from w2 - I don't believe this changes the high bits.
> There are no references to w24, just x24.
> So the kmalloc_array() is passed 'huge' and will fail.
> The iov_iter_init also gets the 64bit value.
> 
> Note that the gcc code has a sign-extend copy of w2.

Do we have a result from using "unsigned long" in the base function and
explicitly masking of the high bits? That should definitely work.

Now, I am not a compiler expert, but as I already cited, at least on
x86-64 clang expects that the high bits were cleared by the caller - in
contrast to gcc. I suspect it's the same on arm64, but again, I am no
compiler expert.

If what I said and cites for x86-64 is correct, if the function expects
an "unsigned int", it will happily use 64bit operations without further
checks where valid when assuming high bits are zero. That's why even
converting everything to "unsigned int" as proposed by me won't work on
clang - it assumes high bits are zero (as indicated by Nick).

As I am neither a compiler experts (did I mention that already? ;) ) nor
an arm64 experts, I can't tell if this is a compiler BUG or not.

Main issue seems to be garbage in high bits as originally suggested by me.
David Laight Oct. 23, 2020, 1:28 p.m. UTC | #33
From: Arnd Bergmann

> Sent: 23 October 2020 14:23

> 

> On Fri, Oct 23, 2020 at 2:46 PM David Laight <David.Laight@aculab.com> wrote:

> >

> > From: Greg KH <gregkh@linuxfoundation.org>

> > > Sent: 22 October 2020 14:51

> >

> > I've rammed the code into godbolt.

> >

> > https://godbolt.org/z/9v5PPW

> >

> > Definitely a clang bug.

> >

> > Search for [wx]24 in the clang output.

> > nr_segs comes in as w2 and the initial bound checks are done on w2.

> > w24 is loaded from w2 - I don't believe this changes the high bits.

> 

> You believe wrong, "mov w24, w2" is a zero-extending operation.


Ah ok, but gcc uses utxw for the same task.
I guess they could be the same opcode.

Last time I wrote ARM thumb didn't really exist - never mind 64bit

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Laight Oct. 23, 2020, 2:39 p.m. UTC | #34
From: David Hildenbrand

> Sent: 23 October 2020 15:33

...
> I just checked against upstream code generated by clang 10 and it

> properly discards the upper 32bit via a mov w23 w2.

> 

> So at least clang 10 indeed properly assumes we could have garbage and

> masks it off.

> 

> Maybe the issue is somewhere else, unrelated to nr_pages ... or clang 11

> behaves differently.


We'll need the disassembly from a failing kernel image.
It isn't that big to hand annotate.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Segher Boessenkool Oct. 23, 2020, 6:27 p.m. UTC | #35
On Fri, Oct 23, 2020 at 06:58:57PM +0100, Al Viro wrote:
> On Fri, Oct 23, 2020 at 03:09:30PM +0200, David Hildenbrand wrote:
> 
> > Now, I am not a compiler expert, but as I already cited, at least on
> > x86-64 clang expects that the high bits were cleared by the caller - in
> > contrast to gcc. I suspect it's the same on arm64, but again, I am no
> > compiler expert.
> > 
> > If what I said and cites for x86-64 is correct, if the function expects
> > an "unsigned int", it will happily use 64bit operations without further
> > checks where valid when assuming high bits are zero. That's why even
> > converting everything to "unsigned int" as proposed by me won't work on
> > clang - it assumes high bits are zero (as indicated by Nick).
> > 
> > As I am neither a compiler experts (did I mention that already? ;) ) nor
> > an arm64 experts, I can't tell if this is a compiler BUG or not.
> 
> On arm64 when callee expects a 32bit argument, the caller is *not* responsible
> for clearing the upper half of 64bit register used to pass the value - it only
> needs to store the actual value into the lower half.  The callee must consider
> the contents of the upper half of that register as undefined.  See AAPCS64 (e.g.
> https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#parameter-passing-rules
> ); AFAICS, the relevant bit is
> 	"Unlike in the 32-bit AAPCS, named integral values must be narrowed by
> the callee rather than the caller."

Or the formal rule:

C.9 	If the argument is an Integral or Pointer Type, the size of the
	argument is less than or equal to 8 bytes and the NGRN is less
	than 8, the argument is copied to the least significant bits in
	x[NGRN]. The NGRN is incremented by one. The argument has now
	been allocated.


Segher
David Laight Oct. 24, 2020, 9:12 p.m. UTC | #36
From: Segher Boessenkool

> Sent: 24 October 2020 18:29

> 

> On Fri, Oct 23, 2020 at 09:28:59PM +0000, David Laight wrote:

> > From: Segher Boessenkool

> > > Sent: 23 October 2020 19:27

> > > On Fri, Oct 23, 2020 at 06:58:57PM +0100, Al Viro wrote:

> > > > On Fri, Oct 23, 2020 at 03:09:30PM +0200, David Hildenbrand wrote:

> > > > On arm64 when callee expects a 32bit argument, the caller is *not* responsible

> > > > for clearing the upper half of 64bit register used to pass the value - it only

> > > > needs to store the actual value into the lower half.  The callee must consider

> > > > the contents of the upper half of that register as undefined.  See AAPCS64 (e.g.

> > > > https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#parameter-passing-rules

> > > > ); AFAICS, the relevant bit is

> > > > 	"Unlike in the 32-bit AAPCS, named integral values must be narrowed by

> > > > the callee rather than the caller."

> > >

> > > Or the formal rule:

> > >

> > > C.9 	If the argument is an Integral or Pointer Type, the size of the

> > > 	argument is less than or equal to 8 bytes and the NGRN is less

> > > 	than 8, the argument is copied to the least significant bits in

> > > 	x[NGRN]. The NGRN is incremented by one. The argument has now

> > > 	been allocated.

> >

> > So, in essence, if the value is in a 64bit register the calling

> > code is independent of the actual type of the formal parameter.

> > Clearly a value might need explicit widening.

> 

> No, this says that if you pass a 32-bit integer in a 64-bit register,

> then the top 32 bits of that register hold an undefined value.


That's sort of what I meant.
The 'normal' junk in the hight bits will there because the variable
in the calling code is wider.

> > I've found a copy of the 64 bit arm instruction set.

> > Unfortunately it is alpha sorted and repetitive so shows none

> > of the symmetry and makes things difficult to find.

> 

> All of this is ABI, not ISA.  Look at the AAPCS64 pointed to above.

> 

> > But, contrary to what someone suggested most register writes

> > (eg from arithmetic) seem to zero/extend the high bits.

> 

> Everything that writes a "w" does, yes.  But that has nothing to do with

> the parameter passing rules, that is ABI.  It just means that very often

> a 32-bit integer will be passed zero-extended in a 64-bit register, but

> that is just luck (or not, it makes finding bugs harder ;-) )


Working out why the code is wrong is more of an ISA issue than an ABI one.
It may be an ABI one, but the analysis is ISA.

I've written a lot of asm over the years - decoding compiler generated
asm isn't that hard.
At least ARM doesn't have annulled delay slots.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
David Laight Nov. 2, 2020, 9:06 a.m. UTC | #37
From: 'Greg KH'
> Sent: 23 October 2020 15:47
> 
> On Fri, Oct 23, 2020 at 02:39:24PM +0000, David Laight wrote:
> > From: David Hildenbrand
> > > Sent: 23 October 2020 15:33
> > ...
> > > I just checked against upstream code generated by clang 10 and it
> > > properly discards the upper 32bit via a mov w23 w2.
> > >
> > > So at least clang 10 indeed properly assumes we could have garbage and
> > > masks it off.
> > >
> > > Maybe the issue is somewhere else, unrelated to nr_pages ... or clang 11
> > > behaves differently.
> >
> > We'll need the disassembly from a failing kernel image.
> > It isn't that big to hand annotate.
> 
> I've worked around the merge at the moment in the android tree, but it
> is still quite reproducable, and will try to get a .o file to
> disassemble on Monday or so...

Did this get properly resolved?

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/fs/read_write.c b/fs/read_write.c
index 5db58b8c78d0dd..e5e891a88442ef 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -752,185 +752,6 @@  static ssize_t do_loop_readv_writev(struct file *filp, struct iov_iter *iter,
 	return ret;
 }
 
-/**
- * rw_copy_check_uvector() - Copy an array of &struct iovec from userspace
- *     into the kernel and check that it is valid.
- *
- * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
- * @uvector: Pointer to the userspace array.
- * @nr_segs: Number of elements in userspace array.
- * @fast_segs: Number of elements in @fast_pointer.
- * @fast_pointer: Pointer to (usually small on-stack) kernel array.
- * @ret_pointer: (output parameter) Pointer to a variable that will point to
- *     either @fast_pointer, a newly allocated kernel array, or NULL,
- *     depending on which array was used.
- *
- * This function copies an array of &struct iovec of @nr_segs from
- * userspace into the kernel and checks that each element is valid (e.g.
- * it does not point to a kernel address or cause overflow by being too
- * large, etc.).
- *
- * As an optimization, the caller may provide a pointer to a small
- * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
- * (the size of this array, or 0 if unused, should be given in @fast_segs).
- *
- * @ret_pointer will always point to the array that was used, so the
- * caller must take care not to call kfree() on it e.g. in case the
- * @fast_pointer array was used and it was allocated on the stack.
- *
- * Return: The total number of bytes covered by the iovec array on success
- *   or a negative error code on error.
- */
-ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
-			      unsigned long nr_segs, unsigned long fast_segs,
-			      struct iovec *fast_pointer,
-			      struct iovec **ret_pointer)
-{
-	unsigned long seg;
-	ssize_t ret;
-	struct iovec *iov = fast_pointer;
-
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs == 0) {
-		ret = 0;
-		goto out;
-	}
-
-	/*
-	 * First get the "struct iovec" from user memory and
-	 * verify all the pointers
-	 */
-	if (nr_segs > UIO_MAXIOV) {
-		ret = -EINVAL;
-		goto out;
-	}
-	if (nr_segs > fast_segs) {
-		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov == NULL) {
-			ret = -ENOMEM;
-			goto out;
-		}
-	}
-	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
-		ret = -EFAULT;
-		goto out;
-	}
-
-	/*
-	 * According to the Single Unix Specification we should return EINVAL
-	 * if an element length is < 0 when cast to ssize_t or if the
-	 * total length would overflow the ssize_t return value of the
-	 * system call.
-	 *
-	 * Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
-	 * overflow case.
-	 */
-	ret = 0;
-	for (seg = 0; seg < nr_segs; seg++) {
-		void __user *buf = iov[seg].iov_base;
-		ssize_t len = (ssize_t)iov[seg].iov_len;
-
-		/* see if we we're about to use an invalid len or if
-		 * it's about to overflow ssize_t */
-		if (len < 0) {
-			ret = -EINVAL;
-			goto out;
-		}
-		if (type >= 0
-		    && unlikely(!access_ok(buf, len))) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - ret) {
-			len = MAX_RW_COUNT - ret;
-			iov[seg].iov_len = len;
-		}
-		ret += len;
-	}
-out:
-	*ret_pointer = iov;
-	return ret;
-}
-
-#ifdef CONFIG_COMPAT
-ssize_t compat_rw_copy_check_uvector(int type,
-		const struct compat_iovec __user *uvector, unsigned long nr_segs,
-		unsigned long fast_segs, struct iovec *fast_pointer,
-		struct iovec **ret_pointer)
-{
-	compat_ssize_t tot_len;
-	struct iovec *iov = *ret_pointer = fast_pointer;
-	ssize_t ret = 0;
-	int seg;
-
-	/*
-	 * SuS says "The readv() function *may* fail if the iovcnt argument
-	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
-	 * traditionally returned zero for zero segments, so...
-	 */
-	if (nr_segs == 0)
-		goto out;
-
-	ret = -EINVAL;
-	if (nr_segs > UIO_MAXIOV)
-		goto out;
-	if (nr_segs > fast_segs) {
-		ret = -ENOMEM;
-		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
-		if (iov == NULL)
-			goto out;
-	}
-	*ret_pointer = iov;
-
-	ret = -EFAULT;
-	if (!access_ok(uvector, nr_segs*sizeof(*uvector)))
-		goto out;
-
-	/*
-	 * Single unix specification:
-	 * We should -EINVAL if an element length is not >= 0 and fitting an
-	 * ssize_t.
-	 *
-	 * In Linux, the total length is limited to MAX_RW_COUNT, there is
-	 * no overflow possibility.
-	 */
-	tot_len = 0;
-	ret = -EINVAL;
-	for (seg = 0; seg < nr_segs; seg++) {
-		compat_uptr_t buf;
-		compat_ssize_t len;
-
-		if (__get_user(len, &uvector->iov_len) ||
-		   __get_user(buf, &uvector->iov_base)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
-			goto out;
-		if (type >= 0 &&
-		    !access_ok(compat_ptr(buf), len)) {
-			ret = -EFAULT;
-			goto out;
-		}
-		if (len > MAX_RW_COUNT - tot_len)
-			len = MAX_RW_COUNT - tot_len;
-		tot_len += len;
-		iov->iov_base = compat_ptr(buf);
-		iov->iov_len = (compat_size_t) len;
-		uvector++;
-		iov++;
-	}
-	ret = tot_len;
-
-out:
-	return ret;
-}
-#endif
-
 static ssize_t do_iter_read(struct file *file, struct iov_iter *iter,
 		loff_t *pos, rwf_t flags)
 {
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 5e40786c8f1232..ccea9db3f72be8 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1650,6 +1650,109 @@  const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags)
 }
 EXPORT_SYMBOL(dup_iter);
 
+/**
+ * rw_copy_check_uvector() - Copy an array of &struct iovec from userspace
+ *     into the kernel and check that it is valid.
+ *
+ * @type: One of %CHECK_IOVEC_ONLY, %READ, or %WRITE.
+ * @uvector: Pointer to the userspace array.
+ * @nr_segs: Number of elements in userspace array.
+ * @fast_segs: Number of elements in @fast_pointer.
+ * @fast_pointer: Pointer to (usually small on-stack) kernel array.
+ * @ret_pointer: (output parameter) Pointer to a variable that will point to
+ *     either @fast_pointer, a newly allocated kernel array, or NULL,
+ *     depending on which array was used.
+ *
+ * This function copies an array of &struct iovec of @nr_segs from
+ * userspace into the kernel and checks that each element is valid (e.g.
+ * it does not point to a kernel address or cause overflow by being too
+ * large, etc.).
+ *
+ * As an optimization, the caller may provide a pointer to a small
+ * on-stack array in @fast_pointer, typically %UIO_FASTIOV elements long
+ * (the size of this array, or 0 if unused, should be given in @fast_segs).
+ *
+ * @ret_pointer will always point to the array that was used, so the
+ * caller must take care not to call kfree() on it e.g. in case the
+ * @fast_pointer array was used and it was allocated on the stack.
+ *
+ * Return: The total number of bytes covered by the iovec array on success
+ *   or a negative error code on error.
+ */
+ssize_t rw_copy_check_uvector(int type, const struct iovec __user *uvector,
+		unsigned long nr_segs, unsigned long fast_segs,
+		struct iovec *fast_pointer, struct iovec **ret_pointer)
+{
+	unsigned long seg;
+	ssize_t ret;
+	struct iovec *iov = fast_pointer;
+
+	/*
+	 * SuS says "The readv() function *may* fail if the iovcnt argument
+	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
+	 * traditionally returned zero for zero segments, so...
+	 */
+	if (nr_segs == 0) {
+		ret = 0;
+		goto out;
+	}
+
+	/*
+	 * First get the "struct iovec" from user memory and
+	 * verify all the pointers
+	 */
+	if (nr_segs > UIO_MAXIOV) {
+		ret = -EINVAL;
+		goto out;
+	}
+	if (nr_segs > fast_segs) {
+		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
+		if (iov == NULL) {
+			ret = -ENOMEM;
+			goto out;
+		}
+	}
+	if (copy_from_user(iov, uvector, nr_segs*sizeof(*uvector))) {
+		ret = -EFAULT;
+		goto out;
+	}
+
+	/*
+	 * According to the Single Unix Specification we should return EINVAL
+	 * if an element length is < 0 when cast to ssize_t or if the
+	 * total length would overflow the ssize_t return value of the
+	 * system call.
+	 *
+	 * Linux caps all read/write calls to MAX_RW_COUNT, and avoids the
+	 * overflow case.
+	 */
+	ret = 0;
+	for (seg = 0; seg < nr_segs; seg++) {
+		void __user *buf = iov[seg].iov_base;
+		ssize_t len = (ssize_t)iov[seg].iov_len;
+
+		/* see if we we're about to use an invalid len or if
+		 * it's about to overflow ssize_t */
+		if (len < 0) {
+			ret = -EINVAL;
+			goto out;
+		}
+		if (type >= 0
+		    && unlikely(!access_ok(buf, len))) {
+			ret = -EFAULT;
+			goto out;
+		}
+		if (len > MAX_RW_COUNT - ret) {
+			len = MAX_RW_COUNT - ret;
+			iov[seg].iov_len = len;
+		}
+		ret += len;
+	}
+out:
+	*ret_pointer = iov;
+	return ret;
+}
+
 /**
  * import_iovec() - Copy an array of &struct iovec from userspace
  *     into the kernel, check that it is valid, and initialize a new
@@ -1695,6 +1798,79 @@  EXPORT_SYMBOL(import_iovec);
 #ifdef CONFIG_COMPAT
 #include <linux/compat.h>
 
+ssize_t compat_rw_copy_check_uvector(int type,
+		const struct compat_iovec __user *uvector,
+		unsigned long nr_segs, unsigned long fast_segs,
+		struct iovec *fast_pointer, struct iovec **ret_pointer)
+{
+	compat_ssize_t tot_len;
+	struct iovec *iov = *ret_pointer = fast_pointer;
+	ssize_t ret = 0;
+	int seg;
+
+	/*
+	 * SuS says "The readv() function *may* fail if the iovcnt argument
+	 * was less than or equal to 0, or greater than {IOV_MAX}.  Linux has
+	 * traditionally returned zero for zero segments, so...
+	 */
+	if (nr_segs == 0)
+		goto out;
+
+	ret = -EINVAL;
+	if (nr_segs > UIO_MAXIOV)
+		goto out;
+	if (nr_segs > fast_segs) {
+		ret = -ENOMEM;
+		iov = kmalloc_array(nr_segs, sizeof(struct iovec), GFP_KERNEL);
+		if (iov == NULL)
+			goto out;
+	}
+	*ret_pointer = iov;
+
+	ret = -EFAULT;
+	if (!access_ok(uvector, nr_segs*sizeof(*uvector)))
+		goto out;
+
+	/*
+	 * Single unix specification:
+	 * We should -EINVAL if an element length is not >= 0 and fitting an
+	 * ssize_t.
+	 *
+	 * In Linux, the total length is limited to MAX_RW_COUNT, there is
+	 * no overflow possibility.
+	 */
+	tot_len = 0;
+	ret = -EINVAL;
+	for (seg = 0; seg < nr_segs; seg++) {
+		compat_uptr_t buf;
+		compat_ssize_t len;
+
+		if (__get_user(len, &uvector->iov_len) ||
+		   __get_user(buf, &uvector->iov_base)) {
+			ret = -EFAULT;
+			goto out;
+		}
+		if (len < 0)	/* size_t not fitting in compat_ssize_t .. */
+			goto out;
+		if (type >= 0 &&
+		    !access_ok(compat_ptr(buf), len)) {
+			ret = -EFAULT;
+			goto out;
+		}
+		if (len > MAX_RW_COUNT - tot_len)
+			len = MAX_RW_COUNT - tot_len;
+		tot_len += len;
+		iov->iov_base = compat_ptr(buf);
+		iov->iov_len = (compat_size_t) len;
+		uvector++;
+		iov++;
+	}
+	ret = tot_len;
+
+out:
+	return ret;
+}
+
 ssize_t compat_import_iovec(int type,
 		const struct compat_iovec __user * uvector,
 		unsigned nr_segs, unsigned fast_segs,