diff mbox

Fix p{readv,writev}{64} consolidation implementation

Message ID 1465941275-3459-1-git-send-email-adhemerval.zanella@linaro.org
State Superseded
Headers show

Commit Message

Adhemerval Zanella June 14, 2016, 9:54 p.m. UTC
This patch fixes the p{readv,writev}{64} consolidation implementation
from commits 4e77815 and af5fdf5.  Different from pread/pwrite
implementation, preadv/pwritev implementation does not require
__ALIGNMENT_ARG because kernel syscall prototypes define
the high and low part of the off_t, if it is the case, directly
(different from pread/pwrite where the architecture ABI for passing
64-bit values must be in consideration for passsing the arguments).

It also adds some basic tests for preadv/pwritev.

Tested on x86_64, i686, and armhf.

	* misc/Makefile (tests): Add tst-preadvwritev and tst-preadvwritev64.
	* misc/tst-preadvwritev.c: New file.
	* misc/tst-preadvwritev64.c: Likewise.
	* sysdeps/unix/sysv/linux/preadv.c (preadv): Remove SYSCALL_LL{64}
	usage.
	* sysdeps/unix/sysv/linux/preadv64.c (preadv64): Likewise.
	* sysdeps/unix/sysv/linux/pwritev.c (pwritev): Likewise.
	* sysdeps/unix/sysv/linux/pwritev64.c (pwritev64): Likewise.
---
 ChangeLog                           |  11 ++++
 misc/Makefile                       |   3 +-
 misc/tst-preadvwritev.c             | 111 ++++++++++++++++++++++++++++++++++++
 misc/tst-preadvwritev64.c           |  22 +++++++
 sysdeps/unix/sysv/linux/preadv.c    |   5 +-
 sysdeps/unix/sysv/linux/preadv64.c  |   5 +-
 sysdeps/unix/sysv/linux/pwritev.c   |   5 +-
 sysdeps/unix/sysv/linux/pwritev64.c |   4 +-
 8 files changed, 154 insertions(+), 12 deletions(-)
 create mode 100644 misc/tst-preadvwritev.c
 create mode 100644 misc/tst-preadvwritev64.c

-- 
2.7.4

Comments

Adhemerval Zanella June 15, 2016, 1:32 p.m. UTC | #1
On 15/06/2016 09:59, Pedro Alves wrote:
> On 06/14/2016 10:54 PM, Adhemerval Zanella wrote:

>> +

>> +  char buf1[32];

>> +  char buf2[64];

>> +

>> +  memset (buf1, 0x00, sizeof buf1);

>> +  memset (buf1, 0xca, sizeof buf1);

>> +

> 

> Did you mean to memset buf2 in the second memset call?

> 

> Thanks,

> Pedro Alves

> 


Oops yes, I will change that.
Adhemerval Zanella June 15, 2016, 1:45 p.m. UTC | #2
On 15/06/2016 02:37, Mike Frysinger wrote:
> On 14 Jun 2016 18:54, Adhemerval Zanella wrote:

>> This patch fixes the p{readv,writev}{64} consolidation implementation

>> from commits 4e77815 and af5fdf5.  Different from pread/pwrite

>> implementation, preadv/pwritev implementation does not require

>> __ALIGNMENT_ARG because kernel syscall prototypes define

>> the high and low part of the off_t, if it is the case, directly

>> (different from pread/pwrite where the architecture ABI for passing

>> 64-bit values must be in consideration for passsing the arguments).

> 

> i had looked at that specifically but thought it ok because the old code

> was using the alignment arg.  was the old code broken too ?

> 

> this is what the preadv code looked like:

> -ssize_t

> -__libc_preadv (int fd, const struct iovec *vector, int count, off_t offset)

> -{

> -  assert (sizeof (offset) == 4);

> -  return SYSCALL_CANCEL (preadv, fd,

> -                         vector, count, __ALIGNMENT_ARG

> -                         __LONG_LONG_PAIR (offset >> 31, offset));

> -}

> 

> although i guess this isn't too surprising as this code was in the

> generic sysdeps dir which currently doesn't have as many users as

> we wish it did :).


I though it too, but before the consolidation patch only nios2 and tile 32-bits
used the generic preadv.c implementation.  And only tile 32-bits defines 
__ASSUME_ALIGNED_REGISTER_PAIRS (so __ALIGNMENT_ARG will pad the argument with 0).
However since the syscall is defined as in linux source:

fs/read_write.c:

 991 SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,
 992                 unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)
 993 {
 994         loff_t pos = pos_from_hilo(pos_h, pos_l);
 995 
 996         return do_preadv(fd, vec, vlen, pos, 0);
 997 }


The idea is not really to align the argument to zero pass, but rather to split
the possible 64-bits argument in high and low as required (as the default 
implementation was doing [2]). On tile, it is working because the preadv.c
offset is 32-bits and thus the high word is indeed zero, but it is passing
one superfluous argument.  Also my understanding is this generic implementation
does work correctly in every architecture because __LONG_LONG_PAIR relies
on endianess. 


[1] sysdeps/unix/sysv/linux/generic/wordsize-32/preadv.c
[2] sysdeps/unix/sysv/linux/preadv.c

> 

>> +#define FAIL() \

>> +  do { printf ("error: failure on line %d\n", __LINE__); return 1; } while (0)

>> ...

>> +  ret = pwritev (temp_fd, iov, 2, 0);

>> +  if (ret == -1)

>> +    FAIL ();

> 

> as a personal stance, never been a big fan of messages that just have a

> line number.  when you get reports/logs, you end having to chase down the

> exact same source tree as the reporter.

> 


Right, I will change to a more descriptive error message.

> why can't we just assert() everywhere ?  we rely on that in tests already

> and we wouldn't have to do any checking otherwise.

>   ret = pwritev (temp_fd, iov, 2, 0);

>   assert (ret == sizeof buf1 + sizeof buf2);

> -mike

> 


As Florian has stated assert writes on stderr :(
Adhemerval Zanella June 15, 2016, 1:48 p.m. UTC | #3
On 15/06/2016 03:50, Florian Weimer wrote:
> On 06/15/2016 07:37 AM, Mike Frysinger wrote:

> 

>> although i guess this isn't too surprising as this code was in the

>> generic sysdeps dir which currently doesn't have as many users as

>> we wish it did :).

> 

> Yeah, I think we have to rely on the new test to catch any outliers.

> 

>>> +#define FAIL() \

>>> +  do { printf ("error: failure on line %d\n", __LINE__); return 1; } while (0)

>>> ...

>>> +  ret = pwritev (temp_fd, iov, 2, 0);

>>> +  if (ret == -1)

>>> +    FAIL ();

>>

>> as a personal stance, never been a big fan of messages that just have a

>> line number.  when you get reports/logs, you end having to chase down the

>> exact same source tree as the reporter.

> 

> At least there is a line number.  Many existing tests do not even have that.

> 

>> why can't we just assert() everywhere ?  we rely on that in tests already

>> and we wouldn't have to do any checking otherwise.

>>   ret = pwritev (temp_fd, iov, 2, 0);

>>   assert (ret == sizeof buf1 + sizeof buf2);

> 

> assert writes to standard error, not standard output, where it will be captured.  I don't know why we don't capture both.

> 

> There seems to be a policy against tests aborting on failure, too, but we are not very consistent about that.  In my experience, people who build and test glibc are more likely to shrug off tests with non-zero exit status than to ignore tests with aborts, although they really should treat them the same.

> 

> I think we should really have some sort of libtest.a, which provides test helpers, error-checking functions such as xmalloc, and general helpers for setting up special test environments.  I'm a bit worried about figuring out the proper dependencies, so that libtest.a is built before all the tests are linked.

> 

> Florian

> 


libtest.a is indeed good idea and there is lot of tests that reimplement the same
thing over and over.
Adhemerval Zanella June 15, 2016, 8:17 p.m. UTC | #4
On 15/06/2016 15:21, Chris Metcalf wrote:
> On 6/15/2016 9:45 AM, Adhemerval Zanella wrote:

>> On 15/06/2016 02:37, Mike Frysinger wrote:

>>> On 14 Jun 2016 18:54, Adhemerval Zanella wrote:

>>>> This patch fixes the p{readv,writev}{64} consolidation implementation

>>>> from commits 4e77815 and af5fdf5.  Different from pread/pwrite

>>>> implementation, preadv/pwritev implementation does not require

>>>> __ALIGNMENT_ARG because kernel syscall prototypes define

>>>> the high and low part of the off_t, if it is the case, directly

>>>> (different from pread/pwrite where the architecture ABI for passing

>>>> 64-bit values must be in consideration for passsing the arguments).

>>> i had looked at that specifically but thought it ok because the old code

>>> was using the alignment arg.  was the old code broken too ?

>>>

>>> this is what the preadv code looked like:

>>> -ssize_t

>>> -__libc_preadv (int fd, const struct iovec *vector, int count, off_t offset)

>>> -{

>>> -  assert (sizeof (offset) == 4);

>>> -  return SYSCALL_CANCEL (preadv, fd,

>>> -                         vector, count, __ALIGNMENT_ARG

>>> -                         __LONG_LONG_PAIR (offset >> 31, offset));

>>> -}

>>>

>>> although i guess this isn't too surprising as this code was in the

>>> generic sysdeps dir which currently doesn't have as many users as

>>> we wish it did :).

>> I though it too, but before the consolidation patch only nios2 and tile 32-bits

>> used the generic preadv.c implementation.  And only tile 32-bits defines

>> __ASSUME_ALIGNED_REGISTER_PAIRS (so __ALIGNMENT_ARG will pad the argument with 0).

>> However since the syscall is defined as in linux source:

>>

>> fs/read_write.c:

>>

>>   991 SYSCALL_DEFINE5(preadv, unsigned long, fd, const struct iovec __user *, vec,

>>   992                 unsigned long, vlen, unsigned long, pos_l, unsigned long, pos_h)

>>   993 {

>>   994         loff_t pos = pos_from_hilo(pos_h, pos_l);

>>   995

>>   996         return do_preadv(fd, vec, vlen, pos, 0);

>>   997 }

> 

> Actually, the relevant code at fs/read_write.c:1162 is:

> 

> COMPAT_SYSCALL_DEFINE5(preadv, compat_ulong_t, fd,

>         const struct compat_iovec __user *,vec,

>         compat_ulong_t, vlen, u32, pos_low, u32, pos_high)

> {

>     loff_t pos = ((loff_t)pos_high << 32) | pos_low;

> 

>     return do_compat_preadv64(fd, vec, vlen, pos, 0);

> }

> 

> but it amounts to the same thing as far the arguments are concerned.


This is for 64-bits kernel that provides the 32-bit compat syscalls.
For 32-bits kernels only it the code I referenced above.

> 

>> The idea is not really to align the argument to zero pass, but rather to split

>> the possible 64-bits argument in high and low as required (as the default

>> implementation was doing [2]). On tile, it is working because the preadv.c

>> offset is 32-bits and thus the high word is indeed zero, but it is passing

>> one superfluous argument.

> 

> No, what happens is that instead of passing r3=pos_low=lo, r4=pos_high=0, we

> are passing r3=pos_low=0, r4=pos_high=lo, r5=0.  This means that we get a crazy

> high offset and either fail or (for a sufficiently large file) get the wrong data.

> Filed as bug 20261.


I was referring to *old* behaviour (pre-consolidation) implementation 
(the sysdeps/unix/sysv/linux/generic/wordsize-32/preadv.c), which did:

--
  return SYSCALL_CANCEL (preadv, fd,
                         vector, count, __ALIGNMENT_ARG
                         __LONG_LONG_PAIR (offset >> 31, offset));
--

It has 2 issue:

 1. It passed one superfluous argument to preadv. On tilepro build
    I noted that is using internal_syscall6, which means it is passing
    { fd, vector, cound, 0, offset, offset>>31 }.  It is not wrong,
    since for this code off_t will be the old 32-bit value, but the
    semantic is wrong.

 2. __LONG_LONG_PAIR is not correct for big-endian.


Now for BZ#20261 I do not think it applicable since this is a fix for
consolidation done in development phase, it does not appear in any
released version.

Now, your analysis is correct for *current* code and it is contains the
__LONG_LONG_PAIR issue due the SYSCALL_LL usage.  I will change it
and post a second version.

> 

> It appears to be true for both preadv and pwritev, though I only tested preadv.

> 

>> Also my understanding is this generic implementation

>> does work correctly in every architecture because __LONG_LONG_PAIR relies

>> on endianess.

> 

> In fact that's another bug; __LONG_LONG_PAIR is intended only to

> be used if you are simulating passing a 64-bit value in 32-bit registers,

> where the ABI naturally splits the 64 bits into a high and low part

> according to endianness.

> 

> In this example we are calling into the kernel where it expects a pos_low

> and a pos_high in a particular order.  On a big-endian machine, the

> __LONG_LONG_PAIR will put the high part first and the kernel will

> see the wrong offset.

>
Adhemerval Zanella June 16, 2016, 3:49 p.m. UTC | #5
On 16/06/2016 12:25, Chris Metcalf wrote:
> On 6/15/2016 4:17 PM, Adhemerval Zanella wrote:

>> On 15/06/2016 15:21, Chris Metcalf wrote:

>>> On 6/15/2016 9:45 AM, Adhemerval Zanella wrote:

>>>> On 15/06/2016 02:37, Mike Frysinger wrote:

>>>>> On 14 Jun 2016 18:54, Adhemerval Zanella wrote:

>>>>>> This patch fixes the p{readv,writev}{64} consolidation implementation

>>>>>> from commits 4e77815 and af5fdf5.  Different from pread/pwrite

>>>>>> implementation, preadv/pwritev implementation does not require

>>>>>> __ALIGNMENT_ARG because kernel syscall prototypes define

>>>>>> the high and low part of the off_t, if it is the case, directly

>>>>>> (different from pread/pwrite where the architecture ABI for passing

>>>>>> 64-bit values must be in consideration for passsing the arguments).

>>>>> i had looked at that specifically but thought it ok because the old code

>>>>> was using the alignment arg.  was the old code broken too ?

>>>>>

>>>>> this is what the preadv code looked like:

>>>>> -ssize_t

>>>>> -__libc_preadv (int fd, const struct iovec *vector, int count, off_t offset)

>>>>> -{

>>>>> -  assert (sizeof (offset) == 4);

>>>>> -  return SYSCALL_CANCEL (preadv, fd,

>>>>> -                         vector, count, __ALIGNMENT_ARG

>>>>> -                         __LONG_LONG_PAIR (offset >> 31, offset));

>>>>> -}

>>>>>

>>>>> although i guess this isn't too surprising as this code was in the

>>>>> generic sysdeps dir which currently doesn't have as many users as

>>>>> we wish it did :).

>>>>>

>>>> The idea is not really to align the argument to zero pass, but rather to split

>>>> the possible 64-bits argument in high and low as required (as the default

>>>> implementation was doing [2]). On tile, it is working because the preadv.c

>>>> offset is 32-bits and thus the high word is indeed zero, but it is passing

>>>> one superfluous argument.

>>> No, what happens is that instead of passing r3=pos_low=lo, r4=pos_high=0, we

>>> are passing r3=pos_low=0, r4=pos_high=lo, r5=0.  This means that we get a crazy

>>> high offset and either fail or (for a sufficiently large file) get the wrong data.

>>> Filed as bug 20261.

>> I was referring to *old* behaviour (pre-consolidation) implementation

>> (the sysdeps/unix/sysv/linux/generic/wordsize-32/preadv.c), which did:

>>

>> -- 

>>    return SYSCALL_CANCEL (preadv, fd,

>>                           vector, count, __ALIGNMENT_ARG

>>                           __LONG_LONG_PAIR (offset >> 31, offset));

>> -- 

>>

>> It has 2 issue:

>>

>>   1. It passed one superfluous argument to preadv. On tilepro build

>>      I noted that is using internal_syscall6, which means it is passing

>>      { fd, vector, cound, 0, offset, offset>>31 }.  It is not wrong,

>>      since for this code off_t will be the old 32-bit value, but the

>>      semantic is wrong.

> 

> No, it's definitely wrong :-)

> 

> We pass "offset" as the high part, and "0" as the low part. Accordingly, what

> the kernel sees is a request for "offset << 32" rather than "offset".  The bug

> report I filed contains a repro that does fail on tilegx32.


I am not following because I am referring to the old code that was *suppose*
to work on tile (either on 32-bits kernel or 64-bits kernel with compat
enabled). I rechecked the compilation against GLIBC 2.23 (which does not 
contain this consolidation) and I am seeing tilepro passing:

"R00" (fd), "R01" (vector), "R02" (count), "R03" (0), "R04" (offset), "R05" (offset >> 31)

Which indeed seems wrong with the issue you pointed out.  So how was this
suppose to work on previous GLIBC?

The new consolidation implementation indeed contains a bug and that's what
my v2 [1] is trying to fix.  With the v2 I am not seeing:

"R00" (fd), "R01" (vector), "R02" (count), "R03" ((long) (offset)), "R04" ((long) (((uint64_t) (offset)) >> 32))

[1] https://sourceware.org/ml/libc-alpha/2016-06/msg00607.html

> 

>>   2. __LONG_LONG_PAIR is not correct for big-endian.

> 

> Yes, this needs to be fixed on the kernel side for consistency. Many other

> syscalls pass their arguments with __LONG_LONG_PAIR.  Handling it this way

> in the kernel seems pretty standard at this point, e.g. regs_to_64 in

> arch/arm64/include/asm/assembler.h, or merge_64 in

> arch/mips/kernel/linux32.c.  Even the fixed-endian platforms do it

> consistently, e.g. "hi, lo" on parisc/s390/sparc and "lo, hi" on metag/x86.

> 

> It's possible little-endian compat powerpc shares this bug since it seems to

> take fixed "hi, lo" arguments, e.g. in compat_sys_readahead().  I'll

> forward this email on to the powerpc kernel maintainers to make sure

> they have a chance to think about it.

> 

> See https://lkml.kernel.org/g/1466019219-10462-1-git-send-email-cmetcalf@mellanox.com


I *really* do not think this is correct approach mainly because it break an
already defined kernel ABI and old BE preadv call on GLIBC was not done
by using __LONG_LONG_PAIR.  Old GLIBC will just fail in this new kernel
interface.  Unfortunately this is something we will need to carry on IMHO.

That's why I just removed the wrong assumption in old
sysdeps/unix/sysv/linux/generic/wordsize-32/preadv{64} about __LONG_LONG_PAIR.
 

>> Now for BZ#20261 I do not think it applicable since this is a fix for

>> consolidation done in development phase, it does not appear in any

>> released version.

> 

> It breaks on all versions, as far as I can tell.  I tested it on our RHEL 6 platform

> (based on backported glibc 2.12) and it fails there.


What exactly breaks in all version? What I am referring is that not the 
bug is 'invalid', but rather it does not apply to a *user visible* one
because the consolidation code is only in 2.24 master branch (it is not
in any *released* version).

Now, if it fails on older releases it is not due this change, but rather
due old sysdeps/unix/sysv/linux/generic/wordsize-32/preadv{64}.  Now, if
it is the case it is indeed a user visible bug.

> 

>> Now, your analysis is correct for *current* code and it is contains the

>> __LONG_LONG_PAIR issue due the SYSCALL_LL usage.  I will change it

>> and post a second version.

> 

> I think you should revert that part of the change and stick with __LONG_LONG_PAIR.

>
Adhemerval Zanella June 21, 2016, 1:17 p.m. UTC | #6
On 20/06/2016 16:26, Chris Metcalf wrote:
> (+Chung-Lin Tang for possible nios2 bug.)

> 

> On 6/16/2016 11:49 AM, Adhemerval Zanella wrote:

>> On 16/06/2016 12:25, Chris Metcalf wrote:

>>> On 6/15/2016 4:17 PM, Adhemerval Zanella wrote:

>>>> On 15/06/2016 15:21, Chris Metcalf wrote:

>>>>> On 6/15/2016 9:45 AM, Adhemerval Zanella wrote:

>>>>>> On 15/06/2016 02:37, Mike Frysinger wrote:

>>>>>>> On 14 Jun 2016 18:54, Adhemerval Zanella wrote:

>>>>>>>> This patch fixes the p{readv,writev}{64} consolidation implementation

>>>>>>>> from commits 4e77815 and af5fdf5.  Different from pread/pwrite

>>>>>>>> implementation, preadv/pwritev implementation does not require

>>>>>>>> __ALIGNMENT_ARG because kernel syscall prototypes define

>>>>>>>> the high and low part of the off_t, if it is the case, directly

>>>>>>>> (different from pread/pwrite where the architecture ABI for passing

>>>>>>>> 64-bit values must be in consideration for passsing the arguments).

>>>>>>> i had looked at that specifically but thought it ok because the old code

>>>>>>> was using the alignment arg.  was the old code broken too ?

>>>>>>>

>>>>>>> this is what the preadv code looked like:

>>>>>>> -ssize_t

>>>>>>> -__libc_preadv (int fd, const struct iovec *vector, int count, off_t offset)

>>>>>>> -{

>>>>>>> -  assert (sizeof (offset) == 4);

>>>>>>> -  return SYSCALL_CANCEL (preadv, fd,

>>>>>>> -                         vector, count, __ALIGNMENT_ARG

>>>>>>> -                         __LONG_LONG_PAIR (offset >> 31, offset));

>>>>>>> -}

>>>>>>>

>>>>>>> although i guess this isn't too surprising as this code was in the

>>>>>>> generic sysdeps dir which currently doesn't have as many users as

>>>>>>> we wish it did :).

>>>>>>>

>>>>>> The idea is not really to align the argument to zero pass, but rather to split

>>>>>> the possible 64-bits argument in high and low as required (as the default

>>>>>> implementation was doing [2]). On tile, it is working because the preadv.c

>>>>>> offset is 32-bits and thus the high word is indeed zero, but it is passing

>>>>>> one superfluous argument.

>>>>> No, what happens is that instead of passing r3=pos_low=lo, r4=pos_high=0, we

>>>>> are passing r3=pos_low=0, r4=pos_high=lo, r5=0.  This means that we get a crazy

>>>>> high offset and either fail or (for a sufficiently large file) get the wrong data.

>>>>> Filed as bug 20261.

>>>> I was referring to *old* behaviour (pre-consolidation) implementation

>>>> (the sysdeps/unix/sysv/linux/generic/wordsize-32/preadv.c), which did:

>>>>

>>>> -- 

>>>>     return SYSCALL_CANCEL (preadv, fd,

>>>>                            vector, count, __ALIGNMENT_ARG

>>>>                            __LONG_LONG_PAIR (offset >> 31, offset));

>>>> -- 

>>>>

>>>> It has 2 issue:

>>>>

>>>>    1. It passed one superfluous argument to preadv. On tilepro build

>>>>       I noted that is using internal_syscall6, which means it is passing

>>>>       { fd, vector, cound, 0, offset, offset>>31 }.  It is not wrong,

>>>>       since for this code off_t will be the old 32-bit value, but the

>>>>       semantic is wrong.

>>> No, it's definitely wrong :-)

>>>

>>> We pass "offset" as the high part, and "0" as the low part. Accordingly, what

>>> the kernel sees is a request for "offset << 32" rather than "offset".  The bug

>>> report I filed contains a repro that does fail on tilegx32.

>> I am not following because I am referring to the old code that was *suppose*

>> to work on tile (either on 32-bits kernel or 64-bits kernel with compat

>> enabled). I rechecked the compilation against GLIBC 2.23 (which does not

>> contain this consolidation) and I am seeing tilepro passing:

>>

>> "R00" (fd), "R01" (vector), "R02" (count), "R03" (0), "R04" (offset), "R05" (offset >> 31)

> 

> This is wrong, and this was indeed a bug prior to your consolidation patch.

> This is a generic/wordsize-32 specific bug which apparently was not caught

> during testing of tilepro or tilegx32.  In addition to the erroneous __ALIGNMENT_ARG,

> we should be using LO_HI_LONG() here rather than __LONG_LONG_PAIR.

> I suspect this is also a bug in the nios2 glibc.


Ah right then.

> 

>> Which indeed seems wrong with the issue you pointed out.  So how was this

>> suppose to work on previous GLIBC?

> 

> It doesn't.  That's why I filed the bug.


Ack.

> 

>> The new consolidation implementation indeed contains a bug and that's what

>> my v2 [1] is trying to fix.  With the v2 I am not seeing:

>>

>> "R00" (fd), "R01" (vector), "R02" (count), "R03" ((long) (offset)), "R04" ((long) (((uint64_t) (offset)) >> 32))

>>

>> [1]https://sourceware.org/ml/libc-alpha/2016-06/msg00607.html

> 

> When you say "not seeing", I'm assuming you mean "now seeing".

> 

> In any case, "vector, count, lo, hi" is what we should see here.


Yes, s/not/now.  Right, that is my understanding as well.

> 

>>>>    2. __LONG_LONG_PAIR is not correct for big-endian.

>>> Yes, this needs to be fixed on the kernel side for consistency. Many other

>>> syscalls pass their arguments with __LONG_LONG_PAIR.  Handling it this way

>>> in the kernel seems pretty standard at this point, e.g. regs_to_64 in

>>> arch/arm64/include/asm/assembler.h, or merge_64 in

>>> arch/mips/kernel/linux32.c.  Even the fixed-endian platforms do it

>>> consistently, e.g. "hi, lo" on parisc/s390/sparc and "lo, hi" on metag/x86.

>>>

>>> It's possible little-endian compat powerpc shares this bug since it seems to

>>> take fixed "hi, lo" arguments, e.g. in compat_sys_readahead().  I'll

>>> forward this email on to the powerpc kernel maintainers to make sure

>>> they have a chance to think about it.

>>>

>>> Seehttps://lkml.kernel.org/g/1466019219-10462-1-git-send-email-cmetcalf@mellanox.com

>> I *really* do not think this is correct approach mainly because it break an

>> already defined kernel ABI and old BE preadv call on GLIBC was not done

>> by using __LONG_LONG_PAIR.  Old GLIBC will just fail in this new kernel

>> interface.  Unfortunately this is something we will need to carry on IMHO.

>>

>> That's why I just removed the wrong assumption in old

>> sysdeps/unix/sysv/linux/generic/wordsize-32/preadv{64} about __LONG_LONG_PAIR.

> 

> Yes, there are two issues here that I am conflating - sorry about that.

> 

> The preadv/pwritev stuff should be using LO_HI_LONG.  I think your

> v2 patch is correct.  No kernel change is required here.


Nice, I was with the wrong assumption you had concerns regarding preadv/writev
patch.

> 

> When I went and reviewed this stuff for tile, it led me to the tile kernel ABI

> code that was not properly handling __LONG_LONG_PAIR in the kernel

> for APIs that expect 64-bit values like truncate/ftruncate, pread64/pwrite64,

> sync_file_range, and fallocate.

> 

> I think our installed base of "tilepro/tilegx32 big-endian" is so small as to

> be effectively zero.  We had one or two customers interested in big-endian 64-bit,

> and a couple of customers interested in tilegx compat 32-bit, but I don't think we

> ever had customers interested in the cross product of those two features.

> 

> Generally speaking I would otherwise agree and say we should provide

> tile-specific variants of all those broken syscalls in glibc to support the existing ABI.


Right, I was assuming a more conservative approach regarding kernel/libc
API.  But I see your point regarding the base installation.

> 

>>>> Now for BZ#20261 I do not think it applicable since this is a fix for

>>>> consolidation done in development phase, it does not appear in any

>>>> released version.

>>> It breaks on all versions, as far as I can tell.  I tested it on our RHEL 6 platform

>>> (based on backported glibc 2.12) and it fails there.

>> What exactly breaks in all version? What I am referring is that not the

>> bug is 'invalid', but rather it does not apply to a *user visible* one

>> because the consolidation code is only in 2.24 master branch (it is not

>> in any *released* version).

>>

>> Now, if it fails on older releases it is not due this change, but rather

>> due old sysdeps/unix/sysv/linux/generic/wordsize-32/preadv{64}.  Now, if

>> it is the case it is indeed a user visible bug.

> 

> Yes, that's exactly the case for preadv/pwritev, due to the bogus _ALIGNMENT_ARG;

> it breaks on all 32-bit tile platforms with all public releases.

> 


Right, I think we now aligned about pread/pwritev and I will commit it soon.  Thanks
for the reply!
diff mbox

Patch

diff --git a/misc/Makefile b/misc/Makefile
index 6498adc..56e20ce 100644
--- a/misc/Makefile
+++ b/misc/Makefile
@@ -77,7 +77,8 @@  gpl2lgpl := error.c error.h
 
 tests := tst-dirname tst-tsearch tst-fdset tst-efgcvt tst-mntent tst-hsearch \
 	 tst-error1 tst-pselect tst-insremque tst-mntent2 bug-hsearch1 \
-	 tst-mntent-blank-corrupt tst-mntent-blank-passno bug18240
+	 tst-mntent-blank-corrupt tst-mntent-blank-passno bug18240 \
+	 tst-preadvwritev tst-preadvwritev64
 ifeq ($(run-built-tests),yes)
 tests-special += $(objpfx)tst-error1-mem.out
 endif
diff --git a/misc/tst-preadvwritev.c b/misc/tst-preadvwritev.c
new file mode 100644
index 0000000..60c7c84
--- /dev/null
+++ b/misc/tst-preadvwritev.c
@@ -0,0 +1,111 @@ 
+/* Tests for preadv and pwritev.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/uio.h>
+
+/* Allow testing of the 64-bit versions as well.  */
+#ifndef PREAD
+# define PREAD pread
+# define PWRITE pwrite
+#endif
+
+static void do_prepare (void);
+static int do_test (void);
+#define PREPARE(argc, argv)     do_prepare ()
+#define TEST_FUNCTION           do_test ()
+#include "../test-skeleton.c"
+
+static char *temp_filename;
+static int temp_fd;
+
+void
+do_prepare (void)
+{
+  temp_fd = create_temp_file ("tst-preadvwritev.", &temp_filename);
+  if (temp_fd == -1)
+    {
+      printf ("cannot create temporary file: %m\n");
+      exit (1);
+    }
+}
+
+#define FAIL() \
+  do { printf ("error: failure on line %d\n", __LINE__); return 1; } while (0)
+
+int
+do_test (void)
+{
+  struct iovec iov[2];
+  ssize_t ret;
+
+  char buf1[32];
+  char buf2[64];
+
+  memset (buf1, 0x00, sizeof buf1);
+  memset (buf1, 0xca, sizeof buf1);
+
+  memset (iov, 0, sizeof iov);
+  iov[0].iov_base = buf1;
+  iov[0].iov_len = sizeof buf1;
+  iov[1].iov_base = buf2;
+  iov[1].iov_len = sizeof buf2;
+
+  ret = pwritev (temp_fd, iov, 2, 0);
+  if (ret == -1)
+    FAIL ();
+  if (ret != (sizeof buf1 + sizeof buf2))
+    FAIL ();
+
+  ret = pwritev (temp_fd, iov, 2, sizeof buf1 + sizeof buf2);
+  if (ret == -1)
+    FAIL ();
+  if (ret != (sizeof buf1 + sizeof buf2))
+    FAIL ();
+  
+  char buf3[32];
+  char buf4[64];
+
+  iov[0].iov_base = buf3;
+  iov[0].iov_len = sizeof buf3;
+  iov[1].iov_base = buf4;
+  iov[1].iov_len = sizeof buf4;
+
+  ret = preadv (temp_fd, iov, 2, 0);
+  if (ret == -1)
+    FAIL ();
+  if (ret != (sizeof buf3 + sizeof buf4))
+    FAIL ();
+
+  if (memcmp (buf1, buf3, sizeof buf1) != 0)
+    FAIL ();
+  if (memcmp (buf2, buf4, sizeof buf2) != 0)
+    FAIL ();
+
+  ret = preadv (temp_fd, iov, 2, sizeof buf3 + sizeof buf4);
+  if (ret == -1)
+    FAIL ();
+  if (ret != (sizeof buf3 + sizeof buf4))
+    FAIL ();
+
+  if (memcmp (buf1, buf3, sizeof buf1) != 0)
+    FAIL ();
+  if (memcmp (buf2, buf4, sizeof buf2) != 0)
+    FAIL ();
+
+  return 0;
+}
diff --git a/misc/tst-preadvwritev64.c b/misc/tst-preadvwritev64.c
new file mode 100644
index 0000000..ff6e134
--- /dev/null
+++ b/misc/tst-preadvwritev64.c
@@ -0,0 +1,22 @@ 
+/* Tests for pread64 and pwrite64.
+   Copyright (C) 2016 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#define PREADV  preadv64
+#define PWRITEV pwritev64
+
+#include "tst-preadvwritev.c"
diff --git a/sysdeps/unix/sysv/linux/preadv.c b/sysdeps/unix/sysv/linux/preadv.c
index f6958c3..1dbaa35 100644
--- a/sysdeps/unix/sysv/linux/preadv.c
+++ b/sysdeps/unix/sysv/linux/preadv.c
@@ -29,8 +29,7 @@ 
 ssize_t
 preadv (int fd, const struct iovec *vector, int count, off_t offset)
 {
-  return SYSCALL_CANCEL (preadv, fd, vector, count,
-			 __ALIGNMENT_ARG SYSCALL_LL (offset));
+  return SYSCALL_CANCEL (preadv, fd, vector, count, SYSCALL_LL (offset));
 }
 # else
 static ssize_t __atomic_preadv_replacement (int, const struct iovec *,
@@ -40,7 +39,7 @@  preadv (int fd, const struct iovec *vector, int count, off_t offset)
 {
 #  ifdef __NR_preadv
   ssize_t result = SYSCALL_CANCEL (preadv, fd, vector, count,
-				   __ALIGNMENT_ARG SYSCALL_LL (offset));
+				   SYSCALL_LL (offset));
   if (result >= 0 || errno != ENOSYS)
     return result;
 #  endif
diff --git a/sysdeps/unix/sysv/linux/preadv64.c b/sysdeps/unix/sysv/linux/preadv64.c
index 18f5550..7c7910a 100644
--- a/sysdeps/unix/sysv/linux/preadv64.c
+++ b/sysdeps/unix/sysv/linux/preadv64.c
@@ -27,8 +27,7 @@ 
 ssize_t
 preadv64 (int fd, const struct iovec *vector, int count, off64_t offset)
 {
-  return SYSCALL_CANCEL (preadv64, fd, vector, count,
-			 __ALIGNMENT_ARG SYSCALL_LL64 (offset));
+  return SYSCALL_CANCEL (preadv64, fd, vector, count, SYSCALL_LL64 (offset));
 }
 #else
 static ssize_t __atomic_preadv64_replacement (int, const struct iovec *,
@@ -38,7 +37,7 @@  preadv64 (int fd, const struct iovec *vector, int count, off64_t offset)
 {
 #ifdef __NR_preadv64
   ssize_t result = SYSCALL_CANCEL (preadv64, fd, vector, count,
-				   __ALIGNMENT_ARG SYSCALL_LL64 (offset));
+				   SYSCALL_LL64 (offset));
   if (result >= 0 || errno != ENOSYS)
     return result;
 #endif
diff --git a/sysdeps/unix/sysv/linux/pwritev.c b/sysdeps/unix/sysv/linux/pwritev.c
index b11606c..57e579c 100644
--- a/sysdeps/unix/sysv/linux/pwritev.c
+++ b/sysdeps/unix/sysv/linux/pwritev.c
@@ -29,8 +29,7 @@ 
 ssize_t
 pwritev (int fd, const struct iovec *vector, int count, off_t offset)
 {
-  return SYSCALL_CANCEL (pwritev, fd, vector, count,
-			 __ALIGNMENT_ARG SYSCALL_LL (offset));
+  return SYSCALL_CANCEL (pwritev, fd, vector, count, SYSCALL_LL (offset));
 }
 # else
 static ssize_t __atomic_pwritev_replacement (int, const struct iovec *,
@@ -40,7 +39,7 @@  pwritev (int fd, const struct iovec *vector, int count, off_t offset)
 {
 #  ifdef __NR_pwritev
   ssize_t result = SYSCALL_CANCEL (pwritev, fd, vector, count,
-				   __ALIGNMENT_ARG SYSCALL_LL (offset));
+				   SYSCALL_LL (offset));
   if (result >= 0 || errno != ENOSYS)
     return result;
 #  endif
diff --git a/sysdeps/unix/sysv/linux/pwritev64.c b/sysdeps/unix/sysv/linux/pwritev64.c
index bed79b7..39af02f 100644
--- a/sysdeps/unix/sysv/linux/pwritev64.c
+++ b/sysdeps/unix/sysv/linux/pwritev64.c
@@ -28,7 +28,7 @@  ssize_t
 pwritev64 (int fd, const struct iovec *vector, int count, off64_t offset)
 {
   return SYSCALL_CANCEL (pwritev64, fd, vector, count,
-			 __ALIGNMENT_ARG SYSCALL_LL64 (offset));
+			 SYSCALL_LL64 (offset));
 }
 #else
 static ssize_t __atomic_pwritev64_replacement (int, const struct iovec *,
@@ -38,7 +38,7 @@  pwritev64 (int fd, const struct iovec *vector, int count, off64_t offset)
 {
 #ifdef __NR_pwrite64v
   ssize_t result = SYSCALL_CANCEL (pwritev64, fd, vector, count,
-				   __ALIGNMENT_ARG SYSCALL_LL64 (offset));
+				   SYSCALL_LL64 (offset));
   if (result >= 0 || errno != ENOSYS)
     return result;
 #endif