diff mbox series

[RFC] pseudo: intercept syscall() and return ENOTSUP for renameat2

Message ID 20180327154802.14611-1-ross.burton@intel.com
State New
Headers show
Series [RFC] pseudo: intercept syscall() and return ENOTSUP for renameat2 | expand

Commit Message

Ross Burton March 27, 2018, 3:48 p.m. UTC
coreutils is now using renameat2() in mv(1) but as this syscall isn't in most
glibc headers yet it falls back to directly calling syscall(), which pseudo
doesn't intercept.  This results in permission problems as files mysteriously
move without pseudo knowing.

This patch intercepts syscall() and returns ENOTSUP if renameat2() is being
called.  Thanks to Andre McCurdy for the proof-of-concept that this patch is
based on.

Signed-off-by: Ross Burton <ross.burton@intel.com>

---
 meta/recipes-devtools/pseudo/files/renameat2.patch | 63 ++++++++++++++++++++++
 meta/recipes-devtools/pseudo/pseudo_git.bb         |  1 +
 2 files changed, 64 insertions(+)
 create mode 100644 meta/recipes-devtools/pseudo/files/renameat2.patch

-- 
2.11.0

-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core

Comments

Seebs March 27, 2018, 3:53 p.m. UTC | #1
On Tue, 27 Mar 2018 16:48:02 +0100
Ross Burton <ross.burton@intel.com> wrote:

> This patch intercepts syscall() and returns ENOTSUP if renameat2() is

> being called.


I am inclined to NAK this until we have a clearer understanding of the
mechanics observed in glibc's syscall implementation; it's doing magic
that this will not do, and will in fact undo, and we don't know *why*
it does that magic. It seems dangerous to break a thing without first
knowing why it was there in the first place.

If we want to wrap this, at a bare minimum, we should be using a
custom wrapper which doesn't use any of the standard wrapper magic. At
that point, we can *probably* pass args along and just return
immediately after the real_syscall call and get usable results? (And
bail prematurely if it's a call we need to prevent.)

-s
-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Andre McCurdy March 27, 2018, 10:43 p.m. UTC | #2
On Tue, Mar 27, 2018 at 8:48 AM, Ross Burton <ross.burton@intel.com> wrote:
> coreutils is now using renameat2() in mv(1) but as this syscall isn't in most

> glibc headers yet it falls back to directly calling syscall(), which pseudo

> doesn't intercept.  This results in permission problems as files mysteriously

> move without pseudo knowing.

>

> This patch intercepts syscall() and returns ENOTSUP if renameat2() is being

> called.  Thanks to Andre McCurdy for the proof-of-concept that this patch is

> based on.


Tested-by: Andre McCurdy <armccurdy@gmail.com>


> Signed-off-by: Ross Burton <ross.burton@intel.com>

> ---

>  meta/recipes-devtools/pseudo/files/renameat2.patch | 63 ++++++++++++++++++++++

>  meta/recipes-devtools/pseudo/pseudo_git.bb         |  1 +

>  2 files changed, 64 insertions(+)

>  create mode 100644 meta/recipes-devtools/pseudo/files/renameat2.patch

>

> diff --git a/meta/recipes-devtools/pseudo/files/renameat2.patch b/meta/recipes-devtools/pseudo/files/renameat2.patch

> new file mode 100644

> index 00000000000..467b0b3e79f

> --- /dev/null

> +++ b/meta/recipes-devtools/pseudo/files/renameat2.patch

> @@ -0,0 +1,63 @@

> +commit 3a4c536817dce4d0cbaa8f4efe30e722108357dd

> +Author: Ross Burton <ross.burton@intel.com>

> +Date:   Tue Mar 27 14:02:10 2018 +0100

> +

> +    HACK syscall

> +

> +diff --git a/ports/linux/guts/syscall.c b/ports/linux/guts/syscall.c

> +new file mode 100644

> +index 0000000..4ed38ed

> +--- /dev/null

> ++++ b/ports/linux/guts/syscall.c

> +@@ -0,0 +1,30 @@

> ++/*

> ++ * Copyright (c) 2018 Wind River Systems; see

> ++ * guts/COPYRIGHT for information.

> ++ *

> ++ * long syscall(long number, ...)

> ++ *    long rc = -1;

> ++ */

> ++      typedef long syscall_arg_t;

> ++      syscall_arg_t a,b,c,d,e,f;

> ++

> ++      //va_start (ap, number);


Without knowing anything about pseudo, commenting this out looks odd?

> ++      a = va_arg (ap, syscall_arg_t);

> ++      b = va_arg (ap, syscall_arg_t);

> ++      c = va_arg (ap, syscall_arg_t);

> ++      d = va_arg (ap, syscall_arg_t);

> ++      e = va_arg (ap, syscall_arg_t);

> ++      f = va_arg (ap, syscall_arg_t);

> ++      va_end (ap);

> ++

> ++      if ((number == SYS_renameat2)) {

> ++                      errno = ENOTSUP;


I think ENOTSUP is intended to indicate that the failing syscall may
succeed in another context (e.g. fails on a file but may succeed on a
socket, etc ?).

In this case, it may be better to return ENOSYS, which indicates the
syscall is never expected to work on this system. It's the error
returned by a kernel which is too old to support renameat2 and so
anything trying to use renameat2 is likely to check for it.

gnulib checks for both though, so either is going to work for the
current problem with coreutils mv.

> ++                      rc = -1;

> ++      }

> ++      else {

> ++                      rc = real_syscall (number, a, b, c, d, e, f);

> ++      }

> ++

> ++/*    return rc;

> ++ * }

> ++ */

> +diff --git a/ports/linux/wrapfuncs.in b/ports/linux/wrapfuncs.in

> +index fca5b50..137612c 100644

> +--- a/ports/linux/wrapfuncs.in

> ++++ b/ports/linux/wrapfuncs.in

> +@@ -54,3 +54,4 @@ int getpw(uid_t uid, char *buf);

> + int getpwent_r(struct passwd *pwbuf, char *buf, size_t buflen, struct passwd **pwbufp);

> + int getgrent_r(struct group *gbuf, char *buf, size_t buflen, struct group **gbufp);

> + int capset(cap_user_header_t hdrp, const cap_user_data_t datap); /* real_func=pseudo_capset */

> ++long syscall(long number, ...);

> +diff --git a/pseudo_wrappers.c b/pseudo_wrappers.c

> +index e05f73a..b7225d7 100644

> +--- a/pseudo_wrappers.c

> ++++ b/pseudo_wrappers.c

> +@@ -36,6 +36,7 @@

> + #include <sys/time.h>

> + #include <sys/wait.h>

> + #include <dlfcn.h>

> ++#include <sys/syscall.h>

> +

> + /* include this to get PSEUDO_PORT_* definitions */

> + #include "pseudo.h"

> diff --git a/meta/recipes-devtools/pseudo/pseudo_git.bb b/meta/recipes-devtools/pseudo/pseudo_git.bb

> index 66da1cc53b8..44343c3bc57 100644

> --- a/meta/recipes-devtools/pseudo/pseudo_git.bb

> +++ b/meta/recipes-devtools/pseudo/pseudo_git.bb

> @@ -6,6 +6,7 @@ SRC_URI = "git://git.yoctoproject.org/pseudo \

>             file://fallback-group \

>             file://moreretries.patch \

>             file://toomanyfiles.patch \

> +           file://renameat2.patch \

>             "

>

>  SRCREV = "d7c31a25e4b02af0c64e6be0b4b0a9ac4ffc9da2"

> --

> 2.11.0

>

> --

> _______________________________________________

> Openembedded-core mailing list

> Openembedded-core@lists.openembedded.org

> http://lists.openembedded.org/mailman/listinfo/openembedded-core

-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Ross Burton March 28, 2018, 9:15 a.m. UTC | #3
On 27 March 2018 at 23:43, Andre McCurdy <armccurdy@gmail.com> wrote:
>> ++      //va_start (ap, number);

>

> Without knowing anything about pseudo, commenting this out looks odd?


It appears that pseudo's wrappers pass a va_list for you.

Then again I know little about pseudo and just hacked until something worked.

With this patch I was getting occasional failures of the pseudo-using
bitbake-worker, so its not quite ready, but Peter is working on a
better form anyway.

Ross
-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Andre McCurdy March 31, 2018, 4:15 a.m. UTC | #4
On Wed, Mar 28, 2018 at 2:15 AM, Burton, Ross <ross.burton@intel.com> wrote:
> With this patch I was getting occasional failures of the pseudo-using

> bitbake-worker, so its not quite ready, but Peter is working on a

> better form anyway.


The "better form" seems to have been committed to the pseudo master
branch. Very interesting...

+long
+syscall(long number, ...) {
+       /* In a fit of optimism, I imagine that if we didn't get at least 7
+        * arguments, reading past the ones we did get will read into this
+        * space and maybe not clash with or overlap with any later-declared
+        * values. This isn't really a guarantee, and is probably just
+        * superstition.
+        */
+       unsigned long long padding[7];
+       (void) padding;

Arguments passed by the caller will be put on the stack before any
stack frame is created by the callee. You can argue about which way a
stack grows (up or down) but however you define it, reading past the
end of the arguments passed on the stack by the caller is never going
to read into the stack frame created by the callee, so this can't have
the intended affect.

Also... any compiler from at least the past 20 years or so is going to
optimise away unused variables, so this does precisely nothing anyway.

+       /* gcc magic to attempt to just pass these args to syscall. we have to
+        * guess about the number of args; the docs discuss calling conventions
+        * up to 7, so let's try that?
+        */
+       void *res = __builtin_apply((void (*)()) real_syscall,
__builtin_apply_args(), sizeof(long long) * 7);
+       __builtin_return(res);

This is probably going to work, but if the goal is to avoid reading
more from the stack than the generic C code would do, it doesn't
succeed. The "size" parameter to __builtin_apply() seems to simply
specify how much argument data to copy from the stack frame passed by
the caller. Setting it to sizeof(long long) * 7 is safe (ie it will
copy at least enough data from the stack frame passed by the caller,
never less) as it covers both the corner cases where registers are
long long (such as x32) and where _no_ arguments are passed in
registers and everything needs to be copied from the stack. However,
on 32bit targets (where registers are smaller than long long) and on
any target where _some_ arguments are passed via registers, it will
cause more data to be read from the stack than the generic C code
would.

e.g. on 32bit ARM where the first 4 integer arguments are passed via
registers, the optimum value for __builtin_apply() "size" in order to
pass through 1 syscall number and 6 additional register sized
arguments would be sizeof(long) * 3.

A simple test built for 32bit ARM seems to confirm that. The generic
code unconditionally reads 12 bytes from the stack frame passed by the
caller. The code now in pseudo master unconditionally reads 56 bytes.

$ cat tst.c
#include <stdarg.h>

extern int real_syscall();

typedef long syscall_arg_t; /* fixme: wrong for x32 */

int wrapper_generic (long int n, ...)
{
    va_list ap;
    syscall_arg_t a,b,c,d,e,f;
    va_start(ap, n);
    a=va_arg(ap, syscall_arg_t);
    b=va_arg(ap, syscall_arg_t);
    c=va_arg(ap, syscall_arg_t);
    d=va_arg(ap, syscall_arg_t);
    e=va_arg(ap, syscall_arg_t);
    f=va_arg(ap, syscall_arg_t);
    va_end(ap);
    return real_syscall(n,a,b,c,d,e,f);
}

int wrapper_gcc_specific (long int n, ...)
{
    void *res = __builtin_apply((void (*)()) real_syscall,
__builtin_apply_args(), sizeof(long long) * 7);
    __builtin_return(res);
}

$ arm-linux-gnueabi-objdump -d tst.o

tst.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <wrapper_generic>:
   0:    e92d000f     push    {r0, r1, r2, r3}
   4:    e92d407f     push    {r0, r1, r2, r3, r4, r5, r6, lr}
   8:    e28d0020     add    r0, sp, #32
   c:    e59d3038     ldr    r3, [sp, #56]    ; 0x38
  10:    e28d2024     add    r2, sp, #36    ; 0x24
  14:    e58d2014     str    r2, [sp, #20]
  18:    e58d3008     str    r3, [sp, #8]
  1c:    e59d3034     ldr    r3, [sp, #52]    ; 0x34
  20:    e58d3004     str    r3, [sp, #4]
  24:    e59d3030     ldr    r3, [sp, #48]    ; 0x30
  28:    e58d3000     str    r3, [sp]
  2c:    e890000f     ldm    r0, {r0, r1, r2, r3}
  30:    ebfffffe     bl    0 <real_syscall>
  34:    e28dd01c     add    sp, sp, #28
  38:    e49de004     pop    {lr}        ; (ldr lr, [sp], #4)
  3c:    e28dd010     add    sp, sp, #16
  40:    e12fff1e     bx    lr

00000044 <wrapper_gcc_specific>:
  44:    e92d000f     push    {r0, r1, r2, r3}
  48:    e92d4830     push    {r4, r5, fp, lr}
  4c:    e28db00c     add    fp, sp, #12
  50:    e24b400c     sub    r4, fp, #12
  54:    e28bc014     add    ip, fp, #20
  58:    e24dd028     sub    sp, sp, #40    ; 0x28
  5c:    e50b0020     str    r0, [fp, #-32]    ; 0xffffffe0
  60:    e1a0500d     mov    r5, sp
  64:    e50b101c     str    r1, [fp, #-28]    ; 0xffffffe4
  68:    e24dd040     sub    sp, sp, #64    ; 0x40
  6c:    e50b2018     str    r2, [fp, #-24]    ; 0xffffffe8
  70:    e1a0e00d     mov    lr, sp
  74:    e50b3014     str    r3, [fp, #-20]    ; 0xffffffec
  78:    e524c018     str    ip, [r4, #-24]!    ; 0xffffffe8
  7c:    e8bc000f     ldm    ip!, {r0, r1, r2, r3}
  80:    e8ae000f     stmia    lr!, {r0, r1, r2, r3}
  84:    e8bc000f     ldm    ip!, {r0, r1, r2, r3}
  88:    e8ae000f     stmia    lr!, {r0, r1, r2, r3}
  8c:    e8bc000f     ldm    ip!, {r0, r1, r2, r3}
  90:    e8ae000f     stmia    lr!, {r0, r1, r2, r3}
  94:    e89c0003     ldm    ip, {r0, r1}
  98:    e88e0003     stm    lr, {r0, r1}
  9c:    e994000f     ldmib    r4, {r0, r1, r2, r3}
  a0:    e24b4034     sub    r4, fp, #52    ; 0x34
  a4:    ebfffffe     bl    0 <real_syscall>
  a8:    e884000f     stm    r4, {r0, r1, r2, r3}
  ac:    e1a0d005     mov    sp, r5
  b0:    e894000f     ldm    r4, {r0, r1, r2, r3}
  b4:    e24bd00c     sub    sp, fp, #12
  b8:    e8bd4830     pop    {r4, r5, fp, lr}
  bc:    e28dd010     add    sp, sp, #16
  c0:    e12fff1e     bx    lr


(Note in the code was compiled with -mfloat-abi=soft to avoid
__builtin_apply() needing to save and restore all floating point
registers - which doesn't affect the amount of data read from the
stack, but makes the assembler more than twice as long...).
-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Seebs March 31, 2018, 5:06 a.m. UTC | #5
On Fri, 30 Mar 2018 21:15:18 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:
> Arguments passed by the caller will be put on the stack before any

> stack frame is created by the callee. You can argue about which way a

> stack grows (up or down) but however you define it, reading past the

> end of the arguments passed on the stack by the caller is never going

> to read into the stack frame created by the callee, so this can't have

> the intended affect.


I definitely think it's Probably Purely Superstitious, but I'm not sure
about this. I seem to recall at least one environment in which
consecutive parameters had increasing addresses, and local variables
had addresses higher still, so that somewhere past the address of the
Nth argument would be the address of a local variable. Given how many
insane calling conventions there are, I can't rule it out. (I do think
you're right about the compiler probably optimizing it away, although
they are not always as aggressive about that as you might expect them
to be.)

> This is probably going to work, but if the goal is to avoid reading

> more from the stack than the generic C code would do, it doesn't

> succeed.


I'm aware. The purpose is to (1) use the thing most expressive of
intent, (2) make sure that people know that this is not expected to be
portable. "__builtin_apply()" is fairly clear as to its *intent*, and
is unlikely to exist in a compatible calling convention in other
compilers.

> The "size" parameter to __builtin_apply() seems to simply

> specify how much argument data to copy from the stack frame passed

> byIt is not always simple to compute the proper value for size. the

> caller. Setting it to sizeof(long long) * 7 is safe (ie it will copy

> at least enough data from the stack frame passed by the caller, never

> less) as it covers both the corner cases where registers are long

> long (such as x32) and where _no_ arguments are passed in registers

> and everything needs to be copied from the stack. However, on 32bit

> targets (where registers are smaller than long long) and on any

> target where _some_ arguments are passed via registers, it will cause

> more data to be read from the stack than the generic C code would.


Yes, it will. As the documentation says: "It is not always simple to
compute the proper value for size."

I do think this is currently too large; specifically, it looks as
though right now there's a hard limit of 6 register-sized things, and
anything that takes off_t or similar types has fewer than 6 arguments.

So 6 * sizeof(off_t) or so is probably actually sufficient, if that
stays true, but I don't see a feature test macro for it...

> e.g. on 32bit ARM where the first 4 integer arguments are passed via

> registers, the optimum value for __builtin_apply() "size" in order to

> pass through 1 syscall number and 6 additional register sized

> arguments would be sizeof(long) * 3.


That seems probable, yes.

> typedef long syscall_arg_t; /* fixme: wrong for x32 */


Yes.

That's the problem: There's no sane way to express "the size that you
would have gotten for these arguments of unknown types", so I
intentionally went with something that may well be too long, but will
not be too short.

> (Note in the code was compiled with -mfloat-abi=soft to avoid

> __builtin_apply() needing to save and restore all floating point

> registers - which doesn't affect the amount of data read from the

> stack, but makes the assembler more than twice as long...).


And I don't *think* any syscalls take float arguments, but I don't
know that this is not only true now, but going to stay true.

-s
-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Andre McCurdy March 31, 2018, 6:02 a.m. UTC | #6
On Fri, Mar 30, 2018 at 10:06 PM, Seebs <seebs@seebs.net> wrote:
> That's the problem: There's no sane way to express "the size that you

> would have gotten for these arguments of unknown types"


And there I think lies the key point that you still haven't grasped.
The arguments are not of unknown types - they are all of types which
are the same size as integer registers. The syscall manpage very
clearly documents the fact that all arguments to Linux syscalls are
passed to the kernel in registers. I think you even asked me why that
was important or useful.

If there's any user space code out there which calls libc syscall()
and does not pass variables which libc syscall() (or a wrapper for it)
can unconditionally treat as a type which fits into integer registers
than it's a bug the caller. End of story.
-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Richard Purdie March 31, 2018, 1:12 p.m. UTC | #7
Just to report back, I've tried testing an earlier version of pseudo
master with the path changes reverted and current master. With both I'm
seeing librsvg fail during do_install with a segfault (you have to
remove the 2> /dev/null to see it).

I'm seeing multiple entries in the host's dmesg:

[180364.269879] glib-compile-re[38258]: segfault at 0 ip           (null) sp 00007ffffaafbf78 error 14 in glib-compile-resources[55abc201b000+9000]
[180376.499904] glib-compile-re[46860]: segfault at 0 ip           (null) sp 00007ffd3b10f2c8 error 14 in glib-compile-resources[55b2cb528000+9000]
[180376.612404] glib-compile-re[46862]: segfault at 0 ip           (null) sp 00007fff612977f8 error 14 in glib-compile-resources[55ad08797000+9000]
[180376.726317] glib-compile-re[46864]: segfault at 0 ip           (null) sp 00007ffdce018928 error 14 in glib-compile-resources[5610851ec000+9000]
[180376.836705] glib-compile-re[46866]: segfault at 0 ip           (null) sp 00007ffc586fdda8 error 14 in glib-compile-resources[562f38dfc000+9000]
[180603.294668] gdk-pixbuf-quer[51620]: segfault at 0 ip           (null) sp 00007ffce7947fe8 error 14 in gdk-pixbuf-query-loaders.real[55b88bb7f000+2000]
[185206.227285] gdk-pixbuf-quer[51885]: segfault at 0 ip           (null) sp 00007ffe6393ae28 error 14 in gdk-pixbuf-query-loaders.real[556db9ae3000+2000]
[186523.735264] gdk-pixbuf-quer[70272]: segfault at 0 ip           (null) sp 00007fff6a855808 error 14 in gdk-pixbuf-query-loaders.real[55ec4e8fd000+2000]

I believe that libglib-2.0 does use syscall() for something and that
the above programs are calling into it and faulting.

Its likely possible to reproduce this outside of a build by running
make install from librsvg under pseudo.

If I take master and revert 778abd3686fb7c419e9016fdd9e93819405d52aa fr
om pseudo, it no longer segfaults.

So something is not working correctly with the intercept sadly.

Cheers,

Richard
Seebs March 31, 2018, 3:35 p.m. UTC | #8
On Fri, 30 Mar 2018 23:02:29 -0700
Andre McCurdy <armccurdy@gmail.com> wrote:

> On Fri, Mar 30, 2018 at 10:06 PM, Seebs <seebs@seebs.net> wrote:

> > That's the problem: There's no sane way to express "the size that

> > you would have gotten for these arguments of unknown types"

> 

> And there I think lies the key point that you still haven't grasped.


I think you're right, but also you're being sorta rude about it. I
think you're missing the distinction between "broad enough experience
with weird edge cases to be possibly-unduly cautious" and "has no idea
which end of a compiler is up".

> The arguments are not of unknown types - they are all of types which

> are the same size as integer registers.


It turns out, in C, "the same size" is *not enough information* to
determine where an argument goes. It is probably enough information for
the subset of arguments that syscall is using, on these architectures,
but I say "probably" because I have no spec to point to that says it's
necessarily the case.

> The syscall manpage very

> clearly documents the fact that all arguments to Linux syscalls are

> passed to the kernel in registers. I think you even asked me why that

> was important or useful.


I'm still honestly not totally sure why it has to say *which*
registers, specifically. Under what circumstances will my invocation of
syscall(2) be changed by the knowledge that argument 5 to an OABI ARM
system gets passed in v1? The information being present implies that I
might need to know this to invoke syscall(2) correctly, but the only
examples given are not actually specific to that information.

> If there's any user space code out there which calls libc syscall()

> and does not pass variables which libc syscall() (or a wrapper for it)

> can unconditionally treat as a type which fits into integer registers

> than it's a bug the caller. End of story.


This raises an interesting point. The man page says that the dummy 0
argument is needed for SYS_readahead because of an alignment issue, to
force the value into r2/r3. It does not say that the manual split of the
64-bit value into two halves is needed, it just does that. Is that
actually required anyway, or only because of the need to pad the
argument list? (For instance, look at sync_file_range2(), which
reorders the arguments to sync_file_range() for efficiency.

... Oh, wow. I went and checked on readahead():

>> On some 32-bit architectures, the calling signature for this

>> system call differs, for the reasons described in syscall(2).


Note how they don't specify what the alternative calling signature is.

-s
-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Seebs March 31, 2018, 3:39 p.m. UTC | #9
On Sat, 31 Mar 2018 14:12:55 +0100
Richard Purdie <richard.purdie@linuxfoundation.org> wrote:

> I believe that libglib-2.0 does use syscall() for something and that

> the above programs are calling into it and faulting.


Interesting! I'll poke around and see what I can find.

-s
-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Bruce Ashfield March 31, 2018, 3:58 p.m. UTC | #10
On Sat, Mar 31, 2018 at 11:35 AM, Seebs <seebs@seebs.net> wrote:
> On Fri, 30 Mar 2018 23:02:29 -0700

> Andre McCurdy <armccurdy@gmail.com> wrote:

>

>> On Fri, Mar 30, 2018 at 10:06 PM, Seebs <seebs@seebs.net> wrote:

>> > That's the problem: There's no sane way to express "the size that

>> > you would have gotten for these arguments of unknown types"

>>

>> And there I think lies the key point that you still haven't grasped.

>

> I think you're right, but also you're being sorta rude about it. I


FWIW, I was thinking the same thing.

seebs is quite capable of working through this .. with all the bike shedding
here, it is taking up time that could actually be spent on the solution.

Cheers,

Bruce

> think you're missing the distinction between "broad enough experience

> with weird edge cases to be possibly-unduly cautious" and "has no idea

> which end of a compiler is up".

>

>> The arguments are not of unknown types - they are all of types which

>> are the same size as integer registers.

>

> It turns out, in C, "the same size" is *not enough information* to

> determine where an argument goes. It is probably enough information for

> the subset of arguments that syscall is using, on these architectures,

> but I say "probably" because I have no spec to point to that says it's

> necessarily the case.

>

>> The syscall manpage very

>> clearly documents the fact that all arguments to Linux syscalls are

>> passed to the kernel in registers. I think you even asked me why that

>> was important or useful.

>

> I'm still honestly not totally sure why it has to say *which*

> registers, specifically. Under what circumstances will my invocation of

> syscall(2) be changed by the knowledge that argument 5 to an OABI ARM

> system gets passed in v1? The information being present implies that I

> might need to know this to invoke syscall(2) correctly, but the only

> examples given are not actually specific to that information.

>

>> If there's any user space code out there which calls libc syscall()

>> and does not pass variables which libc syscall() (or a wrapper for it)

>> can unconditionally treat as a type which fits into integer registers

>> than it's a bug the caller. End of story.

>

> This raises an interesting point. The man page says that the dummy 0

> argument is needed for SYS_readahead because of an alignment issue, to

> force the value into r2/r3. It does not say that the manual split of the

> 64-bit value into two halves is needed, it just does that. Is that

> actually required anyway, or only because of the need to pad the

> argument list? (For instance, look at sync_file_range2(), which

> reorders the arguments to sync_file_range() for efficiency.

>

> ... Oh, wow. I went and checked on readahead():

>

>>> On some 32-bit architectures, the calling signature for this

>>> system call differs, for the reasons described in syscall(2).

>

> Note how they don't specify what the alternative calling signature is.

>

> -s

> --

> _______________________________________________

> Openembedded-core mailing list

> Openembedded-core@lists.openembedded.org

> http://lists.openembedded.org/mailman/listinfo/openembedded-core




-- 
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end"
-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Joshua Watt March 31, 2018, 9:03 p.m. UTC | #11
On Sat, Mar 31, 2018 at 8:12 AM, Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
> Just to report back, I've tried testing an earlier version of pseudo

> master with the path changes reverted and current master. With both I'm

> seeing librsvg fail during do_install with a segfault (you have to

> remove the 2> /dev/null to see it).

>

> I'm seeing multiple entries in the host's dmesg:

>

> [180364.269879] glib-compile-re[38258]: segfault at 0 ip           (null) sp 00007ffffaafbf78 error 14 in glib-compile-resources[55abc201b000+9000]

> [180376.499904] glib-compile-re[46860]: segfault at 0 ip           (null) sp 00007ffd3b10f2c8 error 14 in glib-compile-resources[55b2cb528000+9000]

> [180376.612404] glib-compile-re[46862]: segfault at 0 ip           (null) sp 00007fff612977f8 error 14 in glib-compile-resources[55ad08797000+9000]

> [180376.726317] glib-compile-re[46864]: segfault at 0 ip           (null) sp 00007ffdce018928 error 14 in glib-compile-resources[5610851ec000+9000]

> [180376.836705] glib-compile-re[46866]: segfault at 0 ip           (null) sp 00007ffc586fdda8 error 14 in glib-compile-resources[562f38dfc000+9000]

> [180603.294668] gdk-pixbuf-quer[51620]: segfault at 0 ip           (null) sp 00007ffce7947fe8 error 14 in gdk-pixbuf-query-loaders.real[55b88bb7f000+2000]

> [185206.227285] gdk-pixbuf-quer[51885]: segfault at 0 ip           (null) sp 00007ffe6393ae28 error 14 in gdk-pixbuf-query-loaders.real[556db9ae3000+2000]

> [186523.735264] gdk-pixbuf-quer[70272]: segfault at 0 ip           (null) sp 00007fff6a855808 error 14 in gdk-pixbuf-query-loaders.real[55ec4e8fd000+2000]

>

> I believe that libglib-2.0 does use syscall() for something and that

> the above programs are calling into it and faulting.

>

> Its likely possible to reproduce this outside of a build by running

> make install from librsvg under pseudo.

>

> If I take master and revert 778abd3686fb7c419e9016fdd9e93819405d52aa fr

> om pseudo, it no longer segfaults.

>

> So something is not working correctly with the intercept sadly.


I had a little time and it was easy for me to reproduce this. It looks
like real_syscall in pseudo is still NULL meaning it wasn't
initialized:

$ coredumpctl gdb -1
...
Core was generated by
`/home/jwatt/Development/poky/build/tmp/work/i586-poky-linux/librsvg/2.40.20-r0/'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00007f1c013debfb in syscall (number=140726086677920) at
ports/linux/pseudo_wrappers.c:71
#2  0x00007f1c0071737f in g_once_init_leave ()
   from /home/jwatt/Development/poky/build/tmp/work/i586-poky-linux/librsvg/2.40.20-r0/recipe-sysroot-native/usr/lib/gdk-pixbuf-2.0/../libglib-2.0.so.0
#3  0x00007f1c009c615d in g_value_array_get_type ()
   from /home/jwatt/Development/poky/build/tmp/work/i586-poky-linux/librsvg/2.40.20-r0/recipe-sysroot-native/usr/lib/gdk-pixbuf-2.0/../libgobject-2.0.so.0
#4  0x00007f1c009d7478 in ?? () from
/home/jwatt/Development/poky/build/tmp/work/i586-poky-linux/librsvg/2.40.20-r0/recipe-sysroot-native/usr/lib/gdk-pixbuf-2.0/../libgobject-2.0.so.0
#5  0x00007f1c009c41ac in ?? () from
/home/jwatt/Development/poky/build/tmp/work/i586-poky-linux/librsvg/2.40.20-r0/recipe-sysroot-native/usr/lib/gdk-pixbuf-2.0/../libgobject-2.0.so.0
#6  0x00007f1c01628f8a in ?? () from
/home/jwatt/Development/poky/build/tmp/sysroots-uninative/x86_64-linux/lib/ld-linux-x86-64.so.2
#7  0x00007f1c01629096 in ?? () from
/home/jwatt/Development/poky/build/tmp/sysroots-uninative/x86_64-linux/lib/ld-linux-x86-64.so.2
#8  0x00007f1c0161b06a in ?? () from
/home/jwatt/Development/poky/build/tmp/sysroots-uninative/x86_64-linux/lib/ld-linux-x86-64.so.2
#9  0x0000000000000002 in ?? ()
#10 0x00007ffd58684fd4 in ?? ()
#11 0x00007ffd58685069 in ?? ()
#12 0x0000000000000000 in ?? ()
(gdb) p real_syscall
$1 = (long (*)(long, ...)) 0x0

I'm not very familiar with the internal of how pseudo works, but
adding this to the beginning of the syscall wrapper function in
ports/linux/pseudo_wrappers.c fixed it for me (no idea if this is the
"right" fix):

    if (!pseudo_check_wrappers() || !real_syscall) {
        /* rc was initialized to the "failure" value */
        pseudo_enosys("syscall");
        PROFILE_DONE;
        errno = ENOSYS;
        return -1;
    }

Looks like maybe gdk-pixbuf-query-loaders has the unlucky honour of
being one of the few programs that invokes the pseudo syscall()
wrapper before any other functions that pseudo wraps and the missing
initializer made it explode?

>

> Cheers,

>

> Richard

> --

> _______________________________________________

> Openembedded-core mailing list

> Openembedded-core@lists.openembedded.org

> http://lists.openembedded.org/mailman/listinfo/openembedded-core

-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Seebs March 31, 2018, 9:16 p.m. UTC | #12
On Sat, 31 Mar 2018 16:03:01 -0500
Joshua Watt <jpewhacker@gmail.com> wrote:

> Looks like maybe gdk-pixbuf-query-loaders has the unlucky honour of

> being one of the few programs that invokes the pseudo syscall()

> wrapper before any other functions that pseudo wraps and the missing

> initializer made it explode?


That seems exceedingly likely, because indeed, that's supposed to be at
the top of the wrapper. Thanks.

-s
-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Richard Purdie April 2, 2018, 10:20 p.m. UTC | #13
And on a happier note this time, pseudo master appears much happier and
19f18124f16c4c85405b140a1fb8cb3b31d865bf seems to pass on the
autobuilders. I'll do some specific checks on f27 but things are
looking good, thanks everyone who's helped with this.

Cheers,

Richard


-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Khem Raj April 4, 2018, 9:28 p.m. UTC | #14
On 3/27/18 8:48 AM, Ross Burton wrote:
> coreutils is now using renameat2() in mv(1) but as this syscall isn't in most

> glibc headers yet it falls back to directly calling syscall(), which pseudo

> doesn't intercept.  This results in permission problems as files mysteriously

> move without pseudo knowing.

> 

> This patch intercepts syscall() and returns ENOTSUP if renameat2() is being

> called.  Thanks to Andre McCurdy for the proof-of-concept that this patch is

> based on.


what is the performance impact of adding another stack frame and
function call in the chain here. Do we have data ?

> 

> Signed-off-by: Ross Burton <ross.burton@intel.com>

> ---

>  meta/recipes-devtools/pseudo/files/renameat2.patch | 63 ++++++++++++++++++++++

>  meta/recipes-devtools/pseudo/pseudo_git.bb         |  1 +

>  2 files changed, 64 insertions(+)

>  create mode 100644 meta/recipes-devtools/pseudo/files/renameat2.patch

> 

> diff --git a/meta/recipes-devtools/pseudo/files/renameat2.patch b/meta/recipes-devtools/pseudo/files/renameat2.patch

> new file mode 100644

> index 00000000000..467b0b3e79f

> --- /dev/null

> +++ b/meta/recipes-devtools/pseudo/files/renameat2.patch

> @@ -0,0 +1,63 @@

> +commit 3a4c536817dce4d0cbaa8f4efe30e722108357dd

> +Author: Ross Burton <ross.burton@intel.com>

> +Date:   Tue Mar 27 14:02:10 2018 +0100

> +

> +    HACK syscall

> +

> +diff --git a/ports/linux/guts/syscall.c b/ports/linux/guts/syscall.c

> +new file mode 100644

> +index 0000000..4ed38ed

> +--- /dev/null

> ++++ b/ports/linux/guts/syscall.c

> +@@ -0,0 +1,30 @@

> ++/*

> ++ * Copyright (c) 2018 Wind River Systems; see

> ++ * guts/COPYRIGHT for information.

> ++ *

> ++ * long syscall(long number, ...)

> ++ *	long rc = -1;

> ++ */

> ++	typedef long syscall_arg_t;

> ++	syscall_arg_t a,b,c,d,e,f;

> ++

> ++	//va_start (ap, number);

> ++	a = va_arg (ap, syscall_arg_t);

> ++	b = va_arg (ap, syscall_arg_t);

> ++	c = va_arg (ap, syscall_arg_t);

> ++	d = va_arg (ap, syscall_arg_t);

> ++	e = va_arg (ap, syscall_arg_t);

> ++	f = va_arg (ap, syscall_arg_t);

> ++	va_end (ap);

> ++

> ++	if ((number == SYS_renameat2)) {

> ++			errno = ENOTSUP;

> ++			rc = -1;

> ++	}

> ++	else {

> ++			rc = real_syscall (number, a, b, c, d, e, f);

> ++	}

> ++

> ++/*	return rc;

> ++ * }

> ++ */

> +diff --git a/ports/linux/wrapfuncs.in b/ports/linux/wrapfuncs.in

> +index fca5b50..137612c 100644

> +--- a/ports/linux/wrapfuncs.in

> ++++ b/ports/linux/wrapfuncs.in

> +@@ -54,3 +54,4 @@ int getpw(uid_t uid, char *buf);

> + int getpwent_r(struct passwd *pwbuf, char *buf, size_t buflen, struct passwd **pwbufp);

> + int getgrent_r(struct group *gbuf, char *buf, size_t buflen, struct group **gbufp);

> + int capset(cap_user_header_t hdrp, const cap_user_data_t datap); /* real_func=pseudo_capset */

> ++long syscall(long number, ...);

> +diff --git a/pseudo_wrappers.c b/pseudo_wrappers.c

> +index e05f73a..b7225d7 100644

> +--- a/pseudo_wrappers.c

> ++++ b/pseudo_wrappers.c

> +@@ -36,6 +36,7 @@

> + #include <sys/time.h>

> + #include <sys/wait.h>

> + #include <dlfcn.h>

> ++#include <sys/syscall.h>

> + 

> + /* include this to get PSEUDO_PORT_* definitions */

> + #include "pseudo.h"

> diff --git a/meta/recipes-devtools/pseudo/pseudo_git.bb b/meta/recipes-devtools/pseudo/pseudo_git.bb

> index 66da1cc53b8..44343c3bc57 100644

> --- a/meta/recipes-devtools/pseudo/pseudo_git.bb

> +++ b/meta/recipes-devtools/pseudo/pseudo_git.bb

> @@ -6,6 +6,7 @@ SRC_URI = "git://git.yoctoproject.org/pseudo \

>             file://fallback-group \

>             file://moreretries.patch \

>             file://toomanyfiles.patch \

> +           file://renameat2.patch \

>             "

>  

>  SRCREV = "d7c31a25e4b02af0c64e6be0b4b0a9ac4ffc9da2"

> 


-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Seebs April 4, 2018, 9:45 p.m. UTC | #15
On Wed, 4 Apr 2018 14:28:11 -0700
Khem Raj <raj.khem@gmail.com> wrote:

> what is the performance impact of adding another stack frame and

> function call in the chain here. Do we have data ?


Very close to unmeasurable, because *almost nothing ever uses syscall*.

This is used only for the case where someone is explicitly calling
syscall(), not for any other system call use case. And my
implementation (which is not the same as this one) also overrides the
wrapper generation, so there's no standard pseudo wrapper overhead
(which is several times larger and involves mutexes and signal mask
changing), it's just passing the call on unless it's SYS_renameat2.

-s
-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Andre McCurdy April 4, 2018, 9:51 p.m. UTC | #16
On Wed, Apr 4, 2018 at 2:28 PM, Khem Raj <raj.khem@gmail.com> wrote:
> On 3/27/18 8:48 AM, Ross Burton wrote:

>> coreutils is now using renameat2() in mv(1) but as this syscall isn't in most

>> glibc headers yet it falls back to directly calling syscall(), which pseudo

>> doesn't intercept.  This results in permission problems as files mysteriously

>> move without pseudo knowing.

>>

>> This patch intercepts syscall() and returns ENOTSUP if renameat2() is being

>> called.  Thanks to Andre McCurdy for the proof-of-concept that this patch is

>> based on.

>

> what is the performance impact of adding another stack frame and

> function call in the chain here. Do we have data ?


I'm not sure if anyone has done any formal measurements, but the
overhead of wrapping libc APIs is certainly noticeable in some cases.
For example running "git status" in a devshell under pseudo is
noticeably slower than from a regular shell. e.g.

Within a devshell for glibc:

  # time git status
  real    0m1.552s
  user    0m0.235s
  sys    0m0.870s

From a regular shell in the glibc workdir:

  $ time git status
  real    0m0.067s
  user    0m0.034s
  sys    0m0.033s

Of course most the overhead here comes from what pseudo does inside
the wrapper, not from the wrapper itself. For tasks which are CPU
bound (e.g. compiling) and calling into the libc functions which
pseudo wraps less often than "git status" does, the overhead will be
much less.
-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
Khem Raj April 5, 2018, 12:29 a.m. UTC | #17
On 4/4/18 2:45 PM, Seebs wrote:
> On Wed, 4 Apr 2018 14:28:11 -0700

> Khem Raj <raj.khem@gmail.com> wrote:

> 

>> what is the performance impact of adding another stack frame and

>> function call in the chain here. Do we have data ?

> 

> Very close to unmeasurable, because *almost nothing ever uses syscall*.

> 

> This is used only for the case where someone is explicitly calling

> syscall(), not for any other system call use case. And my

> implementation (which is not the same as this one) also overrides the

> wrapper generation, so there's no standard pseudo wrapper overhead

> (which is several times larger and involves mutexes and signal mask

> changing), it's just passing the call on unless it's SYS_renameat2.

> 


Thanks for this

> -s

> 


-- 
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core
diff mbox series

Patch

diff --git a/meta/recipes-devtools/pseudo/files/renameat2.patch b/meta/recipes-devtools/pseudo/files/renameat2.patch
new file mode 100644
index 00000000000..467b0b3e79f
--- /dev/null
+++ b/meta/recipes-devtools/pseudo/files/renameat2.patch
@@ -0,0 +1,63 @@ 
+commit 3a4c536817dce4d0cbaa8f4efe30e722108357dd
+Author: Ross Burton <ross.burton@intel.com>
+Date:   Tue Mar 27 14:02:10 2018 +0100
+
+    HACK syscall
+
+diff --git a/ports/linux/guts/syscall.c b/ports/linux/guts/syscall.c
+new file mode 100644
+index 0000000..4ed38ed
+--- /dev/null
++++ b/ports/linux/guts/syscall.c
+@@ -0,0 +1,30 @@
++/*
++ * Copyright (c) 2018 Wind River Systems; see
++ * guts/COPYRIGHT for information.
++ *
++ * long syscall(long number, ...)
++ *	long rc = -1;
++ */
++	typedef long syscall_arg_t;
++	syscall_arg_t a,b,c,d,e,f;
++
++	//va_start (ap, number);
++	a = va_arg (ap, syscall_arg_t);
++	b = va_arg (ap, syscall_arg_t);
++	c = va_arg (ap, syscall_arg_t);
++	d = va_arg (ap, syscall_arg_t);
++	e = va_arg (ap, syscall_arg_t);
++	f = va_arg (ap, syscall_arg_t);
++	va_end (ap);
++
++	if ((number == SYS_renameat2)) {
++			errno = ENOTSUP;
++			rc = -1;
++	}
++	else {
++			rc = real_syscall (number, a, b, c, d, e, f);
++	}
++
++/*	return rc;
++ * }
++ */
+diff --git a/ports/linux/wrapfuncs.in b/ports/linux/wrapfuncs.in
+index fca5b50..137612c 100644
+--- a/ports/linux/wrapfuncs.in
++++ b/ports/linux/wrapfuncs.in
+@@ -54,3 +54,4 @@ int getpw(uid_t uid, char *buf);
+ int getpwent_r(struct passwd *pwbuf, char *buf, size_t buflen, struct passwd **pwbufp);
+ int getgrent_r(struct group *gbuf, char *buf, size_t buflen, struct group **gbufp);
+ int capset(cap_user_header_t hdrp, const cap_user_data_t datap); /* real_func=pseudo_capset */
++long syscall(long number, ...);
+diff --git a/pseudo_wrappers.c b/pseudo_wrappers.c
+index e05f73a..b7225d7 100644
+--- a/pseudo_wrappers.c
++++ b/pseudo_wrappers.c
+@@ -36,6 +36,7 @@
+ #include <sys/time.h>
+ #include <sys/wait.h>
+ #include <dlfcn.h>
++#include <sys/syscall.h>
+ 
+ /* include this to get PSEUDO_PORT_* definitions */
+ #include "pseudo.h"
diff --git a/meta/recipes-devtools/pseudo/pseudo_git.bb b/meta/recipes-devtools/pseudo/pseudo_git.bb
index 66da1cc53b8..44343c3bc57 100644
--- a/meta/recipes-devtools/pseudo/pseudo_git.bb
+++ b/meta/recipes-devtools/pseudo/pseudo_git.bb
@@ -6,6 +6,7 @@  SRC_URI = "git://git.yoctoproject.org/pseudo \
            file://fallback-group \
            file://moreretries.patch \
            file://toomanyfiles.patch \
+           file://renameat2.patch \
            "
 
 SRCREV = "d7c31a25e4b02af0c64e6be0b4b0a9ac4ffc9da2"