Message ID | 20200211173510.16347-1-alex.bennee@linaro.org |
---|---|
State | New |
Headers | show |
Series | linux-user: un-register threads from RCU before exit | expand |
On Tue, 11 Feb 2020 at 17:36, Alex Bennée <alex.bennee@linaro.org> wrote: > > Through a mechanism I don't quite yet understand we can find ourselves > with a left over RCU thread when we exit group. This is a racy failure > that occurs for example with: > > alpha-linux-user running testthread > with libhowvec.so plugin > but only when run from make > > This may not be the correct fix but it seems to alleviate the > symptoms. This is weird. The only time we call preexit_cleanup() is when the next thing we do is to terminate the entire process all at once. (For some reason in one place we do that by calling _exit() and in another place by calling exit_group() -- I don't see why we need that inconsistency). I'm pretty sure the system emulation threads don't call rcu_unregister_thread() for the "whole process is going away" case, so something odd is happening here... thanks -- PMM
On 11/02/20 18:35, Alex Bennée wrote: > Through a mechanism I don't quite yet understand we can find ourselves > with a left over RCU thread when we exit group. This is a racy failure > that occurs for example with: > > alpha-linux-user running testthread > with libhowvec.so plugin > but only when run from make > > This may not be the correct fix but it seems to alleviate the > symptoms. Can you explain what is the effect of this left-over thread? All threads should be terminated when the process exits and I'm not sure why the user-mode emulation is special. Paolo
Peter Maydell <peter.maydell@linaro.org> writes: > On Tue, 11 Feb 2020 at 17:36, Alex Bennée <alex.bennee@linaro.org> wrote: >> >> Through a mechanism I don't quite yet understand we can find ourselves >> with a left over RCU thread when we exit group. This is a racy failure >> that occurs for example with: >> >> alpha-linux-user running testthread >> with libhowvec.so plugin >> but only when run from make >> >> This may not be the correct fix but it seems to alleviate the >> symptoms. > > This is weird. The only time we call preexit_cleanup() > is when the next thing we do is to terminate the entire > process all at once. (For some reason in one place > we do that by calling _exit() and in another place > by calling exit_group() -- I don't see why we need that > inconsistency). > > I'm pretty sure the system emulation threads don't > call rcu_unregister_thread() for the "whole process > is going away" case, so something odd is happening here... So what I see is (although possibly confused further by rr's capture): End of pthread test. [New Thread 7966.7967] Thread 3 received signal SIGKILL, Killed. [Switching to Thread 7966.7967] 0x0000000070000002 in ?? () (rr) bt #0 0x0000000070000002 in ?? () #1 0x00007f36981a490e in _raw_syscall () at /build/rr-79viaC/rr-5.2.0/src/preload/raw_syscall.S:120 #2 0x00007f36981a13fe in traced_raw_syscall (call=call@entry=0x7f369656ffa0) at ./src/preload/syscallbuf.c:222 #3 0x00007f36981a271a in sys_xstat64 (call=<optimized out>) at ./src/preload/syscallbuf.c:2439 #4 syscall_hook_internal (call=0x7f369656ffa0) at ./src/preload/syscallbuf.c:2651 #5 syscall_hook (call=0x7f369656ffa0) at ./src/preload/syscallbuf.c:2687 #6 0x00007f36981a12da in _syscall_hook_trampoline () at /build/rr-79viaC/rr-5.2.0/src/preload/syscall_hook.S:282 #7 0x00007f36981a130a in __morestack () at /build/rr-79viaC/rr-5.2.0/src/preload/syscall_hook.S:417 #8 0x00007f36981a1310 in _syscall_hook_trampoline_48_3d_01_f0_ff_ff () at /build/rr-79viaC/rr-5.2.0/src/preload/syscall_hook.S:423 #9 0x00007f369758bf5f in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #10 0x0000556b768b764b in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /home/alex/lsrc/qemu.git/util/qemu-thread-posix.c:455 #11 qemu_event_wait (ev=ev@entry=0x556b7897a608 <rcu_call_ready_event>) at /home/alex/lsrc/qemu.git/util/qemu-thread-posix.c:459 #12 0x0000556b768be29a in call_rcu_thread (opaque=opaque@entry=0x0) at /home/alex/lsrc/qemu.git/util/rcu.c:260 #13 0x0000556b768b689a in qemu_thread_start (args=<optimized out>) at /home/alex/lsrc/qemu.git/util/qemu-thread-posix.c:519 #14 0x00007f3697660fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486 #15 0x00007f36975914cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (rr) info threads Id Target Id Frame * 3 Thread 7966.7967 (mmap_hardlink_3_qemu-alpha) 0x0000000070000002 in ?? () (rr) Although possibly it hasn't moved on from where it was during exit: (rr) b preexit_cleanup Breakpoint 1 at 0x556b768646d0: file /home/alex/lsrc/qemu.git/linux-user/exit.c, line 31. (rr) rc Continuing. [New Thread 7966.7966] [Switching to Thread 7966.7966] Thread 4 hit Breakpoint 1, preexit_cleanup (env=0x556b797aac40, code=code@entry=0) at /home/alex/lsrc/qemu.git/linux-user/exit.c:31 31 rcu_unregister_thread(); (rr) bt #0 preexit_cleanup (env=0x556b797aac40, code=code@entry=0) at /home/alex/lsrc/qemu.git/linux-user/exit.c:31 #1 0x0000556b76850a63 in do_syscall1 (cpu_env=cpu_env@entry=0x556b797aac40, num=num@entry=405, arg1=arg1@entry=0, arg2=arg2@entry=0, arg3=arg3@entry=4832687680, arg4=arg4@entry=0, arg5=4095, arg6=4832686256, arg8=0, arg7=0) at /home/alex/lsrc/qemu.git/linux-user/syscall.c:9373 #2 0x0000556b76859b88 in do_syscall (cpu_env=cpu_env@entry=0x556b797aac40, num=405, arg1=0, arg2=0, arg3=<optimized out>, arg4=<optimized out>, arg5=4095, arg6=4832686256, arg7=0, arg8=0) at /home/alex/lsrc/qemu.git/linux-user/syscall.c:12110 #3 0x0000556b768645c6 in cpu_loop (env=0x556b797aac40) at /home/alex/lsrc/qemu.git/linux-user/alpha/cpu_loop.c:109 #4 0x0000556b767e13de in main (argc=<optimized out>, argv=0x7ffe9d8f5ca8, envp=<optimized out>) at /home/alex/lsrc/qemu.git/linux-user/main.c:865 (rr) info threads Id Target Id Frame 3 Thread 7966.7967 (mmap_hardlink_3) 0x0000000070000002 in ?? () * 4 Thread 7966.7966 (mmap_hardlink_3) preexit_cleanup (env=0x556b797aac40, code=code@entry=0) at /home/alex/lsrc/qemu.git/linux-user/exit.c:31 (rr) thread 3 [Switching to thread 3 (Thread 7966.7967)] #0 0x0000000070000002 in ?? () (rr) bt #0 0x0000000070000002 in ?? () #1 0x00007f36981a490e in _raw_syscall () at /build/rr-79viaC/rr-5.2.0/src/preload/raw_syscall.S:120 #2 0x00007f36981a13fe in traced_raw_syscall (call=call@entry=0x7f369656ffa0) at ./src/preload/syscallbuf.c:222 #3 0x00007f36981a271a in sys_xstat64 (call=<optimized out>) at ./src/preload/syscallbuf.c:2439 #4 syscall_hook_internal (call=0x7f369656ffa0) at ./src/preload/syscallbuf.c:2651 #5 syscall_hook (call=0x7f369656ffa0) at ./src/preload/syscallbuf.c:2687 #6 0x00007f36981a12da in _syscall_hook_trampoline () at /build/rr-79viaC/rr-5.2.0/src/preload/syscall_hook.S:282 #7 0x00007f36981a130a in __morestack () at /build/rr-79viaC/rr-5.2.0/src/preload/syscall_hook.S:417 #8 0x00007f36981a1310 in _syscall_hook_trampoline_48_3d_01_f0_ff_ff () at /build/rr-79viaC/rr-5.2.0/src/preload/syscall_hook.S:423 #9 0x00007f369758bf5f in syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 #10 0x0000556b768b764b in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /home/alex/lsrc/qemu.git/util/qemu-thread-posix.c:455 #11 qemu_event_wait (ev=ev@entry=0x556b7897a608 <rcu_call_ready_event>) at /home/alex/lsrc/qemu.git/util/qemu-thread-posix.c:459 #12 0x0000556b768be29a in call_rcu_thread (opaque=opaque@entry=0x0) at /home/alex/lsrc/qemu.git/util/rcu.c:260 #13 0x0000556b768b689a in qemu_thread_start (args=<optimized out>) at /home/alex/lsrc/qemu.git/util/qemu-thread-posix.c:519 #14 0x00007f3697660fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486 #15 0x00007f36975914cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (rr) Of course it is occurring on my patched tree so I guess the unregister approach doesn't actually help. > > thanks > -- PMM -- Alex Bennée
diff --git a/linux-user/exit.c b/linux-user/exit.c index a362ef67d2c..1c7ce347324 100644 --- a/linux-user/exit.c +++ b/linux-user/exit.c @@ -28,12 +28,13 @@ extern void __gcov_dump(void); void preexit_cleanup(CPUArchState *env, int code) { + rcu_unregister_thread(); #ifdef TARGET_GPROF - _mcleanup(); + _mcleanup(); #endif #ifdef CONFIG_GCOV - __gcov_dump(); + __gcov_dump(); #endif - gdb_exit(env, code); - qemu_plugin_atexit_cb(); + gdb_exit(env, code); + qemu_plugin_atexit_cb(); }
Through a mechanism I don't quite yet understand we can find ourselves with a left over RCU thread when we exit group. This is a racy failure that occurs for example with: alpha-linux-user running testthread with libhowvec.so plugin but only when run from make This may not be the correct fix but it seems to alleviate the symptoms. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> --- linux-user/exit.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) -- 2.20.1