tests: Disable test-bdrv-drain

Message ID	20181005143802.18412-1-peter.maydell@linaro.org
State	Rejected
Headers	show Delivered-To: patches@linaro.org Received-SPF: pass (google.com: best guess record for domain of pm215@archaic.org.uk designates 2001:8b0:1d0::2 as permitted sender) client-ip=2001:8b0:1d0::2; From: Peter Maydell <peter.maydell@linaro.org> To: qemu-devel@nongnu.org Cc: patches@linaro.org, qemu-block@nongnu.org, Kevin Wolf <kwolf@redhat.com>, Max Reitz <mreitz@redhat.com> Subject: [PATCH] tests: Disable test-bdrv-drain Date: Fri, 5 Oct 2018 15:38:02 +0100 Message-Id: <20181005143802.18412-1-peter.maydell@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	tests: Disable test-bdrv-drain \| expand tests: Disable test-bdrv-drain

Peter Maydell Oct. 5, 2018, 2:38 p.m. UTC

The test-bdrv-drain test fails at least 50% of the time
on my OS build system. Disable the test until we can figure
out what's going on, as this makes pull request processing
very difficult.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

---
 tests/Makefile.include | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.19.0

Peter Maydell Oct. 5, 2018, 2:41 p.m. UTC | #1

On 5 October 2018 at 15:38, Peter Maydell <peter.maydell@linaro.org> wrote:
> The test-bdrv-drain test fails at least 50% of the time

> on my OS build system. Disable the test until we can figure


This is a typo: I meant "OSX build system".

> out what's going on, as this makes pull request processing

> very difficult.

>

> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>


thanks
-- PMM

Kevin Wolf Oct. 5, 2018, 4:17 p.m. UTC | #2

Am 05.10.2018 um 16:41 hat Peter Maydell geschrieben:
> On 5 October 2018 at 15:38, Peter Maydell <peter.maydell@linaro.org> wrote:

> > The test-bdrv-drain test fails at least 50% of the time

> > on my OS build system. Disable the test until we can figure

> 

> This is a typo: I meant "OSX build system".

> 

> > out what's going on, as this makes pull request processing

> > very difficult.

> >

> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

Can we disable it conditionally only on OS X instead?

I'd hate to lose this test case, and without OS X I can't really do
anything to fix the problem (or to even find out if it's a test case
problem or a bug that the test case reveals).

Kevin

Peter Maydell Oct. 5, 2018, 4:54 p.m. UTC | #3

On 5 October 2018 at 17:17, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 05.10.2018 um 16:41 hat Peter Maydell geschrieben:

>> On 5 October 2018 at 15:38, Peter Maydell <peter.maydell@linaro.org> wrote:

>> > The test-bdrv-drain test fails at least 50% of the time

>> > on my OS build system. Disable the test until we can figure

>>

>> This is a typo: I meant "OSX build system".

>>

>> > out what's going on, as this makes pull request processing

>> > very difficult.

>> >

>> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

>

> Can we disable it conditionally only on OS X instead?

>

> I'd hate to lose this test case, and without OS X I can't really do

> anything to fix the problem (or to even find out if it's a test case

> problem or a bug that the test case reveals).


If we disable it for OSX only then nobody has any incentive
to investigate and fix it...

thanks
-- PMM

Kevin Wolf Oct. 5, 2018, 6:09 p.m. UTC | #4

Am 05.10.2018 um 18:54 hat Peter Maydell geschrieben:
> On 5 October 2018 at 17:17, Kevin Wolf <kwolf@redhat.com> wrote:

> > Am 05.10.2018 um 16:41 hat Peter Maydell geschrieben:

> >> On 5 October 2018 at 15:38, Peter Maydell <peter.maydell@linaro.org> wrote:

> >> > The test-bdrv-drain test fails at least 50% of the time

> >> > on my OS build system. Disable the test until we can figure

> >>

> >> This is a typo: I meant "OSX build system".

> >>

> >> > out what's going on, as this makes pull request processing

> >> > very difficult.

> >> >

> >> > Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

> >

> > Can we disable it conditionally only on OS X instead?

> >

> > I'd hate to lose this test case, and without OS X I can't really do

> > anything to fix the problem (or to even find out if it's a test case

> > problem or a bug that the test case reveals).

> 

> If we disable it for OSX only then nobody has any incentive

> to investigate and fix it...

And if we disable it wholesale, then nobody has any incentive to fix any
bug that the test case could have uncovered.

Look, if this were on BSD or something, I'd even setup a BSD VM and try
to investigate. With OS X, that's not an option. If OS X users care
about the bug, they need to fix it. If you want to give them an
incentive, then the test case needs to stay enabled. If they don't care,
we can disable the test case for OS X (and leave QEMU broken if it's a
real bug, but eventually someone will certainly report a bug in a real
life scenario in that case).

Anyway, killing tests for Linux users because they can't fix OS X bugs
doesn't sound like a very useful policy to me.

Kevin

Peter Maydell Oct. 8, 2018, 9:12 a.m. UTC | #5

On 5 October 2018 at 19:09, Kevin Wolf <kwolf@redhat.com> wrote:
> And if we disable it wholesale, then nobody has any incentive to fix any

> bug that the test case could have uncovered.


Yes, that's fair. I'm sorry; I was a bit grumpy when I wrote
that email because my test runs had been bumping into it all day.

> Look, if this were on BSD or something, I'd even setup a BSD VM and try

> to investigate. With OS X, that's not an option. If OS X users care

> about the bug, they need to fix it. If you want to give them an

> incentive, then the test case needs to stay enabled. If they don't care,

> we can disable the test case for OS X (and leave QEMU broken if it's a

> real bug, but eventually someone will certainly report a bug in a real

> life scenario in that case).


I looked back at the backtrace/etc that I posted earlier in this
thread, and it looked to me like maybe a memory corruption issue.
So I tried running the test under valgrind on Linux, and:
$ valgrind ./build/x86/tests/test-bdrv-drain
[...]
/bdrv-drain/iothread/drain_all: OK
/bdrv-drain/iothread/drain: ==7972== Thread 4:
==7972== Invalid read of size 1
==7972==    at 0x24BCBE: aio_notify (async.c:352)
==7972==    by 0x24B6FB: qemu_bh_schedule (async.c:168)
==7972==    by 0x24C0FE: aio_co_schedule (async.c:465)
==7972==    by 0x24C174: aio_co_enter (async.c:484)
==7972==    by 0x24C143: aio_co_wake (async.c:478)
==7972==    by 0x129558: co_reenter_bh (test-bdrv-drain.c:63)
==7972==    by 0x24B525: aio_bh_call (async.c:90)
==7972==    by 0x24B5BD: aio_bh_poll (async.c:118)
==7972==    by 0x250C6D: aio_poll (aio-posix.c:690)
==7972==    by 0x20BB32: iothread_run (iothread.c:51)
==7972==    by 0x253888: qemu_thread_start (qemu-thread-posix.c:504)
==7972==    by 0x19D5C6B9: start_thread (pthread_create.c:333)
==7972==  Address 0x20c0e3b0 is 176 bytes inside a block of size 312 free'd
==7972==    at 0x4C2EDEB: free (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7972==    by 0x18DFE289: g_source_unref_internal (in
/lib/x86_64-linux-gnu/libglib-2.0.so.0.4800.2)
==7972==    by 0x24C241: aio_context_unref (async.c:506)
==7972==    by 0x20BBB2: iothread_join (iothread.c:65)
==7972==    by 0x12DD5E: test_iothread_common (test-bdrv-drain.c:727)
==7972==    by 0x12DDCB: test_iothread_drain (test-bdrv-drain.c:740)
==7972==    by 0x18E2687A: g_test_run_suite_internal (in
/lib/x86_64-linux-gnu/libglib-2.0.so.0.4800.2)
==7972==    by 0x18E26A42: g_test_run_suite_internal (in
/lib/x86_64-linux-gnu/libglib-2.0.so.0.4800.2)
==7972==    by 0x18E26A42: g_test_run_suite_internal (in
/lib/x86_64-linux-gnu/libglib-2.0.so.0.4800.2)
==7972==    by 0x18E26C4D: g_test_run_suite (in
/lib/x86_64-linux-gnu/libglib-2.0.so.0.4800.2)
==7972==    by 0x18E26C70: g_test_run (in
/lib/x86_64-linux-gnu/libglib-2.0.so.0.4800.2)
==7972==    by 0x13073A: main (test-bdrv-drain.c:1350)
==7972==  Block was alloc'd at
==7972==    at 0x4C2FB55: calloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7972==    by 0x18E06810: g_malloc0 (in
/lib/x86_64-linux-gnu/libglib-2.0.so.0.4800.2)
==7972==    by 0x18DFEC29: g_source_new (in
/lib/x86_64-linux-gnu/libglib-2.0.so.0.4800.2)
==7972==    by 0x24BE6B: aio_context_new (async.c:413)
==7972==    by 0x20BAE6: iothread_run (iothread.c:46)
==7972==    by 0x253888: qemu_thread_start (qemu-thread-posix.c:504)
==7972==    by 0x19D5C6B9: start_thread (pthread_create.c:333)
==7972==    by 0x1A07941C: clone (clone.S:109)
==7972==
OK
/bdrv-drain/iothread/drain_subtree: OK
[...]

which is indeed a use-after-free in the test immediately before
the one that crashes on OSX. Could I ask you to have a look at
that error, please? Hopefully it will turn out to be the underlying
cause of the OSX problems.

thanks
-- PMM

Peter Maydell Oct. 8, 2018, 3:43 p.m. UTC | #6

On 8 October 2018 at 10:12, Peter Maydell <peter.maydell@linaro.org> wrote:
> I looked back at the backtrace/etc that I posted earlier in this

> thread, and it looked to me like maybe a memory corruption issue.

> So I tried running the test under valgrind on Linux, and:


...which goes away if I do a complete build from clean, so
presumably is the result of a stale .o file?

The OSX version I'm running doesn't support valgrind, but
the C compiler does have the clang sanitizers. Here's a
log from a build with -fsanitize=address -fsanitize=undefined
of commit df51a005192ee40b:

$ ./tests/test-bdrv-drain
/bdrv-drain/nested: ==60415==WARNING: ASan is ignoring requested
__asan_handle_no_return: stack top: 0x7ffee500e000; bottom
0x00010fa0d000; size: 0x7ffdd5601000 (140728183296000)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
OK
/bdrv-drain/multiparent: OK
/bdrv-drain/driver-cb/drain_all: OK
/bdrv-drain/driver-cb/drain: OK
/bdrv-drain/driver-cb/drain_subtree: OK
/bdrv-drain/driver-cb/co/drain_all: OK
/bdrv-drain/driver-cb/co/drain: OK
/bdrv-drain/driver-cb/co/drain_subtree: OK
/bdrv-drain/quiesce/drain_all: OK
/bdrv-drain/quiesce/drain: OK
/bdrv-drain/quiesce/drain_subtree: OK
/bdrv-drain/quiesce/co/drain_all: OK
/bdrv-drain/quiesce/co/drain: OK
/bdrv-drain/quiesce/co/drain_subtree: OK
/bdrv-drain/graph-change/drain_subtree: OK
/bdrv-drain/graph-change/drain_all: OK
/bdrv-drain/iothread/drain_all:
=================================================================
==60415==ERROR: AddressSanitizer: heap-use-after-free on address
0x60d000010060 at pc 0x00010b329270 bp 0x7000036c9d10 sp
0x7000036c9d08
READ of size 8 at 0x60d000010060 thread T3
    #0 0x10b32926f in notifier_list_notify notify.c:39
    #1 0x10b2b8622 in qemu_thread_atexit_run qemu-thread-posix.c:473
    #2 0x7fff5a0e1162 in _pthread_tsd_cleanup
(libsystem_pthread.dylib:x86_64+0x5162)
    #3 0x7fff5a0e0ee8 in _pthread_exit (libsystem_pthread.dylib:x86_64+0x4ee8)
    #4 0x7fff5a0df66b in _pthread_body (libsystem_pthread.dylib:x86_64+0x366b)
    #5 0x7fff5a0df50c in _pthread_start (libsystem_pthread.dylib:x86_64+0x350c)
    #6 0x7fff5a0debf8 in thread_start (libsystem_pthread.dylib:x86_64+0x2bf8)

0x60d000010060 is located 48 bytes inside of 144-byte region
[0x60d000010030,0x60d0000100c0)
freed by thread T3 here:
    #0 0x10bcc51bd in wrap_free
(libclang_rt.asan_osx_dynamic.dylib:x86_64+0x551bd)
    #1 0x7fff5a0e1162 in _pthread_tsd_cleanup
(libsystem_pthread.dylib:x86_64+0x5162)
    #2 0x7fff5a0e0ee8 in _pthread_exit (libsystem_pthread.dylib:x86_64+0x4ee8)
    #3 0x7fff5a0df66b in _pthread_body (libsystem_pthread.dylib:x86_64+0x366b)
    #4 0x7fff5a0df50c in _pthread_start (libsystem_pthread.dylib:x86_64+0x350c)
    #5 0x7fff5a0debf8 in thread_start (libsystem_pthread.dylib:x86_64+0x2bf8)

previously allocated by thread T3 here:
    #0 0x10bcc5003 in wrap_malloc
(libclang_rt.asan_osx_dynamic.dylib:x86_64+0x55003)
    #1 0x7fff59dc9969 in tlv_allocate_and_initialize_for_key
(libdyld.dylib:x86_64+0x3969)
    #2 0x7fff59dca0eb in tlv_get_addr (libdyld.dylib:x86_64+0x40eb)
    #3 0x10b3558d6 in rcu_register_thread rcu.c:301
    #4 0x10b131cb7 in iothread_run iothread.c:42
    #5 0x10b2b8eff in qemu_thread_start qemu-thread-posix.c:504
    #6 0x7fff5a0df660 in _pthread_body (libsystem_pthread.dylib:x86_64+0x3660)
    #7 0x7fff5a0df50c in _pthread_start (libsystem_pthread.dylib:x86_64+0x350c)
    #8 0x7fff5a0debf8 in thread_start (libsystem_pthread.dylib:x86_64+0x2bf8)

Thread T3 created by T0 here:
    #0 0x10bcbd00d in wrap_pthread_create
(libclang_rt.asan_osx_dynamic.dylib:x86_64+0x4d00d)
    #1 0x10b2b8bb5 in qemu_thread_create qemu-thread-posix.c:534
    #2 0x10b131720 in iothread_new iothread.c:75
    #3 0x10ac04edc in test_iothread_common test-bdrv-drain.c:668
    #4 0x10abff44e in test_iothread_drain_all test-bdrv-drain.c:768
    #5 0x10ba45b2b in g_test_run_suite_internal
(libglib-2.0.0.dylib:x86_64+0x4fb2b)
    #6 0x10ba45cec in g_test_run_suite_internal
(libglib-2.0.0.dylib:x86_64+0x4fcec)
    #7 0x10ba45cec in g_test_run_suite_internal
(libglib-2.0.0.dylib:x86_64+0x4fcec)
    #8 0x10ba450fb in g_test_run_suite (libglib-2.0.0.dylib:x86_64+0x4f0fb)
    #9 0x10ba4504e in g_test_run (libglib-2.0.0.dylib:x86_64+0x4f04e)
    #10 0x10abf4515 in main test-bdrv-drain.c:1606
    #11 0x7fff59dc7014 in start (libdyld.dylib:x86_64+0x1014)

SUMMARY: AddressSanitizer: heap-use-after-free notify.c:39 in
notifier_list_notify
Shadow bytes around the buggy address:
  0x1c1a00001fb0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c1a00001fc0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c1a00001fd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c1a00001fe0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c1a00001ff0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x1c1a00002000: fa fa fa fa fa fa fd fd fd fd fd fd[fd]fd fd fd
  0x1c1a00002010: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
  0x1c1a00002020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c1a00002030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c1a00002040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c1a00002050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==60415==ABORTING
Illegal instruction: 4

Looking at the backtraces I'm wondering if this is the result of
an implicit reliance on the order in which per-thread destructors
are called (which is left unspecified by POSIX) -- the destructor
function qemu_thread_atexit_run() is called after some other
destructor, but accesses its memory.

Specifically, the memory it's trying to read looks like
the __thread local variable pollfds_cleanup_notifier in
util/aio-posix.c. So I think what is happening is:
 * util/aio-posix.c calls qemu_thread_atexit_add(), passing
   it a pointer to a thread-local variable pollfds_cleanup_notifier
 * qemu_thread_atexit_add() works by arranging to run the
   notifiers when its 'exit_key' variable's destructor is called
 * the destructor for pollfds_cleanup_notifier runs before that
   for exit_key, and so the qemu_thread_atexit_run() function
   ends up touching freed memory

I'm pretty confident this analysis of the problem is correct:
unfortunately I have no idea what the right way to fix it is...

thanks
-- PMM

Kevin Wolf Oct. 8, 2018, 4:40 p.m. UTC | #7

Am 08.10.2018 um 17:43 hat Peter Maydell geschrieben:
> Looking at the backtraces I'm wondering if this is the result of

> an implicit reliance on the order in which per-thread destructors

> are called (which is left unspecified by POSIX) -- the destructor

> function qemu_thread_atexit_run() is called after some other

> destructor, but accesses its memory.

> 

> Specifically, the memory it's trying to read looks like

> the __thread local variable pollfds_cleanup_notifier in

> util/aio-posix.c. So I think what is happening is:

>  * util/aio-posix.c calls qemu_thread_atexit_add(), passing

>    it a pointer to a thread-local variable pollfds_cleanup_notifier

>  * qemu_thread_atexit_add() works by arranging to run the

>    notifiers when its 'exit_key' variable's destructor is called

>  * the destructor for pollfds_cleanup_notifier runs before that

>    for exit_key, and so the qemu_thread_atexit_run() function

>    ends up touching freed memory

> 

> I'm pretty confident this analysis of the problem is correct:

> unfortunately I have no idea what the right way to fix it is...


Yes, I agree with your analysis. If __thread variables can be destructed
before pthread_key_create() destructors are called (and in particular if
the former are implemented in terms of the latter), this implies at
least two rules:

1. The Notfier itself can't be a TLS variable

2. The notifier callback can't access any TLS variables

Of course, with these restrictions, qemu_thread_atexit_*() with its
existing API is as useless as it could be.

The best I can think of at the moment would be to use a separate
pthread_key_create() (and therefore a separate destructor) for
registering each TLS variable, so that the destructor always gets a
valid pointer. Maybe move all __thread variables of a file into a single
malloced struct to make it more managable (we could then keep a __thread
pointer to it for convenience, but only free the struct with the pointer
passed by the pthread_key destructor so that we don't have to access
__thread variables in the destructor).

By the way, can you reproduce this with virtio-blk/scsi and an iothread
in a real QEMU or is it only the test case that fails? In theory, I
don't see what would prevent QEMU from hanging at shutdown.

Kevin

Peter Maydell Oct. 8, 2018, 5:07 p.m. UTC | #8

On 8 October 2018 at 17:40, Kevin Wolf <kwolf@redhat.com> wrote:
> By the way, can you reproduce this with virtio-blk/scsi and an iothread

> in a real QEMU or is it only the test case that fails? In theory, I

> don't see what would prevent QEMU from hanging at shutdown.

I haven't tested, but I suspect this is less likely to
cause a problem, because in a real QEMU run we'll only
kill these threads and cause the memory corruption at
the end of the run when QEMU is exiting anyway. The problem
in the test case is that we kill threads and corrupt memory,
and then continue to use the same process for the next test
in the set, which then crashes later as a result of the
memory corruption.

thanks
-- PMM

Eric Blake Oct. 8, 2018, 7:53 p.m. UTC | #9

On 10/8/18 11:40 AM, Kevin Wolf wrote:
> Am 08.10.2018 um 17:43 hat Peter Maydell geschrieben:

>> Looking at the backtraces I'm wondering if this is the result of

>> an implicit reliance on the order in which per-thread destructors

>> are called (which is left unspecified by POSIX) -- the destructor

>> function qemu_thread_atexit_run() is called after some other

>> destructor, but accesses its memory.

>>

>> Specifically, the memory it's trying to read looks like

>> the __thread local variable pollfds_cleanup_notifier in

>> util/aio-posix.c. So I think what is happening is:

>>   * util/aio-posix.c calls qemu_thread_atexit_add(), passing

>>     it a pointer to a thread-local variable pollfds_cleanup_notifier

>>   * qemu_thread_atexit_add() works by arranging to run the

>>     notifiers when its 'exit_key' variable's destructor is called

>>   * the destructor for pollfds_cleanup_notifier runs before that

>>     for exit_key, and so the qemu_thread_atexit_run() function

>>     ends up touching freed memory

>>

>> I'm pretty confident this analysis of the problem is correct:

>> unfortunately I have no idea what the right way to fix it is...

> 

> Yes, I agree with your analysis. If __thread variables can be destructed

> before pthread_key_create() destructors are called (and in particular if

> the former are implemented in terms of the latter), this implies at

> least two rules:

> 

> 1. The Notfier itself can't be a TLS variable

> 

> 2. The notifier callback can't access any TLS variables

> 

> Of course, with these restrictions, qemu_thread_atexit_*() with its

> existing API is as useless as it could be.

> 

> The best I can think of at the moment would be to use a separate

> pthread_key_create() (and therefore a separate destructor) for

> registering each TLS variable, so that the destructor always gets a

> valid pointer. Maybe move all __thread variables of a file into a single

> malloced struct to make it more managable (we could then keep a __thread

> pointer to it for convenience, but only free the struct with the pointer

> passed by the pthread_key destructor so that we don't have to access

> __thread variables in the destructor).


pthread_key_create() says that a when a destructor is triggered, it sets 
the value of the key to NULL; but that you can once again set the key 
back to a non-NULL value, and that the implementation will loop at least 
PTHREAD_DESTRUCTOR_ITERATIONS over all destructors or until it has 
convinced the destructors to leave values at NULL.  Thus, while you 
cannot guarantee ordering between destructors within a single iteration 
of the cleanup loop, you CAN do some sort of witness locking or 
down-counter where a destructor purposefully calls pthread_setspecific() 
to revive the value to survive into the next iteration of destructor 
calls, for variables which are known to be referenced by other 
destructors while the witness count is still high enough, as a way of 
imposing order between loops.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Kevin Wolf Oct. 9, 2018, 9:48 a.m. UTC | #10

Am 08.10.2018 um 21:53 hat Eric Blake geschrieben:
> On 10/8/18 11:40 AM, Kevin Wolf wrote:

> > Am 08.10.2018 um 17:43 hat Peter Maydell geschrieben:

> > > Looking at the backtraces I'm wondering if this is the result of

> > > an implicit reliance on the order in which per-thread destructors

> > > are called (which is left unspecified by POSIX) -- the destructor

> > > function qemu_thread_atexit_run() is called after some other

> > > destructor, but accesses its memory.

> > > 

> > > Specifically, the memory it's trying to read looks like

> > > the __thread local variable pollfds_cleanup_notifier in

> > > util/aio-posix.c. So I think what is happening is:

> > >   * util/aio-posix.c calls qemu_thread_atexit_add(), passing

> > >     it a pointer to a thread-local variable pollfds_cleanup_notifier

> > >   * qemu_thread_atexit_add() works by arranging to run the

> > >     notifiers when its 'exit_key' variable's destructor is called

> > >   * the destructor for pollfds_cleanup_notifier runs before that

> > >     for exit_key, and so the qemu_thread_atexit_run() function

> > >     ends up touching freed memory

> > > 

> > > I'm pretty confident this analysis of the problem is correct:

> > > unfortunately I have no idea what the right way to fix it is...

> > 

> > Yes, I agree with your analysis. If __thread variables can be destructed

> > before pthread_key_create() destructors are called (and in particular if

> > the former are implemented in terms of the latter), this implies at

> > least two rules:

> > 

> > 1. The Notfier itself can't be a TLS variable

> > 

> > 2. The notifier callback can't access any TLS variables

> > 

> > Of course, with these restrictions, qemu_thread_atexit_*() with its

> > existing API is as useless as it could be.

> > 

> > The best I can think of at the moment would be to use a separate

> > pthread_key_create() (and therefore a separate destructor) for

> > registering each TLS variable, so that the destructor always gets a

> > valid pointer. Maybe move all __thread variables of a file into a single

> > malloced struct to make it more managable (we could then keep a __thread

> > pointer to it for convenience, but only free the struct with the pointer

> > passed by the pthread_key destructor so that we don't have to access

> > __thread variables in the destructor).

> 

> pthread_key_create() says that a when a destructor is triggered, it sets the

> value of the key to NULL; but that you can once again set the key back to a

> non-NULL value, and that the implementation will loop at least

> PTHREAD_DESTRUCTOR_ITERATIONS over all destructors or until it has convinced

> the destructors to leave values at NULL.  Thus, while you cannot guarantee

> ordering between destructors within a single iteration of the cleanup loop,

> you CAN do some sort of witness locking or down-counter where a destructor

> purposefully calls pthread_setspecific() to revive the value to survive into

> the next iteration of destructor calls, for variables which are known to be

> referenced by other destructors while the witness count is still high

> enough, as a way of imposing order between loops.


Yes, everything that is explicitly managed with pthread_key_create() is
fine. The point is that the destructors for __thread variables aren't
controlled by QEMU, but by the system libraries, so their memory can go
away earlier than other destructors are called.

Kevin

Paolo Bonzini Oct. 9, 2018, 11:16 a.m. UTC | #11

On 08/10/2018 18:40, Kevin Wolf wrote:
>>

>> I'm pretty confident this analysis of the problem is correct:

>> unfortunately I have no idea what the right way to fix it is...

> Yes, I agree with your analysis. If __thread variables can be destructed

> before pthread_key_create() destructors are called (and in particular if

> the former are implemented in terms of the latter), this implies at

> least two rules:

> 

> 1. The Notfier itself can't be a TLS variable

> 

> 2. The notifier callback can't access any TLS variables

> 

> Of course, with these restrictions, qemu_thread_atexit_*() with its

> existing API is as useless as it could be.


Yup, we have to stop using pthread_key_create.  Luckily, these days
there is always qemu_thread_start that wraps the thread, so we can call
qemu_thread_atexit_run from there, and change exit_key to a thread-local
NotifierList.

Paolo

Peter Maydell Nov. 2, 2018, 1:33 p.m. UTC | #12

On 9 October 2018 at 12:16, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 08/10/2018 18:40, Kevin Wolf wrote:

>>>

>>> I'm pretty confident this analysis of the problem is correct:

>>> unfortunately I have no idea what the right way to fix it is...

>> Yes, I agree with your analysis. If __thread variables can be destructed

>> before pthread_key_create() destructors are called (and in particular if

>> the former are implemented in terms of the latter), this implies at

>> least two rules:

>>

>> 1. The Notfier itself can't be a TLS variable

>>

>> 2. The notifier callback can't access any TLS variables

>>

>> Of course, with these restrictions, qemu_thread_atexit_*() with its

>> existing API is as useless as it could be.

>

> Yup, we have to stop using pthread_key_create.  Luckily, these days

> there is always qemu_thread_start that wraps the thread, so we can call

> qemu_thread_atexit_run from there, and change exit_key to a thread-local

> NotifierList.


We would also need to catch exits via qemu_thread_exit(), right?
We probably also need to handle the main thread specially, via
atexit(). This seems to be pretty much what we already do in
util/qemu-thread-win32.c...

thanks
-- PMM

Paolo Bonzini Nov. 4, 2018, 10:03 p.m. UTC | #13

On 02/11/2018 14:33, Peter Maydell wrote:
> On 9 October 2018 at 12:16, Paolo Bonzini <pbonzini@redhat.com> wrote:

>> On 08/10/2018 18:40, Kevin Wolf wrote:

>>>>

>>>> I'm pretty confident this analysis of the problem is correct:

>>>> unfortunately I have no idea what the right way to fix it is...

>>> Yes, I agree with your analysis. If __thread variables can be destructed

>>> before pthread_key_create() destructors are called (and in particular if

>>> the former are implemented in terms of the latter), this implies at

>>> least two rules:

>>>

>>> 1. The Notfier itself can't be a TLS variable

>>>

>>> 2. The notifier callback can't access any TLS variables

>>>

>>> Of course, with these restrictions, qemu_thread_atexit_*() with its

>>> existing API is as useless as it could be.

>>

>> Yup, we have to stop using pthread_key_create.  Luckily, these days

>> there is always qemu_thread_start that wraps the thread, so we can call

>> qemu_thread_atexit_run from there, and change exit_key to a thread-local

>> NotifierList.

> 

> We would also need to catch exits via qemu_thread_exit(), right?

> We probably also need to handle the main thread specially, via

> atexit(). This seems to be pretty much what we already do in

> util/qemu-thread-win32.c...


There's only one caller of qemu_thread_exit and it can be removed easily.

However, an improvement on the idea is to use pthread_cleanup_push/pop
in qemu_thread_start.  This does handle qemu_thread_exit.

Thanks,

Paolo

Peter Maydell Nov. 5, 2018, 10:34 a.m. UTC | #14

On 4 November 2018 at 22:03, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 02/11/2018 14:33, Peter Maydell wrote:

>> On 9 October 2018 at 12:16, Paolo Bonzini <pbonzini@redhat.com> wrote:

>>> Yup, we have to stop using pthread_key_create.  Luckily, these days

>>> there is always qemu_thread_start that wraps the thread, so we can call

>>> qemu_thread_atexit_run from there, and change exit_key to a thread-local

>>> NotifierList.

>>

>> We would also need to catch exits via qemu_thread_exit(), right?

>> We probably also need to handle the main thread specially, via

>> atexit(). This seems to be pretty much what we already do in

>> util/qemu-thread-win32.c...

>

> There's only one caller of qemu_thread_exit and it can be removed easily.

>

> However, an improvement on the idea is to use pthread_cleanup_push/pop

> in qemu_thread_start.  This does handle qemu_thread_exit.


Interesting. (I have a patch I put together on Friday and haven't
tested yet that just uses the win32-style "explicitly run notifiers
in exit and at end of thread_start".)

thanks
-- PMM

tests: Disable test-bdrv-drain

Commit Message

Comments

Patch