Message ID | 20190111143815.26107-1-alex.bennee@linaro.org |
---|---|
State | New |
Headers | show |
Series | [RFC] tests: replace rem = sleep(time) with g_timer | expand |
On 11/01/19 15:38, Alex Bennée wrote: > Relying on sleep to always return having slept isn't safe as a signal > may have occurred. If signals are constantly incoming the program will > never reach it's termination condition. This is believed to be the > mechanism causing time outs for qht-test in Travis. > > Instead we use a g_timer to determine if the duration of the test has > passed and sleep for a second at a time. This may bias benchmark > results for short runs. Why not g_usleep? It already does a while loop around nanosleep (which returns the remaining time in the wait, like select but unlike sleep and poll). Thanks, Paolo
Paolo Bonzini <pbonzini@redhat.com> writes: > On 11/01/19 15:38, Alex Bennée wrote: >> Relying on sleep to always return having slept isn't safe as a signal >> may have occurred. If signals are constantly incoming the program will >> never reach it's termination condition. This is believed to be the >> mechanism causing time outs for qht-test in Travis. >> >> Instead we use a g_timer to determine if the duration of the test has >> passed and sleep for a second at a time. This may bias benchmark >> results for short runs. > > Why not g_usleep? It already does a while loop around nanosleep (which > returns the remaining time in the wait, like select but unlike sleep and > poll). Yeah I'm testing that now. However I have managed to trigger: ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0) but I'm not sure if this is some other side-effect of the test-qht-par/qht-bench invocation dance. -- Alex Bennée
On 11/01/19 16:28, Alex Bennée wrote: >> Why not g_usleep? It already does a while loop around nanosleep (which >> returns the remaining time in the wait, like select but unlike sleep and >> poll). > Yeah I'm testing that now. However I have managed to trigger: > > ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0) I think that's a good old SIGSEGV (0x8B00). Paolo
Paolo Bonzini <pbonzini@redhat.com> writes: > On 11/01/19 16:28, Alex Bennée wrote: >>> Why not g_usleep? It already does a while loop around nanosleep (which >>> returns the remaining time in the wait, like select but unlike sleep and >>> poll). >> Yeah I'm testing that now. However I have managed to trigger: >> >> ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0) > > I think that's a good old SIGSEGV (0x8B00). Hmmm, but I haven't been able to trigger it running it directly: retry.py -n 30 -c -- ./tests/qht-bench 1>/dev/null 2>&1 -R -S0.1 -D10000 -N1 -n 4 -u 20 -d 1 Could this be some sort of weird interaction caused by using system()? -- Alex Bennée
Paolo Bonzini <pbonzini@redhat.com> writes: > On 11/01/19 16:28, Alex Bennée wrote: >>> Why not g_usleep? It already does a while loop around nanosleep (which >>> returns the remaining time in the wait, like select but unlike sleep and >>> poll). >> Yeah I'm testing that now. However I have managed to trigger: >> >> ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0) > > I think that's a good old SIGSEGV (0x8B00). According to the PC in the logs: Line 98 of "mcount.c" starts at address 0x7ffff6e15145 <__mcount_internal+69> and ends at 0x7ffff6e15148 <__mcount_internal+72>. > > Paolo -- Alex Bennée
On Fri, 11 Jan 2019 16:41:41 +0100 Paolo Bonzini <pbonzini@redhat.com> wrote: > On 11/01/19 16:28, Alex Bennée wrote: > >> Why not g_usleep? It already does a while loop around nanosleep (which > >> returns the remaining time in the wait, like select but unlike sleep and > >> poll). > > Yeah I'm testing that now. However I have managed to trigger: > > > > ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0) > > I think that's a good old SIGSEGV (0x8B00). > Hmmm... system() returns a "wait status" that can be examined using the macros described in waitpid(2), and we have: /* If WIFEXITED(STATUS), the low-order 8 bits of the status. */ #define __WEXITSTATUS(status) (((status) & 0xff00) >> 8) So this rather looks like a 139 exit status to me... Not sure how this can happen though. > Paolo >
On Fri, Jan 11, 2019 at 04:06:54PM +0000, Alex Bennée wrote: > > Paolo Bonzini <pbonzini@redhat.com> writes: > > > On 11/01/19 16:28, Alex Bennée wrote: > >>> Why not g_usleep? It already does a while loop around nanosleep (which > >>> returns the remaining time in the wait, like select but unlike sleep and > >>> poll). > >> Yeah I'm testing that now. However I have managed to trigger: > >> > >> ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0) > > > > I think that's a good old SIGSEGV (0x8B00). > > According to the PC in the logs: > > Line 98 of "mcount.c" starts at address 0x7ffff6e15145 <__mcount_internal+69> and ends at 0x7ffff6e15148 <__mcount_internal+72>. Was this on Travis? Which architecture? -- Eduardo
Greg Kurz <groug@kaod.org> writes: > On Fri, 11 Jan 2019 16:41:41 +0100 > Paolo Bonzini <pbonzini@redhat.com> wrote: > >> On 11/01/19 16:28, Alex Bennée wrote: >> >> Why not g_usleep? It already does a while loop around nanosleep (which >> >> returns the remaining time in the wait, like select but unlike sleep and >> >> poll). >> > Yeah I'm testing that now. However I have managed to trigger: >> > >> > ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0) >> >> I think that's a good old SIGSEGV (0x8B00). >> > > Hmmm... system() returns a "wait status" that can be examined using the > macros described in waitpid(2), and we have: > > /* If WIFEXITED(STATUS), the low-order 8 bits of the status. */ > #define __WEXITSTATUS(status) (((status) & 0xff00) >> 8) > > So this rather looks like a 139 exit status to me... Not sure how > this can happen though. Yeah the child segfaulted in mcount while closing down. I've started a new thread with the details of the remaining failure modes: Subject: Remaining CI failures Date: Fri, 11 Jan 2019 19:10:07 +0000 Message-ID: <87lg3rui28.fsf@linaro.org> -- Alex Bennée
diff --git a/tests/atomic64-bench.c b/tests/atomic64-bench.c index 71692560ed..f087ca5bc2 100644 --- a/tests/atomic64-bench.c +++ b/tests/atomic64-bench.c @@ -74,18 +74,23 @@ static void *thread_func(void *arg) static void run_test(void) { - unsigned int remaining; + GTimer *timer; unsigned int i; while (atomic_read(&n_ready_threads) != n_threads) { cpu_relax(); } + + timer = g_timer_new(); + atomic_set(&test_start, true); do { - remaining = sleep(duration); - } while (remaining); + sleep(1); + } while (g_timer_elapsed(timer, NULL) < duration); atomic_set(&test_stop, true); + g_timer_destroy(timer); + for (i = 0; i < n_threads; i++) { qemu_thread_join(&threads[i]); } diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c index 2f6c72f63a..5fdf901a71 100644 --- a/tests/atomic_add-bench.c +++ b/tests/atomic_add-bench.c @@ -76,18 +76,23 @@ static void *thread_func(void *arg) static void run_test(void) { - unsigned int remaining; + GTimer *timer; unsigned int i; while (atomic_read(&n_ready_threads) != n_threads) { cpu_relax(); } + + timer = g_timer_new(); + atomic_set(&test_start, true); do { - remaining = sleep(duration); - } while (remaining); + sleep(1); + } while (g_timer_elapsed(timer, NULL) < duration); atomic_set(&test_stop, true); + g_timer_destroy(timer); + for (i = 0; i < n_threads; i++) { qemu_thread_join(&threads[i]); } diff --git a/tests/qht-bench.c b/tests/qht-bench.c index ab4e708180..7473fcb60a 100644 --- a/tests/qht-bench.c +++ b/tests/qht-bench.c @@ -398,18 +398,23 @@ static void pr_stats(void) static void run_test(void) { - unsigned int remaining; + GTimer *timer; int i; while (atomic_read(&n_ready_threads) != n_rw_threads + n_rz_threads) { cpu_relax(); } + + timer = g_timer_new(); + atomic_set(&test_start, true); do { - remaining = sleep(duration); - } while (remaining); + sleep(1); + } while (g_timer_elapsed(timer, NULL) < duration); atomic_set(&test_stop, true); + g_timer_destroy(timer); + for (i = 0; i < n_rw_threads; i++) { qemu_thread_join(&rw_threads[i]); }
Relying on sleep to always return having slept isn't safe as a signal may have occurred. If signals are constantly incoming the program will never reach it's termination condition. This is believed to be the mechanism causing time outs for qht-test in Travis. Instead we use a g_timer to determine if the duration of the test has passed and sleep for a second at a time. This may bias benchmark results for short runs. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> --- tests/atomic64-bench.c | 11 ++++++++--- tests/atomic_add-bench.c | 11 ++++++++--- tests/qht-bench.c | 11 ++++++++--- 3 files changed, 24 insertions(+), 9 deletions(-) -- 2.17.1