diff mbox series

[RFC] tests: replace rem = sleep(time) with g_timer

Message ID 20190111143815.26107-1-alex.bennee@linaro.org
State New
Headers show
Series [RFC] tests: replace rem = sleep(time) with g_timer | expand

Commit Message

Alex Bennée Jan. 11, 2019, 2:38 p.m. UTC
Relying on sleep to always return having slept isn't safe as a signal
may have occurred. If signals are constantly incoming the program will
never reach it's termination condition. This is believed to be the
mechanism causing time outs for qht-test in Travis.

Instead we use a g_timer to determine if the duration of the test has
passed and sleep for a second at a time. This may bias benchmark
results for short runs.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
 tests/atomic64-bench.c   | 11 ++++++++---
 tests/atomic_add-bench.c | 11 ++++++++---
 tests/qht-bench.c        | 11 ++++++++---
 3 files changed, 24 insertions(+), 9 deletions(-)

-- 
2.17.1

Comments

Paolo Bonzini Jan. 11, 2019, 2:43 p.m. UTC | #1
On 11/01/19 15:38, Alex Bennée wrote:
> Relying on sleep to always return having slept isn't safe as a signal

> may have occurred. If signals are constantly incoming the program will

> never reach it's termination condition. This is believed to be the

> mechanism causing time outs for qht-test in Travis.

> 

> Instead we use a g_timer to determine if the duration of the test has

> passed and sleep for a second at a time. This may bias benchmark

> results for short runs.


Why not g_usleep?  It already does a while loop around nanosleep (which
returns the remaining time in the wait, like select but unlike sleep and
poll).

Thanks,

Paolo
Alex Bennée Jan. 11, 2019, 3:28 p.m. UTC | #2
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 11/01/19 15:38, Alex Bennée wrote:

>> Relying on sleep to always return having slept isn't safe as a signal

>> may have occurred. If signals are constantly incoming the program will

>> never reach it's termination condition. This is believed to be the

>> mechanism causing time outs for qht-test in Travis.

>>

>> Instead we use a g_timer to determine if the duration of the test has

>> passed and sleep for a second at a time. This may bias benchmark

>> results for short runs.

>

> Why not g_usleep?  It already does a while loop around nanosleep (which

> returns the remaining time in the wait, like select but unlike sleep and

> poll).


Yeah I'm testing that now. However I have managed to trigger:

  ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)

but I'm not sure if this is some other side-effect of the
test-qht-par/qht-bench invocation dance.

--
Alex Bennée
Paolo Bonzini Jan. 11, 2019, 3:41 p.m. UTC | #3
On 11/01/19 16:28, Alex Bennée wrote:
>> Why not g_usleep?  It already does a while loop around nanosleep (which

>> returns the remaining time in the wait, like select but unlike sleep and

>> poll).

> Yeah I'm testing that now. However I have managed to trigger:

> 

>   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)


I think that's a good old SIGSEGV (0x8B00).

Paolo
Alex Bennée Jan. 11, 2019, 4:05 p.m. UTC | #4
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 11/01/19 16:28, Alex Bennée wrote:

>>> Why not g_usleep?  It already does a while loop around nanosleep (which

>>> returns the remaining time in the wait, like select but unlike sleep and

>>> poll).

>> Yeah I'm testing that now. However I have managed to trigger:

>>

>>   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)

>

> I think that's a good old SIGSEGV (0x8B00).


Hmmm, but I haven't been able to trigger it running it directly:

  retry.py -n 30 -c -- ./tests/qht-bench 1>/dev/null 2>&1 -R -S0.1 -D10000 -N1 -n 4 -u 20 -d 1

Could this be some sort of weird interaction caused by using system()?

--
Alex Bennée
Alex Bennée Jan. 11, 2019, 4:06 p.m. UTC | #5
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 11/01/19 16:28, Alex Bennée wrote:

>>> Why not g_usleep?  It already does a while loop around nanosleep (which

>>> returns the remaining time in the wait, like select but unlike sleep and

>>> poll).

>> Yeah I'm testing that now. However I have managed to trigger:

>>

>>   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)

>

> I think that's a good old SIGSEGV (0x8B00).


According to the PC in the logs:

  Line 98 of "mcount.c" starts at address 0x7ffff6e15145 <__mcount_internal+69> and ends at 0x7ffff6e15148 <__mcount_internal+72>.

>

> Paolo



--
Alex Bennée
Greg Kurz Jan. 11, 2019, 6:32 p.m. UTC | #6
On Fri, 11 Jan 2019 16:41:41 +0100
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 11/01/19 16:28, Alex Bennée wrote:

> >> Why not g_usleep?  It already does a while loop around nanosleep (which

> >> returns the remaining time in the wait, like select but unlike sleep and

> >> poll).  

> > Yeah I'm testing that now. However I have managed to trigger:

> > 

> >   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)  

> 

> I think that's a good old SIGSEGV (0x8B00).

> 


Hmmm... system() returns a "wait status" that can  be examined using the
macros described in waitpid(2), and we have:

/* If WIFEXITED(STATUS), the low-order 8 bits of the status.  */
#define	__WEXITSTATUS(status)	(((status) & 0xff00) >> 8)

So this rather looks like a 139 exit status to me... Not sure how
this can happen though.

> Paolo

>
Eduardo Habkost Jan. 11, 2019, 7:05 p.m. UTC | #7
On Fri, Jan 11, 2019 at 04:06:54PM +0000, Alex Bennée wrote:
> 

> Paolo Bonzini <pbonzini@redhat.com> writes:

> 

> > On 11/01/19 16:28, Alex Bennée wrote:

> >>> Why not g_usleep?  It already does a while loop around nanosleep (which

> >>> returns the remaining time in the wait, like select but unlike sleep and

> >>> poll).

> >> Yeah I'm testing that now. However I have managed to trigger:

> >>

> >>   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)

> >

> > I think that's a good old SIGSEGV (0x8B00).

> 

> According to the PC in the logs:

> 

>   Line 98 of "mcount.c" starts at address 0x7ffff6e15145 <__mcount_internal+69> and ends at 0x7ffff6e15148 <__mcount_internal+72>.


Was this on Travis?  Which architecture?

-- 
Eduardo
Alex Bennée Jan. 11, 2019, 7:25 p.m. UTC | #8
Greg Kurz <groug@kaod.org> writes:

> On Fri, 11 Jan 2019 16:41:41 +0100

> Paolo Bonzini <pbonzini@redhat.com> wrote:

>

>> On 11/01/19 16:28, Alex Bennée wrote:

>> >> Why not g_usleep?  It already does a while loop around nanosleep (which

>> >> returns the remaining time in the wait, like select but unlike sleep and

>> >> poll).

>> > Yeah I'm testing that now. However I have managed to trigger:

>> >

>> >   ERROR:tests/test-qht-par.c:20:test_qht: assertion failed (rc == 0): (35584 == 0)

>>

>> I think that's a good old SIGSEGV (0x8B00).

>>

>

> Hmmm... system() returns a "wait status" that can  be examined using the

> macros described in waitpid(2), and we have:

>

> /* If WIFEXITED(STATUS), the low-order 8 bits of the status.  */

> #define	__WEXITSTATUS(status)	(((status) & 0xff00) >> 8)

>

> So this rather looks like a 139 exit status to me... Not sure how

> this can happen though.


Yeah the child segfaulted in mcount while closing down. I've started a
new thread with the details of the remaining failure modes:

  Subject: Remaining CI failures
  Date: Fri, 11 Jan 2019 19:10:07 +0000
  Message-ID: <87lg3rui28.fsf@linaro.org>


--
Alex Bennée
diff mbox series

Patch

diff --git a/tests/atomic64-bench.c b/tests/atomic64-bench.c
index 71692560ed..f087ca5bc2 100644
--- a/tests/atomic64-bench.c
+++ b/tests/atomic64-bench.c
@@ -74,18 +74,23 @@  static void *thread_func(void *arg)
 
 static void run_test(void)
 {
-    unsigned int remaining;
+    GTimer *timer;
     unsigned int i;
 
     while (atomic_read(&n_ready_threads) != n_threads) {
         cpu_relax();
     }
+
+    timer = g_timer_new();
+
     atomic_set(&test_start, true);
     do {
-        remaining = sleep(duration);
-    } while (remaining);
+        sleep(1);
+    } while (g_timer_elapsed(timer, NULL) < duration);
     atomic_set(&test_stop, true);
 
+    g_timer_destroy(timer);
+
     for (i = 0; i < n_threads; i++) {
         qemu_thread_join(&threads[i]);
     }
diff --git a/tests/atomic_add-bench.c b/tests/atomic_add-bench.c
index 2f6c72f63a..5fdf901a71 100644
--- a/tests/atomic_add-bench.c
+++ b/tests/atomic_add-bench.c
@@ -76,18 +76,23 @@  static void *thread_func(void *arg)
 
 static void run_test(void)
 {
-    unsigned int remaining;
+    GTimer *timer;
     unsigned int i;
 
     while (atomic_read(&n_ready_threads) != n_threads) {
         cpu_relax();
     }
+
+    timer = g_timer_new();
+
     atomic_set(&test_start, true);
     do {
-        remaining = sleep(duration);
-    } while (remaining);
+        sleep(1);
+    } while (g_timer_elapsed(timer, NULL) < duration);
     atomic_set(&test_stop, true);
 
+    g_timer_destroy(timer);
+
     for (i = 0; i < n_threads; i++) {
         qemu_thread_join(&threads[i]);
     }
diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index ab4e708180..7473fcb60a 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -398,18 +398,23 @@  static void pr_stats(void)
 
 static void run_test(void)
 {
-    unsigned int remaining;
+    GTimer *timer;
     int i;
 
     while (atomic_read(&n_ready_threads) != n_rw_threads + n_rz_threads) {
         cpu_relax();
     }
+
+    timer = g_timer_new();
+
     atomic_set(&test_start, true);
     do {
-        remaining = sleep(duration);
-    } while (remaining);
+        sleep(1);
+    } while (g_timer_elapsed(timer, NULL) < duration);
     atomic_set(&test_stop, true);
 
+    g_timer_destroy(timer);
+
     for (i = 0; i < n_rw_threads; i++) {
         qemu_thread_join(&rw_threads[i]);
     }