diff mbox series

tests/migration: Allow longer timeouts

Message ID 20201008160330.130431-1-dgilbert@redhat.com
State New
Headers show
Series tests/migration: Allow longer timeouts | expand

Commit Message

Dr. David Alan Gilbert Oct. 8, 2020, 4:03 p.m. UTC
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

In travis, with gcov and gprof we're seeing timeouts; hopefully fix
this by increasing the test timeouts a bit, but for xbzrle ensure it
really does get a couple of cycles through to test the cache.

I think the problem in travis is we have about 2 host CPU threads,
in the test we have at least 3:
   a) The vCPU thread (100% flat out)
   b) The source migration thread
   c) The destination migration thread

if (b) & (c) are slow for any reason - gcov+gperf or a slow host -
then they're sharing one host CPU thread so limit the migration
bandwidth.

Tested on my laptop with:
   taskset -c 0,1 ./tests/qtest/migration-test -p /x86_64/migration

Reported-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 tests/qtest/migration-test.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

Comments

Thomas Huth Oct. 12, 2020, 1:13 p.m. UTC | #1
On 08/10/2020 18.03, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

> 

> In travis, with gcov and gprof we're seeing timeouts; hopefully fix

> this by increasing the test timeouts a bit, but for xbzrle ensure it

> really does get a couple of cycles through to test the cache.

> 

> I think the problem in travis is we have about 2 host CPU threads,

> in the test we have at least 3:

>    a) The vCPU thread (100% flat out)

>    b) The source migration thread

>    c) The destination migration thread

> 

> if (b) & (c) are slow for any reason - gcov+gperf or a slow host -

> then they're sharing one host CPU thread so limit the migration

> bandwidth.

> 

> Tested on my laptop with:

>    taskset -c 0,1 ./tests/qtest/migration-test -p /x86_64/migration

> 

> Reported-by: Alex Bennée <alex.bennee@linaro.org>

> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---

>  tests/qtest/migration-test.c | 21 +++++++++++----------

>  1 file changed, 11 insertions(+), 10 deletions(-)


This seems to fix the gcov/gprof test indeed:

 https://travis-ci.com/github/huth/qemu/jobs/398270396

Thus:

Tested-by: Thomas Huth <thuth@redhat.com>


I'm also queuing this to my qtest-next branch (in case you don't plan a
migration pull request within the next days):

 https://gitlab.com/huth/qemu/-/commits/qtest-next/

 Thomas
Thomas Huth Oct. 13, 2020, 6:06 a.m. UTC | #2
On 12/10/2020 15.13, Thomas Huth wrote:
> On 08/10/2020 18.03, Dr. David Alan Gilbert (git) wrote:

>> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

>>

>> In travis, with gcov and gprof we're seeing timeouts; hopefully fix

>> this by increasing the test timeouts a bit, but for xbzrle ensure it

>> really does get a couple of cycles through to test the cache.

>>

>> I think the problem in travis is we have about 2 host CPU threads,

>> in the test we have at least 3:

>>    a) The vCPU thread (100% flat out)

>>    b) The source migration thread

>>    c) The destination migration thread

>>

>> if (b) & (c) are slow for any reason - gcov+gperf or a slow host -

>> then they're sharing one host CPU thread so limit the migration

>> bandwidth.

>>

>> Tested on my laptop with:

>>    taskset -c 0,1 ./tests/qtest/migration-test -p /x86_64/migration

>>

>> Reported-by: Alex Bennée <alex.bennee@linaro.org>

>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

>> ---

>>  tests/qtest/migration-test.c | 21 +++++++++++----------

>>  1 file changed, 11 insertions(+), 10 deletions(-)

> 

> This seems to fix the gcov/gprof test indeed:

> 

>  https://travis-ci.com/github/huth/qemu/jobs/398270396

> 

> Thus:

> 

> Tested-by: Thomas Huth <thuth@redhat.com>

> 

> I'm also queuing this to my qtest-next branch (in case you don't plan a

> migration pull request within the next days):

> 

>  https://gitlab.com/huth/qemu/-/commits/qtest-next/


FYI, this patch fails to build on non-Linux systems:

https://cirrus-ci.com/task/5951706225704960?command=main#L6076

The #define needs to be moved out of the #if defined(__linux__) block. I can
fixup the patch here locally, but if you want to include it in your next
migration pull request instead, you should do that, too.

 Cheers,
  Thomas
diff mbox series

Patch

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 00a233cd8c..481db4e929 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -44,6 +44,9 @@  static bool uffd_feature_thread_id;
 #include <sys/ioctl.h>
 #include <linux/userfaultfd.h>
 
+/* A downtime where the test really should converge */
+#define CONVERGE_DOWNTIME 1000
+
 static bool ufd_version_check(void)
 {
     struct uffdio_api api_struct;
@@ -864,8 +867,7 @@  static void test_precopy_unix(void)
 
     wait_for_migration_pass(from);
 
-    /* 300 ms should converge */
-    migrate_set_parameter_int(from, "downtime-limit", 300);
+    migrate_set_parameter_int(from, "downtime-limit", CONVERGE_DOWNTIME);
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");
@@ -946,10 +948,12 @@  static void test_xbzrle(const char *uri)
 
     migrate_qmp(from, uri, "{}");
 
+    wait_for_migration_pass(from);
+    /* Make sure we have 2 passes, so the xbzrle cache gets a workout */
     wait_for_migration_pass(from);
 
-    /* 300ms should converge */
-    migrate_set_parameter_int(from, "downtime-limit", 300);
+    /* 1000ms should converge */
+    migrate_set_parameter_int(from, "downtime-limit", 1000);
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");
@@ -999,8 +1003,7 @@  static void test_precopy_tcp(void)
 
     wait_for_migration_pass(from);
 
-    /* 300ms should converge */
-    migrate_set_parameter_int(from, "downtime-limit", 300);
+    migrate_set_parameter_int(from, "downtime-limit", CONVERGE_DOWNTIME);
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");
@@ -1068,8 +1071,7 @@  static void test_migrate_fd_proto(void)
 
     wait_for_migration_pass(from);
 
-    /* 300ms should converge */
-    migrate_set_parameter_int(from, "downtime-limit", 300);
+    migrate_set_parameter_int(from, "downtime-limit", CONVERGE_DOWNTIME);
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");
@@ -1304,8 +1306,7 @@  static void test_multifd_tcp(const char *method)
 
     wait_for_migration_pass(from);
 
-    /* 300ms it should converge */
-    migrate_set_parameter_int(from, "downtime-limit", 300);
+    migrate_set_parameter_int(from, "downtime-limit", CONVERGE_DOWNTIME);
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");