mbox series

[v1,0/8] testing/next

Message ID 20190220172811.10588-1-alex.bennee@linaro.org
Headers show
Series testing/next | expand

Message

Alex Bennée Feb. 20, 2019, 5:28 p.m. UTC
Hi,

Here is the current status of testing/next which fixes some of the
current instability as well as some totally broken builds. However we
have two failures I'm still trying to track down:

  tests/boot-sector.c:161:boot_sector_test:
    assertion failed (signature == SIGNATURE): (0x0000face == 0x0000dead)

I have seen this locally and got a core dump but it doesn't show much.
I'm going to see if I can get more out of a debug build. It is
generated by:

  tests/cdrom-test -m=quick -k --tap

I think the subtest is:

  /x86_64/cdrom/boot/isapc

Attaching to the child QEMU looks like it is eternally returning
-EINTR and looping around with the occasionally kvm_handle_io to port
146 and port 112. I guess under test conditions this eventually times
out and dies.

The other is the failure in the gprof build:

  PASS 55 ahci-test /x86_64/ahci/flush/migrate
  tests/libqtest.c:143: kill_qemu() detected QEMU death from signal 11
      (Segmentation fault) (core dumped)
  ERROR - too few tests run (expected 66, got 55)
  Aborted (core dumped)
  /home/travis/build/stsquad/qemu/tests/Makefile.include:857: recipe
      for target 'check-qtest-x86_64' failed

So far attempts to re-create this locally have failed. It may be a
Travis related environment thing.

The following patches need review
: patch 0002/.travis.yml split debug builds.patch
: patch 0005/tests docker squash initial update and install st.patch
: patch 0006/tests docker peg netmap code to a specific versio.patch
: patch 0008/tests softfloat always do quick softfloat tests.patch

Alex Bennée (4):
  .travis.yml: split debug builds
  tests/docker: squash initial update and install step for docker9
  tests/docker: peg netmap code to a specific version
  tests/softfloat: always do quick softfloat tests

Dr. David Alan Gilbert (2):
  .travis.yml: Test with disable-replication
  .travis.yml: Remove disable-uuid

Paolo Bonzini (1):
  .travis.yml: the xcode10 image seems to be hosed

Thomas Huth (1):
  Add a gitlab-ci file for Continuous Integration testing on Gitlab

 .gitlab-ci.yml                               | 73 ++++++++++++++++++++
 .travis.yml                                  | 16 ++---
 MAINTAINERS                                  |  5 ++
 tests/Makefile.include                       |  6 +-
 tests/docker/dockerfiles/debian-amd64.docker |  1 +
 tests/docker/dockerfiles/debian9.docker      |  4 +-
 6 files changed, 90 insertions(+), 15 deletions(-)
 create mode 100644 .gitlab-ci.yml

-- 
2.20.1

Comments

Philippe Mathieu-Daudé Feb. 20, 2019, 7:07 p.m. UTC | #1
On 2/20/19 6:28 PM, Alex Bennée wrote:
> Hi,

> 

> Here is the current status of testing/next which fixes some of the

> current instability as well as some totally broken builds. However we

> have two failures I'm still trying to track down:

> 

>   tests/boot-sector.c:161:boot_sector_test:

>     assertion failed (signature == SIGNATURE): (0x0000face == 0x0000dead)

> 

> I have seen this locally and got a core dump but it doesn't show much.

> I'm going to see if I can get more out of a debug build. It is

> generated by:

> 

>   tests/cdrom-test -m=quick -k --tap

> 

> I think the subtest is:

> 

>   /x86_64/cdrom/boot/isapc

> 

> Attaching to the child QEMU looks like it is eternally returning

> -EINTR and looping around with the occasionally kvm_handle_io to port

> 146 and port 112. I guess under test conditions this eventually times

> out and dies.

> 

> The other is the failure in the gprof build:

> 

>   PASS 55 ahci-test /x86_64/ahci/flush/migrate

>   tests/libqtest.c:143: kill_qemu() detected QEMU death from signal 11

>       (Segmentation fault) (core dumped)

>   ERROR - too few tests run (expected 66, got 55)

>   Aborted (core dumped)

>   /home/travis/build/stsquad/qemu/tests/Makefile.include:857: recipe

>       for target 'check-qtest-x86_64' failed

> 

> So far attempts to re-create this locally have failed. It may be a

> Travis related environment thing.


On OSX (Xcode9.4) I still have:

ERROR:tests/test-aio.c:501:test_timer_schedule: assertion failed:
(aio_poll(ctx, true))
ERROR - Bail out! ERROR:tests/test-aio.c:501:test_timer_schedule:
assertion failed: (aio_poll(ctx, true))
make: *** [check-unit] Error 1
make: *** Waiting for unfinished jobs....

https://travis-ci.org/philmd/qemu/jobs/495761523
Alex Bennée Feb. 20, 2019, 8:52 p.m. UTC | #2
Alex Bennée <alex.bennee@linaro.org> writes:

> Hi,

>

>   tests/boot-sector.c:161:boot_sector_test:

>     assertion failed (signature == SIGNATURE): (0x0000face == 0x0000dead)

>

> I have seen this locally and got a core dump but it doesn't show much.

> I'm going to see if I can get more out of a debug build. It is

> generated by:

>

>   tests/cdrom-test -m=quick -k --tap

>

> I think the subtest is:

>

>   /x86_64/cdrom/boot/isapc

>

> Attaching to the child QEMU looks like it is eternally returning

> -EINTR and looping around with the occasionally kvm_handle_io to port

> 146 and port 112. I guess under test conditions this eventually times

> out and dies.

<snip>

This is load related. I can run:

  retry.py --timeout 600 -n 500 -c -- \
    env QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 \
        QTEST_QEMU_IMG=qemu-img ./tests/cdrom-test \
        -m=quick -p /x86_64/cdrom/boot/isapc

and then all 500 run fine. If I do the same while running a make check
check-tcg -j9 in another build directory it hangs within 20 odd
attempts.

Can anyone else reproduce this?


--
Alex Bennée
Thomas Huth Feb. 21, 2019, 8:51 a.m. UTC | #3
On 20/02/2019 21.52, Alex Bennée wrote:
> 

> Alex Bennée <alex.bennee@linaro.org> writes:

> 

>> Hi,

>>

>>   tests/boot-sector.c:161:boot_sector_test:

>>     assertion failed (signature == SIGNATURE): (0x0000face == 0x0000dead)

>>

>> I have seen this locally and got a core dump but it doesn't show much.

>> I'm going to see if I can get more out of a debug build. It is

>> generated by:

>>

>>   tests/cdrom-test -m=quick -k --tap

>>

>> I think the subtest is:

>>

>>   /x86_64/cdrom/boot/isapc

>>

>> Attaching to the child QEMU looks like it is eternally returning

>> -EINTR and looping around with the occasionally kvm_handle_io to port

>> 146 and port 112. I guess under test conditions this eventually times

>> out and dies.

> <snip>

> 

> This is load related. I can run:

> 

>   retry.py --timeout 600 -n 500 -c -- \

>     env QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 \

>         QTEST_QEMU_IMG=qemu-img ./tests/cdrom-test \

>         -m=quick -p /x86_64/cdrom/boot/isapc

> 

> and then all 500 run fine. If I do the same while running a make check

> check-tcg -j9 in another build directory it hangs within 20 odd

> attempts.

> 

> Can anyone else reproduce this?


I can't reproduce it here. Might be worth a try to check the BIOS output
in that case. Add this patch:

diff --git a/tests/cdrom-test.c b/tests/cdrom-test.c
index 14bd981..c38e016 100644
--- a/tests/cdrom-test.c
+++ b/tests/cdrom-test.c
@@ -132,7 +132,7 @@ static void add_x86_tests(void)
     qtest_add_data_func("cdrom/boot/virtio-scsi",
                         "-device virtio-scsi -device scsi-cd,drive=cdr "
                         "-blockdev file,node-name=cdr,filename=", test_cdboot);
-    qtest_add_data_func("cdrom/boot/isapc", "-M isapc "
+    qtest_add_data_func("cdrom/boot/isapc", "-M isapc -vga none -device sga -serial file:/tmp/stdio "
                         "-drive if=ide,media=cdrom,file=", test_cdboot);
     qtest_add_data_func("cdrom/boot/am53c974",
                         "-device am53c974 -device scsi-cd,drive=cd1 "

... then check /tmp/stdio when it hangs.

 Thomas
Alex Bennée Feb. 21, 2019, 1 p.m. UTC | #4
Thomas Huth <thuth@redhat.com> writes:

> On 20/02/2019 21.52, Alex Bennée wrote:

>>

>> Alex Bennée <alex.bennee@linaro.org> writes:

>>

>>> Hi,

>>>

>>>   tests/boot-sector.c:161:boot_sector_test:

>>>     assertion failed (signature == SIGNATURE): (0x0000face == 0x0000dead)

>>>

>>> I have seen this locally and got a core dump but it doesn't show much.

>>> I'm going to see if I can get more out of a debug build. It is

>>> generated by:

>>>

>>>   tests/cdrom-test -m=quick -k --tap

>>>

>>> I think the subtest is:

>>>

>>>   /x86_64/cdrom/boot/isapc

>>>

>>> Attaching to the child QEMU looks like it is eternally returning

>>> -EINTR and looping around with the occasionally kvm_handle_io to port

>>> 146 and port 112. I guess under test conditions this eventually times

>>> out and dies.

>> <snip>

>>

>> This is load related. I can run:

>>

>>   retry.py --timeout 600 -n 500 -c -- \

>>     env QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 \

>>         QTEST_QEMU_IMG=qemu-img ./tests/cdrom-test \

>>         -m=quick -p /x86_64/cdrom/boot/isapc

>>

>> and then all 500 run fine. If I do the same while running a make check

>> check-tcg -j9 in another build directory it hangs within 20 odd

>> attempts.

>>

>> Can anyone else reproduce this?


I should also mention my current config for the reproducer test is:

  '../../configure' '--python=python3' '--disable-docs' '--disable-tools' '--disable-libusb' '--disable-vte' '--disable-xen' '--enable-debug' '--extra-cflags=-O0 -g3' '--target-list=x86_64-softmmu'

>

> I can't reproduce it here. Might be worth a try to check the BIOS output

> in that case. Add this patch:

>

> diff --git a/tests/cdrom-test.c b/tests/cdrom-test.c

> index 14bd981..c38e016 100644

> --- a/tests/cdrom-test.c

> +++ b/tests/cdrom-test.c

> @@ -132,7 +132,7 @@ static void add_x86_tests(void)

>      qtest_add_data_func("cdrom/boot/virtio-scsi",

>                          "-device virtio-scsi -device scsi-cd,drive=cdr "

>                          "-blockdev file,node-name=cdr,filename=", test_cdboot);

> -    qtest_add_data_func("cdrom/boot/isapc", "-M isapc "

> +    qtest_add_data_func("cdrom/boot/isapc", "-M isapc -vga none -device sga -serial file:/tmp/stdio "

>                          "-drive if=ide,media=cdrom,file=", test_cdboot);

>      qtest_add_data_func("cdrom/boot/am53c974",

>                          "-device am53c974 -device scsi-cd,drive=cd1 "

>

> ... then check /tmp/stdio when it hangs.


This gets us:

  SeaBIOS (version rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org)
  Booting from Floppy...
  Boot failed: could not read the boot disk

  Booting from DVD/CD...
  Boot failed: Could not read from CDROM (code 000c)
  Booting from DVD/CD...
  Boot failed: Could not read from CDROM (code 0003)
  Booting from Hard Disk...
  Boot failed: could not read the boot disk

  No bootable device.

I tried to bisect but this occurs even in v3.0.0

  SeaBIOS (version rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org)
  Booting from Floppy...
  Boot failed: could not read the boot disk

  Booting from DVD/CD...
  Boot failed: Could not read from CDROM (code 000c)
  Booting from DVD/CD...
  Boot failed: Could not read from CDROM (code 0003)
  Booting from Hard Disk...
  Boot failed: could not read the boot disk

  No bootable device.

Can I pass this over to the block/bios people to look at. I think at
this point I think the test has always been unstable but tends only to
trigger the fault on Travis because it is highly loaded.

In the meantime I'll see if I can disable the test for the CI runs.

--
Alex Bennée
Thomas Huth Feb. 21, 2019, 4:21 p.m. UTC | #5
On 21/02/2019 14.00, Alex Bennée wrote:
> 

> Thomas Huth <thuth@redhat.com> writes:

> 

>> On 20/02/2019 21.52, Alex Bennée wrote:

>>>

>>> Alex Bennée <alex.bennee@linaro.org> writes:

>>>

>>>> Hi,

>>>>

>>>>   tests/boot-sector.c:161:boot_sector_test:

>>>>     assertion failed (signature == SIGNATURE): (0x0000face == 0x0000dead)

>>>>

>>>> I have seen this locally and got a core dump but it doesn't show much.

>>>> I'm going to see if I can get more out of a debug build. It is

>>>> generated by:

>>>>

>>>>   tests/cdrom-test -m=quick -k --tap

>>>>

>>>> I think the subtest is:

>>>>

>>>>   /x86_64/cdrom/boot/isapc

>>>>

>>>> Attaching to the child QEMU looks like it is eternally returning

>>>> -EINTR and looping around with the occasionally kvm_handle_io to port

>>>> 146 and port 112. I guess under test conditions this eventually times

>>>> out and dies.

>>> <snip>

>>>

>>> This is load related. I can run:

>>>

>>>   retry.py --timeout 600 -n 500 -c -- \

>>>     env QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 \

>>>         QTEST_QEMU_IMG=qemu-img ./tests/cdrom-test \

>>>         -m=quick -p /x86_64/cdrom/boot/isapc

>>>

>>> and then all 500 run fine. If I do the same while running a make check

>>> check-tcg -j9 in another build directory it hangs within 20 odd

>>> attempts.

>>>

>>> Can anyone else reproduce this?

> 

> I should also mention my current config for the reproducer test is:

> 

>   '../../configure' '--python=python3' '--disable-docs' '--disable-tools' '--disable-libusb' '--disable-vte' '--disable-xen' '--enable-debug' '--extra-cflags=-O0 -g3' '--target-list=x86_64-softmmu'

> 

>>

>> I can't reproduce it here. Might be worth a try to check the BIOS output

>> in that case. Add this patch:

>>

>> diff --git a/tests/cdrom-test.c b/tests/cdrom-test.c

>> index 14bd981..c38e016 100644

>> --- a/tests/cdrom-test.c

>> +++ b/tests/cdrom-test.c

>> @@ -132,7 +132,7 @@ static void add_x86_tests(void)

>>      qtest_add_data_func("cdrom/boot/virtio-scsi",

>>                          "-device virtio-scsi -device scsi-cd,drive=cdr "

>>                          "-blockdev file,node-name=cdr,filename=", test_cdboot);

>> -    qtest_add_data_func("cdrom/boot/isapc", "-M isapc "

>> +    qtest_add_data_func("cdrom/boot/isapc", "-M isapc -vga none -device sga -serial file:/tmp/stdio "

>>                          "-drive if=ide,media=cdrom,file=", test_cdboot);

>>      qtest_add_data_func("cdrom/boot/am53c974",

>>                          "-device am53c974 -device scsi-cd,drive=cd1 "

>>

>> ... then check /tmp/stdio when it hangs.

> 

> This gets us:

> 

>   SeaBIOS (version rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org)

>   Booting from Floppy...

>   Boot failed: could not read the boot disk

> 

>   Booting from DVD/CD...

>   Boot failed: Could not read from CDROM (code 000c)

>   Booting from DVD/CD...

>   Boot failed: Could not read from CDROM (code 0003)

>   Booting from Hard Disk...

>   Boot failed: could not read the boot disk

> 

>   No bootable device.

> 

> I tried to bisect but this occurs even in v3.0.0

Weird, if it also occurs with 3.0 already, why didn't we notice this
earlier?

> In the meantime I'll see if I can disable the test for the CI runs.


Ok, but that likely means nobody is going to fix it anymore ... anyway,
if we still keep it for the SPEED=slow mode, we will at least still
notice if it breaks completely one day...

 Thomas