mbox

[PULL,for,7.2,00/10] testing and doc updates

Message ID 20221115133439.2348929-1-alex.bennee@linaro.org
State New
Headers show

Pull-request

https://gitlab.com/stsquad/qemu.git tags/pull-misc-for-7.2-151122-2

Message

Alex Bennée Nov. 15, 2022, 1:34 p.m. UTC
The following changes since commit 98f10f0e2613ba1ac2ad3f57a5174014f6dcb03d:

  Merge tag 'pull-target-arm-20221114' of https://git.linaro.org/people/pmaydell/qemu-arm into staging (2022-11-14 13:31:17 -0500)

are available in the Git repository at:

  https://gitlab.com/stsquad/qemu.git tags/pull-misc-for-7.2-151122-2

for you to fetch changes up to 6bac1087ef4c6b190c865384dd69cde683b977bf:

  gitlab: integrate coverage report (2022-11-15 12:21:34 +0000)

----------------------------------------------------------------
Testing and doc updates:

  - Only probe if docker or podman binaries in path
  - tweak avacado console to better find login prompts
  - reduce console noise for aspeed avocado tests
  - update documents on maintainer roles and process
  - raise timeout for ppc64 avocado tests
  - integrate coverage reports into gitlab

----------------------------------------------------------------
Alex Bennée (7):
      tests/avocado: improve behaviour waiting for login prompts
      tests/docker: allow user to override check target
      docs/devel: add a maintainers section to development process
      docs/devel: make language a little less code centric
      docs/devel: simplify the minimal checklist
      docs/devel: try and improve the language around patch review
      gitlab: integrate coverage report

Cédric Le Goater (1):
      tests/avocado/machine_aspeed.py: Reduce noise on the console for SDK tests

Peter Maydell (1):
      tests/avocado: Raise timeout for boot_linux.py:BootLinuxPPC64.test_pseries_tcg

Stefan Weil (1):
      Run docker probe only if docker or podman are available

 docs/devel/code-of-conduct.rst           |   2 +
 docs/devel/index-process.rst             |   1 +
 docs/devel/maintainers.rst               | 106 +++++++++++++++++++++++++++++++
 docs/devel/submitting-a-patch.rst        | 101 ++++++++++++++++++-----------
 docs/devel/submitting-a-pull-request.rst |  12 ++--
 configure                                |   2 +-
 .gitlab-ci.d/buildtest.yml               |  12 +++-
 tests/avocado/avocado_qemu/__init__.py   |  90 +++++++++++++++++++++++++-
 tests/avocado/boot_linux.py              |   2 +-
 tests/avocado/machine_aspeed.py          |  17 +++--
 tests/docker/Makefile.include            |   2 +
 tests/docker/common.rc                   |   6 +-
 12 files changed, 297 insertions(+), 56 deletions(-)
 create mode 100644 docs/devel/maintainers.rst

Comments

Stefan Hajnoczi Nov. 15, 2022, 11:53 p.m. UTC | #1
This pull request causes the following CI failure:

https://gitlab.com/qemu-project/qemu/-/jobs/3328449477

I haven't figured out the root cause of the failure. Maybe the pull
request just exposes a latent failure. Please take a look and we can
try again for -rc2.

Thanks,
Stefan
Alex Bennée Nov. 16, 2022, 6:20 p.m. UTC | #2
Stefan Hajnoczi <stefanha@gmail.com> writes:

> This pull request causes the following CI failure:
>
> https://gitlab.com/qemu-project/qemu/-/jobs/3328449477
>
> I haven't figured out the root cause of the failure. Maybe the pull
> request just exposes a latent failure. Please take a look and we can
> try again for -rc2.

OK after a lot of digging I've come to the following conclusion:

  * the Fuloong 2E machine never enables the FIFO on the 16550 (s->fcr & UART_FCR_FE)
  * as a result if qemu_chr_fe_write(&s->chr, &s->tsr, 1) fails with -EAGAIN
    - a serial_watch_cb is queued
    - s->tsr_retry++
  * additional serial_ioport_write's overwrite s->thr
  * the console output gets corrupted

You can see the effect by comparing the serial write and xmit values:

  ➜  grep serial_write alex.log | cut -d ' ' -f 6 | xxd -r -p | head -n 10
  [    0.000000] Initializing cgroup subsys cpuset
  [    0.000000] Initializing cgroup subsys cpu
  [    0.000000] Initializing cgroup subsys cpuacct
  [    0.000000] Linux version 3.16.0-6-loongson-2e (debian-kernel@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 Debian 3.16.56-1+deb8u1 (2018-05-08)
  [    0.000000] memsize=256, highmemsize=0
  [    0.000000] CpuClock = 533080000
  [    0.000000] bootconsole [early0] enabled
  [    0.000000] CPU0 revision is: 00006302 (ICT Loongson-2)
  [    0.000000] FPU revision is: 00000501
  [    0.000000] Checking for the multiply/shift bug... no.
  🕙18:27:17 alex@zen:qemu.git/builds/all  on  pr/141122-misc-for-7.2-1 [$!?⇕] 
  ➜  grep serial_xmit alex.log | cut -d ' ' -f 2 | xxd -r -p | head -n 10
  [    0.000000] Initializing cgroup subsys cpuset
  [    0.000000] Initializing cgroup subsys cpu
  [    0.000000] Initializing cgroup subsys cpuacct
  [    0.000000] Linux version 3.16.0-6-loongson-2e (debian-kernel@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 Debian 33 0.000000] bootconsole [early0] enabled
  [    0.000000] CPU0 revision is: 00006302 (ICT Loongson-2)
  [    0.000000] FPU revision is: 00000501
  [    0.000000] Checking for the multiply/shift bug... no.
  [    0.000000] Checking for the daddiu bug... no.
  [    0.000000] Determined physical RAM map:
  [    0.000000]  memory: 000

As a result the check for the pattern fails:

        console_pattern = 'Kernel command line: %s' % kernel_command_line
        self.wait_for_console_pattern(console_pattern)

resulting in a timeout and test fail.

In effect the configuration makes the output dependent on how fast the
avocado test can drain the socket as there is no buffering elsewhere in
the system. The changes in:

  Subject: [PULL 02/10] tests/avocado: improve behaviour waiting for login prompts

makes this failure more likely to happen - I think because the .peek() and
.readline() behaviour have different buffering strategies. Options
include:

  - enable the 16550 FIFO for the Loognson kernel (command line option?)
  - increase the buffering of the python socket.socket() code

I can get it to pass by shuffling the time.sleep() and a few other
checks around but that seems flaky at best.
Mark Cave-Ayland Nov. 16, 2022, 7:26 p.m. UTC | #3
On 16/11/2022 18:20, Alex Bennée wrote:

> Stefan Hajnoczi <stefanha@gmail.com> writes:
> 
>> This pull request causes the following CI failure:
>>
>> https://gitlab.com/qemu-project/qemu/-/jobs/3328449477
>>
>> I haven't figured out the root cause of the failure. Maybe the pull
>> request just exposes a latent failure. Please take a look and we can
>> try again for -rc2.
> 
> OK after a lot of digging I've come to the following conclusion:
> 
>    * the Fuloong 2E machine never enables the FIFO on the 16550 (s->fcr & UART_FCR_FE)
>    * as a result if qemu_chr_fe_write(&s->chr, &s->tsr, 1) fails with -EAGAIN
>      - a serial_watch_cb is queued
>      - s->tsr_retry++
>    * additional serial_ioport_write's overwrite s->thr
>    * the console output gets corrupted
> 
> You can see the effect by comparing the serial write and xmit values:
> 
>    ➜  grep serial_write alex.log | cut -d ' ' -f 6 | xxd -r -p | head -n 10
>    [    0.000000] Initializing cgroup subsys cpuset
>    [    0.000000] Initializing cgroup subsys cpu
>    [    0.000000] Initializing cgroup subsys cpuacct
>    [    0.000000] Linux version 3.16.0-6-loongson-2e (debian-kernel@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 Debian 3.16.56-1+deb8u1 (2018-05-08)
>    [    0.000000] memsize=256, highmemsize=0
>    [    0.000000] CpuClock = 533080000
>    [    0.000000] bootconsole [early0] enabled
>    [    0.000000] CPU0 revision is: 00006302 (ICT Loongson-2)
>    [    0.000000] FPU revision is: 00000501
>    [    0.000000] Checking for the multiply/shift bug... no.
>    🕙18:27:17 alex@zen:qemu.git/builds/all  on  pr/141122-misc-for-7.2-1 [$!?⇕]
>    ➜  grep serial_xmit alex.log | cut -d ' ' -f 2 | xxd -r -p | head -n 10
>    [    0.000000] Initializing cgroup subsys cpuset
>    [    0.000000] Initializing cgroup subsys cpu
>    [    0.000000] Initializing cgroup subsys cpuacct
>    [    0.000000] Linux version 3.16.0-6-loongson-2e (debian-kernel@lists.debian.org) (gcc version 4.8.4 (Debian 4.8.4-1) ) #1 Debian 33 0.000000] bootconsole [early0] enabled
>    [    0.000000] CPU0 revision is: 00006302 (ICT Loongson-2)
>    [    0.000000] FPU revision is: 00000501
>    [    0.000000] Checking for the multiply/shift bug... no.
>    [    0.000000] Checking for the daddiu bug... no.
>    [    0.000000] Determined physical RAM map:
>    [    0.000000]  memory: 000
> 
> As a result the check for the pattern fails:
> 
>          console_pattern = 'Kernel command line: %s' % kernel_command_line
>          self.wait_for_console_pattern(console_pattern)
> 
> resulting in a timeout and test fail.
> 
> In effect the configuration makes the output dependent on how fast the
> avocado test can drain the socket as there is no buffering elsewhere in
> the system. The changes in:
> 
>    Subject: [PULL 02/10] tests/avocado: improve behaviour waiting for login prompts
> 
> makes this failure more likely to happen - I think because the .peek() and
> .readline() behaviour have different buffering strategies. Options
> include:
> 
>    - enable the 16550 FIFO for the Loognson kernel (command line option?)
>    - increase the buffering of the python socket.socket() code
> 
> I can get it to pass by shuffling the time.sleep() and a few other
> checks around but that seems flaky at best.

Nice work! This is the well-known problem whereby the kernel sometimes expects the 
BIOS to have pre-configured the serial ports, which of course never happens when 
booting directly with -kernel.

Given that the fuloong2e machine already has a mini "trampoline" bootloader, would it 
be possible to tweak write_bootloader() at 
https://gitlab.com/qemu-project/qemu/-/blob/master/hw/mips/fuloong2e.c#L166 to set 
UART_FCR_FE on the available UARTs before jumping into the kernel?


ATB,

Mark.