mbox series

[v7,00/14] Reverse debugging

Message ID 160174516520.12451.10785284392438702137.stgit@pasha-ThinkPad-X280
Headers show
Series Reverse debugging | expand

Message

Pavel Dovgalyuk Oct. 3, 2020, 5:12 p.m. UTC
GDB remote protocol supports reverse debugging of the targets.
It includes 'reverse step' and 'reverse continue' operations.
The first one finds the previous step of the execution,
and the second one is intended to stop at the last breakpoint that
would happen when the program is executed normally.

Reverse debugging is possible in the replay mode, when at least
one snapshot was created at the record or replay phase.
QEMU can use these snapshots for travelling back in time with GDB.

Running the execution in replay mode allows using GDB reverse debugging
commands:
 - reverse-stepi (or rsi): Steps one instruction to the past.
   QEMU loads on of the prior snapshots and proceeds to the desired
   instruction forward. When that step is reaches, execution stops.
 - reverse-continue (or rc): Runs execution "backwards".
   QEMU tries to find breakpoint or watchpoint by loaded prior snapshot
   and replaying the execution. Then QEMU loads snapshots again and
   replays to the latest breakpoint. When there are no breakpoints in
   the examined section of the execution, QEMU finds one more snapshot
   and tries again. After the first snapshot is processed, execution
   stops at this snapshot.

The set of patches include the following modifications:
 - gdbstub update for reverse debugging support
 - functions that automatically perform reverse step and reverse
   continue operations
 - hmp/qmp commands for manipulating the replay process
 - improvement of the snapshotting for saving the execution step
   in the snapshot parameters
 - avocado-based acceptance tests for reverse debugging

The patches are available in the repository:
https://github.com/ispras/qemu/tree/rr-200901

v7 changes:
 - updated snapshot info output format
 - fixed qcow2 snapshot-related tests
v6 changes:
 - removed passing err variable without checking it's value after
v5 changes:
 - disabled reverse debugging tests for gitlab-based testing
   due to the unidentified timeout problem
v4 changes:
 - added VM snapshot creation on gdb connect (suggested by Alex Bennée)
 - removed useless calls to error_free
 - updated poll interrupt processing
 - minor changes
v3 changes:
 - rebased to support the new build system
 - bumped avocado framework version for using fixed remote gdb client
v2 changes:
 - rebased to the latest upstream version
 - fixed replaying of the POLL interrupts after the latest debug changes

---

Pavel Dovgaluk (10):
      replay: provide an accessor for rr filename
      qapi: introduce replay.json for record/replay-related stuff
      replay: introduce info hmp/qmp command
      replay: introduce breakpoint at the specified step
      replay: implement replay-seek command
      replay: flush rr queue before loading the vmstate
      gdbstub: add reverse step support in replay mode
      gdbstub: add reverse continue support in replay mode
      replay: describe reverse debugging in docs/replay.txt
      tests/acceptance: add reverse debugging test

Pavel Dovgalyuk (4):
      replay: don't record interrupt poll
      qcow2: introduce icount field for snapshots
      migration: introduce icount field for snapshots
      replay: create temporary snapshot at debugger connection


 MAINTAINERS                           |    2 
 accel/tcg/cpu-exec.c                  |   21 ++
 accel/tcg/translator.c                |    1 
 block/qapi.c                          |   18 +-
 block/qcow2-snapshot.c                |    9 +
 block/qcow2.h                         |    3 
 blockdev.c                            |   10 +
 docs/interop/qcow2.txt                |    5 
 docs/replay.txt                       |   46 +++++
 exec.c                                |    8 +
 gdbstub.c                             |   64 ++++++
 hmp-commands-info.hx                  |   11 +
 hmp-commands.hx                       |   50 +++++
 include/block/snapshot.h              |    1 
 include/monitor/hmp.h                 |    4 
 include/sysemu/replay.h               |   26 +++
 migration/savevm.c                    |   17 +-
 qapi/block-core.json                  |   11 +
 qapi/meson.build                      |    1 
 qapi/misc.json                        |   18 --
 qapi/qapi-schema.json                 |    1 
 qapi/replay.json                      |  121 ++++++++++++
 replay/meson.build                    |    1 
 replay/replay-debugging.c             |  334 +++++++++++++++++++++++++++++++++
 replay/replay-events.c                |    4 
 replay/replay-internal.h              |    6 -
 replay/replay.c                       |   22 ++
 softmmu/cpus.c                        |   19 ++
 stubs/replay.c                        |   15 +
 tests/acceptance/reverse_debugging.py |  208 +++++++++++++++++++++
 tests/qemu-iotests/261                |   19 +-
 tests/qemu-iotests/261.out            |   51 +++--
 tests/qemu-iotests/267.out            |   48 ++---
 33 files changed, 1086 insertions(+), 89 deletions(-)
 create mode 100644 qapi/replay.json
 create mode 100644 replay/replay-debugging.c
 create mode 100644 tests/acceptance/reverse_debugging.py

--
Pavel Dovgalyuk

Comments

no-reply@patchew.org Oct. 4, 2020, 1:06 a.m. UTC | #1
Patchew URL: https://patchew.org/QEMU/160174516520.12451.10785284392438702137.stgit@pasha-ThinkPad-X280/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 160174516520.12451.10785284392438702137.stgit@pasha-ThinkPad-X280
Subject: [PATCH v7 00/14] Reverse debugging

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
ba53a9d tests/acceptance: add reverse debugging test
c6aa9c5 replay: create temporary snapshot at debugger connection
17d5c46 replay: describe reverse debugging in docs/replay.txt
1f88bff gdbstub: add reverse continue support in replay mode
72ef5d6 gdbstub: add reverse step support in replay mode
42bf7cc replay: flush rr queue before loading the vmstate
4285666 replay: implement replay-seek command
653aa62 replay: introduce breakpoint at the specified step
59ab65a replay: introduce info hmp/qmp command
1bc0b45 qapi: introduce replay.json for record/replay-related stuff
c4b17f7 migration: introduce icount field for snapshots
03d28c5 qcow2: introduce icount field for snapshots
6de69ce replay: provide an accessor for rr filename
8ba3d42 replay: don't record interrupt poll

=== OUTPUT BEGIN ===
1/14 Checking commit 8ba3d42631d9 (replay: don't record interrupt poll)
2/14 Checking commit 6de69cee86b9 (replay: provide an accessor for rr filename)
3/14 Checking commit 03d28c50b445 (qcow2: introduce icount field for snapshots)
4/14 Checking commit c4b17f7373f0 (migration: introduce icount field for snapshots)
ERROR: trailing whitespace
#251: FILE: tests/qemu-iotests/267.out:37:
+--        snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000           $

ERROR: trailing whitespace
#262: FILE: tests/qemu-iotests/267.out:48:
+--        snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000           $

ERROR: trailing whitespace
#273: FILE: tests/qemu-iotests/267.out:73:
+--        snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000           $

ERROR: trailing whitespace
#284: FILE: tests/qemu-iotests/267.out:98:
+--        snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000           $

ERROR: trailing whitespace
#295: FILE: tests/qemu-iotests/267.out:109:
+--        snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000           $

ERROR: trailing whitespace
#306: FILE: tests/qemu-iotests/267.out:123:
+--        snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000           $

ERROR: trailing whitespace
#317: FILE: tests/qemu-iotests/267.out:138:
+--        snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000           $

ERROR: trailing whitespace
#328: FILE: tests/qemu-iotests/267.out:149:
+--        snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000           $

ERROR: trailing whitespace
#337: FILE: tests/qemu-iotests/267.out:156:
+1         snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000           $

ERROR: trailing whitespace
#348: FILE: tests/qemu-iotests/267.out:170:
+--        snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000           $

ERROR: trailing whitespace
#357: FILE: tests/qemu-iotests/267.out:177:
+1         snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000           $

ERROR: trailing whitespace
#363: FILE: tests/qemu-iotests/267.out:181:
+1         snap0                SIZE yyyy-mm-dd hh:mm:ss 00:00:00.000           $

total: 12 errors, 0 warnings, 275 lines checked

Patch 4/14 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

5/14 Checking commit 1bc0b45203ea (qapi: introduce replay.json for record/replay-related stuff)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#93: 
new file mode 100644

total: 0 errors, 1 warnings, 78 lines checked

Patch 5/14 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
6/14 Checking commit 59ab65a00e3b (replay: introduce info hmp/qmp command)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#122: 
new file mode 100644

total: 0 errors, 1 warnings, 120 lines checked

Patch 6/14 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
7/14 Checking commit 653aa622c001 (replay: introduce breakpoint at the specified step)
8/14 Checking commit 4285666198ee (replay: implement replay-seek command)
9/14 Checking commit 42bf7cc3ae4e (replay: flush rr queue before loading the vmstate)
10/14 Checking commit 72ef5d64fb17 (gdbstub: add reverse step support in replay mode)
11/14 Checking commit 1f88bff3b6ee (gdbstub: add reverse continue support in replay mode)
12/14 Checking commit 17d5c466b4de (replay: describe reverse debugging in docs/replay.txt)
13/14 Checking commit c6aa9c57bfcf (replay: create temporary snapshot at debugger connection)
14/14 Checking commit ba53a9d9a49e (tests/acceptance: add reverse debugging test)
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#41: 
new file mode 100644

total: 0 errors, 1 warnings, 215 lines checked

Patch 14/14 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/160174516520.12451.10785284392438702137.stgit@pasha-ThinkPad-X280/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Paolo Bonzini Oct. 5, 2020, 12:27 p.m. UTC | #2
On 03/10/20 19:12, Pavel Dovgalyuk wrote:
> GDB remote protocol supports reverse debugging of the targets.

> It includes 'reverse step' and 'reverse continue' operations.

> The first one finds the previous step of the execution,

> and the second one is intended to stop at the last breakpoint that

> would happen when the program is executed normally.

> 

> Reverse debugging is possible in the replay mode, when at least

> one snapshot was created at the record or replay phase.

> QEMU can use these snapshots for travelling back in time with GDB.

> 

> Running the execution in replay mode allows using GDB reverse debugging

> commands:

>  - reverse-stepi (or rsi): Steps one instruction to the past.

>    QEMU loads on of the prior snapshots and proceeds to the desired

>    instruction forward. When that step is reaches, execution stops.

>  - reverse-continue (or rc): Runs execution "backwards".

>    QEMU tries to find breakpoint or watchpoint by loaded prior snapshot

>    and replaying the execution. Then QEMU loads snapshots again and

>    replays to the latest breakpoint. When there are no breakpoints in

>    the examined section of the execution, QEMU finds one more snapshot

>    and tries again. After the first snapshot is processed, execution

>    stops at this snapshot.

> 

> The set of patches include the following modifications:

>  - gdbstub update for reverse debugging support

>  - functions that automatically perform reverse step and reverse

>    continue operations

>  - hmp/qmp commands for manipulating the replay process

>  - improvement of the snapshotting for saving the execution step

>    in the snapshot parameters

>  - avocado-based acceptance tests for reverse debugging

> 

> The patches are available in the repository:

> https://github.com/ispras/qemu/tree/rr-200901

> 

> v7 changes:

>  - updated snapshot info output format

>  - fixed qcow2 snapshot-related tests


Sorry, I'm still seeing a failure

timeout 15  /home/travis/build/bonzini/qemu/build/qemu-system-aarch64 -monitor none -display none -chardev file,path=memory-replay.out,id=output -icount shift=5,rr=replay,rrfile=record.bin  -M virt -cpu max -display none -semihosting-config enable=on,target=native,chardev=output -kernel memory

qemu-system-aarch64: terminating on signal 15 from pid 38312 (timeout)

https://travis-ci.com/gitlab/bonzini/qemu/jobs/395029273

Paolo
Pavel Dovgalyuk Oct. 5, 2020, 1:45 p.m. UTC | #3
On 05.10.2020 15:27, Paolo Bonzini wrote:
> On 03/10/20 19:12, Pavel Dovgalyuk wrote:

>> GDB remote protocol supports reverse debugging of the targets.

>> It includes 'reverse step' and 'reverse continue' operations.

>> The first one finds the previous step of the execution,

>> and the second one is intended to stop at the last breakpoint that

>> would happen when the program is executed normally.

>>

>> Reverse debugging is possible in the replay mode, when at least

>> one snapshot was created at the record or replay phase.

>> QEMU can use these snapshots for travelling back in time with GDB.

>>

>> Running the execution in replay mode allows using GDB reverse debugging

>> commands:

>>   - reverse-stepi (or rsi): Steps one instruction to the past.

>>     QEMU loads on of the prior snapshots and proceeds to the desired

>>     instruction forward. When that step is reaches, execution stops.

>>   - reverse-continue (or rc): Runs execution "backwards".

>>     QEMU tries to find breakpoint or watchpoint by loaded prior snapshot

>>     and replaying the execution. Then QEMU loads snapshots again and

>>     replays to the latest breakpoint. When there are no breakpoints in

>>     the examined section of the execution, QEMU finds one more snapshot

>>     and tries again. After the first snapshot is processed, execution

>>     stops at this snapshot.

>>

>> The set of patches include the following modifications:

>>   - gdbstub update for reverse debugging support

>>   - functions that automatically perform reverse step and reverse

>>     continue operations

>>   - hmp/qmp commands for manipulating the replay process

>>   - improvement of the snapshotting for saving the execution step

>>     in the snapshot parameters

>>   - avocado-based acceptance tests for reverse debugging

>>

>> The patches are available in the repository:

>> https://github.com/ispras/qemu/tree/rr-200901

>>

>> v7 changes:

>>   - updated snapshot info output format

>>   - fixed qcow2 snapshot-related tests

> 

> Sorry, I'm still seeing a failure

> 

> timeout 15  /home/travis/build/bonzini/qemu/build/qemu-system-aarch64 -monitor none -display none -chardev file,path=memory-replay.out,id=output -icount shift=5,rr=replay,rrfile=record.bin  -M virt -cpu max -display none -semihosting-config enable=on,target=native,chardev=output -kernel memory

> 

> qemu-system-aarch64: terminating on signal 15 from pid 38312 (timeout)


That's very strange.
None of the patches affect RR for AArch64. Is this the real failure or 
just a coincidence?
I also tried running this test on my local machine and got normal 
execution time for replay:
real	0m0,968s
user	0m0,657s
sys	0m0,625s

By the way, this is an early RR test. Now we have more complex (and 
easier to reproduce) avocado-based RR tests (for aarch64 too).
I.e. record and replay in this test are divided into two "tests",
and can cause races if running in parallel for some reason.

Shouldn't we just drop this one?

> 

> https://travis-ci.com/gitlab/bonzini/qemu/jobs/395029273

> 

> Paolo

>
Paolo Bonzini Oct. 5, 2020, 1:51 p.m. UTC | #4
On 05/10/20 15:45, Pavel Dovgalyuk wrote:
> 

> That's very strange.

> None of the patches affect RR for AArch64. Is this the real failure or

> just a coincidence?

> I also tried running this test on my local machine and got normal

> execution time for replay:

> real    0m0,968s

> user    0m0,657s

> sys    0m0,625s

> 

> By the way, this is an early RR test. Now we have more complex (and

> easier to reproduce) avocado-based RR tests (for aarch64 too).

> I.e. record and replay in this test are divided into two "tests",

> and can cause races if running in parallel for some reason.


Good to know.  I'll keep this series in my tree so that it reruns, and
will keep an eye on whether I see similar failures in the next few days.
 I have seen other similar timeouts (e.g. in xtensa test-timer) that
weren't related to RR so it's possible that it's a false positive.

> Shouldn't we just drop this one?


Feel free to send a patch to Alex for it.

Paolo