mbox series

[v3,00/11] MTTCG fix-ups for 2.9

Message ID 20170307155054.5833-1-alex.bennee@linaro.org
Headers show
Series MTTCG fix-ups for 2.9 | expand

Message

Alex Bennée March 7, 2017, 3:50 p.m. UTC
Hi,

This is the latest iteration of fixes for problems introduced by
MTTCG. Aside from the usual slew of addressing review comments and
applying tags I've also pulled in Yongbok Kim's MIPS patch (to replace
mine) and Paolo's VMEXIT fix for the problem reported by Alexander
Boettcher.

Unless anyone shouts I'll wrap these up into a pull request for Peter
tomorrow.

I'm hoping the relative quite on the list is because other
architectures have so far been un-affected by the changes ;-)

However it is a timely reminder with the 2.9 rc0 coming up that now is
the time for target maintainers to check everything is still working
as expected.

The next thing on my list it to look at the icount problems and review
Paolo's fixes for it. However those fixes should go in a separate
series and I assume via Paolo's tree.

Thanks,

Alex.

Alex Bennée (9):
  vl/cpus: be smarter with icount and MTTCG
  target/i386/cpu.h: declare TCG_GUEST_DEFAULT_MO
  cpus.c: add additional error_report when !TARGET_SUPPORT_MTTCG
  sparc/sparc64: grab BQL before calling cpu_check_irqs
  s390x/misc_helper.c: wrap IO instructions in BQL
  target/xtensa: hold BQL for interrupt processing
  translate-all: exit cpu_restore_state early if translating
  target/arm/helper: make it clear the EC field is also in hex
  hw/intc/arm_gic: modernise the DPRINTF

Paolo Bonzini (1):
  target-i386: defer VMEXIT to do_interrupt

Yongbok Kim (1):
  target/mips: hold BQL for timer interrupts

 cpus.c                      | 11 +++++++----
 hw/intc/arm_gic.c           | 13 +++++++++----
 hw/sparc/sun4m.c            |  3 +++
 hw/sparc64/sparc64.c        |  3 +++
 target/arm/helper.c         |  2 +-
 target/i386/cpu.h           |  5 +++++
 target/i386/seg_helper.c    | 20 +++++++++++---------
 target/i386/svm_helper.c    | 22 +++++++++++++---------
 target/mips/op_helper.c     | 21 ++++++++++++++++++---
 target/s390x/misc_helper.c  | 21 +++++++++++++++++++++
 target/sparc/int64_helper.c |  3 +++
 target/sparc/win_helper.c   | 13 +++++++++++++
 target/xtensa/helper.c      |  1 +
 target/xtensa/op_helper.c   |  7 +++++++
 translate-all.c             | 13 +++++++++++++
 vl.c                        |  7 ++-----
 16 files changed, 130 insertions(+), 35 deletions(-)

-- 
2.11.0

Comments

Pranith Kumar March 7, 2017, 8:25 p.m. UTC | #1
Hi Alex,

On Tue, Mar 7, 2017 at 10:50 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
> Hi,

>

> This is the latest iteration of fixes for problems introduced by

> MTTCG. Aside from the usual slew of addressing review comments and

> applying tags I've also pulled in Yongbok Kim's MIPS patch (to replace

> mine) and Paolo's VMEXIT fix for the problem reported by Alexander

> Boettcher.

>

> Unless anyone shouts I'll wrap these up into a pull request for Peter

> tomorrow.



Can you also pick up this patch? I am not sure who else will.

http://patchwork.ozlabs.org/patch/733293/

Thanks,
-- 
Pranith
Pranith Kumar March 8, 2017, 2:20 p.m. UTC | #2
On Wed, Mar 8, 2017 at 2:39 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
>

> Pranith Kumar <bobby.prani@gmail.com> writes:

>

>> Hi Alex,

>>

>> On Tue, Mar 7, 2017 at 10:50 AM, Alex Bennée <alex.bennee@linaro.org> wrote:

>>> Hi,

>>>

>>> This is the latest iteration of fixes for problems introduced by

>>> MTTCG. Aside from the usual slew of addressing review comments and

>>> applying tags I've also pulled in Yongbok Kim's MIPS patch (to replace

>>> mine) and Paolo's VMEXIT fix for the problem reported by Alexander

>>> Boettcher.

>>>

>>> Unless anyone shouts I'll wrap these up into a pull request for Peter

>>> tomorrow.

>>

>>

>> Can you also pick up this patch? I am not sure who else will.

>>

>> http://patchwork.ozlabs.org/patch/733293/

>

> Does the problem only exhibit itself with --accel tcg,thread=multi? I

> was figuring it would go in with a larger x86 MTTCG enabling patch set

> post 2.9. If it can occur with the defaults now I'll happily add it

> to the pullreq I'm preparing.

>


It happens only with thread=multi, so I guess later on is OK.

Thanks,
-- 
Pranith
Pavel Dovgalyuk March 13, 2017, 12:32 p.m. UTC | #3
Hi,

> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]

> 

> The next thing on my list it to look at the icount problems and review

> Paolo's fixes for it. However those fixes should go in a separate

> series and I assume via Paolo's tree.

 
Do you mean those problems that completely broke icount?
Are you going to fix it?


> Alex Bennée (9):

>   vl/cpus: be smarter with icount and MTTCG

>   target/i386/cpu.h: declare TCG_GUEST_DEFAULT_MO

>   cpus.c: add additional error_report when !TARGET_SUPPORT_MTTCG

>   sparc/sparc64: grab BQL before calling cpu_check_irqs

>   s390x/misc_helper.c: wrap IO instructions in BQL

>   target/xtensa: hold BQL for interrupt processing

>   translate-all: exit cpu_restore_state early if translating

>   target/arm/helper: make it clear the EC field is also in hex

>   hw/intc/arm_gic: modernise the DPRINTF

> 

> Paolo Bonzini (1):

>   target-i386: defer VMEXIT to do_interrupt

> 

> Yongbok Kim (1):

>   target/mips: hold BQL for timer interrupts



Pavel Dovgalyuk
Alex Bennée March 13, 2017, 1:16 p.m. UTC | #4
Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> Hi,

>

>> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]

>>

>> The next thing on my list it to look at the icount problems and review

>> Paolo's fixes for it. However those fixes should go in a separate

>> series and I assume via Paolo's tree.

>

> Do you mean those problems that completely broke icount?


Completely broke is a bit strong. Certainly I tested icount on my
patches but I missed the pathological failure mode that led to the
interaction between the BQL lock breaking patch and running a real
guest. It didn't break icount so much as slow down guests so much they
never got round to finishing their IRQ handling.

That certainly seems to be fixed by Paolo's patches.

> Are you going to fix it?


Yes. It is certainly not intentional to regress icount and it needs to
be fixed before we release 2.9.

If you can point me towards the record/replay test cases I'll make sure
I run them on the updated series.

>

>

>> Alex Bennée (9):

>>   vl/cpus: be smarter with icount and MTTCG

>>   target/i386/cpu.h: declare TCG_GUEST_DEFAULT_MO

>>   cpus.c: add additional error_report when !TARGET_SUPPORT_MTTCG

>>   sparc/sparc64: grab BQL before calling cpu_check_irqs

>>   s390x/misc_helper.c: wrap IO instructions in BQL

>>   target/xtensa: hold BQL for interrupt processing

>>   translate-all: exit cpu_restore_state early if translating

>>   target/arm/helper: make it clear the EC field is also in hex

>>   hw/intc/arm_gic: modernise the DPRINTF

>>

>> Paolo Bonzini (1):

>>   target-i386: defer VMEXIT to do_interrupt

>>

>> Yongbok Kim (1):

>>   target/mips: hold BQL for timer interrupts

>

>

> Pavel Dovgalyuk



--
Alex Bennée
Pavel Dovgalyuk March 14, 2017, 12:15 p.m. UTC | #5
> From: Alex Bennée [mailto:alex.bennee@linaro.org]

> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-

> request@listserver.greensocs.com]

> >>

> >> The next thing on my list it to look at the icount problems and review

> >> Paolo's fixes for it. However those fixes should go in a separate

> >> series and I assume via Paolo's tree.

> >

> > Do you mean those problems that completely broke icount?

> 

> Completely broke is a bit strong. Certainly I tested icount on my

> patches but I missed the pathological failure mode that led to the

> interaction between the BQL lock breaking patch and running a real

> guest. It didn't break icount so much as slow down guests so much they

> never got round to finishing their IRQ handling.


That might look as slowing down in regular icount mode.
But it becomes non-deterministic in record/replay mode.
Therefore none of the recorded scenarios may be replayed.

I tried the simplest command lines:

qemu-system-i386.exe -icount shift=7,rr=record,rrfile=replay.bin -net none 

qemu-system-i386.exe -icount shift=7,rr=replay,rrfile=replay.bin -net none

First one was used to record execution until BIOS will print its startup info.
The second one is for replay, but it can replay about 200000 instructions
until the problems with icount manifest itself - the execution does not proceed.

> That certainly seems to be fixed by Paolo's patches.

> 

> > Are you going to fix it?

> 

> Yes. It is certainly not intentional to regress icount and it needs to

> be fixed before we release 2.9.

> 

> If you can point me towards the record/replay test cases I'll make sure

> I run them on the updated series.



Pavel Dovgalyuk
Alex Bennée March 14, 2017, 3:18 p.m. UTC | #6
Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> From: Alex Bennée [mailto:alex.bennee@linaro.org]

>> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-

>> request@listserver.greensocs.com]

>> >>

>> >> The next thing on my list it to look at the icount problems and review

>> >> Paolo's fixes for it. However those fixes should go in a separate

>> >> series and I assume via Paolo's tree.

>> >

>> > Do you mean those problems that completely broke icount?

>>

>> Completely broke is a bit strong. Certainly I tested icount on my

>> patches but I missed the pathological failure mode that led to the

>> interaction between the BQL lock breaking patch and running a real

>> guest. It didn't break icount so much as slow down guests so much they

>> never got round to finishing their IRQ handling.

>

> That might look as slowing down in regular icount mode.

> But it becomes non-deterministic in record/replay mode.

> Therefore none of the recorded scenarios may be replayed.

>

> I tried the simplest command lines:

>

> qemu-system-i386.exe -icount shift=7,rr=record,rrfile=replay.bin -net

> none


Running this against 2421f381dc (pre-merge of MTTCG) from the source
tree generates no output. My command is on Linux:

  /x86_64-softmmu/qemu-system-x86_64 -icount shift=7,rr=record,rrfile=replay.bin -net none

Do I need to specify the BIOS manually?

>

> qemu-system-i386.exe -icount shift=7,rr=replay,rrfile=replay.bin -net none

>

> First one was used to record execution until BIOS will print its startup info.

> The second one is for replay, but it can replay about 200000 instructions

> until the problems with icount manifest itself - the execution does

> not proceed.


What error message does it give? How do you know how many instructions
have been executed?

I ran the following test on both pre-mttcg-merge and my current HEAD
which includes Paolo's fix series:

 ./arm-softmmu/qemu-system-arm -machine type=virt \
    -display none -smp 1 -m 4096 -cpu cortex-a15 \
    -kernel ../images/aarch32-current-linux-initrd-guest.img
    -append "console=ttyAMA0" -serial mon:stdio \
    -net none \
    -icount shift=7,rr=record,rrfile=replay.bin

And then:

 ./arm-softmmu/qemu-system-arm -machine type=virt \
    -display none -smp 1 -m 4096 -cpu cortex-a15 \
    -kernel ../images/aarch32-current-linux-initrd-guest.img
    -append "console=ttyAMA0" -serial mon:stdio \
    -net none \
    -icount shift=7,rr=replay,rrfile=replay.bin

And they both ran the same. However I kept running into:

 [    3.542408] RPC: Registered tcp NFSv4.1 backchannel transport module.
qemu-system-arm: Missing character write event in the replay log

This seems to be a pre-existing

>

>> That certainly seems to be fixed by Paolo's patches.

>>

>> > Are you going to fix it?

>>

>> Yes. It is certainly not intentional to regress icount and it needs to

>> be fixed before we release 2.9.

>>

>> If you can point me towards the record/replay test cases I'll make sure

>> I run them on the updated series.

>

>

> Pavel Dovgalyuk



--
Alex Bennée
Pavel Dovgalyuk March 16, 2017, 8:34 a.m. UTC | #7
> From: Alex Bennée [mailto:alex.bennee@linaro.org]

> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]

> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> >> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-

> >> request@listserver.greensocs.com]

> >> >>

> >> >> The next thing on my list it to look at the icount problems and review

> >> >> Paolo's fixes for it. However those fixes should go in a separate

> >> >> series and I assume via Paolo's tree.

> >> >

> >> > Do you mean those problems that completely broke icount?

> >>

> >> Completely broke is a bit strong. Certainly I tested icount on my

> >> patches but I missed the pathological failure mode that led to the

> >> interaction between the BQL lock breaking patch and running a real

> >> guest. It didn't break icount so much as slow down guests so much they

> >> never got round to finishing their IRQ handling.

> >

> > That might look as slowing down in regular icount mode.

> > But it becomes non-deterministic in record/replay mode.

> > Therefore none of the recorded scenarios may be replayed.

> >

> > I tried the simplest command lines:

> >

> > qemu-system-i386.exe -icount shift=7,rr=record,rrfile=replay.bin -net

> > none

> 

> Running this against 2421f381dc (pre-merge of MTTCG) from the source

> tree generates no output. My command is on Linux:

> 

>   /x86_64-softmmu/qemu-system-x86_64 -icount shift=7,rr=record,rrfile=replay.bin -net none

> 

> Do I need to specify the BIOS manually?


No, this command line works well for me. BIOS executes and shows some messages.
Do you have any graphical output for QEMU?

> > qemu-system-i386.exe -icount shift=7,rr=replay,rrfile=replay.bin -net none

> >

> > First one was used to record execution until BIOS will print its startup info.

> > The second one is for replay, but it can replay about 200000 instructions

> > until the problems with icount manifest itself - the execution does

> > not proceed.

> 

> What error message does it give? How do you know how many instructions

> have been executed?


It hangs after executing about 200000 instructions.
I checked -d exec logs.

> I ran the following test on both pre-mttcg-merge and my current HEAD

> which includes Paolo's fix series:

> 

>  ./arm-softmmu/qemu-system-arm -machine type=virt \

>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

>     -kernel ../images/aarch32-current-linux-initrd-guest.img

>     -append "console=ttyAMA0" -serial mon:stdio \

>     -net none \

>     -icount shift=7,rr=record,rrfile=replay.bin

> 

> And then:

> 

>  ./arm-softmmu/qemu-system-arm -machine type=virt \

>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

>     -kernel ../images/aarch32-current-linux-initrd-guest.img

>     -append "console=ttyAMA0" -serial mon:stdio \

>     -net none \

>     -icount shift=7,rr=replay,rrfile=replay.bin

> 

> And they both ran the same. However I kept running into:

> 

>  [    3.542408] RPC: Registered tcp NFSv4.1 backchannel transport module.

> qemu-system-arm: Missing character write event in the replay log

> 

> This seems to be a pre-existing


Does it mean that qemu-arm platform includes some serial devices that were
not detected by the replay?

Pavel Dovgalyuk
Alex Bennée March 16, 2017, 1:06 p.m. UTC | #8
Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> From: Alex Bennée [mailto:alex.bennee@linaro.org]

>> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]

>> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> >> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-

>> >> request@listserver.greensocs.com]

>> >> >>

>> >> >> The next thing on my list it to look at the icount problems and review

>> >> >> Paolo's fixes for it. However those fixes should go in a separate

>> >> >> series and I assume via Paolo's tree.

>> >> >

>> >> > Do you mean those problems that completely broke icount?

>> >>

>> >> Completely broke is a bit strong. Certainly I tested icount on my

>> >> patches but I missed the pathological failure mode that led to the

>> >> interaction between the BQL lock breaking patch and running a real

>> >> guest. It didn't break icount so much as slow down guests so much they

>> >> never got round to finishing their IRQ handling.

>> >

>> > That might look as slowing down in regular icount mode.

>> > But it becomes non-deterministic in record/replay mode.

>> > Therefore none of the recorded scenarios may be replayed.

>> >

>> > I tried the simplest command lines:

>> >

>> > qemu-system-i386.exe -icount shift=7,rr=record,rrfile=replay.bin -net

>> > none

>>

>> Running this against 2421f381dc (pre-merge of MTTCG) from the source

>> tree generates no output. My command is on Linux:

>>

>>   /x86_64-softmmu/qemu-system-x86_64 -icount shift=7,rr=record,rrfile=replay.bin -net none

>>

>> Do I need to specify the BIOS manually?

>

> No, this command line works well for me. BIOS executes and shows some messages.

> Do you have any graphical output for QEMU?


Ahh I was running remotely. OK I can now get output on the graphics
display and the described hang. I'll have a look at it.

>

>> > qemu-system-i386.exe -icount shift=7,rr=replay,rrfile=replay.bin -net none

>> >

>> > First one was used to record execution until BIOS will print its startup info.

>> > The second one is for replay, but it can replay about 200000 instructions

>> > until the problems with icount manifest itself - the execution does

>> > not proceed.

>>

>> What error message does it give? How do you know how many instructions

>> have been executed?

>

> It hangs after executing about 200000 instructions.

> I checked -d exec logs.

>

>> I ran the following test on both pre-mttcg-merge and my current HEAD

>> which includes Paolo's fix series:

>>

>>  ./arm-softmmu/qemu-system-arm -machine type=virt \

>>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

>>     -kernel ../images/aarch32-current-linux-initrd-guest.img

>>     -append "console=ttyAMA0" -serial mon:stdio \

>>     -net none \

>>     -icount shift=7,rr=record,rrfile=replay.bin

>>

>> And then:

>>

>>  ./arm-softmmu/qemu-system-arm -machine type=virt \

>>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

>>     -kernel ../images/aarch32-current-linux-initrd-guest.img

>>     -append "console=ttyAMA0" -serial mon:stdio \

>>     -net none \

>>     -icount shift=7,rr=replay,rrfile=replay.bin

>>

>> And they both ran the same. However I kept running into:

>>

>>  [    3.542408] RPC: Registered tcp NFSv4.1 backchannel transport module.

>> qemu-system-arm: Missing character write event in the replay log

>>

>> This seems to be a pre-existing

>

> Does it mean that qemu-arm platform includes some serial devices that were

> not detected by the replay?


It's the standard ARM platform serial port. When I read replay.txt is
said:

 * Supports i386, x86_64, and ARM hardware platforms.

Should we add some qualifications about which machine types are
supported? What is you ARM test case for record/replay?

--
Alex Bennée
Pavel Dovgalyuk March 16, 2017, 2:46 p.m. UTC | #9
> From: Alex Bennée [mailto:alex.bennee@linaro.org]

> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]

> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> >

> >> I ran the following test on both pre-mttcg-merge and my current HEAD

> >> which includes Paolo's fix series:

> >>

> >>  ./arm-softmmu/qemu-system-arm -machine type=virt \

> >>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

> >>     -kernel ../images/aarch32-current-linux-initrd-guest.img

> >>     -append "console=ttyAMA0" -serial mon:stdio \

> >>     -net none \

> >>     -icount shift=7,rr=record,rrfile=replay.bin

> >>

> >> And then:

> >>

> >>  ./arm-softmmu/qemu-system-arm -machine type=virt \

> >>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

> >>     -kernel ../images/aarch32-current-linux-initrd-guest.img

> >>     -append "console=ttyAMA0" -serial mon:stdio \

> >>     -net none \

> >>     -icount shift=7,rr=replay,rrfile=replay.bin

> >>

> >> And they both ran the same. However I kept running into:

> >>

> >>  [    3.542408] RPC: Registered tcp NFSv4.1 backchannel transport module.

> >> qemu-system-arm: Missing character write event in the replay log

> >>

> >> This seems to be a pre-existing

> >

> > Does it mean that qemu-arm platform includes some serial devices that were

> > not detected by the replay?

> 

> It's the standard ARM platform serial port. When I read replay.txt is

> said:

> 

>  * Supports i386, x86_64, and ARM hardware platforms.

> 

> Should we add some qualifications about which machine types are

> supported? What is you ARM test case for record/replay?


I tested on vexpress-a9 platform with Debian wheezy.

Pavel Dovgalyuk
Alex Bennée March 22, 2017, 2:17 p.m. UTC | #10
Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> From: Alex Bennée [mailto:alex.bennee@linaro.org]

>> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]

>> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> >

>> >> I ran the following test on both pre-mttcg-merge and my current HEAD

>> >> which includes Paolo's fix series:

>> >>

>> >>  ./arm-softmmu/qemu-system-arm -machine type=virt \

>> >>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

>> >>     -kernel ../images/aarch32-current-linux-initrd-guest.img

>> >>     -append "console=ttyAMA0" -serial mon:stdio \

>> >>     -net none \

>> >>     -icount shift=7,rr=record,rrfile=replay.bin

>> >>

>> >> And then:

>> >>

>> >>  ./arm-softmmu/qemu-system-arm -machine type=virt \

>> >>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

>> >>     -kernel ../images/aarch32-current-linux-initrd-guest.img

>> >>     -append "console=ttyAMA0" -serial mon:stdio \

>> >>     -net none \

>> >>     -icount shift=7,rr=replay,rrfile=replay.bin

>> >>

>> >> And they both ran the same. However I kept running into:

>> >>

>> >>  [    3.542408] RPC: Registered tcp NFSv4.1 backchannel transport module.

>> >> qemu-system-arm: Missing character write event in the replay log

>> >>

>> >> This seems to be a pre-existing

>> >

>> > Does it mean that qemu-arm platform includes some serial devices that were

>> > not detected by the replay?

>>

>> It's the standard ARM platform serial port. When I read replay.txt is

>> said:

>>

>>  * Supports i386, x86_64, and ARM hardware platforms.

>>

>> Should we add some qualifications about which machine types are

>> supported? What is you ARM test case for record/replay?

>

> I tested on vexpress-a9 platform with Debian wheezy.


Thanks for that. I now have a test case that I can reproduce failures on
without needing graphics.

I've been investigating if there are any problems with the timer
processing now they have been moved into the TCG thread. The record
stage seems to work fine but I'm having difficulty figuring out why
playback freezes. It seems we get to a point where we are stuck waiting
for a suspiciously exact timer deadline:

  replay_account_executed_instructions: instructions_count reached zero
  handle_icount_deadline: deadline=471077745280
  replay_account_executed_instructions: instructions_count reached zero
  handle_icount_deadline: deadline=10000000
  [Switching to Thread 0x7fff7a7be700 (LWP 733)]

  Thread 3 "qemu-system-arm" hit Breakpoint 1, handle_icount_deadline () at /home/alex/lsrc/qemu/qemu.git/cpus.c:1184
  1184                    fprintf(stderr,"%s: no change to deadline=%ld for %ld\n",
  (gdb) c
  Continuing.
  handle_icount_deadline: no change to deadline=10000000 for 11

  Thread 3 "qemu-system-arm" hit Breakpoint 1, handle_icount_deadline () at /home/alex/lsrc/qemu/qemu.git/cpus.c:1184
  1184                    fprintf(stderr,"%s: no change to deadline=%ld for %ld\n",
  (gdb) c 10
  Will ignore next 9 crossings of breakpoint 1.  Continuing.
  handle_icount_deadline: no change to deadline=10000000 for 12
  handle_icount_deadline: no change to deadline=10000000 for 13
  handle_icount_deadline: no change to deadline=10000000 for 14
  handle_icount_deadline: no change to deadline=10000000 for 15
  handle_icount_deadline: no change to deadline=10000000 for 16
  handle_icount_deadline: no change to deadline=10000000 for 17
  handle_icount_deadline: no change to deadline=10000000 for 18
  handle_icount_deadline: no change to deadline=10000000 for 19
  handle_icount_deadline: no change to deadline=10000000 for 20
  handle_icount_deadline: no change to deadline=10000000 for 21

  (gdb) p replay_state
  $1 = {cached_clock = {1490191319270134000, 0}, current_step = 267869922, instructions_count = 0, data_kind = 12, has_unread_data = 1, file_offset = 0, block_request_id = 0}

But the timers are all enabled:

  (gdb) qemu timers
  Processing Realtime timers
    clock QEMU_CLOCK_REALTIME is enabled:true, last:-9223372036854775808
  Processing Virtual timers
    clock QEMU_CLOCK_VIRTUAL is enabled:true, last:-9223372036854775808
      timer 34297350016/1 (cb:0x555555a2e952 <ptimer_tick>)
      timer 503290000000/1000000 (cb:0x555555bd4d1d <ra_timer_handler>)
  Processing Host timers
    clock QEMU_CLOCK_HOST is enabled:true, last:1490191319270134000
  Processing Virtual RT timers
    clock QEMU_CLOCK_VIRTUAL_RT is enabled:true, last:-9223372036854775808

One area I wanted to look back at was comparing the record trace from
pre-mttcg-merge to now to see if any information was missing. However
the bin file has quite a lot of noise in it from changing fields so I
was wondering do you have any sort of dumper tool for comparing the
traces? If not is the format of the trace file specified anywhere?

I found:
  #define REPLAY_VERSION              0xe02005

but couldn't find any more than that. Is the format basically just
described by the calls to replay_put_byte?

--
Alex Bennée
Pavel Dovgalyuk March 29, 2017, 6:06 a.m. UTC | #11
> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]

> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> 

> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]

> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> >> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]

> >> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> >> >

> >> >> I ran the following test on both pre-mttcg-merge and my current HEAD

> >> >> which includes Paolo's fix series:

> >> >>

> >> >>  ./arm-softmmu/qemu-system-arm -machine type=virt \

> >> >>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

> >> >>     -kernel ../images/aarch32-current-linux-initrd-guest.img

> >> >>     -append "console=ttyAMA0" -serial mon:stdio \

> >> >>     -net none \

> >> >>     -icount shift=7,rr=record,rrfile=replay.bin

> >> >>

> >> >> And then:

> >> >>

> >> >>  ./arm-softmmu/qemu-system-arm -machine type=virt \

> >> >>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

> >> >>     -kernel ../images/aarch32-current-linux-initrd-guest.img

> >> >>     -append "console=ttyAMA0" -serial mon:stdio \

> >> >>     -net none \

> >> >>     -icount shift=7,rr=replay,rrfile=replay.bin

> >> >>

> >> >> And they both ran the same. However I kept running into:

> >> >>

> >> >>  [    3.542408] RPC: Registered tcp NFSv4.1 backchannel transport module.

> >> >> qemu-system-arm: Missing character write event in the replay log

> >> >>

> >> >> This seems to be a pre-existing

> >> >

> >> > Does it mean that qemu-arm platform includes some serial devices that were

> >> > not detected by the replay?

> >>

> >> It's the standard ARM platform serial port. When I read replay.txt is

> >> said:

> >>

> >>  * Supports i386, x86_64, and ARM hardware platforms.

> >>

> >> Should we add some qualifications about which machine types are

> >> supported? What is you ARM test case for record/replay?

> >

> > I tested on vexpress-a9 platform with Debian wheezy.

> 

> Thanks for that. I now have a test case that I can reproduce failures on

> without needing graphics.

> 

> I've been investigating if there are any problems with the timer

> processing now they have been moved into the TCG thread. The record

> stage seems to work fine but I'm having difficulty figuring out why

> playback freezes. It seems we get to a point where we are stuck waiting

> for a suspiciously exact timer deadline:


I've encountered the following behavior at replay stage:
- replay takes some instructions to execute (qemu_icount += counter)
- virtual timer is fired
- replay puts back unexecuted instructions (qemu_icount -= counter)

But virtual timer cannot take in account non-executed instructions (counter) and
therefore it reads only qemu_icount, which provides incorrect time.

> But the timers are all enabled:

> 

>   (gdb) qemu timers

>   Processing Realtime timers

>     clock QEMU_CLOCK_REALTIME is enabled:true, last:-9223372036854775808

>   Processing Virtual timers

>     clock QEMU_CLOCK_VIRTUAL is enabled:true, last:-9223372036854775808

>       timer 34297350016/1 (cb:0x555555a2e952 <ptimer_tick>)

>       timer 503290000000/1000000 (cb:0x555555bd4d1d <ra_timer_handler>)

>   Processing Host timers

>     clock QEMU_CLOCK_HOST is enabled:true, last:1490191319270134000

>   Processing Virtual RT timers

>     clock QEMU_CLOCK_VIRTUAL_RT is enabled:true, last:-9223372036854775808


Timers are processed only at checkpoints recorded in the log.
When replay is stuck, probably there is a pending checkpoint in the log
and pending instructions in CPU (because iothread breaks its synchronization).

> One area I wanted to look back at was comparing the record trace from

> pre-mttcg-merge to now to see if any information was missing. However


I usually use in_asm and exec logs and also add some logging at replay checkpoints.

> the bin file has quite a lot of noise in it from changing fields so I

> was wondering do you have any sort of dumper tool for comparing the

> traces? If not is the format of the trace file specified anywhere?


Usually you cannot compare two different record/replay logs, because there
is no synchronization of the timers in different recording runs and
therefore you'll get totally different logs.

Pavel Dovgalyuk
Alex Bennée March 29, 2017, 9:42 a.m. UTC | #12
Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]

>> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>>

>> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]

>> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> >> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]

>> >> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> >> >

>> >> >> I ran the following test on both pre-mttcg-merge and my current HEAD

>> >> >> which includes Paolo's fix series:

>> >> >>

>> >> >>  ./arm-softmmu/qemu-system-arm -machine type=virt \

>> >> >>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

>> >> >>     -kernel ../images/aarch32-current-linux-initrd-guest.img

>> >> >>     -append "console=ttyAMA0" -serial mon:stdio \

>> >> >>     -net none \

>> >> >>     -icount shift=7,rr=record,rrfile=replay.bin

>> >> >>

>> >> >> And then:

>> >> >>

>> >> >>  ./arm-softmmu/qemu-system-arm -machine type=virt \

>> >> >>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

>> >> >>     -kernel ../images/aarch32-current-linux-initrd-guest.img

>> >> >>     -append "console=ttyAMA0" -serial mon:stdio \

>> >> >>     -net none \

>> >> >>     -icount shift=7,rr=replay,rrfile=replay.bin

>> >> >>

>> >> >> And they both ran the same. However I kept running into:

>> >> >>

>> >> >>  [    3.542408] RPC: Registered tcp NFSv4.1 backchannel transport module.

>> >> >> qemu-system-arm: Missing character write event in the replay log

>> >> >>

>> >> >> This seems to be a pre-existing

>> >> >

>> >> > Does it mean that qemu-arm platform includes some serial devices that were

>> >> > not detected by the replay?

>> >>

>> >> It's the standard ARM platform serial port. When I read replay.txt is

>> >> said:

>> >>

>> >>  * Supports i386, x86_64, and ARM hardware platforms.

>> >>

>> >> Should we add some qualifications about which machine types are

>> >> supported? What is you ARM test case for record/replay?

>> >

>> > I tested on vexpress-a9 platform with Debian wheezy.

>>

>> Thanks for that. I now have a test case that I can reproduce failures on

>> without needing graphics.

>>

>> I've been investigating if there are any problems with the timer

>> processing now they have been moved into the TCG thread. The record

>> stage seems to work fine but I'm having difficulty figuring out why

>> playback freezes. It seems we get to a point where we are stuck waiting

>> for a suspiciously exact timer deadline:

>

> I've encountered the following behavior at replay stage:

> - replay takes some instructions to execute (qemu_icount += counter)

> - virtual timer is fired


This is the virtual timer based on icount not the virtual rt timer? So
under the new scheme of being processed in the vCPU loop we should only
fire this one once all execution is done (although you may exit the loop
early before icount is exhausted).

> - replay puts back unexecuted instructions (qemu_icount -= counter)

>

> But virtual timer cannot take in account non-executed instructions (counter) and

> therefore it reads only qemu_icount, which provides incorrect time.

>

>> But the timers are all enabled:

>>

>>   (gdb) qemu timers

>>   Processing Realtime timers

>>     clock QEMU_CLOCK_REALTIME is enabled:true, last:-9223372036854775808

>>   Processing Virtual timers

>>     clock QEMU_CLOCK_VIRTUAL is enabled:true, last:-9223372036854775808

>>       timer 34297350016/1 (cb:0x555555a2e952 <ptimer_tick>)

>>       timer 503290000000/1000000 (cb:0x555555bd4d1d <ra_timer_handler>)

>>   Processing Host timers

>>     clock QEMU_CLOCK_HOST is enabled:true, last:1490191319270134000

>>   Processing Virtual RT timers

>>     clock QEMU_CLOCK_VIRTUAL_RT is enabled:true, last:-9223372036854775808

>

> Timers are processed only at checkpoints recorded in the log.

> When replay is stuck, probably there is a pending checkpoint in the log

> and pending instructions in CPU (because iothread breaks its

> synchronization).


How does this work for instructions that sleep (like WFI)? The WFI will
cause us to exit the loop but I assume we need a "real" timer to trigger
a firing of whatever the WFI is waiting for as time according to icount
isn't advancing.

>

>> One area I wanted to look back at was comparing the record trace from

>> pre-mttcg-merge to now to see if any information was missing. However

>

> I usually use in_asm and exec logs and also add some logging at replay checkpoints.

>

>> the bin file has quite a lot of noise in it from changing fields so I

>> was wondering do you have any sort of dumper tool for comparing the

>> traces? If not is the format of the trace file specified anywhere?

>

> Usually you cannot compare two different record/replay logs, because there

> is no synchronization of the timers in different recording runs and

> therefore you'll get totally different logs.


I wasn't looking to have identical traces but I was interested in the
patterns in the logs and how they have changed. For example:

HEADER: version 0xe02005                                      | HEADER: version 0xe02006
1:EVENT_CP_INIT                                                 1:EVENT_CP_INIT
2:EVENT_CLOCK_HOST 0x14afc6687f1fc738                         | 2:EVENT_CLOCK_HOST 0x14b000644239c8d8
3:EVENT_CLOCK_HOST 0x14afc66c6ac91858                         | 3:EVENT_CLOCK_HOST 0x14b00064423a1310
4:EVENT_CP_RESET no additional data                             4:EVENT_CP_RESET no additional data
5:EVENT_CP_CLOCK_WARP_START no additional data                  5:EVENT_CP_CLOCK_WARP_START no additional data
6:EVENT_CP_CLOCK_VIRTUAL no additional data                   | 6:EVENT_CP_CLOCK_WARP_START no additional data
7:EVENT_CP_CLOCK_WARP_START no additional data                | 7:EVENT_CP_CLOCK_VIRTUAL no additional data
8:EVENT_CP_CLOCK_VIRTUAL no additional data                   | 8:EVENT_CP_CLOCK_WARP_ACCOUNT no additional data
9:EVENT_CP_CLOCK_WARP_START no additional data                | 9:EVENT_CP_CLOCK_WARP_ACCOUNT no additional data
10:EVENT_CP_CLOCK_VIRTUAL no additional data                  | 10:EVENT_INSTRUCTION 0x1000000
11:EVENT_CP_CLOCK_WARP_ACCOUNT no additional data               11:EVENT_CP_CLOCK_WARP_ACCOUNT no additional data
12:EVENT_CP_CLOCK_WARP_ACCOUNT no additional data             | 12:EVENT_INSTRUCTION 0x1000000
13:EVENT_INSTRUCTION 0x1000000                                | 13:EVENT_CP_CLOCK_WARP_ACCOUNT no additional data

There is a difference to the pattern at the start as the initial
checkpoints are layed down. I guess this is due to the interaction of
the main thread with the vCPU as they bounce the BQL with each other.

I'm wondering why we don't just do all the warp timer management in the
vCPU thread as well?

--
Alex Bennée
Pavel Dovgalyuk March 30, 2017, 11:44 a.m. UTC | #13
> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]

> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-

> request@listserver.greensocs.com]

> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> >>

> >> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]

> >> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> >> >> >> From: Alex Bennée [mailto:alex.bennee@linaro.org]

> >> >> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> >> >> >

> >> >> >> I ran the following test on both pre-mttcg-merge and my current HEAD

> >> >> >> which includes Paolo's fix series:

> >> >> >>

> >> >> >>  ./arm-softmmu/qemu-system-arm -machine type=virt \

> >> >> >>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

> >> >> >>     -kernel ../images/aarch32-current-linux-initrd-guest.img

> >> >> >>     -append "console=ttyAMA0" -serial mon:stdio \

> >> >> >>     -net none \

> >> >> >>     -icount shift=7,rr=record,rrfile=replay.bin

> >> >> >>

> >> >> >> And then:

> >> >> >>

> >> >> >>  ./arm-softmmu/qemu-system-arm -machine type=virt \

> >> >> >>     -display none -smp 1 -m 4096 -cpu cortex-a15 \

> >> >> >>     -kernel ../images/aarch32-current-linux-initrd-guest.img

> >> >> >>     -append "console=ttyAMA0" -serial mon:stdio \

> >> >> >>     -net none \

> >> >> >>     -icount shift=7,rr=replay,rrfile=replay.bin

> >> >> >>

> >> >> >> And they both ran the same. However I kept running into:

> >> >> >>

> >> >> >>  [    3.542408] RPC: Registered tcp NFSv4.1 backchannel transport module.

> >> >> >> qemu-system-arm: Missing character write event in the replay log

> >> >> >>

> >> >> >> This seems to be a pre-existing

> >> >> >

> >> >> > Does it mean that qemu-arm platform includes some serial devices that were

> >> >> > not detected by the replay?

> >> >>

> >> >> It's the standard ARM platform serial port. When I read replay.txt is

> >> >> said:

> >> >>

> >> >>  * Supports i386, x86_64, and ARM hardware platforms.

> >> >>

> >> >> Should we add some qualifications about which machine types are

> >> >> supported? What is you ARM test case for record/replay?

> >> >

> >> > I tested on vexpress-a9 platform with Debian wheezy.

> >>

> >> Thanks for that. I now have a test case that I can reproduce failures on

> >> without needing graphics.

> >>

> >> I've been investigating if there are any problems with the timer

> >> processing now they have been moved into the TCG thread. The record

> >> stage seems to work fine but I'm having difficulty figuring out why

> >> playback freezes. It seems we get to a point where we are stuck waiting

> >> for a suspiciously exact timer deadline:

> >

> > I've encountered the following behavior at replay stage:

> > - replay takes some instructions to execute (qemu_icount += counter)

> > - virtual timer is fired

> 

> This is the virtual timer based on icount not the virtual rt timer? So


Regular virtual timer. It's value is based on icount.
virtual_rt is used for internal icount purposes.

> under the new scheme of being processed in the vCPU loop we should only

> fire this one once all execution is done (although you may exit the loop

> early before icount is exhausted).


We should stop the vCPU before processing virtual timers because:
- virtual clock depends on instruction counter
- virtual timers may change virtual hardware state

> > - replay puts back unexecuted instructions (qemu_icount -= counter)

> >

> > But virtual timer cannot take in account non-executed instructions (counter) and

> > therefore it reads only qemu_icount, which provides incorrect time.

> >

> >> But the timers are all enabled:

> >>

> >>   (gdb) qemu timers

> >>   Processing Realtime timers

> >>     clock QEMU_CLOCK_REALTIME is enabled:true, last:-9223372036854775808

> >>   Processing Virtual timers

> >>     clock QEMU_CLOCK_VIRTUAL is enabled:true, last:-9223372036854775808

> >>       timer 34297350016/1 (cb:0x555555a2e952 <ptimer_tick>)

> >>       timer 503290000000/1000000 (cb:0x555555bd4d1d <ra_timer_handler>)

> >>   Processing Host timers

> >>     clock QEMU_CLOCK_HOST is enabled:true, last:1490191319270134000

> >>   Processing Virtual RT timers

> >>     clock QEMU_CLOCK_VIRTUAL_RT is enabled:true, last:-9223372036854775808

> >

> > Timers are processed only at checkpoints recorded in the log.

> > When replay is stuck, probably there is a pending checkpoint in the log

> > and pending instructions in CPU (because iothread breaks its

> > synchronization).

> 

> How does this work for instructions that sleep (like WFI)? The WFI will

> cause us to exit the loop but I assume we need a "real" timer to trigger

> a firing of whatever the WFI is waiting for as time according to icount

> isn't advancing.


This is virtual_rt for. It advances even when vCPU is stopped.

> >> One area I wanted to look back at was comparing the record trace from

> >> pre-mttcg-merge to now to see if any information was missing. However

> >

> > I usually use in_asm and exec logs and also add some logging at replay checkpoints.

> >

> >> the bin file has quite a lot of noise in it from changing fields so I

> >> was wondering do you have any sort of dumper tool for comparing the

> >> traces? If not is the format of the trace file specified anywhere?

> >

> > Usually you cannot compare two different record/replay logs, because there

> > is no synchronization of the timers in different recording runs and

> > therefore you'll get totally different logs.

> 

> I wasn't looking to have identical traces but I was interested in the

> patterns in the logs and how they have changed. For example:

> 

> HEADER: version 0xe02005                                      | HEADER: version 0xe02006

> 1:EVENT_CP_INIT                                                 1:EVENT_CP_INIT

> 2:EVENT_CLOCK_HOST 0x14afc6687f1fc738                         | 2:EVENT_CLOCK_HOST

> 0x14b000644239c8d8

> 3:EVENT_CLOCK_HOST 0x14afc66c6ac91858                         | 3:EVENT_CLOCK_HOST

> 0x14b00064423a1310

> 4:EVENT_CP_RESET no additional data                             4:EVENT_CP_RESET no additional

> data

> 5:EVENT_CP_CLOCK_WARP_START no additional data                  5:EVENT_CP_CLOCK_WARP_START no

> additional data

> 6:EVENT_CP_CLOCK_VIRTUAL no additional data                   | 6:EVENT_CP_CLOCK_WARP_START no

> additional data

> 7:EVENT_CP_CLOCK_WARP_START no additional data                | 7:EVENT_CP_CLOCK_VIRTUAL no

> additional data

> 8:EVENT_CP_CLOCK_VIRTUAL no additional data                   | 8:EVENT_CP_CLOCK_WARP_ACCOUNT

> no additional data

> 9:EVENT_CP_CLOCK_WARP_START no additional data                | 9:EVENT_CP_CLOCK_WARP_ACCOUNT

> no additional data

> 10:EVENT_CP_CLOCK_VIRTUAL no additional data                  | 10:EVENT_INSTRUCTION 0x1000000

> 11:EVENT_CP_CLOCK_WARP_ACCOUNT no additional data               11:EVENT_CP_CLOCK_WARP_ACCOUNT

> no additional data

> 12:EVENT_CP_CLOCK_WARP_ACCOUNT no additional data             | 12:EVENT_INSTRUCTION 0x1000000

> 13:EVENT_INSTRUCTION 0x1000000                                | 13:EVENT_CP_CLOCK_WARP_ACCOUNT

> no additional data

> 

> There is a difference to the pattern at the start as the initial

> checkpoints are layed down. I guess this is due to the interaction of

> the main thread with the vCPU as they bounce the BQL with each other.


Yes, this may be caused by changed interaction between the threads.

> I'm wondering why we don't just do all the warp timer management in the

> vCPU thread as well?


Because vCPU may sleep and we'll have to setup some timer (virtual_rt as now) anyway.



By the way, I made small description of the replay log format
to be later included into docs/replay.txt:

Record/replay log consits of the header and the sequence of execution
events. The header includes 4-byte replay version id and 8-byte reserved
field. Version is updated every time replay log format changes to prevent
using replay log created by another build of qemu.

The sequence of the events describes virtual machine state changes.
It includes all non-deterministic inputs of VM, synchronization marks and
instruction counts used to correctly inject inputs at replay.

Synchronization marks (checkpoints) are used for synchronizing qemu threads
that perform operations with virtual hardware. These operations may change
system's state (e.g., change some register or generate interrupt) and
therefore should execute synchronously with CPU thread.

Every event in the log includes 1-byte event id and optional arguments.
When argument is an array, it is stored as 4-byte array length
and corresponding number of bytes with data.
Here is the list of events that are written into the log:

 - EVENT_INSTRUCTION. Instructions executed since last event.
   Argument: 4-byte number of executed instructions.
 - EVENT_INTERRUPT. Used to synchronize interrupt processing.
 - EVENT_EXCEPTION. Used to synchronize exception handling.
 - EVENT_ASYNC. This is a group of events. They are always processed
   together with checkpoints. When such an event is generated, it is
   stored in the queue and processed only when checkpoint occurs.
   Every such event is followed by 1-byte checkpoint id and 1-byte
   async event id from the following list:
     - REPLAY_ASYNC_EVENT_BH. Bottom-half callback. This event synchronizes
       callbacks that affect virtual machine state, but normally called
       asyncronously.
       Argument: 8-byte operation id.
     - REPLAY_ASYNC_EVENT_INPUT. Input device event. Contains
       parameters of keyboard and mouse input operations
       (key press/release, mouse pointer movement).
       Arguments: 9-16 bytes depending of input event.
     - REPLAY_ASYNC_EVENT_INPUT_SYNC. Internal input synchronization event.
     - REPLAY_ASYNC_EVENT_CHAR_READ. Character (e.g., serial port) device input
       initiated by the sender.
       Arguments: 1-byte character device id.
                  Array with bytes were read.
     - REPLAY_ASYNC_EVENT_BLOCK. Block device operation. Used to synchronize
       operations with disk and flash drives with CPU.
       Argument: 8-byte operation id.
     - REPLAY_ASYNC_EVENT_NET. Incoming network packet.
       Arguments: 1-byte network adapter id.
                  4-byte packet flags.
                  Array with packet bytes.
 - EVENT_SHUTDOWN. Occurs when user sends shutdown event to qemu,
   e.g., by closing the window.
 - EVENT_CHAR_WRITE. Used to synchronize character output operations.
   Arguments: 4-byte output function return value.
              4-byte offset in the output array.
 - EVENT_CHAR_READ_ALL. Used to synchronize character input operations,
   initiated by qemu.
   Argument: Array with bytes that were read.
 - EVENT_CHAR_READ_ALL_ERROR. Unsuccessful character input operation,
   initiated by qemu.
   Argument: 4-byte error code.
 - EVENT_CLOCK + clock_id. Group of events for host clock read operations.
   Argument: 8-byte clock value.
 - EVENT_CHECKPOINT + checkpoint_id. Checkpoint for synchronization of
   CPU, internal threads, and asynchronous input events. May be followed
   by one or more EVENT_ASYNC events.
 - EVENT_END. Last event in the log.


Pavel Dovgalyuk
Alex Bennée March 30, 2017, 12:42 p.m. UTC | #14
Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]

>> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-

>> request@listserver.greensocs.com]

>> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

<snip>
>> >> > I tested on vexpress-a9 platform with Debian wheezy.

>> >>

>> >> Thanks for that. I now have a test case that I can reproduce failures on

>> >> without needing graphics.

>> >>

>> >> I've been investigating if there are any problems with the timer

>> >> processing now they have been moved into the TCG thread. The record

>> >> stage seems to work fine but I'm having difficulty figuring out why

>> >> playback freezes. It seems we get to a point where we are stuck waiting

>> >> for a suspiciously exact timer deadline:

>> >

>> > I've encountered the following behavior at replay stage:

>> > - replay takes some instructions to execute (qemu_icount += counter)

>> > - virtual timer is fired

>>

>> This is the virtual timer based on icount not the virtual rt timer? So

>

> Regular virtual timer. It's value is based on icount.

> virtual_rt is used for internal icount purposes.


And this is where the clock warps come in? The bias brings the
virtual_rt time forward because execution is waiting for some external
event to fire (e.g. a timer IRQ)?

>> under the new scheme of being processed in the vCPU loop we should only

>> fire this one once all execution is done (although you may exit the loop

>> early before icount is exhausted).

>

> We should stop the vCPU before processing virtual timers because:

> - virtual clock depends on instruction counter

> - virtual timers may change virtual hardware state


If we do the processing in the top of main vCPU loop this should be
equivalent. The instruction counter cannot run because we haven't
entered tcg_exec_cpu. We also process virtual timers in this thread
outside the loop so nothing else can be poking the hardware state.

>> > - replay puts back unexecuted instructions (qemu_icount -= counter)

>> >

>> > But virtual timer cannot take in account non-executed instructions (counter) and

>> > therefore it reads only qemu_icount, which provides incorrect time.

>> >

>> >> But the timers are all enabled:

>> >>

>> >>   (gdb) qemu timers

>> >>   Processing Realtime timers

>> >>     clock QEMU_CLOCK_REALTIME is enabled:true, last:-9223372036854775808

>> >>   Processing Virtual timers

>> >>     clock QEMU_CLOCK_VIRTUAL is enabled:true, last:-9223372036854775808

>> >>       timer 34297350016/1 (cb:0x555555a2e952 <ptimer_tick>)

>> >>       timer 503290000000/1000000 (cb:0x555555bd4d1d <ra_timer_handler>)

>> >>   Processing Host timers

>> >>     clock QEMU_CLOCK_HOST is enabled:true, last:1490191319270134000

>> >>   Processing Virtual RT timers

>> >>     clock QEMU_CLOCK_VIRTUAL_RT is enabled:true, last:-9223372036854775808

>> >

>> > Timers are processed only at checkpoints recorded in the log.

>> > When replay is stuck, probably there is a pending checkpoint in the log

>> > and pending instructions in CPU (because iothread breaks its

>> > synchronization).

>>

>> How does this work for instructions that sleep (like WFI)? The WFI will

>> cause us to exit the loop but I assume we need a "real" timer to trigger

>> a firing of whatever the WFI is waiting for as time according to icount

>> isn't advancing.

>

> This is virtual_rt for. It advances even when vCPU is stopped.


So I'm not seeing this happen:

  process_icount_data: icount not increased 117302332749=>117302332749 RT:-1

I wonder if the advance of virtual_rt has been deadlocked? The -1
suggests its not even setup.

>> >> One area I wanted to look back at was comparing the record trace from

>> >> pre-mttcg-merge to now to see if any information was missing. However

>> >

>> > I usually use in_asm and exec logs and also add some logging at replay checkpoints.

>> >

>> >> the bin file has quite a lot of noise in it from changing fields so I

>> >> was wondering do you have any sort of dumper tool for comparing the

>> >> traces? If not is the format of the trace file specified anywhere?

>> >

>> > Usually you cannot compare two different record/replay logs, because there

>> > is no synchronization of the timers in different recording runs and

>> > therefore you'll get totally different logs.

>>

>> I wasn't looking to have identical traces but I was interested in the

>> patterns in the logs and how they have changed. For example:

>>

>> HEADER: version 0xe02005                                      | HEADER: version 0xe02006

>> 1:EVENT_CP_INIT                                                 1:EVENT_CP_INIT

>> 2:EVENT_CLOCK_HOST 0x14afc6687f1fc738                         | 2:EVENT_CLOCK_HOST

>> 0x14b000644239c8d8

>> 3:EVENT_CLOCK_HOST 0x14afc66c6ac91858                         | 3:EVENT_CLOCK_HOST

>> 0x14b00064423a1310

>> 4:EVENT_CP_RESET no additional data                             4:EVENT_CP_RESET no additional

>> data

>> 5:EVENT_CP_CLOCK_WARP_START no additional data                  5:EVENT_CP_CLOCK_WARP_START no

>> additional data

>> 6:EVENT_CP_CLOCK_VIRTUAL no additional data                   | 6:EVENT_CP_CLOCK_WARP_START no

>> additional data

>> 7:EVENT_CP_CLOCK_WARP_START no additional data                | 7:EVENT_CP_CLOCK_VIRTUAL no

>> additional data

>> 8:EVENT_CP_CLOCK_VIRTUAL no additional data                   | 8:EVENT_CP_CLOCK_WARP_ACCOUNT

>> no additional data

>> 9:EVENT_CP_CLOCK_WARP_START no additional data                | 9:EVENT_CP_CLOCK_WARP_ACCOUNT

>> no additional data

>> 10:EVENT_CP_CLOCK_VIRTUAL no additional data                  | 10:EVENT_INSTRUCTION 0x1000000

>> 11:EVENT_CP_CLOCK_WARP_ACCOUNT no additional data               11:EVENT_CP_CLOCK_WARP_ACCOUNT

>> no additional data

>> 12:EVENT_CP_CLOCK_WARP_ACCOUNT no additional data             | 12:EVENT_INSTRUCTION 0x1000000

>> 13:EVENT_INSTRUCTION 0x1000000                                | 13:EVENT_CP_CLOCK_WARP_ACCOUNT

>> no additional data

>>

>> There is a difference to the pattern at the start as the initial

>> checkpoints are layed down. I guess this is due to the interaction of

>> the main thread with the vCPU as they bounce the BQL with each other.

>

> Yes, this may be caused by changed interaction between the threads.

>

>> I'm wondering why we don't just do all the warp timer management in the

>> vCPU thread as well?

>

> Because vCPU may sleep and we'll have to setup some timer (virtual_rt

> as now) anyway.


Right - because if we end up qemu_tcg_wait_io_event(cpu ? cpu :
QTAILQ_FIRST(&cpus)); we need another thread to kick us out of the
qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);

> By the way, I made small description of the replay log format

> to be later included into docs/replay.txt:

>

> Record/replay log consits of the header and the sequence of execution

> events. The header includes 4-byte replay version id and 8-byte reserved

> field. Version is updated every time replay log format changes to prevent

> using replay log created by another build of qemu.

>

> The sequence of the events describes virtual machine state changes.

> It includes all non-deterministic inputs of VM, synchronization marks and

> instruction counts used to correctly inject inputs at replay.

>

> Synchronization marks (checkpoints) are used for synchronizing qemu threads

> that perform operations with virtual hardware. These operations may change

> system's state (e.g., change some register or generate interrupt) and

> therefore should execute synchronously with CPU thread.

>

> Every event in the log includes 1-byte event id and optional arguments.

> When argument is an array, it is stored as 4-byte array length

> and corresponding number of bytes with data.

> Here is the list of events that are written into the log:

>

>  - EVENT_INSTRUCTION. Instructions executed since last event.

>    Argument: 4-byte number of executed instructions.

>  - EVENT_INTERRUPT. Used to synchronize interrupt processing.

>  - EVENT_EXCEPTION. Used to synchronize exception handling.

>  - EVENT_ASYNC. This is a group of events. They are always processed

>    together with checkpoints. When such an event is generated, it is

>    stored in the queue and processed only when checkpoint occurs.

>    Every such event is followed by 1-byte checkpoint id and 1-byte

>    async event id from the following list:

>      - REPLAY_ASYNC_EVENT_BH. Bottom-half callback. This event synchronizes

>        callbacks that affect virtual machine state, but normally called

>        asyncronously.

>        Argument: 8-byte operation id.

>      - REPLAY_ASYNC_EVENT_INPUT. Input device event. Contains

>        parameters of keyboard and mouse input operations

>        (key press/release, mouse pointer movement).

>        Arguments: 9-16 bytes depending of input event.

>      - REPLAY_ASYNC_EVENT_INPUT_SYNC. Internal input synchronization event.

>      - REPLAY_ASYNC_EVENT_CHAR_READ. Character (e.g., serial port) device input

>        initiated by the sender.

>        Arguments: 1-byte character device id.

>                   Array with bytes were read.

>      - REPLAY_ASYNC_EVENT_BLOCK. Block device operation. Used to synchronize

>        operations with disk and flash drives with CPU.

>        Argument: 8-byte operation id.

>      - REPLAY_ASYNC_EVENT_NET. Incoming network packet.

>        Arguments: 1-byte network adapter id.

>                   4-byte packet flags.

>                   Array with packet bytes.

>  - EVENT_SHUTDOWN. Occurs when user sends shutdown event to qemu,

>    e.g., by closing the window.

>  - EVENT_CHAR_WRITE. Used to synchronize character output operations.

>    Arguments: 4-byte output function return value.

>               4-byte offset in the output array.

>  - EVENT_CHAR_READ_ALL. Used to synchronize character input operations,

>    initiated by qemu.

>    Argument: Array with bytes that were read.

>  - EVENT_CHAR_READ_ALL_ERROR. Unsuccessful character input operation,

>    initiated by qemu.

>    Argument: 4-byte error code.

>  - EVENT_CLOCK + clock_id. Group of events for host clock read operations.

>    Argument: 8-byte clock value.

>  - EVENT_CHECKPOINT + checkpoint_id. Checkpoint for synchronization of

>    CPU, internal threads, and asynchronous input events. May be followed

>    by one or more EVENT_ASYNC events.

>  - EVENT_END. Last event in the log.


Thanks. That's really useful.

Do you think it would be worth including the dumper script I wrote in
./scripts/? I mainly wrote it to aid my understanding of the code flow
but it might be useful in future investigations?

--
Alex Bennée
Pavel Dovgalyuk March 31, 2017, 9:16 a.m. UTC | #15
> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]

> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-

> request@listserver.greensocs.com]

> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> >> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-

> >> request@listserver.greensocs.com]

> >> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

> <snip>

> >> >> > I tested on vexpress-a9 platform with Debian wheezy.

> >> >>

> >> >> Thanks for that. I now have a test case that I can reproduce failures on

> >> >> without needing graphics.

> >> >>

> >> >> I've been investigating if there are any problems with the timer

> >> >> processing now they have been moved into the TCG thread. The record

> >> >> stage seems to work fine but I'm having difficulty figuring out why

> >> >> playback freezes. It seems we get to a point where we are stuck waiting

> >> >> for a suspiciously exact timer deadline:

> >> >

> >> > I've encountered the following behavior at replay stage:

> >> > - replay takes some instructions to execute (qemu_icount += counter)

> >> > - virtual timer is fired

> >>

> >> This is the virtual timer based on icount not the virtual rt timer? So

> >

> > Regular virtual timer. It's value is based on icount.

> > virtual_rt is used for internal icount purposes.

> 

> And this is where the clock warps come in? The bias brings the

> virtual_rt time forward because execution is waiting for some external

> event to fire (e.g. a timer IRQ)?


I guess so. But bias is not updated when the vCPU works.
vCPU thread updates only qemu_icount which is used for virtual clock calculation.

> >> under the new scheme of being processed in the vCPU loop we should only

> >> fire this one once all execution is done (although you may exit the loop

> >> early before icount is exhausted).

> >

> > We should stop the vCPU before processing virtual timers because:

> > - virtual clock depends on instruction counter

> > - virtual timers may change virtual hardware state

> 

> If we do the processing in the top of main vCPU loop this should be

> equivalent. The instruction counter cannot run because we haven't

> entered tcg_exec_cpu. We also process virtual timers in this thread

> outside the loop so nothing else can be poking the hardware state.


This is how qemu worked in older version - it switched between
processing tasks (vCPU and timers) in one thread.
But how we'll join this behavior with the current design and MTTCG?

> 

> >> > - replay puts back unexecuted instructions (qemu_icount -= counter)

> >> >

> >> > But virtual timer cannot take in account non-executed instructions (counter) and

> >> > therefore it reads only qemu_icount, which provides incorrect time.

> >> >

> >> >> But the timers are all enabled:

> >> >>

> >> >>   (gdb) qemu timers

> >> >>   Processing Realtime timers

> >> >>     clock QEMU_CLOCK_REALTIME is enabled:true, last:-9223372036854775808

> >> >>   Processing Virtual timers

> >> >>     clock QEMU_CLOCK_VIRTUAL is enabled:true, last:-9223372036854775808

> >> >>       timer 34297350016/1 (cb:0x555555a2e952 <ptimer_tick>)

> >> >>       timer 503290000000/1000000 (cb:0x555555bd4d1d <ra_timer_handler>)

> >> >>   Processing Host timers

> >> >>     clock QEMU_CLOCK_HOST is enabled:true, last:1490191319270134000

> >> >>   Processing Virtual RT timers

> >> >>     clock QEMU_CLOCK_VIRTUAL_RT is enabled:true, last:-9223372036854775808

> >> >

> >> > Timers are processed only at checkpoints recorded in the log.

> >> > When replay is stuck, probably there is a pending checkpoint in the log

> >> > and pending instructions in CPU (because iothread breaks its

> >> > synchronization).

> >>

> >> How does this work for instructions that sleep (like WFI)? The WFI will

> >> cause us to exit the loop but I assume we need a "real" timer to trigger

> >> a firing of whatever the WFI is waiting for as time according to icount

> >> isn't advancing.

> >

> > This is virtual_rt for. It advances even when vCPU is stopped.

> 

> So I'm not seeing this happen:

> 

>   process_icount_data: icount not increased 117302332749=>117302332749 RT:-1

> 

> I wonder if the advance of virtual_rt has been deadlocked? The -1

> suggests its not even setup.


qemu_start_warp_timer says that it is setup only when there are active virtual timers.


> > By the way, I made small description of the replay log format

> > to be later included into docs/replay.txt:

> >

> Thanks. That's really useful.

> 

> Do you think it would be worth including the dumper script I wrote in

> ./scripts/? I mainly wrote it to aid my understanding of the code flow

> but it might be useful in future investigations?


Usually I get human-readable logs in record and replay modes to find
divergence between them, but I think your script might be useful in some cases.

Pavel Dovgalyuk
Paolo Bonzini March 31, 2017, 10:16 a.m. UTC | #16
On 31/03/2017 11:16, Pavel Dovgalyuk wrote:
> 

> Do you think it would be worth including the dumper script I wrote in

> ./scripts/? I mainly wrote it to aid my understanding of the code flow

> but it might be useful in future investigations?


Yes, I think it is.  Alternatively, adding "-d rr" that dumps what is
recorded/replayed from the rr file would be useful.

Paolo
Alex Bennée March 31, 2017, 11:21 a.m. UTC | #17
Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]

>> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-

>> request@listserver.greensocs.com]

>> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> >> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-

>> >> request@listserver.greensocs.com]

>> >> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>> <snip>

>> >> >> > I tested on vexpress-a9 platform with Debian wheezy.

>> >> >>

>> >> >> Thanks for that. I now have a test case that I can reproduce failures on

>> >> >> without needing graphics.

>> >> >>

>> >> >> I've been investigating if there are any problems with the timer

>> >> >> processing now they have been moved into the TCG thread. The record

>> >> >> stage seems to work fine but I'm having difficulty figuring out why

>> >> >> playback freezes. It seems we get to a point where we are stuck waiting

>> >> >> for a suspiciously exact timer deadline:

>> >> >

>> >> > I've encountered the following behavior at replay stage:

>> >> > - replay takes some instructions to execute (qemu_icount += counter)

>> >> > - virtual timer is fired

>> >>

>> >> This is the virtual timer based on icount not the virtual rt timer? So

>> >

>> > Regular virtual timer. It's value is based on icount.

>> > virtual_rt is used for internal icount purposes.

>>

>> And this is where the clock warps come in? The bias brings the

>> virtual_rt time forward because execution is waiting for some external

>> event to fire (e.g. a timer IRQ)?

>

> I guess so. But bias is not updated when the vCPU works.

> vCPU thread updates only qemu_icount which is used for virtual clock calculation.

>

>> >> under the new scheme of being processed in the vCPU loop we should only

>> >> fire this one once all execution is done (although you may exit the loop

>> >> early before icount is exhausted).

>> >

>> > We should stop the vCPU before processing virtual timers because:

>> > - virtual clock depends on instruction counter

>> > - virtual timers may change virtual hardware state

>>

>> If we do the processing in the top of main vCPU loop this should be

>> equivalent. The instruction counter cannot run because we haven't

>> entered tcg_exec_cpu. We also process virtual timers in this thread

>> outside the loop so nothing else can be poking the hardware state.

>

> This is how qemu worked in older version - it switched between

> processing tasks (vCPU and timers) in one thread.


The only real difference is the sequencing in the old case was protected
by the BQL - now its my program order.

> But how we'll join this behavior with the current design and MTTCG?


Ignore MTTCG for now. We don't even try and run multiple-threads when
icount is enabled. However the change in the BQL locking is what has
triggered the failures.

Anyway I think I'm getting closer to narrowing it down. On record I see
replay_current_step jump backwards with this:

/* A common event print, called when reading or saving an event */
static void print_event(uint8_t event)
{
    static int event_count;
    static uint64_t last_step;
    uint64_t this_step = replay_get_current_step();

    fprintf(stderr, "replay: event %d is %d @ step=%#" PRIx64 "\n",
            event_count, event, this_step);

    if (this_step < last_step) {
        fprintf(stderr,"%s: !!! step %d went backwards %#"PRIx64"=>%#"PRIx64"!!!\n",
                __func__, event_count, last_step, this_step);
        abort();
    }
    last_step = this_step;
    event_count++;
}

void replay_put_event(uint8_t event)
{
    assert(event < EVENT_COUNT);
    replay_put_byte(event);
    print_event(event);
}

The jump back is fairly consistent in my case (event 66 in the vexpress
run) but I'm fairly sure that is the bug. I can't see any reason for the
step count to go backwards - indeed that breaks the premise of .

Unfortunately when I start putting break and watchpoints in to see how
this jump back occurs the problem goes away until I disable them. So
this very much looks like a race condition corrupting the icount data.

>> >> > - replay puts back unexecuted instructions (qemu_icount -= counter)

>> >> >

>> >> > But virtual timer cannot take in account non-executed instructions (counter) and

>> >> > therefore it reads only qemu_icount, which provides incorrect time.

>> >> >

>> >> >> But the timers are all enabled:

>> >> >>

>> >> >>   (gdb) qemu timers

>> >> >>   Processing Realtime timers

>> >> >>     clock QEMU_CLOCK_REALTIME is enabled:true, last:-9223372036854775808

>> >> >>   Processing Virtual timers

>> >> >>     clock QEMU_CLOCK_VIRTUAL is enabled:true, last:-9223372036854775808

>> >> >>       timer 34297350016/1 (cb:0x555555a2e952 <ptimer_tick>)

>> >> >>       timer 503290000000/1000000 (cb:0x555555bd4d1d <ra_timer_handler>)

>> >> >>   Processing Host timers

>> >> >>     clock QEMU_CLOCK_HOST is enabled:true, last:1490191319270134000

>> >> >>   Processing Virtual RT timers

>> >> >>     clock QEMU_CLOCK_VIRTUAL_RT is enabled:true, last:-9223372036854775808

>> >> >

>> >> > Timers are processed only at checkpoints recorded in the log.

>> >> > When replay is stuck, probably there is a pending checkpoint in the log

>> >> > and pending instructions in CPU (because iothread breaks its

>> >> > synchronization).

>> >>

>> >> How does this work for instructions that sleep (like WFI)? The WFI will

>> >> cause us to exit the loop but I assume we need a "real" timer to trigger

>> >> a firing of whatever the WFI is waiting for as time according to icount

>> >> isn't advancing.

>> >

>> > This is virtual_rt for. It advances even when vCPU is stopped.

>>

>> So I'm not seeing this happen:

>>

>>   process_icount_data: icount not increased 117302332749=>117302332749 RT:-1

>>

>> I wonder if the advance of virtual_rt has been deadlocked? The -1

>> suggests its not even setup.

>

> qemu_start_warp_timer says that it is setup only when there are active virtual timers.

>

>

>> > By the way, I made small description of the replay log format

>> > to be later included into docs/replay.txt:

>> >

>> Thanks. That's really useful.

>>

>> Do you think it would be worth including the dumper script I wrote in

>> ./scripts/? I mainly wrote it to aid my understanding of the code flow

>> but it might be useful in future investigations?

>

> Usually I get human-readable logs in record and replay modes to find

> divergence between them, but I think your script might be useful in some cases.


Cool. I'll include it in the icount series once I have something to post
(fingers crossed I'm getting close).

--
Alex Bennée
Paolo Bonzini March 31, 2017, 11:31 a.m. UTC | #18
On 31/03/2017 13:21, Alex Bennée wrote:
> Anyway I think I'm getting closer to narrowing it down. On record I see

> replay_current_step jump backwards with this:

> 

> /* A common event print, called when reading or saving an event */

> static void print_event(uint8_t event)

> {

>     static int event_count;

>     static uint64_t last_step;

>     uint64_t this_step = replay_get_current_step();

> 

>     fprintf(stderr, "replay: event %d is %d @ step=%#" PRIx64 "\n",

>             event_count, event, this_step);

> 

>     if (this_step < last_step) {

>         fprintf(stderr,"%s: !!! step %d went backwards %#"PRIx64"=>%#"PRIx64"!!!\n",

>                 __func__, event_count, last_step, this_step);

>         abort();

>     }

>     last_step = this_step;

>     event_count++;

> }

> 

> void replay_put_event(uint8_t event)

> {

>     assert(event < EVENT_COUNT);

>     replay_put_byte(event);

>     print_event(event);

> }

> 

> The jump back is fairly consistent in my case (event 66 in the vexpress

> run) but I'm fairly sure that is the bug. I can't see any reason for the

> step count to go backwards - indeed that breaks the premise of .


Good catch!  I suspect it's the "else" case in cpu_get_icount_raw:

    icount = timers_state.qemu_icount;
    if (cpu) {
        if (!cpu->can_do_io) {
            fprintf(stderr, "Bad icount read\n");
            exit(1);
        }
        icount -= (cpu->icount_decr.u16.low + cpu->icount_extra);
    }

Between

        timers_state.qemu_icount += count;

and

        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
                                    + cpu->icount_extra);

the I/O thread can read a value that is later rolled back.  I think you
need to do this the other way round: add something to icount in
cpu_get_icount_raw rather than taking it off:

    icount = timers_state.qemu_icount;
    if (cpu) {
        if (!cpu->can_do_io) {
            fprintf(stderr, "Bad icount read\n");
            exit(1);
        }
        icount += cpu->pending_icount
            - (cpu->icount_decr.u16.low + cpu->icount_extra);
    }

where cpu->pending_icount is set before cpu_exec, and folded into
timers_state.qemu_icount afterwards.

Also, here:

    if (use_icount) {
        int64_t count;
        int decr;
        timers_state.qemu_icount -= (cpu->icount_decr.u16.low
                                    + cpu->icount_extra);
        cpu->icount_decr.u16.low = 0;
        cpu->icount_extra = 0;

this should be dead code because tcg_cpu_exec also clears the two
decrementer fields.  So if you can replace the three assignments with
assertions on cpu->icount_decr.u16.low and cpu->icount_extra, that would
also simplify the code and remove race opportunities.

Paolo

>>>>>> - replay puts back unexecuted instructions (qemu_icount -= counter)

>>>>>>

>>>>>> But virtual timer cannot take in account non-executed instructions (counter) and

>>>>>> therefore it reads only qemu_icount, which provides incorrect time.

>>>>>>

>>>>>>> But the timers are all enabled:

>>>>>>>

>>>>>>>   (gdb) qemu timers

>>>>>>>   Processing Realtime timers

>>>>>>>     clock QEMU_CLOCK_REALTIME is enabled:true, last:-9223372036854775808

>>>>>>>   Processing Virtual timers

>>>>>>>     clock QEMU_CLOCK_VIRTUAL is enabled:true, last:-9223372036854775808

>>>>>>>       timer 34297350016/1 (cb:0x555555a2e952 <ptimer_tick>)

>>>>>>>       timer 503290000000/1000000 (cb:0x555555bd4d1d <ra_timer_handler>)

>>>>>>>   Processing Host timers

>>>>>>>     clock QEMU_CLOCK_HOST is enabled:true, last:1490191319270134000

>>>>>>>   Processing Virtual RT timers

>>>>>>>     clock QEMU_CLOCK_VIRTUAL_RT is enabled:true, last:-9223372036854775808

>>>>>>

>>>>>> Timers are processed only at checkpoints recorded in the log.

>>>>>> When replay is stuck, probably there is a pending checkpoint in the log

>>>>>> and pending instructions in CPU (because iothread breaks its

>>>>>> synchronization).

>>>>>

>>>>> How does this work for instructions that sleep (like WFI)? The WFI will

>>>>> cause us to exit the loop but I assume we need a "real" timer to trigger

>>>>> a firing of whatever the WFI is waiting for as time according to icount

>>>>> isn't advancing.

>>>>

>>>> This is virtual_rt for. It advances even when vCPU is stopped.

>>>

>>> So I'm not seeing this happen:

>>>

>>>   process_icount_data: icount not increased 117302332749=>117302332749 RT:-1

>>>

>>> I wonder if the advance of virtual_rt has been deadlocked? The -1

>>> suggests its not even setup.

>>

>> qemu_start_warp_timer says that it is setup only when there are active virtual timers.

>>

>>

>>>> By the way, I made small description of the replay log format

>>>> to be later included into docs/replay.txt:

>>>>

>>> Thanks. That's really useful.

>>>

>>> Do you think it would be worth including the dumper script I wrote in

>>> ./scripts/? I mainly wrote it to aid my understanding of the code flow

>>> but it might be useful in future investigations?

>>

>> Usually I get human-readable logs in record and replay modes to find

>> divergence between them, but I think your script might be useful in some cases.

> 

> Cool. I'll include it in the icount series once I have something to post

> (fingers crossed I'm getting close).

> 

> --

> Alex Bennée

>
Alex Bennée March 31, 2017, 1:14 p.m. UTC | #19
Alex Bennée <alex.bennee@linaro.org> writes:

> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>

>>> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-request@listserver.greensocs.com]

>>> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>>> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-

>>> request@listserver.greensocs.com]

>>> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>>> >> >> From: mttcg-request@listserver.greensocs.com [mailto:mttcg-

>>> >> request@listserver.greensocs.com]

>>> >> >> Pavel Dovgalyuk <dovgaluk@ispras.ru> writes:

>>> <snip>

>>> >> >> > I tested on vexpress-a9 platform with Debian wheezy.

>>> >> >>

>>> >> >> Thanks for that. I now have a test case that I can reproduce failures on

>>> >> >> without needing graphics.

>>> >> >>

>>> >> >> I've been investigating if there are any problems with the timer

>>> >> >> processing now they have been moved into the TCG thread. The record

>>> >> >> stage seems to work fine but I'm having difficulty figuring out why

>>> >> >> playback freezes. It seems we get to a point where we are stuck waiting

>>> >> >> for a suspiciously exact timer deadline:

>>> >> >

>>> >> > I've encountered the following behavior at replay stage:

>>> >> > - replay takes some instructions to execute (qemu_icount += counter)

>>> >> > - virtual timer is fired

>>> >>

>>> >> This is the virtual timer based on icount not the virtual rt timer? So

>>> >

>>> > Regular virtual timer. It's value is based on icount.

>>> > virtual_rt is used for internal icount purposes.

>>>

>>> And this is where the clock warps come in? The bias brings the

>>> virtual_rt time forward because execution is waiting for some external

>>> event to fire (e.g. a timer IRQ)?

>>

>> I guess so. But bias is not updated when the vCPU works.

>> vCPU thread updates only qemu_icount which is used for virtual clock calculation.

>>

>>> >> under the new scheme of being processed in the vCPU loop we should only

>>> >> fire this one once all execution is done (although you may exit the loop

>>> >> early before icount is exhausted).

>>> >

>>> > We should stop the vCPU before processing virtual timers because:

>>> > - virtual clock depends on instruction counter

>>> > - virtual timers may change virtual hardware state

>>>

>>> If we do the processing in the top of main vCPU loop this should be

>>> equivalent. The instruction counter cannot run because we haven't

>>> entered tcg_exec_cpu. We also process virtual timers in this thread

>>> outside the loop so nothing else can be poking the hardware state.

>>

>> This is how qemu worked in older version - it switched between

>> processing tasks (vCPU and timers) in one thread.

>

> The only real difference is the sequencing in the old case was protected

> by the BQL - now its my program order.

>

>> But how we'll join this behavior with the current design and MTTCG?

>

> Ignore MTTCG for now. We don't even try and run multiple-threads when

> icount is enabled. However the change in the BQL locking is what has

> triggered the failures.

>

> Anyway I think I'm getting closer to narrowing it down. On record I see

> replay_current_step jump backwards with this:

>

> /* A common event print, called when reading or saving an event */

> static void print_event(uint8_t event)

> {

>     static int event_count;

>     static uint64_t last_step;

>     uint64_t this_step = replay_get_current_step();

>

>     fprintf(stderr, "replay: event %d is %d @ step=%#" PRIx64 "\n",

>             event_count, event, this_step);

>

>     if (this_step < last_step) {

>         fprintf(stderr,"%s: !!! step %d went backwards %#"PRIx64"=>%#"PRIx64"!!!\n",

>                 __func__, event_count, last_step, this_step);

>         abort();

>     }

>     last_step = this_step;

>     event_count++;

> }

>

> void replay_put_event(uint8_t event)

> {

>     assert(event < EVENT_COUNT);

>     replay_put_byte(event);

>     print_event(event);

> }

>

> The jump back is fairly consistent in my case (event 66 in the vexpress

> run) but I'm fairly sure that is the bug. I can't see any reason for the

> step count to go backwards - indeed that breaks the premise of .

>

> Unfortunately when I start putting break and watchpoints in to see how

> this jump back occurs the problem goes away until I disable them. So

> this very much looks like a race condition corrupting the icount data.


So this is a symptom of cpu_get_icount_raw(void) only accounting for in
progress instructions when in the vCPU context and the fact
timers_state.qemu_icount is "in credit" while the vCPU is running.

--
Alex Bennée
Alex Bennée March 31, 2017, 7:49 p.m. UTC | #20
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 31/03/2017 13:21, Alex Bennée wrote:

>> Anyway I think I'm getting closer to narrowing it down. On record I see

>> replay_current_step jump backwards with this:

>>

>> /* A common event print, called when reading or saving an event */

>> static void print_event(uint8_t event)

>> {

>>     static int event_count;

>>     static uint64_t last_step;

>>     uint64_t this_step = replay_get_current_step();

>>

>>     fprintf(stderr, "replay: event %d is %d @ step=%#" PRIx64 "\n",

>>             event_count, event, this_step);

>>

>>     if (this_step < last_step) {

>>         fprintf(stderr,"%s: !!! step %d went backwards %#"PRIx64"=>%#"PRIx64"!!!\n",

>>                 __func__, event_count, last_step, this_step);

>>         abort();

>>     }

>>     last_step = this_step;

>>     event_count++;

>> }

>>

>> void replay_put_event(uint8_t event)

>> {

>>     assert(event < EVENT_COUNT);

>>     replay_put_byte(event);

>>     print_event(event);

>> }

>>

>> The jump back is fairly consistent in my case (event 66 in the vexpress

>> run) but I'm fairly sure that is the bug. I can't see any reason for the

>> step count to go backwards - indeed that breaks the premise of .

>

> Good catch!  I suspect it's the "else" case in cpu_get_icount_raw:

>

>     icount = timers_state.qemu_icount;

>     if (cpu) {

>         if (!cpu->can_do_io) {

>             fprintf(stderr, "Bad icount read\n");

>             exit(1);

>         }

>         icount -= (cpu->icount_decr.u16.low + cpu->icount_extra);

>     }

>

> Between

>

>         timers_state.qemu_icount += count;

>

> and

>

>         timers_state.qemu_icount -= (cpu->icount_decr.u16.low

>                                     + cpu->icount_extra);

>

> the I/O thread can read a value that is later rolled back.  I think you

> need to do this the other way round: add something to icount in

> cpu_get_icount_raw rather than taking it off:

>

>     icount = timers_state.qemu_icount;

>     if (cpu) {

>         if (!cpu->can_do_io) {

>             fprintf(stderr, "Bad icount read\n");

>             exit(1);

>         }

>         icount += cpu->pending_icount

>             - (cpu->icount_decr.u16.low + cpu->icount_extra);

>     }

>

> where cpu->pending_icount is set before cpu_exec, and folded into

> timers_state.qemu_icount afterwards.

>

> Also, here:

>

>     if (use_icount) {

>         int64_t count;

>         int decr;

>         timers_state.qemu_icount -= (cpu->icount_decr.u16.low

>                                     + cpu->icount_extra);

>         cpu->icount_decr.u16.low = 0;

>         cpu->icount_extra = 0;

>

> this should be dead code because tcg_cpu_exec also clears the two

> decrementer fields.  So if you can replace the three assignments with

> assertions on cpu->icount_decr.u16.low and cpu->icount_extra, that would

> also simplify the code and remove race opportunities.


I'll have a look at that on Monday. I wrote this before I saw your
email:

  https://github.com/stsquad/qemu/tree/mttcg/debug-record-replay-v1

It fixes the race so time only ever goes forward. Combined with the
following patches it also makes the record/replay streams tolerant of
"missing the boat". With this I can do a initrd run of the vexpress
kernel in both record and playback.

I'm not sure it counts as deterministic though - the vCPU and main-loop
thread seem to figure it out as the go along but I suspect if we really
want to be sure we push the replay_lock() further up. This would ensure
all related events from whichever thread are pushed together.

This is very much a come back on Monday and see if it still seems like a
good idea in the cold light of morning ;-)

--
Alex Bennée