diff mbox

[RFC] ui/console: ensure graphic updates don't race with TCG vCPUs

Message ID 20170315144825.3108-1-alex.bennee@linaro.org
State Superseded
Headers show

Commit Message

Alex Bennée March 15, 2017, 2:48 p.m. UTC
Commit 8d04fb55..

  tcg: drop global lock during TCG code execution

..broke the assumption that updates to the GUI couldn't happen at the
same time as TCG vCPUs where running. As a result the TCG vCPU could
still be updating a directly mapped frame-buffer while the display
side was updating. This would cause artefacts to appear when the
update code assumed that memory block hadn't changed.

The simplest solution is to ensure the two things can't happen at the
same time like the old BQL locking scheme. Here we use the solution
introduced for MTTCG and schedule the update as async_safe_work when
we know no vCPUs can be running.

Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: BALATON Zoltan <balaton@eik.bme.hu>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
 ui/console.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

-- 
2.11.0

Comments

Laurent Desnogues March 22, 2017, 3:06 p.m. UTC | #1
Hi Alex,

this patch breaks:

  http://wiki.qemu.org/download/arm-test-0.2.tar.gz

qemu-system-arm -kernel zImage.integrator -initrd arm_root.img
-append "console=ttyAMA0" -machine integratorcp -serial stdio -icount
0
Uncompressing Linux..........................................................................
done, booting the kernel.
qemu-system-arm: /work/qemu/qemu/memory.c:920:
memory_region_transaction_commit: Assertion
`qemu_mutex_iothread_locked()' failed.

The backtrace looks like this:

#4  0x00005555557ced26 in memory_region_transaction_commit ()
    at /work/qemu/qemu/memory.c:920
#5  0x0000555555959b82 in pl110_update_display (opaque=0x555556a839e0)
    at hw/display/pl110.c:245
#6  0x0000555555a60931 in sdl_refresh (dcl=0x555556b90d40) at ui/sdl.c:821
#7  0x00005555558f9f91 in process_queued_cpu_work (
    cpu=cpu@entry=0x5555566c6810) at cpus-common.c:338
#8  0x00005555557ba5a8 in qemu_wait_io_event_common (
    cpu=cpu@entry=0x5555566c6810) at /work/qemu/qemu/cpus.c:1027
#9  0x00005555557ba9bb in qemu_tcg_wait_io_event (cpu=cpu@entry=0x5555566c6810)
    at /work/qemu/qemu/cpus.c:1048
#10 0x00005555557bc2bd in qemu_tcg_cpu_thread_fn (arg=0x5555566c6810)
    at /work/qemu/qemu/cpus.c:1440
#11 0x00007ffff5f4a6ba in start_thread (arg=0x7fffee650700)
    at pthread_create.c:333
#12 0x00007ffff5c8082d in clone ()
    at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

BTW is it easier if I enter such bug reports in QEMU launchpad?

Thanks,

Laurent

On Wed, Mar 15, 2017 at 3:48 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
> Commit 8d04fb55..

>

>   tcg: drop global lock during TCG code execution

>

> ..broke the assumption that updates to the GUI couldn't happen at the

> same time as TCG vCPUs where running. As a result the TCG vCPU could

> still be updating a directly mapped frame-buffer while the display

> side was updating. This would cause artefacts to appear when the

> update code assumed that memory block hadn't changed.

>

> The simplest solution is to ensure the two things can't happen at the

> same time like the old BQL locking scheme. Here we use the solution

> introduced for MTTCG and schedule the update as async_safe_work when

> we know no vCPUs can be running.

>

> Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>

> Cc: BALATON Zoltan <balaton@eik.bme.hu>

> Cc: Gerd Hoffmann <kraxel@redhat.com>

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> ---

>  ui/console.c | 18 +++++++++++++++++-

>  1 file changed, 17 insertions(+), 1 deletion(-)

>

> diff --git a/ui/console.c b/ui/console.c

> index d1ff7504ec..37a69e27de 100644

> --- a/ui/console.c

> +++ b/ui/console.c

> @@ -1575,13 +1575,29 @@ bool dpy_gfx_check_format(QemuConsole *con,

>      return true;

>  }

>

> +/*

> + * Safe DPY refresh for TCG guests. This runs when the TCG vCPUs are

> + * quiescent so we can avoid races between dirty page tracking for

> + * direct frame-buffer access by the guest.

> + */

> +static void do_safe_dpy_refresh(CPUState *cpu, run_on_cpu_data opaque)

> +{

> +    DisplayChangeListener *dcl = opaque.host_ptr;

> +    dcl->ops->dpy_refresh(dcl);

> +}

> +

>  static void dpy_refresh(DisplayState *s)

>  {

>      DisplayChangeListener *dcl;

>

>      QLIST_FOREACH(dcl, &s->listeners, next) {

>          if (dcl->ops->dpy_refresh) {

> -            dcl->ops->dpy_refresh(dcl);

> +            if (tcg_enabled()) {

> +                async_safe_run_on_cpu(first_cpu, do_safe_dpy_refresh,

> +                                      RUN_ON_CPU_HOST_PTR(dcl));

> +            } else {

> +                dcl->ops->dpy_refresh(dcl);

> +            }

>          }

>      }

>  }

> --

> 2.11.0

>

>
Alex Bennée March 22, 2017, 4:28 p.m. UTC | #2
Laurent Desnogues <laurent.desnogues@gmail.com> writes:

> Hi Alex,

>

> this patch breaks:

>

>   http://wiki.qemu.org/download/arm-test-0.2.tar.gz

>

> qemu-system-arm -kernel zImage.integrator -initrd arm_root.img

> -append "console=ttyAMA0" -machine integratorcp -serial stdio -icount

> 0

> Uncompressing Linux..........................................................................

> done, booting the kernel.

> qemu-system-arm: /work/qemu/qemu/memory.c:920:

> memory_region_transaction_commit: Assertion

> `qemu_mutex_iothread_locked()' failed.


Doh - too many levels of BQL. So wi->exclusive tasks drop the iothread
so they don't deadlock as any other threads try and wake up and get to
cpu_exec_end. Of course the update was originally under BQL so the
deferred work should be as well. We have to make a minor tweak to avoid
triggering RCU clean-up outside the execution context though. Try this:

ui/console: ensure do_safe_dpy_refresh holds BQL

I missed the fact that when an exclusive work item runs it drops the
BQL to ensure all no vCPUs are stuck waiting for it, hence causing a
deadlock. However the actual helper needs to take the BQL especially
as we'll be messing with device emulation bits during the update which
all assume BQL is held.

We make a minor cpu_reloading_memory_map which must try and unlock the
RCU if we are actually outside the running context.

Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
CC: Gerd Hoffmann <kraxel@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>


2 files changed, 3 insertions(+), 1 deletion(-)
cpu-exec-common.c | 2 +-
ui/console.c      | 2 ++

modified   cpu-exec-common.c
@@ -35,7 +35,7 @@ void cpu_loop_exit_noexc(CPUState *cpu)
 #if defined(CONFIG_SOFTMMU)
 void cpu_reloading_memory_map(void)
 {
-    if (qemu_in_vcpu_thread()) {
+    if (qemu_in_vcpu_thread() && current_cpu->running) {
         /* The guest can in theory prolong the RCU critical section as long
          * as it feels like. The major problem with this is that because it
          * can do multiple reconfigurations of the memory map within the
modified   ui/console.c
@@ -1586,7 +1586,9 @@ bool dpy_gfx_check_format(QemuConsole *con,
 static void do_safe_dpy_refresh(CPUState *cpu, run_on_cpu_data opaque)
 {
     DisplayChangeListener *dcl = opaque.host_ptr;
+    qemu_mutex_lock_iothread();
     dcl->ops->dpy_refresh(dcl);
+    qemu_mutex_unlock_iothread();
 }

 static void dpy_refresh(DisplayState *s)
Laurent Desnogues March 22, 2017, 4:38 p.m. UTC | #3
On Wed, Mar 22, 2017 at 5:28 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>

> Laurent Desnogues <laurent.desnogues@gmail.com> writes:

>

>> Hi Alex,

>>

>> this patch breaks:

>>

>>   http://wiki.qemu.org/download/arm-test-0.2.tar.gz

>>

>> qemu-system-arm -kernel zImage.integrator -initrd arm_root.img

>> -append "console=ttyAMA0" -machine integratorcp -serial stdio -icount

>> 0

>> Uncompressing Linux..........................................................................

>> done, booting the kernel.

>> qemu-system-arm: /work/qemu/qemu/memory.c:920:

>> memory_region_transaction_commit: Assertion

>> `qemu_mutex_iothread_locked()' failed.

>

> Doh - too many levels of BQL. So wi->exclusive tasks drop the iothread

> so they don't deadlock as any other threads try and wake up and get to

> cpu_exec_end. Of course the update was originally under BQL so the

> deferred work should be as well. We have to make a minor tweak to avoid

> triggering RCU clean-up outside the execution context though. Try this:

>

> ui/console: ensure do_safe_dpy_refresh holds BQL

>

> I missed the fact that when an exclusive work item runs it drops the

> BQL to ensure all no vCPUs are stuck waiting for it, hence causing a

> deadlock. However the actual helper needs to take the BQL especially

> as we'll be messing with device emulation bits during the update which

> all assume BQL is held.

>

> We make a minor cpu_reloading_memory_map which must try and unlock the

> RCU if we are actually outside the running context.

>

> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>

> CC: Gerd Hoffmann <kraxel@redhat.com>

> CC: Paolo Bonzini <pbonzini@redhat.com>

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>


It works fine on the image I reported and another AArch64 one.

Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>


Thanks for the quick fix,

Laurent

> 2 files changed, 3 insertions(+), 1 deletion(-)

> cpu-exec-common.c | 2 +-

> ui/console.c      | 2 ++

>

> modified   cpu-exec-common.c

> @@ -35,7 +35,7 @@ void cpu_loop_exit_noexc(CPUState *cpu)

>  #if defined(CONFIG_SOFTMMU)

>  void cpu_reloading_memory_map(void)

>  {

> -    if (qemu_in_vcpu_thread()) {

> +    if (qemu_in_vcpu_thread() && current_cpu->running) {

>          /* The guest can in theory prolong the RCU critical section as long

>           * as it feels like. The major problem with this is that because it

>           * can do multiple reconfigurations of the memory map within the

> modified   ui/console.c

> @@ -1586,7 +1586,9 @@ bool dpy_gfx_check_format(QemuConsole *con,

>  static void do_safe_dpy_refresh(CPUState *cpu, run_on_cpu_data opaque)

>  {

>      DisplayChangeListener *dcl = opaque.host_ptr;

> +    qemu_mutex_lock_iothread();

>      dcl->ops->dpy_refresh(dcl);

> +    qemu_mutex_unlock_iothread();

>  }

>

>  static void dpy_refresh(DisplayState *s)
diff mbox

Patch

diff --git a/ui/console.c b/ui/console.c
index d1ff7504ec..37a69e27de 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -1575,13 +1575,29 @@  bool dpy_gfx_check_format(QemuConsole *con,
     return true;
 }
 
+/*
+ * Safe DPY refresh for TCG guests. This runs when the TCG vCPUs are
+ * quiescent so we can avoid races between dirty page tracking for
+ * direct frame-buffer access by the guest.
+ */
+static void do_safe_dpy_refresh(CPUState *cpu, run_on_cpu_data opaque)
+{
+    DisplayChangeListener *dcl = opaque.host_ptr;
+    dcl->ops->dpy_refresh(dcl);
+}
+
 static void dpy_refresh(DisplayState *s)
 {
     DisplayChangeListener *dcl;
 
     QLIST_FOREACH(dcl, &s->listeners, next) {
         if (dcl->ops->dpy_refresh) {
-            dcl->ops->dpy_refresh(dcl);
+            if (tcg_enabled()) {
+                async_safe_run_on_cpu(first_cpu, do_safe_dpy_refresh,
+                                      RUN_ON_CPU_HOST_PTR(dcl));
+            } else {
+                dcl->ops->dpy_refresh(dcl);
+            }
         }
     }
 }