diff mbox series

[PULL,11/24] tcg: enable thread-per-vCPU

Message ID 20170224112109.3147-12-alex.bennee@linaro.org
State Accepted
Commit 372579427a5040a26dfee78464b50e2bdf27ef26
Headers show
Series MTTCG Base enabling patches with ARM enablement | expand

Commit Message

Alex Bennée Feb. 24, 2017, 11:20 a.m. UTC
There are a couple of changes that occur at the same time here:

  - introduce a single vCPU qemu_tcg_cpu_thread_fn

  One of these is spawned per vCPU with its own Thread and Condition
  variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old
  single threaded function.

  - the TLS current_cpu variable is now live for the lifetime of MTTCG
    vCPU threads. This is for future work where async jobs need to know
    the vCPU context they are operating in.

The user to switch on multi-thread behaviour and spawn a thread
per-vCPU. For a simple test kvm-unit-test like:

  ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

Will now use 4 vCPU threads and have an expected FAIL (instead of the
unexpected PASS) as the default mode of the test has no protection when
incrementing a shared variable.

We enable the parallel_cpus flag to ensure we generate correct barrier
and atomic code if supported by the front and backends. This doesn't
automatically enable MTTCG until default_mttcg_enabled() is updated to
check the configuration is supported.

Signed-off-by: KONRAD Frederic <fred.konrad@greensocs.com>

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

[AJB: Some fixes, conditionally, commit rewording]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Richard Henderson <rth@twiddle.net>

---
 cpu-exec.c |   4 --
 cpus.c     | 134 +++++++++++++++++++++++++++++++++++++++++++++++--------------
 2 files changed, 103 insertions(+), 35 deletions(-)

-- 
2.11.0

Comments

Laurent Vivier Feb. 27, 2017, 12:48 p.m. UTC | #1
Le 24/02/2017 à 12:20, Alex Bennée a écrit :
> There are a couple of changes that occur at the same time here:

> 

>   - introduce a single vCPU qemu_tcg_cpu_thread_fn

> 

>   One of these is spawned per vCPU with its own Thread and Condition

>   variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old

>   single threaded function.

> 

>   - the TLS current_cpu variable is now live for the lifetime of MTTCG

>     vCPU threads. This is for future work where async jobs need to know

>     the vCPU context they are operating in.

> 

> The user to switch on multi-thread behaviour and spawn a thread

> per-vCPU. For a simple test kvm-unit-test like:

> 

>   ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

> 

> Will now use 4 vCPU threads and have an expected FAIL (instead of the

> unexpected PASS) as the default mode of the test has no protection when

> incrementing a shared variable.

> 

> We enable the parallel_cpus flag to ensure we generate correct barrier

> and atomic code if supported by the front and backends. This doesn't

> automatically enable MTTCG until default_mttcg_enabled() is updated to

> check the configuration is supported.


This commit breaks linux-user mode:

debian-8 with qemu-ppc on x86_64 with ltp-full-20170116

cd /opt/ltp
./runltp -p -l "qemu-$(date +%FT%T).log" -f /opt/ltp/runtest/syscalls -s
setgroups03

setgroups03    1  TPASS  :  setgroups(65537) fails, Size is >
sysconf(_SC_NGROUPS_MAX), errno=22
qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:
rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.
qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:
rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.
qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:
rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.
...

Laurent
Alex Bennée Feb. 27, 2017, 2:38 p.m. UTC | #2
Laurent Vivier <laurent@vivier.eu> writes:

> Le 24/02/2017 à 12:20, Alex Bennée a écrit :

>> There are a couple of changes that occur at the same time here:

>>

>>   - introduce a single vCPU qemu_tcg_cpu_thread_fn

>>

>>   One of these is spawned per vCPU with its own Thread and Condition

>>   variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old

>>   single threaded function.

>>

>>   - the TLS current_cpu variable is now live for the lifetime of MTTCG

>>     vCPU threads. This is for future work where async jobs need to know

>>     the vCPU context they are operating in.

>>

>> The user to switch on multi-thread behaviour and spawn a thread

>> per-vCPU. For a simple test kvm-unit-test like:

>>

>>   ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

>>

>> Will now use 4 vCPU threads and have an expected FAIL (instead of the

>> unexpected PASS) as the default mode of the test has no protection when

>> incrementing a shared variable.

>>

>> We enable the parallel_cpus flag to ensure we generate correct barrier

>> and atomic code if supported by the front and backends. This doesn't

>> automatically enable MTTCG until default_mttcg_enabled() is updated to

>> check the configuration is supported.

>

> This commit breaks linux-user mode:

>

> debian-8 with qemu-ppc on x86_64 with ltp-full-20170116

>

> cd /opt/ltp

> ./runltp -p -l "qemu-$(date +%FT%T).log" -f /opt/ltp/runtest/syscalls -s

> setgroups03

>

> setgroups03    1  TPASS  :  setgroups(65537) fails, Size is >

> sysconf(_SC_NGROUPS_MAX), errno=22

> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

> ...


Interesting. I can only think the current_cpu change has broken it
because most of the changes in this commit affect softmmu targets only
(linux-user has its own run loop).

Thanks for the report - I'll look into it.


--
Alex Bennée
Laurent Vivier March 13, 2017, 2:03 p.m. UTC | #3
Le 27/02/2017 à 15:38, Alex Bennée a écrit :
> 

> Laurent Vivier <laurent@vivier.eu> writes:

> 

>> Le 24/02/2017 à 12:20, Alex Bennée a écrit :

>>> There are a couple of changes that occur at the same time here:

>>>

>>>   - introduce a single vCPU qemu_tcg_cpu_thread_fn

>>>

>>>   One of these is spawned per vCPU with its own Thread and Condition

>>>   variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old

>>>   single threaded function.

>>>

>>>   - the TLS current_cpu variable is now live for the lifetime of MTTCG

>>>     vCPU threads. This is for future work where async jobs need to know

>>>     the vCPU context they are operating in.

>>>

>>> The user to switch on multi-thread behaviour and spawn a thread

>>> per-vCPU. For a simple test kvm-unit-test like:

>>>

>>>   ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

>>>

>>> Will now use 4 vCPU threads and have an expected FAIL (instead of the

>>> unexpected PASS) as the default mode of the test has no protection when

>>> incrementing a shared variable.

>>>

>>> We enable the parallel_cpus flag to ensure we generate correct barrier

>>> and atomic code if supported by the front and backends. This doesn't

>>> automatically enable MTTCG until default_mttcg_enabled() is updated to

>>> check the configuration is supported.

>>

>> This commit breaks linux-user mode:

>>

>> debian-8 with qemu-ppc on x86_64 with ltp-full-20170116

>>

>> cd /opt/ltp

>> ./runltp -p -l "qemu-$(date +%FT%T).log" -f /opt/ltp/runtest/syscalls -s

>> setgroups03

>>

>> setgroups03    1  TPASS  :  setgroups(65537) fails, Size is >

>> sysconf(_SC_NGROUPS_MAX), errno=22

>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>> ...

> 

> Interesting. I can only think the current_cpu change has broken it

> because most of the changes in this commit affect softmmu targets only

> (linux-user has its own run loop).

> 

> Thanks for the report - I'll look into it.


After:

     95b0eca Merge remote-tracking branch
'remotes/stsquad/tags/pull-mttcg-fixups-090317-1' into staging

[Tested with my HEAD on:
b1616fe Merge remote-tracking branch
'remotes/famz/tags/docker-pull-request' into staging]

I have now:

<<<test_start>>>
tag=setgroups03 stime=1489413401
cmdline="setgroups03"
contacts=""
analysis=exit
<<<test_output>>>
**
ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion
failed: (cpu == current_cpu)
**

Laurent
Alex Bennée March 13, 2017, 4:58 p.m. UTC | #4
Laurent Vivier <laurent@vivier.eu> writes:

> Le 27/02/2017 à 15:38, Alex Bennée a écrit :

>>

>> Laurent Vivier <laurent@vivier.eu> writes:

>>

>>> Le 24/02/2017 à 12:20, Alex Bennée a écrit :

>>>> There are a couple of changes that occur at the same time here:

>>>>

>>>>   - introduce a single vCPU qemu_tcg_cpu_thread_fn

>>>>

>>>>   One of these is spawned per vCPU with its own Thread and Condition

>>>>   variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old

>>>>   single threaded function.

>>>>

>>>>   - the TLS current_cpu variable is now live for the lifetime of MTTCG

>>>>     vCPU threads. This is for future work where async jobs need to know

>>>>     the vCPU context they are operating in.

>>>>

>>>> The user to switch on multi-thread behaviour and spawn a thread

>>>> per-vCPU. For a simple test kvm-unit-test like:

>>>>

>>>>   ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

>>>>

>>>> Will now use 4 vCPU threads and have an expected FAIL (instead of the

>>>> unexpected PASS) as the default mode of the test has no protection when

>>>> incrementing a shared variable.

>>>>

>>>> We enable the parallel_cpus flag to ensure we generate correct barrier

>>>> and atomic code if supported by the front and backends. This doesn't

>>>> automatically enable MTTCG until default_mttcg_enabled() is updated to

>>>> check the configuration is supported.

>>>

>>> This commit breaks linux-user mode:

>>>

>>> debian-8 with qemu-ppc on x86_64 with ltp-full-20170116

>>>

>>> cd /opt/ltp

>>> ./runltp -p -l "qemu-$(date +%FT%T).log" -f /opt/ltp/runtest/syscalls -s

>>> setgroups03

>>>

>>> setgroups03    1  TPASS  :  setgroups(65537) fails, Size is >

>>> sysconf(_SC_NGROUPS_MAX), errno=22

>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>> ...

>>

>> Interesting. I can only think the current_cpu change has broken it

>> because most of the changes in this commit affect softmmu targets only

>> (linux-user has its own run loop).

>>

>> Thanks for the report - I'll look into it.

>

> After:

>

>      95b0eca Merge remote-tracking branch

> 'remotes/stsquad/tags/pull-mttcg-fixups-090317-1' into staging

>

> [Tested with my HEAD on:

> b1616fe Merge remote-tracking branch

> 'remotes/famz/tags/docker-pull-request' into staging]

>

> I have now:

>

> <<<test_start>>>

> tag=setgroups03 stime=1489413401

> cmdline="setgroups03"

> contacts=""

> analysis=exit

> <<<test_output>>>

> **

> ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion

> failed: (cpu == current_cpu)

> **


So I think this is saying that we were outside the tcg_exec_loop for
this cpu and somehow longjmp'ed back into the loop.

I'll start setting up LTP on my system but in the meantime you might
find it useful adding the cpu == current_cpu assert into all the places
in cpu-exec-common.c before siglongjmp is called. Then a backtrace of
the offending call will be easier to follow.

>

> Laurent



--
Alex Bennée
Laurent Vivier March 13, 2017, 6:21 p.m. UTC | #5
Le 13/03/2017 à 17:58, Alex Bennée a écrit :
> 

> Laurent Vivier <laurent@vivier.eu> writes:

> 

>> Le 27/02/2017 à 15:38, Alex Bennée a écrit :

>>>

>>> Laurent Vivier <laurent@vivier.eu> writes:

>>>

>>>> Le 24/02/2017 à 12:20, Alex Bennée a écrit :

>>>>> There are a couple of changes that occur at the same time here:

>>>>>

>>>>>   - introduce a single vCPU qemu_tcg_cpu_thread_fn

>>>>>

>>>>>   One of these is spawned per vCPU with its own Thread and Condition

>>>>>   variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old

>>>>>   single threaded function.

>>>>>

>>>>>   - the TLS current_cpu variable is now live for the lifetime of MTTCG

>>>>>     vCPU threads. This is for future work where async jobs need to know

>>>>>     the vCPU context they are operating in.

>>>>>

>>>>> The user to switch on multi-thread behaviour and spawn a thread

>>>>> per-vCPU. For a simple test kvm-unit-test like:

>>>>>

>>>>>   ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

>>>>>

>>>>> Will now use 4 vCPU threads and have an expected FAIL (instead of the

>>>>> unexpected PASS) as the default mode of the test has no protection when

>>>>> incrementing a shared variable.

>>>>>

>>>>> We enable the parallel_cpus flag to ensure we generate correct barrier

>>>>> and atomic code if supported by the front and backends. This doesn't

>>>>> automatically enable MTTCG until default_mttcg_enabled() is updated to

>>>>> check the configuration is supported.

>>>>

>>>> This commit breaks linux-user mode:

>>>>

>>>> debian-8 with qemu-ppc on x86_64 with ltp-full-20170116

>>>>

>>>> cd /opt/ltp

>>>> ./runltp -p -l "qemu-$(date +%FT%T).log" -f /opt/ltp/runtest/syscalls -s

>>>> setgroups03

>>>>

>>>> setgroups03    1  TPASS  :  setgroups(65537) fails, Size is >

>>>> sysconf(_SC_NGROUPS_MAX), errno=22

>>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>>> ...

>>>

>>> Interesting. I can only think the current_cpu change has broken it

>>> because most of the changes in this commit affect softmmu targets only

>>> (linux-user has its own run loop).

>>>

>>> Thanks for the report - I'll look into it.

>>

>> After:

>>

>>      95b0eca Merge remote-tracking branch

>> 'remotes/stsquad/tags/pull-mttcg-fixups-090317-1' into staging

>>

>> [Tested with my HEAD on:

>> b1616fe Merge remote-tracking branch

>> 'remotes/famz/tags/docker-pull-request' into staging]

>>

>> I have now:

>>

>> <<<test_start>>>

>> tag=setgroups03 stime=1489413401

>> cmdline="setgroups03"

>> contacts=""

>> analysis=exit

>> <<<test_output>>>

>> **

>> ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion

>> failed: (cpu == current_cpu)

>> **

> 

> So I think this is saying that we were outside the tcg_exec_loop for

> this cpu and somehow longjmp'ed back into the loop.

> 

> I'll start setting up LTP on my system but in the meantime you might

> find it useful adding the cpu == current_cpu assert into all the places

> in cpu-exec-common.c before siglongjmp is called. Then a backtrace of

> the offending call will be easier to follow.


If I patch cpu-exec-common.c:
I have exactly the same trace:

**
ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion
failed: (cpu == current_cpu)
**
ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion
failed: (cpu == current_cpu)
**
ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion
failed: (cpu == current_cpu)
**
ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion
failed: (cpu == current_cpu)
**
ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion
failed: (cpu == current_cpu)

QEMU_STRACE gives:

6805 close(3) = 0
6805 setgroups(65536,-159891448,0,-150998360,0,0)**
ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion
failed: (cpu == current_cpu)

and strace gives:

sudo strace -ffff chroot /var/lib/lxc/debian-8-ppc/rootfs
/opt/ltp/testcases/bin/setgroups03
...
[pid  6690] futex(0x7ffce8bc3340, FUTEX_WAIT_PRIVATE, 1, NULL
<unfinished ...>
[pid  6691] --- SIGRT_1 {si_signo=SIGRT_1, si_code=SI_TKILL,
si_pid=6690, si_uid=0} ---
[pid  6691] setgroups(65536, [65534, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]) = 0
[pid  6691] futex(0x7f656a601d1c, FUTEX_WAKE_PRIVATE, 1) = 0
[pid  6691] futex(0x7ffce8bc3340, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid  6690] <... futex resumed> )       = 0
[pid  6691] <... futex resumed> )       = 1
[pid  6690] setgroups(65536, [65534, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
<unfinished ...>
[pid  6691] rt_sigreturn({mask=~[KILL STOP RTMIN RT_1]} <unfinished ...>
[pid  6690] <... setgroups resumed> )   = -1 EPERM (Operation not permitted)
[pid  6691] <... rt_sigreturn resumed> ) = 202
[pid  6690] rt_sigprocmask(SIG_UNBLOCK, [ABRT],  <unfinished ...>
[pid  6691] futex(0x625ffba4, FUTEX_WAIT, 4294967295, NULL <unfinished ...>
[pid  6690] <... rt_sigprocmask resumed> NULL, 8) = 0
[pid  6690] rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
[pid  6690] getpid()                    = 6690
[pid  6690] gettid()                    = 6690
[pid  6690] tgkill(6690, 6690, SIGABRT) = 0
[pid  6690] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid  6690] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL,
si_pid=6690, si_uid=0} ---
[pid  6690] rt_sigreturn({mask=~[BUS SEGV]}) = 0
[pid  6690] rt_sigaction(SIGABRT, {sa_handler=SIG_DFL, sa_mask=~[],
sa_flags=SA_RESTORER, sa_restorer=0x6018b100}, NULL, 8) = 0
[pid  6690] rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], ~[BUS KILL SEGV
STOP], 8) = 0
[pid  6690] getpid()                    = 6690
[pid  6690] gettid()                    = 6690
[pid  6690] tgkill(6690, 6690, SIGABRT) = 0
[pid  6690] rt_sigprocmask(SIG_SETMASK, ~[BUS KILL SEGV STOP], NULL, 8) = 0
[pid  6690] --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL,
si_addr=NULL} ---
[pid  6690] rt_sigprocmask(SIG_SETMASK, ~[BUS KILL SEGV STOP], NULL, 8) = 0
[pid  6690] open("/usr/lib64/charset.alias", O_RDONLY) = -1 ENOENT (No
such file or directory)
[pid  6690] open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY) = -1
ENOENT (No such file or directory)
[pid  6690] open("/usr/lib64/gconv/gconv-modules", O_RDONLY|O_CLOEXEC) =
-1 ENOENT (No such file or directory)
[pid  6690] futex(0x62605a30, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid  6690] brk(0x636dc000)             = 0x636dc000
[pid  6690] write(2, "**\nERROR:/home/laurent/Projects/"..., 101**
ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion
failed: (cpu == current_cpu)
) = 101
[pid  6690] brk(0x636d4000)             = 0x636d4000
[pid  6690] --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL,
si_addr=NULL} ---
[pid  6690] rt_sigprocmask(SIG_SETMASK, ~[BUS KILL SEGV STOP], NULL, 8) = 0
[pid  6690] write(2, "**\nERROR:/home/laurent/Projects/"..., 101**
ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion
failed: (cpu == current_cpu)
) = 101
[pid  6690] --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL,
si_addr=NULL} ---
[pid  6690] rt_sigprocmask(SIG_SETMASK, ~[BUS KILL SEGV STOP], NULL, 8) = 0
[pid  6690] write(2, "**\nERROR:/home/laurent/Projects/"..., 101**
ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion
failed: (cpu == current_cpu)
) = 101
[pid  6690] --- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL,
si_addr=NULL} ---
[pid  6690] rt_sigprocmask(SIG_SETMASK, ~[BUS KILL SEGV STOP], NULL, 8) = 0
[pid  6690] write(2, "**\nERROR:/home/laurent/Projects/"..., 101**
ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion
failed: (cpu == current_cpu)

Laurentdiff --git a/cpu-exec-common.c b/cpu-exec-common.c
index 0504a94..4bdf295 100644
--- a/cpu-exec-common.c
+++ b/cpu-exec-common.c
@@ -29,6 +29,7 @@ void cpu_loop_exit_noexc(CPUState *cpu)
     /* XXX: restore cpu registers saved in host registers */

     cpu->exception_index = -1;
+g_assert(cpu == current_cpu);
     siglongjmp(cpu->jmp_env, 1);
 }

@@ -64,6 +65,7 @@ void cpu_reloading_memory_map(void)

 void cpu_loop_exit(CPUState *cpu)
 {
+g_assert(cpu == current_cpu);
     siglongjmp(cpu->jmp_env, 1);
 }

@@ -72,6 +74,7 @@ void cpu_loop_exit_restore(CPUState *cpu, uintptr_t pc)
     if (pc) {
         cpu_restore_state(cpu, pc);
     }
+g_assert(cpu == current_cpu);
     siglongjmp(cpu->jmp_env, 1);
 }


Alex Bennée March 16, 2017, 5:31 p.m. UTC | #6
Laurent Vivier <laurent@vivier.eu> writes:

> Le 27/02/2017 à 15:38, Alex Bennée a écrit :

>>

>> Laurent Vivier <laurent@vivier.eu> writes:

>>

>>> Le 24/02/2017 à 12:20, Alex Bennée a écrit :

>>>> There are a couple of changes that occur at the same time here:

>>>>

>>>>   - introduce a single vCPU qemu_tcg_cpu_thread_fn

>>>>

>>>>   One of these is spawned per vCPU with its own Thread and Condition

>>>>   variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old

>>>>   single threaded function.

>>>>

>>>>   - the TLS current_cpu variable is now live for the lifetime of MTTCG

>>>>     vCPU threads. This is for future work where async jobs need to know

>>>>     the vCPU context they are operating in.

>>>>

>>>> The user to switch on multi-thread behaviour and spawn a thread

>>>> per-vCPU. For a simple test kvm-unit-test like:

>>>>

>>>>   ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

>>>>

>>>> Will now use 4 vCPU threads and have an expected FAIL (instead of the

>>>> unexpected PASS) as the default mode of the test has no protection when

>>>> incrementing a shared variable.

>>>>

>>>> We enable the parallel_cpus flag to ensure we generate correct barrier

>>>> and atomic code if supported by the front and backends. This doesn't

>>>> automatically enable MTTCG until default_mttcg_enabled() is updated to

>>>> check the configuration is supported.

>>>

>>> This commit breaks linux-user mode:

>>>

>>> debian-8 with qemu-ppc on x86_64 with ltp-full-20170116

>>>

>>> cd /opt/ltp

>>> ./runltp -p -l "qemu-$(date +%FT%T).log" -f /opt/ltp/runtest/syscalls -s

>>> setgroups03

>>>

>>> setgroups03    1  TPASS  :  setgroups(65537) fails, Size is >

>>> sysconf(_SC_NGROUPS_MAX), errno=22

>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>> ...

>>

>> Interesting. I can only think the current_cpu change has broken it

>> because most of the changes in this commit affect softmmu targets only

>> (linux-user has its own run loop).

>>

>> Thanks for the report - I'll look into it.

>

> After:

>

>      95b0eca Merge remote-tracking branch

> 'remotes/stsquad/tags/pull-mttcg-fixups-090317-1' into staging

>

> [Tested with my HEAD on:

> b1616fe Merge remote-tracking branch

> 'remotes/famz/tags/docker-pull-request' into staging]

>

> I have now:

>

> <<<test_start>>>

> tag=setgroups03 stime=1489413401

> cmdline="setgroups03"

> contacts=""

> analysis=exit

> <<<test_output>>>

> **

> ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion

> failed: (cpu == current_cpu)

> **


Sorry about the delay. After lengthy fighting to get LTP on PowerPC
built I got this behaviour:

  17:26 alex@zen taken:41, git:mttcg/more-fixes-for-rc1, [/home/alex/lsrc/qemu/qemu.git]> sudo ./ppc-linux-user/qemu-ppc ./ppc-linux-user/setgroups03
  setgroups03    1  TPASS  :  setgroups(65537) fails, Size is > sysconf(_SC_NGROUPS_MAX), errno=22
  setgroups03    2  TBROK  :  tst_sig.c:233: unexpected signal SIGSEGV(11) received (pid = 22137).
  setgroups03    3  TBROK  :  tst_sig.c:233: Remaining cases broken

I'm afraid I can't compare the result to real hardware so maybe my LTP
build is broken. But the main thing is I can't seem to reproduce it
here.

Could you ping me your LTP binary so I can have a look?

The other thing to note is the assert you now see firing is a guard for
buggy compilers. What version are you building with and can you try any
other versions?

--
Alex Bennée
Laurent Vivier March 16, 2017, 6:36 p.m. UTC | #7
Le 16/03/2017 à 18:31, Alex Bennée a écrit :
> 

> Laurent Vivier <laurent@vivier.eu> writes:

> 

>> Le 27/02/2017 à 15:38, Alex Bennée a écrit :

>>>

>>> Laurent Vivier <laurent@vivier.eu> writes:

>>>

>>>> Le 24/02/2017 à 12:20, Alex Bennée a écrit :

>>>>> There are a couple of changes that occur at the same time here:

>>>>>

>>>>>   - introduce a single vCPU qemu_tcg_cpu_thread_fn

>>>>>

>>>>>   One of these is spawned per vCPU with its own Thread and Condition

>>>>>   variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old

>>>>>   single threaded function.

>>>>>

>>>>>   - the TLS current_cpu variable is now live for the lifetime of MTTCG

>>>>>     vCPU threads. This is for future work where async jobs need to know

>>>>>     the vCPU context they are operating in.

>>>>>

>>>>> The user to switch on multi-thread behaviour and spawn a thread

>>>>> per-vCPU. For a simple test kvm-unit-test like:

>>>>>

>>>>>   ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

>>>>>

>>>>> Will now use 4 vCPU threads and have an expected FAIL (instead of the

>>>>> unexpected PASS) as the default mode of the test has no protection when

>>>>> incrementing a shared variable.

>>>>>

>>>>> We enable the parallel_cpus flag to ensure we generate correct barrier

>>>>> and atomic code if supported by the front and backends. This doesn't

>>>>> automatically enable MTTCG until default_mttcg_enabled() is updated to

>>>>> check the configuration is supported.

>>>>

>>>> This commit breaks linux-user mode:

>>>>

>>>> debian-8 with qemu-ppc on x86_64 with ltp-full-20170116

>>>>

>>>> cd /opt/ltp

>>>> ./runltp -p -l "qemu-$(date +%FT%T).log" -f /opt/ltp/runtest/syscalls -s

>>>> setgroups03

>>>>

>>>> setgroups03    1  TPASS  :  setgroups(65537) fails, Size is >

>>>> sysconf(_SC_NGROUPS_MAX), errno=22

>>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>>> ...

>>>

>>> Interesting. I can only think the current_cpu change has broken it

>>> because most of the changes in this commit affect softmmu targets only

>>> (linux-user has its own run loop).

>>>

>>> Thanks for the report - I'll look into it.

>>

>> After:

>>

>>      95b0eca Merge remote-tracking branch

>> 'remotes/stsquad/tags/pull-mttcg-fixups-090317-1' into staging

>>

>> [Tested with my HEAD on:

>> b1616fe Merge remote-tracking branch

>> 'remotes/famz/tags/docker-pull-request' into staging]

>>

>> I have now:

>>

>> <<<test_start>>>

>> tag=setgroups03 stime=1489413401

>> cmdline="setgroups03"

>> contacts=""

>> analysis=exit

>> <<<test_output>>>

>> **

>> ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion

>> failed: (cpu == current_cpu)

>> **

> 

> Sorry about the delay. After lengthy fighting to get LTP on PowerPC

> built I got this behaviour:

> 

>   17:26 alex@zen taken:41, git:mttcg/more-fixes-for-rc1, [/home/alex/lsrc/qemu/qemu.git]> sudo ./ppc-linux-user/qemu-ppc ./ppc-linux-user/setgroups03

>   setgroups03    1  TPASS  :  setgroups(65537) fails, Size is > sysconf(_SC_NGROUPS_MAX), errno=22

>   setgroups03    2  TBROK  :  tst_sig.c:233: unexpected signal SIGSEGV(11) received (pid = 22137).

>   setgroups03    3  TBROK  :  tst_sig.c:233: Remaining cases broken


I've just tested with master (272d7de) and I always have the "(cpu ==
current_cpu)".
I think this test has never worked correctly (I mean I had also the
SIGSEGV signal before), what is annoying here is the infinite loop
generated by this error (the test never ends).

> 

> I'm afraid I can't compare the result to real hardware so maybe my LTP

> build is broken. But the main thing is I can't seem to reproduce it

> here.


In attachment.

> Could you ping me your LTP binary so I can have a look?

> 

> The other thing to note is the assert you now see firing is a guard for

> buggy compilers. What version are you building with and can you try any

> other versions?


gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)

Laurent
Alex Bennée March 17, 2017, 8:43 p.m. UTC | #8
Laurent Vivier <laurent@vivier.eu> writes:

> Le 27/02/2017 à 15:38, Alex Bennée a écrit :

>>

>> Laurent Vivier <laurent@vivier.eu> writes:

>>

>>> Le 24/02/2017 à 12:20, Alex Bennée a écrit :

>>>> There are a couple of changes that occur at the same time here:

>>>>

>>>>   - introduce a single vCPU qemu_tcg_cpu_thread_fn

>>>>

>>>>   One of these is spawned per vCPU with its own Thread and Condition

>>>>   variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old

>>>>   single threaded function.

>>>>

>>>>   - the TLS current_cpu variable is now live for the lifetime of MTTCG

>>>>     vCPU threads. This is for future work where async jobs need to know

>>>>     the vCPU context they are operating in.

>>>>

>>>> The user to switch on multi-thread behaviour and spawn a thread

>>>> per-vCPU. For a simple test kvm-unit-test like:

>>>>

>>>>   ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

>>>>

>>>> Will now use 4 vCPU threads and have an expected FAIL (instead of the

>>>> unexpected PASS) as the default mode of the test has no protection when

>>>> incrementing a shared variable.

>>>>

>>>> We enable the parallel_cpus flag to ensure we generate correct barrier

>>>> and atomic code if supported by the front and backends. This doesn't

>>>> automatically enable MTTCG until default_mttcg_enabled() is updated to

>>>> check the configuration is supported.

>>>

>>> This commit breaks linux-user mode:

>>>

>>> debian-8 with qemu-ppc on x86_64 with ltp-full-20170116

>>>

>>> cd /opt/ltp

>>> ./runltp -p -l "qemu-$(date +%FT%T).log" -f /opt/ltp/runtest/syscalls -s

>>> setgroups03

>>>

>>> setgroups03    1  TPASS  :  setgroups(65537) fails, Size is >

>>> sysconf(_SC_NGROUPS_MAX), errno=22

>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>> ...

>>

>> Interesting. I can only think the current_cpu change has broken it

>> because most of the changes in this commit affect softmmu targets only

>> (linux-user has its own run loop).

>>

>> Thanks for the report - I'll look into it.

>

> After:

>

>      95b0eca Merge remote-tracking branch

> 'remotes/stsquad/tags/pull-mttcg-fixups-090317-1' into staging

>

> [Tested with my HEAD on:

> b1616fe Merge remote-tracking branch

> 'remotes/famz/tags/docker-pull-request' into staging]

>

> I have now:

>

> <<<test_start>>>

> tag=setgroups03 stime=1489413401

> cmdline="setgroups03"

> contacts=""

> analysis=exit

> <<<test_output>>>

> **

> ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion

> failed: (cpu == current_cpu)

> **


OK we now understand what's happening:

 - setgroups calls __nptl_setxid_error, triggers abort()
   - this sends sig_num 6, then 11
 - host_signal_handler tries to handle 11
 - -> handle_cpu_signal

Pre: tcg: enable thread-per-vCPU caused this problem:

 - current_cpu was reset to NULL on the way out of the loop
 - therefore handle_cpu_signal went boom because
     cpu = current_cpu;
     cc = CPU_GET_CLASS(cpu);

Post: tcg: enable thread-per-vCPU caused this problem:

 - current_cpu is now live outside cpu_exec_loop
   - this is mainly so async_work functions can assert (cpu == current_cpu)
 - hence handle_cpu_signal gets further and calls
    cpu_loop_exit(cpu);
 - hilarity ensues as we siglongjmp into a stale context

Obviously we shouldn't try to siglongjmp. But we also shouldn't rely on
current_cpu as a proxy to crash early when outside of the loop. There is
a slight wrinkle that we also have funny handling of segs during
translation if a guest jumps to code in an as-yet un-mapped region of
memory.

There is currently cpu->running which is set/cleared by
cpu_exec_start/end. Although if we crash between cpu_exec_start and
sigsetjmp the same sort of brokenness might happen.

Anyway understood now. If anyone has any suggestions for neater stuff
over the weekend please shout, otherwise I'll probably just hack
handle_cpu_signal to do:

   cpu = current_cpu;
   if (!cpu->running) {
      /* we weren't running or translating JIT code when the signal came */
      return 1;
   }


--
Alex Bennée
Laurent Vivier March 18, 2017, 11:19 a.m. UTC | #9
Le 17/03/2017 à 21:43, Alex Bennée a écrit :
> 

> Laurent Vivier <laurent@vivier.eu> writes:

> 

>> Le 27/02/2017 à 15:38, Alex Bennée a écrit :

>>>

>>> Laurent Vivier <laurent@vivier.eu> writes:

>>>

>>>> Le 24/02/2017 à 12:20, Alex Bennée a écrit :

>>>>> There are a couple of changes that occur at the same time here:

>>>>>

>>>>>   - introduce a single vCPU qemu_tcg_cpu_thread_fn

>>>>>

>>>>>   One of these is spawned per vCPU with its own Thread and Condition

>>>>>   variables. qemu_tcg_rr_cpu_thread_fn is the new name for the old

>>>>>   single threaded function.

>>>>>

>>>>>   - the TLS current_cpu variable is now live for the lifetime of MTTCG

>>>>>     vCPU threads. This is for future work where async jobs need to know

>>>>>     the vCPU context they are operating in.

>>>>>

>>>>> The user to switch on multi-thread behaviour and spawn a thread

>>>>> per-vCPU. For a simple test kvm-unit-test like:

>>>>>

>>>>>   ./arm/run ./arm/locking-test.flat -smp 4 -accel tcg,thread=multi

>>>>>

>>>>> Will now use 4 vCPU threads and have an expected FAIL (instead of the

>>>>> unexpected PASS) as the default mode of the test has no protection when

>>>>> incrementing a shared variable.

>>>>>

>>>>> We enable the parallel_cpus flag to ensure we generate correct barrier

>>>>> and atomic code if supported by the front and backends. This doesn't

>>>>> automatically enable MTTCG until default_mttcg_enabled() is updated to

>>>>> check the configuration is supported.

>>>>

>>>> This commit breaks linux-user mode:

>>>>

>>>> debian-8 with qemu-ppc on x86_64 with ltp-full-20170116

>>>>

>>>> cd /opt/ltp

>>>> ./runltp -p -l "qemu-$(date +%FT%T).log" -f /opt/ltp/runtest/syscalls -s

>>>> setgroups03

>>>>

>>>> setgroups03    1  TPASS  :  setgroups(65537) fails, Size is >

>>>> sysconf(_SC_NGROUPS_MAX), errno=22

>>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>>> qemu-ppc: /home/laurent/Projects/qemu/include/qemu/rcu.h:89:

>>>> rcu_read_unlock: Assertion `p_rcu_reader->depth != 0' failed.

>>>> ...

>>>

>>> Interesting. I can only think the current_cpu change has broken it

>>> because most of the changes in this commit affect softmmu targets only

>>> (linux-user has its own run loop).

>>>

>>> Thanks for the report - I'll look into it.

>>

>> After:

>>

>>      95b0eca Merge remote-tracking branch

>> 'remotes/stsquad/tags/pull-mttcg-fixups-090317-1' into staging

>>

>> [Tested with my HEAD on:

>> b1616fe Merge remote-tracking branch

>> 'remotes/famz/tags/docker-pull-request' into staging]

>>

>> I have now:

>>

>> <<<test_start>>>

>> tag=setgroups03 stime=1489413401

>> cmdline="setgroups03"

>> contacts=""

>> analysis=exit

>> <<<test_output>>>

>> **

>> ERROR:/home/laurent/Projects/qemu/cpu-exec.c:656:cpu_exec: assertion

>> failed: (cpu == current_cpu)

>> **

> 

> OK we now understand what's happening:

> 

>  - setgroups calls __nptl_setxid_error, triggers abort()

>    - this sends sig_num 6, then 11

>  - host_signal_handler tries to handle 11

>  - -> handle_cpu_signal

> 

> Pre: tcg: enable thread-per-vCPU caused this problem:

> 

>  - current_cpu was reset to NULL on the way out of the loop

>  - therefore handle_cpu_signal went boom because

>      cpu = current_cpu;

>      cc = CPU_GET_CLASS(cpu);

> 

> Post: tcg: enable thread-per-vCPU caused this problem:

> 

>  - current_cpu is now live outside cpu_exec_loop

>    - this is mainly so async_work functions can assert (cpu == current_cpu)

>  - hence handle_cpu_signal gets further and calls

>     cpu_loop_exit(cpu);

>  - hilarity ensues as we siglongjmp into a stale context

> 

> Obviously we shouldn't try to siglongjmp. But we also shouldn't rely on

> current_cpu as a proxy to crash early when outside of the loop. There is

> a slight wrinkle that we also have funny handling of segs during

> translation if a guest jumps to code in an as-yet un-mapped region of

> memory.

> 

> There is currently cpu->running which is set/cleared by

> cpu_exec_start/end. Although if we crash between cpu_exec_start and

> sigsetjmp the same sort of brokenness might happen.

> 

> Anyway understood now. If anyone has any suggestions for neater stuff

> over the weekend please shout, otherwise I'll probably just hack

> handle_cpu_signal to do:

> 

>    cpu = current_cpu;

>    if (!cpu->running) {

>       /* we weren't running or translating JIT code when the signal came */

>       return 1;

>    }


The return doesn't break the loop, but an abort() does.
I think we can put abort() here as it can be seen as an internal error
(and we get back the previous behavior).

Laurent
Paolo Bonzini March 20, 2017, 11:19 a.m. UTC | #10
On 17/03/2017 21:43, Alex Bennée wrote:
> There is currently cpu->running which is set/cleared by

> cpu_exec_start/end. Although if we crash between cpu_exec_start and

> sigsetjmp the same sort of brokenness might happen.


I think cpu_exec_start/end should be moved into cpu_exec itself (but
probably just in 2.10).

Paolo

> Anyway understood now. If anyone has any suggestions for neater stuff

> over the weekend please shout, otherwise I'll probably just hack

> handle_cpu_signal to do:

> 

>    cpu = current_cpu;

>    if (!cpu->running) {

>       /* we weren't running or translating JIT code when the signal came */

>       return 1;

>    }
Alex Bennée March 20, 2017, 11:47 a.m. UTC | #11
Paolo Bonzini <pbonzini@redhat.com> writes:

> On 17/03/2017 21:43, Alex Bennée wrote:

>> There is currently cpu->running which is set/cleared by

>> cpu_exec_start/end. Although if we crash between cpu_exec_start and

>> sigsetjmp the same sort of brokenness might happen.

>

> I think cpu_exec_start/end should be moved into cpu_exec itself (but

> probably just in 2.10).


Sure. Although hopefully we can resist the temptation to insert segging
code into that small window in the meantime ;-)

>

> Paolo

>

>> Anyway understood now. If anyone has any suggestions for neater stuff

>> over the weekend please shout, otherwise I'll probably just hack

>> handle_cpu_signal to do:

>>

>>    cpu = current_cpu;

>>    if (!cpu->running) {

>>       /* we weren't running or translating JIT code when the signal came */

>>       return 1;

>>    }



--
Alex Bennée
diff mbox series

Patch

diff --git a/cpu-exec.c b/cpu-exec.c
index 85f14d4194..2edd26e823 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -396,7 +396,6 @@  static inline bool cpu_handle_halt(CPUState *cpu)
         }
 #endif
         if (!cpu_has_work(cpu)) {
-            current_cpu = NULL;
             return true;
         }
 
@@ -675,8 +674,5 @@  int cpu_exec(CPUState *cpu)
     cc->cpu_exec_exit(cpu);
     rcu_read_unlock();
 
-    /* fail safe : never use current_cpu outside cpu_exec() */
-    current_cpu = NULL;
-
     return ret;
 }
diff --git a/cpus.c b/cpus.c
index e165d18785..bfee326d30 100644
--- a/cpus.c
+++ b/cpus.c
@@ -809,7 +809,7 @@  static void kick_tcg_thread(void *opaque)
 
 static void start_tcg_kick_timer(void)
 {
-    if (!tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) {
+    if (!mttcg_enabled && !tcg_kick_vcpu_timer && CPU_NEXT(first_cpu)) {
         tcg_kick_vcpu_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
                                            kick_tcg_thread, NULL);
         timer_mod(tcg_kick_vcpu_timer, qemu_tcg_next_kick());
@@ -1063,27 +1063,34 @@  static void qemu_tcg_destroy_vcpu(CPUState *cpu)
 
 static void qemu_wait_io_event_common(CPUState *cpu)
 {
+    atomic_mb_set(&cpu->thread_kicked, false);
     if (cpu->stop) {
         cpu->stop = false;
         cpu->stopped = true;
         qemu_cond_broadcast(&qemu_pause_cond);
     }
     process_queued_cpu_work(cpu);
-    cpu->thread_kicked = false;
+}
+
+static bool qemu_tcg_should_sleep(CPUState *cpu)
+{
+    if (mttcg_enabled) {
+        return cpu_thread_is_idle(cpu);
+    } else {
+        return all_cpu_threads_idle();
+    }
 }
 
 static void qemu_tcg_wait_io_event(CPUState *cpu)
 {
-    while (all_cpu_threads_idle()) {
+    while (qemu_tcg_should_sleep(cpu)) {
         stop_tcg_kick_timer();
         qemu_cond_wait(cpu->halt_cond, &qemu_global_mutex);
     }
 
     start_tcg_kick_timer();
 
-    CPU_FOREACH(cpu) {
-        qemu_wait_io_event_common(cpu);
-    }
+    qemu_wait_io_event_common(cpu);
 }
 
 static void qemu_kvm_wait_io_event(CPUState *cpu)
@@ -1154,6 +1161,7 @@  static void *qemu_dummy_cpu_thread_fn(void *arg)
     qemu_thread_get_self(cpu->thread);
     cpu->thread_id = qemu_get_thread_id();
     cpu->can_do_io = 1;
+    current_cpu = cpu;
 
     sigemptyset(&waitset);
     sigaddset(&waitset, SIG_IPI);
@@ -1162,9 +1170,7 @@  static void *qemu_dummy_cpu_thread_fn(void *arg)
     cpu->created = true;
     qemu_cond_signal(&qemu_cpu_cond);
 
-    current_cpu = cpu;
     while (1) {
-        current_cpu = NULL;
         qemu_mutex_unlock_iothread();
         do {
             int sig;
@@ -1175,7 +1181,6 @@  static void *qemu_dummy_cpu_thread_fn(void *arg)
             exit(1);
         }
         qemu_mutex_lock_iothread();
-        current_cpu = cpu;
         qemu_wait_io_event_common(cpu);
     }
 
@@ -1287,7 +1292,7 @@  static void deal_with_unplugged_cpus(void)
  * elsewhere.
  */
 
-static void *qemu_tcg_cpu_thread_fn(void *arg)
+static void *qemu_tcg_rr_cpu_thread_fn(void *arg)
 {
     CPUState *cpu = arg;
 
@@ -1309,6 +1314,7 @@  static void *qemu_tcg_cpu_thread_fn(void *arg)
 
         /* process any pending work */
         CPU_FOREACH(cpu) {
+            current_cpu = cpu;
             qemu_wait_io_event_common(cpu);
         }
     }
@@ -1331,6 +1337,7 @@  static void *qemu_tcg_cpu_thread_fn(void *arg)
         while (cpu && !cpu->queued_work_first && !cpu->exit_request) {
 
             atomic_mb_set(&tcg_current_rr_cpu, cpu);
+            current_cpu = cpu;
 
             qemu_clock_enable(QEMU_CLOCK_VIRTUAL,
                               (cpu->singlestep_enabled & SSTEP_NOTIMER) == 0);
@@ -1342,7 +1349,7 @@  static void *qemu_tcg_cpu_thread_fn(void *arg)
                     cpu_handle_guest_debug(cpu);
                     break;
                 }
-            } else if (cpu->stop || cpu->stopped) {
+            } else if (cpu->stop) {
                 if (cpu->unplug) {
                     cpu = CPU_NEXT(cpu);
                 }
@@ -1361,7 +1368,7 @@  static void *qemu_tcg_cpu_thread_fn(void *arg)
 
         handle_icount_deadline();
 
-        qemu_tcg_wait_io_event(QTAILQ_FIRST(&cpus));
+        qemu_tcg_wait_io_event(cpu ? cpu : QTAILQ_FIRST(&cpus));
         deal_with_unplugged_cpus();
     }
 
@@ -1408,6 +1415,64 @@  static void CALLBACK dummy_apc_func(ULONG_PTR unused)
 }
 #endif
 
+/* Multi-threaded TCG
+ *
+ * In the multi-threaded case each vCPU has its own thread. The TLS
+ * variable current_cpu can be used deep in the code to find the
+ * current CPUState for a given thread.
+ */
+
+static void *qemu_tcg_cpu_thread_fn(void *arg)
+{
+    CPUState *cpu = arg;
+
+    rcu_register_thread();
+
+    qemu_mutex_lock_iothread();
+    qemu_thread_get_self(cpu->thread);
+
+    cpu->thread_id = qemu_get_thread_id();
+    cpu->created = true;
+    cpu->can_do_io = 1;
+    current_cpu = cpu;
+    qemu_cond_signal(&qemu_cpu_cond);
+
+    /* process any pending work */
+    cpu->exit_request = 1;
+
+    while (1) {
+        if (cpu_can_run(cpu)) {
+            int r;
+            r = tcg_cpu_exec(cpu);
+            switch (r) {
+            case EXCP_DEBUG:
+                cpu_handle_guest_debug(cpu);
+                break;
+            case EXCP_HALTED:
+                /* during start-up the vCPU is reset and the thread is
+                 * kicked several times. If we don't ensure we go back
+                 * to sleep in the halted state we won't cleanly
+                 * start-up when the vCPU is enabled.
+                 *
+                 * cpu->halted should ensure we sleep in wait_io_event
+                 */
+                g_assert(cpu->halted);
+                break;
+            default:
+                /* Ignore everything else? */
+                break;
+            }
+        }
+
+        handle_icount_deadline();
+
+        atomic_mb_set(&cpu->exit_request, 0);
+        qemu_tcg_wait_io_event(cpu);
+    }
+
+    return NULL;
+}
+
 static void qemu_cpu_kick_thread(CPUState *cpu)
 {
 #ifndef _WIN32
@@ -1438,7 +1503,7 @@  void qemu_cpu_kick(CPUState *cpu)
     qemu_cond_broadcast(cpu->halt_cond);
     if (tcg_enabled()) {
         cpu_exit(cpu);
-        /* Also ensure current RR cpu is kicked */
+        /* NOP unless doing single-thread RR */
         qemu_cpu_kick_rr_cpu();
     } else {
         if (hax_enabled()) {
@@ -1514,13 +1579,6 @@  void pause_all_vcpus(void)
 
     if (qemu_in_vcpu_thread()) {
         cpu_stop_current();
-        if (!kvm_enabled()) {
-            CPU_FOREACH(cpu) {
-                cpu->stop = false;
-                cpu->stopped = true;
-            }
-            return;
-        }
     }
 
     while (!all_vcpus_paused()) {
@@ -1569,29 +1627,43 @@  void cpu_remove_sync(CPUState *cpu)
 static void qemu_tcg_init_vcpu(CPUState *cpu)
 {
     char thread_name[VCPU_THREAD_NAME_SIZE];
-    static QemuCond *tcg_halt_cond;
-    static QemuThread *tcg_cpu_thread;
+    static QemuCond *single_tcg_halt_cond;
+    static QemuThread *single_tcg_cpu_thread;
 
-    /* share a single thread for all cpus with TCG */
-    if (!tcg_cpu_thread) {
+    if (qemu_tcg_mttcg_enabled() || !single_tcg_cpu_thread) {
         cpu->thread = g_malloc0(sizeof(QemuThread));
         cpu->halt_cond = g_malloc0(sizeof(QemuCond));
         qemu_cond_init(cpu->halt_cond);
-        tcg_halt_cond = cpu->halt_cond;
-        snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
+
+        if (qemu_tcg_mttcg_enabled()) {
+            /* create a thread per vCPU with TCG (MTTCG) */
+            parallel_cpus = true;
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "CPU %d/TCG",
                  cpu->cpu_index);
-        qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
-                           cpu, QEMU_THREAD_JOINABLE);
+
+            qemu_thread_create(cpu->thread, thread_name, qemu_tcg_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+        } else {
+            /* share a single thread for all cpus with TCG */
+            snprintf(thread_name, VCPU_THREAD_NAME_SIZE, "ALL CPUs/TCG");
+            qemu_thread_create(cpu->thread, thread_name,
+                               qemu_tcg_rr_cpu_thread_fn,
+                               cpu, QEMU_THREAD_JOINABLE);
+
+            single_tcg_halt_cond = cpu->halt_cond;
+            single_tcg_cpu_thread = cpu->thread;
+        }
 #ifdef _WIN32
         cpu->hThread = qemu_thread_get_handle(cpu->thread);
 #endif
         while (!cpu->created) {
             qemu_cond_wait(&qemu_cpu_cond, &qemu_global_mutex);
         }
-        tcg_cpu_thread = cpu->thread;
     } else {
-        cpu->thread = tcg_cpu_thread;
-        cpu->halt_cond = tcg_halt_cond;
+        /* For non-MTTCG cases we share the thread */
+        cpu->thread = single_tcg_cpu_thread;
+        cpu->halt_cond = single_tcg_halt_cond;
     }
 }