[v1,2/3] user-exec: handle synchronous signals from QEMU gracefully

Message ID 20170320153441.2181-3-alex.bennee@linaro.org
State Superseded
Headers show
Series
  • MTTCG regression fixes for 2.9-rc1
Related show

Commit Message

Alex Bennée March 20, 2017, 3:34 p.m.
When "tcg: enable thread-per-vCPU" (commit 3725794) was merged the
lifetime of current_cpu was changed. Previously a broken linux-user
call might abort() which can eventually escalate into a SIGSEGV which
would then crash qemu as it attempted to deref a NULL current_cpu.
After commit 3725794 it would attempt to fixup state and re-start the
run-loop and much hilarity (i.e. a looping lockup) would ensue from
jumping into a stale jmp_env.

As we can actually tell if we are in the run-loop from looking at the
cpu->running flag we should catch this badness first and abort()
cleanly rather than try to soldier on. There is a theoretical race
between the flag being set and sigsetjmp refreshing the jump buffer
but we can try really hard to not introduce crashes into that code.

[LV: setgroups03 fails on powerpc LTP]
Reported-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
 user-exec.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

-- 
2.11.0

Comments

Richard Henderson March 20, 2017, 9:52 p.m. | #1
On 03/21/2017 01:34 AM, Alex Bennée wrote:
> When "tcg: enable thread-per-vCPU" (commit 3725794) was merged the

> lifetime of current_cpu was changed. Previously a broken linux-user

> call might abort() which can eventually escalate into a SIGSEGV which

> would then crash qemu as it attempted to deref a NULL current_cpu.

> After commit 3725794 it would attempt to fixup state and re-start the

> run-loop and much hilarity (i.e. a looping lockup) would ensue from

> jumping into a stale jmp_env.

>

> As we can actually tell if we are in the run-loop from looking at the

> cpu->running flag we should catch this badness first and abort()

> cleanly rather than try to soldier on. There is a theoretical race

> between the flag being set and sigsetjmp refreshing the jump buffer

> but we can try really hard to not introduce crashes into that code.

>

> [LV: setgroups03 fails on powerpc LTP]

> Reported-by: Laurent Vivier <laurent@vivier.eu>

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> ---

>  user-exec.c | 18 +++++++++++++++---

>  1 file changed, 15 insertions(+), 3 deletions(-)


Reviewed-by: Richard Henderson <rth@twiddle.net>



r~

Patch

diff --git a/user-exec.c b/user-exec.c
index 6db075884d..a8f95fa1e1 100644
--- a/user-exec.c
+++ b/user-exec.c
@@ -57,10 +57,23 @@  static void cpu_exit_tb_from_sighandler(CPUState *cpu, sigset_t *old_set)
 static inline int handle_cpu_signal(uintptr_t pc, unsigned long address,
                                     int is_write, sigset_t *old_set)
 {
-    CPUState *cpu;
+    CPUState *cpu = current_cpu;
     CPUClass *cc;
     int ret;
 
+    /* For synchronous signals we expect to be coming from the vCPU
+     * thread (so current_cpu should be valid) and either from running
+     * code or during translation which can fault as we cross pages.
+     *
+     * If neither is true then something has gone wrong and we should
+     * abort rather than try and restart the vCPU execution.
+     */
+    if (!cpu || !cpu->running) {
+        printf("qemu:%s received signal outside vCPU context @ pc=0x%"
+               PRIxPTR "\n",  __func__, pc);
+        abort();
+    }
+
 #if defined(DEBUG_SIGNAL)
     printf("qemu: SIGSEGV pc=0x%08lx address=%08lx w=%d oldset=0x%08lx\n",
            pc, address, is_write, *(unsigned long *)old_set);
@@ -83,7 +96,7 @@  static inline int handle_cpu_signal(uintptr_t pc, unsigned long address,
              * currently executing TB was modified and must be exited
              * immediately.
              */
-            cpu_exit_tb_from_sighandler(current_cpu, old_set);
+            cpu_exit_tb_from_sighandler(cpu, old_set);
             g_assert_not_reached();
         default:
             g_assert_not_reached();
@@ -94,7 +107,6 @@  static inline int handle_cpu_signal(uintptr_t pc, unsigned long address,
        are still valid segv ones */
     address = h2g_nocheck(address);
 
-    cpu = current_cpu;
     cc = CPU_GET_CLASS(cpu);
     /* see if it is an MMU fault */
     g_assert(cc->handle_mmu_fault);