Message ID | 20170302195337.31558-6-alex.bennee@linaro.org |
---|---|
State | New |
Headers | show |
Series | MTTCG fixups for 2.9 | expand |
On 03/03/2017 06:53 AM, Alex Bennée wrote: > The translation code uses cpu_ld*_code which can trigger a tlb_fill > which if it fails will attempt a fault resolution. This never works > during translation as the TB being generated hasn't been added yet. > However with the new locking regime we end up double locking the > tb_lock(). As the tcg_ctx.cpu is only set during translation we use > this to short circuit the restore code and return with a fail. We *should* have retaddr == 0 for this case, which indicates that we should not attempt to restore state. Are you seeing a non-zero value? Hmm.. Or rather we should not have called cpu_restore_state in the first place. What is the call chain leading to this point? Is this in fact linux-user, not softmmu, as you imply from tlb_fill? Because handle_cpu_signal will in fact pass a genuine non-zero retaddr for the SIGSEGV resulting from a cpu_ld*_code from a non-mapped address. So... for linux-user I think the qemu_log is unnecessary -- that's just the way things are. For softmmu, I don't know what to think except that we shouldn't have gotten here. r~
Richard Henderson <rth@twiddle.net> writes: > On 03/03/2017 06:53 AM, Alex Bennée wrote: >> The translation code uses cpu_ld*_code which can trigger a tlb_fill >> which if it fails will attempt a fault resolution. This never works >> during translation as the TB being generated hasn't been added yet. >> However with the new locking regime we end up double locking the >> tb_lock(). As the tcg_ctx.cpu is only set during translation we use >> this to short circuit the restore code and return with a fail. > > We *should* have retaddr == 0 for this case, which indicates that we > should not attempt to restore state. Are you seeing a non-zero value? Actually looking at xtensa I see: Attempt to resolve CPU state @ 0x0 while translating So maybe I should check just that - but I don't see where we ensure we always pass zero. > > Hmm.. Or rather we should not have called cpu_restore_state in the > first place. What is the call chain leading to this point? Thread 3 "qemu-system-xte" hit Breakpoint 2, cpu_restore_state (cpu=cpu@entry=0x555556032600, retaddr=retaddr@entry=0) at /home/alex/lsrc/qemu/qemu.git/translate-all.c:338 338 qemu_log_mask(LOG_UNIMP, "Attempt to resolve CPU state @ 0x%" PRIxPTR #0 0x00005555555e3712 in cpu_restore_state (cpu=cpu@entry=0x555556032600, retaddr=retaddr@entry=0) at /home/alex/lsrc/qemu/qemu.git/translate-all.c:338 #1 0x000055555564cb38 in tlb_fill (cs=cs@entry=0x555556032600, vaddr=vaddr@entry=537034752, access_type=access_type@entry=MMU_INST_FETCH, mmu_idx=mmu_idx@entry=1, retaddr=retaddr@entry=0) at /home/alex/lsrc/qemu/qemu.git/target/xtensa/op_helper.c:73 #2 0x000055555562d604 in helper_ret_ldb_cmmu (env=env@entry=0x55555603a890, addr=537034752, oi=<optimised out>, retaddr=retaddr@entry=0) at /home/alex/lsrc/qemu/qemu.git/softmmu_template.h:127 #3 0x00005555556448df in gen_intermediate_code (retaddr=0, ptr=<optimised out>, env=0x55555603a890) at /home/alex/lsrc/qemu/qemu.git/include/exec/cpu_ldst_template.h:102 #4 0x00005555556448df in gen_intermediate_code (ptr=<optimised out>, env=0x55555603a890) at /home/alex/lsrc/qemu/qemu.git/include/exec/cpu_ldst_template.h:114 #5 0x00005555556448df in gen_intermediate_code (dc=0x7fffcca0f4d0, env=0x55555603a890) at /home/alex/lsrc/qemu/qemu.git/target/xtensa/translate.c:1052 #6 0x00005555556448df in gen_intermediate_code (env=0x55555603a890, tb=0x7fffccc7de00) at /home/alex/lsrc/qemu/qemu.git/target/xtensa/translate.c:3214 #7 0x00005555555e383b in tb_gen_code (cpu=cpu@entry=0x555556032600, pc=pc@entry=537034751, cs_base=cs_base@entry=0, flags=flags@entry=229393, cflags=<optimised out>, cflags@entry=0) at /home/alex/lsrc/qemu/qemu.git/translate-all.c:1288 #8 0x00005555555e5e41 in cpu_exec (tb_exit=0, last_tb=<optimised out>, cpu=0x38011) at /home/alex/lsrc/qemu/qemu.git/cpu-exec.c:370 #9 0x00005555555e5e41 in cpu_exec (cpu=cpu@entry=0x555556032600) at /home/alex/lsrc/qemu/qemu.git/cpu-exec.c:685 #10 0x0000555555611898 in tcg_cpu_exec (cpu=0x555556032600) at /home/alex/lsrc/qemu/qemu.git/cpus.c:1254 #11 0x0000555555611bd4 in qemu_tcg_rr_cpu_thread_fn (arg=<optimised out>) at /home/alex/lsrc/qemu/qemu.git/cpus.c:1350 #12 0x00007fffdf6606ba in start_thread (arg=0x7fffcca12700) at pthread_create.c:333 #13 0x00007fffdf39682d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 Continuing. Thread 3 "qemu-system-xte" hit Breakpoint 2, cpu_restore_state (cpu=cpu@entry=0x555556032600, retaddr=retaddr@entry=0) at /home/alex/lsrc/qemu/qemu.git/translate-all.c:338 338 qemu_log_mask(LOG_UNIMP, "Attempt to resolve CPU state @ 0x%" PRIxPTR #0 0x00005555555e3712 in cpu_restore_state (cpu=cpu@entry=0x555556032600, retaddr=retaddr@entry=0) at /home/alex/lsrc/qemu/qemu.git/translate-all.c:338 #1 0x000055555564cb38 in tlb_fill (cs=cs@entry=0x555556032600, vaddr=vaddr@entry=4308992, access_type=access_type@entry=MMU_INST_FETCH, mmu_idx=mmu_idx@entry=1, retaddr=retaddr@entry=0) at /home/alex/lsrc/qemu/qemu.git/target/xtensa/op_helper.c:73 #2 0x000055555562d604 in helper_ret_ldb_cmmu (env=env@entry=0x55555603a890, addr=4308992, oi=<optimised out>, retaddr=retaddr@entry=0) at /home/alex/lsrc/qemu/qemu.git/softmmu_template.h:127 #3 0x00005555556448df in gen_intermediate_code (retaddr=0, ptr=<optimised out>, env=0x55555603a890) at /home/alex/lsrc/qemu/qemu.git/include/exec/cpu_ldst_template.h:102 #4 0x00005555556448df in gen_intermediate_code (ptr=<optimised out>, env=0x55555603a890) at /home/alex/lsrc/qemu/qemu.git/include/exec/cpu_ldst_template.h:114 #5 0x00005555556448df in gen_intermediate_code (dc=0x7fffcca0f4d0, env=0x55555603a890) at /home/alex/lsrc/qemu/qemu.git/target/xtensa/translate.c:1052 #6 0x00005555556448df in gen_intermediate_code (env=0x55555603a890, tb=0x7fffccd0b1b0) at /home/alex/lsrc/qemu/qemu.git/target/xtensa/translate.c:3214 #7 0x00005555555e383b in tb_gen_code (cpu=cpu@entry=0x555556032600, pc=pc@entry=4308991, cs_base=cs_base@entry=0, flags=flags@entry=229393, cflags=<optimised out>, cflags@entry=0) at /home/alex/lsrc/qemu/qemu.git/translate-all.c:1288 #8 0x00005555555e5e41 in cpu_exec (tb_exit=0, last_tb=<optimised out>, cpu=0x38011) at /home/alex/lsrc/qemu/qemu.git/cpu-exec.c:370 #9 0x00005555555e5e41 in cpu_exec (cpu=cpu@entry=0x555556032600) at /home/alex/lsrc/qemu/qemu.git/cpu-exec.c:685 #10 0x0000555555611898 in tcg_cpu_exec (cpu=0x555556032600) at /home/alex/lsrc/qemu/qemu.git/cpus.c:1254 #11 0x0000555555611bd4 in qemu_tcg_rr_cpu_thread_fn (arg=<optimised out>) at /home/alex/lsrc/qemu/qemu.git/cpus.c:1350 #12 0x00007fffdf6606ba in start_thread (arg=0x7fffcca12700) at pthread_create.c:333 #13 0x00007fffdf39682d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109 > Is this in fact linux-user, not softmmu, as you imply from tlb_fill? > Because handle_cpu_signal will in fact pass a genuine non-zero retaddr > for the SIGSEGV resulting from a cpu_ld*_code from a non-mapped > address. I think that is another call chain that might trip us up. Peter mentioned he'd hit it. This one is definitely softmmu. > So... for linux-user I think the qemu_log is unnecessary -- that's > just the way things are. For softmmu, I don't know what to think > except that we shouldn't have gotten here. I kinda agree but all SoftMMU targets have this potential path. Having saif that I haven't seen it hit ARM, maybe because we take care not to cross a page boundary? I agree it would be better to handle fetching code bytes without this potential for breakage but that would be a much bigger change and need more testing this close to rc0. > > > r~ -- Alex Bennée
On 03/03/2017 09:03 PM, Alex Bennée wrote: >> We *should* have retaddr == 0 for this case, which indicates that we >> should not attempt to restore state. Are you seeing a non-zero value? > > Actually looking at xtensa I see: > > Attempt to resolve CPU state @ 0x0 while translating > > So maybe I should check just that - but I don't see where we ensure we > always pass zero. cpu_ld*_cmmu, in include/exec/cpu_ldst_template.h, has return glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(env, ptr, 0); so there's your zero. > #0 0x00005555555e3712 in cpu_restore_state (cpu=cpu@entry=0x555556032600, retaddr=retaddr@entry=0) at /home/alex/lsrc/qemu/qemu.git/translate-all.c:338 > #1 0x000055555564cb38 in tlb_fill (cs=cs@entry=0x555556032600, vaddr=vaddr@entry=537034752, access_type=access_type@entry=MMU_INST_FETCH, mmu_idx=mmu_idx@entry=1, retaddr=retaddr@entry=0) at /home/alex/lsrc/qemu/qemu.git/target/xtensa/op_helper.c:73 > #2 0x000055555562d604 in helper_ret_ldb_cmmu (env=env@entry=0x55555603a890, addr=537034752, oi=<optimised out>, retaddr=retaddr@entry=0) at /home/alex/lsrc/qemu/qemu.git/softmmu_template.h:127 Confirmed. This is a simple bug in xtensa -- failure to check retaddr == 0 before calling cpu_restore_state. That said, I do wonder if it wouldn't be better to move that check inside cpu_restore_state instead. Put the check there now, but leave the follow-on cleanup for the next devel cycle. It would also save auditing the other usages of cpu_restore_state in the tree. >> Is this in fact linux-user, not softmmu, as you imply from tlb_fill? >> Because handle_cpu_signal will in fact pass a genuine non-zero retaddr >> for the SIGSEGV resulting from a cpu_ld*_code from a non-mapped >> address. > > I think that is another call chain that might trip us up. Peter > mentioned he'd hit it. This one is definitely softmmu. This is really easy to reproduce on any guest with a call to a null function pointer. r~
diff --git a/translate-all.c b/translate-all.c index 7ee273410d..956d54b882 100644 --- a/translate-all.c +++ b/translate-all.c @@ -333,6 +333,13 @@ bool cpu_restore_state(CPUState *cpu, uintptr_t retaddr) TranslationBlock *tb; bool r = false; + /* Don't attempt to restore state if we are translating already */ + if (tcg_ctx.cpu == cpu) { + qemu_log_mask(LOG_UNIMP, "Attempt to resolve CPU state @ 0x%" PRIxPTR + " while translating\n", retaddr); + return r; + } + tb_lock(); tb = tb_find_pc(retaddr); if (tb) {
The translation code uses cpu_ld*_code which can trigger a tlb_fill which if it fails will attempt a fault resolution. This never works during translation as the TB being generated hasn't been added yet. However with the new locking regime we end up double locking the tb_lock(). As the tcg_ctx.cpu is only set during translation we use this to short circuit the restore code and return with a fail. Most front-ends seem to ignore the pass/fail result anyway but tolerate not having the cpu environment updated. This is arguably ugly but will do for now. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> --- translate-all.c | 7 +++++++ 1 file changed, 7 insertions(+) -- 2.11.0