Message ID | 20240910212351.977753-1-richard.henderson@linaro.org |
---|---|
Headers | show |
Series | tcg: Fix branch/label link during plugin expansion | expand |
On 9/10/24 14:23, Richard Henderson wrote: > With tcg_last_op(), we always get the last op of the stream. > With TCGContext.emit_before_op, the most recently emitted op > is no longer the last op. > > Instead, pass the op being emitted back from the allocator so > that we can link it to the label without needing to look it up. Oh, I meant to point out from whence this comes. The plugin uses a conditional ld_i32 tmp18,env,$0xffffffffffffdb10 mul_i32 tmp18,tmp18,$0x18 ext_i32_i64 tmp17,tmp18 add_i64 tmp17,tmp17,$0x575410edadc8 ld_i64 tmp21,tmp17,$0x0 brcond_i64 tmp21,$0x0,ltu,$L1 ld_i32 tmp18,env,$0xffffffffffffdb10 call plugin(0x79a2abfde66a),$0x1,$0,tmp18,$0x0 set_label $L1 Note that the branch is X < 0 (unsigned), which is always false, and thus the branch is optimized away. r~
Richard Henderson <richard.henderson@linaro.org> writes: > On 9/10/24 14:23, Richard Henderson wrote: >> With tcg_last_op(), we always get the last op of the stream. >> With TCGContext.emit_before_op, the most recently emitted op >> is no longer the last op. >> Instead, pass the op being emitted back from the allocator so >> that we can link it to the label without needing to look it up. > > Oh, I meant to point out from whence this comes. > The plugin uses a conditional size_t n_insns = qemu_plugin_tb_n_insns(tb); qemu_plugin_u64 quantum_insn = qemu_plugin_scoreboard_u64_in_struct(vcpus, vCPUTime, quantum_insn); /* count (and eventually trap) once per tb */ qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu( tb, QEMU_PLUGIN_INLINE_ADD_U64, quantum_insn, n_insns); > ld_i32 tmp18,env,$0xffffffffffffdb10 > mul_i32 tmp18,tmp18,$0x18 > ext_i32_i64 tmp17,tmp18 > add_i64 tmp17,tmp17,$0x575410edadc8 qemu_plugin_register_vcpu_tb_exec_cond_cb( tb, every_quantum_insn, QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_COND_GE, quantum_insn, max_insn_per_quantum, NULL); ? > ld_i64 tmp21,tmp17,$0x0 > brcond_i64 tmp21,$0x0,ltu,$L1 > ld_i32 tmp18,env,$0xffffffffffffdb10 > call plugin(0x79a2abfde66a),$0x1,$0,tmp18,$0x0 > set_label $L1 > > Note that the branch is X < 0 (unsigned), which is always false, and > thus the branch is optimized away. I'm obviously missing something reading this. How can TCG know the state of the scoreboard variables and optimise away the branch? > > > r~
On 9/13/24 03:23, Alex Bennée wrote: >> Note that the branch is X < 0 (unsigned), which is always false, and >> thus the branch is optimized away. > > I'm obviously missing something reading this. How can TCG know the state > of the scoreboard variables and optimise away the branch? 0 < 0 is of course false. r~
On 9/13/24 03:23, Alex Bennée wrote: > Richard Henderson <richard.henderson@linaro.org> writes: > >> On 9/10/24 14:23, Richard Henderson wrote: >>> With tcg_last_op(), we always get the last op of the stream. >>> With TCGContext.emit_before_op, the most recently emitted op >>> is no longer the last op. >>> Instead, pass the op being emitted back from the allocator so >>> that we can link it to the label without needing to look it up. >> >> Oh, I meant to point out from whence this comes. >> The plugin uses a conditional > > size_t n_insns = qemu_plugin_tb_n_insns(tb); > qemu_plugin_u64 quantum_insn = > qemu_plugin_scoreboard_u64_in_struct(vcpus, vCPUTime, quantum_insn); > /* count (and eventually trap) once per tb */ > qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu( > tb, QEMU_PLUGIN_INLINE_ADD_U64, quantum_insn, n_insns); > >> ld_i32 tmp18,env,$0xffffffffffffdb10 >> mul_i32 tmp18,tmp18,$0x18 >> ext_i32_i64 tmp17,tmp18 >> add_i64 tmp17,tmp17,$0x575410edadc8 > > qemu_plugin_register_vcpu_tb_exec_cond_cb( > tb, every_quantum_insn, > QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_COND_GE, > quantum_insn, max_insn_per_quantum, NULL); > > ? > >> ld_i64 tmp21,tmp17,$0x0 >> brcond_i64 tmp21,$0x0,ltu,$L1 >> ld_i32 tmp18,env,$0xffffffffffffdb10 >> call plugin(0x79a2abfde66a),$0x1,$0,tmp18,$0x0 >> set_label $L1 >> >> Note that the branch is X < 0 (unsigned), which is always false, and >> thus the branch is optimized away. > > I'm obviously missing something reading this. How can TCG know the state > of the scoreboard variables and optimise away the branch? > The constant against which we compare scoreboard entry value is known at translation time. >> >> >> r~ >