[0/2] tcg: Fix branch/label link during plugin expansion

Message ID	20240910212351.977753-1-richard.henderson@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Cc: alex.bennee@linaro.org, pierrick.bouvier@linaro.org Subject: [PATCH 0/2] tcg: Fix branch/label link during plugin expansion Date: Tue, 10 Sep 2024 14:23:49 -0700 Message-ID: <20240910212351.977753-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::42a; envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x42a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
Series	tcg: Fix branch/label link during plugin expansion \| expand [0/2] tcg: Fix branch/label link during plugin expansion [1/2] tcg: Return TCGOp from tcg_gen_op[1-6] [2/2] tcg: Propagate new TCGOp to add_as_label_use

Message ID

20240910212351.977753-1-richard.henderson@linaro.org

Headers

Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as
 permitted sender) client-ip=209.51.188.17;
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Cc: alex.bennee@linaro.org,
	pierrick.bouvier@linaro.org
Subject: [PATCH 0/2] tcg: Fix branch/label link during plugin expansion
Date: Tue, 10 Sep 2024 14:23:49 -0700
Message-ID: <20240910212351.977753-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::42a;
 envelope-from=richard.henderson@linaro.org; helo=mail-pf1-x42a.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org

Series

tcg: Fix branch/label link during plugin expansion | expand

Message

Richard Henderson Sept. 10, 2024, 9:23 p.m. UTC

With tcg_last_op(), we always get the last op of the stream.
With TCGContext.emit_before_op, the most recently emitted op
is no longer the last op.

Instead, pass the op being emitted back from the allocator so
that we can link it to the label without needing to look it up.


r~


Richard Henderson (2):
  tcg: Return TCGOp from tcg_gen_op[1-6]
  tcg: Propagate new TCGOp to add_as_label_use

 tcg/tcg-internal.h | 12 +++----
 tcg/tcg-op.c       | 86 +++++++++++++++++++++++++---------------------
 2 files changed, 53 insertions(+), 45 deletions(-)

Comments

Richard Henderson Sept. 10, 2024, 9:28 p.m. UTC | #1

On 9/10/24 14:23, Richard Henderson wrote:
> With tcg_last_op(), we always get the last op of the stream.
> With TCGContext.emit_before_op, the most recently emitted op
> is no longer the last op.
> 
> Instead, pass the op being emitted back from the allocator so
> that we can link it to the label without needing to look it up.

Oh, I meant to point out from whence this comes.
The plugin uses a conditional

  ld_i32 tmp18,env,$0xffffffffffffdb10
  mul_i32 tmp18,tmp18,$0x18
  ext_i32_i64 tmp17,tmp18
  add_i64 tmp17,tmp17,$0x575410edadc8
  ld_i64 tmp21,tmp17,$0x0
  brcond_i64 tmp21,$0x0,ltu,$L1
  ld_i32 tmp18,env,$0xffffffffffffdb10
  call plugin(0x79a2abfde66a),$0x1,$0,tmp18,$0x0
  set_label $L1

Note that the branch is X < 0 (unsigned), which is always false, and thus the branch is 
optimized away.


r~

Alex Bennée Sept. 13, 2024, 10:23 a.m. UTC | #2

Richard Henderson <richard.henderson@linaro.org> writes:

> On 9/10/24 14:23, Richard Henderson wrote:
>> With tcg_last_op(), we always get the last op of the stream.
>> With TCGContext.emit_before_op, the most recently emitted op
>> is no longer the last op.
>> Instead, pass the op being emitted back from the allocator so
>> that we can link it to the label without needing to look it up.
>
> Oh, I meant to point out from whence this comes.
> The plugin uses a conditional

    size_t n_insns = qemu_plugin_tb_n_insns(tb);
    qemu_plugin_u64 quantum_insn =
        qemu_plugin_scoreboard_u64_in_struct(vcpus, vCPUTime, quantum_insn);
    /* count (and eventually trap) once per tb */
    qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
        tb, QEMU_PLUGIN_INLINE_ADD_U64, quantum_insn, n_insns);

>  ld_i32 tmp18,env,$0xffffffffffffdb10
>  mul_i32 tmp18,tmp18,$0x18
>  ext_i32_i64 tmp17,tmp18
>  add_i64 tmp17,tmp17,$0x575410edadc8

    qemu_plugin_register_vcpu_tb_exec_cond_cb(
        tb, every_quantum_insn,
        QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_COND_GE,
        quantum_insn, max_insn_per_quantum, NULL);

?

>  ld_i64 tmp21,tmp17,$0x0
>  brcond_i64 tmp21,$0x0,ltu,$L1
>  ld_i32 tmp18,env,$0xffffffffffffdb10
>  call plugin(0x79a2abfde66a),$0x1,$0,tmp18,$0x0
>  set_label $L1
>
> Note that the branch is X < 0 (unsigned), which is always false, and
> thus the branch is optimized away.

I'm obviously missing something reading this. How can TCG know the state
of the scoreboard variables and optimise away the branch?

>
>
> r~

Richard Henderson Sept. 13, 2024, 4:27 p.m. UTC | #3

On 9/13/24 03:23, Alex Bennée wrote:
>> Note that the branch is X < 0 (unsigned), which is always false, and
>> thus the branch is optimized away.
> 
> I'm obviously missing something reading this. How can TCG know the state
> of the scoreboard variables and optimise away the branch?

0 < 0 is of course false.

r~

Pierrick Bouvier Sept. 18, 2024, 6:43 p.m. UTC | #4

On 9/13/24 03:23, Alex Bennée wrote:
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> On 9/10/24 14:23, Richard Henderson wrote:
>>> With tcg_last_op(), we always get the last op of the stream.
>>> With TCGContext.emit_before_op, the most recently emitted op
>>> is no longer the last op.
>>> Instead, pass the op being emitted back from the allocator so
>>> that we can link it to the label without needing to look it up.
>>
>> Oh, I meant to point out from whence this comes.
>> The plugin uses a conditional
> 
>      size_t n_insns = qemu_plugin_tb_n_insns(tb);
>      qemu_plugin_u64 quantum_insn =
>          qemu_plugin_scoreboard_u64_in_struct(vcpus, vCPUTime, quantum_insn);
>      /* count (and eventually trap) once per tb */
>      qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
>          tb, QEMU_PLUGIN_INLINE_ADD_U64, quantum_insn, n_insns);
> 
>>   ld_i32 tmp18,env,$0xffffffffffffdb10
>>   mul_i32 tmp18,tmp18,$0x18
>>   ext_i32_i64 tmp17,tmp18
>>   add_i64 tmp17,tmp17,$0x575410edadc8
> 
>      qemu_plugin_register_vcpu_tb_exec_cond_cb(
>          tb, every_quantum_insn,
>          QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_COND_GE,
>          quantum_insn, max_insn_per_quantum, NULL);
> 
> ?
> 
>>   ld_i64 tmp21,tmp17,$0x0
>>   brcond_i64 tmp21,$0x0,ltu,$L1
>>   ld_i32 tmp18,env,$0xffffffffffffdb10
>>   call plugin(0x79a2abfde66a),$0x1,$0,tmp18,$0x0
>>   set_label $L1
>>
>> Note that the branch is X < 0 (unsigned), which is always false, and
>> thus the branch is optimized away.
> 
> I'm obviously missing something reading this. How can TCG know the state
> of the scoreboard variables and optimise away the branch?
> 

The constant against which we compare scoreboard entry value is known at 
translation time.

>>
>>
>> r~
>