mbox series

[v4,000/163] tcg: Convert to TCGOutOp structures

Message ID 20250415192515.232910-1-richard.henderson@linaro.org
Headers show
Series tcg: Convert to TCGOutOp structures | expand

Message

Richard Henderson April 15, 2025, 7:22 p.m. UTC
v2: 20250107080112.1175095-1-richard.henderson@linaro.org
v3: 20250216231012.2808572-1-richard.henderson@linaro.org

Since it has been 2 months, I don't recall specific changes from v3 to v4.
It's mostly application of r-b tags.  There is one more patch, which I
believe was Phil asking for one patch to be split.

Patches still requiring review: 29, 41-43, 46, 47, 49-51, 55, 57, 59-62,
  64, 66-68, 70, 72-78, 80, 82-87, 89, 91, 93, 95, 97-102, 104, 106-162.


r~


Richard Henderson (163):
  tcg: Add all_outop[]
  tcg: Use extract2 for cross-word 64-bit extract on 32-bit host
  tcg: Remove INDEX_op_ext{8,16,32}*
  tcg: Merge INDEX_op_mov_{i32,i64}
  tcg: Convert add to TCGOutOpBinary
  tcg: Merge INDEX_op_add_{i32,i64}
  tcg: Convert and to TCGOutOpBinary
  tcg: Merge INDEX_op_and_{i32,i64}
  tcg/optimize: Fold andc with immediate to and
  tcg/optimize: Emit add r,r,-1 in fold_setcond_tst_pow2
  tcg: Convert andc to TCGOutOpBinary
  tcg: Merge INDEX_op_andc_{i32,i64}
  tcg: Convert or to TCGOutOpBinary
  tcg: Merge INDEX_op_or_{i32,i64}
  tcg/optimize: Fold orc with immediate to or
  tcg: Convert orc to TCGOutOpBinary
  tcg: Merge INDEX_op_orc_{i32,i64}
  tcg: Convert xor to TCGOutOpBinary
  tcg: Merge INDEX_op_xor_{i32,i64}
  tcg/optimize: Fold eqv with immediate to xor
  tcg: Convert eqv to TCGOutOpBinary
  tcg: Merge INDEX_op_eqv_{i32,i64}
  tcg: Convert nand to TCGOutOpBinary
  tcg: Merge INDEX_op_nand_{i32,i64}
  tcg/loongarch64: Do not accept constant argument to nor
  tcg: Convert nor to TCGOutOpBinary
  tcg: Merge INDEX_op_nor_{i32,i64}
  tcg/arm: Fix constraints for sub
  tcg: Convert sub to TCGOutOpSubtract
  tcg: Merge INDEX_op_sub_{i32,i64}
  tcg: Convert neg to TCGOutOpUnary
  tcg: Merge INDEX_op_neg_{i32,i64}
  tcg: Convert not to TCGOutOpUnary
  tcg: Merge INDEX_op_not_{i32,i64}
  tcg: Convert mul to TCGOutOpBinary
  tcg: Merge INDEX_op_mul_{i32,i64}
  tcg: Convert muluh to TCGOutOpBinary
  tcg: Merge INDEX_op_muluh_{i32,i64}
  tcg: Convert mulsh to TCGOutOpBinary
  tcg: Merge INDEX_op_mulsh_{i32,i64}
  tcg: Convert div to TCGOutOpBinary
  tcg: Merge INDEX_op_div_{i32,i64}
  tcg: Convert divu to TCGOutOpBinary
  tcg: Merge INDEX_op_divu_{i32,i64}
  tcg: Convert div2 to TCGOutOpDivRem
  tcg: Merge INDEX_op_div2_{i32,i64}
  tcg: Convert divu2 to TCGOutOpDivRem
  tcg: Merge INDEX_op_divu2_{i32,i64}
  tcg: Convert rem to TCGOutOpBinary
  tcg: Merge INDEX_op_rem_{i32,i64}
  tcg: Convert remu to TCGOutOpBinary
  tcg: Merge INDEX_op_remu_{i32,i64}
  tcg: Convert shl to TCGOutOpBinary
  tcg: Merge INDEX_op_shl_{i32,i64}
  tcg: Convert shr to TCGOutOpBinary
  tcg: Merge INDEX_op_shr_{i32,i64}
  tcg: Convert sar to TCGOutOpBinary
  tcg: Merge INDEX_op_sar_{i32,i64}
  tcg: Do not require both rotr and rotl from the backend
  tcg: Convert rotl, rotr to TCGOutOpBinary
  tcg: Merge INDEX_op_rot{l,r}_{i32,i64}
  tcg: Convert clz to TCGOutOpBinary
  tcg: Merge INDEX_op_clz_{i32,i64}
  tcg: Convert ctz to TCGOutOpBinary
  tcg: Merge INDEX_op_ctz_{i32,i64}
  tcg: Convert ctpop to TCGOutOpUnary
  tcg: Merge INDEX_op_ctpop_{i32,i64}
  tcg: Convert muls2 to TCGOutOpMul2
  tcg: Merge INDEX_op_muls2_{i32,i64}
  tcg: Convert mulu2 to TCGOutOpMul2
  tcg: Merge INDEX_op_mulu2_{i32,i64}
  tcg/loongarch64: Support negsetcond
  tcg/mips: Support negsetcond
  tcg/tci: Support negsetcond
  tcg: Remove TCG_TARGET_HAS_negsetcond_{i32,i64}
  tcg: Convert setcond, negsetcond to TCGOutOpSetcond
  tcg: Merge INDEX_op_{neg}setcond_{i32,i64}`
  tcg: Convert brcond to TCGOutOpBrcond
  tcg: Merge INDEX_op_brcond_{i32,i64}
  tcg: Convert movcond to TCGOutOpMovcond
  tcg: Merge INDEX_op_movcond_{i32,i64}
  tcg/ppc: Drop fallback constant loading in tcg_out_cmp
  tcg/arm: Expand arguments to tcg_out_cmp2
  tcg/ppc: Expand arguments to tcg_out_cmp2
  tcg: Convert brcond2_i32 to TCGOutOpBrcond2
  tcg: Convert setcond2_i32 to TCGOutOpSetcond2
  tcg: Convert bswap16 to TCGOutOpBswap
  tcg: Merge INDEX_op_bswap16_{i32,i64}
  tcg: Convert bswap32 to TCGOutOpBswap
  tcg: Merge INDEX_op_bswap32_{i32,i64}
  tcg: Convert bswap64 to TCGOutOpUnary
  tcg: Rename INDEX_op_bswap64_i64 to INDEX_op_bswap64
  tcg: Convert extract to TCGOutOpExtract
  tcg: Merge INDEX_op_extract_{i32,i64}
  tcg: Convert sextract to TCGOutOpExtract
  tcg: Merge INDEX_op_sextract_{i32,i64}
  tcg: Convert ext_i32_i64 to TCGOutOpUnary
  tcg: Convert extu_i32_i64 to TCGOutOpUnary
  tcg: Convert extrl_i64_i32 to TCGOutOpUnary
  tcg: Convert extrh_i64_i32 to TCGOutOpUnary
  tcg: Convert deposit to TCGOutOpDeposit
  tcg/aarch64: Improve deposit
  tcg: Merge INDEX_op_deposit_{i32,i64}
  tcg: Convert extract2 to TCGOutOpExtract2
  tcg: Merge INDEX_op_extract2_{i32,i64}
  tcg: Expand fallback add2 with 32-bit operations
  tcg: Expand fallback sub2 with 32-bit operations
  tcg: Do not default add2/sub2_i32 for 32-bit hosts
  tcg/mips: Drop support for add2/sub2
  tcg/riscv: Drop support for add2/sub2
  tcg: Move i into each for loop in liveness_pass_1
  tcg: Sink def, nb_iargs, nb_oargs loads in liveness_pass_1
  tcg: Add add/sub with carry opcodes and infrastructure
  tcg: Add TCGOutOp structures for add/sub carry opcodes
  tcg/optimize: Handle add/sub with carry opcodes
  tcg/optimize: With two const operands, prefer 0 in arg1
  tcg: Use add carry opcodes to expand add2
  tcg: Use sub carry opcodes to expand sub2
  tcg/i386: Honor carry_live in tcg_out_movi
  tcg/i386: Implement add/sub carry opcodes
  tcg/i386: Remove support for add2/sub2
  tcg/i386: Special case addci r, 0, 0
  tcg: Add tcg_gen_addcio_{i32,i64,tl}
  target/arm: Use tcg_gen_addcio_* for ADCS
  target/hppa: Use tcg_gen_addcio_i64
  target/microblaze: Use tcg_gen_addcio_i32
  target/openrisc: Use tcg_gen_addcio_* for ADDC
  target/ppc: Use tcg_gen_addcio_tl for ADD and SUBF
  target/s390x: Use tcg_gen_addcio_i64 for op_addc64
  target/sh4: Use tcg_gen_addcio_i32 for addc
  target/sparc: Use tcg_gen_addcio_tl for gen_op_addcc_int
  target/tricore: Use tcg_gen_addcio_i32 for gen_addc_CC
  tcg/aarch64: Implement add/sub carry opcodes
  tcg/aarch64: Remove support for add2/sub2
  tcg/arm: Implement add/sub carry opcodes
  tcg/arm: Remove support for add2/sub2
  tcg/ppc: Implement add/sub carry opcodes
  tcg/ppc: Remove support for add2/sub2
  tcg/s390x: Honor carry_live in tcg_out_movi
  tcg/s390: Add TCG_CT_CONST_N32
  tcg/s390x: Implement add/sub carry opcodes
  tcg/s390x: Use ADD LOGICAL WITH SIGNED IMMEDIATE
  tcg/s390x: Remove support for add2/sub2
  tcg/sparc64: Hoist tcg_cond_to_bcond lookup out of tcg_out_movcc
  tcg/sparc64: Implement add/sub carry opcodes
  tcg/sparc64: Remove support for add2/sub2
  tcg/tci: Implement add/sub carry opcodes
  tcg/tci: Remove support for add2/sub2
  tcg: Remove add2/sub2 opcodes
  tcg: Formalize tcg_out_mb
  tcg: Formalize tcg_out_br
  tcg: Formalize tcg_out_goto_ptr
  tcg: Assign TCGOP_TYPE in liveness_pass_2
  tcg: Convert ld to TCGOutOpLoad
  tcg: Merge INDEX_op_ld*_{i32,i64}
  tcg: Convert st to TCGOutOpStore
  tcg: Merge INDEX_op_st*_{i32,i64}
  tcg: Stash MemOp size in TCGOP_FLAGS
  tcg: Remove INDEX_op_qemu_st8_*
  tcg: Merge INDEX_op_{ld,st}_{i32,i64,i128}
  tcg: Convert qemu_ld{2} to TCGOutOpLoad{2}
  tcg: Convert qemu_st{2} to TCGOutOpLdSt{2}
  tcg: Remove tcg_out_op

 include/tcg/tcg-op-common.h          |    4 +
 include/tcg/tcg-op.h                 |    2 +
 include/tcg/tcg-opc.h                |  212 +--
 include/tcg/tcg.h                    |   15 +-
 tcg/aarch64/tcg-target-con-set.h     |    5 +-
 tcg/aarch64/tcg-target-has.h         |   57 -
 tcg/arm/tcg-target-con-set.h         |    5 +-
 tcg/arm/tcg-target-has.h             |   27 -
 tcg/i386/tcg-target-con-set.h        |    4 +-
 tcg/i386/tcg-target-con-str.h        |    2 +-
 tcg/i386/tcg-target-has.h            |   57 -
 tcg/loongarch64/tcg-target-con-set.h |    9 +-
 tcg/loongarch64/tcg-target-con-str.h |    1 -
 tcg/loongarch64/tcg-target-has.h     |   60 -
 tcg/mips/tcg-target-con-set.h        |   15 +-
 tcg/mips/tcg-target-con-str.h        |    1 -
 tcg/mips/tcg-target-has.h            |   64 -
 tcg/ppc/tcg-target-con-set.h         |   12 +-
 tcg/ppc/tcg-target-con-str.h         |    1 +
 tcg/ppc/tcg-target-has.h             |   59 -
 tcg/riscv/tcg-target-con-set.h       |    7 +-
 tcg/riscv/tcg-target-con-str.h       |    2 -
 tcg/riscv/tcg-target-has.h           |   61 -
 tcg/s390x/tcg-target-con-set.h       |    7 +-
 tcg/s390x/tcg-target-con-str.h       |    1 +
 tcg/s390x/tcg-target-has.h           |   57 -
 tcg/sparc64/tcg-target-con-set.h     |    9 +-
 tcg/sparc64/tcg-target-has.h         |   59 -
 tcg/tcg-has.h                        |   47 -
 tcg/tci/tcg-target-has.h             |   59 -
 target/arm/tcg/translate-a64.c       |   10 +-
 target/arm/tcg/translate-sve.c       |    2 +-
 target/arm/tcg/translate.c           |   17 +-
 target/hppa/translate.c              |   17 +-
 target/microblaze/translate.c        |   10 +-
 target/openrisc/translate.c          |    3 +-
 target/ppc/translate.c               |   11 +-
 target/s390x/tcg/translate.c         |    6 +-
 target/sh4/translate.c               |   36 +-
 target/sparc/translate.c             |    3 +-
 target/tricore/translate.c           |   12 +-
 tcg/optimize.c                       | 1066 ++++++++------
 tcg/tcg-op-ldst.c                    |   74 +-
 tcg/tcg-op.c                         | 1242 ++++++++--------
 tcg/tcg.c                            | 1303 ++++++++++++-----
 tcg/tci.c                            |  766 ++++------
 docs/devel/tcg-ops.rst               |  220 ++-
 target/i386/tcg/emit.c.inc           |   12 +-
 tcg/aarch64/tcg-target.c.inc         | 1626 ++++++++++++---------
 tcg/arm/tcg-target.c.inc             | 1556 ++++++++++++--------
 tcg/i386/tcg-target.c.inc            | 1850 ++++++++++++++----------
 tcg/loongarch64/tcg-target.c.inc     | 1425 +++++++++++--------
 tcg/mips/tcg-target.c.inc            | 1703 ++++++++++++----------
 tcg/ppc/tcg-target.c.inc             | 1978 ++++++++++++++------------
 tcg/riscv/tcg-target.c.inc           | 1375 +++++++++---------
 tcg/s390x/tcg-target.c.inc           | 1945 +++++++++++++------------
 tcg/sparc64/tcg-target.c.inc         | 1295 +++++++++++------
 tcg/tci/tcg-target-opc.h.inc         |   11 +
 tcg/tci/tcg-target.c.inc             | 1175 +++++++++------
 59 files changed, 12100 insertions(+), 9570 deletions(-)

Comments

Nicholas Piggin April 16, 2025, 1:24 p.m. UTC | #1
On Wed Apr 16, 2025 at 5:22 AM AEST, Richard Henderson wrote:
> v2: 20250107080112.1175095-1-richard.henderson@linaro.org
> v3: 20250216231012.2808572-1-richard.henderson@linaro.org
>
> Since it has been 2 months, I don't recall specific changes from v3 to v4.
> It's mostly application of r-b tags.  There is one more patch, which I
> believe was Phil asking for one patch to be split.
>
> Patches still requiring review: 29, 41-43, 46, 47, 49-51, 55, 57, 59-62,
>   64, 66-68, 70, 72-78, 80, 82-87, 89, 91, 93, 95, 97-102, 104, 106-162.

For ppc64 host I ran check and functional and avocado tests and
some ad hoc tests and holds up so far.

Tested-by: Nicholas Piggin <npiggin@gmail.com> (ppc64 host)
Pierrick Bouvier April 16, 2025, 11:38 p.m. UTC | #2
Hi Richard,

On 4/15/25 12:22, Richard Henderson wrote:
> v2: 20250107080112.1175095-1-richard.henderson@linaro.org
> v3: 20250216231012.2808572-1-richard.henderson@linaro.org
> 
> Since it has been 2 months, I don't recall specific changes from v3 to v4.
> It's mostly application of r-b tags.  There is one more patch, which I
> believe was Phil asking for one patch to be split.
> 
> Patches still requiring review: 29, 41-43, 46, 47, 49-51, 55, 57, 59-62,
>    64, 66-68, 70, 72-78, 80, 82-87, 89, 91, 93, 95, 97-102, 104, 106-162.
> 
> 
> r~
> 

Thanks for this series Richard, reviewing this is a good opportunity to 
look at register allocation and associated constraints in tcg.

The new way to define dynamic constraints is quite neat, and readable, 
as it was one of the feedback you previously asked.
The only concern I have is that we could create silent "performance" 
related bugs, where a specific feature is deactivated because of a bad 
combination, but it's inherent to this approach and not a blocker.

Even though I reviewed this series, it's hard for me to review all the 
target specific implementations, as I don't have your expertise on such 
a wide range of architectures.

As a more general question, how do you approach testing for a series 
like this one? I see two different challenges, as it touches the IR 
itself, and the various backends.
- For the IR, I don't know how extensive our complete test suite is 
(regarding coverage of all existing TCG ops), but I guess there are some 
holes there. It would be interesting to generate coverage data once we 
can get a single binary in the future.
- For the various backends:
   * Are you able to compile QEMU on all concerned hosts and run testing 
there?
   * Or do you cross compile and run binaries emulated?
   * Or another way I might ignore at the moment?

Regards,
Pierrick
Richard Henderson April 17, 2025, 12:18 a.m. UTC | #3
On 4/16/25 16:38, Pierrick Bouvier wrote:
> The only concern I have is that we could create silent "performance" related bugs, where a 
> specific feature is deactivated because of a bad combination, but it's inherent to this 
> approach and not a blocker.

I think I know what you mean, and the way I see things it that the silent performance bug 
was previously scattered across different sections of the code, whereas now it is on the 
same page.  But underneath there is no real change.

Unless you mean something different?

> As a more general question, how do you approach testing for a series like this one? I see 
> two different challenges, as it touches the IR itself, and the various backends.
> - For the IR, I don't know how extensive our complete test suite is (regarding coverage of 
> all existing TCG ops), but I guess there are some holes there. It would be interesting to 
> generate coverage data once we can get a single binary in the future.

I don't use anything more than our testsuite.
Coverage data would indeed be interesting; I've not attempted that.

> - For the various backends:
>    * Are you able to compile QEMU on all concerned hosts and run testing there?

I have aarch64, arm, s390x via *.ci.qemu.org;
loongarch64, riscv64, ppc64le via the gcc compile farm.

>    * Or do you cross compile and run binaries emulated?

This is my only option for mipsel, mips64el.

I do not even have a cross-compile solution for ppc32, as there is no longer any distro 
support. I have been ignoring that, waiting to remove it when all 32-bit hosts get kicked.


r~
Pierrick Bouvier April 17, 2025, 12:49 a.m. UTC | #4
On 4/16/25 17:18, Richard Henderson wrote:
> On 4/16/25 16:38, Pierrick Bouvier wrote:
>> The only concern I have is that we could create silent "performance" related bugs, where a
>> specific feature is deactivated because of a bad combination, but it's inherent to this
>> approach and not a blocker.
> 
> I think I know what you mean, and the way I see things it that the silent performance bug
> was previously scattered across different sections of the code, whereas now it is on the
> same page.  But underneath there is no real change.
> 
> Unless you mean something different?
> 

It should be functionnally equivalent indeed, but in case one of cset_* 
function contains bug, it might silently fallback to a slower 
implementation. The TCG_TARGET_HAS_* were less error prone I guess, as 
it's just a declaration.

But overall, the new approach is really better, so it's worth the risk.

>> As a more general question, how do you approach testing for a series like this one? I see
>> two different challenges, as it touches the IR itself, and the various backends.
>> - For the IR, I don't know how extensive our complete test suite is (regarding coverage of
>> all existing TCG ops), but I guess there are some holes there. It would be interesting to
>> generate coverage data once we can get a single binary in the future.
> 
> I don't use anything more than our testsuite.
> Coverage data would indeed be interesting; I've not attempted that.
> 

I tried previously, but since we have duplicated compilation units per 
target, this is confusing for any coverage tool, as soon as you try to 
aggregate data from several targets.

>> - For the various backends:
>>     * Are you able to compile QEMU on all concerned hosts and run testing there?
> 
> I have aarch64, arm, s390x via *.ci.qemu.org;
> loongarch64, riscv64, ppc64le via the gcc compile farm.
> 
>>     * Or do you cross compile and run binaries emulated?
> 
> This is my only option for mipsel, mips64el.
> 
> I do not even have a cross-compile solution for ppc32, as there is no longer any distro
> support. I have been ignoring that, waiting to remove it when all 32-bit hosts get kicked.
> 
> 
> r~
BALATON Zoltan April 17, 2025, 12:02 p.m. UTC | #5
On Wed, 16 Apr 2025, Richard Henderson wrote:
> On 4/16/25 16:38, Pierrick Bouvier wrote:
>> The only concern I have is that we could create silent "performance" 
>> related bugs, where a specific feature is deactivated because of a bad 
>> combination, but it's inherent to this approach and not a blocker.
>
> I think I know what you mean, and the way I see things it that the silent 
> performance bug was previously scattered across different sections of the 
> code, whereas now it is on the same page.  But underneath there is no real 
> change.
>
> Unless you mean something different?
>
>> As a more general question, how do you approach testing for a series like 
>> this one? I see two different challenges, as it touches the IR itself, and 
>> the various backends.
>> - For the IR, I don't know how extensive our complete test suite is 
>> (regarding coverage of all existing TCG ops), but I guess there are some 
>> holes there. It would be interesting to generate coverage data once we can 
>> get a single binary in the future.
>
> I don't use anything more than our testsuite.
> Coverage data would indeed be interesting; I've not attempted that.
>
>> - For the various backends:
>>    * Are you able to compile QEMU on all concerned hosts and run testing 
>> there?
>
> I have aarch64, arm, s390x via *.ci.qemu.org;
> loongarch64, riscv64, ppc64le via the gcc compile farm.
>
>>    * Or do you cross compile and run binaries emulated?
>
> This is my only option for mipsel, mips64el.
>
> I do not even have a cross-compile solution for ppc32, as there is no longer 
> any distro support. I have been ignoring that, waiting to remove it when all 
> 32-bit hosts get kicked.

Compiling for ppc32 is still possible with powerpc64-linux-gnu-gcc -m32 
-mbig-endian which should still be available in distros. As long as there 
are ppc32 hosts available keeping support for KVM may be interesting. And 
I hope you don't want to remove emulating ppc32 on 64 bit hosts at least.

Regards,
BALATON Zoltan