Message ID | 20230713202327.12662-1-richard.henderson@linaro.org |
---|---|
State | Superseded |
Headers | show |
Series | [for-8.1] tcg: Use HAVE_CMPXCHG128 instead of CONFIG_CMPXCHG128 | expand |
Hi Richard, On 13/7/23 22:23, Richard Henderson wrote: > We adjust CONFIG_ATOMIC128 and CONFIG_CMPXCHG128 with > CONFIG_ATOMIC128_OPT in atomic128.h. It is difficult > to tell when those changes have been applied with the > ifdef we must use with CONFIG_CMPXCHG128. So instead > use HAVE_CMPXCHG128, which triggers -Werror-undef when > the proper header has not been included. > > Improves tcg_gen_atomic_cmpxchg_i128 for s390x host, which > requires CONFIG_ATOMIC128_OPT. Without this we fall back > to EXCP_ATOMIC to single-step 128-bit atomics, which is > slow enough to cause some tests to time out. > > Reported-by: Thomas Huth <thuth@redhat.com> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > --- > > Thomas, this issue does not quite match the one you bisected, but > other than the cmpxchg, I don't see any see any qemu_{ld,st}_i128 > being used in BootLinuxS390X.test_s390_ccw_virtio_tcg. > > As far as I can see, this wasn't broken by the addition of > CONFIG_ATOMIC128_OPT, rather that fix didn't go far enough. > > Anyway, test_s390_ccw_virtio_tcg now passes in 159s on our host. IIUC: If we have CONFIG_ATOMIC128, we use qatomic_cmpxchg__nocheck; else if we have CONFIG_CMPXCHG128 we use __sync_val_compare_and_swap_16; in both cases we set HAVE_CMPXCHG128; otherwise we can not use atomic128 cmpxchg(). (I'm trying to figure why we need both CONFIGs).
On 7/13/23 22:36, Philippe Mathieu-Daudé wrote: > Hi Richard, > > On 13/7/23 22:23, Richard Henderson wrote: >> We adjust CONFIG_ATOMIC128 and CONFIG_CMPXCHG128 with >> CONFIG_ATOMIC128_OPT in atomic128.h. It is difficult >> to tell when those changes have been applied with the >> ifdef we must use with CONFIG_CMPXCHG128. So instead >> use HAVE_CMPXCHG128, which triggers -Werror-undef when >> the proper header has not been included. >> >> Improves tcg_gen_atomic_cmpxchg_i128 for s390x host, which >> requires CONFIG_ATOMIC128_OPT. Without this we fall back >> to EXCP_ATOMIC to single-step 128-bit atomics, which is >> slow enough to cause some tests to time out. >> >> Reported-by: Thomas Huth <thuth@redhat.com> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> >> --- >> >> Thomas, this issue does not quite match the one you bisected, but >> other than the cmpxchg, I don't see any see any qemu_{ld,st}_i128 >> being used in BootLinuxS390X.test_s390_ccw_virtio_tcg. >> >> As far as I can see, this wasn't broken by the addition of >> CONFIG_ATOMIC128_OPT, rather that fix didn't go far enough. >> >> Anyway, test_s390_ccw_virtio_tcg now passes in 159s on our host. > > IIUC: > > If we have CONFIG_ATOMIC128, we use qatomic_cmpxchg__nocheck; > else if we have CONFIG_CMPXCHG128 we use __sync_val_compare_and_swap_16; > in both cases we set HAVE_CMPXCHG128; > otherwise we can not use atomic128 cmpxchg(). > > (I'm trying to figure why we need both CONFIGs). Or sometimes we use inline asm, because there's no compiler support at all. Please see host/include/*/host/atomic16-*.h. r~
On 13/07/2023 22.23, Richard Henderson wrote: > We adjust CONFIG_ATOMIC128 and CONFIG_CMPXCHG128 with > CONFIG_ATOMIC128_OPT in atomic128.h. It is difficult > to tell when those changes have been applied with the > ifdef we must use with CONFIG_CMPXCHG128. So instead > use HAVE_CMPXCHG128, which triggers -Werror-undef when > the proper header has not been included. > > Improves tcg_gen_atomic_cmpxchg_i128 for s390x host, which > requires CONFIG_ATOMIC128_OPT. Without this we fall back > to EXCP_ATOMIC to single-step 128-bit atomics, which is > slow enough to cause some tests to time out. > > Reported-by: Thomas Huth <thuth@redhat.com> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org> > --- > > Thomas, this issue does not quite match the one you bisected, but > other than the cmpxchg, I don't see any see any qemu_{ld,st}_i128 > being used in BootLinuxS390X.test_s390_ccw_virtio_tcg. > > As far as I can see, this wasn't broken by the addition of > CONFIG_ATOMIC128_OPT, rather that fix didn't go far enough. > > Anyway, test_s390_ccw_virtio_tcg now passes in 159s on our host. Thanks, I can confirm that this fixes the issue for me, too. Tested-by: Thomas Huth <thuth@redhat.com>
diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h index 39e68007f9..186899a2c7 100644 --- a/accel/tcg/tcg-runtime.h +++ b/accel/tcg/tcg-runtime.h @@ -58,7 +58,7 @@ DEF_HELPER_FLAGS_5(atomic_cmpxchgq_be, TCG_CALL_NO_WG, DEF_HELPER_FLAGS_5(atomic_cmpxchgq_le, TCG_CALL_NO_WG, i64, env, i64, i64, i64, i32) #endif -#ifdef CONFIG_CMPXCHG128 +#if HAVE_CMPXCHG128 DEF_HELPER_FLAGS_5(atomic_cmpxchgo_be, TCG_CALL_NO_WG, i128, env, i64, i128, i128, i32) DEF_HELPER_FLAGS_5(atomic_cmpxchgo_le, TCG_CALL_NO_WG, diff --git a/include/exec/helper-proto-common.h b/include/exec/helper-proto-common.h index 4d4b022668..8b67170a22 100644 --- a/include/exec/helper-proto-common.h +++ b/include/exec/helper-proto-common.h @@ -7,6 +7,8 @@ #ifndef HELPER_PROTO_COMMON_H #define HELPER_PROTO_COMMON_H +#include "qemu/atomic128.h" /* for HAVE_CMPXCHG128 */ + #define HELPER_H "accel/tcg/tcg-runtime.h" #include "exec/helper-proto.h.inc" #undef HELPER_H diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c index c2b81ec569..e0079c9a9d 100644 --- a/accel/tcg/cputlb.c +++ b/accel/tcg/cputlb.c @@ -3105,7 +3105,7 @@ void cpu_st16_mmu(CPUArchState *env, target_ulong addr, Int128 val, #include "atomic_template.h" #endif -#if defined(CONFIG_ATOMIC128) || defined(CONFIG_CMPXCHG128) +#if defined(CONFIG_ATOMIC128) || HAVE_CMPXCHG128 #define DATA_SIZE 16 #include "atomic_template.h" #endif diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c index d95b875a6a..e7225e10e9 100644 --- a/accel/tcg/user-exec.c +++ b/accel/tcg/user-exec.c @@ -1385,7 +1385,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, vaddr addr, MemOpIdx oi, #include "atomic_template.h" #endif -#if defined(CONFIG_ATOMIC128) || defined(CONFIG_CMPXCHG128) +#if defined(CONFIG_ATOMIC128) || HAVE_CMPXCHG128 #define DATA_SIZE 16 #include "atomic_template.h" #endif diff --git a/tcg/tcg-op-ldst.c b/tcg/tcg-op-ldst.c index 0fcc1618e5..d54c305598 100644 --- a/tcg/tcg-op-ldst.c +++ b/tcg/tcg-op-ldst.c @@ -778,7 +778,7 @@ typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv_i64, #else # define WITH_ATOMIC64(X) #endif -#ifdef CONFIG_CMPXCHG128 +#if HAVE_CMPXCHG128 # define WITH_ATOMIC128(X) X, #else # define WITH_ATOMIC128(X) diff --git a/accel/tcg/atomic_common.c.inc b/accel/tcg/atomic_common.c.inc index ee222fd7e7..95a5c5ff12 100644 --- a/accel/tcg/atomic_common.c.inc +++ b/accel/tcg/atomic_common.c.inc @@ -41,7 +41,7 @@ CMPXCHG_HELPER(cmpxchgq_be, uint64_t) CMPXCHG_HELPER(cmpxchgq_le, uint64_t) #endif -#ifdef CONFIG_CMPXCHG128 +#if HAVE_CMPXCHG128 CMPXCHG_HELPER(cmpxchgo_be, Int128) CMPXCHG_HELPER(cmpxchgo_le, Int128) #endif
We adjust CONFIG_ATOMIC128 and CONFIG_CMPXCHG128 with CONFIG_ATOMIC128_OPT in atomic128.h. It is difficult to tell when those changes have been applied with the ifdef we must use with CONFIG_CMPXCHG128. So instead use HAVE_CMPXCHG128, which triggers -Werror-undef when the proper header has not been included. Improves tcg_gen_atomic_cmpxchg_i128 for s390x host, which requires CONFIG_ATOMIC128_OPT. Without this we fall back to EXCP_ATOMIC to single-step 128-bit atomics, which is slow enough to cause some tests to time out. Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- Thomas, this issue does not quite match the one you bisected, but other than the cmpxchg, I don't see any see any qemu_{ld,st}_i128 being used in BootLinuxS390X.test_s390_ccw_virtio_tcg. As far as I can see, this wasn't broken by the addition of CONFIG_ATOMIC128_OPT, rather that fix didn't go far enough. Anyway, test_s390_ccw_virtio_tcg now passes in 159s on our host. r~ --- accel/tcg/tcg-runtime.h | 2 +- include/exec/helper-proto-common.h | 2 ++ accel/tcg/cputlb.c | 2 +- accel/tcg/user-exec.c | 2 +- tcg/tcg-op-ldst.c | 2 +- accel/tcg/atomic_common.c.inc | 2 +- 6 files changed, 7 insertions(+), 5 deletions(-)