[v3,42/69] target/arm: Introduce gen_gvec_rev{16,32,64}

Message ID	20241211163036.2297116-43-richard.henderson@linaro.org
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Cc: qemu-arm@nongnu.org, Peter Maydell <peter.maydell@linaro.org> Subject: [PATCH v3 42/69] target/arm: Introduce gen_gvec_rev{16,32,64} Date: Wed, 11 Dec 2024 10:30:09 -0600 Message-ID: <20241211163036.2297116-43-richard.henderson@linaro.org> In-Reply-To: <20241211163036.2297116-1-richard.henderson@linaro.org> References: <20241211163036.2297116-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::f29; envelope-from=richard.henderson@linaro.org; helo=mail-qv1-xf29.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
Series	target/arm: AArch64 decodetree conversion, final part \| expand [v3,00/69] target/arm: AArch64 decodetree conversion, final part [v3,01/69] target/arm: Add section labels for "Data Processing (register)" [v3,02/69] target/arm: Convert UDIV, SDIV to decodetree [v3,03/69] target/arm: Convert LSLV, LSRV, ASRV, RORV to decodetree [v3,04/69] target/arm: Convert CRC32, CRC32C to decodetree [v3,05/69] target/arm: Convert SUBP, IRG, GMI to decodetree [v3,06/69] target/arm: Convert PACGA to decodetree [v3,07/69] target/arm: Convert RBIT, REV16, REV32, REV64 to decodetree [v3,08/69] target/arm: Convert CLZ, CLS to decodetree [v3,09/69] target/arm: Convert PAC[ID], AUT[ID] to decodetree [v3,10/69] target/arm: Convert XPAC[ID] to decodetree [v3,11/69] target/arm: Convert disas_logic_reg to decodetree [v3,12/69] target/arm: Convert disas_add_sub_ext_reg to decodetree [v3,13/69] target/arm: Convert disas_add_sub_reg to decodetree [v3,14/69] target/arm: Convert disas_data_proc_3src to decodetree [v3,15/69] target/arm: Convert disas_adc_sbc to decodetree [v3,16/69] target/arm: Convert RMIF to decodetree [v3,17/69] target/arm: Convert SETF8, SETF16 to decodetree [v3,18/69] target/arm: Convert CCMP, CCMN to decodetree [v3,19/69] target/arm: Convert disas_cond_select to decodetree [v3,20/69] target/arm: Introduce fp_access_check_scalar_hsd [v3,21/69] target/arm: Introduce fp_access_check_vector_hsd [v3,22/69] target/arm: Convert FCMP, FCMPE, FCCMP, FCCMPE to decodetree [v3,23/69] target/arm: Fix decode of fp16 vector fabs, fneg, fsqrt [v3,24/69] target/arm: Convert FMOV, FABS, FNEG (scalar) to decodetree [v3,25/69] target/arm: Pass fpstatus to vfp_sqrt* [v3,26/69] target/arm: Remove helper_sqrt_f16 [v3,27/69] target/arm: Convert FSQRT (scalar) to decodetree [v3,28/69] target/arm: Convert FRINT[NPMSAXI] (scalar) to decodetree [v3,29/69] target/arm: Convert BFCVT to decodetree [v3,30/69] target/arm: Convert FRINT{32, 64}[ZX] (scalar) to decodetree [v3,31/69] target/arm: Convert FCVT (scalar) to decodetree [v3,32/69] target/arm: Convert handle_fpfpcvt to decodetree [v3,33/69] target/arm: Convert FJCVTZS to decodetree [v3,34/69] target/arm: Convert handle_fmov to decodetree [v3,35/69] target/arm: Convert SQABS, SQNEG to decodetree [v3,36/69] target/arm: Convert ABS, NEG to decodetree [v3,37/69] target/arm: Introduce gen_gvec_cls, gen_gvec_clz [v3,38/69] target/arm: Convert CLS, CLZ (vector) to decodetree [v3,39/69] target/arm: Introduce gen_gvec_cnt, gen_gvec_rbit [v3,40/69] target/arm: Convert CNT, NOT, RBIT (vector) to decodetree [v3,41/69] target/arm: Convert CMGT, CMGE, GMLT, GMLE, CMEQ (zero) to decodetree [v3,42/69] target/arm: Introduce gen_gvec_rev{16,32,64} [v3,43/69] target/arm: Convert handle_rev to decodetree [v3,44/69] target/arm: Move helper_neon_addlp_{s8, s16} to neon_helper.c [v3,45/69] target/arm: Introduce gen_gvec_{s,u}{add,ada}lp [v3,46/69] target/arm: Convert handle_2misc_pairwise to decodetree [v3,47/69] target/arm: Remove helper_neon_{add,sub}l_u{16,32} [v3,48/69] target/arm: Introduce clear_vec [v3,49/69] target/arm: Convert XTN, SQXTUN, SQXTN, UQXTN to decodetree [v3,50/69] target/arm: Convert FCVTN, BFCVTN to decodetree [v3,51/69] target/arm: Convert FCVTXN to decodetree [v3,52/69] target/arm: Convert SHLL to decodetree [v3,53/69] target/arm: Implement gen_gvec_fabs, gen_gvec_fneg [v3,54/69] target/arm: Convert FABS, FNEG (vector) to decodetree [v3,55/69] target/arm: Convert FSQRT (vector) to decodetree [v3,56/69] target/arm: Convert FRINT* (vector) to decodetree [v3,57/69] target/arm: Convert FCVT* (vector, integer) scalar to decodetree [v3,58/69] target/arm: Convert FCVT* (vector, fixed-point) scalar to decodetree [v3,59/69] target/arm: Convert [US]CVTF (vector, integer) scalar to decodetree [v3,60/69] target/arm: Convert [US]CVTF (vector, fixed-point) scalar to decodetree [v3,61/69] target/arm: Rename helper_gvec_vcvt_[hf][su] with _rz [v3,62/69] target/arm: Convert [US]CVTF (vector) to decodetree [v3,63/69] target/arm: Convert FCVTZ[SU] (vector, fixed-point) to decodetree [v3,64/69] target/arm: Convert FCVT* (vector, integer) to decodetree [v3,65/69] target/arm: Convert handle_2misc_fcmp_zero to decodetree [v3,66/69] target/arm: Convert FRECPE, FRECPX, FRSQRTE to decodetree [v3,67/69] target/arm: Introduce gen_gvec_urecpe, gen_gvec_ursqrte [v3,68/69] target/arm: Convert URECPE and URSQRTE to decodetree [v3,69/69] target/arm: Convert FCVTL to decodetree

Message ID

20241211163036.2297116-43-richard.henderson@linaro.org

State

New

Headers

Received-SPF: pass (google.com: domain of
 qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as
 permitted sender) client-ip=209.51.188.17;
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Cc: qemu-arm@nongnu.org,
	Peter Maydell <peter.maydell@linaro.org>
Subject: [PATCH v3 42/69] target/arm: Introduce gen_gvec_rev{16,32,64}
Date: Wed, 11 Dec 2024 10:30:09 -0600
Message-ID: <20241211163036.2297116-43-richard.henderson@linaro.org>
In-Reply-To: <20241211163036.2297116-1-richard.henderson@linaro.org>
References: <20241211163036.2297116-1-richard.henderson@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::f29;
 envelope-from=richard.henderson@linaro.org; helo=mail-qv1-xf29.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org

Series

target/arm: AArch64 decodetree conversion, final part | expand

Commit Message

Richard Henderson Dec. 11, 2024, 4:30 p.m. UTC

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate.h      |  6 +++
 target/arm/tcg/gengvec.c        | 58 ++++++++++++++++++++++
 target/arm/tcg/translate-neon.c | 88 +++++++--------------------------
 3 files changed, 81 insertions(+), 71 deletions(-)

Comments

Philippe Mathieu-Daudé Dec. 11, 2024, 5:19 p.m. UTC | #1

On 11/12/24 17:30, Richard Henderson wrote:
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   target/arm/tcg/translate.h      |  6 +++
>   target/arm/tcg/gengvec.c        | 58 ++++++++++++++++++++++
>   target/arm/tcg/translate-neon.c | 88 +++++++--------------------------
>   3 files changed, 81 insertions(+), 71 deletions(-)
> 
> diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
> index cb8e1b2586..342ebedafc 100644
> --- a/target/arm/tcg/translate.h
> +++ b/target/arm/tcg/translate.h
> @@ -586,6 +586,12 @@ void gen_gvec_cnt(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
>                     uint32_t opr_sz, uint32_t max_sz);
>   void gen_gvec_rbit(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
>                      uint32_t opr_sz, uint32_t max_sz);
> +void gen_gvec_rev16(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
> +                    uint32_t opr_sz, uint32_t max_sz);
> +void gen_gvec_rev32(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
> +                    uint32_t opr_sz, uint32_t max_sz);
> +void gen_gvec_rev64(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
> +                    uint32_t opr_sz, uint32_t max_sz);

Remembering 
https://lore.kernel.org/qemu-devel/20230822124042.54739-1-philmd@linaro.org/, 
these gvec helpers might be useful for other targets.

Richard Henderson Dec. 11, 2024, 5:31 p.m. UTC | #2

On 12/11/24 11:19, Philippe Mathieu-Daudé wrote:
> On 11/12/24 17:30, Richard Henderson wrote:
>> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   target/arm/tcg/translate.h      |  6 +++
>>   target/arm/tcg/gengvec.c        | 58 ++++++++++++++++++++++
>>   target/arm/tcg/translate-neon.c | 88 +++++++--------------------------
>>   3 files changed, 81 insertions(+), 71 deletions(-)
>>
>> diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
>> index cb8e1b2586..342ebedafc 100644
>> --- a/target/arm/tcg/translate.h
>> +++ b/target/arm/tcg/translate.h
>> @@ -586,6 +586,12 @@ void gen_gvec_cnt(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
>>                     uint32_t opr_sz, uint32_t max_sz);
>>   void gen_gvec_rbit(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
>>                      uint32_t opr_sz, uint32_t max_sz);
>> +void gen_gvec_rev16(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
>> +                    uint32_t opr_sz, uint32_t max_sz);
>> +void gen_gvec_rev32(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
>> +                    uint32_t opr_sz, uint32_t max_sz);
>> +void gen_gvec_rev64(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
>> +                    uint32_t opr_sz, uint32_t max_sz);
> 
> Remembering https://lore.kernel.org/qemu-devel/20230822124042.54739-1-philmd@linaro.org/, 
> these gvec helpers might be useful for other targets.

These may be factored incorrectly for other usage.  Here, for rev<N>, N is the size of the 
container, and vece specifies the size of the element within each container.  It's reverse 
of the usual meaning of vece, but it maps well to the Arm instruction encoding.

The only other bswap I can recall with vector operands is s390x VLBR/VSTBR, and similar 
for Power VSX, which performs the reversal at the same time as a load/store.  So in this 
case the heavy lifting of the bswap gets pushed off to MO_BSWAP.


r~

diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index cb8e1b2586..342ebedafc 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -586,6 +586,12 @@  void gen_gvec_cnt(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
                   uint32_t opr_sz, uint32_t max_sz);
 void gen_gvec_rbit(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
                    uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_rev16(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                    uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_rev32(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                    uint32_t opr_sz, uint32_t max_sz);
+void gen_gvec_rev64(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                    uint32_t opr_sz, uint32_t max_sz);
 
 /*
  * Forward to the isar_feature_* tests given a DisasContext pointer.
diff --git a/target/arm/tcg/gengvec.c b/target/arm/tcg/gengvec.c
index 85a0b50496..33c0a94958 100644
--- a/target/arm/tcg/gengvec.c
+++ b/target/arm/tcg/gengvec.c
@@ -2409,3 +2409,61 @@  void gen_gvec_rbit(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
     tcg_gen_gvec_2_ool(rd_ofs, rn_ofs, opr_sz, max_sz, 0,
                        gen_helper_gvec_rbit_b);
 }
+
+void gen_gvec_rev16(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                    uint32_t opr_sz, uint32_t max_sz)
+{
+    assert(vece == MO_8);
+    tcg_gen_gvec_rotli(MO_16, rd_ofs, rn_ofs, 8, opr_sz, max_sz);
+}
+
+static void gen_bswap32_i64(TCGv_i64 d, TCGv_i64 n)
+{
+    tcg_gen_bswap64_i64(d, n);
+    tcg_gen_rotli_i64(d, d, 32);
+}
+
+void gen_gvec_rev32(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                    uint32_t opr_sz, uint32_t max_sz)
+{
+    static const GVecGen2 g = {
+        .fni8 = gen_bswap32_i64,
+        .fni4 = tcg_gen_bswap32_i32,
+        .prefer_i64 = TCG_TARGET_REG_BITS == 64,
+        .vece = MO_32
+    };
+
+    switch (vece) {
+    case MO_16:
+        tcg_gen_gvec_rotli(MO_32, rd_ofs, rn_ofs, 16, opr_sz, max_sz);
+        break;
+    case MO_8:
+        tcg_gen_gvec_2(rd_ofs, rn_ofs, opr_sz, max_sz, &g);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
+
+void gen_gvec_rev64(unsigned vece, uint32_t rd_ofs, uint32_t rn_ofs,
+                    uint32_t opr_sz, uint32_t max_sz)
+{
+    static const GVecGen2 g[] = {
+        { .fni8 = tcg_gen_bswap64_i64,
+          .vece = MO_64 },
+        { .fni8 = tcg_gen_hswap_i64,
+          .vece = MO_64 },
+    };
+
+    switch (vece) {
+    case MO_32:
+        tcg_gen_gvec_rotli(MO_64, rd_ofs, rn_ofs, 32, opr_sz, max_sz);
+        break;
+    case MO_8:
+    case MO_16:
+        tcg_gen_gvec_2(rd_ofs, rn_ofs, opr_sz, max_sz, &g[vece]);
+        break;
+    default:
+        g_assert_not_reached();
+    }
+}
diff --git a/target/arm/tcg/translate-neon.c b/target/arm/tcg/translate-neon.c
index 50d0bf7753..ca6f5578b4 100644
--- a/target/arm/tcg/translate-neon.c
+++ b/target/arm/tcg/translate-neon.c
@@ -2565,58 +2565,6 @@  static bool trans_VDUP_scalar(DisasContext *s, arg_VDUP_scalar *a)
     return true;
 }
 
-static bool trans_VREV64(DisasContext *s, arg_VREV64 *a)
-{
-    int pass, half;
-    TCGv_i32 tmp[2];
-
-    if (!arm_dc_feature(s, ARM_FEATURE_NEON)) {
-        return false;
-    }
-
-    /* UNDEF accesses to D16-D31 if they don't exist. */
-    if (!dc_isar_feature(aa32_simd_r32, s) &&
-        ((a->vd | a->vm) & 0x10)) {
-        return false;
-    }
-
-    if ((a->vd | a->vm) & a->q) {
-        return false;
-    }
-
-    if (a->size == 3) {
-        return false;
-    }
-
-    if (!vfp_access_check(s)) {
-        return true;
-    }
-
-    tmp[0] = tcg_temp_new_i32();
-    tmp[1] = tcg_temp_new_i32();
-
-    for (pass = 0; pass < (a->q ? 2 : 1); pass++) {
-        for (half = 0; half < 2; half++) {
-            read_neon_element32(tmp[half], a->vm, pass * 2 + half, MO_32);
-            switch (a->size) {
-            case 0:
-                tcg_gen_bswap32_i32(tmp[half], tmp[half]);
-                break;
-            case 1:
-                gen_swap_half(tmp[half], tmp[half]);
-                break;
-            case 2:
-                break;
-            default:
-                g_assert_not_reached();
-            }
-        }
-        write_neon_element32(tmp[1], a->vd, pass * 2, MO_32);
-        write_neon_element32(tmp[0], a->vd, pass * 2 + 1, MO_32);
-    }
-    return true;
-}
-
 static bool do_2misc_pairwise(DisasContext *s, arg_2misc *a,
                               NeonGenWidenFn *widenfn,
                               NeonGenTwo64OpFn *opfn,
@@ -3122,6 +3070,7 @@  DO_2MISC_VEC(VCGE0, gen_gvec_cge0)
 DO_2MISC_VEC(VCLT0, gen_gvec_clt0)
 DO_2MISC_VEC(VCLS, gen_gvec_cls)
 DO_2MISC_VEC(VCLZ, gen_gvec_clz)
+DO_2MISC_VEC(VREV64, gen_gvec_rev64)
 
 static bool trans_VMVN(DisasContext *s, arg_2misc *a)
 {
@@ -3139,6 +3088,22 @@  static bool trans_VCNT(DisasContext *s, arg_2misc *a)
     return do_2misc_vec(s, a, gen_gvec_cnt);
 }
 
+static bool trans_VREV16(DisasContext *s, arg_2misc *a)
+{
+    if (a->size != 0) {
+        return false;
+    }
+    return do_2misc_vec(s, a, gen_gvec_rev16);
+}
+
+static bool trans_VREV32(DisasContext *s, arg_2misc *a)
+{
+    if (a->size != 0 && a->size != 1) {
+        return false;
+    }
+    return do_2misc_vec(s, a, gen_gvec_rev32);
+}
+
 #define WRAP_2M_3_OOL_FN(WRAPNAME, FUNC, DATA)                          \
     static void WRAPNAME(unsigned vece, uint32_t rd_ofs,                \
                          uint32_t rm_ofs, uint32_t oprsz,               \
@@ -3218,25 +3183,6 @@  static bool do_2misc(DisasContext *s, arg_2misc *a, NeonGenOneOpFn *fn)
     return true;
 }
 
-static bool trans_VREV32(DisasContext *s, arg_2misc *a)
-{
-    static NeonGenOneOpFn * const fn[] = {
-        tcg_gen_bswap32_i32,
-        gen_swap_half,
-        NULL,
-        NULL,
-    };
-    return do_2misc(s, a, fn[a->size]);
-}
-
-static bool trans_VREV16(DisasContext *s, arg_2misc *a)
-{
-    if (a->size != 0) {
-        return false;
-    }
-    return do_2misc(s, a, gen_rev16);
-}
-
 static void gen_VABS_F(unsigned vece, uint32_t rd_ofs, uint32_t rm_ofs,
                        uint32_t oprsz, uint32_t maxsz)
 {

[v3,42/69] target/arm: Introduce gen_gvec_rev{16,32,64}

Commit Message

Comments

Patch