From patchwork Sat Feb 17 18:22:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Henderson X-Patchwork-Id: 128712 Delivered-To: patch@linaro.org Received: by 10.46.124.24 with SMTP id x24csp1838166ljc; Sat, 17 Feb 2018 11:00:45 -0800 (PST) X-Google-Smtp-Source: AH8x225Ut2HCj3ct83fGFdlBFuDOnLu4PrBOH7yN5JDiDMsGRMjYwZECS4RnZYMeuOEDqQmGtL8f X-Received: by 10.129.42.194 with SMTP id q185mr3832270ywq.245.1518894045250; Sat, 17 Feb 2018 11:00:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518894045; cv=none; d=google.com; s=arc-20160816; b=TrnMxROhklKybvA1/ZCVvp49kJNRwkYXVAfg7xnZs8qOrgDX1EA865g2TMSPzePcjo 0LCmr4YrRO+9hB23ZjMIVN/Lu/Qy6lLlFt2pH2vPUJ78HdBf/95U4pI4FPiGJngfYSeM 2je6npYBEuIsp3M8ZhQsJZ+21ewFVjQ6bmVmLKUPj1tkaQ/sIXz9/UNGAV6FfPIBAxpR /aCII2FbYIL7ML0a4iUuYcIpMUzossTNPFGQGb3G1pHyNjahRNzB6NoSB8KeIQW4w8Xn Dw15X4odlPouJG5dDRBvQlqp052KfrONLT7GULUv2K7IvkoS28q90pjczp+GIzVBC+o6 CkXw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:references:in-reply-to :message-id:date:to:from:dkim-signature:arc-authentication-results; bh=8ZPV9VHMlufqnzfBjaH561O57HKOPZlRQxm41Pp/2u4=; b=SiiI25eH1O4vTTKxerOLB8gfhxhjFRy1S5R/lgvgFqGPs/vBHIxyrPPMgI8Hl6vllt 1WfcooaZq7Ytryf4ZBzn+uyXopNh8z/JECOHF2k/jcn7T13/vM/akCA5Yfqg3tReGxN4 qnp/zWMwaYXev65X9+cCvLJV0HmAhgvGogqJvvV4SaVH2j3D0yc6iqy8Wa1nKZA2LbEf nsvg4vnpByIgoSJev3zFCT6jc93MTR6UNojD3Arsd+IKLFr3hMNW4Lgjpl6IHFgcRh36 fcr9JNsBGlTbBUhRBtu54Ylmw0f7JIUbOc6E8/JocNhzi6Q7dQuKHHFiTSWAqUUJ6lHG uJVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=YsP6ikEi; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id z7si1128457ybm.243.2018.02.17.11.00.45 for (version=TLS1 cipher=AES128-SHA bits=128/128); Sat, 17 Feb 2018 11:00:45 -0800 (PST) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Authentication-Results: mx.google.com; dkim=fail header.i=@linaro.org header.s=google header.b=YsP6ikEi; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) smtp.mailfrom=qemu-devel-bounces+patch=linaro.org@nongnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([::1]:48363 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1en7j2-0005Dq-IU for patch@linaro.org; Sat, 17 Feb 2018 14:00:44 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40198) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1en79k-0000so-6c for qemu-devel@nongnu.org; Sat, 17 Feb 2018 13:24:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1en79i-0001q4-Li for qemu-devel@nongnu.org; Sat, 17 Feb 2018 13:24:16 -0500 Received: from mail-pf0-x243.google.com ([2607:f8b0:400e:c00::243]:38521) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1en79i-0001pp-E7 for qemu-devel@nongnu.org; Sat, 17 Feb 2018 13:24:14 -0500 Received: by mail-pf0-x243.google.com with SMTP id i3so592644pfe.5 for ; Sat, 17 Feb 2018 10:24:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=8ZPV9VHMlufqnzfBjaH561O57HKOPZlRQxm41Pp/2u4=; b=YsP6ikEiSp9Batn91eU/Rl+M+SJ/N9BzD+8oWcykyx2ZJE9/F1T/GFEl83APQ5+qOj gt2S/r7NOtbmCKF6LzwtkwGEaC046JHhosydXCvqNpYFq03G8cq6TBlgnr6iPRPufh7S HyBXWvCu3wyqngPFLby+5f3in/+Ga9vp80+OY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=8ZPV9VHMlufqnzfBjaH561O57HKOPZlRQxm41Pp/2u4=; b=YMBxDaXxRWiTDaP+29fswG4X34vi0kbCsKuLHMwNEQlLYuMsJp+h7HLRi6hL6tBZbF CeaYelxMkd5UTJ7GYSB9O/fYS0lc/QNR8YdUqJbarsX/c00Oo0BXx6Jb42otor0O75th sFbGXMHWZMrXRC9mqUnq2EFTO49TUxSreEVYEyoe/mUQzxSz9RAGO/QZ+R/tf/DRavbA hWNH76y36HzcaiAdtsqXv5jmvbNcmOuEgI1wjK3V3fxZGWmU2zYQCgHSQvMKV0nJSoqu FWfZ3W4/jO1EvtMFpjU/ZoNsrFJdRHKx3yiYCLMjU5bBQwV2Br30wSJK6BNlKnMKS72r 2ELA== X-Gm-Message-State: APf1xPAjgj2PeE8ZztzJehO7yQkKYdkiqKFGCuWHVyo9WLkW6uUeEF/C xO5A69xUMeVf4XDp1bsflR1JzGoJJ2E= X-Received: by 10.99.146.3 with SMTP id o3mr8290580pgd.309.1518891853101; Sat, 17 Feb 2018 10:24:13 -0800 (PST) Received: from cloudburst.twiddle.net ([50.0.192.64]) by smtp.gmail.com with ESMTPSA id h15sm13466712pfi.56.2018.02.17.10.24.11 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 17 Feb 2018 10:24:12 -0800 (PST) From: Richard Henderson To: qemu-devel@nongnu.org Date: Sat, 17 Feb 2018 10:22:45 -0800 Message-Id: <20180217182323.25885-30-richard.henderson@linaro.org> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180217182323.25885-1-richard.henderson@linaro.org> References: <20180217182323.25885-1-richard.henderson@linaro.org> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2607:f8b0:400e:c00::243 Subject: [Qemu-devel] [PATCH v2 29/67] target/arm: Implement SVE Permute - Interleaving Group X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Richard Henderson --- target/arm/helper-sve.h | 15 ++++++++++ target/arm/sve_helper.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++ target/arm/translate-sve.c | 69 ++++++++++++++++++++++++++++++++++++++++++++ target/arm/sve.decode | 10 +++++++ 4 files changed, 166 insertions(+) -- 2.14.3 Reviewed-by: Peter Maydell diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h index ff958fcebd..bab20345c6 100644 --- a/target/arm/helper-sve.h +++ b/target/arm/helper-sve.h @@ -445,6 +445,21 @@ DEF_HELPER_FLAGS_4(sve_trn_p, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_rev_p, TCG_CALL_NO_RWG, void, ptr, ptr, i32) DEF_HELPER_FLAGS_3(sve_punpk_p, TCG_CALL_NO_RWG, void, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_zip_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_zip_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_zip_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_zip_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve_uzp_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_uzp_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_uzp_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_uzp_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + +DEF_HELPER_FLAGS_4(sve_trn_b, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_trn_h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_trn_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) +DEF_HELPER_FLAGS_4(sve_trn_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32) + DEF_HELPER_FLAGS_5(sve_and_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_bic_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sve_eor_pppp, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c index c3a2706a16..62982bd099 100644 --- a/target/arm/sve_helper.c +++ b/target/arm/sve_helper.c @@ -1944,3 +1944,75 @@ void HELPER(sve_punpk_p)(void *vd, void *vn, uint32_t pred_desc) } } } + +#define DO_ZIP(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + intptr_t i, oprsz_2 = oprsz / 2; \ + ARMVectorReg tmp_n, tmp_m; \ + /* We produce output faster than we consume input. \ + Therefore we must be mindful of possible overlap. */ \ + if (unlikely((vn - vd) < (uintptr_t)oprsz)) { \ + vn = memcpy(&tmp_n, vn, oprsz_2); \ + } \ + if (unlikely((vm - vd) < (uintptr_t)oprsz)) { \ + vm = memcpy(&tmp_m, vm, oprsz_2); \ + } \ + for (i = 0; i < oprsz_2; i += sizeof(TYPE)) { \ + *(TYPE *)(vd + H(2 * i + 0)) = *(TYPE *)(vn + H(i)); \ + *(TYPE *)(vd + H(2 * i + sizeof(TYPE))) = *(TYPE *)(vm + H(i)); \ + } \ +} + +DO_ZIP(sve_zip_b, uint8_t, H1) +DO_ZIP(sve_zip_h, uint16_t, H1_2) +DO_ZIP(sve_zip_s, uint32_t, H1_4) +DO_ZIP(sve_zip_d, uint64_t, ) + +#define DO_UZP(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + intptr_t oprsz_2 = oprsz / 2; \ + intptr_t odd_ofs = simd_data(desc); \ + intptr_t i; \ + ARMVectorReg tmp_m; \ + if (unlikely((vm - vd) < (uintptr_t)oprsz)) { \ + vm = memcpy(&tmp_m, vm, oprsz); \ + } \ + for (i = 0; i < oprsz_2; i += sizeof(TYPE)) { \ + *(TYPE *)(vd + H(i)) = *(TYPE *)(vn + H(2 * i + odd_ofs)); \ + } \ + for (i = 0; i < oprsz_2; i += sizeof(TYPE)) { \ + *(TYPE *)(vd + H(oprsz_2 + i)) = *(TYPE *)(vm + H(2 * i + odd_ofs)); \ + } \ +} + +DO_UZP(sve_uzp_b, uint8_t, H1) +DO_UZP(sve_uzp_h, uint16_t, H1_2) +DO_UZP(sve_uzp_s, uint32_t, H1_4) +DO_UZP(sve_uzp_d, uint64_t, ) + +#define DO_TRN(NAME, TYPE, H) \ +void HELPER(NAME)(void *vd, void *vn, void *vm, uint32_t desc) \ +{ \ + intptr_t oprsz = simd_oprsz(desc); \ + intptr_t odd_ofs = simd_data(desc); \ + intptr_t i; \ + for (i = 0; i < oprsz; i += 2 * sizeof(TYPE)) { \ + TYPE ae = *(TYPE *)(vn + H(i + odd_ofs)); \ + TYPE be = *(TYPE *)(vm + H(i + odd_ofs)); \ + *(TYPE *)(vd + H(i + 0)) = ae; \ + *(TYPE *)(vd + H(i + sizeof(TYPE))) = be; \ + } \ +} + +DO_TRN(sve_trn_b, uint8_t, H1) +DO_TRN(sve_trn_h, uint16_t, H1_2) +DO_TRN(sve_trn_s, uint32_t, H1_4) +DO_TRN(sve_trn_d, uint64_t, ) + +#undef DO_ZIP +#undef DO_UZP +#undef DO_TRN diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c index 45e1ea87bf..09ac955a36 100644 --- a/target/arm/translate-sve.c +++ b/target/arm/translate-sve.c @@ -2042,6 +2042,75 @@ static void trans_PUNPKHI(DisasContext *s, arg_PUNPKHI *a, uint32_t insn) do_perm_pred2(s, a, 1, gen_helper_sve_punpk_p); } +/* + *** SVE Permute - Interleaving Group + */ + +static void do_zip(DisasContext *s, arg_rrr_esz *a, bool high) +{ + static gen_helper_gvec_3 * const fns[4] = { + gen_helper_sve_zip_b, gen_helper_sve_zip_h, + gen_helper_sve_zip_s, gen_helper_sve_zip_d, + }; + unsigned vsz = vec_full_reg_size(s); + unsigned high_ofs = high ? vsz / 2 : 0; + + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn) + high_ofs, + vec_full_reg_offset(s, a->rm) + high_ofs, + vsz, vsz, 0, fns[a->esz]); +} + +static void do_zzz_data_ool(DisasContext *s, arg_rrr_esz *a, int data, + gen_helper_gvec_3 *fn) +{ + unsigned vsz = vec_full_reg_size(s); + tcg_gen_gvec_3_ool(vec_full_reg_offset(s, a->rd), + vec_full_reg_offset(s, a->rn), + vec_full_reg_offset(s, a->rm), + vsz, vsz, data, fn); +} + +static void trans_ZIP1_z(DisasContext *s, arg_rrr_esz *a, uint32_t insn) +{ + do_zip(s, a, false); +} + +static void trans_ZIP2_z(DisasContext *s, arg_rrr_esz *a, uint32_t insn) +{ + do_zip(s, a, true); +} + +static gen_helper_gvec_3 * const uzp_fns[4] = { + gen_helper_sve_uzp_b, gen_helper_sve_uzp_h, + gen_helper_sve_uzp_s, gen_helper_sve_uzp_d, +}; + +static void trans_UZP1_z(DisasContext *s, arg_rrr_esz *a, uint32_t insn) +{ + do_zzz_data_ool(s, a, 0, uzp_fns[a->esz]); +} + +static void trans_UZP2_z(DisasContext *s, arg_rrr_esz *a, uint32_t insn) +{ + do_zzz_data_ool(s, a, 1 << a->esz, uzp_fns[a->esz]); +} + +static gen_helper_gvec_3 * const trn_fns[4] = { + gen_helper_sve_trn_b, gen_helper_sve_trn_h, + gen_helper_sve_trn_s, gen_helper_sve_trn_d, +}; + +static void trans_TRN1_z(DisasContext *s, arg_rrr_esz *a, uint32_t insn) +{ + do_zzz_data_ool(s, a, 0, trn_fns[a->esz]); +} + +static void trans_TRN2_z(DisasContext *s, arg_rrr_esz *a, uint32_t insn) +{ + do_zzz_data_ool(s, a, 1 << a->esz, trn_fns[a->esz]); +} + /* *** SVE Memory - 32-bit Gather and Unsized Contiguous Group */ diff --git a/target/arm/sve.decode b/target/arm/sve.decode index bcbe84c3a6..2efa3773fc 100644 --- a/target/arm/sve.decode +++ b/target/arm/sve.decode @@ -415,6 +415,16 @@ REV_p 00000101 .. 11 0100 010 000 0 .... 0 .... @pd_pn PUNPKLO 00000101 00 11 0000 010 000 0 .... 0 .... @pd_pn_e0 PUNPKHI 00000101 00 11 0001 010 000 0 .... 0 .... @pd_pn_e0 +### SVE Permute - Interleaving Group + +# SVE permute vector elements +ZIP1_z 00000101 .. 1 ..... 011 000 ..... ..... @rd_rn_rm +ZIP2_z 00000101 .. 1 ..... 011 001 ..... ..... @rd_rn_rm +UZP1_z 00000101 .. 1 ..... 011 010 ..... ..... @rd_rn_rm +UZP2_z 00000101 .. 1 ..... 011 011 ..... ..... @rd_rn_rm +TRN1_z 00000101 .. 1 ..... 011 100 ..... ..... @rd_rn_rm +TRN2_z 00000101 .. 1 ..... 011 101 ..... ..... @rd_rn_rm + ### SVE Predicate Logical Operations Group # SVE predicate logical operations