From patchwork Sun Jan 26 19:24:52 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 23707 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-pd0-f197.google.com (mail-pd0-f197.google.com [209.85.192.197]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 3F89E202FA for ; Sun, 26 Jan 2014 19:25:28 +0000 (UTC) Received: by mail-pd0-f197.google.com with SMTP id x10sf12261880pdj.4 for ; Sun, 26 Jan 2014 11:25:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:in-reply-to:references:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe; bh=r0OM+ypL8uXc0j8L0PhHJKd+TrSOzkKBAZWYZPxzbYU=; b=JrnlMqxjrjLSP2WiYyfjxO4eFWZh0B7kZ/Gu/zNO+pi22q9Qp8RNknWd/zAb3mdmBP OuEvpNaT72OdZmCsiLGKDJ+4mnpsz1m/CSeu27sretNn6Eouykl46iukRLl82euCM8NW TkE9rYSkDcVHUAosMYhnR8A4zZ9Bofv9Xuz84UBSsZNcNcdeI9uc3DR+OMTvWPJ6X7g4 ujDE1L/iRb7pPPFKIb3UvNN+bp3FlXW6IE+YPSmV8yMlgKhFiZSw0+56JIX4qBsil4cc dc/XPcGlIGSj+Rh/FjpVxSPAi/mtNu6s3eZGyInzUZ3KYxacKXcrTihgq5di1eHw4cwN YHRA== X-Gm-Message-State: ALoCoQkgNKXnVlXu9rHQJpRuZusGsAZF5Iol+ezRa0Pprb9lDRzNnajEEkWnqKS3+Rh+oJqmlkxM X-Received: by 10.66.240.4 with SMTP id vw4mr9193701pac.10.1390764327531; Sun, 26 Jan 2014 11:25:27 -0800 (PST) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.16.145 with SMTP id 17ls1354780qgb.58.gmail; Sun, 26 Jan 2014 11:25:27 -0800 (PST) X-Received: by 10.59.0.193 with SMTP id ba1mr13544344ved.12.1390764327378; Sun, 26 Jan 2014 11:25:27 -0800 (PST) Received: from mail-ve0-f176.google.com (mail-ve0-f176.google.com [209.85.128.176]) by mx.google.com with ESMTPS id k9si4186112vez.125.2014.01.26.11.25.27 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Sun, 26 Jan 2014 11:25:27 -0800 (PST) Received-SPF: neutral (google.com: 209.85.128.176 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.128.176; Received: by mail-ve0-f176.google.com with SMTP id oz11so3007274veb.21 for ; Sun, 26 Jan 2014 11:25:27 -0800 (PST) X-Received: by 10.58.181.165 with SMTP id dx5mr13139717vec.19.1390764327290; Sun, 26 Jan 2014 11:25:27 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.220.174.196 with SMTP id u4csp81097vcz; Sun, 26 Jan 2014 11:25:26 -0800 (PST) X-Received: by 10.66.144.227 with SMTP id sp3mr26482269pab.100.1390764322245; Sun, 26 Jan 2014 11:25:22 -0800 (PST) Received: from mnementh.archaic.org.uk (mnementh.archaic.org.uk. [81.2.115.146]) by mx.google.com with ESMTPS id nf8si8697341pbc.0.2014.01.26.11.25.18 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Sun, 26 Jan 2014 11:25:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of pm215@archaic.org.uk designates 81.2.115.146 as permitted sender) client-ip=81.2.115.146; Received: from pm215 by mnementh.archaic.org.uk with local (Exim 4.80) (envelope-from ) id 1W7VKS-0005fy-TO; Sun, 26 Jan 2014 19:25:12 +0000 From: Peter Maydell To: qemu-devel@nongnu.org Cc: patches@linaro.org, Alexander Graf , Michael Matz , Claudio Fontana , Dirk Mueller , Laurent Desnogues , kvmarm@lists.cs.columbia.edu, Richard Henderson , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Christoffer Dall , Will Newton , Peter Crosthwaite Subject: [PATCH 01/21] target-arm: A64: Add SIMD three-different multiply accumulate insns Date: Sun, 26 Jan 2014 19:24:52 +0000 Message-Id: <1390764312-21789-2-git-send-email-peter.maydell@linaro.org> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1390764312-21789-1-git-send-email-peter.maydell@linaro.org> References: <1390764312-21789-1-git-send-email-peter.maydell@linaro.org> X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: peter.maydell@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.128.176 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , Add support for the multiply-accumulate instructions from the SIMD three-different instructions group (C3.6.15): * skeleton decode of unallocated encodings and split of the group into its three sub-parts * framework for handling the 64x64->128 widening subpart * implementation of the multiply-accumulate instructions SMLAL, SMLAL2, UMLAL, UMLAL2, SMLSL, SMLSL2, UMLSL, UMLSL2, UMULL, UMULL2, SMULL, SMULL2 Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson --- target-arm/translate-a64.c | 233 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 232 insertions(+), 1 deletion(-) diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index 7cfb55b..924a539 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -700,6 +700,9 @@ static void do_fp_ld(DisasContext *s, int destidx, TCGv_i64 tcg_addr, int size) * zero extend as we are filling a partial chunk of the vector register. * These functions don't support 128 bit loads/stores, which would be * normal load/store operations. + * + * The _i32 versions are useful when operating on 32 bit quantities + * (eg for floating point single or using Neon helper functions). */ /* Get value of an element within a vector register */ @@ -735,6 +738,32 @@ static void read_vec_element(DisasContext *s, TCGv_i64 tcg_dest, int srcidx, } } +static void read_vec_element_i32(DisasContext *s, TCGv_i32 tcg_dest, int srcidx, + int element, TCGMemOp memop) +{ + int vect_off = vec_reg_offset(srcidx, element, memop & MO_SIZE); + switch (memop) { + case MO_8: + tcg_gen_ld8u_i32(tcg_dest, cpu_env, vect_off); + break; + case MO_16: + tcg_gen_ld16u_i32(tcg_dest, cpu_env, vect_off); + break; + case MO_8|MO_SIGN: + tcg_gen_ld8s_i32(tcg_dest, cpu_env, vect_off); + break; + case MO_16|MO_SIGN: + tcg_gen_ld16s_i32(tcg_dest, cpu_env, vect_off); + break; + case MO_32: + case MO_32|MO_SIGN: + tcg_gen_ld_i32(tcg_dest, cpu_env, vect_off); + break; + default: + g_assert_not_reached(); + } +} + /* Set value of an element within a vector register */ static void write_vec_element(DisasContext *s, TCGv_i64 tcg_src, int destidx, int element, TCGMemOp memop) @@ -5546,6 +5575,150 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn) unsupported_encoding(s, insn); } +static void handle_3rd_widening(DisasContext *s, int is_q, int is_u, int size, + int opcode, int rd, int rn, int rm) +{ + /* 3-reg-different widening insns: 64 x 64 -> 128 */ + TCGv_i64 tcg_res[2]; + int pass, accop; + + tcg_res[0] = tcg_temp_new_i64(); + tcg_res[1] = tcg_temp_new_i64(); + + /* Does this op do an adding accumulate, a subtracting accumulate, + * or no accumulate at all? + */ + switch (opcode) { + case 5: + case 8: + case 9: + accop = 1; + break; + case 10: + case 11: + accop = -1; + break; + default: + accop = 0; + break; + } + + if (accop != 0) { + read_vec_element(s, tcg_res[0], rd, 0, MO_64); + read_vec_element(s, tcg_res[1], rd, 1, MO_64); + } + + /* size == 2 means two 32x32->64 operations; this is worth special + * casing because we can generally handle it inline. + */ + if (size == 2) { + for (pass = 0; pass < 2; pass++) { + TCGv_i64 tcg_op1 = tcg_temp_new_i64(); + TCGv_i64 tcg_op2 = tcg_temp_new_i64(); + TCGv_i64 tcg_passres; + TCGMemOp memop = MO_32 | (is_u ? 0 : MO_SIGN); + + int elt = pass + is_q * 2; + + read_vec_element(s, tcg_op1, rn, elt, memop); + read_vec_element(s, tcg_op2, rm, elt, memop); + + if (accop == 0) { + tcg_passres = tcg_res[pass]; + } else { + tcg_passres = tcg_temp_new_i64(); + } + + switch (opcode) { + case 8: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */ + case 10: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */ + case 12: /* UMULL, UMULL2, SMULL, SMULL2 */ + tcg_gen_mul_i64(tcg_passres, tcg_op1, tcg_op2); + break; + default: + g_assert_not_reached(); + } + + if (accop > 0) { + tcg_gen_add_i64(tcg_res[pass], tcg_res[pass], tcg_passres); + tcg_temp_free_i64(tcg_passres); + } else if (accop < 0) { + tcg_gen_sub_i64(tcg_res[pass], tcg_res[pass], tcg_passres); + tcg_temp_free_i64(tcg_passres); + } + + tcg_temp_free_i64(tcg_op1); + tcg_temp_free_i64(tcg_op2); + } + } else { + /* size 0 or 1, generally helper functions */ + for (pass = 0; pass < 2; pass++) { + TCGv_i32 tcg_op1 = tcg_temp_new_i32(); + TCGv_i32 tcg_op2 = tcg_temp_new_i32(); + TCGv_i64 tcg_passres; + int elt = pass + is_q * 2; + + read_vec_element_i32(s, tcg_op1, rn, elt, MO_32); + read_vec_element_i32(s, tcg_op2, rm, elt, MO_32); + + if (accop == 0) { + tcg_passres = tcg_res[pass]; + } else { + tcg_passres = tcg_temp_new_i64(); + } + + switch (opcode) { + case 8: /* SMLAL, SMLAL2, UMLAL, UMLAL2 */ + case 10: /* SMLSL, SMLSL2, UMLSL, UMLSL2 */ + case 12: /* UMULL, UMULL2, SMULL, SMULL2 */ + if (size == 0) { + if (is_u) { + gen_helper_neon_mull_u8(tcg_passres, tcg_op1, tcg_op2); + } else { + gen_helper_neon_mull_s8(tcg_passres, tcg_op1, tcg_op2); + } + } else { + if (is_u) { + gen_helper_neon_mull_u16(tcg_passres, tcg_op1, tcg_op2); + } else { + gen_helper_neon_mull_s16(tcg_passres, tcg_op1, tcg_op2); + } + } + break; + default: + g_assert_not_reached(); + } + tcg_temp_free_i32(tcg_op1); + tcg_temp_free_i32(tcg_op2); + + if (accop > 0) { + if (size == 0) { + gen_helper_neon_addl_u16(tcg_res[pass], tcg_res[pass], + tcg_passres); + } else { + gen_helper_neon_addl_u32(tcg_res[pass], tcg_res[pass], + tcg_passres); + } + tcg_temp_free_i64(tcg_passres); + } else if (accop < 0) { + if (size == 0) { + gen_helper_neon_subl_u16(tcg_res[pass], tcg_res[pass], + tcg_passres); + } else { + gen_helper_neon_subl_u32(tcg_res[pass], tcg_res[pass], + tcg_passres); + } + tcg_temp_free_i64(tcg_passres); + } + } + } + + write_vec_element(s, tcg_res[0], rd, 0, MO_64); + write_vec_element(s, tcg_res[1], rd, 1, MO_64); + tcg_temp_free_i64(tcg_res[0]); + tcg_temp_free_i64(tcg_res[1]); +} + /* C3.6.15 AdvSIMD three different * 31 30 29 28 24 23 22 21 20 16 15 12 11 10 9 5 4 0 * +---+---+---+-----------+------+---+------+--------+-----+------+------+ @@ -5554,7 +5727,65 @@ static void disas_simd_shift_imm(DisasContext *s, uint32_t insn) */ static void disas_simd_three_reg_diff(DisasContext *s, uint32_t insn) { - unsupported_encoding(s, insn); + /* Instructions in this group fall into three basic classes + * (in each case with the operation working on each element in + * the input vectors): + * (1) widening 64 x 64 -> 128 (with possibly Vd as an extra + * 128 bit input) + * (2) wide 64 x 128 -> 128 + * (3) narrowing 128 x 128 -> 64 + * Here we do initial decode, catch unallocated cases and + * dispatch to separate functions for each class. + */ + int is_q = extract32(insn, 30, 1); + int is_u = extract32(insn, 29, 1); + int size = extract32(insn, 22, 2); + int opcode = extract32(insn, 12, 4); + int rm = extract32(insn, 16, 5); + int rn = extract32(insn, 5, 5); + int rd = extract32(insn, 0, 5); + + switch (opcode) { + case 1: /* SADDW, SADDW2, UADDW, UADDW2 */ + case 3: /* SSUBW, SSUBW2, USUBW, USUBW2 */ + /* 64 x 128 -> 128 */ + unsupported_encoding(s, insn); + break; + case 4: /* ADDHN, ADDHN2, RADDHN, RADDHN2 */ + case 6: /* SUBHN, SUBHN2, RSUBHN, RSUBHN2 */ + /* 128 x 128 -> 64 */ + unsupported_encoding(s, insn); + break; + case 9: + case 11: + case 13: + case 14: + if (is_u) { + unallocated_encoding(s); + return; + } + /* fall through */ + case 0: + case 2: + case 5: + case 7: + unsupported_encoding(s, insn); + break; + case 8: + case 10: + case 12: + /* 64 x 64 -> 128 */ + if (size == 3) { + unallocated_encoding(s); + return; + } + handle_3rd_widening(s, is_q, is_u, size, opcode, rd, rn, rm); + break; + default: + /* opcode 15 not allocated */ + unallocated_encoding(s); + break; + } } /* C3.6.16 AdvSIMD three same