From patchwork Wed Mar 19 12:05:46 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 26558 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-pd0-f198.google.com (mail-pd0-f198.google.com [209.85.192.198]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id EFB5020534 for ; Wed, 19 Mar 2014 12:06:32 +0000 (UTC) Received: by mail-pd0-f198.google.com with SMTP id fp1sf20717088pdb.5 for ; Wed, 19 Mar 2014 05:06:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:from:to:date:message-id:in-reply-to :references:mime-version:cc:subject:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:list-subscribe :errors-to:sender:x-original-sender :x-original-authentication-results:mailing-list:content-type :content-transfer-encoding; bh=5p8aMw65v3XlD9x4tKH1W/1QXVEu6Ln0/VUAHqzQYG0=; b=IDhu7CjWfk0Z4DJ4Ev6EBM6vuvJpQGqb3B667BxX5TiYCKtVnXPSorSbS4ENQFl+D4 Dbq5u75ul770E2i42aVkgNs5jDq+rrvnkH/MWyiaHWyyfoZ/I6A+XmtjgO5qP/mLioVk GVzezGkvSSA8HrrJh1X2KrzEPbGMHd4AIRM2ko8MuDYZJZ8mObeWPPxbtY0J7ozpHp0r OQXbU6VZBIxmgL7MMxnoq/mMrEkHGpAISIsPiP0ru4YJoE0MynF6d69Un9iYheGjZUam HIimf/YxW59fJ+ex9i1rwE0TJeHWcCX2lAflQexLmCqvU47s0ntpG2NIYfxw5rKwWPRk sm9w== X-Gm-Message-State: ALoCoQkOGH4/22KT1cCS4A16/9YQf+TPm307zckAGhMmIMeVuSSQChSY7a4g4immz16M/I+SFWNZ X-Received: by 10.66.163.70 with SMTP id yg6mr8913320pab.44.1395230792187; Wed, 19 Mar 2014 05:06:32 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.80.115 with SMTP id b106ls2489184qgd.97.gmail; Wed, 19 Mar 2014 05:06:32 -0700 (PDT) X-Received: by 10.58.8.8 with SMTP id n8mr1033081vea.23.1395230791996; Wed, 19 Mar 2014 05:06:31 -0700 (PDT) Received: from mail-vc0-f169.google.com (mail-vc0-f169.google.com [209.85.220.169]) by mx.google.com with ESMTPS id yv5si4387537veb.134.2014.03.19.05.06.31 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 19 Mar 2014 05:06:31 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.169 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.220.169; Received: by mail-vc0-f169.google.com with SMTP id ik5so9279153vcb.28 for ; Wed, 19 Mar 2014 05:06:31 -0700 (PDT) X-Received: by 10.220.10.2 with SMTP id n2mr1090821vcn.26.1395230791917; Wed, 19 Mar 2014 05:06:31 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.220.78.9 with SMTP id i9csp293284vck; Wed, 19 Mar 2014 05:06:31 -0700 (PDT) X-Received: by 10.224.57.81 with SMTP id b17mr43003457qah.44.1395230790900; Wed, 19 Mar 2014 05:06:30 -0700 (PDT) Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id k67si12648971qge.64.2014.03.19.05.06.30 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 19 Mar 2014 05:06:30 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Received: from localhost ([::1]:40426 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WQFGQ-0004i1-Av for patch@linaro.org; Wed, 19 Mar 2014 08:06:30 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49650) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WQFFu-0004AB-5n for qemu-devel@nongnu.org; Wed, 19 Mar 2014 08:05:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WQFFm-0005oE-4h for qemu-devel@nongnu.org; Wed, 19 Mar 2014 08:05:58 -0400 Received: from mnementh.archaic.org.uk ([2001:8b0:1d0::1]:47065) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WQFFl-0005nv-Nb for qemu-devel@nongnu.org; Wed, 19 Mar 2014 08:05:50 -0400 Received: from pm215 by mnementh.archaic.org.uk with local (Exim 4.80) (envelope-from ) id 1WQFFj-0005u4-96; Wed, 19 Mar 2014 12:05:47 +0000 From: Peter Maydell To: Anthony Liguori Date: Wed, 19 Mar 2014 12:05:46 +0000 Message-Id: <1395230746-22643-7-git-send-email-peter.maydell@linaro.org> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1395230746-22643-1-git-send-email-peter.maydell@linaro.org> References: <1395230746-22643-1-git-send-email-peter.maydell@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:8b0:1d0::1 Cc: Blue Swirl , =?UTF-8?q?Andreas=20F=C3=A4rber?= , qemu-devel@nongnu.org, Aurelien Jarno Subject: [Qemu-devel] [PULL 6/6] target-arm: A64: Add saturating accumulate ops (USQADD/SUQADD) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: peter.maydell@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.220.169 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 From: Alex Bennée Add the saturating accumulate operations USQADD and SUQADD to the A64 instruction set. This completes coverage of A64 Neon. These operations (which are unsigned + signed -> signed and signed + unsigned -> unsigned) don't exist in the A32/T32 instruction set, so require a complete new set of helper functions. Signed-off-by: Alex Bennée Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson --- target-arm/helper.h | 20 ++++-- target-arm/neon_helper.c | 165 +++++++++++++++++++++++++++++++++++++++++++++ target-arm/translate-a64.c | 109 ++++++++++++++++++++++++++++-- 3 files changed, 284 insertions(+), 10 deletions(-) diff --git a/target-arm/helper.h b/target-arm/helper.h index b006fd5..366c1b3 100644 --- a/target-arm/helper.h +++ b/target-arm/helper.h @@ -186,12 +186,20 @@ DEF_HELPER_FLAGS_2(rints, TCG_CALL_NO_RWG, f32, f32, ptr) DEF_HELPER_FLAGS_2(rintd, TCG_CALL_NO_RWG, f64, f64, ptr) /* neon_helper.c */ -DEF_HELPER_3(neon_qadd_u8, i32, env, i32, i32) -DEF_HELPER_3(neon_qadd_s8, i32, env, i32, i32) -DEF_HELPER_3(neon_qadd_u16, i32, env, i32, i32) -DEF_HELPER_3(neon_qadd_s16, i32, env, i32, i32) -DEF_HELPER_3(neon_qadd_u32, i32, env, i32, i32) -DEF_HELPER_3(neon_qadd_s32, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_qadd_u8, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_qadd_s8, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_qadd_u16, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_qadd_s16, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_qadd_u32, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_qadd_s32, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_uqadd_s8, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_uqadd_s16, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_uqadd_s32, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_uqadd_s64, TCG_CALL_NO_RWG, i64, env, i64, i64) +DEF_HELPER_FLAGS_3(neon_sqadd_u8, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_sqadd_u16, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_sqadd_u32, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_sqadd_u64, TCG_CALL_NO_RWG, i64, env, i64, i64) DEF_HELPER_3(neon_qsub_u8, i32, env, i32, i32) DEF_HELPER_3(neon_qsub_s8, i32, env, i32, i32) DEF_HELPER_3(neon_qsub_u16, i32, env, i32, i32) diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c index e23f224..8d6f9a9 100644 --- a/target-arm/neon_helper.c +++ b/target-arm/neon_helper.c @@ -236,6 +236,171 @@ uint64_t HELPER(neon_qadd_s64)(CPUARMState *env, uint64_t src1, uint64_t src2) return res; } +/* Unsigned saturating accumulate of signed value + * + * Op1/Rn is treated as signed + * Op2/Rd is treated as unsigned + * + * Explicit casting is used to ensure the correct sign extension of + * inputs. The result is treated as a unsigned value and saturated as such. + * + * We use a macro for the 8/16 bit cases which expects signed integers of va, + * vb, and vr for interim calculation and an unsigned 32 bit result value r. + */ + +#define USATACC(bits, shift) \ + do { \ + va = sextract32(a, shift, bits); \ + vb = extract32(b, shift, bits); \ + vr = va + vb; \ + if (vr > UINT##bits##_MAX) { \ + SET_QC(); \ + vr = UINT##bits##_MAX; \ + } else if (vr < 0) { \ + SET_QC(); \ + vr = 0; \ + } \ + r = deposit32(r, shift, bits, vr); \ + } while (0) + +uint32_t HELPER(neon_uqadd_s8)(CPUARMState *env, uint32_t a, uint32_t b) +{ + int16_t va, vb, vr; + uint32_t r = 0; + + USATACC(8, 0); + USATACC(8, 8); + USATACC(8, 16); + USATACC(8, 24); + return r; +} + +uint32_t HELPER(neon_uqadd_s16)(CPUARMState *env, uint32_t a, uint32_t b) +{ + int32_t va, vb, vr; + uint64_t r = 0; + + USATACC(16, 0); + USATACC(16, 16); + return r; +} + +#undef USATACC + +uint32_t HELPER(neon_uqadd_s32)(CPUARMState *env, uint32_t a, uint32_t b) +{ + int64_t va = (int32_t)a; + int64_t vb = (uint32_t)b; + int64_t vr = va + vb; + if (vr > UINT32_MAX) { + SET_QC(); + vr = UINT32_MAX; + } else if (vr < 0) { + SET_QC(); + vr = 0; + } + return vr; +} + +uint64_t HELPER(neon_uqadd_s64)(CPUARMState *env, uint64_t a, uint64_t b) +{ + uint64_t res; + res = a + b; + /* We only need to look at the pattern of SIGN bits to detect + * +ve/-ve saturation + */ + if (~a & b & ~res & SIGNBIT64) { + SET_QC(); + res = UINT64_MAX; + } else if (a & ~b & res & SIGNBIT64) { + SET_QC(); + res = 0; + } + return res; +} + +/* Signed saturating accumulate of unsigned value + * + * Op1/Rn is treated as unsigned + * Op2/Rd is treated as signed + * + * The result is treated as a signed value and saturated as such + * + * We use a macro for the 8/16 bit cases which expects signed integers of va, + * vb, and vr for interim calculation and an unsigned 32 bit result value r. + */ + +#define SSATACC(bits, shift) \ + do { \ + va = extract32(a, shift, bits); \ + vb = sextract32(b, shift, bits); \ + vr = va + vb; \ + if (vr > INT##bits##_MAX) { \ + SET_QC(); \ + vr = INT##bits##_MAX; \ + } else if (vr < INT##bits##_MIN) { \ + SET_QC(); \ + vr = INT##bits##_MIN; \ + } \ + r = deposit32(r, shift, bits, vr); \ + } while (0) + +uint32_t HELPER(neon_sqadd_u8)(CPUARMState *env, uint32_t a, uint32_t b) +{ + int16_t va, vb, vr; + uint32_t r = 0; + + SSATACC(8, 0); + SSATACC(8, 8); + SSATACC(8, 16); + SSATACC(8, 24); + return r; +} + +uint32_t HELPER(neon_sqadd_u16)(CPUARMState *env, uint32_t a, uint32_t b) +{ + int32_t va, vb, vr; + uint32_t r = 0; + + SSATACC(16, 0); + SSATACC(16, 16); + + return r; +} + +#undef SSATACC + +uint32_t HELPER(neon_sqadd_u32)(CPUARMState *env, uint32_t a, uint32_t b) +{ + int64_t res; + int64_t op1 = (uint32_t)a; + int64_t op2 = (int32_t)b; + res = op1 + op2; + if (res > INT32_MAX) { + SET_QC(); + res = INT32_MAX; + } else if (res < INT32_MIN) { + SET_QC(); + res = INT32_MIN; + } + return res; +} + +uint64_t HELPER(neon_sqadd_u64)(CPUARMState *env, uint64_t a, uint64_t b) +{ + uint64_t res; + res = a + b; + /* We only need to look at the pattern of SIGN bits to detect an overflow */ + if (((a & res) + | (~b & res) + | (a & ~b)) & SIGNBIT64) { + SET_QC(); + res = INT64_MAX; + } + return res; +} + + #define NEON_USAT(dest, src1, src2, type) do { \ uint32_t tmp = (uint32_t)src1 - (uint32_t)src2; \ if (tmp != (type)tmp) { \ diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index 18659d7..9f06450 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -7321,6 +7321,101 @@ static void handle_2misc_narrow(DisasContext *s, bool scalar, } } +/* Remaining saturating accumulating ops */ +static void handle_2misc_satacc(DisasContext *s, bool is_scalar, bool is_u, + bool is_q, int size, int rn, int rd) +{ + bool is_double = (size == 3); + + if (is_double) { + TCGv_i64 tcg_rn = tcg_temp_new_i64(); + TCGv_i64 tcg_rd = tcg_temp_new_i64(); + int pass; + + for (pass = 0; pass < (is_scalar ? 1 : 2); pass++) { + read_vec_element(s, tcg_rn, rn, pass, MO_64); + read_vec_element(s, tcg_rd, rd, pass, MO_64); + + if (is_u) { /* USQADD */ + gen_helper_neon_uqadd_s64(tcg_rd, cpu_env, tcg_rn, tcg_rd); + } else { /* SUQADD */ + gen_helper_neon_sqadd_u64(tcg_rd, cpu_env, tcg_rn, tcg_rd); + } + write_vec_element(s, tcg_rd, rd, pass, MO_64); + } + if (is_scalar) { + clear_vec_high(s, rd); + } + + tcg_temp_free_i64(tcg_rd); + tcg_temp_free_i64(tcg_rn); + } else { + TCGv_i32 tcg_rn = tcg_temp_new_i32(); + TCGv_i32 tcg_rd = tcg_temp_new_i32(); + int pass, maxpasses; + + if (is_scalar) { + maxpasses = 1; + } else { + maxpasses = is_q ? 4 : 2; + } + + for (pass = 0; pass < maxpasses; pass++) { + if (is_scalar) { + read_vec_element_i32(s, tcg_rn, rn, pass, size); + read_vec_element_i32(s, tcg_rd, rd, pass, size); + } else { + read_vec_element_i32(s, tcg_rn, rn, pass, MO_32); + read_vec_element_i32(s, tcg_rd, rd, pass, MO_32); + } + + if (is_u) { /* USQADD */ + switch (size) { + case 0: + gen_helper_neon_uqadd_s8(tcg_rd, cpu_env, tcg_rn, tcg_rd); + break; + case 1: + gen_helper_neon_uqadd_s16(tcg_rd, cpu_env, tcg_rn, tcg_rd); + break; + case 2: + gen_helper_neon_uqadd_s32(tcg_rd, cpu_env, tcg_rn, tcg_rd); + break; + default: + g_assert_not_reached(); + } + } else { /* SUQADD */ + switch (size) { + case 0: + gen_helper_neon_sqadd_u8(tcg_rd, cpu_env, tcg_rn, tcg_rd); + break; + case 1: + gen_helper_neon_sqadd_u16(tcg_rd, cpu_env, tcg_rn, tcg_rd); + break; + case 2: + gen_helper_neon_sqadd_u32(tcg_rd, cpu_env, tcg_rn, tcg_rd); + break; + default: + g_assert_not_reached(); + } + } + + if (is_scalar) { + TCGv_i64 tcg_zero = tcg_const_i64(0); + write_vec_element(s, tcg_zero, rd, 0, MO_64); + tcg_temp_free_i64(tcg_zero); + } + write_vec_element_i32(s, tcg_rd, rd, pass, MO_32); + } + + if (!is_q) { + clear_vec_high(s, rd); + } + + tcg_temp_free_i32(tcg_rd); + tcg_temp_free_i32(tcg_rn); + } +} + /* C3.6.12 AdvSIMD scalar two reg misc * 31 30 29 28 24 23 22 21 17 16 12 11 10 9 5 4 0 * +-----+---+-----------+------+-----------+--------+-----+------+------+ @@ -7340,6 +7435,9 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) TCGv_ptr tcg_fpstatus; switch (opcode) { + case 0x3: /* USQADD / SUQADD*/ + handle_2misc_satacc(s, true, u, false, size, rn, rd); + return; case 0x7: /* SQABS / SQNEG */ break; case 0xa: /* CMLT */ @@ -7427,10 +7525,7 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) } break; default: - /* Other categories of encoding in this class: - * + SUQADD/USQADD/SQABS/SQNEG : size 8, 16, 32 or 64 - */ - unsupported_encoding(s, insn); + unallocated_encoding(s); return; } @@ -9194,6 +9289,12 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } break; case 0x3: /* SUQADD, USQADD */ + if (size == 3 && !is_q) { + unallocated_encoding(s); + return; + } + handle_2misc_satacc(s, false, u, is_q, size, rn, rd); + return; case 0x7: /* SQABS, SQNEG */ if (size == 3 && !is_q) { unallocated_encoding(s);