From patchwork Tue Mar 18 19:23:55 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 26520 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qa0-f69.google.com (mail-qa0-f69.google.com [209.85.216.69]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 570E1203C3 for ; Tue, 18 Mar 2014 19:24:01 +0000 (UTC) Received: by mail-qa0-f69.google.com with SMTP id w5sf16078325qac.8 for ; Tue, 18 Mar 2014 12:24:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe:content-type :content-transfer-encoding; bh=hM34zbhPZ6wm4pxQT1hlYOHwklUd5v25eMJMLhYWqzU=; b=NqDrTo/+AGRZRhyfyBbWYrQ1NnLFEBLjqlymLVYCD7+JMbPM0H5gDQr85HcGGS+yK4 Y0irkBVbTd1yWReaz25nqeD6uiIxmATKd4Ocfjjno6dookM9iUyYC6vDJmBfxdsFND55 SrhU97Tnv2Oiymd3pD+Mx35RBOhy9LlMhFKkj5saeeMBcGdIPj8uZsVPtH5dIb1+MGi3 7pG5NxcT1fo/+GLrRoUTj2dAaMGmKQuttwaulSSlmCBcinzFwWBJTKWvgiV4D7Lw0gsl N07FoNus+eWjaXejbBoKQMytgEU7RH/SRxpeP4TufwBuE6UKpO1/m20tVc2WE6fZvSig Jv0w== X-Gm-Message-State: ALoCoQm51gc8nxcljZcQhuiNLEbydcufMCBVtapKT2eAirts7xlEAgPLHBKxCNS/X6NFb3UDvEKZ X-Received: by 10.52.94.47 with SMTP id cz15mr10937520vdb.0.1395170641034; Tue, 18 Mar 2014 12:24:01 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.34.233 with SMTP id l96ls2196156qgl.90.gmail; Tue, 18 Mar 2014 12:24:00 -0700 (PDT) X-Received: by 10.220.48.194 with SMTP id s2mr65215vcf.43.1395170640934; Tue, 18 Mar 2014 12:24:00 -0700 (PDT) Received: from mail-ve0-f169.google.com (mail-ve0-f169.google.com [209.85.128.169]) by mx.google.com with ESMTPS id gs7si2884059vdc.110.2014.03.18.12.24.00 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 18 Mar 2014 12:24:00 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.128.169 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.128.169; Received: by mail-ve0-f169.google.com with SMTP id pa12so7635381veb.14 for ; Tue, 18 Mar 2014 12:24:00 -0700 (PDT) X-Received: by 10.52.241.106 with SMTP id wh10mr22319223vdc.16.1395170640788; Tue, 18 Mar 2014 12:24:00 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.220.78.9 with SMTP id i9csp239712vck; Tue, 18 Mar 2014 12:24:00 -0700 (PDT) X-Received: by 10.194.87.163 with SMTP id az3mr3063582wjb.63.1395170639765; Tue, 18 Mar 2014 12:23:59 -0700 (PDT) Received: from mnementh.archaic.org.uk (mnementh.archaic.org.uk. [2001:8b0:1d0::1]) by mx.google.com with ESMTPS id y20si10249972wjq.23.2014.03.18.12.23.59 for (version=TLSv1.2 cipher=RC4-SHA bits=128/128); Tue, 18 Mar 2014 12:23:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of pm215@archaic.org.uk designates 2001:8b0:1d0::1 as permitted sender) client-ip=2001:8b0:1d0::1; Received: from pm215 by mnementh.archaic.org.uk with local (Exim 4.80) (envelope-from ) id 1WPzcB-0005Xq-7b; Tue, 18 Mar 2014 19:23:55 +0000 From: Peter Maydell To: qemu-devel@nongnu.org Cc: patches@linaro.org, Alexander Graf , Michael Matz , Dirk Mueller , Laurent Desnogues , kvmarm@lists.cs.columbia.edu, Richard Henderson , =?UTF-8?q?Alex=20Benn=C3=A9e?= , Christoffer Dall , Will Newton , Peter Crosthwaite Subject: [PATCH for-2.0 2/2] target-arm: A64: Add saturating accumulate ops (USQADD/SUQADD) Date: Tue, 18 Mar 2014 19:23:55 +0000 Message-Id: <1395170635-21281-3-git-send-email-peter.maydell@linaro.org> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1395170635-21281-1-git-send-email-peter.maydell@linaro.org> References: <1395170635-21281-1-git-send-email-peter.maydell@linaro.org> MIME-Version: 1.0 X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: peter.maydell@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.128.169 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , From: Alex Bennée Add the saturating accumulate operations USQADD and SUQADD to the A64 instruction set. This completes coverage of A64 Neon. These operations (which are unsigned + signed -> signed and signed + unsigned -> unsigned) don't exist in the A32/T32 instruction set, so require a complete new set of helper functions. Signed-off-by: Alex Bennée Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson --- target-arm/helper.h | 20 ++++-- target-arm/neon_helper.c | 165 +++++++++++++++++++++++++++++++++++++++++++++ target-arm/translate-a64.c | 109 ++++++++++++++++++++++++++++-- 3 files changed, 284 insertions(+), 10 deletions(-) diff --git a/target-arm/helper.h b/target-arm/helper.h index b006fd5..366c1b3 100644 --- a/target-arm/helper.h +++ b/target-arm/helper.h @@ -186,12 +186,20 @@ DEF_HELPER_FLAGS_2(rints, TCG_CALL_NO_RWG, f32, f32, ptr) DEF_HELPER_FLAGS_2(rintd, TCG_CALL_NO_RWG, f64, f64, ptr) /* neon_helper.c */ -DEF_HELPER_3(neon_qadd_u8, i32, env, i32, i32) -DEF_HELPER_3(neon_qadd_s8, i32, env, i32, i32) -DEF_HELPER_3(neon_qadd_u16, i32, env, i32, i32) -DEF_HELPER_3(neon_qadd_s16, i32, env, i32, i32) -DEF_HELPER_3(neon_qadd_u32, i32, env, i32, i32) -DEF_HELPER_3(neon_qadd_s32, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_qadd_u8, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_qadd_s8, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_qadd_u16, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_qadd_s16, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_qadd_u32, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_qadd_s32, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_uqadd_s8, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_uqadd_s16, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_uqadd_s32, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_uqadd_s64, TCG_CALL_NO_RWG, i64, env, i64, i64) +DEF_HELPER_FLAGS_3(neon_sqadd_u8, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_sqadd_u16, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_sqadd_u32, TCG_CALL_NO_RWG, i32, env, i32, i32) +DEF_HELPER_FLAGS_3(neon_sqadd_u64, TCG_CALL_NO_RWG, i64, env, i64, i64) DEF_HELPER_3(neon_qsub_u8, i32, env, i32, i32) DEF_HELPER_3(neon_qsub_s8, i32, env, i32, i32) DEF_HELPER_3(neon_qsub_u16, i32, env, i32, i32) diff --git a/target-arm/neon_helper.c b/target-arm/neon_helper.c index e23f224..3c65b8e 100644 --- a/target-arm/neon_helper.c +++ b/target-arm/neon_helper.c @@ -236,6 +236,171 @@ uint64_t HELPER(neon_qadd_s64)(CPUARMState *env, uint64_t src1, uint64_t src2) return res; } +/* Unsigned saturating accumulate of signed value + * + * Op1/Rn is treated as signed + * Op2/Rd is treated as unsigned + * + * Explicit casting is used to ensure the correct sign extension of + * inputs. The result is treated as a unsigned value and saturated as such. + * + * We use a macro for the 8/16 bit cases which expects signed integers of va, + * vb, and vr for interim calculation and an unsigned 32 bit result value r. + */ + +#define USATACC(bits, shift) \ + do { \ + va = (int##bits##_t)((a >> shift) & ((1 << bits) - 1)); \ + vb = (uint##bits##_t)((b >> shift) & ((1 << bits) - 1)); \ + vr = va + vb; \ + if (vr > UINT##bits##_MAX) { \ + SET_QC(); \ + vr = UINT##bits##_MAX; \ + } else if (vr < 0) { \ + SET_QC(); \ + vr = 0; \ + } \ + r |= (uint32_t) vr << shift; \ + } while (0) + +uint32_t HELPER(neon_uqadd_s8)(CPUARMState *env, uint32_t a, uint32_t b) +{ + int16_t va, vb, vr; + uint32_t r = 0; + + USATACC(8, 0); + USATACC(8, 8); + USATACC(8, 16); + USATACC(8, 24); + return r; +} + +uint32_t HELPER(neon_uqadd_s16)(CPUARMState *env, uint32_t a, uint32_t b) +{ + int32_t va, vb, vr; + uint64_t r = 0; + + USATACC(16, 0); + USATACC(16, 16); + return r; +} + +#undef USATACC + +uint32_t HELPER(neon_uqadd_s32)(CPUARMState *env, uint32_t a, uint32_t b) +{ + int64_t va = (int32_t)a; + int64_t vb = (uint32_t)b; + int64_t vr = va + vb; + if (vr > UINT32_MAX) { + SET_QC(); + vr = UINT32_MAX; + } else if (vr < 0) { + SET_QC(); + vr = 0; + } + return vr; +} + +uint64_t HELPER(neon_uqadd_s64)(CPUARMState *env, uint64_t a, uint64_t b) +{ + uint64_t res; + res = a + b; + /* We only need to look at the pattern of SIGN bits to detect + * +ve/-ve saturation + */ + if (~a & b & ~res & SIGNBIT64) { + SET_QC(); + res = UINT64_MAX; + } else if (a & ~b & res & SIGNBIT64) { + SET_QC(); + res = 0; + } + return res; +} + +/* Signed saturating accumulate of unsigned value + * + * Op1/Rn is treated as unsigned + * Op2/Rd is treated as signed + * + * The result is treated as a signed value and saturated as such + * + * We use a macro for the 8/16 bit cases which expects signed integers of va, + * vb, and vr for interim calculation and an unsigned 32 bit result value r. + */ + +#define SSATACC(bits, shift) \ + do { \ + va = (uint##bits##_t)((a >> shift) & ((1 << bits) - 1)); \ + vb = (int##bits##_t)((b >> shift) & ((1 << bits) - 1)); \ + vr = va + vb; \ + if (vr > INT##bits##_MAX) { \ + SET_QC(); \ + vr = INT##bits##_MAX; \ + } else if (vr < INT##bits##_MIN) { \ + SET_QC(); \ + vr = INT##bits##_MIN; \ + } \ + r |= (uint32_t) (vr & ((1 << bits) - 1)) << shift; \ + } while (0) + +uint32_t HELPER(neon_sqadd_u8)(CPUARMState *env, uint32_t a, uint32_t b) +{ + int16_t va, vb, vr; + uint32_t r = 0; + + SSATACC(8, 0); + SSATACC(8, 8); + SSATACC(8, 16); + SSATACC(8, 24); + return r; +} + +uint32_t HELPER(neon_sqadd_u16)(CPUARMState *env, uint32_t a, uint32_t b) +{ + int32_t va, vb, vr; + uint32_t r = 0; + + SSATACC(16, 0); + SSATACC(16, 16); + + return r; +} + +#undef SSATACC + +uint32_t HELPER(neon_sqadd_u32)(CPUARMState *env, uint32_t a, uint32_t b) +{ + int64_t res; + int64_t op1 = (uint32_t)a; + int64_t op2 = (int32_t)b; + res = op1 + op2; + if (res > INT32_MAX) { + SET_QC(); + res = INT32_MAX; + } else if (res < INT32_MIN) { + SET_QC(); + res = INT32_MIN; + } + return res; +} + +uint64_t HELPER(neon_sqadd_u64)(CPUARMState *env, uint64_t a, uint64_t b) +{ + uint64_t res; + res = a + b; + /* We only need to look at the pattern of SIGN bits to detect an overflow */ + if (((a & res) + | (~b & res) + | (a & ~b)) & SIGNBIT64) { + SET_QC(); + res = INT64_MAX; + } + return res; +} + + #define NEON_USAT(dest, src1, src2, type) do { \ uint32_t tmp = (uint32_t)src1 - (uint32_t)src2; \ if (tmp != (type)tmp) { \ diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index 18659d7..9f06450 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -7321,6 +7321,101 @@ static void handle_2misc_narrow(DisasContext *s, bool scalar, } } +/* Remaining saturating accumulating ops */ +static void handle_2misc_satacc(DisasContext *s, bool is_scalar, bool is_u, + bool is_q, int size, int rn, int rd) +{ + bool is_double = (size == 3); + + if (is_double) { + TCGv_i64 tcg_rn = tcg_temp_new_i64(); + TCGv_i64 tcg_rd = tcg_temp_new_i64(); + int pass; + + for (pass = 0; pass < (is_scalar ? 1 : 2); pass++) { + read_vec_element(s, tcg_rn, rn, pass, MO_64); + read_vec_element(s, tcg_rd, rd, pass, MO_64); + + if (is_u) { /* USQADD */ + gen_helper_neon_uqadd_s64(tcg_rd, cpu_env, tcg_rn, tcg_rd); + } else { /* SUQADD */ + gen_helper_neon_sqadd_u64(tcg_rd, cpu_env, tcg_rn, tcg_rd); + } + write_vec_element(s, tcg_rd, rd, pass, MO_64); + } + if (is_scalar) { + clear_vec_high(s, rd); + } + + tcg_temp_free_i64(tcg_rd); + tcg_temp_free_i64(tcg_rn); + } else { + TCGv_i32 tcg_rn = tcg_temp_new_i32(); + TCGv_i32 tcg_rd = tcg_temp_new_i32(); + int pass, maxpasses; + + if (is_scalar) { + maxpasses = 1; + } else { + maxpasses = is_q ? 4 : 2; + } + + for (pass = 0; pass < maxpasses; pass++) { + if (is_scalar) { + read_vec_element_i32(s, tcg_rn, rn, pass, size); + read_vec_element_i32(s, tcg_rd, rd, pass, size); + } else { + read_vec_element_i32(s, tcg_rn, rn, pass, MO_32); + read_vec_element_i32(s, tcg_rd, rd, pass, MO_32); + } + + if (is_u) { /* USQADD */ + switch (size) { + case 0: + gen_helper_neon_uqadd_s8(tcg_rd, cpu_env, tcg_rn, tcg_rd); + break; + case 1: + gen_helper_neon_uqadd_s16(tcg_rd, cpu_env, tcg_rn, tcg_rd); + break; + case 2: + gen_helper_neon_uqadd_s32(tcg_rd, cpu_env, tcg_rn, tcg_rd); + break; + default: + g_assert_not_reached(); + } + } else { /* SUQADD */ + switch (size) { + case 0: + gen_helper_neon_sqadd_u8(tcg_rd, cpu_env, tcg_rn, tcg_rd); + break; + case 1: + gen_helper_neon_sqadd_u16(tcg_rd, cpu_env, tcg_rn, tcg_rd); + break; + case 2: + gen_helper_neon_sqadd_u32(tcg_rd, cpu_env, tcg_rn, tcg_rd); + break; + default: + g_assert_not_reached(); + } + } + + if (is_scalar) { + TCGv_i64 tcg_zero = tcg_const_i64(0); + write_vec_element(s, tcg_zero, rd, 0, MO_64); + tcg_temp_free_i64(tcg_zero); + } + write_vec_element_i32(s, tcg_rd, rd, pass, MO_32); + } + + if (!is_q) { + clear_vec_high(s, rd); + } + + tcg_temp_free_i32(tcg_rd); + tcg_temp_free_i32(tcg_rn); + } +} + /* C3.6.12 AdvSIMD scalar two reg misc * 31 30 29 28 24 23 22 21 17 16 12 11 10 9 5 4 0 * +-----+---+-----------+------+-----------+--------+-----+------+------+ @@ -7340,6 +7435,9 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) TCGv_ptr tcg_fpstatus; switch (opcode) { + case 0x3: /* USQADD / SUQADD*/ + handle_2misc_satacc(s, true, u, false, size, rn, rd); + return; case 0x7: /* SQABS / SQNEG */ break; case 0xa: /* CMLT */ @@ -7427,10 +7525,7 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) } break; default: - /* Other categories of encoding in this class: - * + SUQADD/USQADD/SQABS/SQNEG : size 8, 16, 32 or 64 - */ - unsupported_encoding(s, insn); + unallocated_encoding(s); return; } @@ -9194,6 +9289,12 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } break; case 0x3: /* SUQADD, USQADD */ + if (size == 3 && !is_q) { + unallocated_encoding(s); + return; + } + handle_2misc_satacc(s, false, u, is_q, size, rn, rd); + return; case 0x7: /* SQABS, SQNEG */ if (size == 3 && !is_q) { unallocated_encoding(s);