From patchwork Mon Mar 17 22:12:20 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Peter Maydell X-Patchwork-Id: 26443 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-ie0-f197.google.com (mail-ie0-f197.google.com [209.85.223.197]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 3A3A0202FA for ; Mon, 17 Mar 2014 23:36:12 +0000 (UTC) Received: by mail-ie0-f197.google.com with SMTP id rd18sf23587451iec.0 for ; Mon, 17 Mar 2014 16:36:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:from:to:date:message-id:in-reply-to :references:mime-version:cc:subject:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:list-subscribe :errors-to:sender:x-original-sender :x-original-authentication-results:mailing-list:content-type :content-transfer-encoding; bh=OMNldKletY/mR0eCFgMlQ1VbKieri2Ih3v+mDm76dn0=; b=cSC3mjApDeyNpoxFFc9j+t6BDF5ZOPcR54VAZTev/ErSU4+k3BRA6OwM5kDm36Nmz6 b7eEkfULXu3XBu6a7i8Ofc0lW6hlSepdBJZJ+qjh6vG/NL+AWC+M5Nr9CrKC1wfUeqpP 0pZTwaaQJDjBHRqHYMrfy3PVh4//7lZ5qMVah05nlI6cmtCFn1e3Mi3U1Q9gPbblJalv UXL/R9eV+zB/lKAntSdRlqiN5vdy1dPndvJ1WitPZ34MtAGnzvXQ3lRLcugRA8i7BP0C kUQrNmDrBBVKdKKw2XC3N6QQM5bxetupkmEIGc30EWgn0KUXsONb0p0xF6g8V7/Yy6O1 q5YA== X-Gm-Message-State: ALoCoQlDv35cq2St5VVNVLC4s+hKkm6SeJ9s3BMNYsTAWCvgICBo9E6RysIpwEFE75OmeVIoB1dg X-Received: by 10.42.107.146 with SMTP id d18mr8667820icp.8.1395099371608; Mon, 17 Mar 2014 16:36:11 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.140.109.9 with SMTP id k9ls1825940qgf.78.gmail; Mon, 17 Mar 2014 16:36:11 -0700 (PDT) X-Received: by 10.58.161.205 with SMTP id xu13mr1352653veb.4.1395099371436; Mon, 17 Mar 2014 16:36:11 -0700 (PDT) Received: from mail-vc0-f173.google.com (mail-vc0-f173.google.com [209.85.220.173]) by mx.google.com with ESMTPS id yb7si3176698vec.55.2014.03.17.16.36.11 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 17 Mar 2014 16:36:11 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.173 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.220.173; Received: by mail-vc0-f173.google.com with SMTP id il7so6359711vcb.4 for ; Mon, 17 Mar 2014 16:36:11 -0700 (PDT) X-Received: by 10.58.31.136 with SMTP id a8mr8478092vei.20.1395099371343; Mon, 17 Mar 2014 16:36:11 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.220.78.9 with SMTP id i9csp162999vck; Mon, 17 Mar 2014 16:36:10 -0700 (PDT) X-Received: by 10.224.69.197 with SMTP id a5mr5869339qaj.87.1395099370702; Mon, 17 Mar 2014 16:36:10 -0700 (PDT) Received: from lists.gnu.org (lists.gnu.org. [2001:4830:134:3::11]) by mx.google.com with ESMTPS id 5si4221686qch.7.2014.03.17.16.36.10 for (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 17 Mar 2014 16:36:10 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; Received: from localhost ([::1]:60540 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WPfnH-0005ih-4o for patch@linaro.org; Mon, 17 Mar 2014 18:14:03 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51161) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WPfmJ-0004XE-Nj for qemu-devel@nongnu.org; Mon, 17 Mar 2014 18:13:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WPfmH-0007RW-QE for qemu-devel@nongnu.org; Mon, 17 Mar 2014 18:13:03 -0400 Received: from mnementh.archaic.org.uk ([2001:8b0:1d0::1]:46921) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WPfmH-0007Pr-9W for qemu-devel@nongnu.org; Mon, 17 Mar 2014 18:13:01 -0400 Received: from pm215 by mnementh.archaic.org.uk with local (Exim 4.80) (envelope-from ) id 1WPfle-00055u-Nw; Mon, 17 Mar 2014 22:12:22 +0000 From: Peter Maydell To: Anthony Liguori Date: Mon, 17 Mar 2014 22:12:20 +0000 Message-Id: <1395094341-19339-30-git-send-email-peter.maydell@linaro.org> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1395094341-19339-1-git-send-email-peter.maydell@linaro.org> References: <1395094341-19339-1-git-send-email-peter.maydell@linaro.org> MIME-Version: 1.0 X-detected-operating-system: by eggs.gnu.org: Error: Malformed IPv6 address (bad octet value). X-Received-From: 2001:8b0:1d0::1 Cc: Blue Swirl , =?UTF-8?q?Andreas=20F=C3=A4rber?= , qemu-devel@nongnu.org, Aurelien Jarno Subject: [Qemu-devel] [PULL 29/30] target-arm: A64: Add [UF]RSQRTE (reciprocal root estimate) X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: peter.maydell@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.220.173 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 From: Alex Bennée This adds support for [UF]RSQRTE instructions. It utilises the existing NEON helpers with some changes. The changes include an explicit passing of fpstatus (so the correct one is used between arm32 and aarch64), denormilzation, more correct error handling and also proper scaling of the fraction going into the estimate. Signed-off-by: Alex Bennée Reviewed-by: Richard Henderson Message-id: 1394822294-14837-25-git-send-email-peter.maydell@linaro.org Signed-off-by: Peter Maydell --- target-arm/helper.c | 133 ++++++++++++++++++++++++++++++++++++--------- target-arm/helper.h | 5 +- target-arm/translate-a64.c | 27 +++++++-- target-arm/translate.c | 12 +++- 4 files changed, 140 insertions(+), 37 deletions(-) diff --git a/target-arm/helper.c b/target-arm/helper.c index 535fc8f..55077ed 100644 --- a/target-arm/helper.c +++ b/target-arm/helper.c @@ -4721,12 +4721,12 @@ float64 HELPER(recpe_f64)(float64 input, void *fpstp) /* The algorithm that must be used to calculate the estimate * is specified by the ARM ARM. */ -static float64 recip_sqrt_estimate(float64 a, CPUARMState *env) +static float64 recip_sqrt_estimate(float64 a, float_status *real_fp_status) { /* These calculations mustn't set any fp exception flags, * so we use a local copy of the fp_status. */ - float_status dummy_status = env->vfp.standard_fp_status; + float_status dummy_status = *real_fp_status; float_status *s = &dummy_status; float64 q; int64_t q_int; @@ -4773,49 +4773,64 @@ static float64 recip_sqrt_estimate(float64 a, CPUARMState *env) return float64_div(int64_to_float64(q_int, s), float64_256, s); } -float32 HELPER(rsqrte_f32)(float32 a, CPUARMState *env) +float32 HELPER(rsqrte_f32)(float32 input, void *fpstp) { - float_status *s = &env->vfp.standard_fp_status; + float_status *s = fpstp; + float32 f32 = float32_squash_input_denormal(input, s); + uint32_t val = float32_val(f32); + uint32_t f32_sbit = 0x80000000 & val; + int32_t f32_exp = extract32(val, 23, 8); + uint32_t f32_frac = extract32(val, 0, 23); + uint64_t f64_frac; + uint64_t val64; int result_exp; float64 f64; - uint32_t val; - uint64_t val64; - val = float32_val(a); - - if (float32_is_any_nan(a)) { - if (float32_is_signaling_nan(a)) { + if (float32_is_any_nan(f32)) { + float32 nan = f32; + if (float32_is_signaling_nan(f32)) { float_raise(float_flag_invalid, s); + nan = float32_maybe_silence_nan(f32); } - return float32_default_nan; - } else if (float32_is_zero_or_denormal(a)) { - if (!float32_is_zero(a)) { - float_raise(float_flag_input_denormal, s); + if (s->default_nan_mode) { + nan = float32_default_nan; } + return nan; + } else if (float32_is_zero(f32)) { float_raise(float_flag_divbyzero, s); - return float32_set_sign(float32_infinity, float32_is_neg(a)); - } else if (float32_is_neg(a)) { + return float32_set_sign(float32_infinity, float32_is_neg(f32)); + } else if (float32_is_neg(f32)) { float_raise(float_flag_invalid, s); return float32_default_nan; - } else if (float32_is_infinity(a)) { + } else if (float32_is_infinity(f32)) { return float32_zero; } - /* Normalize to a double-precision value between 0.25 and 1.0, + /* Scale and normalize to a double-precision value between 0.25 and 1.0, * preserving the parity of the exponent. */ - if ((val & 0x800000) == 0) { - f64 = make_float64(((uint64_t)(val & 0x80000000) << 32) + + f64_frac = ((uint64_t) f32_frac) << 29; + if (f32_exp == 0) { + while (extract64(f64_frac, 51, 1) == 0) { + f64_frac = f64_frac << 1; + f32_exp = f32_exp-1; + } + f64_frac = extract64(f64_frac, 0, 51) << 1; + } + + if (extract64(f32_exp, 0, 1) == 0) { + f64 = make_float64(((uint64_t) f32_sbit) << 32 | (0x3feULL << 52) - | ((uint64_t)(val & 0x7fffff) << 29)); + | f64_frac); } else { - f64 = make_float64(((uint64_t)(val & 0x80000000) << 32) + f64 = make_float64(((uint64_t) f32_sbit) << 32 | (0x3fdULL << 52) - | ((uint64_t)(val & 0x7fffff) << 29)); + | f64_frac); } - result_exp = (380 - ((val & 0x7f800000) >> 23)) / 2; + result_exp = (380 - f32_exp) / 2; - f64 = recip_sqrt_estimate(f64, env); + f64 = recip_sqrt_estimate(f64, s); val64 = float64_val(f64); @@ -4824,6 +4839,69 @@ float32 HELPER(rsqrte_f32)(float32 a, CPUARMState *env) return make_float32(val); } +float64 HELPER(rsqrte_f64)(float64 input, void *fpstp) +{ + float_status *s = fpstp; + float64 f64 = float64_squash_input_denormal(input, s); + uint64_t val = float64_val(f64); + uint64_t f64_sbit = 0x8000000000000000ULL & val; + int64_t f64_exp = extract64(val, 52, 11); + uint64_t f64_frac = extract64(val, 0, 52); + int64_t result_exp; + uint64_t result_frac; + + if (float64_is_any_nan(f64)) { + float64 nan = f64; + if (float64_is_signaling_nan(f64)) { + float_raise(float_flag_invalid, s); + nan = float64_maybe_silence_nan(f64); + } + if (s->default_nan_mode) { + nan = float64_default_nan; + } + return nan; + } else if (float64_is_zero(f64)) { + float_raise(float_flag_divbyzero, s); + return float64_set_sign(float64_infinity, float64_is_neg(f64)); + } else if (float64_is_neg(f64)) { + float_raise(float_flag_invalid, s); + return float64_default_nan; + } else if (float64_is_infinity(f64)) { + return float64_zero; + } + + /* Scale and normalize to a double-precision value between 0.25 and 1.0, + * preserving the parity of the exponent. */ + + if (f64_exp == 0) { + while (extract64(f64_frac, 51, 1) == 0) { + f64_frac = f64_frac << 1; + f64_exp = f64_exp - 1; + } + f64_frac = extract64(f64_frac, 0, 51) << 1; + } + + if (extract64(f64_exp, 0, 1) == 0) { + f64 = make_float64(f64_sbit + | (0x3feULL << 52) + | f64_frac); + } else { + f64 = make_float64(f64_sbit + | (0x3fdULL << 52) + | f64_frac); + } + + result_exp = (3068 - f64_exp) / 2; + + f64 = recip_sqrt_estimate(f64, s); + + result_frac = extract64(float64_val(f64), 0, 52); + + return make_float64(f64_sbit | + ((result_exp & 0x7ff) << 52) | + result_frac); +} + uint32_t HELPER(recpe_u32)(uint32_t a, void *fpstp) { float_status *s = fpstp; @@ -4841,8 +4919,9 @@ uint32_t HELPER(recpe_u32)(uint32_t a, void *fpstp) return 0x80000000 | ((float64_val(f64) >> 21) & 0x7fffffff); } -uint32_t HELPER(rsqrte_u32)(uint32_t a, CPUARMState *env) +uint32_t HELPER(rsqrte_u32)(uint32_t a, void *fpstp) { + float_status *fpst = fpstp; float64 f64; if ((a & 0xc0000000) == 0) { @@ -4857,7 +4936,7 @@ uint32_t HELPER(rsqrte_u32)(uint32_t a, CPUARMState *env) | ((uint64_t)(a & 0x3fffffff) << 22)); } - f64 = recip_sqrt_estimate(f64, env); + f64 = recip_sqrt_estimate(f64, fpst); return 0x80000000 | ((float64_val(f64) >> 21) & 0x7fffffff); } diff --git a/target-arm/helper.h b/target-arm/helper.h index f96a824..a3d6f32 100644 --- a/target-arm/helper.h +++ b/target-arm/helper.h @@ -169,9 +169,10 @@ DEF_HELPER_3(recps_f32, f32, f32, f32, env) DEF_HELPER_3(rsqrts_f32, f32, f32, f32, env) DEF_HELPER_FLAGS_2(recpe_f32, TCG_CALL_NO_RWG, f32, f32, ptr) DEF_HELPER_FLAGS_2(recpe_f64, TCG_CALL_NO_RWG, f64, f64, ptr) -DEF_HELPER_2(rsqrte_f32, f32, f32, env) +DEF_HELPER_FLAGS_2(rsqrte_f32, TCG_CALL_NO_RWG, f32, f32, ptr) +DEF_HELPER_FLAGS_2(rsqrte_f64, TCG_CALL_NO_RWG, f64, f64, ptr) DEF_HELPER_2(recpe_u32, i32, i32, ptr) -DEF_HELPER_2(rsqrte_u32, i32, i32, env) +DEF_HELPER_FLAGS_2(rsqrte_u32, TCG_CALL_NO_RWG, i32, i32, ptr) DEF_HELPER_5(neon_tbl, i32, env, i32, i32, i32, i32) DEF_HELPER_3(shl_cc, i32, env, i32, i32) diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index 235f880..befffac 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -7146,6 +7146,9 @@ static void handle_2misc_reciprocal(DisasContext *s, int opcode, case 0x3f: /* FRECPX */ gen_helper_frecpx_f64(tcg_res, tcg_op, fpst); break; + case 0x7d: /* FRSQRTE */ + gen_helper_rsqrte_f64(tcg_res, tcg_op, fpst); + break; default: g_assert_not_reached(); } @@ -7181,6 +7184,9 @@ static void handle_2misc_reciprocal(DisasContext *s, int opcode, case 0x3f: /* FRECPX */ gen_helper_frecpx_f32(tcg_res, tcg_op, fpst); break; + case 0x7d: /* FRSQRTE */ + gen_helper_rsqrte_f32(tcg_res, tcg_op, fpst); + break; default: g_assert_not_reached(); } @@ -7378,6 +7384,7 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) } case 0x3d: /* FRECPE */ case 0x3f: /* FRECPX */ + case 0x7d: /* FRSQRTE */ handle_2misc_reciprocal(s, opcode, true, u, true, size, rn, rd); return; case 0x1a: /* FCVTNS */ @@ -7404,9 +7411,6 @@ static void disas_simd_scalar_two_reg_misc(DisasContext *s, uint32_t insn) } handle_2misc_narrow(s, true, opcode, u, false, size - 1, rn, rd); return; - case 0x7d: /* FRSQRTE */ - unsupported_encoding(s, insn); - return; default: unallocated_encoding(s); return; @@ -9255,6 +9259,11 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } /* fall through */ case 0x3d: /* FRECPE */ + case 0x7d: /* FRSQRTE */ + if (size == 3 && !is_q) { + unallocated_encoding(s); + return; + } handle_2misc_reciprocal(s, opcode, false, u, is_q, size, rn, rd); return; case 0x56: /* FCVTXN, FCVTXN2 */ @@ -9297,9 +9306,12 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) } break; case 0x7c: /* URSQRTE */ - case 0x7d: /* FRSQRTE */ - unsupported_encoding(s, insn); - return; + if (size == 3) { + unallocated_encoding(s); + return; + } + need_fpstatus = true; + break; default: unallocated_encoding(s); return; @@ -9432,6 +9444,9 @@ static void disas_simd_two_reg_misc(DisasContext *s, uint32_t insn) case 0x59: /* FRINTX */ gen_helper_rints_exact(tcg_res, tcg_op, tcg_fpstatus); break; + case 0x7c: /* URSQRTE */ + gen_helper_rsqrte_u32(tcg_res, tcg_op, tcg_fpstatus); + break; default: g_assert_not_reached(); } diff --git a/target-arm/translate.c b/target-arm/translate.c index 3771953..56e3b4b 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -6689,8 +6689,12 @@ static int disas_neon_data_insn(CPUARMState * env, DisasContext *s, uint32_t ins break; } case NEON_2RM_VRSQRTE: - gen_helper_rsqrte_u32(tmp, tmp, cpu_env); + { + TCGv_ptr fpstatus = get_fpstatus_ptr(1); + gen_helper_rsqrte_u32(tmp, tmp, fpstatus); + tcg_temp_free_ptr(fpstatus); break; + } case NEON_2RM_VRECPE_F: { TCGv_ptr fpstatus = get_fpstatus_ptr(1); @@ -6699,8 +6703,12 @@ static int disas_neon_data_insn(CPUARMState * env, DisasContext *s, uint32_t ins break; } case NEON_2RM_VRSQRTE_F: - gen_helper_rsqrte_f32(cpu_F0s, cpu_F0s, cpu_env); + { + TCGv_ptr fpstatus = get_fpstatus_ptr(1); + gen_helper_rsqrte_f32(cpu_F0s, cpu_F0s, fpstatus); + tcg_temp_free_ptr(fpstatus); break; + } case NEON_2RM_VCVT_FS: /* VCVT.F32.S32 */ gen_vfp_sito(0, 1); break;