From patchwork Fri Sep 6 11:12:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Tokarev X-Patchwork-Id: 825938 Delivered-To: patch@linaro.org Received: by 2002:adf:a345:0:b0:367:895a:4699 with SMTP id d5csp746581wrb; Fri, 6 Sep 2024 04:20:50 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVpsL+iJ+9bWk6AB0BV+UQSgLmS1LFI2BYkGUM7HL9lcZBocSG6JqUoZL/v3Gw/9Wy9qGKiUA==@linaro.org X-Google-Smtp-Source: AGHT+IED3eiu6VWaK77RffZNGI/umD1oDcg2CaY0VuNA2xU6eYcmgbzOXTkDtWywsV7nrQK5lgL7 X-Received: by 2002:a05:6830:911:b0:703:6ac4:2ab8 with SMTP id 46e09a7af769-710cc1cd2e8mr2348526a34.0.1725621649720; Fri, 06 Sep 2024 04:20:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725621649; cv=none; d=google.com; s=arc-20240605; b=NZT8whhocrKgq1q+Y5yIglvKFBGmngojQLIb95bS6lwaLhASVX5Xn5ef3Gcb+BocE9 lhW653S861eozJwTZ0Wpq9Bi8Z0cLcEpGoxwnO0705L0GmFG+gPYA5i5T3o2iFE6Q1T+ 1y7RuBC8KP3bN2Xxk0X4Lp9V09+6GGZN3LTfjsHZigf6LK0k6yX6wrGK2KNKqsWrA2Bb XdkSApp4YbD1Ptbl385PRvfU/521BVKch3gey2Qyji2T0xn2G9jy9aakmtB4jLxfubnt HCtuxOAwsvkodyr6datiIBagZbUBYB4N87k8htc48IcpK0GGyWxTkgGRI8HxrAGU6glN S9VQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from; bh=gmFVaIR/B+pwV8/QFmFWNFKhLR9EVK5Jw2DEXJkWPBU=; fh=xJ1URYKcMN3TM0/XAv5v+aCN+5tIbzAdcfBx5UNgoLw=; b=Xcats37c//Q/SE3U3++5r0BhLzoOvO8edCn30xaaYLO7dldguGK7DeSCRK/Cwyfb+H WGV2Lng9bnGTHa20blYr1bk6Y5YmQKdQsiqZm/PsLyccg2QnrnJWjhbYiK80BjopEBeB NskjmlbHg0ShzALykjNhvq2sNNFchh+xDBrEKZksF0TV32kf2nYbRuWqR9tYVaOsqgkK Vqk7sVDt5HeGz9WPtBevn8HsVwE9qis6N1PcyKYZFjLgPRKX+uMbyLlCxcLxxPt/+ZFR E8EYesi9oSTtmMo1D7enAXczpndeXmj/UKrhYexg79pwPGwVquq8rmIFLVMWfDMWEL/f gCJA==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org" Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id 6a1803df08f44-6c53062a25csi742086d6.22.2024.09.06.04.20.48 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 06 Sep 2024 04:20:49 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org" Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1smWxN-0007HT-E5; Fri, 06 Sep 2024 07:16:49 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1smWwk-0004O3-HN; Fri, 06 Sep 2024 07:16:14 -0400 Received: from isrv.corpit.ru ([86.62.121.231]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1smWwi-00083J-Ei; Fri, 06 Sep 2024 07:16:10 -0400 Received: from tsrv.corpit.ru (tsrv.tls.msk.ru [192.168.177.2]) by isrv.corpit.ru (Postfix) with ESMTP id 9A76C8C490; Fri, 6 Sep 2024 14:12:08 +0300 (MSK) Received: from tls.msk.ru (mjt.wg.tls.msk.ru [192.168.177.130]) by tsrv.corpit.ru (Postfix) with SMTP id A61131336FA; Fri, 6 Sep 2024 14:13:26 +0300 (MSK) Received: (nullmailer pid 353647 invoked by uid 1000); Fri, 06 Sep 2024 11:13:24 -0000 From: Michael Tokarev To: qemu-devel@nongnu.org Cc: qemu-stable@nongnu.org, Peter Maydell , Richard Henderson , Michael Tokarev Subject: [Stable-9.0.3 32/69] target/arm: Handle denormals correctly for FMOPA (widening) Date: Fri, 6 Sep 2024 14:12:41 +0300 Message-Id: <20240906111324.353230-32-mjt@tls.msk.ru> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=86.62.121.231; envelope-from=mjt@tls.msk.ru; helo=isrv.corpit.ru X-Spam_score_int: -68 X-Spam_score: -6.9 X-Spam_bar: ------ X-Spam_report: (-6.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Peter Maydell The FMOPA (widening) SME instruction takes pairs of half-precision floating point values, widens them to single-precision, does a two-way dot product and accumulates the results into a single-precision destination. We don't quite correctly handle the FPCR bits FZ and FZ16 which control flushing of denormal inputs and outputs. This is because at the moment we pass a single float_status value to the helper function, which then uses that configuration for all the fp operations it does. However, because the inputs to this operation are float16 and the outputs are float32 we need to use the fp_status_f16 for the float16 input widening but the normal fp_status for everything else. Otherwise we will apply the flushing control FPCR.FZ16 to the 32-bit output rather than the FPCR.FZ control, and incorrectly flush a denormal output to zero when we should not (or vice-versa). (In commit 207d30b5fdb5b we tried to fix the FZ handling but didn't get it right, switching from "use FPCR.FZ for everything" to "use FPCR.FZ16 for everything".) (Mjt: it is commit 43929c818c4b in stable-9.0) Pass the CPU env to the sme_fmopa_h helper instead of an fp_status pointer, and have the helper pass an extra fp_status into the f16_dotadd() function so that we can use the right status for the right parts of this operation. Cc: qemu-stable@nongnu.org Fixes: 207d30b5fdb5 ("target/arm: Use FPST_F16 for SME FMOPA (widening)") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2373 Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson (cherry picked from commit 55f9f4ee018c5ccea81d8c8c586756d7711ae46f) Signed-off-by: Michael Tokarev diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index 27eef49a11..d22bf9d21b 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -121,7 +121,7 @@ DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG, - void, ptr, ptr, ptr, ptr, ptr, ptr, i32) + void, ptr, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index f9001f5213..3906bb51c0 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -976,12 +976,23 @@ static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t neg) } static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, - float_status *s_std, float_status *s_odd) + float_status *s_f16, float_status *s_std, + float_status *s_odd) { - float64 e1r = float16_to_float64(e1 & 0xffff, true, s_std); - float64 e1c = float16_to_float64(e1 >> 16, true, s_std); - float64 e2r = float16_to_float64(e2 & 0xffff, true, s_std); - float64 e2c = float16_to_float64(e2 >> 16, true, s_std); + /* + * We need three different float_status for different parts of this + * operation: + * - the input conversion of the float16 values must use the + * f16-specific float_status, so that the FPCR.FZ16 control is applied + * - operations on float32 including the final accumulation must use + * the normal float_status, so that FPCR.FZ is applied + * - we have pre-set-up copy of s_std which is set to round-to-odd, + * for the multiply (see below) + */ + float64 e1r = float16_to_float64(e1 & 0xffff, true, s_f16); + float64 e1c = float16_to_float64(e1 >> 16, true, s_f16); + float64 e2r = float16_to_float64(e2 & 0xffff, true, s_f16); + float64 e2c = float16_to_float64(e2 >> 16, true, s_f16); float64 t64; float32 t32; @@ -1003,20 +1014,23 @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, } void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, void *vst, uint32_t desc) + void *vpm, CPUARMState *env, uint32_t desc) { intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; uint16_t *pn = vpn, *pm = vpm; - float_status fpst_odd, fpst_std; + float_status fpst_odd, fpst_std, fpst_f16; /* - * Make a copy of float_status because this operation does not - * update the cumulative fp exception status. It also produces - * default nans. Make a second copy with round-to-odd -- see above. + * Make copies of fp_status and fp_status_f16, because this operation + * does not update the cumulative fp exception status. It also + * produces default NaNs. We also need a second copy of fp_status with + * round-to-odd -- see above. */ - fpst_std = *(float_status *)vst; + fpst_f16 = env->vfp.fp_status_f16; + fpst_std = env->vfp.fp_status; set_default_nan_mode(true, &fpst_std); + set_default_nan_mode(true, &fpst_f16); fpst_odd = fpst_std; set_float_rounding_mode(float_round_to_odd, &fpst_odd); @@ -1036,7 +1050,8 @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, uint32_t m = *(uint32_t *)(vzm + H1_4(col)); m = f16mop_adj_pair(m, pcol, 0); - *a = f16_dotadd(*a, n, m, &fpst_std, &fpst_odd); + *a = f16_dotadd(*a, n, m, + &fpst_f16, &fpst_std, &fpst_odd); } col += 4; pcol >>= 4; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index a50a419af2..ae42ddef7b 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -334,8 +334,29 @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz, return true; } -TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a, - MO_32, FPST_FPCR_F16, gen_helper_sme_fmopa_h) +static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, + gen_helper_gvec_5_ptr *fn) +{ + int svl = streaming_vec_reg_size(s); + uint32_t desc = simd_desc(svl, svl, a->sub); + TCGv_ptr za, zn, zm, pn, pm; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + za = get_tile(s, esz, a->zad); + zn = vec_full_reg_ptr(s, a->zn); + zm = vec_full_reg_ptr(s, a->zm); + pn = pred_full_reg_ptr(s, a->pn); + pm = pred_full_reg_ptr(s, a->pm); + + fn(za, zn, zm, pn, pm, tcg_env, tcg_constant_i32(desc)); + return true; +} + +TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_env, a, + MO_32, gen_helper_sme_fmopa_h) TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, FPST_FPCR, gen_helper_sme_fmopa_s) TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a,