From patchwork Fri Sep 6 06:53:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Tokarev X-Patchwork-Id: 825924 Delivered-To: patch@linaro.org Received: by 2002:adf:a345:0:b0:367:895a:4699 with SMTP id d5csp664748wrb; Fri, 6 Sep 2024 00:04:36 -0700 (PDT) X-Forwarded-Encrypted: i=2; AJvYcCVXdIWeMM8Xg7CGG2tEa2knxERv+E9w1Xr99NFXBqq6oBLkcq4BopbIaLNZyYorrvGwRELqBQ==@linaro.org X-Google-Smtp-Source: AGHT+IHdXmexrPwvMfypNw3CfZujQMSwqvW2C7tF8RqXbGsEnTFl1mvpCt7EiCjIsCllViU6s/31 X-Received: by 2002:a05:620a:370a:b0:7a7:da1e:fe2d with SMTP id af79cd13be357-7a8041bb467mr3116552685a.25.1725606276259; Fri, 06 Sep 2024 00:04:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1725606276; cv=none; d=google.com; s=arc-20240605; b=eG1MiDwWeN2bTAJ8feC6J8k0AcEerrJ5G4EmRD5pwr4a8L1jMIUu4vZaR/hMRv/HBC 3m8MFhX+l6aong9HqkJ1uOoW31EUXzb4pCNku+k4PxndJUo5eMgwD9IEAQN4OEhIgrVJ wwVgd8og1R5o1JJonBuiGhX3SgQXvcLHI9Ww4moU461CCMJP20mUCZtGQDd/hf0J1y87 kkuJYluL8K1iW+xev4n91wBFXp8D9pgW7CaYxI+IeNCB+PTpx61F6fb2Wj//guDgMgh3 jQLFr9TUr7aCK49vltOg9p4uKF7rLU9MS5ULDjSaaJCrehj6O+BLNPveJ7k/Qzbnj37o D4QA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from; bh=B6UMtjlZGw29ztUu2T90k4F48ytfJKpl3S6o53CxgGw=; fh=xJ1URYKcMN3TM0/XAv5v+aCN+5tIbzAdcfBx5UNgoLw=; b=eWd/FCY/BLbHp7Zy6bNZum4gHb8bTMznCkDb4Li08n8plr2bb/2UKKXUpRm5L1mLKn yd3qagXSqgI/k1ssT9D0cUtPhU5Y3AUaEIebC9zY3W+3r1p2nVH4psRNt2Kts19u96XS EMBotY4SE6jAXgJ9jS4KnezSwHOHfCVO/htiTYW4HsuVNaHGoH2Gx1gQ0ZdyzEarrCU4 5iTure8oUnE5OYLB95URW6ZplGiPDZKREmYY/qcHuMvXA0qxO0VB7JSkSF1DyRNOrXjv CB2960Q0T69PWzeOJWeN+K6viglxuqRPQoXffcrbrr4BO8xmztCEuD0Af7rX+H2/GA2X 3bKw==; dara=google.com ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org" Return-Path: Received: from lists.gnu.org (lists.gnu.org. [209.51.188.17]) by mx.google.com with ESMTPS id af79cd13be357-7a98ef1e487si374050385a.133.2024.09.06.00.04.36 for (version=TLS1_2 cipher=ECDHE-ECDSA-CHACHA20-POLY1305 bits=256/256); Fri, 06 Sep 2024 00:04:36 -0700 (PDT) Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; Authentication-Results: mx.google.com; spf=pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom="qemu-devel-bounces+patch=linaro.org@nongnu.org" Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1smSuR-0002eS-2a; Fri, 06 Sep 2024 02:57:31 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1smSto-0008PZ-MM; Fri, 06 Sep 2024 02:56:55 -0400 Received: from isrv.corpit.ru ([86.62.121.231]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1smStm-0003TC-7S; Fri, 06 Sep 2024 02:56:52 -0400 Received: from tsrv.corpit.ru (tsrv.tls.msk.ru [192.168.177.2]) by isrv.corpit.ru (Postfix) with ESMTP id 6BD0F8C250; Fri, 6 Sep 2024 09:53:13 +0300 (MSK) Received: from tls.msk.ru (mjt.wg.tls.msk.ru [192.168.177.130]) by tsrv.corpit.ru (Postfix) with SMTP id 33B94133408; Fri, 6 Sep 2024 09:54:31 +0300 (MSK) Received: (nullmailer pid 43529 invoked by uid 1000); Fri, 06 Sep 2024 06:54:30 -0000 From: Michael Tokarev To: qemu-devel@nongnu.org Cc: qemu-stable@nongnu.org, Peter Maydell , Richard Henderson , Michael Tokarev Subject: [Stable-8.2.7 27/53] target/arm: Handle denormals correctly for FMOPA (widening) Date: Fri, 6 Sep 2024 09:53:57 +0300 Message-Id: <20240906065429.42415-27-mjt@tls.msk.ru> X-Mailer: git-send-email 2.39.2 In-Reply-To: References: MIME-Version: 1.0 Received-SPF: pass client-ip=86.62.121.231; envelope-from=mjt@tls.msk.ru; helo=isrv.corpit.ru X-Spam_score_int: -68 X-Spam_score: -6.9 X-Spam_bar: ------ X-Spam_report: (-6.9 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org From: Peter Maydell The FMOPA (widening) SME instruction takes pairs of half-precision floating point values, widens them to single-precision, does a two-way dot product and accumulates the results into a single-precision destination. We don't quite correctly handle the FPCR bits FZ and FZ16 which control flushing of denormal inputs and outputs. This is because at the moment we pass a single float_status value to the helper function, which then uses that configuration for all the fp operations it does. However, because the inputs to this operation are float16 and the outputs are float32 we need to use the fp_status_f16 for the float16 input widening but the normal fp_status for everything else. Otherwise we will apply the flushing control FPCR.FZ16 to the 32-bit output rather than the FPCR.FZ control, and incorrectly flush a denormal output to zero when we should not (or vice-versa). (In commit 207d30b5fdb5b we tried to fix the FZ handling but didn't get it right, switching from "use FPCR.FZ for everything" to "use FPCR.FZ16 for everything".) (Mjt: it is commit 4975f9fc4ea3 in stable-8.2) Pass the CPU env to the sme_fmopa_h helper instead of an fp_status pointer, and have the helper pass an extra fp_status into the f16_dotadd() function so that we can use the right status for the right parts of this operation. Cc: qemu-stable@nongnu.org Fixes: 207d30b5fdb5 ("target/arm: Use FPST_F16 for SME FMOPA (widening)") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2373 Signed-off-by: Peter Maydell Reviewed-by: Richard Henderson (cherry picked from commit 55f9f4ee018c5ccea81d8c8c586756d7711ae46f) Signed-off-by: Michael Tokarev diff --git a/target/arm/tcg/helper-sme.h b/target/arm/tcg/helper-sme.h index 27eef49a11..d22bf9d21b 100644 --- a/target/arm/tcg/helper-sme.h +++ b/target/arm/tcg/helper-sme.h @@ -121,7 +121,7 @@ DEF_HELPER_FLAGS_5(sme_addha_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_5(sme_addva_d, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_7(sme_fmopa_h, TCG_CALL_NO_RWG, - void, ptr, ptr, ptr, ptr, ptr, ptr, i32) + void, ptr, ptr, ptr, ptr, ptr, env, i32) DEF_HELPER_FLAGS_7(sme_fmopa_s, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, ptr, ptr, ptr, i32) DEF_HELPER_FLAGS_7(sme_fmopa_d, TCG_CALL_NO_RWG, diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c index f9001f5213..3906bb51c0 100644 --- a/target/arm/tcg/sme_helper.c +++ b/target/arm/tcg/sme_helper.c @@ -976,12 +976,23 @@ static inline uint32_t f16mop_adj_pair(uint32_t pair, uint32_t pg, uint32_t neg) } static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, - float_status *s_std, float_status *s_odd) + float_status *s_f16, float_status *s_std, + float_status *s_odd) { - float64 e1r = float16_to_float64(e1 & 0xffff, true, s_std); - float64 e1c = float16_to_float64(e1 >> 16, true, s_std); - float64 e2r = float16_to_float64(e2 & 0xffff, true, s_std); - float64 e2c = float16_to_float64(e2 >> 16, true, s_std); + /* + * We need three different float_status for different parts of this + * operation: + * - the input conversion of the float16 values must use the + * f16-specific float_status, so that the FPCR.FZ16 control is applied + * - operations on float32 including the final accumulation must use + * the normal float_status, so that FPCR.FZ is applied + * - we have pre-set-up copy of s_std which is set to round-to-odd, + * for the multiply (see below) + */ + float64 e1r = float16_to_float64(e1 & 0xffff, true, s_f16); + float64 e1c = float16_to_float64(e1 >> 16, true, s_f16); + float64 e2r = float16_to_float64(e2 & 0xffff, true, s_f16); + float64 e2c = float16_to_float64(e2 >> 16, true, s_f16); float64 t64; float32 t32; @@ -1003,20 +1014,23 @@ static float32 f16_dotadd(float32 sum, uint32_t e1, uint32_t e2, } void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, - void *vpm, void *vst, uint32_t desc) + void *vpm, CPUARMState *env, uint32_t desc) { intptr_t row, col, oprsz = simd_maxsz(desc); uint32_t neg = simd_data(desc) * 0x80008000u; uint16_t *pn = vpn, *pm = vpm; - float_status fpst_odd, fpst_std; + float_status fpst_odd, fpst_std, fpst_f16; /* - * Make a copy of float_status because this operation does not - * update the cumulative fp exception status. It also produces - * default nans. Make a second copy with round-to-odd -- see above. + * Make copies of fp_status and fp_status_f16, because this operation + * does not update the cumulative fp exception status. It also + * produces default NaNs. We also need a second copy of fp_status with + * round-to-odd -- see above. */ - fpst_std = *(float_status *)vst; + fpst_f16 = env->vfp.fp_status_f16; + fpst_std = env->vfp.fp_status; set_default_nan_mode(true, &fpst_std); + set_default_nan_mode(true, &fpst_f16); fpst_odd = fpst_std; set_float_rounding_mode(float_round_to_odd, &fpst_odd); @@ -1036,7 +1050,8 @@ void HELPER(sme_fmopa_h)(void *vza, void *vzn, void *vzm, void *vpn, uint32_t m = *(uint32_t *)(vzm + H1_4(col)); m = f16mop_adj_pair(m, pcol, 0); - *a = f16_dotadd(*a, n, m, &fpst_std, &fpst_odd); + *a = f16_dotadd(*a, n, m, + &fpst_f16, &fpst_std, &fpst_odd); } col += 4; pcol >>= 4; diff --git a/target/arm/tcg/translate-sme.c b/target/arm/tcg/translate-sme.c index a50a419af2..ae42ddef7b 100644 --- a/target/arm/tcg/translate-sme.c +++ b/target/arm/tcg/translate-sme.c @@ -334,8 +334,29 @@ static bool do_outprod_fpst(DisasContext *s, arg_op *a, MemOp esz, return true; } -TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_fpst, a, - MO_32, FPST_FPCR_F16, gen_helper_sme_fmopa_h) +static bool do_outprod_env(DisasContext *s, arg_op *a, MemOp esz, + gen_helper_gvec_5_ptr *fn) +{ + int svl = streaming_vec_reg_size(s); + uint32_t desc = simd_desc(svl, svl, a->sub); + TCGv_ptr za, zn, zm, pn, pm; + + if (!sme_smza_enabled_check(s)) { + return true; + } + + za = get_tile(s, esz, a->zad); + zn = vec_full_reg_ptr(s, a->zn); + zm = vec_full_reg_ptr(s, a->zm); + pn = pred_full_reg_ptr(s, a->pn); + pm = pred_full_reg_ptr(s, a->pm); + + fn(za, zn, zm, pn, pm, tcg_env, tcg_constant_i32(desc)); + return true; +} + +TRANS_FEAT(FMOPA_h, aa64_sme, do_outprod_env, a, + MO_32, gen_helper_sme_fmopa_h) TRANS_FEAT(FMOPA_s, aa64_sme, do_outprod_fpst, a, MO_32, FPST_FPCR, gen_helper_sme_fmopa_s) TRANS_FEAT(FMOPA_d, aa64_sme_f64f64, do_outprod_fpst, a,