[RFC,28/30] softfloat: float16_to_int16 conversion

Message ID	20171013162438.32458-29-alex.bennee@linaro.org
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; From: =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org> To: richard.henderson@linaro.org Date: Fri, 13 Oct 2017 17:24:36 +0100 Message-Id: <20171013162438.32458-29-alex.bennee@linaro.org> In-Reply-To: <20171013162438.32458-1-alex.bennee@linaro.org> References: <20171013162438.32458-1-alex.bennee@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: [Qemu-devel] [RFC PATCH 28/30] softfloat: float16_to_int16 conversion Precedence: list Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org, =?utf-8?q?Alex_Benn?= =?utf-8?b?w6ll?= <alex.bennee@linaro.org>, qemu-devel@nongnu.org, Aurelien Jarno <aurelien@aurel32.net> Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Series	v8.2 half-precision support (work-in-progress) \| expand [RFC,00/30] v8.2 half-precision support (work-in-progress) [RFC,01/30] linux-user/main: support dfilter [RFC,02/30] arm: introduce ARM_V8_FP16 feature bit [RFC,03/30] include/exec/helper-head.h: support f16 in helper calls [RFC,04/30] target/arm/cpu.h: update comment for half-precision values [RFC,05/30] softfloat: implement propagateFloat16NaN [RFC,06/30] fpu/softfloat: implement float16_squash_input_denormal [RFC,07/30] fpu/softfloat: implement float16_abs helper [RFC,08/30] softfloat: add half-precision expansions for MINMAX fns [RFC,09/30] softfloat: propagate signalling NaNs in MINMAX [RFC,10/30] softfloat: improve comments on ARM NaN propagation [RFC,11/30] target/arm: implement half-precision F(MIN\|MAX)(V\|NMV) [RFC,12/30] target/arm/translate-a64.c: handle_3same_64 comment fix [RFC,13/30] target/arm/translate-a64.c: AdvSIMD scalar 3 Same FP16 initial decode [RFC,14/30] softfloat: 16 bit helpers for shr, clz and rounding and packing [RFC,15/30] softfloat: half-precision add/sub/mul/div support [RFC,16/30] target/arm/translate-a64.c: add FP16 FADD/FMUL/FDIV to AdvSIMD 3 Same (!sub) [RFC,17/30] target/arm/translate-a64.c: add FP16 FMULX [RFC,18/30] target/arm/translate-a64.c: add AdvSIMD scalar two-reg misc skeleton [RFC,19/30] Fix mask for AdvancedSIMD 2 reg misc [RFC,20/30] softfloat: half-precision compare functions [RFC,21/30] target/arm/translate-a64: add FP16 2-reg misc compare (zero) [RFC,22/30] target/arm/translate-a64.c: add FP16 FAGCT to AdvSIMD 3 Same [RFC,23/30] softfloat: add float16_rem and float16_muladd (!CHECK) [RFC,24/30] disas_simd_indexed: support half-precision operations [RFC,25/30] softfloat: float16_round_to_int [RFC,26/30] tests/test-softfloat: add a simple test framework [RFC,27/30] target/arm/translate-a64.c: add FP16 FRINTP to 2 reg misc [RFC,28/30] softfloat: float16_to_int16 conversion [RFC,29/30] tests/test-softfloat: add f16_to_int16 conversion test [RFC,30/30] target/arm/translate-a64.c: add FP16 FCVTPS to 2 reg misc

Message ID

20171013162438.32458-29-alex.bennee@linaro.org

State

New

Headers

Received-SPF: pass (google.com: domain of
	qemu-devel-bounces+patch=linaro.org@nongnu.org designates
	2001:4830:134:3::11 as permitted sender)
	client-ip=2001:4830:134:3::11; 
From: =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>
To: richard.henderson@linaro.org
Date: Fri, 13 Oct 2017 17:24:36 +0100
Message-Id: <20171013162438.32458-29-alex.bennee@linaro.org>
In-Reply-To: <20171013162438.32458-1-alex.bennee@linaro.org>
References: <20171013162438.32458-1-alex.bennee@linaro.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Subject: [Qemu-devel] [RFC PATCH 28/30] softfloat: float16_to_int16
	conversion
Precedence: list
Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org, =?utf-8?q?Alex_Benn?=
	=?utf-8?b?w6ll?= <alex.bennee@linaro.org>, qemu-devel@nongnu.org,
	Aurelien Jarno <aurelien@aurel32.net>
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>

Series

v8.2 half-precision support (work-in-progress) | expand

Commit Message

Alex Bennée Oct. 13, 2017, 4:24 p.m. UTC

I didn't have another reference for this so I wrote it from first
principles. The roundAndPackInt16 works with the same shifted input as
roundAndPacknt32 but with different constants for invalid testing for
overflow.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
 fpu/softfloat.c         | 98 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/fpu/softfloat.h |  1 +
 2 files changed, 99 insertions(+)

-- 
2.14.1

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index dc7f5f6d88..63f7cd1226 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -132,6 +132,62 @@  static inline flag extractFloat16Sign(float16 a)
     return float16_val(a)>>15;
 }
 
+/*----------------------------------------------------------------------------
+| Takes a 32-bit fixed-point value `absZ' with binary point between bits 6
+| and 7, and returns the properly rounded 16-bit integer corresponding to the
+| input.  If `zSign' is 1, the input is negated before being converted to an
+| integer.  Bit 31 of `absZ' must be zero.  Ordinarily, the fixed-point input
+| is simply rounded to an integer, with the inexact exception raised if the
+| input cannot be represented exactly as an integer.  However, if the fixed-
+| point input is too large, the invalid exception is raised and the largest
+| positive or negative integer is returned.
+*----------------------------------------------------------------------------*/
+
+static int16_t roundAndPackInt16(flag zSign, uint32_t absZ, float_status *status)
+{
+    int8_t roundingMode;
+    flag roundNearestEven;
+    int8_t roundIncrement, roundBits;
+    int16_t z;
+
+    roundingMode = status->float_rounding_mode;
+    roundNearestEven = ( roundingMode == float_round_nearest_even );
+
+    switch (roundingMode) {
+    case float_round_nearest_even:
+    case float_round_ties_away:
+        roundIncrement = 0x40;
+        break;
+    case float_round_to_zero:
+        roundIncrement = 0;
+        break;
+    case float_round_up:
+        roundIncrement = zSign ? 0 : 0x7f;
+        break;
+    case float_round_down:
+        roundIncrement = zSign ? 0x7f : 0;
+        break;
+    default:
+        abort();
+    }
+    roundBits = absZ & 0x7F;
+
+    absZ = ( absZ + roundIncrement )>>7;
+    absZ &= ~ ( ( ( roundBits ^ 0x40 ) == 0 ) & roundNearestEven );
+    z = absZ;
+    if ( zSign ) z = - z;
+
+    if ( ( absZ>>16 ) || ( z && ( ( z < 0 ) ^ zSign ) ) ) {
+        float_raise(float_flag_invalid, status);
+        return zSign ? (int16_t) 0x8000 : 0x7FFF;
+    }
+    if (roundBits) {
+        status->float_exception_flags |= float_flag_inexact;
+    }
+    return z;
+
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -4509,6 +4565,48 @@  int float16_unordered_quiet(float16 a, float16 b, float_status *status)
     return 0;
 }
 
+/*----------------------------------------------------------------------------
+| Returns the result of converting the half-precision floating-point value
+| `a' to the 16-bit two's complement integer format.  The conversion is
+| performed according to the IEC/IEEE Standard for Binary Floating-Point
+| Arithmetic---which means in particular that the conversion is rounded
+| according to the current rounding mode.  If `a' is a NaN, the largest
+| positive integer is returned.  Otherwise, if the conversion overflows, the
+| largest integer with the same sign as `a' is returned.
+*----------------------------------------------------------------------------*/
+
+int16_t float16_to_int16(float32 a, float_status *status)
+{
+    flag aSign;
+    int aExp;
+    uint32_t aSig;
+
+    a = float16_squash_input_denormal(a, status);
+    aSig = extractFloat16Frac( a );
+    aExp = extractFloat16Exp( a );
+    aSign = extractFloat16Sign( a );
+    if ( ( aExp == 0x1F ) && aSig ) aSign = 0;
+    if ( aExp ) aSig |= 0x0400; /* implicit bit */
+
+    /* At this point the binary point is between 10:9, we need to
+     * shift the significand it up by the +ve exponent to get the
+     * integer and then move the binary point down to the  7:6 for
+     * the final roundAnPackInt16.
+     *
+     * Even with the maximum +ve shift everything happily fits in the
+     * 32 bit aSig.
+     */
+    aExp -= 15; /* exp bias */
+    if (aExp >= 3) {
+        aSig <<= aExp - 3;
+    } else {
+        /* ensure small numbers still get rounded */
+        shift32RightJamming( aSig, 3 - aExp, &aSig );
+    }
+
+    return roundAndPackInt16(aSign, aSig, status);
+}
+
 /* Half precision floats come in two formats: standard IEEE and "ARM" format.
    The latter gains extra exponent range by omitting the NaN/Inf encodings.  */
 
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 856f67cf12..49517b19ea 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -338,6 +338,7 @@  static inline float64 uint16_to_float64(uint16_t v, float_status *status)
 | Software half-precision conversion routines.
 *----------------------------------------------------------------------------*/
 float16 float32_to_float16(float32, flag, float_status *status);
+int16_t float16_to_int16(float32 a, float_status *status);
 float32 float16_to_float32(float16, flag, float_status *status);
 float16 float64_to_float16(float64 a, flag ieee, float_status *status);
 float64 float16_to_float64(float16 a, flag ieee, float_status *status);

[RFC,28/30] softfloat: float16_to_int16 conversion

Commit Message

Patch