[v4,13/22] fpu/softfloat: re-factor mul

Message ID 20180206164815.10084-14-alex.bennee@linaro.org
State Superseded
Headers show
Series
  • re-factor softfloat and add fp16 functions
Related show

Commit Message

Alex Bennée Feb. 6, 2018, 4:48 p.m.
We can now add float16_mul and use the common decompose and
canonicalize functions to have a single implementation for
float16/32/64 versions.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>


---
v3
  - mul compile fixes for new names
  - remove duplicate s-o-b
  - more explicit inf * zero check
v4
  - use is_nan
  - pick_nan_parts/pick_nan
---
 fpu/softfloat.c         | 209 +++++++++++++++++++-----------------------------
 include/fpu/softfloat.h |   1 +
 2 files changed, 82 insertions(+), 128 deletions(-)

-- 
2.15.1

Comments

Peter Maydell Feb. 13, 2018, 3:20 p.m. | #1
On 6 February 2018 at 16:48, Alex Bennée <alex.bennee@linaro.org> wrote:
> We can now add float16_mul and use the common decompose and

> canonicalize functions to have a single implementation for

> float16/32/64 versions.

>

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

>

> ---

> v3


> +/*

> + * Returns the result of multiplying the floating-point values `a' and

> + * `b'. The operation is performed according to the IEC/IEEE Standard

> + * for Binary Floating-Point Arithmetic.

> + */

> +

> +static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s)

> +{

> +    bool sign = a.sign ^ b.sign;

> +

> +    if (a.cls == float_class_normal && b.cls == float_class_normal) {

> +        uint64_t hi, lo;

> +        int exp = a.exp + b.exp;

> +

> +        mul64To128(a.frac, b.frac, &hi, &lo);


It seems a shame that we previously were able to use a
32x32->64 multiply for the float32 case, and now we have to
do an expensive 64x64->128 multiply regardless...

Regardless
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>


thanks
-- PMM
Richard Henderson Feb. 13, 2018, 3:39 p.m. | #2
On 02/13/2018 07:20 AM, Peter Maydell wrote:
>> +static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s)

>> +{

>> +    bool sign = a.sign ^ b.sign;

>> +

>> +    if (a.cls == float_class_normal && b.cls == float_class_normal) {

>> +        uint64_t hi, lo;

>> +        int exp = a.exp + b.exp;

>> +

>> +        mul64To128(a.frac, b.frac, &hi, &lo);

> 

> It seems a shame that we previously were able to use a

> 32x32->64 multiply for the float32 case, and now we have to

> do an expensive 64x64->128 multiply regardless...


To be fair, I've proposed two different solutions addressing that -- c++
templates and glibc macros -- and you like neither.  Is there a third
alternative that does not involve code duplication?


r~
Alex Bennée Feb. 19, 2018, 4:04 p.m. | #3
Peter Maydell <peter.maydell@linaro.org> writes:

> On 6 February 2018 at 16:48, Alex Bennée <alex.bennee@linaro.org> wrote:

>> We can now add float16_mul and use the common decompose and

>> canonicalize functions to have a single implementation for

>> float16/32/64 versions.

>>

>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

>>

>> ---

>> v3

>

>> +/*

>> + * Returns the result of multiplying the floating-point values `a' and

>> + * `b'. The operation is performed according to the IEC/IEEE Standard

>> + * for Binary Floating-Point Arithmetic.

>> + */

>> +

>> +static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s)

>> +{

>> +    bool sign = a.sign ^ b.sign;

>> +

>> +    if (a.cls == float_class_normal && b.cls == float_class_normal) {

>> +        uint64_t hi, lo;

>> +        int exp = a.exp + b.exp;

>> +

>> +        mul64To128(a.frac, b.frac, &hi, &lo);

>

> It seems a shame that we previously were able to use a

> 32x32->64 multiply for the float32 case, and now we have to

> do an expensive 64x64->128 multiply regardless...


Actually for mul the hit isn't too bad. When we do a div however you do
notice a bit of a gulf:

 https://i.imgur.com/KMWceo8.png

We could start passing &floatN_params to the functions much like the
sqrt function and be a bit smarter when we do our multiply and let the
compiler figure it out as we go.

Another avenue worth exploring is ensuring we use native Int128 support
where we can so these wide operations can use wide registers where
available.

However both of these things for future optimisations given it doesn't
show up in dbt-bench timings.

>

> Regardless

> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

>

> thanks

> -- PMM



--
Alex Bennée

Patch

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 2190e7de56..6d29e1a103 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -735,6 +735,87 @@  float64 __attribute__((flatten)) float64_sub(float64 a, float64 b,
     return float64_round_pack_canonical(pr, status);
 }
 
+/*
+ * Returns the result of multiplying the floating-point values `a' and
+ * `b'. The operation is performed according to the IEC/IEEE Standard
+ * for Binary Floating-Point Arithmetic.
+ */
+
+static FloatParts mul_floats(FloatParts a, FloatParts b, float_status *s)
+{
+    bool sign = a.sign ^ b.sign;
+
+    if (a.cls == float_class_normal && b.cls == float_class_normal) {
+        uint64_t hi, lo;
+        int exp = a.exp + b.exp;
+
+        mul64To128(a.frac, b.frac, &hi, &lo);
+        shift128RightJamming(hi, lo, DECOMPOSED_BINARY_POINT, &hi, &lo);
+        if (lo & DECOMPOSED_OVERFLOW_BIT) {
+            shift64RightJamming(lo, 1, &lo);
+            exp += 1;
+        }
+
+        /* Re-use a */
+        a.exp = exp;
+        a.sign = sign;
+        a.frac = lo;
+        return a;
+    }
+    /* handle all the NaN cases */
+    if (is_nan(a.cls) || is_nan(b.cls)) {
+        return pick_nan(a, b, s);
+    }
+    /* Inf * Zero == NaN */
+    if ((a.cls == float_class_inf && b.cls == float_class_zero) ||
+        (a.cls == float_class_zero && b.cls == float_class_inf)) {
+        s->float_exception_flags |= float_flag_invalid;
+        a.cls = float_class_dnan;
+        a.sign = sign;
+        return a;
+    }
+    /* Multiply by 0 or Inf */
+    if (a.cls == float_class_inf || a.cls == float_class_zero) {
+        a.sign = sign;
+        return a;
+    }
+    if (b.cls == float_class_inf || b.cls == float_class_zero) {
+        b.sign = sign;
+        return b;
+    }
+    g_assert_not_reached();
+}
+
+float16 __attribute__((flatten)) float16_mul(float16 a, float16 b,
+                                             float_status *status)
+{
+    FloatParts pa = float16_unpack_canonical(a, status);
+    FloatParts pb = float16_unpack_canonical(b, status);
+    FloatParts pr = mul_floats(pa, pb, status);
+
+    return float16_round_pack_canonical(pr, status);
+}
+
+float32 __attribute__((flatten)) float32_mul(float32 a, float32 b,
+                                             float_status *status)
+{
+    FloatParts pa = float32_unpack_canonical(a, status);
+    FloatParts pb = float32_unpack_canonical(b, status);
+    FloatParts pr = mul_floats(pa, pb, status);
+
+    return float32_round_pack_canonical(pr, status);
+}
+
+float64 __attribute__((flatten)) float64_mul(float64 a, float64 b,
+                                             float_status *status)
+{
+    FloatParts pa = float64_unpack_canonical(a, status);
+    FloatParts pb = float64_unpack_canonical(b, status);
+    FloatParts pr = mul_floats(pa, pb, status);
+
+    return float64_round_pack_canonical(pr, status);
+}
+
 /*----------------------------------------------------------------------------
 | Takes a 64-bit fixed-point value `absZ' with binary point between bits 6
 | and 7, and returns the properly rounded 32-bit integer corresponding to the
@@ -2546,70 +2627,6 @@  float32 float32_round_to_int(float32 a, float_status *status)
 
 }
 
-/*----------------------------------------------------------------------------
-| Returns the result of multiplying the single-precision floating-point values
-| `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float32 float32_mul(float32 a, float32 b, float_status *status)
-{
-    flag aSign, bSign, zSign;
-    int aExp, bExp, zExp;
-    uint32_t aSig, bSig;
-    uint64_t zSig64;
-    uint32_t zSig;
-
-    a = float32_squash_input_denormal(a, status);
-    b = float32_squash_input_denormal(b, status);
-
-    aSig = extractFloat32Frac( a );
-    aExp = extractFloat32Exp( a );
-    aSign = extractFloat32Sign( a );
-    bSig = extractFloat32Frac( b );
-    bExp = extractFloat32Exp( b );
-    bSign = extractFloat32Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0xFF ) {
-        if ( aSig || ( ( bExp == 0xFF ) && bSig ) ) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        if ( ( bExp | bSig ) == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float32_default_nan(status);
-        }
-        return packFloat32( zSign, 0xFF, 0 );
-    }
-    if ( bExp == 0xFF ) {
-        if (bSig) {
-            return propagateFloat32NaN(a, b, status);
-        }
-        if ( ( aExp | aSig ) == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float32_default_nan(status);
-        }
-        return packFloat32( zSign, 0xFF, 0 );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat32( zSign, 0, 0 );
-        normalizeFloat32Subnormal( aSig, &aExp, &aSig );
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) return packFloat32( zSign, 0, 0 );
-        normalizeFloat32Subnormal( bSig, &bExp, &bSig );
-    }
-    zExp = aExp + bExp - 0x7F;
-    aSig = ( aSig | 0x00800000 )<<7;
-    bSig = ( bSig | 0x00800000 )<<8;
-    shift64RightJamming( ( (uint64_t) aSig ) * bSig, 32, &zSig64 );
-    zSig = zSig64;
-    if ( 0 <= (int32_t) ( zSig<<1 ) ) {
-        zSig <<= 1;
-        --zExp;
-    }
-    return roundAndPackFloat32(zSign, zExp, zSig, status);
-
-}
 
 /*----------------------------------------------------------------------------
 | Returns the result of dividing the single-precision floating-point value `a'
@@ -4142,70 +4159,6 @@  float64 float64_trunc_to_int(float64 a, float_status *status)
     return res;
 }
 
-
-/*----------------------------------------------------------------------------
-| Returns the result of multiplying the double-precision floating-point values
-| `a' and `b'.  The operation is performed according to the IEC/IEEE Standard
-| for Binary Floating-Point Arithmetic.
-*----------------------------------------------------------------------------*/
-
-float64 float64_mul(float64 a, float64 b, float_status *status)
-{
-    flag aSign, bSign, zSign;
-    int aExp, bExp, zExp;
-    uint64_t aSig, bSig, zSig0, zSig1;
-
-    a = float64_squash_input_denormal(a, status);
-    b = float64_squash_input_denormal(b, status);
-
-    aSig = extractFloat64Frac( a );
-    aExp = extractFloat64Exp( a );
-    aSign = extractFloat64Sign( a );
-    bSig = extractFloat64Frac( b );
-    bExp = extractFloat64Exp( b );
-    bSign = extractFloat64Sign( b );
-    zSign = aSign ^ bSign;
-    if ( aExp == 0x7FF ) {
-        if ( aSig || ( ( bExp == 0x7FF ) && bSig ) ) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        if ( ( bExp | bSig ) == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float64_default_nan(status);
-        }
-        return packFloat64( zSign, 0x7FF, 0 );
-    }
-    if ( bExp == 0x7FF ) {
-        if (bSig) {
-            return propagateFloat64NaN(a, b, status);
-        }
-        if ( ( aExp | aSig ) == 0 ) {
-            float_raise(float_flag_invalid, status);
-            return float64_default_nan(status);
-        }
-        return packFloat64( zSign, 0x7FF, 0 );
-    }
-    if ( aExp == 0 ) {
-        if ( aSig == 0 ) return packFloat64( zSign, 0, 0 );
-        normalizeFloat64Subnormal( aSig, &aExp, &aSig );
-    }
-    if ( bExp == 0 ) {
-        if ( bSig == 0 ) return packFloat64( zSign, 0, 0 );
-        normalizeFloat64Subnormal( bSig, &bExp, &bSig );
-    }
-    zExp = aExp + bExp - 0x3FF;
-    aSig = ( aSig | LIT64( 0x0010000000000000 ) )<<10;
-    bSig = ( bSig | LIT64( 0x0010000000000000 ) )<<11;
-    mul64To128( aSig, bSig, &zSig0, &zSig1 );
-    zSig0 |= ( zSig1 != 0 );
-    if ( 0 <= (int64_t) ( zSig0<<1 ) ) {
-        zSig0 <<= 1;
-        --zExp;
-    }
-    return roundAndPackFloat64(zSign, zExp, zSig0, status);
-
-}
-
 /*----------------------------------------------------------------------------
 | Returns the result of dividing the double-precision floating-point value `a'
 | by the corresponding value `b'.  The operation is performed according to
diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h
index 693ece0974..7fc63dd60f 100644
--- a/include/fpu/softfloat.h
+++ b/include/fpu/softfloat.h
@@ -239,6 +239,7 @@  float64 float16_to_float64(float16 a, flag ieee, float_status *status);
 
 float16 float16_add(float16, float16, float_status *status);
 float16 float16_sub(float16, float16, float_status *status);
+float16 float16_mul(float16, float16, float_status *status);
 
 int float16_is_quiet_nan(float16, float_status *status);
 int float16_is_signaling_nan(float16, float_status *status);