diff mbox series

[RFC,14/30] softfloat: 16 bit helpers for shr, clz and rounding and packing

Message ID 20171013162438.32458-15-alex.bennee@linaro.org
State New
Headers show
Series v8.2 half-precision support (work-in-progress) | expand

Commit Message

Alex Bennée Oct. 13, 2017, 4:24 p.m. UTC
Half-precision helpers for float16 maths. I didn't bother hand-coding
the count leading zeros as we could always fall-back to host-utils if
we needed to.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
 fpu/softfloat-macros.h | 39 +++++++++++++++++++++++++++++++++++++++
 fpu/softfloat.c        | 21 +++++++++++++++++++++
 2 files changed, 60 insertions(+)

-- 
2.14.1

Comments

Richard Henderson Oct. 15, 2017, 6:02 p.m. UTC | #1
On 10/13/2017 09:24 AM, Alex Bennée wrote:
> Half-precision helpers for float16 maths. I didn't bother hand-coding

> the count leading zeros as we could always fall-back to host-utils if

> we needed to.

> 

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> ---

>  fpu/softfloat-macros.h | 39 +++++++++++++++++++++++++++++++++++++++

>  fpu/softfloat.c        | 21 +++++++++++++++++++++

>  2 files changed, 60 insertions(+)

> 

> diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h

> index 9cc6158cb4..73091a88a8 100644

> --- a/fpu/softfloat-macros.h

> +++ b/fpu/softfloat-macros.h

> @@ -89,6 +89,31 @@ this code that are retained.

>  # define SOFTFLOAT_GNUC_PREREQ(maj, min) 0

>  #endif

>  

> +/*----------------------------------------------------------------------------

> +| Shifts `a' right by the number of bits given in `count'.  If any nonzero

> +| bits are shifted off, they are ``jammed'' into the least significant bit of

> +| the result by setting the least significant bit to 1.  The value of `count'

> +| can be arbitrarily large; in particular, if `count' is greater than 16, the

> +| result will be either 0 or 1, depending on whether `a' is zero or nonzero.

> +| The result is stored in the location pointed to by `zPtr'.

> +*----------------------------------------------------------------------------*/

> +

> +static inline void shift16RightJamming(uint16_t a, int count, uint16_t *zPtr)

> +{

> +    uint16_t z;

> +

> +    if ( count == 0 ) {

> +        z = a;

> +    }

> +    else if ( count < 16 ) {

> +        z = ( a>>count ) | ( ( a<<( ( - count ) & 16 ) ) != 0 );

> +    }

> +    else {

> +        z = ( a != 0 );

> +    }

> +    *zPtr = z;

> +

> +}


When are you going to use a SRJ of a uint16_t?  Isn't most of your actual
arithmetic actually done on uint32_t?

> +/*----------------------------------------------------------------------------

> +| Returns the number of leading 0 bits before the most-significant 1 bit of

> +| `a'.  If `a' is zero, 16 is returned.

> +*----------------------------------------------------------------------------*/

> +

> +static int8_t countLeadingZeros16( uint16_t a )

> +{

> +    if (a) {

> +        return __builtin_clz(a);

> +    } else {

> +        return 16;

> +    }

> +}


__builtin_clz works on "int".  You need to use clz32(a) - 16.

> +/*----------------------------------------------------------------------------

> +| Takes an abstract floating-point value having sign `zSign', exponent `zExp',

> +| and significand `zSig', and returns the proper single-precision floating-


s/single/half/

> +| point value corresponding to the abstract input.  This routine is just like

> +| `roundAndPackFloat32' except that `zSig' does not have to be normalized.

> +| Bit 15 of `zSig' must be zero, and `zExp' must be 1 less than the ``true''

> +| floating-point exponent.

> +*----------------------------------------------------------------------------*/

> +

> +static float16

> + normalizeRoundAndPackFloat16(flag zSign, int zExp, uint16_t zSig,

> +                              float_status *status)

> +{

> +    int8_t shiftCount;

> +

> +    shiftCount = countLeadingZeros16( zSig ) - 1;

> +    return roundAndPackFloat16(zSign, zExp - shiftCount, zSig<<shiftCount,

> +                               true, status);


Do I recall correctly that your lsb is between bits 7:6, like
roundAndPackFloat32?  You've got 11 bits of sig.  Plus 7 bits of extra equals
18 bits.  Which doesn't fit in uint16_t.

So, the reason that roundAndPackFloat32 uses 7 bits is that 7 + 24 == 31.

We can either use a split at (15 - 11 =) 4 bits, and still fit in a uint16_t,
or we can drop uint16_t and admit that the compiler is going to promote to int,
or uint32_t, anyway.  If we do that, we have options of a split between 4 and
(31 - 11 =) 20 bits.

We talked this week re fp->int conversion, it did seem Really Useful when we
noted that sig << exp is representable in a uint32_t.  Which does suggest a
choice at or below (32 - 11 - 14 =) 7.


r~
Alex Bennée Oct. 16, 2017, 8:20 a.m. UTC | #2
Richard Henderson <richard.henderson@linaro.org> writes:

> On 10/13/2017 09:24 AM, Alex Bennée wrote:

>> Half-precision helpers for float16 maths. I didn't bother hand-coding

>> the count leading zeros as we could always fall-back to host-utils if

>> we needed to.

>>

>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>> ---

>>  fpu/softfloat-macros.h | 39 +++++++++++++++++++++++++++++++++++++++

>>  fpu/softfloat.c        | 21 +++++++++++++++++++++

>>  2 files changed, 60 insertions(+)

>>

>> diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h

>> index 9cc6158cb4..73091a88a8 100644

>> --- a/fpu/softfloat-macros.h

>> +++ b/fpu/softfloat-macros.h

>> @@ -89,6 +89,31 @@ this code that are retained.

>>  # define SOFTFLOAT_GNUC_PREREQ(maj, min) 0

>>  #endif

>>

>> +/*----------------------------------------------------------------------------

>> +| Shifts `a' right by the number of bits given in `count'.  If any nonzero

>> +| bits are shifted off, they are ``jammed'' into the least significant bit of

>> +| the result by setting the least significant bit to 1.  The value of `count'

>> +| can be arbitrarily large; in particular, if `count' is greater than 16, the

>> +| result will be either 0 or 1, depending on whether `a' is zero or nonzero.

>> +| The result is stored in the location pointed to by `zPtr'.

>> +*----------------------------------------------------------------------------*/

>> +

>> +static inline void shift16RightJamming(uint16_t a, int count, uint16_t *zPtr)

>> +{

>> +    uint16_t z;

>> +

>> +    if ( count == 0 ) {

>> +        z = a;

>> +    }

>> +    else if ( count < 16 ) {

>> +        z = ( a>>count ) | ( ( a<<( ( - count ) & 16 ) ) != 0 );

>> +    }

>> +    else {

>> +        z = ( a != 0 );

>> +    }

>> +    *zPtr = z;

>> +

>> +}

>

> When are you going to use a SRJ of a uint16_t?  Isn't most of your actual

> arithmetic actually done on uint32_t?


The add/sub stuff currently uses it. Arguably it could do what it needs
with 32 bit as well but the spare exponent bits are enough for operating
on the significand. That said I'm fairly sure it all ends up 32 bit in
the generated code.

>

>> +/*----------------------------------------------------------------------------

>> +| Returns the number of leading 0 bits before the most-significant 1 bit of

>> +| `a'.  If `a' is zero, 16 is returned.

>> +*----------------------------------------------------------------------------*/

>> +

>> +static int8_t countLeadingZeros16( uint16_t a )

>> +{

>> +    if (a) {

>> +        return __builtin_clz(a);

>> +    } else {

>> +        return 16;

>> +    }

>> +}

>

> __builtin_clz works on "int".  You need to use clz32(a) - 16.


Ahh my mistake - I'd assumed it had the same smarts as the gcc atomics.
Maybe I should just use our utils functions afterall.

>

>> +/*----------------------------------------------------------------------------

>> +| Takes an abstract floating-point value having sign `zSign', exponent `zExp',

>> +| and significand `zSig', and returns the proper single-precision floating-

>

> s/single/half/

>

>> +| point value corresponding to the abstract input.  This routine is just like

>> +| `roundAndPackFloat32' except that `zSig' does not have to be normalized.

>> +| Bit 15 of `zSig' must be zero, and `zExp' must be 1 less than the ``true''

>> +| floating-point exponent.

>> +*----------------------------------------------------------------------------*/

>> +

>> +static float16

>> + normalizeRoundAndPackFloat16(flag zSign, int zExp, uint16_t zSig,

>> +                              float_status *status)

>> +{

>> +    int8_t shiftCount;

>> +

>> +    shiftCount = countLeadingZeros16( zSig ) - 1;

>> +    return roundAndPackFloat16(zSign, zExp - shiftCount, zSig<<shiftCount,

>> +                               true, status);

>

> Do I recall correctly that your lsb is between bits 7:6, like

> roundAndPackFloat32?  You've got 11 bits of sig.  Plus 7 bits of extra equals

> 18 bits.  Which doesn't fit in uint16_t.


No it takes a 32 bit sig in and deals with it internally.

>

> So, the reason that roundAndPackFloat32 uses 7 bits is that 7 + 24 == 31.

>

> We can either use a split at (15 - 11 =) 4 bits, and still fit in a uint16_t,

> or we can drop uint16_t and admit that the compiler is going to promote to int,

> or uint32_t, anyway.  If we do that, we have options of a split between 4 and

> (31 - 11 =) 20 bits.

>

> We talked this week re fp->int conversion, it did seem Really Useful when we

> noted that sig << exp is representable in a uint32_t.  Which does suggest a

> choice at or below (32 - 11 - 14 =) 7.

>

>

> r~



--
Alex Bennée
diff mbox series

Patch

diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h
index 9cc6158cb4..73091a88a8 100644
--- a/fpu/softfloat-macros.h
+++ b/fpu/softfloat-macros.h
@@ -89,6 +89,31 @@  this code that are retained.
 # define SOFTFLOAT_GNUC_PREREQ(maj, min) 0
 #endif
 
+/*----------------------------------------------------------------------------
+| Shifts `a' right by the number of bits given in `count'.  If any nonzero
+| bits are shifted off, they are ``jammed'' into the least significant bit of
+| the result by setting the least significant bit to 1.  The value of `count'
+| can be arbitrarily large; in particular, if `count' is greater than 16, the
+| result will be either 0 or 1, depending on whether `a' is zero or nonzero.
+| The result is stored in the location pointed to by `zPtr'.
+*----------------------------------------------------------------------------*/
+
+static inline void shift16RightJamming(uint16_t a, int count, uint16_t *zPtr)
+{
+    uint16_t z;
+
+    if ( count == 0 ) {
+        z = a;
+    }
+    else if ( count < 16 ) {
+        z = ( a>>count ) | ( ( a<<( ( - count ) & 16 ) ) != 0 );
+    }
+    else {
+        z = ( a != 0 );
+    }
+    *zPtr = z;
+
+}
 
 /*----------------------------------------------------------------------------
 | Shifts `a' right by the number of bits given in `count'.  If any nonzero
@@ -664,6 +689,20 @@  static uint32_t estimateSqrt32(int aExp, uint32_t a)
 
 }
 
+/*----------------------------------------------------------------------------
+| Returns the number of leading 0 bits before the most-significant 1 bit of
+| `a'.  If `a' is zero, 16 is returned.
+*----------------------------------------------------------------------------*/
+
+static int8_t countLeadingZeros16( uint16_t a )
+{
+    if (a) {
+        return __builtin_clz(a);
+    } else {
+        return 16;
+    }
+}
+
 /*----------------------------------------------------------------------------
 | Returns the number of leading 0 bits before the most-significant 1 bit of
 | `a'.  If `a' is zero, 32 is returned.
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 6ab4b39c09..cf7bf6d4f4 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -3488,6 +3488,27 @@  static float16 roundAndPackFloat16(flag zSign, int zExp,
     return packFloat16(zSign, zExp, zSig >> 13);
 }
 
+/*----------------------------------------------------------------------------
+| Takes an abstract floating-point value having sign `zSign', exponent `zExp',
+| and significand `zSig', and returns the proper single-precision floating-
+| point value corresponding to the abstract input.  This routine is just like
+| `roundAndPackFloat32' except that `zSig' does not have to be normalized.
+| Bit 15 of `zSig' must be zero, and `zExp' must be 1 less than the ``true''
+| floating-point exponent.
+*----------------------------------------------------------------------------*/
+
+static float16
+ normalizeRoundAndPackFloat16(flag zSign, int zExp, uint16_t zSig,
+                              float_status *status)
+{
+    int8_t shiftCount;
+
+    shiftCount = countLeadingZeros16( zSig ) - 1;
+    return roundAndPackFloat16(zSign, zExp - shiftCount, zSig<<shiftCount,
+                               true, status);
+
+}
+
 /*----------------------------------------------------------------------------
 | If `a' is denormal and we are in flush-to-zero mode then set the
 | input-denormal exception and return zero. Otherwise just return the value.