diff mbox series

[v2,2/3] fpu/softfloat: support ARM Alternative half-precision

Message ID 20180502154344.10585-3-alex.bennee@linaro.org
State New
Headers show
Series refactor float-to-float conversions and fix AHP | expand

Commit Message

Alex Bennée May 2, 2018, 3:43 p.m. UTC
For float16 ARM supports an alternative half-precision format which
sacrifices the ability to represent NaN/Inf in return for a higher
dynamic range. To support this I've added an additional
FloatFmt (float16_params_ahp).

The new FloatFmt flag (arm_althp) is then used to modify the behaviour
of canonicalize and round_canonical with respect to representation and
exception raising.

Finally the float16_to_floatN and floatN_to_float16 conversion
routines select the new alternative FloatFmt when !ieee.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

---
 fpu/softfloat.c | 97 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 74 insertions(+), 23 deletions(-)

-- 
2.17.0

Comments

Peter Maydell May 3, 2018, 6:17 p.m. UTC | #1
On 2 May 2018 at 16:43, Alex Bennée <alex.bennee@linaro.org> wrote:
> For float16 ARM supports an alternative half-precision format which

> sacrifices the ability to represent NaN/Inf in return for a higher

> dynamic range. To support this I've added an additional

> FloatFmt (float16_params_ahp).

>

> The new FloatFmt flag (arm_althp) is then used to modify the behaviour

> of canonicalize and round_canonical with respect to representation and

> exception raising.

>

> Finally the float16_to_floatN and floatN_to_float16 conversion

> routines select the new alternative FloatFmt when !ieee.

>

> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

> ---

>  fpu/softfloat.c | 97 +++++++++++++++++++++++++++++++++++++------------

>  1 file changed, 74 insertions(+), 23 deletions(-)


I found some corner cases where this patchset introduces
regressions; details below... They're not all althp related
but I started composing this email in reply to patch 2/3 and
don't want to try to move it to replying to the cover letter now :-)


(1) Here's an example of a wrong 32->16 conversion in alt-hp mode:
for FCVT h5, s0 where s0 is an SNaN, then your code gives 0x7E00
when it should give 0x0, because float16a_round_pack_canonical()
tries to return a NaN, which doesn't exist in alt-HP.

In the Arm ARM pseudocode this case is handled by the
FPConvert pseudocode, which treats alt_hp specially
when the input is a NaN. On that analogy I put this into
float_to_float(), which seems to have the desired effect:

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 25a331158f..1cc368175d 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -1256,6 +1256,17 @@ static FloatParts float_to_float(FloatParts a,
             s->float_exception_flags |= float_flag_invalid;
         }

+        if (dstf->arm_althp) {
+            /* There is no NaN in the destination format: raise Invalid
+             * and return a zero with the sign of the input NaN.
+             */
+            s->float_exception_flags |= float_flag_invalid;
+            a.cls = float_class_zero;
+            a.frac = 0;
+            a.exp = 0;
+            return a;
+        }
+
         if (s->default_nan_mode) {
             a.cls = float_class_dnan;
             return a;

(2) You're also failing to set the Inexact flag for cases like
 fcvt h1, s0     where s0 is 0x33000000
which should return result of 0, flags Inexact | Underflow,
but is after this patchset returning just Underflow.

I think this is because you're not dealing with the
odd handling of flush-to-zero for half-precision:
in the v8A Arm ARM pseudocode, float-to-float conversion
uses the FPRoundCV() function, which squashes FZ16 to 0,
and FPRoundBase() looks at fpcr.FZ for 32 and 64 bit floats
but fpcr.FZ16 for 16 bit floats. (FZ16 exists only with the
halfprec extension, and effectively applies only for the
data processing and int<->fp conversions -- see FPRound().)

In our old code we handled this implicitly by having
roundAndPackFloat16() not check status->flush_to_zero the way
that roundAndPackFloat32/64 did. Now you've combined them all
into one code path you need to do some special casing, I think.
This change fixes things for the fcvt case, but (a) is a
dubious hack and (b) you'll want to do something different
to handle FZ16 for actual arithmetic operations on halfprec.
If architectures other than Arm use halfprec (MIPS seems to)
then we should check what their semantics are to see if they
match Arm.

--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -453,7 +453,7 @@ static FloatParts round_canonical(FloatParts p,
float_status *s,
                     flags |= float_flag_inexact;
                 }
             }
-        } else if (s->flush_to_zero) {
+        } else if (s->flush_to_zero && parm->exp_size != 5) {
             flags |= float_flag_output_denormal;
             p.cls = float_class_zero;
             goto do_zero;

(3) Here's a NaN case we get wrong now: 64 to IEEE-16 conversion,
input is 0x7ff0000000000001 (an SNaN), we produce
0x7c00 (infinity) but should produce 0x7e00 (a QNaN).

This is because this code in float16a_round_pack_canonical():

    case float_class_msnan:
        return float16_maybe_silence_nan(float16_pack_raw(p), s);

doesn't consider the possibility that float16_pack_raw()
ends up with something that's not a NaN. In this case
because the float-to-float conversion has thrown away the
bottom bits of the double's mantissa, we have p.frac == 0,
and float16_pack_raw() gives 0x7c00, which is an infinity,
not a NaN. So when float16_maybe_silence_nan() calls
float16_is_signaling_nan() on it it returns false and then
we don't change the SNaN bit.

The code as of this patch seems to be a bit confused:
it does part of the conversion of NaNs from one format
to the other in float_to_float() (which is where it's
fiddling with the frac bits) and part of it in
the round_canonical() case (where it then messes
about with quietening the NaN). In an ideal world
this would all be punted out to the softfloat-specialize
code to convert with access to the full details of the
input number, because it's impdef how NaN conversion handles
the fraction bits. Arm happens to choose to use the
most significant bits of the fraction field, but there's
no theoretical reason why you couldn't have an
implementation that wanted to preserve the least
significant bits, for instance.

Note also that we currently have workarounds at the target/arm
level for the softfloat code not quietening input NaNs for
fp-to-fp conversion: see the uses of float*_maybe_silence_nan()
after float*_to_float* calls in target/arm/helper.c.
If the softfloat code is now going to get these correct then
we can drop those. HPPA, MIPS, RISCV and S390x have similar
workarounds also. Overall, the maybe_silence_nan function
was a dubious workaround for not having been able to do
the NaN handling when we had a fully unpacked value, and
perhaps we can minimise its use or even get rid of it...
(target/i386 notably does not do this, we should check how
SSE and x87 handle NaNs in fp conversions first.)

thanks
-- PMM
Alex Bennée May 3, 2018, 7:41 p.m. UTC | #2
Peter Maydell <peter.maydell@linaro.org> writes:

> On 2 May 2018 at 16:43, Alex Bennée <alex.bennee@linaro.org> wrote:

>> For float16 ARM supports an alternative half-precision format which

>> sacrifices the ability to represent NaN/Inf in return for a higher

>> dynamic range. To support this I've added an additional

>> FloatFmt (float16_params_ahp).

>>

>> The new FloatFmt flag (arm_althp) is then used to modify the behaviour

>> of canonicalize and round_canonical with respect to representation and

>> exception raising.

>>

>> Finally the float16_to_floatN and floatN_to_float16 conversion

>> routines select the new alternative FloatFmt when !ieee.

>>

>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

>> ---

>>  fpu/softfloat.c | 97 +++++++++++++++++++++++++++++++++++++------------

>>  1 file changed, 74 insertions(+), 23 deletions(-)

>

> I found some corner cases where this patchset introduces

> regressions; details below... They're not all althp related

> but I started composing this email in reply to patch 2/3 and

> don't want to try to move it to replying to the cover letter now :-)

>

>

> (1) Here's an example of a wrong 32->16 conversion in alt-hp mode:

> for FCVT h5, s0 where s0 is an SNaN, then your code gives 0x7E00

> when it should give 0x0, because float16a_round_pack_canonical()

> tries to return a NaN, which doesn't exist in alt-HP.

>

> In the Arm ARM pseudocode this case is handled by the

> FPConvert pseudocode, which treats alt_hp specially

> when the input is a NaN. On that analogy I put this into

> float_to_float(), which seems to have the desired effect:

>

> diff --git a/fpu/softfloat.c b/fpu/softfloat.c

> index 25a331158f..1cc368175d 100644

> --- a/fpu/softfloat.c

> +++ b/fpu/softfloat.c

> @@ -1256,6 +1256,17 @@ static FloatParts float_to_float(FloatParts a,

>              s->float_exception_flags |= float_flag_invalid;

>          }

>

> +        if (dstf->arm_althp) {

> +            /* There is no NaN in the destination format: raise Invalid

> +             * and return a zero with the sign of the input NaN.

> +             */

> +            s->float_exception_flags |= float_flag_invalid;

> +            a.cls = float_class_zero;

> +            a.frac = 0;

> +            a.exp = 0;

> +            return a;

> +        }

> +


Doh. The previous version had handling for this in float_to_float but it
was clear on my test case and I thought I'd handled it all in the
unpack/canonicalize step.

>          if (s->default_nan_mode) {

>              a.cls = float_class_dnan;

>              return a;

>

> (2) You're also failing to set the Inexact flag for cases like

>  fcvt h1, s0     where s0 is 0x33000000

> which should return result of 0, flags Inexact | Underflow,

> but is after this patchset returning just Underflow.


More cases for the test case ;-)

>

> I think this is because you're not dealing with the

> odd handling of flush-to-zero for half-precision:

> in the v8A Arm ARM pseudocode, float-to-float conversion

> uses the FPRoundCV() function, which squashes FZ16 to 0,

> and FPRoundBase() looks at fpcr.FZ for 32 and 64 bit floats

> but fpcr.FZ16 for 16 bit floats. (FZ16 exists only with the

> halfprec extension, and effectively applies only for the

> data processing and int<->fp conversions -- see FPRound().)

>

> In our old code we handled this implicitly by having

> roundAndPackFloat16() not check status->flush_to_zero the way

> that roundAndPackFloat32/64 did. Now you've combined them all

> into one code path you need to do some special casing, I think.

> This change fixes things for the fcvt case, but (a) is a

> dubious hack and (b) you'll want to do something different

> to handle FZ16 for actual arithmetic operations on halfprec.


We handle FZ16 by passing a different float_status for halfprec. But I
guess the fcvt helpers can do the save/squash/restore dance.

> If architectures other than Arm use halfprec (MIPS seems to)

> then we should check what their semantics are to see if they

> match Arm.


There are helpers but I couldn't see them actually being called. I was
half tempted to delete the helpers and rationalise the softfloat API
convention but decided against it to avoid churn.

>

> --- a/fpu/softfloat.c

> +++ b/fpu/softfloat.c

> @@ -453,7 +453,7 @@ static FloatParts round_canonical(FloatParts p,

> float_status *s,

>                      flags |= float_flag_inexact;

>                  }

>              }

> -        } else if (s->flush_to_zero) {

> +        } else if (s->flush_to_zero && parm->exp_size != 5) {

>              flags |= float_flag_output_denormal;

>              p.cls = float_class_zero;

>              goto do_zero;

>

> (3) Here's a NaN case we get wrong now: 64 to IEEE-16 conversion,

> input is 0x7ff0000000000001 (an SNaN), we produce

> 0x7c00 (infinity) but should produce 0x7e00 (a QNaN).


OK.

>

> This is because this code in float16a_round_pack_canonical():

>

>     case float_class_msnan:

>         return float16_maybe_silence_nan(float16_pack_raw(p), s);

>

> doesn't consider the possibility that float16_pack_raw()

> ends up with something that's not a NaN. In this case

> because the float-to-float conversion has thrown away the

> bottom bits of the double's mantissa, we have p.frac == 0,

> and float16_pack_raw() gives 0x7c00, which is an infinity,

> not a NaN. So when float16_maybe_silence_nan() calls

> float16_is_signaling_nan() on it it returns false and then

> we don't change the SNaN bit.

>

> The code as of this patch seems to be a bit confused:

> it does part of the conversion of NaNs from one format

> to the other in float_to_float() (which is where it's

> fiddling with the frac bits) and part of it in

> the round_canonical() case (where it then messes

> about with quietening the NaN). In an ideal world

> this would all be punted out to the softfloat-specialize

> code to convert with access to the full details of the

> input number, because it's impdef how NaN conversion handles

> the fraction bits. Arm happens to choose to use the

> most significant bits of the fraction field, but there's

> no theoretical reason why you couldn't have an

> implementation that wanted to preserve the least

> significant bits, for instance.

>

> Note also that we currently have workarounds at the target/arm

> level for the softfloat code not quietening input NaNs for

> fp-to-fp conversion: see the uses of float*_maybe_silence_nan()

> after float*_to_float* calls in target/arm/helper.c.

> If the softfloat code is now going to get these correct then

> we can drop those. HPPA, MIPS, RISCV and S390x have similar

> workarounds also. Overall, the maybe_silence_nan function

> was a dubious workaround for not having been able to do

> the NaN handling when we had a fully unpacked value, and

> perhaps we can minimise its use or even get rid of it...

> (target/i386 notably does not do this, we should check how

> SSE and x87 handle NaNs in fp conversions first.)


I guess it is time to expose some of the details for the unpacked float
handling to specialize so its not an after the fact hack.

>

> thanks

> -- PMM



--
Alex Bennée
Richard Henderson May 3, 2018, 8:09 p.m. UTC | #3
On 05/03/2018 11:17 AM, Peter Maydell wrote:
> (target/i386 notably does not do this, we should check how

> SSE and x87 handle NaNs in fp conversions first.)


Hardware does silence NaNs.  I tested that earlier:

https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg03114.html


r~
Alex Bennée May 4, 2018, 12:26 p.m. UTC | #4
Richard Henderson <richard.henderson@linaro.org> writes:

> On 05/03/2018 11:17 AM, Peter Maydell wrote:

>> (target/i386 notably does not do this, we should check how

>> SSE and x87 handle NaNs in fp conversions first.)

>

> Hardware does silence NaNs.  I tested that earlier:

>

> https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg03114.html


Does that include SSE? I know the hardware will silence NaN's if the
value is ever pushed into an x87 register (as GCC will do when
spilling/filling float).

>

>

> r~



--
Alex Bennée
Richard Henderson May 4, 2018, 3:36 p.m. UTC | #5
On 05/04/2018 05:26 AM, Alex Bennée wrote:
> 

> Richard Henderson <richard.henderson@linaro.org> writes:

> 

>> On 05/03/2018 11:17 AM, Peter Maydell wrote:

>>> (target/i386 notably does not do this, we should check how

>>> SSE and x87 handle NaNs in fp conversions first.)

>>

>> Hardware does silence NaNs.  I tested that earlier:

>>

>> https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg03114.html

> 

> Does that include SSE?


Yes.

> I know the hardware will silence NaN's if the

> value is ever pushed into an x87 register (as GCC will do when

> spilling/filling float).


Actually, loads and stores are purely bit operations that do not modify data.


r~
diff mbox series

Patch

diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 28b9f4f79b..25a331158f 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -234,6 +234,8 @@  typedef struct {
  *   frac_lsb: least significant bit of fraction
  *   fram_lsbm1: the bit bellow the least significant bit (for rounding)
  *   round_mask/roundeven_mask: masks used for rounding
+ * The following optional modifiers are available:
+ *   arm_althp: handle ARM Alternative Half Precision
  */
 typedef struct {
     int exp_size;
@@ -245,6 +247,7 @@  typedef struct {
     uint64_t frac_lsbm1;
     uint64_t round_mask;
     uint64_t roundeven_mask;
+    bool arm_althp;
 } FloatFmt;
 
 /* Expand fields based on the size of exponent and fraction */
@@ -257,12 +260,17 @@  typedef struct {
     .frac_lsb       = 1ull << (DECOMPOSED_BINARY_POINT - F),         \
     .frac_lsbm1     = 1ull << ((DECOMPOSED_BINARY_POINT - F) - 1),   \
     .round_mask     = (1ull << (DECOMPOSED_BINARY_POINT - F)) - 1,   \
-    .roundeven_mask = (2ull << (DECOMPOSED_BINARY_POINT - F)) - 1
+    .roundeven_mask = (2ull << (DECOMPOSED_BINARY_POINT - F)) - 1,
 
 static const FloatFmt float16_params = {
     FLOAT_PARAMS(5, 10)
 };
 
+static const FloatFmt float16_params_ahp = {
+    FLOAT_PARAMS(5, 10)
+    .arm_althp = true
+};
+
 static const FloatFmt float32_params = {
     FLOAT_PARAMS(8, 23)
 };
@@ -326,7 +334,7 @@  static inline float64 float64_pack_raw(FloatParts p)
 static FloatParts canonicalize(FloatParts part, const FloatFmt *parm,
                                float_status *status)
 {
-    if (part.exp == parm->exp_max) {
+    if (part.exp == parm->exp_max && !parm->arm_althp) {
         if (part.frac == 0) {
             part.cls = float_class_inf;
         } else {
@@ -412,8 +420,9 @@  static FloatParts round_canonical(FloatParts p, float_status *s,
 
         exp += parm->exp_bias;
         if (likely(exp > 0)) {
+            bool maybe_inexact = false;
             if (frac & round_mask) {
-                flags |= float_flag_inexact;
+                maybe_inexact = true;
                 frac += inc;
                 if (frac & DECOMPOSED_OVERFLOW_BIT) {
                     frac >>= 1;
@@ -422,14 +431,26 @@  static FloatParts round_canonical(FloatParts p, float_status *s,
             }
             frac >>= frac_shift;
 
-            if (unlikely(exp >= exp_max)) {
-                flags |= float_flag_overflow | float_flag_inexact;
-                if (overflow_norm) {
-                    exp = exp_max - 1;
-                    frac = -1;
-                } else {
-                    p.cls = float_class_inf;
-                    goto do_inf;
+            if (parm->arm_althp) {
+                if (unlikely(exp >= exp_max + 1)) {
+                        flags |= float_flag_invalid;
+                        frac = -1;
+                        exp = exp_max;
+                } else if (maybe_inexact) {
+                    flags |= float_flag_inexact;
+                }
+            } else {
+                if (unlikely(exp >= exp_max)) {
+                    flags |= float_flag_overflow | float_flag_inexact;
+                    if (overflow_norm) {
+                        exp = exp_max - 1;
+                        frac = -1;
+                    } else {
+                        p.cls = float_class_inf;
+                        goto do_inf;
+                    }
+                } else if (maybe_inexact) {
+                    flags |= float_flag_inexact;
                 }
             }
         } else if (s->flush_to_zero) {
@@ -474,7 +495,13 @@  static FloatParts round_canonical(FloatParts p, float_status *s,
     case float_class_inf:
     do_inf:
         exp = exp_max;
-        frac = 0;
+        if (parm->arm_althp) {
+            flags |= float_flag_invalid;
+            /* Alt HP returns result = sign:Ones(M-1) */
+            frac = -1;
+        } else {
+            frac = 0;
+        }
         break;
 
     case float_class_qnan:
@@ -492,12 +519,21 @@  static FloatParts round_canonical(FloatParts p, float_status *s,
     return p;
 }
 
+/* Explicit FloatFmt version */
+static FloatParts float16a_unpack_canonical(const FloatFmt *params,
+                                            float16 f, float_status *s)
+{
+    return canonicalize(float16_unpack_raw(f), params, s);
+}
+
 static FloatParts float16_unpack_canonical(float16 f, float_status *s)
 {
-    return canonicalize(float16_unpack_raw(f), &float16_params, s);
+    return float16a_unpack_canonical(&float16_params, f, s);
 }
 
-static float16 float16_round_pack_canonical(FloatParts p, float_status *s)
+
+static float16 float16a_round_pack_canonical(const FloatFmt *params,
+                                             FloatParts p, float_status *s)
 {
     switch (p.cls) {
     case float_class_dnan:
@@ -505,11 +541,16 @@  static float16 float16_round_pack_canonical(FloatParts p, float_status *s)
     case float_class_msnan:
         return float16_maybe_silence_nan(float16_pack_raw(p), s);
     default:
-        p = round_canonical(p, s, &float16_params);
+        p = round_canonical(p, s, params);
         return float16_pack_raw(p);
     }
 }
 
+static float16 float16_round_pack_canonical(FloatParts p, float_status *s)
+{
+    return float16a_round_pack_canonical(&float16_params, p, s);
+}
+
 static FloatParts float32_unpack_canonical(float32 f, float_status *s)
 {
     return canonicalize(float32_unpack_raw(f), &float32_params, s);
@@ -1235,25 +1276,34 @@  static FloatParts float_to_float(FloatParts a,
     return a;
 }
 
+/*
+ * Currently non-ieee implies ARM Alternative Half Precision handling
+ * for float16 values. If more are needed we'll need to expand the API
+ * into softfloat.
+ */
+
 float32 float16_to_float32(float16 a, bool ieee, float_status *s)
 {
-    FloatParts p = float16_unpack_canonical(a, s);
-    FloatParts pr = float_to_float(p, &float16_params, &float32_params, s);
+    const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
+    FloatParts p = float16a_unpack_canonical(fmt16, a, s);
+    FloatParts pr = float_to_float(p, fmt16, &float32_params, s);
     return float32_round_pack_canonical(pr, s);
 }
 
 float64 float16_to_float64(float16 a, bool ieee, float_status *s)
 {
-    FloatParts p = float16_unpack_canonical(a, s);
-    FloatParts pr = float_to_float(p, &float16_params, &float64_params, s);
+    const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
+    FloatParts p = float16a_unpack_canonical(fmt16, a, s);
+    FloatParts pr = float_to_float(p, fmt16, &float64_params, s);
     return float64_round_pack_canonical(pr, s);
 }
 
 float16 float32_to_float16(float32 a, bool ieee, float_status *s)
 {
+    const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
     FloatParts p = float32_unpack_canonical(a, s);
-    FloatParts pr = float_to_float(p, &float32_params, &float16_params, s);
-    return float16_round_pack_canonical(pr, s);
+    FloatParts pr = float_to_float(p, &float32_params, fmt16, s);
+    return float16a_round_pack_canonical(fmt16, pr, s);
 }
 
 float64 float32_to_float64(float32 a, float_status *s)
@@ -1265,9 +1315,10 @@  float64 float32_to_float64(float32 a, float_status *s)
 
 float16 float64_to_float16(float64 a, bool ieee, float_status *s)
 {
+    const FloatFmt *fmt16 = ieee ? &float16_params : &float16_params_ahp;
     FloatParts p = float64_unpack_canonical(a, s);
-    FloatParts pr = float_to_float(p, &float64_params, &float16_params, s);
-    return float16_round_pack_canonical(pr, s);
+    FloatParts pr = float_to_float(p, &float64_params, fmt16, s);
+    return float16a_round_pack_canonical(fmt16, pr, s);
 }
 
 float32 float64_to_float32(float64 a, float_status *s)