diff mbox

[AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics.

Message ID 1441893770-27517-1-git-send-email-james.greenhalgh@arm.com
State Superseded
Headers show

Commit Message

James Greenhalgh Sept. 10, 2015, 2:02 p.m. UTC
On Wed, Sep 09, 2015 at 10:28:28AM +0100, Christophe Lyon wrote:
> On 9 September 2015 at 10:31, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> >
> > Hi,
> >
> > This patch clears up some remaining confusion in the vector lane orderings
> > for the two intrinsics mentioned in the title.
> >
> > Bootstrapped on aarch64-none-linux-gnu and regression tested for
> > aarch64_be-none-elf with no issues.
> >
>
> Does this actually fix an existing testcase?

Yes, of course, sorry - that was a useless introduction to the patch!

First, I've updated the patch with a testcase, which fails for me on
aarch64_be-none-elf but not aarch64-*-*.

The issue is that the RTL folding routines will happily fold through
a vec_concat or a vec_select, which we have given the wrong operands
to when in BYTES_BIG_ENDIAN mode.  The fix is similar to that which we
have elsewhere in aarch64-simd.md, which is to split out the big
and little endian forms of the patterns which need vec_concat, and
to build a vec_par_cnst_*_half mask for the patterns which need
vec_select. This keeps us in the GCC-view of lane ordering.

There is test coverage that these patterns do the right thing for the
vectorizer (I know, because I initially typoed s/le/be and saw tests
gcc.dg/vect fall over), and the new testcase adds coverage for the
expansion path through intrinsics.

I've rebased on top of Alan's patch, which goes halfway to fixing the
issue, but which didn't fix the float_truncate patterns, which had an
incorrect vec_concat. That simplifies the patch considerably.

Rechecked on aarch64_be-none-elf and aarch64-none-linux-gnu with no
issues.

OK?

Thanks,
James

---
gcc/

2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64-simd.md

	(aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
	(aarch64_float_truncate_hi_v4sf_le): New.
	(aarch64_float_truncate_hi_v4sf_be): Likewise.

gcc/testsuite/

2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>

	* gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c: New.

Comments

Christophe Lyon Sept. 20, 2015, 8:38 p.m. UTC | #1
On 10 September 2015 at 16:02, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
>
> On Wed, Sep 09, 2015 at 10:28:28AM +0100, Christophe Lyon wrote:
>> On 9 September 2015 at 10:31, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>> >
>> > Hi,
>> >
>> > This patch clears up some remaining confusion in the vector lane orderings
>> > for the two intrinsics mentioned in the title.
>> >
>> > Bootstrapped on aarch64-none-linux-gnu and regression tested for
>> > aarch64_be-none-elf with no issues.
>> >
>>
>> Does this actually fix an existing testcase?
>
> Yes, of course, sorry - that was a useless introduction to the patch!
>
> First, I've updated the patch with a testcase, which fails for me on
> aarch64_be-none-elf but not aarch64-*-*.
>
> The issue is that the RTL folding routines will happily fold through
> a vec_concat or a vec_select, which we have given the wrong operands
> to when in BYTES_BIG_ENDIAN mode.  The fix is similar to that which we
> have elsewhere in aarch64-simd.md, which is to split out the big
> and little endian forms of the patterns which need vec_concat, and
> to build a vec_par_cnst_*_half mask for the patterns which need
> vec_select. This keeps us in the GCC-view of lane ordering.
>
> There is test coverage that these patterns do the right thing for the
> vectorizer (I know, because I initially typoed s/le/be and saw tests
> gcc.dg/vect fall over), and the new testcase adds coverage for the
> expansion path through intrinsics.
>
> I've rebased on top of Alan's patch, which goes halfway to fixing the
> issue, but which didn't fix the float_truncate patterns, which had an
> incorrect vec_concat. That simplifies the patch considerably.
>
> Rechecked on aarch64_be-none-elf and aarch64-none-linux-gnu with no
> issues.
>
> OK?

The testcase should be modified so that it is skipped on arm* targets.

Christophe.

>
> Thanks,
> James
>
> ---
> gcc/
>
> 2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64-simd.md
>
>         (aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
>         (aarch64_float_truncate_hi_v4sf_le): New.
>         (aarch64_float_truncate_hi_v4sf_be): Likewise.
>
> gcc/testsuite/
>
> 2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c: New.
>
Alan Lawrence Sept. 21, 2015, 9:44 a.m. UTC | #2
[Resending in plain text] This makes sense to me now, although I find
your comment slightly confusing:

[....] in that
+;; the meaning of HI and LO is always taken with a little-endian view of
+;; the vector

You mean vec_unpacks_{hi,lo} (which seems to go against the
*architectural* bit after this), or hi/lo in cases other than
vec_unpack (=> not "always"), or something else?

maybe s/always/usually/ or s/always/otherwise/ ?

Cheers, Alan

On 10 September 2015 at 15:02, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
>
> On Wed, Sep 09, 2015 at 10:28:28AM +0100, Christophe Lyon wrote:
>> On 9 September 2015 at 10:31, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>> >
>> > Hi,
>> >
>> > This patch clears up some remaining confusion in the vector lane orderings
>> > for the two intrinsics mentioned in the title.
>> >
>> > Bootstrapped on aarch64-none-linux-gnu and regression tested for
>> > aarch64_be-none-elf with no issues.
>> >
>>
>> Does this actually fix an existing testcase?
>
> Yes, of course, sorry - that was a useless introduction to the patch!
>
> First, I've updated the patch with a testcase, which fails for me on
> aarch64_be-none-elf but not aarch64-*-*.
>
> The issue is that the RTL folding routines will happily fold through
> a vec_concat or a vec_select, which we have given the wrong operands
> to when in BYTES_BIG_ENDIAN mode.  The fix is similar to that which we
> have elsewhere in aarch64-simd.md, which is to split out the big
> and little endian forms of the patterns which need vec_concat, and
> to build a vec_par_cnst_*_half mask for the patterns which need
> vec_select. This keeps us in the GCC-view of lane ordering.
>
> There is test coverage that these patterns do the right thing for the
> vectorizer (I know, because I initially typoed s/le/be and saw tests
> gcc.dg/vect fall over), and the new testcase adds coverage for the
> expansion path through intrinsics.
>
> I've rebased on top of Alan's patch, which goes halfway to fixing the
> issue, but which didn't fix the float_truncate patterns, which had an
> incorrect vec_concat. That simplifies the patch considerably.
>
> Rechecked on aarch64_be-none-elf and aarch64-none-linux-gnu with no
> issues.
>
> OK?
>
> Thanks,
> James
>
> ---
> gcc/
>
> 2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * config/aarch64/aarch64-simd.md
>
>         (aarch64_float_truncate_hi_v4sf): Rewrite as an expand.
>         (aarch64_float_truncate_hi_v4sf_le): New.
>         (aarch64_float_truncate_hi_v4sf_be): Likewise.
>
> gcc/testsuite/
>
> 2015-09-09  James Greenhalgh  <james.greenhalgh@arm.com>
>
>         * gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c: New.
>
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index a4eaeca..8be9b97 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1703,6 +1703,13 @@ 
   [(set_attr "type" "neon_fp_cvt_widen_s")]
 )
 
+;; ??? Note that the vectorizer usage of the vec_unpacks_[lo/hi] patterns
+;; is inconsistent with vector ordering elsewhere in the compiler, in that
+;; the meaning of HI and LO is always taken with a little-endian view of
+;; the vector.  Thus, while the patterns below look incorrect in that
+;; vec_unpacks_hi always extracts the high *architectural* lanes of a
+;; vector, their behaviour is as required.
+
 (define_expand "vec_unpacks_lo_<mode>"
   [(match_operand:<VWIDE> 0 "register_operand" "")
    (match_operand:VQ_HSF 1 "register_operand" "")]
@@ -1757,17 +1764,42 @@ 
   [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
 )
 
-(define_insn "aarch64_float_truncate_hi_<Vdbl>"
+(define_insn "aarch64_float_truncate_hi_<Vdbl>_le"
   [(set (match_operand:<VDBL> 0 "register_operand" "=w")
     (vec_concat:<VDBL>
       (match_operand:VDF 1 "register_operand" "0")
       (float_truncate:VDF
 	(match_operand:<VWIDE> 2 "register_operand" "w"))))]
-  "TARGET_SIMD"
+  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
   "fcvtn2\\t%0.<Vdtype>, %2<Vmwtype>"
   [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
 )
 
+(define_insn "aarch64_float_truncate_hi_<Vdbl>_be"
+  [(set (match_operand:<VDBL> 0 "register_operand" "=w")
+    (vec_concat:<VDBL>
+      (float_truncate:VDF
+	(match_operand:<VWIDE> 2 "register_operand" "w"))
+      (match_operand:VDF 1 "register_operand" "0")))]
+  "TARGET_SIMD && BYTES_BIG_ENDIAN"
+  "fcvtn2\\t%0.<Vdtype>, %2<Vmwtype>"
+  [(set_attr "type" "neon_fp_cvt_narrow_d_q")]
+)
+
+(define_expand "aarch64_float_truncate_hi_<Vdbl>"
+  [(match_operand:<VDBL> 0 "register_operand" "=w")
+   (match_operand:VDF 1 "register_operand" "0")
+   (match_operand:<VWIDE> 2 "register_operand" "w")]
+  "TARGET_SIMD"
+{
+  rtx (*gen) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN
+			     ? gen_aarch64_float_truncate_hi_<Vdbl>_be
+			     : gen_aarch64_float_truncate_hi_<Vdbl>_le;
+  emit_insn (gen (operands[0], operands[1], operands[2]));
+  DONE;
+}
+)
+
 (define_expand "vec_pack_trunc_v2df"
   [(set (match_operand:V4SF 0 "register_operand")
       (vec_concat:V4SF
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c
new file mode 100644
index 0000000..492d6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c
@@ -0,0 +1,97 @@ 
+#include "arm_neon.h"
+#include <inttypes.h>
+
+void abort (void);
+
+void
+foo (void)
+{
+  /* Test vcvt_high_f32_f64.  */
+  float32x2_t arg1;
+  float64x2_t arg2;
+  float32x4_t result;
+  arg1 = vcreate_f32 (UINT64_C (0x3f0db5793f6e1892));
+  arg2 = vcombine_f64 (vcreate_f64 (UINT64_C (0x3fe8e49d23fb575d)),
+		       vcreate_f64 (UINT64_C (0x3fd921291b3df73e)));
+  //  Expect: "result" = 3ec909483f4724e93f0db5793f6e1892
+  result = vcvt_high_f32_f64 (arg1, arg2);
+  float32_t got;
+  float32_t exp;
+
+  /* Lane 0.  */
+  got = vgetq_lane_f32 (result, 0);
+  exp = ((float32_t) 0.9300624132156372);
+  if (((((exp / got) < ((float32_t) 0.999))
+	 || ((exp / got) > ((float32_t) 1.001)))
+     && (((exp - got) < ((float32_t) -1.0e-4))
+	 || ((exp - got) > ((float32_t) 1.0e-4)))))
+    abort ();
+
+  /* Lane 1.  */
+  got = vgetq_lane_f32 (result, 1);
+  exp = ((float32_t) 0.5535503029823303);
+  if (((((exp / got) < ((float32_t) 0.999))
+	  || ((exp / got) > ((float32_t) 1.001)))
+     && (((exp - got) < ((float32_t) -1.0e-4))
+	   || ((exp - got) > ((float32_t) 1.0e-4)))))
+    abort ();
+
+  /* Lane 2.  */
+  got = vgetq_lane_f32 (result, 2);
+  exp = ((float32_t) 0.7779069617051665);
+  if (((((exp / got) < ((float32_t) 0.999))
+	  || ((exp / got) > ((float32_t) 1.001)))
+      && (((exp - got) < ((float32_t) -1.0e-4))
+	  || ((exp - got) > ((float32_t) 1.0e-4)))))
+    abort ();
+
+  /* Lane 2.  */
+  got = vgetq_lane_f32 (result, 3);
+  exp = ((float32_t) 0.3926489606891329);
+  if (((((exp / got) < ((float32_t) 0.999))
+	  || ((exp / got) > ((float32_t) 1.001)))
+      && (((exp - got) < ((float32_t) -1.0e-4))
+	  || ((exp - got) > ((float32_t) 1.0e-4)))))
+    abort ();
+}
+
+void
+bar (void)
+{
+  /* Test vcvt_high_f64_f32.  */
+  float32x4_t arg1;
+  float64x2_t result;
+  arg1 = vcombine_f32 (vcreate_f32 (UINT64_C (0x3f7c5cf13f261f74)),
+		       vcreate_f32 (UINT64_C (0x3e3a7bc03f6ccc1d)));
+  //  Expect: "result" = 3fc74f78000000003fed9983a0000000
+  result = vcvt_high_f64_f32 (arg1);
+
+  float64_t got;
+  float64_t exp;
+
+  /* Lane 0.  */
+  got = vgetq_lane_f64 (result, 0);
+  exp = 0.9249895215034485;
+  if (((((exp / got) < 0.999)
+	 || ((exp / got) > 1.001))
+     && (((exp - got) < -1.0e-4)
+	 || ((exp - got) > 1.0e-4))))
+    abort ();
+
+  /* Lane 0.  */
+  got = vgetq_lane_f64 (result, 1);
+  exp = 0.1821126937866211;
+  if (((((exp / got) < 0.999)
+	  || ((exp / got) > 1.001))
+      && (((exp - got) < -1.0e-4)
+	  || ((exp - got) > 1.0e-4))))
+    abort ();
+}
+
+int
+main (int argc, char **argv)
+{
+  foo ();
+  bar ();
+  return 0;
+}