Message ID | CAKdteObyBRRiYN9PSxwKUhm7iKQycTLsynFiviAu5txx3dDhOg@mail.gmail.com |
---|---|
State | Accepted |
Commit | 6f105c502c1d53ea7352d2c20d66b8e0c87c1fa2 |
Headers | show |
On 26 January 2015 at 13:10, Tejas Belagod <tejas.belagod@arm.com> wrote: > On 25/01/15 21:05, Christophe Lyon wrote: >> >> On 23 January 2015 at 14:44, Christophe Lyon <christophe.lyon@linaro.org> >> wrote: >>> >>> On 23 January 2015 at 12:42, Christophe Lyon <christophe.lyon@linaro.org> >>> wrote: >>>> >>>> On 23 January 2015 at 11:18, Tejas Belagod <tejas.belagod@arm.com> >>>> wrote: >>>>> >>>>> On 22/01/15 21:31, Christophe Lyon wrote: >>>>>> >>>>>> >>>>>> On 22 January 2015 at 16:22, Tejas Belagod <tejas.belagod@arm.com> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On 22/01/15 14:28, Christophe Lyon wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 22 January 2015 at 12:19, Tejas Belagod <tejas.belagod@arm.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 21/01/15 15:07, Christophe Lyon wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 19 January 2015 at 17:54, Marcus Shawcroft >>>>>>>>>> <marcus.shawcroft@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 19 January 2015 at 15:43, Christophe Lyon >>>>>>>>>>> <christophe.lyon@linaro.org> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 19 January 2015 at 14:29, Marcus Shawcroft >>>>>>>>>>>> <marcus.shawcroft@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 16 January 2015 at 17:52, Christophe Lyon >>>>>>>>>>>>> <christophe.lyon@linaro.org> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>>> OK provided, as per the previous couple, that we don;t >>>>>>>>>>>>>>> regression >>>>>>>>>>>>>>> or >>>>>>>>>>>>>>> introduce new fails on aarch64[_be] or aarch32. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> This patch shows failures on aarch64 and aarch64_be for vmax >>>>>>>>>>>>>> and >>>>>>>>>>>>>> vmin >>>>>>>>>>>>>> when the input is -NaN. >>>>>>>>>>>>>> It's a corner case, and my reading of the ARM ARM is that the >>>>>>>>>>>>>> result >>>>>>>>>>>>>> should the same as on aarch32. >>>>>>>>>>>>>> I haven't had time to look at it in more details though. >>>>>>>>>>>>>> So, not OK? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> They should have the same behaviour in aarch32 and aarch64. Did >>>>>>>>>>>>> you >>>>>>>>>>>>> test on HW or a model? >>>>>>>>>>>>> >>>>>>>>>>>> I ran the tests on qemu for aarch32 and aarch64-linux, and on >>>>>>>>>>>> the >>>>>>>>>>>> foundation model for aarch64*-elf. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Leave this one out until we understand why it fails. /Marcus >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I've looked at this a bit more. >>>>>>>>>> We have >>>>>>>>>> fmax v0.4s, v0.4s, v1.4s >>>>>>>>>> where v0 is a vector of -NaN (0xffc00000) and v1 is a vector of 1. >>>>>>>>>> >>>>>>>>>> The output is still -NaN (0xffc00000), while the test expects >>>>>>>>>> defaultNaN (0x7fc00000). >>>>>>>>>> >>>>>>>>> >>>>>>>>> In the AArch32 execution state, Advanced SIMD FP arithmetic always >>>>>>>>> uses >>>>>>>>> the >>>>>>>>> DefaultNaN setting regardless of the DN-bit value in the FPSCR. In >>>>>>>>> AArch64 >>>>>>>>> execution state, result of Advanced SIMD FP arithmetic operations >>>>>>>>> depend >>>>>>>>> on >>>>>>>>> the value of the DN-bit i.e. either propagate the input NaN or >>>>>>>>> generate >>>>>>>>> DefaultNaN depending on the value of DN. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Maybe I'm using an outdated doc. On page 2282 of ARMv8 ARM rev C, I >>>>>>>> can see only the latter (no diff between aarch32 and aarch64 in >>>>>>>> FPProcessNan pseudo-code) >>>>>>>> >>>>>>> >>>>>>> If you see pg. 4005 in the same doc(rev C), you'll see the FPSCR spec >>>>>>> - >>>>>>> under DN: >>>>>>> >>>>>>> "The value of this bit only controls scalar floating-point >>>>>>> arithmetic. >>>>>>> Advanced SIMD arithmetic always uses the Default NaN setting, >>>>>>> regardless >>>>>>> of >>>>>>> the value of the DN bit." >>>>>>> >>>>>>> Also on page 3180 for the description of VMAX(vector FP), it says: >>>>>>> " >>>>>>> * max(+0.0, -0.0) = +0.0 >>>>>>> * If any input is a NaN, the corresponding result element is the >>>>>>> default >>>>>>> NaN. >>>>>>> " >>>>>>> >>>>>> Oops I was looking at FMAX (vector) pg 936. >>>>>> >>>>>>> The pseudocode for FPMax () on pg. 3180 passes StandardFPSCRValue() >>>>>>> to >>>>>>> FPMax() which is on pg. 2285 >>>>>>> >>>>>>> // StandardFPSCRValue() >>>>>>> // ==================== >>>>>>> FPCRType StandardFPSCRValue() >>>>>>> return ‘00000’ : FPSCR.AHP : ‘11000000000000000000000000’ >>>>>>> >>>>>>> Here bit-25(FPSCR.DN) is set to 1. >>>>>>> >>>>>> >>>>>> So, we should get defaultNaN too on aarch64, and no need to try to >>>>>> force DN to 1 in gdb? >>>>>> >>>>>> What can be wrong? >>>>>> >>>>> >>>>> On pg 3180, I see VMAX(FPSIMD) for A32/T32, not A64. I hope we're >>>>> reading >>>>> the same document. >>>>> >>>>> Regardless of the page number, if you see the pseudocode for >>>>> VMAX(FPSIMD) >>>>> for AArch32, StandardFPSCRValue() (i.e. DN = 1) is passed to FPMax() >>>>> which >>>>> means generate DefaultNaN() regardless. >>>>> >>>>> OTOH, on pg 936, you have FMAX(vector) for A64 where FPMax() in the >>>>> pseudocode gets just FPCR. >>>>> >>>>> >>>> Ok, that was my initial understanding but our discussion confused me. >>>> >>>> And that's why I tried to force DN = 1 in gdb before single-stepping >>>> over >>>> fmax v0.4s, v0.4s, v1.4s >>>> >>>> but it changed nothing :-( >>>> Hence my question about a gdb possible bug or misuse. >>> >>> >>> Hmm... user error, I missed one bit >>> set $fpcr=0x2000000 >>> works under gdb. >>> >>>> I'll try modifying the test to have it force DN=1. >>>> >>> Forcing DN=1 in the test makes it pass. >>> >>> I am going to look at adding that cleanly to my test, and resubmit it. >>> >>> Thanks, and sorry for the noise. >>> >> Here is the updated version: >> - Now I set DN=1 on AArch64 in clean_results, as it is the main >> initialization function. >> - I removed the double negative :-) >> - I removed the useless [u]int64 and poly variants >> >> Christophe. >> >> 2015-01-25 Christophe Lyon <christophe.lyon@linaro.org> >> >> * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h >> (_ARM_FPSRC): Add DN and AHP fields. >> (clean_results): Force DN=1 on AArch64. >> * gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc: New file. >> * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: New file. >> * gcc.target/aarch64/advsimd-intrinsics/vhsub.c: New file. >> * gcc.target/aarch64/advsimd-intrinsics/vmax.c: New file. >> * gcc.target/aarch64/advsimd-intrinsics/vmin.c: New file. >> * gcc.target/aarch64/advsimd-intrinsics/vrhadd.c: New file. >> > > I guess you don't need the fake dependency fix for this as this is mostly > called only once? > Yes, that is my current assumption: for the time being there is no other code which can potentially change this value. > + _ARM_FPSCR _afpscr_for_dn; > + asm volatile ("mrs %0,fpcr" : "=r" (_afpscr_for_dn)); > + _afpscr_for_dn.b.DN = 1; > + asm volatile ("msr fpcr,%0" : : "r" (_afpscr_for_dn)); Maybe in the future we'll want to check that DN=0 means that we actually forward a NaN != DefaultNaN, but that can be a further improvement to this patch. > Otherwise, your patch looks OK to me(but I can't approve it). Thanks for the review. > Thanks, > Tejas. > >
On 26 January 2015 at 14:23, Christophe Lyon <christophe.lyon@linaro.org> wrote: > On 26 January 2015 at 13:10, Tejas Belagod <tejas.belagod@arm.com> wrote: >> On 25/01/15 21:05, Christophe Lyon wrote: >>> >>> On 23 January 2015 at 14:44, Christophe Lyon <christophe.lyon@linaro.org> >>> wrote: >>>> >>>> On 23 January 2015 at 12:42, Christophe Lyon <christophe.lyon@linaro.org> >>>> wrote: >>>>> >>>>> On 23 January 2015 at 11:18, Tejas Belagod <tejas.belagod@arm.com> >>>>> wrote: >>>>>> >>>>>> On 22/01/15 21:31, Christophe Lyon wrote: >>>>>>> >>>>>>> >>>>>>> On 22 January 2015 at 16:22, Tejas Belagod <tejas.belagod@arm.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 22/01/15 14:28, Christophe Lyon wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 22 January 2015 at 12:19, Tejas Belagod <tejas.belagod@arm.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 21/01/15 15:07, Christophe Lyon wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 19 January 2015 at 17:54, Marcus Shawcroft >>>>>>>>>>> <marcus.shawcroft@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 19 January 2015 at 15:43, Christophe Lyon >>>>>>>>>>>> <christophe.lyon@linaro.org> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 19 January 2015 at 14:29, Marcus Shawcroft >>>>>>>>>>>>> <marcus.shawcroft@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 16 January 2015 at 17:52, Christophe Lyon >>>>>>>>>>>>>> <christophe.lyon@linaro.org> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> OK provided, as per the previous couple, that we don;t >>>>>>>>>>>>>>>> regression >>>>>>>>>>>>>>>> or >>>>>>>>>>>>>>>> introduce new fails on aarch64[_be] or aarch32. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This patch shows failures on aarch64 and aarch64_be for vmax >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>> vmin >>>>>>>>>>>>>>> when the input is -NaN. >>>>>>>>>>>>>>> It's a corner case, and my reading of the ARM ARM is that the >>>>>>>>>>>>>>> result >>>>>>>>>>>>>>> should the same as on aarch32. >>>>>>>>>>>>>>> I haven't had time to look at it in more details though. >>>>>>>>>>>>>>> So, not OK? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> They should have the same behaviour in aarch32 and aarch64. Did >>>>>>>>>>>>>> you >>>>>>>>>>>>>> test on HW or a model? >>>>>>>>>>>>>> >>>>>>>>>>>>> I ran the tests on qemu for aarch32 and aarch64-linux, and on >>>>>>>>>>>>> the >>>>>>>>>>>>> foundation model for aarch64*-elf. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Leave this one out until we understand why it fails. /Marcus >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I've looked at this a bit more. >>>>>>>>>>> We have >>>>>>>>>>> fmax v0.4s, v0.4s, v1.4s >>>>>>>>>>> where v0 is a vector of -NaN (0xffc00000) and v1 is a vector of 1. >>>>>>>>>>> >>>>>>>>>>> The output is still -NaN (0xffc00000), while the test expects >>>>>>>>>>> defaultNaN (0x7fc00000). >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> In the AArch32 execution state, Advanced SIMD FP arithmetic always >>>>>>>>>> uses >>>>>>>>>> the >>>>>>>>>> DefaultNaN setting regardless of the DN-bit value in the FPSCR. In >>>>>>>>>> AArch64 >>>>>>>>>> execution state, result of Advanced SIMD FP arithmetic operations >>>>>>>>>> depend >>>>>>>>>> on >>>>>>>>>> the value of the DN-bit i.e. either propagate the input NaN or >>>>>>>>>> generate >>>>>>>>>> DefaultNaN depending on the value of DN. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Maybe I'm using an outdated doc. On page 2282 of ARMv8 ARM rev C, I >>>>>>>>> can see only the latter (no diff between aarch32 and aarch64 in >>>>>>>>> FPProcessNan pseudo-code) >>>>>>>>> >>>>>>>> >>>>>>>> If you see pg. 4005 in the same doc(rev C), you'll see the FPSCR spec >>>>>>>> - >>>>>>>> under DN: >>>>>>>> >>>>>>>> "The value of this bit only controls scalar floating-point >>>>>>>> arithmetic. >>>>>>>> Advanced SIMD arithmetic always uses the Default NaN setting, >>>>>>>> regardless >>>>>>>> of >>>>>>>> the value of the DN bit." >>>>>>>> >>>>>>>> Also on page 3180 for the description of VMAX(vector FP), it says: >>>>>>>> " >>>>>>>> * max(+0.0, -0.0) = +0.0 >>>>>>>> * If any input is a NaN, the corresponding result element is the >>>>>>>> default >>>>>>>> NaN. >>>>>>>> " >>>>>>>> >>>>>>> Oops I was looking at FMAX (vector) pg 936. >>>>>>> >>>>>>>> The pseudocode for FPMax () on pg. 3180 passes StandardFPSCRValue() >>>>>>>> to >>>>>>>> FPMax() which is on pg. 2285 >>>>>>>> >>>>>>>> // StandardFPSCRValue() >>>>>>>> // ==================== >>>>>>>> FPCRType StandardFPSCRValue() >>>>>>>> return ‘00000’ : FPSCR.AHP : ‘11000000000000000000000000’ >>>>>>>> >>>>>>>> Here bit-25(FPSCR.DN) is set to 1. >>>>>>>> >>>>>>> >>>>>>> So, we should get defaultNaN too on aarch64, and no need to try to >>>>>>> force DN to 1 in gdb? >>>>>>> >>>>>>> What can be wrong? >>>>>>> >>>>>> >>>>>> On pg 3180, I see VMAX(FPSIMD) for A32/T32, not A64. I hope we're >>>>>> reading >>>>>> the same document. >>>>>> >>>>>> Regardless of the page number, if you see the pseudocode for >>>>>> VMAX(FPSIMD) >>>>>> for AArch32, StandardFPSCRValue() (i.e. DN = 1) is passed to FPMax() >>>>>> which >>>>>> means generate DefaultNaN() regardless. >>>>>> >>>>>> OTOH, on pg 936, you have FMAX(vector) for A64 where FPMax() in the >>>>>> pseudocode gets just FPCR. >>>>>> >>>>>> >>>>> Ok, that was my initial understanding but our discussion confused me. >>>>> >>>>> And that's why I tried to force DN = 1 in gdb before single-stepping >>>>> over >>>>> fmax v0.4s, v0.4s, v1.4s >>>>> >>>>> but it changed nothing :-( >>>>> Hence my question about a gdb possible bug or misuse. >>>> >>>> >>>> Hmm... user error, I missed one bit >>>> set $fpcr=0x2000000 >>>> works under gdb. >>>> >>>>> I'll try modifying the test to have it force DN=1. >>>>> >>>> Forcing DN=1 in the test makes it pass. >>>> >>>> I am going to look at adding that cleanly to my test, and resubmit it. >>>> >>>> Thanks, and sorry for the noise. >>>> >>> Here is the updated version: >>> - Now I set DN=1 on AArch64 in clean_results, as it is the main >>> initialization function. >>> - I removed the double negative :-) >>> - I removed the useless [u]int64 and poly variants >>> >>> Christophe. >>> >>> 2015-01-25 Christophe Lyon <christophe.lyon@linaro.org> >>> >>> * gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h >>> (_ARM_FPSRC): Add DN and AHP fields. >>> (clean_results): Force DN=1 on AArch64. >>> * gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc: New file. >>> * gcc.target/aarch64/advsimd-intrinsics/vhadd.c: New file. >>> * gcc.target/aarch64/advsimd-intrinsics/vhsub.c: New file. >>> * gcc.target/aarch64/advsimd-intrinsics/vmax.c: New file. >>> * gcc.target/aarch64/advsimd-intrinsics/vmin.c: New file. >>> * gcc.target/aarch64/advsimd-intrinsics/vrhadd.c: New file. >>> >> >> I guess you don't need the fake dependency fix for this as this is mostly >> called only once? >> > Yes, that is my current assumption: for the time being there is no > other code which can potentially change this value. > >> + _ARM_FPSCR _afpscr_for_dn; >> + asm volatile ("mrs %0,fpcr" : "=r" (_afpscr_for_dn)); >> + _afpscr_for_dn.b.DN = 1; >> + asm volatile ("msr fpcr,%0" : : "r" (_afpscr_for_dn)); > > Maybe in the future we'll want to check that DN=0 means that we > actually forward a NaN != DefaultNaN, but that can be a further > improvement to this patch. > Marcus, Is it OK to commit this one? This is the only remaining one from this series. Thanks, Christophe. >> Otherwise, your patch looks OK to me(but I can't approve it). > Thanks for the review. > >> Thanks, >> Tejas. >> >>
commit 187f34bdf94cfd2fbe7d439d12956edb69a3c131 Author: Christophe Lyon <christophe.lyon@linaro.org> Date: Sun Nov 30 22:29:57 2014 +0100 Add vmax, vmin, vhadd, vhsub and vrhadd tests. On AArch64, force FPCR.DN=1 to get the same results as on AArch32 (DefaultNaN). diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h index 2730a66..5b0c1e7 100644 --- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h @@ -230,7 +230,9 @@ extern ARRAY(expected, hfloat, 32, 4); typedef union { struct { - int _xxx:27; + int _xxx:25; + unsigned int DN:1; + unsigned int AHP:1; unsigned int QC:1; int V:1; int C:1; @@ -249,7 +251,9 @@ typedef union { int C:1; int V:1; unsigned int QC:1; - int _dnm:27; + unsigned int AHP:1; + unsigned int DN:1; + int _dnm:25; } b; unsigned int word; } _ARM_FPSCR; @@ -382,6 +386,15 @@ static void clean_results (void) CLEAN(result, poly, 8, 16); CLEAN(result, poly, 16, 8); CLEAN(result, float, 32, 4); + +#if defined(__aarch64__) + /* On AArch64, make sure to return DefaultNaN to have the same + results as on AArch32. */ + _ARM_FPSCR _afpscr_for_dn; + asm volatile ("mrs %0,fpcr" : "=r" (_afpscr_for_dn)); + _afpscr_for_dn.b.DN = 1; + asm volatile ("msr fpcr,%0" : : "r" (_afpscr_for_dn)); +#endif } diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc new file mode 100644 index 0000000..1eb9271 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/binary_op_no64.inc @@ -0,0 +1,134 @@ +/* Can't use the standard binary_op.inc template because vmax has no + 64 bits variant. */ + +#include <math.h> + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ + int i; + + /* Basic test: y=vmax(x,x), then store the result. */ +#define TEST_BINARY_OP1(INSN, Q, T1, T2, W, N) \ + VECT_VAR(vector_res, T1, W, N) = \ + INSN##Q##_##T2##W(VECT_VAR(vector, T1, W, N), \ + VECT_VAR(vector2, T1, W, N)); \ + vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N)) + +#define TEST_BINARY_OP(INSN, Q, T1, T2, W, N) \ + TEST_BINARY_OP1(INSN, Q, T1, T2, W, N) \ + + DECL_VARIABLE_ALL_VARIANTS(vector); + DECL_VARIABLE_ALL_VARIANTS(vector2); + DECL_VARIABLE_ALL_VARIANTS(vector_res); + + clean_results (); + + /* Initialize input "vector" from "buffer". */ + TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer); +#ifdef HAS_FLOAT_VARIANT + VLOAD(vector, buffer, , float, f, 32, 2); + VLOAD(vector, buffer, q, float, f, 32, 4); +#endif + + /* Choose init value arbitrarily, will be used as comparison value. */ + VDUP(vector2, , int, s, 8, 8, -13); + VDUP(vector2, , int, s, 16, 4, -14); + VDUP(vector2, , int, s, 32, 2, -16); + VDUP(vector2, , uint, u, 8, 8, 0xf3); + VDUP(vector2, , uint, u, 16, 4, 0xfff1); + VDUP(vector2, , uint, u, 32, 2, 0xfffffff0); + VDUP(vector2, q, int, s, 8, 16, -12); + VDUP(vector2, q, int, s, 16, 8, -13); + VDUP(vector2, q, int, s, 32, 4, -15); + VDUP(vector2, q, uint, u, 8, 16, 0xf9); + VDUP(vector2, q, uint, u, 16, 8, 0xfff2); + VDUP(vector2, q, uint, u, 32, 4, 0xfffffff1); +#ifdef HAS_FLOAT_VARIANT + VDUP(vector2, , float, f, 32, 2, -15.5f); + VDUP(vector2, q, float, f, 32, 4, -14.5f); +#endif + +#ifdef HAS_FLOAT_VARIANT +#define FLOAT_VARIANT(MACRO, VAR) \ + MACRO(VAR, , float, f, 32, 2); \ + MACRO(VAR, q, float, f, 32, 4) +#else +#define FLOAT_VARIANT(MACRO, VAR) +#endif + +#define TEST_MACRO_NO64BIT_VARIANT_1_5(MACRO, VAR) \ + MACRO(VAR, , int, s, 8, 8); \ + MACRO(VAR, , int, s, 16, 4); \ + MACRO(VAR, , int, s, 32, 2); \ + MACRO(VAR, , uint, u, 8, 8); \ + MACRO(VAR, , uint, u, 16, 4); \ + MACRO(VAR, , uint, u, 32, 2); \ + MACRO(VAR, q, int, s, 8, 16); \ + MACRO(VAR, q, int, s, 16, 8); \ + MACRO(VAR, q, int, s, 32, 4); \ + MACRO(VAR, q, uint, u, 8, 16); \ + MACRO(VAR, q, uint, u, 16, 8); \ + MACRO(VAR, q, uint, u, 32, 4); \ + FLOAT_VARIANT(MACRO, VAR) + + /* Apply a binary operator named INSN_NAME. */ + TEST_MACRO_NO64BIT_VARIANT_1_5(TEST_BINARY_OP, INSN_NAME); + + CHECK(TEST_MSG, int, 8, 8, PRIx8, expected, ""); + CHECK(TEST_MSG, int, 16, 4, PRIx16, expected, ""); + CHECK(TEST_MSG, int, 32, 2, PRIx32, expected, ""); + CHECK(TEST_MSG, uint, 8, 8, PRIx8, expected, ""); + CHECK(TEST_MSG, uint, 16, 4, PRIx16, expected, ""); + CHECK(TEST_MSG, uint, 32, 2, PRIx32, expected, ""); + CHECK(TEST_MSG, int, 8, 16, PRIx8, expected, ""); + CHECK(TEST_MSG, int, 16, 8, PRIx16, expected, ""); + CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, ""); + CHECK(TEST_MSG, uint, 8, 16, PRIx8, expected, ""); + CHECK(TEST_MSG, uint, 16, 8, PRIx16, expected, ""); + CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, ""); + +#ifdef HAS_FLOAT_VARIANT + CHECK_FP(TEST_MSG, float, 32, 2, PRIx32, expected, ""); + CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected, ""); + + /* Extra FP tests with special values (NaN, ....) */ + VDUP(vector, q, float, f, 32, 4, 1.0f); + VDUP(vector2, q, float, f, 32, 4, NAN); + TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4); + CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_nan, " FP special (NaN)"); + + VDUP(vector, q, float, f, 32, 4, -NAN); + VDUP(vector2, q, float, f, 32, 4, 1.0f); + TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4); + CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_mnan, " FP special (-NaN)"); + + VDUP(vector, q, float, f, 32, 4, 1.0f); + VDUP(vector2, q, float, f, 32, 4, HUGE_VALF); + TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4); + CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_inf, " FP special (inf)"); + + VDUP(vector, q, float, f, 32, 4, -HUGE_VALF); + VDUP(vector2, q, float, f, 32, 4, 1.0f); + TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4); + CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_minf, " FP special (-inf)"); + + VDUP(vector, q, float, f, 32, 4, 0.0f); + VDUP(vector2, q, float, f, 32, 4, -0.0f); + TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4); + CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_zero1, " FP special (-0.0)"); + + VDUP(vector, q, float, f, 32, 4, -0.0f); + VDUP(vector2, q, float, f, 32, 4, 0.0f); + TEST_BINARY_OP(INSN_NAME, q, float, f, 32, 4); + CHECK_FP(TEST_MSG, float, 32, 4, PRIx32, expected_zero2, " FP special (-0.0)"); +#endif +} + +int main (void) +{ + FNNAME (INSN_NAME) (); + return 0; +} diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vhadd.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vhadd.c new file mode 100644 index 0000000..d8a09ca --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vhadd.c @@ -0,0 +1,34 @@ +#include <arm_neon.h> +#include "arm-neon-ref.h" +#include "compute-ref-data.h" + +#define INSN_NAME vhadd +#define TEST_MSG "VHADD/VHADDQ" + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf1, 0xf2, 0xf2, 0xf3, + 0xf3, 0xf4, 0xf4, 0xf5 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff1, 0xfff1, 0xfff2, 0xfff2 }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff0, 0xfffffff0 }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf1, 0xf2, 0xf2, 0xf3, + 0xf3, 0xf4, 0xf4, 0xf5 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff1, 0xfff2 }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0xfffffff0 }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0xf2, 0xf2, 0xf3, 0xf3, + 0xf4, 0xf4, 0xf5, 0xf5, + 0xf6, 0xf6, 0xf7, 0xf7, + 0xf8, 0xf8, 0xf9, 0xf9 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff1, 0xfff2, 0xfff2, 0xfff3, + 0xfff3, 0xfff4, 0xfff4, 0xfff5 }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0xfffffff0, 0xfffffff1, + 0xfffffff1, 0xfffffff2 }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf4, 0xf5, 0xf5, 0xf6, + 0xf6, 0xf7, 0xf7, 0xf8, + 0xf8, 0xf9, 0xf9, 0xfa, + 0xfa, 0xfb, 0xfb, 0xfc }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff1, 0xfff1, 0xfff2, 0xfff2, + 0xfff3, 0xfff3, 0xfff4, 0xfff4 }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff0, 0xfffffff1, + 0xfffffff1, 0xfffffff2 }; + +#include "binary_op_no64.inc" diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vhsub.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vhsub.c new file mode 100644 index 0000000..0fe8080 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vhsub.c @@ -0,0 +1,32 @@ +#include <arm_neon.h> +#include "arm-neon-ref.h" +#include "compute-ref-data.h" + +#define INSN_NAME vhsub +#define TEST_MSG "VHSUB/VHSUBQ" + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0xfe, 0xff, 0xff, 0x0, + 0x0, 0x1, 0x1, 0x2 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0xffff, 0xffff, 0x0, 0x0 }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0x0, 0x0 }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0xfe, 0xff, 0xff, 0x0, + 0x0, 0x1, 0x1, 0x2 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xffff, 0x0, 0x0, 0x1 }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0x0, 0x0 }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0xfe, 0xfe, 0xff, 0xff, + 0x0, 0x0, 0x1, 0x1, + 0x2, 0x2, 0x3, 0x3, + 0x4, 0x4, 0x5, 0x5 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0xfffe, 0xffff, 0xffff, 0x0, + 0x0, 0x1, 0x1, 0x2 }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0xffffffff, 0x0, 0x0, 0x1 }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0xfb, 0xfc, 0xfc, 0xfd, + 0xfd, 0xfe, 0xfe, 0xff, + 0xff, 0x0, 0x0, 0x1, + 0x1, 0x2, 0x2, 0x3 }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xffff, 0xffff, 0x0, 0x0, + 0x1, 0x1, 0x2, 0x2 }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xffffffff, 0x0, 0x0, 0x1 }; + +#include "binary_op_no64.inc" diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax.c new file mode 100644 index 0000000..830603d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmax.c @@ -0,0 +1,51 @@ +#include <arm_neon.h> +#include "arm-neon-ref.h" +#include "compute-ref-data.h" + +#define INSN_NAME vmax +#define TEST_MSG "VMAX/VMAXQ" + +#define HAS_FLOAT_VARIANT + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3, + 0xf4, 0xf5, 0xf6, 0xf7 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff0, 0xfffffff1 }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf3, 0xf3, 0xf3, 0xf3, + 0xf4, 0xf5, 0xf6, 0xf7 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff1, 0xfff1, 0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1780000, 0xc1700000 }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0xf4, 0xf4, 0xf4, 0xf4, + 0xf4, 0xf5, 0xf6, 0xf7, + 0xf8, 0xf9, 0xfa, 0xfb, + 0xfc, 0xfd, 0xfe, 0xff }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff3, 0xfff3, 0xfff3, 0xfff3, + 0xfff4, 0xfff5, 0xfff6, 0xfff7 }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0xfffffff1, 0xfffffff1, + 0xfffffff2, 0xfffffff3 }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf9, 0xf9, 0xf9, 0xf9, + 0xf9, 0xf9, 0xf9, 0xf9, + 0xf9, 0xf9, 0xfa, 0xfb, + 0xfc, 0xfd, 0xfe, 0xff }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff2, 0xfff2, 0xfff2, 0xfff3, + 0xfff4, 0xfff5, 0xfff6, 0xfff7 }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff1, 0xfffffff1, + 0xfffffff2, 0xfffffff3 }; +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1680000, 0xc1680000, + 0xc1600000, 0xc1500000 }; + +/* Expected results with special FP values. */ +VECT_VAR_DECL(expected_nan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000, + 0x7fc00000, 0x7fc00000 }; +VECT_VAR_DECL(expected_mnan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000, + 0x7fc00000, 0x7fc00000 }; +VECT_VAR_DECL(expected_inf,hfloat,32,4) [] = { 0x7f800000, 0x7f800000, + 0x7f800000, 0x7f800000 }; +VECT_VAR_DECL(expected_minf,hfloat,32,4) [] = { 0x3f800000, 0x3f800000, + 0x3f800000, 0x3f800000 }; +VECT_VAR_DECL(expected_zero1,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 }; +VECT_VAR_DECL(expected_zero2,hfloat,32,4) [] = { 0x0, 0x0, 0x0, 0x0 }; + +#include "binary_op_no64.inc" diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin.c new file mode 100644 index 0000000..8ad2703 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmin.c @@ -0,0 +1,52 @@ +#include <arm_neon.h> +#include "arm-neon-ref.h" +#include "compute-ref-data.h" + +#define INSN_NAME vmin +#define TEST_MSG "VMIN/VMINQ" + +#define HAS_FLOAT_VARIANT + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3, + 0xf3, 0xf3, 0xf3, 0xf3 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff2 }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff0, 0xfffffff0 }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3, + 0xf3, 0xf3, 0xf3, 0xf3 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff1, 0xfff1 }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0xfffffff0 }; +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0xc1780000 }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3, + 0xf4, 0xf4, 0xf4, 0xf4, + 0xf4, 0xf4, 0xf4, 0xf4, + 0xf4, 0xf4, 0xf4, 0xf4 }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3, + 0xfff3, 0xfff3, 0xfff3, 0xfff3 }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0xfffffff0, 0xfffffff1, + 0xfffffff1, 0xfffffff1 }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3, + 0xf4, 0xf5, 0xf6, 0xf7, + 0xf8, 0xf9, 0xf9, 0xf9, + 0xf9, 0xf9, 0xf9, 0xf9 }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff2, + 0xfff2, 0xfff2, 0xfff2, 0xfff2 }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff0, 0xfffffff1, + 0xfffffff1, 0xfffffff1 }; +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0xc1700000, + 0xc1680000, 0xc1680000 }; +/* Expected results with special FP values. */ +VECT_VAR_DECL(expected_nan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000, + 0x7fc00000, 0x7fc00000 }; +VECT_VAR_DECL(expected_mnan,hfloat,32,4) [] = { 0x7fc00000, 0x7fc00000, + 0x7fc00000, 0x7fc00000 }; +VECT_VAR_DECL(expected_inf,hfloat,32,4) [] = { 0x3f800000, 0x3f800000, + 0x3f800000, 0x3f800000 }; +VECT_VAR_DECL(expected_minf,hfloat,32,4) [] = { 0xff800000, 0xff800000, + 0xff800000, 0xff800000 }; +VECT_VAR_DECL(expected_zero1,hfloat,32,4) [] = { 0x80000000, 0x80000000, + 0x80000000, 0x80000000 }; +VECT_VAR_DECL(expected_zero2,hfloat,32,4) [] = { 0x80000000, 0x80000000, + 0x80000000, 0x80000000 }; + +#include "binary_op_no64.inc" diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrhadd.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrhadd.c new file mode 100644 index 0000000..eb82002 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrhadd.c @@ -0,0 +1,34 @@ +#include <arm_neon.h> +#include "arm-neon-ref.h" +#include "compute-ref-data.h" + +#define INSN_NAME vrhadd +#define TEST_MSG "VRHADD/VRHADDQ" + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf2, 0xf2, 0xf3, 0xf3, + 0xf4, 0xf4, 0xf5, 0xf5 }; +VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff1, 0xfff2, 0xfff2, 0xfff3 }; +VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff0, 0xfffffff1 }; +VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf2, 0xf2, 0xf3, 0xf3, + 0xf4, 0xf4, 0xf5, 0xf5 }; +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff1, 0xfff1, 0xfff2, 0xfff2 }; +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 }; +VECT_VAR_DECL(expected,int,8,16) [] = { 0xf2, 0xf3, 0xf3, 0xf4, + 0xf4, 0xf5, 0xf5, 0xf6, + 0xf6, 0xf7, 0xf7, 0xf8, + 0xf8, 0xf9, 0xf9, 0xfa }; +VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff2, 0xfff2, 0xfff3, 0xfff3, + 0xfff4, 0xfff4, 0xfff5, 0xfff5 }; +VECT_VAR_DECL(expected,int,32,4) [] = { 0xfffffff1, 0xfffffff1, + 0xfffffff2, 0xfffffff2 }; +VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf5, 0xf5, 0xf6, 0xf6, + 0xf7, 0xf7, 0xf8, 0xf8, + 0xf9, 0xf9, 0xfa, 0xfa, + 0xfb, 0xfb, 0xfc, 0xfc }; +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff1, 0xfff2, 0xfff2, 0xfff3, + 0xfff3, 0xfff4, 0xfff4, 0xfff5 }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff1, 0xfffffff1, + 0xfffffff2, 0xfffffff2 }; + +#include "binary_op_no64.inc"