diff mbox

[ARM-AArch64/testsuite,v2,01/21] Neon intrinsics execution tests initial framework.

Message ID 1404209174-25364-2-git-send-email-christophe.lyon@linaro.org
State New
Headers show

Commit Message

Christophe Lyon July 1, 2014, 10:05 a.m. UTC
* documentation (README)
* dejanu driver (neon-intrinsics.exp)
* support macros (arm-neon-ref.h, compute-ref-data.h)
* Tests for 3 intrinsics: vaba, vld1, vshl

Comments

Ramana Radhakrishnan July 3, 2014, 11:06 a.m. UTC | #1
On Tue, Jul 1, 2014 at 11:05 AM, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> * documentation (README)
> * dejanu driver (neon-intrinsics.exp)
> * support macros (arm-neon-ref.h, compute-ref-data.h)
> * Tests for 3 intrinsics: vaba, vld1, vshl
>
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index c2ed273..3a0f99b 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,14 @@
> +2014-06-30  Christophe Lyon  <christophe.lyon@linaro.org>
> +
> +       * gcc.target/arm/README.neon-intrinsics: New file.
> +       * gcc.target/aarch64/neon-intrinsics/README: Likewise.
> +       * gcc.target/aarch64/neon-intrinsics/arm-neon-ref.h: Likewise.
> +       * gcc.target/aarch64/neon-intrinsics/compute-ref-data.h: Likewise.
> +       * gcc.target/aarch64/neon-intrinsics/neon-intrinsics.exp: Likewise.
> +       * gcc.target/aarch64/neon-intrinsics/vaba.c: Likewise.
> +       * gcc.target/aarch64/neon-intrinsics/vld1.c: Likewise.
> +       * gcc.target/aarch64/neon-intrinsics/vshl.c: Likewise.
> +


Ok for the ARM backend , please don't put changelogs in patches.

Assuming no regressions when tested on both backends.

Wait for an explicit ack from one of the AArch64 maintainers as this
is technically gcc.target/aarch64


regards
Ramana




>  2014-06-30  Igor Zamyatin  <igor.zamyatin@intel.com>
>
>         PR middle-end/57541
> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
> new file mode 100644
> index 0000000..232bb1d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
> @@ -0,0 +1,132 @@
> +This directory contains executable tests for ARM/AArch64 Neon
> +intrinsics.
> +
> +It is meant to cover execution cases of all the Advanced SIMD
> +intrinsics, but does not scan the generated assembler code.
> +
> +The general framework is composed as follows:
> +- neon-intrinsics.exp: main dejagnu driver
> +- *.c: actual tests, generally one per intrinsinc family
> +- arm-neon-ref.h: contains macro definitions to save typing in actual
> +  test files
> +- compute-ref-data.h: contains input vectors definitions
> +- *.inc: generic tests, shared by several families of intrinsics. For
> +   instance, unary or binary operators
> +
> +A typical .c test file starts with the following contents (look at
> +vld1.c and vaba.c for sample cases):
> +#include <arm_neon.h>
> +#include "arm-neon-ref.h"
> +#include "compute-ref-data.h"
> +
> +Then, definitions of expected results, based on common input values,
> +as defined in compute-ref-data.h.
> +For example:
> +VECT_VAR_DECL(expected,int,16,4) [] = { 0x16, 0x17, 0x18, 0x19 };
> +defines the expected results of an operator generating int16x4 values.
> +
> +The common input values defined in compute-ref-data.h have been chosen
> +to avoid corner-case values for most operators, yet exposing negative
> +values for signed operators. For this reason, their range is also
> +limited. For instance, the initialization of buffer_int16x4 will be
> +{ -16, -15, -14, -13 }.
> +
> +The initialization of floating-point values is done via hex notation,
> +to avoid potential rounding problems.
> +
> +To test special values and corner cases, specific initialization
> +values should be used in dedicated tests, to ensure proper coverage.
> +An example of this is vshl.
> +
> +When a variant of an intrinsic is not available, its expected result
> +should be defined to the value of CLEAN_PATTERN_8 as defined in
> +arm-neon-ref.h. For example:
> +VECT_VAR_DECL(expected,int,64,1) [] = { 0x3333333333333333 };
> +if the given intrinsic has no variant producing an int64x1 result,
> +like the vcmp family (eg. vclt).
> +
> +This is because the helper function (check_results(), defined in
> +arm-neon-ref.h), iterates over all the possible variants, to save
> +typing in each individual test file. Alternatively, one can directly
> +call the CHECK/CHECK_FP macros to check only a few expected results
> +(see vabs.c for an example).
> +
> +Then, define the TEST_MSG string, which will be used when reporting errors.
> +
> +Next, define the function performing the actual tests, in general
> +relying on the helpers provided by arm-neon-ref.h, which means:
> +
> +* declare necessary vectors of suitable types: using
> +  DECL_VARIABLE_ALL_VARIANTS when all variants are supported, or the
> +  relevant of subset calls to DECL_VARIABLE.
> +
> +* call clean_results() to initialize the 'results' buffers.
> +
> +* initialize the input vectors, using VLOAD, VDUP or VSET_LANE (vld*
> +  tests do not need this step, since their actual purpose is to
> +  initialize vectors).
> +
> +* execute the intrinsic on relevant variants, for instance using
> +  TEST_MACRO_ALL_VARIANTS_2_5.
> +
> +* call check_results() to check that the results match the expected
> +  values.
> +
> +A template test file could be:
> +=================================================================
> +#include <arm_neon.h>
> +#include "arm-neon-ref.h"
> +#include "compute-ref-data.h"
> +
> +/* Expected results.  */
> +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf6, 0xf7, 0xf8, 0xf9,
> +                                      0xfa, 0xfb, 0xfc, 0xfd };
> +/* and as many others as necessary.  */
> +
> +#define TEST_MSG "VMYINTRINSIC"
> +void exec_myintrinsic (void)
> +{
> +  /* my test: v4=vmyintrinsic(v1,v2,v3), then store the result.  */
> +#define TEST_VMYINTR(Q, T1, T2, W, N)                                  \
> +  VECT_VAR(vector_res, T1, W, N) =                                     \
> +    vmyintr##Q##_##T2##W(VECT_VAR(vector1, T1, W, N),                  \
> +                        VECT_VAR(vector2, T1, W, N),                   \
> +                        VECT_VAR(vector3, T1, W, N));                  \
> +  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
> +
> +#define DECL_VMYINTR_VAR(VAR)                  \
> +  DECL_VARIABLE(VAR, int, 8, 8);
> +/* And as many others as necessary.  */
> +
> +  DECL_VMYINTR_VAR(vector1);
> +  DECL_VMYINTR_VAR(vector2);
> +  DECL_VMYINTR_VAR(vector3);
> +  DECL_VMYINTR_VAR(vector_res);
> +
> +  clean_results ();
> +
> +  /* Initialize input "vector1" from "buffer".  */
> +  VLOAD(vector1, buffer, , int, s, 8, 8);
> +/* And as many others as necessary.  */
> +
> +  /* Choose init value arbitrarily.  */
> +  VDUP(vector2, , int, s, 8, 8, 1);
> +/* And as many others as necessary.  */
> +
> +  /* Choose init value arbitrarily.  */
> +  VDUP(vector3, , int, s, 8, 8, -5);
> +/* And as many others as necessary.  */
> +
> +  /* Execute the tests.  */
> +  TEST_VMYINTR(, int, s, 8, 8);
> +/* And as many others as necessary.  */
> +
> +  check_results (TEST_MSG, "");
> +}
> +
> +int main (void)
> +{
> +  exec_vmyintrinsic ();
> +  return 0;
> +}
> +=================================================================
> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/arm-neon-ref.h
> new file mode 100644
> index 0000000..8ddcbf2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/arm-neon-ref.h
> @@ -0,0 +1,547 @@
> +/* This file defines helper operations shared by all the tests.  */
> +
> +#ifndef _ARM_NEON_REF_H_
> +#define _ARM_NEON_REF_H_
> +
> +#include <stdio.h>
> +#include <inttypes.h>
> +
> +/* helper type, to help write floating point results in integer form.  */
> +typedef uint32_t hfloat32_t;
> +
> +extern void abort(void);
> +extern void *memset(void *, int, size_t);
> +extern void *memcpy(void *, const void *, size_t);
> +extern size_t strlen(const char *);
> +
> +/* Various string construction helpers.  */
> +
> +/*
> +  The most useful at user-level are VECT_VAR and VECT_VAR_DECL, which
> +   construct variable names or declarations, such as:
> +   VECT_VAR(expected, int, 16, 4) -> expected_int16x4
> +   VECT_VAR_DECL(expected, int, 16, 4) -> int16x4_t expected_int16x4
> +*/
> +
> +#define xSTR(X) #X
> +#define STR(X) xSTR(X)
> +
> +#define xNAME1(V,T) V ## _ ##  T
> +#define xNAME(V,T) xNAME1(V,T)
> +
> +/* VAR(foo,int,16) -> foo_int16 */
> +#define VAR(V,T,W) xNAME(V,T##W)
> +/* VAR_DECL(foo,int,16) -> int16_t foo_int16 */
> +#define VAR_DECL(V, T, W) T##W##_t VAR(V,T,W)
> +
> +/* VECT_NAME(int,16,4) ->  int16x4 */
> +#define VECT_NAME(T, W, N) T##W##x##N
> +/* VECT_ARRAY_NAME(int,16,4,2) -> int16x4x2 */
> +#define VECT_ARRAY_NAME(T, W, N, L) T##W##x##N##x##L
> +/* VECT_TYPE(int,16,4) -> int16x4_t */
> +#define VECT_TYPE(T, W, N) xNAME(VECT_NAME(T,W,N),t)
> +/* VECT_ARRAY_TYPE(int,16,4,2) -> int16x4x2_t */
> +#define VECT_ARRAY_TYPE(T, W, N, L) xNAME(VECT_ARRAY_NAME(T,W,N,L),t)
> +
> +/* VECT_VAR(foo,int,16,4) -> foo_int16x4 */
> +#define VECT_VAR(V,T,W,N) xNAME(V,VECT_NAME(T,W,N))
> +/* VECT_VAR_DECL(foo,int,16,4) -> int16_t foo_int16x4 */
> +#define VECT_VAR_DECL(V, T, W, N) T##W##_t VECT_VAR(V,T,W,N)
> +
> +/* Array declarations.  */
> +/* ARRAY(foo,int,16,4) -> int16_t foo_int16x4[4] */
> +#define ARRAY(V, T, W, N) VECT_VAR_DECL(V,T,W,N)[N]
> +
> +/* Arrays of vectors.  */
> +/* VECT_ARRAY_VAR(foo,int,16,4,2) -> foo_int16x4x2 */
> +#define VECT_ARRAY_VAR(V,T,W,N,L) xNAME(V,VECT_ARRAY_NAME(T,W,N,L))
> +/* VECT_ARRAY(foo,int,16,4,2) -> int16_t foo_int16x4x2[4*2] */
> +#define VECT_ARRAY(V, T, W, N, L) T##W##_t VECT_ARRAY_VAR(V,T,W,N,L)[N*L]
> +
> +/* Check results vs expected values. Operates on one vector.  */
> +#define CHECK(MSG,T,W,N,FMT,EXPECTED,COMMENT)                          \
> +  {                                                                    \
> +    int i;                                                             \
> +    for(i=0; i<N ; i++)                                                        \
> +      {                                                                        \
> +       if (VECT_VAR(result, T, W, N)[i] !=                             \
> +           VECT_VAR(EXPECTED, T, W, N)[i]) {                           \
> +         fprintf(stderr,                                               \
> +                 "ERROR in %s (%s line %d in buffer '%s') at type %s " \
> +                 "index %d: got 0x%" FMT " != 0x%" FMT " %s\n",        \
> +                 MSG, __FILE__, __LINE__,                              \
> +                 STR(EXPECTED),                                        \
> +                 STR(VECT_NAME(T, W, N)),                              \
> +                 i,                                                    \
> +                 VECT_VAR(result, T, W, N)[i],                         \
> +                 VECT_VAR(EXPECTED, T, W, N)[i],                       \
> +                 strlen(COMMENT) > 0 ? COMMENT : "");                  \
> +         abort();                                                      \
> +       }                                                               \
> +      }                                                                        \
> +  }
> +
> +/* Floating-point variant.  */
> +#define CHECK_FP(MSG,T,W,N,FMT,EXPECTED,COMMENT)                       \
> +  {                                                                    \
> +    int i;                                                             \
> +    for(i=0; i<N ; i++)                                                        \
> +      {                                                                        \
> +       union fp_operand {                                              \
> +         uint##W##_t i;                                                \
> +         float##W##_t f;                                               \
> +       } tmp_res, tmp_exp;                                             \
> +       tmp_res.f = VECT_VAR(result, T, W, N)[i];                       \
> +       tmp_exp.i = VECT_VAR(EXPECTED, h##T, W, N)[i];                  \
> +       if (tmp_res.i != tmp_exp.i) {                                   \
> +         fprintf(stderr,                                               \
> +                 "ERROR in %s (%s line %d in buffer '%s') at type %s " \
> +                 "index %d: got 0x%" FMT " != 0x%" FMT " %s\n",        \
> +                 MSG, __FILE__, __LINE__,                              \
> +                 STR(EXPECTED),                                        \
> +                 STR(VECT_NAME(T, W, N)),                              \
> +                 i,                                                    \
> +                 tmp_res.i,                                            \
> +                 tmp_exp.i,                                            \
> +                 strlen(COMMENT) > 0 ? COMMENT : "");                  \
> +         abort();                                                      \
> +       }                                                               \
> +      }                                                                        \
> +  }
> +
> +/* Clean buffer with a non-zero pattern to help diagnose buffer
> +   overflows.  */
> +#define CLEAN_PATTERN_8  0x33
> +
> +#define CLEAN(VAR,T,W,N)                                               \
> +  memset(VECT_VAR(VAR, T, W, N),                                       \
> +        CLEAN_PATTERN_8,                                               \
> +        sizeof(VECT_VAR(VAR, T, W, N)));
> +
> +/* Define output buffers, one of each size.  */
> +static ARRAY(result, int, 8, 8);
> +static ARRAY(result, int, 16, 4);
> +static ARRAY(result, int, 32, 2);
> +static ARRAY(result, int, 64, 1);
> +static ARRAY(result, uint, 8, 8);
> +static ARRAY(result, uint, 16, 4);
> +static ARRAY(result, uint, 32, 2);
> +static ARRAY(result, uint, 64, 1);
> +static ARRAY(result, poly, 8, 8);
> +static ARRAY(result, poly, 16, 4);
> +static ARRAY(result, float, 32, 2);
> +static ARRAY(result, int, 8, 16);
> +static ARRAY(result, int, 16, 8);
> +static ARRAY(result, int, 32, 4);
> +static ARRAY(result, int, 64, 2);
> +static ARRAY(result, uint, 8, 16);
> +static ARRAY(result, uint, 16, 8);
> +static ARRAY(result, uint, 32, 4);
> +static ARRAY(result, uint, 64, 2);
> +static ARRAY(result, poly, 8, 16);
> +static ARRAY(result, poly, 16, 8);
> +static ARRAY(result, float, 32, 4);
> +
> +/* Declare expected results, one of each size. They are defined and
> +   initialized in each test file.  */
> +extern ARRAY(expected, int, 8, 8);
> +extern ARRAY(expected, int, 16, 4);
> +extern ARRAY(expected, int, 32, 2);
> +extern ARRAY(expected, int, 64, 1);
> +extern ARRAY(expected, uint, 8, 8);
> +extern ARRAY(expected, uint, 16, 4);
> +extern ARRAY(expected, uint, 32, 2);
> +extern ARRAY(expected, uint, 64, 1);
> +extern ARRAY(expected, poly, 8, 8);
> +extern ARRAY(expected, poly, 16, 4);
> +extern ARRAY(expected, hfloat, 32, 2);
> +extern ARRAY(expected, int, 8, 16);
> +extern ARRAY(expected, int, 16, 8);
> +extern ARRAY(expected, int, 32, 4);
> +extern ARRAY(expected, int, 64, 2);
> +extern ARRAY(expected, uint, 8, 16);
> +extern ARRAY(expected, uint, 16, 8);
> +extern ARRAY(expected, uint, 32, 4);
> +extern ARRAY(expected, uint, 64, 2);
> +extern ARRAY(expected, poly, 8, 16);
> +extern ARRAY(expected, poly, 16, 8);
> +extern ARRAY(expected, hfloat, 32, 4);
> +
> +/* Check results. Operates on all possible vector types.  */
> +#define CHECK_RESULTS(test_name,comment)                               \
> +  {                                                                    \
> +    CHECK(test_name, int, 8, 8, PRIx8, expected, comment);             \
> +    CHECK(test_name, int, 16, 4, PRIx16, expected, comment);           \
> +    CHECK(test_name, int, 32, 2, PRIx32, expected, comment);           \
> +    CHECK(test_name, int, 64, 1, PRIx64, expected, comment);           \
> +    CHECK(test_name, uint, 8, 8, PRIx8, expected, comment);            \
> +    CHECK(test_name, uint, 16, 4, PRIx16, expected, comment);          \
> +    CHECK(test_name, uint, 32, 2, PRIx32, expected, comment);          \
> +    CHECK(test_name, uint, 64, 1, PRIx64, expected, comment);          \
> +    CHECK(test_name, poly, 8, 8, PRIx8, expected, comment);            \
> +    CHECK(test_name, poly, 16, 4, PRIx16, expected, comment);          \
> +    CHECK_FP(test_name, float, 32, 2, PRIx32, expected, comment);      \
> +                                                                       \
> +    CHECK(test_name, int, 8, 16, PRIx8, expected, comment);            \
> +    CHECK(test_name, int, 16, 8, PRIx16, expected, comment);           \
> +    CHECK(test_name, int, 32, 4, PRIx32, expected, comment);           \
> +    CHECK(test_name, int, 64, 2, PRIx64, expected, comment);           \
> +    CHECK(test_name, uint, 8, 16, PRIx8, expected, comment);           \
> +    CHECK(test_name, uint, 16, 8, PRIx16, expected, comment);          \
> +    CHECK(test_name, uint, 32, 4, PRIx32, expected, comment);          \
> +    CHECK(test_name, uint, 64, 2, PRIx64, expected, comment);          \
> +    CHECK(test_name, poly, 8, 16, PRIx8, expected, comment);           \
> +    CHECK(test_name, poly, 16, 8, PRIx16, expected, comment);          \
> +    CHECK_FP(test_name, float, 32, 4, PRIx32, expected, comment);      \
> +  }                                                                    \
> +
> +#define CHECK_RESULTS_NAMED(test_name,EXPECTED,comment)                        \
> +  {                                                                    \
> +    CHECK(test_name, int, 8, 8, PRIx8, EXPECTED, comment);             \
> +    CHECK(test_name, int, 16, 4, PRIx16, EXPECTED, comment);           \
> +    CHECK(test_name, int, 32, 2, PRIx32, EXPECTED, comment);           \
> +    CHECK(test_name, int, 64, 1, PRIx64, EXPECTED, comment);           \
> +    CHECK(test_name, uint, 8, 8, PRIx8, EXPECTED, comment);            \
> +    CHECK(test_name, uint, 16, 4, PRIx16, EXPECTED, comment);          \
> +    CHECK(test_name, uint, 32, 2, PRIx32, EXPECTED, comment);          \
> +    CHECK(test_name, uint, 64, 1, PRIx64, EXPECTED, comment);          \
> +    CHECK(test_name, poly, 8, 8, PRIx8, EXPECTED, comment);            \
> +    CHECK(test_name, poly, 16, 4, PRIx16, EXPECTED, comment);          \
> +    CHECK_FP(test_name, float, 32, 2, PRIx32, EXPECTED, comment);      \
> +                                                                       \
> +    CHECK(test_name, int, 8, 16, PRIx8, EXPECTED, comment);            \
> +    CHECK(test_name, int, 16, 8, PRIx16, EXPECTED, comment);           \
> +    CHECK(test_name, int, 32, 4, PRIx32, EXPECTED, comment);           \
> +    CHECK(test_name, int, 64, 2, PRIx64, EXPECTED, comment);           \
> +    CHECK(test_name, uint, 8, 16, PRIx8, EXPECTED, comment);           \
> +    CHECK(test_name, uint, 16, 8, PRIx16, EXPECTED, comment);          \
> +    CHECK(test_name, uint, 32, 4, PRIx32, EXPECTED, comment);          \
> +    CHECK(test_name, uint, 64, 2, PRIx64, EXPECTED, comment);          \
> +    CHECK(test_name, poly, 8, 16, PRIx8, EXPECTED, comment);           \
> +    CHECK(test_name, poly, 16, 8, PRIx16, EXPECTED, comment);          \
> +    CHECK_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment);      \
> +  }                                                                    \
> +
> +
> +
> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> +
> +typedef union {
> +  struct {
> +    int _xxx:27;
> +    unsigned int QC:1;
> +    int V:1;
> +    int C:1;
> +    int Z:1;
> +    int N:1;
> +  } b;
> +  unsigned int word;
> +} _ARM_FPSCR;
> +
> +#else /* __ORDER_BIG_ENDIAN__ */
> +
> +typedef union {
> +  struct {
> +    int N:1;
> +    int Z:1;
> +    int C:1;
> +    int V:1;
> +    unsigned int QC:1;
> +    int _dnm:27;
> +  } b;
> +  unsigned int word;
> +} _ARM_FPSCR;
> +
> +#endif /* __ORDER_BIG_ENDIAN__ */
> +
> +#define Neon_Cumulative_Sat  __read_neon_cumulative_sat()
> +#define Set_Neon_Cumulative_Sat(x)  __set_neon_cumulative_sat((x))
> +
> +#if defined(__aarch64__)
> +static volatile int __read_neon_cumulative_sat (void) {
> +    _ARM_FPSCR _afpscr_for_qc;
> +    asm volatile ("mrs %0,fpsr" : "=r" (_afpscr_for_qc));
> +    return _afpscr_for_qc.b.QC;
> +}
> +static void __set_neon_cumulative_sat (int x) {
> +    _ARM_FPSCR _afpscr_for_qc;
> +    asm volatile ("mrs %0,fpsr" : "=r" (_afpscr_for_qc));
> +    _afpscr_for_qc.b.QC = x;
> +    asm volatile ("msr fpsr,%0" : : "r" (_afpscr_for_qc));
> +    return;
> +}
> +#else
> +static volatile int __read_neon_cumulative_sat (void) {
> +    _ARM_FPSCR _afpscr_for_qc;
> +    asm volatile ("vmrs %0,fpscr" : "=r" (_afpscr_for_qc));
> +    return _afpscr_for_qc.b.QC;
> +}
> +
> +static void __set_neon_cumulative_sat (int x) {
> +    _ARM_FPSCR _afpscr_for_qc;
> +    asm volatile ("vmrs %0,fpscr" : "=r" (_afpscr_for_qc));
> +    _afpscr_for_qc.b.QC = x;
> +    asm volatile ("vmsr fpscr,%0" : : "r" (_afpscr_for_qc));
> +    return;
> +}
> +#endif
> +
> +/* Declare expected cumulative saturation results, one for each
> +   size. They are defined and initialized in relevant test files.  */
> +extern int VECT_VAR(expected_cumulative_sat, int, 8, 8);
> +extern int VECT_VAR(expected_cumulative_sat, int, 16, 4);
> +extern int VECT_VAR(expected_cumulative_sat, int, 32, 2);
> +extern int VECT_VAR(expected_cumulative_sat, int, 64, 1);
> +extern int VECT_VAR(expected_cumulative_sat, uint, 8, 8);
> +extern int VECT_VAR(expected_cumulative_sat, uint, 16, 4);
> +extern int VECT_VAR(expected_cumulative_sat, uint, 32, 2);
> +extern int VECT_VAR(expected_cumulative_sat, uint, 64, 1);
> +extern int VECT_VAR(expected_cumulative_sat, int, 8, 16);
> +extern int VECT_VAR(expected_cumulative_sat, int, 16, 8);
> +extern int VECT_VAR(expected_cumulative_sat, int, 32, 4);
> +extern int VECT_VAR(expected_cumulative_sat, int, 64, 2);
> +extern int VECT_VAR(expected_cumulative_sat, uint, 8, 16);
> +extern int VECT_VAR(expected_cumulative_sat, uint, 16, 8);
> +extern int VECT_VAR(expected_cumulative_sat, uint, 32, 4);
> +extern int VECT_VAR(expected_cumulative_sat, uint, 64, 2);
> +
> +/* Check cumulative saturation flag vs expected value.  */
> +#define CHECK_CUMULATIVE_SAT(MSG,T,W,N,EXPECTED,COMMENT)               \
> +  {                                                                    \
> +    if (Neon_Cumulative_Sat !=                                         \
> +       VECT_VAR(EXPECTED, T, W, N)) {                                  \
> +      fprintf(stderr,                                                  \
> +             "ERROR in %s (%s line %d in cumulative_sat '%s') at type %s: " \
> +             "got %d expected %d%s\n",                                 \
> +             MSG, __FILE__, __LINE__,                                  \
> +             STR(EXPECTED),                                            \
> +             STR(VECT_NAME(T, W, N)),                                  \
> +             Neon_Cumulative_Sat,                                      \
> +             VECT_VAR(EXPECTED, T, W, N),                              \
> +             strlen(COMMENT) > 0 ? " " COMMENT : "");                  \
> +      abort();                                                         \
> +    }                                                                  \
> +  }
> +
> +#define CHECK_CUMULATIVE_SAT_NAMED(test_name,EXPECTED,comment)         \
> +  {                                                                    \
> +    CHECK_CUMULATIVE_SAT(test_name, int, 8, 8, PRIx8, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, int, 16, 4, PRIx16, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, int, 32, 2, PRIx32, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, int, 64, 1, PRIx64, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, uint, 8, 8, PRIx8, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, uint, 16, 4, PRIx16, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, uint, 32, 2, PRIx32, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, uint, 64, 1, PRIx64, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, poly, 8, 8, PRIx8, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, poly, 16, 4, PRIx16, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT_FP(test_name, float, 32, 2, PRIx32, EXPECTED, comment); \
> +                                                                       \
> +    CHECK_CUMULATIVE_SAT(test_name, int, 8, 16, PRIx8, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, int, 16, 8, PRIx16, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, int, 32, 4, PRIx32, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, int, 64, 2, PRIx64, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, uint, 8, 16, PRIx8, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, uint, 16, 8, PRIx16, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, uint, 32, 4, PRIx32, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, uint, 64, 2, PRIx64, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, poly, 8, 16, PRIx8, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT(test_name, poly, 16, 8, PRIx16, EXPECTED, comment); \
> +    CHECK_CUMULATIVE_SAT_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment); \
> +  }                                                                    \
> +
> +
> +/* Clean output buffers before execution.  */
> +static void clean_results (void)
> +{
> +  CLEAN(result, int, 8, 8);
> +  CLEAN(result, int, 16, 4);
> +  CLEAN(result, int, 32, 2);
> +  CLEAN(result, int, 64, 1);
> +  CLEAN(result, uint, 8, 8);
> +  CLEAN(result, uint, 16, 4);
> +  CLEAN(result, uint, 32, 2);
> +  CLEAN(result, uint, 64, 1);
> +  CLEAN(result, poly, 8, 8);
> +  CLEAN(result, poly, 16, 4);
> +  CLEAN(result, float, 32, 2);
> +
> +  CLEAN(result, int, 8, 16);
> +  CLEAN(result, int, 16, 8);
> +  CLEAN(result, int, 32, 4);
> +  CLEAN(result, int, 64, 2);
> +  CLEAN(result, uint, 8, 16);
> +  CLEAN(result, uint, 16, 8);
> +  CLEAN(result, uint, 32, 4);
> +  CLEAN(result, uint, 64, 2);
> +  CLEAN(result, poly, 8, 16);
> +  CLEAN(result, poly, 16, 8);
> +  CLEAN(result, float, 32, 4);
> +}
> +
> +
> +/* Helpers to declare variables of various types.   */
> +#define DECL_VARIABLE(VAR, T1, W, N)           \
> +  VECT_TYPE(T1, W, N) VECT_VAR(VAR, T1, W, N)
> +
> +/* Declare only 64 bits signed variants.  */
> +#define DECL_VARIABLE_64BITS_SIGNED_VARIANTS(VAR)      \
> +  DECL_VARIABLE(VAR, int, 8, 8);                       \
> +  DECL_VARIABLE(VAR, int, 16, 4);                      \
> +  DECL_VARIABLE(VAR, int, 32, 2);                      \
> +  DECL_VARIABLE(VAR, int, 64, 1)
> +
> +/* Declare only 64 bits unsigned variants.  */
> +#define DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR)    \
> +  DECL_VARIABLE(VAR, uint, 8, 8);                      \
> +  DECL_VARIABLE(VAR, uint, 16, 4);                     \
> +  DECL_VARIABLE(VAR, uint, 32, 2);                     \
> +  DECL_VARIABLE(VAR, uint, 64, 1)
> +
> +/* Declare only 128 bits signed variants.  */
> +#define DECL_VARIABLE_128BITS_SIGNED_VARIANTS(VAR)     \
> +  DECL_VARIABLE(VAR, int, 8, 16);                      \
> +  DECL_VARIABLE(VAR, int, 16, 8);                      \
> +  DECL_VARIABLE(VAR, int, 32, 4);                      \
> +  DECL_VARIABLE(VAR, int, 64, 2)
> +
> +/* Declare only 128 bits unsigned variants.  */
> +#define DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR)   \
> +  DECL_VARIABLE(VAR, uint, 8, 16);                     \
> +  DECL_VARIABLE(VAR, uint, 16, 8);                     \
> +  DECL_VARIABLE(VAR, uint, 32, 4);                     \
> +  DECL_VARIABLE(VAR, uint, 64, 2)
> +
> +/* Declare all 64 bits variants.  */
> +#define DECL_VARIABLE_64BITS_VARIANTS(VAR)     \
> +  DECL_VARIABLE_64BITS_SIGNED_VARIANTS(VAR);   \
> +  DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR); \
> +  DECL_VARIABLE(VAR, poly, 8, 8);              \
> +  DECL_VARIABLE(VAR, poly, 16, 4);             \
> +  DECL_VARIABLE(VAR, float, 32, 2)
> +
> +/* Declare all 128 bits variants.  */
> +#define DECL_VARIABLE_128BITS_VARIANTS(VAR)    \
> +  DECL_VARIABLE_128BITS_SIGNED_VARIANTS(VAR);  \
> +  DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR);        \
> +  DECL_VARIABLE(VAR, poly, 8, 16);             \
> +  DECL_VARIABLE(VAR, poly, 16, 8);             \
> +  DECL_VARIABLE(VAR, float, 32, 4)
> +
> +/* Declare all variants.  */
> +#define DECL_VARIABLE_ALL_VARIANTS(VAR)                \
> +  DECL_VARIABLE_64BITS_VARIANTS(VAR);          \
> +  DECL_VARIABLE_128BITS_VARIANTS(VAR)
> +
> +/* Declare all signed variants.  */
> +#define DECL_VARIABLE_SIGNED_VARIANTS(VAR)     \
> +  DECL_VARIABLE_64BITS_SIGNED_VARIANTS(VAR);   \
> +  DECL_VARIABLE_128BITS_SIGNED_VARIANTS(VAR)
> +
> +/* Declare all unsigned variants.  */
> +#define DECL_VARIABLE_UNSIGNED_VARIANTS(VAR)   \
> +  DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR); \
> +  DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR)
> +
> +/* Helpers to initialize vectors.  */
> +#define VDUP(VAR, Q, T1, T2, W, N, V)                  \
> +  VECT_VAR(VAR, T1, W, N) = vdup##Q##_n_##T2##W(V)
> +
> +#define VSET_LANE(VAR, Q, T1, T2, W, N, L, V)                          \
> +  VECT_VAR(VAR, T1, W, N) = vset##Q##_lane_##T2##W(V,                  \
> +                                                  VECT_VAR(VAR, T1, W, N), \
> +                                                  L)
> +
> +/* We need to load initial values first, so rely on VLD1.  */
> +#define VLOAD(VAR, BUF, Q, T1, T2, W, N)                               \
> +  VECT_VAR(VAR, T1, W, N) = vld1##Q##_##T2##W(VECT_VAR(BUF, T1, W, N))
> +
> +/* Helpers to call macros with 1 constant and 5 variable
> +   arguments.  */
> +#define TEST_MACRO_64BITS_SIGNED_VARIANTS_1_5(MACRO, VAR)      \
> +  MACRO(VAR, , int, s, 8, 8);                                  \
> +  MACRO(VAR, , int, s, 16, 4);                                 \
> +  MACRO(VAR, , int, s, 32, 2);                                 \
> +  MACRO(VAR, , int, s, 64, 1)
> +
> +#define TEST_MACRO_64BITS_UNSIGNED_VARIANTS_1_5(MACRO, VAR)    \
> +  MACRO(VAR, , uint, u, 8, 8);                                 \
> +  MACRO(VAR, , uint, u, 16, 4);                                        \
> +  MACRO(VAR, , uint, u, 32, 2);                                        \
> +  MACRO(VAR, , uint, u, 64, 1)
> +
> +#define TEST_MACRO_128BITS_SIGNED_VARIANTS_1_5(MACRO, VAR)     \
> +  MACRO(VAR, q, int, s, 8, 16);                                        \
> +  MACRO(VAR, q, int, s, 16, 8);                                        \
> +  MACRO(VAR, q, int, s, 32, 4);                                        \
> +  MACRO(VAR, q, int, s, 64, 2)
> +
> +#define TEST_MACRO_128BITS_UNSIGNED_VARIANTS_1_5(MACRO,VAR)    \
> +  MACRO(VAR, q, uint, u, 8, 16);                               \
> +  MACRO(VAR, q, uint, u, 16, 8);                               \
> +  MACRO(VAR, q, uint, u, 32, 4);                               \
> +  MACRO(VAR, q, uint, u, 64, 2)
> +
> +#define TEST_MACRO_64BITS_VARIANTS_1_5(MACRO, VAR)     \
> +  TEST_MACRO_64BITS_SIGNED_VARIANTS_1_5(MACRO, VAR);   \
> +  TEST_MACRO_64BITS_UNSIGNED_VARIANTS_1_5(MACRO, VAR)
> +
> +#define TEST_MACRO_128BITS_VARIANTS_1_5(MACRO, VAR)    \
> +  TEST_MACRO_128BITS_SIGNED_VARIANTS_1_5(MACRO, VAR);  \
> +  TEST_MACRO_128BITS_UNSIGNED_VARIANTS_1_5(MACRO, VAR)
> +
> +#define TEST_MACRO_ALL_VARIANTS_1_5(MACRO, VAR)        \
> +  TEST_MACRO_64BITS_VARIANTS_1_5(MACRO, VAR);  \
> +  TEST_MACRO_128BITS_VARIANTS_1_5(MACRO, VAR)
> +
> +#define TEST_MACRO_SIGNED_VARIANTS_1_5(MACRO, VAR)     \
> +  TEST_MACRO_64BITS_SIGNED_VARIANTS_1_5(MACRO, VAR);   \
> +  TEST_MACRO_128BITS_SIGNED_VARIANTS_1_5(MACRO, VAR)
> +
> +/* Helpers to call macros with 2 constant and 5 variable
> +   arguments.  */
> +#define TEST_MACRO_64BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)       \
> +  MACRO(VAR1, VAR2, , int, s, 8, 8);                                   \
> +  MACRO(VAR1, VAR2, , int, s, 16, 4);                                  \
> +  MACRO(VAR1, VAR2, , int, s, 32, 2);                                  \
> +  MACRO(VAR1, VAR2 , , int, s, 64, 1)
> +
> +#define TEST_MACRO_64BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)     \
> +  MACRO(VAR1, VAR2, , uint, u, 8, 8);                                  \
> +  MACRO(VAR1, VAR2, , uint, u, 16, 4);                                 \
> +  MACRO(VAR1, VAR2, , uint, u, 32, 2);                                 \
> +  MACRO(VAR1, VAR2, , uint, u, 64, 1)
> +
> +#define TEST_MACRO_128BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)      \
> +  MACRO(VAR1, VAR2, q, int, s, 8, 16);                                 \
> +  MACRO(VAR1, VAR2, q, int, s, 16, 8);                                 \
> +  MACRO(VAR1, VAR2, q, int, s, 32, 4);                                 \
> +  MACRO(VAR1, VAR2, q, int, s, 64, 2)
> +
> +#define TEST_MACRO_128BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)    \
> +  MACRO(VAR1, VAR2, q, uint, u, 8, 16);                                        \
> +  MACRO(VAR1, VAR2, q, uint, u, 16, 8);                                        \
> +  MACRO(VAR1, VAR2, q, uint, u, 32, 4);                                        \
> +  MACRO(VAR1, VAR2, q, uint, u, 64, 2)
> +
> +#define TEST_MACRO_64BITS_VARIANTS_2_5(MACRO, VAR1, VAR2)      \
> +  TEST_MACRO_64BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);    \
> +  TEST_MACRO_64BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);  \
> +  MACRO(VAR1, VAR2, , poly, p, 8, 8);                          \
> +  MACRO(VAR1, VAR2, , poly, p, 16, 4)
> +
> +#define TEST_MACRO_128BITS_VARIANTS_2_5(MACRO, VAR1, VAR2)     \
> +  TEST_MACRO_128BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);   \
> +  TEST_MACRO_128BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2); \
> +  MACRO(VAR1, VAR2, q, poly, p, 8, 16);                                \
> +  MACRO(VAR1, VAR2, q, poly, p, 16, 8)
> +
> +#define TEST_MACRO_ALL_VARIANTS_2_5(MACRO, VAR1, VAR2) \
> +  TEST_MACRO_64BITS_VARIANTS_2_5(MACRO, VAR1, VAR2);   \
> +  TEST_MACRO_128BITS_VARIANTS_2_5(MACRO, VAR1, VAR2)
> +
> +#define TEST_MACRO_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)      \
> +  TEST_MACRO_64BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);    \
> +  TEST_MACRO_128BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)
> +
> +#endif /* _ARM_NEON_REF_H_ */
> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/compute-ref-data.h b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/compute-ref-data.h
> new file mode 100644
> index 0000000..3673d73
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/compute-ref-data.h
> @@ -0,0 +1,204 @@
> +/* This file contains input data static definitions, shared by most of
> +   the tests.  */
> +
> +#include <arm_neon.h>
> +#include "arm-neon-ref.h"
> +
> +/* Initialization helpers; 4 slices are needed for vld2, vld3 and
> +   vld4.  */
> +#define MY_INIT_TABLE(T,W,N) xNAME(INIT_TABLE,N)(T##W##_t)
> +#define MY_INIT_TABLE2(T,W,N) xNAME(INIT_TABLE2,N)(T##W##_t)
> +#define MY_INIT_TABLE3(T,W,N) xNAME(INIT_TABLE3,N)(T##W##_t)
> +#define MY_INIT_TABLE4(T,W,N) xNAME(INIT_TABLE4,N)(T##W##_t)
> +
> +/* Initialized input buffers.  */
> +#define VECT_VAR_DECL_INIT(V, T, W, N)                 \
> +  VECT_VAR_DECL(V,T,W,N) [] = { MY_INIT_TABLE(T,W,N) }
> +
> +/* Specialized initializer with 4 entries, as used by vldX_dup and
> +   vdup tests, which iterate 4 times on input buffers.  */
> +#define VECT_VAR_DECL_INIT4(V, T, W, N)                        \
> +  VECT_VAR_DECL(V,T,W,N) [] = { MY_INIT_TABLE(T,W,4) };
> +
> +/* Initializers for arrays of vectors.  */
> +#define VECT_ARRAY_INIT2(V, T, W, N)           \
> +  T##W##_t VECT_ARRAY_VAR(V,T,W,N,2)[] =       \
> +  { MY_INIT_TABLE(T,W,N)                       \
> +    MY_INIT_TABLE2(T,W,N) }
> +
> +#define VECT_ARRAY_INIT3(V, T, W, N)                   \
> +  T##W##_t VECT_ARRAY_VAR(V,T,W,N,3)[] =               \
> +  { MY_INIT_TABLE(T,W,N)                               \
> +    MY_INIT_TABLE2(T,W,N)                              \
> +    MY_INIT_TABLE3(T,W,N) }
> +
> +#define VECT_ARRAY_INIT4(V, T, W, N)                   \
> +  T##W##_t VECT_ARRAY_VAR(V,T,W,N,4)[] =               \
> +  { MY_INIT_TABLE(T,W,N)                               \
> +    MY_INIT_TABLE2(T,W,N)                              \
> +    MY_INIT_TABLE3(T,W,N)                              \
> +    MY_INIT_TABLE4(T,W,N) }
> +
> +/* Sample initialization vectors.  */
> +#define INIT_TABLE_1(T)                                \
> +  (T)-16,
> +#define INIT_TABLE2_1(T)                       \
> +  (T)-15,
> +#define INIT_TABLE3_1(T)                       \
> +  (T)-14,
> +#define INIT_TABLE4_1(T)                       \
> +  (T)-13,
> +
> +#define INIT_TABLE_2(T)                                \
> +  (T)-16, (T)-15,
> +#define INIT_TABLE2_2(T)                       \
> +  (T)-14, (T)-13,
> +#define INIT_TABLE3_2(T)                       \
> +  (T)-12, (T)-11,
> +#define INIT_TABLE4_2(T)                       \
> +  (T)-10, (T)-9,
> +
> +/* Initializer for vld3_lane tests.  */
> +#define INIT_TABLE_3(T)                                \
> +  (T)-16, (T)-15, (T)-14,
> +
> +#define INIT_TABLE_4(T)                                \
> +  (T)-16, (T)-15, (T)-14, (T)-13,
> +#define INIT_TABLE2_4(T)                       \
> +  (T)-12, (T)-11, (T)-10, (T)-9,
> +#define INIT_TABLE3_4(T)                       \
> +  (T)-8, (T)-7, (T)-6, (T)-5,
> +#define INIT_TABLE4_4(T)                       \
> +  (T)-4, (T)-3, (T)-2, (T)-1,
> +
> +#define INIT_TABLE_8(T)                                                        \
> +  (T)-16, (T)-15, (T)-14, (T)-13, (T)-12, (T)-11, (T)-10, (T)-9,
> +#define INIT_TABLE2_8(T)                                       \
> +  (T)-8, (T)-7, (T)-6, (T)-5, (T)-4, (T)-3, (T)-2, (T)-1,
> +#define INIT_TABLE3_8(T)                               \
> +  (T)0, (T)1, (T)2, (T)3, (T)4, (T)5, (T)6, (T)7,
> +#define INIT_TABLE4_8(T)                               \
> +  (T)8, (T)9, (T)10, (T)11, (T)12, (T)13, (T)14, (T)15,
> +
> +#define INIT_TABLE_16(T)                                               \
> +  (T)-16, (T)-15, (T)-14, (T)-13, (T)-12, (T)-11, (T)-10, (T)-9,       \
> +  (T)-8, (T)-7, (T)-6, (T)-5, (T)-4, (T)-3, (T)-2, (T)-1,
> +#define INIT_TABLE2_16(T)                                              \
> +  (T)0, (T)1, (T)2, (T)3, (T)4, (T)5, (T)6, (T)7,                      \
> +  (T)8, (T)9, (T)10, (T)11, (T)12, (T)13, (T)14, (T)15,
> +#define INIT_TABLE3_16(T)                                              \
> +  (T)16, (T)17, (T)18, (T)19, (T)20, (T)21, (T)22, (T)23,              \
> +   (T)24, (T)25, (T)26, (T)27, (T)28, (T)29, (T)30, (T)31,
> +#define INIT_TABLE4_16(T)                                              \
> +  (T)32, (T)33, (T)34, (T)35, (T)36, (T)37, (T)38, (T)39,              \
> +  (T)40, (T)41, (T)42, (T)43, (T)44, (T)45, (T)46, (T)47,
> +
> +/* This one is used for padding between input buffers.  */
> +#define PAD(V, T, W, N) char VECT_VAR(V,T,W,N)=42
> +
> +/* Input buffers, one of each size.  */
> +/* Insert some padding to try to exhibit out of bounds accesses.  */
> +VECT_VAR_DECL_INIT(buffer, int, 8, 8);
> +PAD(buffer_pad, int, 8, 8);
> +VECT_VAR_DECL_INIT(buffer, int, 16, 4);
> +PAD(buffer_pad, int, 16, 4);
> +VECT_VAR_DECL_INIT(buffer, int, 32, 2);
> +PAD(buffer_pad, int, 32, 2);
> +VECT_VAR_DECL_INIT(buffer, int, 64, 1);
> +PAD(buffer_pad, int, 64, 1);
> +VECT_VAR_DECL_INIT(buffer, uint, 8, 8);
> +PAD(buffer_pad, uint, 8, 8);
> +VECT_VAR_DECL_INIT(buffer, poly, 8, 8);
> +PAD(buffer_pad, poly, 8, 8);
> +VECT_VAR_DECL_INIT(buffer, poly, 16, 4);
> +PAD(buffer_pad, poly, 16, 4);
> +VECT_VAR_DECL_INIT(buffer, uint, 16, 4);
> +PAD(buffer_pad, uint, 16, 4);
> +VECT_VAR_DECL_INIT(buffer, uint, 32, 2);
> +PAD(buffer_pad, uint, 32, 2);
> +VECT_VAR_DECL_INIT(buffer, uint, 64, 1);
> +PAD(buffer_pad, uint, 64, 1);
> +VECT_VAR_DECL_INIT(buffer, float, 32, 2);
> +PAD(buffer_pad, float, 32, 2);
> +VECT_VAR_DECL_INIT(buffer, int, 8, 16);
> +PAD(buffer_pad, int, 8, 16);
> +VECT_VAR_DECL_INIT(buffer, int, 16, 8);
> +PAD(buffer_pad, int, 16, 8);
> +VECT_VAR_DECL_INIT(buffer, int, 32, 4);
> +PAD(buffer_pad, int, 32, 4);
> +VECT_VAR_DECL_INIT(buffer, int, 64, 2);
> +PAD(buffer_pad, int, 64, 2);
> +VECT_VAR_DECL_INIT(buffer, uint, 8, 16);
> +PAD(buffer_pad, uint, 8, 16);
> +VECT_VAR_DECL_INIT(buffer, uint, 16, 8);
> +PAD(buffer_pad, uint, 16, 8);
> +VECT_VAR_DECL_INIT(buffer, uint, 32, 4);
> +PAD(buffer_pad, uint, 32, 4);
> +VECT_VAR_DECL_INIT(buffer, uint, 64, 2);
> +PAD(buffer_pad, uint, 64, 2);
> +VECT_VAR_DECL_INIT(buffer, poly, 8, 16);
> +PAD(buffer_pad, poly, 8, 16);
> +VECT_VAR_DECL_INIT(buffer, poly, 16, 8);
> +PAD(buffer_pad, poly, 16, 8);
> +VECT_VAR_DECL_INIT(buffer, float, 32, 4);
> +PAD(buffer_pad, float, 32, 4);
> +
> +/* The tests for vld1_dup and vdup expect at least 4 entries in the
> +   input buffer, so force 1- and 2-elements initializers to have 4
> +   entries (using VECT_VAR_DECL_INIT4).  */
> +VECT_VAR_DECL_INIT(buffer_dup, int, 8, 8);
> +VECT_VAR_DECL(buffer_dup_pad, int, 8, 8);
> +VECT_VAR_DECL_INIT(buffer_dup, int, 16, 4);
> +VECT_VAR_DECL(buffer_dup_pad, int, 16, 4);
> +VECT_VAR_DECL_INIT4(buffer_dup, int, 32, 2);
> +VECT_VAR_DECL(buffer_dup_pad, int, 32, 2);
> +VECT_VAR_DECL_INIT4(buffer_dup, int, 64, 1);
> +VECT_VAR_DECL(buffer_dup_pad, int, 64, 1);
> +VECT_VAR_DECL_INIT(buffer_dup, uint, 8, 8);
> +VECT_VAR_DECL(buffer_dup_pad, uint, 8, 8);
> +VECT_VAR_DECL_INIT(buffer_dup, uint, 16, 4);
> +VECT_VAR_DECL(buffer_dup_pad, uint, 16, 4);
> +VECT_VAR_DECL_INIT4(buffer_dup, uint, 32, 2);
> +VECT_VAR_DECL(buffer_dup_pad, uint, 32, 2);
> +VECT_VAR_DECL_INIT4(buffer_dup, uint, 64, 1);
> +VECT_VAR_DECL(buffer_dup_pad, uint, 64, 1);
> +VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 8);
> +VECT_VAR_DECL(buffer_dup_pad, poly, 8, 8);
> +VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 4);
> +VECT_VAR_DECL(buffer_dup_pad, poly, 16, 4);
> +VECT_VAR_DECL_INIT4(buffer_dup, float, 32, 2);
> +VECT_VAR_DECL(buffer_dup_pad, float, 32, 2);
> +#if __ARM_NEON_FP16_INTRINSICS
> +VECT_VAR_DECL(buffer_dup, float, 16, 4)[] = {0xcc00 /* -16 */, 0xcb80 /* -15 */,
> +                                            0xcb00 /* -14 */, 0xca80 /* -13 */};
> +PAD(buffer_dup_pad, float, 16, 4);
> +#endif
> +VECT_VAR_DECL_INIT(buffer_dup, int, 8, 16);
> +VECT_VAR_DECL(buffer_dup_pad, int, 8, 16);
> +VECT_VAR_DECL_INIT(buffer_dup, int, 16, 8);
> +VECT_VAR_DECL(buffer_dup_pad, int, 16, 8);
> +VECT_VAR_DECL_INIT(buffer_dup, int, 32, 4);
> +VECT_VAR_DECL(buffer_dup_pad, int, 32, 4);
> +VECT_VAR_DECL_INIT4(buffer_dup, int, 64, 2);
> +VECT_VAR_DECL(buffer_dup_pad, int, 64, 2);
> +VECT_VAR_DECL_INIT(buffer_dup, uint, 8, 16);
> +VECT_VAR_DECL(buffer_dup_pad, uint, 8, 16);
> +VECT_VAR_DECL_INIT(buffer_dup, uint, 16, 8);
> +VECT_VAR_DECL(buffer_dup_pad, uint, 16, 8);
> +VECT_VAR_DECL_INIT(buffer_dup, uint, 32, 4);
> +VECT_VAR_DECL(buffer_dup_pad, uint, 32, 4);
> +VECT_VAR_DECL_INIT4(buffer_dup, uint, 64, 2);
> +VECT_VAR_DECL(buffer_dup_pad, uint, 64, 2);
> +VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 16);
> +VECT_VAR_DECL(buffer_dup_pad, poly, 8, 16);
> +VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 8);
> +VECT_VAR_DECL(buffer_dup_pad, poly, 16, 8);
> +VECT_VAR_DECL_INIT(buffer_dup, float, 32, 4);
> +VECT_VAR_DECL(buffer_dup_pad, float, 32, 4);
> +#if __ARM_NEON_FP16_INTRINSICS
> +VECT_VAR_DECL(buffer_dup, float, 16, 8)[] = {0xcc00 /* -16 */, 0xcb80 /* -15 */,
> +                                            0xcb00 /* -14 */, 0xca80 /* -13 */,
> +                                            0xca00 /* -12 */, 0xc980 /* -11 */,
> +                                            0xc900 /* -10 */, 0xc880 /* -9 */};
> +PAD(buffer_dup_pad, float, 16, 8);
> +#endif
> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/neon-intrinsics.exp b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/neon-intrinsics.exp
> new file mode 100644
> index 0000000..38bd207
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/neon-intrinsics.exp
> @@ -0,0 +1,54 @@
> +# Copyright (C) 2013 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with GCC; see the file COPYING3.  If not see
> +# <http://www.gnu.org/licenses/>.
> +
> +# GCC testsuite that uses the `dg.exp' driver.
> +
> +# Exit immediately if this isn't an ARM or AArch64 target.
> +if {![istarget arm*-*-*]
> +    && ![istarget aarch64*-*-*]} then {
> +  return
> +}
> +
> +# Load support procs.
> +load_lib gcc-dg.exp
> +
> +# Initialize `dg'.
> +load_lib c-torture.exp
> +load_lib target-supports.exp
> +load_lib torture-options.exp
> +
> +dg-init
> +
> +torture-init
> +set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS
> +
> +# Make sure Neon flags are provided, if necessary.
> +set additional_flags [add_options_for_arm_neon ""]
> +
> +# Main loop.
> +foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
> +    # If we're only testing specific files and this isn't one of them, skip it.
> +    if ![runtest_file_p $runtests $src] then {
> +        continue
> +    }
> +
> +    c-torture-execute $src $additional_flags
> +    gcc-dg-runtest $src $additional_flags
> +}
> +
> +# All done.
> +torture-finish
> +dg-finish
> \ No newline at end of file
> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vaba.c b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vaba.c
> new file mode 100644
> index 0000000..2465cd2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vaba.c
> @@ -0,0 +1,142 @@
> +#include <arm_neon.h>
> +#include "arm-neon-ref.h"
> +#include "compute-ref-data.h"
> +
> +/* Expected results.  */
> +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf6, 0xf7, 0xf8, 0xf9,
> +                                      0xfa, 0xfb, 0xfc, 0xfd };
> +VECT_VAR_DECL(expected,int,16,4) [] = { 0x16, 0x17, 0x18, 0x19 };
> +VECT_VAR_DECL(expected,int,32,2) [] = { 0x20, 0x21 };
> +VECT_VAR_DECL(expected,int,64,1) [] = { 0x3333333333333333 };
> +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x53, 0x54, 0x55, 0x56,
> +                                       0x57, 0x58, 0x59, 0x5a };
> +VECT_VAR_DECL(expected,uint,16,4) [] = { 0x907, 0x908, 0x909, 0x90a };
> +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xffffffe7, 0xffffffe8 };
> +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x3333333333333333 };
> +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
> +                                       0x33, 0x33, 0x33, 0x33 };
> +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x3333, 0x3333, 0x3333, 0x3333 };
> +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x33333333, 0x33333333 };
> +VECT_VAR_DECL(expected,int,8,16) [] = { 0x5e, 0x5f, 0x60, 0x61,
> +                                       0x62, 0x63, 0x64, 0x65,
> +                                       0x66, 0x67, 0x68, 0x69,
> +                                       0x6a, 0x6b, 0x6c, 0x6d };
> +VECT_VAR_DECL(expected,int,16,8) [] = { 0xb9c, 0xb9d, 0xb9e, 0xb9f,
> +                                       0xba0, 0xba1, 0xba2, 0xba3 };
> +VECT_VAR_DECL(expected,int,32,4) [] = { 0x26e0, 0x26e1, 0x26e2, 0x26e3 };
> +VECT_VAR_DECL(expected,int,64,2) [] = { 0x3333333333333333,
> +                                       0x3333333333333333 };
> +VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf8, 0xf9, 0xfa, 0xfb,
> +                                        0xfc, 0xfd, 0xfe, 0xff,
> +                                        0x0, 0x1, 0x2, 0x3,
> +                                        0x4, 0x5, 0x6, 0x7 };
> +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff9, 0xfffa, 0xfffb, 0xfffc,
> +                                        0xfffd, 0xfffe, 0xffff, 0x0 };
> +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xc, 0xd, 0xe, 0xf };
> +VECT_VAR_DECL(expected,uint,64,2) [] = { 0x3333333333333333,
> +                                        0x3333333333333333 };
> +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
> +                                        0x33, 0x33, 0x33, 0x33,
> +                                        0x33, 0x33, 0x33, 0x33,
> +                                        0x33, 0x33, 0x33, 0x33 };
> +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x3333, 0x3333, 0x3333, 0x3333,
> +                                        0x3333, 0x3333, 0x3333, 0x3333 };
> +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x33333333, 0x33333333,
> +                                          0x33333333, 0x33333333 };
> +
> +#define TEST_MSG "VABA/VABAQ"
> +void exec_vaba (void)
> +{
> +  /* Basic test: v4=vaba(v1,v2,v3), then store the result.  */
> +#define TEST_VABA(Q, T1, T2, W, N)                                     \
> +  VECT_VAR(vector_res, T1, W, N) =                                     \
> +    vaba##Q##_##T2##W(VECT_VAR(vector1, T1, W, N),                     \
> +                     VECT_VAR(vector2, T1, W, N),                      \
> +                     VECT_VAR(vector3, T1, W, N));                     \
> +  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
> +
> +#define DECL_VABA_VAR(VAR)                     \
> +  DECL_VARIABLE(VAR, int, 8, 8);               \
> +  DECL_VARIABLE(VAR, int, 16, 4);              \
> +  DECL_VARIABLE(VAR, int, 32, 2);              \
> +  DECL_VARIABLE(VAR, uint, 8, 8);              \
> +  DECL_VARIABLE(VAR, uint, 16, 4);             \
> +  DECL_VARIABLE(VAR, uint, 32, 2);             \
> +  DECL_VARIABLE(VAR, int, 8, 16);              \
> +  DECL_VARIABLE(VAR, int, 16, 8);              \
> +  DECL_VARIABLE(VAR, int, 32, 4);              \
> +  DECL_VARIABLE(VAR, uint, 8, 16);             \
> +  DECL_VARIABLE(VAR, uint, 16, 8);             \
> +  DECL_VARIABLE(VAR, uint, 32, 4)
> +
> +  DECL_VABA_VAR(vector1);
> +  DECL_VABA_VAR(vector2);
> +  DECL_VABA_VAR(vector3);
> +  DECL_VABA_VAR(vector_res);
> +
> +  clean_results ();
> +
> +  /* Initialize input "vector1" from "buffer".  */
> +  VLOAD(vector1, buffer, , int, s, 8, 8);
> +  VLOAD(vector1, buffer, , int, s, 16, 4);
> +  VLOAD(vector1, buffer, , int, s, 32, 2);
> +  VLOAD(vector1, buffer, , uint, u, 8, 8);
> +  VLOAD(vector1, buffer, , uint, u, 16, 4);
> +  VLOAD(vector1, buffer, , uint, u, 32, 2);
> +  VLOAD(vector1, buffer, q, int, s, 8, 16);
> +  VLOAD(vector1, buffer, q, int, s, 16, 8);
> +  VLOAD(vector1, buffer, q, int, s, 32, 4);
> +  VLOAD(vector1, buffer, q, uint, u, 8, 16);
> +  VLOAD(vector1, buffer, q, uint, u, 16, 8);
> +  VLOAD(vector1, buffer, q, uint, u, 32, 4);
> +
> +  /* Choose init value arbitrarily.  */
> +  VDUP(vector2, , int, s, 8, 8, 1);
> +  VDUP(vector2, , int, s, 16, 4, -13);
> +  VDUP(vector2, , int, s, 32, 2, 8);
> +  VDUP(vector2, , uint, u, 8, 8, 1);
> +  VDUP(vector2, , uint, u, 16, 4, 13);
> +  VDUP(vector2, , uint, u, 32, 2, 8);
> +  VDUP(vector2, q, int, s, 8, 16, 10);
> +  VDUP(vector2, q, int, s, 16, 8, -12);
> +  VDUP(vector2, q, int, s, 32, 4, 32);
> +  VDUP(vector2, q, uint, u, 8, 16, 10);
> +  VDUP(vector2, q, uint, u, 16, 8, 12);
> +  VDUP(vector2, q, uint, u, 32, 4, 32);
> +
> +  /* Choose init value arbitrarily.  */
> +  VDUP(vector3, , int, s, 8, 8, -5);
> +  VDUP(vector3, , int, s, 16, 4, 25);
> +  VDUP(vector3, , int, s, 32, 2, -40);
> +  VDUP(vector3, , uint, u, 8, 8, 100);
> +  VDUP(vector3, , uint, u, 16, 4, 2340);
> +  VDUP(vector3, , uint, u, 32, 2, 0xffffffff);
> +  VDUP(vector3, q, int, s, 8, 16, -100);
> +  VDUP(vector3, q, int, s, 16, 8, -3000);
> +  VDUP(vector3, q, int, s, 32, 4, 10000);
> +  VDUP(vector3, q, uint, u, 8, 16, 2);
> +  VDUP(vector3, q, uint, u, 16, 8, 3);
> +  VDUP(vector3, q, uint, u, 32, 4, 4);
> +
> +  /* Execute the tests.  */
> +  TEST_VABA(, int, s, 8, 8);
> +  TEST_VABA(, int, s, 16, 4);
> +  TEST_VABA(, int, s, 32, 2);
> +  TEST_VABA(, uint, u, 8, 8);
> +  TEST_VABA(, uint, u, 16, 4);
> +  TEST_VABA(, uint, u, 32, 2);
> +  TEST_VABA(q, int, s, 8, 16);
> +  TEST_VABA(q, int, s, 16, 8);
> +  TEST_VABA(q, int, s, 32, 4);
> +  TEST_VABA(q, uint, u, 8, 16);
> +  TEST_VABA(q, uint, u, 16, 8);
> +  TEST_VABA(q, uint, u, 32, 4);
> +
> +  CHECK_RESULTS (TEST_MSG, "");
> +}
> +
> +int main (void)
> +{
> +  exec_vaba ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vld1.c b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vld1.c
> new file mode 100644
> index 0000000..784e004
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vld1.c
> @@ -0,0 +1,84 @@
> +#include <arm_neon.h>
> +#include "arm-neon-ref.h"
> +#include "compute-ref-data.h"
> +
> +/* Expected results.  */
> +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
> +                                      0xf4, 0xf5, 0xf6, 0xf7 };
> +VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
> +VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff0, 0xfffffff1 };
> +VECT_VAR_DECL(expected,int,64,1) [] = { 0xfffffffffffffff0 };
> +VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
> +                                       0xf4, 0xf5, 0xf6, 0xf7 };
> +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
> +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
> +VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
> +VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
> +                                       0xf4, 0xf5, 0xf6, 0xf7 };
> +VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
> +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
> +VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
> +                                       0xf4, 0xf5, 0xf6, 0xf7,
> +                                       0xf8, 0xf9, 0xfa, 0xfb,
> +                                       0xfc, 0xfd, 0xfe, 0xff };
> +VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
> +                                       0xfff4, 0xfff5, 0xfff6, 0xfff7 };
> +VECT_VAR_DECL(expected,int,32,4) [] = { 0xfffffff0, 0xfffffff1,
> +                                       0xfffffff2, 0xfffffff3 };
> +VECT_VAR_DECL(expected,int,64,2) [] = { 0xfffffffffffffff0,
> +                                       0xfffffffffffffff1 };
> +VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
> +                                        0xf4, 0xf5, 0xf6, 0xf7,
> +                                        0xf8, 0xf9, 0xfa, 0xfb,
> +                                        0xfc, 0xfd, 0xfe, 0xff };
> +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0xfff1, 0xfff2,
> +                                        0xfff3, 0xfff4, 0xfff5,
> +                                        0xfff6, 0xfff7 };
> +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
> +                                        0xfffffff2, 0xfffffff3 };
> +VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfffffffffffffff0,
> +                                        0xfffffffffffffff1 };
> +VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
> +                                        0xf4, 0xf5, 0xf6, 0xf7,
> +                                        0xf8, 0xf9, 0xfa, 0xfb,
> +                                        0xfc, 0xfd, 0xfe, 0xff };
> +VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
> +                                        0xfff4, 0xfff5, 0xfff6, 0xfff7 };
> +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
> +                                          0xc1600000, 0xc1500000 };
> +
> +#define TEST_MSG "VLD1/VLD1Q"
> +void exec_vld1 (void)
> +{
> +  /* Basic test vec=vld1(buffer); then store vec: vst1(result, vector).  */
> +  /* This test actually tests vdl1 and vst1 at the same time.  */
> +#define TEST_VLD1(VAR, BUF, Q, T1, T2, W, N)                           \
> +  VECT_VAR(VAR, T1, W, N) = vld1##Q##_##T2##W(VECT_VAR(BUF, T1, W, N)); \
> +  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(VAR, T1, W, N))
> +
> +  DECL_VARIABLE_ALL_VARIANTS(vector);
> +#if __ARM_NEON_FP16_INTRINSICS
> +  DECL_VARIABLE(vector, float, 16, 4);
> +  DECL_VARIABLE(vector, float, 16, 8);
> +#endif
> +
> +  clean_results ();
> +
> +  TEST_MACRO_ALL_VARIANTS_2_5(TEST_VLD1, vector, buffer);
> +
> +  TEST_VLD1(vector, buffer, , float, f, 32, 2);
> +  TEST_VLD1(vector, buffer, q, float, f, 32, 4);
> +
> +#if __ARM_NEON_FP16_INTRINSICS
> +  TEST_VLD1(vector, buffer, , float, f, 16, 4);
> +  TEST_VLD1(vector, buffer, q, float, f, 16, 8);
> +#endif
> +
> +  CHECK_RESULTS (TEST_MSG, "");
> +}
> +
> +int main (void)
> +{
> +  exec_vld1 ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vshl.c b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vshl.c
> new file mode 100644
> index 0000000..e64d6e3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vshl.c
> @@ -0,0 +1,230 @@
> +#include <arm_neon.h>
> +#include "arm-neon-ref.h"
> +#include "compute-ref-data.h"
> +
> +/* Expected results.  */
> +VECT_VAR_DECL(expected,int,8,8) [] = { 0xe0, 0xe2, 0xe4, 0xe6,
> +                                      0xe8, 0xea, 0xec, 0xee };
> +VECT_VAR_DECL(expected,int,16,4) [] = { 0xff80, 0xff88, 0xff90, 0xff98 };
> +VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffff000, 0xfffff100 };
> +VECT_VAR_DECL(expected,int,64,1) [] = { 0xffffffffffffff80 };
> +VECT_VAR_DECL(expected,uint,8,8) [] = { 0xe0, 0xe2, 0xe4, 0xe6,
> +                                       0xe8, 0xea, 0xec, 0xee };
> +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xff80, 0xff88, 0xff90, 0xff98 };
> +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffff000, 0xfffff100 };
> +VECT_VAR_DECL(expected,uint,64,1) [] = { 0xffffffffffffff80 };
> +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
> +                                       0x33, 0x33, 0x33, 0x33 };
> +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x3333, 0x3333,
> +                                        0x3333, 0x3333 };
> +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x33333333, 0x33333333 };
> +VECT_VAR_DECL(expected,int,8,16) [] = { 0x0, 0x20, 0x40, 0x60,
> +                                       0x80, 0xa0, 0xc0, 0xe0,
> +                                       0x0, 0x20, 0x40, 0x60,
> +                                       0x80, 0xa0, 0xc0, 0xe0 };
> +VECT_VAR_DECL(expected,int,16,8) [] = { 0x0, 0x1000, 0x2000, 0x3000,
> +                                       0x4000, 0x5000, 0x6000, 0x7000 };
> +VECT_VAR_DECL(expected,int,32,4) [] = { 0x0, 0x40000000,
> +                                       0x80000000, 0xc0000000 };
> +VECT_VAR_DECL(expected,int,64,2) [] = { 0x0, 0x8000000000000000 };
> +VECT_VAR_DECL(expected,uint,8,16) [] = { 0x0, 0x20, 0x40, 0x60,
> +                                        0x80, 0xa0, 0xc0, 0xe0,
> +                                        0x0, 0x20, 0x40, 0x60,
> +                                        0x80, 0xa0, 0xc0, 0xe0 };
> +VECT_VAR_DECL(expected,uint,16,8) [] = { 0x0, 0x1000, 0x2000, 0x3000,
> +                                        0x4000, 0x5000, 0x6000, 0x7000 };
> +VECT_VAR_DECL(expected,uint,32,4) [] = { 0x0, 0x40000000,
> +                                        0x80000000, 0xc0000000 };
> +VECT_VAR_DECL(expected,uint,64,2) [] = { 0x0, 0x8000000000000000 };
> +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
> +                                        0x33, 0x33, 0x33, 0x33,
> +                                        0x33, 0x33, 0x33, 0x33,
> +                                        0x33, 0x33, 0x33, 0x33 };
> +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x3333, 0x3333, 0x3333, 0x3333,
> +                                        0x3333, 0x3333, 0x3333, 0x3333 };
> +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x33333333, 0x33333333,
> +                                          0x33333333, 0x33333333 };
> +
> +/* Expected results with large shift amount.  */
> +VECT_VAR_DECL(expected_large_shift,int,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
> +                                                  0x0, 0x0, 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,int,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,int,32,2) [] = { 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,int,64,1) [] = { 0x0 };
> +VECT_VAR_DECL(expected_large_shift,uint,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
> +                                                   0x0, 0x0, 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,uint,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,uint,32,2) [] = { 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,uint,64,1) [] = { 0x0 };
> +VECT_VAR_DECL(expected_large_shift,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
> +                                                   0x33, 0x33, 0x33, 0x33 };
> +VECT_VAR_DECL(expected_large_shift,poly,16,4) [] = { 0x3333, 0x3333,
> +                                                    0x3333, 0x3333 };
> +VECT_VAR_DECL(expected_large_shift,hfloat,32,2) [] = { 0x33333333, 0x33333333 };
> +VECT_VAR_DECL(expected_large_shift,int,8,16) [] = { 0x0, 0x0, 0x0, 0x0,
> +                                                   0x0, 0x0, 0x0, 0x0,
> +                                                   0x0, 0x0, 0x0, 0x0,
> +                                                   0x0, 0x0, 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
> +                                                   0x0, 0x0, 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,int,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,int,64,2) [] = { 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,uint,8,16) [] = { 0x0, 0x0, 0x0, 0x0,
> +                                                    0x0, 0x0, 0x0, 0x0,
> +                                                    0x0, 0x0, 0x0, 0x0,
> +                                                    0x0, 0x0, 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
> +                                                    0x0, 0x0, 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,uint,64,2) [] = { 0x0, 0x0 };
> +VECT_VAR_DECL(expected_large_shift,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
> +                                                    0x33, 0x33, 0x33, 0x33,
> +                                                    0x33, 0x33, 0x33, 0x33,
> +                                                    0x33, 0x33, 0x33, 0x33 };
> +VECT_VAR_DECL(expected_large_shift,poly,16,8) [] = { 0x3333, 0x3333,
> +                                                    0x3333, 0x3333,
> +                                                    0x3333, 0x3333,
> +                                                    0x3333, 0x3333 };
> +VECT_VAR_DECL(expected_large_shift,hfloat,32,4) [] = { 0x33333333, 0x33333333,
> +                                                      0x33333333, 0x33333333 };
> +
> +
> +/* Expected results with negative shift amount.  */
> +VECT_VAR_DECL(expected_negative_shift,int,8,8) [] = { 0xf8, 0xf8, 0xf9, 0xf9,
> +                                                     0xfa, 0xfa, 0xfb, 0xfb };
> +VECT_VAR_DECL(expected_negative_shift,int,16,4) [] = { 0xfff8, 0xfff8,
> +                                                      0xfff9, 0xfff9  };
> +VECT_VAR_DECL(expected_negative_shift,int,32,2) [] = { 0xfffffffc, 0xfffffffc };
> +VECT_VAR_DECL(expected_negative_shift,int,64,1) [] = { 0xffffffffffffffff };
> +VECT_VAR_DECL(expected_negative_shift,uint,8,8) [] = { 0x78, 0x78, 0x79, 0x79,
> +                                                      0x7a, 0x7a, 0x7b, 0x7b };
> +VECT_VAR_DECL(expected_negative_shift,uint,16,4) [] = { 0x7ff8, 0x7ff8,
> +                                                       0x7ff9, 0x7ff9 };
> +VECT_VAR_DECL(expected_negative_shift,uint,32,2) [] = { 0x3ffffffc,
> +                                                       0x3ffffffc };
> +VECT_VAR_DECL(expected_negative_shift,uint,64,1) [] = { 0xfffffffffffffff };
> +VECT_VAR_DECL(expected_negative_shift,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
> +                                                      0x33, 0x33, 0x33, 0x33 };
> +VECT_VAR_DECL(expected_negative_shift,poly,16,4) [] = { 0x3333, 0x3333,
> +                                                       0x3333, 0x3333 };
> +VECT_VAR_DECL(expected_negative_shift,hfloat,32,2) [] = { 0x33333333,
> +                                                         0x33333333 };
> +VECT_VAR_DECL(expected_negative_shift,int,8,16) [] = { 0xfc, 0xfc, 0xfc, 0xfc,
> +                                                      0xfd, 0xfd, 0xfd, 0xfd,
> +                                                      0xfe, 0xfe, 0xfe, 0xfe,
> +                                                      0xff, 0xff, 0xff, 0xff };
> +VECT_VAR_DECL(expected_negative_shift,int,16,8) [] = { 0xffff, 0xffff,
> +                                                      0xffff, 0xffff,
> +                                                      0xffff, 0xffff,
> +                                                      0xffff, 0xffff };
> +VECT_VAR_DECL(expected_negative_shift,int,32,4) [] = {  0xfffffffe, 0xfffffffe,
> +                                                       0xfffffffe, 0xfffffffe };
> +VECT_VAR_DECL(expected_negative_shift,int,64,2) [] = { 0xffffffffffffffff,
> +                                                      0xffffffffffffffff };
> +VECT_VAR_DECL(expected_negative_shift,uint,8,16) [] = { 0x3c, 0x3c, 0x3c, 0x3c,
> +                                                       0x3d, 0x3d, 0x3d, 0x3d,
> +                                                       0x3e, 0x3e, 0x3e, 0x3e,
> +                                                       0x3f, 0x3f, 0x3f, 0x3f };
> +VECT_VAR_DECL(expected_negative_shift,uint,16,8) [] = { 0x7ff, 0x7ff,
> +                                                       0x7ff, 0x7ff,
> +                                                       0x7ff, 0x7ff,
> +                                                       0x7ff, 0x7ff };
> +VECT_VAR_DECL(expected_negative_shift,uint,32,4) [] = { 0x1ffffffe, 0x1ffffffe,
> +                                                       0x1ffffffe, 0x1ffffffe };
> +VECT_VAR_DECL(expected_negative_shift,uint,64,2) [] = { 0x7ffffffffffffff,
> +                                                       0x7ffffffffffffff };
> +VECT_VAR_DECL(expected_negative_shift,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
> +                                                       0x33, 0x33, 0x33, 0x33,
> +                                                       0x33, 0x33, 0x33, 0x33,
> +                                                       0x33, 0x33, 0x33, 0x33 };
> +VECT_VAR_DECL(expected_negative_shift,poly,16,8) [] = { 0x3333, 0x3333,
> +                                                       0x3333, 0x3333,
> +                                                       0x3333, 0x3333,
> +                                                       0x3333, 0x3333 };
> +VECT_VAR_DECL(expected_negative_shift,hfloat,32,4) [] = { 0x33333333,
> +                                                         0x33333333,
> +                                                         0x33333333,
> +                                                         0x33333333 };
> +
> +
> +#ifndef INSN_NAME
> +#define INSN_NAME vshl
> +#define TEST_MSG "VSHL/VSHLQ"
> +#endif
> +
> +#define FNNAME1(NAME) exec_ ## NAME
> +#define FNNAME(NAME) FNNAME1(NAME)
> +
> +void FNNAME (INSN_NAME) (void)
> +{
> +  /* Basic test: v3=vshl(v1,v2), then store the result.  */
> +#define TEST_VSHL(T3, Q, T1, T2, W, N)                                 \
> +  VECT_VAR(vector_res, T1, W, N) =                                     \
> +    vshl##Q##_##T2##W(VECT_VAR(vector, T1, W, N),                      \
> +                     VECT_VAR(vector_shift, T3, W, N));                \
> +  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
> +
> +  DECL_VARIABLE_ALL_VARIANTS(vector);
> +  DECL_VARIABLE_ALL_VARIANTS(vector_res);
> +
> +  DECL_VARIABLE_SIGNED_VARIANTS(vector_shift);
> +
> +  clean_results ();
> +
> +  /* Initialize input "vector" from "buffer".  */
> +  TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
> +
> +  /* Choose init value arbitrarily, will be used as shift amount.  */
> +  VDUP(vector_shift, , int, s, 8, 8, 1);
> +  VDUP(vector_shift, , int, s, 16, 4, 3);
> +  VDUP(vector_shift, , int, s, 32, 2, 8);
> +  VDUP(vector_shift, , int, s, 64, 1, 3);
> +  VDUP(vector_shift, q, int, s, 8, 16, 5);
> +  VDUP(vector_shift, q, int, s, 16, 8, 12);
> +  VDUP(vector_shift, q, int, s, 32, 4, 30);
> +  VDUP(vector_shift, q, int, s, 64, 2, 63);
> +
> +  /* Execute the tests.  */
> +  TEST_MACRO_ALL_VARIANTS_1_5(TEST_VSHL, int);
> +
> +  CHECK_RESULTS (TEST_MSG, "");
> +
> +
> +  /* Test large shift amount (larger or equal to the type width.  */
> +  VDUP(vector_shift, , int, s, 8, 8, 8);
> +  VDUP(vector_shift, , int, s, 16, 4, 16);
> +  VDUP(vector_shift, , int, s, 32, 2, 32);
> +  VDUP(vector_shift, , int, s, 64, 1, 64);
> +  VDUP(vector_shift, q, int, s, 8, 16, 8);
> +  VDUP(vector_shift, q, int, s, 16, 8, 17);
> +  VDUP(vector_shift, q, int, s, 32, 4, 33);
> +  VDUP(vector_shift, q, int, s, 64, 2, 65);
> +
> +  /* Execute the tests.  */
> +  TEST_MACRO_ALL_VARIANTS_1_5(TEST_VSHL, int);
> +
> +  CHECK_RESULTS_NAMED (TEST_MSG, expected_large_shift, "(large shift amount)");
> +
> +
> +  /* Test negative shift amount. */
> +  VDUP(vector_shift, , int, s, 8, 8, -1);
> +  VDUP(vector_shift, , int, s, 16, 4, -1);
> +  VDUP(vector_shift, , int, s, 32, 2, -2);
> +  VDUP(vector_shift, , int, s, 64, 1, -4);
> +  VDUP(vector_shift, q, int, s, 8, 16, -2);
> +  VDUP(vector_shift, q, int, s, 16, 8, -5);
> +  VDUP(vector_shift, q, int, s, 32, 4, -3);
> +  VDUP(vector_shift, q, int, s, 64, 2, -5);
> +
> +  /* Execute the tests.  */
> +  TEST_MACRO_ALL_VARIANTS_1_5(TEST_VSHL, int);
> +
> +  CHECK_RESULTS_NAMED (TEST_MSG, expected_negative_shift, "(negative shift amount)");
> +}
> +
> +int main (void)
> +{
> +  FNNAME (INSN_NAME) ();
> +
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/arm/README.neon-intrinsics b/gcc/testsuite/gcc.target/arm/README.neon-intrinsics
> new file mode 100644
> index 0000000..f246349
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/README.neon-intrinsics
> @@ -0,0 +1 @@
> +Advanced SIMD intrinsics tests are located in gcc.target/aarch64.
> --
> 1.8.3.2
>
Christophe Lyon July 3, 2014, 9:04 p.m. UTC | #2
On 3 July 2014 13:06, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
> On Tue, Jul 1, 2014 at 11:05 AM, Christophe Lyon
> <christophe.lyon@linaro.org> wrote:
>> * documentation (README)
>> * dejanu driver (neon-intrinsics.exp)
>> * support macros (arm-neon-ref.h, compute-ref-data.h)
>> * Tests for 3 intrinsics: vaba, vld1, vshl
>>
>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>> index c2ed273..3a0f99b 100644
>> --- a/gcc/testsuite/ChangeLog
>> +++ b/gcc/testsuite/ChangeLog
>> @@ -1,3 +1,14 @@
>> +2014-06-30  Christophe Lyon  <christophe.lyon@linaro.org>
>> +
>> +       * gcc.target/arm/README.neon-intrinsics: New file.
>> +       * gcc.target/aarch64/neon-intrinsics/README: Likewise.
>> +       * gcc.target/aarch64/neon-intrinsics/arm-neon-ref.h: Likewise.
>> +       * gcc.target/aarch64/neon-intrinsics/compute-ref-data.h: Likewise.
>> +       * gcc.target/aarch64/neon-intrinsics/neon-intrinsics.exp: Likewise.
>> +       * gcc.target/aarch64/neon-intrinsics/vaba.c: Likewise.
>> +       * gcc.target/aarch64/neon-intrinsics/vld1.c: Likewise.
>> +       * gcc.target/aarch64/neon-intrinsics/vshl.c: Likewise.
>> +
>
>
> Ok for the ARM backend , please don't put changelogs in patches.
>
> Assuming no regressions when tested on both backends.
Not sure what you mean by regression, since I only introduce new tests
and do not modify the compilers.
Some of these new tests do FAIL as already discussed.

> Wait for an explicit ack from one of the AArch64 maintainers as this
> is technically gcc.target/aarch64
sure.

Thanks.

>
> regards
> Ramana
>
>
>
>
>>  2014-06-30  Igor Zamyatin  <igor.zamyatin@intel.com>
>>
>>         PR middle-end/57541
>> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
>> new file mode 100644
>> index 0000000..232bb1d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
>> @@ -0,0 +1,132 @@
>> +This directory contains executable tests for ARM/AArch64 Neon
>> +intrinsics.
>> +
>> +It is meant to cover execution cases of all the Advanced SIMD
>> +intrinsics, but does not scan the generated assembler code.
>> +
>> +The general framework is composed as follows:
>> +- neon-intrinsics.exp: main dejagnu driver
>> +- *.c: actual tests, generally one per intrinsinc family
>> +- arm-neon-ref.h: contains macro definitions to save typing in actual
>> +  test files
>> +- compute-ref-data.h: contains input vectors definitions
>> +- *.inc: generic tests, shared by several families of intrinsics. For
>> +   instance, unary or binary operators
>> +
>> +A typical .c test file starts with the following contents (look at
>> +vld1.c and vaba.c for sample cases):
>> +#include <arm_neon.h>
>> +#include "arm-neon-ref.h"
>> +#include "compute-ref-data.h"
>> +
>> +Then, definitions of expected results, based on common input values,
>> +as defined in compute-ref-data.h.
>> +For example:
>> +VECT_VAR_DECL(expected,int,16,4) [] = { 0x16, 0x17, 0x18, 0x19 };
>> +defines the expected results of an operator generating int16x4 values.
>> +
>> +The common input values defined in compute-ref-data.h have been chosen
>> +to avoid corner-case values for most operators, yet exposing negative
>> +values for signed operators. For this reason, their range is also
>> +limited. For instance, the initialization of buffer_int16x4 will be
>> +{ -16, -15, -14, -13 }.
>> +
>> +The initialization of floating-point values is done via hex notation,
>> +to avoid potential rounding problems.
>> +
>> +To test special values and corner cases, specific initialization
>> +values should be used in dedicated tests, to ensure proper coverage.
>> +An example of this is vshl.
>> +
>> +When a variant of an intrinsic is not available, its expected result
>> +should be defined to the value of CLEAN_PATTERN_8 as defined in
>> +arm-neon-ref.h. For example:
>> +VECT_VAR_DECL(expected,int,64,1) [] = { 0x3333333333333333 };
>> +if the given intrinsic has no variant producing an int64x1 result,
>> +like the vcmp family (eg. vclt).
>> +
>> +This is because the helper function (check_results(), defined in
>> +arm-neon-ref.h), iterates over all the possible variants, to save
>> +typing in each individual test file. Alternatively, one can directly
>> +call the CHECK/CHECK_FP macros to check only a few expected results
>> +(see vabs.c for an example).
>> +
>> +Then, define the TEST_MSG string, which will be used when reporting errors.
>> +
>> +Next, define the function performing the actual tests, in general
>> +relying on the helpers provided by arm-neon-ref.h, which means:
>> +
>> +* declare necessary vectors of suitable types: using
>> +  DECL_VARIABLE_ALL_VARIANTS when all variants are supported, or the
>> +  relevant of subset calls to DECL_VARIABLE.
>> +
>> +* call clean_results() to initialize the 'results' buffers.
>> +
>> +* initialize the input vectors, using VLOAD, VDUP or VSET_LANE (vld*
>> +  tests do not need this step, since their actual purpose is to
>> +  initialize vectors).
>> +
>> +* execute the intrinsic on relevant variants, for instance using
>> +  TEST_MACRO_ALL_VARIANTS_2_5.
>> +
>> +* call check_results() to check that the results match the expected
>> +  values.
>> +
>> +A template test file could be:
>> +=================================================================
>> +#include <arm_neon.h>
>> +#include "arm-neon-ref.h"
>> +#include "compute-ref-data.h"
>> +
>> +/* Expected results.  */
>> +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf6, 0xf7, 0xf8, 0xf9,
>> +                                      0xfa, 0xfb, 0xfc, 0xfd };
>> +/* and as many others as necessary.  */
>> +
>> +#define TEST_MSG "VMYINTRINSIC"
>> +void exec_myintrinsic (void)
>> +{
>> +  /* my test: v4=vmyintrinsic(v1,v2,v3), then store the result.  */
>> +#define TEST_VMYINTR(Q, T1, T2, W, N)                                  \
>> +  VECT_VAR(vector_res, T1, W, N) =                                     \
>> +    vmyintr##Q##_##T2##W(VECT_VAR(vector1, T1, W, N),                  \
>> +                        VECT_VAR(vector2, T1, W, N),                   \
>> +                        VECT_VAR(vector3, T1, W, N));                  \
>> +  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
>> +
>> +#define DECL_VMYINTR_VAR(VAR)                  \
>> +  DECL_VARIABLE(VAR, int, 8, 8);
>> +/* And as many others as necessary.  */
>> +
>> +  DECL_VMYINTR_VAR(vector1);
>> +  DECL_VMYINTR_VAR(vector2);
>> +  DECL_VMYINTR_VAR(vector3);
>> +  DECL_VMYINTR_VAR(vector_res);
>> +
>> +  clean_results ();
>> +
>> +  /* Initialize input "vector1" from "buffer".  */
>> +  VLOAD(vector1, buffer, , int, s, 8, 8);
>> +/* And as many others as necessary.  */
>> +
>> +  /* Choose init value arbitrarily.  */
>> +  VDUP(vector2, , int, s, 8, 8, 1);
>> +/* And as many others as necessary.  */
>> +
>> +  /* Choose init value arbitrarily.  */
>> +  VDUP(vector3, , int, s, 8, 8, -5);
>> +/* And as many others as necessary.  */
>> +
>> +  /* Execute the tests.  */
>> +  TEST_VMYINTR(, int, s, 8, 8);
>> +/* And as many others as necessary.  */
>> +
>> +  check_results (TEST_MSG, "");
>> +}
>> +
>> +int main (void)
>> +{
>> +  exec_vmyintrinsic ();
>> +  return 0;
>> +}
>> +=================================================================
>> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/arm-neon-ref.h
>> new file mode 100644
>> index 0000000..8ddcbf2
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/arm-neon-ref.h
>> @@ -0,0 +1,547 @@
>> +/* This file defines helper operations shared by all the tests.  */
>> +
>> +#ifndef _ARM_NEON_REF_H_
>> +#define _ARM_NEON_REF_H_
>> +
>> +#include <stdio.h>
>> +#include <inttypes.h>
>> +
>> +/* helper type, to help write floating point results in integer form.  */
>> +typedef uint32_t hfloat32_t;
>> +
>> +extern void abort(void);
>> +extern void *memset(void *, int, size_t);
>> +extern void *memcpy(void *, const void *, size_t);
>> +extern size_t strlen(const char *);
>> +
>> +/* Various string construction helpers.  */
>> +
>> +/*
>> +  The most useful at user-level are VECT_VAR and VECT_VAR_DECL, which
>> +   construct variable names or declarations, such as:
>> +   VECT_VAR(expected, int, 16, 4) -> expected_int16x4
>> +   VECT_VAR_DECL(expected, int, 16, 4) -> int16x4_t expected_int16x4
>> +*/
>> +
>> +#define xSTR(X) #X
>> +#define STR(X) xSTR(X)
>> +
>> +#define xNAME1(V,T) V ## _ ##  T
>> +#define xNAME(V,T) xNAME1(V,T)
>> +
>> +/* VAR(foo,int,16) -> foo_int16 */
>> +#define VAR(V,T,W) xNAME(V,T##W)
>> +/* VAR_DECL(foo,int,16) -> int16_t foo_int16 */
>> +#define VAR_DECL(V, T, W) T##W##_t VAR(V,T,W)
>> +
>> +/* VECT_NAME(int,16,4) ->  int16x4 */
>> +#define VECT_NAME(T, W, N) T##W##x##N
>> +/* VECT_ARRAY_NAME(int,16,4,2) -> int16x4x2 */
>> +#define VECT_ARRAY_NAME(T, W, N, L) T##W##x##N##x##L
>> +/* VECT_TYPE(int,16,4) -> int16x4_t */
>> +#define VECT_TYPE(T, W, N) xNAME(VECT_NAME(T,W,N),t)
>> +/* VECT_ARRAY_TYPE(int,16,4,2) -> int16x4x2_t */
>> +#define VECT_ARRAY_TYPE(T, W, N, L) xNAME(VECT_ARRAY_NAME(T,W,N,L),t)
>> +
>> +/* VECT_VAR(foo,int,16,4) -> foo_int16x4 */
>> +#define VECT_VAR(V,T,W,N) xNAME(V,VECT_NAME(T,W,N))
>> +/* VECT_VAR_DECL(foo,int,16,4) -> int16_t foo_int16x4 */
>> +#define VECT_VAR_DECL(V, T, W, N) T##W##_t VECT_VAR(V,T,W,N)
>> +
>> +/* Array declarations.  */
>> +/* ARRAY(foo,int,16,4) -> int16_t foo_int16x4[4] */
>> +#define ARRAY(V, T, W, N) VECT_VAR_DECL(V,T,W,N)[N]
>> +
>> +/* Arrays of vectors.  */
>> +/* VECT_ARRAY_VAR(foo,int,16,4,2) -> foo_int16x4x2 */
>> +#define VECT_ARRAY_VAR(V,T,W,N,L) xNAME(V,VECT_ARRAY_NAME(T,W,N,L))
>> +/* VECT_ARRAY(foo,int,16,4,2) -> int16_t foo_int16x4x2[4*2] */
>> +#define VECT_ARRAY(V, T, W, N, L) T##W##_t VECT_ARRAY_VAR(V,T,W,N,L)[N*L]
>> +
>> +/* Check results vs expected values. Operates on one vector.  */
>> +#define CHECK(MSG,T,W,N,FMT,EXPECTED,COMMENT)                          \
>> +  {                                                                    \
>> +    int i;                                                             \
>> +    for(i=0; i<N ; i++)                                                        \
>> +      {                                                                        \
>> +       if (VECT_VAR(result, T, W, N)[i] !=                             \
>> +           VECT_VAR(EXPECTED, T, W, N)[i]) {                           \
>> +         fprintf(stderr,                                               \
>> +                 "ERROR in %s (%s line %d in buffer '%s') at type %s " \
>> +                 "index %d: got 0x%" FMT " != 0x%" FMT " %s\n",        \
>> +                 MSG, __FILE__, __LINE__,                              \
>> +                 STR(EXPECTED),                                        \
>> +                 STR(VECT_NAME(T, W, N)),                              \
>> +                 i,                                                    \
>> +                 VECT_VAR(result, T, W, N)[i],                         \
>> +                 VECT_VAR(EXPECTED, T, W, N)[i],                       \
>> +                 strlen(COMMENT) > 0 ? COMMENT : "");                  \
>> +         abort();                                                      \
>> +       }                                                               \
>> +      }                                                                        \
>> +  }
>> +
>> +/* Floating-point variant.  */
>> +#define CHECK_FP(MSG,T,W,N,FMT,EXPECTED,COMMENT)                       \
>> +  {                                                                    \
>> +    int i;                                                             \
>> +    for(i=0; i<N ; i++)                                                        \
>> +      {                                                                        \
>> +       union fp_operand {                                              \
>> +         uint##W##_t i;                                                \
>> +         float##W##_t f;                                               \
>> +       } tmp_res, tmp_exp;                                             \
>> +       tmp_res.f = VECT_VAR(result, T, W, N)[i];                       \
>> +       tmp_exp.i = VECT_VAR(EXPECTED, h##T, W, N)[i];                  \
>> +       if (tmp_res.i != tmp_exp.i) {                                   \
>> +         fprintf(stderr,                                               \
>> +                 "ERROR in %s (%s line %d in buffer '%s') at type %s " \
>> +                 "index %d: got 0x%" FMT " != 0x%" FMT " %s\n",        \
>> +                 MSG, __FILE__, __LINE__,                              \
>> +                 STR(EXPECTED),                                        \
>> +                 STR(VECT_NAME(T, W, N)),                              \
>> +                 i,                                                    \
>> +                 tmp_res.i,                                            \
>> +                 tmp_exp.i,                                            \
>> +                 strlen(COMMENT) > 0 ? COMMENT : "");                  \
>> +         abort();                                                      \
>> +       }                                                               \
>> +      }                                                                        \
>> +  }
>> +
>> +/* Clean buffer with a non-zero pattern to help diagnose buffer
>> +   overflows.  */
>> +#define CLEAN_PATTERN_8  0x33
>> +
>> +#define CLEAN(VAR,T,W,N)                                               \
>> +  memset(VECT_VAR(VAR, T, W, N),                                       \
>> +        CLEAN_PATTERN_8,                                               \
>> +        sizeof(VECT_VAR(VAR, T, W, N)));
>> +
>> +/* Define output buffers, one of each size.  */
>> +static ARRAY(result, int, 8, 8);
>> +static ARRAY(result, int, 16, 4);
>> +static ARRAY(result, int, 32, 2);
>> +static ARRAY(result, int, 64, 1);
>> +static ARRAY(result, uint, 8, 8);
>> +static ARRAY(result, uint, 16, 4);
>> +static ARRAY(result, uint, 32, 2);
>> +static ARRAY(result, uint, 64, 1);
>> +static ARRAY(result, poly, 8, 8);
>> +static ARRAY(result, poly, 16, 4);
>> +static ARRAY(result, float, 32, 2);
>> +static ARRAY(result, int, 8, 16);
>> +static ARRAY(result, int, 16, 8);
>> +static ARRAY(result, int, 32, 4);
>> +static ARRAY(result, int, 64, 2);
>> +static ARRAY(result, uint, 8, 16);
>> +static ARRAY(result, uint, 16, 8);
>> +static ARRAY(result, uint, 32, 4);
>> +static ARRAY(result, uint, 64, 2);
>> +static ARRAY(result, poly, 8, 16);
>> +static ARRAY(result, poly, 16, 8);
>> +static ARRAY(result, float, 32, 4);
>> +
>> +/* Declare expected results, one of each size. They are defined and
>> +   initialized in each test file.  */
>> +extern ARRAY(expected, int, 8, 8);
>> +extern ARRAY(expected, int, 16, 4);
>> +extern ARRAY(expected, int, 32, 2);
>> +extern ARRAY(expected, int, 64, 1);
>> +extern ARRAY(expected, uint, 8, 8);
>> +extern ARRAY(expected, uint, 16, 4);
>> +extern ARRAY(expected, uint, 32, 2);
>> +extern ARRAY(expected, uint, 64, 1);
>> +extern ARRAY(expected, poly, 8, 8);
>> +extern ARRAY(expected, poly, 16, 4);
>> +extern ARRAY(expected, hfloat, 32, 2);
>> +extern ARRAY(expected, int, 8, 16);
>> +extern ARRAY(expected, int, 16, 8);
>> +extern ARRAY(expected, int, 32, 4);
>> +extern ARRAY(expected, int, 64, 2);
>> +extern ARRAY(expected, uint, 8, 16);
>> +extern ARRAY(expected, uint, 16, 8);
>> +extern ARRAY(expected, uint, 32, 4);
>> +extern ARRAY(expected, uint, 64, 2);
>> +extern ARRAY(expected, poly, 8, 16);
>> +extern ARRAY(expected, poly, 16, 8);
>> +extern ARRAY(expected, hfloat, 32, 4);
>> +
>> +/* Check results. Operates on all possible vector types.  */
>> +#define CHECK_RESULTS(test_name,comment)                               \
>> +  {                                                                    \
>> +    CHECK(test_name, int, 8, 8, PRIx8, expected, comment);             \
>> +    CHECK(test_name, int, 16, 4, PRIx16, expected, comment);           \
>> +    CHECK(test_name, int, 32, 2, PRIx32, expected, comment);           \
>> +    CHECK(test_name, int, 64, 1, PRIx64, expected, comment);           \
>> +    CHECK(test_name, uint, 8, 8, PRIx8, expected, comment);            \
>> +    CHECK(test_name, uint, 16, 4, PRIx16, expected, comment);          \
>> +    CHECK(test_name, uint, 32, 2, PRIx32, expected, comment);          \
>> +    CHECK(test_name, uint, 64, 1, PRIx64, expected, comment);          \
>> +    CHECK(test_name, poly, 8, 8, PRIx8, expected, comment);            \
>> +    CHECK(test_name, poly, 16, 4, PRIx16, expected, comment);          \
>> +    CHECK_FP(test_name, float, 32, 2, PRIx32, expected, comment);      \
>> +                                                                       \
>> +    CHECK(test_name, int, 8, 16, PRIx8, expected, comment);            \
>> +    CHECK(test_name, int, 16, 8, PRIx16, expected, comment);           \
>> +    CHECK(test_name, int, 32, 4, PRIx32, expected, comment);           \
>> +    CHECK(test_name, int, 64, 2, PRIx64, expected, comment);           \
>> +    CHECK(test_name, uint, 8, 16, PRIx8, expected, comment);           \
>> +    CHECK(test_name, uint, 16, 8, PRIx16, expected, comment);          \
>> +    CHECK(test_name, uint, 32, 4, PRIx32, expected, comment);          \
>> +    CHECK(test_name, uint, 64, 2, PRIx64, expected, comment);          \
>> +    CHECK(test_name, poly, 8, 16, PRIx8, expected, comment);           \
>> +    CHECK(test_name, poly, 16, 8, PRIx16, expected, comment);          \
>> +    CHECK_FP(test_name, float, 32, 4, PRIx32, expected, comment);      \
>> +  }                                                                    \
>> +
>> +#define CHECK_RESULTS_NAMED(test_name,EXPECTED,comment)                        \
>> +  {                                                                    \
>> +    CHECK(test_name, int, 8, 8, PRIx8, EXPECTED, comment);             \
>> +    CHECK(test_name, int, 16, 4, PRIx16, EXPECTED, comment);           \
>> +    CHECK(test_name, int, 32, 2, PRIx32, EXPECTED, comment);           \
>> +    CHECK(test_name, int, 64, 1, PRIx64, EXPECTED, comment);           \
>> +    CHECK(test_name, uint, 8, 8, PRIx8, EXPECTED, comment);            \
>> +    CHECK(test_name, uint, 16, 4, PRIx16, EXPECTED, comment);          \
>> +    CHECK(test_name, uint, 32, 2, PRIx32, EXPECTED, comment);          \
>> +    CHECK(test_name, uint, 64, 1, PRIx64, EXPECTED, comment);          \
>> +    CHECK(test_name, poly, 8, 8, PRIx8, EXPECTED, comment);            \
>> +    CHECK(test_name, poly, 16, 4, PRIx16, EXPECTED, comment);          \
>> +    CHECK_FP(test_name, float, 32, 2, PRIx32, EXPECTED, comment);      \
>> +                                                                       \
>> +    CHECK(test_name, int, 8, 16, PRIx8, EXPECTED, comment);            \
>> +    CHECK(test_name, int, 16, 8, PRIx16, EXPECTED, comment);           \
>> +    CHECK(test_name, int, 32, 4, PRIx32, EXPECTED, comment);           \
>> +    CHECK(test_name, int, 64, 2, PRIx64, EXPECTED, comment);           \
>> +    CHECK(test_name, uint, 8, 16, PRIx8, EXPECTED, comment);           \
>> +    CHECK(test_name, uint, 16, 8, PRIx16, EXPECTED, comment);          \
>> +    CHECK(test_name, uint, 32, 4, PRIx32, EXPECTED, comment);          \
>> +    CHECK(test_name, uint, 64, 2, PRIx64, EXPECTED, comment);          \
>> +    CHECK(test_name, poly, 8, 16, PRIx8, EXPECTED, comment);           \
>> +    CHECK(test_name, poly, 16, 8, PRIx16, EXPECTED, comment);          \
>> +    CHECK_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment);      \
>> +  }                                                                    \
>> +
>> +
>> +
>> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
>> +
>> +typedef union {
>> +  struct {
>> +    int _xxx:27;
>> +    unsigned int QC:1;
>> +    int V:1;
>> +    int C:1;
>> +    int Z:1;
>> +    int N:1;
>> +  } b;
>> +  unsigned int word;
>> +} _ARM_FPSCR;
>> +
>> +#else /* __ORDER_BIG_ENDIAN__ */
>> +
>> +typedef union {
>> +  struct {
>> +    int N:1;
>> +    int Z:1;
>> +    int C:1;
>> +    int V:1;
>> +    unsigned int QC:1;
>> +    int _dnm:27;
>> +  } b;
>> +  unsigned int word;
>> +} _ARM_FPSCR;
>> +
>> +#endif /* __ORDER_BIG_ENDIAN__ */
>> +
>> +#define Neon_Cumulative_Sat  __read_neon_cumulative_sat()
>> +#define Set_Neon_Cumulative_Sat(x)  __set_neon_cumulative_sat((x))
>> +
>> +#if defined(__aarch64__)
>> +static volatile int __read_neon_cumulative_sat (void) {
>> +    _ARM_FPSCR _afpscr_for_qc;
>> +    asm volatile ("mrs %0,fpsr" : "=r" (_afpscr_for_qc));
>> +    return _afpscr_for_qc.b.QC;
>> +}
>> +static void __set_neon_cumulative_sat (int x) {
>> +    _ARM_FPSCR _afpscr_for_qc;
>> +    asm volatile ("mrs %0,fpsr" : "=r" (_afpscr_for_qc));
>> +    _afpscr_for_qc.b.QC = x;
>> +    asm volatile ("msr fpsr,%0" : : "r" (_afpscr_for_qc));
>> +    return;
>> +}
>> +#else
>> +static volatile int __read_neon_cumulative_sat (void) {
>> +    _ARM_FPSCR _afpscr_for_qc;
>> +    asm volatile ("vmrs %0,fpscr" : "=r" (_afpscr_for_qc));
>> +    return _afpscr_for_qc.b.QC;
>> +}
>> +
>> +static void __set_neon_cumulative_sat (int x) {
>> +    _ARM_FPSCR _afpscr_for_qc;
>> +    asm volatile ("vmrs %0,fpscr" : "=r" (_afpscr_for_qc));
>> +    _afpscr_for_qc.b.QC = x;
>> +    asm volatile ("vmsr fpscr,%0" : : "r" (_afpscr_for_qc));
>> +    return;
>> +}
>> +#endif
>> +
>> +/* Declare expected cumulative saturation results, one for each
>> +   size. They are defined and initialized in relevant test files.  */
>> +extern int VECT_VAR(expected_cumulative_sat, int, 8, 8);
>> +extern int VECT_VAR(expected_cumulative_sat, int, 16, 4);
>> +extern int VECT_VAR(expected_cumulative_sat, int, 32, 2);
>> +extern int VECT_VAR(expected_cumulative_sat, int, 64, 1);
>> +extern int VECT_VAR(expected_cumulative_sat, uint, 8, 8);
>> +extern int VECT_VAR(expected_cumulative_sat, uint, 16, 4);
>> +extern int VECT_VAR(expected_cumulative_sat, uint, 32, 2);
>> +extern int VECT_VAR(expected_cumulative_sat, uint, 64, 1);
>> +extern int VECT_VAR(expected_cumulative_sat, int, 8, 16);
>> +extern int VECT_VAR(expected_cumulative_sat, int, 16, 8);
>> +extern int VECT_VAR(expected_cumulative_sat, int, 32, 4);
>> +extern int VECT_VAR(expected_cumulative_sat, int, 64, 2);
>> +extern int VECT_VAR(expected_cumulative_sat, uint, 8, 16);
>> +extern int VECT_VAR(expected_cumulative_sat, uint, 16, 8);
>> +extern int VECT_VAR(expected_cumulative_sat, uint, 32, 4);
>> +extern int VECT_VAR(expected_cumulative_sat, uint, 64, 2);
>> +
>> +/* Check cumulative saturation flag vs expected value.  */
>> +#define CHECK_CUMULATIVE_SAT(MSG,T,W,N,EXPECTED,COMMENT)               \
>> +  {                                                                    \
>> +    if (Neon_Cumulative_Sat !=                                         \
>> +       VECT_VAR(EXPECTED, T, W, N)) {                                  \
>> +      fprintf(stderr,                                                  \
>> +             "ERROR in %s (%s line %d in cumulative_sat '%s') at type %s: " \
>> +             "got %d expected %d%s\n",                                 \
>> +             MSG, __FILE__, __LINE__,                                  \
>> +             STR(EXPECTED),                                            \
>> +             STR(VECT_NAME(T, W, N)),                                  \
>> +             Neon_Cumulative_Sat,                                      \
>> +             VECT_VAR(EXPECTED, T, W, N),                              \
>> +             strlen(COMMENT) > 0 ? " " COMMENT : "");                  \
>> +      abort();                                                         \
>> +    }                                                                  \
>> +  }
>> +
>> +#define CHECK_CUMULATIVE_SAT_NAMED(test_name,EXPECTED,comment)         \
>> +  {                                                                    \
>> +    CHECK_CUMULATIVE_SAT(test_name, int, 8, 8, PRIx8, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, int, 16, 4, PRIx16, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, int, 32, 2, PRIx32, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, int, 64, 1, PRIx64, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, uint, 8, 8, PRIx8, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, uint, 16, 4, PRIx16, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, uint, 32, 2, PRIx32, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, uint, 64, 1, PRIx64, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, poly, 8, 8, PRIx8, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, poly, 16, 4, PRIx16, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT_FP(test_name, float, 32, 2, PRIx32, EXPECTED, comment); \
>> +                                                                       \
>> +    CHECK_CUMULATIVE_SAT(test_name, int, 8, 16, PRIx8, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, int, 16, 8, PRIx16, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, int, 32, 4, PRIx32, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, int, 64, 2, PRIx64, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, uint, 8, 16, PRIx8, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, uint, 16, 8, PRIx16, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, uint, 32, 4, PRIx32, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, uint, 64, 2, PRIx64, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, poly, 8, 16, PRIx8, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT(test_name, poly, 16, 8, PRIx16, EXPECTED, comment); \
>> +    CHECK_CUMULATIVE_SAT_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment); \
>> +  }                                                                    \
>> +
>> +
>> +/* Clean output buffers before execution.  */
>> +static void clean_results (void)
>> +{
>> +  CLEAN(result, int, 8, 8);
>> +  CLEAN(result, int, 16, 4);
>> +  CLEAN(result, int, 32, 2);
>> +  CLEAN(result, int, 64, 1);
>> +  CLEAN(result, uint, 8, 8);
>> +  CLEAN(result, uint, 16, 4);
>> +  CLEAN(result, uint, 32, 2);
>> +  CLEAN(result, uint, 64, 1);
>> +  CLEAN(result, poly, 8, 8);
>> +  CLEAN(result, poly, 16, 4);
>> +  CLEAN(result, float, 32, 2);
>> +
>> +  CLEAN(result, int, 8, 16);
>> +  CLEAN(result, int, 16, 8);
>> +  CLEAN(result, int, 32, 4);
>> +  CLEAN(result, int, 64, 2);
>> +  CLEAN(result, uint, 8, 16);
>> +  CLEAN(result, uint, 16, 8);
>> +  CLEAN(result, uint, 32, 4);
>> +  CLEAN(result, uint, 64, 2);
>> +  CLEAN(result, poly, 8, 16);
>> +  CLEAN(result, poly, 16, 8);
>> +  CLEAN(result, float, 32, 4);
>> +}
>> +
>> +
>> +/* Helpers to declare variables of various types.   */
>> +#define DECL_VARIABLE(VAR, T1, W, N)           \
>> +  VECT_TYPE(T1, W, N) VECT_VAR(VAR, T1, W, N)
>> +
>> +/* Declare only 64 bits signed variants.  */
>> +#define DECL_VARIABLE_64BITS_SIGNED_VARIANTS(VAR)      \
>> +  DECL_VARIABLE(VAR, int, 8, 8);                       \
>> +  DECL_VARIABLE(VAR, int, 16, 4);                      \
>> +  DECL_VARIABLE(VAR, int, 32, 2);                      \
>> +  DECL_VARIABLE(VAR, int, 64, 1)
>> +
>> +/* Declare only 64 bits unsigned variants.  */
>> +#define DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR)    \
>> +  DECL_VARIABLE(VAR, uint, 8, 8);                      \
>> +  DECL_VARIABLE(VAR, uint, 16, 4);                     \
>> +  DECL_VARIABLE(VAR, uint, 32, 2);                     \
>> +  DECL_VARIABLE(VAR, uint, 64, 1)
>> +
>> +/* Declare only 128 bits signed variants.  */
>> +#define DECL_VARIABLE_128BITS_SIGNED_VARIANTS(VAR)     \
>> +  DECL_VARIABLE(VAR, int, 8, 16);                      \
>> +  DECL_VARIABLE(VAR, int, 16, 8);                      \
>> +  DECL_VARIABLE(VAR, int, 32, 4);                      \
>> +  DECL_VARIABLE(VAR, int, 64, 2)
>> +
>> +/* Declare only 128 bits unsigned variants.  */
>> +#define DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR)   \
>> +  DECL_VARIABLE(VAR, uint, 8, 16);                     \
>> +  DECL_VARIABLE(VAR, uint, 16, 8);                     \
>> +  DECL_VARIABLE(VAR, uint, 32, 4);                     \
>> +  DECL_VARIABLE(VAR, uint, 64, 2)
>> +
>> +/* Declare all 64 bits variants.  */
>> +#define DECL_VARIABLE_64BITS_VARIANTS(VAR)     \
>> +  DECL_VARIABLE_64BITS_SIGNED_VARIANTS(VAR);   \
>> +  DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR); \
>> +  DECL_VARIABLE(VAR, poly, 8, 8);              \
>> +  DECL_VARIABLE(VAR, poly, 16, 4);             \
>> +  DECL_VARIABLE(VAR, float, 32, 2)
>> +
>> +/* Declare all 128 bits variants.  */
>> +#define DECL_VARIABLE_128BITS_VARIANTS(VAR)    \
>> +  DECL_VARIABLE_128BITS_SIGNED_VARIANTS(VAR);  \
>> +  DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR);        \
>> +  DECL_VARIABLE(VAR, poly, 8, 16);             \
>> +  DECL_VARIABLE(VAR, poly, 16, 8);             \
>> +  DECL_VARIABLE(VAR, float, 32, 4)
>> +
>> +/* Declare all variants.  */
>> +#define DECL_VARIABLE_ALL_VARIANTS(VAR)                \
>> +  DECL_VARIABLE_64BITS_VARIANTS(VAR);          \
>> +  DECL_VARIABLE_128BITS_VARIANTS(VAR)
>> +
>> +/* Declare all signed variants.  */
>> +#define DECL_VARIABLE_SIGNED_VARIANTS(VAR)     \
>> +  DECL_VARIABLE_64BITS_SIGNED_VARIANTS(VAR);   \
>> +  DECL_VARIABLE_128BITS_SIGNED_VARIANTS(VAR)
>> +
>> +/* Declare all unsigned variants.  */
>> +#define DECL_VARIABLE_UNSIGNED_VARIANTS(VAR)   \
>> +  DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR); \
>> +  DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR)
>> +
>> +/* Helpers to initialize vectors.  */
>> +#define VDUP(VAR, Q, T1, T2, W, N, V)                  \
>> +  VECT_VAR(VAR, T1, W, N) = vdup##Q##_n_##T2##W(V)
>> +
>> +#define VSET_LANE(VAR, Q, T1, T2, W, N, L, V)                          \
>> +  VECT_VAR(VAR, T1, W, N) = vset##Q##_lane_##T2##W(V,                  \
>> +                                                  VECT_VAR(VAR, T1, W, N), \
>> +                                                  L)
>> +
>> +/* We need to load initial values first, so rely on VLD1.  */
>> +#define VLOAD(VAR, BUF, Q, T1, T2, W, N)                               \
>> +  VECT_VAR(VAR, T1, W, N) = vld1##Q##_##T2##W(VECT_VAR(BUF, T1, W, N))
>> +
>> +/* Helpers to call macros with 1 constant and 5 variable
>> +   arguments.  */
>> +#define TEST_MACRO_64BITS_SIGNED_VARIANTS_1_5(MACRO, VAR)      \
>> +  MACRO(VAR, , int, s, 8, 8);                                  \
>> +  MACRO(VAR, , int, s, 16, 4);                                 \
>> +  MACRO(VAR, , int, s, 32, 2);                                 \
>> +  MACRO(VAR, , int, s, 64, 1)
>> +
>> +#define TEST_MACRO_64BITS_UNSIGNED_VARIANTS_1_5(MACRO, VAR)    \
>> +  MACRO(VAR, , uint, u, 8, 8);                                 \
>> +  MACRO(VAR, , uint, u, 16, 4);                                        \
>> +  MACRO(VAR, , uint, u, 32, 2);                                        \
>> +  MACRO(VAR, , uint, u, 64, 1)
>> +
>> +#define TEST_MACRO_128BITS_SIGNED_VARIANTS_1_5(MACRO, VAR)     \
>> +  MACRO(VAR, q, int, s, 8, 16);                                        \
>> +  MACRO(VAR, q, int, s, 16, 8);                                        \
>> +  MACRO(VAR, q, int, s, 32, 4);                                        \
>> +  MACRO(VAR, q, int, s, 64, 2)
>> +
>> +#define TEST_MACRO_128BITS_UNSIGNED_VARIANTS_1_5(MACRO,VAR)    \
>> +  MACRO(VAR, q, uint, u, 8, 16);                               \
>> +  MACRO(VAR, q, uint, u, 16, 8);                               \
>> +  MACRO(VAR, q, uint, u, 32, 4);                               \
>> +  MACRO(VAR, q, uint, u, 64, 2)
>> +
>> +#define TEST_MACRO_64BITS_VARIANTS_1_5(MACRO, VAR)     \
>> +  TEST_MACRO_64BITS_SIGNED_VARIANTS_1_5(MACRO, VAR);   \
>> +  TEST_MACRO_64BITS_UNSIGNED_VARIANTS_1_5(MACRO, VAR)
>> +
>> +#define TEST_MACRO_128BITS_VARIANTS_1_5(MACRO, VAR)    \
>> +  TEST_MACRO_128BITS_SIGNED_VARIANTS_1_5(MACRO, VAR);  \
>> +  TEST_MACRO_128BITS_UNSIGNED_VARIANTS_1_5(MACRO, VAR)
>> +
>> +#define TEST_MACRO_ALL_VARIANTS_1_5(MACRO, VAR)        \
>> +  TEST_MACRO_64BITS_VARIANTS_1_5(MACRO, VAR);  \
>> +  TEST_MACRO_128BITS_VARIANTS_1_5(MACRO, VAR)
>> +
>> +#define TEST_MACRO_SIGNED_VARIANTS_1_5(MACRO, VAR)     \
>> +  TEST_MACRO_64BITS_SIGNED_VARIANTS_1_5(MACRO, VAR);   \
>> +  TEST_MACRO_128BITS_SIGNED_VARIANTS_1_5(MACRO, VAR)
>> +
>> +/* Helpers to call macros with 2 constant and 5 variable
>> +   arguments.  */
>> +#define TEST_MACRO_64BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)       \
>> +  MACRO(VAR1, VAR2, , int, s, 8, 8);                                   \
>> +  MACRO(VAR1, VAR2, , int, s, 16, 4);                                  \
>> +  MACRO(VAR1, VAR2, , int, s, 32, 2);                                  \
>> +  MACRO(VAR1, VAR2 , , int, s, 64, 1)
>> +
>> +#define TEST_MACRO_64BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)     \
>> +  MACRO(VAR1, VAR2, , uint, u, 8, 8);                                  \
>> +  MACRO(VAR1, VAR2, , uint, u, 16, 4);                                 \
>> +  MACRO(VAR1, VAR2, , uint, u, 32, 2);                                 \
>> +  MACRO(VAR1, VAR2, , uint, u, 64, 1)
>> +
>> +#define TEST_MACRO_128BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)      \
>> +  MACRO(VAR1, VAR2, q, int, s, 8, 16);                                 \
>> +  MACRO(VAR1, VAR2, q, int, s, 16, 8);                                 \
>> +  MACRO(VAR1, VAR2, q, int, s, 32, 4);                                 \
>> +  MACRO(VAR1, VAR2, q, int, s, 64, 2)
>> +
>> +#define TEST_MACRO_128BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)    \
>> +  MACRO(VAR1, VAR2, q, uint, u, 8, 16);                                        \
>> +  MACRO(VAR1, VAR2, q, uint, u, 16, 8);                                        \
>> +  MACRO(VAR1, VAR2, q, uint, u, 32, 4);                                        \
>> +  MACRO(VAR1, VAR2, q, uint, u, 64, 2)
>> +
>> +#define TEST_MACRO_64BITS_VARIANTS_2_5(MACRO, VAR1, VAR2)      \
>> +  TEST_MACRO_64BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);    \
>> +  TEST_MACRO_64BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);  \
>> +  MACRO(VAR1, VAR2, , poly, p, 8, 8);                          \
>> +  MACRO(VAR1, VAR2, , poly, p, 16, 4)
>> +
>> +#define TEST_MACRO_128BITS_VARIANTS_2_5(MACRO, VAR1, VAR2)     \
>> +  TEST_MACRO_128BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);   \
>> +  TEST_MACRO_128BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2); \
>> +  MACRO(VAR1, VAR2, q, poly, p, 8, 16);                                \
>> +  MACRO(VAR1, VAR2, q, poly, p, 16, 8)
>> +
>> +#define TEST_MACRO_ALL_VARIANTS_2_5(MACRO, VAR1, VAR2) \
>> +  TEST_MACRO_64BITS_VARIANTS_2_5(MACRO, VAR1, VAR2);   \
>> +  TEST_MACRO_128BITS_VARIANTS_2_5(MACRO, VAR1, VAR2)
>> +
>> +#define TEST_MACRO_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)      \
>> +  TEST_MACRO_64BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);    \
>> +  TEST_MACRO_128BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)
>> +
>> +#endif /* _ARM_NEON_REF_H_ */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/compute-ref-data.h b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/compute-ref-data.h
>> new file mode 100644
>> index 0000000..3673d73
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/compute-ref-data.h
>> @@ -0,0 +1,204 @@
>> +/* This file contains input data static definitions, shared by most of
>> +   the tests.  */
>> +
>> +#include <arm_neon.h>
>> +#include "arm-neon-ref.h"
>> +
>> +/* Initialization helpers; 4 slices are needed for vld2, vld3 and
>> +   vld4.  */
>> +#define MY_INIT_TABLE(T,W,N) xNAME(INIT_TABLE,N)(T##W##_t)
>> +#define MY_INIT_TABLE2(T,W,N) xNAME(INIT_TABLE2,N)(T##W##_t)
>> +#define MY_INIT_TABLE3(T,W,N) xNAME(INIT_TABLE3,N)(T##W##_t)
>> +#define MY_INIT_TABLE4(T,W,N) xNAME(INIT_TABLE4,N)(T##W##_t)
>> +
>> +/* Initialized input buffers.  */
>> +#define VECT_VAR_DECL_INIT(V, T, W, N)                 \
>> +  VECT_VAR_DECL(V,T,W,N) [] = { MY_INIT_TABLE(T,W,N) }
>> +
>> +/* Specialized initializer with 4 entries, as used by vldX_dup and
>> +   vdup tests, which iterate 4 times on input buffers.  */
>> +#define VECT_VAR_DECL_INIT4(V, T, W, N)                        \
>> +  VECT_VAR_DECL(V,T,W,N) [] = { MY_INIT_TABLE(T,W,4) };
>> +
>> +/* Initializers for arrays of vectors.  */
>> +#define VECT_ARRAY_INIT2(V, T, W, N)           \
>> +  T##W##_t VECT_ARRAY_VAR(V,T,W,N,2)[] =       \
>> +  { MY_INIT_TABLE(T,W,N)                       \
>> +    MY_INIT_TABLE2(T,W,N) }
>> +
>> +#define VECT_ARRAY_INIT3(V, T, W, N)                   \
>> +  T##W##_t VECT_ARRAY_VAR(V,T,W,N,3)[] =               \
>> +  { MY_INIT_TABLE(T,W,N)                               \
>> +    MY_INIT_TABLE2(T,W,N)                              \
>> +    MY_INIT_TABLE3(T,W,N) }
>> +
>> +#define VECT_ARRAY_INIT4(V, T, W, N)                   \
>> +  T##W##_t VECT_ARRAY_VAR(V,T,W,N,4)[] =               \
>> +  { MY_INIT_TABLE(T,W,N)                               \
>> +    MY_INIT_TABLE2(T,W,N)                              \
>> +    MY_INIT_TABLE3(T,W,N)                              \
>> +    MY_INIT_TABLE4(T,W,N) }
>> +
>> +/* Sample initialization vectors.  */
>> +#define INIT_TABLE_1(T)                                \
>> +  (T)-16,
>> +#define INIT_TABLE2_1(T)                       \
>> +  (T)-15,
>> +#define INIT_TABLE3_1(T)                       \
>> +  (T)-14,
>> +#define INIT_TABLE4_1(T)                       \
>> +  (T)-13,
>> +
>> +#define INIT_TABLE_2(T)                                \
>> +  (T)-16, (T)-15,
>> +#define INIT_TABLE2_2(T)                       \
>> +  (T)-14, (T)-13,
>> +#define INIT_TABLE3_2(T)                       \
>> +  (T)-12, (T)-11,
>> +#define INIT_TABLE4_2(T)                       \
>> +  (T)-10, (T)-9,
>> +
>> +/* Initializer for vld3_lane tests.  */
>> +#define INIT_TABLE_3(T)                                \
>> +  (T)-16, (T)-15, (T)-14,
>> +
>> +#define INIT_TABLE_4(T)                                \
>> +  (T)-16, (T)-15, (T)-14, (T)-13,
>> +#define INIT_TABLE2_4(T)                       \
>> +  (T)-12, (T)-11, (T)-10, (T)-9,
>> +#define INIT_TABLE3_4(T)                       \
>> +  (T)-8, (T)-7, (T)-6, (T)-5,
>> +#define INIT_TABLE4_4(T)                       \
>> +  (T)-4, (T)-3, (T)-2, (T)-1,
>> +
>> +#define INIT_TABLE_8(T)                                                        \
>> +  (T)-16, (T)-15, (T)-14, (T)-13, (T)-12, (T)-11, (T)-10, (T)-9,
>> +#define INIT_TABLE2_8(T)                                       \
>> +  (T)-8, (T)-7, (T)-6, (T)-5, (T)-4, (T)-3, (T)-2, (T)-1,
>> +#define INIT_TABLE3_8(T)                               \
>> +  (T)0, (T)1, (T)2, (T)3, (T)4, (T)5, (T)6, (T)7,
>> +#define INIT_TABLE4_8(T)                               \
>> +  (T)8, (T)9, (T)10, (T)11, (T)12, (T)13, (T)14, (T)15,
>> +
>> +#define INIT_TABLE_16(T)                                               \
>> +  (T)-16, (T)-15, (T)-14, (T)-13, (T)-12, (T)-11, (T)-10, (T)-9,       \
>> +  (T)-8, (T)-7, (T)-6, (T)-5, (T)-4, (T)-3, (T)-2, (T)-1,
>> +#define INIT_TABLE2_16(T)                                              \
>> +  (T)0, (T)1, (T)2, (T)3, (T)4, (T)5, (T)6, (T)7,                      \
>> +  (T)8, (T)9, (T)10, (T)11, (T)12, (T)13, (T)14, (T)15,
>> +#define INIT_TABLE3_16(T)                                              \
>> +  (T)16, (T)17, (T)18, (T)19, (T)20, (T)21, (T)22, (T)23,              \
>> +   (T)24, (T)25, (T)26, (T)27, (T)28, (T)29, (T)30, (T)31,
>> +#define INIT_TABLE4_16(T)                                              \
>> +  (T)32, (T)33, (T)34, (T)35, (T)36, (T)37, (T)38, (T)39,              \
>> +  (T)40, (T)41, (T)42, (T)43, (T)44, (T)45, (T)46, (T)47,
>> +
>> +/* This one is used for padding between input buffers.  */
>> +#define PAD(V, T, W, N) char VECT_VAR(V,T,W,N)=42
>> +
>> +/* Input buffers, one of each size.  */
>> +/* Insert some padding to try to exhibit out of bounds accesses.  */
>> +VECT_VAR_DECL_INIT(buffer, int, 8, 8);
>> +PAD(buffer_pad, int, 8, 8);
>> +VECT_VAR_DECL_INIT(buffer, int, 16, 4);
>> +PAD(buffer_pad, int, 16, 4);
>> +VECT_VAR_DECL_INIT(buffer, int, 32, 2);
>> +PAD(buffer_pad, int, 32, 2);
>> +VECT_VAR_DECL_INIT(buffer, int, 64, 1);
>> +PAD(buffer_pad, int, 64, 1);
>> +VECT_VAR_DECL_INIT(buffer, uint, 8, 8);
>> +PAD(buffer_pad, uint, 8, 8);
>> +VECT_VAR_DECL_INIT(buffer, poly, 8, 8);
>> +PAD(buffer_pad, poly, 8, 8);
>> +VECT_VAR_DECL_INIT(buffer, poly, 16, 4);
>> +PAD(buffer_pad, poly, 16, 4);
>> +VECT_VAR_DECL_INIT(buffer, uint, 16, 4);
>> +PAD(buffer_pad, uint, 16, 4);
>> +VECT_VAR_DECL_INIT(buffer, uint, 32, 2);
>> +PAD(buffer_pad, uint, 32, 2);
>> +VECT_VAR_DECL_INIT(buffer, uint, 64, 1);
>> +PAD(buffer_pad, uint, 64, 1);
>> +VECT_VAR_DECL_INIT(buffer, float, 32, 2);
>> +PAD(buffer_pad, float, 32, 2);
>> +VECT_VAR_DECL_INIT(buffer, int, 8, 16);
>> +PAD(buffer_pad, int, 8, 16);
>> +VECT_VAR_DECL_INIT(buffer, int, 16, 8);
>> +PAD(buffer_pad, int, 16, 8);
>> +VECT_VAR_DECL_INIT(buffer, int, 32, 4);
>> +PAD(buffer_pad, int, 32, 4);
>> +VECT_VAR_DECL_INIT(buffer, int, 64, 2);
>> +PAD(buffer_pad, int, 64, 2);
>> +VECT_VAR_DECL_INIT(buffer, uint, 8, 16);
>> +PAD(buffer_pad, uint, 8, 16);
>> +VECT_VAR_DECL_INIT(buffer, uint, 16, 8);
>> +PAD(buffer_pad, uint, 16, 8);
>> +VECT_VAR_DECL_INIT(buffer, uint, 32, 4);
>> +PAD(buffer_pad, uint, 32, 4);
>> +VECT_VAR_DECL_INIT(buffer, uint, 64, 2);
>> +PAD(buffer_pad, uint, 64, 2);
>> +VECT_VAR_DECL_INIT(buffer, poly, 8, 16);
>> +PAD(buffer_pad, poly, 8, 16);
>> +VECT_VAR_DECL_INIT(buffer, poly, 16, 8);
>> +PAD(buffer_pad, poly, 16, 8);
>> +VECT_VAR_DECL_INIT(buffer, float, 32, 4);
>> +PAD(buffer_pad, float, 32, 4);
>> +
>> +/* The tests for vld1_dup and vdup expect at least 4 entries in the
>> +   input buffer, so force 1- and 2-elements initializers to have 4
>> +   entries (using VECT_VAR_DECL_INIT4).  */
>> +VECT_VAR_DECL_INIT(buffer_dup, int, 8, 8);
>> +VECT_VAR_DECL(buffer_dup_pad, int, 8, 8);
>> +VECT_VAR_DECL_INIT(buffer_dup, int, 16, 4);
>> +VECT_VAR_DECL(buffer_dup_pad, int, 16, 4);
>> +VECT_VAR_DECL_INIT4(buffer_dup, int, 32, 2);
>> +VECT_VAR_DECL(buffer_dup_pad, int, 32, 2);
>> +VECT_VAR_DECL_INIT4(buffer_dup, int, 64, 1);
>> +VECT_VAR_DECL(buffer_dup_pad, int, 64, 1);
>> +VECT_VAR_DECL_INIT(buffer_dup, uint, 8, 8);
>> +VECT_VAR_DECL(buffer_dup_pad, uint, 8, 8);
>> +VECT_VAR_DECL_INIT(buffer_dup, uint, 16, 4);
>> +VECT_VAR_DECL(buffer_dup_pad, uint, 16, 4);
>> +VECT_VAR_DECL_INIT4(buffer_dup, uint, 32, 2);
>> +VECT_VAR_DECL(buffer_dup_pad, uint, 32, 2);
>> +VECT_VAR_DECL_INIT4(buffer_dup, uint, 64, 1);
>> +VECT_VAR_DECL(buffer_dup_pad, uint, 64, 1);
>> +VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 8);
>> +VECT_VAR_DECL(buffer_dup_pad, poly, 8, 8);
>> +VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 4);
>> +VECT_VAR_DECL(buffer_dup_pad, poly, 16, 4);
>> +VECT_VAR_DECL_INIT4(buffer_dup, float, 32, 2);
>> +VECT_VAR_DECL(buffer_dup_pad, float, 32, 2);
>> +#if __ARM_NEON_FP16_INTRINSICS
>> +VECT_VAR_DECL(buffer_dup, float, 16, 4)[] = {0xcc00 /* -16 */, 0xcb80 /* -15 */,
>> +                                            0xcb00 /* -14 */, 0xca80 /* -13 */};
>> +PAD(buffer_dup_pad, float, 16, 4);
>> +#endif
>> +VECT_VAR_DECL_INIT(buffer_dup, int, 8, 16);
>> +VECT_VAR_DECL(buffer_dup_pad, int, 8, 16);
>> +VECT_VAR_DECL_INIT(buffer_dup, int, 16, 8);
>> +VECT_VAR_DECL(buffer_dup_pad, int, 16, 8);
>> +VECT_VAR_DECL_INIT(buffer_dup, int, 32, 4);
>> +VECT_VAR_DECL(buffer_dup_pad, int, 32, 4);
>> +VECT_VAR_DECL_INIT4(buffer_dup, int, 64, 2);
>> +VECT_VAR_DECL(buffer_dup_pad, int, 64, 2);
>> +VECT_VAR_DECL_INIT(buffer_dup, uint, 8, 16);
>> +VECT_VAR_DECL(buffer_dup_pad, uint, 8, 16);
>> +VECT_VAR_DECL_INIT(buffer_dup, uint, 16, 8);
>> +VECT_VAR_DECL(buffer_dup_pad, uint, 16, 8);
>> +VECT_VAR_DECL_INIT(buffer_dup, uint, 32, 4);
>> +VECT_VAR_DECL(buffer_dup_pad, uint, 32, 4);
>> +VECT_VAR_DECL_INIT4(buffer_dup, uint, 64, 2);
>> +VECT_VAR_DECL(buffer_dup_pad, uint, 64, 2);
>> +VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 16);
>> +VECT_VAR_DECL(buffer_dup_pad, poly, 8, 16);
>> +VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 8);
>> +VECT_VAR_DECL(buffer_dup_pad, poly, 16, 8);
>> +VECT_VAR_DECL_INIT(buffer_dup, float, 32, 4);
>> +VECT_VAR_DECL(buffer_dup_pad, float, 32, 4);
>> +#if __ARM_NEON_FP16_INTRINSICS
>> +VECT_VAR_DECL(buffer_dup, float, 16, 8)[] = {0xcc00 /* -16 */, 0xcb80 /* -15 */,
>> +                                            0xcb00 /* -14 */, 0xca80 /* -13 */,
>> +                                            0xca00 /* -12 */, 0xc980 /* -11 */,
>> +                                            0xc900 /* -10 */, 0xc880 /* -9 */};
>> +PAD(buffer_dup_pad, float, 16, 8);
>> +#endif
>> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/neon-intrinsics.exp b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/neon-intrinsics.exp
>> new file mode 100644
>> index 0000000..38bd207
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/neon-intrinsics.exp
>> @@ -0,0 +1,54 @@
>> +# Copyright (C) 2013 Free Software Foundation, Inc.
>> +
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with GCC; see the file COPYING3.  If not see
>> +# <http://www.gnu.org/licenses/>.
>> +
>> +# GCC testsuite that uses the `dg.exp' driver.
>> +
>> +# Exit immediately if this isn't an ARM or AArch64 target.
>> +if {![istarget arm*-*-*]
>> +    && ![istarget aarch64*-*-*]} then {
>> +  return
>> +}
>> +
>> +# Load support procs.
>> +load_lib gcc-dg.exp
>> +
>> +# Initialize `dg'.
>> +load_lib c-torture.exp
>> +load_lib target-supports.exp
>> +load_lib torture-options.exp
>> +
>> +dg-init
>> +
>> +torture-init
>> +set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS
>> +
>> +# Make sure Neon flags are provided, if necessary.
>> +set additional_flags [add_options_for_arm_neon ""]
>> +
>> +# Main loop.
>> +foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
>> +    # If we're only testing specific files and this isn't one of them, skip it.
>> +    if ![runtest_file_p $runtests $src] then {
>> +        continue
>> +    }
>> +
>> +    c-torture-execute $src $additional_flags
>> +    gcc-dg-runtest $src $additional_flags
>> +}
>> +
>> +# All done.
>> +torture-finish
>> +dg-finish
>> \ No newline at end of file
>> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vaba.c b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vaba.c
>> new file mode 100644
>> index 0000000..2465cd2
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vaba.c
>> @@ -0,0 +1,142 @@
>> +#include <arm_neon.h>
>> +#include "arm-neon-ref.h"
>> +#include "compute-ref-data.h"
>> +
>> +/* Expected results.  */
>> +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf6, 0xf7, 0xf8, 0xf9,
>> +                                      0xfa, 0xfb, 0xfc, 0xfd };
>> +VECT_VAR_DECL(expected,int,16,4) [] = { 0x16, 0x17, 0x18, 0x19 };
>> +VECT_VAR_DECL(expected,int,32,2) [] = { 0x20, 0x21 };
>> +VECT_VAR_DECL(expected,int,64,1) [] = { 0x3333333333333333 };
>> +VECT_VAR_DECL(expected,uint,8,8) [] = { 0x53, 0x54, 0x55, 0x56,
>> +                                       0x57, 0x58, 0x59, 0x5a };
>> +VECT_VAR_DECL(expected,uint,16,4) [] = { 0x907, 0x908, 0x909, 0x90a };
>> +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xffffffe7, 0xffffffe8 };
>> +VECT_VAR_DECL(expected,uint,64,1) [] = { 0x3333333333333333 };
>> +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
>> +                                       0x33, 0x33, 0x33, 0x33 };
>> +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x3333, 0x3333, 0x3333, 0x3333 };
>> +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x33333333, 0x33333333 };
>> +VECT_VAR_DECL(expected,int,8,16) [] = { 0x5e, 0x5f, 0x60, 0x61,
>> +                                       0x62, 0x63, 0x64, 0x65,
>> +                                       0x66, 0x67, 0x68, 0x69,
>> +                                       0x6a, 0x6b, 0x6c, 0x6d };
>> +VECT_VAR_DECL(expected,int,16,8) [] = { 0xb9c, 0xb9d, 0xb9e, 0xb9f,
>> +                                       0xba0, 0xba1, 0xba2, 0xba3 };
>> +VECT_VAR_DECL(expected,int,32,4) [] = { 0x26e0, 0x26e1, 0x26e2, 0x26e3 };
>> +VECT_VAR_DECL(expected,int,64,2) [] = { 0x3333333333333333,
>> +                                       0x3333333333333333 };
>> +VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf8, 0xf9, 0xfa, 0xfb,
>> +                                        0xfc, 0xfd, 0xfe, 0xff,
>> +                                        0x0, 0x1, 0x2, 0x3,
>> +                                        0x4, 0x5, 0x6, 0x7 };
>> +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff9, 0xfffa, 0xfffb, 0xfffc,
>> +                                        0xfffd, 0xfffe, 0xffff, 0x0 };
>> +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xc, 0xd, 0xe, 0xf };
>> +VECT_VAR_DECL(expected,uint,64,2) [] = { 0x3333333333333333,
>> +                                        0x3333333333333333 };
>> +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
>> +                                        0x33, 0x33, 0x33, 0x33,
>> +                                        0x33, 0x33, 0x33, 0x33,
>> +                                        0x33, 0x33, 0x33, 0x33 };
>> +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x3333, 0x3333, 0x3333, 0x3333,
>> +                                        0x3333, 0x3333, 0x3333, 0x3333 };
>> +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x33333333, 0x33333333,
>> +                                          0x33333333, 0x33333333 };
>> +
>> +#define TEST_MSG "VABA/VABAQ"
>> +void exec_vaba (void)
>> +{
>> +  /* Basic test: v4=vaba(v1,v2,v3), then store the result.  */
>> +#define TEST_VABA(Q, T1, T2, W, N)                                     \
>> +  VECT_VAR(vector_res, T1, W, N) =                                     \
>> +    vaba##Q##_##T2##W(VECT_VAR(vector1, T1, W, N),                     \
>> +                     VECT_VAR(vector2, T1, W, N),                      \
>> +                     VECT_VAR(vector3, T1, W, N));                     \
>> +  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
>> +
>> +#define DECL_VABA_VAR(VAR)                     \
>> +  DECL_VARIABLE(VAR, int, 8, 8);               \
>> +  DECL_VARIABLE(VAR, int, 16, 4);              \
>> +  DECL_VARIABLE(VAR, int, 32, 2);              \
>> +  DECL_VARIABLE(VAR, uint, 8, 8);              \
>> +  DECL_VARIABLE(VAR, uint, 16, 4);             \
>> +  DECL_VARIABLE(VAR, uint, 32, 2);             \
>> +  DECL_VARIABLE(VAR, int, 8, 16);              \
>> +  DECL_VARIABLE(VAR, int, 16, 8);              \
>> +  DECL_VARIABLE(VAR, int, 32, 4);              \
>> +  DECL_VARIABLE(VAR, uint, 8, 16);             \
>> +  DECL_VARIABLE(VAR, uint, 16, 8);             \
>> +  DECL_VARIABLE(VAR, uint, 32, 4)
>> +
>> +  DECL_VABA_VAR(vector1);
>> +  DECL_VABA_VAR(vector2);
>> +  DECL_VABA_VAR(vector3);
>> +  DECL_VABA_VAR(vector_res);
>> +
>> +  clean_results ();
>> +
>> +  /* Initialize input "vector1" from "buffer".  */
>> +  VLOAD(vector1, buffer, , int, s, 8, 8);
>> +  VLOAD(vector1, buffer, , int, s, 16, 4);
>> +  VLOAD(vector1, buffer, , int, s, 32, 2);
>> +  VLOAD(vector1, buffer, , uint, u, 8, 8);
>> +  VLOAD(vector1, buffer, , uint, u, 16, 4);
>> +  VLOAD(vector1, buffer, , uint, u, 32, 2);
>> +  VLOAD(vector1, buffer, q, int, s, 8, 16);
>> +  VLOAD(vector1, buffer, q, int, s, 16, 8);
>> +  VLOAD(vector1, buffer, q, int, s, 32, 4);
>> +  VLOAD(vector1, buffer, q, uint, u, 8, 16);
>> +  VLOAD(vector1, buffer, q, uint, u, 16, 8);
>> +  VLOAD(vector1, buffer, q, uint, u, 32, 4);
>> +
>> +  /* Choose init value arbitrarily.  */
>> +  VDUP(vector2, , int, s, 8, 8, 1);
>> +  VDUP(vector2, , int, s, 16, 4, -13);
>> +  VDUP(vector2, , int, s, 32, 2, 8);
>> +  VDUP(vector2, , uint, u, 8, 8, 1);
>> +  VDUP(vector2, , uint, u, 16, 4, 13);
>> +  VDUP(vector2, , uint, u, 32, 2, 8);
>> +  VDUP(vector2, q, int, s, 8, 16, 10);
>> +  VDUP(vector2, q, int, s, 16, 8, -12);
>> +  VDUP(vector2, q, int, s, 32, 4, 32);
>> +  VDUP(vector2, q, uint, u, 8, 16, 10);
>> +  VDUP(vector2, q, uint, u, 16, 8, 12);
>> +  VDUP(vector2, q, uint, u, 32, 4, 32);
>> +
>> +  /* Choose init value arbitrarily.  */
>> +  VDUP(vector3, , int, s, 8, 8, -5);
>> +  VDUP(vector3, , int, s, 16, 4, 25);
>> +  VDUP(vector3, , int, s, 32, 2, -40);
>> +  VDUP(vector3, , uint, u, 8, 8, 100);
>> +  VDUP(vector3, , uint, u, 16, 4, 2340);
>> +  VDUP(vector3, , uint, u, 32, 2, 0xffffffff);
>> +  VDUP(vector3, q, int, s, 8, 16, -100);
>> +  VDUP(vector3, q, int, s, 16, 8, -3000);
>> +  VDUP(vector3, q, int, s, 32, 4, 10000);
>> +  VDUP(vector3, q, uint, u, 8, 16, 2);
>> +  VDUP(vector3, q, uint, u, 16, 8, 3);
>> +  VDUP(vector3, q, uint, u, 32, 4, 4);
>> +
>> +  /* Execute the tests.  */
>> +  TEST_VABA(, int, s, 8, 8);
>> +  TEST_VABA(, int, s, 16, 4);
>> +  TEST_VABA(, int, s, 32, 2);
>> +  TEST_VABA(, uint, u, 8, 8);
>> +  TEST_VABA(, uint, u, 16, 4);
>> +  TEST_VABA(, uint, u, 32, 2);
>> +  TEST_VABA(q, int, s, 8, 16);
>> +  TEST_VABA(q, int, s, 16, 8);
>> +  TEST_VABA(q, int, s, 32, 4);
>> +  TEST_VABA(q, uint, u, 8, 16);
>> +  TEST_VABA(q, uint, u, 16, 8);
>> +  TEST_VABA(q, uint, u, 32, 4);
>> +
>> +  CHECK_RESULTS (TEST_MSG, "");
>> +}
>> +
>> +int main (void)
>> +{
>> +  exec_vaba ();
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vld1.c b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vld1.c
>> new file mode 100644
>> index 0000000..784e004
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vld1.c
>> @@ -0,0 +1,84 @@
>> +#include <arm_neon.h>
>> +#include "arm-neon-ref.h"
>> +#include "compute-ref-data.h"
>> +
>> +/* Expected results.  */
>> +VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
>> +                                      0xf4, 0xf5, 0xf6, 0xf7 };
>> +VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
>> +VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff0, 0xfffffff1 };
>> +VECT_VAR_DECL(expected,int,64,1) [] = { 0xfffffffffffffff0 };
>> +VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
>> +                                       0xf4, 0xf5, 0xf6, 0xf7 };
>> +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
>> +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
>> +VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
>> +VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
>> +                                       0xf4, 0xf5, 0xf6, 0xf7 };
>> +VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
>> +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
>> +VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
>> +                                       0xf4, 0xf5, 0xf6, 0xf7,
>> +                                       0xf8, 0xf9, 0xfa, 0xfb,
>> +                                       0xfc, 0xfd, 0xfe, 0xff };
>> +VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
>> +                                       0xfff4, 0xfff5, 0xfff6, 0xfff7 };
>> +VECT_VAR_DECL(expected,int,32,4) [] = { 0xfffffff0, 0xfffffff1,
>> +                                       0xfffffff2, 0xfffffff3 };
>> +VECT_VAR_DECL(expected,int,64,2) [] = { 0xfffffffffffffff0,
>> +                                       0xfffffffffffffff1 };
>> +VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
>> +                                        0xf4, 0xf5, 0xf6, 0xf7,
>> +                                        0xf8, 0xf9, 0xfa, 0xfb,
>> +                                        0xfc, 0xfd, 0xfe, 0xff };
>> +VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0xfff1, 0xfff2,
>> +                                        0xfff3, 0xfff4, 0xfff5,
>> +                                        0xfff6, 0xfff7 };
>> +VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
>> +                                        0xfffffff2, 0xfffffff3 };
>> +VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfffffffffffffff0,
>> +                                        0xfffffffffffffff1 };
>> +VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
>> +                                        0xf4, 0xf5, 0xf6, 0xf7,
>> +                                        0xf8, 0xf9, 0xfa, 0xfb,
>> +                                        0xfc, 0xfd, 0xfe, 0xff };
>> +VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
>> +                                        0xfff4, 0xfff5, 0xfff6, 0xfff7 };
>> +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
>> +                                          0xc1600000, 0xc1500000 };
>> +
>> +#define TEST_MSG "VLD1/VLD1Q"
>> +void exec_vld1 (void)
>> +{
>> +  /* Basic test vec=vld1(buffer); then store vec: vst1(result, vector).  */
>> +  /* This test actually tests vdl1 and vst1 at the same time.  */
>> +#define TEST_VLD1(VAR, BUF, Q, T1, T2, W, N)                           \
>> +  VECT_VAR(VAR, T1, W, N) = vld1##Q##_##T2##W(VECT_VAR(BUF, T1, W, N)); \
>> +  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(VAR, T1, W, N))
>> +
>> +  DECL_VARIABLE_ALL_VARIANTS(vector);
>> +#if __ARM_NEON_FP16_INTRINSICS
>> +  DECL_VARIABLE(vector, float, 16, 4);
>> +  DECL_VARIABLE(vector, float, 16, 8);
>> +#endif
>> +
>> +  clean_results ();
>> +
>> +  TEST_MACRO_ALL_VARIANTS_2_5(TEST_VLD1, vector, buffer);
>> +
>> +  TEST_VLD1(vector, buffer, , float, f, 32, 2);
>> +  TEST_VLD1(vector, buffer, q, float, f, 32, 4);
>> +
>> +#if __ARM_NEON_FP16_INTRINSICS
>> +  TEST_VLD1(vector, buffer, , float, f, 16, 4);
>> +  TEST_VLD1(vector, buffer, q, float, f, 16, 8);
>> +#endif
>> +
>> +  CHECK_RESULTS (TEST_MSG, "");
>> +}
>> +
>> +int main (void)
>> +{
>> +  exec_vld1 ();
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vshl.c b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vshl.c
>> new file mode 100644
>> index 0000000..e64d6e3
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vshl.c
>> @@ -0,0 +1,230 @@
>> +#include <arm_neon.h>
>> +#include "arm-neon-ref.h"
>> +#include "compute-ref-data.h"
>> +
>> +/* Expected results.  */
>> +VECT_VAR_DECL(expected,int,8,8) [] = { 0xe0, 0xe2, 0xe4, 0xe6,
>> +                                      0xe8, 0xea, 0xec, 0xee };
>> +VECT_VAR_DECL(expected,int,16,4) [] = { 0xff80, 0xff88, 0xff90, 0xff98 };
>> +VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffff000, 0xfffff100 };
>> +VECT_VAR_DECL(expected,int,64,1) [] = { 0xffffffffffffff80 };
>> +VECT_VAR_DECL(expected,uint,8,8) [] = { 0xe0, 0xe2, 0xe4, 0xe6,
>> +                                       0xe8, 0xea, 0xec, 0xee };
>> +VECT_VAR_DECL(expected,uint,16,4) [] = { 0xff80, 0xff88, 0xff90, 0xff98 };
>> +VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffff000, 0xfffff100 };
>> +VECT_VAR_DECL(expected,uint,64,1) [] = { 0xffffffffffffff80 };
>> +VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
>> +                                       0x33, 0x33, 0x33, 0x33 };
>> +VECT_VAR_DECL(expected,poly,16,4) [] = { 0x3333, 0x3333,
>> +                                        0x3333, 0x3333 };
>> +VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x33333333, 0x33333333 };
>> +VECT_VAR_DECL(expected,int,8,16) [] = { 0x0, 0x20, 0x40, 0x60,
>> +                                       0x80, 0xa0, 0xc0, 0xe0,
>> +                                       0x0, 0x20, 0x40, 0x60,
>> +                                       0x80, 0xa0, 0xc0, 0xe0 };
>> +VECT_VAR_DECL(expected,int,16,8) [] = { 0x0, 0x1000, 0x2000, 0x3000,
>> +                                       0x4000, 0x5000, 0x6000, 0x7000 };
>> +VECT_VAR_DECL(expected,int,32,4) [] = { 0x0, 0x40000000,
>> +                                       0x80000000, 0xc0000000 };
>> +VECT_VAR_DECL(expected,int,64,2) [] = { 0x0, 0x8000000000000000 };
>> +VECT_VAR_DECL(expected,uint,8,16) [] = { 0x0, 0x20, 0x40, 0x60,
>> +                                        0x80, 0xa0, 0xc0, 0xe0,
>> +                                        0x0, 0x20, 0x40, 0x60,
>> +                                        0x80, 0xa0, 0xc0, 0xe0 };
>> +VECT_VAR_DECL(expected,uint,16,8) [] = { 0x0, 0x1000, 0x2000, 0x3000,
>> +                                        0x4000, 0x5000, 0x6000, 0x7000 };
>> +VECT_VAR_DECL(expected,uint,32,4) [] = { 0x0, 0x40000000,
>> +                                        0x80000000, 0xc0000000 };
>> +VECT_VAR_DECL(expected,uint,64,2) [] = { 0x0, 0x8000000000000000 };
>> +VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
>> +                                        0x33, 0x33, 0x33, 0x33,
>> +                                        0x33, 0x33, 0x33, 0x33,
>> +                                        0x33, 0x33, 0x33, 0x33 };
>> +VECT_VAR_DECL(expected,poly,16,8) [] = { 0x3333, 0x3333, 0x3333, 0x3333,
>> +                                        0x3333, 0x3333, 0x3333, 0x3333 };
>> +VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x33333333, 0x33333333,
>> +                                          0x33333333, 0x33333333 };
>> +
>> +/* Expected results with large shift amount.  */
>> +VECT_VAR_DECL(expected_large_shift,int,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
>> +                                                  0x0, 0x0, 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,int,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,int,32,2) [] = { 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,int,64,1) [] = { 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,uint,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
>> +                                                   0x0, 0x0, 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,uint,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,uint,32,2) [] = { 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,uint,64,1) [] = { 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
>> +                                                   0x33, 0x33, 0x33, 0x33 };
>> +VECT_VAR_DECL(expected_large_shift,poly,16,4) [] = { 0x3333, 0x3333,
>> +                                                    0x3333, 0x3333 };
>> +VECT_VAR_DECL(expected_large_shift,hfloat,32,2) [] = { 0x33333333, 0x33333333 };
>> +VECT_VAR_DECL(expected_large_shift,int,8,16) [] = { 0x0, 0x0, 0x0, 0x0,
>> +                                                   0x0, 0x0, 0x0, 0x0,
>> +                                                   0x0, 0x0, 0x0, 0x0,
>> +                                                   0x0, 0x0, 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
>> +                                                   0x0, 0x0, 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,int,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,int,64,2) [] = { 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,uint,8,16) [] = { 0x0, 0x0, 0x0, 0x0,
>> +                                                    0x0, 0x0, 0x0, 0x0,
>> +                                                    0x0, 0x0, 0x0, 0x0,
>> +                                                    0x0, 0x0, 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
>> +                                                    0x0, 0x0, 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,uint,64,2) [] = { 0x0, 0x0 };
>> +VECT_VAR_DECL(expected_large_shift,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
>> +                                                    0x33, 0x33, 0x33, 0x33,
>> +                                                    0x33, 0x33, 0x33, 0x33,
>> +                                                    0x33, 0x33, 0x33, 0x33 };
>> +VECT_VAR_DECL(expected_large_shift,poly,16,8) [] = { 0x3333, 0x3333,
>> +                                                    0x3333, 0x3333,
>> +                                                    0x3333, 0x3333,
>> +                                                    0x3333, 0x3333 };
>> +VECT_VAR_DECL(expected_large_shift,hfloat,32,4) [] = { 0x33333333, 0x33333333,
>> +                                                      0x33333333, 0x33333333 };
>> +
>> +
>> +/* Expected results with negative shift amount.  */
>> +VECT_VAR_DECL(expected_negative_shift,int,8,8) [] = { 0xf8, 0xf8, 0xf9, 0xf9,
>> +                                                     0xfa, 0xfa, 0xfb, 0xfb };
>> +VECT_VAR_DECL(expected_negative_shift,int,16,4) [] = { 0xfff8, 0xfff8,
>> +                                                      0xfff9, 0xfff9  };
>> +VECT_VAR_DECL(expected_negative_shift,int,32,2) [] = { 0xfffffffc, 0xfffffffc };
>> +VECT_VAR_DECL(expected_negative_shift,int,64,1) [] = { 0xffffffffffffffff };
>> +VECT_VAR_DECL(expected_negative_shift,uint,8,8) [] = { 0x78, 0x78, 0x79, 0x79,
>> +                                                      0x7a, 0x7a, 0x7b, 0x7b };
>> +VECT_VAR_DECL(expected_negative_shift,uint,16,4) [] = { 0x7ff8, 0x7ff8,
>> +                                                       0x7ff9, 0x7ff9 };
>> +VECT_VAR_DECL(expected_negative_shift,uint,32,2) [] = { 0x3ffffffc,
>> +                                                       0x3ffffffc };
>> +VECT_VAR_DECL(expected_negative_shift,uint,64,1) [] = { 0xfffffffffffffff };
>> +VECT_VAR_DECL(expected_negative_shift,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
>> +                                                      0x33, 0x33, 0x33, 0x33 };
>> +VECT_VAR_DECL(expected_negative_shift,poly,16,4) [] = { 0x3333, 0x3333,
>> +                                                       0x3333, 0x3333 };
>> +VECT_VAR_DECL(expected_negative_shift,hfloat,32,2) [] = { 0x33333333,
>> +                                                         0x33333333 };
>> +VECT_VAR_DECL(expected_negative_shift,int,8,16) [] = { 0xfc, 0xfc, 0xfc, 0xfc,
>> +                                                      0xfd, 0xfd, 0xfd, 0xfd,
>> +                                                      0xfe, 0xfe, 0xfe, 0xfe,
>> +                                                      0xff, 0xff, 0xff, 0xff };
>> +VECT_VAR_DECL(expected_negative_shift,int,16,8) [] = { 0xffff, 0xffff,
>> +                                                      0xffff, 0xffff,
>> +                                                      0xffff, 0xffff,
>> +                                                      0xffff, 0xffff };
>> +VECT_VAR_DECL(expected_negative_shift,int,32,4) [] = {  0xfffffffe, 0xfffffffe,
>> +                                                       0xfffffffe, 0xfffffffe };
>> +VECT_VAR_DECL(expected_negative_shift,int,64,2) [] = { 0xffffffffffffffff,
>> +                                                      0xffffffffffffffff };
>> +VECT_VAR_DECL(expected_negative_shift,uint,8,16) [] = { 0x3c, 0x3c, 0x3c, 0x3c,
>> +                                                       0x3d, 0x3d, 0x3d, 0x3d,
>> +                                                       0x3e, 0x3e, 0x3e, 0x3e,
>> +                                                       0x3f, 0x3f, 0x3f, 0x3f };
>> +VECT_VAR_DECL(expected_negative_shift,uint,16,8) [] = { 0x7ff, 0x7ff,
>> +                                                       0x7ff, 0x7ff,
>> +                                                       0x7ff, 0x7ff,
>> +                                                       0x7ff, 0x7ff };
>> +VECT_VAR_DECL(expected_negative_shift,uint,32,4) [] = { 0x1ffffffe, 0x1ffffffe,
>> +                                                       0x1ffffffe, 0x1ffffffe };
>> +VECT_VAR_DECL(expected_negative_shift,uint,64,2) [] = { 0x7ffffffffffffff,
>> +                                                       0x7ffffffffffffff };
>> +VECT_VAR_DECL(expected_negative_shift,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
>> +                                                       0x33, 0x33, 0x33, 0x33,
>> +                                                       0x33, 0x33, 0x33, 0x33,
>> +                                                       0x33, 0x33, 0x33, 0x33 };
>> +VECT_VAR_DECL(expected_negative_shift,poly,16,8) [] = { 0x3333, 0x3333,
>> +                                                       0x3333, 0x3333,
>> +                                                       0x3333, 0x3333,
>> +                                                       0x3333, 0x3333 };
>> +VECT_VAR_DECL(expected_negative_shift,hfloat,32,4) [] = { 0x33333333,
>> +                                                         0x33333333,
>> +                                                         0x33333333,
>> +                                                         0x33333333 };
>> +
>> +
>> +#ifndef INSN_NAME
>> +#define INSN_NAME vshl
>> +#define TEST_MSG "VSHL/VSHLQ"
>> +#endif
>> +
>> +#define FNNAME1(NAME) exec_ ## NAME
>> +#define FNNAME(NAME) FNNAME1(NAME)
>> +
>> +void FNNAME (INSN_NAME) (void)
>> +{
>> +  /* Basic test: v3=vshl(v1,v2), then store the result.  */
>> +#define TEST_VSHL(T3, Q, T1, T2, W, N)                                 \
>> +  VECT_VAR(vector_res, T1, W, N) =                                     \
>> +    vshl##Q##_##T2##W(VECT_VAR(vector, T1, W, N),                      \
>> +                     VECT_VAR(vector_shift, T3, W, N));                \
>> +  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
>> +
>> +  DECL_VARIABLE_ALL_VARIANTS(vector);
>> +  DECL_VARIABLE_ALL_VARIANTS(vector_res);
>> +
>> +  DECL_VARIABLE_SIGNED_VARIANTS(vector_shift);
>> +
>> +  clean_results ();
>> +
>> +  /* Initialize input "vector" from "buffer".  */
>> +  TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
>> +
>> +  /* Choose init value arbitrarily, will be used as shift amount.  */
>> +  VDUP(vector_shift, , int, s, 8, 8, 1);
>> +  VDUP(vector_shift, , int, s, 16, 4, 3);
>> +  VDUP(vector_shift, , int, s, 32, 2, 8);
>> +  VDUP(vector_shift, , int, s, 64, 1, 3);
>> +  VDUP(vector_shift, q, int, s, 8, 16, 5);
>> +  VDUP(vector_shift, q, int, s, 16, 8, 12);
>> +  VDUP(vector_shift, q, int, s, 32, 4, 30);
>> +  VDUP(vector_shift, q, int, s, 64, 2, 63);
>> +
>> +  /* Execute the tests.  */
>> +  TEST_MACRO_ALL_VARIANTS_1_5(TEST_VSHL, int);
>> +
>> +  CHECK_RESULTS (TEST_MSG, "");
>> +
>> +
>> +  /* Test large shift amount (larger or equal to the type width.  */
>> +  VDUP(vector_shift, , int, s, 8, 8, 8);
>> +  VDUP(vector_shift, , int, s, 16, 4, 16);
>> +  VDUP(vector_shift, , int, s, 32, 2, 32);
>> +  VDUP(vector_shift, , int, s, 64, 1, 64);
>> +  VDUP(vector_shift, q, int, s, 8, 16, 8);
>> +  VDUP(vector_shift, q, int, s, 16, 8, 17);
>> +  VDUP(vector_shift, q, int, s, 32, 4, 33);
>> +  VDUP(vector_shift, q, int, s, 64, 2, 65);
>> +
>> +  /* Execute the tests.  */
>> +  TEST_MACRO_ALL_VARIANTS_1_5(TEST_VSHL, int);
>> +
>> +  CHECK_RESULTS_NAMED (TEST_MSG, expected_large_shift, "(large shift amount)");
>> +
>> +
>> +  /* Test negative shift amount. */
>> +  VDUP(vector_shift, , int, s, 8, 8, -1);
>> +  VDUP(vector_shift, , int, s, 16, 4, -1);
>> +  VDUP(vector_shift, , int, s, 32, 2, -2);
>> +  VDUP(vector_shift, , int, s, 64, 1, -4);
>> +  VDUP(vector_shift, q, int, s, 8, 16, -2);
>> +  VDUP(vector_shift, q, int, s, 16, 8, -5);
>> +  VDUP(vector_shift, q, int, s, 32, 4, -3);
>> +  VDUP(vector_shift, q, int, s, 64, 2, -5);
>> +
>> +  /* Execute the tests.  */
>> +  TEST_MACRO_ALL_VARIANTS_1_5(TEST_VSHL, int);
>> +
>> +  CHECK_RESULTS_NAMED (TEST_MSG, expected_negative_shift, "(negative shift amount)");
>> +}
>> +
>> +int main (void)
>> +{
>> +  FNNAME (INSN_NAME) ();
>> +
>> +  return 0;
>> +}
>> diff --git a/gcc/testsuite/gcc.target/arm/README.neon-intrinsics b/gcc/testsuite/gcc.target/arm/README.neon-intrinsics
>> new file mode 100644
>> index 0000000..f246349
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/arm/README.neon-intrinsics
>> @@ -0,0 +1 @@
>> +Advanced SIMD intrinsics tests are located in gcc.target/aarch64.
>> --
>> 1.8.3.2
>>
Marcus Shawcroft July 10, 2014, 10:12 a.m. UTC | #3
On 1 July 2014 11:05, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> * documentation (README)
> * dejanu driver (neon-intrinsics.exp)
> * support macros (arm-neon-ref.h, compute-ref-data.h)
> * Tests for 3 intrinsics: vaba, vld1, vshl

Hi, The terminology in armv8 is advsimd rather than neon.  Can we
rename neon-intrinsics to advsimd-intrinsics or simd-intrinsics
throughout please.  The existing gcc.target/aarch64/simd directory of
tests will presumably be superseded by this more comprehensive set of
tests so I suggest these tests go in gcc.target/aarch64/advsimd and we
eventually remove gcc.target/aarch64/simd/ directory.

GNU style should apply throughout this patch series, notably double
space after period in comments and README text.  Space before left
parenthesis in function/macro call and function declaration.  The
function name in a declaration goes on a new line.  The GCC wiki notes
on test case state individual test should have file names ending in
_<number>, see here https://gcc.gnu.org/wiki/TestCaseWriting

I'm OK with the execute only no scan nature of the tests.

> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
> new file mode 100644
> index 0000000..232bb1d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
> @@ -0,0 +1,132 @@
> +This directory contains executable tests for ARM/AArch64 Neon
> +intrinsics.

Neon -> Advanced SIMD as below.

> +
> +It is meant to cover execution cases of all the Advanced SIMD
> +intrinsics, but does not scan the generated assembler code.

> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
> +
> +typedef union {
> +  struct {

GNUstyle { on new lne.

> +#define Neon_Cumulative_Sat  __read_neon_cumulative_sat()
> +#define Set_Neon_Cumulative_Sat(x)  __set_neon_cumulative_sat((x))

Upper case the macro's rather than camel case.

> +# Copyright (C) 2013 Free Software Foundation, Inc.

s/13/14/

Cheers
/Marcus
Richard Earnshaw July 11, 2014, 10:41 a.m. UTC | #4
On 10/07/14 11:12, Marcus Shawcroft wrote:
> On 1 July 2014 11:05, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>> * documentation (README)
>> * dejanu driver (neon-intrinsics.exp)
>> * support macros (arm-neon-ref.h, compute-ref-data.h)
>> * Tests for 3 intrinsics: vaba, vld1, vshl
> 
> Hi, The terminology in armv8 is advsimd rather than neon.  Can we
> rename neon-intrinsics to advsimd-intrinsics or simd-intrinsics
> throughout please.  The existing gcc.target/aarch64/simd directory of
> tests will presumably be superseded by this more comprehensive set of
> tests so I suggest these tests go in gcc.target/aarch64/advsimd and we
> eventually remove gcc.target/aarch64/simd/ directory.
> 
> GNU style should apply throughout this patch series, notably double
> space after period in comments and README text.  Space before left
> parenthesis in function/macro call and function declaration.  The
> function name in a declaration goes on a new line.  The GCC wiki notes
> on test case state individual test should have file names ending in
> _<number>, see here https://gcc.gnu.org/wiki/TestCaseWriting
> 
> I'm OK with the execute only no scan nature of the tests.
> 
>> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
>> new file mode 100644
>> index 0000000..232bb1d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
>> @@ -0,0 +1,132 @@
>> +This directory contains executable tests for ARM/AArch64 Neon
>> +intrinsics.
> 
> Neon -> Advanced SIMD as below.

On first use, I think "Advanced SIMD (Neon)" is even better.

R.

> 
>> +
>> +It is meant to cover execution cases of all the Advanced SIMD
>> +intrinsics, but does not scan the generated assembler code.
> 
>> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
>> +
>> +typedef union {
>> +  struct {
> 
> GNUstyle { on new lne.
> 
>> +#define Neon_Cumulative_Sat  __read_neon_cumulative_sat()
>> +#define Set_Neon_Cumulative_Sat(x)  __set_neon_cumulative_sat((x))
> 
> Upper case the macro's rather than camel case.
> 
>> +# Copyright (C) 2013 Free Software Foundation, Inc.
> 
> s/13/14/
> 
> Cheers
> /Marcus
>
Christophe Lyon Sept. 30, 2014, 2:27 p.m. UTC | #5
On 10 July 2014 12:12, Marcus Shawcroft <marcus.shawcroft@gmail.com> wrote:
> On 1 July 2014 11:05, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>> * documentation (README)
>> * dejanu driver (neon-intrinsics.exp)
>> * support macros (arm-neon-ref.h, compute-ref-data.h)
>> * Tests for 3 intrinsics: vaba, vld1, vshl
>
> Hi, The terminology in armv8 is advsimd rather than neon.  Can we
> rename neon-intrinsics to advsimd-intrinsics or simd-intrinsics
> throughout please.  The existing gcc.target/aarch64/simd directory of
> tests will presumably be superseded by this more comprehensive set of
> tests so I suggest these tests go in gcc.target/aarch64/advsimd and we
> eventually remove gcc.target/aarch64/simd/ directory.
>
> GNU style should apply throughout this patch series, notably double
> space after period in comments and README text.  Space before left
> parenthesis in function/macro call and function declaration.  The
> function name in a declaration goes on a new line.  The GCC wiki notes
> on test case state individual test should have file names ending in
> _<number>, see here https://gcc.gnu.org/wiki/TestCaseWriting
>

Hi,

For the record, these tests are based on a testsuite I wrote quite
some time ago:
https://gitorious.org/arm-neon-tests/

where obviously I had no such requirement (and v8 wasn't public yet)

So I prefer to apply the changes you request in my main version before
re-submitting it here.
(libsanitizer-style, sort-of....).

This will take me some time, so the next version of my patch series
should not be expected really soon :-(

Christophe.


> I'm OK with the execute only no scan nature of the tests.
>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
>> new file mode 100644
>> index 0000000..232bb1d
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
>> @@ -0,0 +1,132 @@
>> +This directory contains executable tests for ARM/AArch64 Neon
>> +intrinsics.
>
> Neon -> Advanced SIMD as below.
>
>> +
>> +It is meant to cover execution cases of all the Advanced SIMD
>> +intrinsics, but does not scan the generated assembler code.
>
>> +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
>> +
>> +typedef union {
>> +  struct {
>
> GNUstyle { on new lne.
>
>> +#define Neon_Cumulative_Sat  __read_neon_cumulative_sat()
>> +#define Set_Neon_Cumulative_Sat(x)  __set_neon_cumulative_sat((x))
>
> Upper case the macro's rather than camel case.
>
>> +# Copyright (C) 2013 Free Software Foundation, Inc.
>
> s/13/14/
>
> Cheers
> /Marcus
Marcus Shawcroft Oct. 1, 2014, 3:11 p.m. UTC | #6
On 30 September 2014 15:27, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> On 10 July 2014 12:12, Marcus Shawcroft <marcus.shawcroft@gmail.com> wrote:
>> On 1 July 2014 11:05, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>>> * documentation (README)
>>> * dejanu driver (neon-intrinsics.exp)
>>> * support macros (arm-neon-ref.h, compute-ref-data.h)
>>> * Tests for 3 intrinsics: vaba, vld1, vshl
>>
>> Hi, The terminology in armv8 is advsimd rather than neon.  Can we
>> rename neon-intrinsics to advsimd-intrinsics or simd-intrinsics
>> throughout please.  The existing gcc.target/aarch64/simd directory of
>> tests will presumably be superseded by this more comprehensive set of
>> tests so I suggest these tests go in gcc.target/aarch64/advsimd and we
>> eventually remove gcc.target/aarch64/simd/ directory.
>>
>> GNU style should apply throughout this patch series, notably double
>> space after period in comments and README text.  Space before left
>> parenthesis in function/macro call and function declaration.  The
>> function name in a declaration goes on a new line.  The GCC wiki notes
>> on test case state individual test should have file names ending in
>> _<number>, see here https://gcc.gnu.org/wiki/TestCaseWriting
>>
>
> Hi,
>
> For the record, these tests are based on a testsuite I wrote quite
> some time ago:
> https://gitorious.org/arm-neon-tests/
>
> where obviously I had no such requirement (and v8 wasn't public yet)
>
> So I prefer to apply the changes you request in my main version before
> re-submitting it here.
> (libsanitizer-style, sort-of....).
>
> This will take me some time, so the next version of my patch series
> should not be expected really soon :-(


Hi Christophe,   Given that this test suite code is an existing body
of work I see no reason to impose the GNU style change I originally
asked for. I withdraw my original comment that these patches should
conform to GNU style.  My comment on file names is also withdrawn.  I
would like to see the terminology corrected.

Thanks
/Marcus
Christophe Lyon Oct. 7, 2014, 1:33 p.m. UTC | #7
On 1 October 2014 17:11, Marcus Shawcroft <marcus.shawcroft@gmail.com> wrote:
> On 30 September 2014 15:27, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>> On 10 July 2014 12:12, Marcus Shawcroft <marcus.shawcroft@gmail.com> wrote:
>>> On 1 July 2014 11:05, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>>>> * documentation (README)
>>>> * dejanu driver (neon-intrinsics.exp)
>>>> * support macros (arm-neon-ref.h, compute-ref-data.h)
>>>> * Tests for 3 intrinsics: vaba, vld1, vshl
>>>
>>> Hi, The terminology in armv8 is advsimd rather than neon.  Can we
>>> rename neon-intrinsics to advsimd-intrinsics or simd-intrinsics
>>> throughout please.  The existing gcc.target/aarch64/simd directory of
>>> tests will presumably be superseded by this more comprehensive set of
>>> tests so I suggest these tests go in gcc.target/aarch64/advsimd and we
>>> eventually remove gcc.target/aarch64/simd/ directory.
>>>
>>> GNU style should apply throughout this patch series, notably double
>>> space after period in comments and README text.  Space before left
>>> parenthesis in function/macro call and function declaration.  The
>>> function name in a declaration goes on a new line.  The GCC wiki notes
>>> on test case state individual test should have file names ending in
>>> _<number>, see here https://gcc.gnu.org/wiki/TestCaseWriting
>>>
>>
>> Hi,
>>
>> For the record, these tests are based on a testsuite I wrote quite
>> some time ago:
>> https://gitorious.org/arm-neon-tests/
>>
>> where obviously I had no such requirement (and v8 wasn't public yet)
>>
>> So I prefer to apply the changes you request in my main version before
>> re-submitting it here.
>> (libsanitizer-style, sort-of....).
>>
>> This will take me some time, so the next version of my patch series
>> should not be expected really soon :-(
>
>
Ramana, Marcus,

> Hi Christophe,   Given that this test suite code is an existing body
> of work I see no reason to impose the GNU style change I originally
> asked for. I withdraw my original comment that these patches should
> conform to GNU style.  My comment on file names is also withdrawn.  I
> would like to see the terminology corrected.
>

Thanks, I have updated my patch according to this.

But meanwhile I have also updated my testsuite, and fixed the #define
flag I used to toggle float16 tests: I now use __ARM_FP16_FORMAT_IEEE,
such as:
#if defined(__ARM_FP16_FORMAT_IEEE)
  TEST_VLD1(vector, buffer, , float, f, 16, 4);
  TEST_VLD1(vector, buffer, q, float, f, 16, 8);
#endif

Which reminded me that:
- on ARM (AArch32), float16x4_t is supported, but float16x8_t isn't yet
- on AArch64, -mfp16-format=ieee is rejected, and I didn't see a
similar option in the doc

What do you prefer me to do for these tests? I can think of:
- do not include them at all until fp16 is fully supported on both
AArch32 and AArch64
- include only those with float16x4_t
- include both float16x4_t and float16x8_t tests, leaving float16x8_t commented
- include both, uncommented, but do not test with -mfp16-format=ieee

Thanks,

Christophe.


> Thanks
> /Marcus
Ramana Radhakrishnan Oct. 8, 2014, 12:16 p.m. UTC | #8
> What do you prefer me to do for these tests? I can think of:
> - do not include them at all until fp16 is fully supported on both
> AArch32 and AArch64
> - include only those with float16x4_t
> - include both float16x4_t and float16x8_t tests, leaving float16x8_t commented

I would include them both and XFAIL the failing tests. So when the 
support arrives, it's just a question of adjusting the tests.


regards
Ramana


>
> Thanks,
>
> Christophe.
>
>
>> Thanks
>> /Marcus
>
diff mbox

Patch

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index c2ed273..3a0f99b 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,14 @@ 
+2014-06-30  Christophe Lyon  <christophe.lyon@linaro.org>
+
+	* gcc.target/arm/README.neon-intrinsics: New file.
+	* gcc.target/aarch64/neon-intrinsics/README: Likewise.
+	* gcc.target/aarch64/neon-intrinsics/arm-neon-ref.h: Likewise.
+	* gcc.target/aarch64/neon-intrinsics/compute-ref-data.h: Likewise.
+	* gcc.target/aarch64/neon-intrinsics/neon-intrinsics.exp: Likewise.
+	* gcc.target/aarch64/neon-intrinsics/vaba.c: Likewise.
+	* gcc.target/aarch64/neon-intrinsics/vld1.c: Likewise.
+	* gcc.target/aarch64/neon-intrinsics/vshl.c: Likewise.
+
 2014-06-30  Igor Zamyatin  <igor.zamyatin@intel.com>
 
 	PR middle-end/57541
diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
new file mode 100644
index 0000000..232bb1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
@@ -0,0 +1,132 @@ 
+This directory contains executable tests for ARM/AArch64 Neon
+intrinsics.
+
+It is meant to cover execution cases of all the Advanced SIMD
+intrinsics, but does not scan the generated assembler code.
+
+The general framework is composed as follows:
+- neon-intrinsics.exp: main dejagnu driver
+- *.c: actual tests, generally one per intrinsinc family
+- arm-neon-ref.h: contains macro definitions to save typing in actual
+  test files
+- compute-ref-data.h: contains input vectors definitions
+- *.inc: generic tests, shared by several families of intrinsics. For
+   instance, unary or binary operators
+
+A typical .c test file starts with the following contents (look at
+vld1.c and vaba.c for sample cases):
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+Then, definitions of expected results, based on common input values,
+as defined in compute-ref-data.h.
+For example:
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x16, 0x17, 0x18, 0x19 };
+defines the expected results of an operator generating int16x4 values.
+
+The common input values defined in compute-ref-data.h have been chosen
+to avoid corner-case values for most operators, yet exposing negative
+values for signed operators. For this reason, their range is also
+limited. For instance, the initialization of buffer_int16x4 will be
+{ -16, -15, -14, -13 }.
+
+The initialization of floating-point values is done via hex notation,
+to avoid potential rounding problems.
+
+To test special values and corner cases, specific initialization
+values should be used in dedicated tests, to ensure proper coverage.
+An example of this is vshl.
+
+When a variant of an intrinsic is not available, its expected result
+should be defined to the value of CLEAN_PATTERN_8 as defined in
+arm-neon-ref.h. For example:
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x3333333333333333 };
+if the given intrinsic has no variant producing an int64x1 result,
+like the vcmp family (eg. vclt).
+
+This is because the helper function (check_results(), defined in
+arm-neon-ref.h), iterates over all the possible variants, to save
+typing in each individual test file. Alternatively, one can directly
+call the CHECK/CHECK_FP macros to check only a few expected results
+(see vabs.c for an example).
+
+Then, define the TEST_MSG string, which will be used when reporting errors.
+
+Next, define the function performing the actual tests, in general
+relying on the helpers provided by arm-neon-ref.h, which means:
+
+* declare necessary vectors of suitable types: using
+  DECL_VARIABLE_ALL_VARIANTS when all variants are supported, or the
+  relevant of subset calls to DECL_VARIABLE.
+
+* call clean_results() to initialize the 'results' buffers.
+
+* initialize the input vectors, using VLOAD, VDUP or VSET_LANE (vld*
+  tests do not need this step, since their actual purpose is to
+  initialize vectors).
+
+* execute the intrinsic on relevant variants, for instance using
+  TEST_MACRO_ALL_VARIANTS_2_5.
+
+* call check_results() to check that the results match the expected
+  values.
+
+A template test file could be:
+=================================================================
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xf6, 0xf7, 0xf8, 0xf9,
+				       0xfa, 0xfb, 0xfc, 0xfd };
+/* and as many others as necessary.  */
+
+#define TEST_MSG "VMYINTRINSIC"
+void exec_myintrinsic (void)
+{
+  /* my test: v4=vmyintrinsic(v1,v2,v3), then store the result.  */
+#define TEST_VMYINTR(Q, T1, T2, W, N)					\
+  VECT_VAR(vector_res, T1, W, N) =					\
+    vmyintr##Q##_##T2##W(VECT_VAR(vector1, T1, W, N),			\
+		         VECT_VAR(vector2, T1, W, N),			\
+		         VECT_VAR(vector3, T1, W, N));			\
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define DECL_VMYINTR_VAR(VAR)			\
+  DECL_VARIABLE(VAR, int, 8, 8);
+/* And as many others as necessary.  */
+
+  DECL_VMYINTR_VAR(vector1);
+  DECL_VMYINTR_VAR(vector2);
+  DECL_VMYINTR_VAR(vector3);
+  DECL_VMYINTR_VAR(vector_res);
+
+  clean_results ();
+
+  /* Initialize input "vector1" from "buffer".  */
+  VLOAD(vector1, buffer, , int, s, 8, 8);
+/* And as many others as necessary.  */
+
+  /* Choose init value arbitrarily.  */
+  VDUP(vector2, , int, s, 8, 8, 1);
+/* And as many others as necessary.  */
+
+  /* Choose init value arbitrarily.  */
+  VDUP(vector3, , int, s, 8, 8, -5);
+/* And as many others as necessary.  */
+
+  /* Execute the tests.  */
+  TEST_VMYINTR(, int, s, 8, 8);
+/* And as many others as necessary.  */
+
+  check_results (TEST_MSG, "");
+}
+
+int main (void)
+{
+  exec_vmyintrinsic ();
+  return 0;
+}
+=================================================================
diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/arm-neon-ref.h
new file mode 100644
index 0000000..8ddcbf2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/arm-neon-ref.h
@@ -0,0 +1,547 @@ 
+/* This file defines helper operations shared by all the tests.  */
+
+#ifndef _ARM_NEON_REF_H_
+#define _ARM_NEON_REF_H_
+
+#include <stdio.h>
+#include <inttypes.h>
+
+/* helper type, to help write floating point results in integer form.  */
+typedef uint32_t hfloat32_t;
+
+extern void abort(void);
+extern void *memset(void *, int, size_t);
+extern void *memcpy(void *, const void *, size_t);
+extern size_t strlen(const char *);
+
+/* Various string construction helpers.  */
+
+/*
+  The most useful at user-level are VECT_VAR and VECT_VAR_DECL, which
+   construct variable names or declarations, such as:
+   VECT_VAR(expected, int, 16, 4) -> expected_int16x4
+   VECT_VAR_DECL(expected, int, 16, 4) -> int16x4_t expected_int16x4
+*/
+
+#define xSTR(X) #X
+#define STR(X) xSTR(X)
+
+#define xNAME1(V,T) V ## _ ##  T
+#define xNAME(V,T) xNAME1(V,T)
+
+/* VAR(foo,int,16) -> foo_int16 */
+#define VAR(V,T,W) xNAME(V,T##W)
+/* VAR_DECL(foo,int,16) -> int16_t foo_int16 */
+#define VAR_DECL(V, T, W) T##W##_t VAR(V,T,W)
+
+/* VECT_NAME(int,16,4) ->  int16x4 */
+#define VECT_NAME(T, W, N) T##W##x##N
+/* VECT_ARRAY_NAME(int,16,4,2) -> int16x4x2 */
+#define VECT_ARRAY_NAME(T, W, N, L) T##W##x##N##x##L
+/* VECT_TYPE(int,16,4) -> int16x4_t */
+#define VECT_TYPE(T, W, N) xNAME(VECT_NAME(T,W,N),t)
+/* VECT_ARRAY_TYPE(int,16,4,2) -> int16x4x2_t */
+#define VECT_ARRAY_TYPE(T, W, N, L) xNAME(VECT_ARRAY_NAME(T,W,N,L),t)
+
+/* VECT_VAR(foo,int,16,4) -> foo_int16x4 */
+#define VECT_VAR(V,T,W,N) xNAME(V,VECT_NAME(T,W,N))
+/* VECT_VAR_DECL(foo,int,16,4) -> int16_t foo_int16x4 */
+#define VECT_VAR_DECL(V, T, W, N) T##W##_t VECT_VAR(V,T,W,N)
+
+/* Array declarations.  */
+/* ARRAY(foo,int,16,4) -> int16_t foo_int16x4[4] */
+#define ARRAY(V, T, W, N) VECT_VAR_DECL(V,T,W,N)[N]
+
+/* Arrays of vectors.  */
+/* VECT_ARRAY_VAR(foo,int,16,4,2) -> foo_int16x4x2 */
+#define VECT_ARRAY_VAR(V,T,W,N,L) xNAME(V,VECT_ARRAY_NAME(T,W,N,L))
+/* VECT_ARRAY(foo,int,16,4,2) -> int16_t foo_int16x4x2[4*2] */
+#define VECT_ARRAY(V, T, W, N, L) T##W##_t VECT_ARRAY_VAR(V,T,W,N,L)[N*L]
+
+/* Check results vs expected values. Operates on one vector.  */
+#define CHECK(MSG,T,W,N,FMT,EXPECTED,COMMENT)				\
+  {									\
+    int i;								\
+    for(i=0; i<N ; i++)							\
+      {									\
+	if (VECT_VAR(result, T, W, N)[i] !=				\
+	    VECT_VAR(EXPECTED, T, W, N)[i]) {				\
+	  fprintf(stderr,						\
+		  "ERROR in %s (%s line %d in buffer '%s') at type %s "	\
+		  "index %d: got 0x%" FMT " != 0x%" FMT " %s\n",	\
+		  MSG, __FILE__, __LINE__,				\
+		  STR(EXPECTED),					\
+		  STR(VECT_NAME(T, W, N)),				\
+		  i,							\
+		  VECT_VAR(result, T, W, N)[i],				\
+		  VECT_VAR(EXPECTED, T, W, N)[i],			\
+		  strlen(COMMENT) > 0 ? COMMENT : "");			\
+	  abort();							\
+	}								\
+      }									\
+  }
+
+/* Floating-point variant.  */
+#define CHECK_FP(MSG,T,W,N,FMT,EXPECTED,COMMENT)			\
+  {									\
+    int i;								\
+    for(i=0; i<N ; i++)							\
+      {									\
+	union fp_operand {						\
+	  uint##W##_t i;						\
+	  float##W##_t f;						\
+	} tmp_res, tmp_exp;						\
+	tmp_res.f = VECT_VAR(result, T, W, N)[i];			\
+	tmp_exp.i = VECT_VAR(EXPECTED, h##T, W, N)[i];			\
+	if (tmp_res.i != tmp_exp.i) {					\
+	  fprintf(stderr,						\
+		  "ERROR in %s (%s line %d in buffer '%s') at type %s "	\
+		  "index %d: got 0x%" FMT " != 0x%" FMT " %s\n",	\
+		  MSG, __FILE__, __LINE__,				\
+		  STR(EXPECTED),					\
+		  STR(VECT_NAME(T, W, N)),				\
+		  i,							\
+		  tmp_res.i,						\
+		  tmp_exp.i,						\
+		  strlen(COMMENT) > 0 ? COMMENT : "");			\
+	  abort();							\
+	}								\
+      }									\
+  }
+
+/* Clean buffer with a non-zero pattern to help diagnose buffer
+   overflows.  */
+#define CLEAN_PATTERN_8  0x33
+
+#define CLEAN(VAR,T,W,N)						\
+  memset(VECT_VAR(VAR, T, W, N),					\
+	 CLEAN_PATTERN_8,						\
+	 sizeof(VECT_VAR(VAR, T, W, N)));
+
+/* Define output buffers, one of each size.  */
+static ARRAY(result, int, 8, 8);
+static ARRAY(result, int, 16, 4);
+static ARRAY(result, int, 32, 2);
+static ARRAY(result, int, 64, 1);
+static ARRAY(result, uint, 8, 8);
+static ARRAY(result, uint, 16, 4);
+static ARRAY(result, uint, 32, 2);
+static ARRAY(result, uint, 64, 1);
+static ARRAY(result, poly, 8, 8);
+static ARRAY(result, poly, 16, 4);
+static ARRAY(result, float, 32, 2);
+static ARRAY(result, int, 8, 16);
+static ARRAY(result, int, 16, 8);
+static ARRAY(result, int, 32, 4);
+static ARRAY(result, int, 64, 2);
+static ARRAY(result, uint, 8, 16);
+static ARRAY(result, uint, 16, 8);
+static ARRAY(result, uint, 32, 4);
+static ARRAY(result, uint, 64, 2);
+static ARRAY(result, poly, 8, 16);
+static ARRAY(result, poly, 16, 8);
+static ARRAY(result, float, 32, 4);
+
+/* Declare expected results, one of each size. They are defined and
+   initialized in each test file.  */
+extern ARRAY(expected, int, 8, 8);
+extern ARRAY(expected, int, 16, 4);
+extern ARRAY(expected, int, 32, 2);
+extern ARRAY(expected, int, 64, 1);
+extern ARRAY(expected, uint, 8, 8);
+extern ARRAY(expected, uint, 16, 4);
+extern ARRAY(expected, uint, 32, 2);
+extern ARRAY(expected, uint, 64, 1);
+extern ARRAY(expected, poly, 8, 8);
+extern ARRAY(expected, poly, 16, 4);
+extern ARRAY(expected, hfloat, 32, 2);
+extern ARRAY(expected, int, 8, 16);
+extern ARRAY(expected, int, 16, 8);
+extern ARRAY(expected, int, 32, 4);
+extern ARRAY(expected, int, 64, 2);
+extern ARRAY(expected, uint, 8, 16);
+extern ARRAY(expected, uint, 16, 8);
+extern ARRAY(expected, uint, 32, 4);
+extern ARRAY(expected, uint, 64, 2);
+extern ARRAY(expected, poly, 8, 16);
+extern ARRAY(expected, poly, 16, 8);
+extern ARRAY(expected, hfloat, 32, 4);
+
+/* Check results. Operates on all possible vector types.  */
+#define CHECK_RESULTS(test_name,comment)				\
+  {									\
+    CHECK(test_name, int, 8, 8, PRIx8, expected, comment);		\
+    CHECK(test_name, int, 16, 4, PRIx16, expected, comment);		\
+    CHECK(test_name, int, 32, 2, PRIx32, expected, comment);		\
+    CHECK(test_name, int, 64, 1, PRIx64, expected, comment);		\
+    CHECK(test_name, uint, 8, 8, PRIx8, expected, comment);		\
+    CHECK(test_name, uint, 16, 4, PRIx16, expected, comment);		\
+    CHECK(test_name, uint, 32, 2, PRIx32, expected, comment);		\
+    CHECK(test_name, uint, 64, 1, PRIx64, expected, comment);		\
+    CHECK(test_name, poly, 8, 8, PRIx8, expected, comment);		\
+    CHECK(test_name, poly, 16, 4, PRIx16, expected, comment);		\
+    CHECK_FP(test_name, float, 32, 2, PRIx32, expected, comment);	\
+									\
+    CHECK(test_name, int, 8, 16, PRIx8, expected, comment);		\
+    CHECK(test_name, int, 16, 8, PRIx16, expected, comment);		\
+    CHECK(test_name, int, 32, 4, PRIx32, expected, comment);		\
+    CHECK(test_name, int, 64, 2, PRIx64, expected, comment);		\
+    CHECK(test_name, uint, 8, 16, PRIx8, expected, comment);		\
+    CHECK(test_name, uint, 16, 8, PRIx16, expected, comment);		\
+    CHECK(test_name, uint, 32, 4, PRIx32, expected, comment);		\
+    CHECK(test_name, uint, 64, 2, PRIx64, expected, comment);		\
+    CHECK(test_name, poly, 8, 16, PRIx8, expected, comment);		\
+    CHECK(test_name, poly, 16, 8, PRIx16, expected, comment);		\
+    CHECK_FP(test_name, float, 32, 4, PRIx32, expected, comment);	\
+  }									\
+
+#define CHECK_RESULTS_NAMED(test_name,EXPECTED,comment)			\
+  {									\
+    CHECK(test_name, int, 8, 8, PRIx8, EXPECTED, comment);		\
+    CHECK(test_name, int, 16, 4, PRIx16, EXPECTED, comment);		\
+    CHECK(test_name, int, 32, 2, PRIx32, EXPECTED, comment);		\
+    CHECK(test_name, int, 64, 1, PRIx64, EXPECTED, comment);		\
+    CHECK(test_name, uint, 8, 8, PRIx8, EXPECTED, comment);		\
+    CHECK(test_name, uint, 16, 4, PRIx16, EXPECTED, comment);		\
+    CHECK(test_name, uint, 32, 2, PRIx32, EXPECTED, comment);		\
+    CHECK(test_name, uint, 64, 1, PRIx64, EXPECTED, comment);		\
+    CHECK(test_name, poly, 8, 8, PRIx8, EXPECTED, comment);		\
+    CHECK(test_name, poly, 16, 4, PRIx16, EXPECTED, comment);		\
+    CHECK_FP(test_name, float, 32, 2, PRIx32, EXPECTED, comment);	\
+									\
+    CHECK(test_name, int, 8, 16, PRIx8, EXPECTED, comment);		\
+    CHECK(test_name, int, 16, 8, PRIx16, EXPECTED, comment);		\
+    CHECK(test_name, int, 32, 4, PRIx32, EXPECTED, comment);		\
+    CHECK(test_name, int, 64, 2, PRIx64, EXPECTED, comment);		\
+    CHECK(test_name, uint, 8, 16, PRIx8, EXPECTED, comment);		\
+    CHECK(test_name, uint, 16, 8, PRIx16, EXPECTED, comment);		\
+    CHECK(test_name, uint, 32, 4, PRIx32, EXPECTED, comment);		\
+    CHECK(test_name, uint, 64, 2, PRIx64, EXPECTED, comment);		\
+    CHECK(test_name, poly, 8, 16, PRIx8, EXPECTED, comment);		\
+    CHECK(test_name, poly, 16, 8, PRIx16, EXPECTED, comment);		\
+    CHECK_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment);	\
+  }									\
+
+
+
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+
+typedef union {
+  struct {
+    int _xxx:27;
+    unsigned int QC:1;
+    int V:1;
+    int C:1;
+    int Z:1;
+    int N:1;
+  } b;
+  unsigned int word;
+} _ARM_FPSCR;
+
+#else /* __ORDER_BIG_ENDIAN__ */
+
+typedef union {
+  struct {
+    int N:1;
+    int Z:1;
+    int C:1;
+    int V:1;
+    unsigned int QC:1;
+    int _dnm:27;
+  } b;
+  unsigned int word;
+} _ARM_FPSCR;
+
+#endif /* __ORDER_BIG_ENDIAN__ */
+
+#define Neon_Cumulative_Sat  __read_neon_cumulative_sat()
+#define Set_Neon_Cumulative_Sat(x)  __set_neon_cumulative_sat((x))
+
+#if defined(__aarch64__)
+static volatile int __read_neon_cumulative_sat (void) {
+    _ARM_FPSCR _afpscr_for_qc;
+    asm volatile ("mrs %0,fpsr" : "=r" (_afpscr_for_qc));
+    return _afpscr_for_qc.b.QC;
+}
+static void __set_neon_cumulative_sat (int x) {
+    _ARM_FPSCR _afpscr_for_qc;
+    asm volatile ("mrs %0,fpsr" : "=r" (_afpscr_for_qc));
+    _afpscr_for_qc.b.QC = x;
+    asm volatile ("msr fpsr,%0" : : "r" (_afpscr_for_qc));
+    return;
+}
+#else
+static volatile int __read_neon_cumulative_sat (void) {
+    _ARM_FPSCR _afpscr_for_qc;
+    asm volatile ("vmrs %0,fpscr" : "=r" (_afpscr_for_qc));
+    return _afpscr_for_qc.b.QC;
+}
+
+static void __set_neon_cumulative_sat (int x) {
+    _ARM_FPSCR _afpscr_for_qc;
+    asm volatile ("vmrs %0,fpscr" : "=r" (_afpscr_for_qc));
+    _afpscr_for_qc.b.QC = x;
+    asm volatile ("vmsr fpscr,%0" : : "r" (_afpscr_for_qc));
+    return;
+}
+#endif
+
+/* Declare expected cumulative saturation results, one for each
+   size. They are defined and initialized in relevant test files.  */
+extern int VECT_VAR(expected_cumulative_sat, int, 8, 8);
+extern int VECT_VAR(expected_cumulative_sat, int, 16, 4);
+extern int VECT_VAR(expected_cumulative_sat, int, 32, 2);
+extern int VECT_VAR(expected_cumulative_sat, int, 64, 1);
+extern int VECT_VAR(expected_cumulative_sat, uint, 8, 8);
+extern int VECT_VAR(expected_cumulative_sat, uint, 16, 4);
+extern int VECT_VAR(expected_cumulative_sat, uint, 32, 2);
+extern int VECT_VAR(expected_cumulative_sat, uint, 64, 1);
+extern int VECT_VAR(expected_cumulative_sat, int, 8, 16);
+extern int VECT_VAR(expected_cumulative_sat, int, 16, 8);
+extern int VECT_VAR(expected_cumulative_sat, int, 32, 4);
+extern int VECT_VAR(expected_cumulative_sat, int, 64, 2);
+extern int VECT_VAR(expected_cumulative_sat, uint, 8, 16);
+extern int VECT_VAR(expected_cumulative_sat, uint, 16, 8);
+extern int VECT_VAR(expected_cumulative_sat, uint, 32, 4);
+extern int VECT_VAR(expected_cumulative_sat, uint, 64, 2);
+
+/* Check cumulative saturation flag vs expected value.  */
+#define CHECK_CUMULATIVE_SAT(MSG,T,W,N,EXPECTED,COMMENT)		\
+  {									\
+    if (Neon_Cumulative_Sat !=						\
+	VECT_VAR(EXPECTED, T, W, N)) {					\
+      fprintf(stderr,							\
+	      "ERROR in %s (%s line %d in cumulative_sat '%s') at type %s: " \
+	      "got %d expected %d%s\n",					\
+	      MSG, __FILE__, __LINE__,					\
+	      STR(EXPECTED),						\
+	      STR(VECT_NAME(T, W, N)),					\
+	      Neon_Cumulative_Sat,					\
+	      VECT_VAR(EXPECTED, T, W, N),				\
+	      strlen(COMMENT) > 0 ? " " COMMENT : "");			\
+      abort();								\
+    }									\
+  }
+
+#define CHECK_CUMULATIVE_SAT_NAMED(test_name,EXPECTED,comment)		\
+  {									\
+    CHECK_CUMULATIVE_SAT(test_name, int, 8, 8, PRIx8, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, int, 16, 4, PRIx16, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, int, 32, 2, PRIx32, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, int, 64, 1, PRIx64, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, uint, 8, 8, PRIx8, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, uint, 16, 4, PRIx16, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, uint, 32, 2, PRIx32, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, uint, 64, 1, PRIx64, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, poly, 8, 8, PRIx8, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, poly, 16, 4, PRIx16, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT_FP(test_name, float, 32, 2, PRIx32, EXPECTED, comment); \
+    									\
+    CHECK_CUMULATIVE_SAT(test_name, int, 8, 16, PRIx8, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, int, 16, 8, PRIx16, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, int, 32, 4, PRIx32, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, int, 64, 2, PRIx64, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, uint, 8, 16, PRIx8, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, uint, 16, 8, PRIx16, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, uint, 32, 4, PRIx32, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, uint, 64, 2, PRIx64, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, poly, 8, 16, PRIx8, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT(test_name, poly, 16, 8, PRIx16, EXPECTED, comment); \
+    CHECK_CUMULATIVE_SAT_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment); \
+  }									\
+
+
+/* Clean output buffers before execution.  */
+static void clean_results (void)
+{
+  CLEAN(result, int, 8, 8);
+  CLEAN(result, int, 16, 4);
+  CLEAN(result, int, 32, 2);
+  CLEAN(result, int, 64, 1);
+  CLEAN(result, uint, 8, 8);
+  CLEAN(result, uint, 16, 4);
+  CLEAN(result, uint, 32, 2);
+  CLEAN(result, uint, 64, 1);
+  CLEAN(result, poly, 8, 8);
+  CLEAN(result, poly, 16, 4);
+  CLEAN(result, float, 32, 2);
+
+  CLEAN(result, int, 8, 16);
+  CLEAN(result, int, 16, 8);
+  CLEAN(result, int, 32, 4);
+  CLEAN(result, int, 64, 2);
+  CLEAN(result, uint, 8, 16);
+  CLEAN(result, uint, 16, 8);
+  CLEAN(result, uint, 32, 4);
+  CLEAN(result, uint, 64, 2);
+  CLEAN(result, poly, 8, 16);
+  CLEAN(result, poly, 16, 8);
+  CLEAN(result, float, 32, 4);
+}
+
+
+/* Helpers to declare variables of various types.   */
+#define DECL_VARIABLE(VAR, T1, W, N)		\
+  VECT_TYPE(T1, W, N) VECT_VAR(VAR, T1, W, N)
+
+/* Declare only 64 bits signed variants.  */
+#define DECL_VARIABLE_64BITS_SIGNED_VARIANTS(VAR)	\
+  DECL_VARIABLE(VAR, int, 8, 8);			\
+  DECL_VARIABLE(VAR, int, 16, 4);			\
+  DECL_VARIABLE(VAR, int, 32, 2);			\
+  DECL_VARIABLE(VAR, int, 64, 1)
+
+/* Declare only 64 bits unsigned variants.  */
+#define DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR)	\
+  DECL_VARIABLE(VAR, uint, 8, 8);			\
+  DECL_VARIABLE(VAR, uint, 16, 4);			\
+  DECL_VARIABLE(VAR, uint, 32, 2);			\
+  DECL_VARIABLE(VAR, uint, 64, 1)
+
+/* Declare only 128 bits signed variants.  */
+#define DECL_VARIABLE_128BITS_SIGNED_VARIANTS(VAR)	\
+  DECL_VARIABLE(VAR, int, 8, 16);			\
+  DECL_VARIABLE(VAR, int, 16, 8);			\
+  DECL_VARIABLE(VAR, int, 32, 4);			\
+  DECL_VARIABLE(VAR, int, 64, 2)
+
+/* Declare only 128 bits unsigned variants.  */
+#define DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR)	\
+  DECL_VARIABLE(VAR, uint, 8, 16);			\
+  DECL_VARIABLE(VAR, uint, 16, 8);			\
+  DECL_VARIABLE(VAR, uint, 32, 4);			\
+  DECL_VARIABLE(VAR, uint, 64, 2)
+
+/* Declare all 64 bits variants.  */
+#define DECL_VARIABLE_64BITS_VARIANTS(VAR)	\
+  DECL_VARIABLE_64BITS_SIGNED_VARIANTS(VAR);	\
+  DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR);	\
+  DECL_VARIABLE(VAR, poly, 8, 8);		\
+  DECL_VARIABLE(VAR, poly, 16, 4);		\
+  DECL_VARIABLE(VAR, float, 32, 2)
+
+/* Declare all 128 bits variants.  */
+#define DECL_VARIABLE_128BITS_VARIANTS(VAR)	\
+  DECL_VARIABLE_128BITS_SIGNED_VARIANTS(VAR);	\
+  DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR);	\
+  DECL_VARIABLE(VAR, poly, 8, 16);		\
+  DECL_VARIABLE(VAR, poly, 16, 8);		\
+  DECL_VARIABLE(VAR, float, 32, 4)
+
+/* Declare all variants.  */
+#define DECL_VARIABLE_ALL_VARIANTS(VAR)		\
+  DECL_VARIABLE_64BITS_VARIANTS(VAR);		\
+  DECL_VARIABLE_128BITS_VARIANTS(VAR)
+
+/* Declare all signed variants.  */
+#define DECL_VARIABLE_SIGNED_VARIANTS(VAR)	\
+  DECL_VARIABLE_64BITS_SIGNED_VARIANTS(VAR);	\
+  DECL_VARIABLE_128BITS_SIGNED_VARIANTS(VAR)
+
+/* Declare all unsigned variants.  */
+#define DECL_VARIABLE_UNSIGNED_VARIANTS(VAR)	\
+  DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR);	\
+  DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR)
+
+/* Helpers to initialize vectors.  */
+#define VDUP(VAR, Q, T1, T2, W, N, V)			\
+  VECT_VAR(VAR, T1, W, N) = vdup##Q##_n_##T2##W(V)
+
+#define VSET_LANE(VAR, Q, T1, T2, W, N, L, V)				\
+  VECT_VAR(VAR, T1, W, N) = vset##Q##_lane_##T2##W(V,			\
+						   VECT_VAR(VAR, T1, W, N), \
+						   L)
+
+/* We need to load initial values first, so rely on VLD1.  */
+#define VLOAD(VAR, BUF, Q, T1, T2, W, N)				\
+  VECT_VAR(VAR, T1, W, N) = vld1##Q##_##T2##W(VECT_VAR(BUF, T1, W, N))
+
+/* Helpers to call macros with 1 constant and 5 variable
+   arguments.  */
+#define TEST_MACRO_64BITS_SIGNED_VARIANTS_1_5(MACRO, VAR)	\
+  MACRO(VAR, , int, s, 8, 8);					\
+  MACRO(VAR, , int, s, 16, 4);					\
+  MACRO(VAR, , int, s, 32, 2);					\
+  MACRO(VAR, , int, s, 64, 1)
+
+#define TEST_MACRO_64BITS_UNSIGNED_VARIANTS_1_5(MACRO, VAR)	\
+  MACRO(VAR, , uint, u, 8, 8);					\
+  MACRO(VAR, , uint, u, 16, 4);					\
+  MACRO(VAR, , uint, u, 32, 2);					\
+  MACRO(VAR, , uint, u, 64, 1)
+
+#define TEST_MACRO_128BITS_SIGNED_VARIANTS_1_5(MACRO, VAR)	\
+  MACRO(VAR, q, int, s, 8, 16);					\
+  MACRO(VAR, q, int, s, 16, 8);					\
+  MACRO(VAR, q, int, s, 32, 4);					\
+  MACRO(VAR, q, int, s, 64, 2)
+
+#define TEST_MACRO_128BITS_UNSIGNED_VARIANTS_1_5(MACRO,VAR)	\
+  MACRO(VAR, q, uint, u, 8, 16);				\
+  MACRO(VAR, q, uint, u, 16, 8);				\
+  MACRO(VAR, q, uint, u, 32, 4);				\
+  MACRO(VAR, q, uint, u, 64, 2)
+
+#define TEST_MACRO_64BITS_VARIANTS_1_5(MACRO, VAR)	\
+  TEST_MACRO_64BITS_SIGNED_VARIANTS_1_5(MACRO, VAR);	\
+  TEST_MACRO_64BITS_UNSIGNED_VARIANTS_1_5(MACRO, VAR)
+
+#define TEST_MACRO_128BITS_VARIANTS_1_5(MACRO, VAR)	\
+  TEST_MACRO_128BITS_SIGNED_VARIANTS_1_5(MACRO, VAR);	\
+  TEST_MACRO_128BITS_UNSIGNED_VARIANTS_1_5(MACRO, VAR)
+
+#define TEST_MACRO_ALL_VARIANTS_1_5(MACRO, VAR)	\
+  TEST_MACRO_64BITS_VARIANTS_1_5(MACRO, VAR);	\
+  TEST_MACRO_128BITS_VARIANTS_1_5(MACRO, VAR)
+
+#define TEST_MACRO_SIGNED_VARIANTS_1_5(MACRO, VAR)	\
+  TEST_MACRO_64BITS_SIGNED_VARIANTS_1_5(MACRO, VAR);	\
+  TEST_MACRO_128BITS_SIGNED_VARIANTS_1_5(MACRO, VAR)
+
+/* Helpers to call macros with 2 constant and 5 variable
+   arguments.  */
+#define TEST_MACRO_64BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)	\
+  MACRO(VAR1, VAR2, , int, s, 8, 8);					\
+  MACRO(VAR1, VAR2, , int, s, 16, 4);					\
+  MACRO(VAR1, VAR2, , int, s, 32, 2);					\
+  MACRO(VAR1, VAR2 , , int, s, 64, 1)
+
+#define TEST_MACRO_64BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)	\
+  MACRO(VAR1, VAR2, , uint, u, 8, 8);					\
+  MACRO(VAR1, VAR2, , uint, u, 16, 4);					\
+  MACRO(VAR1, VAR2, , uint, u, 32, 2);					\
+  MACRO(VAR1, VAR2, , uint, u, 64, 1)
+
+#define TEST_MACRO_128BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)	\
+  MACRO(VAR1, VAR2, q, int, s, 8, 16);					\
+  MACRO(VAR1, VAR2, q, int, s, 16, 8);					\
+  MACRO(VAR1, VAR2, q, int, s, 32, 4);					\
+  MACRO(VAR1, VAR2, q, int, s, 64, 2)
+
+#define TEST_MACRO_128BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)	\
+  MACRO(VAR1, VAR2, q, uint, u, 8, 16);					\
+  MACRO(VAR1, VAR2, q, uint, u, 16, 8);					\
+  MACRO(VAR1, VAR2, q, uint, u, 32, 4);					\
+  MACRO(VAR1, VAR2, q, uint, u, 64, 2)
+
+#define TEST_MACRO_64BITS_VARIANTS_2_5(MACRO, VAR1, VAR2)	\
+  TEST_MACRO_64BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);	\
+  TEST_MACRO_64BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);	\
+  MACRO(VAR1, VAR2, , poly, p, 8, 8);				\
+  MACRO(VAR1, VAR2, , poly, p, 16, 4)
+
+#define TEST_MACRO_128BITS_VARIANTS_2_5(MACRO, VAR1, VAR2)	\
+  TEST_MACRO_128BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);	\
+  TEST_MACRO_128BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);	\
+  MACRO(VAR1, VAR2, q, poly, p, 8, 16);				\
+  MACRO(VAR1, VAR2, q, poly, p, 16, 8)
+
+#define TEST_MACRO_ALL_VARIANTS_2_5(MACRO, VAR1, VAR2)	\
+  TEST_MACRO_64BITS_VARIANTS_2_5(MACRO, VAR1, VAR2);	\
+  TEST_MACRO_128BITS_VARIANTS_2_5(MACRO, VAR1, VAR2)
+
+#define TEST_MACRO_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)	\
+  TEST_MACRO_64BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);	\
+  TEST_MACRO_128BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2)
+
+#endif /* _ARM_NEON_REF_H_ */
diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/compute-ref-data.h b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/compute-ref-data.h
new file mode 100644
index 0000000..3673d73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/compute-ref-data.h
@@ -0,0 +1,204 @@ 
+/* This file contains input data static definitions, shared by most of
+   the tests.  */
+
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+
+/* Initialization helpers; 4 slices are needed for vld2, vld3 and
+   vld4.  */
+#define MY_INIT_TABLE(T,W,N) xNAME(INIT_TABLE,N)(T##W##_t)
+#define MY_INIT_TABLE2(T,W,N) xNAME(INIT_TABLE2,N)(T##W##_t)
+#define MY_INIT_TABLE3(T,W,N) xNAME(INIT_TABLE3,N)(T##W##_t)
+#define MY_INIT_TABLE4(T,W,N) xNAME(INIT_TABLE4,N)(T##W##_t)
+
+/* Initialized input buffers.  */
+#define VECT_VAR_DECL_INIT(V, T, W, N)			\
+  VECT_VAR_DECL(V,T,W,N) [] = { MY_INIT_TABLE(T,W,N) }
+
+/* Specialized initializer with 4 entries, as used by vldX_dup and
+   vdup tests, which iterate 4 times on input buffers.  */
+#define VECT_VAR_DECL_INIT4(V, T, W, N)			\
+  VECT_VAR_DECL(V,T,W,N) [] = { MY_INIT_TABLE(T,W,4) };
+
+/* Initializers for arrays of vectors.  */
+#define VECT_ARRAY_INIT2(V, T, W, N)		\
+  T##W##_t VECT_ARRAY_VAR(V,T,W,N,2)[] =	\
+  { MY_INIT_TABLE(T,W,N)			\
+    MY_INIT_TABLE2(T,W,N) }
+
+#define VECT_ARRAY_INIT3(V, T, W, N)			\
+  T##W##_t VECT_ARRAY_VAR(V,T,W,N,3)[] =		\
+  { MY_INIT_TABLE(T,W,N)				\
+    MY_INIT_TABLE2(T,W,N)				\
+    MY_INIT_TABLE3(T,W,N) }
+
+#define VECT_ARRAY_INIT4(V, T, W, N)			\
+  T##W##_t VECT_ARRAY_VAR(V,T,W,N,4)[] =		\
+  { MY_INIT_TABLE(T,W,N)				\
+    MY_INIT_TABLE2(T,W,N)				\
+    MY_INIT_TABLE3(T,W,N)				\
+    MY_INIT_TABLE4(T,W,N) }
+
+/* Sample initialization vectors.  */
+#define INIT_TABLE_1(T)				\
+  (T)-16,
+#define INIT_TABLE2_1(T)			\
+  (T)-15,
+#define INIT_TABLE3_1(T)			\
+  (T)-14,
+#define INIT_TABLE4_1(T)			\
+  (T)-13,
+
+#define INIT_TABLE_2(T)				\
+  (T)-16, (T)-15,
+#define INIT_TABLE2_2(T)			\
+  (T)-14, (T)-13,
+#define INIT_TABLE3_2(T)			\
+  (T)-12, (T)-11,
+#define INIT_TABLE4_2(T)			\
+  (T)-10, (T)-9,
+
+/* Initializer for vld3_lane tests.  */
+#define INIT_TABLE_3(T)				\
+  (T)-16, (T)-15, (T)-14,
+
+#define INIT_TABLE_4(T)				\
+  (T)-16, (T)-15, (T)-14, (T)-13,
+#define INIT_TABLE2_4(T)			\
+  (T)-12, (T)-11, (T)-10, (T)-9,
+#define INIT_TABLE3_4(T)			\
+  (T)-8, (T)-7, (T)-6, (T)-5,
+#define INIT_TABLE4_4(T)			\
+  (T)-4, (T)-3, (T)-2, (T)-1,
+
+#define INIT_TABLE_8(T)							\
+  (T)-16, (T)-15, (T)-14, (T)-13, (T)-12, (T)-11, (T)-10, (T)-9,
+#define INIT_TABLE2_8(T)					\
+  (T)-8, (T)-7, (T)-6, (T)-5, (T)-4, (T)-3, (T)-2, (T)-1,
+#define INIT_TABLE3_8(T)				\
+  (T)0, (T)1, (T)2, (T)3, (T)4, (T)5, (T)6, (T)7,
+#define INIT_TABLE4_8(T)				\
+  (T)8, (T)9, (T)10, (T)11, (T)12, (T)13, (T)14, (T)15,
+
+#define INIT_TABLE_16(T)						\
+  (T)-16, (T)-15, (T)-14, (T)-13, (T)-12, (T)-11, (T)-10, (T)-9,	\
+  (T)-8, (T)-7, (T)-6, (T)-5, (T)-4, (T)-3, (T)-2, (T)-1,
+#define INIT_TABLE2_16(T)						\
+  (T)0, (T)1, (T)2, (T)3, (T)4, (T)5, (T)6, (T)7,			\
+  (T)8, (T)9, (T)10, (T)11, (T)12, (T)13, (T)14, (T)15,
+#define INIT_TABLE3_16(T)						\
+  (T)16, (T)17, (T)18, (T)19, (T)20, (T)21, (T)22, (T)23,		\
+   (T)24, (T)25, (T)26, (T)27, (T)28, (T)29, (T)30, (T)31,
+#define INIT_TABLE4_16(T)						\
+  (T)32, (T)33, (T)34, (T)35, (T)36, (T)37, (T)38, (T)39,		\
+  (T)40, (T)41, (T)42, (T)43, (T)44, (T)45, (T)46, (T)47,
+
+/* This one is used for padding between input buffers.  */
+#define PAD(V, T, W, N) char VECT_VAR(V,T,W,N)=42
+
+/* Input buffers, one of each size.  */
+/* Insert some padding to try to exhibit out of bounds accesses.  */
+VECT_VAR_DECL_INIT(buffer, int, 8, 8);
+PAD(buffer_pad, int, 8, 8);
+VECT_VAR_DECL_INIT(buffer, int, 16, 4);
+PAD(buffer_pad, int, 16, 4);
+VECT_VAR_DECL_INIT(buffer, int, 32, 2);
+PAD(buffer_pad, int, 32, 2);
+VECT_VAR_DECL_INIT(buffer, int, 64, 1);
+PAD(buffer_pad, int, 64, 1);
+VECT_VAR_DECL_INIT(buffer, uint, 8, 8);
+PAD(buffer_pad, uint, 8, 8);
+VECT_VAR_DECL_INIT(buffer, poly, 8, 8);
+PAD(buffer_pad, poly, 8, 8);
+VECT_VAR_DECL_INIT(buffer, poly, 16, 4);
+PAD(buffer_pad, poly, 16, 4);
+VECT_VAR_DECL_INIT(buffer, uint, 16, 4);
+PAD(buffer_pad, uint, 16, 4);
+VECT_VAR_DECL_INIT(buffer, uint, 32, 2);
+PAD(buffer_pad, uint, 32, 2);
+VECT_VAR_DECL_INIT(buffer, uint, 64, 1);
+PAD(buffer_pad, uint, 64, 1);
+VECT_VAR_DECL_INIT(buffer, float, 32, 2);
+PAD(buffer_pad, float, 32, 2);
+VECT_VAR_DECL_INIT(buffer, int, 8, 16);
+PAD(buffer_pad, int, 8, 16);
+VECT_VAR_DECL_INIT(buffer, int, 16, 8);
+PAD(buffer_pad, int, 16, 8);
+VECT_VAR_DECL_INIT(buffer, int, 32, 4);
+PAD(buffer_pad, int, 32, 4);
+VECT_VAR_DECL_INIT(buffer, int, 64, 2);
+PAD(buffer_pad, int, 64, 2);
+VECT_VAR_DECL_INIT(buffer, uint, 8, 16);
+PAD(buffer_pad, uint, 8, 16);
+VECT_VAR_DECL_INIT(buffer, uint, 16, 8);
+PAD(buffer_pad, uint, 16, 8);
+VECT_VAR_DECL_INIT(buffer, uint, 32, 4);
+PAD(buffer_pad, uint, 32, 4);
+VECT_VAR_DECL_INIT(buffer, uint, 64, 2);
+PAD(buffer_pad, uint, 64, 2);
+VECT_VAR_DECL_INIT(buffer, poly, 8, 16);
+PAD(buffer_pad, poly, 8, 16);
+VECT_VAR_DECL_INIT(buffer, poly, 16, 8);
+PAD(buffer_pad, poly, 16, 8);
+VECT_VAR_DECL_INIT(buffer, float, 32, 4);
+PAD(buffer_pad, float, 32, 4);
+
+/* The tests for vld1_dup and vdup expect at least 4 entries in the
+   input buffer, so force 1- and 2-elements initializers to have 4
+   entries (using VECT_VAR_DECL_INIT4).  */
+VECT_VAR_DECL_INIT(buffer_dup, int, 8, 8);
+VECT_VAR_DECL(buffer_dup_pad, int, 8, 8);
+VECT_VAR_DECL_INIT(buffer_dup, int, 16, 4);
+VECT_VAR_DECL(buffer_dup_pad, int, 16, 4);
+VECT_VAR_DECL_INIT4(buffer_dup, int, 32, 2);
+VECT_VAR_DECL(buffer_dup_pad, int, 32, 2);
+VECT_VAR_DECL_INIT4(buffer_dup, int, 64, 1);
+VECT_VAR_DECL(buffer_dup_pad, int, 64, 1);
+VECT_VAR_DECL_INIT(buffer_dup, uint, 8, 8);
+VECT_VAR_DECL(buffer_dup_pad, uint, 8, 8);
+VECT_VAR_DECL_INIT(buffer_dup, uint, 16, 4);
+VECT_VAR_DECL(buffer_dup_pad, uint, 16, 4);
+VECT_VAR_DECL_INIT4(buffer_dup, uint, 32, 2);
+VECT_VAR_DECL(buffer_dup_pad, uint, 32, 2);
+VECT_VAR_DECL_INIT4(buffer_dup, uint, 64, 1);
+VECT_VAR_DECL(buffer_dup_pad, uint, 64, 1);
+VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 8);
+VECT_VAR_DECL(buffer_dup_pad, poly, 8, 8);
+VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 4);
+VECT_VAR_DECL(buffer_dup_pad, poly, 16, 4);
+VECT_VAR_DECL_INIT4(buffer_dup, float, 32, 2);
+VECT_VAR_DECL(buffer_dup_pad, float, 32, 2);
+#if __ARM_NEON_FP16_INTRINSICS
+VECT_VAR_DECL(buffer_dup, float, 16, 4)[] = {0xcc00 /* -16 */, 0xcb80 /* -15 */,
+					     0xcb00 /* -14 */, 0xca80 /* -13 */};
+PAD(buffer_dup_pad, float, 16, 4);
+#endif
+VECT_VAR_DECL_INIT(buffer_dup, int, 8, 16);
+VECT_VAR_DECL(buffer_dup_pad, int, 8, 16);
+VECT_VAR_DECL_INIT(buffer_dup, int, 16, 8);
+VECT_VAR_DECL(buffer_dup_pad, int, 16, 8);
+VECT_VAR_DECL_INIT(buffer_dup, int, 32, 4);
+VECT_VAR_DECL(buffer_dup_pad, int, 32, 4);
+VECT_VAR_DECL_INIT4(buffer_dup, int, 64, 2);
+VECT_VAR_DECL(buffer_dup_pad, int, 64, 2);
+VECT_VAR_DECL_INIT(buffer_dup, uint, 8, 16);
+VECT_VAR_DECL(buffer_dup_pad, uint, 8, 16);
+VECT_VAR_DECL_INIT(buffer_dup, uint, 16, 8);
+VECT_VAR_DECL(buffer_dup_pad, uint, 16, 8);
+VECT_VAR_DECL_INIT(buffer_dup, uint, 32, 4);
+VECT_VAR_DECL(buffer_dup_pad, uint, 32, 4);
+VECT_VAR_DECL_INIT4(buffer_dup, uint, 64, 2);
+VECT_VAR_DECL(buffer_dup_pad, uint, 64, 2);
+VECT_VAR_DECL_INIT(buffer_dup, poly, 8, 16);
+VECT_VAR_DECL(buffer_dup_pad, poly, 8, 16);
+VECT_VAR_DECL_INIT(buffer_dup, poly, 16, 8);
+VECT_VAR_DECL(buffer_dup_pad, poly, 16, 8);
+VECT_VAR_DECL_INIT(buffer_dup, float, 32, 4);
+VECT_VAR_DECL(buffer_dup_pad, float, 32, 4);
+#if __ARM_NEON_FP16_INTRINSICS
+VECT_VAR_DECL(buffer_dup, float, 16, 8)[] = {0xcc00 /* -16 */, 0xcb80 /* -15 */,
+					     0xcb00 /* -14 */, 0xca80 /* -13 */,
+					     0xca00 /* -12 */, 0xc980 /* -11 */,
+					     0xc900 /* -10 */, 0xc880 /* -9 */};
+PAD(buffer_dup_pad, float, 16, 8);
+#endif
diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/neon-intrinsics.exp b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/neon-intrinsics.exp
new file mode 100644
index 0000000..38bd207
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/neon-intrinsics.exp
@@ -0,0 +1,54 @@ 
+# Copyright (C) 2013 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an ARM or AArch64 target.
+if {![istarget arm*-*-*]
+    && ![istarget aarch64*-*-*]} then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# Initialize `dg'.
+load_lib c-torture.exp
+load_lib target-supports.exp
+load_lib torture-options.exp
+
+dg-init
+
+torture-init
+set-torture-options $C_TORTURE_OPTIONS {{}} $LTO_TORTURE_OPTIONS
+
+# Make sure Neon flags are provided, if necessary.
+set additional_flags [add_options_for_arm_neon ""]
+
+# Main loop.
+foreach src [lsort [glob -nocomplain $srcdir/$subdir/*.c]] {
+    # If we're only testing specific files and this isn't one of them, skip it.
+    if ![runtest_file_p $runtests $src] then {
+        continue
+    }
+
+    c-torture-execute $src $additional_flags
+    gcc-dg-runtest $src $additional_flags
+}
+
+# All done.
+torture-finish
+dg-finish
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vaba.c b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vaba.c
new file mode 100644
index 0000000..2465cd2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vaba.c
@@ -0,0 +1,142 @@ 
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xf6, 0xf7, 0xf8, 0xf9,
+				       0xfa, 0xfb, 0xfc, 0xfd };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x16, 0x17, 0x18, 0x19 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x20, 0x21 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0x3333333333333333 };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0x53, 0x54, 0x55, 0x56,
+					0x57, 0x58, 0x59, 0x5a };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0x907, 0x908, 0x909, 0x90a };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xffffffe7, 0xffffffe8 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0x3333333333333333 };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+					0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x3333, 0x3333, 0x3333, 0x3333 };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x33333333, 0x33333333 };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0x5e, 0x5f, 0x60, 0x61,
+					0x62, 0x63, 0x64, 0x65,
+					0x66, 0x67, 0x68, 0x69,
+					0x6a, 0x6b, 0x6c, 0x6d };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xb9c, 0xb9d, 0xb9e, 0xb9f,
+					0xba0, 0xba1, 0xba2, 0xba3 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x26e0, 0x26e1, 0x26e2, 0x26e3 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x3333333333333333,
+					0x3333333333333333 };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf8, 0xf9, 0xfa, 0xfb,
+					 0xfc, 0xfd, 0xfe, 0xff,
+					 0x0, 0x1, 0x2, 0x3,
+					 0x4, 0x5, 0x6, 0x7 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff9, 0xfffa, 0xfffb, 0xfffc,
+					 0xfffd, 0xfffe, 0xffff, 0x0 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xc, 0xd, 0xe, 0xf };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x3333333333333333,
+					 0x3333333333333333 };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+					 0x33, 0x33, 0x33, 0x33,
+					 0x33, 0x33, 0x33, 0x33,
+					 0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x3333, 0x3333, 0x3333, 0x3333,
+					 0x3333, 0x3333, 0x3333, 0x3333 };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x33333333, 0x33333333,
+					   0x33333333, 0x33333333 };
+
+#define TEST_MSG "VABA/VABAQ"
+void exec_vaba (void)
+{
+  /* Basic test: v4=vaba(v1,v2,v3), then store the result.  */
+#define TEST_VABA(Q, T1, T2, W, N)					\
+  VECT_VAR(vector_res, T1, W, N) =					\
+    vaba##Q##_##T2##W(VECT_VAR(vector1, T1, W, N),			\
+		      VECT_VAR(vector2, T1, W, N),			\
+		      VECT_VAR(vector3, T1, W, N));			\
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+#define DECL_VABA_VAR(VAR)			\
+  DECL_VARIABLE(VAR, int, 8, 8);		\
+  DECL_VARIABLE(VAR, int, 16, 4);		\
+  DECL_VARIABLE(VAR, int, 32, 2);		\
+  DECL_VARIABLE(VAR, uint, 8, 8);		\
+  DECL_VARIABLE(VAR, uint, 16, 4);		\
+  DECL_VARIABLE(VAR, uint, 32, 2);		\
+  DECL_VARIABLE(VAR, int, 8, 16);		\
+  DECL_VARIABLE(VAR, int, 16, 8);		\
+  DECL_VARIABLE(VAR, int, 32, 4);		\
+  DECL_VARIABLE(VAR, uint, 8, 16);		\
+  DECL_VARIABLE(VAR, uint, 16, 8);		\
+  DECL_VARIABLE(VAR, uint, 32, 4)
+
+  DECL_VABA_VAR(vector1);
+  DECL_VABA_VAR(vector2);
+  DECL_VABA_VAR(vector3);
+  DECL_VABA_VAR(vector_res);
+
+  clean_results ();
+
+  /* Initialize input "vector1" from "buffer".  */
+  VLOAD(vector1, buffer, , int, s, 8, 8);
+  VLOAD(vector1, buffer, , int, s, 16, 4);
+  VLOAD(vector1, buffer, , int, s, 32, 2);
+  VLOAD(vector1, buffer, , uint, u, 8, 8);
+  VLOAD(vector1, buffer, , uint, u, 16, 4);
+  VLOAD(vector1, buffer, , uint, u, 32, 2);
+  VLOAD(vector1, buffer, q, int, s, 8, 16);
+  VLOAD(vector1, buffer, q, int, s, 16, 8);
+  VLOAD(vector1, buffer, q, int, s, 32, 4);
+  VLOAD(vector1, buffer, q, uint, u, 8, 16);
+  VLOAD(vector1, buffer, q, uint, u, 16, 8);
+  VLOAD(vector1, buffer, q, uint, u, 32, 4);
+
+  /* Choose init value arbitrarily.  */
+  VDUP(vector2, , int, s, 8, 8, 1);
+  VDUP(vector2, , int, s, 16, 4, -13);
+  VDUP(vector2, , int, s, 32, 2, 8);
+  VDUP(vector2, , uint, u, 8, 8, 1);
+  VDUP(vector2, , uint, u, 16, 4, 13);
+  VDUP(vector2, , uint, u, 32, 2, 8);
+  VDUP(vector2, q, int, s, 8, 16, 10);
+  VDUP(vector2, q, int, s, 16, 8, -12);
+  VDUP(vector2, q, int, s, 32, 4, 32);
+  VDUP(vector2, q, uint, u, 8, 16, 10);
+  VDUP(vector2, q, uint, u, 16, 8, 12);
+  VDUP(vector2, q, uint, u, 32, 4, 32);
+
+  /* Choose init value arbitrarily.  */
+  VDUP(vector3, , int, s, 8, 8, -5);
+  VDUP(vector3, , int, s, 16, 4, 25);
+  VDUP(vector3, , int, s, 32, 2, -40);
+  VDUP(vector3, , uint, u, 8, 8, 100);
+  VDUP(vector3, , uint, u, 16, 4, 2340);
+  VDUP(vector3, , uint, u, 32, 2, 0xffffffff);
+  VDUP(vector3, q, int, s, 8, 16, -100);
+  VDUP(vector3, q, int, s, 16, 8, -3000);
+  VDUP(vector3, q, int, s, 32, 4, 10000);
+  VDUP(vector3, q, uint, u, 8, 16, 2);
+  VDUP(vector3, q, uint, u, 16, 8, 3);
+  VDUP(vector3, q, uint, u, 32, 4, 4);
+
+  /* Execute the tests.  */
+  TEST_VABA(, int, s, 8, 8);
+  TEST_VABA(, int, s, 16, 4);
+  TEST_VABA(, int, s, 32, 2);
+  TEST_VABA(, uint, u, 8, 8);
+  TEST_VABA(, uint, u, 16, 4);
+  TEST_VABA(, uint, u, 32, 2);
+  TEST_VABA(q, int, s, 8, 16);
+  TEST_VABA(q, int, s, 16, 8);
+  TEST_VABA(q, int, s, 32, 4);
+  TEST_VABA(q, uint, u, 8, 16);
+  TEST_VABA(q, uint, u, 16, 8);
+  TEST_VABA(q, uint, u, 32, 4);
+
+  CHECK_RESULTS (TEST_MSG, "");
+}
+
+int main (void)
+{
+  exec_vaba ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vld1.c b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vld1.c
new file mode 100644
index 0000000..784e004
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vld1.c
@@ -0,0 +1,84 @@ 
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+				       0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffffff0, 0xfffffff1 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+					0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffffff0, 0xfffffff1 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+					0xf4, 0xf5, 0xf6, 0xf7 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3 };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0xc1800000, 0xc1700000 };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+					0xf4, 0xf5, 0xf6, 0xf7,
+					0xf8, 0xf9, 0xfa, 0xfb,
+					0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
+					0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0xfffffff0, 0xfffffff1,
+					0xfffffff2, 0xfffffff3 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0xfffffffffffffff0,
+					0xfffffffffffffff1 };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+					 0xf4, 0xf5, 0xf6, 0xf7,
+					 0xf8, 0xf9, 0xfa, 0xfb,
+					 0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0xfff0, 0xfff1, 0xfff2,
+					 0xfff3, 0xfff4, 0xfff5,
+					 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0xfffffff0, 0xfffffff1,
+					 0xfffffff2, 0xfffffff3 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0xfffffffffffffff0,
+					 0xfffffffffffffff1 };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0xf0, 0xf1, 0xf2, 0xf3,
+					 0xf4, 0xf5, 0xf6, 0xf7,
+					 0xf8, 0xf9, 0xfa, 0xfb,
+					 0xfc, 0xfd, 0xfe, 0xff };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0xfff0, 0xfff1, 0xfff2, 0xfff3,
+					 0xfff4, 0xfff5, 0xfff6, 0xfff7 };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0xc1800000, 0xc1700000,
+					   0xc1600000, 0xc1500000 };
+
+#define TEST_MSG "VLD1/VLD1Q"
+void exec_vld1 (void)
+{
+  /* Basic test vec=vld1(buffer); then store vec: vst1(result, vector).  */
+  /* This test actually tests vdl1 and vst1 at the same time.  */
+#define TEST_VLD1(VAR, BUF, Q, T1, T2, W, N)				\
+  VECT_VAR(VAR, T1, W, N) = vld1##Q##_##T2##W(VECT_VAR(BUF, T1, W, N)); \
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(VAR, T1, W, N))
+
+  DECL_VARIABLE_ALL_VARIANTS(vector);
+#if __ARM_NEON_FP16_INTRINSICS
+  DECL_VARIABLE(vector, float, 16, 4);
+  DECL_VARIABLE(vector, float, 16, 8);
+#endif
+
+  clean_results ();
+
+  TEST_MACRO_ALL_VARIANTS_2_5(TEST_VLD1, vector, buffer);
+
+  TEST_VLD1(vector, buffer, , float, f, 32, 2);
+  TEST_VLD1(vector, buffer, q, float, f, 32, 4);
+
+#if __ARM_NEON_FP16_INTRINSICS
+  TEST_VLD1(vector, buffer, , float, f, 16, 4);
+  TEST_VLD1(vector, buffer, q, float, f, 16, 8);
+#endif
+
+  CHECK_RESULTS (TEST_MSG, "");
+}
+
+int main (void)
+{
+  exec_vld1 ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vshl.c b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vshl.c
new file mode 100644
index 0000000..e64d6e3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/vshl.c
@@ -0,0 +1,230 @@ 
+#include <arm_neon.h>
+#include "arm-neon-ref.h"
+#include "compute-ref-data.h"
+
+/* Expected results.  */
+VECT_VAR_DECL(expected,int,8,8) [] = { 0xe0, 0xe2, 0xe4, 0xe6,
+				       0xe8, 0xea, 0xec, 0xee };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0xff80, 0xff88, 0xff90, 0xff98 };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0xfffff000, 0xfffff100 };
+VECT_VAR_DECL(expected,int,64,1) [] = { 0xffffffffffffff80 };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0xe0, 0xe2, 0xe4, 0xe6,
+					0xe8, 0xea, 0xec, 0xee };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0xff80, 0xff88, 0xff90, 0xff98 };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0xfffff000, 0xfffff100 };
+VECT_VAR_DECL(expected,uint,64,1) [] = { 0xffffffffffffff80 };
+VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+					0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,4) [] = { 0x3333, 0x3333,
+					 0x3333, 0x3333 };
+VECT_VAR_DECL(expected,hfloat,32,2) [] = { 0x33333333, 0x33333333 };
+VECT_VAR_DECL(expected,int,8,16) [] = { 0x0, 0x20, 0x40, 0x60,
+					0x80, 0xa0, 0xc0, 0xe0,
+					0x0, 0x20, 0x40, 0x60,
+					0x80, 0xa0, 0xc0, 0xe0 };
+VECT_VAR_DECL(expected,int,16,8) [] = { 0x0, 0x1000, 0x2000, 0x3000,
+					0x4000, 0x5000, 0x6000, 0x7000 };
+VECT_VAR_DECL(expected,int,32,4) [] = { 0x0, 0x40000000,
+					0x80000000, 0xc0000000 };
+VECT_VAR_DECL(expected,int,64,2) [] = { 0x0, 0x8000000000000000 };
+VECT_VAR_DECL(expected,uint,8,16) [] = { 0x0, 0x20, 0x40, 0x60,
+					 0x80, 0xa0, 0xc0, 0xe0,
+					 0x0, 0x20, 0x40, 0x60,
+					 0x80, 0xa0, 0xc0, 0xe0 };
+VECT_VAR_DECL(expected,uint,16,8) [] = { 0x0, 0x1000, 0x2000, 0x3000,
+					 0x4000, 0x5000, 0x6000, 0x7000 };
+VECT_VAR_DECL(expected,uint,32,4) [] = { 0x0, 0x40000000,
+					 0x80000000, 0xc0000000 };
+VECT_VAR_DECL(expected,uint,64,2) [] = { 0x0, 0x8000000000000000 };
+VECT_VAR_DECL(expected,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+					 0x33, 0x33, 0x33, 0x33,
+					 0x33, 0x33, 0x33, 0x33,
+					 0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,poly,16,8) [] = { 0x3333, 0x3333, 0x3333, 0x3333,
+					 0x3333, 0x3333, 0x3333, 0x3333 };
+VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x33333333, 0x33333333,
+					   0x33333333, 0x33333333 };
+
+/* Expected results with large shift amount.  */
+VECT_VAR_DECL(expected_large_shift,int,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
+						   0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,int,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,int,32,2) [] = { 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,int,64,1) [] = { 0x0 };
+VECT_VAR_DECL(expected_large_shift,uint,8,8) [] = { 0x0, 0x0, 0x0, 0x0,
+						    0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,uint,16,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,uint,32,2) [] = { 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,uint,64,1) [] = { 0x0 };
+VECT_VAR_DECL(expected_large_shift,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+						    0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected_large_shift,poly,16,4) [] = { 0x3333, 0x3333,
+						     0x3333, 0x3333 };
+VECT_VAR_DECL(expected_large_shift,hfloat,32,2) [] = { 0x33333333, 0x33333333 };
+VECT_VAR_DECL(expected_large_shift,int,8,16) [] = { 0x0, 0x0, 0x0, 0x0,
+						    0x0, 0x0, 0x0, 0x0,
+						    0x0, 0x0, 0x0, 0x0,
+						    0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,int,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+						    0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,int,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,int,64,2) [] = { 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,uint,8,16) [] = { 0x0, 0x0, 0x0, 0x0,
+						     0x0, 0x0, 0x0, 0x0,
+						     0x0, 0x0, 0x0, 0x0,
+						     0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,uint,16,8) [] = { 0x0, 0x0, 0x0, 0x0,
+						     0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,uint,32,4) [] = { 0x0, 0x0, 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,uint,64,2) [] = { 0x0, 0x0 };
+VECT_VAR_DECL(expected_large_shift,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+						     0x33, 0x33, 0x33, 0x33,
+						     0x33, 0x33, 0x33, 0x33,
+						     0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected_large_shift,poly,16,8) [] = { 0x3333, 0x3333,
+						     0x3333, 0x3333,
+						     0x3333, 0x3333,
+						     0x3333, 0x3333 };
+VECT_VAR_DECL(expected_large_shift,hfloat,32,4) [] = { 0x33333333, 0x33333333,
+						       0x33333333, 0x33333333 };
+
+
+/* Expected results with negative shift amount.  */
+VECT_VAR_DECL(expected_negative_shift,int,8,8) [] = { 0xf8, 0xf8, 0xf9, 0xf9,
+						      0xfa, 0xfa, 0xfb, 0xfb };
+VECT_VAR_DECL(expected_negative_shift,int,16,4) [] = { 0xfff8, 0xfff8,
+						       0xfff9, 0xfff9  };
+VECT_VAR_DECL(expected_negative_shift,int,32,2) [] = { 0xfffffffc, 0xfffffffc };
+VECT_VAR_DECL(expected_negative_shift,int,64,1) [] = { 0xffffffffffffffff };
+VECT_VAR_DECL(expected_negative_shift,uint,8,8) [] = { 0x78, 0x78, 0x79, 0x79,
+						       0x7a, 0x7a, 0x7b, 0x7b };
+VECT_VAR_DECL(expected_negative_shift,uint,16,4) [] = { 0x7ff8, 0x7ff8,
+							0x7ff9, 0x7ff9 };
+VECT_VAR_DECL(expected_negative_shift,uint,32,2) [] = { 0x3ffffffc,
+							0x3ffffffc };
+VECT_VAR_DECL(expected_negative_shift,uint,64,1) [] = { 0xfffffffffffffff };
+VECT_VAR_DECL(expected_negative_shift,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+						       0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected_negative_shift,poly,16,4) [] = { 0x3333, 0x3333,
+							0x3333, 0x3333 };
+VECT_VAR_DECL(expected_negative_shift,hfloat,32,2) [] = { 0x33333333,
+							  0x33333333 };
+VECT_VAR_DECL(expected_negative_shift,int,8,16) [] = { 0xfc, 0xfc, 0xfc, 0xfc,
+						       0xfd, 0xfd, 0xfd, 0xfd,
+						       0xfe, 0xfe, 0xfe, 0xfe,
+						       0xff, 0xff, 0xff, 0xff };
+VECT_VAR_DECL(expected_negative_shift,int,16,8) [] = { 0xffff, 0xffff,
+						       0xffff, 0xffff,
+						       0xffff, 0xffff,
+						       0xffff, 0xffff };
+VECT_VAR_DECL(expected_negative_shift,int,32,4) [] = {  0xfffffffe, 0xfffffffe,
+							0xfffffffe, 0xfffffffe };
+VECT_VAR_DECL(expected_negative_shift,int,64,2) [] = { 0xffffffffffffffff,
+						       0xffffffffffffffff };
+VECT_VAR_DECL(expected_negative_shift,uint,8,16) [] = { 0x3c, 0x3c, 0x3c, 0x3c,
+							0x3d, 0x3d, 0x3d, 0x3d,
+							0x3e, 0x3e, 0x3e, 0x3e,
+							0x3f, 0x3f, 0x3f, 0x3f };
+VECT_VAR_DECL(expected_negative_shift,uint,16,8) [] = { 0x7ff, 0x7ff,
+							0x7ff, 0x7ff,
+							0x7ff, 0x7ff,
+							0x7ff, 0x7ff };
+VECT_VAR_DECL(expected_negative_shift,uint,32,4) [] = { 0x1ffffffe, 0x1ffffffe,
+							0x1ffffffe, 0x1ffffffe };
+VECT_VAR_DECL(expected_negative_shift,uint,64,2) [] = { 0x7ffffffffffffff,
+							0x7ffffffffffffff };
+VECT_VAR_DECL(expected_negative_shift,poly,8,16) [] = { 0x33, 0x33, 0x33, 0x33,
+							0x33, 0x33, 0x33, 0x33,
+							0x33, 0x33, 0x33, 0x33,
+							0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected_negative_shift,poly,16,8) [] = { 0x3333, 0x3333,
+							0x3333, 0x3333,
+							0x3333, 0x3333,
+							0x3333, 0x3333 };
+VECT_VAR_DECL(expected_negative_shift,hfloat,32,4) [] = { 0x33333333,
+							  0x33333333,
+							  0x33333333,
+							  0x33333333 };
+
+
+#ifndef INSN_NAME
+#define INSN_NAME vshl
+#define TEST_MSG "VSHL/VSHLQ"
+#endif
+
+#define FNNAME1(NAME) exec_ ## NAME
+#define FNNAME(NAME) FNNAME1(NAME)
+
+void FNNAME (INSN_NAME) (void)
+{
+  /* Basic test: v3=vshl(v1,v2), then store the result.  */
+#define TEST_VSHL(T3, Q, T1, T2, W, N)					\
+  VECT_VAR(vector_res, T1, W, N) =					\
+    vshl##Q##_##T2##W(VECT_VAR(vector, T1, W, N),			\
+		      VECT_VAR(vector_shift, T3, W, N));		\
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vector_res, T1, W, N))
+
+  DECL_VARIABLE_ALL_VARIANTS(vector);
+  DECL_VARIABLE_ALL_VARIANTS(vector_res);
+
+  DECL_VARIABLE_SIGNED_VARIANTS(vector_shift);
+
+  clean_results ();
+
+  /* Initialize input "vector" from "buffer".  */
+  TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+
+  /* Choose init value arbitrarily, will be used as shift amount.  */
+  VDUP(vector_shift, , int, s, 8, 8, 1);
+  VDUP(vector_shift, , int, s, 16, 4, 3);
+  VDUP(vector_shift, , int, s, 32, 2, 8);
+  VDUP(vector_shift, , int, s, 64, 1, 3);
+  VDUP(vector_shift, q, int, s, 8, 16, 5);
+  VDUP(vector_shift, q, int, s, 16, 8, 12);
+  VDUP(vector_shift, q, int, s, 32, 4, 30);
+  VDUP(vector_shift, q, int, s, 64, 2, 63);
+
+  /* Execute the tests.  */
+  TEST_MACRO_ALL_VARIANTS_1_5(TEST_VSHL, int);
+
+  CHECK_RESULTS (TEST_MSG, "");
+
+
+  /* Test large shift amount (larger or equal to the type width.  */
+  VDUP(vector_shift, , int, s, 8, 8, 8);
+  VDUP(vector_shift, , int, s, 16, 4, 16);
+  VDUP(vector_shift, , int, s, 32, 2, 32);
+  VDUP(vector_shift, , int, s, 64, 1, 64);
+  VDUP(vector_shift, q, int, s, 8, 16, 8);
+  VDUP(vector_shift, q, int, s, 16, 8, 17);
+  VDUP(vector_shift, q, int, s, 32, 4, 33);
+  VDUP(vector_shift, q, int, s, 64, 2, 65);
+
+  /* Execute the tests.  */
+  TEST_MACRO_ALL_VARIANTS_1_5(TEST_VSHL, int);
+
+  CHECK_RESULTS_NAMED (TEST_MSG, expected_large_shift, "(large shift amount)");
+
+
+  /* Test negative shift amount. */
+  VDUP(vector_shift, , int, s, 8, 8, -1);
+  VDUP(vector_shift, , int, s, 16, 4, -1);
+  VDUP(vector_shift, , int, s, 32, 2, -2);
+  VDUP(vector_shift, , int, s, 64, 1, -4);
+  VDUP(vector_shift, q, int, s, 8, 16, -2);
+  VDUP(vector_shift, q, int, s, 16, 8, -5);
+  VDUP(vector_shift, q, int, s, 32, 4, -3);
+  VDUP(vector_shift, q, int, s, 64, 2, -5);
+
+  /* Execute the tests.  */
+  TEST_MACRO_ALL_VARIANTS_1_5(TEST_VSHL, int);
+
+  CHECK_RESULTS_NAMED (TEST_MSG, expected_negative_shift, "(negative shift amount)");
+}
+
+int main (void)
+{
+  FNNAME (INSN_NAME) ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/arm/README.neon-intrinsics b/gcc/testsuite/gcc.target/arm/README.neon-intrinsics
new file mode 100644
index 0000000..f246349
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/README.neon-intrinsics
@@ -0,0 +1 @@ 
+Advanced SIMD intrinsics tests are located in gcc.target/aarch64.