diff mbox

[AArch64,ARM,PATCHv2,3/3] Add tests for missing Poly64_t intrinsics to GCC

Message ID VI1PR0801MB20314DA97DD94B48F7814297FFB60@VI1PR0801MB2031.eurprd08.prod.outlook.com
State Superseded
Headers show

Commit Message

Tamar Christina Nov. 24, 2016, 11:45 a.m. UTC
Hi Christoph,

I have combined most of the tests in p64_p128 except for the
vreinterpret_p128 and vreinterpret_p64 ones because I felt the number
of code that would be have to be added to p64_p128 vs having them in those
files isn't worth it. Since a lot of the test setup would have to be copied.

Kind regards,
Tamar
________________________________________
From: Tamar Christina

Sent: Tuesday, November 8, 2016 11:58:46 AM
To: Christophe Lyon
Cc: GCC Patches; christophe.lyon@st.com; Marcus Shawcroft; Richard Earnshaw; James Greenhalgh; Kyrylo Tkachov; nd
Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC

Hi Christophe,

Thanks for the review!

>

> A while ago I added p64_p128.c, to contain all the poly64/128 tests except for

> vreinterpret.

> Why do you need to create p64.c ?


I originally created it because I had a much smaller set of intrinsics that I wanted to
add initially, this grew and It hadn't occurred to me that I can use the existing file now.

Another reason was the effective-target arm_crypto_ok as you mentioned below.

>

> Similarly, adding tests for vcreate_p64 etc... in p64.c or p64_p128.c might be

> easier to maintain than adding them to vcreate.c etc with several #ifdef

> conditions.


Fair enough, I'll move them to p64_p128.c.

> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)"

> condition? These intrinsics are defined in arm/arm_neon.h, right?

> They are tested in p64_p128.c


I should have looked for them, they weren't being tested before so I had
Mistakenly assumed that they weren't available. Now I realize I just need
To add the proper test option to the file to enable crypto. I'll update this as well.

> Looking at your patch, it seems some tests are currently missing for arm:

> vget_high_p64. I'm not sure why I missed it when I removed neont-

> testgen...


I'll adjust the test conditions so they run for ARM as well.

>

> Regarding vreinterpret_p128.c, doesn't the existing effective-target

> arm_crypto_ok prevent the tests from running on aarch64?


Yes they do, I was comparing the output against a clean version and hasn't noticed
That they weren't running. Thanks!

>

> Thanks,

>

> Christophe

Comments

Christophe Lyon Nov. 25, 2016, 2:53 p.m. UTC | #1
Hi Tamar,

On 24 November 2016 at 12:45, Tamar Christina <Tamar.Christina@arm.com> wrote:
> Hi Christoph,

>

> I have combined most of the tests in p64_p128 except for the

> vreinterpret_p128 and vreinterpret_p64 ones because I felt the number

> of code that would be have to be added to p64_p128 vs having them in those

> files isn't worth it. Since a lot of the test setup would have to be copied.

>


A few comments about this new version:
* arm-neon-ref.h: why do you create CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64?
Can't you just add calls to CHECK_CRYPTO in the existing
CHECK_RESULTS_NAMED_NO_FP16?

* p64_p128:
From what I can see ARM and AArch64 differ on the vceq variants
available with poly64.
For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a,
poly64x1_t __b)
For AArch64, I can't see vceq_p64 in arm_neon.h? ... Actually I've just noticed
the other you submitted while I was writing this, where you add vceq_p64 for
aarch64, but it still returns uint64_t.
Why do you change the vceq_64 test to return poly64_t instead of uint64_t?

Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64?

The comment /* vget_lane_p64 tests.  */ is wrong before VLDX_LANE tests

You need to protect the new vmov, vget_high and vget_lane tests with
#ifdef __aarch64__.

Christophe

> Kind regards,

> Tamar

> ________________________________________

> From: Tamar Christina

> Sent: Tuesday, November 8, 2016 11:58:46 AM

> To: Christophe Lyon

> Cc: GCC Patches; christophe.lyon@st.com; Marcus Shawcroft; Richard Earnshaw; James Greenhalgh; Kyrylo Tkachov; nd

> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC

>

> Hi Christophe,

>

> Thanks for the review!

>

>>

>> A while ago I added p64_p128.c, to contain all the poly64/128 tests except for

>> vreinterpret.

>> Why do you need to create p64.c ?

>

> I originally created it because I had a much smaller set of intrinsics that I wanted to

> add initially, this grew and It hadn't occurred to me that I can use the existing file now.

>

> Another reason was the effective-target arm_crypto_ok as you mentioned below.

>

>>

>> Similarly, adding tests for vcreate_p64 etc... in p64.c or p64_p128.c might be

>> easier to maintain than adding them to vcreate.c etc with several #ifdef

>> conditions.

>

> Fair enough, I'll move them to p64_p128.c.

>

>> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)"

>> condition? These intrinsics are defined in arm/arm_neon.h, right?

>> They are tested in p64_p128.c

>

> I should have looked for them, they weren't being tested before so I had

> Mistakenly assumed that they weren't available. Now I realize I just need

> To add the proper test option to the file to enable crypto. I'll update this as well.

>

>> Looking at your patch, it seems some tests are currently missing for arm:

>> vget_high_p64. I'm not sure why I missed it when I removed neont-

>> testgen...

>

> I'll adjust the test conditions so they run for ARM as well.

>

>>

>> Regarding vreinterpret_p128.c, doesn't the existing effective-target

>> arm_crypto_ok prevent the tests from running on aarch64?

>

> Yes they do, I was comparing the output against a clean version and hasn't noticed

> That they weren't running. Thanks!

>

>>

>> Thanks,

>>

>> Christophe
Christophe Lyon Nov. 25, 2016, 3:03 p.m. UTC | #2
On 25 November 2016 at 15:53, Christophe Lyon
<christophe.lyon@linaro.org> wrote:
> Hi Tamar,

>

> On 24 November 2016 at 12:45, Tamar Christina <Tamar.Christina@arm.com> wrote:

>> Hi Christoph,

>>

>> I have combined most of the tests in p64_p128 except for the

>> vreinterpret_p128 and vreinterpret_p64 ones because I felt the number

>> of code that would be have to be added to p64_p128 vs having them in those

>> files isn't worth it. Since a lot of the test setup would have to be copied.

>>

>

> A few comments about this new version:

> * arm-neon-ref.h: why do you create CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64?

> Can't you just add calls to CHECK_CRYPTO in the existing

> CHECK_RESULTS_NAMED_NO_FP16?

>

> * p64_p128:

> From what I can see ARM and AArch64 differ on the vceq variants

> available with poly64.

> For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a,

> poly64x1_t __b)

> For AArch64, I can't see vceq_p64 in arm_neon.h? ... Actually I've just noticed

> the other you submitted while I was writing this, where you add vceq_p64 for

> aarch64, but it still returns uint64_t.

> Why do you change the vceq_64 test to return poly64_t instead of uint64_t?

>

> Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64?

>

> The comment /* vget_lane_p64 tests.  */ is wrong before VLDX_LANE tests

>

> You need to protect the new vmov, vget_high and vget_lane tests with

> #ifdef __aarch64__.

>


Actually, vget_high_p64 exists on arm, so no need for the #fidef for it.


> Christophe

>

>> Kind regards,

>> Tamar

>> ________________________________________

>> From: Tamar Christina

>> Sent: Tuesday, November 8, 2016 11:58:46 AM

>> To: Christophe Lyon

>> Cc: GCC Patches; christophe.lyon@st.com; Marcus Shawcroft; Richard Earnshaw; James Greenhalgh; Kyrylo Tkachov; nd

>> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing Poly64_t intrinsics to GCC

>>

>> Hi Christophe,

>>

>> Thanks for the review!

>>

>>>

>>> A while ago I added p64_p128.c, to contain all the poly64/128 tests except for

>>> vreinterpret.

>>> Why do you need to create p64.c ?

>>

>> I originally created it because I had a much smaller set of intrinsics that I wanted to

>> add initially, this grew and It hadn't occurred to me that I can use the existing file now.

>>

>> Another reason was the effective-target arm_crypto_ok as you mentioned below.

>>

>>>

>>> Similarly, adding tests for vcreate_p64 etc... in p64.c or p64_p128.c might be

>>> easier to maintain than adding them to vcreate.c etc with several #ifdef

>>> conditions.

>>

>> Fair enough, I'll move them to p64_p128.c.

>>

>>> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)"

>>> condition? These intrinsics are defined in arm/arm_neon.h, right?

>>> They are tested in p64_p128.c

>>

>> I should have looked for them, they weren't being tested before so I had

>> Mistakenly assumed that they weren't available. Now I realize I just need

>> To add the proper test option to the file to enable crypto. I'll update this as well.

>>

>>> Looking at your patch, it seems some tests are currently missing for arm:

>>> vget_high_p64. I'm not sure why I missed it when I removed neont-

>>> testgen...

>>

>> I'll adjust the test conditions so they run for ARM as well.

>>

>>>

>>> Regarding vreinterpret_p128.c, doesn't the existing effective-target

>>> arm_crypto_ok prevent the tests from running on aarch64?

>>

>> Yes they do, I was comparing the output against a clean version and hasn't noticed

>> That they weren't running. Thanks!

>>

>>>

>>> Thanks,

>>>

>>> Christophe
Tamar Christina Nov. 25, 2016, 4:01 p.m. UTC | #3
>

> > A few comments about this new version:

> > * arm-neon-ref.h: why do you create

> CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64?

> > Can't you just add calls to CHECK_CRYPTO in the existing

> > CHECK_RESULTS_NAMED_NO_FP16?


Yes, that should be fine, I didn't used to have CHECK_CRYPTO before and when I added it
I didn't remove the split. I'll do it now.

> >

> > * p64_p128:

> > From what I can see ARM and AArch64 differ on the vceq variants

> > available with poly64.

> > For ARM, arm_neon.h contains: uint64x1_t vceq_p64 (poly64x1_t __a,

> > poly64x1_t __b) For AArch64, I can't see vceq_p64 in arm_neon.h? ...

> > Actually I've just noticed the other you submitted while I was writing

> > this, where you add vceq_p64 for aarch64, but it still returns

> > uint64_t.

> > Why do you change the vceq_64 test to return poly64_t instead of

> uint64_t?


This patch is slightly outdated. The correct type is `uint64_t` but when it was noticed
This patch was already sent. New one coming soon.

> >

> > Why do you add #ifdef __aarch64 before vldX_p64 tests and until vsli_p64?

> >


This is wrong, remove them. It was supposed to be around the vldX_lane_p64 tests.

> > The comment /* vget_lane_p64 tests.  */ is wrong before VLDX_LANE

> > tests

> >

> > You need to protect the new vmov, vget_high and vget_lane tests with

> > #ifdef __aarch64__.

> >


vget_lane is already in an #ifdef, vmov you're right, but I also notice that the
test calls VDUP instead of VMOV, which explains why I didn't get a test failure.

Thanks for the feedback,
I'll get these updated.

> 

> Actually, vget_high_p64 exists on arm, so no need for the #fidef for it.

> 

> 

> > Christophe

> >

> >> Kind regards,

> >> Tamar

> >> ________________________________________

> >> From: Tamar Christina

> >> Sent: Tuesday, November 8, 2016 11:58:46 AM

> >> To: Christophe Lyon

> >> Cc: GCC Patches; christophe.lyon@st.com; Marcus Shawcroft; Richard

> >> Earnshaw; James Greenhalgh; Kyrylo Tkachov; nd

> >> Subject: RE: [AArch64][ARM][GCC][PATCHv2 3/3] Add tests for missing

> >> Poly64_t intrinsics to GCC

> >>

> >> Hi Christophe,

> >>

> >> Thanks for the review!

> >>

> >>>

> >>> A while ago I added p64_p128.c, to contain all the poly64/128 tests

> >>> except for vreinterpret.

> >>> Why do you need to create p64.c ?

> >>

> >> I originally created it because I had a much smaller set of

> >> intrinsics that I wanted to add initially, this grew and It hadn't occurred to

> me that I can use the existing file now.

> >>

> >> Another reason was the effective-target arm_crypto_ok as you

> mentioned below.

> >>

> >>>

> >>> Similarly, adding tests for vcreate_p64 etc... in p64.c or

> >>> p64_p128.c might be easier to maintain than adding them to vcreate.c

> >>> etc with several #ifdef conditions.

> >>

> >> Fair enough, I'll move them to p64_p128.c.

> >>

> >>> For vdup-vmod.c, why do you add the "&& defined(__aarch64__)"

> >>> condition? These intrinsics are defined in arm/arm_neon.h, right?

> >>> They are tested in p64_p128.c

> >>

> >> I should have looked for them, they weren't being tested before so I

> >> had Mistakenly assumed that they weren't available. Now I realize I

> >> just need To add the proper test option to the file to enable crypto. I'll

> update this as well.

> >>

> >>> Looking at your patch, it seems some tests are currently missing for arm:

> >>> vget_high_p64. I'm not sure why I missed it when I removed neont-

> >>> testgen...

> >>

> >> I'll adjust the test conditions so they run for ARM as well.

> >>

> >>>

> >>> Regarding vreinterpret_p128.c, doesn't the existing effective-target

> >>> arm_crypto_ok prevent the tests from running on aarch64?

> >>

> >> Yes they do, I was comparing the output against a clean version and

> >> hasn't noticed That they weren't running. Thanks!

> >>

> >>>

> >>> Thanks,

> >>>

> >>> Christophe
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
index 462141586b3db7c5256c74b08fa0449210634226..174c1948221025b860aaac503354b406fa804007 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/arm-neon-ref.h
@@ -32,6 +32,13 @@  extern size_t strlen(const char *);
    VECT_VAR(expected, int, 16, 4) -> expected_int16x4
    VECT_VAR_DECL(expected, int, 16, 4) -> int16x4_t expected_int16x4
 */
+/* Some instructions don't exist on ARM.
+   Use this macro to guard against them.  */
+#ifdef __aarch64__
+#define AARCH64_ONLY(X) X
+#else
+#define AARCH64_ONLY(X)
+#endif
 
 #define xSTR(X) #X
 #define STR(X) xSTR(X)
@@ -92,6 +99,13 @@  extern size_t strlen(const char *);
     fprintf(stderr, "CHECKED %s %s\n", STR(VECT_TYPE(T, W, N)), MSG);	\
   }
 
+#if defined (__ARM_FEATURE_CRYPTO)
+#define CHECK_CRYPTO(MSG,T,W,N,FMT,EXPECTED,COMMENT) \
+	       CHECK(MSG,T,W,N,FMT,EXPECTED,COMMENT)
+#else
+#define CHECK_CRYPTO(MSG,T,W,N,FMT,EXPECTED,COMMENT)
+#endif
+
 /* Floating-point variant.  */
 #define CHECK_FP(MSG,T,W,N,FMT,EXPECTED,COMMENT)			\
   {									\
@@ -184,6 +198,9 @@  extern ARRAY(expected, uint, 32, 2);
 extern ARRAY(expected, uint, 64, 1);
 extern ARRAY(expected, poly, 8, 8);
 extern ARRAY(expected, poly, 16, 4);
+#if defined (__ARM_FEATURE_CRYPTO)
+extern ARRAY(expected, poly, 64, 1);
+#endif
 extern ARRAY(expected, hfloat, 16, 4);
 extern ARRAY(expected, hfloat, 32, 2);
 extern ARRAY(expected, hfloat, 64, 1);
@@ -197,11 +214,14 @@  extern ARRAY(expected, uint, 32, 4);
 extern ARRAY(expected, uint, 64, 2);
 extern ARRAY(expected, poly, 8, 16);
 extern ARRAY(expected, poly, 16, 8);
+#if defined (__ARM_FEATURE_CRYPTO)
+extern ARRAY(expected, poly, 64, 2);
+#endif
 extern ARRAY(expected, hfloat, 16, 8);
 extern ARRAY(expected, hfloat, 32, 4);
 extern ARRAY(expected, hfloat, 64, 2);
 
-#define CHECK_RESULTS_NAMED_NO_FP16(test_name,EXPECTED,comment)		\
+#define CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64(test_name,EXPECTED,comment)		\
   {									\
     CHECK(test_name, int, 8, 8, PRIx8, EXPECTED, comment);		\
     CHECK(test_name, int, 16, 4, PRIx16, EXPECTED, comment);		\
@@ -228,6 +248,13 @@  extern ARRAY(expected, hfloat, 64, 2);
     CHECK_FP(test_name, float, 32, 4, PRIx32, EXPECTED, comment);	\
   }									\
 
+#define CHECK_RESULTS_NAMED_NO_FP16(test_name,EXPECTED,comment)		\
+  {									\
+    CHECK_RESULTS_NAMED_NO_FP16_NO_POLY64(test_name, EXPECTED, comment);		\
+    CHECK_CRYPTO(test_name, poly, 64, 1, PRIx64, EXPECTED, comment);	\
+    CHECK_CRYPTO(test_name, poly, 64, 2, PRIx64, EXPECTED, comment);	\
+  }									\
+
 /* Check results against EXPECTED.  Operates on all possible vector types.  */
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 #define CHECK_RESULTS_NAMED(test_name,EXPECTED,comment)			\
@@ -398,6 +425,9 @@  static void clean_results (void)
   CLEAN(result, uint, 64, 1);
   CLEAN(result, poly, 8, 8);
   CLEAN(result, poly, 16, 4);
+#if defined (__ARM_FEATURE_CRYPTO)
+  CLEAN(result, poly, 64, 1);
+#endif
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
   CLEAN(result, float, 16, 4);
 #endif
@@ -413,6 +443,9 @@  static void clean_results (void)
   CLEAN(result, uint, 64, 2);
   CLEAN(result, poly, 8, 16);
   CLEAN(result, poly, 16, 8);
+#if defined (__ARM_FEATURE_CRYPTO)
+  CLEAN(result, poly, 64, 2);
+#endif
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
   CLEAN(result, float, 16, 8);
 #endif
@@ -438,6 +471,13 @@  static void clean_results (void)
 #define DECL_VARIABLE(VAR, T1, W, N)		\
   VECT_TYPE(T1, W, N) VECT_VAR(VAR, T1, W, N)
 
+#if defined (__ARM_FEATURE_CRYPTO)
+#define DECL_VARIABLE_CRYPTO(VAR, T1, W, N) \
+  DECL_VARIABLE(VAR, T1, W, N)
+#else
+#define DECL_VARIABLE_CRYPTO(VAR, T1, W, N)
+#endif
+
 /* Declare only 64 bits signed variants.  */
 #define DECL_VARIABLE_64BITS_SIGNED_VARIANTS(VAR)	\
   DECL_VARIABLE(VAR, int, 8, 8);			\
@@ -473,6 +513,7 @@  static void clean_results (void)
   DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR);	\
   DECL_VARIABLE(VAR, poly, 8, 8);		\
   DECL_VARIABLE(VAR, poly, 16, 4);		\
+  DECL_VARIABLE_CRYPTO(VAR, poly, 64, 1);	\
   DECL_VARIABLE(VAR, float, 16, 4);		\
   DECL_VARIABLE(VAR, float, 32, 2)
 #else
@@ -481,6 +522,7 @@  static void clean_results (void)
   DECL_VARIABLE_64BITS_UNSIGNED_VARIANTS(VAR);	\
   DECL_VARIABLE(VAR, poly, 8, 8);		\
   DECL_VARIABLE(VAR, poly, 16, 4);		\
+  DECL_VARIABLE_CRYPTO(VAR, poly, 64, 1);	\
   DECL_VARIABLE(VAR, float, 32, 2)
 #endif
 
@@ -491,6 +533,7 @@  static void clean_results (void)
   DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR);	\
   DECL_VARIABLE(VAR, poly, 8, 16);		\
   DECL_VARIABLE(VAR, poly, 16, 8);		\
+  DECL_VARIABLE_CRYPTO(VAR, poly, 64, 2);	\
   DECL_VARIABLE(VAR, float, 16, 8);		\
   DECL_VARIABLE(VAR, float, 32, 4)
 #else
@@ -499,6 +542,7 @@  static void clean_results (void)
   DECL_VARIABLE_128BITS_UNSIGNED_VARIANTS(VAR);	\
   DECL_VARIABLE(VAR, poly, 8, 16);		\
   DECL_VARIABLE(VAR, poly, 16, 8);		\
+  DECL_VARIABLE_CRYPTO(VAR, poly, 64, 2);	\
   DECL_VARIABLE(VAR, float, 32, 4)
 #endif
 /* Declare all variants.  */
@@ -531,6 +575,13 @@  static void clean_results (void)
 
 /* Helpers to call macros with 1 constant and 5 variable
    arguments.  */
+#if defined (__ARM_FEATURE_CRYPTO)
+#define MACRO_CRYPTO(MACRO, VAR1, VAR2, T1, T2, T3, W, N) \
+  MACRO(VAR1, VAR2, T1, T2, T3, W, N)
+#else
+#define MACRO_CRYPTO(MACRO, VAR1, VAR2, T1, T2, T3, W, N)
+#endif
+
 #define TEST_MACRO_64BITS_SIGNED_VARIANTS_1_5(MACRO, VAR)	\
   MACRO(VAR, , int, s, 8, 8);					\
   MACRO(VAR, , int, s, 16, 4);					\
@@ -601,13 +652,15 @@  static void clean_results (void)
   TEST_MACRO_64BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);	\
   TEST_MACRO_64BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);	\
   MACRO(VAR1, VAR2, , poly, p, 8, 8);				\
-  MACRO(VAR1, VAR2, , poly, p, 16, 4)
+  MACRO(VAR1, VAR2, , poly, p, 16, 4);				\
+  MACRO_CRYPTO(MACRO, VAR1, VAR2, , poly, p, 64, 1)
 
 #define TEST_MACRO_128BITS_VARIANTS_2_5(MACRO, VAR1, VAR2)	\
   TEST_MACRO_128BITS_SIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);	\
   TEST_MACRO_128BITS_UNSIGNED_VARIANTS_2_5(MACRO, VAR1, VAR2);	\
   MACRO(VAR1, VAR2, q, poly, p, 8, 16);				\
-  MACRO(VAR1, VAR2, q, poly, p, 16, 8)
+  MACRO(VAR1, VAR2, q, poly, p, 16, 8);				\
+  MACRO_CRYPTO(MACRO, VAR1, VAR2, q, poly, p, 64, 2)
 
 #define TEST_MACRO_ALL_VARIANTS_2_5(MACRO, VAR1, VAR2)	\
   TEST_MACRO_64BITS_VARIANTS_2_5(MACRO, VAR1, VAR2);	\
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c
index 519cffb0125079022e7ba876c1ca657d9e37cac2..f92c820f4c7de0dff2e593559412cf1702e860ff 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/p64_p128.c
@@ -1,8 +1,9 @@ 
 /* This file contains tests for all the *p64 intrinsics, except for
    vreinterpret which have their own testcase.  */
 
-/* { dg-require-effective-target arm_crypto_ok } */
+/* { dg-require-effective-target arm_crypto_ok { target { arm*-*-* } } } */
 /* { dg-add-options arm_crypto } */
+/* { dg-additional-options "-march=armv8-a+crypto" { target { aarch64*-*-* } } }*/
 
 #include <arm_neon.h>
 #include "arm-neon-ref.h"
@@ -14,7 +15,7 @@  VECT_VAR_DECL(vbsl_expected,poly,64,2) [] = { 0xfffffff1,
 					      0xfffffff1 };
 
 /* Expected results: vceq.  */
-VECT_VAR_DECL(vceq_expected,uint,64,1) [] = { 0x0 };
+VECT_VAR_DECL(vceq_expected,poly,64,1) [] = { 0x0 };
 
 /* Expected results: vcombine.  */
 VECT_VAR_DECL(vcombine_expected,poly,64,2) [] = { 0xfffffffffffffff0, 0x88 };
@@ -38,6 +39,17 @@  VECT_VAR_DECL(vdup_n_expected2,poly,64,1) [] = { 0xfffffffffffffff2 };
 VECT_VAR_DECL(vdup_n_expected2,poly,64,2) [] = { 0xfffffffffffffff2,
 						 0xfffffffffffffff2 };
 
+/* Expected results: vmov_n.  */
+VECT_VAR_DECL(vmov_n_expected0,poly,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(vmov_n_expected0,poly,64,2) [] = { 0xfffffffffffffff0,
+						 0xfffffffffffffff0 };
+VECT_VAR_DECL(vmov_n_expected1,poly,64,1) [] = { 0xfffffffffffffff1 };
+VECT_VAR_DECL(vmov_n_expected1,poly,64,2) [] = { 0xfffffffffffffff1,
+						 0xfffffffffffffff1 };
+VECT_VAR_DECL(vmov_n_expected2,poly,64,1) [] = { 0xfffffffffffffff2 };
+VECT_VAR_DECL(vmov_n_expected2,poly,64,2) [] = { 0xfffffffffffffff2,
+						 0xfffffffffffffff2 };
+
 /* Expected results: vext.  */
 VECT_VAR_DECL(vext_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(vext_expected,poly,64,2) [] = { 0xfffffffffffffff1, 0x88 };
@@ -45,6 +57,9 @@  VECT_VAR_DECL(vext_expected,poly,64,2) [] = { 0xfffffffffffffff1, 0x88 };
 /* Expected results: vget_low.  */
 VECT_VAR_DECL(vget_low_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
 
+/* Expected results: vget_high.  */
+VECT_VAR_DECL(vget_high_expected,poly,64,1) [] = { 0xfffffffffffffff1 };
+
 /* Expected results: vld1.  */
 VECT_VAR_DECL(vld1_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(vld1_expected,poly,64,2) [] = { 0xfffffffffffffff0,
@@ -109,6 +124,39 @@  VECT_VAR_DECL(vst1_lane_expected,poly,64,1) [] = { 0xfffffffffffffff0 };
 VECT_VAR_DECL(vst1_lane_expected,poly,64,2) [] = { 0xfffffffffffffff0,
 						   0x3333333333333333 };
 
+/* Expected results: vldX_lane.  */
+VECT_VAR_DECL(expected_vld_st2_0,poly,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected_vld_st2_0,poly,64,2) [] = { 0xfffffffffffffff0,
+						   0xfffffffffffffff1 };
+VECT_VAR_DECL(expected_vld_st2_1,poly,64,1) [] = { 0xfffffffffffffff1 };
+VECT_VAR_DECL(expected_vld_st2_1,poly,64,2) [] = { 0xaaaaaaaaaaaaaaaa,
+						   0xaaaaaaaaaaaaaaaa };
+VECT_VAR_DECL(expected_vld_st3_0,poly,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected_vld_st3_0,poly,64,2) [] = { 0xfffffffffffffff0,
+						   0xfffffffffffffff1 };
+VECT_VAR_DECL(expected_vld_st3_1,poly,64,1) [] = { 0xfffffffffffffff1 };
+VECT_VAR_DECL(expected_vld_st3_1,poly,64,2) [] = { 0xfffffffffffffff2,
+						   0xaaaaaaaaaaaaaaaa };
+VECT_VAR_DECL(expected_vld_st3_2,poly,64,1) [] = { 0xfffffffffffffff2 };
+VECT_VAR_DECL(expected_vld_st3_2,poly,64,2) [] = { 0xaaaaaaaaaaaaaaaa,
+						   0xaaaaaaaaaaaaaaaa };
+VECT_VAR_DECL(expected_vld_st4_0,poly,64,1) [] = { 0xfffffffffffffff0 };
+VECT_VAR_DECL(expected_vld_st4_0,poly,64,2) [] = { 0xfffffffffffffff0,
+						   0xfffffffffffffff1 };
+VECT_VAR_DECL(expected_vld_st4_1,poly,64,1) [] = { 0xfffffffffffffff1 };
+VECT_VAR_DECL(expected_vld_st4_1,poly,64,2) [] = { 0xfffffffffffffff2,
+						   0xfffffffffffffff3 };
+VECT_VAR_DECL(expected_vld_st4_2,poly,64,1) [] = { 0xfffffffffffffff2 };
+VECT_VAR_DECL(expected_vld_st4_2,poly,64,2) [] = { 0xaaaaaaaaaaaaaaaa,
+						   0xaaaaaaaaaaaaaaaa };
+VECT_VAR_DECL(expected_vld_st4_3,poly,64,1) [] = { 0xfffffffffffffff3 };
+VECT_VAR_DECL(expected_vld_st4_3,poly,64,2) [] = { 0xaaaaaaaaaaaaaaaa,
+						   0xaaaaaaaaaaaaaaaa };
+
+/* Expected results: vget_lane.  */
+VECT_VAR_DECL(vget_lane_expected,poly,64,1) = 0xfffffffffffffff0;
+VECT_VAR_DECL(vget_lane_expected,poly,64,2) = 0xfffffffffffffff0;
+
 int main (void)
 {
   int i;
@@ -159,24 +207,24 @@  int main (void)
   VECT_VAR(vceq_vector_res, T3, W, N) =					\
     INSN##Q##_##T2##W(VECT_VAR(vceq_vector, T1, W, N),			\
 		      VECT_VAR(vceq_vector2, T1, W, N));		\
-  vst1##Q##_u##W(VECT_VAR(result, T3, W, N), VECT_VAR(vceq_vector_res, T3, W, N))
+  vst1##Q##_p##W(VECT_VAR(result, T3, W, N), VECT_VAR(vceq_vector_res, T3, W, N))
 
 #define TEST_VCOMP(INSN, Q, T1, T2, T3, W, N)				\
-  TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N)
+  TEST_VCOMP1(INSN, Q, T1, T2, T3, W, N);
 
   DECL_VARIABLE(vceq_vector, poly, 64, 1);
   DECL_VARIABLE(vceq_vector2, poly, 64, 1);
-  DECL_VARIABLE(vceq_vector_res, uint, 64, 1);
+  DECL_VARIABLE(vceq_vector_res, poly, 64, 1);
 
-  CLEAN(result, uint, 64, 1);
+  CLEAN(result, poly, 64, 1);
 
   VLOAD(vceq_vector, buffer, , poly, p, 64, 1);
 
   VDUP(vceq_vector2, , poly, p, 64, 1, 0x88);
 
-  TEST_VCOMP(vceq, , poly, p, uint, 64, 1);
+  TEST_VCOMP(vceq, , poly, p, poly, 64, 1);
 
-  CHECK(TEST_MSG, uint, 64, 1, PRIx64, vceq_expected, "");
+  CHECK(TEST_MSG, poly, 64, 1, PRIx64, vceq_expected, "");
 
   /* vcombine_p64 tests.  */
 #undef TEST_MSG
@@ -288,6 +336,44 @@  int main (void)
     }
   }
 
+  /* vmov_n_p64 tests.  */
+#undef TEST_MSG
+#define TEST_MSG "VMOV/VMOVQ"
+
+#define TEST_VMOV(Q, T1, T2, W, N)					\
+  VECT_VAR(vmov_n_vector, T1, W, N) =					\
+    vmov##Q##_n_##T2##W(VECT_VAR(buffer_dup, T1, W, N)[i]);		\
+  vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vmov_n_vector, T1, W, N))
+
+  DECL_VARIABLE(vmov_n_vector, poly, 64, 1);
+  DECL_VARIABLE(vmov_n_vector, poly, 64, 2);
+
+  /* Try to read different places from the input buffer.  */
+  for (i=0; i< 3; i++) {
+    CLEAN(result, poly, 64, 1);
+    CLEAN(result, poly, 64, 2);
+
+    TEST_VDUP(, poly, p, 64, 1);
+    TEST_VDUP(q, poly, p, 64, 2);
+
+    switch (i) {
+    case 0:
+      CHECK(TEST_MSG, poly, 64, 1, PRIx64, vmov_n_expected0, "");
+      CHECK(TEST_MSG, poly, 64, 2, PRIx64, vmov_n_expected0, "");
+      break;
+    case 1:
+      CHECK(TEST_MSG, poly, 64, 1, PRIx64, vmov_n_expected1, "");
+      CHECK(TEST_MSG, poly, 64, 2, PRIx64, vmov_n_expected1, "");
+      break;
+    case 2:
+      CHECK(TEST_MSG, poly, 64, 1, PRIx64, vmov_n_expected2, "");
+      CHECK(TEST_MSG, poly, 64, 2, PRIx64, vmov_n_expected2, "");
+      break;
+    default:
+      abort();
+    }
+  }
+
   /* vexit_p64 tests.  */
 #undef TEST_MSG
 #define TEST_MSG "VEXT/VEXTQ"
@@ -341,6 +427,26 @@  int main (void)
 
   CHECK(TEST_MSG, poly, 64, 1, PRIx64, vget_low_expected, "");
 
+  /* vget_high_p64 tests.  */
+#undef TEST_MSG
+#define TEST_MSG "VGET_HIGH"
+
+#define TEST_VGET_HIGH(T1, T2, W, N, N2)					\
+  VECT_VAR(vget_high_vector64, T1, W, N) =				\
+    vget_high_##T2##W(VECT_VAR(vget_high_vector128, T1, W, N2));		\
+  vst1_##T2##W(VECT_VAR(result, T1, W, N), VECT_VAR(vget_high_vector64, T1, W, N))
+
+  DECL_VARIABLE(vget_high_vector64, poly, 64, 1);
+  DECL_VARIABLE(vget_high_vector128, poly, 64, 2);
+
+  CLEAN(result, poly, 64, 1);
+
+  VLOAD(vget_high_vector128, buffer, q, poly, p, 64, 2);
+
+  TEST_VGET_HIGH(poly, p, 64, 1, 2);
+
+  CHECK(TEST_MSG, poly, 64, 1, PRIx64, vget_high_expected, "");
+
   /* vld1_p64 tests.  */
 #undef TEST_MSG
 #define TEST_MSG "VLD1/VLD1Q"
@@ -432,6 +538,8 @@  int main (void)
   CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld1_lane_expected, "");
   CHECK(TEST_MSG, poly, 64, 2, PRIx64, vld1_lane_expected, "");
 
+#ifdef __aarch64__
+
   /* vldX_p64 tests.  */
 #define DECL_VLDX(T1, W, N, X)						\
   VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vldX_vector, T1, W, N, X); \
@@ -560,6 +668,8 @@  int main (void)
   TEST_VLDX_DUP_EXTRA_CHUNK(poly, 64, 1, 4, 3);
   CHECK(TEST_MSG, poly, 64, 1, PRIx64, vld4_dup_expected_3, "chunk 3");
 
+#endif /* __aarch64__.  */
+
   /* vsli_p64 tests.  */
 #undef TEST_MSG
 #define TEST_MSG "VSLI"
@@ -645,7 +755,7 @@  int main (void)
   VECT_VAR(vst1_lane_vector, T1, W, N) =				\
     vld1##Q##_##T2##W(VECT_VAR(buffer, T1, W, N));			\
   vst1##Q##_lane_##T2##W(VECT_VAR(result, T1, W, N),			\
-			 VECT_VAR(vst1_lane_vector, T1, W, N), L)
+			 VECT_VAR(vst1_lane_vector, T1, W, N), L);
 
   DECL_VARIABLE(vst1_lane_vector, poly, 64, 1);
   DECL_VARIABLE(vst1_lane_vector, poly, 64, 2);
@@ -659,5 +769,260 @@  int main (void)
   CHECK(TEST_MSG, poly, 64, 1, PRIx64, vst1_lane_expected, "");
   CHECK(TEST_MSG, poly, 64, 2, PRIx64, vst1_lane_expected, "");
 
+#ifdef __aarch64__
+
+  /* vget_lane_p64 tests.  */
+#undef TEST_MSG
+#define TEST_MSG "VGET_LANE/VGETQ_LANE"
+
+#define TEST_VGET_LANE(Q, T1, T2, W, N, L)				   \
+  VECT_VAR(vget_lane_vector, T1, W, N) = vget##Q##_lane_##T2##W(VECT_VAR(vector, T1, W, N), L); \
+  if (VECT_VAR(vget_lane_vector, T1, W, N) != VECT_VAR(vget_lane_expected, T1, W, N)) {		\
+    fprintf(stderr,							   \
+	    "ERROR in %s (%s line %d in result '%s') at type %s "	   \
+	    "got 0x%" PRIx##W " != 0x%" PRIx##W "\n",			   \
+	    TEST_MSG, __FILE__, __LINE__,				   \
+	    STR(VECT_VAR(vget_lane_expected, T1, W, N)),		   \
+	    STR(VECT_NAME(T1, W, N)),					   \
+	    VECT_VAR(vget_lane_vector, T1, W, N),			   \
+	    VECT_VAR(vget_lane_expected, T1, W, N));			   \
+    abort ();								   \
+  }
+
+  /* Initialize input values.  */
+  DECL_VARIABLE(vector, poly, 64, 1);
+  DECL_VARIABLE(vector, poly, 64, 2);
+
+  VLOAD(vector, buffer,  , poly, p, 64, 1);
+  VLOAD(vector, buffer, q, poly, p, 64, 2);
+
+  VECT_VAR_DECL(vget_lane_vector, poly, 64, 1);
+  VECT_VAR_DECL(vget_lane_vector, poly, 64, 2);
+
+  TEST_VGET_LANE( , poly, p, 64, 1, 0);
+  TEST_VGET_LANE(q, poly, p, 64, 2, 0);
+
+  /* vget_lane_p64 tests.  */
+#undef TEST_MSG
+#define TEST_MSG "VLDX_LANE/VLDXQ_LANE"
+
+VECT_VAR_DECL_INIT(buffer_vld2_lane, poly, 64, 2);
+VECT_VAR_DECL_INIT(buffer_vld3_lane, poly, 64, 3);
+VECT_VAR_DECL_INIT(buffer_vld4_lane, poly, 64, 4);
+
+  /* In this case, input variables are arrays of vectors.  */
+#define DECL_VLD_STX_LANE(T1, W, N, X)					\
+  VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vector, T1, W, N, X);	\
+  VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vector_src, T1, W, N, X);	\
+  VECT_VAR_DECL(result_bis_##X, T1, W, N)[X * N]
+
+  /* We need to use a temporary result buffer (result_bis), because
+     the one used for other tests is not large enough. A subset of the
+     result data is moved from result_bis to result, and it is this
+     subset which is used to check the actual behavior. The next
+     macro enables to move another chunk of data from result_bis to
+     result.  */
+  /* We also use another extra input buffer (buffer_src), which we
+     fill with 0xAA, and which it used to load a vector from which we
+     read a given lane.  */
+
+#define TEST_VLDX_LANE(Q, T1, T2, W, N, X, L)				\
+  memset (VECT_VAR(buffer_src, T1, W, N), 0xAA,				\
+	  sizeof(VECT_VAR(buffer_src, T1, W, N)));			\
+									\
+  VECT_ARRAY_VAR(vector_src, T1, W, N, X) =				\
+    vld##X##Q##_##T2##W(VECT_VAR(buffer_src, T1, W, N));		\
+									\
+  VECT_ARRAY_VAR(vector, T1, W, N, X) =					\
+    /* Use dedicated init buffer, of size.  X */			\
+    vld##X##Q##_lane_##T2##W(VECT_VAR(buffer_vld##X##_lane, T1, W, X),	\
+			     VECT_ARRAY_VAR(vector_src, T1, W, N, X),	\
+			     L);					\
+  vst##X##Q##_##T2##W(VECT_VAR(result_bis_##X, T1, W, N),		\
+		      VECT_ARRAY_VAR(vector, T1, W, N, X));		\
+  memcpy(VECT_VAR(result, T1, W, N), VECT_VAR(result_bis_##X, T1, W, N), \
+	 sizeof(VECT_VAR(result, T1, W, N)))
+
+  /* Overwrite "result" with the contents of "result_bis"[Y].  */
+#undef TEST_EXTRA_CHUNK
+#define TEST_EXTRA_CHUNK(T1, W, N, X, Y)		\
+  memcpy(VECT_VAR(result, T1, W, N),			\
+	 &(VECT_VAR(result_bis_##X, T1, W, N)[Y*N]),	\
+	 sizeof(VECT_VAR(result, T1, W, N)));
+
+  /* Add some padding to try to catch out of bound accesses.  */
+#define ARRAY1(V, T, W, N) VECT_VAR_DECL(V,T,W,N)[1]={42}
+#define DUMMY_ARRAY(V, T, W, N, L) \
+  VECT_VAR_DECL(V,T,W,N)[N*L]={0}; \
+  ARRAY1(V##_pad,T,W,N)
+
+#define DECL_ALL_VLD_STX_LANE(X)     \
+  DECL_VLD_STX_LANE(poly, 64, 1, X); \
+  DECL_VLD_STX_LANE(poly, 64, 2, X);
+
+#define TEST_ALL_VLDX_LANE(X)		  \
+  TEST_VLDX_LANE(, poly, p, 64, 1, X, 0); \
+  TEST_VLDX_LANE(q, poly, p, 64, 2, X, 0);
+
+#define TEST_ALL_EXTRA_CHUNKS(X,Y)	     \
+  TEST_EXTRA_CHUNK(poly, 64, 1, X, Y) \
+  TEST_EXTRA_CHUNK(poly, 64, 2, X, Y)
+
+#define CHECK_RESULTS_VLD_STX_LANE(test_name,EXPECTED,comment)	\
+  CHECK(test_name, poly, 64, 1, PRIx64, EXPECTED, comment);	\
+  CHECK(test_name, poly, 64, 2, PRIx64, EXPECTED, comment);
+
+  /* Declare the temporary buffers / variables.  */
+  DECL_ALL_VLD_STX_LANE(2);
+  DECL_ALL_VLD_STX_LANE(3);
+  DECL_ALL_VLD_STX_LANE(4);
+
+  DUMMY_ARRAY(buffer_src, poly, 64, 1, 4);
+  DUMMY_ARRAY(buffer_src, poly, 64, 2, 4);
+
+  /* Check vld2_lane/vld2q_lane.  */
+  clean_results ();
+#undef TEST_MSG
+#define TEST_MSG "VLD2_LANE/VLD2Q_LANE"
+  TEST_ALL_VLDX_LANE(2);
+  CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st2_0, " chunk 0");
+
+  TEST_ALL_EXTRA_CHUNKS(2, 1);
+  CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st2_1, " chunk 1");
+
+  /* Check vld3_lane/vld3q_lane.  */
+  clean_results ();
+#undef TEST_MSG
+#define TEST_MSG "VLD3_LANE/VLD3Q_LANE"
+  TEST_ALL_VLDX_LANE(3);
+  CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st3_0, " chunk 0");
+
+  TEST_ALL_EXTRA_CHUNKS(3, 1);
+  CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st3_1, " chunk 1");
+
+  TEST_ALL_EXTRA_CHUNKS(3, 2);
+  CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st3_2, " chunk 2");
+
+  /* Check vld4_lane/vld4q_lane.  */
+  clean_results ();
+#undef TEST_MSG
+#define TEST_MSG "VLD4_LANE/VLD4Q_LANE"
+  TEST_ALL_VLDX_LANE(4);
+  CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st4_0, " chunk 0");
+
+  TEST_ALL_EXTRA_CHUNKS(4, 1);
+  CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st4_1, " chunk 1");
+  TEST_ALL_EXTRA_CHUNKS(4, 2);
+
+  CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st4_2, " chunk 2");
+
+  TEST_ALL_EXTRA_CHUNKS(4, 3);
+  CHECK_RESULTS_VLD_STX_LANE (TEST_MSG, expected_vld_st4_3, " chunk 3");
+
+  /* In this case, input variables are arrays of vectors.  */
+#define DECL_VSTX_LANE(T1, W, N, X)					\
+  VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vector, T1, W, N, X);	\
+  VECT_ARRAY_TYPE(T1, W, N, X) VECT_ARRAY_VAR(vector_src, T1, W, N, X);	\
+  VECT_VAR_DECL(result_bis_##X, T1, W, N)[X * N]
+
+  /* We need to use a temporary result buffer (result_bis), because
+     the one used for other tests is not large enough. A subset of the
+     result data is moved from result_bis to result, and it is this
+     subset which is used to check the actual behavior. The next
+     macro enables to move another chunk of data from result_bis to
+     result.  */
+  /* We also use another extra input buffer (buffer_src), which we
+     fill with 0xAA, and which it used to load a vector from which we
+     read a given lane.  */
+#define TEST_VSTX_LANE(Q, T1, T2, W, N, X, L)				 \
+  memset (VECT_VAR(buffer_src, T1, W, N), 0xAA,				 \
+	  sizeof(VECT_VAR(buffer_src, T1, W, N)));			 \
+  memset (VECT_VAR(result_bis_##X, T1, W, N), 0,			 \
+	  sizeof(VECT_VAR(result_bis_##X, T1, W, N)));			 \
+									 \
+  VECT_ARRAY_VAR(vector_src, T1, W, N, X) =				 \
+    vld##X##Q##_##T2##W(VECT_VAR(buffer_src, T1, W, N));		 \
+									 \
+  VECT_ARRAY_VAR(vector, T1, W, N, X) =					 \
+    /* Use dedicated init buffer, of size X.  */			 \
+    vld##X##Q##_lane_##T2##W(VECT_VAR(buffer_vld##X##_lane, T1, W, X),	 \
+			     VECT_ARRAY_VAR(vector_src, T1, W, N, X),	 \
+			     L);					 \
+  vst##X##Q##_lane_##T2##W(VECT_VAR(result_bis_##X, T1, W, N),		 \
+			   VECT_ARRAY_VAR(vector, T1, W, N, X),		 \
+			   L);						 \
+  memcpy(VECT_VAR(result, T1, W, N), VECT_VAR(result_bis_##X, T1, W, N), \
+	 sizeof(VECT_VAR(result, T1, W, N)));
+
+#define TEST_ALL_VSTX_LANE(X)		  \
+  TEST_VSTX_LANE(, poly, p, 64, 1, X, 0); \
+  TEST_VSTX_LANE(q, poly, p, 64, 2, X, 0);
+
+  /* Check vst2_lane/vst2q_lane.  */
+  clean_results ();
+#undef TEST_MSG
+#define TEST_MSG "VST2_LANE/VST2Q_LANE"
+  TEST_ALL_VSTX_LANE(2);
+
+#define CMT " (chunk 0)"
+  CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st2_0, CMT);
+
+  TEST_ALL_EXTRA_CHUNKS(2, 1);
+#undef CMT
+#define CMT " chunk 1"
+  CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st2_1, CMT);
+
+  /* Check vst3_lane/vst3q_lane.  */
+  clean_results ();
+#undef TEST_MSG
+#define TEST_MSG "VST3_LANE/VST3Q_LANE"
+  TEST_ALL_VSTX_LANE(3);
+
+#undef CMT
+#define CMT " (chunk 0)"
+  CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st3_0, CMT);
+
+  TEST_ALL_EXTRA_CHUNKS(3, 1);
+
+#undef CMT
+#define CMT " (chunk 1)"
+  CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st3_1, CMT);
+
+  TEST_ALL_EXTRA_CHUNKS(3, 2);
+
+#undef CMT
+#define CMT " (chunk 2)"
+  CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st3_2, CMT);
+
+  /* Check vst4_lane/vst4q_lane.  */
+  clean_results ();
+#undef TEST_MSG
+#define TEST_MSG "VST4_LANE/VST4Q_LANE"
+  TEST_ALL_VSTX_LANE(4);
+
+#undef CMT
+#define CMT " (chunk 0)"
+  CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st4_0, CMT);
+
+  TEST_ALL_EXTRA_CHUNKS(4, 1);
+
+#undef CMT
+#define CMT " (chunk 1)"
+  CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st4_1, CMT);
+
+  TEST_ALL_EXTRA_CHUNKS(4, 2);
+
+#undef CMT
+#define CMT " (chunk 2)"
+  CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st4_2, CMT);
+
+  TEST_ALL_EXTRA_CHUNKS(4, 3);
+
+#undef CMT
+#define CMT " (chunk 3)"
+  CHECK(TEST_MSG, poly, 64, 1, PRIx64, expected_vld_st4_3, CMT);
+
+#endif /* __aarch64__.  */
+
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c
index 808641524c47b2c245ee2f10e74a784a7bccefc9..f192d4dda514287c8417e7fc922bc580b209b163 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c
@@ -1,7 +1,8 @@ 
 /* This file contains tests for the vreinterpret *p128 intrinsics.  */
 
-/* { dg-require-effective-target arm_crypto_ok } */
+/* { dg-require-effective-target arm_crypto_ok { target { arm*-*-* } } } */
 /* { dg-add-options arm_crypto } */
+/* { dg-additional-options "-march=armv8-a+crypto" { target { aarch64*-*-* } } }*/
 
 #include <arm_neon.h>
 #include "arm-neon-ref.h"
@@ -78,9 +79,7 @@  VECT_VAR_DECL(vreint_expected_q_f16_p128,hfloat,16,8) [] = { 0xfff0, 0xffff,
 int main (void)
 {
   DECL_VARIABLE_128BITS_VARIANTS(vreint_vector);
-  DECL_VARIABLE(vreint_vector, poly, 64, 2);
   DECL_VARIABLE_128BITS_VARIANTS(vreint_vector_res);
-  DECL_VARIABLE(vreint_vector_res, poly, 64, 2);
 
   clean_results ();
 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c
index 1d8cf9aa69f0b5b0717e98de613e3c350d6395d4..c915fd2fea6b4d8770c9a4aab88caad391105d89 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c
@@ -1,7 +1,8 @@ 
 /* This file contains tests for the vreinterpret *p64 intrinsics.  */
 
-/* { dg-require-effective-target arm_crypto_ok } */
+/* { dg-require-effective-target arm_crypto_ok { target { arm*-*-* } } } */
 /* { dg-add-options arm_crypto } */
+/* { dg-additional-options "-march=armv8-a+crypto" { target { aarch64*-*-* } } }*/
 
 #include <arm_neon.h>
 #include "arm-neon-ref.h"
@@ -121,11 +122,7 @@  int main (void)
   CHECK_FP(TEST_MSG, T1, W, N, PRIx##W, EXPECTED, "");
 
   DECL_VARIABLE_ALL_VARIANTS(vreint_vector);
-  DECL_VARIABLE(vreint_vector, poly, 64, 1);
-  DECL_VARIABLE(vreint_vector, poly, 64, 2);
   DECL_VARIABLE_ALL_VARIANTS(vreint_vector_res);
-  DECL_VARIABLE(vreint_vector_res, poly, 64, 1);
-  DECL_VARIABLE(vreint_vector_res, poly, 64, 2);
 
   clean_results ();