From patchwork Tue Feb  9 17:00:57 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Charles Baylis <charles.baylis@linaro.org>
X-Patchwork-Id: 61568
Delivered-To: patch@linaro.org
Received: by 10.112.43.199 with SMTP id y7csp2159579lbl;
 Tue, 9 Feb 2016 09:01:19 -0800 (PST)
X-Received: by 10.98.69.78 with SMTP id s75mr52148625pfa.102.1455037278933; 
 Tue, 09 Feb 2016 09:01:18 -0800 (PST)
Return-Path: <gcc-patches-return-421079-patch=linaro.org@gcc.gnu.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 bc9si55020376pad.140.2016.02.09.09.01.18 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 09 Feb 2016 09:01:18 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-421079-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com; spf=pass (google.com: domain of
 gcc-patches-return-421079-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=gcc-patches-return-421079-patch=linaro.org@gcc.gnu.org;
 dkim=pass header.i=@gcc.gnu.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :mime-version:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type; q=dns; s=default; b=aUMzLgNzBrCLLm9t1x
 uLDlTT91KLTCyHWIA5H9vzZl959oIdJtMgC0dYYZZiKvRqSRTYlAWFvHjBODYtLu
 /uoNch0il+65M6usQqN0s4tJ1p8xMaX9Ja9HYW0yzAfxIf8ma4SQoHNwHFNT+QSz
 ujwlCI9DULTqb9xvfLAGr0DWQ=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :mime-version:in-reply-to:references:date:message-id:subject
 :from:to:cc:content-type; s=default; bh=6T/WJegC/bvSMP/mncwsWBZ/
 neU=; b=izQNGfaa9/zxgmoli5cyw55qmHwM1QwEPooytqPsfiA1ldQ1fdW0s7tR
 7WQcifNydIxJ+EHMHfbIkjP/YOKrIGK1yzY0VJrNVY7ajQ3xy9si52b/kVoFF5GR
 N47Vbx5uAZ04ZtjPL8cur0RmXo1DnxZ/xwNctGIK8ZZPhRXco38=
Received: (qmail 53029 invoked by alias); 9 Feb 2016 17:01:03 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-patch=linaro.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 53009 invoked by uid 89); 9 Feb 2016 17:01:02 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL, BAYES_50,
 RCVD_IN_DNSWL_LOW,
 SPF_PASS autolearn=ham version=3.3.2 spammy=uint8x16x2_t, *gen,
 sk:arm_evp, Baylis
X-HELO: mail-oi0-f45.google.com
Received: from mail-oi0-f45.google.com (HELO mail-oi0-f45.google.com)
 (209.85.218.45) by sourceware.org
 (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256
 encrypted) ESMTPS; Tue, 09 Feb 2016 17:00:59 +0000
Received: by mail-oi0-f45.google.com with SMTP id s4so18760692oif.3 for
 <gcc-patches@gcc.gnu.org>; Tue, 09 Feb 2016 09:00:59 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
 s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:date
 :message-id:subject:from:to:cc:content-type;
 bh=XHTVeLgDDA18x3qmJsD1rohnGQTLAuY9heSv3L6rdjQ=;
 b=UuUw11ejwtpyTSTNPNfPLjP+Lqp6oUgF/T32nnMlWxcIVwb6S+uGzoh2Jbopncsgk4
 kOK6bv4pL4vEdz+R0+YPxcDop9saTllM2rG8sDO0NGP2E/H+x1YXRgx0QBLcUonGFrxI
 zi7Iln4MzizgIWXwaWn+6ymbLnEysz+3rwgKt7X7DzZcmBwJJwCqz2Vuv7Y94toteawL
 fKT0dTX+XDcFPDfGZHGF+HqWQ5QV+P9MFgS4hgQMtAzaJNcv2vLU0KW6u3IvGFLV6HHr
 AEOELQ4HGfSx3ABR3iSoU1ulQ6zRv+LIAi9AHTbU1R5cMSsfAa2u73lkQH0n3joN+HDx
 ULbw==
X-Gm-Message-State: AG10YOQz9af/xjepds+BG0kfGYPaBd3h7i6FkgHUgzqpExKBC5DdSUMZEZcNtu5XY7w5R1elauh7Wq41lOiVXaEh
MIME-Version: 1.0
X-Received: by 10.202.194.132 with SMTP id s126mr206367oif.15.1455037257839;
 Tue, 09 Feb 2016 09:00:57 -0800 (PST)
Received: by 10.202.224.4 with HTTP; Tue, 9 Feb 2016 09:00:57 -0800 (PST)
In-Reply-To: <56B87F1D.1070905@foss.arm.com>
References: <1454525947-14690-1-git-send-email-charles.baylis@linaro.org>	<1454525947-14690-2-git-send-email-charles.baylis@linaro.org>	<56B87F1D.1070905@foss.arm.com>
Date: Tue, 9 Feb 2016 17:00:57 +0000
Message-ID: <CADnVucB=gALNdOEgo9rAq8xjykN6y_BPObLw_x6rEEe2t5C28Q@mail.gmail.com>
Subject: Re: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian
From: Charles Baylis <charles.baylis@linaro.org>
To: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
Cc: Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>,
 Richard Earnshaw <richard.earnshaw@arm.com>,
 Richard Earnshaw <rearnsha@arm.com>, GCC Patches <gcc-patches@gcc.gnu.org>,
 Michael Collison <michael.collison@linaro.org>
X-IsSubscribed: yes

On 8 February 2016 at 11:42, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
> Hi Charles,
>
>
> On 03/02/16 18:59, charles.baylis@linaro.org wrote:
>>

>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -28208,6 +28208,35 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx
>> op1, rtx sel)
>>     arm_expand_vec_perm_1 (target, op0, op1, sel);
>>   }
>>   +/* map lane ordering between architectural lane order, and GCC lane
>> order,
>> +   taking into account ABI.  See comment above output_move_neon for
>> details.  */
>> +static int
>> +neon_endian_lane_map (machine_mode mode, int lane)
>
>
> s/map/Map/
> New line between comment and function signature.

Done.

>> +{
>> +  if (BYTES_BIG_ENDIAN)
>> +  {
>> +    int nelems = GET_MODE_NUNITS (mode);
>> +    /* Reverse lane order.  */
>> +    lane = (nelems - 1 - lane);
>> +    /* Reverse D register order, to match ABI.  */
>> +    if (GET_MODE_SIZE (mode) == 16)
>> +      lane = lane ^ (nelems / 2);
>> +  }
>> +  return lane;
>> +}
>> +
>> +/* some permutations index into pairs of vectors, this is a helper
>> function
>> +   to map indexes into those pairs of vectors.  */
>> +static int
>> +neon_pair_endian_lane_map (machine_mode mode, int lane)
>
>
> Similarly, s/some/Some/ and new line after comment.

Done.

>> +{
>> +  int nelem = GET_MODE_NUNITS (mode);
>> +  if (BYTES_BIG_ENDIAN)
>> +    lane =
>> +      neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem);
>> +  return lane;
>> +}
>> +
>>   /* Generate or test for an insn that supports a constant permutation.
>> */
>>     /* Recognize patterns for the VUZP insns.  */
>> @@ -28218,14 +28247,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
>>     unsigned int i, odd, mask, nelt = d->nelt;
>>     rtx out0, out1, in0, in1;
>>     rtx (*gen)(rtx, rtx, rtx, rtx);
>> +  int first_elem;
>> +  int swap;
>>
>
> Just make this a bool.

As discussed on IRC, this variable does contain an integer. I have
renamed it as swap_nelt, and changed the test on it below.

[snip]

>>   @@ -28258,10 +28296,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d
>> *d)
>>       in0 = d->op0;
>>     in1 = d->op1;
>> -  if (BYTES_BIG_ENDIAN)
>> +  if (swap)
>>       {
>>         std::swap (in0, in1);
>> -      odd = !odd;
>>       }
>
> remove the braces around the std::swap

Done. Also changed if (swap) to if (swap_nelt != 0)

[snip]

>> @@ -0,0 +1,24 @@
>> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */
>> +
>> +#define SIZE 128
>> +unsigned short _Alignas (16) in[SIZE];
>> +
>> +extern void abort (void);
>> +
>> +__attribute__ ((noinline)) int
>> +test (unsigned short sum, unsigned short *in, int x)
>> +{
>> +  for (int j = 0; j < SIZE; j += 8)
>> +    sum += in[j] * x;
>> +  return sum;
>> +}
>> +
>> +int
>> +main ()
>> +{
>> +  for (int i = 0; i < SIZE; i++)
>> +    in[i] = i;
>> +  if (test (0, in, 1) != 960)
>> +    abort ();
>
>
> AFAIK tests here usually prefer __builtin_abort ();
> That way you don't have to declare the abort prototype in the beginning.

Done.

Updated patch attached

>From 99a536e2e10e3759a5de88422fadcabb22084b2f Mon Sep 17 00:00:00 2001
From: Charles Baylis <charles.baylis@linaro.org>
Date: Tue, 9 Feb 2016 15:18:43 +0000
Subject: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian

gcc/ChangeLog:

2016-02-09  Charles Baylis  <charles.baylis@linaro.org>

	PR target/68532
	* config/arm/arm.c (neon_endian_lane_map): New function.
	(neon_vector_pair_endian_lane_map): New function.
	(arm_evpc_neon_vuzp): Allow for big endian lane order.
	* config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big
	endian.
	(vuzpq_s16): Likewise.
	(vuzpq_s32): Likewise.
	(vuzpq_f32): Likewise.
	(vuzpq_u8): Likewise.
	(vuzpq_u16): Likewise.
	(vuzpq_u32): Likewise.
	(vuzpq_p8): Likewise.
	(vuzpq_p16): Likewise.

gcc/testsuite/ChangeLog:

2016-02-09  Charles Baylis  <charles.baylis@linaro.org>

	PR target/68532
	* gcc.c-torture/execute/pr68532.c: New test.

Change-Id: Ifd35d79bd42825f05403a1b96d8f34ef0f21dac3

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d8a2745..95ee9a5 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -28208,6 +28208,37 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
   arm_expand_vec_perm_1 (target, op0, op1, sel);
 }
 
+/* Map lane ordering between architectural lane order, and GCC lane order,
+   taking into account ABI.  See comment above output_move_neon for details.  */
+
+static int
+neon_endian_lane_map (machine_mode mode, int lane)
+{
+  if (BYTES_BIG_ENDIAN)
+  {
+    int nelems = GET_MODE_NUNITS (mode);
+    /* Reverse lane order.  */
+    lane = (nelems - 1 - lane);
+    /* Reverse D register order, to match ABI.  */
+    if (GET_MODE_SIZE (mode) == 16)
+      lane = lane ^ (nelems / 2);
+  }
+  return lane;
+}
+
+/* Some permutations index into pairs of vectors, this is a helper function
+   to map indexes into those pairs of vectors.  */
+
+static int
+neon_pair_endian_lane_map (machine_mode mode, int lane)
+{
+  int nelem = GET_MODE_NUNITS (mode);
+  if (BYTES_BIG_ENDIAN)
+    lane =
+      neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem);
+  return lane;
+}
+
 /* Generate or test for an insn that supports a constant permutation.  */
 
 /* Recognize patterns for the VUZP insns.  */
@@ -28218,14 +28249,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
   unsigned int i, odd, mask, nelt = d->nelt;
   rtx out0, out1, in0, in1;
   rtx (*gen)(rtx, rtx, rtx, rtx);
+  int first_elem;
+  int swap_nelt;
 
   if (GET_MODE_UNIT_SIZE (d->vmode) >= 8)
     return false;
 
-  /* Note that these are little-endian tests.  Adjust for big-endian later.  */
-  if (d->perm[0] == 0)
+  /* arm_expand_vec_perm_const_1 () helpfully swaps the operands for the
+     big endian pattern on 64 bit vectors, so we correct for that.  */
+  swap_nelt = BYTES_BIG_ENDIAN && !d->one_vector_p
+    && GET_MODE_SIZE (d->vmode) == 8 ? d->nelt : 0;
+
+  first_elem = d->perm[neon_endian_lane_map (d->vmode, 0)] ^ swap_nelt;
+
+  if (first_elem == neon_endian_lane_map (d->vmode, 0))
     odd = 0;
-  else if (d->perm[0] == 1)
+  else if (first_elem == neon_endian_lane_map (d->vmode, 1))
     odd = 1;
   else
     return false;
@@ -28233,8 +28272,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
 
   for (i = 0; i < nelt; i++)
     {
-      unsigned elt = (i * 2 + odd) & mask;
-      if (d->perm[i] != elt)
+      unsigned elt =
+	(neon_pair_endian_lane_map (d->vmode, i) * 2 + odd) & mask;
+      if ((d->perm[i] ^ swap_nelt) != neon_pair_endian_lane_map (d->vmode, elt))
 	return false;
     }
 
@@ -28258,11 +28298,8 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d)
 
   in0 = d->op0;
   in1 = d->op1;
-  if (BYTES_BIG_ENDIAN)
-    {
-      std::swap (in0, in1);
-      odd = !odd;
-    }
+  if (swap_nelt != 0)
+    std::swap (in0, in1);
 
   out0 = d->target;
   out1 = gen_reg_rtx (d->vmode);
diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 47816d5..2e014b6 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -8741,9 +8741,9 @@ vuzpq_s8 (int8x16_t __a, int8x16_t __b)
   int8x16x2_t __rv;
 #ifdef __ARM_BIG_ENDIAN
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t)
-      { 17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15 });
+      { 9, 11, 13, 15, 1, 3, 5, 7, 25, 27, 29, 31, 17, 19, 21, 23 });
   __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t)
-      { 16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14 });
+      { 8, 10, 12, 14, 0, 2, 4, 6, 24, 26, 28, 30, 16, 18, 20, 22 });
 #else
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t)
       { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 });
@@ -8759,9 +8759,9 @@ vuzpq_s16 (int16x8_t __a, int16x8_t __b)
   int16x8x2_t __rv;
 #ifdef __ARM_BIG_ENDIAN
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t)
-      { 9, 11, 13, 15, 1, 3, 5, 7 });
+      { 5, 7, 1, 3, 13, 15, 9, 11 });
   __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t)
-      { 8, 10, 12, 14, 0, 2, 4, 6 });
+      { 4, 6, 0, 2, 12, 14, 8, 10 });
 #else
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t)
       { 0, 2, 4, 6, 8, 10, 12, 14 });
@@ -8776,8 +8776,8 @@ vuzpq_s32 (int32x4_t __a, int32x4_t __b)
 {
   int32x4x2_t __rv;
 #ifdef __ARM_BIG_ENDIAN
-  __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 7, 1, 3 });
-  __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 6, 0, 2 });
+  __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 3, 1, 7, 5 });
+  __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 0, 6, 4 });
 #else
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 2, 4, 6 });
   __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 1, 3, 5, 7 });
@@ -8790,8 +8790,8 @@ vuzpq_f32 (float32x4_t __a, float32x4_t __b)
 {
   float32x4x2_t __rv;
 #ifdef __ARM_BIG_ENDIAN
-  __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 7, 1, 3 });
-  __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 6, 0, 2 });
+  __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 3, 1, 7, 5 });
+  __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 0, 6, 4 });
 #else
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 2, 4, 6 });
   __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 1, 3, 5, 7 });
@@ -8805,9 +8805,9 @@ vuzpq_u8 (uint8x16_t __a, uint8x16_t __b)
   uint8x16x2_t __rv;
 #ifdef __ARM_BIG_ENDIAN
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t)
-      { 17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15 });
+      { 9, 11, 13, 15, 1, 3, 5, 7, 25, 27, 29, 31, 17, 19, 21, 23 });
   __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t)
-      { 16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14 });
+      { 8, 10, 12, 14, 0, 2, 4, 6, 24, 26, 28, 30, 16, 18, 20, 22 });
 #else
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t)
       { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 });
@@ -8823,9 +8823,9 @@ vuzpq_u16 (uint16x8_t __a, uint16x8_t __b)
   uint16x8x2_t __rv;
 #ifdef __ARM_BIG_ENDIAN
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t)
-      { 9, 11, 13, 15, 1, 3, 5, 7 });
+      { 5, 7, 1, 3, 13, 15, 9, 11 });
   __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t)
-      { 8, 10, 12, 14, 0, 2, 4, 6 });
+      { 4, 6, 0, 2, 12, 14, 8, 10 });
 #else
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t)
       { 0, 2, 4, 6, 8, 10, 12, 14 });
@@ -8840,8 +8840,8 @@ vuzpq_u32 (uint32x4_t __a, uint32x4_t __b)
 {
   uint32x4x2_t __rv;
 #ifdef __ARM_BIG_ENDIAN
-  __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 7, 1, 3 });
-  __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 6, 0, 2 });
+  __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 3, 1, 7, 5 });
+  __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 0, 6, 4 });
 #else
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 2, 4, 6 });
   __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 1, 3, 5, 7 });
@@ -8855,9 +8855,9 @@ vuzpq_p8 (poly8x16_t __a, poly8x16_t __b)
   poly8x16x2_t __rv;
 #ifdef __ARM_BIG_ENDIAN
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t)
-      { 17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15 });
+      { 9, 11, 13, 15, 1, 3, 5, 7, 25, 27, 29, 31, 17, 19, 21, 23 });
   __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t)
-      { 16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14 });
+      { 8, 10, 12, 14, 0, 2, 4, 6, 24, 26, 28, 30, 16, 18, 20, 22 });
 #else
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t)
       { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 });
@@ -8873,9 +8873,9 @@ vuzpq_p16 (poly16x8_t __a, poly16x8_t __b)
   poly16x8x2_t __rv;
 #ifdef __ARM_BIG_ENDIAN
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t)
-      { 9, 11, 13, 15, 1, 3, 5, 7 });
+      { 5, 7, 1, 3, 13, 15, 9, 11 });
   __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t)
-      { 8, 10, 12, 14, 0, 2, 4, 6 });
+      { 4, 6, 0, 2, 12, 14, 8, 10 });
 #else
   __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t)
       { 0, 2, 4, 6, 8, 10, 12, 14 });
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68532.c b/gcc/testsuite/gcc.c-torture/execute/pr68532.c
new file mode 100644
index 0000000..5d4bd8e
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr68532.c
@@ -0,0 +1,22 @@
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#define SIZE 128
+unsigned short _Alignas (16) in[SIZE];
+
+__attribute__ ((noinline)) int
+test (unsigned short sum, unsigned short *in, int x)
+{
+  for (int j = 0; j < SIZE; j += 8)
+    sum += in[j] * x;
+  return sum;
+}
+
+int
+main ()
+{
+  for (int i = 0; i < SIZE; i++)
+    in[i] = i;
+  if (test (0, in, 1) != 960)
+    __builtin_abort ();
+  return 0;
+}
-- 
1.9.1