From patchwork Wed Feb 3 18:59:06 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Charles Baylis X-Patchwork-Id: 61139 Delivered-To: patch@linaro.org Received: by 10.112.43.199 with SMTP id y7csp23063lbl; Wed, 3 Feb 2016 10:59:55 -0800 (PST) X-Received: by 10.66.159.38 with SMTP id wz6mr4751267pab.12.1454525995049; Wed, 03 Feb 2016 10:59:55 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id o12si10834064pfa.162.2016.02.03.10.59.54 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 03 Feb 2016 10:59:55 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-420676-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-return-420676-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-420676-patch=linaro.org@gcc.gnu.org; dkim=pass header.i=@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references; q=dns; s= default; b=GcixXhdTsPtfd7xAW5ugjYLdef5OXASkiVNpVYZ8uNqIN1X/4Ovcq VjUjp5/hN7WlZMPkETaXH2AVQ7+ffIFWGBrocnnVv22bdMa5ecStB6jPpfZ3AvIU Q6VNzNYRw7JmJQWSoq6hODxu89UZJM8DmOpJnCtP4z8FUBw+crCQNg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references; s= default; bh=rv79RSOGB2rz2oACHt/B/RZtcN8=; b=XWxsvFpUHygmsflB+Nxy xvtNC+T/dr2GV3aAdQPC2Z7hkCFS9fw24UQvD5U2qi3KKHaCcCQyn6keHkAR7VY5 0632b12gp0+A90e2HWhNoozmxDzAmmBJzU/6FqDGTG23/vbob4nh8mnSLgEXHKIO 44de1hsJKTxZpJBzbQ56/uU= Received: (qmail 36686 invoked by alias); 3 Feb 2016 18:59:29 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 36606 invoked by uid 89); 3 Feb 2016 18:59:28 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=Recognize, elt, Hx-languages-length:9140, Reverse X-HELO: mail-wm0-f52.google.com Received: from mail-wm0-f52.google.com (HELO mail-wm0-f52.google.com) (74.125.82.52) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 03 Feb 2016 18:59:26 +0000 Received: by mail-wm0-f52.google.com with SMTP id p63so84759825wmp.1 for ; Wed, 03 Feb 2016 10:59:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=33ttP2lOgcGgb3Ubq3XdaQQ3V/GQF+t1ROnJ/u1yLr0=; b=Mu1fGgJCIOT0icWr0+PsNjRmjT4+7b2EPBCJZts7Il2cr24VI/FvzcAklFsN5Scw7Q xmAXYhBCo6s9aLHe4Uc8QzlB0TwqYQ5yEGlgTkV6LtXdbl7p73qWMSznd5H1GBqex2Ey TfDkvf6HkydGhEw4nr5pxE3qZFpg6FSMso7Wbftsk1m1WwRkmkP0MKzo/oOfBe5Tu6l3 hLVQ3EcU4YTplDYdyR2NRLLuuPgZ1DvpU9rz1C48N3BztVCkv1XoT1byrYyKDR9lKBJr hVXmtaXZhI44qTymDkx1WiHNELk1IWmiCg2MgTFk8osYFzlu+A2WilQeVM26LlqK8+lE N3AQ== X-Gm-Message-State: AG10YOSI3oOhfTrzxqhcgSgYXGXIJiIuFWN0Zdl2YjkZz0iJM/It6Wu8ezRWGaHfzWPRr33z X-Received: by 10.28.218.81 with SMTP id r78mr5773014wmg.91.1454525963569; Wed, 03 Feb 2016 10:59:23 -0800 (PST) Received: from localhost.localdomain (cpc92322-cmbg19-2-0-cust1928.5-4.cable.virginm.net. [86.26.39.137]) by smtp.gmail.com with ESMTPSA id w62sm22774792wmg.21.2016.02.03.10.59.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 03 Feb 2016 10:59:22 -0800 (PST) From: charles.baylis@linaro.org To: Ramana.Radhakrishnan@arm.com, kyrylo.tkachov@arm.com, richard.earnshaw@arm.com Cc: rearnsha@arm.com, gcc-patches@gcc.gnu.org, michael.collison@linaro.org Subject: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian Date: Wed, 3 Feb 2016 18:59:06 +0000 Message-Id: <1454525947-14690-2-git-send-email-charles.baylis@linaro.org> In-Reply-To: <1454525947-14690-1-git-send-email-charles.baylis@linaro.org> References: <1454525947-14690-1-git-send-email-charles.baylis@linaro.org> X-IsSubscribed: yes From: Charles Baylis gcc/ChangeLog: 2016-02-03 Charles Baylis PR target/68532 * config/arm/arm.c (neon_endian_lane_map): New function. (neon_vector_pair_endian_lane_map): New function. (arm_evpc_neon_vuzp): Allow for big endian lane order. * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big endian. (vuzpq_s16): Likewise. (vuzpq_s32): Likewise. (vuzpq_f32): Likewise. (vuzpq_u8): Likewise. (vuzpq_u16): Likewise. (vuzpq_u32): Likewise. (vuzpq_p8): Likewise. (vuzpq_p16): Likewise. gcc/testsuite/ChangeLog: 2015-12-15 Charles Baylis PR target/68532 * gcc.c-torture/execute/pr68532.c: New test. Change-Id: Ifd35d79bd42825f05403a1b96d8f34ef0f21dac3 -- 1.9.1 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index d8a2745..e9aa982 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28208,6 +28208,35 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) arm_expand_vec_perm_1 (target, op0, op1, sel); } +/* map lane ordering between architectural lane order, and GCC lane order, + taking into account ABI. See comment above output_move_neon for details. */ +static int +neon_endian_lane_map (machine_mode mode, int lane) +{ + if (BYTES_BIG_ENDIAN) + { + int nelems = GET_MODE_NUNITS (mode); + /* Reverse lane order. */ + lane = (nelems - 1 - lane); + /* Reverse D register order, to match ABI. */ + if (GET_MODE_SIZE (mode) == 16) + lane = lane ^ (nelems / 2); + } + return lane; +} + +/* some permutations index into pairs of vectors, this is a helper function + to map indexes into those pairs of vectors. */ +static int +neon_pair_endian_lane_map (machine_mode mode, int lane) +{ + int nelem = GET_MODE_NUNITS (mode); + if (BYTES_BIG_ENDIAN) + lane = + neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem); + return lane; +} + /* Generate or test for an insn that supports a constant permutation. */ /* Recognize patterns for the VUZP insns. */ @@ -28218,14 +28247,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) unsigned int i, odd, mask, nelt = d->nelt; rtx out0, out1, in0, in1; rtx (*gen)(rtx, rtx, rtx, rtx); + int first_elem; + int swap; if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) return false; - /* Note that these are little-endian tests. Adjust for big-endian later. */ - if (d->perm[0] == 0) + /* arm_expand_vec_perm_const_1 () helpfully swaps the operands for the + big endian pattern on 64 bit vectors, so we correct for that. */ + swap = BYTES_BIG_ENDIAN && !d->one_vector_p + && GET_MODE_SIZE (d->vmode) == 8 ? d->nelt : 0; + + first_elem = d->perm[neon_endian_lane_map (d->vmode, 0)] ^ swap; + + if (first_elem == neon_endian_lane_map (d->vmode, 0)) odd = 0; - else if (d->perm[0] == 1) + else if (first_elem == neon_endian_lane_map (d->vmode, 1)) odd = 1; else return false; @@ -28233,8 +28270,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) for (i = 0; i < nelt; i++) { - unsigned elt = (i * 2 + odd) & mask; - if (d->perm[i] != elt) + unsigned elt = + (neon_pair_endian_lane_map (d->vmode, i) * 2 + odd) & mask; + if ((d->perm[i] ^ swap) != neon_pair_endian_lane_map (d->vmode, elt)) return false; } @@ -28258,10 +28296,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) in0 = d->op0; in1 = d->op1; - if (BYTES_BIG_ENDIAN) + if (swap) { std::swap (in0, in1); - odd = !odd; } out0 = d->target; diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 47816d5..2e014b6 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -8741,9 +8741,9 @@ vuzpq_s8 (int8x16_t __a, int8x16_t __b) int8x16x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15 }); + { 9, 11, 13, 15, 1, 3, 5, 7, 25, 27, 29, 31, 17, 19, 21, 23 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14 }); + { 8, 10, 12, 14, 0, 2, 4, 6, 24, 26, 28, 30, 16, 18, 20, 22 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 }); @@ -8759,9 +8759,9 @@ vuzpq_s16 (int16x8_t __a, int16x8_t __b) int16x8x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 9, 11, 13, 15, 1, 3, 5, 7 }); + { 5, 7, 1, 3, 13, 15, 9, 11 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 8, 10, 12, 14, 0, 2, 4, 6 }); + { 4, 6, 0, 2, 12, 14, 8, 10 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) { 0, 2, 4, 6, 8, 10, 12, 14 }); @@ -8776,8 +8776,8 @@ vuzpq_s32 (int32x4_t __a, int32x4_t __b) { int32x4x2_t __rv; #ifdef __ARM_BIG_ENDIAN - __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 7, 1, 3 }); - __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 6, 0, 2 }); + __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 3, 1, 7, 5 }); + __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 0, 6, 4 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 2, 4, 6 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 1, 3, 5, 7 }); @@ -8790,8 +8790,8 @@ vuzpq_f32 (float32x4_t __a, float32x4_t __b) { float32x4x2_t __rv; #ifdef __ARM_BIG_ENDIAN - __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 7, 1, 3 }); - __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 6, 0, 2 }); + __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 3, 1, 7, 5 }); + __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 0, 6, 4 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 2, 4, 6 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 1, 3, 5, 7 }); @@ -8805,9 +8805,9 @@ vuzpq_u8 (uint8x16_t __a, uint8x16_t __b) uint8x16x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15 }); + { 9, 11, 13, 15, 1, 3, 5, 7, 25, 27, 29, 31, 17, 19, 21, 23 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14 }); + { 8, 10, 12, 14, 0, 2, 4, 6, 24, 26, 28, 30, 16, 18, 20, 22 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 }); @@ -8823,9 +8823,9 @@ vuzpq_u16 (uint16x8_t __a, uint16x8_t __b) uint16x8x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 9, 11, 13, 15, 1, 3, 5, 7 }); + { 5, 7, 1, 3, 13, 15, 9, 11 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 8, 10, 12, 14, 0, 2, 4, 6 }); + { 4, 6, 0, 2, 12, 14, 8, 10 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) { 0, 2, 4, 6, 8, 10, 12, 14 }); @@ -8840,8 +8840,8 @@ vuzpq_u32 (uint32x4_t __a, uint32x4_t __b) { uint32x4x2_t __rv; #ifdef __ARM_BIG_ENDIAN - __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 7, 1, 3 }); - __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 6, 0, 2 }); + __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 3, 1, 7, 5 }); + __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 0, 6, 4 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 2, 4, 6 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 1, 3, 5, 7 }); @@ -8855,9 +8855,9 @@ vuzpq_p8 (poly8x16_t __a, poly8x16_t __b) poly8x16x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15 }); + { 9, 11, 13, 15, 1, 3, 5, 7, 25, 27, 29, 31, 17, 19, 21, 23 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14 }); + { 8, 10, 12, 14, 0, 2, 4, 6, 24, 26, 28, 30, 16, 18, 20, 22 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 }); @@ -8873,9 +8873,9 @@ vuzpq_p16 (poly16x8_t __a, poly16x8_t __b) poly16x8x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 9, 11, 13, 15, 1, 3, 5, 7 }); + { 5, 7, 1, 3, 13, 15, 9, 11 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 8, 10, 12, 14, 0, 2, 4, 6 }); + { 4, 6, 0, 2, 12, 14, 8, 10 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) { 0, 2, 4, 6, 8, 10, 12, 14 }); diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68532.c b/gcc/testsuite/gcc.c-torture/execute/pr68532.c new file mode 100644 index 0000000..3c40aa8 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr68532.c @@ -0,0 +1,24 @@ +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */ + +#define SIZE 128 +unsigned short _Alignas (16) in[SIZE]; + +extern void abort (void); + +__attribute__ ((noinline)) int +test (unsigned short sum, unsigned short *in, int x) +{ + for (int j = 0; j < SIZE; j += 8) + sum += in[j] * x; + return sum; +} + +int +main () +{ + for (int i = 0; i < SIZE; i++) + in[i] = i; + if (test (0, in, 1) != 960) + abort (); + return 0; +}