From patchwork Tue Feb 9 17:00:57 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Charles Baylis X-Patchwork-Id: 61568 Delivered-To: patch@linaro.org Received: by 10.112.43.199 with SMTP id y7csp2159579lbl; Tue, 9 Feb 2016 09:01:19 -0800 (PST) X-Received: by 10.98.69.78 with SMTP id s75mr52148625pfa.102.1455037278933; Tue, 09 Feb 2016 09:01:18 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id bc9si55020376pad.140.2016.02.09.09.01.18 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 09 Feb 2016 09:01:18 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-421079-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-return-421079-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-421079-patch=linaro.org@gcc.gnu.org; dkim=pass header.i=@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; q=dns; s=default; b=aUMzLgNzBrCLLm9t1x uLDlTT91KLTCyHWIA5H9vzZl959oIdJtMgC0dYYZZiKvRqSRTYlAWFvHjBODYtLu /uoNch0il+65M6usQqN0s4tJ1p8xMaX9Ja9HYW0yzAfxIf8ma4SQoHNwHFNT+QSz ujwlCI9DULTqb9xvfLAGr0DWQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; s=default; bh=6T/WJegC/bvSMP/mncwsWBZ/ neU=; b=izQNGfaa9/zxgmoli5cyw55qmHwM1QwEPooytqPsfiA1ldQ1fdW0s7tR 7WQcifNydIxJ+EHMHfbIkjP/YOKrIGK1yzY0VJrNVY7ajQ3xy9si52b/kVoFF5GR N47Vbx5uAZ04ZtjPL8cur0RmXo1DnxZ/xwNctGIK8ZZPhRXco38= Received: (qmail 53029 invoked by alias); 9 Feb 2016 17:01:03 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 53009 invoked by uid 89); 9 Feb 2016 17:01:02 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.3 required=5.0 tests=AWL, BAYES_50, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=uint8x16x2_t, *gen, sk:arm_evp, Baylis X-HELO: mail-oi0-f45.google.com Received: from mail-oi0-f45.google.com (HELO mail-oi0-f45.google.com) (209.85.218.45) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 09 Feb 2016 17:00:59 +0000 Received: by mail-oi0-f45.google.com with SMTP id s4so18760692oif.3 for ; Tue, 09 Feb 2016 09:00:59 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=XHTVeLgDDA18x3qmJsD1rohnGQTLAuY9heSv3L6rdjQ=; b=UuUw11ejwtpyTSTNPNfPLjP+Lqp6oUgF/T32nnMlWxcIVwb6S+uGzoh2Jbopncsgk4 kOK6bv4pL4vEdz+R0+YPxcDop9saTllM2rG8sDO0NGP2E/H+x1YXRgx0QBLcUonGFrxI zi7Iln4MzizgIWXwaWn+6ymbLnEysz+3rwgKt7X7DzZcmBwJJwCqz2Vuv7Y94toteawL fKT0dTX+XDcFPDfGZHGF+HqWQ5QV+P9MFgS4hgQMtAzaJNcv2vLU0KW6u3IvGFLV6HHr AEOELQ4HGfSx3ABR3iSoU1ulQ6zRv+LIAi9AHTbU1R5cMSsfAa2u73lkQH0n3joN+HDx ULbw== X-Gm-Message-State: AG10YOQz9af/xjepds+BG0kfGYPaBd3h7i6FkgHUgzqpExKBC5DdSUMZEZcNtu5XY7w5R1elauh7Wq41lOiVXaEh MIME-Version: 1.0 X-Received: by 10.202.194.132 with SMTP id s126mr206367oif.15.1455037257839; Tue, 09 Feb 2016 09:00:57 -0800 (PST) Received: by 10.202.224.4 with HTTP; Tue, 9 Feb 2016 09:00:57 -0800 (PST) In-Reply-To: <56B87F1D.1070905@foss.arm.com> References: <1454525947-14690-1-git-send-email-charles.baylis@linaro.org> <1454525947-14690-2-git-send-email-charles.baylis@linaro.org> <56B87F1D.1070905@foss.arm.com> Date: Tue, 9 Feb 2016 17:00:57 +0000 Message-ID: Subject: Re: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian From: Charles Baylis To: Kyrill Tkachov Cc: Ramana Radhakrishnan , Richard Earnshaw , Richard Earnshaw , GCC Patches , Michael Collison X-IsSubscribed: yes On 8 February 2016 at 11:42, Kyrill Tkachov wrote: > Hi Charles, > > > On 03/02/16 18:59, charles.baylis@linaro.org wrote: >> >> --- a/gcc/config/arm/arm.c >> +++ b/gcc/config/arm/arm.c >> @@ -28208,6 +28208,35 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx >> op1, rtx sel) >> arm_expand_vec_perm_1 (target, op0, op1, sel); >> } >> +/* map lane ordering between architectural lane order, and GCC lane >> order, >> + taking into account ABI. See comment above output_move_neon for >> details. */ >> +static int >> +neon_endian_lane_map (machine_mode mode, int lane) > > > s/map/Map/ > New line between comment and function signature. Done. >> +{ >> + if (BYTES_BIG_ENDIAN) >> + { >> + int nelems = GET_MODE_NUNITS (mode); >> + /* Reverse lane order. */ >> + lane = (nelems - 1 - lane); >> + /* Reverse D register order, to match ABI. */ >> + if (GET_MODE_SIZE (mode) == 16) >> + lane = lane ^ (nelems / 2); >> + } >> + return lane; >> +} >> + >> +/* some permutations index into pairs of vectors, this is a helper >> function >> + to map indexes into those pairs of vectors. */ >> +static int >> +neon_pair_endian_lane_map (machine_mode mode, int lane) > > > Similarly, s/some/Some/ and new line after comment. Done. >> +{ >> + int nelem = GET_MODE_NUNITS (mode); >> + if (BYTES_BIG_ENDIAN) >> + lane = >> + neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem); >> + return lane; >> +} >> + >> /* Generate or test for an insn that supports a constant permutation. >> */ >> /* Recognize patterns for the VUZP insns. */ >> @@ -28218,14 +28247,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) >> unsigned int i, odd, mask, nelt = d->nelt; >> rtx out0, out1, in0, in1; >> rtx (*gen)(rtx, rtx, rtx, rtx); >> + int first_elem; >> + int swap; >> > > Just make this a bool. As discussed on IRC, this variable does contain an integer. I have renamed it as swap_nelt, and changed the test on it below. [snip] >> @@ -28258,10 +28296,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d >> *d) >> in0 = d->op0; >> in1 = d->op1; >> - if (BYTES_BIG_ENDIAN) >> + if (swap) >> { >> std::swap (in0, in1); >> - odd = !odd; >> } > > remove the braces around the std::swap Done. Also changed if (swap) to if (swap_nelt != 0) [snip] >> @@ -0,0 +1,24 @@ >> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */ >> + >> +#define SIZE 128 >> +unsigned short _Alignas (16) in[SIZE]; >> + >> +extern void abort (void); >> + >> +__attribute__ ((noinline)) int >> +test (unsigned short sum, unsigned short *in, int x) >> +{ >> + for (int j = 0; j < SIZE; j += 8) >> + sum += in[j] * x; >> + return sum; >> +} >> + >> +int >> +main () >> +{ >> + for (int i = 0; i < SIZE; i++) >> + in[i] = i; >> + if (test (0, in, 1) != 960) >> + abort (); > > > AFAIK tests here usually prefer __builtin_abort (); > That way you don't have to declare the abort prototype in the beginning. Done. Updated patch attached >From 99a536e2e10e3759a5de88422fadcabb22084b2f Mon Sep 17 00:00:00 2001 From: Charles Baylis Date: Tue, 9 Feb 2016 15:18:43 +0000 Subject: [PATCH 1/2] [ARM] PR68532: Fix up vuzp for big endian gcc/ChangeLog: 2016-02-09 Charles Baylis PR target/68532 * config/arm/arm.c (neon_endian_lane_map): New function. (neon_vector_pair_endian_lane_map): New function. (arm_evpc_neon_vuzp): Allow for big endian lane order. * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big endian. (vuzpq_s16): Likewise. (vuzpq_s32): Likewise. (vuzpq_f32): Likewise. (vuzpq_u8): Likewise. (vuzpq_u16): Likewise. (vuzpq_u32): Likewise. (vuzpq_p8): Likewise. (vuzpq_p16): Likewise. gcc/testsuite/ChangeLog: 2016-02-09 Charles Baylis PR target/68532 * gcc.c-torture/execute/pr68532.c: New test. Change-Id: Ifd35d79bd42825f05403a1b96d8f34ef0f21dac3 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index d8a2745..95ee9a5 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28208,6 +28208,37 @@ arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel) arm_expand_vec_perm_1 (target, op0, op1, sel); } +/* Map lane ordering between architectural lane order, and GCC lane order, + taking into account ABI. See comment above output_move_neon for details. */ + +static int +neon_endian_lane_map (machine_mode mode, int lane) +{ + if (BYTES_BIG_ENDIAN) + { + int nelems = GET_MODE_NUNITS (mode); + /* Reverse lane order. */ + lane = (nelems - 1 - lane); + /* Reverse D register order, to match ABI. */ + if (GET_MODE_SIZE (mode) == 16) + lane = lane ^ (nelems / 2); + } + return lane; +} + +/* Some permutations index into pairs of vectors, this is a helper function + to map indexes into those pairs of vectors. */ + +static int +neon_pair_endian_lane_map (machine_mode mode, int lane) +{ + int nelem = GET_MODE_NUNITS (mode); + if (BYTES_BIG_ENDIAN) + lane = + neon_endian_lane_map (mode, lane & (nelem - 1)) + (lane & nelem); + return lane; +} + /* Generate or test for an insn that supports a constant permutation. */ /* Recognize patterns for the VUZP insns. */ @@ -28218,14 +28249,22 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) unsigned int i, odd, mask, nelt = d->nelt; rtx out0, out1, in0, in1; rtx (*gen)(rtx, rtx, rtx, rtx); + int first_elem; + int swap_nelt; if (GET_MODE_UNIT_SIZE (d->vmode) >= 8) return false; - /* Note that these are little-endian tests. Adjust for big-endian later. */ - if (d->perm[0] == 0) + /* arm_expand_vec_perm_const_1 () helpfully swaps the operands for the + big endian pattern on 64 bit vectors, so we correct for that. */ + swap_nelt = BYTES_BIG_ENDIAN && !d->one_vector_p + && GET_MODE_SIZE (d->vmode) == 8 ? d->nelt : 0; + + first_elem = d->perm[neon_endian_lane_map (d->vmode, 0)] ^ swap_nelt; + + if (first_elem == neon_endian_lane_map (d->vmode, 0)) odd = 0; - else if (d->perm[0] == 1) + else if (first_elem == neon_endian_lane_map (d->vmode, 1)) odd = 1; else return false; @@ -28233,8 +28272,9 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) for (i = 0; i < nelt; i++) { - unsigned elt = (i * 2 + odd) & mask; - if (d->perm[i] != elt) + unsigned elt = + (neon_pair_endian_lane_map (d->vmode, i) * 2 + odd) & mask; + if ((d->perm[i] ^ swap_nelt) != neon_pair_endian_lane_map (d->vmode, elt)) return false; } @@ -28258,11 +28298,8 @@ arm_evpc_neon_vuzp (struct expand_vec_perm_d *d) in0 = d->op0; in1 = d->op1; - if (BYTES_BIG_ENDIAN) - { - std::swap (in0, in1); - odd = !odd; - } + if (swap_nelt != 0) + std::swap (in0, in1); out0 = d->target; out1 = gen_reg_rtx (d->vmode); diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index 47816d5..2e014b6 100644 --- a/gcc/config/arm/arm_neon.h +++ b/gcc/config/arm/arm_neon.h @@ -8741,9 +8741,9 @@ vuzpq_s8 (int8x16_t __a, int8x16_t __b) int8x16x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15 }); + { 9, 11, 13, 15, 1, 3, 5, 7, 25, 27, 29, 31, 17, 19, 21, 23 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14 }); + { 8, 10, 12, 14, 0, 2, 4, 6, 24, 26, 28, 30, 16, 18, 20, 22 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 }); @@ -8759,9 +8759,9 @@ vuzpq_s16 (int16x8_t __a, int16x8_t __b) int16x8x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 9, 11, 13, 15, 1, 3, 5, 7 }); + { 5, 7, 1, 3, 13, 15, 9, 11 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 8, 10, 12, 14, 0, 2, 4, 6 }); + { 4, 6, 0, 2, 12, 14, 8, 10 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) { 0, 2, 4, 6, 8, 10, 12, 14 }); @@ -8776,8 +8776,8 @@ vuzpq_s32 (int32x4_t __a, int32x4_t __b) { int32x4x2_t __rv; #ifdef __ARM_BIG_ENDIAN - __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 7, 1, 3 }); - __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 6, 0, 2 }); + __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 3, 1, 7, 5 }); + __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 0, 6, 4 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 2, 4, 6 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 1, 3, 5, 7 }); @@ -8790,8 +8790,8 @@ vuzpq_f32 (float32x4_t __a, float32x4_t __b) { float32x4x2_t __rv; #ifdef __ARM_BIG_ENDIAN - __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 7, 1, 3 }); - __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 6, 0, 2 }); + __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 3, 1, 7, 5 }); + __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 0, 6, 4 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 2, 4, 6 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 1, 3, 5, 7 }); @@ -8805,9 +8805,9 @@ vuzpq_u8 (uint8x16_t __a, uint8x16_t __b) uint8x16x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15 }); + { 9, 11, 13, 15, 1, 3, 5, 7, 25, 27, 29, 31, 17, 19, 21, 23 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14 }); + { 8, 10, 12, 14, 0, 2, 4, 6, 24, 26, 28, 30, 16, 18, 20, 22 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 }); @@ -8823,9 +8823,9 @@ vuzpq_u16 (uint16x8_t __a, uint16x8_t __b) uint16x8x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 9, 11, 13, 15, 1, 3, 5, 7 }); + { 5, 7, 1, 3, 13, 15, 9, 11 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 8, 10, 12, 14, 0, 2, 4, 6 }); + { 4, 6, 0, 2, 12, 14, 8, 10 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) { 0, 2, 4, 6, 8, 10, 12, 14 }); @@ -8840,8 +8840,8 @@ vuzpq_u32 (uint32x4_t __a, uint32x4_t __b) { uint32x4x2_t __rv; #ifdef __ARM_BIG_ENDIAN - __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 5, 7, 1, 3 }); - __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 4, 6, 0, 2 }); + __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 3, 1, 7, 5 }); + __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 2, 0, 6, 4 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint32x4_t) { 0, 2, 4, 6 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint32x4_t) { 1, 3, 5, 7 }); @@ -8855,9 +8855,9 @@ vuzpq_p8 (poly8x16_t __a, poly8x16_t __b) poly8x16x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 17, 19, 21, 23, 25, 27, 29, 31, 1, 3, 5, 7, 9, 11, 13, 15 }); + { 9, 11, 13, 15, 1, 3, 5, 7, 25, 27, 29, 31, 17, 19, 21, 23 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint8x16_t) - { 16, 18, 20, 22, 24, 26, 28, 30, 0, 2, 4, 6, 8, 10, 12, 14 }); + { 8, 10, 12, 14, 0, 2, 4, 6, 24, 26, 28, 30, 16, 18, 20, 22 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint8x16_t) { 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30 }); @@ -8873,9 +8873,9 @@ vuzpq_p16 (poly16x8_t __a, poly16x8_t __b) poly16x8x2_t __rv; #ifdef __ARM_BIG_ENDIAN __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 9, 11, 13, 15, 1, 3, 5, 7 }); + { 5, 7, 1, 3, 13, 15, 9, 11 }); __rv.val[1] = __builtin_shuffle (__a, __b, (uint16x8_t) - { 8, 10, 12, 14, 0, 2, 4, 6 }); + { 4, 6, 0, 2, 12, 14, 8, 10 }); #else __rv.val[0] = __builtin_shuffle (__a, __b, (uint16x8_t) { 0, 2, 4, 6, 8, 10, 12, 14 }); diff --git a/gcc/testsuite/gcc.c-torture/execute/pr68532.c b/gcc/testsuite/gcc.c-torture/execute/pr68532.c new file mode 100644 index 0000000..5d4bd8e --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr68532.c @@ -0,0 +1,22 @@ +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */ + +#define SIZE 128 +unsigned short _Alignas (16) in[SIZE]; + +__attribute__ ((noinline)) int +test (unsigned short sum, unsigned short *in, int x) +{ + for (int j = 0; j < SIZE; j += 8) + sum += in[j] * x; + return sum; +} + +int +main () +{ + for (int i = 0; i < SIZE; i++) + in[i] = i; + if (test (0, in, 1) != 960) + __builtin_abort (); + return 0; +} -- 1.9.1