From patchwork Sun May 1 07:30:00 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Rosen X-Patchwork-Id: 1240 Return-Path: Delivered-To: unknown Received: from imap.gmail.com (74.125.159.109) by localhost6.localdomain6 with IMAP4-SSL; 08 Jun 2011 14:50:30 -0000 Delivered-To: patches@linaro.org Received: by 10.224.2.73 with SMTP id 9cs226472qai; Sun, 1 May 2011 00:30:07 -0700 (PDT) Received: by 10.227.182.74 with SMTP id cb10mr1305423wbb.48.1304235005878; Sun, 01 May 2011 00:30:05 -0700 (PDT) Received: from mtagate6.uk.ibm.com (mtagate6.uk.ibm.com [194.196.100.166]) by mx.google.com with ESMTPS id l2si11993215wba.76.2011.05.01.00.30.04 (version=TLSv1/SSLv3 cipher=OTHER); Sun, 01 May 2011 00:30:05 -0700 (PDT) Received-SPF: pass (google.com: domain of IRAR@il.ibm.com designates 194.196.100.166 as permitted sender) client-ip=194.196.100.166; Authentication-Results: mx.google.com; spf=pass (google.com: domain of IRAR@il.ibm.com designates 194.196.100.166 as permitted sender) smtp.mail=IRAR@il.ibm.com Received: from d06nrmr1806.portsmouth.uk.ibm.com (d06nrmr1806.portsmouth.uk.ibm.com [9.149.39.193]) by mtagate6.uk.ibm.com (8.13.1/8.13.1) with ESMTP id p417U3xi021277; Sun, 1 May 2011 07:30:03 GMT Received: from d06av12.portsmouth.uk.ibm.com (d06av12.portsmouth.uk.ibm.com [9.149.37.247]) by d06nrmr1806.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p417VBxj868392; Sun, 1 May 2011 08:31:11 +0100 Received: from d06av12.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av12.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p417U3jM013525; Sun, 1 May 2011 01:30:03 -0600 Received: from d12mc102.megacenter.de.ibm.com (d12mc102.megacenter.de.ibm.com [9.149.167.114]) by d06av12.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p417U3Jk013517; Sun, 1 May 2011 01:30:03 -0600 In-Reply-To: <4D9DAB2C.6010306@linaro.org> References: <4D9DAB2C.6010306@linaro.org> Subject: Re: [patch, ARM] Fix PR target/48252 X-KeepSent: 8E0EC0C3:51E64C85-C2257883:00282C16; type=4; name=$KeepSent To: Ramana Radhakrishnan Cc: gcc-patches@gcc.gnu.org, Ira Rosen , Patch Tracking , Ulrich Weigand X-Mailer: Lotus Notes Release 8.5 HF58 February 06, 2009 Message-ID: From: Ira Rosen Date: Sun, 1 May 2011 10:30:00 +0300 X-MIMETrack: Serialize by Router on D12MC102/12/M/IBM(Release 8.5.2FP1|November 29, 2010) at 01/05/2011 10:30:09 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII Ramana Radhakrishnan wrote on 07/04/2011 03:16:44 PM: > > On 07/04/11 08:42, Ira Rosen wrote: > > Hi, > > > > This patch makes both outputs of neon_vzip/vuzp/vtrn_internal > > explicitly dependent on both inputs, preventing incorrect > > optimization: > > for > > (a,b)<- vzip (c,d) > > and > > (e,f)<- vzip (g,d) > > CSE decides that b==f, since b and f depend only on d. > > > > Tested on arm-linux-gnueabi. OK for trunk? > > This is OK for trunk. > > > OK for 4.6 after testing? > > I have no objections to this going into 4.5 and 4.6 since it corrects > the implementation of the neon intrinsics but please check with the > release managers. OK to backport to 4.5 and 4.6 - both tested on arm-linux-gnueabi? Thanks, Ira 4.5 and 4.6 ChangeLog: Backport from mainline: 2011-04-18 Ulrich Weigand Ira Rosen PR target/48252 * config/arm/arm.c (neon_emit_pair_result_insn): Swap arguments to match neon_vzip/vuzp/vtrn_internal. * config/arm/neon.md (neon_vtrn_internal): Make both outputs explicitly dependent on both inputs. (neon_vzip_internal, neon_vuzp_internal): Likewise. testsuite/Changelog: Backport from mainline: 2011-04-18 Ulrich Weigand Ira Rosen PR target/48252 * gcc.target/arm/pr48252.c: New test. 4.5 patch: > > cheers > Ramana > > > > > Thanks, > > Ira > > > > ChangeLog: > > > > 2011-04-07 Ulrich Weigand > > Ira Rosen > > > > PR target/48252 > > * config/arm/arm.c (neon_emit_pair_result_insn): Swap arguments > > to match neon_vzip/vuzp/vtrn_internal. > > * config/arm/neon.md (neon_vtrn_internal): Make both > > outputs explicitly dependent on both inputs. > > (neon_vzip_internal, neon_vuzp_internal): Likewise. > > > > testsuite/Changelog: > > > > PR target/48252 > > * gcc.target/arm/pr48252.c: New test. > Index: config/arm/arm.c =================================================================== --- config/arm/arm.c (revision 172714) +++ config/arm/arm.c (working copy) @@ -18237,7 +18237,7 @@ neon_emit_pair_result_insn (enum machine_mode mode rtx tmp1 = gen_reg_rtx (mode); rtx tmp2 = gen_reg_rtx (mode); - emit_insn (intfn (tmp1, op1, tmp2, op2)); + emit_insn (intfn (tmp1, op1, op2, tmp2)); emit_move_insn (mem, tmp1); mem = adjust_address (mem, mode, GET_MODE_SIZE (mode)); Index: config/arm/neon.md =================================================================== --- config/arm/neon.md (revision 172714) +++ config/arm/neon.md (working copy) @@ -3895,13 +3895,14 @@ (define_insn "neon_vtrn_internal" [(set (match_operand:VDQW 0 "s_register_operand" "=w") - (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] - UNSPEC_VTRN1)) - (set (match_operand:VDQW 2 "s_register_operand" "=w") - (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] - UNSPEC_VTRN2))] + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") + (match_operand:VDQW 2 "s_register_operand" "w")] + UNSPEC_VTRN1)) + (set (match_operand:VDQW 3 "s_register_operand" "=2") + (unspec:VDQW [(match_dup 1) (match_dup 2)] + UNSPEC_VTRN2))] "TARGET_NEON" - "vtrn.\t%0, %2" + "vtrn.\t%0, %3" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "") (const_int 0)) (const_string "neon_bp_simple") @@ -3921,13 +3922,14 @@ (define_insn "neon_vzip_internal" [(set (match_operand:VDQW 0 "s_register_operand" "=w") - (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] - UNSPEC_VZIP1)) - (set (match_operand:VDQW 2 "s_register_operand" "=w") - (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] - UNSPEC_VZIP2))] + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") + (match_operand:VDQW 2 "s_register_operand" "w")] + UNSPEC_VZIP1)) + (set (match_operand:VDQW 3 "s_register_operand" "=2") + (unspec:VDQW [(match_dup 1) (match_dup 2)] + UNSPEC_VZIP2))] "TARGET_NEON" - "vzip.\t%0, %2" + "vzip.\t%0, %3" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "") (const_int 0)) (const_string "neon_bp_simple") @@ -3947,13 +3949,14 @@ (define_insn "neon_vuzp_internal" [(set (match_operand:VDQW 0 "s_register_operand" "=w") - (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") + (match_operand:VDQW 2 "s_register_operand" "w")] UNSPEC_VUZP1)) - (set (match_operand:VDQW 2 "s_register_operand" "=w") - (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] - UNSPEC_VUZP2))] + (set (match_operand:VDQW 3 "s_register_operand" "=2") + (unspec:VDQW [(match_dup 1) (match_dup 2)] + UNSPEC_VUZP2))] "TARGET_NEON" - "vuzp.\t%0, %2" + "vuzp.\t%0, %3" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "") (const_int 0)) (const_string "neon_bp_simple") Index: testsuite/gcc.target/arm/pr48252.c =================================================================== --- testsuite/gcc.target/arm/pr48252.c (revision 0) +++ testsuite/gcc.target/arm/pr48252.c (revision 0) @@ -0,0 +1,32 @@ +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_neon } */ + +#include "arm_neon.h" +#include + +int main(void) +{ + uint8x8_t v1 = {1, 1, 1, 1, 1, 1, 1, 1}; + uint8x8_t v2 = {2, 2, 2, 2, 2, 2, 2, 2}; + uint8x8x2_t vd1, vd2; + union {uint8x8_t v; uint8_t buf[8];} d1, d2, d3, d4; + int i; + + vd1 = vzip_u8(v1, vdup_n_u8(0)); + vd2 = vzip_u8(v2, vdup_n_u8(0)); + + vst1_u8(d1.buf, vd1.val[0]); + vst1_u8(d2.buf, vd1.val[1]); + vst1_u8(d3.buf, vd2.val[0]); + vst1_u8(d4.buf, vd2.val[1]); + + for (i = 0; i < 8; i++) + if ((i % 2 == 0 && d4.buf[i] != 2) + || (i % 2 == 1 && d4.buf[i] != 0)) + abort (); + + return 0; +} + 4.6 patch: Index: config/arm/arm.c =================================================================== --- config/arm/arm.c (revision 172810) +++ config/arm/arm.c (working copy) @@ -19564,7 +19564,7 @@ neon_emit_pair_result_insn (enum machine_mode mode rtx tmp1 = gen_reg_rtx (mode); rtx tmp2 = gen_reg_rtx (mode); - emit_insn (intfn (tmp1, op1, tmp2, op2)); + emit_insn (intfn (tmp1, op1, op2, tmp2)); emit_move_insn (mem, tmp1); mem = adjust_address (mem, mode, GET_MODE_SIZE (mode)); Index: config/arm/neon.md =================================================================== --- config/arm/neon.md (revision 172810) +++ config/arm/neon.md (working copy) @@ -4079,13 +4079,14 @@ (define_insn "neon_vtrn_internal" [(set (match_operand:VDQW 0 "s_register_operand" "=w") - (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] - UNSPEC_VTRN1)) - (set (match_operand:VDQW 2 "s_register_operand" "=w") - (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] - UNSPEC_VTRN2))] + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") + (match_operand:VDQW 2 "s_register_operand" "w")] + UNSPEC_VTRN1)) + (set (match_operand:VDQW 3 "s_register_operand" "=2") + (unspec:VDQW [(match_dup 1) (match_dup 2)] + UNSPEC_VTRN2))] "TARGET_NEON" - "vtrn.\t%0, %2" + "vtrn.\t%0, %3" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "") (const_int 0)) (const_string "neon_bp_simple") @@ -4105,13 +4106,14 @@ (define_insn "neon_vzip_internal" [(set (match_operand:VDQW 0 "s_register_operand" "=w") - (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] - UNSPEC_VZIP1)) - (set (match_operand:VDQW 2 "s_register_operand" "=w") - (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] - UNSPEC_VZIP2))] + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") + (match_operand:VDQW 2 "s_register_operand" "w")] + UNSPEC_VZIP1)) + (set (match_operand:VDQW 3 "s_register_operand" "=2") + (unspec:VDQW [(match_dup 1) (match_dup 2)] + UNSPEC_VZIP2))] "TARGET_NEON" - "vzip.\t%0, %2" + "vzip.\t%0, %3" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "") (const_int 0)) (const_string "neon_bp_simple") @@ -4131,13 +4133,14 @@ (define_insn "neon_vuzp_internal" [(set (match_operand:VDQW 0 "s_register_operand" "=w") - (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0")] + (unspec:VDQW [(match_operand:VDQW 1 "s_register_operand" "0") + (match_operand:VDQW 2 "s_register_operand" "w")] UNSPEC_VUZP1)) - (set (match_operand:VDQW 2 "s_register_operand" "=w") - (unspec:VDQW [(match_operand:VDQW 3 "s_register_operand" "2")] - UNSPEC_VUZP2))] + (set (match_operand:VDQW 3 "s_register_operand" "=2") + (unspec:VDQW [(match_dup 1) (match_dup 2)] + UNSPEC_VUZP2))] "TARGET_NEON" - "vuzp.\t%0, %2" + "vuzp.\t%0, %3" [(set (attr "neon_type") (if_then_else (ne (symbol_ref "") (const_int 0)) (const_string "neon_bp_simple") Index: testsuite/gcc.target/arm/pr48252.c =================================================================== --- testsuite/gcc.target/arm/pr48252.c (revision 0) +++ testsuite/gcc.target/arm/pr48252.c (revision 0) @@ -0,0 +1,32 @@ +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_neon } */ + +#include "arm_neon.h" +#include + +int main(void) +{ + uint8x8_t v1 = {1, 1, 1, 1, 1, 1, 1, 1}; + uint8x8_t v2 = {2, 2, 2, 2, 2, 2, 2, 2}; + uint8x8x2_t vd1, vd2; + union {uint8x8_t v; uint8_t buf[8];} d1, d2, d3, d4; + int i; + + vd1 = vzip_u8(v1, vdup_n_u8(0)); + vd2 = vzip_u8(v2, vdup_n_u8(0)); + + vst1_u8(d1.buf, vd1.val[0]); + vst1_u8(d2.buf, vd1.val[1]); + vst1_u8(d3.buf, vd2.val[0]); + vst1_u8(d4.buf, vd2.val[1]); + + for (i = 0; i < 8; i++) + if ((i % 2 == 0 && d4.buf[i] != 2) + || (i % 2 == 1 && d4.buf[i] != 0)) + abort (); + + return 0; +} +