From patchwork Wed May 30 02:03:45 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
X-Patchwork-Id: 9018
Return-Path: <patch+caf_=linaro-patchwork=canonical.com@linaro.org>
X-Original-To: patchwork@peony.canonical.com
Delivered-To: patchwork@peony.canonical.com
Received: from fiordland.canonical.com (fiordland.canonical.com
 [91.189.94.145])
 by peony.canonical.com (Postfix) with ESMTP id 0C60A23E1B
 for <patchwork@peony.canonical.com>;
 Wed, 30 May 2012 02:03:49 +0000 (UTC)
Received: from mail-yx0-f180.google.com (mail-yx0-f180.google.com
 [209.85.213.180])
 by fiordland.canonical.com (Postfix) with ESMTP id AC8C7A18098
 for <linaro-patchwork@canonical.com>;
 Wed, 30 May 2012 02:03:48 +0000 (UTC)
Received: by yenq6 with SMTP id q6so3175967yen.11
 for <linaro-patchwork@canonical.com>;
 Tue, 29 May 2012 19:03:48 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=x-forwarded-to:x-forwarded-for:delivered-to:received-spf
 :mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type:x-gm-message-state;
 bh=O5c3Yof6V532v7N/xib7z9YheeN1BFk5LhGtYcn6SAc=;
 b=o6C3T948eNSpQ0KBbUnogFoYY5hQjZohc608Qn8Dgb0XJ2aXOOslfoTFI64zVeiSS+
 ujNi8eY1oSsXMSIJGAaZo2ZwTRoWXDTvDo+Ms4WAfRd62k9iAO1AV1FPq8mjr33Jd9ff
 bMagOYHPSUk2zyzJ37SxEZCM3mp1XnIAVL31hrM0J3qH+P/HqYsmxP9MZZ1yU070imDa
 qohtFpz16oyqaDQrTgDzZu6snnNmVEGi0M9Cq76mL5GDeH7tiR8+7dWUESH0jfTN6A20
 JdnpCrlorkd6OR+Bo1yuOya0fUUAEOkcuQgPwjTqD4dkF76MnKs6+u4wNjHhhtzUPJGG
 ociQ==
Received: by 10.50.87.227 with SMTP id bb3mr9135848igb.57.1338343427736;
 Tue, 29 May 2012 19:03:47 -0700 (PDT)
X-Forwarded-To: linaro-patchwork@canonical.com
X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com
Delivered-To: patches@linaro.org
Received: by 10.231.24.148 with SMTP id v20csp164034ibb;
 Tue, 29 May 2012 19:03:46 -0700 (PDT)
Received: by 10.224.176.19 with SMTP id bc19mr14084363qab.7.1338343425869;
 Tue, 29 May 2012 19:03:45 -0700 (PDT)
Received: from mail-qc0-f178.google.com (mail-qc0-f178.google.com
 [209.85.216.178]) by mx.google.com with ESMTPS id
 u14si3924803qct.139.2012.05.29.19.03.45
 (version=TLSv1/SSLv3 cipher=OTHER);
 Tue, 29 May 2012 19:03:45 -0700 (PDT)
Received-SPF: neutral (google.com: 209.85.216.178 is neither permitted nor
 denied by best guess record for domain of
 ramana.radhakrishnan@linaro.org) client-ip=209.85.216.178; 
Authentication-Results: mx.google.com;
 spf=neutral (google.com: 209.85.216.178 is neither
 permitted nor denied by best guess record for domain of
 ramana.radhakrishnan@linaro.org)
 smtp.mail=ramana.radhakrishnan@linaro.org
Received: by qcse1 with SMTP id e1so2953023qcs.37
 for <patches@linaro.org>; Tue, 29 May 2012 19:03:45 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.224.39.68 with SMTP id f4mr284137qae.3.1338343425288; Tue, 29
 May 2012 19:03:45 -0700 (PDT)
Received: by 10.224.137.73 with HTTP; Tue, 29 May 2012 19:03:45 -0700 (PDT)
In-Reply-To: <4FC5079D.2010904@redhat.com>
References: <CACUk7=WZMsuDyRAxhipc=bUcfACRCM1SPhV=mSuPWcXP4XON2Q@mail.gmail.com>
 <4FC5079D.2010904@redhat.com>
Date: Wed, 30 May 2012 03:03:45 +0100
Message-ID: <CACUk7=V9zDvb0em-67zrK04ZrCYWNp50YmQabzgS0y+gCB_HBg@mail.gmail.com>
Subject: Re: [Patch ARM] Fix off by one error in neon_evpc_vrev.
From: Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
To: Richard Henderson <rth@redhat.com>
Cc: gcc-patches@gcc.gnu.org, Patch Tracking <patches@linaro.org>, 
 Richard Earnshaw <rearnsha@arm.com>,
 Richard Guenther <richard.guenther@gmail.com>
X-Gm-Message-State: ALoCoQlZar74CwZs0KWTsPGFLYyZU/Tcli2XH5KGByzkqiZi1KrM+OtKVE8j24oRafm3OSdpmZtV

On 29 May 2012 18:30, Richard Henderson <rth@redhat.com> wrote:
> On 05/26/2012 01:27 AM, Ramana Radhakrishnan wrote:
>>
>> -  for (i = 0; i<  nelt; i += diff)
>> +  for (i = 0; i<  nelt ; i += (diff + 1))
>>      for (j = 0; j<= diff; j += 1)
>> -      if (d->perm[i + j] != i + diff - j)
>> -       return false;
>> +      {
>> +       /* This is guaranteed to be true as the value of diff
>> +          is 7, 3, 1 and we should have enough elements in the
>> +          queue to generate this. Getting a vector mask with a
>> +          value of diff other than these values implies that
>> +          something is wrong by the time we get here.  */
>> +       gcc_assert ((i + j)<  nelt);
>
>
> Yep, that all looks correct.  Unnecessary () in both lines though.

Bah - Thanks - don't know why I put those in :(  .Committed to trunk
with those changes and I would like to backport this to the 4.7 branch
after a couple of weeks to allow the auto-testers to pick this up as
it really turns on this functionality in this particular case if the
release managers don't object. This is a significant performance issue
in 4.7 for cases where we reverse vectors and would be nice to fix
there. ( 2 loads + 2 generic permutes vs a single reverse instruciton)

regards,
Ramana


2012-05-30  Ramana Radhakrishnan  <ramana.radhakrishnan@linaro.org>

	* config/arm/arm.c (arm_evpc_neon_vrev): Adjust off by one error.
        * gcc.target/arm/neon-vrev..c: New.


>
>
> r~

Index: gcc/testsuite/gcc.target/arm/neon-vrev.c
===================================================================
--- gcc/testsuite/gcc.target/arm/neon-vrev.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/neon-vrev.c	(revision 187999)
@@ -0,0 +1,105 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+
+#include <arm_neon.h>
+
+uint16x4_t
+tst_vrev642_u16 (uint16x4_t __a)
+{
+  uint16x4_t __rv;
+  uint16x4_t __mask1 = { 3, 2, 1, 0};
+  return __builtin_shuffle ( __a, __mask1) ;
+}
+
+uint16x8_t
+tst_vrev64q2_u16 (uint16x8_t __a)
+{
+  uint16x8_t __rv;
+  uint16x8_t __mask1 = {3, 2, 1, 0, 7, 6, 5, 4 };
+  return __builtin_shuffle ( __a, __mask1) ;
+}
+
+uint8x8_t
+tst_vrev642_u8 (uint8x8_t __a)
+{
+  uint8x8_t __rv;
+  uint8x8_t __mask1 = { 7, 6, 5, 4, 3, 2, 1, 0};
+  return __builtin_shuffle ( __a, __mask1) ;
+}
+
+uint8x16_t
+tst_vrev64q2_u8 (uint8x16_t __a)
+{
+  uint8x16_t __rv;
+  uint8x16_t __mask1 = {7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8};
+  return __builtin_shuffle ( __a, __mask1) ;
+
+}
+
+uint32x2_t
+tst_vrev642_u32 (uint32x2_t __a)
+{
+  uint32x2_t __rv;
+  uint32x2_t __mask1 = {1, 0};
+  return __builtin_shuffle ( __a, __mask1) ;
+
+}
+
+uint32x4_t
+tst_vrev64q2_u32 (uint32x4_t __a)
+{
+  uint32x4_t __rv;
+  uint32x4_t __mask1 = {1, 0, 3, 2};
+  return __builtin_shuffle ( __a, __mask1) ;
+}
+
+uint16x4_t
+tst_vrev322_u16 (uint16x4_t __a)
+{
+  uint16x4_t __mask1 = { 1, 0, 3, 2 };
+  return __builtin_shuffle (__a, __mask1);
+}
+
+uint16x8_t
+tst_vrev32q2_u16 (uint16x8_t __a)
+{
+  uint16x8_t __mask1 = { 1, 0, 3, 2, 5, 4, 7, 6 }; 
+  return __builtin_shuffle (__a, __mask1);
+}
+
+uint8x8_t
+tst_vrev322_u8 (uint8x8_t __a)
+{
+  uint8x8_t __mask1 = { 3, 2, 1, 0, 7, 6, 5, 4};
+  return __builtin_shuffle (__a, __mask1);
+}
+
+uint8x16_t
+tst_vrev32q2_u8 (uint8x16_t __a)
+{
+  uint8x16_t __mask1 = { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12};
+  return __builtin_shuffle (__a, __mask1);
+}
+
+uint8x8_t
+tst_vrev162_u8 (uint8x8_t __a)
+{
+  uint8x8_t __mask = { 1, 0, 3, 2, 5, 4, 7, 6};
+  return __builtin_shuffle (__a, __mask);
+}
+
+uint8x16_t
+tst_vrev16q2_u8 (uint8x16_t __a)
+{
+  uint8x16_t __mask = { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14};
+  return __builtin_shuffle (__a, __mask);
+}
+
+/* { dg-final {scan-assembler-times "vrev32\.16\\t" 2} }  */
+/* { dg-final {scan-assembler-times "vrev32\.8\\t" 2} }  */ 
+/* { dg-final {scan-assembler-times "vrev16\.8\\t" 2} }  */
+/* { dg-final {scan-assembler-times "vrev64\.8\\t" 2} }  */
+/* { dg-final {scan-assembler-times "vrev64\.32\\t" 2} }  */
+/* { dg-final {scan-assembler-times "vrev64\.16\\t" 2} }  */
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 187998)
+++ gcc/config/arm/arm.c	(revision 187999)
@@ -25637,10 +25637,18 @@
       return false;
     }
 
-  for (i = 0; i < nelt; i += diff)
+  for (i = 0; i < nelt ; i += diff + 1)
     for (j = 0; j <= diff; j += 1)
-      if (d->perm[i + j] != i + diff - j)
-	return false;
+      {
+	/* This is guaranteed to be true as the value of diff
+	   is 7, 3, 1 and we should have enough elements in the
+	   queue to generate this. Getting a vector mask with a
+	   value of diff other than these values implies that
+	   something is wrong by the time we get here.  */
+	gcc_assert (i + j < nelt);
+	if (d->perm[i + j] != i + diff - j)
+	  return false;
+      }
 
   /* Success! */
   if (d->testing_p)