From patchwork Wed May 30 02:03:45 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramana Radhakrishnan X-Patchwork-Id: 9018 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id 0C60A23E1B for ; Wed, 30 May 2012 02:03:49 +0000 (UTC) Received: from mail-yx0-f180.google.com (mail-yx0-f180.google.com [209.85.213.180]) by fiordland.canonical.com (Postfix) with ESMTP id AC8C7A18098 for ; Wed, 30 May 2012 02:03:48 +0000 (UTC) Received: by yenq6 with SMTP id q6so3175967yen.11 for ; Tue, 29 May 2012 19:03:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-forwarded-to:x-forwarded-for:delivered-to:received-spf :mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:x-gm-message-state; bh=O5c3Yof6V532v7N/xib7z9YheeN1BFk5LhGtYcn6SAc=; b=o6C3T948eNSpQ0KBbUnogFoYY5hQjZohc608Qn8Dgb0XJ2aXOOslfoTFI64zVeiSS+ ujNi8eY1oSsXMSIJGAaZo2ZwTRoWXDTvDo+Ms4WAfRd62k9iAO1AV1FPq8mjr33Jd9ff bMagOYHPSUk2zyzJ37SxEZCM3mp1XnIAVL31hrM0J3qH+P/HqYsmxP9MZZ1yU070imDa qohtFpz16oyqaDQrTgDzZu6snnNmVEGi0M9Cq76mL5GDeH7tiR8+7dWUESH0jfTN6A20 JdnpCrlorkd6OR+Bo1yuOya0fUUAEOkcuQgPwjTqD4dkF76MnKs6+u4wNjHhhtzUPJGG ociQ== Received: by 10.50.87.227 with SMTP id bb3mr9135848igb.57.1338343427736; Tue, 29 May 2012 19:03:47 -0700 (PDT) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.231.24.148 with SMTP id v20csp164034ibb; Tue, 29 May 2012 19:03:46 -0700 (PDT) Received: by 10.224.176.19 with SMTP id bc19mr14084363qab.7.1338343425869; Tue, 29 May 2012 19:03:45 -0700 (PDT) Received: from mail-qc0-f178.google.com (mail-qc0-f178.google.com [209.85.216.178]) by mx.google.com with ESMTPS id u14si3924803qct.139.2012.05.29.19.03.45 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 29 May 2012 19:03:45 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.216.178 is neither permitted nor denied by best guess record for domain of ramana.radhakrishnan@linaro.org) client-ip=209.85.216.178; Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.216.178 is neither permitted nor denied by best guess record for domain of ramana.radhakrishnan@linaro.org) smtp.mail=ramana.radhakrishnan@linaro.org Received: by qcse1 with SMTP id e1so2953023qcs.37 for ; Tue, 29 May 2012 19:03:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.224.39.68 with SMTP id f4mr284137qae.3.1338343425288; Tue, 29 May 2012 19:03:45 -0700 (PDT) Received: by 10.224.137.73 with HTTP; Tue, 29 May 2012 19:03:45 -0700 (PDT) In-Reply-To: <4FC5079D.2010904@redhat.com> References: <4FC5079D.2010904@redhat.com> Date: Wed, 30 May 2012 03:03:45 +0100 Message-ID: Subject: Re: [Patch ARM] Fix off by one error in neon_evpc_vrev. From: Ramana Radhakrishnan To: Richard Henderson Cc: gcc-patches@gcc.gnu.org, Patch Tracking , Richard Earnshaw , Richard Guenther X-Gm-Message-State: ALoCoQlZar74CwZs0KWTsPGFLYyZU/Tcli2XH5KGByzkqiZi1KrM+OtKVE8j24oRafm3OSdpmZtV On 29 May 2012 18:30, Richard Henderson wrote: > On 05/26/2012 01:27 AM, Ramana Radhakrishnan wrote: >> >> -  for (i = 0; i<  nelt; i += diff) >> +  for (i = 0; i<  nelt ; i += (diff + 1)) >>      for (j = 0; j<= diff; j += 1) >> -      if (d->perm[i + j] != i + diff - j) >> -       return false; >> +      { >> +       /* This is guaranteed to be true as the value of diff >> +          is 7, 3, 1 and we should have enough elements in the >> +          queue to generate this. Getting a vector mask with a >> +          value of diff other than these values implies that >> +          something is wrong by the time we get here.  */ >> +       gcc_assert ((i + j)<  nelt); > > > Yep, that all looks correct.  Unnecessary () in both lines though. Bah - Thanks - don't know why I put those in :( .Committed to trunk with those changes and I would like to backport this to the 4.7 branch after a couple of weeks to allow the auto-testers to pick this up as it really turns on this functionality in this particular case if the release managers don't object. This is a significant performance issue in 4.7 for cases where we reverse vectors and would be nice to fix there. ( 2 loads + 2 generic permutes vs a single reverse instruciton) regards, Ramana 2012-05-30 Ramana Radhakrishnan * config/arm/arm.c (arm_evpc_neon_vrev): Adjust off by one error. * gcc.target/arm/neon-vrev..c: New. > > > r~ Index: gcc/testsuite/gcc.target/arm/neon-vrev.c =================================================================== --- gcc/testsuite/gcc.target/arm/neon-vrev.c (revision 0) +++ gcc/testsuite/gcc.target/arm/neon-vrev.c (revision 187999) @@ -0,0 +1,105 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_ok } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_neon } */ + +#include + +uint16x4_t +tst_vrev642_u16 (uint16x4_t __a) +{ + uint16x4_t __rv; + uint16x4_t __mask1 = { 3, 2, 1, 0}; + return __builtin_shuffle ( __a, __mask1) ; +} + +uint16x8_t +tst_vrev64q2_u16 (uint16x8_t __a) +{ + uint16x8_t __rv; + uint16x8_t __mask1 = {3, 2, 1, 0, 7, 6, 5, 4 }; + return __builtin_shuffle ( __a, __mask1) ; +} + +uint8x8_t +tst_vrev642_u8 (uint8x8_t __a) +{ + uint8x8_t __rv; + uint8x8_t __mask1 = { 7, 6, 5, 4, 3, 2, 1, 0}; + return __builtin_shuffle ( __a, __mask1) ; +} + +uint8x16_t +tst_vrev64q2_u8 (uint8x16_t __a) +{ + uint8x16_t __rv; + uint8x16_t __mask1 = {7, 6, 5, 4, 3, 2, 1, 0, 15, 14, 13, 12, 11, 10, 9, 8}; + return __builtin_shuffle ( __a, __mask1) ; + +} + +uint32x2_t +tst_vrev642_u32 (uint32x2_t __a) +{ + uint32x2_t __rv; + uint32x2_t __mask1 = {1, 0}; + return __builtin_shuffle ( __a, __mask1) ; + +} + +uint32x4_t +tst_vrev64q2_u32 (uint32x4_t __a) +{ + uint32x4_t __rv; + uint32x4_t __mask1 = {1, 0, 3, 2}; + return __builtin_shuffle ( __a, __mask1) ; +} + +uint16x4_t +tst_vrev322_u16 (uint16x4_t __a) +{ + uint16x4_t __mask1 = { 1, 0, 3, 2 }; + return __builtin_shuffle (__a, __mask1); +} + +uint16x8_t +tst_vrev32q2_u16 (uint16x8_t __a) +{ + uint16x8_t __mask1 = { 1, 0, 3, 2, 5, 4, 7, 6 }; + return __builtin_shuffle (__a, __mask1); +} + +uint8x8_t +tst_vrev322_u8 (uint8x8_t __a) +{ + uint8x8_t __mask1 = { 3, 2, 1, 0, 7, 6, 5, 4}; + return __builtin_shuffle (__a, __mask1); +} + +uint8x16_t +tst_vrev32q2_u8 (uint8x16_t __a) +{ + uint8x16_t __mask1 = { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12}; + return __builtin_shuffle (__a, __mask1); +} + +uint8x8_t +tst_vrev162_u8 (uint8x8_t __a) +{ + uint8x8_t __mask = { 1, 0, 3, 2, 5, 4, 7, 6}; + return __builtin_shuffle (__a, __mask); +} + +uint8x16_t +tst_vrev16q2_u8 (uint8x16_t __a) +{ + uint8x16_t __mask = { 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14}; + return __builtin_shuffle (__a, __mask); +} + +/* { dg-final {scan-assembler-times "vrev32\.16\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev32\.8\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev16\.8\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev64\.8\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev64\.32\\t" 2} } */ +/* { dg-final {scan-assembler-times "vrev64\.16\\t" 2} } */ Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c (revision 187998) +++ gcc/config/arm/arm.c (revision 187999) @@ -25637,10 +25637,18 @@ return false; } - for (i = 0; i < nelt; i += diff) + for (i = 0; i < nelt ; i += diff + 1) for (j = 0; j <= diff; j += 1) - if (d->perm[i + j] != i + diff - j) - return false; + { + /* This is guaranteed to be true as the value of diff + is 7, 3, 1 and we should have enough elements in the + queue to generate this. Getting a vector mask with a + value of diff other than these values implies that + something is wrong by the time we get here. */ + gcc_assert (i + j < nelt); + if (d->perm[i + j] != i + diff - j) + return false; + } /* Success! */ if (d->testing_p)