From patchwork Thu Oct  1 10:05:27 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Michael Collison <michael.collison@linaro.org>
X-Patchwork-Id: 54360
Return-Path: <patchwork-forward+bncBDIIVBVZ6QLRBAMLWSYAKGQEV2X25YI@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-wi0-f197.google.com (mail-wi0-f197.google.com
 [209.85.212.197])
 by patches.linaro.org (Postfix) with ESMTPS id D8549205D0
 for <linaro@patches.linaro.org>; Thu,  1 Oct 2015 10:05:53 +0000 (UTC)
Received: by wicgb1 with SMTP id gb1sf6167217wic.3
 for <linaro@patches.linaro.org>; Thu, 01 Oct 2015 03:05:53 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :delivered-to:message-id:date:from:user-agent:mime-version:to
 :subject:references:in-reply-to:content-type:x-original-sender
 :x-original-authentication-results;
 bh=no/1VUbuOQHd66aZVX0930RLTJcZSKXJ7QoTUWzEXLU=;
 b=YStzNi78Qua9NCug17k0Bby+VNamN21d7p6iGDO0DS9SIV5nnMgwFWRxVBcqfN2D61
 U5KEkXKz/Uu4aAf7jEYjzQgyd9ZijEDtXclglQ/VpVehehzcOXA0q0pmRIk/0ooRzIzk
 iEpkNL+81j9pc3kq+8gHmBW80+eC8z+aExZtwIxuoEFxQID/aJJ6/5mG7FzkFzc8g1cd
 Z9Fdr9fIRb2YU3c4zSoOPbIk4BkUUtRPA0MwO5Z+s6kk6wCM3zk5lymnXB2DSZB2kAda
 0jfzoLkfuO5p8QLOd6WLTmbXwbD/6L1zzvVCNvi3Exv4GZ+BzUGjfch92mpyzo+y7nzl
 LRjg==
X-Gm-Message-State: ALoCoQlPqbxtKrVA9wt1Z02Yt1kDprNmMPEkl1YcjVjm8qzKIC+DfG83NXyH4hfvey9bQZyJsg22
X-Received: by 10.112.198.33 with SMTP id iz1mr1256338lbc.8.1443693953136;
 Thu, 01 Oct 2015 03:05:53 -0700 (PDT)
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.152.28.202 with SMTP id d10ls94872lah.43.gmail; Thu, 01 Oct
 2015 03:05:52 -0700 (PDT)
X-Received: by 10.152.30.3 with SMTP id o3mr1336026lah.90.1443693952771;
 Thu, 01 Oct 2015 03:05:52 -0700 (PDT)
Received: from mail-lb0-x22d.google.com (mail-lb0-x22d.google.com.
 [2a00:1450:4010:c04::22d])
 by mx.google.com with ESMTPS id s5si2438872lbw.89.2015.10.01.03.05.52
 for <patchwork-forward@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 01 Oct 2015 03:05:52 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c04::22d as permitted sender)
 client-ip=2a00:1450:4010:c04::22d; 
Received: by lbcao8 with SMTP id ao8so6182297lbc.3
 for <patchwork-forward@linaro.org>;
 Thu, 01 Oct 2015 03:05:52 -0700 (PDT)
X-Received: by 10.152.5.133 with SMTP id s5mr2606039las.19.1443693952594;
 Thu, 01 Oct 2015 03:05:52 -0700 (PDT)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patch@linaro.org
Received: by 10.112.59.35 with SMTP id w3csp492223lbq;
 Thu, 1 Oct 2015 03:05:51 -0700 (PDT)
X-Received: by 10.66.146.197 with SMTP id te5mr6486000pab.35.1443693950920; 
 Thu, 01 Oct 2015 03:05:50 -0700 (PDT)
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 xr2si7835677pab.191.2015.10.01.03.05.50 for <patch@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 01 Oct 2015 03:05:50 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-408808-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Received: (qmail 83849 invoked by alias); 1 Oct 2015 10:05:36 -0000
Mailing-List: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
Precedence: list
List-Id: <patchwork-forward.linaro.org>
List-Unsubscribe: <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>, 
 <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 83794 invoked by uid 89); 1 Oct 2015 10:05:35 -0000
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00,
 RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2
X-HELO: mail-pa0-f41.google.com
Received: from mail-pa0-f41.google.com (HELO mail-pa0-f41.google.com)
 (209.85.220.41) by sourceware.org
 (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256
 encrypted) ESMTPS; Thu, 01 Oct 2015 10:05:32 +0000
Received: by pacfv12 with SMTP id fv12so71853973pac.2 for
 <gcc-patches@gcc.gnu.org>; Thu, 01 Oct 2015 03:05:30 -0700 (PDT)
X-Received: by 10.66.184.42 with SMTP id er10mr11329778pac.117.1443693930698; 
 Thu, 01 Oct 2015 03:05:30 -0700 (PDT)
Received: from [192.168.1.14] (ip70-176-202-128.ph.ph.cox.net.
 [70.176.202.128]) by smtp.googlemail.com with ESMTPSA id
 fx4sm5611445pbb.92.2015.10.01.03.05.29 (version=TLSv1.2
 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 01 Oct 2015 03:05:29 -0700 (PDT)
Message-ID: <560D0567.40207@linaro.org>
Date: Thu, 01 Oct 2015 03:05:27 -0700
From: Michael Collison <michael.collison@linaro.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: Kyrill Tkachov <kyrylo.tkachov@arm.com>,
 GCC Patches <gcc-patches@gcc.gnu.org>,
 Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
Subject: Re: [ARM] Use vector wide add for mixed-mode adds
References: <5601E9B9.5060600@linaro.org> <560267B4.5070809@arm.com>
In-Reply-To: <560267B4.5070809@arm.com>
X-Original-Sender: michael.collison@linaro.org
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com:
 domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c04::22d as permitted sender)
 smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org;
 dkim=pass header.i=@gcc.gnu.org
X-Google-Group-Id: 836684582541

Kyrill,

I have modified the patch to address your comments. I also modified 
check_effective_target_vect_widen_sum_hi_to_si_pattern in 
target-supports.exp to
indicate that arm neon supports vector widen sum of HImode to SImode. 
This resolved
several test suite failures.

Successfully tested on arm-none-eabi, arm-none-linux-gnueabihf. I have 
four related execution failure
tests on armeb-non-linux-gnueabihf with -flto only.

gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test
gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test


I am debugging but have not tracked down the root cause yet. Feedback?

2015-07-22  Michael Collison  <michael.collison@linaro.org>

     * config/arm/neon.md (widen_<us>sum<mode>): New patterns
     where mode is VQI to improve mixed mode vectorization.
     * config/arm/neon.md (vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3): New
     define_insn to match low half of signed vaddw.
     * config/arm/neon.md (vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3): New
     define_insn to match high half of signed vaddw.
     * config/arm/neon.md (vec_sel_widen_usum_lo<VQI:mode><VW:mode>3): New
     define_insn to match low half of unsigned vaddw.
     * config/arm/neon.md (vec_sel_widen_usum_hi<VQI:mode><VW:mode>3): New
     define_insn to match high half of unsigned vaddw.
     * testsuite/gcc.target/arm/neon-vaddws16.c: New test.
     * testsuite/gcc.target/arm/neon-vaddws32.c: New test.
     * testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
     * testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
     * testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
     * testsuite/lib/target-supports.exp
     (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
     that arm neon support vector widen sum of HImode TO SImode.

On 09/23/2015 01:49 AM, Kyrill Tkachov wrote:
> Hi Michael,
>
> On 23/09/15 00:52, Michael Collison wrote:
>> This is a modified version of the previous patch that removes the
>> documentation and read-md.c fixes. These patches have been submitted
>> separately and approved.
>>
>> This patch is designed to address code that was not being vectorized due
>> to missing widening patterns in the ARM backend. Code such as:
>>
>> int t6(int len, void * dummy, short * __restrict x)
>> {
>>     len = len & ~31;
>>     int result = 0;
>>     __asm volatile ("");
>>     for (int i = 0; i < len; i++)
>>       result += x[i];
>>     return result;
>> }
>>
>> Validated on arm-none-eabi, arm-none-linux-gnueabi,
>> arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf.
>>
>> 2015-09-22  Michael Collison <michael.collison@linaro.org>
>>
>>       * config/arm/neon.md (widen_<us>sum<mode>): New patterns
>>       where mode is VQI to improve mixed mode add vectorization.
>>
>
> Please list all the new define_expands and define_insns
> in the changelog. Also, please add an ChangeLog entry for
> the testsuite additions.
>
> The approach looks ok to me with a few comments on some
> parts of the patch itself.
>
>
> +(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3"
> +  [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
> +    (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW 
> (match_operand:VQI 1 "s_register_operand" "%w")
> +                           (match_operand:VQI 2 
> "vect_par_constant_high" "")))
> +                (match_operand:<VW:V_widen> 3 "s_register_operand" 
> "0")))]
> +  "TARGET_NEON"
> +  "vaddw.<V_s_elem>\t%q0, %q3, %f1"
> +  [(set_attr "type" "neon_add_widen")
> +  (set_attr "length" "8")]
> +)
>
>
> This is a single instruction, and it has a length of 4, so no need to 
> override the length attribute.
> Same with the other define_insns in this patch.
>
>
> diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c 
> b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
> new file mode 100644
> index 0000000..ed10669
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target arm_neon_hw } */
>
> The arm_neon_hw check is usually used when you want to run the tests.
> Since this is a compile-only tests you just need arm_neon_ok.
>
>  +/* { dg-add-options arm_neon_ok } */
> +/* { dg-options "-O3" } */
> +
> +
> +int
> +t6(int len, void * dummy, short * __restrict x)
> +{
> +  len = len & ~31;
> +  int result = 0;
> +  __asm volatile ("");
> +  for (int i = 0; i < len; i++)
> +    result += x[i];
> +  return result;
> +}
> +
> +/* { dg-final { scan-assembler "vaddw\.s16" } } */
> +
> +
> +
>
> Stray trailing newlines. Similar comments for the other testcases.
>
> Thanks,
> Kyrill
>

diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 654d9d5..b3485f1 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1174,6 +1174,55 @@
 
 ;; Widening operations
 
+(define_expand "widen_ssum<mode>3"
+  [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
+	(plus:<V_double_width> (sign_extend:<V_double_width> (match_operand:VQI 1 "s_register_operand" ""))
+			       (match_operand:<V_double_width> 2 "s_register_operand" "")))]
+  "TARGET_NEON"
+  {
+    int i;
+    int half_elem = <V_mode_nunits>/2;
+    rtvec v1 = rtvec_alloc (half_elem);
+    rtvec v2 = rtvec_alloc (half_elem);
+    rtx p1, p2;
+
+    for (i = 0; i < half_elem; i++)
+      RTVEC_ELT (v1, i) = GEN_INT (i);
+    p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1);
+
+    for (i = half_elem; i < <V_mode_nunits>; i++)
+      RTVEC_ELT (v2, i - half_elem) = GEN_INT (i);
+    p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2);
+
+    if (operands[0] != operands[2])
+      emit_move_insn (operands[0], operands[2]);
+
+    emit_insn (gen_vec_sel_widen_ssum_lo<mode><V_half>3 (operands[0], operands[1], p1, operands[0]));
+    emit_insn (gen_vec_sel_widen_ssum_hi<mode><V_half>3 (operands[0], operands[1], p2, operands[0]));
+    DONE;
+  }
+)
+
+(define_insn "vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3"
+  [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+	(plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
+						   (match_operand:VQI 2 "vect_par_constant_low" "")))
+		        (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+  "TARGET_NEON"
+  "vaddw.<V_s_elem>\t%q0, %q3, %e1"
+  [(set_attr "type" "neon_add_widen")]
+)
+
+(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3"
+  [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+	(plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
+						   (match_operand:VQI 2 "vect_par_constant_high" "")))
+		        (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+  "TARGET_NEON"
+  "vaddw.<V_s_elem>\t%q0, %q3, %f1"
+  [(set_attr "type" "neon_add_widen")]
+)
+
 (define_insn "widen_ssum<mode>3"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
 	(plus:<V_widen> (sign_extend:<V_widen>
@@ -1184,6 +1233,55 @@
   [(set_attr "type" "neon_add_widen")]
 )
 
+(define_expand "widen_usum<mode>3"
+  [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
+	(plus:<V_double_width> (zero_extend:<V_double_width> (match_operand:VQI 1 "s_register_operand" ""))
+			       (match_operand:<V_double_width> 2 "s_register_operand" "")))]
+  "TARGET_NEON"
+  {
+    int i;
+    int half_elem = <V_mode_nunits>/2;
+    rtvec v1 = rtvec_alloc (half_elem);
+    rtvec v2 = rtvec_alloc (half_elem);
+    rtx p1, p2;
+
+    for (i = 0; i < half_elem; i++)
+      RTVEC_ELT (v1, i) = GEN_INT (i);
+    p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1);
+
+    for (i = half_elem; i < <V_mode_nunits>; i++)
+      RTVEC_ELT (v2, i - half_elem) = GEN_INT (i);
+    p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2);
+
+    if (operands[0] != operands[2])
+      emit_move_insn (operands[0], operands[2]);
+
+    emit_insn (gen_vec_sel_widen_usum_lo<mode><V_half>3 (operands[0], operands[1], p1, operands[0]));
+    emit_insn (gen_vec_sel_widen_usum_hi<mode><V_half>3 (operands[0], operands[1], p2, operands[0]));
+    DONE;
+  }
+)
+
+(define_insn "vec_sel_widen_usum_lo<VQI:mode><VW:mode>3"
+  [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+	(plus:<VW:V_widen> (zero_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
+						   (match_operand:VQI 2 "vect_par_constant_low" "")))
+		        (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+  "TARGET_NEON"
+  "vaddw.<V_u_elem>\t%q0, %q3, %e1"
+  [(set_attr "type" "neon_add_widen")]
+)
+
+(define_insn "vec_sel_widen_usum_hi<VQI:mode><VW:mode>3"
+  [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+	(plus:<VW:V_widen> (zero_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
+						   (match_operand:VQI 2 "vect_par_constant_high" "")))
+		        (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+  "TARGET_NEON"
+  "vaddw.<V_u_elem>\t%q0, %q3, %f1"
+  [(set_attr "type" "neon_add_widen")]
+)
+
 (define_insn "widen_usum<mode>3"
   [(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
 	(plus:<V_widen> (zero_extend:<V_widen>
@@ -5347,7 +5445,7 @@
  [(set (match_operand:<V_unpack> 0 "register_operand" "=w")
        (mult:<V_unpack> (SE:<V_unpack> (vec_select:<V_HALF>
 			   (match_operand:VU 1 "register_operand" "w") 
-                           (match_operand:VU 2 "vect_par_constant_low" "")))
+					(match_operand:VU 2 "vect_par_constant_low" "")))
  		        (SE:<V_unpack> (vec_select:<V_HALF>
                            (match_operand:VU 3 "register_operand" "w") 
                            (match_dup 2)))))]
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
new file mode 100644
index 0000000..96c657e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+
+int 
+t6(int len, void * dummy, short * __restrict x)
+{
+  len = len & ~31;
+  int result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+    result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.s16" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws32.c b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c
new file mode 100644
index 0000000..1bfdc13
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+int 
+t6(int len, void * dummy, int * __restrict x)
+{
+  len = len & ~31;
+  long long result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+    result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.s32" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c
new file mode 100644
index 0000000..98f8768
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+
+int 
+t6(int len, void * dummy, unsigned short * __restrict x)
+{
+  len = len & ~31;
+  unsigned int result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+    result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw.u16" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c
new file mode 100644
index 0000000..4a72a39
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+int 
+t6(int len, void * dummy, unsigned int * __restrict x)
+{
+  len = len & ~31;
+  unsigned long long result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+    result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.u32" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c
new file mode 100644
index 0000000..9c9c68a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_hw } */
+/* { dg-add-options arm_neon_ok } */
+/* { dg-options "-O3" } */
+
+
+int 
+t6(int len, void * dummy, char * __restrict x)
+{
+  len = len & ~31;
+  unsigned short result = 0;
+  __asm volatile ("");
+  for (int i = 0; i < len; i++)
+    result += x[i];
+  return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.u8" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 1988301..5530edc 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3838,6 +3838,7 @@ proc check_effective_target_vect_widen_sum_hi_to_si_pattern { } {
     } else {
         set et_vect_widen_sum_hi_to_si_pattern_saved 0
         if { [istarget powerpc*-*-*]
+	     || [check_effective_target_arm_neon_ok]
              || [istarget ia64-*-*] } {
             set et_vect_widen_sum_hi_to_si_pattern_saved 1
         }
-- 
1.9.1