From patchwork Thu Oct 1 10:05:27 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Collison X-Patchwork-Id: 54360 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-wi0-f197.google.com (mail-wi0-f197.google.com [209.85.212.197]) by patches.linaro.org (Postfix) with ESMTPS id D8549205D0 for ; Thu, 1 Oct 2015 10:05:53 +0000 (UTC) Received: by wicgb1 with SMTP id gb1sf6167217wic.3 for ; Thu, 01 Oct 2015 03:05:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type:x-original-sender :x-original-authentication-results; bh=no/1VUbuOQHd66aZVX0930RLTJcZSKXJ7QoTUWzEXLU=; b=YStzNi78Qua9NCug17k0Bby+VNamN21d7p6iGDO0DS9SIV5nnMgwFWRxVBcqfN2D61 U5KEkXKz/Uu4aAf7jEYjzQgyd9ZijEDtXclglQ/VpVehehzcOXA0q0pmRIk/0ooRzIzk iEpkNL+81j9pc3kq+8gHmBW80+eC8z+aExZtwIxuoEFxQID/aJJ6/5mG7FzkFzc8g1cd Z9Fdr9fIRb2YU3c4zSoOPbIk4BkUUtRPA0MwO5Z+s6kk6wCM3zk5lymnXB2DSZB2kAda 0jfzoLkfuO5p8QLOd6WLTmbXwbD/6L1zzvVCNvi3Exv4GZ+BzUGjfch92mpyzo+y7nzl LRjg== X-Gm-Message-State: ALoCoQlPqbxtKrVA9wt1Z02Yt1kDprNmMPEkl1YcjVjm8qzKIC+DfG83NXyH4hfvey9bQZyJsg22 X-Received: by 10.112.198.33 with SMTP id iz1mr1256338lbc.8.1443693953136; Thu, 01 Oct 2015 03:05:53 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.28.202 with SMTP id d10ls94872lah.43.gmail; Thu, 01 Oct 2015 03:05:52 -0700 (PDT) X-Received: by 10.152.30.3 with SMTP id o3mr1336026lah.90.1443693952771; Thu, 01 Oct 2015 03:05:52 -0700 (PDT) Received: from mail-lb0-x22d.google.com (mail-lb0-x22d.google.com. [2a00:1450:4010:c04::22d]) by mx.google.com with ESMTPS id s5si2438872lbw.89.2015.10.01.03.05.52 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 01 Oct 2015 03:05:52 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::22d as permitted sender) client-ip=2a00:1450:4010:c04::22d; Received: by lbcao8 with SMTP id ao8so6182297lbc.3 for ; Thu, 01 Oct 2015 03:05:52 -0700 (PDT) X-Received: by 10.152.5.133 with SMTP id s5mr2606039las.19.1443693952594; Thu, 01 Oct 2015 03:05:52 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.59.35 with SMTP id w3csp492223lbq; Thu, 1 Oct 2015 03:05:51 -0700 (PDT) X-Received: by 10.66.146.197 with SMTP id te5mr6486000pab.35.1443693950920; Thu, 01 Oct 2015 03:05:50 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id xr2si7835677pab.191.2015.10.01.03.05.50 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 01 Oct 2015 03:05:50 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-408808-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 83849 invoked by alias); 1 Oct 2015 10:05:36 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 83794 invoked by uid 89); 1 Oct 2015 10:05:35 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 X-HELO: mail-pa0-f41.google.com Received: from mail-pa0-f41.google.com (HELO mail-pa0-f41.google.com) (209.85.220.41) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 01 Oct 2015 10:05:32 +0000 Received: by pacfv12 with SMTP id fv12so71853973pac.2 for ; Thu, 01 Oct 2015 03:05:30 -0700 (PDT) X-Received: by 10.66.184.42 with SMTP id er10mr11329778pac.117.1443693930698; Thu, 01 Oct 2015 03:05:30 -0700 (PDT) Received: from [192.168.1.14] (ip70-176-202-128.ph.ph.cox.net. [70.176.202.128]) by smtp.googlemail.com with ESMTPSA id fx4sm5611445pbb.92.2015.10.01.03.05.29 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 01 Oct 2015 03:05:29 -0700 (PDT) Message-ID: <560D0567.40207@linaro.org> Date: Thu, 01 Oct 2015 03:05:27 -0700 From: Michael Collison User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Kyrill Tkachov , GCC Patches , Ramana Radhakrishnan Subject: Re: [ARM] Use vector wide add for mixed-mode adds References: <5601E9B9.5060600@linaro.org> <560267B4.5070809@arm.com> In-Reply-To: <560267B4.5070809@arm.com> X-Original-Sender: michael.collison@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::22d as permitted sender) smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 Kyrill, I have modified the patch to address your comments. I also modified check_effective_target_vect_widen_sum_hi_to_si_pattern in target-supports.exp to indicate that arm neon supports vector widen sum of HImode to SImode. This resolved several test suite failures. Successfully tested on arm-none-eabi, arm-none-linux-gnueabihf. I have four related execution failure tests on armeb-non-linux-gnueabihf with -flto only. gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test I am debugging but have not tracked down the root cause yet. Feedback? 2015-07-22 Michael Collison * config/arm/neon.md (widen_sum): New patterns where mode is VQI to improve mixed mode vectorization. * config/arm/neon.md (vec_sel_widen_ssum_lo3): New define_insn to match low half of signed vaddw. * config/arm/neon.md (vec_sel_widen_ssum_hi3): New define_insn to match high half of signed vaddw. * config/arm/neon.md (vec_sel_widen_usum_lo3): New define_insn to match low half of unsigned vaddw. * config/arm/neon.md (vec_sel_widen_usum_hi3): New define_insn to match high half of unsigned vaddw. * testsuite/gcc.target/arm/neon-vaddws16.c: New test. * testsuite/gcc.target/arm/neon-vaddws32.c: New test. * testsuite/gcc.target/arm/neon-vaddwu16.c: New test. * testsuite/gcc.target/arm/neon-vaddwu32.c: New test. * testsuite/gcc.target/arm/neon-vaddwu8.c: New test. * testsuite/lib/target-supports.exp (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate that arm neon support vector widen sum of HImode TO SImode. On 09/23/2015 01:49 AM, Kyrill Tkachov wrote: > Hi Michael, > > On 23/09/15 00:52, Michael Collison wrote: >> This is a modified version of the previous patch that removes the >> documentation and read-md.c fixes. These patches have been submitted >> separately and approved. >> >> This patch is designed to address code that was not being vectorized due >> to missing widening patterns in the ARM backend. Code such as: >> >> int t6(int len, void * dummy, short * __restrict x) >> { >> len = len & ~31; >> int result = 0; >> __asm volatile (""); >> for (int i = 0; i < len; i++) >> result += x[i]; >> return result; >> } >> >> Validated on arm-none-eabi, arm-none-linux-gnueabi, >> arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf. >> >> 2015-09-22 Michael Collison >> >> * config/arm/neon.md (widen_sum): New patterns >> where mode is VQI to improve mixed mode add vectorization. >> > > Please list all the new define_expands and define_insns > in the changelog. Also, please add an ChangeLog entry for > the testsuite additions. > > The approach looks ok to me with a few comments on some > parts of the patch itself. > > > +(define_insn "vec_sel_widen_ssum_hi3" > + [(set (match_operand: 0 "s_register_operand" "=w") > + (plus: (sign_extend: (vec_select:VW > (match_operand:VQI 1 "s_register_operand" "%w") > + (match_operand:VQI 2 > "vect_par_constant_high" ""))) > + (match_operand: 3 "s_register_operand" > "0")))] > + "TARGET_NEON" > + "vaddw.\t%q0, %q3, %f1" > + [(set_attr "type" "neon_add_widen") > + (set_attr "length" "8")] > +) > > > This is a single instruction, and it has a length of 4, so no need to > override the length attribute. > Same with the other define_insns in this patch. > > > diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c > b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c > new file mode 100644 > index 0000000..ed10669 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target arm_neon_hw } */ > > The arm_neon_hw check is usually used when you want to run the tests. > Since this is a compile-only tests you just need arm_neon_ok. > > +/* { dg-add-options arm_neon_ok } */ > +/* { dg-options "-O3" } */ > + > + > +int > +t6(int len, void * dummy, short * __restrict x) > +{ > + len = len & ~31; > + int result = 0; > + __asm volatile (""); > + for (int i = 0; i < len; i++) > + result += x[i]; > + return result; > +} > + > +/* { dg-final { scan-assembler "vaddw\.s16" } } */ > + > + > + > > Stray trailing newlines. Similar comments for the other testcases. > > Thanks, > Kyrill > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 654d9d5..b3485f1 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1174,6 +1174,55 @@ ;; Widening operations +(define_expand "widen_ssum3" + [(set (match_operand: 0 "s_register_operand" "") + (plus: (sign_extend: (match_operand:VQI 1 "s_register_operand" "")) + (match_operand: 2 "s_register_operand" "")))] + "TARGET_NEON" + { + int i; + int half_elem = /2; + rtvec v1 = rtvec_alloc (half_elem); + rtvec v2 = rtvec_alloc (half_elem); + rtx p1, p2; + + for (i = 0; i < half_elem; i++) + RTVEC_ELT (v1, i) = GEN_INT (i); + p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1); + + for (i = half_elem; i < ; i++) + RTVEC_ELT (v2, i - half_elem) = GEN_INT (i); + p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2); + + if (operands[0] != operands[2]) + emit_move_insn (operands[0], operands[2]); + + emit_insn (gen_vec_sel_widen_ssum_lo3 (operands[0], operands[1], p1, operands[0])); + emit_insn (gen_vec_sel_widen_ssum_hi3 (operands[0], operands[1], p2, operands[0])); + DONE; + } +) + +(define_insn "vec_sel_widen_ssum_lo3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %e1" + [(set_attr "type" "neon_add_widen")] +) + +(define_insn "vec_sel_widen_ssum_hi3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %f1" + [(set_attr "type" "neon_add_widen")] +) + (define_insn "widen_ssum3" [(set (match_operand: 0 "s_register_operand" "=w") (plus: (sign_extend: @@ -1184,6 +1233,55 @@ [(set_attr "type" "neon_add_widen")] ) +(define_expand "widen_usum3" + [(set (match_operand: 0 "s_register_operand" "") + (plus: (zero_extend: (match_operand:VQI 1 "s_register_operand" "")) + (match_operand: 2 "s_register_operand" "")))] + "TARGET_NEON" + { + int i; + int half_elem = /2; + rtvec v1 = rtvec_alloc (half_elem); + rtvec v2 = rtvec_alloc (half_elem); + rtx p1, p2; + + for (i = 0; i < half_elem; i++) + RTVEC_ELT (v1, i) = GEN_INT (i); + p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1); + + for (i = half_elem; i < ; i++) + RTVEC_ELT (v2, i - half_elem) = GEN_INT (i); + p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2); + + if (operands[0] != operands[2]) + emit_move_insn (operands[0], operands[2]); + + emit_insn (gen_vec_sel_widen_usum_lo3 (operands[0], operands[1], p1, operands[0])); + emit_insn (gen_vec_sel_widen_usum_hi3 (operands[0], operands[1], p2, operands[0])); + DONE; + } +) + +(define_insn "vec_sel_widen_usum_lo3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (zero_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %e1" + [(set_attr "type" "neon_add_widen")] +) + +(define_insn "vec_sel_widen_usum_hi3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (zero_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %f1" + [(set_attr "type" "neon_add_widen")] +) + (define_insn "widen_usum3" [(set (match_operand: 0 "s_register_operand" "=w") (plus: (zero_extend: @@ -5347,7 +5445,7 @@ [(set (match_operand: 0 "register_operand" "=w") (mult: (SE: (vec_select: (match_operand:VU 1 "register_operand" "w") - (match_operand:VU 2 "vect_par_constant_low" ""))) + (match_operand:VU 2 "vect_par_constant_low" ""))) (SE: (vec_select: (match_operand:VU 3 "register_operand" "w") (match_dup 2)))))] diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c new file mode 100644 index 0000000..96c657e --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, short * __restrict x) +{ + len = len & ~31; + int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws32.c b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c new file mode 100644 index 0000000..1bfdc13 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, int * __restrict x) +{ + len = len & ~31; + long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c new file mode 100644 index 0000000..98f8768 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, unsigned short * __restrict x) +{ + len = len & ~31; + unsigned int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c new file mode 100644 index 0000000..4a72a39 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, unsigned int * __restrict x) +{ + len = len & ~31; + unsigned long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c new file mode 100644 index 0000000..9c9c68a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, char * __restrict x) +{ + len = len & ~31; + unsigned short result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u8" } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 1988301..5530edc 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3838,6 +3838,7 @@ proc check_effective_target_vect_widen_sum_hi_to_si_pattern { } { } else { set et_vect_widen_sum_hi_to_si_pattern_saved 0 if { [istarget powerpc*-*-*] + || [check_effective_target_arm_neon_ok] || [istarget ia64-*-*] } { set et_vect_widen_sum_hi_to_si_pattern_saved 1 } -- 1.9.1