From patchwork Sat Aug 22 21:38:33 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Collison X-Patchwork-Id: 52611 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-la0-f71.google.com (mail-la0-f71.google.com [209.85.215.71]) by patches.linaro.org (Postfix) with ESMTPS id 9AD3E20AF6 for ; Sat, 22 Aug 2015 21:39:04 +0000 (UTC) Received: by labth1 with SMTP id th1sf30843370lab.2 for ; Sat, 22 Aug 2015 14:39:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:message-id:date:from:user-agent:mime-version:to :subject:references:in-reply-to:content-type :content-transfer-encoding:x-original-sender :x-original-authentication-results; bh=qA+9+NCwN86a+NlnOI3rdTDtsyTAyCkKAQeH9mHn+jE=; b=b6DGdu3mtxJEZwpuDLhQZNUu9NCjV6W71UtERHCk8emK54ZmIl8/o8lqTv+RRhN8Tp vfwa6p0JJlRL5yI/W2NghqzkL2OUKE4XUlor93+6KsMq/XqMVI2NBUGFofkEK3Xtqb+G MSmyaiZ1EZrrSl8gOMZXK7ksPJEU6jC7WIFcz2T4fzfu53gsC8aKZK2E9hABsvIFS94U CcOu+EoLPY6MXglbfPRJo8w+iD+cqe4wEZP+4XAQZgin72Ph79nEpUq1cDjo/WCOK7dN sUqu7wsvl2QMsO5+CHu+wN81T9i8z496x76oYUw7FxcERriIVU9iCw69TNR49ra4kOFa 5B9w== X-Gm-Message-State: ALoCoQmvFWbTEc0bA0wyKuHS3DkZMnIaBHWU+gdWvzXoarXsAdf7JU5R3rOUYEqw1Wt48NXYaZHB X-Received: by 10.194.71.39 with SMTP id r7mr4385365wju.0.1440279543521; Sat, 22 Aug 2015 14:39:03 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.27.65 with SMTP id r1ls444889lag.31.gmail; Sat, 22 Aug 2015 14:39:03 -0700 (PDT) X-Received: by 10.112.161.137 with SMTP id xs9mr13889023lbb.4.1440279543337; Sat, 22 Aug 2015 14:39:03 -0700 (PDT) Received: from mail-lb0-x22d.google.com (mail-lb0-x22d.google.com. [2a00:1450:4010:c04::22d]) by mx.google.com with ESMTPS id py9si9596138lbb.165.2015.08.22.14.39.02 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 22 Aug 2015 14:39:02 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::22d as permitted sender) client-ip=2a00:1450:4010:c04::22d; Received: by lbbsx3 with SMTP id sx3so61168504lbb.0 for ; Sat, 22 Aug 2015 14:39:02 -0700 (PDT) X-Received: by 10.112.219.70 with SMTP id pm6mr13319698lbc.41.1440279542898; Sat, 22 Aug 2015 14:39:02 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.162.200 with SMTP id yc8csp1548709lbb; Sat, 22 Aug 2015 14:39:01 -0700 (PDT) X-Received: by 10.68.102.225 with SMTP id fr1mr31158240pbb.65.1440279540475; Sat, 22 Aug 2015 14:39:00 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id uv1si19888051pbc.94.2015.08.22.14.38.59 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 22 Aug 2015 14:39:00 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-405830-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 14231 invoked by alias); 22 Aug 2015 21:38:42 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 14217 invoked by uid 89); 22 Aug 2015 21:38:40 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=no version=3.3.2 X-HELO: mail-pd0-f169.google.com Received: from mail-pd0-f169.google.com (HELO mail-pd0-f169.google.com) (209.85.192.169) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Sat, 22 Aug 2015 21:38:37 +0000 Received: by pdbmi9 with SMTP id mi9so38648925pdb.3 for ; Sat, 22 Aug 2015 14:38:36 -0700 (PDT) X-Received: by 10.70.95.138 with SMTP id dk10mr28077958pdb.57.1440279515779; Sat, 22 Aug 2015 14:38:35 -0700 (PDT) Received: from [192.168.1.14] (ip70-176-202-128.ph.ph.cox.net. [70.176.202.128]) by smtp.googlemail.com with ESMTPSA id fa1sm12179710pab.37.2015.08.22.14.38.34 (version=TLSv1/SSLv3 cipher=OTHER); Sat, 22 Aug 2015 14:38:35 -0700 (PDT) Message-ID: <55D8EBD9.20408@linaro.org> Date: Sat, 22 Aug 2015 14:38:33 -0700 From: Michael Collison User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: Ramana Radhakrishnan , gcc-patches@gcc.gnu.org Subject: Re: [ARM] Use vector wide add for mixed-mode adds References: <55D2E483.5050806@linaro.org> <55D3339C.1080807@foss.arm.com> In-Reply-To: <55D3339C.1080807@foss.arm.com> X-Original-Sender: michael.collison@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::22d as permitted sender) smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 This is a modified version of the previous patch that addresses issue raised by Ramana. The patch now uses vect_select instead of unspec. I had to fix an unrelated issue to the read_name function in read-md.c. The fix corrects broken support for mode iterators inside '<>'. Without this fix support for rtl expression such 'plus:' were broken. A second unrelated issue to this patch is correcting the documentation for the standard names for wide add support This patch is designed to address code that was not being vectorized due to missing widening patterns in the ARM backend. Code such as: int t6(int len, void * dummy, short * __restrict x) { len = len & ~31; int result = 0; __asm volatile (""); for (int i = 0; i < len; i++) result += x[i]; return result; } Validated on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf. -------------------------------------------------------------------------------------------------------------------------------------------------- 2015-08-21 Michael Collison * config/arm/neon.md (widen_sum): New patterns where mode is VQI to improve mixed mode vectorization. * read-md.c (read_name): Allow mode iterators inside '<>' in rtl expressions. * doc/md.texi: Rename [su]sum_widen to widen_[su]sum to reflect correct standard name * gcc.target/arm/neon-vaddws16.c: New test. * gcc.target/arm/neon-vaddws32.c: New test. * gcc.target/arm/neon-vaddwu16.c: New test. * gcc.target/arm/neon-vaddwu32.c: New test. * gcc.target/arm/neon-vaddwu8.c: New test. On 08/18/2015 06:31 AM, Ramana Radhakrishnan wrote: > > On 18/08/15 08:53, Michael Collison wrote: >> This patch is designed to address code that was not being vectorized due to missing widening patterns in the ARM backend. Code such as: >> >> int t6(int len, void * dummy, short * __restrict x) >> { >> len = len & ~31; >> int result = 0; >> __asm volatile (""); >> for (int i = 0; i < len; i++) >> result += x[i]; >> return result; >> } >> >> Validated on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf. >> >> There is one regression on gcc.dg/vect/slp-reduc-3.c that only occurs when -flto is enabled: >> >> gcc.dg/vect/slp-reduc-3.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 1 >> gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 >> > Interesting, though not sure why that happens without some digging further. > >> I could use some feedback on whether this is a regression or issue with the test case. >> ------------------------------------------------------------------------------------------------------------- >> 2015-08-18 Michael Collison >> >> * config/arm/neon.md (widen_sum): New patterns >> where mode is VQI to improve mixed mode vectorization. >> * config/arm/unspec.md: Add new unspecs: UNSPEC_VZERO_EXTEND and >> UNSPEC_VSIGN_EXTEND. >> * gcc.target/arm/neon-vaddws16.c: New test. >> * gcc.target/arm/neon-vaddws32.c: New test. >> * gcc.target/arm/neon-vaddwu16.c: New test. >> * gcc.target/arm/neon-vaddwu32.c: New test. >> * gcc.target/arm/neon-vaddwu8.c: New test. >> >> >> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md >> index 654d9d5..50cb409 100644 >> --- a/gcc/config/arm/neon.md >> +++ b/gcc/config/arm/neon.md >> @@ -1174,6 +1174,27 @@ >> >> ;; Widening operations >> >> +(define_insn_and_split "widen_ssum3" >> + [(set (match_operand: 0 "s_register_operand" "=&w") >> + (plus: (unspec: >> + [(match_operand:VQI 1 "s_register_operand" "w")] >> + UNSPEC_VSIGN_EXTEND) >> + (match_operand: 2 "s_register_operand" "0")))] >> + "TARGET_NEON" >> + "#" >> + "&& reload_completed" > > I notice widen_ssum and widen_usum do not have any documentation with it - can you look to provide some kind of followup documentation for these patterns in md.texi while you are here ? > > >> + [(const_int 0)] >> +{ >> + rtx loreg = simplify_gen_subreg (mode, operands[1], mode, 0); >> + rtx hireg = simplify_gen_subreg (mode, operands[1], mode, GET_MODE_SIZE (mode)); >> + >> + emit_insn (gen_widen_ssum3 (operands[0], loreg, operands[2])); >> + emit_insn (gen_widen_ssum3 (operands[0], hireg, operands[2])); >> + DONE; >> + } >> + [(set_attr "type" "neon_add_widen") >> + (set_attr "length" "8")]) > Isn't it better to expand this into > > (set (reg:V4SI reg) (plus:V4SI (sign_extend:V4SI (vec_select:V4HI (reg:V8HI ...) > (parallel:V8HI (const_vector { 4, 5, 6, 7}))) > (reg:V4SI reg))) > > (set (reg:V4SI reg) (plus:V4SI (sign_extend: V4SI (vec_select:V4HI (reg:V8HI) > (parallel: V8HI (const_vector { 0, 1, 2, 3})))) > > > > That way we can "combine" cases where we have this kind of expressions from the intrinsics - I'm wondering about combinations from vmovl / vadd / vget_low ? > > I'd like us to avoid unspecs where we can... > > > regards > Ramana > >> + >> (define_insn "widen_ssum3" >> [(set (match_operand: 0 "s_register_operand" "=w") >> (plus: (sign_extend: >> @@ -1184,6 +1205,27 @@ >> [(set_attr "type" "neon_add_widen")] >> ) >> >> +(define_insn_and_split "widen_usum3" >> + [(set (match_operand: 0 "s_register_operand" "=&w") >> + (plus: (unspec: >> + [(match_operand:VQI 1 "s_register_operand" "w")] >> + UNSPEC_VZERO_EXTEND) >> + (match_operand: 2 "s_register_operand" "0")))] >> + "TARGET_NEON" >> + "#" >> + "&& reload_completed" >> + [(const_int 0)] >> +{ >> + rtx loreg = simplify_gen_subreg (mode, operands[1], mode, 0); >> + rtx hireg = simplify_gen_subreg (mode, operands[1], mode, GET_MODE_SIZE (mode)); >> + >> + emit_insn (gen_widen_usum3 (operands[0], loreg, operands[2])); >> + emit_insn (gen_widen_usum3 (operands[0], hireg, operands[2])); >> + DONE; >> + } >> + [(set_attr "type" "neon_add_widen") >> + (set_attr "length" "8")]) >> + >> (define_insn "widen_usum3" >> [(set (match_operand: 0 "s_register_operand" "=w") >> (plus: (zero_extend: >> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md >> index 0ec2c48..e9cf836 100644 >> --- a/gcc/config/arm/unspecs.md >> +++ b/gcc/config/arm/unspecs.md >> @@ -358,5 +358,7 @@ >> UNSPEC_NVRINTX >> UNSPEC_NVRINTA >> UNSPEC_NVRINTN >> + UNSPEC_VZERO_EXTEND >> + UNSPEC_VSIGN_EXTEND >> ]) >> >> diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c >> new file mode 100644 >> index 0000000..ed10669 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c >> @@ -0,0 +1,21 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target arm_neon_hw } */ >> +/* { dg-add-options arm_neon_ok } */ >> +/* { dg-options "-O3" } */ >> + >> + >> +int >> +t6(int len, void * dummy, short * __restrict x) >> +{ >> + len = len & ~31; >> + int result = 0; >> + __asm volatile (""); >> + for (int i = 0; i < len; i++) >> + result += x[i]; >> + return result; >> +} >> + >> +/* { dg-final { scan-assembler "vaddw\.s16" } } */ >> + >> + >> + >> diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws32.c b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c >> new file mode 100644 >> index 0000000..94bf0c9 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c >> @@ -0,0 +1,19 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target arm_neon_hw } */ >> +/* { dg-add-options arm_neon_ok } */ >> +/* { dg-options "-O3" } */ >> + >> +int >> +t6(int len, void * dummy, int * __restrict x) >> +{ >> + len = len & ~31; >> + long long result = 0; >> + __asm volatile (""); >> + for (int i = 0; i < len; i++) >> + result += x[i]; >> + return result; >> +} >> + >> +/* { dg-final { scan-assembler "vaddw\.s32" } } */ >> + >> + >> diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c >> new file mode 100644 >> index 0000000..98f8768 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c >> @@ -0,0 +1,18 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target arm_neon_hw } */ >> +/* { dg-add-options arm_neon_ok } */ >> +/* { dg-options "-O3" } */ >> + >> + >> +int >> +t6(int len, void * dummy, unsigned short * __restrict x) >> +{ >> + len = len & ~31; >> + unsigned int result = 0; >> + __asm volatile (""); >> + for (int i = 0; i < len; i++) >> + result += x[i]; >> + return result; >> +} >> + >> +/* { dg-final { scan-assembler "vaddw.u16" } } */ >> diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c >> new file mode 100644 >> index 0000000..2e9af56 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c >> @@ -0,0 +1,18 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target arm_neon_hw } */ >> +/* { dg-add-options arm_neon_ok } */ >> +/* { dg-options "-O3" } */ >> + >> +int >> +t6(int len, void * dummy, unsigned int * __restrict x) >> +{ >> + len = len & ~31; >> + unsigned long long result = 0; >> + __asm volatile (""); >> + for (int i = 0; i < len; i++) >> + result += x[i]; >> + return result; >> +} >> + >> +/* { dg-final { scan-assembler "vaddw\.u32" } } */ >> + >> diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c >> new file mode 100644 >> index 0000000..de2ad8a >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c >> @@ -0,0 +1,21 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target arm_neon_hw } */ >> +/* { dg-add-options arm_neon_ok } */ >> +/* { dg-options "-O3" } */ >> + >> + >> +int >> +t6(int len, void * dummy, char * __restrict x) >> +{ >> + len = len & ~31; >> + unsigned short result = 0; >> + __asm volatile (""); >> + for (int i = 0; i < len; i++) >> + result += x[i]; >> + return result; >> +} >> + >> +/* { dg-final { scan-assembler "vaddw\.u8" } } */ >> + >> + >> + diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 654d9d5..54623fe 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1174,6 +1174,57 @@ ;; Widening operations +(define_expand "widen_ssum3" + [(set (match_operand: 0 "s_register_operand" "") + (plus: (sign_extend: (match_operand:VQI 1 "s_register_operand" "")) + (match_operand: 2 "s_register_operand" "")))] + "TARGET_NEON" + { + int i; + int half_elem = /2; + rtvec v1 = rtvec_alloc (half_elem); + rtvec v2 = rtvec_alloc (half_elem); + rtx p1, p2; + + for (i = 0; i < half_elem; i++) + RTVEC_ELT (v1, i) = GEN_INT (i); + p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1); + + for (i = half_elem; i < ; i++) + RTVEC_ELT (v2, i - half_elem) = GEN_INT (i); + p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2); + + if (operands[0] != operands[2]) + emit_move_insn (operands[0], operands[2]); + + emit_insn (gen_vec_sel_widen_ssum_lo3 (operands[0], operands[1], p1, operands[0])); + emit_insn (gen_vec_sel_widen_ssum_hi3 (operands[0], operands[1], p2, operands[0])); + DONE; + } +) + +(define_insn "vec_sel_widen_ssum_lo3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %e1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) + +(define_insn "vec_sel_widen_ssum_hi3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (sign_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %f1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) + (define_insn "widen_ssum3" [(set (match_operand: 0 "s_register_operand" "=w") (plus: (sign_extend: @@ -1184,6 +1235,57 @@ [(set_attr "type" "neon_add_widen")] ) +(define_expand "widen_usum3" + [(set (match_operand: 0 "s_register_operand" "") + (plus: (zero_extend: (match_operand:VQI 1 "s_register_operand" "")) + (match_operand: 2 "s_register_operand" "")))] + "TARGET_NEON" + { + int i; + int half_elem = /2; + rtvec v1 = rtvec_alloc (half_elem); + rtvec v2 = rtvec_alloc (half_elem); + rtx p1, p2; + + for (i = 0; i < half_elem; i++) + RTVEC_ELT (v1, i) = GEN_INT (i); + p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1); + + for (i = half_elem; i < ; i++) + RTVEC_ELT (v2, i - half_elem) = GEN_INT (i); + p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2); + + if (operands[0] != operands[2]) + emit_move_insn (operands[0], operands[2]); + + emit_insn (gen_vec_sel_widen_usum_lo3 (operands[0], operands[1], p1, operands[0])); + emit_insn (gen_vec_sel_widen_usum_hi3 (operands[0], operands[1], p2, operands[0])); + DONE; + } +) + +(define_insn "vec_sel_widen_usum_lo3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (zero_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %e1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) + +(define_insn "vec_sel_widen_usum_hi3" + [(set (match_operand: 0 "s_register_operand" "=w") + (plus: (zero_extend: (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand: 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.\t%q0, %q3, %f1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) + diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 0ec229f..7af4183 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -4939,10 +4939,10 @@ is of a wider mode, is computed and added to operand 3. Operand 3 is of a mode equal or wider than the mode of the absolute difference. The result is placed in operand 0, which is of the same mode as operand 3. -@cindex @code{ssum_widen@var{m3}} instruction pattern -@item @samp{ssum_widen@var{m3}} -@cindex @code{usum_widen@var{m3}} instruction pattern -@itemx @samp{usum_widen@var{m3}} +@cindex @code{widen_ssum@var{m3}} instruction pattern +@item @samp{widen_ssum@var{m3}} +@cindex @code{widen_usum@var{m3}} instruction pattern +@itemx @samp{widen_usum@var{m3}} Operands 0 and 2 are of the same mode, which is wider than the mode of operand 1. Add operand 1 to operand 2 and place the widened result in operand 0. (This is used express accumulation of elements into an accumulator diff --git a/gcc/read-md.c b/gcc/read-md.c index 9f158ec..df5748f 100644 --- a/gcc/read-md.c +++ b/gcc/read-md.c @@ -399,16 +399,24 @@ read_name (struct md_name *name) { int c; size_t i; + int in_angle_bracket; c = read_skip_spaces (); i = 0; + in_angle_bracket = 0; while (1) { + if (c == '<') + in_angle_bracket = 1; + + if (c == '>') + in_angle_bracket = 0; + if (c == ' ' || c == '\n' || c == '\t' || c == '\f' || c == '\r' || c == EOF) break; - if (c == ':' || c == ')' || c == ']' || c == '"' || c == '/' + if (((c == ':') and (in_angle_bracket == 0)) || c == ')' || c == ']' || c == '"' || c == '/' || c == '(' || c == '[') { unread_char (c); diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c new file mode 100644 index 0000000..ed10669 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, short * __restrict x) +{ + len = len & ~31; + int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s16" } } */ + + + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws32.c b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c new file mode 100644 index 0000000..94bf0c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, int * __restrict x) +{ + len = len & ~31; + long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s32" } } */ + + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c new file mode 100644 index 0000000..98f8768 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, unsigned short * __restrict x) +{ + len = len & ~31; + unsigned int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c new file mode 100644 index 0000000..2e9af56 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, unsigned int * __restrict x) +{ + len = len & ~31; + unsigned long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u32" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c new file mode 100644 index 0000000..de2ad8a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, char * __restrict x) +{ + len = len & ~31; + unsigned short result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u8" } } */ + + +