From patchwork Tue Aug 18 07:53:39 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Collison X-Patchwork-Id: 52488 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-la0-f72.google.com (mail-la0-f72.google.com [209.85.215.72]) by patches.linaro.org (Postfix) with ESMTPS id 0227622DB1 for ; Tue, 18 Aug 2015 07:54:27 +0000 (UTC) Received: by labd1 with SMTP id d1sf54436701lab.0 for ; Tue, 18 Aug 2015 00:54:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:message-id:date:from:user-agent:mime-version:to :subject:content-type:content-transfer-encoding:x-original-sender :x-original-authentication-results; bh=ktakzdytmZIQi6F+nMIGqSMGP0DN3GHOHODPuXDJM3M=; b=DN5IZ1qrriW2I1jbEa1svaayzHhibhnRcN/NkZXaLfzOuXhyEixkwAM9AB9Rcdr0Th BtDd3peFtpguR/rcx89+/5YEf9WPIG9UB+K0HnuBCkDK8LktkZcgp86QnCoaWRAceXbR vempGbB6E0ifFoNTpE/vKGlMlwO3iYJYAJMuXxZvSTHuawErp397qDd85mffbPMI3Ut0 7vj/0tBfttQ1WsAqcRWwz9LTqfqgEE6r0sbNs66v2cPjBfaIZ8dU2AnPNnKDzzoGWjoK 72eLcHWJ1OJFBl91O4j3TNv+N4++c2ahM/9qC3pPIOiazCzu/egWQKuC+o4VrevGLr+n 2xNQ== X-Gm-Message-State: ALoCoQmZOQ0nGp0/df59W8ueqMEP+fal/E4KvLGj5pxNU5mcCqC72LjDmlf4AFpuKrh3bGq5UNKZ X-Received: by 10.112.78.101 with SMTP id a5mr1442331lbx.9.1439884465623; Tue, 18 Aug 2015 00:54:25 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.21.194 with SMTP id x2ls739037lae.33.gmail; Tue, 18 Aug 2015 00:54:25 -0700 (PDT) X-Received: by 10.112.131.98 with SMTP id ol2mr5035829lbb.56.1439884465219; Tue, 18 Aug 2015 00:54:25 -0700 (PDT) Received: from mail-lb0-x22b.google.com (mail-lb0-x22b.google.com. [2a00:1450:4010:c04::22b]) by mx.google.com with ESMTPS id jo12si13520902lab.37.2015.08.18.00.54.25 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 Aug 2015 00:54:25 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::22b as permitted sender) client-ip=2a00:1450:4010:c04::22b; Received: by lbcbn3 with SMTP id bn3so97114996lbc.2 for ; Tue, 18 Aug 2015 00:54:25 -0700 (PDT) X-Received: by 10.152.28.193 with SMTP id d1mr4971408lah.72.1439884465005; Tue, 18 Aug 2015 00:54:25 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.162.200 with SMTP id yc8csp315718lbb; Tue, 18 Aug 2015 00:54:23 -0700 (PDT) X-Received: by 10.107.130.141 with SMTP id m13mr6835797ioi.22.1439884463305; Tue, 18 Aug 2015 00:54:23 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id u3si28990516pde.161.2015.08.18.00.54.22 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 18 Aug 2015 00:54:23 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-405416-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 90562 invoked by alias); 18 Aug 2015 07:53:53 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 90450 invoked by uid 89); 18 Aug 2015 07:53:52 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL, BAYES_00, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=no version=3.3.2 X-HELO: mail-pa0-f47.google.com Received: from mail-pa0-f47.google.com (HELO mail-pa0-f47.google.com) (209.85.220.47) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 18 Aug 2015 07:53:44 +0000 Received: by pacgr6 with SMTP id gr6so127095543pac.2 for ; Tue, 18 Aug 2015 00:53:42 -0700 (PDT) X-Received: by 10.68.220.199 with SMTP id py7mr10805344pbc.150.1439884422496; Tue, 18 Aug 2015 00:53:42 -0700 (PDT) Received: from [192.168.1.14] (ip70-176-202-128.ph.ph.cox.net. [70.176.202.128]) by smtp.googlemail.com with ESMTPSA id lo10sm8500397pab.16.2015.08.18.00.53.41 for (version=TLSv1/SSLv3 cipher=OTHER); Tue, 18 Aug 2015 00:53:42 -0700 (PDT) Message-ID: <55D2E483.5050806@linaro.org> Date: Tue, 18 Aug 2015 00:53:39 -0700 From: Michael Collison User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0 MIME-Version: 1.0 To: gcc-patches@gcc.gnu.org Subject: [ARM] Use vector wide add for mixed-mode adds X-Original-Sender: michael.collison@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::22b as permitted sender) smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 This patch is designed to address code that was not being vectorized due to missing widening patterns in the ARM backend. Code such as: int t6(int len, void * dummy, short * __restrict x) { len = len & ~31; int result = 0; __asm volatile (""); for (int i = 0; i < len; i++) result += x[i]; return result; } Validated on arm-none-eabi, arm-none-linux-gnueabi, arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf. There is one regression on gcc.dg/vect/slp-reduc-3.c that only occurs when -flto is enabled: gcc.dg/vect/slp-reduc-3.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 1 gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1 I could use some feedback on whether this is a regression or issue with the test case. ------------------------------------------------------------------------------------------------------------- 2015-08-18 Michael Collison * config/arm/neon.md (widen_sum): New patterns where mode is VQI to improve mixed mode vectorization. * config/arm/unspec.md: Add new unspecs: UNSPEC_VZERO_EXTEND and UNSPEC_VSIGN_EXTEND. * gcc.target/arm/neon-vaddws16.c: New test. * gcc.target/arm/neon-vaddws32.c: New test. * gcc.target/arm/neon-vaddwu16.c: New test. * gcc.target/arm/neon-vaddwu32.c: New test. * gcc.target/arm/neon-vaddwu8.c: New test. diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 654d9d5..50cb409 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1174,6 +1174,27 @@ ;; Widening operations +(define_insn_and_split "widen_ssum3" + [(set (match_operand: 0 "s_register_operand" "=&w") + (plus: (unspec: + [(match_operand:VQI 1 "s_register_operand" "w")] + UNSPEC_VSIGN_EXTEND) + (match_operand: 2 "s_register_operand" "0")))] + "TARGET_NEON" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx loreg = simplify_gen_subreg (mode, operands[1], mode, 0); + rtx hireg = simplify_gen_subreg (mode, operands[1], mode, GET_MODE_SIZE (mode)); + + emit_insn (gen_widen_ssum3 (operands[0], loreg, operands[2])); + emit_insn (gen_widen_ssum3 (operands[0], hireg, operands[2])); + DONE; + } + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")]) + (define_insn "widen_ssum3" [(set (match_operand: 0 "s_register_operand" "=w") (plus: (sign_extend: @@ -1184,6 +1205,27 @@ [(set_attr "type" "neon_add_widen")] ) +(define_insn_and_split "widen_usum3" + [(set (match_operand: 0 "s_register_operand" "=&w") + (plus: (unspec: + [(match_operand:VQI 1 "s_register_operand" "w")] + UNSPEC_VZERO_EXTEND) + (match_operand: 2 "s_register_operand" "0")))] + "TARGET_NEON" + "#" + "&& reload_completed" + [(const_int 0)] +{ + rtx loreg = simplify_gen_subreg (mode, operands[1], mode, 0); + rtx hireg = simplify_gen_subreg (mode, operands[1], mode, GET_MODE_SIZE (mode)); + + emit_insn (gen_widen_usum3 (operands[0], loreg, operands[2])); + emit_insn (gen_widen_usum3 (operands[0], hireg, operands[2])); + DONE; + } + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")]) + (define_insn "widen_usum3" [(set (match_operand: 0 "s_register_operand" "=w") (plus: (zero_extend: diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md index 0ec2c48..e9cf836 100644 --- a/gcc/config/arm/unspecs.md +++ b/gcc/config/arm/unspecs.md @@ -358,5 +358,7 @@ UNSPEC_NVRINTX UNSPEC_NVRINTA UNSPEC_NVRINTN + UNSPEC_VZERO_EXTEND + UNSPEC_VSIGN_EXTEND ]) diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c new file mode 100644 index 0000000..ed10669 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, short * __restrict x) +{ + len = len & ~31; + int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s16" } } */ + + + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws32.c b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c new file mode 100644 index 0000000..94bf0c9 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, int * __restrict x) +{ + len = len & ~31; + long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s32" } } */ + + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c new file mode 100644 index 0000000..98f8768 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, unsigned short * __restrict x) +{ + len = len & ~31; + unsigned int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c new file mode 100644 index 0000000..2e9af56 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, unsigned int * __restrict x) +{ + len = len & ~31; + unsigned long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u32" } } */ + diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c new file mode 100644 index 0000000..de2ad8a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, char * __restrict x) +{ + len = len & ~31; + unsigned short result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u8" } } */ + + +