From patchwork Thu Feb 23 21:40:39 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 6909 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id 91A1123DEE for ; Thu, 23 Feb 2012 21:40:51 +0000 (UTC) Received: from mail-iy0-f180.google.com (mail-iy0-f180.google.com [209.85.210.180]) by fiordland.canonical.com (Postfix) with ESMTP id 2FA94A18034 for ; Thu, 23 Feb 2012 21:40:51 +0000 (UTC) Received: by iabz7 with SMTP id z7so2897974iab.11 for ; Thu, 23 Feb 2012 13:40:50 -0800 (PST) Received: from mr.google.com ([10.42.131.129]) by 10.42.131.129 with SMTP id z1mr3712977ics.53.1330033250707 (num_hops = 1); Thu, 23 Feb 2012 13:40:50 -0800 (PST) Received: by 10.42.131.129 with SMTP id z1mr3117014ics.53.1330033250653; Thu, 23 Feb 2012 13:40:50 -0800 (PST) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.231.11.10 with SMTP id r10csp22063ibr; Thu, 23 Feb 2012 13:40:50 -0800 (PST) Received: by 10.14.200.129 with SMTP id z1mr1395009een.124.1330033249166; Thu, 23 Feb 2012 13:40:49 -0800 (PST) Received: from relay1.mentorg.com (relay1.mentorg.com. [192.94.38.131]) by mx.google.com with ESMTPS id x3si1931714eea.102.2012.02.23.13.40.47 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 23 Feb 2012 13:40:48 -0800 (PST) Received-SPF: neutral (google.com: 192.94.38.131 is neither permitted nor denied by best guess record for domain of Andrew_Stubbs@mentor.com) client-ip=192.94.38.131; Authentication-Results: mx.google.com; spf=neutral (google.com: 192.94.38.131 is neither permitted nor denied by best guess record for domain of Andrew_Stubbs@mentor.com) smtp.mail=Andrew_Stubbs@mentor.com Received: from svr-orw-fem-01.mgc.mentorg.com ([147.34.98.93]) by relay1.mentorg.com with esmtp id 1S0gP7-00049k-HC from Andrew_Stubbs@mentor.com ; Thu, 23 Feb 2012 13:40:45 -0800 Received: from SVR-IES-FEM-02.mgc.mentorg.com ([137.202.0.106]) by svr-orw-fem-01.mgc.mentorg.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Thu, 23 Feb 2012 13:40:45 -0800 Received: from [172.30.11.14] (137.202.0.76) by SVR-IES-FEM-02.mgc.mentorg.com (137.202.0.106) with Microsoft SMTP Server id 14.1.289.1; Thu, 23 Feb 2012 21:40:42 +0000 Message-ID: <4F46B257.3000505@codesourcery.com> Date: Thu, 23 Feb 2012 21:40:39 +0000 From: Andrew Stubbs User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" CC: "patches@linaro.org" Subject: [PATCH][ARM] core -> NEON extend X-OriginalArrivalTime: 23 Feb 2012 21:40:45.0343 (UTC) FILETIME=[CC8682F0:01CCF273] X-Gm-Message-State: ALoCoQn2sc18L7SBQprl1mDS0J3mJXMNip99/kDS1vWv0GTwBTpnGXXDvChKL9r2dStbMAqCml28 Hi All, This patch converts SImode to DImode extends that also move from core registers to VFP/NEON registers. Currently, the compiler does extends in core registers first, and then does the move. This adds to register pressure, which I would imagine to be a bad thing. If the value is not in a properly aligned register (the first parameter to a register never is) then it also has to move that around also. With my patch, it first moves the SImode value into the NEON register, and then extends it, which uses no extra registers. Zero extend, before and after (assuming the value is passed in r0): mov r2, r0 | vdup.32 d16, r0 movs r3, #0 | vshr.u64 d16, d16, #32 fmdrr d16, r2, r3 | Sign extend: mov r2, r0 | vdup.32 d16, r0 asrs r3, r0, #31 | vshr.s64 d16, d16, #32 fmdrr d16, r2, r3 | OK for 4.8? Andrew P.S. I have experimented with doing zero-extends something like vmov.i64 d7, #0 fmsr s14, r0 But, somehow the immediate load doesn't seem to work, and it limits the target register to VFP_LO_REGS. It's also not possible to load into only s15, so I'm not sure there's any advantage. 2012-02-23 Andrew Stubbs gcc/ * config/arm/arm.md (zero_extenddi2): Add extra alternatives for NEON registers. (extenddi2): Likewise. Prevent extend splitters doing NEON alternatives. * config/arm/iterators.md (qhs_extenddi_cstr, qhs_zextenddi_cstr): Adjust constraints to add new alternatives. * config/arm/neon.md: Add splitters for zero- and sign-extend. gcc/testsuite/ * gcc.target/arm/neon-extend-1.c: New file. * gcc.target/arm/neon-extend-2.c: New file. --- gcc/config/arm/arm.md | 26 +++++++++++++++----------- gcc/config/arm/iterators.md | 4 ++-- gcc/config/arm/neon.md | 22 ++++++++++++++++++++++ gcc/testsuite/gcc.target/arm/neon-extend-1.c | 13 +++++++++++++ gcc/testsuite/gcc.target/arm/neon-extend-2.c | 13 +++++++++++++ 5 files changed, 65 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/neon-extend-1.c create mode 100644 gcc/testsuite/gcc.target/arm/neon-extend-2.c diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 182c52a..35bf688 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -4479,33 +4479,35 @@ ;; Zero and sign extension instructions. (define_insn "zero_extenddi2" - [(set (match_operand:DI 0 "s_register_operand" "=r") + [(set (match_operand:DI 0 "s_register_operand" "=w, r") (zero_extend:DI (match_operand:QHSI 1 "" "")))] "TARGET_32BIT " "#" - [(set_attr "length" "8") - (set_attr "ce_count" "2") - (set_attr "predicable" "yes")] + [(set_attr "length" "8,8") + (set_attr "ce_count" "2,2") + (set_attr "predicable" "yes,yes")] ) (define_insn "extenddi2" - [(set (match_operand:DI 0 "s_register_operand" "=r") + [(set (match_operand:DI 0 "s_register_operand" "=w,r") (sign_extend:DI (match_operand:QHSI 1 "" "")))] "TARGET_32BIT " "#" - [(set_attr "length" "8") - (set_attr "ce_count" "2") - (set_attr "shift" "1") - (set_attr "predicable" "yes")] + [(set_attr "length" "8,8") + (set_attr "ce_count" "2,2") + (set_attr "shift" "1,1") + (set_attr "predicable" "yes,yes")] ) ;; Splits for all extensions to DImode (define_split [(set (match_operand:DI 0 "s_register_operand" "") (zero_extend:DI (match_operand 1 "nonimmediate_operand" "")))] - "TARGET_32BIT" + "TARGET_32BIT && (!TARGET_NEON + || (reload_completed + && !(IS_VFP_REGNUM (REGNO (operands[0])))))" [(set (match_dup 0) (match_dup 1))] { rtx lo_part = gen_lowpart (SImode, operands[0]); @@ -4531,7 +4533,9 @@ (define_split [(set (match_operand:DI 0 "s_register_operand" "") (sign_extend:DI (match_operand 1 "nonimmediate_operand" "")))] - "TARGET_32BIT" + "TARGET_32BIT && (!TARGET_NEON + || (reload_completed + && !(IS_VFP_REGNUM (REGNO (operands[0])))))" [(set (match_dup 0) (ashiftrt:SI (match_dup 1) (const_int 31)))] { rtx lo_part = gen_lowpart (SImode, operands[0]); diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 1567264..07ac5da 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -409,8 +409,8 @@ (define_mode_attr qhs_extenddi_op [(SI "s_register_operand") (HI "nonimmediate_operand") (QI "arm_reg_or_extendqisi_mem_op")]) -(define_mode_attr qhs_extenddi_cstr [(SI "r") (HI "rm") (QI "rUq")]) -(define_mode_attr qhs_zextenddi_cstr [(SI "r") (HI "rm") (QI "rm")]) +(define_mode_attr qhs_extenddi_cstr [(SI "r,r") (HI "r,rm") (QI "r,rUq")]) +(define_mode_attr qhs_zextenddi_cstr [(SI "r,r") (HI "r,rm") (QI "r,rm")]) ;; Mode attributes used for fixed-point support. (define_mode_attr qaddsub_suf [(V4UQQ "8") (V2UHQ "16") (UQQ "8") (UHQ "16") diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 6492721..618d59d 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -5879,3 +5879,25 @@ (const_string "neon_fp_vadd_qqq_vabs_qq")) (const_string "neon_int_5")))] ) + +;; Copy from core-to-neon regs, then extend, not vice-versa + +(define_split + [(set (match_operand:DI 0 "s_register_operand" "") + (sign_extend:DI (match_operand:SI 1 "s_register_operand" "")))] + "TARGET_NEON && reload_completed && IS_VFP_REGNUM (REGNO (operands[0]))" + [(set (match_dup 2) (vec_duplicate:V2SI (match_dup 1))) + (set (match_dup 0) (ashiftrt:DI (match_dup 0) (const_int 32)))] + { + operands[2] = gen_rtx_REG (V2SImode, REGNO (operands[0])); + }) + +(define_split + [(set (match_operand:DI 0 "s_register_operand" "") + (zero_extend:DI (match_operand:SI 1 "s_register_operand" "")))] + "TARGET_NEON && reload_completed && IS_VFP_REGNUM (REGNO (operands[0]))" + [(set (match_dup 2) (vec_duplicate:V2SI (match_dup 1))) + (set (match_dup 0) (lshiftrt:DI (match_dup 0) (const_int 32)))] + { + operands[2] = gen_rtx_REG (V2SImode, REGNO (operands[0])); + }) diff --git a/gcc/testsuite/gcc.target/arm/neon-extend-1.c b/gcc/testsuite/gcc.target/arm/neon-extend-1.c new file mode 100644 index 0000000..cfe83ce --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-extend-1.c @@ -0,0 +1,13 @@ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_neon } */ + +void +f (unsigned int a) +{ + unsigned long long b = a; + asm volatile ("@ extended to %0" : : "w" (b)); +} + +/* { dg-final { scan-assembler "vdup.32" } } */ +/* { dg-final { scan-assembler "vshr.u64" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-extend-2.c b/gcc/testsuite/gcc.target/arm/neon-extend-2.c new file mode 100644 index 0000000..1c5a17e --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-extend-2.c @@ -0,0 +1,13 @@ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-options "-O2" } */ +/* { dg-add-options arm_neon } */ + +void +f (int a) +{ + long long b = a; + asm volatile ("@ extended to %0" : : "w" (b)); +} + +/* { dg-final { scan-assembler "vdup.32" } } */ +/* { dg-final { scan-assembler "vshr.s64" } } */