From patchwork Thu Feb 23 20:36:06 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Stubbs X-Patchwork-Id: 6908 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id 9ACBD2450A for ; Thu, 23 Feb 2012 20:36:18 +0000 (UTC) Received: from mail-iy0-f180.google.com (mail-iy0-f180.google.com [209.85.210.180]) by fiordland.canonical.com (Postfix) with ESMTP id 2466EA18025 for ; Thu, 23 Feb 2012 20:36:18 +0000 (UTC) Received: by iabz7 with SMTP id z7so2795026iab.11 for ; Thu, 23 Feb 2012 12:36:17 -0800 (PST) Received: from mr.google.com ([10.50.163.71]) by 10.50.163.71 with SMTP id yg7mr4249815igb.26.1330029377536 (num_hops = 1); Thu, 23 Feb 2012 12:36:17 -0800 (PST) Received: by 10.50.163.71 with SMTP id yg7mr3539405igb.26.1330029377445; Thu, 23 Feb 2012 12:36:17 -0800 (PST) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.231.11.10 with SMTP id r10csp20945ibr; Thu, 23 Feb 2012 12:36:16 -0800 (PST) Received: by 10.68.129.41 with SMTP id nt9mr6873250pbb.111.1330029375232; Thu, 23 Feb 2012 12:36:15 -0800 (PST) Received: from relay1.mentorg.com (relay1.mentorg.com. [192.94.38.131]) by mx.google.com with ESMTPS id g9si3389289pbd.11.2012.02.23.12.36.13 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 23 Feb 2012 12:36:15 -0800 (PST) Received-SPF: neutral (google.com: 192.94.38.131 is neither permitted nor denied by best guess record for domain of Andrew_Stubbs@mentor.com) client-ip=192.94.38.131; Authentication-Results: mx.google.com; spf=neutral (google.com: 192.94.38.131 is neither permitted nor denied by best guess record for domain of Andrew_Stubbs@mentor.com) smtp.mail=Andrew_Stubbs@mentor.com Received: from svr-orw-exc-10.mgc.mentorg.com ([147.34.98.58]) by relay1.mentorg.com with esmtp id 1S0fOd-0002cV-TY from Andrew_Stubbs@mentor.com ; Thu, 23 Feb 2012 12:36:11 -0800 Received: from SVR-IES-FEM-01.mgc.mentorg.com ([137.202.0.104]) by SVR-ORW-EXC-10.mgc.mentorg.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 23 Feb 2012 12:36:10 -0800 Received: from [172.30.11.14] (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.1.289.1; Thu, 23 Feb 2012 20:36:09 +0000 Message-ID: <4F46A336.5000503@codesourcery.com> Date: Thu, 23 Feb 2012 20:36:06 +0000 From: Andrew Stubbs User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" CC: "patches@linaro.org" Subject: Re: [PATCH][ARM] 64-bit shifts in NEON. References: <4F2FD216.6090507@codesourcery.com> <4F43B707.7070908@codesourcery.com> In-Reply-To: <4F43B707.7070908@codesourcery.com> X-OriginalArrivalTime: 23 Feb 2012 20:36:10.0459 (UTC) FILETIME=[C6EA22B0:01CCF26A] X-Gm-Message-State: ALoCoQlnTxDKxjNFLUFPH6JYX3VtK+3NcN4l1HjQjgdAJ9UYDZM+YKDa1B7vAKuwYMBJZq75PN5o On 21/02/12 15:23, Andrew Stubbs wrote: > On 06/02/12 13:13, Andrew Stubbs wrote: >> This patch adds DImode shift support in NEON registers/instructions. >> >> The patch causes delays any lowering until the split2 pass, after the >> register allocator has chosen whether to do the shift in NEON (VFP) >> registers, or in core-registers. >> >> The core-registers case depends on the patch I previously posted here: >> http://gcc.gnu.org/ml/gcc-patches/2012-01/msg01472.html >> >> The NEON right-shifts make life more interesting by using a left-shift >> instruction with a negative offset. This means that the amount has to be >> negated. Ideally you'd want to do this at expand time, but the delayed >> NEON/core decision makes this impossible, so I've chosen to expand this >> in the post-reload split pass. Unfortunately, NEON does not provide a >> suitable instruction for negating the shift amount, so that ends up >> happening in core-registers. >> >> Another complication is that the NEON shift instructions use a 64-bit >> register for the shift amount, but they only pay attention to the bottom >> 8 bits. I did experiment with using a DImode shift amount, but that >> didn't work out well; there were unnecessary extends and the >> core-registers fall back was less efficient. >> >> Therefore, I've chosen to create a new register class, VFP_LO_REGS_EVEN, >> which includes only the 32-bit low-part of the DImode NEON registers so >> the shift amount can be loaded into VFP regs without extending them. >> This required a new print format 'E' that converts the low-part name to >> the full register name the instructions need. Unfortunately, this does >> artificially limit the shift amount to the bottom half of the register >> set, but hopefully that's not going to be a big problem. >> >> The register allocator is causing me trouble though. The problem is that >> the compiler just refused to use the NEON variant in all of my toy >> examples. It turns out to be simply that the IRA & reload passes do not >> change hard-registers already present in the RTL (function parameters, >> return values, etc.) unless there is absolutely no alternative that >> works with that register. I'm not sure if there's anything that can be >> done about this, or not. I'm not even sure if it isn't the right choice >> much of the time, cost wise. > > I've now updated the patch to take into account size optimization. > > Currently, if optimizing for size the compiler prefers to call the > libgcc function, rather that do the shift inline. > > With my old patch, when NEON is enabled it always used the inline code > (either in NEON or core-registers) no matter which optimization flags > were set. This is more-or-less correct if the register allocator chooses > to do the operation in NEON, but much less space efficient otherwise. > > The update simply disables the core-registers fall-back option when > optimizing for size. Transferring the values to NEON registers and back > should be roughly the same size as calling a function, so there > shouldn't be a big loss. > > I'm in two minds about the shift-by-constant cases though, since they > expand to fewer instructions. Any thoughts? And yet another update. This time I noticed that I didn't discard the "clobber"s after the split has determined they're not necessary any more. Presumably the unallocated "match_scratch"es were harmless, but the unnecessary CC clobbers could affect if-conversion and scheduling. This patch is the same as the previous, except that I've broken out the alternatives that don't need any clobbers. Ok for 4.8? Andrew 2012-02-21 Andrew Stubbs gcc/ * config/arm/arm.c (arm_print_operand): Add new 'E' format code. * config/arm/arm.h (enum reg_class): Add VFP_LO_REGS_EVEN. (REG_CLASS_NAMES, REG_CLASS_CONTENTS, IS_VFP_CLASS): Likewise. * config/arm/arm.md (ashldi3): Add TARGET_NEON case. (ashrdi3, lshrdi3): Likewise. * config/arm/constraints.md (T): New register constraint. (Pe, P1, Pf, Pg): New constraints. * config/arm/neon.md (signed_shift_di3_neon, unsigned_shift_di3_neon, ashldi3_neon, ashldi3_neon_noclobber, ashrdi3_neon_imm, ashrdi3_neon_reg, ashrdi3_neon, ashrdi3_neon_imm_noclobber, lshrdi3_neon_imm, ashrdi3_neon, lshrdi3_neon_imm_noclobber, lshrdi3_neon_imm, lshrdi3_neon_reg, lshrdi3_neon): New patterns. * config/arm/predicates.md (int_0_to_63): New predicate. (shift_amount_64): New predicate. --- gcc/config/arm/arm.c | 18 +++ gcc/config/arm/arm.h | 5 + gcc/config/arm/arm.md | 33 ++++- gcc/config/arm/constraints.md | 30 ++++ gcc/config/arm/neon.md | 290 +++++++++++++++++++++++++++++++++++++++++ gcc/config/arm/predicates.md | 8 + 6 files changed, 374 insertions(+), 10 deletions(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 386231a..65ccd91 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -17661,6 +17661,24 @@ arm_print_operand (FILE *stream, rtx x, int code) } return; + /* Print the VFP/Neon double precision register name that overlaps the + given single-precision register. */ + case 'E': + { + int mode = GET_MODE (x); + + if (GET_MODE_SIZE (mode) != 4 + || GET_CODE (x) != REG + || !IS_VFP_REGNUM (REGNO (x))) + { + output_operand_lossage ("invalid operand for code '%c'", code); + return; + } + + fprintf (stream, "d%d", (REGNO (x) - FIRST_VFP_REGNUM) >> 1); + } + return; + /* These two codes print the low/high doubleword register of a Neon quad register, respectively. For pair-structure types, can also print low/high quadword registers. */ diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index 5a78125..6f0df83 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -1061,6 +1061,7 @@ enum reg_class CIRRUS_REGS, VFP_D0_D7_REGS, VFP_LO_REGS, + VFP_LO_REGS_EVEN, VFP_HI_REGS, VFP_REGS, IWMMXT_GR_REGS, @@ -1087,6 +1088,7 @@ enum reg_class "CIRRUS_REGS", \ "VFP_D0_D7_REGS", \ "VFP_LO_REGS", \ + "VFP_LO_REGS_EVEN", \ "VFP_HI_REGS", \ "VFP_REGS", \ "IWMMXT_GR_REGS", \ @@ -1112,6 +1114,7 @@ enum reg_class { 0xF8000000, 0x000007FF, 0x00000000, 0x00000000 }, /* CIRRUS_REGS */ \ { 0x00000000, 0x80000000, 0x00007FFF, 0x00000000 }, /* VFP_D0_D7_REGS */ \ { 0x00000000, 0x80000000, 0x7FFFFFFF, 0x00000000 }, /* VFP_LO_REGS */ \ + { 0x00000000, 0x80000000, 0x2AAAAAAA, 0x00000000 }, /* VFP_LO_REGS_EVEN */ \ { 0x00000000, 0x00000000, 0x80000000, 0x7FFFFFFF }, /* VFP_HI_REGS */ \ { 0x00000000, 0x80000000, 0xFFFFFFFF, 0x7FFFFFFF }, /* VFP_REGS */ \ { 0x00000000, 0x00007800, 0x00000000, 0x00000000 }, /* IWMMXT_GR_REGS */ \ @@ -1129,7 +1132,7 @@ enum reg_class /* Any of the VFP register classes. */ #define IS_VFP_CLASS(X) \ - ((X) == VFP_D0_D7_REGS || (X) == VFP_LO_REGS \ + ((X) == VFP_D0_D7_REGS || (X) == VFP_LO_REGS || (X) == VFP_LO_REGS_EVEN \ || (X) == VFP_HI_REGS || (X) == VFP_REGS) /* The same information, inverted: diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 7910bae..182c52a 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -3466,8 +3466,15 @@ (match_operand:SI 2 "reg_or_int_operand" "")))] "TARGET_32BIT" " - if (!CONST_INT_P (operands[2]) - && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK))) + if (TARGET_NEON) + { + /* Delay the decision whether to use NEON or core-regs until + register allocation. */ + emit_insn (gen_ashldi3_neon (operands[0], operands[1], operands[2])); + DONE; + } + else if (!CONST_INT_P (operands[2]) + && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK))) ; /* No special preparation statements; expand pattern as above. */ else { @@ -3541,8 +3548,15 @@ (match_operand:SI 2 "reg_or_int_operand" "")))] "TARGET_32BIT" " - if (!CONST_INT_P (operands[2]) - && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK))) + if (TARGET_NEON) + { + /* Delay the decision whether to use NEON or core-regs until + register allocation. */ + emit_insn (gen_ashrdi3_neon (operands[0], operands[1], operands[2])); + DONE; + } + else if (!CONST_INT_P (operands[2]) + && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK))) ; /* No special preparation statements; expand pattern as above. */ else { @@ -3614,8 +3628,15 @@ (match_operand:SI 2 "reg_or_int_operand" "")))] "TARGET_32BIT" " - if (!CONST_INT_P (operands[2]) - && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK))) + if (TARGET_NEON) + { + /* Delay the decision whether to use NEON or core-regs until + register allocation. */ + emit_insn (gen_lshrdi3_neon (operands[0], operands[1], operands[2])); + DONE; + } + else if (!CONST_INT_P (operands[2]) + && (TARGET_REALLY_IWMMXT || (TARGET_HARD_FLOAT && TARGET_MAVERICK))) ; /* No special preparation statements; expand pattern as above. */ else { diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md index 7d0269a..a1aaf43 100644 --- a/gcc/config/arm/constraints.md +++ b/gcc/config/arm/constraints.md @@ -19,7 +19,7 @@ ;; . ;; The following register constraints have been used: -;; - in ARM/Thumb-2 state: f, t, v, w, x, y, z +;; - in ARM/Thumb-2 state: f, t, T, v, w, x, y, z ;; - in Thumb state: h, b ;; - in both states: l, c, k ;; In ARM state, 'l' is an alias for 'r' @@ -29,9 +29,9 @@ ;; in Thumb-1 state: I, J, K, L, M, N, O ;; The following multi-letter normal constraints have been used: -;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz +;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz, Pe, Pf, P1 ;; in Thumb-1 state: Pa, Pb, Pc, Pd -;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py +;; in Thumb-2 state: Pg, Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py ;; The following memory constraints have been used: ;; in ARM/Thumb-2 state: Q, Ut, Uv, Uy, Un, Um, Us @@ -45,6 +45,9 @@ (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS" "The VFP registers @code{s0}-@code{s31}.") +(define_register_constraint "T" "TARGET_32BIT ? VFP_LO_REGS_EVEN : NO_REGS" + "The even numbered VFP registers @code{s0}-@code{s31}.") + (define_register_constraint "v" "TARGET_ARM ? CIRRUS_REGS : NO_REGS" "The Cirrus Maverick co-processor registers.") @@ -172,6 +175,27 @@ (and (match_code "const_int") (match_test "TARGET_THUMB1 && ival >= 0 && ival <= 7"))) +(define_constraint "Pe" + "@internal In ARM/Thumb-2 state, a constant in the range 0 to 63" + (and (match_code "const_int") + (match_test "TARGET_32BIT && ival >= 0 && ival < 64"))) + +(define_constraint "P1" + "@internal In ARM/Thumb2 state, a constant of 1" + (and (match_code "const_int") + (match_test "TARGET_32BIT && ival == 1"))) + +(define_constraint "Pf" + "@internal In ARM state, a constant in the range 0 to 63, and in thumb-2 state, 32 to 63" + (and (match_code "const_int") + (match_test "(TARGET_ARM && ival >= 0 && ival < 64) + || (TARGET_THUMB2 && ival >= 32 && ival < 64)"))) + +(define_constraint "Pg" + "@internal In Thumb-2 state, a constant in the range 0 to 31" + (and (match_code "const_int") + (match_test "TARGET_THUMB2 && ival >= 0 && ival <= 31"))) + (define_constraint "Ps" "@internal In Thumb-2 state a constant in the range -255 to +255" (and (match_code "const_int") diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index d7caa37..6492721 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1090,6 +1090,296 @@ DONE; }) +;; 64-bit shifts + +; The shift amount needs to be negated for right-shifts +(define_insn "signed_shift_di3_neon" + [(set (match_operand:DI 0 "s_register_operand" "=w") + (unspec:DI [(match_operand:DI 1 "s_register_operand" " w") + (match_operand:SI 2 "s_register_operand" " T")] + UNSPEC_ASHIFT_SIGNED))] + "TARGET_NEON" + "vshl.s64\t%P0, %P1, %E2" + [(set_attr "neon_type" "neon_vshl_ddd")] +) + +; The shift amount needs to be negated for right-shifts +(define_insn "unsigned_shift_di3_neon" + [(set (match_operand:DI 0 "s_register_operand" "=w") + (unspec:DI [(match_operand:DI 1 "s_register_operand" " w") + (match_operand:SI 2 "s_register_operand" " T")] + UNSPEC_ASHIFT_UNSIGNED))] + "TARGET_NEON" + "vshl.u64\t%P0, %P1, %E2" + [(set_attr "neon_type" "neon_vshl_ddd")] +) + +(define_insn "ashldi3_neon_noclobber" + [(set (match_operand:DI 0 "s_register_operand" "=w, w") + (ashift:DI (match_operand:DI 1 "s_register_operand" " w, w") + (match_operand:SI 2 "shift_amount_64" " T,Pe")))] + "TARGET_NEON" + "@ + vshl.u64\t%P0, %P1, %E2 + vshl.u64\t%P0, %P1, %2" + [(set_attr "neon_type" "neon_vshl_ddd,neon_vshl_ddd")] +) + +(define_insn_and_split "ashldi3_neon" + [(set (match_operand:DI 0 "s_register_operand" "=w, w,?&r,?&r,?r,?r,?r,?w,?w") + (ashift:DI (match_operand:DI 1 "s_register_operand" " w, w, 0, r, r, r, r, w, w") + (match_operand:SI 2 "shift_amount_64" " T,Pe, r, r,P1,Pf,Pg, T,Pe"))) + (clobber (match_scratch:SI 3 "=X, X, r, r, X, X, r, X, X")) + (clobber (match_scratch:SI 4 "=X, X, r, r, X, X, X, X, X")) + (clobber (reg:CC_C CC_REGNUM))] + "TARGET_NEON" + "#" + "TARGET_NEON && reload_completed" + [(const_int 0)] + " + { + if (IS_VFP_REGNUM (REGNO (operands[0]))) + /* Ditch the unnecessary clobbers. */ + emit_insn (gen_ashldi3_neon_noclobber (operands[0], operands[1], + operands[2])); + else if (CONST_INT_P (operands[2]) && INTVAL (operands[2]) == 1) + /* This clobbers CC. */ + emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1])); + else + arm_emit_coreregs_64bit_shift (ASHIFT, operands[0], operands[1], + operands[2], operands[3], operands[4]); + DONE; + }" + [(set_attr "length" "*,*,24,24,8,12,12,*,*") + (set_attr "arch" "nota8,nota8,*,*,*,*,*,onlya8,onlya8") + (set_attr_alternative "insn_enabled" + [(const_string "yes") + (const_string "yes") + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (const_string "yes") + (const_string "yes")])] +) + +(define_insn "ashrdi3_neon_imm_noclobber" + [(set (match_operand:DI 0 "s_register_operand" "=w") + (ashiftrt:DI (match_operand:DI 1 "s_register_operand" " w") + (match_operand:SI 2 "int_0_to_63" "Pe")))] + "TARGET_NEON" + "vshr.s64\t%P0, %P1, %2" + [(set_attr "neon_type" "neon_vshl_ddd")] +) + +(define_insn_and_split "ashrdi3_neon_imm" + [(set (match_operand:DI 0 "s_register_operand" "=w,?r,?r,?r,?w") + (ashiftrt:DI (match_operand:DI 1 "s_register_operand" " w, r, r, r, w") + (match_operand:SI 2 "int_0_to_63" "Pe,P1,Pf,Pg,Pe"))) + (clobber (match_scratch:SI 3 "=X, X, X, r, X")) + (clobber (reg:CC_C CC_REGNUM))] + "TARGET_NEON" + "#" + "TARGET_NEON && reload_completed" + [(const_int 0)] + " + { + if (IS_VFP_REGNUM (REGNO (operands[0]))) + /* Ditch the unnecessary clobbers. */ + emit_insn (gen_ashrdi3_neon_imm_noclobber (operands[0], operands[1], + operands[2])); + else if (INTVAL (operands[2]) == 1) + /* This clobbers CC. */ + emit_insn (gen_arm_ashrdi3_1bit (operands[0], operands[1])); + else + arm_emit_coreregs_64bit_shift (ASHIFTRT, operands[0], operands[1], + operands[2], operands[3], NULL); + DONE; + }" + [(set_attr "length" "*,8,12,12,*") + (set_attr "arch" "nota8,*,*,*,onlya8") + (set_attr_alternative "insn_enabled" + [(const_string "yes") + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (const_string "yes")])] +) + +(define_insn_and_split "ashrdi3_neon_reg" + [(set (match_operand:DI 0 "s_register_operand" "=w,w,?&r,?&r,?w,?w") + (unspec:DI [(match_operand:DI 1 "s_register_operand" " w,w, 0, r, w, w") + (match_operand:SI 2 "s_register_operand" " r,r, r, r, r, r")] + UNSPEC_ASHIFT_SIGNED)) + (clobber (match_scratch:SI 3 "=2,r, r, r, 2, r")) + (clobber (match_scratch:SI 4 "=T,T, r, r, T, T")) + (clobber (reg:CC CC_REGNUM))] + "TARGET_NEON" + "#" + "TARGET_NEON && reload_completed" + [(const_int 0)] + " + { + if (IS_VFP_REGNUM (REGNO (operands[0]))) + { + emit_insn (gen_negsi2 (operands[3], operands[2])); + emit_insn (gen_rtx_SET (SImode, operands[4], operands[3])); + emit_insn (gen_signed_shift_di3_neon (operands[0], operands[1], + operands[4])); + } + else + /* This clobbers CC (ASHIFTRT only). */ + arm_emit_coreregs_64bit_shift (ASHIFTRT, operands[0], operands[1], + operands[2], operands[3], operands[4]); + DONE; + }" + [(set_attr "length" "12,12,24,24,12,12") + (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8") + (set_attr_alternative "insn_enabled" + [(const_string "yes") + (const_string "yes") + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (const_string "yes") + (const_string "yes")])] +) + +(define_expand "ashrdi3_neon" + [(match_operand:DI 0 "s_register_operand" "") + (match_operand:DI 1 "s_register_operand" "") + (match_operand:SI 2 "shift_amount_64" "")] + "TARGET_NEON" +{ + if (CONST_INT_P (operands[2])) + emit_insn (gen_ashrdi3_neon_imm (operands[0], operands[1], operands[2])); + else + emit_insn (gen_ashrdi3_neon_reg (operands[0], operands[1], operands[2])); + DONE; +}) + +(define_insn "lshrdi3_neon_imm_noclobber" + [(set (match_operand:DI 0 "s_register_operand" "=w") + (lshiftrt:DI (match_operand:DI 1 "s_register_operand" " w") + (match_operand:SI 2 "int_0_to_63" "Pe")))] + "TARGET_NEON" + "vshr.u64\t%P0, %P1, %2" + [(set_attr "neon_type" "neon_vshl_ddd")] +) + +(define_insn_and_split "lshrdi3_neon_imm" + [(set (match_operand:DI 0 "s_register_operand" "=w,?r,?r,?r,?w") + (lshiftrt:DI (match_operand:DI 1 "s_register_operand" " w, r, r, r, w") + (match_operand:SI 2 "int_0_to_63" "Pe,P1,Pf,Pg,Pe"))) + (clobber (match_scratch:SI 3 "=X, X, X, r, X")) + (clobber (reg:CC_C CC_REGNUM))] + "TARGET_NEON" + "#" + "TARGET_NEON && reload_completed" + [(const_int 0)] + " + { + if (IS_VFP_REGNUM (REGNO (operands[0]))) + /* Ditch the unnecessary clobbers. */ + emit_insn (gen_lshrdi3_neon_imm_noclobber (operands[0], operands[1], + operands[2])); + else if (INTVAL (operands[2]) == 1) + /* This clobbers CC. */ + emit_insn (gen_arm_lshrdi3_1bit (operands[0], operands[1])); + else + arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], operands[1], + operands[2], operands[3], NULL); + DONE; + }" + [(set_attr "length" "4,8,12,12,4") + (set_attr "arch" "nota8,*,*,*,onlya8") + (set_attr_alternative "insn_enabled" + [(const_string "yes") + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (const_string "yes")])] +) + +(define_insn_and_split "lshrdi3_neon_reg" + [(set (match_operand:DI 0 "s_register_operand" "=w,w,?&r,?&r,?w,?w") + (unspec:DI [(match_operand:DI 1 "s_register_operand" " w,w, 0, r, w, w") + (match_operand:SI 2 "s_register_operand" " r,r, r, r, r, r")] + UNSPEC_ASHIFT_UNSIGNED)) + (clobber (match_scratch:SI 3 "=2,r, r, r, 2, r")) + (clobber (match_scratch:SI 4 "=T,T, r, r, T, T"))] + "TARGET_NEON" + "#" + "TARGET_NEON && reload_completed" + [(const_int 0)] + " + { + if (IS_VFP_REGNUM (REGNO (operands[0]))) + { + emit_insn (gen_negsi2 (operands[3], operands[2])); + emit_insn (gen_rtx_SET (SImode, operands[4], operands[3])); + emit_insn (gen_unsigned_shift_di3_neon (operands[0], operands[1], + operands[4])); + } + else + arm_emit_coreregs_64bit_shift (LSHIFTRT, operands[0], operands[1], + operands[2], operands[3], operands[4]); + DONE; + }" + [(set_attr "length" "12,12,24,24,12,12") + (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8") + (set_attr_alternative "insn_enabled" + [(const_string "yes") + (const_string "yes") + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (if_then_else (match_test "optimize_function_for_size_p (cfun)") + (const_string "no") + (const_string "yes")) + (const_string "yes") + (const_string "yes")])] +) + +(define_expand "lshrdi3_neon" + [(match_operand:DI 0 "s_register_operand" "") + (match_operand:DI 1 "s_register_operand" "") + (match_operand:SI 2 "shift_amount_64" "")] + "TARGET_NEON" +{ + if (CONST_INT_P (operands[2])) + emit_insn (gen_lshrdi3_neon_imm (operands[0], operands[1], operands[2])); + else + emit_insn (gen_lshrdi3_neon_reg (operands[0], operands[1], operands[2])); + DONE; +}) + ;; Widening operations (define_insn "widen_ssum3" diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index b535335..64eb3b8 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -769,3 +769,11 @@ (define_special_predicate "add_operator" (match_code "plus")) + +(define_predicate "int_0_to_63" + (and (match_code "const_int") + (match_test "IN_RANGE (INTVAL (op), 0, 63)"))) + +(define_predicate "shift_amount_64" + (ior (match_operand 0 "s_register_operand") + (match_operand 0 "int_0_to_63")))