From patchwork Thu Oct 8 08:54:11 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrylo Tkachov X-Patchwork-Id: 54649 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-lb0-f197.google.com (mail-lb0-f197.google.com [209.85.217.197]) by patches.linaro.org (Postfix) with ESMTPS id D32C522FF8 for ; Thu, 8 Oct 2015 08:54:34 +0000 (UTC) Received: by lbcao8 with SMTP id ao8sf20101630lbc.1 for ; Thu, 08 Oct 2015 01:54:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:message-id:date:from:user-agent:mime-version:to:cc :subject:content-type:x-original-sender :x-original-authentication-results; bh=Nr5Xn/x8yYYQ54wHRPIXxxEs6863KU/UPZUMubV9tTM=; b=dOqo9eqBdRGf2ut9iqmCqdPDBPeOgCL264Eo2ZfeE6xkT3EJqE54u03opwehzKGoG7 0Ax2kMj+FUkwZfnGAW5Jz97KhVWalVYAL5uFuJmGGWfrB05AJUagX8XPVEY8x0xQg1h6 TmMsxyrQzT41q0GTh+nD2nPxNAyGbULL3s3uQnuH5agJryux+gY31l0MmfQdNHIaU5GX c9U8IxVgA8s/uu4t7eHfUJ3VN3sWBjh1i8DwVmF1XHRB8eFkmlWWWzV63odqHte635kq pyaA0ewqq03tmNVNewsezzLY/AGMH+M2+AbxCvrejgwh5ZgsrsNhPT8B6DRRwm4Ulcwu M1bg== X-Gm-Message-State: ALoCoQlkKCFnmM/KF24kHtwjjoiCI4/WfPmN5jz2cmt7z1DYBQV1Xneop+5DMLab/Tyr9jLa1I1b X-Received: by 10.180.106.197 with SMTP id gw5mr449279wib.7.1444294473767; Thu, 08 Oct 2015 01:54:33 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.25.21.96 with SMTP id l93ls188652lfi.20.gmail; Thu, 08 Oct 2015 01:54:33 -0700 (PDT) X-Received: by 10.112.32.72 with SMTP id g8mr2984864lbi.22.1444294473464; Thu, 08 Oct 2015 01:54:33 -0700 (PDT) Received: from mail-lb0-x234.google.com (mail-lb0-x234.google.com. [2a00:1450:4010:c04::234]) by mx.google.com with ESMTPS id n7si28840671lbc.113.2015.10.08.01.54.33 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 08 Oct 2015 01:54:33 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::234 as permitted sender) client-ip=2a00:1450:4010:c04::234; Received: by lbos8 with SMTP id s8so38787330lbo.0 for ; Thu, 08 Oct 2015 01:54:33 -0700 (PDT) X-Received: by 10.112.151.106 with SMTP id up10mr3076006lbb.106.1444294473090; Thu, 08 Oct 2015 01:54:33 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.59.35 with SMTP id w3csp448502lbq; Thu, 8 Oct 2015 01:54:32 -0700 (PDT) X-Received: by 10.66.235.4 with SMTP id ui4mr6667341pac.129.1444294471492; Thu, 08 Oct 2015 01:54:31 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id pf9si64823893pbb.141.2015.10.08.01.54.31 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 08 Oct 2015 01:54:31 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-409579-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 93846 invoked by alias); 8 Oct 2015 08:54:19 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 93828 invoked by uid 89); 8 Oct 2015 08:54:18 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00, SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 08 Oct 2015 08:54:17 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-28-M3kh7biCQouAWGlHw6lRDQ-1; Thu, 08 Oct 2015 09:54:11 +0100 Received: from [10.2.207.50] ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 8 Oct 2015 09:54:11 +0100 Message-ID: <56162F33.6060103@arm.com> Date: Thu, 08 Oct 2015 09:54:11 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches CC: Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Subject: [PATCH][AArch64] Improve comparison with complex immediates followed by branch/cset X-MC-Unique: M3kh7biCQouAWGlHw6lRDQ-1 X-IsSubscribed: yes X-Original-Sender: kyrylo.tkachov@arm.com X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::234 as permitted sender) smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 Hi all, This patch slightly improves sequences where we want to compare against a complex immediate and branch against the result or perform a cset on it. This means transforming sequences of mov+movk+cmp+branch into sub+subs+branch. Similar for cset. Unfortunately I can't just do this by simply matching a (compare (reg) (const_int)) rtx because this transformation is only valid for equal/not equal comparisons, not greater than/less than ones but the compare instruction pattern only has the general CC mode. We need to also match the use of the condition code. I've done this by creating a splitter for the conditional jump where the condition is the comparison between the register and the complex immediate and splitting it into the sub+subs+condjump sequence. Similar for the cstore pattern. Thankfully we don't split immediate moves until later in the optimization pipeline so combine can still try the right patterns. With this patch for the example code: void g(void); void f8(int x) { if (x != 0x123456) g(); } I get: f8: sub w0, w0, #1191936 subs w0, w0, #1110 beq .L1 b g .p2align 3 .L1: ret instead of the previous: f8: mov w1, 13398 movk w1, 0x12, lsl 16 cmp w0, w1 beq .L1 b g .p2align 3 .L1: ret The condjump case triggered 130 times across all of SPEC2006 which is, admittedly, not much whereas the cstore case didn't trigger at all. However, the included testcase in the patch demonstrates the kind of code that it would trigger on. Bootstrapped and tested on aarch64. Ok for trunk? Thanks, Kyrill 2015-10-08 Kyrylo Tkachov * config/aarch64/aarch64.md (*condjump): Rename to... (condjump): ... This. (*compare_condjump): New define_insn_and_split. (*compare_cstore_insn): Likewise. (*cstore_insn): Rename to... (cstore_insn): ... This. * config/aarch64/iterators.md (CMP): Handle ne code. * config/aarch64/predicates.md (aarch64_imm24): New predicate. 2015-10-08 Kyrylo Tkachov * gcc.target/aarch64/cmpimm_branch_1.c: New test. * gcc.target/aarch64/cmpimm_cset_1.c: Likewise. commit 0c1530fab4c3979fb287f3b960f110e857df79b6 Author: Kyrylo Tkachov Date: Mon Sep 21 10:56:47 2015 +0100 [AArch64] Improve comparison with complex immediates diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 83ea74a..acda64f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -369,7 +369,7 @@ (define_expand "mod3" } ) -(define_insn "*condjump" +(define_insn "condjump" [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" [(match_operand 1 "cc_register" "") (const_int 0)]) (label_ref (match_operand 2 "" "")) @@ -394,6 +394,42 @@ (define_insn "*condjump" (const_int 1)))] ) + +;; For a 24-bit immediate CST we can optimize the compare for equality +;; and branch sequence from: +;; mov x0, #imm1 +;; movk x0, #imm2, lsl 16 // x0 contains CST +;; cmp x1, x0 +;; b .Label +;; into the shorter: +;; sub x0, #(CST & 0xfff000) +;; subs x0, #(CST & 0x000fff) +;; b .Label +(define_insn_and_split "*compare_condjump" + [(set (pc) (if_then_else (EQL + (match_operand:GPI 0 "register_operand" "r") + (match_operand:GPI 1 "aarch64_imm24" "n")) + (label_ref:DI (match_operand 2 "" "")) + (pc)))] + "!aarch64_move_imm (INTVAL (operands[1]), mode) + && !aarch64_plus_operand (operands[1], mode)" + "#" + "&& true" + [(const_int 0)] + { + HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff; + HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000; + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_add3 (tmp, operands[0], GEN_INT (-hi_imm))); + emit_insn (gen_add3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); + rtx cmp_rtx = gen_rtx_fmt_ee (, mode, cc_reg, const0_rtx); + emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2])); + DONE; + } +) + + (define_expand "casesi" [(match_operand:SI 0 "register_operand" "") ; Index (match_operand:SI 1 "const_int_operand" "") ; Lower bound @@ -2894,7 +2930,7 @@ (define_expand "cstore4" " ) -(define_insn "*cstore_insn" +(define_insn "cstore_insn" [(set (match_operand:ALLI 0 "register_operand" "=r") (match_operator:ALLI 1 "aarch64_comparison_operator" [(match_operand 2 "cc_register" "") (const_int 0)]))] @@ -2903,6 +2939,39 @@ (define_insn "*cstore_insn" [(set_attr "type" "csel")] ) +;; For a 24-bit immediate CST we can optimize the compare for equality +;; and branch sequence from: +;; mov x0, #imm1 +;; movk x0, #imm2, lsl 16 // x0 contains CST +;; cmp x1, x0 +;; cset x2, +;; into the shorter: +;; sub x0, #(CST & 0xfff000) +;; subs x0, #(CST & 0x000fff) +;; cset x1, . +(define_insn_and_split "*compare_cstore_insn" + [(set (match_operand:GPI 0 "register_operand" "=r") + (EQL:GPI (match_operand:GPI 1 "register_operand" "r") + (match_operand:GPI 2 "aarch64_imm24" "n")))] + "!aarch64_move_imm (INTVAL (operands[2]), mode) + && !aarch64_plus_operand (operands[2], mode)" + "#" + "&& true" + [(const_int 0)] + { + HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff; + HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000; + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_add3 (tmp, operands[1], GEN_INT (-hi_imm))); + emit_insn (gen_add3_compare0 (tmp, tmp, GEN_INT (-lo_imm))); + rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM); + rtx cmp_rtx = gen_rtx_fmt_ee (, mode, cc_reg, const0_rtx); + emit_insn (gen_cstore_insn (operands[0], cmp_rtx, cc_reg)); + DONE; + } + [(set_attr "type" "csel")] +) + ;; zero_extend version of the above (define_insn "*cstoresi_insn_uxtw" [(set (match_operand:DI 0 "register_operand" "=r") diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index a1436ac..8b2663b 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -798,7 +798,7 @@ (define_code_attr cmp_2 [(lt "1") (le "1") (eq "2") (ge "2") (gt "2") (ltu "1") (leu "1") (geu "2") (gtu "2")]) (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT") - (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")]) + (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") (gtu "GTU")]) (define_code_attr fix_trunc_optab [(fix "fix_trunc") (unsigned_fix "fixuns_trunc")]) diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 7b852a4..1b62432 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -138,6 +138,11 @@ (define_predicate "aarch64_imm3" (and (match_code "const_int") (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4"))) +;; An immediate that fits into 24 bits. +(define_predicate "aarch64_imm24" + (and (match_code "const_int") + (match_test "(UINTVAL (op) & 0xffffff) == UINTVAL (op)"))) + (define_predicate "aarch64_pwr_imm3" (and (match_code "const_int") (match_test "INTVAL (op) != 0 diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c new file mode 100644 index 0000000..d7a8d5b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-options "-save-temps -O2" } */ + +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ + +void g (void); +void +foo (int x) +{ + if (x != 0x123456) + g (); +} + +void +fool (long long x) +{ + if (x != 0x123456) + g (); +} + +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c new file mode 100644 index 0000000..619c026 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options "-save-temps -O2" } */ + +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp. */ + +int +foo (int x) +{ + return x == 0x123456; +} + +long +fool (long x) +{ + return x == 0x123456; +} + +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */ +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */