From patchwork Mon Nov 23 10:33:01 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
X-Patchwork-Id: 57137
Delivered-To: patch@linaro.org
Received: by 10.112.155.196 with SMTP id vy4csp1350735lbb;
 Mon, 23 Nov 2015 02:33:24 -0800 (PST)
X-Received: by 10.68.87.227 with SMTP id bb3mr34548746pbb.160.1448274803979; 
 Mon, 23 Nov 2015 02:33:23 -0800 (PST)
Return-Path: <gcc-patches-return-414959-patch=linaro.org@gcc.gnu.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 hm2si18366765pac.186.2015.11.23.02.33.23 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 23 Nov 2015 02:33:23 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-414959-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com; spf=pass (google.com: domain of
 gcc-patches-return-414959-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=gcc-patches-return-414959-patch=linaro.org@gcc.gnu.org;
 dkim=pass header.i=@gcc.gnu.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :message-id:date:from:mime-version:to:cc:subject:references
 :in-reply-to:content-type; q=dns; s=default; b=qfcxAp6u9cFnChkH0
 g0/ghELfZB7kqBR+iQQrhXsNSKyI/k43Y8Ug8A8LDbm5QhHsWzXzHZ2XFLywybr1
 yQ0hVqJYS14pvmzLnoUKnJVLt7AcI3fvDdChO85SOWu4fmJo1D8ka1xmlorS+WV5
 VvYLPzpfJB95RgBMEixG7mH96Y=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :message-id:date:from:mime-version:to:cc:subject:references
 :in-reply-to:content-type; s=default; bh=FWNEk+FrExg+tk4zRp7AUX4
 6qno=; b=v+m+fn5SCrhXcKrq7QRWer7xmn87rbPltrqYMHhyiR7Kt4i2QK7rAhO
 SZINXobnbaXVFpb9tlJJCAsxtwRwCjP2ffGGUmorFjZXJO1m6yCnQ0QXb9vwFkhA
 +HU9b4k2WS0tkrhCriRKKcX0zDwQjvwEwF7IG9vszgfzubR73FC0=
Received: (qmail 11426 invoked by alias); 23 Nov 2015 10:33:11 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-patch=linaro.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 11377 invoked by uid 89); 23 Nov 2015 10:33:10 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.7 required=5.0 tests=AWL, BAYES_00,
 SPF_PASS autolearn=ham version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO
 eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by
 sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
 Mon, 23 Nov 2015 10:33:07 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com
 [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id
 uk-mta-13-R34RisoPSNucAGwhMifijw-1; Mon, 23 Nov 2015 10:33:02 +0000
Received: from [10.2.206.200] ([10.1.2.79]) by cam-owa1.Emea.Arm.com with
 Microsoft SMTPSVC(6.0.3790.3959); Mon, 23 Nov 2015 10:33:02 +0000
Message-ID: <5652EB5D.8040002@arm.com>
Date: Mon, 23 Nov 2015 10:33:01 +0000
From: Kyrill Tkachov <kyrylo.tkachov@arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: James Greenhalgh <james.greenhalgh@arm.com>
CC: GCC Patches <gcc-patches@gcc.gnu.org>,
 Marcus Shawcroft <marcus.shawcroft@arm.com>,
 Richard Earnshaw <Richard.Earnshaw@arm.com>
Subject: Re: [PATCH][AArch64][v2] Improve comparison with complex immediates
 followed by branch/cset
References: <5638D61C.5060100@arm.com> <20151112120543.GA22716@arm.com>
In-Reply-To: <20151112120543.GA22716@arm.com>
X-MC-Unique: R34RisoPSNucAGwhMifijw-1
X-IsSubscribed: yes

On 12/11/15 12:05, James Greenhalgh wrote:
> On Tue, Nov 03, 2015 at 03:43:24PM +0000, Kyrill Tkachov wrote:
>> Hi all,
>>
>> Bootstrapped and tested on aarch64.
>>
>> Ok for trunk?
> Comments in-line.
>

Here's an updated patch according to your comments.
Sorry it took so long to respin it, had other things to deal with with
stage1 closing...

I've indented the sample code sequences and used valid mnemonics.
These patterns can only match during combine, so I'd expect them to always
split during combine or immediately after, but I don't think that's a documented
guarantee so I've gated them on !reload_completed.

I've used IN_RANGE in the predicate.md hunk and added scan-assembler checks
in the tests.

Is this ok?

Thanks,
Kyrill

2015-11-20  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md (*condjump): Rename to...
     (condjump): ... This.
     (*compare_condjump<mode>): New define_insn_and_split.
     (*compare_cstore<mode>_insn): Likewise.
     (*cstore<mode>_insn): Rename to...
     (cstore<mode>_insn): ... This.
     * config/aarch64/iterators.md (CMP): Handle ne code.
     * config/aarch64/predicates.md (aarch64_imm24): New predicate.

2015-11-20  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/cmpimm_branch_1.c: New test.
     * gcc.target/aarch64/cmpimm_cset_1.c: Likewise.

>> Thanks,
>> Kyrill
>>
>>
>> 2015-11-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * config/aarch64/aarch64.md (*condjump): Rename to...
>>      (condjump): ... This.
>>      (*compare_condjump<mode>): New define_insn_and_split.
>>      (*compare_cstore<mode>_insn): Likewise.
>>      (*cstore<mode>_insn): Rename to...
>>      (aarch64_cstore<mode>): ... This.
>>      * config/aarch64/iterators.md (CMP): Handle ne code.
>>      * config/aarch64/predicates.md (aarch64_imm24): New predicate.
>>
>> 2015-11-03  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * gcc.target/aarch64/cmpimm_branch_1.c: New test.
>>      * gcc.target/aarch64/cmpimm_cset_1.c: Likewise.
>> commit 7df013a391532f39932b80c902e3b4bbd841710f
>> Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
>> Date:   Mon Sep 21 10:56:47 2015 +0100
>>
>>      [AArch64] Improve comparison with complex immediates
>>
>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> index 126c9c2..1bfc870 100644
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -369,7 +369,7 @@ (define_expand "mod<mode>3"
>>     }
>>   )
>>   
>> -(define_insn "*condjump"
>> +(define_insn "condjump"
>>     [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
>>   			    [(match_operand 1 "cc_register" "") (const_int 0)])
>>   			   (label_ref (match_operand 2 "" ""))
>> @@ -394,6 +394,40 @@ (define_insn "*condjump"
>>   		      (const_int 1)))]
>>   )
>>   
>> +;; For a 24-bit immediate CST we can optimize the compare for equality
>> +;; and branch sequence from:
>> +;; mov	x0, #imm1
>> +;; movk	x0, #imm2, lsl 16 /* x0 contains CST.  */
>> +;; cmp	x1, x0
>> +;; b<ne,eq> .Label
> This would be easier on the eyes if you were to indent the code sequence.
>
> +;; and branch sequence from:
> +;;     mov	x0, #imm1
> +;;     movk	x0, #imm2, lsl 16 /* x0 contains CST.  */
> +;;     cmp	x1, x0
> +;;     b<ne,eq> .Label
> +;; into the shorter:
> +;;     sub	x0, #(CST & 0xfff000)
>
>> +;; into the shorter:
>> +;; sub	x0, #(CST & 0xfff000)
>> +;; subs	x0, #(CST & 0x000fff)
> These instructions are not valid (2 operand sub/subs?) can you write them
> out fully for this comment so I can see the data flow?
>
>> +;; b<ne,eq> .Label
>> +(define_insn_and_split "*compare_condjump<mode>"
>> +  [(set (pc) (if_then_else (EQL
>> +			      (match_operand:GPI 0 "register_operand" "r")
>> +			      (match_operand:GPI 1 "aarch64_imm24" "n"))
>> +			   (label_ref:P (match_operand 2 "" ""))
>> +			   (pc)))]
>> +  "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode)
>> +   && !aarch64_plus_operand (operands[1], <MODE>mode)"
>> +  "#"
>> +  "&& true"
>> +  [(const_int 0)]
>> +  {
>> +    HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff;
>> +    HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000;
>> +    rtx tmp = gen_reg_rtx (<MODE>mode);
> Can you guarantee we can always create this pseudo? What if we're a
> post-register-allocation split?
>
>> +    emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm)));
>> +    emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
>> +    rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
>> +    rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx);
>> +    emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
>> +    DONE;
>> +  }
>> +)
>> +
>>   (define_expand "casesi"
>>     [(match_operand:SI 0 "register_operand" "")	; Index
>>      (match_operand:SI 1 "const_int_operand" "")	; Lower bound
>> @@ -2898,7 +2932,7 @@ (define_expand "cstore<mode>4"
>>     "
>>   )
>>   
>> -(define_insn "*cstore<mode>_insn"
>> +(define_insn "aarch64_cstore<mode>"
>>     [(set (match_operand:ALLI 0 "register_operand" "=r")
>>   	(match_operator:ALLI 1 "aarch64_comparison_operator"
>>   	 [(match_operand 2 "cc_register" "") (const_int 0)]))]
>> @@ -2907,6 +2941,39 @@ (define_insn "*cstore<mode>_insn"
>>     [(set_attr "type" "csel")]
>>   )
>>   
>> +;; For a 24-bit immediate CST we can optimize the compare for equality
>> +;; and branch sequence from:
>> +;; mov	x0, #imm1
>> +;; movk	x0, #imm2, lsl 16 /* x0 contains CST.  */
>> +;; cmp	x1, x0
>> +;; cset	x2, <ne,eq>
>> +;; into the shorter:
>> +;; sub	x0, #(CST & 0xfff000)
>> +;; subs	x0, #(CST & 0x000fff)
>> +;; cset x1, <ne, eq>.
> Same comments as above regarding formatting and making this a valid set
> of instructions.
>
>> +(define_insn_and_split "*compare_cstore<mode>_insn"
>> +  [(set (match_operand:GPI 0 "register_operand" "=r")
>> +	 (EQL:GPI (match_operand:GPI 1 "register_operand" "r")
>> +		  (match_operand:GPI 2 "aarch64_imm24" "n")))]
>> +  "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode)
>> +   && !aarch64_plus_operand (operands[2], <MODE>mode)"
>> +  "#"
>> +  "&& true"
>> +  [(const_int 0)]
>> +  {
>> +    HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff;
>> +    HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000;
>> +    rtx tmp = gen_reg_rtx (<MODE>mode);
>> +    emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm)));
>> +    emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
>> +    rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
>> +    rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx);
>> +    emit_insn (gen_aarch64_cstore<mode> (operands[0], cmp_rtx, cc_reg));
>> +    DONE;
>> +  }
>> +  [(set_attr "type" "csel")]
>> +)
>> +
>>   ;; zero_extend version of the above
>>   (define_insn "*cstoresi_insn_uxtw"
>>     [(set (match_operand:DI 0 "register_operand" "=r")
>> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
>> index c4a1c98..9f63ef2 100644
>> --- a/gcc/config/aarch64/iterators.md
>> +++ b/gcc/config/aarch64/iterators.md
>> @@ -801,7 +801,7 @@ (define_code_attr cmp_2   [(lt "1") (le "1") (eq "2") (ge "2") (gt "2")
>>   			   (ltu "1") (leu "1") (geu "2") (gtu "2")])
>>   
>>   (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT")
>> -			   (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")])
>> +			   (ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU") (gtu "GTU")])
>>   
>>   (define_code_attr fix_trunc_optab [(fix "fix_trunc")
>>   				   (unsigned_fix "fixuns_trunc")])
>> diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
>> index 046f852..1bcbf62 100644
>> --- a/gcc/config/aarch64/predicates.md
>> +++ b/gcc/config/aarch64/predicates.md
>> @@ -145,6 +145,11 @@ (define_predicate "aarch64_imm3"
>>     (and (match_code "const_int")
>>          (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4")))
>>   
>> +;; An immediate that fits into 24 bits.
>> +(define_predicate "aarch64_imm24"
>> +  (and (match_code "const_int")
>> +       (match_test "(UINTVAL (op) & 0xffffff) == UINTVAL (op)")))
>> +
> IN_RANGE (UINTVAL (op), 0, 0xffffff) ?
>
> We use quite a few different ways to check an immediate fits in a particular
> range in the AArch64 backend, it would be good to pick just one idiomatic
> way.
>
>>   (define_predicate "aarch64_pwr_imm3"
>>     (and (match_code "const_int")
>>          (match_test "INTVAL (op) != 0
>> diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c
>> new file mode 100644
>> index 0000000..d7a8d5b
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c
>> @@ -0,0 +1,22 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-save-temps -O2" } */
>> +
>> +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp.  */
>> +
> This just tests that we don't emit cmp, it doesn't test anything else.
>
>> +void g (void);
>> +void
>> +foo (int x)
>> +{
>> +  if (x != 0x123456)
>> +    g ();
>> +}
>> +
>> +void
>> +fool (long long x)
>> +{
>> +  if (x != 0x123456)
>> +    g ();
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */
>> +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */
>> diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c
>> new file mode 100644
>> index 0000000..619c026
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c
>> @@ -0,0 +1,19 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-save-temps -O2" } */
>> +
>> +/* Test that we emit a sub+subs sequence rather than mov+movk+cmp.  */
> Likewise, I don't see any checks for sub/subs.
>
>> +
>> +int
>> +foo (int x)
>> +{
>> +  return x == 0x123456;
>> +}
>> +
>> +long
>> +fool (long x)
>> +{
>> +  return x == 0x123456;
>> +}
>> +
>> +/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */
>> +/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */
> Thanks,
> James
>

commit bb44feed4e6beaae25d9bdffa45073dc61c65838
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Mon Sep 21 10:56:47 2015 +0100

    [AArch64] Improve comparison with complex immediates

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 11f6387..3e57d08 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -372,7 +372,7 @@ (define_expand "mod<mode>3"
   }
 )
 
-(define_insn "*condjump"
+(define_insn "condjump"
   [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
 			    [(match_operand 1 "cc_register" "") (const_int 0)])
 			   (label_ref (match_operand 2 "" ""))
@@ -397,6 +397,41 @@ (define_insn "*condjump"
 		      (const_int 1)))]
 )
 
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; 	mov	x0, #imm1
+;; 	movk	x0, #imm2, lsl 16 /* x0 contains CST.  */
+;; 	cmp	x1, x0
+;; 	b<ne,eq> .Label
+;; into the shorter:
+;; 	sub	x0, x0, #(CST & 0xfff000)
+;; 	subs	x0, x0, #(CST & 0x000fff)
+;; 	b<ne,eq> .Label
+(define_insn_and_split "*compare_condjump<mode>"
+  [(set (pc) (if_then_else (EQL
+			      (match_operand:GPI 0 "register_operand" "r")
+			      (match_operand:GPI 1 "aarch64_imm24" "n"))
+			   (label_ref:P (match_operand 2 "" ""))
+			   (pc)))]
+  "!aarch64_move_imm (INTVAL (operands[1]), <MODE>mode)
+   && !aarch64_plus_operand (operands[1], <MODE>mode)
+   && !reload_completed"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+    HOST_WIDE_INT lo_imm = UINTVAL (operands[1]) & 0xfff;
+    HOST_WIDE_INT hi_imm = UINTVAL (operands[1]) & 0xfff000;
+    rtx tmp = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_add<mode>3 (tmp, operands[0], GEN_INT (-hi_imm)));
+    emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
+    rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+    rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx);
+    emit_jump_insn (gen_condjump (cmp_rtx, cc_reg, operands[2]));
+    DONE;
+  }
+)
+
 (define_expand "casesi"
   [(match_operand:SI 0 "register_operand" "")	; Index
    (match_operand:SI 1 "const_int_operand" "")	; Lower bound
@@ -2901,7 +2936,7 @@ (define_expand "cstore<mode>4"
   "
 )
 
-(define_insn "*cstore<mode>_insn"
+(define_insn "aarch64_cstore<mode>"
   [(set (match_operand:ALLI 0 "register_operand" "=r")
 	(match_operator:ALLI 1 "aarch64_comparison_operator"
 	 [(match_operand 2 "cc_register" "") (const_int 0)]))]
@@ -2910,6 +2945,40 @@ (define_insn "*cstore<mode>_insn"
   [(set_attr "type" "csel")]
 )
 
+;; For a 24-bit immediate CST we can optimize the compare for equality
+;; and branch sequence from:
+;; 	mov	x0, #imm1
+;; 	movk	x0, #imm2, lsl 16 /* x0 contains CST.  */
+;; 	cmp	x1, x0
+;; 	cset	x2, <ne,eq>
+;; into the shorter:
+;; 	sub	x0, x0, #(CST & 0xfff000)
+;; 	subs	x0, x0, #(CST & 0x000fff)
+;; 	cset x1, <ne, eq>.
+(define_insn_and_split "*compare_cstore<mode>_insn"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	 (EQL:GPI (match_operand:GPI 1 "register_operand" "r")
+		  (match_operand:GPI 2 "aarch64_imm24" "n")))]
+  "!aarch64_move_imm (INTVAL (operands[2]), <MODE>mode)
+   && !aarch64_plus_operand (operands[2], <MODE>mode)
+   && !reload_completed"
+  "#"
+  "&& true"
+  [(const_int 0)]
+  {
+    HOST_WIDE_INT lo_imm = UINTVAL (operands[2]) & 0xfff;
+    HOST_WIDE_INT hi_imm = UINTVAL (operands[2]) & 0xfff000;
+    rtx tmp = gen_reg_rtx (<MODE>mode);
+    emit_insn (gen_add<mode>3 (tmp, operands[1], GEN_INT (-hi_imm)));
+    emit_insn (gen_add<mode>3_compare0 (tmp, tmp, GEN_INT (-lo_imm)));
+    rtx cc_reg = gen_rtx_REG (CC_NZmode, CC_REGNUM);
+    rtx cmp_rtx = gen_rtx_fmt_ee (<EQL:CMP>, <MODE>mode, cc_reg, const0_rtx);
+    emit_insn (gen_aarch64_cstore<mode> (operands[0], cmp_rtx, cc_reg));
+    DONE;
+  }
+  [(set_attr "type" "csel")]
+)
+
 ;; zero_extend version of the above
 (define_insn "*cstoresi_insn_uxtw"
   [(set (match_operand:DI 0 "register_operand" "=r")
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index c2eb7de..422bc87 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -824,7 +824,8 @@ (define_code_attr cmp_2   [(lt "1") (le "1") (eq "2") (ge "2") (gt "2")
 			   (ltu "1") (leu "1") (geu "2") (gtu "2")])
 
 (define_code_attr CMP [(lt "LT") (le "LE") (eq "EQ") (ge "GE") (gt "GT")
-			   (ltu "LTU") (leu "LEU") (geu "GEU") (gtu "GTU")])
+			(ltu "LTU") (leu "LEU") (ne "NE") (geu "GEU")
+			(gtu "GTU")])
 
 (define_code_attr fix_trunc_optab [(fix "fix_trunc")
 				   (unsigned_fix "fixuns_trunc")])
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index e7f76e0..c0c3ff5 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -145,6 +145,11 @@ (define_predicate "aarch64_imm3"
   (and (match_code "const_int")
        (match_test "(unsigned HOST_WIDE_INT) INTVAL (op) <= 4")))
 
+;; An immediate that fits into 24 bits.
+(define_predicate "aarch64_imm24"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (UINTVAL (op), 0, 0xffffff)")))
+
 (define_predicate "aarch64_pwr_imm3"
   (and (match_code "const_int")
        (match_test "INTVAL (op) != 0
diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c
new file mode 100644
index 0000000..7ad736b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_branch_1.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O2" } */
+
+/* Test that we emit a sub+subs sequence rather than mov+movk+cmp.  */
+
+void g (void);
+void
+foo (int x)
+{
+  if (x != 0x123456)
+    g ();
+}
+
+void
+fool (long long x)
+{
+  if (x != 0x123456)
+    g ();
+}
+
+/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-times "sub\tw\[0-9\]+.*" 1 } } */
+/* { dg-final { scan-assembler-times "sub\tx\[0-9\]+.*" 1 } } */
+/* { dg-final { scan-assembler-times "subs\tw\[0-9\]+.*" 1 } } */
+/* { dg-final { scan-assembler-times "subs\tx\[0-9\]+.*" 1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c
new file mode 100644
index 0000000..6a03cc9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/cmpimm_cset_1.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-save-temps -O2" } */
+
+/* Test that we emit a sub+subs sequence rather than mov+movk+cmp.  */
+
+int
+foo (int x)
+{
+  return x == 0x123456;
+}
+
+long
+fool (long x)
+{
+  return x == 0x123456;
+}
+
+/* { dg-final { scan-assembler-not "cmp\tw\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-not "cmp\tx\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler-times "sub\tw\[0-9\]+.*" 1 } } */
+/* { dg-final { scan-assembler-times "sub\tx\[0-9\]+.*" 1 } } */
+/* { dg-final { scan-assembler-times "subs\tw\[0-9\]+.*" 1 } } */
+/* { dg-final { scan-assembler-times "subs\tx\[0-9\]+.*" 1 } } */