diff mbox

[AArch64] Use preferred aliases for CSNEG, CSINC, CSINV

Message ID 55E5790A.2090205@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov Sept. 1, 2015, 10:08 a.m. UTC
Hi all,

The ARMv8-A reference manual says:
"CNEG <Wd>, <Wn>, <cond>
is equivalent to
CSNEG <Wd>, <Wn>, <Wn>, invert(<cond>)
and is the preferred disassembly when Rn == Rm && cond != '111x'."

That is, when the two input registers are the same we can use the shorter CNEG mnemonic
with the inverse condition instead of the longer CSNEG instruction. Similarly for the
CSINV and CSINC instructions, they have shorter CINV and CINC forms.
This patch adjusts the output templates to emit the preferred shorter sequences when possible.

The new mnemonics are just aliases, they map down to the same instruction in the end, so there
are no performance or behaviour implications. But it does make the assembly a bit more readable
IMO, since:
"cneg    w27, w9, le"
can be simply read as "if the condition is less or equal negate w9" instead of the previous:
"csneg    w27, w9, w9, gt" where you have to remember which of the input registers is negated.


Bootstrapped and tested on aarch64-linux-gnu.
Ok for trunk?

Thanks,
Kyrill

2015-09-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md (csinc3<mode>_insn): Use CINC
     mnemonic when possible.
     (*csinv3<mode>_insn): Use CINV mnemonic when possible.
     (csneg3<mode>_insn): USE CNEG mnemonic when possible.

2015-09-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/abs_1.c: Update scan-assembler checks
     to allow cneg.
     * gcc.target/aarch64/cond_op_imm_1.c: Likewise.  Likewise for cinv.
     * gcc.target/aarch64/mod_2.c: Likewise.

Comments

Kyrylo Tkachov Sept. 11, 2015, 3:25 p.m. UTC | #1
Ping.
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00020.html

Thanks,
Kyrill

On 01/09/15 11:08, Kyrill Tkachov wrote:
> Hi all,
>
> The ARMv8-A reference manual says:
> "CNEG <Wd>, <Wn>, <cond>
> is equivalent to
> CSNEG <Wd>, <Wn>, <Wn>, invert(<cond>)
> and is the preferred disassembly when Rn == Rm && cond != '111x'."
>
> That is, when the two input registers are the same we can use the shorter CNEG mnemonic
> with the inverse condition instead of the longer CSNEG instruction. Similarly for the
> CSINV and CSINC instructions, they have shorter CINV and CINC forms.
> This patch adjusts the output templates to emit the preferred shorter sequences when possible.
>
> The new mnemonics are just aliases, they map down to the same instruction in the end, so there
> are no performance or behaviour implications. But it does make the assembly a bit more readable
> IMO, since:
> "cneg    w27, w9, le"
> can be simply read as "if the condition is less or equal negate w9" instead of the previous:
> "csneg    w27, w9, w9, gt" where you have to remember which of the input registers is negated.
>
>
> Bootstrapped and tested on aarch64-linux-gnu.
> Ok for trunk?
>
> Thanks,
> Kyrill
>
> 2015-09-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       * config/aarch64/aarch64.md (csinc3<mode>_insn): Use CINC
>       mnemonic when possible.
>       (*csinv3<mode>_insn): Use CINV mnemonic when possible.
>       (csneg3<mode>_insn): USE CNEG mnemonic when possible.
>
> 2015-09-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>       * gcc.target/aarch64/abs_1.c: Update scan-assembler checks
>       to allow cneg.
>       * gcc.target/aarch64/cond_op_imm_1.c: Likewise.  Likewise for cinv.
>       * gcc.target/aarch64/mod_2.c: Likewise.
James Greenhalgh Sept. 11, 2015, 3:31 p.m. UTC | #2
On Tue, Sep 01, 2015 at 11:08:10AM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> The ARMv8-A reference manual says:
> "CNEG <Wd>, <Wn>, <cond>
> is equivalent to
> CSNEG <Wd>, <Wn>, <Wn>, invert(<cond>)
> and is the preferred disassembly when Rn == Rm && cond != '111x'."
> 
> That is, when the two input registers are the same we can use the shorter CNEG mnemonic
> with the inverse condition instead of the longer CSNEG instruction. Similarly for the
> CSINV and CSINC instructions, they have shorter CINV and CINC forms.
> This patch adjusts the output templates to emit the preferred shorter sequences when possible.
> 
> The new mnemonics are just aliases, they map down to the same instruction in the end, so there
> are no performance or behaviour implications. But it does make the assembly a bit more readable
> IMO, since:
> "cneg    w27, w9, le"
> can be simply read as "if the condition is less or equal negate w9" instead of the previous:
> "csneg    w27, w9, w9, gt" where you have to remember which of the input registers is negated.
> 
> 
> Bootstrapped and tested on aarch64-linux-gnu.
> Ok for trunk?
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 77bc7cd..2e4b26c 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -3090,7 +3090,12 @@ (define_insn "csinc3<mode>_insn"
>  		    (const_int 1))
>  	  (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))]
>    ""
> -  "csinc\\t%<w>0, %<w>3, %<w>2, %M1"
> +  {
> +    if (rtx_equal_p (operands[2], operands[3]))
> +      return "cinc\\t%<w>0, %<w>2, %m1";
> +    else
> +      return "csinc\\t%<w>0, %<w>3, %<w>2, %M1";
> +  }
>    [(set_attr "type" "csel")]
>  )

I guess you do it this way rather than just adding a new alternative in
the pattern to avoid any chance of constraining the register allocator, but
would this not be more natural to read as an {r, r, r, 2} alternative, or
similar?

If you've given that some thought and decided it doesn't work for you,
then this is OK for trunk.

Thanks,
James
Kyrylo Tkachov Sept. 11, 2015, 4:15 p.m. UTC | #3
On 11/09/15 16:31, James Greenhalgh wrote:
> On Tue, Sep 01, 2015 at 11:08:10AM +0100, Kyrill Tkachov wrote:
>> Hi all,
>>
>> The ARMv8-A reference manual says:
>> "CNEG <Wd>, <Wn>, <cond>
>> is equivalent to
>> CSNEG <Wd>, <Wn>, <Wn>, invert(<cond>)
>> and is the preferred disassembly when Rn == Rm && cond != '111x'."
>>
>> That is, when the two input registers are the same we can use the shorter CNEG mnemonic
>> with the inverse condition instead of the longer CSNEG instruction. Similarly for the
>> CSINV and CSINC instructions, they have shorter CINV and CINC forms.
>> This patch adjusts the output templates to emit the preferred shorter sequences when possible.
>>
>> The new mnemonics are just aliases, they map down to the same instruction in the end, so there
>> are no performance or behaviour implications. But it does make the assembly a bit more readable
>> IMO, since:
>> "cneg    w27, w9, le"
>> can be simply read as "if the condition is less or equal negate w9" instead of the previous:
>> "csneg    w27, w9, w9, gt" where you have to remember which of the input registers is negated.
>>
>>
>> Bootstrapped and tested on aarch64-linux-gnu.
>> Ok for trunk?
>>
>> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> index 77bc7cd..2e4b26c 100644
>> --- a/gcc/config/aarch64/aarch64.md
>> +++ b/gcc/config/aarch64/aarch64.md
>> @@ -3090,7 +3090,12 @@ (define_insn "csinc3<mode>_insn"
>>   		    (const_int 1))
>>   	  (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))]
>>     ""
>> -  "csinc\\t%<w>0, %<w>3, %<w>2, %M1"
>> +  {
>> +    if (rtx_equal_p (operands[2], operands[3]))
>> +      return "cinc\\t%<w>0, %<w>2, %m1";
>> +    else
>> +      return "csinc\\t%<w>0, %<w>3, %<w>2, %M1";
>> +  }
>>     [(set_attr "type" "csel")]
>>   )
> I guess you do it this way rather than just adding a new alternative in
> the pattern to avoid any chance of constraining the register allocator, but
> would this not be more natural to read as an {r, r, r, 2} alternative, or
> similar?

I had not considered this approach and I'm a bit sceptical on how feasible it is.
If we put the {r,r,r,2} as a second alternative then it will be a purely more restrictive
version of the first alternative and so will never match.
If, however, we put it as the first alternative we'll be expressing some preference for
allocating the same register for operands 2 and 3, which is not something we want to do.

>
> If you've given that some thought and decided it doesn't work for you,
> then this is OK for trunk.
Given the above
I'll commit this version next week if there are no objections.

Thanks,
Kyrill

>
> Thanks,
> James
>
Andrew Pinski Sept. 12, 2015, 1:15 a.m. UTC | #4
On Tue, Sep 1, 2015 at 6:08 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> Hi all,
>
> The ARMv8-A reference manual says:
> "CNEG <Wd>, <Wn>, <cond>
> is equivalent to
> CSNEG <Wd>, <Wn>, <Wn>, invert(<cond>)
> and is the preferred disassembly when Rn == Rm && cond != '111x'."
>
> That is, when the two input registers are the same we can use the shorter
> CNEG mnemonic
> with the inverse condition instead of the longer CSNEG instruction.
> Similarly for the
> CSINV and CSINC instructions, they have shorter CINV and CINC forms.
> This patch adjusts the output templates to emit the preferred shorter
> sequences when possible.
>
> The new mnemonics are just aliases, they map down to the same instruction in
> the end, so there
> are no performance or behaviour implications. But it does make the assembly
> a bit more readable
> IMO, since:
> "cneg    w27, w9, le"
> can be simply read as "if the condition is less or equal negate w9" instead
> of the previous:
> "csneg    w27, w9, w9, gt" where you have to remember which of the input
> registers is negated.
>
>
> Bootstrapped and tested on aarch64-linux-gnu.
> Ok for trunk?

I really think this kind of special casing is not correct and does not
belong in the compiler.  The main reason it complicates the back-end
more than the benefit of easier of reading the assembly code.

Thanks,
Andrew Pinski

>
> Thanks,
> Kyrill
>
> 2015-09-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * config/aarch64/aarch64.md (csinc3<mode>_insn): Use CINC
>     mnemonic when possible.
>     (*csinv3<mode>_insn): Use CINV mnemonic when possible.
>     (csneg3<mode>_insn): USE CNEG mnemonic when possible.
>
> 2015-09-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>
>     * gcc.target/aarch64/abs_1.c: Update scan-assembler checks
>     to allow cneg.
>     * gcc.target/aarch64/cond_op_imm_1.c: Likewise.  Likewise for cinv.
>     * gcc.target/aarch64/mod_2.c: Likewise.
Kyrylo Tkachov Sept. 21, 2015, 1:16 p.m. UTC | #5
Hi Andrew,

On 12/09/15 02:15, Andrew Pinski wrote:
> On Tue, Sep 1, 2015 at 6:08 PM, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>> Hi all,
>>
>> The ARMv8-A reference manual says:
>> "CNEG <Wd>, <Wn>, <cond>
>> is equivalent to
>> CSNEG <Wd>, <Wn>, <Wn>, invert(<cond>)
>> and is the preferred disassembly when Rn == Rm && cond != '111x'."
>>
>> That is, when the two input registers are the same we can use the shorter
>> CNEG mnemonic
>> with the inverse condition instead of the longer CSNEG instruction.
>> Similarly for the
>> CSINV and CSINC instructions, they have shorter CINV and CINC forms.
>> This patch adjusts the output templates to emit the preferred shorter
>> sequences when possible.
>>
>> The new mnemonics are just aliases, they map down to the same instruction in
>> the end, so there
>> are no performance or behaviour implications. But it does make the assembly
>> a bit more readable
>> IMO, since:
>> "cneg    w27, w9, le"
>> can be simply read as "if the condition is less or equal negate w9" instead
>> of the previous:
>> "csneg    w27, w9, w9, gt" where you have to remember which of the input
>> registers is negated.
>>
>>
>> Bootstrapped and tested on aarch64-linux-gnu.
>> Ok for trunk?
> I really think this kind of special casing is not correct and does not
> belong in the compiler.  The main reason it complicates the back-end
> more than the benefit of easier of reading the assembly code.

<sorry for the delay>.
The complication is an extra if-else statement with
explicit strings on each arm i.e. there's no snprintf trickery.
I tend to read a lot of the generated assembly when performing assembly
comparisons while working on performance patches and I find that having
the cneg from with two register operands and the negate condition is quicker
to parse than the full csneg form where I need to remember that extra bit
of info that the condition there must be inverted to get the negation condition.

If you feel very strongly against this I can withdraw this patch, but I'd rather have it in.
FWIW, clang also emits the CNEG when it can AFAICS, though I admit that's not necessarily a strong
argument for this change.

Kyrill

>
> Thanks,
> Andrew Pinski
>
>> Thanks,
>> Kyrill
>>
>> 2015-09-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * config/aarch64/aarch64.md (csinc3<mode>_insn): Use CINC
>>      mnemonic when possible.
>>      (*csinv3<mode>_insn): Use CINV mnemonic when possible.
>>      (csneg3<mode>_insn): USE CNEG mnemonic when possible.
>>
>> 2015-09-01  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      * gcc.target/aarch64/abs_1.c: Update scan-assembler checks
>>      to allow cneg.
>>      * gcc.target/aarch64/cond_op_imm_1.c: Likewise.  Likewise for cinv.
>>      * gcc.target/aarch64/mod_2.c: Likewise.
diff mbox

Patch

commit 5f2598ffa7e0d7db92163cc5e8f4f26f7d2aff5a
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Fri Aug 21 14:51:55 2015 +0100

    [AArch64] Use preferred aliases for CSNEG, CSINC, CSINV

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 77bc7cd..2e4b26c 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3090,7 +3090,12 @@  (define_insn "csinc3<mode>_insn"
 		    (const_int 1))
 	  (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))]
   ""
-  "csinc\\t%<w>0, %<w>3, %<w>2, %M1"
+  {
+    if (rtx_equal_p (operands[2], operands[3]))
+      return "cinc\\t%<w>0, %<w>2, %m1";
+    else
+      return "csinc\\t%<w>0, %<w>3, %<w>2, %M1";
+  }
   [(set_attr "type" "csel")]
 )
 
@@ -3101,7 +3106,12 @@  (define_insn "*csinv3<mode>_insn"
 	  (not:GPI (match_operand:GPI 2 "register_operand" "r"))
 	  (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))]
   ""
-  "csinv\\t%<w>0, %<w>3, %<w>2, %M1"
+  {
+    if (rtx_equal_p (operands[2], operands[3]))
+      return "cinv\\t%<w>0, %<w>2, %m1";
+    else
+      return "csinv\\t%<w>0, %<w>3, %<w>2, %M1";
+  }
   [(set_attr "type" "csel")]
 )
 
@@ -3112,7 +3122,12 @@  (define_insn "csneg3<mode>_insn"
 	  (neg:GPI (match_operand:GPI 2 "register_operand" "r"))
 	  (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")))]
   ""
-  "csneg\\t%<w>0, %<w>3, %<w>2, %M1"
+  {
+    if (rtx_equal_p (operands[2], operands[3]))
+      return "cneg\\t%<w>0, %<w>2, %m1";
+    else
+      return "csneg\\t%<w>0, %<w>3, %<w>2, %M1";
+  }
   [(set_attr "type" "csel")]
 )
 
diff --git a/gcc/testsuite/gcc.target/aarch64/abs_1.c b/gcc/testsuite/gcc.target/aarch64/abs_1.c
index 39364f4..84996a42 100644
--- a/gcc/testsuite/gcc.target/aarch64/abs_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/abs_1.c
@@ -7,14 +7,14 @@  extern void abort (void);
 long long
 abs64 (long long a)
 {
-  /* { dg-final { scan-assembler "csneg\t" } } */
+  /* { dg-final { scan-assembler "cs?neg\t" } } */
   return llabs (a);
 }
 
 long long
 abs64_in_dreg (long long a)
 {
-  /* { dg-final { scan-assembler "csneg\t" } } */
+  /* { dg-final { scan-assembler "cs?neg\t" } } */
   register long long x asm ("d8") = a;
   register long long y asm ("d9");
   asm volatile ("" : : "w" (x));
diff --git a/gcc/testsuite/gcc.target/aarch64/cond_op_imm_1.c b/gcc/testsuite/gcc.target/aarch64/cond_op_imm_1.c
index e93a693..a5394cc 100644
--- a/gcc/testsuite/gcc.target/aarch64/cond_op_imm_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/cond_op_imm_1.c
@@ -12,7 +12,7 @@  foonegsi (int a)
   return a ? N : -N;
 }
 
-/* { dg-final { scan-assembler "csneg\tw\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler "cs?neg\tw\[0-9\]*.*" } } */
 
 
 int
@@ -21,7 +21,7 @@  fooinvsi (int a)
   return a ? N : ~N;
 }
 
-/* { dg-final { scan-assembler "csinv\tw\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler "cs?inv\tw\[0-9\]*.*" } } */
 
 
 long long
@@ -36,7 +36,7 @@  largefooneg (long long a)
   return a ? M : -M;
 }
 
-/* { dg-final { scan-assembler "csneg\tx\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler "cs?neg\tx\[0-9\]*.*" } } */
 
 long long
 fooinvdi (long long a)
@@ -50,7 +50,7 @@  largefooinv (long long a)
   return a ? M : ~M;
 }
 
-/* { dg-final { scan-assembler "csinv\tx\[0-9\]*.*" } } */
+/* { dg-final { scan-assembler "cs?inv\tx\[0-9\]*.*" } } */
 
 
 int
diff --git a/gcc/testsuite/gcc.target/aarch64/mod_2.c b/gcc/testsuite/gcc.target/aarch64/mod_2.c
index 2645c18..a49783d 100644
--- a/gcc/testsuite/gcc.target/aarch64/mod_2.c
+++ b/gcc/testsuite/gcc.target/aarch64/mod_2.c
@@ -3,5 +3,5 @@ 
 
 #include "mod_2.x"
 
-/* { dg-final { scan-assembler "csneg\t\[wx\]\[0-9\]*" } } */
+/* { dg-final { scan-assembler "cs?neg\t\[wx\]\[0-9\]*" } } */
 /* { dg-final { scan-assembler-times "and\t\[wx\]\[0-9\]*" 1 } } */