diff mbox

[AArch64] Split X-reg UBFX into W-reg LSR when possible

Message ID 5853DC60.4060000@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov Dec. 16, 2016, 12:21 p.m. UTC
On 15/12/16 11:53, James Greenhalgh wrote:
> On Thu, Dec 08, 2016 at 09:35:08AM +0000, Kyrill Tkachov wrote:

>> Hi all,

>>

>> In this patch we split X-register UBFX instructions that extract up to the

>> edge of a W-register into a W-register LSR instruction. So for the example in

>> the testcase instead of:

>> UBFX    X0, X0, 24, 8

>>

>> we'd generate:

>> LSR     w0, w0, 24

>>

>> An LSR is a simpler instruction and there's a higher chance that it can be

>> combined with other instructions.

>>

>> To do this the patch separates the sign_extract case from the zero_extract

>> case in the *<optab><mode> ANY_EXTRACT pattern and further splits the

>> SImode/DImode patterns from the resulting zrero_extract pattern.

>> The DImode zero_extract pattern then becomes a define_insn_and_split that

>> splits into a zero_extend of an lshiftrt when the bitposition and width of

>> the zero_extract add up to 32.

>>

>> Bootstrapped and tested on aarch64-none-linux-gnu.

>>

>> Since we're in stage 3 perhaps this is not for GCC 6, but it is fairly low

>> risk.  I'm happy for it to wait for the next release if necessary.

> I'm OK with the idea, and I'm OK taking this in for Stage 3, but I'm not

> convinced by the implementation.

>

>> 2016-12-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

>>

>>      * config/aarch64/aarch64.md (*<optab><mode>): Split into...

>>      (*extv<mode>): ...This...

>>      (*extzvsi): ...This...

>>      (*extzvdi:): ... And this.  Add splitting to lshiftrt when possible.

> Why do we want to to it this way, rather than simply defining a single

> "split" which works in the case you're trying to catch.

>

> i.e. (untested)

>

> (define_split

>     [(set (match_operand:DI 0 "register_operand")

> 	(zero_extract:DI (match_operand:DI 1 "register_operand")

> 			 (match_operand 2

> 			   "aarch64_simd_shift_imm_offset_di")

> 			 (match_operand 3

> 			   "aarch64_simd_shift_imm_di")))]

>    "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),

> 	     1, GET_MODE_BITSIZE (DImode) - 1)

>     && (INTVAL (operands[2]) + INTVAL (operands[3]))

> 	== GET_MODE_BITSIZE (SImode)"

>    [(set (match_dup 0)

> 	(zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]

>    {

>      operands[4] = gen_lowpart (SImode, operands[1]);

>    }

> )

>

> Thanks,

> James


Yes, that works and is simpler.
Is this ok then?
Bootstrapped and tested on aarch64-none-linux-gnu.
Thanks,
Kyrill

2016-12-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.md: New define_split above insv<mode>.

2016-12-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/ubfx_lsr_1.c: New test.

Comments

James Greenhalgh Dec. 16, 2016, 4:17 p.m. UTC | #1
On Fri, Dec 16, 2016 at 12:21:52PM +0000, Kyrill Tkachov wrote:
> 

> On 15/12/16 11:53, James Greenhalgh wrote:

> >On Thu, Dec 08, 2016 at 09:35:08AM +0000, Kyrill Tkachov wrote:

> >>Hi all,

> >>

> >>In this patch we split X-register UBFX instructions that extract up to the

> >>edge of a W-register into a W-register LSR instruction. So for the example in

> >>the testcase instead of:

> >>UBFX    X0, X0, 24, 8

> >>

> >>we'd generate:

> >>LSR     w0, w0, 24

> >>

> >>An LSR is a simpler instruction and there's a higher chance that it can be

> >>combined with other instructions.

> >>

> >>To do this the patch separates the sign_extract case from the zero_extract

> >>case in the *<optab><mode> ANY_EXTRACT pattern and further splits the

> >>SImode/DImode patterns from the resulting zrero_extract pattern.

> >>The DImode zero_extract pattern then becomes a define_insn_and_split that

> >>splits into a zero_extend of an lshiftrt when the bitposition and width of

> >>the zero_extract add up to 32.

> >>

> >>Bootstrapped and tested on aarch64-none-linux-gnu.

> >>

> >>Since we're in stage 3 perhaps this is not for GCC 6, but it is fairly low

> >>risk.  I'm happy for it to wait for the next release if necessary.

> >I'm OK with the idea, and I'm OK taking this in for Stage 3, but I'm not

> >convinced by the implementation.

> >

> >>2016-12-08  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

> >>

> >>     * config/aarch64/aarch64.md (*<optab><mode>): Split into...

> >>     (*extv<mode>): ...This...

> >>     (*extzvsi): ...This...

> >>     (*extzvdi:): ... And this.  Add splitting to lshiftrt when possible.

> >Why do we want to to it this way, rather than simply defining a single

> >"split" which works in the case you're trying to catch.

> >

> >i.e. (untested)

> >

> >(define_split

> >    [(set (match_operand:DI 0 "register_operand")

> >	(zero_extract:DI (match_operand:DI 1 "register_operand")

> >			 (match_operand 2

> >			   "aarch64_simd_shift_imm_offset_di")

> >			 (match_operand 3

> >			   "aarch64_simd_shift_imm_di")))]

> >   "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]),

> >	     1, GET_MODE_BITSIZE (DImode) - 1)

> >    && (INTVAL (operands[2]) + INTVAL (operands[3]))

> >	== GET_MODE_BITSIZE (SImode)"

> >   [(set (match_dup 0)

> >	(zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]

> >   {

> >     operands[4] = gen_lowpart (SImode, operands[1]);

> >   }

> >)

> >

> >Thanks,

> >James

> 

> Yes, that works and is simpler.

> Is this ok then?


OK with the typo fixed.

Thanks,
James

> 2016-12-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

> 

>     * config/aarch64/aarch64.md: New define_split above insv<mode>.

> 

> 2016-12-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

> 

>     * gcc.target/aarch64/ubfx_lsr_1.c: New test.


> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md

> index 65ea04587442b0cab18b1e4652537524b82d5b86..5a40ee6abd5e123116aaaa478dced2207dd59478 100644

> --- a/gcc/config/aarch64/aarch64.md

> +++ b/gcc/config/aarch64/aarch64.md

> @@ -4340,6 +4340,26 @@ (define_insn "*<optab><mode>"

>    [(set_attr "type" "bfx")]

>  )

>  

> +;; When the bitposition and width add up to 32 we can use a W-reg LSR


s/bitposition/bit position/
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 65ea04587442b0cab18b1e4652537524b82d5b86..5a40ee6abd5e123116aaaa478dced2207dd59478 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4340,6 +4340,26 @@  (define_insn "*<optab><mode>"
   [(set_attr "type" "bfx")]
 )
 
+;; When the bitposition and width add up to 32 we can use a W-reg LSR
+;; instruction taking advantage of the implicit zero-extension of the X-reg.
+(define_split
+  [(set (match_operand:DI 0 "register_operand")
+	(zero_extract:DI (match_operand:DI 1 "register_operand")
+			 (match_operand 2
+			   "aarch64_simd_shift_imm_offset_di")
+			 (match_operand 3
+			   "aarch64_simd_shift_imm_di")))]
+  "IN_RANGE (INTVAL (operands[2]) + INTVAL (operands[3]), 1,
+	     GET_MODE_BITSIZE (DImode) - 1)
+   && (INTVAL (operands[2]) + INTVAL (operands[3]))
+       == GET_MODE_BITSIZE (SImode)"
+  [(set (match_dup 0)
+	(zero_extend:DI (lshiftrt:SI (match_dup 4) (match_dup 3))))]
+  {
+    operands[4] = gen_lowpart (SImode, operands[1]);
+  }
+)
+
 ;; Bitfield Insert (insv)
 (define_expand "insv<mode>"
   [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand")
diff --git a/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c b/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c
new file mode 100644
index 0000000000000000000000000000000000000000..f6f72b074e1fc6bcb1976eee6c545e9781b4bed6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ubfx_lsr_1.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Check that an X-reg UBFX can be simplified into a W-reg LSR.  */
+
+int
+f (unsigned long long x)
+{
+  x = (x >> 24) & 255;
+  return x + 1;
+}
+
+/* { dg-final { scan-assembler "lsr\tw" } } */
+/* { dg-final { scan-assembler-not "ubfx\tx" } } */