[v2] Recognize a missed usage of a sbfiz instruction

Message ID 1517929440-7534-1-git-send-email-luis.machado@linaro.org
State New
Headers show
Series
  • [v2] Recognize a missed usage of a sbfiz instruction
Related show

Commit Message

Luis Machado Feb. 6, 2018, 3:04 p.m.
Thanks for the feedback Kyrill. I've adjusted the v2 patch based on your
suggestions and re-tested the changes. Everything is still sane.

Since this is ARM-specific and fairly specific, i wonder if it would be
reasonable to consider it for inclusion at the current stage.

Regards,
Luis

Changes in v2:

- Added more restrictive predicates to operands 2, 3 and 4.
- Removed pattern conditional.
- Switched to long long for 64-bit signed integer for the testcase.

---

A customer reported the following missed opportunities to combine a couple
instructions into a sbfiz.

int sbfiz32 (int x)
{
  return x << 29 >> 10;
}

long long sbfiz64 (long long x)
{
  return x << 58 >> 20;
}

This gets converted to the following pattern:

(set (reg:SI 98)
    (ashift:SI (sign_extend:SI (reg:HI 0 x0 [ xD.3334 ]))
        (const_int 6 [0x6])))

Currently, gcc generates the following:

sbfiz32:
	lsl x0, x0, 29
	asr x0, x0, 10
	ret

sbfiz64:
	lsl x0, x0, 58
	asr x0, x0, 20
	ret

It could generate this instead:

sbfiz32:
	sbfiz   w0, w0, 19, 3
	ret

sbfiz64::
	sbfiz   x0, x0, 38, 6
	ret

The unsigned versions already generate ubfiz for the same code, so the lack of
a sbfiz pattern may have been an oversight.

This particular sbfiz pattern shows up in both CPU2006 (~ 80 hits) and
CPU2017 (~ 280 hits). It's not a lot, but seems beneficial in any case. No
significant performance differences, probably due to the small number of
occurrences.

2018-02-06  Luis Machado  <luis.machado@linaro.org>

	gcc/
	* config/aarch64/aarch64.md (*ashift<mode>_extv_bfiz): New pattern.

2018-02-06  Luis Machado  <luis.machado@linaro.org>

	gcc/testsuite/
	* gcc.target/aarch64/lsl_asr_sbfiz.c: New test.
---
 gcc/config/aarch64/aarch64.md                    | 13 +++++++++++++
 gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c | 24 ++++++++++++++++++++++++
 2 files changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c

-- 
2.7.4

Comments

Kyrill Tkachov Feb. 8, 2018, 11:48 a.m. | #1
Hi Luis,

On 06/02/18 15:04, Luis Machado wrote:
> Thanks for the feedback Kyrill. I've adjusted the v2 patch based on your

> suggestions and re-tested the changes. Everything is still sane.


Thanks! This looks pretty good to me.

> Since this is ARM-specific and fairly specific, i wonder if it would be

> reasonable to consider it for inclusion at the current stage.


It is true that the target maintainers can choose to take
such patches at any stage. However, any patch at this stage increases
the risk of regressions being introduced and these regressions
can come bite us in ways that are very hard to anticipate.

Have a look at some of the bugs in bugzilla (or a quick scan of the gcc-bugs list)
for examples of the ways that things can go wrong with any of the myriad of GCC components
and the unexpected ways in which they can interact.

For example, I am now working on what I initially thought was a one-liner fix for
PR 84164 but it has expanded into a 3-patch series with a midend component and
target-specific changes for 2 ports.

These issues are very hard to catch during review and normal testing, and can sometimes take months of deep testing by
fuzzing and massive codebase rebuilds to expose, so the closer the commit is to a release
the higher the risk is that an obscure edge case will be unnoticed and unfixed in the release.

So the priority at this stage is to minimise the risk of destabilising the codebase,
as opposed to taking in new features and desirable performance improvements (like your patch!)

That is the rationale for delaying committing such changes until the start
of GCC 9 development. But again, this is up to the aarch64 maintainers.
I'm sure the patch will be a perfectly fine and desirable commit for GCC 9.
This is just my perspective as maintainer of the arm port.

Thanks,
Kyrill

> Regards,

> Luis

>

> Changes in v2:

>

> - Added more restrictive predicates to operands 2, 3 and 4.

> - Removed pattern conditional.

> - Switched to long long for 64-bit signed integer for the testcase.

>

> ---

>

> A customer reported the following missed opportunities to combine a couple

> instructions into a sbfiz.

>

> int sbfiz32 (int x)

> {

>    return x << 29 >> 10;

> }

>

> long long sbfiz64 (long long x)

> {

>    return x << 58 >> 20;

> }

>

> This gets converted to the following pattern:

>

> (set (reg:SI 98)

>      (ashift:SI (sign_extend:SI (reg:HI 0 x0 [ xD.3334 ]))

>          (const_int 6 [0x6])))

>

> Currently, gcc generates the following:

>

> sbfiz32:

> 	lsl x0, x0, 29

> 	asr x0, x0, 10

> 	ret

>

> sbfiz64:

> 	lsl x0, x0, 58

> 	asr x0, x0, 20

> 	ret

>

> It could generate this instead:

>

> sbfiz32:

> 	sbfiz   w0, w0, 19, 3

> 	ret

>

> sbfiz64::

> 	sbfiz   x0, x0, 38, 6

> 	ret

>

> The unsigned versions already generate ubfiz for the same code, so the lack of

> a sbfiz pattern may have been an oversight.

>

> This particular sbfiz pattern shows up in both CPU2006 (~ 80 hits) and

> CPU2017 (~ 280 hits). It's not a lot, but seems beneficial in any case. No

> significant performance differences, probably due to the small number of

> occurrences.

>

> 2018-02-06  Luis Machado  <luis.machado@linaro.org>

>

> 	gcc/

> 	* config/aarch64/aarch64.md (*ashift<mode>_extv_bfiz): New pattern.

>

> 2018-02-06  Luis Machado  <luis.machado@linaro.org>

>

> 	gcc/testsuite/

> 	* gcc.target/aarch64/lsl_asr_sbfiz.c: New test.

> ---

>   gcc/config/aarch64/aarch64.md                    | 13 +++++++++++++

>   gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c | 24 ++++++++++++++++++++++++

>   2 files changed, 37 insertions(+)

>   create mode 100644 gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c

>

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md

> index 5a2a930..e8284ae 100644

> --- a/gcc/config/aarch64/aarch64.md

> +++ b/gcc/config/aarch64/aarch64.md

> @@ -4828,6 +4828,19 @@

>     [(set_attr "type" "bfx")]

>   )

>   

> +;; Match sbfiz pattern in a shift left + shift right operation.

> +

> +(define_insn "*ashift<mode>_extv_bfiz"

> +  [(set (match_operand:GPI 0 "register_operand" "=r")

> +	(ashift:GPI (sign_extract:GPI (match_operand:GPI 1 "register_operand" "r")

> +				      (match_operand 2 "aarch64_simd_shift_imm_offset_<mode>" "n")

> +				      (match_operand 3 "aarch64_simd_shift_imm_<mode>" "n"))

> +		     (match_operand 4 "aarch64_simd_shift_imm_<mode>" "n")))]

> +  ""

> +  "sbfiz\\t%<w>0, %<w>1, %4, %2"

> +  [(set_attr "type" "bfx")]

> +)

> +

>   ;; When the bit position and width of the equivalent extraction add up to 32

>   ;; we can use a W-reg LSL instruction taking advantage of the implicit

>   ;; zero-extension of the X-reg.

> diff --git a/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c b/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c

> new file mode 100644

> index 0000000..106433d

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c

> @@ -0,0 +1,24 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O3" } */

> +

> +/* Check that a LSL followed by an ASR can be combined into a single SBFIZ

> +   instruction.  */

> +

> +/* Using W-reg */

> +

> +int

> +sbfiz32 (int x)

> +{

> +  return x << 29 >> 10;

> +}

> +

> +/* Using X-reg */

> +

> +long long

> +sbfiz64 (long long x)

> +{

> +  return x << 58 >> 20;

> +}

> +

> +/* { dg-final { scan-assembler "sbfiz\tw" } } */

> +/* { dg-final { scan-assembler "sbfiz\tx" } } */
Luis Machado Feb. 8, 2018, 12:45 p.m. | #2
Hi Kyrill,

On 02/08/2018 09:48 AM, Kyrill Tkachov wrote:
> Hi Luis,

> 

> On 06/02/18 15:04, Luis Machado wrote:

>> Thanks for the feedback Kyrill. I've adjusted the v2 patch based on your

>> suggestions and re-tested the changes. Everything is still sane.

> 

> Thanks! This looks pretty good to me.

> 

>> Since this is ARM-specific and fairly specific, i wonder if it would be

>> reasonable to consider it for inclusion at the current stage.

> 

> It is true that the target maintainers can choose to take

> such patches at any stage. However, any patch at this stage increases

> the risk of regressions being introduced and these regressions

> can come bite us in ways that are very hard to anticipate.

> 

> Have a look at some of the bugs in bugzilla (or a quick scan of the 

> gcc-bugs list)

> for examples of the ways that things can go wrong with any of the myriad 

> of GCC components

> and the unexpected ways in which they can interact.

> 

> For example, I am now working on what I initially thought was a 

> one-liner fix for

> PR 84164 but it has expanded into a 3-patch series with a midend 

> component and

> target-specific changes for 2 ports.

> 

> These issues are very hard to catch during review and normal testing, 

> and can sometimes take months of deep testing by

> fuzzing and massive codebase rebuilds to expose, so the closer the 

> commit is to a release

> the higher the risk is that an obscure edge case will be unnoticed and 

> unfixed in the release.

> 

> So the priority at this stage is to minimise the risk of destabilising 

> the codebase,

> as opposed to taking in new features and desirable performance 

> improvements (like your patch!)

> 

> That is the rationale for delaying committing such changes until the start

> of GCC 9 development. But again, this is up to the aarch64 maintainers.

> I'm sure the patch will be a perfectly fine and desirable commit for GCC 9.

> This is just my perspective as maintainer of the arm port.


Thanks. Your explanation makes the situation pretty clear and it sounds 
very reasonable. I'll put the patch on hold until development is open again.

Regards,
Luis

Patch

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 5a2a930..e8284ae 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4828,6 +4828,19 @@ 
   [(set_attr "type" "bfx")]
 )
 
+;; Match sbfiz pattern in a shift left + shift right operation.
+
+(define_insn "*ashift<mode>_extv_bfiz"
+  [(set (match_operand:GPI 0 "register_operand" "=r")
+	(ashift:GPI (sign_extract:GPI (match_operand:GPI 1 "register_operand" "r")
+				      (match_operand 2 "aarch64_simd_shift_imm_offset_<mode>" "n")
+				      (match_operand 3 "aarch64_simd_shift_imm_<mode>" "n"))
+		     (match_operand 4 "aarch64_simd_shift_imm_<mode>" "n")))]
+  ""
+  "sbfiz\\t%<w>0, %<w>1, %4, %2"
+  [(set_attr "type" "bfx")]
+)
+
 ;; When the bit position and width of the equivalent extraction add up to 32
 ;; we can use a W-reg LSL instruction taking advantage of the implicit
 ;; zero-extension of the X-reg.
diff --git a/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c b/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c
new file mode 100644
index 0000000..106433d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+/* Check that a LSL followed by an ASR can be combined into a single SBFIZ
+   instruction.  */
+
+/* Using W-reg */
+
+int
+sbfiz32 (int x)
+{
+  return x << 29 >> 10;
+}
+
+/* Using X-reg */
+
+long long
+sbfiz64 (long long x)
+{
+  return x << 58 >> 20;
+}
+
+/* { dg-final { scan-assembler "sbfiz\tw" } } */
+/* { dg-final { scan-assembler "sbfiz\tx" } } */