Message ID | 1517582287-9923-1-git-send-email-luis.machado@linaro.org |
---|---|
State | Superseded |
Headers | show |
Series | Recognize a missed usage of a sbfiz instruction | expand |
Hi Luis, On 02/02/18 14:38, Luis Machado wrote: > A customer reported the following missed opportunities to combine a couple > instructions into a sbfiz. > > int sbfiz32 (int x) > { > return x << 29 >> 10; > } > > long sbfiz64 (long x) > { > return x << 58 >> 20; > } > > This gets converted to the following pattern: > > (set (reg:SI 98) > (ashift:SI (sign_extend:SI (reg:HI 0 x0 [ xD.3334 ])) > (const_int 6 [0x6]))) > > Currently, gcc generates the following: > > sbfiz32: > lsl x0, x0, 29 > asr x0, x0, 10 > ret > > sbfiz64: > lsl x0, x0, 58 > asr x0, x0, 20 > ret > > It could generate this instead: > > sbfiz32: > sbfiz w0, w0, 19, 3 > ret > > sbfiz64:: > sbfiz x0, x0, 38, 6 > ret > > The unsigned versions already generate ubfiz for the same code, so the lack of > a sbfiz pattern may have been an oversight. You're right. In fact, llvm will generate the SBFIZ form. > > This particular sbfiz pattern shows up in both CPU2006 (~ 80 hits) and > CPU2017 (~ 280 hits). It's not a lot, but seems beneficial in any case. No > significant performance differences, probably due to the small number of > occurrences or cases outside hot areas. Yeah, these little missed opportunities add up in the long run :) > > Regression-tested and bootstrapped ok on aarch64-linux. Validated with > CPU2017 and CPU2006 runs. > > I thought i'd put this up for review. I know we're still not in development > mode yet. > > 2018-02-02 Luis Machado <luis.machado@linaro.org> > > gcc/ > * config/aarch64/aarch64.md (*ashift<mode>_extv_bfiz): New pattern. > * testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c: New test. I'm sure you know this already, the testsuite entry will go into a ChangeLog of its own in gcc/testsuite/ with the testsuite/ prefix removed. > --- > gcc/config/aarch64/aarch64.md | 13 +++++++++++++ > gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c | 24 ++++++++++++++++++++++++ > 2 files changed, 37 insertions(+) > create mode 100644 gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 5a2a930..d336bf0 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -4828,6 +4828,19 @@ > [(set_attr "type" "bfx")] > ) > > +;; Match sbfiz pattern in a shift left + shift right operation. > + > +(define_insn "*ashift<mode>_extv_bfiz" > + [(set (match_operand:GPI 0 "register_operand" "=r") > + (ashift:GPI (sign_extract:GPI (match_operand:GPI 1 "register_operand" "r") > + (match_operand 2 "const_int_operand" "n") > + (match_operand 3 "const_int_operand" "n")) > + (match_operand 4 "const_int_operand" "n")))] > + "UINTVAL (operands[2]) < <GPI:sizen>" > + "sbfiz\\t%<w>0, %<w>1, %4, %2" > + [(set_attr "type" "bfx")] > +) I think you need some more validation on operands 3 and 4. Look at other similar patterns, but you want something like the aarch64_simd_shift_imm_<mode> predicate on them to make sure they fall within [0, 63] or [0, 31]. Also, for operands 2 you probably want the aarch64_simd_shift_imm_offset_di to make sure it doesn't allow a size of zero. Don't know if the RTL optimisers will try to create such wonky RTL, but we tend to be defensive in the pattern about these things. > + > ;; When the bit position and width of the equivalent extraction add up to 32 > ;; we can use a W-reg LSL instruction taking advantage of the implicit > ;; zero-extension of the X-reg. > diff --git a/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c b/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c > new file mode 100644 > index 0000000..931f8f4 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c > @@ -0,0 +1,24 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3" } */ > + > +/* Check that a LSL followed by an ASR can be combined into a single SBFIZ > + instruction. */ > + > +/* Using W-reg */ > + > +int > +sbfiz32 (int x) > +{ > + return x << 29 >> 10; > +} > + > +/* Using X-reg */ > + > +long > +sbfiz64 (long x) > +{ > + return x << 58 >> 20; > +} long will be 32 bits when testing with -mabi=ilp32 so you want this to be long long, or some other guaranteed 64-bit type. Thanks, Kyrill > + > +/* { dg-final { scan-assembler "sbfiz\tw" } } */ > +/* { dg-final { scan-assembler "sbfiz\tx" } } */ > -- > 2.7.4 >
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5a2a930..d336bf0 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4828,6 +4828,19 @@ [(set_attr "type" "bfx")] ) +;; Match sbfiz pattern in a shift left + shift right operation. + +(define_insn "*ashift<mode>_extv_bfiz" + [(set (match_operand:GPI 0 "register_operand" "=r") + (ashift:GPI (sign_extract:GPI (match_operand:GPI 1 "register_operand" "r") + (match_operand 2 "const_int_operand" "n") + (match_operand 3 "const_int_operand" "n")) + (match_operand 4 "const_int_operand" "n")))] + "UINTVAL (operands[2]) < <GPI:sizen>" + "sbfiz\\t%<w>0, %<w>1, %4, %2" + [(set_attr "type" "bfx")] +) + ;; When the bit position and width of the equivalent extraction add up to 32 ;; we can use a W-reg LSL instruction taking advantage of the implicit ;; zero-extension of the X-reg. diff --git a/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c b/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c new file mode 100644 index 0000000..931f8f4 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/lsl_asr_sbfiz.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O3" } */ + +/* Check that a LSL followed by an ASR can be combined into a single SBFIZ + instruction. */ + +/* Using W-reg */ + +int +sbfiz32 (int x) +{ + return x << 29 >> 10; +} + +/* Using X-reg */ + +long +sbfiz64 (long x) +{ + return x << 58 >> 20; +} + +/* { dg-final { scan-assembler "sbfiz\tw" } } */ +/* { dg-final { scan-assembler "sbfiz\tx" } } */