Message ID | 87wp0x8gad.fsf@linaro.org |
---|---|
State | Accepted |
Commit | b4c969723c1e61dbdaadc61b0b5fd5393e9ff76e |
Headers | show |
Series | Tighten LRA cycling check | expand |
On 01/04/2018 02:28 PM, Richard Sandiford wrote: > LRA has code to try to prevent cycling, by avoiding reloads that > look too similar to the instruction being reloaded. E.g. if we > have a R<-C move for some constant C, reloading the source with > another R<-C move is unlikely to be a good idea. > > However, this safeguard unnecessarily triggered in tests like > the one in the patch. We started with instructions like: > > (insn 12 9 13 5 (set (reg:DI 0 x0) > (reg/f:DI 459)) "reg-alloc-1.c":18 47 {*movdi_aarch64} > (expr_list:REG_EQUAL (symbol_ref:DI ("x00") [flags 0xc0] <var_decl 0x7f3c03c1f510 x00>) > (nil))) > > where r459 didn't get allocated a register and is equivalent to > constant x00. LRA would then handle it like this: > > Changing pseudo 459 in operand 1 of insn 12 on equiv `x00' > 1 Non-pseudo reload: reject+=2 > 1 Non input pseudo reload: reject++ > alt=0,overall=609,losers=1,rld_nregs=1 > [...] > alt=13,overall=9,losers=1,rld_nregs=1 > [...] > Choosing alt 13 in insn 12: (0) r (1) w {*movdi_aarch64} > > In other words, to avoid loading the constant x00 into another GPR, > LRA decided instead to move it into a floating-point register, > then move that floating-point register into x0: > > Creating newreg=630, assigning class FP_REGS to r630 > Set class ALL_REGS for r631 > 12: x0:DI=r630:DI > REG_EQUAL `x00' > Inserting insn reload before: > 815: r631:DI=high(`x00') > 816: r630:DI=r631:DI+low(`x00') > REG_EQUAL `x00' > > That's inefficient and doesn't really help to resolve a cycling > problem, since the r630 destination of 816 needs to be reloaded into > a GPR anyway. > > The cycling check already had an exception for source values that are > the result of an elimination. This patch extends it to include the > result of equivalence substitution. > > Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. > Also tested by comparing the before and after assembly output for at > least one target per CPU directory. The targets most affected were: > > aarch64_be-linux-gn > aarch64-linux-gnu > powerpc-eabispe > riscv32-elf > riscv64-elf > sparc64-linux-gnu > sparc-linux-gnu > spu-elf > > for which it improved code size overall. There were minor register > renaming differences on some other targets. x86 and powerpc*-linux-gnu > targets were unaffected. > > OK to install? > > Richard > > > 2018-01-04 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > * lra-constraints.c (process_alt_operands): Test for the equivalence > substitutions when detecting a possible reload cycle. > > gcc/testsuite/ > * gcc.target/aarch64/reg-alloc-1.c: New test. OK. Presumably related to 83699? Do you want to reference it in the ChangeLog? jeff
Jeff Law <law@redhat.com> writes: > On 01/04/2018 02:28 PM, Richard Sandiford wrote: >> LRA has code to try to prevent cycling, by avoiding reloads that >> look too similar to the instruction being reloaded. E.g. if we >> have a R<-C move for some constant C, reloading the source with >> another R<-C move is unlikely to be a good idea. >> >> However, this safeguard unnecessarily triggered in tests like >> the one in the patch. We started with instructions like: >> >> (insn 12 9 13 5 (set (reg:DI 0 x0) >> (reg/f:DI 459)) "reg-alloc-1.c":18 47 {*movdi_aarch64} >> (expr_list:REG_EQUAL (symbol_ref:DI ("x00") [flags 0xc0] <var_decl 0x7f3c03c1f510 x00>) >> (nil))) >> >> where r459 didn't get allocated a register and is equivalent to >> constant x00. LRA would then handle it like this: >> >> Changing pseudo 459 in operand 1 of insn 12 on equiv `x00' >> 1 Non-pseudo reload: reject+=2 >> 1 Non input pseudo reload: reject++ >> alt=0,overall=609,losers=1,rld_nregs=1 >> [...] >> alt=13,overall=9,losers=1,rld_nregs=1 >> [...] >> Choosing alt 13 in insn 12: (0) r (1) w {*movdi_aarch64} >> >> In other words, to avoid loading the constant x00 into another GPR, >> LRA decided instead to move it into a floating-point register, >> then move that floating-point register into x0: >> >> Creating newreg=630, assigning class FP_REGS to r630 >> Set class ALL_REGS for r631 >> 12: x0:DI=r630:DI >> REG_EQUAL `x00' >> Inserting insn reload before: >> 815: r631:DI=high(`x00') >> 816: r630:DI=r631:DI+low(`x00') >> REG_EQUAL `x00' >> >> That's inefficient and doesn't really help to resolve a cycling >> problem, since the r630 destination of 816 needs to be reloaded into >> a GPR anyway. >> >> The cycling check already had an exception for source values that are >> the result of an elimination. This patch extends it to include the >> result of equivalence substitution. >> >> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. >> Also tested by comparing the before and after assembly output for at >> least one target per CPU directory. The targets most affected were: >> >> aarch64_be-linux-gn >> aarch64-linux-gnu >> powerpc-eabispe >> riscv32-elf >> riscv64-elf >> sparc64-linux-gnu >> sparc-linux-gnu >> spu-elf >> >> for which it improved code size overall. There were minor register >> renaming differences on some other targets. x86 and powerpc*-linux-gnu >> targets were unaffected. >> >> OK to install? >> >> Richard >> >> >> 2018-01-04 Richard Sandiford <richard.sandiford@linaro.org> >> >> gcc/ >> * lra-constraints.c (process_alt_operands): Test for the equivalence >> substitutions when detecting a possible reload cycle. >> >> gcc/testsuite/ >> * gcc.target/aarch64/reg-alloc-1.c: New test. > OK. Thanks. > Presumably related to 83699? Do you want to reference it in the > ChangeLog? It's actually an unrelated problem that I'd originally seen on SVE. I guess I was just lucky to have two cycling bugs around the same time :-) Richard
Index: gcc/lra-constraints.c =================================================================== --- gcc/lra-constraints.c 2018-01-04 21:28:22.495890864 +0000 +++ gcc/lra-constraints.c 2018-01-04 21:28:22.637885000 +0000 @@ -2866,7 +2866,12 @@ process_alt_operands (int only_alternati /* If it is a result of recent elimination in move insn we can transform it into an add still by using this alternative. */ - && GET_CODE (no_subreg_reg_operand[1]) != PLUS))) + && GET_CODE (no_subreg_reg_operand[1]) != PLUS + /* Likewise if the source has been replaced with an + equivalent value. This only happens once -- the reload + will use the equivalent value instead of the register it + replaces -- so there should be no danger of cycling. */ + && !equiv_substition_p[1]))) { /* We have a move insn and a new reload insn will be similar to the current insn. We should avoid such situation as Index: gcc/testsuite/gcc.target/aarch64/reg-alloc-1.c =================================================================== --- /dev/null 2018-01-04 17:16:45.681472446 +0000 +++ gcc/testsuite/gcc.target/aarch64/reg-alloc-1.c 2018-01-04 21:28:22.637885000 +0000 @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-rtl-reload-details" } */ + +#define R1(X, Y) X (Y##0) X (Y##1) X (Y##2) X (Y##3) \ + X (Y##4) X (Y##5) X (Y##6) X (Y##7) +#define R2(X) R1 (X, 0) R1 (X, 1) R1 (X, 2) R1 (X, 3) + +#define DEFINE(N) extern int x##N; +R2 (DEFINE) + +void b1 (int *); + +void +foo (int n) +{ + for (int i = 0; i < n; ++i) + { +#define CALL(N) b1 (&x##N); + R2 (CALL); + R2 (CALL); + } +#define INC(N) x##N += 1; + R2 (INC); +} + +/* { dg-final { scan-rtl-dump-not "DI 32 v0" "reload" } } */