Message ID | 877enl9vfy.fsf_-_@linaro.org |
---|---|
State | Accepted |
Commit | 8fd966327ae16a820143f589dd48b8c17a6f6ae5 |
Headers | show |
Series | Tighten LRA test for reloading the inner reg of a paradoxical subreg | expand |
On 05/30/2018 06:31 AM, Richard Sandiford wrote: > Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> writes: >> Hi Richard, >> >> On 29/05/18 15:26, Richard Sandiford wrote: >>> Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> writes: >>>> Hi all, >>>> >>>> The recent changes to aarch64_expand_vector_init cause an ICE in the >>>> attached testcase. The register allocator "ICEs with Max. number of >>>> generated reload insns per insn is achieved (90)" >>>> >>>> That is because aarch64_expand_vector_init creates a paradoxical >>>> subreg to move a DImode value >>>> into a V2DI vector: >>>> (insn 74 72 76 8 (set (reg:V2DI 287 [ _166 ]) >>>> (subreg:V2DI (reg/v/f:DI 112 [ d ]) 0)) 1050 >>>> {*aarch64_simd_movv2di} >>>> >>>> This is done because we want to express that the whole of the V2DI >>>> vector will be written so that init-regs doesn't try to >>>> zero-initialise it before we overwrite each lane individually anyway. >>>> >>>> This can go bad for because if the DImode value is allocated in, say, >>>> x30: the last register in that register class, the V2DI subreg of that >>>> isn't valid or meaningful and that seems to cause the trouble. >>>> >>>> It's kinda hard to know what the right solution for this is. >>>> We could emit a duplicate of the value into all the lanes of the >>>> vector, but we have tests that test against that >>>> (we're trying to avoid unnecessary duplicates) >>>> >>>> What this patch does is it defines a pattern for moving a scalar into >>>> lane 0 of a vector using a simple FMOV or LDR and represents that as a >>>> merging with a vector of zeroes. That way, the instruction represents >>>> a full write of the destination vector but doesn't "read" more bits >>>> from the source than necessary. The zeroing effect is also a more >>>> accurate representation of the semantics of FMOV. >>> This feels like a hack. Either the paradoxical subreg of the pseudo >>> is invalid for some reason (in which case we should ICE when it's formed, >>> not just in the case of x30 being allocated) or the subreg is valid, >>> in which case the RA should handle it correctly (and the backend should >>> give it the information it needs to do that). >>> >>> I could see the argument for ignoring the problem for expediency if the >>> patch was a clean-up in its own right, but I think it's wrong to add so >>> much code to paper over a bug. >> >> I see what you mean. Do you have any thoughts on where in RA we'd go >> about fixing this? Since I don't know my way around RA I tried in the >> place I was most comfortable changing :) > > Ah, but everyone who's patched the RA had to patch it for the > first time. :-) And it's not that scary. But anyway... > > The original insn was: > > (insn 74 72 76 8 (set (reg:V2DI 287 [ _166 ]) > (subreg:V2DI (reg/v/f:DI 112 [ d ]) 0)) 1060 {*aarch64_simd_movv2di} > (nil)) > > which IRA converted to: > > (insn 74 72 580 8 (set (reg:V2DI 287 [ _166 ]) > (subreg:V2DI (reg/v/f:DI 517 [orig:112 d ] [112]) 0)) 1060 {*aarch64_simd_movv2di} > (nil)) > > after creating loop allocnos. It happens that the ALLOCNO_WMODEs for > both 112 and 517 were not set to V2DI due to another bug that I'll post > a separate patch for, but we nevertheless got a valid allocation of > register 1. > > LRA's first try at constraining the instruction gave: > > Choosing alt 5 in insn 74: (0) ?w (1) r {*aarch64_simd_movv2di} > > at which point all was good. But LRA later decided it needed > to spill r517: > > Spill r517 after risky transformations > > so the next constraint attempt gave: > > Choosing alt 0 in insn 74: (0) =w (1) m {*aarch64_simd_movv2di} > > which was still good. Then during inheritance we had: > > Creating newreg=672 from oldreg=517, assigning class GENERAL_REGS to inheritance r672 > Original reg change 517->672 (bb8): > 74: r287:V2DI=r672:DI#0 > Add inheritance<-original before: > 939: r672:DI=r517:DI > > Inheritance reuse change 517->672 (bb8): > 620: r572:DI=r672:DI > REG_DEAD r672:DI > > Use smallest class of POINTER_REGS and GENERAL_REGS > Creating newreg=673 from oldreg=517, assigning class POINTER_REGS to inheritance r673 > Original reg change 517->673 (bb8): > 936: r669:DI=r673:DI > Add inheritance<-original before: > 940: r673:DI=r517:DI > > ("Use smallest class of POINTER_REGS and GENERAL_REGS" ought to > give GENERAL_REGS. That might be a missed optimisation, and probably > due to both classes having the same number of allocatable registers. > I'll look at that as a follow-on.) > > Thus LRA created two inheritance registers for r517, one (r673) > that included the unallocatable x31 and another (r672) that didn't. > The r672 references included the paradoxical subreg in insn 74 but the > r673 ones didn't. LRA then allocated x30 to r673, which was a valid > choice. > > Later LRA decided to "undo" the inheritance for insn 620, but because > of the double inheritance, it got confused as to what the original > situation was, and made insn 74 use the other inheritance register > instead of r517: > > ********** Undoing inheritance #2: ********** > > Inherit 11 out of 12 (91.67%) > Insn after restoring regs: > 620: r572:DI=r517:DI > REG_DEAD r517:DI > Change reload insn: > 74: r287:V2DI=r673:DI#0 <------------------- > Insn after restoring regs: > 939: r517:DI=r673:DI > REG_DEAD r673:DI > > This might be a bug in itself: we should probably look through sets > of other inheritance pseudos to find the "real" origin. > > Either way, at this point we had a situation in which r673 was used in an > insn whose subreg was larger than the biggest_mode that r673 had when it > was allocated. While x30 was valid for the original biggest_mode, it > wasn't valid for this subreg use. > > The next attempt to constrain insn 74 was: > > Choosing alt 5 in insn 74: (0) ?w (1) r {*aarch64_simd_movv2di} > Creating newreg=684, assigning class GENERAL_REGS to r684 > 74: r287:V2DI=r684:V2DI > Inserting insn reload before: > 951: r684:V2DI=r673:DI#0 > > where LRA reloaded the SUBREG rather than the SUBREG_REG. And it > then cycled trying the same thing when reloading the reload (and the > reload of the reload, etc.). > > What it should be doing here is reloading the SUBREG_REG instead. > There's already code to cope with this case when the paradoxical > subreg falls outside the class (which isn't true here, since r673 > is POINTER_REGS and POINTER_REGS includes x31). But I think we > should also test whether LRA is entitled to allocate the spanned > registers. Not doing that seems like a bug regardless of the above > missed optimisation and the mix-up undoing inheritance. > > Tested so far on aarch64-linux-gnu. I'll try aarch64_be-elf > and x86_64-linux-gnu too. > > Thanks, > Richard > > > 2018-05-30 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > * lra-constraints.c (simplify_operand_subreg): In the paradoxical > case, check whether the outer register overlaps an unallocatable > register, not just whether it fits the required class. > > gcc/testsuite/ > * g++.dg/torture/aarch64-vect-init-1.C: New test. OK. jeff
Index: gcc/lra-constraints.c =================================================================== --- gcc/lra-constraints.c 2018-03-30 12:28:37.301927949 +0100 +++ gcc/lra-constraints.c 2018-05-30 12:07:18.220421910 +0100 @@ -1722,7 +1722,13 @@ simplify_operand_subreg (int nop, machin (subreg:TI (reg:TI 180 [orig:107 __comp ] [107]) 0)) {*movti_internal_rex64} Two reload hard registers will be allocated to reg180 to save TImode data - in LRA_assign. */ + in LRA_assign. + + For LRA pseudos this should normally be handled by the biggest_mode + mechanism. However, it's possible for new uses of an LRA pseudo + to be introduced after we've allocated it, such as when undoing + inheritance, and the allocated register might not then be appropriate + for the new uses. */ else if (REG_P (reg) && REGNO (reg) >= FIRST_PSEUDO_REGISTER && (hard_regno = lra_get_regno_hard_regno (REGNO (reg))) >= 0 @@ -1731,7 +1737,9 @@ simplify_operand_subreg (int nop, machin && (regclass = lra_get_allocno_class (REGNO (reg))) && (type != OP_IN || !in_hard_reg_set_p (reg_class_contents[regclass], - mode, hard_regno))) + mode, hard_regno) + || overlaps_hard_reg_set_p (lra_no_alloc_regs, + mode, hard_regno))) { /* The class will be defined later in curr_insn_transform. */ enum reg_class rclass Index: gcc/testsuite/g++.dg/torture/aarch64-vect-init-1.C =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/g++.dg/torture/aarch64-vect-init-1.C 2018-05-30 12:07:18.220421910 +0100 @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-mcpu=cortex-a72" { target aarch64*-*-* } } */ + +class A { +public: + unsigned char *fn1(); + int fn2(); +}; + +class B { + A fld1; + int fld2; + void fn3(); + unsigned char fld3; +}; + +int a; + +void +B::fn3() { + int b = fld1.fn2() / 8; + unsigned char *c = fld1.fn1(), *d = &fld3, *e = c; + for (; a < fld2;) + for (int j = 0; j < b; j++) + *d++ = e[j]; + for (; 0 < fld2;) + for (int j = 0; j < b; j++) + e[j] = *d++; + for (; fld2;) + ; +}