Message ID | 871sgj0ym5.fsf@linaro.org |
---|---|
State | New |
Headers | show |
Series | Use SCEV information when aligning for vectorisation (PR 84005) | expand |
On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > This PR is another regression caused by the removal of the simple_iv > check in dr_analyze_innermost for BB analysis. Without splitting out > the step, we weren't able to find an underlying object whose alignment > could be increased. > > As with PR81635, I think the simple_iv was only handling one special > case of something that ought to be more general. The more general > thing here is that if the address can be analysed as a scalar > evolution, and if all updates preserve alignment N, it's possible > to align the address to N by increasing the alignment of the base > object to N. That applies also to outer loops, and to both loop > and BB analysis. > > I wasn't sure where the new functions ought to live, but tree-data-ref.c > seemed OK since (a) that already does scev analysis on addresses and > (b) you'd want to use dr_analyze_innermost first if you were analysing > a reference. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > > Richard > > > 2018-03-17 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > PR tree-optimization/84005 > * tree-data-ref.h (get_base_for_alignment): Declare. > * tree-data-ref.c (get_base_for_alignment_1): New function. > (get_base_for_alignment): Likewise. > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use > get_base_for_alignment to find a suitable base object, instead > of always using drb->base_address. > > gcc/testsuite/ > PR tree-optimization/84005 > * gcc.dg/vect/bb-slp-1.c: Make sure there is no message about > failing to force the alignment. > > Index: gcc/tree-data-ref.h > =================================================================== > --- gcc/tree-data-ref.h 2018-01-13 18:02:00.948360274 +0000 > +++ gcc/tree-data-ref.h 2018-03-17 10:43:42.544669549 +0000 > @@ -463,6 +463,7 @@ extern bool compute_all_dependences (vec > extern tree find_data_references_in_bb (struct loop *, basic_block, > vec<data_reference_p> *); > extern unsigned int dr_alignment (innermost_loop_behavior *); > +extern tree get_base_for_alignment (tree, unsigned int *); > > /* Return the alignment in bytes that DR is guaranteed to have at all > times. */ > Index: gcc/tree-data-ref.c > =================================================================== > --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000 > +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000 > @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d > return alignment; > } > > +/* If BASE is a pointer-typed SSA name, try to find the object that it > + is based on. Return this object X on success and store the alignment > + in bytes of BASE - &X in *ALIGNMENT_OUT. */ > + > +static tree > +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) > +{ > + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base))) > + return NULL_TREE; > + > + gimple *def = SSA_NAME_DEF_STMT (base); > + base = analyze_scalar_evolution (loop_containing_stmt (def), base); I think it is an error to use the def stmt of base here. Instead you need to pass down the point/loop of the use. For the same reason you also want to instantiate parameters after analyzing the evolution. In the end, if the BB to be vectorized is contained in a loop nest we want to instantiate up to the point where (eventually) a DECL we can re-align appears, but using the containing loop for now looks OK. > + /* Peel chrecs and record the minimum alignment preserved by > + all steps. */ > + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; > + while (TREE_CODE (base) == POLYNOMIAL_CHREC) > + { > + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base)); > + alignment = MIN (alignment, step_alignment); > + base = CHREC_LEFT (base); > + } > + > + /* Punt if the expression is too complicated to handle. */ > + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE (base))) > + return NULL_TREE; > + > + /* Analyze the base to which the steps we peeled were applied. */ > + poly_int64 bitsize, bitpos, bytepos; > + machine_mode mode; > + int unsignedp, reversep, volatilep; > + tree offset; > + base = get_inner_reference (build_fold_indirect_ref (base), ick :/ what happens if you simply use get_pointer_alignment here and strip down POINTER_PLUS_EXPRs to the ultimate LHS? (basically what get_pointer_alignment_1 does) After all get_base_for_alignment_1 itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4]. Thanks, Richard. > + &bitsize, &bitpos, &offset, &mode, > + &unsignedp, &reversep, &volatilep); > + if (!base || !multiple_p (bitpos, BITS_PER_UNIT, &bytepos)) > + return NULL_TREE; > + > + /* Restrict the alignment to that guaranteed by the offsets. */ > + unsigned int bytepos_alignment = known_alignment (bytepos); > + if (bytepos_alignment != 0) > + alignment = MIN (alignment, bytepos_alignment); > + if (offset) > + { > + unsigned int offset_alignment = highest_pow2_factor (offset); > + alignment = MIN (alignment, offset_alignment); > + } > + > + *alignment_out = alignment; > + return base; > +} > + > +/* Return the object whose alignment would need to be changed in order > + to increase the alignment of ADDR. Store the maximum achievable > + alignment in *MAX_ALIGNMENT. */ > + > +tree > +get_base_for_alignment (tree addr, unsigned int *max_alignment) > +{ > + tree base = get_base_for_alignment_1 (addr, max_alignment); > + if (base) > + return base; > + > + if (TREE_CODE (addr) == ADDR_EXPR) > + addr = TREE_OPERAND (addr, 0); > + *max_alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; > + return addr; > +} > + > /* Recursive helper function. */ > > static bool > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2018-03-02 09:45:40.478265383 +0000 > +++ gcc/tree-vect-data-refs.c 2018-03-17 10:43:42.544669549 +0000 > @@ -957,11 +957,11 @@ vect_compute_data_ref_alignment (struct > > if (base_alignment < vector_alignment) > { > - tree base = drb->base_address; > - if (TREE_CODE (base) == ADDR_EXPR) > - base = TREE_OPERAND (base, 0); > - if (!vect_can_force_dr_alignment_p (base, > - vector_alignment * BITS_PER_UNIT)) > + unsigned int max_alignment; > + tree base = get_base_for_alignment (drb->base_address, &max_alignment); > + if (max_alignment < vector_alignment > + || !vect_can_force_dr_alignment_p (base, > + vector_alignment * BITS_PER_UNIT)) > { > if (dump_enabled_p ()) > { > Index: gcc/testsuite/gcc.dg/vect/bb-slp-1.c > =================================================================== > --- gcc/testsuite/gcc.dg/vect/bb-slp-1.c 2017-11-09 15:15:06.846650880 +0000 > +++ gcc/testsuite/gcc.dg/vect/bb-slp-1.c 2018-03-17 10:43:42.541669742 +0000 > @@ -54,5 +54,5 @@ int main (void) > return 0; > } > > +/* { dg-final { scan-tree-dump-not "can't force alignment" "slp1" } } */ > /* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */ > -
Richard Biener <richard.guenther@gmail.com> writes: > On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Index: gcc/tree-data-ref.c >> =================================================================== >> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000 >> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000 >> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d >> return alignment; >> } >> >> +/* If BASE is a pointer-typed SSA name, try to find the object that it >> + is based on. Return this object X on success and store the alignment >> + in bytes of BASE - &X in *ALIGNMENT_OUT. */ >> + >> +static tree >> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) >> +{ >> + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base))) >> + return NULL_TREE; >> + >> + gimple *def = SSA_NAME_DEF_STMT (base); >> + base = analyze_scalar_evolution (loop_containing_stmt (def), base); > > I think it is an error to use the def stmt of base here. Instead you > need to pass down the point/loop of the use. For the same reason you > also want to instantiate parameters after analyzing the evolution. > > In the end, if the BB to be vectorized is contained in a loop nest we want to > instantiate up to the point where (eventually) a DECL we can re-align appears, > but using the containing loop for now looks OK. Why's that necessary/better though? We're not interested in the evolution of the value at the point it*s used by the potential vector code, but in how we get from the ultimate base (if there is one) to the definition of the SSA name. I don't think instantiating the SCEV would give any new information, but it could lose some. E.g. if we have: x = &foo; do x += 8; while (*y++ < 10) ...potential vector use of *x... we wouldn't be able to express the address of x after the do-while loop, because there's nothing that counts the number of iterations. But the uninstantiated SCEV could tell us that x = &foo + N * 8 for some (unknown) N. (OK, so that doesn't actually work unless we take the effort to look through loop-closed SSA form, but still...) >> + /* Peel chrecs and record the minimum alignment preserved by >> + all steps. */ >> + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; >> + while (TREE_CODE (base) == POLYNOMIAL_CHREC) >> + { >> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base)); >> + alignment = MIN (alignment, step_alignment); >> + base = CHREC_LEFT (base); >> + } >> + >> + /* Punt if the expression is too complicated to handle. */ >> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE >> (base))) >> + return NULL_TREE; >> + >> + /* Analyze the base to which the steps we peeled were applied. */ >> + poly_int64 bitsize, bitpos, bytepos; >> + machine_mode mode; >> + int unsignedp, reversep, volatilep; >> + tree offset; >> + base = get_inner_reference (build_fold_indirect_ref (base), > > ick :/ > > what happens if you simply use get_pointer_alignment here and > strip down POINTER_PLUS_EXPRs to the ultimate LHS? (basically > what get_pointer_alignment_1 does) After all get_base_for_alignment_1 > itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4]. Yeah, but those have (hopefully) been handled by dr_analyze_innermost already, which should have stripped off both the constant and variable parts of the offset. So I think SSA names are the only interesting input. The problem is that once we follow the definitions of an SSA name through CHREC_LEFTs, we get a general address again. get_pointer_alignment and get_pointer_alignment_1 don't do what we want because they take the alignment of the existing object into account, whereas here we explicitly want to ignore that and only calculate the alignment of the offset. I guess we could pass a flag down to ignore the current alignment though? Thanks, Richard
On Mon, Mar 19, 2018 at 10:27 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> Index: gcc/tree-data-ref.c >>> =================================================================== >>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000 >>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000 >>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d >>> return alignment; >>> } >>> >>> +/* If BASE is a pointer-typed SSA name, try to find the object that it >>> + is based on. Return this object X on success and store the alignment >>> + in bytes of BASE - &X in *ALIGNMENT_OUT. */ >>> + >>> +static tree >>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) >>> +{ >>> + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base))) >>> + return NULL_TREE; >>> + >>> + gimple *def = SSA_NAME_DEF_STMT (base); >>> + base = analyze_scalar_evolution (loop_containing_stmt (def), base); >> >> I think it is an error to use the def stmt of base here. Instead you >> need to pass down the point/loop of the use. For the same reason you >> also want to instantiate parameters after analyzing the evolution. >> >> In the end, if the BB to be vectorized is contained in a loop nest we want to >> instantiate up to the point where (eventually) a DECL we can re-align appears, >> but using the containing loop for now looks OK. > > Why's that necessary/better though? We're not interested in the > evolution of the value at the point it*s used by the potential vector > code, but in how we get from the ultimate base (if there is one) to the > definition of the SSA name. Hmm, ok. > I don't think instantiating the SCEV would give any new information, > but it could lose some. E.g. if we have: > > x = &foo; > do > x += 8; > while (*y++ < 10) > ...potential vector use of *x... > > we wouldn't be able to express the address of x after the do-while > loop, because there's nothing that counts the number of iterations. > But the uninstantiated SCEV could tell us that x = &foo + N * 8 for > some (unknown) N. Not sure that it works that way. I'm not 100% sure what kind of parameters are left symbolic, and if analysis loop and instantiation "loop" are the same I guess it doesn't make any difference... > (OK, so that doesn't actually work unless we take the effort > to look through loop-closed SSA form, but still...) > >>> + /* Peel chrecs and record the minimum alignment preserved by >>> + all steps. */ >>> + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; >>> + while (TREE_CODE (base) == POLYNOMIAL_CHREC) >>> + { >>> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base)); >>> + alignment = MIN (alignment, step_alignment); >>> + base = CHREC_LEFT (base); >>> + } >>> + >>> + /* Punt if the expression is too complicated to handle. */ >>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE >>> (base))) >>> + return NULL_TREE; >>> + >>> + /* Analyze the base to which the steps we peeled were applied. */ >>> + poly_int64 bitsize, bitpos, bytepos; >>> + machine_mode mode; >>> + int unsignedp, reversep, volatilep; >>> + tree offset; >>> + base = get_inner_reference (build_fold_indirect_ref (base), >> >> ick :/ >> >> what happens if you simply use get_pointer_alignment here and >> strip down POINTER_PLUS_EXPRs to the ultimate LHS? (basically >> what get_pointer_alignment_1 does) After all get_base_for_alignment_1 >> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4]. > > Yeah, but those have (hopefully) been handled by dr_analyze_innermost > already, which should have stripped off both the constant and variable > parts of the offset. So I think SSA names are the only interesting > input. The problem is that once we follow the definitions of an SSA > name through CHREC_LEFTs, we get a general address again. > > get_pointer_alignment and get_pointer_alignment_1 don't do what we want > because they take the alignment of the existing object into account, > whereas here we explicitly want to ignore that and only calculate the > alignment of the offset. Ah, indeed - I missed that fact. > I guess we could pass a flag down to ignore the current alignment though? But we need the base anyway... so, given we can only ever re-align decls and never plain pointers instead of using build_fold_indirect_ref do if (TREE_CODE (base) != ADDR_EXPR) return NULL_TREE; else use TREE_OPERAND (base, 0)? Richard. > > Thanks, > Richard
Richard Biener <richard.guenther@gmail.com> writes: > On Mon, Mar 19, 2018 at 10:27 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Richard Biener <richard.guenther@gmail.com> writes: >>> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> Index: gcc/tree-data-ref.c >>>> =================================================================== >>>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000 >>>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000 >>>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d >>>> return alignment; >>>> } >>>> >>>> +/* If BASE is a pointer-typed SSA name, try to find the object that it >>>> + is based on. Return this object X on success and store the alignment >>>> + in bytes of BASE - &X in *ALIGNMENT_OUT. */ >>>> + >>>> +static tree >>>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) >>>> +{ >>>> + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base))) >>>> + return NULL_TREE; >>>> + >>>> + gimple *def = SSA_NAME_DEF_STMT (base); >>>> + base = analyze_scalar_evolution (loop_containing_stmt (def), base); >>> >>> I think it is an error to use the def stmt of base here. Instead you >>> need to pass down the point/loop of the use. For the same reason you >>> also want to instantiate parameters after analyzing the evolution. >>> >>> In the end, if the BB to be vectorized is contained in a loop nest we want to >>> instantiate up to the point where (eventually) a DECL we can re-align >>> appears, >>> but using the containing loop for now looks OK. >> >> Why's that necessary/better though? We're not interested in the >> evolution of the value at the point it*s used by the potential vector >> code, but in how we get from the ultimate base (if there is one) to the >> definition of the SSA name. > > Hmm, ok. > >> I don't think instantiating the SCEV would give any new information, >> but it could lose some. E.g. if we have: >> >> x = &foo; >> do >> x += 8; >> while (*y++ < 10) >> ...potential vector use of *x... >> >> we wouldn't be able to express the address of x after the do-while >> loop, because there's nothing that counts the number of iterations. >> But the uninstantiated SCEV could tell us that x = &foo + N * 8 for >> some (unknown) N. > > Not sure that it works that way. I'm not 100% sure what kind of > parameters are left symbolic, and if analysis loop and instantiation > "loop" are the same I guess it doesn't make any difference... > >> (OK, so that doesn't actually work unless we take the effort >> to look through loop-closed SSA form, but still...) >> >>>> + /* Peel chrecs and record the minimum alignment preserved by >>>> + all steps. */ >>>> + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; >>>> + while (TREE_CODE (base) == POLYNOMIAL_CHREC) >>>> + { >>>> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT >>>> (base)); >>>> + alignment = MIN (alignment, step_alignment); >>>> + base = CHREC_LEFT (base); >>>> + } >>>> + >>>> + /* Punt if the expression is too complicated to handle. */ >>>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE >>>> (base))) >>>> + return NULL_TREE; >>>> + >>>> + /* Analyze the base to which the steps we peeled were applied. */ >>>> + poly_int64 bitsize, bitpos, bytepos; >>>> + machine_mode mode; >>>> + int unsignedp, reversep, volatilep; >>>> + tree offset; >>>> + base = get_inner_reference (build_fold_indirect_ref (base), >>> >>> ick :/ >>> >>> what happens if you simply use get_pointer_alignment here and >>> strip down POINTER_PLUS_EXPRs to the ultimate LHS? (basically >>> what get_pointer_alignment_1 does) After all get_base_for_alignment_1 >>> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4]. >> >> Yeah, but those have (hopefully) been handled by dr_analyze_innermost >> already, which should have stripped off both the constant and variable >> parts of the offset. So I think SSA names are the only interesting >> input. The problem is that once we follow the definitions of an SSA >> name through CHREC_LEFTs, we get a general address again. >> >> get_pointer_alignment and get_pointer_alignment_1 don't do what we want >> because they take the alignment of the existing object into account, >> whereas here we explicitly want to ignore that and only calculate the >> alignment of the offset. > > Ah, indeed - I missed that fact. > >> I guess we could pass a flag down to ignore the current alignment though? > > But we need the base anyway... so, given we can only ever re-align decls > and never plain pointers instead of using build_fold_indirect_ref do > > if (TREE_CODE (base) != ADDR_EXPR) > return NULL_TREE; > > else use TREE_OPERAND (base, 0)? The build_fold_indirect_ref also helps for POINTER_PLUS_EXPR, which is what we get for things like the do-while above (e.g. { &a + 64, +, 64 }_n if x points to a double *.) I guess everything we care about would be handled by fold_indirect_ref_1 though, so would that be OK instead? Thanks, Richard
On Tue, Mar 20, 2018 at 4:26 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Mon, Mar 19, 2018 at 10:27 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> Richard Biener <richard.guenther@gmail.com> writes: >>>> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford >>>> <richard.sandiford@linaro.org> wrote: >>>>> Index: gcc/tree-data-ref.c >>>>> =================================================================== >>>>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000 >>>>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000 >>>>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d >>>>> return alignment; >>>>> } >>>>> >>>>> +/* If BASE is a pointer-typed SSA name, try to find the object that it >>>>> + is based on. Return this object X on success and store the alignment >>>>> + in bytes of BASE - &X in *ALIGNMENT_OUT. */ >>>>> + >>>>> +static tree >>>>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) >>>>> +{ >>>>> + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base))) >>>>> + return NULL_TREE; >>>>> + >>>>> + gimple *def = SSA_NAME_DEF_STMT (base); >>>>> + base = analyze_scalar_evolution (loop_containing_stmt (def), base); >>>> >>>> I think it is an error to use the def stmt of base here. Instead you >>>> need to pass down the point/loop of the use. For the same reason you >>>> also want to instantiate parameters after analyzing the evolution. >>>> >>>> In the end, if the BB to be vectorized is contained in a loop nest we want to >>>> instantiate up to the point where (eventually) a DECL we can re-align >>>> appears, >>>> but using the containing loop for now looks OK. >>> >>> Why's that necessary/better though? We're not interested in the >>> evolution of the value at the point it*s used by the potential vector >>> code, but in how we get from the ultimate base (if there is one) to the >>> definition of the SSA name. >> >> Hmm, ok. >> >>> I don't think instantiating the SCEV would give any new information, >>> but it could lose some. E.g. if we have: >>> >>> x = &foo; >>> do >>> x += 8; >>> while (*y++ < 10) >>> ...potential vector use of *x... >>> >>> we wouldn't be able to express the address of x after the do-while >>> loop, because there's nothing that counts the number of iterations. >>> But the uninstantiated SCEV could tell us that x = &foo + N * 8 for >>> some (unknown) N. >> >> Not sure that it works that way. I'm not 100% sure what kind of >> parameters are left symbolic, and if analysis loop and instantiation >> "loop" are the same I guess it doesn't make any difference... >> >>> (OK, so that doesn't actually work unless we take the effort >>> to look through loop-closed SSA form, but still...) >>> >>>>> + /* Peel chrecs and record the minimum alignment preserved by >>>>> + all steps. */ >>>>> + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; >>>>> + while (TREE_CODE (base) == POLYNOMIAL_CHREC) >>>>> + { >>>>> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT >>>>> (base)); >>>>> + alignment = MIN (alignment, step_alignment); >>>>> + base = CHREC_LEFT (base); >>>>> + } >>>>> + >>>>> + /* Punt if the expression is too complicated to handle. */ >>>>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE >>>>> (base))) >>>>> + return NULL_TREE; >>>>> + >>>>> + /* Analyze the base to which the steps we peeled were applied. */ >>>>> + poly_int64 bitsize, bitpos, bytepos; >>>>> + machine_mode mode; >>>>> + int unsignedp, reversep, volatilep; >>>>> + tree offset; >>>>> + base = get_inner_reference (build_fold_indirect_ref (base), >>>> >>>> ick :/ >>>> >>>> what happens if you simply use get_pointer_alignment here and >>>> strip down POINTER_PLUS_EXPRs to the ultimate LHS? (basically >>>> what get_pointer_alignment_1 does) After all get_base_for_alignment_1 >>>> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4]. >>> >>> Yeah, but those have (hopefully) been handled by dr_analyze_innermost >>> already, which should have stripped off both the constant and variable >>> parts of the offset. So I think SSA names are the only interesting >>> input. The problem is that once we follow the definitions of an SSA >>> name through CHREC_LEFTs, we get a general address again. >>> >>> get_pointer_alignment and get_pointer_alignment_1 don't do what we want >>> because they take the alignment of the existing object into account, >>> whereas here we explicitly want to ignore that and only calculate the >>> alignment of the offset. >> >> Ah, indeed - I missed that fact. >> >>> I guess we could pass a flag down to ignore the current alignment though? >> >> But we need the base anyway... so, given we can only ever re-align decls >> and never plain pointers instead of using build_fold_indirect_ref do >> >> if (TREE_CODE (base) != ADDR_EXPR) >> return NULL_TREE; >> >> else use TREE_OPERAND (base, 0)? > > The build_fold_indirect_ref also helps for POINTER_PLUS_EXPR, > which is what we get for things like the do-while above (e.g. > { &a + 64, +, 64 }_n if x points to a double *.) > > I guess everything we care about would be handled by fold_indirect_ref_1 > though, so would that be OK instead? Sounds like a compromise ;) Patch is ok with that change. Thanks, Richard. > Thanks, > Richard
Richard Biener <richard.guenther@gmail.com> writes: > On Tue, Mar 20, 2018 at 4:26 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Richard Biener <richard.guenther@gmail.com> writes: >>> On Mon, Mar 19, 2018 at 10:27 PM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> Richard Biener <richard.guenther@gmail.com> writes: >>>>> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford >>>>> <richard.sandiford@linaro.org> wrote: >>>>>> Index: gcc/tree-data-ref.c >>>>>> =================================================================== >>>>>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000 >>>>>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000 >>>>>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d >>>>>> return alignment; >>>>>> } >>>>>> >>>>>> +/* If BASE is a pointer-typed SSA name, try to find the object that it >>>>>> + is based on. Return this object X on success and store the alignment >>>>>> + in bytes of BASE - &X in *ALIGNMENT_OUT. */ >>>>>> + >>>>>> +static tree >>>>>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) >>>>>> +{ >>>>>> + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base))) >>>>>> + return NULL_TREE; >>>>>> + >>>>>> + gimple *def = SSA_NAME_DEF_STMT (base); >>>>>> + base = analyze_scalar_evolution (loop_containing_stmt (def), base); >>>>> >>>>> I think it is an error to use the def stmt of base here. Instead you >>>>> need to pass down the point/loop of the use. For the same reason you >>>>> also want to instantiate parameters after analyzing the evolution. >>>>> >>>>> In the end, if the BB to be vectorized is contained in a loop nest >>>>> we want to >>>>> instantiate up to the point where (eventually) a DECL we can re-align >>>>> appears, >>>>> but using the containing loop for now looks OK. >>>> >>>> Why's that necessary/better though? We're not interested in the >>>> evolution of the value at the point it*s used by the potential vector >>>> code, but in how we get from the ultimate base (if there is one) to the >>>> definition of the SSA name. >>> >>> Hmm, ok. >>> >>>> I don't think instantiating the SCEV would give any new information, >>>> but it could lose some. E.g. if we have: >>>> >>>> x = &foo; >>>> do >>>> x += 8; >>>> while (*y++ < 10) >>>> ...potential vector use of *x... >>>> >>>> we wouldn't be able to express the address of x after the do-while >>>> loop, because there's nothing that counts the number of iterations. >>>> But the uninstantiated SCEV could tell us that x = &foo + N * 8 for >>>> some (unknown) N. >>> >>> Not sure that it works that way. I'm not 100% sure what kind of >>> parameters are left symbolic, and if analysis loop and instantiation >>> "loop" are the same I guess it doesn't make any difference... >>> >>>> (OK, so that doesn't actually work unless we take the effort >>>> to look through loop-closed SSA form, but still...) >>>> >>>>>> + /* Peel chrecs and record the minimum alignment preserved by >>>>>> + all steps. */ >>>>>> + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; >>>>>> + while (TREE_CODE (base) == POLYNOMIAL_CHREC) >>>>>> + { >>>>>> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT >>>>>> (base)); >>>>>> + alignment = MIN (alignment, step_alignment); >>>>>> + base = CHREC_LEFT (base); >>>>>> + } >>>>>> + >>>>>> + /* Punt if the expression is too complicated to handle. */ >>>>>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE >>>>>> (base))) >>>>>> + return NULL_TREE; >>>>>> + >>>>>> + /* Analyze the base to which the steps we peeled were applied. */ >>>>>> + poly_int64 bitsize, bitpos, bytepos; >>>>>> + machine_mode mode; >>>>>> + int unsignedp, reversep, volatilep; >>>>>> + tree offset; >>>>>> + base = get_inner_reference (build_fold_indirect_ref (base), >>>>> >>>>> ick :/ >>>>> >>>>> what happens if you simply use get_pointer_alignment here and >>>>> strip down POINTER_PLUS_EXPRs to the ultimate LHS? (basically >>>>> what get_pointer_alignment_1 does) After all get_base_for_alignment_1 >>>>> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or >>>>> &MEM[a_1 + 4]. >>>> >>>> Yeah, but those have (hopefully) been handled by dr_analyze_innermost >>>> already, which should have stripped off both the constant and variable >>>> parts of the offset. So I think SSA names are the only interesting >>>> input. The problem is that once we follow the definitions of an SSA >>>> name through CHREC_LEFTs, we get a general address again. >>>> >>>> get_pointer_alignment and get_pointer_alignment_1 don't do what we want >>>> because they take the alignment of the existing object into account, >>>> whereas here we explicitly want to ignore that and only calculate the >>>> alignment of the offset. >>> >>> Ah, indeed - I missed that fact. >>> >>>> I guess we could pass a flag down to ignore the current alignment though? >>> >>> But we need the base anyway... so, given we can only ever re-align decls >>> and never plain pointers instead of using build_fold_indirect_ref do >>> >>> if (TREE_CODE (base) != ADDR_EXPR) >>> return NULL_TREE; >>> >>> else use TREE_OPERAND (base, 0)? >> >> The build_fold_indirect_ref also helps for POINTER_PLUS_EXPR, >> which is what we get for things like the do-while above (e.g. >> { &a + 64, +, 64 }_n if x points to a double *.) >> >> I guess everything we care about would be handled by fold_indirect_ref_1 >> though, so would that be OK instead? > > Sounds like a compromise ;) > > Patch is ok with that change. Thanks, here's what I installed. Tested as before. Richard 2018-03-24 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/84005 * tree-data-ref.h (get_base_for_alignment): Declare. * tree-data-ref.c (get_base_for_alignment_1): New function. (get_base_for_alignment): Likewise. * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use get_base_for_alignment to find a suitable base object, instead of always using drb->base_address. gcc/testsuite/ PR tree-optimization/84005 * gcc.dg/vect/bb-slp-1.c: Make sure there is no message about failing to force the alignment. Index: gcc/tree-data-ref.h =================================================================== --- gcc/tree-data-ref.h 2018-03-18 10:24:43.602669700 +0000 +++ gcc/tree-data-ref.h 2018-03-24 10:47:51.281625630 +0000 @@ -463,6 +463,7 @@ extern bool compute_all_dependences (vec extern tree find_data_references_in_bb (struct loop *, basic_block, vec<data_reference_p> *); extern unsigned int dr_alignment (innermost_loop_behavior *); +extern tree get_base_for_alignment (tree, unsigned int *); /* Return the alignment in bytes that DR is guaranteed to have at all times. */ Index: gcc/tree-data-ref.c =================================================================== --- gcc/tree-data-ref.c 2018-03-21 20:51:32.502280975 +0000 +++ gcc/tree-data-ref.c 2018-03-24 10:47:51.281625630 +0000 @@ -5202,6 +5202,81 @@ dr_alignment (innermost_loop_behavior *d return alignment; } +/* If BASE is a pointer-typed SSA name, try to find the object that it + is based on. Return this object X on success and store the alignment + in bytes of BASE - &X in *ALIGNMENT_OUT. */ + +static tree +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) +{ + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base))) + return NULL_TREE; + + gimple *def = SSA_NAME_DEF_STMT (base); + base = analyze_scalar_evolution (loop_containing_stmt (def), base); + + /* Peel chrecs and record the minimum alignment preserved by + all steps. */ + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; + while (TREE_CODE (base) == POLYNOMIAL_CHREC) + { + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base)); + alignment = MIN (alignment, step_alignment); + base = CHREC_LEFT (base); + } + + /* Punt if the expression is too complicated to handle. */ + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE (base))) + return NULL_TREE; + + /* The only useful cases are those for which a dereference folds to something + other than an INDIRECT_REF. */ + tree ref_type = TREE_TYPE (TREE_TYPE (base)); + tree ref = fold_indirect_ref_1 (UNKNOWN_LOCATION, ref_type, base); + if (!ref) + return NULL_TREE; + + /* Analyze the base to which the steps we peeled were applied. */ + poly_int64 bitsize, bitpos, bytepos; + machine_mode mode; + int unsignedp, reversep, volatilep; + tree offset; + base = get_inner_reference (ref, &bitsize, &bitpos, &offset, &mode, + &unsignedp, &reversep, &volatilep); + if (!base || !multiple_p (bitpos, BITS_PER_UNIT, &bytepos)) + return NULL_TREE; + + /* Restrict the alignment to that guaranteed by the offsets. */ + unsigned int bytepos_alignment = known_alignment (bytepos); + if (bytepos_alignment != 0) + alignment = MIN (alignment, bytepos_alignment); + if (offset) + { + unsigned int offset_alignment = highest_pow2_factor (offset); + alignment = MIN (alignment, offset_alignment); + } + + *alignment_out = alignment; + return base; +} + +/* Return the object whose alignment would need to be changed in order + to increase the alignment of ADDR. Store the maximum achievable + alignment in *MAX_ALIGNMENT. */ + +tree +get_base_for_alignment (tree addr, unsigned int *max_alignment) +{ + tree base = get_base_for_alignment_1 (addr, max_alignment); + if (base) + return base; + + if (TREE_CODE (addr) == ADDR_EXPR) + addr = TREE_OPERAND (addr, 0); + *max_alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; + return addr; +} + /* Recursive helper function. */ static bool Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2018-03-18 10:24:43.602669700 +0000 +++ gcc/tree-vect-data-refs.c 2018-03-24 10:47:51.283625566 +0000 @@ -957,11 +957,11 @@ vect_compute_data_ref_alignment (struct if (base_alignment < vector_alignment) { - tree base = drb->base_address; - if (TREE_CODE (base) == ADDR_EXPR) - base = TREE_OPERAND (base, 0); - if (!vect_can_force_dr_alignment_p (base, - vector_alignment * BITS_PER_UNIT)) + unsigned int max_alignment; + tree base = get_base_for_alignment (drb->base_address, &max_alignment); + if (max_alignment < vector_alignment + || !vect_can_force_dr_alignment_p (base, + vector_alignment * BITS_PER_UNIT)) { if (dump_enabled_p ()) { Index: gcc/testsuite/gcc.dg/vect/bb-slp-1.c =================================================================== --- gcc/testsuite/gcc.dg/vect/bb-slp-1.c 2018-03-18 10:24:43.602669700 +0000 +++ gcc/testsuite/gcc.dg/vect/bb-slp-1.c 2018-03-24 10:47:51.280625662 +0000 @@ -54,5 +54,5 @@ int main (void) return 0; } +/* { dg-final { scan-tree-dump-not "can't force alignment" "slp1" } } */ /* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */ -
Index: gcc/tree-data-ref.h =================================================================== --- gcc/tree-data-ref.h 2018-01-13 18:02:00.948360274 +0000 +++ gcc/tree-data-ref.h 2018-03-17 10:43:42.544669549 +0000 @@ -463,6 +463,7 @@ extern bool compute_all_dependences (vec extern tree find_data_references_in_bb (struct loop *, basic_block, vec<data_reference_p> *); extern unsigned int dr_alignment (innermost_loop_behavior *); +extern tree get_base_for_alignment (tree, unsigned int *); /* Return the alignment in bytes that DR is guaranteed to have at all times. */ Index: gcc/tree-data-ref.c =================================================================== --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000 +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000 @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d return alignment; } +/* If BASE is a pointer-typed SSA name, try to find the object that it + is based on. Return this object X on success and store the alignment + in bytes of BASE - &X in *ALIGNMENT_OUT. */ + +static tree +get_base_for_alignment_1 (tree base, unsigned int *alignment_out) +{ + if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base))) + return NULL_TREE; + + gimple *def = SSA_NAME_DEF_STMT (base); + base = analyze_scalar_evolution (loop_containing_stmt (def), base); + + /* Peel chrecs and record the minimum alignment preserved by + all steps. */ + unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; + while (TREE_CODE (base) == POLYNOMIAL_CHREC) + { + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base)); + alignment = MIN (alignment, step_alignment); + base = CHREC_LEFT (base); + } + + /* Punt if the expression is too complicated to handle. */ + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE (base))) + return NULL_TREE; + + /* Analyze the base to which the steps we peeled were applied. */ + poly_int64 bitsize, bitpos, bytepos; + machine_mode mode; + int unsignedp, reversep, volatilep; + tree offset; + base = get_inner_reference (build_fold_indirect_ref (base), + &bitsize, &bitpos, &offset, &mode, + &unsignedp, &reversep, &volatilep); + if (!base || !multiple_p (bitpos, BITS_PER_UNIT, &bytepos)) + return NULL_TREE; + + /* Restrict the alignment to that guaranteed by the offsets. */ + unsigned int bytepos_alignment = known_alignment (bytepos); + if (bytepos_alignment != 0) + alignment = MIN (alignment, bytepos_alignment); + if (offset) + { + unsigned int offset_alignment = highest_pow2_factor (offset); + alignment = MIN (alignment, offset_alignment); + } + + *alignment_out = alignment; + return base; +} + +/* Return the object whose alignment would need to be changed in order + to increase the alignment of ADDR. Store the maximum achievable + alignment in *MAX_ALIGNMENT. */ + +tree +get_base_for_alignment (tree addr, unsigned int *max_alignment) +{ + tree base = get_base_for_alignment_1 (addr, max_alignment); + if (base) + return base; + + if (TREE_CODE (addr) == ADDR_EXPR) + addr = TREE_OPERAND (addr, 0); + *max_alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT; + return addr; +} + /* Recursive helper function. */ static bool Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2018-03-02 09:45:40.478265383 +0000 +++ gcc/tree-vect-data-refs.c 2018-03-17 10:43:42.544669549 +0000 @@ -957,11 +957,11 @@ vect_compute_data_ref_alignment (struct if (base_alignment < vector_alignment) { - tree base = drb->base_address; - if (TREE_CODE (base) == ADDR_EXPR) - base = TREE_OPERAND (base, 0); - if (!vect_can_force_dr_alignment_p (base, - vector_alignment * BITS_PER_UNIT)) + unsigned int max_alignment; + tree base = get_base_for_alignment (drb->base_address, &max_alignment); + if (max_alignment < vector_alignment + || !vect_can_force_dr_alignment_p (base, + vector_alignment * BITS_PER_UNIT)) { if (dump_enabled_p ()) { Index: gcc/testsuite/gcc.dg/vect/bb-slp-1.c =================================================================== --- gcc/testsuite/gcc.dg/vect/bb-slp-1.c 2017-11-09 15:15:06.846650880 +0000 +++ gcc/testsuite/gcc.dg/vect/bb-slp-1.c 2018-03-17 10:43:42.541669742 +0000 @@ -54,5 +54,5 @@ int main (void) return 0; } +/* { dg-final { scan-tree-dump-not "can't force alignment" "slp1" } } */ /* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */ -