Use SCEV information when aligning for vectorisation (PR 84005)

Message ID 871sgj0ym5.fsf@linaro.org
State New
Headers show
Series
  • Use SCEV information when aligning for vectorisation (PR 84005)
Related show

Commit Message

Richard Sandiford March 17, 2018, 10:45 a.m.
This PR is another regression caused by the removal of the simple_iv
check in dr_analyze_innermost for BB analysis.  Without splitting out
the step, we weren't able to find an underlying object whose alignment
could be increased.

As with PR81635, I think the simple_iv was only handling one special
case of something that ought to be more general.  The more general
thing here is that if the address can be analysed as a scalar
evolution, and if all updates preserve alignment N, it's possible
to align the address to N by increasing the alignment of the base
object to N.  That applies also to outer loops, and to both loop
and BB analysis.

I wasn't sure where the new functions ought to live, but tree-data-ref.c
seemed OK since (a) that already does scev analysis on addresses and
(b) you'd want to use dr_analyze_innermost first if you were analysing
a reference.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2018-03-17  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	PR tree-optimization/84005
	* tree-data-ref.h (get_base_for_alignment): Declare.
	* tree-data-ref.c (get_base_for_alignment_1): New function.
	(get_base_for_alignment): Likewise.
	* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use
	get_base_for_alignment to find a suitable base object, instead
	of always using drb->base_address.

gcc/testsuite/
	PR tree-optimization/84005
	* gcc.dg/vect/bb-slp-1.c: Make sure there is no message about
	failing to force the alignment.

Comments

Richard Biener March 19, 2018, 2:38 p.m. | #1
On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This PR is another regression caused by the removal of the simple_iv

> check in dr_analyze_innermost for BB analysis.  Without splitting out

> the step, we weren't able to find an underlying object whose alignment

> could be increased.

>

> As with PR81635, I think the simple_iv was only handling one special

> case of something that ought to be more general.  The more general

> thing here is that if the address can be analysed as a scalar

> evolution, and if all updates preserve alignment N, it's possible

> to align the address to N by increasing the alignment of the base

> object to N.  That applies also to outer loops, and to both loop

> and BB analysis.

>

> I wasn't sure where the new functions ought to live, but tree-data-ref.c

> seemed OK since (a) that already does scev analysis on addresses and

> (b) you'd want to use dr_analyze_innermost first if you were analysing

> a reference.

>

> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

>

> Richard

>

>

> 2018-03-17  Richard Sandiford  <richard.sandiford@linaro.org>

>

> gcc/

>         PR tree-optimization/84005

>         * tree-data-ref.h (get_base_for_alignment): Declare.

>         * tree-data-ref.c (get_base_for_alignment_1): New function.

>         (get_base_for_alignment): Likewise.

>         * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use

>         get_base_for_alignment to find a suitable base object, instead

>         of always using drb->base_address.

>

> gcc/testsuite/

>         PR tree-optimization/84005

>         * gcc.dg/vect/bb-slp-1.c: Make sure there is no message about

>         failing to force the alignment.

>

> Index: gcc/tree-data-ref.h

> ===================================================================

> --- gcc/tree-data-ref.h 2018-01-13 18:02:00.948360274 +0000

> +++ gcc/tree-data-ref.h 2018-03-17 10:43:42.544669549 +0000

> @@ -463,6 +463,7 @@ extern bool compute_all_dependences (vec

>  extern tree find_data_references_in_bb (struct loop *, basic_block,

>                                          vec<data_reference_p> *);

>  extern unsigned int dr_alignment (innermost_loop_behavior *);

> +extern tree get_base_for_alignment (tree, unsigned int *);

>

>  /* Return the alignment in bytes that DR is guaranteed to have at all

>     times.  */

> Index: gcc/tree-data-ref.c

> ===================================================================

> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000

> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000

> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d

>    return alignment;

>  }

>

> +/* If BASE is a pointer-typed SSA name, try to find the object that it

> +   is based on.  Return this object X on success and store the alignment

> +   in bytes of BASE - &X in *ALIGNMENT_OUT.  */

> +

> +static tree

> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out)

> +{

> +  if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base)))

> +    return NULL_TREE;

> +

> +  gimple *def = SSA_NAME_DEF_STMT (base);

> +  base = analyze_scalar_evolution (loop_containing_stmt (def), base);


I think it is an error to use the def stmt of base here.  Instead you
need to pass down
the point/loop of the use.  For the same reason you also want to instantiate
parameters after analyzing the evolution.

In the end, if the BB to be vectorized is contained in a loop nest we want to
instantiate up to the point where (eventually) a DECL we can re-align appears,
but using the containing loop for now looks OK.

> +  /* Peel chrecs and record the minimum alignment preserved by

> +     all steps.  */

> +  unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;

> +  while (TREE_CODE (base) == POLYNOMIAL_CHREC)

> +    {

> +      unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base));

> +      alignment = MIN (alignment, step_alignment);

> +      base = CHREC_LEFT (base);

> +    }

> +

> +  /* Punt if the expression is too complicated to handle.  */

> +  if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE (base)))

> +    return NULL_TREE;

> +

> +  /* Analyze the base to which the steps we peeled were applied.  */

> +  poly_int64 bitsize, bitpos, bytepos;

> +  machine_mode mode;

> +  int unsignedp, reversep, volatilep;

> +  tree offset;

> +  base = get_inner_reference (build_fold_indirect_ref (base),


ick :/

what happens if you simply use get_pointer_alignment here and
strip down POINTER_PLUS_EXPRs to the ultimate LHS?  (basically
what get_pointer_alignment_1 does)  After all get_base_for_alignment_1
itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4].

Thanks,
Richard.

> +                             &bitsize, &bitpos, &offset, &mode,

> +                             &unsignedp, &reversep, &volatilep);

> +  if (!base || !multiple_p (bitpos, BITS_PER_UNIT, &bytepos))

> +    return NULL_TREE;

> +

> +  /* Restrict the alignment to that guaranteed by the offsets.  */

> +  unsigned int bytepos_alignment = known_alignment (bytepos);

> +  if (bytepos_alignment != 0)

> +    alignment = MIN (alignment, bytepos_alignment);

> +  if (offset)

> +    {

> +      unsigned int offset_alignment = highest_pow2_factor (offset);

> +      alignment = MIN (alignment, offset_alignment);

> +    }

> +

> +  *alignment_out = alignment;

> +  return base;

> +}

> +

> +/* Return the object whose alignment would need to be changed in order

> +   to increase the alignment of ADDR.  Store the maximum achievable

> +   alignment in *MAX_ALIGNMENT.  */

> +

> +tree

> +get_base_for_alignment (tree addr, unsigned int *max_alignment)

> +{

> +  tree base = get_base_for_alignment_1 (addr, max_alignment);

> +  if (base)

> +    return base;

> +

> +  if (TREE_CODE (addr) == ADDR_EXPR)

> +    addr = TREE_OPERAND (addr, 0);

> +  *max_alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;

> +  return addr;

> +}

> +

>  /* Recursive helper function.  */

>

>  static bool

> Index: gcc/tree-vect-data-refs.c

> ===================================================================

> --- gcc/tree-vect-data-refs.c   2018-03-02 09:45:40.478265383 +0000

> +++ gcc/tree-vect-data-refs.c   2018-03-17 10:43:42.544669549 +0000

> @@ -957,11 +957,11 @@ vect_compute_data_ref_alignment (struct

>

>    if (base_alignment < vector_alignment)

>      {

> -      tree base = drb->base_address;

> -      if (TREE_CODE (base) == ADDR_EXPR)

> -       base = TREE_OPERAND (base, 0);

> -      if (!vect_can_force_dr_alignment_p (base,

> -                                         vector_alignment * BITS_PER_UNIT))

> +      unsigned int max_alignment;

> +      tree base = get_base_for_alignment (drb->base_address, &max_alignment);

> +      if (max_alignment < vector_alignment

> +         || !vect_can_force_dr_alignment_p (base,

> +                                            vector_alignment * BITS_PER_UNIT))

>         {

>           if (dump_enabled_p ())

>             {

> Index: gcc/testsuite/gcc.dg/vect/bb-slp-1.c

> ===================================================================

> --- gcc/testsuite/gcc.dg/vect/bb-slp-1.c        2017-11-09 15:15:06.846650880 +0000

> +++ gcc/testsuite/gcc.dg/vect/bb-slp-1.c        2018-03-17 10:43:42.541669742 +0000

> @@ -54,5 +54,5 @@ int main (void)

>    return 0;

>  }

>

> +/* { dg-final { scan-tree-dump-not "can't force alignment" "slp1" } } */

>  /* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */

> -
Richard Sandiford March 19, 2018, 9:27 p.m. | #2
Richard Biener <richard.guenther@gmail.com> writes:
> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> Index: gcc/tree-data-ref.c

>> ===================================================================

>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000

>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000

>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d

>>    return alignment;

>>  }

>>

>> +/* If BASE is a pointer-typed SSA name, try to find the object that it

>> +   is based on.  Return this object X on success and store the alignment

>> +   in bytes of BASE - &X in *ALIGNMENT_OUT.  */

>> +

>> +static tree

>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out)

>> +{

>> +  if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base)))

>> +    return NULL_TREE;

>> +

>> +  gimple *def = SSA_NAME_DEF_STMT (base);

>> +  base = analyze_scalar_evolution (loop_containing_stmt (def), base);

>

> I think it is an error to use the def stmt of base here.  Instead you

> need to pass down the point/loop of the use.  For the same reason you

> also want to instantiate parameters after analyzing the evolution.

>

> In the end, if the BB to be vectorized is contained in a loop nest we want to

> instantiate up to the point where (eventually) a DECL we can re-align appears,

> but using the containing loop for now looks OK.


Why's that necessary/better though?  We're not interested in the
evolution of the value at the point it*s used by the potential vector
code, but in how we get from the ultimate base (if there is one) to the
definition of the SSA name.

I don't think instantiating the SCEV would give any new information,
but it could lose some.  E.g. if we have:

  x = &foo;
  do
    x += 8;
  while (*y++ < 10)
  ...potential vector use of *x...

we wouldn't be able to express the address of x after the do-while
loop, because there's nothing that counts the number of iterations.
But the uninstantiated SCEV could tell us that x = &foo + N * 8 for
some (unknown) N.

(OK, so that doesn't actually work unless we take the effort
to look through loop-closed SSA form, but still...)

>> +  /* Peel chrecs and record the minimum alignment preserved by

>> +     all steps.  */

>> +  unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;

>> +  while (TREE_CODE (base) == POLYNOMIAL_CHREC)

>> +    {

>> +      unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base));

>> +      alignment = MIN (alignment, step_alignment);

>> +      base = CHREC_LEFT (base);

>> +    }

>> +

>> +  /* Punt if the expression is too complicated to handle.  */

>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE

>> (base)))

>> +    return NULL_TREE;

>> +

>> +  /* Analyze the base to which the steps we peeled were applied.  */

>> +  poly_int64 bitsize, bitpos, bytepos;

>> +  machine_mode mode;

>> +  int unsignedp, reversep, volatilep;

>> +  tree offset;

>> +  base = get_inner_reference (build_fold_indirect_ref (base),

>

> ick :/

>

> what happens if you simply use get_pointer_alignment here and

> strip down POINTER_PLUS_EXPRs to the ultimate LHS?  (basically

> what get_pointer_alignment_1 does)  After all get_base_for_alignment_1

> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4].


Yeah, but those have (hopefully) been handled by dr_analyze_innermost
already, which should have stripped off both the constant and variable
parts of the offset.  So I think SSA names are the only interesting
input.  The problem is that once we follow the definitions of an SSA
name through CHREC_LEFTs, we get a general address again.

get_pointer_alignment and get_pointer_alignment_1 don't do what we want
because they take the alignment of the existing object into account,
whereas here we explicitly want to ignore that and only calculate the
alignment of the offset.

I guess we could pass a flag down to ignore the current alignment though?

Thanks,
Richard
Richard Biener March 20, 2018, 1:53 p.m. | #3
On Mon, Mar 19, 2018 at 10:27 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:

>> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> Index: gcc/tree-data-ref.c

>>> ===================================================================

>>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000

>>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000

>>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d

>>>    return alignment;

>>>  }

>>>

>>> +/* If BASE is a pointer-typed SSA name, try to find the object that it

>>> +   is based on.  Return this object X on success and store the alignment

>>> +   in bytes of BASE - &X in *ALIGNMENT_OUT.  */

>>> +

>>> +static tree

>>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out)

>>> +{

>>> +  if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base)))

>>> +    return NULL_TREE;

>>> +

>>> +  gimple *def = SSA_NAME_DEF_STMT (base);

>>> +  base = analyze_scalar_evolution (loop_containing_stmt (def), base);

>>

>> I think it is an error to use the def stmt of base here.  Instead you

>> need to pass down the point/loop of the use.  For the same reason you

>> also want to instantiate parameters after analyzing the evolution.

>>

>> In the end, if the BB to be vectorized is contained in a loop nest we want to

>> instantiate up to the point where (eventually) a DECL we can re-align appears,

>> but using the containing loop for now looks OK.

>

> Why's that necessary/better though?  We're not interested in the

> evolution of the value at the point it*s used by the potential vector

> code, but in how we get from the ultimate base (if there is one) to the

> definition of the SSA name.


Hmm, ok.

> I don't think instantiating the SCEV would give any new information,

> but it could lose some.  E.g. if we have:

>

>   x = &foo;

>   do

>     x += 8;

>   while (*y++ < 10)

>   ...potential vector use of *x...

>

> we wouldn't be able to express the address of x after the do-while

> loop, because there's nothing that counts the number of iterations.

> But the uninstantiated SCEV could tell us that x = &foo + N * 8 for

> some (unknown) N.


Not sure that it works that way.  I'm not 100% sure what kind of
parameters are left symbolic, and if analysis loop and instantiation
"loop" are the same I guess it doesn't make any difference...

> (OK, so that doesn't actually work unless we take the effort

> to look through loop-closed SSA form, but still...)

>

>>> +  /* Peel chrecs and record the minimum alignment preserved by

>>> +     all steps.  */

>>> +  unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;

>>> +  while (TREE_CODE (base) == POLYNOMIAL_CHREC)

>>> +    {

>>> +      unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base));

>>> +      alignment = MIN (alignment, step_alignment);

>>> +      base = CHREC_LEFT (base);

>>> +    }

>>> +

>>> +  /* Punt if the expression is too complicated to handle.  */

>>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE

>>> (base)))

>>> +    return NULL_TREE;

>>> +

>>> +  /* Analyze the base to which the steps we peeled were applied.  */

>>> +  poly_int64 bitsize, bitpos, bytepos;

>>> +  machine_mode mode;

>>> +  int unsignedp, reversep, volatilep;

>>> +  tree offset;

>>> +  base = get_inner_reference (build_fold_indirect_ref (base),

>>

>> ick :/

>>

>> what happens if you simply use get_pointer_alignment here and

>> strip down POINTER_PLUS_EXPRs to the ultimate LHS?  (basically

>> what get_pointer_alignment_1 does)  After all get_base_for_alignment_1

>> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4].

>

> Yeah, but those have (hopefully) been handled by dr_analyze_innermost

> already, which should have stripped off both the constant and variable

> parts of the offset.  So I think SSA names are the only interesting

> input.  The problem is that once we follow the definitions of an SSA

> name through CHREC_LEFTs, we get a general address again.

>

> get_pointer_alignment and get_pointer_alignment_1 don't do what we want

> because they take the alignment of the existing object into account,

> whereas here we explicitly want to ignore that and only calculate the

> alignment of the offset.


Ah, indeed - I missed that fact.

> I guess we could pass a flag down to ignore the current alignment though?


But we need the base anyway...  so, given we can only ever re-align decls
and never plain pointers instead of using build_fold_indirect_ref do

 if (TREE_CODE (base) != ADDR_EXPR)
   return NULL_TREE;

else use TREE_OPERAND (base, 0)?

Richard.

>

> Thanks,

> Richard
Richard Sandiford March 20, 2018, 3:26 p.m. | #4
Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Mar 19, 2018 at 10:27 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> Richard Biener <richard.guenther@gmail.com> writes:

>>> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford

>>> <richard.sandiford@linaro.org> wrote:

>>>> Index: gcc/tree-data-ref.c

>>>> ===================================================================

>>>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000

>>>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000

>>>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d

>>>>    return alignment;

>>>>  }

>>>>

>>>> +/* If BASE is a pointer-typed SSA name, try to find the object that it

>>>> +   is based on.  Return this object X on success and store the alignment

>>>> +   in bytes of BASE - &X in *ALIGNMENT_OUT.  */

>>>> +

>>>> +static tree

>>>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out)

>>>> +{

>>>> +  if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base)))

>>>> +    return NULL_TREE;

>>>> +

>>>> +  gimple *def = SSA_NAME_DEF_STMT (base);

>>>> +  base = analyze_scalar_evolution (loop_containing_stmt (def), base);

>>>

>>> I think it is an error to use the def stmt of base here.  Instead you

>>> need to pass down the point/loop of the use.  For the same reason you

>>> also want to instantiate parameters after analyzing the evolution.

>>>

>>> In the end, if the BB to be vectorized is contained in a loop nest we want to

>>> instantiate up to the point where (eventually) a DECL we can re-align

>>> appears,

>>> but using the containing loop for now looks OK.

>>

>> Why's that necessary/better though?  We're not interested in the

>> evolution of the value at the point it*s used by the potential vector

>> code, but in how we get from the ultimate base (if there is one) to the

>> definition of the SSA name.

>

> Hmm, ok.

>

>> I don't think instantiating the SCEV would give any new information,

>> but it could lose some.  E.g. if we have:

>>

>>   x = &foo;

>>   do

>>     x += 8;

>>   while (*y++ < 10)

>>   ...potential vector use of *x...

>>

>> we wouldn't be able to express the address of x after the do-while

>> loop, because there's nothing that counts the number of iterations.

>> But the uninstantiated SCEV could tell us that x = &foo + N * 8 for

>> some (unknown) N.

>

> Not sure that it works that way.  I'm not 100% sure what kind of

> parameters are left symbolic, and if analysis loop and instantiation

> "loop" are the same I guess it doesn't make any difference...

>

>> (OK, so that doesn't actually work unless we take the effort

>> to look through loop-closed SSA form, but still...)

>>

>>>> +  /* Peel chrecs and record the minimum alignment preserved by

>>>> +     all steps.  */

>>>> +  unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;

>>>> +  while (TREE_CODE (base) == POLYNOMIAL_CHREC)

>>>> +    {

>>>> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT

>>>> (base));

>>>> +      alignment = MIN (alignment, step_alignment);

>>>> +      base = CHREC_LEFT (base);

>>>> +    }

>>>> +

>>>> +  /* Punt if the expression is too complicated to handle.  */

>>>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE

>>>> (base)))

>>>> +    return NULL_TREE;

>>>> +

>>>> +  /* Analyze the base to which the steps we peeled were applied.  */

>>>> +  poly_int64 bitsize, bitpos, bytepos;

>>>> +  machine_mode mode;

>>>> +  int unsignedp, reversep, volatilep;

>>>> +  tree offset;

>>>> +  base = get_inner_reference (build_fold_indirect_ref (base),

>>>

>>> ick :/

>>>

>>> what happens if you simply use get_pointer_alignment here and

>>> strip down POINTER_PLUS_EXPRs to the ultimate LHS?  (basically

>>> what get_pointer_alignment_1 does)  After all get_base_for_alignment_1

>>> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4].

>>

>> Yeah, but those have (hopefully) been handled by dr_analyze_innermost

>> already, which should have stripped off both the constant and variable

>> parts of the offset.  So I think SSA names are the only interesting

>> input.  The problem is that once we follow the definitions of an SSA

>> name through CHREC_LEFTs, we get a general address again.

>>

>> get_pointer_alignment and get_pointer_alignment_1 don't do what we want

>> because they take the alignment of the existing object into account,

>> whereas here we explicitly want to ignore that and only calculate the

>> alignment of the offset.

>

> Ah, indeed - I missed that fact.

>

>> I guess we could pass a flag down to ignore the current alignment though?

>

> But we need the base anyway...  so, given we can only ever re-align decls

> and never plain pointers instead of using build_fold_indirect_ref do

>

>  if (TREE_CODE (base) != ADDR_EXPR)

>    return NULL_TREE;

>

> else use TREE_OPERAND (base, 0)?


The build_fold_indirect_ref also helps for POINTER_PLUS_EXPR,
which is what we get for things like the do-while above (e.g.
{ &a + 64, +, 64 }_n if x points to a double *.)

I guess everything we care about would be handled by fold_indirect_ref_1
though, so would that be OK instead?

Thanks,
Richard
Richard Biener March 21, 2018, 10:38 a.m. | #5
On Tue, Mar 20, 2018 at 4:26 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:

>> On Mon, Mar 19, 2018 at 10:27 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> Richard Biener <richard.guenther@gmail.com> writes:

>>>> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford

>>>> <richard.sandiford@linaro.org> wrote:

>>>>> Index: gcc/tree-data-ref.c

>>>>> ===================================================================

>>>>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000

>>>>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000

>>>>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d

>>>>>    return alignment;

>>>>>  }

>>>>>

>>>>> +/* If BASE is a pointer-typed SSA name, try to find the object that it

>>>>> +   is based on.  Return this object X on success and store the alignment

>>>>> +   in bytes of BASE - &X in *ALIGNMENT_OUT.  */

>>>>> +

>>>>> +static tree

>>>>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out)

>>>>> +{

>>>>> +  if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base)))

>>>>> +    return NULL_TREE;

>>>>> +

>>>>> +  gimple *def = SSA_NAME_DEF_STMT (base);

>>>>> +  base = analyze_scalar_evolution (loop_containing_stmt (def), base);

>>>>

>>>> I think it is an error to use the def stmt of base here.  Instead you

>>>> need to pass down the point/loop of the use.  For the same reason you

>>>> also want to instantiate parameters after analyzing the evolution.

>>>>

>>>> In the end, if the BB to be vectorized is contained in a loop nest we want to

>>>> instantiate up to the point where (eventually) a DECL we can re-align

>>>> appears,

>>>> but using the containing loop for now looks OK.

>>>

>>> Why's that necessary/better though?  We're not interested in the

>>> evolution of the value at the point it*s used by the potential vector

>>> code, but in how we get from the ultimate base (if there is one) to the

>>> definition of the SSA name.

>>

>> Hmm, ok.

>>

>>> I don't think instantiating the SCEV would give any new information,

>>> but it could lose some.  E.g. if we have:

>>>

>>>   x = &foo;

>>>   do

>>>     x += 8;

>>>   while (*y++ < 10)

>>>   ...potential vector use of *x...

>>>

>>> we wouldn't be able to express the address of x after the do-while

>>> loop, because there's nothing that counts the number of iterations.

>>> But the uninstantiated SCEV could tell us that x = &foo + N * 8 for

>>> some (unknown) N.

>>

>> Not sure that it works that way.  I'm not 100% sure what kind of

>> parameters are left symbolic, and if analysis loop and instantiation

>> "loop" are the same I guess it doesn't make any difference...

>>

>>> (OK, so that doesn't actually work unless we take the effort

>>> to look through loop-closed SSA form, but still...)

>>>

>>>>> +  /* Peel chrecs and record the minimum alignment preserved by

>>>>> +     all steps.  */

>>>>> +  unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;

>>>>> +  while (TREE_CODE (base) == POLYNOMIAL_CHREC)

>>>>> +    {

>>>>> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT

>>>>> (base));

>>>>> +      alignment = MIN (alignment, step_alignment);

>>>>> +      base = CHREC_LEFT (base);

>>>>> +    }

>>>>> +

>>>>> +  /* Punt if the expression is too complicated to handle.  */

>>>>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE

>>>>> (base)))

>>>>> +    return NULL_TREE;

>>>>> +

>>>>> +  /* Analyze the base to which the steps we peeled were applied.  */

>>>>> +  poly_int64 bitsize, bitpos, bytepos;

>>>>> +  machine_mode mode;

>>>>> +  int unsignedp, reversep, volatilep;

>>>>> +  tree offset;

>>>>> +  base = get_inner_reference (build_fold_indirect_ref (base),

>>>>

>>>> ick :/

>>>>

>>>> what happens if you simply use get_pointer_alignment here and

>>>> strip down POINTER_PLUS_EXPRs to the ultimate LHS?  (basically

>>>> what get_pointer_alignment_1 does)  After all get_base_for_alignment_1

>>>> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or &MEM[a_1 + 4].

>>>

>>> Yeah, but those have (hopefully) been handled by dr_analyze_innermost

>>> already, which should have stripped off both the constant and variable

>>> parts of the offset.  So I think SSA names are the only interesting

>>> input.  The problem is that once we follow the definitions of an SSA

>>> name through CHREC_LEFTs, we get a general address again.

>>>

>>> get_pointer_alignment and get_pointer_alignment_1 don't do what we want

>>> because they take the alignment of the existing object into account,

>>> whereas here we explicitly want to ignore that and only calculate the

>>> alignment of the offset.

>>

>> Ah, indeed - I missed that fact.

>>

>>> I guess we could pass a flag down to ignore the current alignment though?

>>

>> But we need the base anyway...  so, given we can only ever re-align decls

>> and never plain pointers instead of using build_fold_indirect_ref do

>>

>>  if (TREE_CODE (base) != ADDR_EXPR)

>>    return NULL_TREE;

>>

>> else use TREE_OPERAND (base, 0)?

>

> The build_fold_indirect_ref also helps for POINTER_PLUS_EXPR,

> which is what we get for things like the do-while above (e.g.

> { &a + 64, +, 64 }_n if x points to a double *.)

>

> I guess everything we care about would be handled by fold_indirect_ref_1

> though, so would that be OK instead?


Sounds like a compromise ;)

Patch is ok with that change.

Thanks,
Richard.

> Thanks,

> Richard
Richard Sandiford March 24, 2018, 10:52 a.m. | #6
Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Mar 20, 2018 at 4:26 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> Richard Biener <richard.guenther@gmail.com> writes:

>>> On Mon, Mar 19, 2018 at 10:27 PM, Richard Sandiford

>>> <richard.sandiford@linaro.org> wrote:

>>>> Richard Biener <richard.guenther@gmail.com> writes:

>>>>> On Sat, Mar 17, 2018 at 11:45 AM, Richard Sandiford

>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>> Index: gcc/tree-data-ref.c

>>>>>> ===================================================================

>>>>>> --- gcc/tree-data-ref.c 2018-02-14 13:14:36.268006810 +0000

>>>>>> +++ gcc/tree-data-ref.c 2018-03-17 10:43:42.544669549 +0000

>>>>>> @@ -5200,6 +5200,75 @@ dr_alignment (innermost_loop_behavior *d

>>>>>>    return alignment;

>>>>>>  }

>>>>>>

>>>>>> +/* If BASE is a pointer-typed SSA name, try to find the object that it

>>>>>> +   is based on.  Return this object X on success and store the alignment

>>>>>> +   in bytes of BASE - &X in *ALIGNMENT_OUT.  */

>>>>>> +

>>>>>> +static tree

>>>>>> +get_base_for_alignment_1 (tree base, unsigned int *alignment_out)

>>>>>> +{

>>>>>> +  if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base)))

>>>>>> +    return NULL_TREE;

>>>>>> +

>>>>>> +  gimple *def = SSA_NAME_DEF_STMT (base);

>>>>>> +  base = analyze_scalar_evolution (loop_containing_stmt (def), base);

>>>>>

>>>>> I think it is an error to use the def stmt of base here.  Instead you

>>>>> need to pass down the point/loop of the use.  For the same reason you

>>>>> also want to instantiate parameters after analyzing the evolution.

>>>>>

>>>>> In the end, if the BB to be vectorized is contained in a loop nest

>>>>> we want to

>>>>> instantiate up to the point where (eventually) a DECL we can re-align

>>>>> appears,

>>>>> but using the containing loop for now looks OK.

>>>>

>>>> Why's that necessary/better though?  We're not interested in the

>>>> evolution of the value at the point it*s used by the potential vector

>>>> code, but in how we get from the ultimate base (if there is one) to the

>>>> definition of the SSA name.

>>>

>>> Hmm, ok.

>>>

>>>> I don't think instantiating the SCEV would give any new information,

>>>> but it could lose some.  E.g. if we have:

>>>>

>>>>   x = &foo;

>>>>   do

>>>>     x += 8;

>>>>   while (*y++ < 10)

>>>>   ...potential vector use of *x...

>>>>

>>>> we wouldn't be able to express the address of x after the do-while

>>>> loop, because there's nothing that counts the number of iterations.

>>>> But the uninstantiated SCEV could tell us that x = &foo + N * 8 for

>>>> some (unknown) N.

>>>

>>> Not sure that it works that way.  I'm not 100% sure what kind of

>>> parameters are left symbolic, and if analysis loop and instantiation

>>> "loop" are the same I guess it doesn't make any difference...

>>>

>>>> (OK, so that doesn't actually work unless we take the effort

>>>> to look through loop-closed SSA form, but still...)

>>>>

>>>>>> +  /* Peel chrecs and record the minimum alignment preserved by

>>>>>> +     all steps.  */

>>>>>> +  unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;

>>>>>> +  while (TREE_CODE (base) == POLYNOMIAL_CHREC)

>>>>>> +    {

>>>>>> + unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT

>>>>>> (base));

>>>>>> +      alignment = MIN (alignment, step_alignment);

>>>>>> +      base = CHREC_LEFT (base);

>>>>>> +    }

>>>>>> +

>>>>>> +  /* Punt if the expression is too complicated to handle.  */

>>>>>> + if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE

>>>>>> (base)))

>>>>>> +    return NULL_TREE;

>>>>>> +

>>>>>> +  /* Analyze the base to which the steps we peeled were applied.  */

>>>>>> +  poly_int64 bitsize, bitpos, bytepos;

>>>>>> +  machine_mode mode;

>>>>>> +  int unsignedp, reversep, volatilep;

>>>>>> +  tree offset;

>>>>>> +  base = get_inner_reference (build_fold_indirect_ref (base),

>>>>>

>>>>> ick :/

>>>>>

>>>>> what happens if you simply use get_pointer_alignment here and

>>>>> strip down POINTER_PLUS_EXPRs to the ultimate LHS?  (basically

>>>>> what get_pointer_alignment_1 does)  After all get_base_for_alignment_1

>>>>> itself only deals with plain SSA_NAMEs, not, say, &a_1->b.c or

>>>>> &MEM[a_1 + 4].

>>>>

>>>> Yeah, but those have (hopefully) been handled by dr_analyze_innermost

>>>> already, which should have stripped off both the constant and variable

>>>> parts of the offset.  So I think SSA names are the only interesting

>>>> input.  The problem is that once we follow the definitions of an SSA

>>>> name through CHREC_LEFTs, we get a general address again.

>>>>

>>>> get_pointer_alignment and get_pointer_alignment_1 don't do what we want

>>>> because they take the alignment of the existing object into account,

>>>> whereas here we explicitly want to ignore that and only calculate the

>>>> alignment of the offset.

>>>

>>> Ah, indeed - I missed that fact.

>>>

>>>> I guess we could pass a flag down to ignore the current alignment though?

>>>

>>> But we need the base anyway...  so, given we can only ever re-align decls

>>> and never plain pointers instead of using build_fold_indirect_ref do

>>>

>>>  if (TREE_CODE (base) != ADDR_EXPR)

>>>    return NULL_TREE;

>>>

>>> else use TREE_OPERAND (base, 0)?

>>

>> The build_fold_indirect_ref also helps for POINTER_PLUS_EXPR,

>> which is what we get for things like the do-while above (e.g.

>> { &a + 64, +, 64 }_n if x points to a double *.)

>>

>> I guess everything we care about would be handled by fold_indirect_ref_1

>> though, so would that be OK instead?

>

> Sounds like a compromise ;)

>

> Patch is ok with that change.


Thanks, here's what I installed.  Tested as before.

Richard


2018-03-24  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	PR tree-optimization/84005
	* tree-data-ref.h (get_base_for_alignment): Declare.
	* tree-data-ref.c (get_base_for_alignment_1): New function.
	(get_base_for_alignment): Likewise.
	* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Use
	get_base_for_alignment to find a suitable base object, instead
	of always using drb->base_address.

gcc/testsuite/
	PR tree-optimization/84005
	* gcc.dg/vect/bb-slp-1.c: Make sure there is no message about
	failing to force the alignment.

Index: gcc/tree-data-ref.h
===================================================================
--- gcc/tree-data-ref.h	2018-03-18 10:24:43.602669700 +0000
+++ gcc/tree-data-ref.h	2018-03-24 10:47:51.281625630 +0000
@@ -463,6 +463,7 @@ extern bool compute_all_dependences (vec
 extern tree find_data_references_in_bb (struct loop *, basic_block,
                                         vec<data_reference_p> *);
 extern unsigned int dr_alignment (innermost_loop_behavior *);
+extern tree get_base_for_alignment (tree, unsigned int *);
 
 /* Return the alignment in bytes that DR is guaranteed to have at all
    times.  */
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2018-03-21 20:51:32.502280975 +0000
+++ gcc/tree-data-ref.c	2018-03-24 10:47:51.281625630 +0000
@@ -5202,6 +5202,81 @@ dr_alignment (innermost_loop_behavior *d
   return alignment;
 }
 
+/* If BASE is a pointer-typed SSA name, try to find the object that it
+   is based on.  Return this object X on success and store the alignment
+   in bytes of BASE - &X in *ALIGNMENT_OUT.  */
+
+static tree
+get_base_for_alignment_1 (tree base, unsigned int *alignment_out)
+{
+  if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base)))
+    return NULL_TREE;
+
+  gimple *def = SSA_NAME_DEF_STMT (base);
+  base = analyze_scalar_evolution (loop_containing_stmt (def), base);
+
+  /* Peel chrecs and record the minimum alignment preserved by
+     all steps.  */
+  unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;
+  while (TREE_CODE (base) == POLYNOMIAL_CHREC)
+    {
+      unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base));
+      alignment = MIN (alignment, step_alignment);
+      base = CHREC_LEFT (base);
+    }
+
+  /* Punt if the expression is too complicated to handle.  */
+  if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE (base)))
+    return NULL_TREE;
+
+  /* The only useful cases are those for which a dereference folds to something
+     other than an INDIRECT_REF.  */
+  tree ref_type = TREE_TYPE (TREE_TYPE (base));
+  tree ref = fold_indirect_ref_1 (UNKNOWN_LOCATION, ref_type, base);
+  if (!ref)
+    return NULL_TREE;
+
+  /* Analyze the base to which the steps we peeled were applied.  */
+  poly_int64 bitsize, bitpos, bytepos;
+  machine_mode mode;
+  int unsignedp, reversep, volatilep;
+  tree offset;
+  base = get_inner_reference (ref, &bitsize, &bitpos, &offset, &mode,
+			      &unsignedp, &reversep, &volatilep);
+  if (!base || !multiple_p (bitpos, BITS_PER_UNIT, &bytepos))
+    return NULL_TREE;
+
+  /* Restrict the alignment to that guaranteed by the offsets.  */
+  unsigned int bytepos_alignment = known_alignment (bytepos);
+  if (bytepos_alignment != 0)
+    alignment = MIN (alignment, bytepos_alignment);
+  if (offset)
+    {
+      unsigned int offset_alignment = highest_pow2_factor (offset);
+      alignment = MIN (alignment, offset_alignment);
+    }
+
+  *alignment_out = alignment;
+  return base;
+}
+
+/* Return the object whose alignment would need to be changed in order
+   to increase the alignment of ADDR.  Store the maximum achievable
+   alignment in *MAX_ALIGNMENT.  */
+
+tree
+get_base_for_alignment (tree addr, unsigned int *max_alignment)
+{
+  tree base = get_base_for_alignment_1 (addr, max_alignment);
+  if (base)
+    return base;
+
+  if (TREE_CODE (addr) == ADDR_EXPR)
+    addr = TREE_OPERAND (addr, 0);
+  *max_alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;
+  return addr;
+}
+
 /* Recursive helper function.  */
 
 static bool
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2018-03-18 10:24:43.602669700 +0000
+++ gcc/tree-vect-data-refs.c	2018-03-24 10:47:51.283625566 +0000
@@ -957,11 +957,11 @@ vect_compute_data_ref_alignment (struct
 
   if (base_alignment < vector_alignment)
     {
-      tree base = drb->base_address;
-      if (TREE_CODE (base) == ADDR_EXPR)
-	base = TREE_OPERAND (base, 0);
-      if (!vect_can_force_dr_alignment_p (base,
-					  vector_alignment * BITS_PER_UNIT))
+      unsigned int max_alignment;
+      tree base = get_base_for_alignment (drb->base_address, &max_alignment);
+      if (max_alignment < vector_alignment
+	  || !vect_can_force_dr_alignment_p (base,
+					     vector_alignment * BITS_PER_UNIT))
 	{
 	  if (dump_enabled_p ())
 	    {
Index: gcc/testsuite/gcc.dg/vect/bb-slp-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-1.c	2018-03-18 10:24:43.602669700 +0000
+++ gcc/testsuite/gcc.dg/vect/bb-slp-1.c	2018-03-24 10:47:51.280625662 +0000
@@ -54,5 +54,5 @@ int main (void)
   return 0;
 }
 
+/* { dg-final { scan-tree-dump-not "can't force alignment" "slp1" } } */
 /* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */
-

Patch

Index: gcc/tree-data-ref.h
===================================================================
--- gcc/tree-data-ref.h	2018-01-13 18:02:00.948360274 +0000
+++ gcc/tree-data-ref.h	2018-03-17 10:43:42.544669549 +0000
@@ -463,6 +463,7 @@  extern bool compute_all_dependences (vec
 extern tree find_data_references_in_bb (struct loop *, basic_block,
                                         vec<data_reference_p> *);
 extern unsigned int dr_alignment (innermost_loop_behavior *);
+extern tree get_base_for_alignment (tree, unsigned int *);
 
 /* Return the alignment in bytes that DR is guaranteed to have at all
    times.  */
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2018-02-14 13:14:36.268006810 +0000
+++ gcc/tree-data-ref.c	2018-03-17 10:43:42.544669549 +0000
@@ -5200,6 +5200,75 @@  dr_alignment (innermost_loop_behavior *d
   return alignment;
 }
 
+/* If BASE is a pointer-typed SSA name, try to find the object that it
+   is based on.  Return this object X on success and store the alignment
+   in bytes of BASE - &X in *ALIGNMENT_OUT.  */
+
+static tree
+get_base_for_alignment_1 (tree base, unsigned int *alignment_out)
+{
+  if (TREE_CODE (base) != SSA_NAME || !POINTER_TYPE_P (TREE_TYPE (base)))
+    return NULL_TREE;
+
+  gimple *def = SSA_NAME_DEF_STMT (base);
+  base = analyze_scalar_evolution (loop_containing_stmt (def), base);
+
+  /* Peel chrecs and record the minimum alignment preserved by
+     all steps.  */
+  unsigned int alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;
+  while (TREE_CODE (base) == POLYNOMIAL_CHREC)
+    {
+      unsigned int step_alignment = highest_pow2_factor (CHREC_RIGHT (base));
+      alignment = MIN (alignment, step_alignment);
+      base = CHREC_LEFT (base);
+    }
+
+  /* Punt if the expression is too complicated to handle.  */
+  if (tree_contains_chrecs (base, NULL) || !POINTER_TYPE_P (TREE_TYPE (base)))
+    return NULL_TREE;
+
+  /* Analyze the base to which the steps we peeled were applied.  */
+  poly_int64 bitsize, bitpos, bytepos;
+  machine_mode mode;
+  int unsignedp, reversep, volatilep;
+  tree offset;
+  base = get_inner_reference (build_fold_indirect_ref (base),
+			      &bitsize, &bitpos, &offset, &mode,
+			      &unsignedp, &reversep, &volatilep);
+  if (!base || !multiple_p (bitpos, BITS_PER_UNIT, &bytepos))
+    return NULL_TREE;
+
+  /* Restrict the alignment to that guaranteed by the offsets.  */
+  unsigned int bytepos_alignment = known_alignment (bytepos);
+  if (bytepos_alignment != 0)
+    alignment = MIN (alignment, bytepos_alignment);
+  if (offset)
+    {
+      unsigned int offset_alignment = highest_pow2_factor (offset);
+      alignment = MIN (alignment, offset_alignment);
+    }
+
+  *alignment_out = alignment;
+  return base;
+}
+
+/* Return the object whose alignment would need to be changed in order
+   to increase the alignment of ADDR.  Store the maximum achievable
+   alignment in *MAX_ALIGNMENT.  */
+
+tree
+get_base_for_alignment (tree addr, unsigned int *max_alignment)
+{
+  tree base = get_base_for_alignment_1 (addr, max_alignment);
+  if (base)
+    return base;
+
+  if (TREE_CODE (addr) == ADDR_EXPR)
+    addr = TREE_OPERAND (addr, 0);
+  *max_alignment = MAX_OFILE_ALIGNMENT / BITS_PER_UNIT;
+  return addr;
+}
+
 /* Recursive helper function.  */
 
 static bool
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2018-03-02 09:45:40.478265383 +0000
+++ gcc/tree-vect-data-refs.c	2018-03-17 10:43:42.544669549 +0000
@@ -957,11 +957,11 @@  vect_compute_data_ref_alignment (struct
 
   if (base_alignment < vector_alignment)
     {
-      tree base = drb->base_address;
-      if (TREE_CODE (base) == ADDR_EXPR)
-	base = TREE_OPERAND (base, 0);
-      if (!vect_can_force_dr_alignment_p (base,
-					  vector_alignment * BITS_PER_UNIT))
+      unsigned int max_alignment;
+      tree base = get_base_for_alignment (drb->base_address, &max_alignment);
+      if (max_alignment < vector_alignment
+	  || !vect_can_force_dr_alignment_p (base,
+					     vector_alignment * BITS_PER_UNIT))
 	{
 	  if (dump_enabled_p ())
 	    {
Index: gcc/testsuite/gcc.dg/vect/bb-slp-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-1.c	2017-11-09 15:15:06.846650880 +0000
+++ gcc/testsuite/gcc.dg/vect/bb-slp-1.c	2018-03-17 10:43:42.541669742 +0000
@@ -54,5 +54,5 @@  int main (void)
   return 0;
 }
 
+/* { dg-final { scan-tree-dump-not "can't force alignment" "slp1" } } */
 /* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */
-