Don't vectorise zero-step rmw operations (PR 84485)

Message ID 87muzt8ac4.fsf@linaro.org
State New
Headers show
Series
  • Don't vectorise zero-step rmw operations (PR 84485)
Related show

Commit Message

Richard Sandiford Feb. 28, 2018, 2:20 p.m.
GCC 6 and 7 would vectorise:

void
f (unsigned long incx, unsigned long incy,
   float *restrict dx, float *restrict dy)
{
  unsigned long ix = 0, iy = 0;
  for (unsigned long i = 0; i < 512; ++i)
    {
      dy[iy] += dx[ix];
      ix += incx;
      iy += incy;
    }
}

without first proving that incy is nonzero.  This is a regression from
GCC 5.  It was fixed on trunk in r223486, which versioned the loop based
on whether incy is zero, but that's obviously too invasive to backport.
This patch instead bails out for non-constant steps in the place that
trunk would try a check for zeroness.

I did wonder about trying to use range information to prove nonzeroness
for SSA_NAMEs, but that would be entirely new code and didn't seem
suitable for a release branch.

Tested on aarch64-linux-gnu.  OK for GCC 7 and 6?  I'll add the testcase
to trunk too.

Richard


2018-02-28  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	PR tree-optimization/84485
	* tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return
	true for zero dependence distances if either step is variable, and if
	there is no metadata that guarantees correctness.

gcc/testsuite/
	PR tree-optimization/84485
	* gcc.dg/vect/pr84485.c: New test.

Comments

Jakub Jelinek Feb. 28, 2018, 2:24 p.m. | #1
On Wed, Feb 28, 2018 at 02:20:27PM +0000, Richard Sandiford wrote:
> GCC 6 and 7 would vectorise:

> 

> void

> f (unsigned long incx, unsigned long incy,

>    float *restrict dx, float *restrict dy)

> {

>   unsigned long ix = 0, iy = 0;

>   for (unsigned long i = 0; i < 512; ++i)

>     {

>       dy[iy] += dx[ix];

>       ix += incx;

>       iy += incy;

>     }

> }

> 

> without first proving that incy is nonzero.  This is a regression from

> GCC 5.  It was fixed on trunk in r223486, which versioned the loop based


Are you sure about the revision?  This one is from 2015 and so should be in
all of GCC 6/7/8.

	Jakub
Richard Sandiford Feb. 28, 2018, 4:59 p.m. | #2
Jakub Jelinek <jakub@redhat.com> writes:
> On Wed, Feb 28, 2018 at 02:20:27PM +0000, Richard Sandiford wrote:

>> GCC 6 and 7 would vectorise:

>> 

>> void

>> f (unsigned long incx, unsigned long incy,

>>    float *restrict dx, float *restrict dy)

>> {

>>   unsigned long ix = 0, iy = 0;

>>   for (unsigned long i = 0; i < 512; ++i)

>>     {

>>       dy[iy] += dx[ix];

>>       ix += incx;

>>       iy += incy;

>>     }

>> }

>> 

>> without first proving that incy is nonzero.  This is a regression from

>> GCC 5.  It was fixed on trunk in r223486, which versioned the loop based

>

> Are you sure about the revision?  This one is from 2015 and so should be in

> all of GCC 6/7/8.


Oops, that was when the bug started.  r256644 was the fix.
Richard Biener March 1, 2018, 10:37 a.m. | #3
On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> GCC 6 and 7 would vectorise:

>

> void

> f (unsigned long incx, unsigned long incy,

>    float *restrict dx, float *restrict dy)

> {

>   unsigned long ix = 0, iy = 0;

>   for (unsigned long i = 0; i < 512; ++i)

>     {

>       dy[iy] += dx[ix];

>       ix += incx;

>       iy += incy;

>     }

> }

>

> without first proving that incy is nonzero.  This is a regression from

> GCC 5.  It was fixed on trunk in r223486, which versioned the loop based

> on whether incy is zero, but that's obviously too invasive to backport.

> This patch instead bails out for non-constant steps in the place that

> trunk would try a check for zeroness.

>

> I did wonder about trying to use range information to prove nonzeroness

> for SSA_NAMEs, but that would be entirely new code and didn't seem

> suitable for a release branch.

>

> Tested on aarch64-linux-gnu.  OK for GCC 7 and 6?  I'll add the testcase

> to trunk too.


Given dist == 0 isn't it enough to test either DR_STEP (dra) or DR_STEP (drb)?
That seems what trunk is doing (just look at dr_zero_step_indicator of dra).
Even when not using range-info I think that using !
tree_expr_nonzero_p (DR_STEP (dra))
is more to the point of the issue we're fixing -- that also would catch
integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your
patch wouldn't so it looks like a more "complete" fix.

Last but not least trunk and your patch guards all this by
!loop->force_vectorize.
But force_vectorize doesn't give any such guarantee that step isn't
zero so I wonder
why we deliberately choose to possibly miscompile stuff here?

Thus I'd like to see a simpler

+         if (! tree_expr_nonzero_p (DR_STEP (dra)))
+           {
+             if (dump_enabled_p ())
+               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                                "step could be zero.\n");
+             return true;
+           }

if you think that works out and also the force_vectorize guard removed from the
trunk version.

OK with that change.

Thanks,
Richard.

> Richard

>

>

> 2018-02-28  Richard Sandiford  <richard.sandiford@linaro.org>

>

> gcc/

>         PR tree-optimization/84485

>         * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return

>         true for zero dependence distances if either step is variable, and if

>         there is no metadata that guarantees correctness.

>

> gcc/testsuite/

>         PR tree-optimization/84485

>         * gcc.dg/vect/pr84485.c: New test.

>

> Index: gcc/tree-vect-data-refs.c

> ===================================================================

> --- gcc/tree-vect-data-refs.c   2017-07-27 18:08:43.779978373 +0100

> +++ gcc/tree-vect-data-refs.c   2018-02-28 14:16:36.621113244 +0000

> @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct

>                 }

>             }

>

> +         if (!loop->force_vectorize

> +             && (TREE_CODE (DR_STEP (dra)) != INTEGER_CST

> +                 || TREE_CODE (DR_STEP (drb)) != INTEGER_CST))

> +           {

> +             if (dump_enabled_p ())

> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> +                                "step could be zero.\n");

> +             return true;

> +           }

> +

>           continue;

>         }

>

> Index: gcc/testsuite/gcc.dg/vect/pr84485.c

> ===================================================================

> --- /dev/null   2018-02-26 10:26:41.624847060 +0000

> +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-02-28 14:16:36.620112862 +0000

> @@ -0,0 +1,34 @@

> +/* { dg-do run } */

> +

> +#include "tree-vect.h"

> +

> +#define N 256

> +

> +void __attribute__ ((noinline, noclone))

> +f (unsigned long incx, unsigned long incy,

> +   float *restrict dx, float *restrict dy)

> +{

> +  unsigned long ix = 0, iy = 0;

> +  for (unsigned long i = 0; i < N; ++i)

> +    {

> +      dy[iy] += dx[ix];

> +      ix += incx;

> +      iy += incy;

> +    }

> +}

> +

> +float a = 0.0;

> +float b[N];

> +

> +int

> +main (void)

> +{

> +  check_vect ();

> +

> +  for (int i = 0; i < N; ++i)

> +    b[i] = i;

> +  f (1, 0, b, &a);

> +  if (a != N * (N - 1) / 2)

> +    __builtin_abort ();

> +  return 0;

> +}
Richard Sandiford March 1, 2018, 11:38 a.m. | #4
Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> GCC 6 and 7 would vectorise:

>>

>> void

>> f (unsigned long incx, unsigned long incy,

>>    float *restrict dx, float *restrict dy)

>> {

>>   unsigned long ix = 0, iy = 0;

>>   for (unsigned long i = 0; i < 512; ++i)

>>     {

>>       dy[iy] += dx[ix];

>>       ix += incx;

>>       iy += incy;

>>     }

>> }

>>

>> without first proving that incy is nonzero.  This is a regression from

>> GCC 5.  It was fixed on trunk in r223486, which versioned the loop based

>> on whether incy is zero, but that's obviously too invasive to backport.

>> This patch instead bails out for non-constant steps in the place that

>> trunk would try a check for zeroness.

>>

>> I did wonder about trying to use range information to prove nonzeroness

>> for SSA_NAMEs, but that would be entirely new code and didn't seem

>> suitable for a release branch.

>>

>> Tested on aarch64-linux-gnu.  OK for GCC 7 and 6?  I'll add the testcase

>> to trunk too.

>

> Given dist == 0 isn't it enough to test either DR_STEP (dra) or DR_STEP (drb)?

> That seems what trunk is doing (just look at dr_zero_step_indicator of dra).

> Even when not using range-info I think that using !

> tree_expr_nonzero_p (DR_STEP (dra))

> is more to the point of the issue we're fixing -- that also would catch

> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your

> patch wouldn't so it looks like a more "complete" fix.


OK.

> Last but not least trunk and your patch guards all this by

> !loop->force_vectorize.

> But force_vectorize doesn't give any such guarantee that step isn't

> zero so I wonder

> why we deliberately choose to possibly miscompile stuff here?


This was based on the pre-existing:

  if (loop_vinfo && integer_zerop (step))
    {
      GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;
      if (!nested_in_vect_loop_p (loop, stmt))
	return DR_IS_READ (dr);
      /* Allow references with zero step for outer loops marked
	 with pragma omp simd only - it guarantees absence of
	 loop-carried dependencies between inner loop iterations.  */
      if (!loop->force_vectorize)
	{
	  if (dump_enabled_p ())
	    dump_printf_loc (MSG_NOTE, vect_location,
			     "zero step in inner loop of nest\n");
	  return false;
	}
    }

AIUI #pragma omp simd really does guarantee that iterations of
the loop can be executed concurrently (up to the limit given by
safelen if present).  So something like:

  #pragma omp simd
  for (int i = 0; i < n; ++i)
    a[i * step] += 1;

would be incorrect for step==0.  (#pragma ordered simd forces
things to be executed in order where necessary.)

Thanks,
Richard

> Thus I'd like to see a simpler

>

> +         if (! tree_expr_nonzero_p (DR_STEP (dra)))

> +           {

> +             if (dump_enabled_p ())

> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> +                                "step could be zero.\n");

> +             return true;

> +           }

>

> if you think that works out and also the force_vectorize guard removed from the

> trunk version.

>

> OK with that change.

>

> Thanks,

> Richard.

>

>> Richard

>>

>>

>> 2018-02-28  Richard Sandiford  <richard.sandiford@linaro.org>

>>

>> gcc/

>>         PR tree-optimization/84485

>>         * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return

>>         true for zero dependence distances if either step is variable, and if

>>         there is no metadata that guarantees correctness.

>>

>> gcc/testsuite/

>>         PR tree-optimization/84485

>>         * gcc.dg/vect/pr84485.c: New test.

>>

>> Index: gcc/tree-vect-data-refs.c

>> ===================================================================

>> --- gcc/tree-vect-data-refs.c   2017-07-27 18:08:43.779978373 +0100

>> +++ gcc/tree-vect-data-refs.c   2018-02-28 14:16:36.621113244 +0000

>> @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct

>>                 }

>>             }

>>

>> +         if (!loop->force_vectorize

>> +             && (TREE_CODE (DR_STEP (dra)) != INTEGER_CST

>> +                 || TREE_CODE (DR_STEP (drb)) != INTEGER_CST))

>> +           {

>> +             if (dump_enabled_p ())

>> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>> +                                "step could be zero.\n");

>> +             return true;

>> +           }

>> +

>>           continue;

>>         }

>>

>> Index: gcc/testsuite/gcc.dg/vect/pr84485.c

>> ===================================================================

>> --- /dev/null   2018-02-26 10:26:41.624847060 +0000

>> +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-02-28 14:16:36.620112862 +0000

>> @@ -0,0 +1,34 @@

>> +/* { dg-do run } */

>> +

>> +#include "tree-vect.h"

>> +

>> +#define N 256

>> +

>> +void __attribute__ ((noinline, noclone))

>> +f (unsigned long incx, unsigned long incy,

>> +   float *restrict dx, float *restrict dy)

>> +{

>> +  unsigned long ix = 0, iy = 0;

>> +  for (unsigned long i = 0; i < N; ++i)

>> +    {

>> +      dy[iy] += dx[ix];

>> +      ix += incx;

>> +      iy += incy;

>> +    }

>> +}

>> +

>> +float a = 0.0;

>> +float b[N];

>> +

>> +int

>> +main (void)

>> +{

>> +  check_vect ();

>> +

>> +  for (int i = 0; i < N; ++i)

>> +    b[i] = i;

>> +  f (1, 0, b, &a);

>> +  if (a != N * (N - 1) / 2)

>> +    __builtin_abort ();

>> +  return 0;

>> +}
Richard Biener March 1, 2018, 1:58 p.m. | #5
On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:

>> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> GCC 6 and 7 would vectorise:

>>>

>>> void

>>> f (unsigned long incx, unsigned long incy,

>>>    float *restrict dx, float *restrict dy)

>>> {

>>>   unsigned long ix = 0, iy = 0;

>>>   for (unsigned long i = 0; i < 512; ++i)

>>>     {

>>>       dy[iy] += dx[ix];

>>>       ix += incx;

>>>       iy += incy;

>>>     }

>>> }

>>>

>>> without first proving that incy is nonzero.  This is a regression from

>>> GCC 5.  It was fixed on trunk in r223486, which versioned the loop based

>>> on whether incy is zero, but that's obviously too invasive to backport.

>>> This patch instead bails out for non-constant steps in the place that

>>> trunk would try a check for zeroness.

>>>

>>> I did wonder about trying to use range information to prove nonzeroness

>>> for SSA_NAMEs, but that would be entirely new code and didn't seem

>>> suitable for a release branch.

>>>

>>> Tested on aarch64-linux-gnu.  OK for GCC 7 and 6?  I'll add the testcase

>>> to trunk too.

>>

>> Given dist == 0 isn't it enough to test either DR_STEP (dra) or DR_STEP (drb)?

>> That seems what trunk is doing (just look at dr_zero_step_indicator of dra).

>> Even when not using range-info I think that using !

>> tree_expr_nonzero_p (DR_STEP (dra))

>> is more to the point of the issue we're fixing -- that also would catch

>> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your

>> patch wouldn't so it looks like a more "complete" fix.

>

> OK.

>

>> Last but not least trunk and your patch guards all this by

>> !loop->force_vectorize.

>> But force_vectorize doesn't give any such guarantee that step isn't

>> zero so I wonder

>> why we deliberately choose to possibly miscompile stuff here?

>

> This was based on the pre-existing:

>

>   if (loop_vinfo && integer_zerop (step))

>     {

>       GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;

>       if (!nested_in_vect_loop_p (loop, stmt))

>         return DR_IS_READ (dr);

>       /* Allow references with zero step for outer loops marked

>          with pragma omp simd only - it guarantees absence of

>          loop-carried dependencies between inner loop iterations.  */

>       if (!loop->force_vectorize)

>         {

>           if (dump_enabled_p ())

>             dump_printf_loc (MSG_NOTE, vect_location,

>                              "zero step in inner loop of nest\n");

>           return false;

>         }

>     }

>

> AIUI #pragma omp simd really does guarantee that iterations of

> the loop can be executed concurrently (up to the limit given by

> safelen if present).  So something like:

>

>   #pragma omp simd

>   for (int i = 0; i < n; ++i)

>     a[i * step] += 1;

>

> would be incorrect for step==0.  (#pragma ordered simd forces

> things to be executed in order where necessary.)


But then we should check safelen, not force_vectorize...

Richard.

> Thanks,

> Richard

>

>> Thus I'd like to see a simpler

>>

>> +         if (! tree_expr_nonzero_p (DR_STEP (dra)))

>> +           {

>> +             if (dump_enabled_p ())

>> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>> +                                "step could be zero.\n");

>> +             return true;

>> +           }

>>

>> if you think that works out and also the force_vectorize guard removed from the

>> trunk version.

>>

>> OK with that change.

>>

>> Thanks,

>> Richard.

>>

>>> Richard

>>>

>>>

>>> 2018-02-28  Richard Sandiford  <richard.sandiford@linaro.org>

>>>

>>> gcc/

>>>         PR tree-optimization/84485

>>>         * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return

>>>         true for zero dependence distances if either step is variable, and if

>>>         there is no metadata that guarantees correctness.

>>>

>>> gcc/testsuite/

>>>         PR tree-optimization/84485

>>>         * gcc.dg/vect/pr84485.c: New test.

>>>

>>> Index: gcc/tree-vect-data-refs.c

>>> ===================================================================

>>> --- gcc/tree-vect-data-refs.c   2017-07-27 18:08:43.779978373 +0100

>>> +++ gcc/tree-vect-data-refs.c   2018-02-28 14:16:36.621113244 +0000

>>> @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct

>>>                 }

>>>             }

>>>

>>> +         if (!loop->force_vectorize

>>> +             && (TREE_CODE (DR_STEP (dra)) != INTEGER_CST

>>> +                 || TREE_CODE (DR_STEP (drb)) != INTEGER_CST))

>>> +           {

>>> +             if (dump_enabled_p ())

>>> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>>> +                                "step could be zero.\n");

>>> +             return true;

>>> +           }

>>> +

>>>           continue;

>>>         }

>>>

>>> Index: gcc/testsuite/gcc.dg/vect/pr84485.c

>>> ===================================================================

>>> --- /dev/null   2018-02-26 10:26:41.624847060 +0000

>>> +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-02-28 14:16:36.620112862 +0000

>>> @@ -0,0 +1,34 @@

>>> +/* { dg-do run } */

>>> +

>>> +#include "tree-vect.h"

>>> +

>>> +#define N 256

>>> +

>>> +void __attribute__ ((noinline, noclone))

>>> +f (unsigned long incx, unsigned long incy,

>>> +   float *restrict dx, float *restrict dy)

>>> +{

>>> +  unsigned long ix = 0, iy = 0;

>>> +  for (unsigned long i = 0; i < N; ++i)

>>> +    {

>>> +      dy[iy] += dx[ix];

>>> +      ix += incx;

>>> +      iy += incy;

>>> +    }

>>> +}

>>> +

>>> +float a = 0.0;

>>> +float b[N];

>>> +

>>> +int

>>> +main (void)

>>> +{

>>> +  check_vect ();

>>> +

>>> +  for (int i = 0; i < N; ++i)

>>> +    b[i] = i;

>>> +  f (1, 0, b, &a);

>>> +  if (a != N * (N - 1) / 2)

>>> +    __builtin_abort ();

>>> +  return 0;

>>> +}
Richard Sandiford March 2, 2018, 2:12 p.m. | #6
Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> Richard Biener <richard.guenther@gmail.com> writes:

>>> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford

>>> <richard.sandiford@linaro.org> wrote:

>>>> GCC 6 and 7 would vectorise:

>>>>

>>>> void

>>>> f (unsigned long incx, unsigned long incy,

>>>>    float *restrict dx, float *restrict dy)

>>>> {

>>>>   unsigned long ix = 0, iy = 0;

>>>>   for (unsigned long i = 0; i < 512; ++i)

>>>>     {

>>>>       dy[iy] += dx[ix];

>>>>       ix += incx;

>>>>       iy += incy;

>>>>     }

>>>> }

>>>>

>>>> without first proving that incy is nonzero.  This is a regression from

>>>> GCC 5.  It was fixed on trunk in r223486, which versioned the loop based

>>>> on whether incy is zero, but that's obviously too invasive to backport.

>>>> This patch instead bails out for non-constant steps in the place that

>>>> trunk would try a check for zeroness.

>>>>

>>>> I did wonder about trying to use range information to prove nonzeroness

>>>> for SSA_NAMEs, but that would be entirely new code and didn't seem

>>>> suitable for a release branch.

>>>>

>>>> Tested on aarch64-linux-gnu.  OK for GCC 7 and 6?  I'll add the testcase

>>>> to trunk too.

>>>

>>> Given dist == 0 isn't it enough to test either DR_STEP (dra) or

>>> DR_STEP (drb)?

>>> That seems what trunk is doing (just look at dr_zero_step_indicator of dra).

>>> Even when not using range-info I think that using !

>>> tree_expr_nonzero_p (DR_STEP (dra))

>>> is more to the point of the issue we're fixing -- that also would catch

>>> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your

>>> patch wouldn't so it looks like a more "complete" fix.

>>

>> OK.

>>

>>> Last but not least trunk and your patch guards all this by

>>> !loop->force_vectorize.

>>> But force_vectorize doesn't give any such guarantee that step isn't

>>> zero so I wonder

>>> why we deliberately choose to possibly miscompile stuff here?

>>

>> This was based on the pre-existing:

>>

>>   if (loop_vinfo && integer_zerop (step))

>>     {

>>       GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;

>>       if (!nested_in_vect_loop_p (loop, stmt))

>>         return DR_IS_READ (dr);

>>       /* Allow references with zero step for outer loops marked

>>          with pragma omp simd only - it guarantees absence of

>>          loop-carried dependencies between inner loop iterations.  */

>>       if (!loop->force_vectorize)

>>         {

>>           if (dump_enabled_p ())

>>             dump_printf_loc (MSG_NOTE, vect_location,

>>                              "zero step in inner loop of nest\n");

>>           return false;

>>         }

>>     }

>>

>> AIUI #pragma omp simd really does guarantee that iterations of

>> the loop can be executed concurrently (up to the limit given by

>> safelen if present).  So something like:

>>

>>   #pragma omp simd

>>   for (int i = 0; i < n; ++i)

>>     a[i * step] += 1;

>>

>> would be incorrect for step==0.  (#pragma ordered simd forces

>> things to be executed in order where necessary.)

>

> But then we should check safelen, not force_vectorize...


OK.  Following on from irc, this version uses expr_not_equal_to
instead of tree_expr_nonzero_p.  It also adds safelen to the existing
use of force_vectorize (which seemed safer than replacing it).

Tested on aarch64-linux-gnu.  OK to install?

Richard


2018-02-28  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	PR tree-optimization/84485
	* tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return
	true for zero dependence distances if the step might be zero,
	and if there is no metadata that guarantees correctness.
	(vect_analyze_data_ref_access): Check safelen as well as
	force_vectorize.

gcc/testsuite/
	PR tree-optimization/84485
	* gcc.dg/vect/pr84485.c: New test.

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2018-02-28 14:19:22.326678337 +0000
+++ gcc/tree-vect-data-refs.c	2018-03-02 13:28:06.339321095 +0000
@@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct
 		}
 	    }
 
+	  unsigned int step_prec = TYPE_PRECISION (TREE_TYPE (DR_STEP (dra)));
+	  if (loop->safelen < 2
+	      && !expr_not_equal_to (DR_STEP (dra), wi::zero (step_prec)))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "step could be zero.\n");
+	      return true;
+	    }
+
 	  continue;
 	}
 
@@ -2515,7 +2525,7 @@ vect_analyze_data_ref_access (struct dat
       /* Allow references with zero step for outer loops marked
 	 with pragma omp simd only - it guarantees absence of
 	 loop-carried dependencies between inner loop iterations.  */
-      if (!loop->force_vectorize)
+      if (!loop->force_vectorize || loop->safelen < 2)
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_NOTE, vect_location,
Index: gcc/testsuite/gcc.dg/vect/pr84485.c
===================================================================
--- /dev/null	2018-03-01 08:17:49.562264353 +0000
+++ gcc/testsuite/gcc.dg/vect/pr84485.c	2018-03-02 13:28:06.338321095 +0000
@@ -0,0 +1,34 @@
+/* { dg-do run } */
+
+#include "tree-vect.h"
+
+#define N 256
+
+void __attribute__ ((noinline, noclone))
+f (unsigned long incx, unsigned long incy,
+   float *restrict dx, float *restrict dy)
+{
+  unsigned long ix = 0, iy = 0;
+  for (unsigned long i = 0; i < N; ++i)
+    {
+      dy[iy] += dx[ix];
+      ix += incx;
+      iy += incy;
+    }
+}
+
+float a = 0.0;
+float b[N];
+
+int
+main (void)
+{
+  check_vect ();
+
+  for (int i = 0; i < N; ++i)
+    b[i] = i;
+  f (1, 0, b, &a);
+  if (a != N * (N - 1) / 2)
+    __builtin_abort ();
+  return 0;
+}
Richard Biener March 5, 2018, 2:17 p.m. | #7
On Fri, Mar 2, 2018 at 3:12 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:

>> On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> Richard Biener <richard.guenther@gmail.com> writes:

>>>> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford

>>>> <richard.sandiford@linaro.org> wrote:

>>>>> GCC 6 and 7 would vectorise:

>>>>>

>>>>> void

>>>>> f (unsigned long incx, unsigned long incy,

>>>>>    float *restrict dx, float *restrict dy)

>>>>> {

>>>>>   unsigned long ix = 0, iy = 0;

>>>>>   for (unsigned long i = 0; i < 512; ++i)

>>>>>     {

>>>>>       dy[iy] += dx[ix];

>>>>>       ix += incx;

>>>>>       iy += incy;

>>>>>     }

>>>>> }

>>>>>

>>>>> without first proving that incy is nonzero.  This is a regression from

>>>>> GCC 5.  It was fixed on trunk in r223486, which versioned the loop based

>>>>> on whether incy is zero, but that's obviously too invasive to backport.

>>>>> This patch instead bails out for non-constant steps in the place that

>>>>> trunk would try a check for zeroness.

>>>>>

>>>>> I did wonder about trying to use range information to prove nonzeroness

>>>>> for SSA_NAMEs, but that would be entirely new code and didn't seem

>>>>> suitable for a release branch.

>>>>>

>>>>> Tested on aarch64-linux-gnu.  OK for GCC 7 and 6?  I'll add the testcase

>>>>> to trunk too.

>>>>

>>>> Given dist == 0 isn't it enough to test either DR_STEP (dra) or

>>>> DR_STEP (drb)?

>>>> That seems what trunk is doing (just look at dr_zero_step_indicator of dra).

>>>> Even when not using range-info I think that using !

>>>> tree_expr_nonzero_p (DR_STEP (dra))

>>>> is more to the point of the issue we're fixing -- that also would catch

>>>> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your

>>>> patch wouldn't so it looks like a more "complete" fix.

>>>

>>> OK.

>>>

>>>> Last but not least trunk and your patch guards all this by

>>>> !loop->force_vectorize.

>>>> But force_vectorize doesn't give any such guarantee that step isn't

>>>> zero so I wonder

>>>> why we deliberately choose to possibly miscompile stuff here?

>>>

>>> This was based on the pre-existing:

>>>

>>>   if (loop_vinfo && integer_zerop (step))

>>>     {

>>>       GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL;

>>>       if (!nested_in_vect_loop_p (loop, stmt))

>>>         return DR_IS_READ (dr);

>>>       /* Allow references with zero step for outer loops marked

>>>          with pragma omp simd only - it guarantees absence of

>>>          loop-carried dependencies between inner loop iterations.  */

>>>       if (!loop->force_vectorize)

>>>         {

>>>           if (dump_enabled_p ())

>>>             dump_printf_loc (MSG_NOTE, vect_location,

>>>                              "zero step in inner loop of nest\n");

>>>           return false;

>>>         }

>>>     }

>>>

>>> AIUI #pragma omp simd really does guarantee that iterations of

>>> the loop can be executed concurrently (up to the limit given by

>>> safelen if present).  So something like:

>>>

>>>   #pragma omp simd

>>>   for (int i = 0; i < n; ++i)

>>>     a[i * step] += 1;

>>>

>>> would be incorrect for step==0.  (#pragma ordered simd forces

>>> things to be executed in order where necessary.)

>>

>> But then we should check safelen, not force_vectorize...

>

> OK.  Following on from irc, this version uses expr_not_equal_to

> instead of tree_expr_nonzero_p.  It also adds safelen to the existing

> use of force_vectorize (which seemed safer than replacing it).

>

> Tested on aarch64-linux-gnu.  OK to install?


Ok.

Richard.

> Richard

>

>

> 2018-02-28  Richard Sandiford  <richard.sandiford@linaro.org>

>

> gcc/

>         PR tree-optimization/84485

>         * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return

>         true for zero dependence distances if the step might be zero,

>         and if there is no metadata that guarantees correctness.

>         (vect_analyze_data_ref_access): Check safelen as well as

>         force_vectorize.

>

> gcc/testsuite/

>         PR tree-optimization/84485

>         * gcc.dg/vect/pr84485.c: New test.

>

> Index: gcc/tree-vect-data-refs.c

> ===================================================================

> --- gcc/tree-vect-data-refs.c   2018-02-28 14:19:22.326678337 +0000

> +++ gcc/tree-vect-data-refs.c   2018-03-02 13:28:06.339321095 +0000

> @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct

>                 }

>             }

>

> +         unsigned int step_prec = TYPE_PRECISION (TREE_TYPE (DR_STEP (dra)));

> +         if (loop->safelen < 2

> +             && !expr_not_equal_to (DR_STEP (dra), wi::zero (step_prec)))

> +           {

> +             if (dump_enabled_p ())

> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> +                                "step could be zero.\n");

> +             return true;

> +           }

> +

>           continue;

>         }

>

> @@ -2515,7 +2525,7 @@ vect_analyze_data_ref_access (struct dat

>        /* Allow references with zero step for outer loops marked

>          with pragma omp simd only - it guarantees absence of

>          loop-carried dependencies between inner loop iterations.  */

> -      if (!loop->force_vectorize)

> +      if (!loop->force_vectorize || loop->safelen < 2)

>         {

>           if (dump_enabled_p ())

>             dump_printf_loc (MSG_NOTE, vect_location,

> Index: gcc/testsuite/gcc.dg/vect/pr84485.c

> ===================================================================

> --- /dev/null   2018-03-01 08:17:49.562264353 +0000

> +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-03-02 13:28:06.338321095 +0000

> @@ -0,0 +1,34 @@

> +/* { dg-do run } */

> +

> +#include "tree-vect.h"

> +

> +#define N 256

> +

> +void __attribute__ ((noinline, noclone))

> +f (unsigned long incx, unsigned long incy,

> +   float *restrict dx, float *restrict dy)

> +{

> +  unsigned long ix = 0, iy = 0;

> +  for (unsigned long i = 0; i < N; ++i)

> +    {

> +      dy[iy] += dx[ix];

> +      ix += incx;

> +      iy += incy;

> +    }

> +}

> +

> +float a = 0.0;

> +float b[N];

> +

> +int

> +main (void)

> +{

> +  check_vect ();

> +

> +  for (int i = 0; i < N; ++i)

> +    b[i] = i;

> +  f (1, 0, b, &a);

> +  if (a != N * (N - 1) / 2)

> +    __builtin_abort ();

> +  return 0;

> +}

Patch

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-07-27 18:08:43.779978373 +0100
+++ gcc/tree-vect-data-refs.c	2018-02-28 14:16:36.621113244 +0000
@@ -394,6 +394,16 @@  vect_analyze_data_ref_dependence (struct
 		}
 	    }
 
+	  if (!loop->force_vectorize
+	      && (TREE_CODE (DR_STEP (dra)) != INTEGER_CST
+		  || TREE_CODE (DR_STEP (drb)) != INTEGER_CST))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "step could be zero.\n");
+	      return true;
+	    }
+
 	  continue;
 	}
 
Index: gcc/testsuite/gcc.dg/vect/pr84485.c
===================================================================
--- /dev/null	2018-02-26 10:26:41.624847060 +0000
+++ gcc/testsuite/gcc.dg/vect/pr84485.c	2018-02-28 14:16:36.620112862 +0000
@@ -0,0 +1,34 @@ 
+/* { dg-do run } */
+
+#include "tree-vect.h"
+
+#define N 256
+
+void __attribute__ ((noinline, noclone))
+f (unsigned long incx, unsigned long incy,
+   float *restrict dx, float *restrict dy)
+{
+  unsigned long ix = 0, iy = 0;
+  for (unsigned long i = 0; i < N; ++i)
+    {
+      dy[iy] += dx[ix];
+      ix += incx;
+      iy += incy;
+    }
+}
+
+float a = 0.0;
+float b[N];
+
+int
+main (void)
+{
+  check_vect ();
+
+  for (int i = 0; i < N; ++i)
+    b[i] = i;
+  f (1, 0, b, &a);
+  if (a != N * (N - 1) / 2)
+    __builtin_abort ();
+  return 0;
+}