Message ID | 87muzt8ac4.fsf@linaro.org |
---|---|
State | New |
Headers | show |
Series | Don't vectorise zero-step rmw operations (PR 84485) | expand |
On Wed, Feb 28, 2018 at 02:20:27PM +0000, Richard Sandiford wrote: > GCC 6 and 7 would vectorise: > > void > f (unsigned long incx, unsigned long incy, > float *restrict dx, float *restrict dy) > { > unsigned long ix = 0, iy = 0; > for (unsigned long i = 0; i < 512; ++i) > { > dy[iy] += dx[ix]; > ix += incx; > iy += incy; > } > } > > without first proving that incy is nonzero. This is a regression from > GCC 5. It was fixed on trunk in r223486, which versioned the loop based Are you sure about the revision? This one is from 2015 and so should be in all of GCC 6/7/8. Jakub
Jakub Jelinek <jakub@redhat.com> writes: > On Wed, Feb 28, 2018 at 02:20:27PM +0000, Richard Sandiford wrote: >> GCC 6 and 7 would vectorise: >> >> void >> f (unsigned long incx, unsigned long incy, >> float *restrict dx, float *restrict dy) >> { >> unsigned long ix = 0, iy = 0; >> for (unsigned long i = 0; i < 512; ++i) >> { >> dy[iy] += dx[ix]; >> ix += incx; >> iy += incy; >> } >> } >> >> without first proving that incy is nonzero. This is a regression from >> GCC 5. It was fixed on trunk in r223486, which versioned the loop based > > Are you sure about the revision? This one is from 2015 and so should be in > all of GCC 6/7/8. Oops, that was when the bug started. r256644 was the fix.
On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > GCC 6 and 7 would vectorise: > > void > f (unsigned long incx, unsigned long incy, > float *restrict dx, float *restrict dy) > { > unsigned long ix = 0, iy = 0; > for (unsigned long i = 0; i < 512; ++i) > { > dy[iy] += dx[ix]; > ix += incx; > iy += incy; > } > } > > without first proving that incy is nonzero. This is a regression from > GCC 5. It was fixed on trunk in r223486, which versioned the loop based > on whether incy is zero, but that's obviously too invasive to backport. > This patch instead bails out for non-constant steps in the place that > trunk would try a check for zeroness. > > I did wonder about trying to use range information to prove nonzeroness > for SSA_NAMEs, but that would be entirely new code and didn't seem > suitable for a release branch. > > Tested on aarch64-linux-gnu. OK for GCC 7 and 6? I'll add the testcase > to trunk too. Given dist == 0 isn't it enough to test either DR_STEP (dra) or DR_STEP (drb)? That seems what trunk is doing (just look at dr_zero_step_indicator of dra). Even when not using range-info I think that using ! tree_expr_nonzero_p (DR_STEP (dra)) is more to the point of the issue we're fixing -- that also would catch integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your patch wouldn't so it looks like a more "complete" fix. Last but not least trunk and your patch guards all this by !loop->force_vectorize. But force_vectorize doesn't give any such guarantee that step isn't zero so I wonder why we deliberately choose to possibly miscompile stuff here? Thus I'd like to see a simpler + if (! tree_expr_nonzero_p (DR_STEP (dra))) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "step could be zero.\n"); + return true; + } if you think that works out and also the force_vectorize guard removed from the trunk version. OK with that change. Thanks, Richard. > Richard > > > 2018-02-28 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > PR tree-optimization/84485 > * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return > true for zero dependence distances if either step is variable, and if > there is no metadata that guarantees correctness. > > gcc/testsuite/ > PR tree-optimization/84485 > * gcc.dg/vect/pr84485.c: New test. > > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2017-07-27 18:08:43.779978373 +0100 > +++ gcc/tree-vect-data-refs.c 2018-02-28 14:16:36.621113244 +0000 > @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct > } > } > > + if (!loop->force_vectorize > + && (TREE_CODE (DR_STEP (dra)) != INTEGER_CST > + || TREE_CODE (DR_STEP (drb)) != INTEGER_CST)) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "step could be zero.\n"); > + return true; > + } > + > continue; > } > > Index: gcc/testsuite/gcc.dg/vect/pr84485.c > =================================================================== > --- /dev/null 2018-02-26 10:26:41.624847060 +0000 > +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-02-28 14:16:36.620112862 +0000 > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > + > +#include "tree-vect.h" > + > +#define N 256 > + > +void __attribute__ ((noinline, noclone)) > +f (unsigned long incx, unsigned long incy, > + float *restrict dx, float *restrict dy) > +{ > + unsigned long ix = 0, iy = 0; > + for (unsigned long i = 0; i < N; ++i) > + { > + dy[iy] += dx[ix]; > + ix += incx; > + iy += incy; > + } > +} > + > +float a = 0.0; > +float b[N]; > + > +int > +main (void) > +{ > + check_vect (); > + > + for (int i = 0; i < N; ++i) > + b[i] = i; > + f (1, 0, b, &a); > + if (a != N * (N - 1) / 2) > + __builtin_abort (); > + return 0; > +}
Richard Biener <richard.guenther@gmail.com> writes: > On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> GCC 6 and 7 would vectorise: >> >> void >> f (unsigned long incx, unsigned long incy, >> float *restrict dx, float *restrict dy) >> { >> unsigned long ix = 0, iy = 0; >> for (unsigned long i = 0; i < 512; ++i) >> { >> dy[iy] += dx[ix]; >> ix += incx; >> iy += incy; >> } >> } >> >> without first proving that incy is nonzero. This is a regression from >> GCC 5. It was fixed on trunk in r223486, which versioned the loop based >> on whether incy is zero, but that's obviously too invasive to backport. >> This patch instead bails out for non-constant steps in the place that >> trunk would try a check for zeroness. >> >> I did wonder about trying to use range information to prove nonzeroness >> for SSA_NAMEs, but that would be entirely new code and didn't seem >> suitable for a release branch. >> >> Tested on aarch64-linux-gnu. OK for GCC 7 and 6? I'll add the testcase >> to trunk too. > > Given dist == 0 isn't it enough to test either DR_STEP (dra) or DR_STEP (drb)? > That seems what trunk is doing (just look at dr_zero_step_indicator of dra). > Even when not using range-info I think that using ! > tree_expr_nonzero_p (DR_STEP (dra)) > is more to the point of the issue we're fixing -- that also would catch > integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your > patch wouldn't so it looks like a more "complete" fix. OK. > Last but not least trunk and your patch guards all this by > !loop->force_vectorize. > But force_vectorize doesn't give any such guarantee that step isn't > zero so I wonder > why we deliberately choose to possibly miscompile stuff here? This was based on the pre-existing: if (loop_vinfo && integer_zerop (step)) { GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; if (!nested_in_vect_loop_p (loop, stmt)) return DR_IS_READ (dr); /* Allow references with zero step for outer loops marked with pragma omp simd only - it guarantees absence of loop-carried dependencies between inner loop iterations. */ if (!loop->force_vectorize) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "zero step in inner loop of nest\n"); return false; } } AIUI #pragma omp simd really does guarantee that iterations of the loop can be executed concurrently (up to the limit given by safelen if present). So something like: #pragma omp simd for (int i = 0; i < n; ++i) a[i * step] += 1; would be incorrect for step==0. (#pragma ordered simd forces things to be executed in order where necessary.) Thanks, Richard > Thus I'd like to see a simpler > > + if (! tree_expr_nonzero_p (DR_STEP (dra))) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "step could be zero.\n"); > + return true; > + } > > if you think that works out and also the force_vectorize guard removed from the > trunk version. > > OK with that change. > > Thanks, > Richard. > >> Richard >> >> >> 2018-02-28 Richard Sandiford <richard.sandiford@linaro.org> >> >> gcc/ >> PR tree-optimization/84485 >> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return >> true for zero dependence distances if either step is variable, and if >> there is no metadata that guarantees correctness. >> >> gcc/testsuite/ >> PR tree-optimization/84485 >> * gcc.dg/vect/pr84485.c: New test. >> >> Index: gcc/tree-vect-data-refs.c >> =================================================================== >> --- gcc/tree-vect-data-refs.c 2017-07-27 18:08:43.779978373 +0100 >> +++ gcc/tree-vect-data-refs.c 2018-02-28 14:16:36.621113244 +0000 >> @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct >> } >> } >> >> + if (!loop->force_vectorize >> + && (TREE_CODE (DR_STEP (dra)) != INTEGER_CST >> + || TREE_CODE (DR_STEP (drb)) != INTEGER_CST)) >> + { >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> + "step could be zero.\n"); >> + return true; >> + } >> + >> continue; >> } >> >> Index: gcc/testsuite/gcc.dg/vect/pr84485.c >> =================================================================== >> --- /dev/null 2018-02-26 10:26:41.624847060 +0000 >> +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-02-28 14:16:36.620112862 +0000 >> @@ -0,0 +1,34 @@ >> +/* { dg-do run } */ >> + >> +#include "tree-vect.h" >> + >> +#define N 256 >> + >> +void __attribute__ ((noinline, noclone)) >> +f (unsigned long incx, unsigned long incy, >> + float *restrict dx, float *restrict dy) >> +{ >> + unsigned long ix = 0, iy = 0; >> + for (unsigned long i = 0; i < N; ++i) >> + { >> + dy[iy] += dx[ix]; >> + ix += incx; >> + iy += incy; >> + } >> +} >> + >> +float a = 0.0; >> +float b[N]; >> + >> +int >> +main (void) >> +{ >> + check_vect (); >> + >> + for (int i = 0; i < N; ++i) >> + b[i] = i; >> + f (1, 0, b, &a); >> + if (a != N * (N - 1) / 2) >> + __builtin_abort (); >> + return 0; >> +}
On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> GCC 6 and 7 would vectorise: >>> >>> void >>> f (unsigned long incx, unsigned long incy, >>> float *restrict dx, float *restrict dy) >>> { >>> unsigned long ix = 0, iy = 0; >>> for (unsigned long i = 0; i < 512; ++i) >>> { >>> dy[iy] += dx[ix]; >>> ix += incx; >>> iy += incy; >>> } >>> } >>> >>> without first proving that incy is nonzero. This is a regression from >>> GCC 5. It was fixed on trunk in r223486, which versioned the loop based >>> on whether incy is zero, but that's obviously too invasive to backport. >>> This patch instead bails out for non-constant steps in the place that >>> trunk would try a check for zeroness. >>> >>> I did wonder about trying to use range information to prove nonzeroness >>> for SSA_NAMEs, but that would be entirely new code and didn't seem >>> suitable for a release branch. >>> >>> Tested on aarch64-linux-gnu. OK for GCC 7 and 6? I'll add the testcase >>> to trunk too. >> >> Given dist == 0 isn't it enough to test either DR_STEP (dra) or DR_STEP (drb)? >> That seems what trunk is doing (just look at dr_zero_step_indicator of dra). >> Even when not using range-info I think that using ! >> tree_expr_nonzero_p (DR_STEP (dra)) >> is more to the point of the issue we're fixing -- that also would catch >> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your >> patch wouldn't so it looks like a more "complete" fix. > > OK. > >> Last but not least trunk and your patch guards all this by >> !loop->force_vectorize. >> But force_vectorize doesn't give any such guarantee that step isn't >> zero so I wonder >> why we deliberately choose to possibly miscompile stuff here? > > This was based on the pre-existing: > > if (loop_vinfo && integer_zerop (step)) > { > GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; > if (!nested_in_vect_loop_p (loop, stmt)) > return DR_IS_READ (dr); > /* Allow references with zero step for outer loops marked > with pragma omp simd only - it guarantees absence of > loop-carried dependencies between inner loop iterations. */ > if (!loop->force_vectorize) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > "zero step in inner loop of nest\n"); > return false; > } > } > > AIUI #pragma omp simd really does guarantee that iterations of > the loop can be executed concurrently (up to the limit given by > safelen if present). So something like: > > #pragma omp simd > for (int i = 0; i < n; ++i) > a[i * step] += 1; > > would be incorrect for step==0. (#pragma ordered simd forces > things to be executed in order where necessary.) But then we should check safelen, not force_vectorize... Richard. > Thanks, > Richard > >> Thus I'd like to see a simpler >> >> + if (! tree_expr_nonzero_p (DR_STEP (dra))) >> + { >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> + "step could be zero.\n"); >> + return true; >> + } >> >> if you think that works out and also the force_vectorize guard removed from the >> trunk version. >> >> OK with that change. >> >> Thanks, >> Richard. >> >>> Richard >>> >>> >>> 2018-02-28 Richard Sandiford <richard.sandiford@linaro.org> >>> >>> gcc/ >>> PR tree-optimization/84485 >>> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return >>> true for zero dependence distances if either step is variable, and if >>> there is no metadata that guarantees correctness. >>> >>> gcc/testsuite/ >>> PR tree-optimization/84485 >>> * gcc.dg/vect/pr84485.c: New test. >>> >>> Index: gcc/tree-vect-data-refs.c >>> =================================================================== >>> --- gcc/tree-vect-data-refs.c 2017-07-27 18:08:43.779978373 +0100 >>> +++ gcc/tree-vect-data-refs.c 2018-02-28 14:16:36.621113244 +0000 >>> @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct >>> } >>> } >>> >>> + if (!loop->force_vectorize >>> + && (TREE_CODE (DR_STEP (dra)) != INTEGER_CST >>> + || TREE_CODE (DR_STEP (drb)) != INTEGER_CST)) >>> + { >>> + if (dump_enabled_p ()) >>> + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >>> + "step could be zero.\n"); >>> + return true; >>> + } >>> + >>> continue; >>> } >>> >>> Index: gcc/testsuite/gcc.dg/vect/pr84485.c >>> =================================================================== >>> --- /dev/null 2018-02-26 10:26:41.624847060 +0000 >>> +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-02-28 14:16:36.620112862 +0000 >>> @@ -0,0 +1,34 @@ >>> +/* { dg-do run } */ >>> + >>> +#include "tree-vect.h" >>> + >>> +#define N 256 >>> + >>> +void __attribute__ ((noinline, noclone)) >>> +f (unsigned long incx, unsigned long incy, >>> + float *restrict dx, float *restrict dy) >>> +{ >>> + unsigned long ix = 0, iy = 0; >>> + for (unsigned long i = 0; i < N; ++i) >>> + { >>> + dy[iy] += dx[ix]; >>> + ix += incx; >>> + iy += incy; >>> + } >>> +} >>> + >>> +float a = 0.0; >>> +float b[N]; >>> + >>> +int >>> +main (void) >>> +{ >>> + check_vect (); >>> + >>> + for (int i = 0; i < N; ++i) >>> + b[i] = i; >>> + f (1, 0, b, &a); >>> + if (a != N * (N - 1) / 2) >>> + __builtin_abort (); >>> + return 0; >>> +}
Richard Biener <richard.guenther@gmail.com> writes: > On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Richard Biener <richard.guenther@gmail.com> writes: >>> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> GCC 6 and 7 would vectorise: >>>> >>>> void >>>> f (unsigned long incx, unsigned long incy, >>>> float *restrict dx, float *restrict dy) >>>> { >>>> unsigned long ix = 0, iy = 0; >>>> for (unsigned long i = 0; i < 512; ++i) >>>> { >>>> dy[iy] += dx[ix]; >>>> ix += incx; >>>> iy += incy; >>>> } >>>> } >>>> >>>> without first proving that incy is nonzero. This is a regression from >>>> GCC 5. It was fixed on trunk in r223486, which versioned the loop based >>>> on whether incy is zero, but that's obviously too invasive to backport. >>>> This patch instead bails out for non-constant steps in the place that >>>> trunk would try a check for zeroness. >>>> >>>> I did wonder about trying to use range information to prove nonzeroness >>>> for SSA_NAMEs, but that would be entirely new code and didn't seem >>>> suitable for a release branch. >>>> >>>> Tested on aarch64-linux-gnu. OK for GCC 7 and 6? I'll add the testcase >>>> to trunk too. >>> >>> Given dist == 0 isn't it enough to test either DR_STEP (dra) or >>> DR_STEP (drb)? >>> That seems what trunk is doing (just look at dr_zero_step_indicator of dra). >>> Even when not using range-info I think that using ! >>> tree_expr_nonzero_p (DR_STEP (dra)) >>> is more to the point of the issue we're fixing -- that also would catch >>> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your >>> patch wouldn't so it looks like a more "complete" fix. >> >> OK. >> >>> Last but not least trunk and your patch guards all this by >>> !loop->force_vectorize. >>> But force_vectorize doesn't give any such guarantee that step isn't >>> zero so I wonder >>> why we deliberately choose to possibly miscompile stuff here? >> >> This was based on the pre-existing: >> >> if (loop_vinfo && integer_zerop (step)) >> { >> GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; >> if (!nested_in_vect_loop_p (loop, stmt)) >> return DR_IS_READ (dr); >> /* Allow references with zero step for outer loops marked >> with pragma omp simd only - it guarantees absence of >> loop-carried dependencies between inner loop iterations. */ >> if (!loop->force_vectorize) >> { >> if (dump_enabled_p ()) >> dump_printf_loc (MSG_NOTE, vect_location, >> "zero step in inner loop of nest\n"); >> return false; >> } >> } >> >> AIUI #pragma omp simd really does guarantee that iterations of >> the loop can be executed concurrently (up to the limit given by >> safelen if present). So something like: >> >> #pragma omp simd >> for (int i = 0; i < n; ++i) >> a[i * step] += 1; >> >> would be incorrect for step==0. (#pragma ordered simd forces >> things to be executed in order where necessary.) > > But then we should check safelen, not force_vectorize... OK. Following on from irc, this version uses expr_not_equal_to instead of tree_expr_nonzero_p. It also adds safelen to the existing use of force_vectorize (which seemed safer than replacing it). Tested on aarch64-linux-gnu. OK to install? Richard 2018-02-28 Richard Sandiford <richard.sandiford@linaro.org> gcc/ PR tree-optimization/84485 * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return true for zero dependence distances if the step might be zero, and if there is no metadata that guarantees correctness. (vect_analyze_data_ref_access): Check safelen as well as force_vectorize. gcc/testsuite/ PR tree-optimization/84485 * gcc.dg/vect/pr84485.c: New test. Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2018-02-28 14:19:22.326678337 +0000 +++ gcc/tree-vect-data-refs.c 2018-03-02 13:28:06.339321095 +0000 @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct } } + unsigned int step_prec = TYPE_PRECISION (TREE_TYPE (DR_STEP (dra))); + if (loop->safelen < 2 + && !expr_not_equal_to (DR_STEP (dra), wi::zero (step_prec))) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "step could be zero.\n"); + return true; + } + continue; } @@ -2515,7 +2525,7 @@ vect_analyze_data_ref_access (struct dat /* Allow references with zero step for outer loops marked with pragma omp simd only - it guarantees absence of loop-carried dependencies between inner loop iterations. */ - if (!loop->force_vectorize) + if (!loop->force_vectorize || loop->safelen < 2) { if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, Index: gcc/testsuite/gcc.dg/vect/pr84485.c =================================================================== --- /dev/null 2018-03-01 08:17:49.562264353 +0000 +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-03-02 13:28:06.338321095 +0000 @@ -0,0 +1,34 @@ +/* { dg-do run } */ + +#include "tree-vect.h" + +#define N 256 + +void __attribute__ ((noinline, noclone)) +f (unsigned long incx, unsigned long incy, + float *restrict dx, float *restrict dy) +{ + unsigned long ix = 0, iy = 0; + for (unsigned long i = 0; i < N; ++i) + { + dy[iy] += dx[ix]; + ix += incx; + iy += incy; + } +} + +float a = 0.0; +float b[N]; + +int +main (void) +{ + check_vect (); + + for (int i = 0; i < N; ++i) + b[i] = i; + f (1, 0, b, &a); + if (a != N * (N - 1) / 2) + __builtin_abort (); + return 0; +}
On Fri, Mar 2, 2018 at 3:12 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Thu, Mar 1, 2018 at 12:38 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> Richard Biener <richard.guenther@gmail.com> writes: >>>> On Wed, Feb 28, 2018 at 3:20 PM, Richard Sandiford >>>> <richard.sandiford@linaro.org> wrote: >>>>> GCC 6 and 7 would vectorise: >>>>> >>>>> void >>>>> f (unsigned long incx, unsigned long incy, >>>>> float *restrict dx, float *restrict dy) >>>>> { >>>>> unsigned long ix = 0, iy = 0; >>>>> for (unsigned long i = 0; i < 512; ++i) >>>>> { >>>>> dy[iy] += dx[ix]; >>>>> ix += incx; >>>>> iy += incy; >>>>> } >>>>> } >>>>> >>>>> without first proving that incy is nonzero. This is a regression from >>>>> GCC 5. It was fixed on trunk in r223486, which versioned the loop based >>>>> on whether incy is zero, but that's obviously too invasive to backport. >>>>> This patch instead bails out for non-constant steps in the place that >>>>> trunk would try a check for zeroness. >>>>> >>>>> I did wonder about trying to use range information to prove nonzeroness >>>>> for SSA_NAMEs, but that would be entirely new code and didn't seem >>>>> suitable for a release branch. >>>>> >>>>> Tested on aarch64-linux-gnu. OK for GCC 7 and 6? I'll add the testcase >>>>> to trunk too. >>>> >>>> Given dist == 0 isn't it enough to test either DR_STEP (dra) or >>>> DR_STEP (drb)? >>>> That seems what trunk is doing (just look at dr_zero_step_indicator of dra). >>>> Even when not using range-info I think that using ! >>>> tree_expr_nonzero_p (DR_STEP (dra)) >>>> is more to the point of the issue we're fixing -- that also would catch >>>> integer_zerop (DR_STEP (dra)) which trunk handles by failing as well but your >>>> patch wouldn't so it looks like a more "complete" fix. >>> >>> OK. >>> >>>> Last but not least trunk and your patch guards all this by >>>> !loop->force_vectorize. >>>> But force_vectorize doesn't give any such guarantee that step isn't >>>> zero so I wonder >>>> why we deliberately choose to possibly miscompile stuff here? >>> >>> This was based on the pre-existing: >>> >>> if (loop_vinfo && integer_zerop (step)) >>> { >>> GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; >>> if (!nested_in_vect_loop_p (loop, stmt)) >>> return DR_IS_READ (dr); >>> /* Allow references with zero step for outer loops marked >>> with pragma omp simd only - it guarantees absence of >>> loop-carried dependencies between inner loop iterations. */ >>> if (!loop->force_vectorize) >>> { >>> if (dump_enabled_p ()) >>> dump_printf_loc (MSG_NOTE, vect_location, >>> "zero step in inner loop of nest\n"); >>> return false; >>> } >>> } >>> >>> AIUI #pragma omp simd really does guarantee that iterations of >>> the loop can be executed concurrently (up to the limit given by >>> safelen if present). So something like: >>> >>> #pragma omp simd >>> for (int i = 0; i < n; ++i) >>> a[i * step] += 1; >>> >>> would be incorrect for step==0. (#pragma ordered simd forces >>> things to be executed in order where necessary.) >> >> But then we should check safelen, not force_vectorize... > > OK. Following on from irc, this version uses expr_not_equal_to > instead of tree_expr_nonzero_p. It also adds safelen to the existing > use of force_vectorize (which seemed safer than replacing it). > > Tested on aarch64-linux-gnu. OK to install? Ok. Richard. > Richard > > > 2018-02-28 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > PR tree-optimization/84485 > * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Return > true for zero dependence distances if the step might be zero, > and if there is no metadata that guarantees correctness. > (vect_analyze_data_ref_access): Check safelen as well as > force_vectorize. > > gcc/testsuite/ > PR tree-optimization/84485 > * gcc.dg/vect/pr84485.c: New test. > > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2018-02-28 14:19:22.326678337 +0000 > +++ gcc/tree-vect-data-refs.c 2018-03-02 13:28:06.339321095 +0000 > @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct > } > } > > + unsigned int step_prec = TYPE_PRECISION (TREE_TYPE (DR_STEP (dra))); > + if (loop->safelen < 2 > + && !expr_not_equal_to (DR_STEP (dra), wi::zero (step_prec))) > + { > + if (dump_enabled_p ()) > + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > + "step could be zero.\n"); > + return true; > + } > + > continue; > } > > @@ -2515,7 +2525,7 @@ vect_analyze_data_ref_access (struct dat > /* Allow references with zero step for outer loops marked > with pragma omp simd only - it guarantees absence of > loop-carried dependencies between inner loop iterations. */ > - if (!loop->force_vectorize) > + if (!loop->force_vectorize || loop->safelen < 2) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > Index: gcc/testsuite/gcc.dg/vect/pr84485.c > =================================================================== > --- /dev/null 2018-03-01 08:17:49.562264353 +0000 > +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-03-02 13:28:06.338321095 +0000 > @@ -0,0 +1,34 @@ > +/* { dg-do run } */ > + > +#include "tree-vect.h" > + > +#define N 256 > + > +void __attribute__ ((noinline, noclone)) > +f (unsigned long incx, unsigned long incy, > + float *restrict dx, float *restrict dy) > +{ > + unsigned long ix = 0, iy = 0; > + for (unsigned long i = 0; i < N; ++i) > + { > + dy[iy] += dx[ix]; > + ix += incx; > + iy += incy; > + } > +} > + > +float a = 0.0; > +float b[N]; > + > +int > +main (void) > +{ > + check_vect (); > + > + for (int i = 0; i < N; ++i) > + b[i] = i; > + f (1, 0, b, &a); > + if (a != N * (N - 1) / 2) > + __builtin_abort (); > + return 0; > +}
Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-07-27 18:08:43.779978373 +0100 +++ gcc/tree-vect-data-refs.c 2018-02-28 14:16:36.621113244 +0000 @@ -394,6 +394,16 @@ vect_analyze_data_ref_dependence (struct } } + if (!loop->force_vectorize + && (TREE_CODE (DR_STEP (dra)) != INTEGER_CST + || TREE_CODE (DR_STEP (drb)) != INTEGER_CST)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "step could be zero.\n"); + return true; + } + continue; } Index: gcc/testsuite/gcc.dg/vect/pr84485.c =================================================================== --- /dev/null 2018-02-26 10:26:41.624847060 +0000 +++ gcc/testsuite/gcc.dg/vect/pr84485.c 2018-02-28 14:16:36.620112862 +0000 @@ -0,0 +1,34 @@ +/* { dg-do run } */ + +#include "tree-vect.h" + +#define N 256 + +void __attribute__ ((noinline, noclone)) +f (unsigned long incx, unsigned long incy, + float *restrict dx, float *restrict dy) +{ + unsigned long ix = 0, iy = 0; + for (unsigned long i = 0; i < N; ++i) + { + dy[iy] += dx[ix]; + ix += incx; + iy += incy; + } +} + +float a = 0.0; +float b[N]; + +int +main (void) +{ + check_vect (); + + for (int i = 0; i < N; ++i) + b[i] = i; + f (1, 0, b, &a); + if (a != N * (N - 1) / 2) + __builtin_abort (); + return 0; +}