Message ID | 201206151313.q5FDDK22032184@d06av02.portsmouth.uk.ibm.com |
---|---|
State | Accepted |
Headers | show |
On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > Hello, > > PR tree-optimization/53636 is a crash due to an invalid unaligned access > generated by the vectorizer. > > The problem is that vect_compute_data_ref_alignment uses DR_ALIGNED_TO > as computed by the default data-ref analysis to decide whether an access > is sufficiently aligned for the vectorizer. > > However, this analysis computes the scalar evolution relative to the > innermost loop in which the access takes place; DR_ALIGNED_TO only > reflects the known alignmnent of the *base* address according to > that evolution. > > Now, if we're actually about to vectorize this particular loop, then > just checking the alignment of the base is exactly what we need to do > (subsequent accesses will usually be misaligned, but that's OK since > we're transforming those into a vector access). > > However, if we're actually currently vectorizing something else, this > test is not sufficient. The code currently already checks for the > case where we're performing outer-loop vectorization. In this case, > we need to check alignment of the access on *every* pass through the > inner loop, as the comment states: > > /* In case the dataref is in an inner-loop of the loop that is being > vectorized (LOOP), we use the base and misalignment information > relative to the outer-loop (LOOP). This is ok only if the misalignment > stays the same throughout the execution of the inner-loop, which is why > we have to check that the stride of the dataref in the inner-loop evenly > divides by the vector size. */ > > However, there is a second case where we need to check every pass: if > we're not actually vectorizing any loop, but are performing basic-block > SLP. In this case, it would appear that we need the same check as > described in the comment above, i.e. to verify that the stride is a > multiple of the vector size. > > The patch below adds this check, and this indeed fixes the invalid access > I was seeing in the test case (in the final assembler, we now get a vld1.16 > instead of vldr). > > Tested on arm-linux-gnueabi with no regressions. > > OK for mainline? Ok. Thanks, Richard. > Bye, > Ulrich > > > ChangeLog: > > gcc/ > PR tree-optimization/53636 > * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Verify > stride when doing basic-block vectorization. > > gcc/testsuite/ > PR tree-optimization/53636 > * gcc.target/arm/pr53636.c: New test. > > > === added file 'gcc/testsuite/gcc.target/arm/pr53636.c' > --- gcc/testsuite/gcc.target/arm/pr53636.c 1970-01-01 00:00:00 +0000 > +++ gcc/testsuite/gcc.target/arm/pr53636.c 2012-06-11 17:31:41 +0000 > @@ -0,0 +1,48 @@ > +/* { dg-do run } */ > +/* { dg-require-effective-target arm_neon_hw } */ > +/* { dg-options "-O -ftree-vectorize" } */ > +/* { dg-add-options arm_neon } */ > + > +void fill (short *buf) __attribute__ ((noinline)); > +void fill (short *buf) > +{ > + int i; > + > + for (i = 0; i < 11 * 8; i++) > + buf[i] = i; > +} > + > +void test (unsigned char *dst) __attribute__ ((noinline)); > +void test (unsigned char *dst) > +{ > + short tmp[11 * 8], *tptr; > + int i; > + > + fill (tmp); > + > + tptr = tmp; > + for (i = 0; i < 8; i++) > + { > + dst[0] = (-tptr[0] + 9 * tptr[0 + 1] + 9 * tptr[0 + 2] - tptr[0 + 3]) >> 7; > + dst[1] = (-tptr[1] + 9 * tptr[1 + 1] + 9 * tptr[1 + 2] - tptr[1 + 3]) >> 7; > + dst[2] = (-tptr[2] + 9 * tptr[2 + 1] + 9 * tptr[2 + 2] - tptr[2 + 3]) >> 7; > + dst[3] = (-tptr[3] + 9 * tptr[3 + 1] + 9 * tptr[3 + 2] - tptr[3 + 3]) >> 7; > + dst[4] = (-tptr[4] + 9 * tptr[4 + 1] + 9 * tptr[4 + 2] - tptr[4 + 3]) >> 7; > + dst[5] = (-tptr[5] + 9 * tptr[5 + 1] + 9 * tptr[5 + 2] - tptr[5 + 3]) >> 7; > + dst[6] = (-tptr[6] + 9 * tptr[6 + 1] + 9 * tptr[6 + 2] - tptr[6 + 3]) >> 7; > + dst[7] = (-tptr[7] + 9 * tptr[7 + 1] + 9 * tptr[7 + 2] - tptr[7 + 3]) >> 7; > + > + dst += 8; > + tptr += 11; > + } > +} > + > +int main (void) > +{ > + char buf [8 * 8]; > + > + test (buf); > + > + return 0; > +} > + > > === modified file 'gcc/tree-vect-data-refs.c' > --- gcc/tree-vect-data-refs.c 2012-05-31 08:46:10 +0000 > +++ gcc/tree-vect-data-refs.c 2012-06-11 17:31:41 +0000 > @@ -845,6 +845,24 @@ > } > } > > + /* Similarly, if we're doing basic-block vectorization, we can only use > + base and misalignment information relative to an innermost loop if the > + misalignment stays the same throughout the execution of the loop. > + As above, this is the case if the stride of the dataref evenly divides > + by the vector size. */ > + if (!loop) > + { > + tree step = DR_STEP (dr); > + HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step); > + > + if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0) > + { > + if (vect_print_dump_info (REPORT_ALIGNMENT)) > + fprintf (vect_dump, "SLP: step doesn't divide the vector-size."); > + misalign = NULL_TREE; > + } > + } > + > base = build_fold_indirect_ref (base_addr); > alignment = ssize_int (TYPE_ALIGN (vectype)/BITS_PER_UNIT); > > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > Ulrich.Weigand@de.ibm.com >
On Fri, Jun 15, 2012 at 5:00 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > Richard Guenther wrote: >> On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: >> > However, there is a second case where we need to check every pass: if >> > we're not actually vectorizing any loop, but are performing basic-block >> > SLP. In this case, it would appear that we need the same check as >> > described in the comment above, i.e. to verify that the stride is a >> > multiple of the vector size. >> > >> > The patch below adds this check, and this indeed fixes the invalid access >> > I was seeing in the test case (in the final assembler, we now get a >> > vld1.16 instead of vldr). >> > >> > Tested on arm-linux-gnueabi with no regressions. >> > >> > OK for mainline? >> >> Ok. > > Thanks for the quick review; I've checked this in to mainline now. > > I just noticed that the test case also crashes on 4.7, but not on 4.6. > > Would a backport to 4.7 also be OK, once testing passes? Yes. Please leave it on mainline a few days to catch fallout from autotesters. Thanks, Richard. > Thanks, > Ulrich > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > Ulrich.Weigand@de.ibm.com >
Richard Guenther writes: > On Fri, Jun 15, 2012 at 5:00 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > > Richard Guenther wrote: > >> On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > >> > However, there is a second case where we need to check every pass: if > >> > we're not actually vectorizing any loop, but are performing basic-block > >> > SLP. In this case, it would appear that we need the same check as > >> > described in the comment above, i.e. to verify that the stride is a > >> > multiple of the vector size. > >> > > >> > The patch below adds this check, and this indeed fixes the invalid access > >> > I was seeing in the test case (in the final assembler, we now get a > >> > vld1.16 instead of vldr). > >> > > >> > Tested on arm-linux-gnueabi with no regressions. > >> > > >> > OK for mainline? > >> > >> Ok. > > > > Thanks for the quick review; I've checked this in to mainline now. > > > > I just noticed that the test case also crashes on 4.7, but not on 4.6. > > > > Would a backport to 4.7 also be OK, once testing passes? > > Yes. Please leave it on mainline a few days to catch fallout from > autotesters. This patch caused FAIL: gcc.dg/vect/bb-slp-16.c scan-tree-dump-times slp "basic block vectorized using SLP" 1 on sparc64-linux. Comparing the pre and post patch dumps for that file shows 22: vect_compute_data_ref_alignment: 22: misalign = 4 bytes of ref MEM[(unsigned int *)pout_90 + 28B] 22: vect_compute_data_ref_alignment: -22: force alignment of arr[i_87] -22: misalign = 0 bytes of ref arr[i_87] +22: SLP: step doesn't divide the vector-size. +22: Unknown alignment for access: arr (lots of stuff that's simply gone) -22: BASIC BLOCK VECTORIZED - -22: basic block vectorized using SLP +22: not vectorized: unsupported unaligned store.arr[i_87] +22: not vectorized: unsupported alignment in basic block. /Mikael
On Tue, Jun 19, 2012 at 11:36 PM, Mikael Pettersson <mikpe@it.uu.se> wrote: > Richard Guenther writes: > > On Fri, Jun 15, 2012 at 5:00 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > > > Richard Guenther wrote: > > >> On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > > >> > However, there is a second case where we need to check every pass: if > > >> > we're not actually vectorizing any loop, but are performing basic-block > > >> > SLP. In this case, it would appear that we need the same check as > > >> > described in the comment above, i.e. to verify that the stride is a > > >> > multiple of the vector size. > > >> > > > >> > The patch below adds this check, and this indeed fixes the invalid access > > >> > I was seeing in the test case (in the final assembler, we now get a > > >> > vld1.16 instead of vldr). > > >> > > > >> > Tested on arm-linux-gnueabi with no regressions. > > >> > > > >> > OK for mainline? > > >> > > >> Ok. > > > > > > Thanks for the quick review; I've checked this in to mainline now. > > > > > > I just noticed that the test case also crashes on 4.7, but not on 4.6. > > > > > > Would a backport to 4.7 also be OK, once testing passes? > > > > Yes. Please leave it on mainline a few days to catch fallout from > > autotesters. > > This patch caused > > FAIL: gcc.dg/vect/bb-slp-16.c scan-tree-dump-times slp "basic block vectorized using SLP" 1 > > on sparc64-linux. Comparing the pre and post patch dumps for that file shows > > 22: vect_compute_data_ref_alignment: > 22: misalign = 4 bytes of ref MEM[(unsigned int *)pout_90 + 28B] > 22: vect_compute_data_ref_alignment: > -22: force alignment of arr[i_87] > -22: misalign = 0 bytes of ref arr[i_87] > +22: SLP: step doesn't divide the vector-size. > +22: Unknown alignment for access: arr > > (lots of stuff that's simply gone) > > -22: BASIC BLOCK VECTORIZED > - > -22: basic block vectorized using SLP > +22: not vectorized: unsupported unaligned store.arr[i_87] > +22: not vectorized: unsupported alignment in basic block. In this testcase the alignment of arr[i] should be irrelevant - it is not part of the stmts that are going to be vectorized. But of course this may be simply an odering issue in how we analyze data-references / statements in basic-block vectorization (thus we possibly did not yet declare the arr[i] = i statement as not taking part in the vectorization). The line > -22: force alignment of arr[i_87] is odd, too - as said we do not need to touch arr when vectorizing the basic-block. Ulrich, can you look into this or do you want me to take a look here? Mikael - please open a bugreport for this. Thanks, Richard. > /Mikael
Richard Guenther writes: > On Tue, Jun 19, 2012 at 11:36 PM, Mikael Pettersson <mikpe@it.uu.se> wrote: > > Richard Guenther writes: > > > On Fri, Jun 15, 2012 at 5:00 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > > > > Richard Guenther wrote: > > > >> On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > > > >> > However, there is a second case where we need to check every pass: if > > > >> > we're not actually vectorizing any loop, but are performing basic-block > > > >> > SLP. In this case, it would appear that we need the same check as > > > >> > described in the comment above, i.e. to verify that the stride is a > > > >> > multiple of the vector size. > > > >> > > > > >> > The patch below adds this check, and this indeed fixes the invalid access > > > >> > I was seeing in the test case (in the final assembler, we now get a > > > >> > vld1.16 instead of vldr). > > > >> > > > > >> > Tested on arm-linux-gnueabi with no regressions. > > > >> > > > > >> > OK for mainline? > > > >> > > > >> Ok. > > > > > > > > Thanks for the quick review; I've checked this in to mainline now. > > > > > > > > I just noticed that the test case also crashes on 4.7, but not on 4.6. > > > > > > > > Would a backport to 4.7 also be OK, once testing passes? > > > > > > Yes. Please leave it on mainline a few days to catch fallout from > > > autotesters. > > > > This patch caused > > > > FAIL: gcc.dg/vect/bb-slp-16.c scan-tree-dump-times slp "basic block vectorized using SLP" 1 > > > > on sparc64-linux. Comparing the pre and post patch dumps for that file shows > > > > 22: vect_compute_data_ref_alignment: > > 22: misalign = 4 bytes of ref MEM[(unsigned int *)pout_90 + 28B] > > 22: vect_compute_data_ref_alignment: > > -22: force alignment of arr[i_87] > > -22: misalign = 0 bytes of ref arr[i_87] > > +22: SLP: step doesn't divide the vector-size. > > +22: Unknown alignment for access: arr > > > > (lots of stuff that's simply gone) > > > > -22: BASIC BLOCK VECTORIZED > > - > > -22: basic block vectorized using SLP > > +22: not vectorized: unsupported unaligned store.arr[i_87] > > +22: not vectorized: unsupported alignment in basic block. > > In this testcase the alignment of arr[i] should be irrelevant - it is > not part of > the stmts that are going to be vectorized. But of course this may be > simply an odering issue in how we analyze data-references / statements > in basic-block vectorization (thus we possibly did not yet declare the > arr[i] = i statement as not taking part in the vectorization). > > The line > > > -22: force alignment of arr[i_87] > > is odd, too - as said we do not need to touch arr when vectorizing the > basic-block. > > Ulrich, can you look into this or do you want me to take a look here? > > Mikael - please open a bugreport for this. I opened PR53729 for this, with an update saying that powerpc64-linux also has this regression. /Mikael
Ulrich Weigand writes: > Richard Guenther wrote: > > > In this testcase the alignment of arr[i] should be irrelevant - it is > > not part of > > the stmts that are going to be vectorized. But of course this may be > > simply an odering issue in how we analyze data-references / statements > > in basic-block vectorization (thus we possibly did not yet declare the > > arr[i] = i statement as not taking part in the vectorization). > > The following patch changes tree-vect-data-refs.c:vect_verify_datarefs_alignment > to only take into account statements marked as "relevant". > > This should have no impact for loop-based vectorization, since the only caller > (vect_enhance_data_refs_alignment) already skips irrelevant statements. > [ As an aside, that routine should probably use STMT_VINFO_RELEVANT_P instead > of just checking STMT_VINFO_RELEVANT as a boolean. ] > > However, for SLP this change in itself doesn't work, since vect_slp_analyze_bb_1 > calls vect_verify_datarefs_alignment *before* statements have actually been > marked as relevant or irrelevant. Therefore, the patch also rearranges the > sequence in vect_slp_analyze_bb_1. > > This in turn caused ICEs in vect_get_store_cost/vect_get_load_cost, since those > now can get called with statements with unsupported alignment. There is already > one caller (vect_get_data_access_cost) that checks for this case and simply > returns "infinite" cost instead of aborting. The patch moves this behaviour > into vect_get_store_cost/vect_get_load_cost themselves. (The particular cost > shouldn't matter since vectorization will still be rejected in the end, it's > just that the test now runs a bit later.) > > Overall, I've tested the patch with no regresions on powerpc64-linux and > arm-linux-gnueabi. On PowerPC, it fixed the gcc.dg/vect/bb-slp-16.c test. > > Mikael, would you mind verifying that it fixes the problem on sparc64? On sparc64-linux it fixed the bb-slp-16.c regression and didn't cause any new ones. /Mikael > > OK for mainline? > > Bye, > Ulrich > > > ChangeLog: > > PR tree-optimization/53729 > PR tree-optimization/53636 > * tree-vect-slp.c (vect_slp_analyze_bb_1): Delay call to > vect_verify_datarefs_alignment until after statements have > been marked as relevant/irrelevant. > * tree-vect-data-refs.c (vect_verify_datarefs_alignment): > Skip irrelevant statements. > (vect_enhance_data_refs_alignment): Use STMT_VINFO_RELEVANT_P > instead of STMT_VINFO_RELEVANT. > (vect_get_data_access_cost): Do not check for supportable > alignment before calling vect_get_load_cost/vect_get_store_cost. > * tree-vect-stmts.c (vect_get_store_cost): Do not abort when > handling unsupported alignment. > (vect_get_load_cost): Likewise. > > > Index: gcc/tree-vect-data-refs.c > =================================================================== > *** gcc/tree-vect-data-refs.c (revision 188850) > --- gcc/tree-vect-data-refs.c (working copy) > *************** vect_verify_datarefs_alignment (loop_vec > *** 1094,1099 **** > --- 1094,1102 ---- > gimple stmt = DR_STMT (dr); > stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > > + if (!STMT_VINFO_RELEVANT_P (stmt_info)) > + continue; > + > /* For interleaving, only the alignment of the first access matters. > Skip statements marked as not vectorizable. */ > if ((STMT_VINFO_GROUPED_ACCESS (stmt_info) > *************** vect_get_data_access_cost (struct data_r > *** 1213,1229 **** > loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > int ncopies = vf / nunits; > - bool supportable_dr_alignment = vect_supportable_dr_alignment (dr, true); > > ! if (!supportable_dr_alignment) > ! *inside_cost = VECT_MAX_COST; > else > ! { > ! if (DR_IS_READ (dr)) > ! vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost); > ! else > ! vect_get_store_cost (dr, ncopies, inside_cost); > ! } > > if (vect_print_dump_info (REPORT_COST)) > fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, " > --- 1216,1226 ---- > loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > int ncopies = vf / nunits; > > ! if (DR_IS_READ (dr)) > ! vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost); > else > ! vect_get_store_cost (dr, ncopies, inside_cost); > > if (vect_print_dump_info (REPORT_COST)) > fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, " > *************** vect_enhance_data_refs_alignment (loop_v > *** 1537,1543 **** > stmt = DR_STMT (dr); > stmt_info = vinfo_for_stmt (stmt); > > ! if (!STMT_VINFO_RELEVANT (stmt_info)) > continue; > > /* For interleaving, only the alignment of the first access > --- 1534,1540 ---- > stmt = DR_STMT (dr); > stmt_info = vinfo_for_stmt (stmt); > > ! if (!STMT_VINFO_RELEVANT_P (stmt_info)) > continue; > > /* For interleaving, only the alignment of the first access > Index: gcc/tree-vect-stmts.c > =================================================================== > *** gcc/tree-vect-stmts.c (revision 188850) > --- gcc/tree-vect-stmts.c (working copy) > *************** vect_get_store_cost (struct data_referen > *** 931,936 **** > --- 931,946 ---- > break; > } > > + case dr_unaligned_unsupported: > + { > + *inside_cost = VECT_MAX_COST; > + > + if (vect_print_dump_info (REPORT_COST)) > + fprintf (vect_dump, "vect_model_store_cost: unsupported access."); > + > + break; > + } > + > default: > gcc_unreachable (); > } > *************** vect_get_load_cost (struct data_referenc > *** 1094,1099 **** > --- 1104,1119 ---- > break; > } > > + case dr_unaligned_unsupported: > + { > + *inside_cost = VECT_MAX_COST; > + > + if (vect_print_dump_info (REPORT_COST)) > + fprintf (vect_dump, "vect_model_load_cost: unsupported access."); > + > + break; > + } > + > default: > gcc_unreachable (); > } > Index: gcc/tree-vect-slp.c > =================================================================== > *** gcc/tree-vect-slp.c (revision 188850) > --- gcc/tree-vect-slp.c (working copy) > *************** vect_slp_analyze_bb_1 (basic_block bb) > *** 2050,2065 **** > return NULL; > } > > - if (!vect_verify_datarefs_alignment (NULL, bb_vinfo)) > - { > - if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS)) > - fprintf (vect_dump, "not vectorized: unsupported alignment in basic " > - "block.\n"); > - > - destroy_bb_vec_info (bb_vinfo); > - return NULL; > - } > - > /* Check the SLP opportunities in the basic block, analyze and build SLP > trees. */ > if (!vect_analyze_slp (NULL, bb_vinfo)) > --- 2050,2055 ---- > *************** vect_slp_analyze_bb_1 (basic_block bb) > *** 2082,2087 **** > --- 2072,2087 ---- > vect_mark_slp_stmts_relevant (SLP_INSTANCE_TREE (instance)); > } > > + if (!vect_verify_datarefs_alignment (NULL, bb_vinfo)) > + { > + if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS)) > + fprintf (vect_dump, "not vectorized: unsupported alignment in basic " > + "block.\n"); > + > + destroy_bb_vec_info (bb_vinfo); > + return NULL; > + } > + > if (!vect_slp_analyze_operations (bb_vinfo)) > { > if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS)) > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > Ulrich.Weigand@de.ibm.com >
On Mon, Jun 25, 2012 at 5:44 PM, Ulrich Weigand <uweigand@de.ibm.com> wrote: > Richard Guenther wrote: > >> In this testcase the alignment of arr[i] should be irrelevant - it is >> not part of >> the stmts that are going to be vectorized. But of course this may be >> simply an odering issue in how we analyze data-references / statements >> in basic-block vectorization (thus we possibly did not yet declare the >> arr[i] = i statement as not taking part in the vectorization). > > The following patch changes tree-vect-data-refs.c:vect_verify_datarefs_alignment > to only take into account statements marked as "relevant". > > This should have no impact for loop-based vectorization, since the only caller > (vect_enhance_data_refs_alignment) already skips irrelevant statements. > [ As an aside, that routine should probably use STMT_VINFO_RELEVANT_P instead > of just checking STMT_VINFO_RELEVANT as a boolean. ] > > However, for SLP this change in itself doesn't work, since vect_slp_analyze_bb_1 > calls vect_verify_datarefs_alignment *before* statements have actually been > marked as relevant or irrelevant. Therefore, the patch also rearranges the > sequence in vect_slp_analyze_bb_1. > > This in turn caused ICEs in vect_get_store_cost/vect_get_load_cost, since those > now can get called with statements with unsupported alignment. There is already > one caller (vect_get_data_access_cost) that checks for this case and simply > returns "infinite" cost instead of aborting. The patch moves this behaviour > into vect_get_store_cost/vect_get_load_cost themselves. (The particular cost > shouldn't matter since vectorization will still be rejected in the end, it's > just that the test now runs a bit later.) > > Overall, I've tested the patch with no regresions on powerpc64-linux and > arm-linux-gnueabi. On PowerPC, it fixed the gcc.dg/vect/bb-slp-16.c test. > > Mikael, would you mind verifying that it fixes the problem on sparc64? > > OK for mainline? Ok. Thanks, Richard. > Bye, > Ulrich > > > ChangeLog: > > PR tree-optimization/53729 > PR tree-optimization/53636 > * tree-vect-slp.c (vect_slp_analyze_bb_1): Delay call to > vect_verify_datarefs_alignment until after statements have > been marked as relevant/irrelevant. > * tree-vect-data-refs.c (vect_verify_datarefs_alignment): > Skip irrelevant statements. > (vect_enhance_data_refs_alignment): Use STMT_VINFO_RELEVANT_P > instead of STMT_VINFO_RELEVANT. > (vect_get_data_access_cost): Do not check for supportable > alignment before calling vect_get_load_cost/vect_get_store_cost. > * tree-vect-stmts.c (vect_get_store_cost): Do not abort when > handling unsupported alignment. > (vect_get_load_cost): Likewise. > > > Index: gcc/tree-vect-data-refs.c > =================================================================== > *** gcc/tree-vect-data-refs.c (revision 188850) > --- gcc/tree-vect-data-refs.c (working copy) > *************** vect_verify_datarefs_alignment (loop_vec > *** 1094,1099 **** > --- 1094,1102 ---- > gimple stmt = DR_STMT (dr); > stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > > + if (!STMT_VINFO_RELEVANT_P (stmt_info)) > + continue; > + > /* For interleaving, only the alignment of the first access matters. > Skip statements marked as not vectorizable. */ > if ((STMT_VINFO_GROUPED_ACCESS (stmt_info) > *************** vect_get_data_access_cost (struct data_r > *** 1213,1229 **** > loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > int ncopies = vf / nunits; > - bool supportable_dr_alignment = vect_supportable_dr_alignment (dr, true); > > ! if (!supportable_dr_alignment) > ! *inside_cost = VECT_MAX_COST; > else > ! { > ! if (DR_IS_READ (dr)) > ! vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost); > ! else > ! vect_get_store_cost (dr, ncopies, inside_cost); > ! } > > if (vect_print_dump_info (REPORT_COST)) > fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, " > --- 1216,1226 ---- > loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > int ncopies = vf / nunits; > > ! if (DR_IS_READ (dr)) > ! vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost); > else > ! vect_get_store_cost (dr, ncopies, inside_cost); > > if (vect_print_dump_info (REPORT_COST)) > fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, " > *************** vect_enhance_data_refs_alignment (loop_v > *** 1537,1543 **** > stmt = DR_STMT (dr); > stmt_info = vinfo_for_stmt (stmt); > > ! if (!STMT_VINFO_RELEVANT (stmt_info)) > continue; > > /* For interleaving, only the alignment of the first access > --- 1534,1540 ---- > stmt = DR_STMT (dr); > stmt_info = vinfo_for_stmt (stmt); > > ! if (!STMT_VINFO_RELEVANT_P (stmt_info)) > continue; > > /* For interleaving, only the alignment of the first access > Index: gcc/tree-vect-stmts.c > =================================================================== > *** gcc/tree-vect-stmts.c (revision 188850) > --- gcc/tree-vect-stmts.c (working copy) > *************** vect_get_store_cost (struct data_referen > *** 931,936 **** > --- 931,946 ---- > break; > } > > + case dr_unaligned_unsupported: > + { > + *inside_cost = VECT_MAX_COST; > + > + if (vect_print_dump_info (REPORT_COST)) > + fprintf (vect_dump, "vect_model_store_cost: unsupported access."); > + > + break; > + } > + > default: > gcc_unreachable (); > } > *************** vect_get_load_cost (struct data_referenc > *** 1094,1099 **** > --- 1104,1119 ---- > break; > } > > + case dr_unaligned_unsupported: > + { > + *inside_cost = VECT_MAX_COST; > + > + if (vect_print_dump_info (REPORT_COST)) > + fprintf (vect_dump, "vect_model_load_cost: unsupported access."); > + > + break; > + } > + > default: > gcc_unreachable (); > } > Index: gcc/tree-vect-slp.c > =================================================================== > *** gcc/tree-vect-slp.c (revision 188850) > --- gcc/tree-vect-slp.c (working copy) > *************** vect_slp_analyze_bb_1 (basic_block bb) > *** 2050,2065 **** > return NULL; > } > > - if (!vect_verify_datarefs_alignment (NULL, bb_vinfo)) > - { > - if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS)) > - fprintf (vect_dump, "not vectorized: unsupported alignment in basic " > - "block.\n"); > - > - destroy_bb_vec_info (bb_vinfo); > - return NULL; > - } > - > /* Check the SLP opportunities in the basic block, analyze and build SLP > trees. */ > if (!vect_analyze_slp (NULL, bb_vinfo)) > --- 2050,2055 ---- > *************** vect_slp_analyze_bb_1 (basic_block bb) > *** 2082,2087 **** > --- 2072,2087 ---- > vect_mark_slp_stmts_relevant (SLP_INSTANCE_TREE (instance)); > } > > + if (!vect_verify_datarefs_alignment (NULL, bb_vinfo)) > + { > + if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS)) > + fprintf (vect_dump, "not vectorized: unsupported alignment in basic " > + "block.\n"); > + > + destroy_bb_vec_info (bb_vinfo); > + return NULL; > + } > + > if (!vect_slp_analyze_operations (bb_vinfo)) > { > if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS)) > > -- > Dr. Ulrich Weigand > GNU Toolchain for Linux on System z and Cell BE > Ulrich.Weigand@de.ibm.com >
=== added file 'gcc/testsuite/gcc.target/arm/pr53636.c' --- gcc/testsuite/gcc.target/arm/pr53636.c 1970-01-01 00:00:00 +0000 +++ gcc/testsuite/gcc.target/arm/pr53636.c 2012-06-11 17:31:41 +0000 @@ -0,0 +1,48 @@ +/* { dg-do run } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-options "-O -ftree-vectorize" } */ +/* { dg-add-options arm_neon } */ + +void fill (short *buf) __attribute__ ((noinline)); +void fill (short *buf) +{ + int i; + + for (i = 0; i < 11 * 8; i++) + buf[i] = i; +} + +void test (unsigned char *dst) __attribute__ ((noinline)); +void test (unsigned char *dst) +{ + short tmp[11 * 8], *tptr; + int i; + + fill (tmp); + + tptr = tmp; + for (i = 0; i < 8; i++) + { + dst[0] = (-tptr[0] + 9 * tptr[0 + 1] + 9 * tptr[0 + 2] - tptr[0 + 3]) >> 7; + dst[1] = (-tptr[1] + 9 * tptr[1 + 1] + 9 * tptr[1 + 2] - tptr[1 + 3]) >> 7; + dst[2] = (-tptr[2] + 9 * tptr[2 + 1] + 9 * tptr[2 + 2] - tptr[2 + 3]) >> 7; + dst[3] = (-tptr[3] + 9 * tptr[3 + 1] + 9 * tptr[3 + 2] - tptr[3 + 3]) >> 7; + dst[4] = (-tptr[4] + 9 * tptr[4 + 1] + 9 * tptr[4 + 2] - tptr[4 + 3]) >> 7; + dst[5] = (-tptr[5] + 9 * tptr[5 + 1] + 9 * tptr[5 + 2] - tptr[5 + 3]) >> 7; + dst[6] = (-tptr[6] + 9 * tptr[6 + 1] + 9 * tptr[6 + 2] - tptr[6 + 3]) >> 7; + dst[7] = (-tptr[7] + 9 * tptr[7 + 1] + 9 * tptr[7 + 2] - tptr[7 + 3]) >> 7; + + dst += 8; + tptr += 11; + } +} + +int main (void) +{ + char buf [8 * 8]; + + test (buf); + + return 0; +} + === modified file 'gcc/tree-vect-data-refs.c' --- gcc/tree-vect-data-refs.c 2012-05-31 08:46:10 +0000 +++ gcc/tree-vect-data-refs.c 2012-06-11 17:31:41 +0000 @@ -845,6 +845,24 @@ } } + /* Similarly, if we're doing basic-block vectorization, we can only use + base and misalignment information relative to an innermost loop if the + misalignment stays the same throughout the execution of the loop. + As above, this is the case if the stride of the dataref evenly divides + by the vector size. */ + if (!loop) + { + tree step = DR_STEP (dr); + HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step); + + if (dr_step % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0) + { + if (vect_print_dump_info (REPORT_ALIGNMENT)) + fprintf (vect_dump, "SLP: step doesn't divide the vector-size."); + misalign = NULL_TREE; + } + } + base = build_fold_indirect_ref (base_addr); alignment = ssize_int (TYPE_ALIGN (vectype)/BITS_PER_UNIT);