Message ID | 87tw52s73r.fsf@linaro.org |
---|---|

State | New |

Headers | show |

On Wed, May 3, 2017 at 9:00 AM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > This patch tries to calculate conservatively-correct distance > vectors for two references whose base addresses are not the same. > It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence > isn't guaranteed to occur. > > The motivating example is: > > struct s { int x[8]; }; > void > f (struct s *a, struct s *b) > { > for (int i = 0; i < 8; ++i) > a->x[i] += b->x[i]; > } > > in which the "a" and "b" accesses are either independent or have a > dependence distance of 0 (assuming -fstrict-aliasing). Neither case > prevents vectorisation, so we can vectorise without an alias check. > > I'd originally wanted to do the same thing for arrays as well, e.g.: > > void > f (int a[][8], struct b[][8]) > { > for (int i = 0; i < 8; ++i) > a[0][i] += b[0][i]; > } > > I think this is valid because C11 6.7.6.2/6 says: > > For two array types to be compatible, both shall have compatible > element types, and if both size specifiers are present, and are > integer constant expressions, then both size specifiers shall have > the same constant value. > > So if we access an array through an int (*)[8], it must have type X[8] > or X[], where X is compatible with int. It doesn't seem possible in > either case for "a[0]" and "b[0]" to overlap when "a != b". > > However, Richard B said that (at least in gimple) we support arbitrary > overlap of arrays and allow arrays to be accessed with different > dimensionality. There are examples of this in PR50067. I've therefore > only handled references that end in a structure field access. > > There are two ways of handling these dependences in the vectoriser: > use them to limit VF, or check at runtime as before. I've gone for > the approach of checking at runtime if we can, to avoid limiting VF > unnecessarily. We still fall back to a VF cap when runtime checks > aren't allowed. > > The patch tests whether we queued an alias check with a dependence > distance of X and then picked a VF <= X, in which case it's safe to > drop the alias check. Since vect_prune_runtime_alias_check_list can > be called twice with different VF for the same loop, it's no longer > safe to clear may_alias_ddrs on exit. Instead we should use > comp_alias_ddrs to check whether versioning is necessary. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Hi Richard, It's nice to explore more alias opportunity, below are some simple comments embedded. > > Thanks, > Richard > > > gcc/ > 2017-05-03 Richard Sandiford <richard.sandiford@linaro.org> > > * tree-data-ref.h (subscript): Add access_fn field. > (data_dependence_relation): Add could_be_independent_p. > (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. > (same_access_functions): Move to tree-data-ref.c. > * tree-data-ref.c (ref_contains_union_access_p): New function. > (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of > DR_ACCESS_FN. > (constant_access_functions): Likewise. > (add_other_self_distances): Likewise. > (same_access_functions): Likewise. (Moved from tree-data-ref.h.) > (initialize_data_dependence_relation): Use XCNEW and remove > explicit zeroing of DDR_REVERSED_P. Look for a subsequence > of access functions that have the same type. Allow the > subsequence to end with different bases in some circumstances. > Record the chosen access functions in SUB_ACCESS_FN. > (build_classic_dist_vector_1): Replace ddr_a and ddr_b with > a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. > (subscript_dependence_tester_1): Likewise dra and drb. > (build_classic_dist_vector): Update calls accordingly. > (subscript_dependence_tester): Likewise. > * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check > DDR_COULD_BE_INDEPENDENT_P. > * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test > comp_alias_ddrs instead of may_alias_ddrs. > * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Try > to mark for aliasing if DDR_COULD_BE_INDEPENDENT_P, but fall back > to using the recorded distance vectors if that fails. > (dependence_distance_ge_vf): New function. > (vect_prune_runtime_alias_test_list): Use it. Don't clear > LOOP_VINFO_MAY_ALIAS_DDRS. > > gcc/testsuite/ > * gcc.dg/vect/vect-alias-check-3.c: New test. > * gcc.dg/vect/vect-alias-check-4.c: Likewise. > * gcc.dg/vect/vect-alias-check-5.c: Likewise. > > Index: gcc/tree-data-ref.h > =================================================================== > --- gcc/tree-data-ref.h 2017-05-03 08:48:11.977015306 +0100 > +++ gcc/tree-data-ref.h 2017-05-03 08:48:48.737038502 +0100 > @@ -191,6 +191,9 @@ struct conflict_function > > struct subscript > { > + /* The access functions of the two references. */ > + tree access_fn[2]; Is it better to follow existing code, i.e, name this as access_fn_a/access_fn_b. Thus we don't need to use const value 0/1 in various places, which is a little bit confusing. > + > /* A description of the iterations for which the elements are > accessed twice. */ > conflict_function *conflicting_iterations_in_a; > @@ -209,6 +212,7 @@ struct subscript > > typedef struct subscript *subscript_p; > > +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] > #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a > #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b > #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict > @@ -264,6 +268,33 @@ struct data_dependence_relation > /* Set to true when the dependence relation is on the same data > access. */ > bool self_reference_p; > + > + /* True if the dependence described is conservatively correct rather > + than exact, and if it is still possible for the accesses to be > + conditionally independent. For example, the a and b references in: > + > + struct s *a, *b; > + for (int i = 0; i < n; ++i) > + a->f[i] += b->f[i]; > + > + conservatively have a distance vector of (0), for the case in which > + a == b, but the accesses are independent if a != b. Similarly, > + the a and b references in: > + > + struct s *a, *b; > + for (int i = 0; i < n; ++i) > + a[0].f[i] += b[i].f[i]; > + > + conservatively have a distance vector of (0), but they are indepenent > + when a != b + i. In contrast, the references in: > + > + struct s *a; > + for (int i = 0; i < n; ++i) > + a->f[i] += a->f[i]; > + > + have the same distance vector of (0), but the accesses can never be > + independent. */ > + bool could_be_independent_p; > }; > > typedef struct data_dependence_relation *ddr_p; > @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \ > #define DDR_DIST_VECT(DDR, I) \ > DDR_DIST_VECTS (DDR)[I] > #define DDR_REVERSED_P(DDR) (DDR)->reversed_p > +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p > > > bool dr_analyze_innermost (struct data_reference *, struct loop *); > @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data > return false; > > return true; > -} > - > -/* Return true when the DDR contains two data references that have the > - same access functions. */ > - > -static inline bool > -same_access_functions (const struct data_dependence_relation *ddr) > -{ > - unsigned i; > - > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), > - DR_ACCESS_FN (DDR_B (ddr), i))) > - return false; > - > - return true; > } > > /* Returns true when all the dependences are computable. */ > Index: gcc/tree-data-ref.c > =================================================================== > --- gcc/tree-data-ref.c 2017-02-23 19:54:15.000000000 +0000 > +++ gcc/tree-data-ref.c 2017-05-03 08:48:48.737038502 +0100 > @@ -123,8 +123,7 @@ Software Foundation; either version 3, o > } dependence_stats; > > static bool subscript_dependence_tester_1 (struct data_dependence_relation *, > - struct data_reference *, > - struct data_reference *, > + unsigned int, unsigned int, > struct loop *); As mentioned, how about passing access_fn directly, rather than less meaningful 0/1 values? > /* Returns true iff A divides B. */ > > @@ -144,6 +143,21 @@ int_divides_p (int a, int b) > return ((b % a) == 0); > } > > +/* Return true if reference REF contains a union access. */ > + > +static bool > +ref_contains_union_access_p (tree ref) > +{ > + while (handled_component_p (ref)) > + { > + ref = TREE_OPERAND (ref, 0); > + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE > + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) > + return true; > + } > + return false; > +} > + > > > /* Dump into FILE all the data references from DATAREFS. */ > @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out > unsigned int i; > struct loop *loopi; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > + subscript *sub; > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > { > fprintf (outf, " access_fn_A: "); > - print_generic_stmt (outf, DR_ACCESS_FN (dra, i), 0); > + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0), 0); > fprintf (outf, " access_fn_B: "); > - print_generic_stmt (outf, DR_ACCESS_FN (drb, i), 0); > - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); > + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1), 0); > + dump_subscript (outf, sub); > } > > fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); > @@ -1484,11 +1499,10 @@ initialize_data_dependence_relation (str > struct data_dependence_relation *res; > unsigned int i; > > - res = XNEW (struct data_dependence_relation); > + res = XCNEW (struct data_dependence_relation); > DDR_A (res) = a; > DDR_B (res) = b; > DDR_LOOP_NEST (res).create (0); > - DDR_REVERSED_P (res) = false; > DDR_SUBSCRIPTS (res).create (0); > DDR_DIR_VECTS (res).create (0); > DDR_DIST_VECTS (res).create (0); > @@ -1506,82 +1520,217 @@ initialize_data_dependence_relation (str > return res; > } > > - /* The case where the references are exactly the same. */ > - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) > + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); > + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); > + if (num_dimensions_a == 0 || num_dimensions_b == 0) > { > - if ((loop_nest.exists () > - && !object_address_invariant_in_loop_p (loop_nest[0], > - DR_BASE_OBJECT (a))) > - || DR_NUM_DIMENSIONS (a) == 0) > - { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > - } > - DDR_AFFINE_P (res) = true; > - DDR_ARE_DEPENDENT (res) = NULL_TREE; > - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); > - DDR_LOOP_NEST (res) = loop_nest; > - DDR_INNER_LOOP (res) = 0; > - DDR_SELF_REFERENCE (res) = true; > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > - { > - struct subscript *subscript; > + DDR_ARE_DEPENDENT (res) = chrec_dont_know; > + return res; > + } > + > + /* For unconstrained bases, the outer (highest-index) subscript > + describes a variation in the base of the original DR_REF rather > + than a component access. We have no type that accurately describes > + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* > + applying the outer subscript) so limit the search to the last real > + component access. > + > + E.g. for: > > - subscript = XNEW (struct subscript); > - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); > - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); > - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; > - SUB_DISTANCE (subscript) = chrec_dont_know; > - DDR_SUBSCRIPTS (res).safe_push (subscript); > + void > + f (int a[][8], int b[][8]) > + { > + for (int i = 0; i < 8; ++i) > + a[i * 2][0] = b[i][0]; > } > - return res; > + > + the a and b accesses have a single ARRAY_REF component reference [0] > + but have two subscripts. */ > + if (DR_UNCONSTRAINED_BASE (a)) > + num_dimensions_a -= 1; > + if (DR_UNCONSTRAINED_BASE (b)) > + num_dimensions_b -= 1; > + > + /* Now look for two sequences of component references that have the same > + type in both A and B. The first sequence includes an arbitrary mixture > + of array and structure references while the second always ends on a > + structure reference. > + > + The former (arbitrary) sequence uses access functions: > + > + [START_A, START_A + NUM_DIMENSIONS) of A > + [START_B, START_B + NUM_DIMENSIONS) of B > + > + The latter sequence uses access functions: > + > + [STRUCT_START_A, STRUCT_START_A + STRUCT_NUM_DIMENSIONS) of A > + [STRUCT_START_B, STRUCT_START_B + STRUCT_NUM_DIMENSIONS) of B > + > + STRUCT_REF_A and STRUCT_REF_B are the outer references for the IIUC, A and B always share the same latter sequence, and the common latter sequence ends at a structure reference providing alias information. Is it possible to record the the former arbitrary references instead of simple flag DDR_COULD_BE_INDEPENDENT_P. With this information, alias check can be simplified by stripping away address computation for the shared common sub-sequence. I doubt vect_create_cond_for_alias_checks could detect this kind CSE for now. Ah, I see you changed alias check code generation in order to handle this. > + latter sequence. */ > + unsigned int start_a = 0; > + unsigned int start_b = 0; > + unsigned int num_dimensions = 0; > + unsigned int struct_start_a = 0; > + unsigned int struct_start_b = 0; > + unsigned int struct_num_dimensions = 0; > + unsigned int index_a = 0; > + unsigned int index_b = 0; > + tree next_ref_a = DR_REF (a); > + tree next_ref_b = DR_REF (b); > + tree struct_ref_a = NULL_TREE; > + tree struct_ref_b = NULL_TREE; > + while (index_a < num_dimensions_a && index_b < num_dimensions_b) > + { > + gcc_checking_assert (handled_component_p (next_ref_a)); > + gcc_checking_assert (handled_component_p (next_ref_b)); > + tree outer_ref_a = TREE_OPERAND (next_ref_a, 0); > + tree outer_ref_b = TREE_OPERAND (next_ref_b, 0); > + tree type_a = TREE_TYPE (outer_ref_a); > + tree type_b = TREE_TYPE (outer_ref_b); > + if (types_compatible_p (type_a, type_b)) > + { > + /* This pair of accesses belong to a suitable sequence. */ > + if (start_a + num_dimensions != index_a > + || start_b + num_dimensions != index_b) > + { > + /* Start a new sequence here. */ > + start_a = index_a; > + start_b = index_b; > + num_dimensions = 0; > + } > + num_dimensions += 1; > + if (TREE_CODE (type_a) == RECORD_TYPE) > + { > + struct_start_a = start_a; > + struct_start_b = start_b; > + struct_num_dimensions = num_dimensions; > + struct_ref_a = outer_ref_a; > + struct_ref_b = outer_ref_b; > + } > + next_ref_a = outer_ref_a; > + next_ref_b = outer_ref_b; > + index_a += 1; > + index_b += 1; > + continue; > + } > + /* Try to approach equal type sizes. */ > + if (!COMPLETE_TYPE_P (type_a) > + || !COMPLETE_TYPE_P (type_b) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) > + break; > + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); > + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); > + if (size_a <= size_b) > + { > + index_a += 1; > + next_ref_a = outer_ref_a; > + } > + if (size_b <= size_a) > + { > + index_b += 1; > + next_ref_b = outer_ref_b; > + } > } > > - /* If the references do not access the same object, we do not know > - whether they alias or not. We do not care about TBAA or alignment > - info so we can use OEP_ADDRESS_OF to avoid false negatives. > - But the accesses have to use compatible types as otherwise the > - built indices would not match. */ > - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF) > - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), > - TREE_TYPE (DR_BASE_OBJECT (b)))) > + /* See whether the sequence ends at the base and whether the two bases > + are equal. We do not care about TBAA or alignment info so we can use > + OEP_ADDRESS_OF to avoid false negatives. */ > + tree base_a = DR_BASE_OBJECT (a); > + tree base_b = DR_BASE_OBJECT (b); > + bool same_base_p = (start_a + num_dimensions == num_dimensions_a > + && start_b + num_dimensions == num_dimensions_b > + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) > + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) > + && types_compatible_p (TREE_TYPE (base_a), > + TREE_TYPE (base_b)) > + && (!loop_nest.exists () > + || (object_address_invariant_in_loop_p > + (loop_nest[0], base_a)))); Major change is in function initialize_data_dependence_relation in order to detect partial alias opportunity. The original equality check on DR_BASE_OBJECT is bypassed now. IMHO better to introduce a new parameter to compute_data_reference_for_loop etc., indicating whether we want to handle partial alias opportunity or not. After all, such computation is unnecessary for predcom/prefetch/parloop. It's only a waste of time computing it. > + > + /* If the bases are the same, we can include the base variation too. > + E.g. the b accesses in: > + > + for (int i = 0; i < n; ++i) > + b[i + 4][0] = b[i][0]; > + > + have a definite dependence distance of 4, while for: > + > + for (int i = 0; i < n; ++i) > + a[i + 4][0] = b[i][0]; > + > + the dependence distance depends on the gap between a and b. > + > + If the bases are different then we can only rely on the sequence > + rooted at a structure access, since arrays are allowed to overlap > + arbitrarily and change shape arbitrarily. E.g. we treat this as > + valid code: > + > + int a[256]; > + ... > + ((int (*)[4][3])&a[1])[i][0] += ((int (*)[4][3])&a[2])[i][0]; > + > + where two lvalues with the same int[4][3] type overlap, and where > + both lvalues are distinct from the object's declared type. */ > + if (same_base_p) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + if (DR_UNCONSTRAINED_BASE (a)) > + num_dimensions += 1; > + } > + else > + { > + start_a = struct_start_a; > + start_b = struct_start_b; > + num_dimensions = struct_num_dimensions; > } > > - /* If the base of the object is not invariant in the loop nest, we cannot > - analyze it. TODO -- in fact, it would suffice to record that there may > - be arbitrary dependences in the loops where the base object varies. */ > - if ((loop_nest.exists () > - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a))) > - || DR_NUM_DIMENSIONS (a) == 0) > + /* Punt if we didn't find a suitable sequence. */ > + if (num_dimensions == 0) > { > DDR_ARE_DEPENDENT (res) = chrec_dont_know; > return res; > } > > - /* If the number of dimensions of the access to not agree we can have > - a pointer access to a component of the array element type and an > - array access while the base-objects are still the same. Punt. */ > - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) > + if (!same_base_p) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + /* Partial overlap is possible for different bases when strict aliasing > + is not in effect. It's also possible if either base involves a union > + access; e.g. for: > + > + struct s1 { int a[2]; }; > + struct s2 { struct s1 b; int c; }; > + struct s3 { int d; struct s1 e; }; > + union u { struct s2 f; struct s3 g; } *p, *q; > + > + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at > + "p->g.e" (base "p->g") and might partially overlap the s1 at > + "q->g.e" (base "q->g"). */ > + if (!flag_strict_aliasing > + || ref_contains_union_access_p (struct_ref_a) > + || ref_contains_union_access_p (struct_ref_b)) > + { > + DDR_ARE_DEPENDENT (res) = chrec_dont_know; > + return res; > + } > + > + DDR_COULD_BE_INDEPENDENT_P (res) = true; > } > > DDR_AFFINE_P (res) = true; > DDR_ARE_DEPENDENT (res) = NULL_TREE; > - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); > + DDR_SUBSCRIPTS (res).create (num_dimensions); > DDR_LOOP_NEST (res) = loop_nest; > DDR_INNER_LOOP (res) = 0; > DDR_SELF_REFERENCE (res) = false; > > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > + for (i = 0; i < num_dimensions; ++i) > { > struct subscript *subscript; > > subscript = XNEW (struct subscript); > + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, start_a + i); > + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, start_b + i); > SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); > SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); > SUB_LAST_CONFLICT (subscript) = chrec_dont_know; > @@ -3163,14 +3312,15 @@ add_outer_distances (struct data_depende > } > > /* Return false when fail to represent the data dependence as a > - distance vector. INIT_B is set to true when a component has been > + distance vector. A_INDEX is the index of the first reference > + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the > + second reference. INIT_B is set to true when a component has been > added to the distance vector DIST_V. INDEX_CARRY is then set to > the index in DIST_V that carries the dependence. */ > > static bool > build_classic_dist_vector_1 (struct data_dependence_relation *ddr, > - struct data_reference *ddr_a, > - struct data_reference *ddr_b, > + unsigned int a_index, unsigned int b_index, > lambda_vector dist_v, bool *init_b, > int *index_carry) > { > @@ -3188,8 +3338,8 @@ build_classic_dist_vector_1 (struct data > return false; > } > > - access_fn_a = DR_ACCESS_FN (ddr_a, i); > - access_fn_b = DR_ACCESS_FN (ddr_b, i); > + access_fn_a = SUB_ACCESS_FN (subscript, a_index); > + access_fn_b = SUB_ACCESS_FN (subscript, b_index); > > if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC > && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) > @@ -3249,10 +3399,11 @@ build_classic_dist_vector_1 (struct data > constant_access_functions (const struct data_dependence_relation *ddr) > { > unsigned i; > + subscript *sub; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) > - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) > + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) > return false; > > return true; > @@ -3315,10 +3466,11 @@ add_other_self_distances (struct data_de > lambda_vector dist_v; > unsigned i; > int index_carry = DDR_NB_LOOPS (ddr); > + subscript *sub; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > { > - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); > + tree access_fun = SUB_ACCESS_FN (sub, 0); > > if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) > { > @@ -3330,7 +3482,7 @@ add_other_self_distances (struct data_de > return; > } > > - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); > + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); > > if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) > add_multivariate_self_dist (ddr, access_fun); > @@ -3401,6 +3553,23 @@ add_distance_for_zero_overlaps (struct d > } > } > > +/* Return true when the DDR contains two data references that have the > + same access functions. */ > + > +static inline bool > +same_access_functions (const struct data_dependence_relation *ddr) > +{ > + unsigned i; > + subscript *sub; > + > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), > + SUB_ACCESS_FN (sub, 1))) > + return false; > + > + return true; > +} > + > /* Compute the classic per loop distance vector. DDR is the data > dependence relation to build a vector from. Return false when fail > to represent the data dependence as a distance vector. */ > @@ -3432,8 +3601,7 @@ build_classic_dist_vector (struct data_d > } > > dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), > - dist_v, &init_b, &index_carry)) > + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry)) > return false; > > /* Save the distance vector if we initialized one. */ > @@ -3466,12 +3634,11 @@ build_classic_dist_vector (struct data_d > if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) > { > lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - loop_nest)) > + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) > return false; > compute_subscript_distance (ddr); > - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - save_v, &init_b, &index_carry)) > + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, > + &index_carry)) > return false; > save_dist_v (ddr, save_v); > DDR_REVERSED_P (ddr) = true; > @@ -3507,12 +3674,10 @@ build_classic_dist_vector (struct data_d > { > lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > > - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), > - DDR_A (ddr), loop_nest)) > + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) > return false; > compute_subscript_distance (ddr); > - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - opposite_v, &init_b, > + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, > &index_carry)) > return false; > > @@ -3591,13 +3756,13 @@ build_classic_dir_vector (struct data_de > } > } > > -/* Helper function. Returns true when there is a dependence between > - data references DRA and DRB. */ > +/* Helper function. Returns true when there is a dependence between the > + data references. A_INDEX is the index of the first reference (0 for > + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ > > static bool > subscript_dependence_tester_1 (struct data_dependence_relation *ddr, > - struct data_reference *dra, > - struct data_reference *drb, > + unsigned int a_index, unsigned int b_index, > struct loop *loop_nest) > { > unsigned int i; > @@ -3609,8 +3774,8 @@ subscript_dependence_tester_1 (struct da > { > conflict_function *overlaps_a, *overlaps_b; > > - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), > - DR_ACCESS_FN (drb, i), > + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), > + SUB_ACCESS_FN (subscript, b_index), > &overlaps_a, &overlaps_b, > &last_conflicts, loop_nest); > > @@ -3659,7 +3824,7 @@ subscript_dependence_tester_1 (struct da > subscript_dependence_tester (struct data_dependence_relation *ddr, > struct loop *loop_nest) > { > - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest)) > + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) > dependence_stats.num_dependence_dependent++; > > compute_subscript_distance (ddr); > Index: gcc/tree-ssa-loop-prefetch.c > =================================================================== > --- gcc/tree-ssa-loop-prefetch.c 2017-03-28 16:19:28.000000000 +0100 > +++ gcc/tree-ssa-loop-prefetch.c 2017-05-03 08:48:48.737038502 +0100 > @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop * > refb = (struct mem_ref *) DDR_B (dep)->aux; > > if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know > + || DDR_COULD_BE_INDEPENDENT_P (dep) > || DDR_NUM_DIST_VECTS (dep) == 0) > { > /* If the dependence cannot be analyzed, assume that there might be As said, we could avoid computing such information in the first place. I can see predcom ingores it by explicitly checking DR_BASE_OBJECT, what about tree-parloops.c? > Index: gcc/tree-vectorizer.h > =================================================================== > --- gcc/tree-vectorizer.h 2017-03-28 16:19:28.000000000 +0100 > +++ gcc/tree-vectorizer.h 2017-05-03 08:48:48.738045102 +0100 > @@ -383,7 +383,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) > #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ > ((L)->may_misalign_stmts.length () > 0) > #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ > - ((L)->may_alias_ddrs.length () > 0) > + ((L)->comp_alias_ddrs.length () > 0) > #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ > (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) > #define LOOP_REQUIRES_VERSIONING(L) \ > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2017-05-03 08:48:30.536704993 +0100 > +++ gcc/tree-vect-data-refs.c 2017-05-03 08:48:48.738045102 +0100 > @@ -340,6 +340,26 @@ vect_analyze_data_ref_dependence (struct > } > > loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); > + > + if (DDR_COULD_BE_INDEPENDENT_P (ddr)) > + /* For dependence distances of 2 or more, we have the option of > + limiting VF or checking for an alias at runtime. Prefer to check > + at runtime if we can, to avoid limiting the VF unnecessarily when > + the bases are in fact independent. > + > + Note that the alias checks will be removed if the VF ends up > + being small enough. */ > + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > + { > + int dist = dist_v[loop_depth]; > + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) > + { > + if (vect_mark_for_runtime_alias_test (ddr, loop_vinfo)) > + return false; > + break; > + } > + } In theory, we could end up with DDR pushed more than once for alias test? Flag it in FOR loop and only mark it later? Thanks, bin > + > FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > { > int dist = dist_v[loop_depth]; > @@ -3017,6 +3037,44 @@ vect_no_alias_p (struct data_reference * > return false; > } > > +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH > + in DDR is >= VF. */ > + > +static bool > +dependence_distance_ge_vf (data_dependence_relation *ddr, > + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) > +{ > + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE > + || DDR_NUM_DIST_VECTS (ddr) == 0) > + return false; > + > + /* If the dependence is exact, we should have limited the VF instead. */ > + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); > + > + unsigned int i; > + lambda_vector dist_v; > + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > + { > + HOST_WIDE_INT dist = dist_v[loop_depth]; > + if (dist != 0 > + && !(dist > 0 && DDR_REVERSED_P (ddr)) > + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) > + return false; > + } > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, > + "dependence distance between "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); > + dump_printf (MSG_NOTE, " and "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); > + dump_printf (MSG_NOTE, " is >= VF\n"); > + } > + > + return true; > +} > + > /* Function vect_prune_runtime_alias_test_list. > > Prune a list of ddrs to be tested at run-time by versioning for alias. > @@ -3075,6 +3133,10 @@ vect_prune_runtime_alias_test_list (loop > > comp_alias_ddrs.create (may_alias_ddrs.length ()); > > + unsigned int loop_depth > + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, > + LOOP_VINFO_LOOP_NEST (loop_vinfo)); > + > /* First, we collect all data ref pairs for aliasing checks. */ > FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) > { > @@ -3084,6 +3146,11 @@ vect_prune_runtime_alias_test_list (loop > tree segment_length_a, segment_length_b; > gimple *stmt_a, *stmt_b; > > + /* Ignore the alias if the VF we chose ended up being no greater > + than the dependence distance. */ > + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) > + continue; > + > dr_a = DDR_A (ddr); > stmt_a = DR_STMT (DDR_A (ddr)); > dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); > @@ -3294,10 +3361,6 @@ vect_prune_runtime_alias_test_list (loop > return false; > } > > - /* All alias checks have been resolved at compilation time. */ > - if (!comp_alias_ddrs.length ()) > - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); > - > return true; > } > > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c > =================================================================== > --- /dev/null 2017-05-03 08:16:26.972699664 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-05-03 08:48:48.736031902 +0100 > @@ -0,0 +1,104 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ > + > +/* Intended to be larger than any VF. */ > +#define GAP 128 > +#define N (GAP * 3) > + > +struct s { int x[N + 1]; }; > +struct t { struct s x[N + 1]; }; > +struct u { int x[N + 1]; int y; }; > + > +void > +f1 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i]; > +} > + > +void > +f2 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[2].x[i]; > +} > + > +void > +f3 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[i].x[i]; > +} > + > +void > +f4 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[i].x[i] += b[i].x[i]; > +} > + > +void > +f5 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i + 1]; > +} > + > +void > +f6 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[2].x[i + 1]; > +} > + > +void > +f7 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[i].x[i + 1]; > +} > + > +void > +f8 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[i].x[i] += b[i].x[i + 1]; > +} > + > +void > +f9 (struct s *a, struct t *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[1].x[i]; > +} > + > +void > +f10 (struct s *a, struct t *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i].x[i]; > +} > + > +void > +f11 (struct u *a, struct u *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i] + b[i].y; > +} > + > +void > +f12 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +void > +f13 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP * 2; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 13 "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c > =================================================================== > --- /dev/null 2017-05-03 08:16:26.972699664 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-05-03 08:48:48.736031902 +0100 > @@ -0,0 +1,35 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ > + > +#define N 16 > + > +struct s1 { int a[N]; }; > +struct s2 { struct s1 b; int c; }; > +struct s3 { int d; struct s1 e; }; > +union u { struct s2 f; struct s3 g; }; > + > +/* We allow a and b to overlap arbitrarily. */ > + > +void > +f1 (int a[][N], int b[][N]) > +{ > + for (int i = 0; i < N; ++i) > + a[0][i] += b[0][i]; > +} > + > +void > +f2 (union u *a, union u *b) > +{ > + for (int i = 0; i < N; ++i) > + a->f.b.a[i] += b->g.e.a[i]; > +} > + > +void > +f3 (struct s1 *a, struct s1 *b) > +{ > + for (int i = 0; i < N - 1; ++i) > + a->a[i + 1] += b->a[i]; > +} > + > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c > =================================================================== > --- /dev/null 2017-05-03 08:16:26.972699664 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-05-03 08:48:48.736031902 +0100 > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > + > +/* Intended to be larger than any VF. */ > +#define GAP 128 > +#define N (GAP * 3) > + > +struct s { int x[N]; }; > + > +void > +f1 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP * 2; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +/* { dg-final { scan-tree-dump-times "mark for run-time aliasing" 1 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */

"Bin.Cheng" <amker.cheng@gmail.com> writes: > On Wed, May 3, 2017 at 9:00 AM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Index: gcc/tree-data-ref.h >> =================================================================== >> --- gcc/tree-data-ref.h 2017-05-03 08:48:11.977015306 +0100 >> +++ gcc/tree-data-ref.h 2017-05-03 08:48:48.737038502 +0100 >> @@ -191,6 +191,9 @@ struct conflict_function >> >> struct subscript >> { >> + /* The access functions of the two references. */ >> + tree access_fn[2]; > Is it better to follow existing code, i.e, name this as > access_fn_a/access_fn_b. Thus we don't need to use const value 0/1 in > various places, which is a little bit confusing. [Answered below] >> + >> /* A description of the iterations for which the elements are >> accessed twice. */ >> conflict_function *conflicting_iterations_in_a; >> @@ -209,6 +212,7 @@ struct subscript >> >> typedef struct subscript *subscript_p; >> >> +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] >> #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a >> #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b >> #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict >> @@ -264,6 +268,33 @@ struct data_dependence_relation >> /* Set to true when the dependence relation is on the same data >> access. */ >> bool self_reference_p; >> + >> + /* True if the dependence described is conservatively correct rather >> + than exact, and if it is still possible for the accesses to be >> + conditionally independent. For example, the a and b references in: >> + >> + struct s *a, *b; >> + for (int i = 0; i < n; ++i) >> + a->f[i] += b->f[i]; >> + >> + conservatively have a distance vector of (0), for the case in which >> + a == b, but the accesses are independent if a != b. Similarly, >> + the a and b references in: >> + >> + struct s *a, *b; >> + for (int i = 0; i < n; ++i) >> + a[0].f[i] += b[i].f[i]; >> + >> + conservatively have a distance vector of (0), but they are indepenent >> + when a != b + i. In contrast, the references in: >> + >> + struct s *a; >> + for (int i = 0; i < n; ++i) >> + a->f[i] += a->f[i]; >> + >> + have the same distance vector of (0), but the accesses can never be >> + independent. */ >> + bool could_be_independent_p; >> }; >> >> typedef struct data_dependence_relation *ddr_p; >> @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \ >> #define DDR_DIST_VECT(DDR, I) \ >> DDR_DIST_VECTS (DDR)[I] >> #define DDR_REVERSED_P(DDR) (DDR)->reversed_p >> +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p >> >> >> bool dr_analyze_innermost (struct data_reference *, struct loop *); >> @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data >> return false; >> >> return true; >> -} >> - >> -/* Return true when the DDR contains two data references that have the >> - same access functions. */ >> - >> -static inline bool >> -same_access_functions (const struct data_dependence_relation *ddr) >> -{ >> - unsigned i; >> - >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), >> - DR_ACCESS_FN (DDR_B (ddr), i))) >> - return false; >> - >> - return true; >> } >> >> /* Returns true when all the dependences are computable. */ >> Index: gcc/tree-data-ref.c >> =================================================================== >> --- gcc/tree-data-ref.c 2017-02-23 19:54:15.000000000 +0000 >> +++ gcc/tree-data-ref.c 2017-05-03 08:48:48.737038502 +0100 >> @@ -123,8 +123,7 @@ Software Foundation; either version 3, o >> } dependence_stats; >> >> static bool subscript_dependence_tester_1 (struct data_dependence_relation *, >> - struct data_reference *, >> - struct data_reference *, >> + unsigned int, unsigned int, >> struct loop *); > As mentioned, how about passing access_fn directly, rather than less > meaningful 0/1 values? The problem is that access_fn is a property of the individual subscripts, whereas this is operating on a full data_reference. One alternative would be to use conditions like: first_is_a ? SUB_ACCESS_FN_A (sub) : SUB_ACCESS_FN_B (sub) but IMO that's less readable than the existing: SUB_ACCESS_FN (sub, index) Or we could have individual access_fn arrays for A and B, separate from the main subscript array, but that would mean allocating three arrays instead of one. >> /* Returns true iff A divides B. */ >> >> @@ -144,6 +143,21 @@ int_divides_p (int a, int b) >> return ((b % a) == 0); >> } >> >> +/* Return true if reference REF contains a union access. */ >> + >> +static bool >> +ref_contains_union_access_p (tree ref) >> +{ >> + while (handled_component_p (ref)) >> + { >> + ref = TREE_OPERAND (ref, 0); >> + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE >> + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) >> + return true; >> + } >> + return false; >> +} >> + >> >> >> /* Dump into FILE all the data references from DATAREFS. */ >> @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out >> unsigned int i; >> struct loop *loopi; >> >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> + subscript *sub; >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> { >> fprintf (outf, " access_fn_A: "); >> - print_generic_stmt (outf, DR_ACCESS_FN (dra, i), 0); >> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0), 0); >> fprintf (outf, " access_fn_B: "); >> - print_generic_stmt (outf, DR_ACCESS_FN (drb, i), 0); >> - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); >> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1), 0); >> + dump_subscript (outf, sub); >> } >> >> fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); >> @@ -1484,11 +1499,10 @@ initialize_data_dependence_relation (str >> struct data_dependence_relation *res; >> unsigned int i; >> >> - res = XNEW (struct data_dependence_relation); >> + res = XCNEW (struct data_dependence_relation); >> DDR_A (res) = a; >> DDR_B (res) = b; >> DDR_LOOP_NEST (res).create (0); >> - DDR_REVERSED_P (res) = false; >> DDR_SUBSCRIPTS (res).create (0); >> DDR_DIR_VECTS (res).create (0); >> DDR_DIST_VECTS (res).create (0); >> @@ -1506,82 +1520,217 @@ initialize_data_dependence_relation (str >> return res; >> } >> >> - /* The case where the references are exactly the same. */ >> - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) >> + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); >> + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); >> + if (num_dimensions_a == 0 || num_dimensions_b == 0) >> { >> - if ((loop_nest.exists () >> - && !object_address_invariant_in_loop_p (loop_nest[0], >> - DR_BASE_OBJECT (a))) >> - || DR_NUM_DIMENSIONS (a) == 0) >> - { >> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> - return res; >> - } >> - DDR_AFFINE_P (res) = true; >> - DDR_ARE_DEPENDENT (res) = NULL_TREE; >> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); >> - DDR_LOOP_NEST (res) = loop_nest; >> - DDR_INNER_LOOP (res) = 0; >> - DDR_SELF_REFERENCE (res) = true; >> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) >> - { >> - struct subscript *subscript; >> + DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> + return res; >> + } >> + >> + /* For unconstrained bases, the outer (highest-index) subscript >> + describes a variation in the base of the original DR_REF rather >> + than a component access. We have no type that accurately describes >> + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* >> + applying the outer subscript) so limit the search to the last real >> + component access. >> + >> + E.g. for: >> >> - subscript = XNEW (struct subscript); >> - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); >> - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); >> - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; >> - SUB_DISTANCE (subscript) = chrec_dont_know; >> - DDR_SUBSCRIPTS (res).safe_push (subscript); >> + void >> + f (int a[][8], int b[][8]) >> + { >> + for (int i = 0; i < 8; ++i) >> + a[i * 2][0] = b[i][0]; >> } >> - return res; >> + >> + the a and b accesses have a single ARRAY_REF component reference [0] >> + but have two subscripts. */ >> + if (DR_UNCONSTRAINED_BASE (a)) >> + num_dimensions_a -= 1; >> + if (DR_UNCONSTRAINED_BASE (b)) >> + num_dimensions_b -= 1; >> + >> + /* Now look for two sequences of component references that have the same >> + type in both A and B. The first sequence includes an arbitrary mixture >> + of array and structure references while the second always ends on a >> + structure reference. >> + >> + The former (arbitrary) sequence uses access functions: >> + >> + [START_A, START_A + NUM_DIMENSIONS) of A >> + [START_B, START_B + NUM_DIMENSIONS) of B >> + >> + The latter sequence uses access functions: >> + >> + [STRUCT_START_A, STRUCT_START_A + STRUCT_NUM_DIMENSIONS) of A >> + [STRUCT_START_B, STRUCT_START_B + STRUCT_NUM_DIMENSIONS) of B >> + >> + STRUCT_REF_A and STRUCT_REF_B are the outer references for the > IIUC, A and B always share the same latter sequence, and the common > latter sequence ends at a structure reference providing alias > information. The A and B accesses aren't necessarily the same, they just have the compatible types. E.g. for: struct s { int x[8]; int y[8]; } *a, *b; ... a->x[0] = b->y[1] ... the sequence would include: a: [0] .x b: [1] .y > Is it possible to record the the former arbitrary > references instead of simple flag DDR_COULD_BE_INDEPENDENT_P. With > this information, alias check can be simplified by stripping away > address computation for the shared common sub-sequence. I doubt > vect_create_cond_for_alias_checks could detect this kind CSE for now. > Ah, I see you changed alias check code generation in order to handle > this. The num_dimensions sequence is only used if it ends at the original base and if the bases are equal. In other cases it doesn't really help. The struct_num_dimensions sequence is meant to be the one that is helpful even when the bases aren't equal. Like you say, there's a follow-on patch that uses this for runtime alias checking. >> + latter sequence. */ >> + unsigned int start_a = 0; >> + unsigned int start_b = 0; >> + unsigned int num_dimensions = 0; >> + unsigned int struct_start_a = 0; >> + unsigned int struct_start_b = 0; >> + unsigned int struct_num_dimensions = 0; >> + unsigned int index_a = 0; >> + unsigned int index_b = 0; >> + tree next_ref_a = DR_REF (a); >> + tree next_ref_b = DR_REF (b); >> + tree struct_ref_a = NULL_TREE; >> + tree struct_ref_b = NULL_TREE; >> + while (index_a < num_dimensions_a && index_b < num_dimensions_b) >> + { >> + gcc_checking_assert (handled_component_p (next_ref_a)); >> + gcc_checking_assert (handled_component_p (next_ref_b)); >> + tree outer_ref_a = TREE_OPERAND (next_ref_a, 0); >> + tree outer_ref_b = TREE_OPERAND (next_ref_b, 0); >> + tree type_a = TREE_TYPE (outer_ref_a); >> + tree type_b = TREE_TYPE (outer_ref_b); >> + if (types_compatible_p (type_a, type_b)) >> + { >> + /* This pair of accesses belong to a suitable sequence. */ >> + if (start_a + num_dimensions != index_a >> + || start_b + num_dimensions != index_b) >> + { >> + /* Start a new sequence here. */ >> + start_a = index_a; >> + start_b = index_b; >> + num_dimensions = 0; >> + } >> + num_dimensions += 1; >> + if (TREE_CODE (type_a) == RECORD_TYPE) >> + { >> + struct_start_a = start_a; >> + struct_start_b = start_b; >> + struct_num_dimensions = num_dimensions; >> + struct_ref_a = outer_ref_a; >> + struct_ref_b = outer_ref_b; >> + } >> + next_ref_a = outer_ref_a; >> + next_ref_b = outer_ref_b; >> + index_a += 1; >> + index_b += 1; >> + continue; >> + } >> + /* Try to approach equal type sizes. */ >> + if (!COMPLETE_TYPE_P (type_a) >> + || !COMPLETE_TYPE_P (type_b) >> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >> + break; >> + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); >> + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); >> + if (size_a <= size_b) >> + { >> + index_a += 1; >> + next_ref_a = outer_ref_a; >> + } >> + if (size_b <= size_a) >> + { >> + index_b += 1; >> + next_ref_b = outer_ref_b; >> + } >> } >> >> - /* If the references do not access the same object, we do not know >> - whether they alias or not. We do not care about TBAA or alignment >> - info so we can use OEP_ADDRESS_OF to avoid false negatives. >> - But the accesses have to use compatible types as otherwise the >> - built indices would not match. */ >> - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), > OEP_ADDRESS_OF) >> - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), >> - TREE_TYPE (DR_BASE_OBJECT (b)))) >> + /* See whether the sequence ends at the base and whether the two bases >> + are equal. We do not care about TBAA or alignment info so we can use >> + OEP_ADDRESS_OF to avoid false negatives. */ >> + tree base_a = DR_BASE_OBJECT (a); >> + tree base_b = DR_BASE_OBJECT (b); >> + bool same_base_p = (start_a + num_dimensions == num_dimensions_a >> + && start_b + num_dimensions == num_dimensions_b >> + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) >> + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) >> + && types_compatible_p (TREE_TYPE (base_a), >> + TREE_TYPE (base_b)) >> + && (!loop_nest.exists () >> + || (object_address_invariant_in_loop_p >> + (loop_nest[0], base_a)))); > Major change is in function initialize_data_dependence_relation in > order to detect partial alias opportunity. The original equality > check on DR_BASE_OBJECT is bypassed now. IMHO better to introduce a > new parameter to compute_data_reference_for_loop etc., indicating > whether we want to handle partial alias opportunity or not. After > all, such computation is unnecessary for predcom/prefetch/parloop. > It's only a waste of time computing it. Well, it also means that we can now prove the accesses are independent in more cases. E.g. previously we would assume the a and b accesses in: struct s { int x[16]; } *a, *b; for (int i = 0; i < 8; ++i) a->x[i] = b->x[i + 8]; could conflict. If callers don't need to know what the relationship between a and b is, I think they should check for that before going through the process of initialising and analysing the ddr. >> + >> + /* If the bases are the same, we can include the base variation too. >> + E.g. the b accesses in: >> + >> + for (int i = 0; i < n; ++i) >> + b[i + 4][0] = b[i][0]; >> + >> + have a definite dependence distance of 4, while for: >> + >> + for (int i = 0; i < n; ++i) >> + a[i + 4][0] = b[i][0]; >> + >> + the dependence distance depends on the gap between a and b. >> + >> + If the bases are different then we can only rely on the sequence >> + rooted at a structure access, since arrays are allowed to overlap >> + arbitrarily and change shape arbitrarily. E.g. we treat this as >> + valid code: >> + >> + int a[256]; >> + ... >> + ((int (*)[4][3])&a[1])[i][0] += ((int (*)[4][3])&a[2])[i][0]; >> + >> + where two lvalues with the same int[4][3] type overlap, and where >> + both lvalues are distinct from the object's declared type. */ >> + if (same_base_p) >> { >> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> - return res; >> + if (DR_UNCONSTRAINED_BASE (a)) >> + num_dimensions += 1; >> + } >> + else >> + { >> + start_a = struct_start_a; >> + start_b = struct_start_b; >> + num_dimensions = struct_num_dimensions; >> } >> >> - /* If the base of the object is not invariant in the loop nest, we cannot >> - analyze it. TODO -- in fact, it would suffice to record that there may >> - be arbitrary dependences in the loops where the base object varies. */ >> - if ((loop_nest.exists () >> - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT > (a))) >> - || DR_NUM_DIMENSIONS (a) == 0) >> + /* Punt if we didn't find a suitable sequence. */ >> + if (num_dimensions == 0) >> { >> DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> return res; >> } >> >> - /* If the number of dimensions of the access to not agree we can have >> - a pointer access to a component of the array element type and an >> - array access while the base-objects are still the same. Punt. */ >> - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) >> + if (!same_base_p) >> { >> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> - return res; >> + /* Partial overlap is possible for different bases when strict aliasing >> + is not in effect. It's also possible if either base involves a union >> + access; e.g. for: >> + >> + struct s1 { int a[2]; }; >> + struct s2 { struct s1 b; int c; }; >> + struct s3 { int d; struct s1 e; }; >> + union u { struct s2 f; struct s3 g; } *p, *q; >> + >> + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at >> + "p->g.e" (base "p->g") and might partially overlap the s1 at >> + "q->g.e" (base "q->g"). */ >> + if (!flag_strict_aliasing >> + || ref_contains_union_access_p (struct_ref_a) >> + || ref_contains_union_access_p (struct_ref_b)) >> + { >> + DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> + return res; >> + } >> + >> + DDR_COULD_BE_INDEPENDENT_P (res) = true; >> } >> >> DDR_AFFINE_P (res) = true; >> DDR_ARE_DEPENDENT (res) = NULL_TREE; >> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); >> + DDR_SUBSCRIPTS (res).create (num_dimensions); >> DDR_LOOP_NEST (res) = loop_nest; >> DDR_INNER_LOOP (res) = 0; >> DDR_SELF_REFERENCE (res) = false; >> >> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) >> + for (i = 0; i < num_dimensions; ++i) >> { >> struct subscript *subscript; >> >> subscript = XNEW (struct subscript); >> + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, start_a + i); >> + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, start_b + i); >> SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); >> SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); >> SUB_LAST_CONFLICT (subscript) = chrec_dont_know; >> @@ -3163,14 +3312,15 @@ add_outer_distances (struct data_depende >> } >> >> /* Return false when fail to represent the data dependence as a >> - distance vector. INIT_B is set to true when a component has been >> + distance vector. A_INDEX is the index of the first reference >> + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the >> + second reference. INIT_B is set to true when a component has been >> added to the distance vector DIST_V. INDEX_CARRY is then set to >> the index in DIST_V that carries the dependence. */ >> >> static bool >> build_classic_dist_vector_1 (struct data_dependence_relation *ddr, >> - struct data_reference *ddr_a, >> - struct data_reference *ddr_b, >> + unsigned int a_index, unsigned int b_index, >> lambda_vector dist_v, bool *init_b, >> int *index_carry) >> { >> @@ -3188,8 +3338,8 @@ build_classic_dist_vector_1 (struct data >> return false; >> } >> >> - access_fn_a = DR_ACCESS_FN (ddr_a, i); >> - access_fn_b = DR_ACCESS_FN (ddr_b, i); >> + access_fn_a = SUB_ACCESS_FN (subscript, a_index); >> + access_fn_b = SUB_ACCESS_FN (subscript, b_index); >> >> if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC >> && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) >> @@ -3249,10 +3399,11 @@ build_classic_dist_vector_1 (struct data >> constant_access_functions (const struct data_dependence_relation *ddr) >> { >> unsigned i; >> + subscript *sub; >> >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) >> - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) >> + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) >> return false; >> >> return true; >> @@ -3315,10 +3466,11 @@ add_other_self_distances (struct data_de >> lambda_vector dist_v; >> unsigned i; >> int index_carry = DDR_NB_LOOPS (ddr); >> + subscript *sub; >> >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> { >> - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); >> + tree access_fun = SUB_ACCESS_FN (sub, 0); >> >> if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) >> { >> @@ -3330,7 +3482,7 @@ add_other_self_distances (struct data_de >> return; >> } >> >> - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); >> + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); >> >> if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) >> add_multivariate_self_dist (ddr, access_fun); >> @@ -3401,6 +3553,23 @@ add_distance_for_zero_overlaps (struct d >> } >> } >> >> +/* Return true when the DDR contains two data references that have the >> + same access functions. */ >> + >> +static inline bool >> +same_access_functions (const struct data_dependence_relation *ddr) >> +{ >> + unsigned i; >> + subscript *sub; >> + >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), >> + SUB_ACCESS_FN (sub, 1))) >> + return false; >> + >> + return true; >> +} >> + >> /* Compute the classic per loop distance vector. DDR is the data >> dependence relation to build a vector from. Return false when fail >> to represent the data dependence as a distance vector. */ >> @@ -3432,8 +3601,7 @@ build_classic_dist_vector (struct data_d >> } >> >> dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >> - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), >> - dist_v, &init_b, &index_carry)) >> + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, > &index_carry)) >> return false; >> >> /* Save the distance vector if we initialized one. */ >> @@ -3466,12 +3634,11 @@ build_classic_dist_vector (struct data_d >> if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) >> { >> lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), >> - loop_nest)) >> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) >> return false; >> compute_subscript_distance (ddr); >> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), >> - save_v, &init_b, &index_carry)) >> + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, >> + &index_carry)) >> return false; >> save_dist_v (ddr, save_v); >> DDR_REVERSED_P (ddr) = true; >> @@ -3507,12 +3674,10 @@ build_classic_dist_vector (struct data_d >> { >> lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >> >> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), >> - DDR_A (ddr), loop_nest)) >> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) >> return false; >> compute_subscript_distance (ddr); >> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), >> - opposite_v, &init_b, >> + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, >> &index_carry)) >> return false; >> >> @@ -3591,13 +3756,13 @@ build_classic_dir_vector (struct data_de >> } >> } >> >> -/* Helper function. Returns true when there is a dependence between >> - data references DRA and DRB. */ >> +/* Helper function. Returns true when there is a dependence between the >> + data references. A_INDEX is the index of the first reference (0 for >> + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ >> >> static bool >> subscript_dependence_tester_1 (struct data_dependence_relation *ddr, >> - struct data_reference *dra, >> - struct data_reference *drb, >> + unsigned int a_index, unsigned int b_index, >> struct loop *loop_nest) >> { >> unsigned int i; >> @@ -3609,8 +3774,8 @@ subscript_dependence_tester_1 (struct da >> { >> conflict_function *overlaps_a, *overlaps_b; >> >> - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), >> - DR_ACCESS_FN (drb, i), >> + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), >> + SUB_ACCESS_FN (subscript, b_index), >> &overlaps_a, &overlaps_b, >> &last_conflicts, loop_nest); >> >> @@ -3659,7 +3824,7 @@ subscript_dependence_tester_1 (struct da >> subscript_dependence_tester (struct data_dependence_relation *ddr, >> struct loop *loop_nest) >> { >> - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), > loop_nest)) >> + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) >> dependence_stats.num_dependence_dependent++; >> >> compute_subscript_distance (ddr); >> Index: gcc/tree-ssa-loop-prefetch.c >> =================================================================== >> --- gcc/tree-ssa-loop-prefetch.c 2017-03-28 16:19:28.000000000 +0100 >> +++ gcc/tree-ssa-loop-prefetch.c 2017-05-03 08:48:48.737038502 +0100 >> @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop * >> refb = (struct mem_ref *) DDR_B (dep)->aux; >> >> if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know >> + || DDR_COULD_BE_INDEPENDENT_P (dep) >> || DDR_NUM_DIST_VECTS (dep) == 0) >> { >> /* If the dependence cannot be analyzed, assume that there might be > As said, we could avoid computing such information in the first place. > I can see predcom ingores it by explicitly checking DR_BASE_OBJECT, > what about tree-parloops.c? For parloops, it should help that we can now prove lack of dependence in more cases. Thanks, Richard

On Thu, May 4, 2017 at 11:06 AM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > "Bin.Cheng" <amker.cheng@gmail.com> writes: >> On Wed, May 3, 2017 at 9:00 AM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> Index: gcc/tree-data-ref.h >>> =================================================================== >>> --- gcc/tree-data-ref.h 2017-05-03 08:48:11.977015306 +0100 >>> +++ gcc/tree-data-ref.h 2017-05-03 08:48:48.737038502 +0100 >>> @@ -191,6 +191,9 @@ struct conflict_function >>> >>> struct subscript >>> { >>> + /* The access functions of the two references. */ >>> + tree access_fn[2]; >> Is it better to follow existing code, i.e, name this as >> access_fn_a/access_fn_b. Thus we don't need to use const value 0/1 in >> various places, which is a little bit confusing. > > [Answered below] > >>> + >>> /* A description of the iterations for which the elements are >>> accessed twice. */ >>> conflict_function *conflicting_iterations_in_a; >>> @@ -209,6 +212,7 @@ struct subscript >>> >>> typedef struct subscript *subscript_p; >>> >>> +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] >>> #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a >>> #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b >>> #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict >>> @@ -264,6 +268,33 @@ struct data_dependence_relation >>> /* Set to true when the dependence relation is on the same data >>> access. */ >>> bool self_reference_p; >>> + >>> + /* True if the dependence described is conservatively correct rather >>> + than exact, and if it is still possible for the accesses to be >>> + conditionally independent. For example, the a and b references in: >>> + >>> + struct s *a, *b; >>> + for (int i = 0; i < n; ++i) >>> + a->f[i] += b->f[i]; >>> + >>> + conservatively have a distance vector of (0), for the case in which >>> + a == b, but the accesses are independent if a != b. Similarly, >>> + the a and b references in: >>> + >>> + struct s *a, *b; >>> + for (int i = 0; i < n; ++i) >>> + a[0].f[i] += b[i].f[i]; >>> + >>> + conservatively have a distance vector of (0), but they are indepenent >>> + when a != b + i. In contrast, the references in: >>> + >>> + struct s *a; >>> + for (int i = 0; i < n; ++i) >>> + a->f[i] += a->f[i]; >>> + >>> + have the same distance vector of (0), but the accesses can never be >>> + independent. */ >>> + bool could_be_independent_p; >>> }; >>> >>> typedef struct data_dependence_relation *ddr_p; >>> @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \ >>> #define DDR_DIST_VECT(DDR, I) \ >>> DDR_DIST_VECTS (DDR)[I] >>> #define DDR_REVERSED_P(DDR) (DDR)->reversed_p >>> +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p >>> >>> >>> bool dr_analyze_innermost (struct data_reference *, struct loop *); >>> @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data >>> return false; >>> >>> return true; >>> -} >>> - >>> -/* Return true when the DDR contains two data references that have the >>> - same access functions. */ >>> - >>> -static inline bool >>> -same_access_functions (const struct data_dependence_relation *ddr) >>> -{ >>> - unsigned i; >>> - >>> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >>> - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), >>> - DR_ACCESS_FN (DDR_B (ddr), i))) >>> - return false; >>> - >>> - return true; >>> } >>> >>> /* Returns true when all the dependences are computable. */ >>> Index: gcc/tree-data-ref.c >>> =================================================================== >>> --- gcc/tree-data-ref.c 2017-02-23 19:54:15.000000000 +0000 >>> +++ gcc/tree-data-ref.c 2017-05-03 08:48:48.737038502 +0100 >>> @@ -123,8 +123,7 @@ Software Foundation; either version 3, o >>> } dependence_stats; >>> >>> static bool subscript_dependence_tester_1 (struct data_dependence_relation *, >>> - struct data_reference *, >>> - struct data_reference *, >>> + unsigned int, unsigned int, >>> struct loop *); >> As mentioned, how about passing access_fn directly, rather than less >> meaningful 0/1 values? > > The problem is that access_fn is a property of the individual > subscripts, whereas this is operating on a full data_reference. > > One alternative would be to use conditions like: > > first_is_a ? SUB_ACCESS_FN_A (sub) : SUB_ACCESS_FN_B (sub) > > but IMO that's less readable than the existing: > > SUB_ACCESS_FN (sub, index) > > Or we could have individual access_fn arrays for A and B, separate > from the main subscript array, but that would mean allocating three > arrays instead of one. Thanks for explanation, I see the problem now. Even the latter sequence could be different for A and B, there should have the same number index? If that's the case, is it possible just recording the starting position (or length) in DR_ACCESS_FN and use that information to access to access_fn vector. This can save the copy in subscript. Anyway, this is not am important problem. > >>> /* Returns true iff A divides B. */ >>> >>> @@ -144,6 +143,21 @@ int_divides_p (int a, int b) >>> return ((b % a) == 0); >>> } >>> >>> +/* Return true if reference REF contains a union access. */ >>> + >>> +static bool >>> +ref_contains_union_access_p (tree ref) >>> +{ >>> + while (handled_component_p (ref)) >>> + { >>> + ref = TREE_OPERAND (ref, 0); >>> + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE >>> + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) >>> + return true; >>> + } >>> + return false; >>> +} >>> + >>> >>> >>> /* Dump into FILE all the data references from DATAREFS. */ >>> @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out >>> unsigned int i; >>> struct loop *loopi; >>> >>> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >>> + subscript *sub; >>> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >>> { >>> fprintf (outf, " access_fn_A: "); >>> - print_generic_stmt (outf, DR_ACCESS_FN (dra, i), 0); >>> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0), 0); >>> fprintf (outf, " access_fn_B: "); >>> - print_generic_stmt (outf, DR_ACCESS_FN (drb, i), 0); >>> - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); >>> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1), 0); >>> + dump_subscript (outf, sub); >>> } >>> >>> fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); >>> @@ -1484,11 +1499,10 @@ initialize_data_dependence_relation (str >>> struct data_dependence_relation *res; >>> unsigned int i; >>> >>> - res = XNEW (struct data_dependence_relation); >>> + res = XCNEW (struct data_dependence_relation); >>> DDR_A (res) = a; >>> DDR_B (res) = b; >>> DDR_LOOP_NEST (res).create (0); >>> - DDR_REVERSED_P (res) = false; >>> DDR_SUBSCRIPTS (res).create (0); >>> DDR_DIR_VECTS (res).create (0); >>> DDR_DIST_VECTS (res).create (0); >>> @@ -1506,82 +1520,217 @@ initialize_data_dependence_relation (str >>> return res; >>> } >>> >>> - /* The case where the references are exactly the same. */ >>> - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) >>> + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); >>> + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); >>> + if (num_dimensions_a == 0 || num_dimensions_b == 0) >>> { >>> - if ((loop_nest.exists () >>> - && !object_address_invariant_in_loop_p (loop_nest[0], >>> - DR_BASE_OBJECT (a))) >>> - || DR_NUM_DIMENSIONS (a) == 0) >>> - { >>> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >>> - return res; >>> - } >>> - DDR_AFFINE_P (res) = true; >>> - DDR_ARE_DEPENDENT (res) = NULL_TREE; >>> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); >>> - DDR_LOOP_NEST (res) = loop_nest; >>> - DDR_INNER_LOOP (res) = 0; >>> - DDR_SELF_REFERENCE (res) = true; >>> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) >>> - { >>> - struct subscript *subscript; >>> + DDR_ARE_DEPENDENT (res) = chrec_dont_know; >>> + return res; >>> + } >>> + >>> + /* For unconstrained bases, the outer (highest-index) subscript >>> + describes a variation in the base of the original DR_REF rather >>> + than a component access. We have no type that accurately describes >>> + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* >>> + applying the outer subscript) so limit the search to the last real >>> + component access. >>> + >>> + E.g. for: >>> >>> - subscript = XNEW (struct subscript); >>> - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); >>> - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); >>> - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; >>> - SUB_DISTANCE (subscript) = chrec_dont_know; >>> - DDR_SUBSCRIPTS (res).safe_push (subscript); >>> + void >>> + f (int a[][8], int b[][8]) >>> + { >>> + for (int i = 0; i < 8; ++i) >>> + a[i * 2][0] = b[i][0]; >>> } >>> - return res; >>> + >>> + the a and b accesses have a single ARRAY_REF component reference [0] >>> + but have two subscripts. */ >>> + if (DR_UNCONSTRAINED_BASE (a)) >>> + num_dimensions_a -= 1; >>> + if (DR_UNCONSTRAINED_BASE (b)) >>> + num_dimensions_b -= 1; >>> + >>> + /* Now look for two sequences of component references that have the same >>> + type in both A and B. The first sequence includes an arbitrary mixture >>> + of array and structure references while the second always ends on a >>> + structure reference. >>> + >>> + The former (arbitrary) sequence uses access functions: >>> + >>> + [START_A, START_A + NUM_DIMENSIONS) of A >>> + [START_B, START_B + NUM_DIMENSIONS) of B >>> + >>> + The latter sequence uses access functions: >>> + >>> + [STRUCT_START_A, STRUCT_START_A + STRUCT_NUM_DIMENSIONS) of A >>> + [STRUCT_START_B, STRUCT_START_B + STRUCT_NUM_DIMENSIONS) of B >>> + >>> + STRUCT_REF_A and STRUCT_REF_B are the outer references for the >> IIUC, A and B always share the same latter sequence, and the common >> latter sequence ends at a structure reference providing alias >> information. > > The A and B accesses aren't necessarily the same, they just have the > compatible types. E.g. for: > > struct s { int x[8]; int y[8]; } *a, *b; > > ... a->x[0] = b->y[1] ... > > the sequence would include: > > a: [0] .x > b: [1] .y I see. > >> Is it possible to record the the former arbitrary >> references instead of simple flag DDR_COULD_BE_INDEPENDENT_P. With >> this information, alias check can be simplified by stripping away >> address computation for the shared common sub-sequence. I doubt >> vect_create_cond_for_alias_checks could detect this kind CSE for now. >> Ah, I see you changed alias check code generation in order to handle >> this. > > The num_dimensions sequence is only used if it ends at the original > base and if the bases are equal. In other cases it doesn't really help. > The struct_num_dimensions sequence is meant to be the one that is > helpful even when the bases aren't equal. > > Like you say, there's a follow-on patch that uses this for runtime > alias checking. > >>> + latter sequence. */ >>> + unsigned int start_a = 0; >>> + unsigned int start_b = 0; >>> + unsigned int num_dimensions = 0; >>> + unsigned int struct_start_a = 0; >>> + unsigned int struct_start_b = 0; >>> + unsigned int struct_num_dimensions = 0; >>> + unsigned int index_a = 0; >>> + unsigned int index_b = 0; >>> + tree next_ref_a = DR_REF (a); >>> + tree next_ref_b = DR_REF (b); >>> + tree struct_ref_a = NULL_TREE; >>> + tree struct_ref_b = NULL_TREE; >>> + while (index_a < num_dimensions_a && index_b < num_dimensions_b) >>> + { >>> + gcc_checking_assert (handled_component_p (next_ref_a)); >>> + gcc_checking_assert (handled_component_p (next_ref_b)); >>> + tree outer_ref_a = TREE_OPERAND (next_ref_a, 0); >>> + tree outer_ref_b = TREE_OPERAND (next_ref_b, 0); >>> + tree type_a = TREE_TYPE (outer_ref_a); >>> + tree type_b = TREE_TYPE (outer_ref_b); >>> + if (types_compatible_p (type_a, type_b)) >>> + { >>> + /* This pair of accesses belong to a suitable sequence. */ >>> + if (start_a + num_dimensions != index_a >>> + || start_b + num_dimensions != index_b) >>> + { >>> + /* Start a new sequence here. */ >>> + start_a = index_a; >>> + start_b = index_b; >>> + num_dimensions = 0; >>> + } >>> + num_dimensions += 1; >>> + if (TREE_CODE (type_a) == RECORD_TYPE) >>> + { >>> + struct_start_a = start_a; >>> + struct_start_b = start_b; >>> + struct_num_dimensions = num_dimensions; >>> + struct_ref_a = outer_ref_a; >>> + struct_ref_b = outer_ref_b; >>> + } >>> + next_ref_a = outer_ref_a; >>> + next_ref_b = outer_ref_b; >>> + index_a += 1; >>> + index_b += 1; >>> + continue; >>> + } >>> + /* Try to approach equal type sizes. */ >>> + if (!COMPLETE_TYPE_P (type_a) >>> + || !COMPLETE_TYPE_P (type_b) >>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >>> + break; >>> + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); >>> + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); >>> + if (size_a <= size_b) >>> + { >>> + index_a += 1; >>> + next_ref_a = outer_ref_a; >>> + } >>> + if (size_b <= size_a) >>> + { >>> + index_b += 1; >>> + next_ref_b = outer_ref_b; >>> + } >>> } >>> >>> - /* If the references do not access the same object, we do not know >>> - whether they alias or not. We do not care about TBAA or alignment >>> - info so we can use OEP_ADDRESS_OF to avoid false negatives. >>> - But the accesses have to use compatible types as otherwise the >>> - built indices would not match. */ >>> - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), >> OEP_ADDRESS_OF) >>> - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), >>> - TREE_TYPE (DR_BASE_OBJECT (b)))) >>> + /* See whether the sequence ends at the base and whether the two bases >>> + are equal. We do not care about TBAA or alignment info so we can use >>> + OEP_ADDRESS_OF to avoid false negatives. */ >>> + tree base_a = DR_BASE_OBJECT (a); >>> + tree base_b = DR_BASE_OBJECT (b); >>> + bool same_base_p = (start_a + num_dimensions == num_dimensions_a >>> + && start_b + num_dimensions == num_dimensions_b >>> + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) >>> + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) >>> + && types_compatible_p (TREE_TYPE (base_a), >>> + TREE_TYPE (base_b)) >>> + && (!loop_nest.exists () >>> + || (object_address_invariant_in_loop_p >>> + (loop_nest[0], base_a)))); >> Major change is in function initialize_data_dependence_relation in >> order to detect partial alias opportunity. The original equality >> check on DR_BASE_OBJECT is bypassed now. IMHO better to introduce a >> new parameter to compute_data_reference_for_loop etc., indicating >> whether we want to handle partial alias opportunity or not. After >> all, such computation is unnecessary for predcom/prefetch/parloop. >> It's only a waste of time computing it. > > Well, it also means that we can now prove the accesses are independent > in more cases. E.g. previously we would assume the a and b accesses in: Predcom cares about dependent refs with constant distance, so independent (neither possible dependent) information based on partial alias is not interested. > > struct s { int x[16]; } *a, *b; > for (int i = 0; i < 8; ++i) > a->x[i] = b->x[i + 8]; > > could conflict. > > If callers don't need to know what the relationship between a and b is, > I think they should check for that before going through the process of > initialising and analysing the ddr. This I don't think so. Users don't have the information to pre-check/analyze reference pair. Even it can do that by repeating most work as in data-ref-analyzer, it sounds not a good practice. That's exactly analyzer's job and the reason why interfaces like compute_data_dependence_for_loop are introduced. It doesn't make much sense requiring users to do additional analysis before looking for help from data-ref-analyzer. Thanks, bin > >>> + >>> + /* If the bases are the same, we can include the base variation too. >>> + E.g. the b accesses in: >>> + >>> + for (int i = 0; i < n; ++i) >>> + b[i + 4][0] = b[i][0]; >>> + >>> + have a definite dependence distance of 4, while for: >>> + >>> + for (int i = 0; i < n; ++i) >>> + a[i + 4][0] = b[i][0]; >>> + >>> + the dependence distance depends on the gap between a and b. >>> + >>> + If the bases are different then we can only rely on the sequence >>> + rooted at a structure access, since arrays are allowed to overlap >>> + arbitrarily and change shape arbitrarily. E.g. we treat this as >>> + valid code: >>> + >>> + int a[256]; >>> + ... >>> + ((int (*)[4][3])&a[1])[i][0] += ((int (*)[4][3])&a[2])[i][0]; >>> + >>> + where two lvalues with the same int[4][3] type overlap, and where >>> + both lvalues are distinct from the object's declared type. */ >>> + if (same_base_p) >>> { >>> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >>> - return res; >>> + if (DR_UNCONSTRAINED_BASE (a)) >>> + num_dimensions += 1; >>> + } >>> + else >>> + { >>> + start_a = struct_start_a; >>> + start_b = struct_start_b; >>> + num_dimensions = struct_num_dimensions; >>> } >>> >>> - /* If the base of the object is not invariant in the loop nest, we cannot >>> - analyze it. TODO -- in fact, it would suffice to record that there may >>> - be arbitrary dependences in the loops where the base object varies. */ >>> - if ((loop_nest.exists () >>> - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT >> (a))) >>> - || DR_NUM_DIMENSIONS (a) == 0) >>> + /* Punt if we didn't find a suitable sequence. */ >>> + if (num_dimensions == 0) >>> { >>> DDR_ARE_DEPENDENT (res) = chrec_dont_know; >>> return res; >>> } >>> >>> - /* If the number of dimensions of the access to not agree we can have >>> - a pointer access to a component of the array element type and an >>> - array access while the base-objects are still the same. Punt. */ >>> - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) >>> + if (!same_base_p) >>> { >>> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >>> - return res; >>> + /* Partial overlap is possible for different bases when strict aliasing >>> + is not in effect. It's also possible if either base involves a union >>> + access; e.g. for: >>> + >>> + struct s1 { int a[2]; }; >>> + struct s2 { struct s1 b; int c; }; >>> + struct s3 { int d; struct s1 e; }; >>> + union u { struct s2 f; struct s3 g; } *p, *q; >>> + >>> + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at >>> + "p->g.e" (base "p->g") and might partially overlap the s1 at >>> + "q->g.e" (base "q->g"). */ >>> + if (!flag_strict_aliasing >>> + || ref_contains_union_access_p (struct_ref_a) >>> + || ref_contains_union_access_p (struct_ref_b)) >>> + { >>> + DDR_ARE_DEPENDENT (res) = chrec_dont_know; >>> + return res; >>> + } >>> + >>> + DDR_COULD_BE_INDEPENDENT_P (res) = true; >>> } >>> >>> DDR_AFFINE_P (res) = true; >>> DDR_ARE_DEPENDENT (res) = NULL_TREE; >>> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); >>> + DDR_SUBSCRIPTS (res).create (num_dimensions); >>> DDR_LOOP_NEST (res) = loop_nest; >>> DDR_INNER_LOOP (res) = 0; >>> DDR_SELF_REFERENCE (res) = false; >>> >>> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) >>> + for (i = 0; i < num_dimensions; ++i) >>> { >>> struct subscript *subscript; >>> >>> subscript = XNEW (struct subscript); >>> + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, start_a + i); >>> + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, start_b + i); >>> SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); >>> SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); >>> SUB_LAST_CONFLICT (subscript) = chrec_dont_know; >>> @@ -3163,14 +3312,15 @@ add_outer_distances (struct data_depende >>> } >>> >>> /* Return false when fail to represent the data dependence as a >>> - distance vector. INIT_B is set to true when a component has been >>> + distance vector. A_INDEX is the index of the first reference >>> + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the >>> + second reference. INIT_B is set to true when a component has been >>> added to the distance vector DIST_V. INDEX_CARRY is then set to >>> the index in DIST_V that carries the dependence. */ >>> >>> static bool >>> build_classic_dist_vector_1 (struct data_dependence_relation *ddr, >>> - struct data_reference *ddr_a, >>> - struct data_reference *ddr_b, >>> + unsigned int a_index, unsigned int b_index, >>> lambda_vector dist_v, bool *init_b, >>> int *index_carry) >>> { >>> @@ -3188,8 +3338,8 @@ build_classic_dist_vector_1 (struct data >>> return false; >>> } >>> >>> - access_fn_a = DR_ACCESS_FN (ddr_a, i); >>> - access_fn_b = DR_ACCESS_FN (ddr_b, i); >>> + access_fn_a = SUB_ACCESS_FN (subscript, a_index); >>> + access_fn_b = SUB_ACCESS_FN (subscript, b_index); >>> >>> if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC >>> && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) >>> @@ -3249,10 +3399,11 @@ build_classic_dist_vector_1 (struct data >>> constant_access_functions (const struct data_dependence_relation *ddr) >>> { >>> unsigned i; >>> + subscript *sub; >>> >>> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >>> - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) >>> - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) >>> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >>> + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) >>> + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) >>> return false; >>> >>> return true; >>> @@ -3315,10 +3466,11 @@ add_other_self_distances (struct data_de >>> lambda_vector dist_v; >>> unsigned i; >>> int index_carry = DDR_NB_LOOPS (ddr); >>> + subscript *sub; >>> >>> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >>> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >>> { >>> - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); >>> + tree access_fun = SUB_ACCESS_FN (sub, 0); >>> >>> if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) >>> { >>> @@ -3330,7 +3482,7 @@ add_other_self_distances (struct data_de >>> return; >>> } >>> >>> - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); >>> + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); >>> >>> if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) >>> add_multivariate_self_dist (ddr, access_fun); >>> @@ -3401,6 +3553,23 @@ add_distance_for_zero_overlaps (struct d >>> } >>> } >>> >>> +/* Return true when the DDR contains two data references that have the >>> + same access functions. */ >>> + >>> +static inline bool >>> +same_access_functions (const struct data_dependence_relation *ddr) >>> +{ >>> + unsigned i; >>> + subscript *sub; >>> + >>> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >>> + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), >>> + SUB_ACCESS_FN (sub, 1))) >>> + return false; >>> + >>> + return true; >>> +} >>> + >>> /* Compute the classic per loop distance vector. DDR is the data >>> dependence relation to build a vector from. Return false when fail >>> to represent the data dependence as a distance vector. */ >>> @@ -3432,8 +3601,7 @@ build_classic_dist_vector (struct data_d >>> } >>> >>> dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >>> - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), >>> - dist_v, &init_b, &index_carry)) >>> + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, >> &index_carry)) >>> return false; >>> >>> /* Save the distance vector if we initialized one. */ >>> @@ -3466,12 +3634,11 @@ build_classic_dist_vector (struct data_d >>> if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) >>> { >>> lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >>> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), >>> - loop_nest)) >>> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) >>> return false; >>> compute_subscript_distance (ddr); >>> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), >>> - save_v, &init_b, &index_carry)) >>> + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, >>> + &index_carry)) >>> return false; >>> save_dist_v (ddr, save_v); >>> DDR_REVERSED_P (ddr) = true; >>> @@ -3507,12 +3674,10 @@ build_classic_dist_vector (struct data_d >>> { >>> lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >>> >>> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), >>> - DDR_A (ddr), loop_nest)) >>> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) >>> return false; >>> compute_subscript_distance (ddr); >>> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), >>> - opposite_v, &init_b, >>> + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, >>> &index_carry)) >>> return false; >>> >>> @@ -3591,13 +3756,13 @@ build_classic_dir_vector (struct data_de >>> } >>> } >>> >>> -/* Helper function. Returns true when there is a dependence between >>> - data references DRA and DRB. */ >>> +/* Helper function. Returns true when there is a dependence between the >>> + data references. A_INDEX is the index of the first reference (0 for >>> + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ >>> >>> static bool >>> subscript_dependence_tester_1 (struct data_dependence_relation *ddr, >>> - struct data_reference *dra, >>> - struct data_reference *drb, >>> + unsigned int a_index, unsigned int b_index, >>> struct loop *loop_nest) >>> { >>> unsigned int i; >>> @@ -3609,8 +3774,8 @@ subscript_dependence_tester_1 (struct da >>> { >>> conflict_function *overlaps_a, *overlaps_b; >>> >>> - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), >>> - DR_ACCESS_FN (drb, i), >>> + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), >>> + SUB_ACCESS_FN (subscript, b_index), >>> &overlaps_a, &overlaps_b, >>> &last_conflicts, loop_nest); >>> >>> @@ -3659,7 +3824,7 @@ subscript_dependence_tester_1 (struct da >>> subscript_dependence_tester (struct data_dependence_relation *ddr, >>> struct loop *loop_nest) >>> { >>> - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), >> loop_nest)) >>> + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) >>> dependence_stats.num_dependence_dependent++; >>> >>> compute_subscript_distance (ddr); >>> Index: gcc/tree-ssa-loop-prefetch.c >>> =================================================================== >>> --- gcc/tree-ssa-loop-prefetch.c 2017-03-28 16:19:28.000000000 +0100 >>> +++ gcc/tree-ssa-loop-prefetch.c 2017-05-03 08:48:48.737038502 +0100 >>> @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop * >>> refb = (struct mem_ref *) DDR_B (dep)->aux; >>> >>> if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know >>> + || DDR_COULD_BE_INDEPENDENT_P (dep) >>> || DDR_NUM_DIST_VECTS (dep) == 0) >>> { >>> /* If the dependence cannot be analyzed, assume that there might be >> As said, we could avoid computing such information in the first place. >> I can see predcom ingores it by explicitly checking DR_BASE_OBJECT, >> what about tree-parloops.c? > > For parloops, it should help that we can now prove lack of dependence > in more cases. > > Thanks, > Richard

"Bin.Cheng" <amker.cheng@gmail.com> writes: > On Thu, May 4, 2017 at 11:06 AM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> "Bin.Cheng" <amker.cheng@gmail.com> writes: >>> On Wed, May 3, 2017 at 9:00 AM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> Index: gcc/tree-data-ref.c >>>> =================================================================== >>>> --- gcc/tree-data-ref.c 2017-02-23 19:54:15.000000000 +0000 >>>> +++ gcc/tree-data-ref.c 2017-05-03 08:48:48.737038502 +0100 >>>> @@ -123,8 +123,7 @@ Software Foundation; either version 3, o >>>> } dependence_stats; >>>> >>>> static bool subscript_dependence_tester_1 (struct > data_dependence_relation *, >>>> - struct data_reference *, >>>> - struct data_reference *, >>>> + unsigned int, unsigned int, >>>> struct loop *); >>> As mentioned, how about passing access_fn directly, rather than less >>> meaningful 0/1 values? >> >> The problem is that access_fn is a property of the individual >> subscripts, whereas this is operating on a full data_reference. >> >> One alternative would be to use conditions like: >> >> first_is_a ? SUB_ACCESS_FN_A (sub) : SUB_ACCESS_FN_B (sub) >> >> but IMO that's less readable than the existing: >> >> SUB_ACCESS_FN (sub, index) >> >> Or we could have individual access_fn arrays for A and B, separate >> from the main subscript array, but that would mean allocating three >> arrays instead of one. > Thanks for explanation, I see the problem now. Even the latter > sequence could be different for A and B, there should have the same > number index? If that's the case, is it possible just recording the > starting position (or length) in DR_ACCESS_FN and use that information > to access to access_fn vector. This can save the copy in subscript. > Anyway, this is not am important problem. I think that would mean trading fields in the subscript for fields in the main ddr structure. One advantage of doing it in the subscript is that those are freed after the analysis is complete, whereas the ddr stays around until the caller has finished with it. >>>> + latter sequence. */ >>>> + unsigned int start_a = 0; >>>> + unsigned int start_b = 0; >>>> + unsigned int num_dimensions = 0; >>>> + unsigned int struct_start_a = 0; >>>> + unsigned int struct_start_b = 0; >>>> + unsigned int struct_num_dimensions = 0; >>>> + unsigned int index_a = 0; >>>> + unsigned int index_b = 0; >>>> + tree next_ref_a = DR_REF (a); >>>> + tree next_ref_b = DR_REF (b); >>>> + tree struct_ref_a = NULL_TREE; >>>> + tree struct_ref_b = NULL_TREE; >>>> + while (index_a < num_dimensions_a && index_b < num_dimensions_b) >>>> + { >>>> + gcc_checking_assert (handled_component_p (next_ref_a)); >>>> + gcc_checking_assert (handled_component_p (next_ref_b)); >>>> + tree outer_ref_a = TREE_OPERAND (next_ref_a, 0); >>>> + tree outer_ref_b = TREE_OPERAND (next_ref_b, 0); >>>> + tree type_a = TREE_TYPE (outer_ref_a); >>>> + tree type_b = TREE_TYPE (outer_ref_b); >>>> + if (types_compatible_p (type_a, type_b)) >>>> + { >>>> + /* This pair of accesses belong to a suitable sequence. */ >>>> + if (start_a + num_dimensions != index_a >>>> + || start_b + num_dimensions != index_b) >>>> + { >>>> + /* Start a new sequence here. */ >>>> + start_a = index_a; >>>> + start_b = index_b; >>>> + num_dimensions = 0; >>>> + } >>>> + num_dimensions += 1; >>>> + if (TREE_CODE (type_a) == RECORD_TYPE) >>>> + { >>>> + struct_start_a = start_a; >>>> + struct_start_b = start_b; >>>> + struct_num_dimensions = num_dimensions; >>>> + struct_ref_a = outer_ref_a; >>>> + struct_ref_b = outer_ref_b; >>>> + } >>>> + next_ref_a = outer_ref_a; >>>> + next_ref_b = outer_ref_b; >>>> + index_a += 1; >>>> + index_b += 1; >>>> + continue; >>>> + } >>>> + /* Try to approach equal type sizes. */ >>>> + if (!COMPLETE_TYPE_P (type_a) >>>> + || !COMPLETE_TYPE_P (type_b) >>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >>>> + break; >>>> + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT > (type_a)); >>>> + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT > (type_b)); >>>> + if (size_a <= size_b) >>>> + { >>>> + index_a += 1; >>>> + next_ref_a = outer_ref_a; >>>> + } >>>> + if (size_b <= size_a) >>>> + { >>>> + index_b += 1; >>>> + next_ref_b = outer_ref_b; >>>> + } >>>> } >>>> >>>> - /* If the references do not access the same object, we do not know >>>> - whether they alias or not. We do not care about TBAA or alignment >>>> - info so we can use OEP_ADDRESS_OF to avoid false negatives. >>>> - But the accesses have to use compatible types as otherwise the >>>> - built indices would not match. */ >>>> - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), >>> OEP_ADDRESS_OF) >>>> - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), >>>> - TREE_TYPE (DR_BASE_OBJECT (b)))) >>>> + /* See whether the sequence ends at the base and whether the two bases >>>> + are equal. We do not care about TBAA or alignment info so we can use >>>> + OEP_ADDRESS_OF to avoid false negatives. */ >>>> + tree base_a = DR_BASE_OBJECT (a); >>>> + tree base_b = DR_BASE_OBJECT (b); >>>> + bool same_base_p = (start_a + num_dimensions == num_dimensions_a >>>> + && start_b + num_dimensions == num_dimensions_b >>>> + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) >>>> + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) >>>> + && types_compatible_p (TREE_TYPE (base_a), >>>> + TREE_TYPE (base_b)) >>>> + && (!loop_nest.exists () >>>> + || (object_address_invariant_in_loop_p >>>> + (loop_nest[0], base_a)))); >>> Major change is in function initialize_data_dependence_relation in >>> order to detect partial alias opportunity. The original equality >>> check on DR_BASE_OBJECT is bypassed now. IMHO better to introduce a >>> new parameter to compute_data_reference_for_loop etc., indicating >>> whether we want to handle partial alias opportunity or not. After >>> all, such computation is unnecessary for predcom/prefetch/parloop. >>> It's only a waste of time computing it. >> >> Well, it also means that we can now prove the accesses are independent >> in more cases. E.g. previously we would assume the a and b accesses in: > Predcom cares about dependent refs with constant distance, so > independent (neither possible dependent) information based on partial > alias is not interested. Yeah, but it checks for that itself, based on the individual data references. E.g.: if (!operand_equal_p (DR_STEP (a), DR_STEP (b), 0) || !operand_equal_p (DR_BASE_ADDRESS (a), DR_BASE_ADDRESS (b), 0)) return false; This condition is independent of the ddr. My point is that if we wanted to avoid doing redundant work in this kind of situation, we should reorganise things so that we filter uninteresting pairs of references out *before* creating the ddr. Even if we passed a flag down to initialize_data_dependence_relation, we'd still end up with a ddr for the uninteresting pairs, just like we have now. And if you're using compute_all_dependences, those pairs still count towards PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS. (I think the only thing predcom uses the ddr for is to check for relations that are known to be independent. So creating specific distance and direction vectors is itself redundant work.) >> >> struct s { int x[16]; } *a, *b; >> for (int i = 0; i < 8; ++i) >> a->x[i] = b->x[i + 8]; >> >> could conflict. >> >> If callers don't need to know what the relationship between a and b is, >> I think they should check for that before going through the process of >> initialising and analysing the ddr. > This I don't think so. Users don't have the information to > pre-check/analyze reference pair. Even it can do that by repeating > most work as in data-ref-analyzer, it sounds not a good practice. > That's exactly analyzer's job and the reason why interfaces like > compute_data_dependence_for_loop are introduced. It doesn't make much > sense requiring users to do additional analysis before looking for > help from data-ref-analyzer. (The reply above was for this too.) Thanks, Richard

On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > This patch tries to calculate conservatively-correct distance > vectors for two references whose base addresses are not the same. > It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence > isn't guaranteed to occur. > > The motivating example is: > > struct s { int x[8]; }; > void > f (struct s *a, struct s *b) > { > for (int i = 0; i < 8; ++i) > a->x[i] += b->x[i]; > } > > in which the "a" and "b" accesses are either independent or have a > dependence distance of 0 (assuming -fstrict-aliasing). Neither case > prevents vectorisation, so we can vectorise without an alias check. > > I'd originally wanted to do the same thing for arrays as well, e.g.: > > void > f (int a[][8], struct b[][8]) > { > for (int i = 0; i < 8; ++i) > a[0][i] += b[0][i]; > } > > I think this is valid because C11 6.7.6.2/6 says: > > For two array types to be compatible, both shall have compatible > element types, and if both size specifiers are present, and are > integer constant expressions, then both size specifiers shall have > the same constant value. > > So if we access an array through an int (*)[8], it must have type X[8] > or X[], where X is compatible with int. It doesn't seem possible in > either case for "a[0]" and "b[0]" to overlap when "a != b". > > However, Richard B said that (at least in gimple) we support arbitrary > overlap of arrays and allow arrays to be accessed with different > dimensionality. There are examples of this in PR50067. I've therefore > only handled references that end in a structure field access. > > There are two ways of handling these dependences in the vectoriser: > use them to limit VF, or check at runtime as before. I've gone for > the approach of checking at runtime if we can, to avoid limiting VF > unnecessarily. We still fall back to a VF cap when runtime checks > aren't allowed. > > The patch tests whether we queued an alias check with a dependence > distance of X and then picked a VF <= X, in which case it's safe to > drop the alias check. Since vect_prune_runtime_alias_check_list can > be called twice with different VF for the same loop, it's no longer > safe to clear may_alias_ddrs on exit. Instead we should use > comp_alias_ddrs to check whether versioning is necessary. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? You seem to do your "fancy" thing but also later compute the old base equality anyway (for same_base_p). It looks to me for this case the new fancy code can be simply skipped, keeping num_dimensions as before? + /* Try to approach equal type sizes. */ + if (!COMPLETE_TYPE_P (type_a) + || !COMPLETE_TYPE_P (type_b) + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) + break; ah, interesting idea to avoid a quadratic search. Note that you should conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR as they are used for type-punning. I see nonoverlapping_component_refs_of_decl_p should simply skip ARRAY_REFs - but I also see there: /* ??? We cannot simply use the type of operand #0 of the refs here as the Fortran compiler smuggles type punning into COMPONENT_REFs for common blocks instead of using unions like everyone else. */ tree type1 = DECL_CONTEXT (field1); tree type2 = DECL_CONTEXT (field2); so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility. You also may not use types_compatible_p here as for LTO that is _way_ too lax for aggregates. The above uses /* We cannot disambiguate fields in a union or qualified union. */ if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) return false; so you should also bail out on unions here, rather than the check you do later. You seem to rely on getting an access_fn entry for each handled_component_p. It looks like this is the case -- we even seem to stop at unions (with the same fortran "issue"). I'm not sure that's the best thing to do but you rely on that. I don't understand the looping, it needs more comments. You seem to be looking for the innermost compatible RECORD_TYPE but then num_dimensions is how many compatible refs you found on the way (with incompatible ones not counting?!). What about an inner varying array of structs? This seems to be disregarded in the analysis now? Thus, a[i].s.b[i].j vs. __real b[i].s.b[i].j? nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p conveniently start from the other end of the ref here. Richard. > Thanks, > Richard > > > gcc/ > 2017-05-03 Richard Sandiford <richard.sandiford@linaro.org> > > * tree-data-ref.h (subscript): Add access_fn field. > (data_dependence_relation): Add could_be_independent_p. > (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. > (same_access_functions): Move to tree-data-ref.c. > * tree-data-ref.c (ref_contains_union_access_p): New function. > (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of > DR_ACCESS_FN. > (constant_access_functions): Likewise. > (add_other_self_distances): Likewise. > (same_access_functions): Likewise. (Moved from tree-data-ref.h.) > (initialize_data_dependence_relation): Use XCNEW and remove > explicit zeroing of DDR_REVERSED_P. Look for a subsequence > of access functions that have the same type. Allow the > subsequence to end with different bases in some circumstances. > Record the chosen access functions in SUB_ACCESS_FN. > (build_classic_dist_vector_1): Replace ddr_a and ddr_b with > a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. > (subscript_dependence_tester_1): Likewise dra and drb. > (build_classic_dist_vector): Update calls accordingly. > (subscript_dependence_tester): Likewise. > * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check > DDR_COULD_BE_INDEPENDENT_P. > * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test > comp_alias_ddrs instead of may_alias_ddrs. > * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Try > to mark for aliasing if DDR_COULD_BE_INDEPENDENT_P, but fall back > to using the recorded distance vectors if that fails. > (dependence_distance_ge_vf): New function. > (vect_prune_runtime_alias_test_list): Use it. Don't clear > LOOP_VINFO_MAY_ALIAS_DDRS. > > gcc/testsuite/ > * gcc.dg/vect/vect-alias-check-3.c: New test. > * gcc.dg/vect/vect-alias-check-4.c: Likewise. > * gcc.dg/vect/vect-alias-check-5.c: Likewise. > > Index: gcc/tree-data-ref.h > =================================================================== > --- gcc/tree-data-ref.h 2017-05-03 08:48:11.977015306 +0100 > +++ gcc/tree-data-ref.h 2017-05-03 08:48:48.737038502 +0100 > @@ -191,6 +191,9 @@ struct conflict_function > > struct subscript > { > + /* The access functions of the two references. */ > + tree access_fn[2]; > + > /* A description of the iterations for which the elements are > accessed twice. */ > conflict_function *conflicting_iterations_in_a; > @@ -209,6 +212,7 @@ struct subscript > > typedef struct subscript *subscript_p; > > +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] > #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a > #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b > #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict > @@ -264,6 +268,33 @@ struct data_dependence_relation > /* Set to true when the dependence relation is on the same data > access. */ > bool self_reference_p; > + > + /* True if the dependence described is conservatively correct rather > + than exact, and if it is still possible for the accesses to be > + conditionally independent. For example, the a and b references in: > + > + struct s *a, *b; > + for (int i = 0; i < n; ++i) > + a->f[i] += b->f[i]; > + > + conservatively have a distance vector of (0), for the case in which > + a == b, but the accesses are independent if a != b. Similarly, > + the a and b references in: > + > + struct s *a, *b; > + for (int i = 0; i < n; ++i) > + a[0].f[i] += b[i].f[i]; > + > + conservatively have a distance vector of (0), but they are indepenent > + when a != b + i. In contrast, the references in: > + > + struct s *a; > + for (int i = 0; i < n; ++i) > + a->f[i] += a->f[i]; > + > + have the same distance vector of (0), but the accesses can never be > + independent. */ > + bool could_be_independent_p; > }; > > typedef struct data_dependence_relation *ddr_p; > @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \ > #define DDR_DIST_VECT(DDR, I) \ > DDR_DIST_VECTS (DDR)[I] > #define DDR_REVERSED_P(DDR) (DDR)->reversed_p > +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p > > > bool dr_analyze_innermost (struct data_reference *, struct loop *); > @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data > return false; > > return true; > -} > - > -/* Return true when the DDR contains two data references that have the > - same access functions. */ > - > -static inline bool > -same_access_functions (const struct data_dependence_relation *ddr) > -{ > - unsigned i; > - > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), > - DR_ACCESS_FN (DDR_B (ddr), i))) > - return false; > - > - return true; > } > > /* Returns true when all the dependences are computable. */ > Index: gcc/tree-data-ref.c > =================================================================== > --- gcc/tree-data-ref.c 2017-02-23 19:54:15.000000000 +0000 > +++ gcc/tree-data-ref.c 2017-05-03 08:48:48.737038502 +0100 > @@ -123,8 +123,7 @@ Software Foundation; either version 3, o > } dependence_stats; > > static bool subscript_dependence_tester_1 (struct data_dependence_relation *, > - struct data_reference *, > - struct data_reference *, > + unsigned int, unsigned int, > struct loop *); > /* Returns true iff A divides B. */ > > @@ -144,6 +143,21 @@ int_divides_p (int a, int b) > return ((b % a) == 0); > } > > +/* Return true if reference REF contains a union access. */ > + > +static bool > +ref_contains_union_access_p (tree ref) > +{ > + while (handled_component_p (ref)) > + { > + ref = TREE_OPERAND (ref, 0); > + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE > + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) > + return true; > + } > + return false; > +} > + > > > /* Dump into FILE all the data references from DATAREFS. */ > @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out > unsigned int i; > struct loop *loopi; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > + subscript *sub; > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > { > fprintf (outf, " access_fn_A: "); > - print_generic_stmt (outf, DR_ACCESS_FN (dra, i), 0); > + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0), 0); > fprintf (outf, " access_fn_B: "); > - print_generic_stmt (outf, DR_ACCESS_FN (drb, i), 0); > - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); > + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1), 0); > + dump_subscript (outf, sub); > } > > fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); > @@ -1484,11 +1499,10 @@ initialize_data_dependence_relation (str > struct data_dependence_relation *res; > unsigned int i; > > - res = XNEW (struct data_dependence_relation); > + res = XCNEW (struct data_dependence_relation); > DDR_A (res) = a; > DDR_B (res) = b; > DDR_LOOP_NEST (res).create (0); > - DDR_REVERSED_P (res) = false; > DDR_SUBSCRIPTS (res).create (0); > DDR_DIR_VECTS (res).create (0); > DDR_DIST_VECTS (res).create (0); > @@ -1506,82 +1520,217 @@ initialize_data_dependence_relation (str > return res; > } > > - /* The case where the references are exactly the same. */ > - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) > + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); > + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); > + if (num_dimensions_a == 0 || num_dimensions_b == 0) > { > - if ((loop_nest.exists () > - && !object_address_invariant_in_loop_p (loop_nest[0], > - DR_BASE_OBJECT (a))) > - || DR_NUM_DIMENSIONS (a) == 0) > - { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > - } > - DDR_AFFINE_P (res) = true; > - DDR_ARE_DEPENDENT (res) = NULL_TREE; > - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); > - DDR_LOOP_NEST (res) = loop_nest; > - DDR_INNER_LOOP (res) = 0; > - DDR_SELF_REFERENCE (res) = true; > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > - { > - struct subscript *subscript; > + DDR_ARE_DEPENDENT (res) = chrec_dont_know; > + return res; > + } > + > + /* For unconstrained bases, the outer (highest-index) subscript > + describes a variation in the base of the original DR_REF rather > + than a component access. We have no type that accurately describes > + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* > + applying the outer subscript) so limit the search to the last real > + component access. > + > + E.g. for: > > - subscript = XNEW (struct subscript); > - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); > - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); > - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; > - SUB_DISTANCE (subscript) = chrec_dont_know; > - DDR_SUBSCRIPTS (res).safe_push (subscript); > + void > + f (int a[][8], int b[][8]) > + { > + for (int i = 0; i < 8; ++i) > + a[i * 2][0] = b[i][0]; > } > - return res; > + > + the a and b accesses have a single ARRAY_REF component reference [0] > + but have two subscripts. */ > + if (DR_UNCONSTRAINED_BASE (a)) > + num_dimensions_a -= 1; > + if (DR_UNCONSTRAINED_BASE (b)) > + num_dimensions_b -= 1; > + > + /* Now look for two sequences of component references that have the same > + type in both A and B. The first sequence includes an arbitrary mixture > + of array and structure references while the second always ends on a > + structure reference. > + > + The former (arbitrary) sequence uses access functions: > + > + [START_A, START_A + NUM_DIMENSIONS) of A > + [START_B, START_B + NUM_DIMENSIONS) of B > + > + The latter sequence uses access functions: > + > + [STRUCT_START_A, STRUCT_START_A + STRUCT_NUM_DIMENSIONS) of A > + [STRUCT_START_B, STRUCT_START_B + STRUCT_NUM_DIMENSIONS) of B > + > + STRUCT_REF_A and STRUCT_REF_B are the outer references for the > + latter sequence. */ > + unsigned int start_a = 0; > + unsigned int start_b = 0; > + unsigned int num_dimensions = 0; > + unsigned int struct_start_a = 0; > + unsigned int struct_start_b = 0; > + unsigned int struct_num_dimensions = 0; > + unsigned int index_a = 0; > + unsigned int index_b = 0; > + tree next_ref_a = DR_REF (a); > + tree next_ref_b = DR_REF (b); > + tree struct_ref_a = NULL_TREE; > + tree struct_ref_b = NULL_TREE; > + while (index_a < num_dimensions_a && index_b < num_dimensions_b) > + { > + gcc_checking_assert (handled_component_p (next_ref_a)); > + gcc_checking_assert (handled_component_p (next_ref_b)); > + tree outer_ref_a = TREE_OPERAND (next_ref_a, 0); > + tree outer_ref_b = TREE_OPERAND (next_ref_b, 0); > + tree type_a = TREE_TYPE (outer_ref_a); > + tree type_b = TREE_TYPE (outer_ref_b); > + if (types_compatible_p (type_a, type_b)) > + { > + /* This pair of accesses belong to a suitable sequence. */ > + if (start_a + num_dimensions != index_a > + || start_b + num_dimensions != index_b) > + { > + /* Start a new sequence here. */ > + start_a = index_a; > + start_b = index_b; > + num_dimensions = 0; > + } > + num_dimensions += 1; > + if (TREE_CODE (type_a) == RECORD_TYPE) > + { > + struct_start_a = start_a; > + struct_start_b = start_b; > + struct_num_dimensions = num_dimensions; > + struct_ref_a = outer_ref_a; > + struct_ref_b = outer_ref_b; > + } > + next_ref_a = outer_ref_a; > + next_ref_b = outer_ref_b; > + index_a += 1; > + index_b += 1; > + continue; > + } > + /* Try to approach equal type sizes. */ > + if (!COMPLETE_TYPE_P (type_a) > + || !COMPLETE_TYPE_P (type_b) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) > + break; > + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); > + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); > + if (size_a <= size_b) > + { > + index_a += 1; > + next_ref_a = outer_ref_a; > + } > + if (size_b <= size_a) > + { > + index_b += 1; > + next_ref_b = outer_ref_b; > + } > } > > - /* If the references do not access the same object, we do not know > - whether they alias or not. We do not care about TBAA or alignment > - info so we can use OEP_ADDRESS_OF to avoid false negatives. > - But the accesses have to use compatible types as otherwise the > - built indices would not match. */ > - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF) > - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), > - TREE_TYPE (DR_BASE_OBJECT (b)))) > + /* See whether the sequence ends at the base and whether the two bases > + are equal. We do not care about TBAA or alignment info so we can use > + OEP_ADDRESS_OF to avoid false negatives. */ > + tree base_a = DR_BASE_OBJECT (a); > + tree base_b = DR_BASE_OBJECT (b); > + bool same_base_p = (start_a + num_dimensions == num_dimensions_a > + && start_b + num_dimensions == num_dimensions_b > + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) > + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) > + && types_compatible_p (TREE_TYPE (base_a), > + TREE_TYPE (base_b)) > + && (!loop_nest.exists () > + || (object_address_invariant_in_loop_p > + (loop_nest[0], base_a)))); > + > + /* If the bases are the same, we can include the base variation too. > + E.g. the b accesses in: > + > + for (int i = 0; i < n; ++i) > + b[i + 4][0] = b[i][0]; > + > + have a definite dependence distance of 4, while for: > + > + for (int i = 0; i < n; ++i) > + a[i + 4][0] = b[i][0]; > + > + the dependence distance depends on the gap between a and b. > + > + If the bases are different then we can only rely on the sequence > + rooted at a structure access, since arrays are allowed to overlap > + arbitrarily and change shape arbitrarily. E.g. we treat this as > + valid code: > + > + int a[256]; > + ... > + ((int (*)[4][3])&a[1])[i][0] += ((int (*)[4][3])&a[2])[i][0]; > + > + where two lvalues with the same int[4][3] type overlap, and where > + both lvalues are distinct from the object's declared type. */ > + if (same_base_p) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + if (DR_UNCONSTRAINED_BASE (a)) > + num_dimensions += 1; > + } > + else > + { > + start_a = struct_start_a; > + start_b = struct_start_b; > + num_dimensions = struct_num_dimensions; > } > > - /* If the base of the object is not invariant in the loop nest, we cannot > - analyze it. TODO -- in fact, it would suffice to record that there may > - be arbitrary dependences in the loops where the base object varies. */ > - if ((loop_nest.exists () > - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a))) > - || DR_NUM_DIMENSIONS (a) == 0) > + /* Punt if we didn't find a suitable sequence. */ > + if (num_dimensions == 0) > { > DDR_ARE_DEPENDENT (res) = chrec_dont_know; > return res; > } > > - /* If the number of dimensions of the access to not agree we can have > - a pointer access to a component of the array element type and an > - array access while the base-objects are still the same. Punt. */ > - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) > + if (!same_base_p) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + /* Partial overlap is possible for different bases when strict aliasing > + is not in effect. It's also possible if either base involves a union > + access; e.g. for: > + > + struct s1 { int a[2]; }; > + struct s2 { struct s1 b; int c; }; > + struct s3 { int d; struct s1 e; }; > + union u { struct s2 f; struct s3 g; } *p, *q; > + > + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at > + "p->g.e" (base "p->g") and might partially overlap the s1 at > + "q->g.e" (base "q->g"). */ > + if (!flag_strict_aliasing > + || ref_contains_union_access_p (struct_ref_a) > + || ref_contains_union_access_p (struct_ref_b)) > + { > + DDR_ARE_DEPENDENT (res) = chrec_dont_know; > + return res; > + } > + > + DDR_COULD_BE_INDEPENDENT_P (res) = true; > } > > DDR_AFFINE_P (res) = true; > DDR_ARE_DEPENDENT (res) = NULL_TREE; > - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); > + DDR_SUBSCRIPTS (res).create (num_dimensions); > DDR_LOOP_NEST (res) = loop_nest; > DDR_INNER_LOOP (res) = 0; > DDR_SELF_REFERENCE (res) = false; > > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > + for (i = 0; i < num_dimensions; ++i) > { > struct subscript *subscript; > > subscript = XNEW (struct subscript); > + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, start_a + i); > + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, start_b + i); > SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); > SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); > SUB_LAST_CONFLICT (subscript) = chrec_dont_know; > @@ -3163,14 +3312,15 @@ add_outer_distances (struct data_depende > } > > /* Return false when fail to represent the data dependence as a > - distance vector. INIT_B is set to true when a component has been > + distance vector. A_INDEX is the index of the first reference > + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the > + second reference. INIT_B is set to true when a component has been > added to the distance vector DIST_V. INDEX_CARRY is then set to > the index in DIST_V that carries the dependence. */ > > static bool > build_classic_dist_vector_1 (struct data_dependence_relation *ddr, > - struct data_reference *ddr_a, > - struct data_reference *ddr_b, > + unsigned int a_index, unsigned int b_index, > lambda_vector dist_v, bool *init_b, > int *index_carry) > { > @@ -3188,8 +3338,8 @@ build_classic_dist_vector_1 (struct data > return false; > } > > - access_fn_a = DR_ACCESS_FN (ddr_a, i); > - access_fn_b = DR_ACCESS_FN (ddr_b, i); > + access_fn_a = SUB_ACCESS_FN (subscript, a_index); > + access_fn_b = SUB_ACCESS_FN (subscript, b_index); > > if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC > && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) > @@ -3249,10 +3399,11 @@ build_classic_dist_vector_1 (struct data > constant_access_functions (const struct data_dependence_relation *ddr) > { > unsigned i; > + subscript *sub; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) > - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) > + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) > return false; > > return true; > @@ -3315,10 +3466,11 @@ add_other_self_distances (struct data_de > lambda_vector dist_v; > unsigned i; > int index_carry = DDR_NB_LOOPS (ddr); > + subscript *sub; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > { > - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); > + tree access_fun = SUB_ACCESS_FN (sub, 0); > > if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) > { > @@ -3330,7 +3482,7 @@ add_other_self_distances (struct data_de > return; > } > > - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); > + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); > > if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) > add_multivariate_self_dist (ddr, access_fun); > @@ -3401,6 +3553,23 @@ add_distance_for_zero_overlaps (struct d > } > } > > +/* Return true when the DDR contains two data references that have the > + same access functions. */ > + > +static inline bool > +same_access_functions (const struct data_dependence_relation *ddr) > +{ > + unsigned i; > + subscript *sub; > + > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), > + SUB_ACCESS_FN (sub, 1))) > + return false; > + > + return true; > +} > + > /* Compute the classic per loop distance vector. DDR is the data > dependence relation to build a vector from. Return false when fail > to represent the data dependence as a distance vector. */ > @@ -3432,8 +3601,7 @@ build_classic_dist_vector (struct data_d > } > > dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), > - dist_v, &init_b, &index_carry)) > + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry)) > return false; > > /* Save the distance vector if we initialized one. */ > @@ -3466,12 +3634,11 @@ build_classic_dist_vector (struct data_d > if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) > { > lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - loop_nest)) > + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) > return false; > compute_subscript_distance (ddr); > - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - save_v, &init_b, &index_carry)) > + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, > + &index_carry)) > return false; > save_dist_v (ddr, save_v); > DDR_REVERSED_P (ddr) = true; > @@ -3507,12 +3674,10 @@ build_classic_dist_vector (struct data_d > { > lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > > - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), > - DDR_A (ddr), loop_nest)) > + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) > return false; > compute_subscript_distance (ddr); > - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - opposite_v, &init_b, > + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, > &index_carry)) > return false; > > @@ -3591,13 +3756,13 @@ build_classic_dir_vector (struct data_de > } > } > > -/* Helper function. Returns true when there is a dependence between > - data references DRA and DRB. */ > +/* Helper function. Returns true when there is a dependence between the > + data references. A_INDEX is the index of the first reference (0 for > + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ > > static bool > subscript_dependence_tester_1 (struct data_dependence_relation *ddr, > - struct data_reference *dra, > - struct data_reference *drb, > + unsigned int a_index, unsigned int b_index, > struct loop *loop_nest) > { > unsigned int i; > @@ -3609,8 +3774,8 @@ subscript_dependence_tester_1 (struct da > { > conflict_function *overlaps_a, *overlaps_b; > > - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), > - DR_ACCESS_FN (drb, i), > + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), > + SUB_ACCESS_FN (subscript, b_index), > &overlaps_a, &overlaps_b, > &last_conflicts, loop_nest); > > @@ -3659,7 +3824,7 @@ subscript_dependence_tester_1 (struct da > subscript_dependence_tester (struct data_dependence_relation *ddr, > struct loop *loop_nest) > { > - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest)) > + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) > dependence_stats.num_dependence_dependent++; > > compute_subscript_distance (ddr); > Index: gcc/tree-ssa-loop-prefetch.c > =================================================================== > --- gcc/tree-ssa-loop-prefetch.c 2017-03-28 16:19:28.000000000 +0100 > +++ gcc/tree-ssa-loop-prefetch.c 2017-05-03 08:48:48.737038502 +0100 > @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop * > refb = (struct mem_ref *) DDR_B (dep)->aux; > > if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know > + || DDR_COULD_BE_INDEPENDENT_P (dep) > || DDR_NUM_DIST_VECTS (dep) == 0) > { > /* If the dependence cannot be analyzed, assume that there might be > Index: gcc/tree-vectorizer.h > =================================================================== > --- gcc/tree-vectorizer.h 2017-03-28 16:19:28.000000000 +0100 > +++ gcc/tree-vectorizer.h 2017-05-03 08:48:48.738045102 +0100 > @@ -383,7 +383,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) > #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ > ((L)->may_misalign_stmts.length () > 0) > #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ > - ((L)->may_alias_ddrs.length () > 0) > + ((L)->comp_alias_ddrs.length () > 0) > #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ > (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) > #define LOOP_REQUIRES_VERSIONING(L) \ > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2017-05-03 08:48:30.536704993 +0100 > +++ gcc/tree-vect-data-refs.c 2017-05-03 08:48:48.738045102 +0100 > @@ -340,6 +340,26 @@ vect_analyze_data_ref_dependence (struct > } > > loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); > + > + if (DDR_COULD_BE_INDEPENDENT_P (ddr)) > + /* For dependence distances of 2 or more, we have the option of > + limiting VF or checking for an alias at runtime. Prefer to check > + at runtime if we can, to avoid limiting the VF unnecessarily when > + the bases are in fact independent. > + > + Note that the alias checks will be removed if the VF ends up > + being small enough. */ > + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > + { > + int dist = dist_v[loop_depth]; > + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) > + { > + if (vect_mark_for_runtime_alias_test (ddr, loop_vinfo)) > + return false; > + break; > + } > + } > + > FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > { > int dist = dist_v[loop_depth]; > @@ -3017,6 +3037,44 @@ vect_no_alias_p (struct data_reference * > return false; > } > > +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH > + in DDR is >= VF. */ > + > +static bool > +dependence_distance_ge_vf (data_dependence_relation *ddr, > + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) > +{ > + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE > + || DDR_NUM_DIST_VECTS (ddr) == 0) > + return false; > + > + /* If the dependence is exact, we should have limited the VF instead. */ > + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); > + > + unsigned int i; > + lambda_vector dist_v; > + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > + { > + HOST_WIDE_INT dist = dist_v[loop_depth]; > + if (dist != 0 > + && !(dist > 0 && DDR_REVERSED_P (ddr)) > + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) > + return false; > + } > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, > + "dependence distance between "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); > + dump_printf (MSG_NOTE, " and "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); > + dump_printf (MSG_NOTE, " is >= VF\n"); > + } > + > + return true; > +} > + > /* Function vect_prune_runtime_alias_test_list. > > Prune a list of ddrs to be tested at run-time by versioning for alias. > @@ -3075,6 +3133,10 @@ vect_prune_runtime_alias_test_list (loop > > comp_alias_ddrs.create (may_alias_ddrs.length ()); > > + unsigned int loop_depth > + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, > + LOOP_VINFO_LOOP_NEST (loop_vinfo)); > + > /* First, we collect all data ref pairs for aliasing checks. */ > FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) > { > @@ -3084,6 +3146,11 @@ vect_prune_runtime_alias_test_list (loop > tree segment_length_a, segment_length_b; > gimple *stmt_a, *stmt_b; > > + /* Ignore the alias if the VF we chose ended up being no greater > + than the dependence distance. */ > + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) > + continue; > + > dr_a = DDR_A (ddr); > stmt_a = DR_STMT (DDR_A (ddr)); > dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); > @@ -3294,10 +3361,6 @@ vect_prune_runtime_alias_test_list (loop > return false; > } > > - /* All alias checks have been resolved at compilation time. */ > - if (!comp_alias_ddrs.length ()) > - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); > - > return true; > } > > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c > =================================================================== > --- /dev/null 2017-05-03 08:16:26.972699664 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-05-03 08:48:48.736031902 +0100 > @@ -0,0 +1,104 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ > + > +/* Intended to be larger than any VF. */ > +#define GAP 128 > +#define N (GAP * 3) > + > +struct s { int x[N + 1]; }; > +struct t { struct s x[N + 1]; }; > +struct u { int x[N + 1]; int y; }; > + > +void > +f1 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i]; > +} > + > +void > +f2 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[2].x[i]; > +} > + > +void > +f3 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[i].x[i]; > +} > + > +void > +f4 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[i].x[i] += b[i].x[i]; > +} > + > +void > +f5 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i + 1]; > +} > + > +void > +f6 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[2].x[i + 1]; > +} > + > +void > +f7 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[i].x[i + 1]; > +} > + > +void > +f8 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[i].x[i] += b[i].x[i + 1]; > +} > + > +void > +f9 (struct s *a, struct t *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[1].x[i]; > +} > + > +void > +f10 (struct s *a, struct t *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i].x[i]; > +} > + > +void > +f11 (struct u *a, struct u *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i] + b[i].y; > +} > + > +void > +f12 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +void > +f13 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP * 2; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 13 "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c > =================================================================== > --- /dev/null 2017-05-03 08:16:26.972699664 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-05-03 08:48:48.736031902 +0100 > @@ -0,0 +1,35 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ > + > +#define N 16 > + > +struct s1 { int a[N]; }; > +struct s2 { struct s1 b; int c; }; > +struct s3 { int d; struct s1 e; }; > +union u { struct s2 f; struct s3 g; }; > + > +/* We allow a and b to overlap arbitrarily. */ > + > +void > +f1 (int a[][N], int b[][N]) > +{ > + for (int i = 0; i < N; ++i) > + a[0][i] += b[0][i]; > +} > + > +void > +f2 (union u *a, union u *b) > +{ > + for (int i = 0; i < N; ++i) > + a->f.b.a[i] += b->g.e.a[i]; > +} > + > +void > +f3 (struct s1 *a, struct s1 *b) > +{ > + for (int i = 0; i < N - 1; ++i) > + a->a[i + 1] += b->a[i]; > +} > + > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c > =================================================================== > --- /dev/null 2017-05-03 08:16:26.972699664 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-05-03 08:48:48.736031902 +0100 > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > + > +/* Intended to be larger than any VF. */ > +#define GAP 128 > +#define N (GAP * 3) > + > +struct s { int x[N]; }; > + > +void > +f1 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP * 2; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +/* { dg-final { scan-tree-dump-times "mark for run-time aliasing" 1 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */

On Thu, May 4, 2017 at 2:12 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> This patch tries to calculate conservatively-correct distance >> vectors for two references whose base addresses are not the same. >> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence >> isn't guaranteed to occur. >> >> The motivating example is: >> >> struct s { int x[8]; }; >> void >> f (struct s *a, struct s *b) >> { >> for (int i = 0; i < 8; ++i) >> a->x[i] += b->x[i]; >> } >> >> in which the "a" and "b" accesses are either independent or have a >> dependence distance of 0 (assuming -fstrict-aliasing). Neither case >> prevents vectorisation, so we can vectorise without an alias check. >> >> I'd originally wanted to do the same thing for arrays as well, e.g.: >> >> void >> f (int a[][8], struct b[][8]) >> { >> for (int i = 0; i < 8; ++i) >> a[0][i] += b[0][i]; >> } >> >> I think this is valid because C11 6.7.6.2/6 says: >> >> For two array types to be compatible, both shall have compatible >> element types, and if both size specifiers are present, and are >> integer constant expressions, then both size specifiers shall have >> the same constant value. >> >> So if we access an array through an int (*)[8], it must have type X[8] >> or X[], where X is compatible with int. It doesn't seem possible in >> either case for "a[0]" and "b[0]" to overlap when "a != b". >> >> However, Richard B said that (at least in gimple) we support arbitrary >> overlap of arrays and allow arrays to be accessed with different >> dimensionality. There are examples of this in PR50067. I've therefore >> only handled references that end in a structure field access. >> >> There are two ways of handling these dependences in the vectoriser: >> use them to limit VF, or check at runtime as before. I've gone for >> the approach of checking at runtime if we can, to avoid limiting VF >> unnecessarily. We still fall back to a VF cap when runtime checks >> aren't allowed. >> >> The patch tests whether we queued an alias check with a dependence >> distance of X and then picked a VF <= X, in which case it's safe to >> drop the alias check. Since vect_prune_runtime_alias_check_list can >> be called twice with different VF for the same loop, it's no longer >> safe to clear may_alias_ddrs on exit. Instead we should use >> comp_alias_ddrs to check whether versioning is necessary. >> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? > > You seem to do your "fancy" thing but also later compute the old > base equality anyway (for same_base_p). It looks to me for this > case the new fancy code can be simply skipped, keeping num_dimensions > as before? > > + /* Try to approach equal type sizes. */ > + if (!COMPLETE_TYPE_P (type_a) > + || !COMPLETE_TYPE_P (type_b) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) > + break; > > ah, interesting idea to avoid a quadratic search. Note that you should > conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR > as they are used for type-punning. I see > nonoverlapping_component_refs_of_decl_p > should simply skip ARRAY_REFs - but I also see there: > > /* ??? We cannot simply use the type of operand #0 of the refs here > as the Fortran compiler smuggles type punning into COMPONENT_REFs > for common blocks instead of using unions like everyone else. */ > tree type1 = DECL_CONTEXT (field1); > tree type2 = DECL_CONTEXT (field2); > > so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility. > You also may not use types_compatible_p here as for LTO that is _way_ too > lax for aggregates. The above uses > > /* We cannot disambiguate fields in a union or qualified union. */ > if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) > return false; > > so you should also bail out on unions here, rather than the check you do later. > > You seem to rely on getting an access_fn entry for each handled_component_p. > It looks like this is the case -- we even seem to stop at unions (with the same > fortran "issue"). I'm not sure that's the best thing to do but you > rely on that. > > I don't understand the looping, it needs more comments. You seem to be > looking for the innermost compatible RECORD_TYPE but then num_dimensions > is how many compatible refs you found on the way (with incompatible ones > not counting?!). What about an inner varying array of structs? This seems to > be disregarded in the analysis now? Thus, a[i].s.b[i].j vs. __real > b[i].s.b[i].j? > > nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p > conveniently start from the other > end of the ref here. That said, for the motivational cases we either have one ref having more dimensions than the other (the __real vs. full complex access) or they have the same number of dimensions (and no access fn for the base). For the first case we should simply "drop" access_fns of the larger dimensional ref (from the start, plus outer component refs) up to the point the number of dimensions are equal. Then we have the case of ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b)) where we have to punt. Then we have the case of ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) which is where the new code should kick in to see if we can drop access_fns from the other end (as unanalyzable but either having distance zero or not aliased because of TBAA). At least your testcases suggest you do not want to handle struct s { int x[N]; }; struct r { struct s s; }; f (struct s *a, struct r *b) { for (i = 0; i < N; ++i) a->s.x[i] = b->x[i]; } ? With this example your loop which seems to search for a "common" sequence in (different) midst of the reference trees makes more sense (still that loop is awkward to understand). Richard. > Richard. > >> Thanks, >> Richard >> >> >> gcc/ >> 2017-05-03 Richard Sandiford <richard.sandiford@linaro.org> >> >> * tree-data-ref.h (subscript): Add access_fn field. >> (data_dependence_relation): Add could_be_independent_p. >> (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. >> (same_access_functions): Move to tree-data-ref.c. >> * tree-data-ref.c (ref_contains_union_access_p): New function. >> (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of >> DR_ACCESS_FN. >> (constant_access_functions): Likewise. >> (add_other_self_distances): Likewise. >> (same_access_functions): Likewise. (Moved from tree-data-ref.h.) >> (initialize_data_dependence_relation): Use XCNEW and remove >> explicit zeroing of DDR_REVERSED_P. Look for a subsequence >> of access functions that have the same type. Allow the >> subsequence to end with different bases in some circumstances. >> Record the chosen access functions in SUB_ACCESS_FN. >> (build_classic_dist_vector_1): Replace ddr_a and ddr_b with >> a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. >> (subscript_dependence_tester_1): Likewise dra and drb. >> (build_classic_dist_vector): Update calls accordingly. >> (subscript_dependence_tester): Likewise. >> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check >> DDR_COULD_BE_INDEPENDENT_P. >> * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test >> comp_alias_ddrs instead of may_alias_ddrs. >> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Try >> to mark for aliasing if DDR_COULD_BE_INDEPENDENT_P, but fall back >> to using the recorded distance vectors if that fails. >> (dependence_distance_ge_vf): New function. >> (vect_prune_runtime_alias_test_list): Use it. Don't clear >> LOOP_VINFO_MAY_ALIAS_DDRS. >> >> gcc/testsuite/ >> * gcc.dg/vect/vect-alias-check-3.c: New test. >> * gcc.dg/vect/vect-alias-check-4.c: Likewise. >> * gcc.dg/vect/vect-alias-check-5.c: Likewise. >> >> Index: gcc/tree-data-ref.h >> =================================================================== >> --- gcc/tree-data-ref.h 2017-05-03 08:48:11.977015306 +0100 >> +++ gcc/tree-data-ref.h 2017-05-03 08:48:48.737038502 +0100 >> @@ -191,6 +191,9 @@ struct conflict_function >> >> struct subscript >> { >> + /* The access functions of the two references. */ >> + tree access_fn[2]; >> + >> /* A description of the iterations for which the elements are >> accessed twice. */ >> conflict_function *conflicting_iterations_in_a; >> @@ -209,6 +212,7 @@ struct subscript >> >> typedef struct subscript *subscript_p; >> >> +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] >> #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a >> #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b >> #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict >> @@ -264,6 +268,33 @@ struct data_dependence_relation >> /* Set to true when the dependence relation is on the same data >> access. */ >> bool self_reference_p; >> + >> + /* True if the dependence described is conservatively correct rather >> + than exact, and if it is still possible for the accesses to be >> + conditionally independent. For example, the a and b references in: >> + >> + struct s *a, *b; >> + for (int i = 0; i < n; ++i) >> + a->f[i] += b->f[i]; >> + >> + conservatively have a distance vector of (0), for the case in which >> + a == b, but the accesses are independent if a != b. Similarly, >> + the a and b references in: >> + >> + struct s *a, *b; >> + for (int i = 0; i < n; ++i) >> + a[0].f[i] += b[i].f[i]; >> + >> + conservatively have a distance vector of (0), but they are indepenent >> + when a != b + i. In contrast, the references in: >> + >> + struct s *a; >> + for (int i = 0; i < n; ++i) >> + a->f[i] += a->f[i]; >> + >> + have the same distance vector of (0), but the accesses can never be >> + independent. */ >> + bool could_be_independent_p; >> }; >> >> typedef struct data_dependence_relation *ddr_p; >> @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \ >> #define DDR_DIST_VECT(DDR, I) \ >> DDR_DIST_VECTS (DDR)[I] >> #define DDR_REVERSED_P(DDR) (DDR)->reversed_p >> +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p >> >> >> bool dr_analyze_innermost (struct data_reference *, struct loop *); >> @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data >> return false; >> >> return true; >> -} >> - >> -/* Return true when the DDR contains two data references that have the >> - same access functions. */ >> - >> -static inline bool >> -same_access_functions (const struct data_dependence_relation *ddr) >> -{ >> - unsigned i; >> - >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), >> - DR_ACCESS_FN (DDR_B (ddr), i))) >> - return false; >> - >> - return true; >> } >> >> /* Returns true when all the dependences are computable. */ >> Index: gcc/tree-data-ref.c >> =================================================================== >> --- gcc/tree-data-ref.c 2017-02-23 19:54:15.000000000 +0000 >> +++ gcc/tree-data-ref.c 2017-05-03 08:48:48.737038502 +0100 >> @@ -123,8 +123,7 @@ Software Foundation; either version 3, o >> } dependence_stats; >> >> static bool subscript_dependence_tester_1 (struct data_dependence_relation *, >> - struct data_reference *, >> - struct data_reference *, >> + unsigned int, unsigned int, >> struct loop *); >> /* Returns true iff A divides B. */ >> >> @@ -144,6 +143,21 @@ int_divides_p (int a, int b) >> return ((b % a) == 0); >> } >> >> +/* Return true if reference REF contains a union access. */ >> + >> +static bool >> +ref_contains_union_access_p (tree ref) >> +{ >> + while (handled_component_p (ref)) >> + { >> + ref = TREE_OPERAND (ref, 0); >> + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE >> + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) >> + return true; >> + } >> + return false; >> +} >> + >> >> >> /* Dump into FILE all the data references from DATAREFS. */ >> @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out >> unsigned int i; >> struct loop *loopi; >> >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> + subscript *sub; >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> { >> fprintf (outf, " access_fn_A: "); >> - print_generic_stmt (outf, DR_ACCESS_FN (dra, i), 0); >> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0), 0); >> fprintf (outf, " access_fn_B: "); >> - print_generic_stmt (outf, DR_ACCESS_FN (drb, i), 0); >> - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); >> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1), 0); >> + dump_subscript (outf, sub); >> } >> >> fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); >> @@ -1484,11 +1499,10 @@ initialize_data_dependence_relation (str >> struct data_dependence_relation *res; >> unsigned int i; >> >> - res = XNEW (struct data_dependence_relation); >> + res = XCNEW (struct data_dependence_relation); >> DDR_A (res) = a; >> DDR_B (res) = b; >> DDR_LOOP_NEST (res).create (0); >> - DDR_REVERSED_P (res) = false; >> DDR_SUBSCRIPTS (res).create (0); >> DDR_DIR_VECTS (res).create (0); >> DDR_DIST_VECTS (res).create (0); >> @@ -1506,82 +1520,217 @@ initialize_data_dependence_relation (str >> return res; >> } >> >> - /* The case where the references are exactly the same. */ >> - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) >> + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); >> + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); >> + if (num_dimensions_a == 0 || num_dimensions_b == 0) >> { >> - if ((loop_nest.exists () >> - && !object_address_invariant_in_loop_p (loop_nest[0], >> - DR_BASE_OBJECT (a))) >> - || DR_NUM_DIMENSIONS (a) == 0) >> - { >> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> - return res; >> - } >> - DDR_AFFINE_P (res) = true; >> - DDR_ARE_DEPENDENT (res) = NULL_TREE; >> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); >> - DDR_LOOP_NEST (res) = loop_nest; >> - DDR_INNER_LOOP (res) = 0; >> - DDR_SELF_REFERENCE (res) = true; >> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) >> - { >> - struct subscript *subscript; >> + DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> + return res; >> + } >> + >> + /* For unconstrained bases, the outer (highest-index) subscript >> + describes a variation in the base of the original DR_REF rather >> + than a component access. We have no type that accurately describes >> + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* >> + applying the outer subscript) so limit the search to the last real >> + component access. >> + >> + E.g. for: >> >> - subscript = XNEW (struct subscript); >> - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); >> - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); >> - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; >> - SUB_DISTANCE (subscript) = chrec_dont_know; >> - DDR_SUBSCRIPTS (res).safe_push (subscript); >> + void >> + f (int a[][8], int b[][8]) >> + { >> + for (int i = 0; i < 8; ++i) >> + a[i * 2][0] = b[i][0]; >> } >> - return res; >> + >> + the a and b accesses have a single ARRAY_REF component reference [0] >> + but have two subscripts. */ >> + if (DR_UNCONSTRAINED_BASE (a)) >> + num_dimensions_a -= 1; >> + if (DR_UNCONSTRAINED_BASE (b)) >> + num_dimensions_b -= 1; >> + >> + /* Now look for two sequences of component references that have the same >> + type in both A and B. The first sequence includes an arbitrary mixture >> + of array and structure references while the second always ends on a >> + structure reference. >> + >> + The former (arbitrary) sequence uses access functions: >> + >> + [START_A, START_A + NUM_DIMENSIONS) of A >> + [START_B, START_B + NUM_DIMENSIONS) of B >> + >> + The latter sequence uses access functions: >> + >> + [STRUCT_START_A, STRUCT_START_A + STRUCT_NUM_DIMENSIONS) of A >> + [STRUCT_START_B, STRUCT_START_B + STRUCT_NUM_DIMENSIONS) of B >> + >> + STRUCT_REF_A and STRUCT_REF_B are the outer references for the >> + latter sequence. */ >> + unsigned int start_a = 0; >> + unsigned int start_b = 0; >> + unsigned int num_dimensions = 0; >> + unsigned int struct_start_a = 0; >> + unsigned int struct_start_b = 0; >> + unsigned int struct_num_dimensions = 0; >> + unsigned int index_a = 0; >> + unsigned int index_b = 0; >> + tree next_ref_a = DR_REF (a); >> + tree next_ref_b = DR_REF (b); >> + tree struct_ref_a = NULL_TREE; >> + tree struct_ref_b = NULL_TREE; >> + while (index_a < num_dimensions_a && index_b < num_dimensions_b) >> + { >> + gcc_checking_assert (handled_component_p (next_ref_a)); >> + gcc_checking_assert (handled_component_p (next_ref_b)); >> + tree outer_ref_a = TREE_OPERAND (next_ref_a, 0); >> + tree outer_ref_b = TREE_OPERAND (next_ref_b, 0); >> + tree type_a = TREE_TYPE (outer_ref_a); >> + tree type_b = TREE_TYPE (outer_ref_b); >> + if (types_compatible_p (type_a, type_b)) >> + { >> + /* This pair of accesses belong to a suitable sequence. */ >> + if (start_a + num_dimensions != index_a >> + || start_b + num_dimensions != index_b) >> + { >> + /* Start a new sequence here. */ >> + start_a = index_a; >> + start_b = index_b; >> + num_dimensions = 0; >> + } >> + num_dimensions += 1; >> + if (TREE_CODE (type_a) == RECORD_TYPE) >> + { >> + struct_start_a = start_a; >> + struct_start_b = start_b; >> + struct_num_dimensions = num_dimensions; >> + struct_ref_a = outer_ref_a; >> + struct_ref_b = outer_ref_b; >> + } >> + next_ref_a = outer_ref_a; >> + next_ref_b = outer_ref_b; >> + index_a += 1; >> + index_b += 1; >> + continue; >> + } >> + /* Try to approach equal type sizes. */ >> + if (!COMPLETE_TYPE_P (type_a) >> + || !COMPLETE_TYPE_P (type_b) >> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >> + break; >> + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); >> + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); >> + if (size_a <= size_b) >> + { >> + index_a += 1; >> + next_ref_a = outer_ref_a; >> + } >> + if (size_b <= size_a) >> + { >> + index_b += 1; >> + next_ref_b = outer_ref_b; >> + } >> } >> >> - /* If the references do not access the same object, we do not know >> - whether they alias or not. We do not care about TBAA or alignment >> - info so we can use OEP_ADDRESS_OF to avoid false negatives. >> - But the accesses have to use compatible types as otherwise the >> - built indices would not match. */ >> - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF) >> - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), >> - TREE_TYPE (DR_BASE_OBJECT (b)))) >> + /* See whether the sequence ends at the base and whether the two bases >> + are equal. We do not care about TBAA or alignment info so we can use >> + OEP_ADDRESS_OF to avoid false negatives. */ >> + tree base_a = DR_BASE_OBJECT (a); >> + tree base_b = DR_BASE_OBJECT (b); >> + bool same_base_p = (start_a + num_dimensions == num_dimensions_a >> + && start_b + num_dimensions == num_dimensions_b >> + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) >> + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) >> + && types_compatible_p (TREE_TYPE (base_a), >> + TREE_TYPE (base_b)) >> + && (!loop_nest.exists () >> + || (object_address_invariant_in_loop_p >> + (loop_nest[0], base_a)))); >> + >> + /* If the bases are the same, we can include the base variation too. >> + E.g. the b accesses in: >> + >> + for (int i = 0; i < n; ++i) >> + b[i + 4][0] = b[i][0]; >> + >> + have a definite dependence distance of 4, while for: >> + >> + for (int i = 0; i < n; ++i) >> + a[i + 4][0] = b[i][0]; >> + >> + the dependence distance depends on the gap between a and b. >> + >> + If the bases are different then we can only rely on the sequence >> + rooted at a structure access, since arrays are allowed to overlap >> + arbitrarily and change shape arbitrarily. E.g. we treat this as >> + valid code: >> + >> + int a[256]; >> + ... >> + ((int (*)[4][3])&a[1])[i][0] += ((int (*)[4][3])&a[2])[i][0]; >> + >> + where two lvalues with the same int[4][3] type overlap, and where >> + both lvalues are distinct from the object's declared type. */ >> + if (same_base_p) >> { >> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> - return res; >> + if (DR_UNCONSTRAINED_BASE (a)) >> + num_dimensions += 1; >> + } >> + else >> + { >> + start_a = struct_start_a; >> + start_b = struct_start_b; >> + num_dimensions = struct_num_dimensions; >> } >> >> - /* If the base of the object is not invariant in the loop nest, we cannot >> - analyze it. TODO -- in fact, it would suffice to record that there may >> - be arbitrary dependences in the loops where the base object varies. */ >> - if ((loop_nest.exists () >> - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a))) >> - || DR_NUM_DIMENSIONS (a) == 0) >> + /* Punt if we didn't find a suitable sequence. */ >> + if (num_dimensions == 0) >> { >> DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> return res; >> } >> >> - /* If the number of dimensions of the access to not agree we can have >> - a pointer access to a component of the array element type and an >> - array access while the base-objects are still the same. Punt. */ >> - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) >> + if (!same_base_p) >> { >> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> - return res; >> + /* Partial overlap is possible for different bases when strict aliasing >> + is not in effect. It's also possible if either base involves a union >> + access; e.g. for: >> + >> + struct s1 { int a[2]; }; >> + struct s2 { struct s1 b; int c; }; >> + struct s3 { int d; struct s1 e; }; >> + union u { struct s2 f; struct s3 g; } *p, *q; >> + >> + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at >> + "p->g.e" (base "p->g") and might partially overlap the s1 at >> + "q->g.e" (base "q->g"). */ >> + if (!flag_strict_aliasing >> + || ref_contains_union_access_p (struct_ref_a) >> + || ref_contains_union_access_p (struct_ref_b)) >> + { >> + DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> + return res; >> + } >> + >> + DDR_COULD_BE_INDEPENDENT_P (res) = true; >> } >> >> DDR_AFFINE_P (res) = true; >> DDR_ARE_DEPENDENT (res) = NULL_TREE; >> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); >> + DDR_SUBSCRIPTS (res).create (num_dimensions); >> DDR_LOOP_NEST (res) = loop_nest; >> DDR_INNER_LOOP (res) = 0; >> DDR_SELF_REFERENCE (res) = false; >> >> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) >> + for (i = 0; i < num_dimensions; ++i) >> { >> struct subscript *subscript; >> >> subscript = XNEW (struct subscript); >> + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, start_a + i); >> + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, start_b + i); >> SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); >> SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); >> SUB_LAST_CONFLICT (subscript) = chrec_dont_know; >> @@ -3163,14 +3312,15 @@ add_outer_distances (struct data_depende >> } >> >> /* Return false when fail to represent the data dependence as a >> - distance vector. INIT_B is set to true when a component has been >> + distance vector. A_INDEX is the index of the first reference >> + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the >> + second reference. INIT_B is set to true when a component has been >> added to the distance vector DIST_V. INDEX_CARRY is then set to >> the index in DIST_V that carries the dependence. */ >> >> static bool >> build_classic_dist_vector_1 (struct data_dependence_relation *ddr, >> - struct data_reference *ddr_a, >> - struct data_reference *ddr_b, >> + unsigned int a_index, unsigned int b_index, >> lambda_vector dist_v, bool *init_b, >> int *index_carry) >> { >> @@ -3188,8 +3338,8 @@ build_classic_dist_vector_1 (struct data >> return false; >> } >> >> - access_fn_a = DR_ACCESS_FN (ddr_a, i); >> - access_fn_b = DR_ACCESS_FN (ddr_b, i); >> + access_fn_a = SUB_ACCESS_FN (subscript, a_index); >> + access_fn_b = SUB_ACCESS_FN (subscript, b_index); >> >> if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC >> && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) >> @@ -3249,10 +3399,11 @@ build_classic_dist_vector_1 (struct data >> constant_access_functions (const struct data_dependence_relation *ddr) >> { >> unsigned i; >> + subscript *sub; >> >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) >> - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) >> + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) >> return false; >> >> return true; >> @@ -3315,10 +3466,11 @@ add_other_self_distances (struct data_de >> lambda_vector dist_v; >> unsigned i; >> int index_carry = DDR_NB_LOOPS (ddr); >> + subscript *sub; >> >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> { >> - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); >> + tree access_fun = SUB_ACCESS_FN (sub, 0); >> >> if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) >> { >> @@ -3330,7 +3482,7 @@ add_other_self_distances (struct data_de >> return; >> } >> >> - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); >> + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); >> >> if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) >> add_multivariate_self_dist (ddr, access_fun); >> @@ -3401,6 +3553,23 @@ add_distance_for_zero_overlaps (struct d >> } >> } >> >> +/* Return true when the DDR contains two data references that have the >> + same access functions. */ >> + >> +static inline bool >> +same_access_functions (const struct data_dependence_relation *ddr) >> +{ >> + unsigned i; >> + subscript *sub; >> + >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), >> + SUB_ACCESS_FN (sub, 1))) >> + return false; >> + >> + return true; >> +} >> + >> /* Compute the classic per loop distance vector. DDR is the data >> dependence relation to build a vector from. Return false when fail >> to represent the data dependence as a distance vector. */ >> @@ -3432,8 +3601,7 @@ build_classic_dist_vector (struct data_d >> } >> >> dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >> - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), >> - dist_v, &init_b, &index_carry)) >> + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry)) >> return false; >> >> /* Save the distance vector if we initialized one. */ >> @@ -3466,12 +3634,11 @@ build_classic_dist_vector (struct data_d >> if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) >> { >> lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), >> - loop_nest)) >> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) >> return false; >> compute_subscript_distance (ddr); >> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), >> - save_v, &init_b, &index_carry)) >> + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, >> + &index_carry)) >> return false; >> save_dist_v (ddr, save_v); >> DDR_REVERSED_P (ddr) = true; >> @@ -3507,12 +3674,10 @@ build_classic_dist_vector (struct data_d >> { >> lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >> >> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), >> - DDR_A (ddr), loop_nest)) >> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) >> return false; >> compute_subscript_distance (ddr); >> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), >> - opposite_v, &init_b, >> + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, >> &index_carry)) >> return false; >> >> @@ -3591,13 +3756,13 @@ build_classic_dir_vector (struct data_de >> } >> } >> >> -/* Helper function. Returns true when there is a dependence between >> - data references DRA and DRB. */ >> +/* Helper function. Returns true when there is a dependence between the >> + data references. A_INDEX is the index of the first reference (0 for >> + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ >> >> static bool >> subscript_dependence_tester_1 (struct data_dependence_relation *ddr, >> - struct data_reference *dra, >> - struct data_reference *drb, >> + unsigned int a_index, unsigned int b_index, >> struct loop *loop_nest) >> { >> unsigned int i; >> @@ -3609,8 +3774,8 @@ subscript_dependence_tester_1 (struct da >> { >> conflict_function *overlaps_a, *overlaps_b; >> >> - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), >> - DR_ACCESS_FN (drb, i), >> + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), >> + SUB_ACCESS_FN (subscript, b_index), >> &overlaps_a, &overlaps_b, >> &last_conflicts, loop_nest); >> >> @@ -3659,7 +3824,7 @@ subscript_dependence_tester_1 (struct da >> subscript_dependence_tester (struct data_dependence_relation *ddr, >> struct loop *loop_nest) >> { >> - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest)) >> + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) >> dependence_stats.num_dependence_dependent++; >> >> compute_subscript_distance (ddr); >> Index: gcc/tree-ssa-loop-prefetch.c >> =================================================================== >> --- gcc/tree-ssa-loop-prefetch.c 2017-03-28 16:19:28.000000000 +0100 >> +++ gcc/tree-ssa-loop-prefetch.c 2017-05-03 08:48:48.737038502 +0100 >> @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop * >> refb = (struct mem_ref *) DDR_B (dep)->aux; >> >> if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know >> + || DDR_COULD_BE_INDEPENDENT_P (dep) >> || DDR_NUM_DIST_VECTS (dep) == 0) >> { >> /* If the dependence cannot be analyzed, assume that there might be >> Index: gcc/tree-vectorizer.h >> =================================================================== >> --- gcc/tree-vectorizer.h 2017-03-28 16:19:28.000000000 +0100 >> +++ gcc/tree-vectorizer.h 2017-05-03 08:48:48.738045102 +0100 >> @@ -383,7 +383,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) >> #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ >> ((L)->may_misalign_stmts.length () > 0) >> #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ >> - ((L)->may_alias_ddrs.length () > 0) >> + ((L)->comp_alias_ddrs.length () > 0) >> #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ >> (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) >> #define LOOP_REQUIRES_VERSIONING(L) \ >> Index: gcc/tree-vect-data-refs.c >> =================================================================== >> --- gcc/tree-vect-data-refs.c 2017-05-03 08:48:30.536704993 +0100 >> +++ gcc/tree-vect-data-refs.c 2017-05-03 08:48:48.738045102 +0100 >> @@ -340,6 +340,26 @@ vect_analyze_data_ref_dependence (struct >> } >> >> loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); >> + >> + if (DDR_COULD_BE_INDEPENDENT_P (ddr)) >> + /* For dependence distances of 2 or more, we have the option of >> + limiting VF or checking for an alias at runtime. Prefer to check >> + at runtime if we can, to avoid limiting the VF unnecessarily when >> + the bases are in fact independent. >> + >> + Note that the alias checks will be removed if the VF ends up >> + being small enough. */ >> + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) >> + { >> + int dist = dist_v[loop_depth]; >> + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) >> + { >> + if (vect_mark_for_runtime_alias_test (ddr, loop_vinfo)) >> + return false; >> + break; >> + } >> + } >> + >> FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) >> { >> int dist = dist_v[loop_depth]; >> @@ -3017,6 +3037,44 @@ vect_no_alias_p (struct data_reference * >> return false; >> } >> >> +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH >> + in DDR is >= VF. */ >> + >> +static bool >> +dependence_distance_ge_vf (data_dependence_relation *ddr, >> + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) >> +{ >> + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE >> + || DDR_NUM_DIST_VECTS (ddr) == 0) >> + return false; >> + >> + /* If the dependence is exact, we should have limited the VF instead. */ >> + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); >> + >> + unsigned int i; >> + lambda_vector dist_v; >> + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) >> + { >> + HOST_WIDE_INT dist = dist_v[loop_depth]; >> + if (dist != 0 >> + && !(dist > 0 && DDR_REVERSED_P (ddr)) >> + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) >> + return false; >> + } >> + >> + if (dump_enabled_p ()) >> + { >> + dump_printf_loc (MSG_NOTE, vect_location, >> + "dependence distance between "); >> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); >> + dump_printf (MSG_NOTE, " and "); >> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); >> + dump_printf (MSG_NOTE, " is >= VF\n"); >> + } >> + >> + return true; >> +} >> + >> /* Function vect_prune_runtime_alias_test_list. >> >> Prune a list of ddrs to be tested at run-time by versioning for alias. >> @@ -3075,6 +3133,10 @@ vect_prune_runtime_alias_test_list (loop >> >> comp_alias_ddrs.create (may_alias_ddrs.length ()); >> >> + unsigned int loop_depth >> + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, >> + LOOP_VINFO_LOOP_NEST (loop_vinfo)); >> + >> /* First, we collect all data ref pairs for aliasing checks. */ >> FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) >> { >> @@ -3084,6 +3146,11 @@ vect_prune_runtime_alias_test_list (loop >> tree segment_length_a, segment_length_b; >> gimple *stmt_a, *stmt_b; >> >> + /* Ignore the alias if the VF we chose ended up being no greater >> + than the dependence distance. */ >> + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) >> + continue; >> + >> dr_a = DDR_A (ddr); >> stmt_a = DR_STMT (DDR_A (ddr)); >> dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); >> @@ -3294,10 +3361,6 @@ vect_prune_runtime_alias_test_list (loop >> return false; >> } >> >> - /* All alias checks have been resolved at compilation time. */ >> - if (!comp_alias_ddrs.length ()) >> - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); >> - >> return true; >> } >> >> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c >> =================================================================== >> --- /dev/null 2017-05-03 08:16:26.972699664 +0100 >> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-05-03 08:48:48.736031902 +0100 >> @@ -0,0 +1,104 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ >> + >> +/* Intended to be larger than any VF. */ >> +#define GAP 128 >> +#define N (GAP * 3) >> + >> +struct s { int x[N + 1]; }; >> +struct t { struct s x[N + 1]; }; >> +struct u { int x[N + 1]; int y; }; >> + >> +void >> +f1 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[i]; >> +} >> + >> +void >> +f2 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[1].x[i] += b[2].x[i]; >> +} >> + >> +void >> +f3 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[1].x[i] += b[i].x[i]; >> +} >> + >> +void >> +f4 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[i].x[i] += b[i].x[i]; >> +} >> + >> +void >> +f5 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[i + 1]; >> +} >> + >> +void >> +f6 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[1].x[i] += b[2].x[i + 1]; >> +} >> + >> +void >> +f7 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[1].x[i] += b[i].x[i + 1]; >> +} >> + >> +void >> +f8 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[i].x[i] += b[i].x[i + 1]; >> +} >> + >> +void >> +f9 (struct s *a, struct t *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[1].x[i]; >> +} >> + >> +void >> +f10 (struct s *a, struct t *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[i].x[i]; >> +} >> + >> +void >> +f11 (struct u *a, struct u *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[i] + b[i].y; >> +} >> + >> +void >> +f12 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < GAP; ++i) >> + a->x[i + GAP] += b->x[i]; >> +} >> + >> +void >> +f13 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < GAP * 2; ++i) >> + a->x[i + GAP] += b->x[i]; >> +} >> + >> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 13 "vect" } } */ >> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c >> =================================================================== >> --- /dev/null 2017-05-03 08:16:26.972699664 +0100 >> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-05-03 08:48:48.736031902 +0100 >> @@ -0,0 +1,35 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ >> + >> +#define N 16 >> + >> +struct s1 { int a[N]; }; >> +struct s2 { struct s1 b; int c; }; >> +struct s3 { int d; struct s1 e; }; >> +union u { struct s2 f; struct s3 g; }; >> + >> +/* We allow a and b to overlap arbitrarily. */ >> + >> +void >> +f1 (int a[][N], int b[][N]) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[0][i] += b[0][i]; >> +} >> + >> +void >> +f2 (union u *a, union u *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->f.b.a[i] += b->g.e.a[i]; >> +} >> + >> +void >> +f3 (struct s1 *a, struct s1 *b) >> +{ >> + for (int i = 0; i < N - 1; ++i) >> + a->a[i + 1] += b->a[i]; >> +} >> + >> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ >> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c >> =================================================================== >> --- /dev/null 2017-05-03 08:16:26.972699664 +0100 >> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-05-03 08:48:48.736031902 +0100 >> @@ -0,0 +1,19 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_int } */ >> + >> +/* Intended to be larger than any VF. */ >> +#define GAP 128 >> +#define N (GAP * 3) >> + >> +struct s { int x[N]; }; >> + >> +void >> +f1 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < GAP * 2; ++i) >> + a->x[i + GAP] += b->x[i]; >> +} >> + >> +/* { dg-final { scan-tree-dump-times "mark for run-time aliasing" 1 "vect" } } */ >> +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */ >> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */

Richard Biener <richard.guenther@gmail.com> writes: > On Thu, May 4, 2017 at 2:12 PM, Richard Biener > <richard.guenther@gmail.com> wrote: >> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> This patch tries to calculate conservatively-correct distance >>> vectors for two references whose base addresses are not the same. >>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence >>> isn't guaranteed to occur. >>> >>> The motivating example is: >>> >>> struct s { int x[8]; }; >>> void >>> f (struct s *a, struct s *b) >>> { >>> for (int i = 0; i < 8; ++i) >>> a->x[i] += b->x[i]; >>> } >>> >>> in which the "a" and "b" accesses are either independent or have a >>> dependence distance of 0 (assuming -fstrict-aliasing). Neither case >>> prevents vectorisation, so we can vectorise without an alias check. >>> >>> I'd originally wanted to do the same thing for arrays as well, e.g.: >>> >>> void >>> f (int a[][8], struct b[][8]) >>> { >>> for (int i = 0; i < 8; ++i) >>> a[0][i] += b[0][i]; >>> } >>> >>> I think this is valid because C11 6.7.6.2/6 says: >>> >>> For two array types to be compatible, both shall have compatible >>> element types, and if both size specifiers are present, and are >>> integer constant expressions, then both size specifiers shall have >>> the same constant value. >>> >>> So if we access an array through an int (*)[8], it must have type X[8] >>> or X[], where X is compatible with int. It doesn't seem possible in >>> either case for "a[0]" and "b[0]" to overlap when "a != b". >>> >>> However, Richard B said that (at least in gimple) we support arbitrary >>> overlap of arrays and allow arrays to be accessed with different >>> dimensionality. There are examples of this in PR50067. I've therefore >>> only handled references that end in a structure field access. >>> >>> There are two ways of handling these dependences in the vectoriser: >>> use them to limit VF, or check at runtime as before. I've gone for >>> the approach of checking at runtime if we can, to avoid limiting VF >>> unnecessarily. We still fall back to a VF cap when runtime checks >>> aren't allowed. >>> >>> The patch tests whether we queued an alias check with a dependence >>> distance of X and then picked a VF <= X, in which case it's safe to >>> drop the alias check. Since vect_prune_runtime_alias_check_list can >>> be called twice with different VF for the same loop, it's no longer >>> safe to clear may_alias_ddrs on exit. Instead we should use >>> comp_alias_ddrs to check whether versioning is necessary. >>> >>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >> >> You seem to do your "fancy" thing but also later compute the old >> base equality anyway (for same_base_p). It looks to me for this >> case the new fancy code can be simply skipped, keeping num_dimensions >> as before? >> >> + /* Try to approach equal type sizes. */ >> + if (!COMPLETE_TYPE_P (type_a) >> + || !COMPLETE_TYPE_P (type_b) >> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >> + break; >> >> ah, interesting idea to avoid a quadratic search. Note that you should >> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR >> as they are used for type-punning. All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs, ARRAY_REFs or COMPONENT_REFs of structures, since that's all that dr_analyze_indices allows, so I think we safe in terms of the tree codes. >> I see nonoverlapping_component_refs_of_decl_p should simply skip >> ARRAY_REFs - but I also see there: >> >> /* ??? We cannot simply use the type of operand #0 of the refs here >> as the Fortran compiler smuggles type punning into COMPONENT_REFs >> for common blocks instead of using unions like everyone else. */ >> tree type1 = DECL_CONTEXT (field1); >> tree type2 = DECL_CONTEXT (field2); >> >> so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility. >> You also may not use types_compatible_p here as for LTO that is _way_ too >> lax for aggregates. The above uses >> >> /* We cannot disambiguate fields in a union or qualified union. */ >> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) >> return false; >> >> so you should also bail out on unions here, rather than the check you do later. The loop stops before we get to a union, so I think "only" the RECORD_TYPE COMPONENT_REF handling is a potential problem. Does this mean that I should use the nonoverlapping_component_refs_of_decl_p code: tree field1 = TREE_OPERAND (ref1, 1); tree field2 = TREE_OPERAND (ref2, 1); /* ??? We cannot simply use the type of operand #0 of the refs here as the Fortran compiler smuggles type punning into COMPONENT_REFs for common blocks instead of using unions like everyone else. */ tree type1 = DECL_CONTEXT (field1); tree type2 = DECL_CONTEXT (field2); /* We cannot disambiguate fields in a union or qualified union. */ if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) return false; if (field1 != field2) { /* A field and its representative need to be considered the same. */ if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2 || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1) return false; /* Different fields of the same record type cannot overlap. ??? Bitfields can overlap at RTL level so punt on them. */ if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2)) return false; return true; } as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p during the new loop? And test for this as well as unions in the outer references? >> You seem to rely on getting an access_fn entry for each handled_component_p. >> It looks like this is the case -- we even seem to stop at unions (with the same >> fortran "issue"). I'm not sure that's the best thing to do but you >> rely on that. Yeah, the loop is deliberately limited to the components associated with an access_fn. I did wonder at first whether dr_analyze_indices should store the original component reference trees for each access function. That would make things simpler and more explicit, but would also eat up more memory. Things like object_address_invariant_in_loop_p rely on the access_fns in the same way that the loop in the patch does. >> I don't understand the looping, it needs more comments. You seem to be >> looking for the innermost compatible RECORD_TYPE but then num_dimensions >> is how many compatible refs you found on the way (with incompatible ones >> not counting?!). What about an inner varying array of structs? This seems to >> be disregarded in the analysis now? Thus, a[i].s.b[i].j vs. __real >> b[i].s.b[i].j? I'll try to improve the comments. But the idea is that both sequences are as long as possible, while that still gives compatible types. If there is more than one such sequence, we pick the one nearest the base. So in your example, the access functions would be: 0 1 2 3 4 a: .j [i] .b .s [i] 0 1 2 3 4 5 b: __real .j [i] .b .s [i] If a and b are pointers, the final access functions would be unconstrained base accesses, so we'd end up with: a: [0, 3] b: [1, 4] for both sequences. >> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p >> conveniently start from the other >> end of the ref here. > > That said, for the motivational cases we either have one ref having > more dimensions than the other (the __real vs. full complex access) or > they have the same number of dimensions (and no access fn for the > base). > > For the first case we should simply "drop" access_fns of the larger > dimensional ref (from the start, plus outer component refs) up to the > point the number of dimensions are equal. Yeah, that's what happens for your example. But if we had: a[i].s.c.d __real b[i].s.b[i].j (where d is the same type as the real component) then the access functions would be: 0 1 2 3 a: .d .c .s [i] 0 1 2 3 4 5 b: __real .j [i] .b .s [i] Comparing the a0/b2 column doesn't make sense, because one's an array and the other is a structure. In this case the sequence we care about is: a: [1, 3] b: [3, 5] which is what the loop gives. The a1/b3 column is the one that proves there's no dependence. > Then we have the case of > > ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b)) > > where we have to punt. > > Then we have the case of > > ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) > > which is where the new code should kick in to see if we can drop access_fns > from the other end (as unanalyzable but either having distance zero or not > aliased because of TBAA). > > At least your testcases suggest you do not want to handle > > struct s { int x[N]; }; > struct r { struct s s; }; > f (struct s *a, struct r *b) > { > for (i = 0; i < N; ++i) > a->s.x[i] = b->x[i]; > } > > ? > > With this example your loop which seems to search for a "common" > sequence in (different) midst of the reference trees makes more sense > (still that loop is awkward to understand). Yeah, I want to handle that too, just hadn't thought of it as a specific testcase. The code does give the expected dependence distance of 0. Thanks, Richard

On 4 May 2017 14:12:04 CEST, Richard Biener <richard.guenther@gmail.com> wrote: >nonoverlapping_component_refs_of_decl_p >should simply skip ARRAY_REFs - but I also see there: > > /* ??? We cannot simply use the type of operand #0 of the refs here > as the Fortran compiler smuggles type punning into COMPONENT_REFs > for common blocks instead of using unions like everyone else. */ > tree type1 = DECL_CONTEXT (field1); > tree type2 = DECL_CONTEXT (field2); > >so you probably can't simply use TREE_TYPE (outer_ref) for type >compatibility. >You also may not use types_compatible_p here as for LTO that is _way_ >too >lax for aggregates. The above uses > > /* We cannot disambiguate fields in a union or qualified union. */ > if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) > return false; > >so you should also bail out on unions here, rather than the check you >do later. > >You seem to rely on getting an access_fn entry for each >handled_component_p. >It looks like this is the case -- we even seem to stop at unions (with >the same >fortran "issue"). I'm not sure that's the best thing to do but you >rely on that. Is there a PR for the (IIUC) common as union? Maybe around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41227 COMMON block, BIND(C) and LTO interoperability issues Thanks

On Thu, May 4, 2017 at 7:21 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Thu, May 4, 2017 at 2:12 PM, Richard Biener >> <richard.guenther@gmail.com> wrote: >>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> This patch tries to calculate conservatively-correct distance >>>> vectors for two references whose base addresses are not the same. >>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence >>>> isn't guaranteed to occur. >>>> >>>> The motivating example is: >>>> >>>> struct s { int x[8]; }; >>>> void >>>> f (struct s *a, struct s *b) >>>> { >>>> for (int i = 0; i < 8; ++i) >>>> a->x[i] += b->x[i]; >>>> } >>>> >>>> in which the "a" and "b" accesses are either independent or have a >>>> dependence distance of 0 (assuming -fstrict-aliasing). Neither case >>>> prevents vectorisation, so we can vectorise without an alias check. >>>> >>>> I'd originally wanted to do the same thing for arrays as well, e.g.: >>>> >>>> void >>>> f (int a[][8], struct b[][8]) >>>> { >>>> for (int i = 0; i < 8; ++i) >>>> a[0][i] += b[0][i]; >>>> } >>>> >>>> I think this is valid because C11 6.7.6.2/6 says: >>>> >>>> For two array types to be compatible, both shall have compatible >>>> element types, and if both size specifiers are present, and are >>>> integer constant expressions, then both size specifiers shall have >>>> the same constant value. >>>> >>>> So if we access an array through an int (*)[8], it must have type X[8] >>>> or X[], where X is compatible with int. It doesn't seem possible in >>>> either case for "a[0]" and "b[0]" to overlap when "a != b". >>>> >>>> However, Richard B said that (at least in gimple) we support arbitrary >>>> overlap of arrays and allow arrays to be accessed with different >>>> dimensionality. There are examples of this in PR50067. I've therefore >>>> only handled references that end in a structure field access. >>>> >>>> There are two ways of handling these dependences in the vectoriser: >>>> use them to limit VF, or check at runtime as before. I've gone for >>>> the approach of checking at runtime if we can, to avoid limiting VF >>>> unnecessarily. We still fall back to a VF cap when runtime checks >>>> aren't allowed. >>>> >>>> The patch tests whether we queued an alias check with a dependence >>>> distance of X and then picked a VF <= X, in which case it's safe to >>>> drop the alias check. Since vect_prune_runtime_alias_check_list can >>>> be called twice with different VF for the same loop, it's no longer >>>> safe to clear may_alias_ddrs on exit. Instead we should use >>>> comp_alias_ddrs to check whether versioning is necessary. >>>> >>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >>> >>> You seem to do your "fancy" thing but also later compute the old >>> base equality anyway (for same_base_p). It looks to me for this >>> case the new fancy code can be simply skipped, keeping num_dimensions >>> as before? >>> >>> + /* Try to approach equal type sizes. */ >>> + if (!COMPLETE_TYPE_P (type_a) >>> + || !COMPLETE_TYPE_P (type_b) >>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >>> + break; >>> >>> ah, interesting idea to avoid a quadratic search. Note that you should >>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR >>> as they are used for type-punning. > > All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs, > ARRAY_REFs or COMPONENT_REFs of structures, since that's all that > dr_analyze_indices allows, so I think we safe in terms of the tree codes. Yeah. I think we need to document that we should have a 1:1 match here. >>> I see nonoverlapping_component_refs_of_decl_p should simply skip >>> ARRAY_REFs - but I also see there: >>> >>> /* ??? We cannot simply use the type of operand #0 of the refs here >>> as the Fortran compiler smuggles type punning into COMPONENT_REFs >>> for common blocks instead of using unions like everyone else. */ >>> tree type1 = DECL_CONTEXT (field1); >>> tree type2 = DECL_CONTEXT (field2); >>> >>> so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility. >>> You also may not use types_compatible_p here as for LTO that is _way_ too >>> lax for aggregates. The above uses >>> >>> /* We cannot disambiguate fields in a union or qualified union. */ >>> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) >>> return false; >>> >>> so you should also bail out on unions here, rather than the check you do later. > > The loop stops before we get to a union, so I think "only" the RECORD_TYPE > COMPONENT_REF handling is a potential problem. Does this mean that > I should use the nonoverlapping_component_refs_of_decl_p code: > > tree field1 = TREE_OPERAND (ref1, 1); > tree field2 = TREE_OPERAND (ref2, 1); > > /* ??? We cannot simply use the type of operand #0 of the refs here > as the Fortran compiler smuggles type punning into COMPONENT_REFs > for common blocks instead of using unions like everyone else. */ > tree type1 = DECL_CONTEXT (field1); > tree type2 = DECL_CONTEXT (field2); > > /* We cannot disambiguate fields in a union or qualified union. */ > if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) > return false; > > if (field1 != field2) > { > /* A field and its representative need to be considered the > same. */ > if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2 > || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1) > return false; > /* Different fields of the same record type cannot overlap. > ??? Bitfields can overlap at RTL level so punt on them. */ > if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2)) > return false; > return true; > } > > as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p > during the new loop? Yes. OTOH you want to "match" while the above disambiguates. So it means you should use either FIELD_DECL equality or DECL_CONTEXT of the FIELD_DECL equality (which should be the same in the end). The RTL concern should not matter here. > And test for this as well as unions in the outer > references? So looking at dr_analyze_indices a union would be always the DR_BASE_OBJECT, and you (should) stop the ref walk at DR_BASE_OBJECT. The dr_analyze_indices code is also somewhat fishy in that it simply ignores everything below unhandled component-refs even if there are indices involved (and it gets away with this because dependence analysis likely/hopefully gives up on the DR_BASE_OBJECT equality test in case it is sth like a[i].union for example ... hopefully ...). >>> You seem to rely on getting an access_fn entry for each handled_component_p. >>> It looks like this is the case -- we even seem to stop at unions (with the same >>> fortran "issue"). I'm not sure that's the best thing to do but you >>> rely on that. > > Yeah, the loop is deliberately limited to the components associated with > an access_fn. I did wonder at first whether dr_analyze_indices should > store the original component reference trees for each access function. > That would make things simpler and more explicit, but would also eat up > more memory. Things like object_address_invariant_in_loop_p rely on the > access_fns in the same way that the loop in the patch does. in fact it fails to handle ARRAY_RANGE_REFs ... >>> I don't understand the looping, it needs more comments. You seem to be >>> looking for the innermost compatible RECORD_TYPE but then num_dimensions >>> is how many compatible refs you found on the way (with incompatible ones >>> not counting?!). What about an inner varying array of structs? This seems to >>> be disregarded in the analysis now? Thus, a[i].s.b[i].j vs. __real >>> b[i].s.b[i].j? > > I'll try to improve the comments. But the idea is that both sequences are > as long as possible, while that still gives compatible types. If there is > more than one such sequence, we pick the one nearest the base. > > So in your example, the access functions would be: > > 0 1 2 3 4 > a: .j [i] .b .s [i] > > 0 1 2 3 4 5 > b: __real .j [i] .b .s [i] > > If a and b are pointers, the final access functions would be > unconstrained base accesses, so we'd end up with: > > a: [0, 3] > b: [1, 4] > > for both sequences. > >>> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p >>> conveniently start from the other >>> end of the ref here. >> >> That said, for the motivational cases we either have one ref having >> more dimensions than the other (the __real vs. full complex access) or >> they have the same number of dimensions (and no access fn for the >> base). >> >> For the first case we should simply "drop" access_fns of the larger >> dimensional ref (from the start, plus outer component refs) up to the >> point the number of dimensions are equal. > > Yeah, that's what happens for your example. But if we had: > > a[i].s.c.d > __real b[i].s.b[i].j > > (where d is the same type as the real component) then the access > functions would be: > > 0 1 2 3 > a: .d .c .s [i] > > 0 1 2 3 4 5 > b: __real .j [i] .b .s [i] > > Comparing the a0/b2 column doesn't make sense, because one's an array > and the other is a structure. In this case the sequence we care about is: > > a: [1, 3] > b: [3, 5] > > which is what the loop gives. The a1/b3 column is the one that proves > there's no dependence. > >> Then we have the case of >> >> ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b)) >> >> where we have to punt. >> >> Then we have the case of >> >> ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) >> >> which is where the new code should kick in to see if we can drop access_fns >> from the other end (as unanalyzable but either having distance zero or not >> aliased because of TBAA). >> >> At least your testcases suggest you do not want to handle >> >> struct s { int x[N]; }; >> struct r { struct s s; }; >> f (struct s *a, struct r *b) >> { >> for (i = 0; i < N; ++i) >> a->s.x[i] = b->x[i]; >> } >> >> ? >> >> With this example your loop which seems to search for a "common" >> sequence in (different) midst of the reference trees makes more sense >> (still that loop is awkward to understand). > > Yeah, I want to handle that too, just hadn't thought of it as a specific > testcase. The code does give the expected dependence distance of 0. Ok. I think the patch is reasonable, maybe the loop can be restructured / simplified a bit and handling of the union case for example be done first (by looking at DR_BASE_OBJECT). Thanks, Richard. > > Thanks, > Richard

On Fri, May 5, 2017 at 11:27 PM, Bernhard Reutner-Fischer <rep.dot.nop@gmail.com> wrote: > On 4 May 2017 14:12:04 CEST, Richard Biener <richard.guenther@gmail.com> wrote: > >>nonoverlapping_component_refs_of_decl_p >>should simply skip ARRAY_REFs - but I also see there: >> >> /* ??? We cannot simply use the type of operand #0 of the refs here >> as the Fortran compiler smuggles type punning into COMPONENT_REFs >> for common blocks instead of using unions like everyone else. */ >> tree type1 = DECL_CONTEXT (field1); >> tree type2 = DECL_CONTEXT (field2); >> >>so you probably can't simply use TREE_TYPE (outer_ref) for type >>compatibility. >>You also may not use types_compatible_p here as for LTO that is _way_ >>too >>lax for aggregates. The above uses >> >> /* We cannot disambiguate fields in a union or qualified union. */ >> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) >> return false; >> >>so you should also bail out on unions here, rather than the check you >>do later. >> >>You seem to rely on getting an access_fn entry for each >>handled_component_p. >>It looks like this is the case -- we even seem to stop at unions (with >>the same >>fortran "issue"). I'm not sure that's the best thing to do but you >>rely on that. > > Is there a PR for the (IIUC) common as union? > Maybe around > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41227 > COMMON block, BIND(C) and LTO interoperability issues I'm not sure, this is Erics code so maybe he remembers. Richard. > Thanks

Richard Biener <richard.guenther@gmail.com> writes: > On Thu, May 4, 2017 at 7:21 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Richard Biener <richard.guenther@gmail.com> writes: >>> On Thu, May 4, 2017 at 2:12 PM, Richard Biener >>> <richard.guenther@gmail.com> wrote: >>>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford >>>> <richard.sandiford@linaro.org> wrote: >>>>> This patch tries to calculate conservatively-correct distance >>>>> vectors for two references whose base addresses are not the same. >>>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence >>>>> isn't guaranteed to occur. >>>>> >>>>> The motivating example is: >>>>> >>>>> struct s { int x[8]; }; >>>>> void >>>>> f (struct s *a, struct s *b) >>>>> { >>>>> for (int i = 0; i < 8; ++i) >>>>> a->x[i] += b->x[i]; >>>>> } >>>>> >>>>> in which the "a" and "b" accesses are either independent or have a >>>>> dependence distance of 0 (assuming -fstrict-aliasing). Neither case >>>>> prevents vectorisation, so we can vectorise without an alias check. >>>>> >>>>> I'd originally wanted to do the same thing for arrays as well, e.g.: >>>>> >>>>> void >>>>> f (int a[][8], struct b[][8]) >>>>> { >>>>> for (int i = 0; i < 8; ++i) >>>>> a[0][i] += b[0][i]; >>>>> } >>>>> >>>>> I think this is valid because C11 6.7.6.2/6 says: >>>>> >>>>> For two array types to be compatible, both shall have compatible >>>>> element types, and if both size specifiers are present, and are >>>>> integer constant expressions, then both size specifiers shall have >>>>> the same constant value. >>>>> >>>>> So if we access an array through an int (*)[8], it must have type X[8] >>>>> or X[], where X is compatible with int. It doesn't seem possible in >>>>> either case for "a[0]" and "b[0]" to overlap when "a != b". >>>>> >>>>> However, Richard B said that (at least in gimple) we support arbitrary >>>>> overlap of arrays and allow arrays to be accessed with different >>>>> dimensionality. There are examples of this in PR50067. I've therefore >>>>> only handled references that end in a structure field access. >>>>> >>>>> There are two ways of handling these dependences in the vectoriser: >>>>> use them to limit VF, or check at runtime as before. I've gone for >>>>> the approach of checking at runtime if we can, to avoid limiting VF >>>>> unnecessarily. We still fall back to a VF cap when runtime checks >>>>> aren't allowed. >>>>> >>>>> The patch tests whether we queued an alias check with a dependence >>>>> distance of X and then picked a VF <= X, in which case it's safe to >>>>> drop the alias check. Since vect_prune_runtime_alias_check_list can >>>>> be called twice with different VF for the same loop, it's no longer >>>>> safe to clear may_alias_ddrs on exit. Instead we should use >>>>> comp_alias_ddrs to check whether versioning is necessary. >>>>> >>>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >>>> >>>> You seem to do your "fancy" thing but also later compute the old >>>> base equality anyway (for same_base_p). It looks to me for this >>>> case the new fancy code can be simply skipped, keeping num_dimensions >>>> as before? >>>> >>>> + /* Try to approach equal type sizes. */ >>>> + if (!COMPLETE_TYPE_P (type_a) >>>> + || !COMPLETE_TYPE_P (type_b) >>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >>>> + break; >>>> >>>> ah, interesting idea to avoid a quadratic search. Note that you should >>>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR >>>> as they are used for type-punning. >> >> All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs, >> ARRAY_REFs or COMPONENT_REFs of structures, since that's all that >> dr_analyze_indices allows, so I think we safe in terms of the tree codes. > > Yeah. I think we need to document that we should have a 1:1 match here. OK, I added that to the comments and also added an access_fn_component_p that we can assert on. >>>> I see nonoverlapping_component_refs_of_decl_p should simply skip >>>> ARRAY_REFs - but I also see there: >>>> >>>> /* ??? We cannot simply use the type of operand #0 of the refs here >>>> as the Fortran compiler smuggles type punning into COMPONENT_REFs >>>> for common blocks instead of using unions like everyone else. */ >>>> tree type1 = DECL_CONTEXT (field1); >>>> tree type2 = DECL_CONTEXT (field2); >>>> >>>> so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility. >>>> You also may not use types_compatible_p here as for LTO that is _way_ too >>>> lax for aggregates. The above uses >>>> >>>> /* We cannot disambiguate fields in a union or qualified union. */ >>>> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) >>>> return false; >>>> >>>> so you should also bail out on unions here, rather than the check you do later. >> >> The loop stops before we get to a union, so I think "only" the RECORD_TYPE >> COMPONENT_REF handling is a potential problem. Does this mean that >> I should use the nonoverlapping_component_refs_of_decl_p code: >> >> tree field1 = TREE_OPERAND (ref1, 1); >> tree field2 = TREE_OPERAND (ref2, 1); >> >> /* ??? We cannot simply use the type of operand #0 of the refs here >> as the Fortran compiler smuggles type punning into COMPONENT_REFs >> for common blocks instead of using unions like everyone else. */ >> tree type1 = DECL_CONTEXT (field1); >> tree type2 = DECL_CONTEXT (field2); >> >> /* We cannot disambiguate fields in a union or qualified union. */ >> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) >> return false; >> >> if (field1 != field2) >> { >> /* A field and its representative need to be considered the >> same. */ >> if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2 >> || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1) >> return false; >> /* Different fields of the same record type cannot overlap. >> ??? Bitfields can overlap at RTL level so punt on them. */ >> if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2)) >> return false; >> return true; >> } >> >> as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p >> during the new loop? > > Yes. OTOH you want to "match" while the above disambiguates. So it means > you should use either FIELD_DECL equality or DECL_CONTEXT of the FIELD_DECL > equality (which should be the same in the end). The RTL concern > should not matter > here. The attached patch adds an access_fn_components_comparable_p helper function that checks whether the DECL_CONTEXTs are the same. >> And test for this as well as unions in the outer >> references? > > So looking at dr_analyze_indices a union would be always the DR_BASE_OBJECT, > and you (should) stop the ref walk at DR_BASE_OBJECT. I was just thinking that if the Fortran front-end has cases in which TREE_TYPE (TREE_OPERAND (ref, 0)) != DECL_CONTEXT (TREE_OPERAND (ref, 1)) for a COMPONENT_REF, should we treat that as equivalent to a union access in ref_contains_union_access_p? But I'm not sure that's necessary after all. > The dr_analyze_indices code is also somewhat fishy in that it simply > ignores everything below unhandled component-refs even if there are > indices involved (and it gets away with this because dependence > analysis likely/hopefully gives up on the DR_BASE_OBJECT equality test > in case it is sth like a[i].union for example ... hopefully ...). I think this is what you meant, but: I don't think the base object itself can be a union, because we need at least one component reference for the DR, and don't accept COMPONENT_REFs for unions as access functions. So if the base involves a union, the base would also need to have a COMPONENT_REF that selects a particular member of that union. And yeah, before the patch we did allow a dependence distance to be calculated for a[i].union.f[j] vs. a[i].union.f[j + 1] (and still do after the patch), on the basis that a[i].union.f refers to the same object in both cases. >>>> You seem to rely on getting an access_fn entry for each handled_component_p. >>>> It looks like this is the case -- we even seem to stop at unions >>>> (with the same >>>> fortran "issue"). I'm not sure that's the best thing to do but you >>>> rely on that. >> >> Yeah, the loop is deliberately limited to the components associated with >> an access_fn. I did wonder at first whether dr_analyze_indices should >> store the original component reference trees for each access function. >> That would make things simpler and more explicit, but would also eat up >> more memory. Things like object_address_invariant_in_loop_p rely on the >> access_fns in the same way that the loop in the patch does. > > in fact it fails to handle ARRAY_RANGE_REFs ... Yeah, the whole file seems to ignore those. What kind of code would benefit? >>>> I don't understand the looping, it needs more comments. You seem to be >>>> looking for the innermost compatible RECORD_TYPE but then num_dimensions >>>> is how many compatible refs you found on the way (with incompatible ones >>>> not counting?!). What about an inner varying array of structs? >>>> This seems to >>>> be disregarded in the analysis now? Thus, a[i].s.b[i].j vs. __real >>>> b[i].s.b[i].j? >> >> I'll try to improve the comments. But the idea is that both sequences are >> as long as possible, while that still gives compatible types. If there is >> more than one such sequence, we pick the one nearest the base. >> >> So in your example, the access functions would be: >> >> 0 1 2 3 4 >> a: .j [i] .b .s [i] >> >> 0 1 2 3 4 5 >> b: __real .j [i] .b .s [i] >> >> If a and b are pointers, the final access functions would be >> unconstrained base accesses, so we'd end up with: >> >> a: [0, 3] >> b: [1, 4] >> >> for both sequences. >> >>>> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p >>>> conveniently start from the other >>>> end of the ref here. >>> >>> That said, for the motivational cases we either have one ref having >>> more dimensions than the other (the __real vs. full complex access) or >>> they have the same number of dimensions (and no access fn for the >>> base). >>> >>> For the first case we should simply "drop" access_fns of the larger >>> dimensional ref (from the start, plus outer component refs) up to the >>> point the number of dimensions are equal. >> >> Yeah, that's what happens for your example. But if we had: >> >> a[i].s.c.d >> __real b[i].s.b[i].j >> >> (where d is the same type as the real component) then the access >> functions would be: >> >> 0 1 2 3 >> a: .d .c .s [i] >> >> 0 1 2 3 4 5 >> b: __real .j [i] .b .s [i] >> >> Comparing the a0/b2 column doesn't make sense, because one's an array >> and the other is a structure. In this case the sequence we care about is: >> >> a: [1, 3] >> b: [3, 5] >> >> which is what the loop gives. The a1/b3 column is the one that proves >> there's no dependence. >> >>> Then we have the case of >>> >>> ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b)) >>> >>> where we have to punt. >>> >>> Then we have the case of >>> >>> ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) >>> >>> which is where the new code should kick in to see if we can drop access_fns >>> from the other end (as unanalyzable but either having distance zero or not >>> aliased because of TBAA). >>> >>> At least your testcases suggest you do not want to handle >>> >>> struct s { int x[N]; }; >>> struct r { struct s s; }; >>> f (struct s *a, struct r *b) >>> { >>> for (i = 0; i < N; ++i) >>> a->s.x[i] = b->x[i]; >>> } >>> >>> ? >>> >>> With this example your loop which seems to search for a "common" >>> sequence in (different) midst of the reference trees makes more sense >>> (still that loop is awkward to understand). >> >> Yeah, I want to handle that too, just hadn't thought of it as a specific >> testcase. The code does give the expected dependence distance of 0. > > Ok. > > I think the patch is reasonable, maybe the loop can be restructured / > simplified a bit and handling of the union case for example be done > first (by looking at DR_BASE_OBJECT). I still prefer doing the loop first and keeping the "same base" check together as a single condition, since it means that we're analysing the reference in a single direction (DR_REF to base) rather than jumping around. And unequal bases should be more common that equal ones. I think both orders involve doing potentially redundant work. The current order tends towards doing redundant work for union accesses and !flag_strict_aliasing, but they should be the less common cases. How does this look? Changes since v1: - Added access_fn_component_p to check for valid access function components. - Added access_fn_components_comparable_p instead of using types_compatibloe_p directly. - Added more commentary. - Added local structures to represent the sequence, so that it's more obvious which variables are temporaries and which aren't. - Added the test above to vect-alias-check-3.c. Tested on aarch64-linux-gnu and x86_64-linux-gnu. Thanks, Richard 2017-05-18 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * tree-data-ref.h (subscript): Add access_fn field. (data_dependence_relation): Add could_be_independent_p. (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. (same_access_functions): Move to tree-data-ref.c. * tree-data-ref.c (ref_contains_union_access_p): New function. (access_fn_component_p): Likewise. (access_fn_components_comparable_p): Likewise. (dr_analyze_indices): Add a comment that this code needs to be kept in sync with access_fn_component_p. (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of DR_ACCESS_FN. (constant_access_functions): Likewise. (add_other_self_distances): Likewise. (same_access_functions): Likewise. (Moved from tree-data-ref.h.) (initialize_data_dependence_relation): Use XCNEW and remove explicit zeroing of DDR_REVERSED_P. Look for a subsequence of access functions that have the same type. Allow the subsequence to end with different bases in some circumstances. Record the chosen access functions in SUB_ACCESS_FN. (build_classic_dist_vector_1): Replace ddr_a and ddr_b with a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. (subscript_dependence_tester_1): Likewise dra and drb. (build_classic_dist_vector): Update calls accordingly. (subscript_dependence_tester): Likewise. * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check DDR_COULD_BE_INDEPENDENT_P. * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test comp_alias_ddrs instead of may_alias_ddrs. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Try to mark for aliasing if DDR_COULD_BE_INDEPENDENT_P, but fall back to using the recorded distance vectors if that fails. (dependence_distance_ge_vf): New function. (vect_prune_runtime_alias_test_list): Use it. Don't clear LOOP_VINFO_MAY_ALIAS_DDRS. gcc/testsuite/ * gcc.dg/vect/vect-alias-check-3.c: New test. * gcc.dg/vect/vect-alias-check-4.c: Likewise. * gcc.dg/vect/vect-alias-check-5.c: Likewise.Index: gcc/tree-data-ref.h =================================================================== --- gcc/tree-data-ref.h 2017-05-04 11:36:51.157328631 +0100 +++ gcc/tree-data-ref.h 2017-05-18 07:51:50.871904726 +0100 @@ -191,6 +191,9 @@ struct conflict_function struct subscript { + /* The access functions of the two references. */ + tree access_fn[2]; + /* A description of the iterations for which the elements are accessed twice. */ conflict_function *conflicting_iterations_in_a; @@ -209,6 +212,7 @@ struct subscript typedef struct subscript *subscript_p; +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict @@ -264,6 +268,33 @@ struct data_dependence_relation /* Set to true when the dependence relation is on the same data access. */ bool self_reference_p; + + /* True if the dependence described is conservatively correct rather + than exact, and if it is still possible for the accesses to be + conditionally independent. For example, the a and b references in: + + struct s *a, *b; + for (int i = 0; i < n; ++i) + a->f[i] += b->f[i]; + + conservatively have a distance vector of (0), for the case in which + a == b, but the accesses are independent if a != b. Similarly, + the a and b references in: + + struct s *a, *b; + for (int i = 0; i < n; ++i) + a[0].f[i] += b[i].f[i]; + + conservatively have a distance vector of (0), but they are indepenent + when a != b + i. In contrast, the references in: + + struct s *a; + for (int i = 0; i < n; ++i) + a->f[i] += a->f[i]; + + have the same distance vector of (0), but the accesses can never be + independent. */ + bool could_be_independent_p; }; typedef struct data_dependence_relation *ddr_p; @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \ #define DDR_DIST_VECT(DDR, I) \ DDR_DIST_VECTS (DDR)[I] #define DDR_REVERSED_P(DDR) (DDR)->reversed_p +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p bool dr_analyze_innermost (struct data_reference *, struct loop *); @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data return false; return true; -} - -/* Return true when the DDR contains two data references that have the - same access functions. */ - -static inline bool -same_access_functions (const struct data_dependence_relation *ddr) -{ - unsigned i; - - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), - DR_ACCESS_FN (DDR_B (ddr), i))) - return false; - - return true; } /* Returns true when all the dependences are computable. */ Index: gcc/tree-data-ref.c =================================================================== --- gcc/tree-data-ref.c 2017-05-18 07:51:26.126377691 +0100 +++ gcc/tree-data-ref.c 2017-05-18 07:51:50.871904726 +0100 @@ -123,8 +123,7 @@ Software Foundation; either version 3, o } dependence_stats; static bool subscript_dependence_tester_1 (struct data_dependence_relation *, - struct data_reference *, - struct data_reference *, + unsigned int, unsigned int, struct loop *); /* Returns true iff A divides B. */ @@ -144,6 +143,21 @@ int_divides_p (int a, int b) return ((b % a) == 0); } +/* Return true if reference REF contains a union access. */ + +static bool +ref_contains_union_access_p (tree ref) +{ + while (handled_component_p (ref)) + { + ref = TREE_OPERAND (ref, 0); + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) + return true; + } + return false; +} + /* Dump into FILE all the data references from DATAREFS. */ @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out unsigned int i; struct loop *loopi; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) + subscript *sub; + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) { fprintf (outf, " access_fn_A: "); - print_generic_stmt (outf, DR_ACCESS_FN (dra, i)); + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0)); fprintf (outf, " access_fn_B: "); - print_generic_stmt (outf, DR_ACCESS_FN (drb, i)); - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1)); + dump_subscript (outf, sub); } fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); @@ -886,6 +901,27 @@ dr_analyze_innermost (struct data_refere return true; } +/* Return true if OP is a valid component reference for a DR access + function. This accepts a subset of what handled_component_p accepts. */ + +static bool +access_fn_component_p (tree op) +{ + switch (TREE_CODE (op)) + { + case REALPART_EXPR: + case IMAGPART_EXPR: + case ARRAY_REF: + return true; + + case COMPONENT_REF: + return TREE_CODE (TREE_TYPE (TREE_OPERAND (op, 0))) == RECORD_TYPE; + + default: + return false; + } +} + /* Determines the base object and the list of indices of memory reference DR, analyzed in LOOP and instantiated in loop nest NEST. */ @@ -923,7 +959,9 @@ dr_analyze_indices (struct data_referenc access_fns.safe_push (integer_one_node); } - /* Analyze access functions of dimensions we know to be independent. */ + /* Analyze access functions of dimensions we know to be independent. + The list of component references handled here should be kept in + sync with access_fn_component_p. */ while (handled_component_p (ref)) { if (TREE_CODE (ref) == ARRAY_REF) @@ -1472,6 +1510,27 @@ dr_may_alias_p (const struct data_refere return refs_may_alias_p (addr_a, addr_b); } +/* REF_A and REF_B both satisfy access_fns_comparable_p. Return true + if it is meaningful to compare their associated access functions + when checking for dependencies. */ + +static bool +access_fn_components_comparable_p (tree ref_a, tree ref_b) +{ + if (TREE_CODE (ref_a) != TREE_CODE (ref_b)) + return false; + + if (TREE_CODE (ref_a) == COMPONENT_REF) + /* ??? We cannot simply use the type of operand #0 of the refs here as + the Fortran compiler smuggles type punning into COMPONENT_REFs. + Use the DECL_CONTEXT of the FIELD_DECLs instead. */ + return (DECL_CONTEXT (TREE_OPERAND (ref_a, 1)) + == DECL_CONTEXT (TREE_OPERAND (ref_b, 1))); + + return types_compatible_p (TREE_TYPE (TREE_OPERAND (ref_a, 0)), + TREE_TYPE (TREE_OPERAND (ref_b, 0))); +} + /* Initialize a data dependence relation between data accesses A and B. NB_LOOPS is the number of loops surrounding the references: the size of the classic distance/direction vectors. */ @@ -1484,11 +1543,10 @@ initialize_data_dependence_relation (str struct data_dependence_relation *res; unsigned int i; - res = XNEW (struct data_dependence_relation); + res = XCNEW (struct data_dependence_relation); DDR_A (res) = a; DDR_B (res) = b; DDR_LOOP_NEST (res).create (0); - DDR_REVERSED_P (res) = false; DDR_SUBSCRIPTS (res).create (0); DDR_DIR_VECTS (res).create (0); DDR_DIST_VECTS (res).create (0); @@ -1506,82 +1564,277 @@ initialize_data_dependence_relation (str return res; } - /* The case where the references are exactly the same. */ - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); + if (num_dimensions_a == 0 || num_dimensions_b == 0) { - if ((loop_nest.exists () - && !object_address_invariant_in_loop_p (loop_nest[0], - DR_BASE_OBJECT (a))) - || DR_NUM_DIMENSIONS (a) == 0) + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + /* For unconstrained bases, the root (highest-indexed) subscript + describes a variation in the base of the original DR_REF rather + than a component access. We have no type that accurately describes + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* + applying this subscript) so limit the search to the last real + component access. + + E.g. for: + + void + f (int a[][8], int b[][8]) + { + for (int i = 0; i < 8; ++i) + a[i * 2][0] = b[i][0]; + } + + the a and b accesses have a single ARRAY_REF component reference [0] + but have two subscripts. */ + if (DR_UNCONSTRAINED_BASE (a)) + num_dimensions_a -= 1; + if (DR_UNCONSTRAINED_BASE (b)) + num_dimensions_b -= 1; + + /* These structures describe sequences of component references in + DR_REF (A) and DR_REF (B). Each component reference is tied to a + specific access function. */ + struct { + /* The sequence starts at DR_ACCESS_FN (A, START_A) of A and + DR_ACCESS_FN (B, START_B) of B (inclusive) and extends to higher + indices. In C notation, these are the indices of the rightmost + component references; e.g. for a sequence .b.c.d, the start + index is for .d. */ + unsigned int start_a; + unsigned int start_b; + + /* The sequence contains LENGTH consecutive access functions from + each DR. */ + unsigned int length; + + /* The enclosing objects for the A and B sequences respectively, + i.e. the objects to which DR_ACCESS_FN (A, START_A + LENGTH - 1) + and DR_ACCESS_FN (B, START_B + LENGTH - 1) are applied. */ + tree object_a; + tree object_b; + } full_seq = {}, struct_seq = {}; + + /* Before each iteration of the loop: + + - REF_A is what you get after applying DR_ACCESS_FN (A, INDEX_A) and + - REF_B is what you get after applying DR_ACCESS_FN (B, INDEX_B). */ + unsigned int index_a = 0; + unsigned int index_b = 0; + tree ref_a = DR_REF (a); + tree ref_b = DR_REF (b); + + /* Now walk the component references from the final DR_REFs back up to + the enclosing base objects. Each component reference corresponds + to one access function in the DR, with access function 0 being for + the final DR_REF and the highest-indexed access function being the + one that is applied to the base of the DR. + + Look for a sequence of component references whose access functions + are comparable (see access_fn_components_comparable_p). If more + than one such sequence exists, pick the one nearest the base + (which is the leftmost sequence in C notation). Store this sequence + in FULL_SEQ. + + For example, if we have: + + struct foo { struct bar s; ... } (*a)[10], (*b)[10]; + + A: a[0][i].s.c.d + B: __real b[0][i].s.e[i].f + + (where d is the same type as the real component of f) then the access + functions would be: + + 0 1 2 3 + A: .d .c .s [i] + + 0 1 2 3 4 5 + B: __real .f [i] .e .s [i] + + The A0/B2 column isn't comparable, since .d is a COMPONENT_REF + and [i] is an ARRAY_REF. However, the A1/B3 column contains two + COMPONENT_REF accesses for struct bar, so is comparable. Likewise + the A2/B4 column contains two COMPONENT_REF accesses for struct foo, + so is comparable. The A3/B5 column contains two ARRAY_REFs that + index foo[10] arrays, so is again comparable. The sequence is + therefore: + + A: [1, 3] (i.e. [i].s.c) + B: [3, 5] (i.e. [i].s.e) + + Also look for sequences of component references whose access + functions are comparable and whose enclosing objects have the same + RECORD_TYPE. Store this sequence in STRUCT_SEQ. In the above + example, STRUCT_SEQ would be: + + A: [1, 2] (i.e. s.c) + B: [3, 4] (i.e. s.e) */ + while (index_a < num_dimensions_a && index_b < num_dimensions_b) + { + /* REF_A and REF_B must be one of the component access types + allowed by dr_analyze_indices. */ + gcc_checking_assert (access_fn_component_p (ref_a)); + gcc_checking_assert (access_fn_component_p (ref_b)); + + /* Get the immediately-enclosing objects for REF_A and REF_B, + i.e. the references *before* applying DR_ACCESS_FN (A, INDEX_A) + and DR_ACCESS_FN (B, INDEX_B). */ + tree object_a = TREE_OPERAND (ref_a, 0); + tree object_b = TREE_OPERAND (ref_b, 0); + + tree type_a = TREE_TYPE (object_a); + tree type_b = TREE_TYPE (object_b); + if (access_fn_components_comparable_p (ref_a, ref_b)) + { + /* This pair of component accesses is comparable for dependence + analysis, so we can include DR_ACCESS_FN (A, INDEX_A) and + DR_ACCESS_FN (B, INDEX_B) in the sequence. */ + if (full_seq.start_a + full_seq.length != index_a + || full_seq.start_b + full_seq.length != index_b) + { + /* The accesses don't extend the current sequence, + so start a new one here. */ + full_seq.start_a = index_a; + full_seq.start_b = index_b; + full_seq.length = 0; + } + + /* Add this pair of references to the sequence. */ + full_seq.length += 1; + full_seq.object_a = object_a; + full_seq.object_b = object_b; + + /* If the enclosing objects are structures (and thus have the + same RECORD_TYPE), record the new sequence in STRUCT_SEQ. */ + if (TREE_CODE (type_a) == RECORD_TYPE) + struct_seq = full_seq; + + /* Move to the next containing reference for both A and B. */ + ref_a = object_a; + ref_b = object_b; + index_a += 1; + index_b += 1; + continue; + } + + /* Try to approach equal type sizes. */ + if (!COMPLETE_TYPE_P (type_a) + || !COMPLETE_TYPE_P (type_b) + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) + break; + + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); + if (size_a <= size_b) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + index_a += 1; + ref_a = object_a; + } + if (size_b <= size_a) + { + index_b += 1; + ref_b = object_b; } - DDR_AFFINE_P (res) = true; - DDR_ARE_DEPENDENT (res) = NULL_TREE; - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); - DDR_LOOP_NEST (res) = loop_nest; - DDR_INNER_LOOP (res) = 0; - DDR_SELF_REFERENCE (res) = true; - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) - { - struct subscript *subscript; - - subscript = XNEW (struct subscript); - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; - SUB_DISTANCE (subscript) = chrec_dont_know; - DDR_SUBSCRIPTS (res).safe_push (subscript); - } - return res; } - /* If the references do not access the same object, we do not know - whether they alias or not. We do not care about TBAA or alignment - info so we can use OEP_ADDRESS_OF to avoid false negatives. - But the accesses have to use compatible types as otherwise the - built indices would not match. */ - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF) - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), - TREE_TYPE (DR_BASE_OBJECT (b)))) + /* See whether FULL_SEQ ends at the base and whether the two bases + are equal. We do not care about TBAA or alignment info so we can + use OEP_ADDRESS_OF to avoid false negatives. */ + tree base_a = DR_BASE_OBJECT (a); + tree base_b = DR_BASE_OBJECT (b); + bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a + && full_seq.start_b + full_seq.length == num_dimensions_b + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) + && types_compatible_p (TREE_TYPE (base_a), + TREE_TYPE (base_b)) + && (!loop_nest.exists () + || (object_address_invariant_in_loop_p + (loop_nest[0], base_a)))); + + /* If the bases are the same, we can include the base variation too. + E.g. the b accesses in: + + for (int i = 0; i < n; ++i) + b[i + 4][0] = b[i][0]; + + have a definite dependence distance of 4, while for: + + for (int i = 0; i < n; ++i) + a[i + 4][0] = b[i][0]; + + the dependence distance depends on the gap between a and b. + + If the bases are different then we can only rely on the sequence + rooted at a structure access, since arrays are allowed to overlap + arbitrarily and change shape arbitrarily. E.g. we treat this as + valid code: + + int a[256]; + ... + ((int (*)[4][3]) &a[1])[i][0] += ((int (*)[4][3]) &a[2])[i][0]; + + where two lvalues with the same int[4][3] type overlap, and where + both lvalues are distinct from the object's declared type. */ + if (same_base_p) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + if (DR_UNCONSTRAINED_BASE (a)) + full_seq.length += 1; } + else + full_seq = struct_seq; - /* If the base of the object is not invariant in the loop nest, we cannot - analyze it. TODO -- in fact, it would suffice to record that there may - be arbitrary dependences in the loops where the base object varies. */ - if ((loop_nest.exists () - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a))) - || DR_NUM_DIMENSIONS (a) == 0) + /* Punt if we didn't find a suitable sequence. */ + if (full_seq.length == 0) { DDR_ARE_DEPENDENT (res) = chrec_dont_know; return res; } - /* If the number of dimensions of the access to not agree we can have - a pointer access to a component of the array element type and an - array access while the base-objects are still the same. Punt. */ - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) + if (!same_base_p) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + /* Partial overlap is possible for different bases when strict aliasing + is not in effect. It's also possible if either base involves a union + access; e.g. for: + + struct s1 { int a[2]; }; + struct s2 { struct s1 b; int c; }; + struct s3 { int d; struct s1 e; }; + union u { struct s2 f; struct s3 g; } *p, *q; + + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at + "p->g.e" (base "p->g") and might partially overlap the s1 at + "q->g.e" (base "q->g"). */ + if (!flag_strict_aliasing + || ref_contains_union_access_p (full_seq.object_a) + || ref_contains_union_access_p (full_seq.object_b)) + { + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + DDR_COULD_BE_INDEPENDENT_P (res) = true; } DDR_AFFINE_P (res) = true; DDR_ARE_DEPENDENT (res) = NULL_TREE; - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); + DDR_SUBSCRIPTS (res).create (full_seq.length); DDR_LOOP_NEST (res) = loop_nest; DDR_INNER_LOOP (res) = 0; DDR_SELF_REFERENCE (res) = false; - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) + for (i = 0; i < full_seq.length; ++i) { struct subscript *subscript; subscript = XNEW (struct subscript); + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i); + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i); SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); SUB_LAST_CONFLICT (subscript) = chrec_dont_know; @@ -3163,14 +3416,15 @@ add_outer_distances (struct data_depende } /* Return false when fail to represent the data dependence as a - distance vector. INIT_B is set to true when a component has been + distance vector. A_INDEX is the index of the first reference + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the + second reference. INIT_B is set to true when a component has been added to the distance vector DIST_V. INDEX_CARRY is then set to the index in DIST_V that carries the dependence. */ static bool build_classic_dist_vector_1 (struct data_dependence_relation *ddr, - struct data_reference *ddr_a, - struct data_reference *ddr_b, + unsigned int a_index, unsigned int b_index, lambda_vector dist_v, bool *init_b, int *index_carry) { @@ -3188,8 +3442,8 @@ build_classic_dist_vector_1 (struct data return false; } - access_fn_a = DR_ACCESS_FN (ddr_a, i); - access_fn_b = DR_ACCESS_FN (ddr_b, i); + access_fn_a = SUB_ACCESS_FN (subscript, a_index); + access_fn_b = SUB_ACCESS_FN (subscript, b_index); if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) @@ -3249,10 +3503,11 @@ build_classic_dist_vector_1 (struct data constant_access_functions (const struct data_dependence_relation *ddr) { unsigned i; + subscript *sub; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) return false; return true; @@ -3315,10 +3570,11 @@ add_other_self_distances (struct data_de lambda_vector dist_v; unsigned i; int index_carry = DDR_NB_LOOPS (ddr); + subscript *sub; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) { - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); + tree access_fun = SUB_ACCESS_FN (sub, 0); if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) { @@ -3330,7 +3586,7 @@ add_other_self_distances (struct data_de return; } - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) add_multivariate_self_dist (ddr, access_fun); @@ -3401,6 +3657,23 @@ add_distance_for_zero_overlaps (struct d } } +/* Return true when the DDR contains two data references that have the + same access functions. */ + +static inline bool +same_access_functions (const struct data_dependence_relation *ddr) +{ + unsigned i; + subscript *sub; + + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), + SUB_ACCESS_FN (sub, 1))) + return false; + + return true; +} + /* Compute the classic per loop distance vector. DDR is the data dependence relation to build a vector from. Return false when fail to represent the data dependence as a distance vector. */ @@ -3432,8 +3705,7 @@ build_classic_dist_vector (struct data_d } dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), - dist_v, &init_b, &index_carry)) + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry)) return false; /* Save the distance vector if we initialized one. */ @@ -3466,12 +3738,11 @@ build_classic_dist_vector (struct data_d if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) { lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), - loop_nest)) + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) return false; compute_subscript_distance (ddr); - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), - save_v, &init_b, &index_carry)) + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, + &index_carry)) return false; save_dist_v (ddr, save_v); DDR_REVERSED_P (ddr) = true; @@ -3507,12 +3778,10 @@ build_classic_dist_vector (struct data_d { lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), - DDR_A (ddr), loop_nest)) + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) return false; compute_subscript_distance (ddr); - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), - opposite_v, &init_b, + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, &index_carry)) return false; @@ -3591,13 +3860,13 @@ build_classic_dir_vector (struct data_de } } -/* Helper function. Returns true when there is a dependence between - data references DRA and DRB. */ +/* Helper function. Returns true when there is a dependence between the + data references. A_INDEX is the index of the first reference (0 for + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ static bool subscript_dependence_tester_1 (struct data_dependence_relation *ddr, - struct data_reference *dra, - struct data_reference *drb, + unsigned int a_index, unsigned int b_index, struct loop *loop_nest) { unsigned int i; @@ -3609,8 +3878,8 @@ subscript_dependence_tester_1 (struct da { conflict_function *overlaps_a, *overlaps_b; - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), - DR_ACCESS_FN (drb, i), + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), + SUB_ACCESS_FN (subscript, b_index), &overlaps_a, &overlaps_b, &last_conflicts, loop_nest); @@ -3659,7 +3928,7 @@ subscript_dependence_tester_1 (struct da subscript_dependence_tester (struct data_dependence_relation *ddr, struct loop *loop_nest) { - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest)) + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) dependence_stats.num_dependence_dependent++; compute_subscript_distance (ddr); Index: gcc/tree-ssa-loop-prefetch.c =================================================================== --- gcc/tree-ssa-loop-prefetch.c 2017-05-18 07:51:26.127377591 +0100 +++ gcc/tree-ssa-loop-prefetch.c 2017-05-18 07:51:50.871904726 +0100 @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop * refb = (struct mem_ref *) DDR_B (dep)->aux; if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know + || DDR_COULD_BE_INDEPENDENT_P (dep) || DDR_NUM_DIST_VECTS (dep) == 0) { /* If the dependence cannot be analyzed, assume that there might be Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-05-18 07:51:26.128377491 +0100 +++ gcc/tree-vectorizer.h 2017-05-18 07:51:50.872904626 +0100 @@ -383,7 +383,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ ((L)->may_misalign_stmts.length () > 0) #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ - ((L)->may_alias_ddrs.length () > 0) + ((L)->comp_alias_ddrs.length () > 0) #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) #define LOOP_REQUIRES_VERSIONING(L) \ Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-05-18 07:51:23.307659382 +0100 +++ gcc/tree-vect-data-refs.c 2017-05-18 07:51:50.872904626 +0100 @@ -340,6 +340,26 @@ vect_analyze_data_ref_dependence (struct } loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); + + if (DDR_COULD_BE_INDEPENDENT_P (ddr)) + /* For dependence distances of 2 or more, we have the option of + limiting VF or checking for an alias at runtime. Prefer to check + at runtime if we can, to avoid limiting the VF unnecessarily when + the bases are in fact independent. + + Note that the alias checks will be removed if the VF ends up + being small enough. */ + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) + { + int dist = dist_v[loop_depth]; + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) + { + if (vect_mark_for_runtime_alias_test (ddr, loop_vinfo)) + return false; + break; + } + } + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) { int dist = dist_v[loop_depth]; @@ -3017,6 +3037,44 @@ vect_no_alias_p (struct data_reference * return false; } +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH + in DDR is >= VF. */ + +static bool +dependence_distance_ge_vf (data_dependence_relation *ddr, + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) +{ + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE + || DDR_NUM_DIST_VECTS (ddr) == 0) + return false; + + /* If the dependence is exact, we should have limited the VF instead. */ + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); + + unsigned int i; + lambda_vector dist_v; + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) + { + HOST_WIDE_INT dist = dist_v[loop_depth]; + if (dist != 0 + && !(dist > 0 && DDR_REVERSED_P (ddr)) + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) + return false; + } + + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + "dependence distance between "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); + dump_printf (MSG_NOTE, " and "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); + dump_printf (MSG_NOTE, " is >= VF\n"); + } + + return true; +} + /* Function vect_prune_runtime_alias_test_list. Prune a list of ddrs to be tested at run-time by versioning for alias. @@ -3075,6 +3133,10 @@ vect_prune_runtime_alias_test_list (loop comp_alias_ddrs.create (may_alias_ddrs.length ()); + unsigned int loop_depth + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, + LOOP_VINFO_LOOP_NEST (loop_vinfo)); + /* First, we collect all data ref pairs for aliasing checks. */ FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) { @@ -3084,6 +3146,11 @@ vect_prune_runtime_alias_test_list (loop tree segment_length_a, segment_length_b; gimple *stmt_a, *stmt_b; + /* Ignore the alias if the VF we chose ended up being no greater + than the dependence distance. */ + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) + continue; + dr_a = DDR_A (ddr); stmt_a = DR_STMT (DDR_A (ddr)); dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); @@ -3294,10 +3361,6 @@ vect_prune_runtime_alias_test_list (loop return false; } - /* All alias checks have been resolved at compilation time. */ - if (!comp_alias_ddrs.length ()) - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); - return true; } Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c =================================================================== --- /dev/null 2017-05-17 17:16:48.996861112 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-05-18 07:51:50.870904826 +0100 @@ -0,0 +1,112 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ + +/* Intended to be larger than any VF. */ +#define GAP 128 +#define N (GAP * 3) + +struct s { int x[N + 1]; }; +struct t { struct s x[N + 1]; }; +struct u { int x[N + 1]; int y; }; +struct v { struct s s; }; + +void +f1 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i]; +} + +void +f2 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[2].x[i]; +} + +void +f3 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[i].x[i]; +} + +void +f4 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[i].x[i] += b[i].x[i]; +} + +void +f5 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i + 1]; +} + +void +f6 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[2].x[i + 1]; +} + +void +f7 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[i].x[i + 1]; +} + +void +f8 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[i].x[i] += b[i].x[i + 1]; +} + +void +f9 (struct s *a, struct t *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[1].x[i]; +} + +void +f10 (struct s *a, struct t *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i].x[i]; +} + +void +f11 (struct u *a, struct u *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i] + b[i].y; +} + +void +f12 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP; ++i) + a->x[i + GAP] += b->x[i]; +} + +void +f13 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP * 2; ++i) + a->x[i + GAP] += b->x[i]; +} + +void +f14 (struct v *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a->s.x[i] = b->x[i]; +} + +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 14 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c =================================================================== --- /dev/null 2017-05-17 17:16:48.996861112 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-05-18 07:51:50.870904826 +0100 @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ + +#define N 16 + +struct s1 { int a[N]; }; +struct s2 { struct s1 b; int c; }; +struct s3 { int d; struct s1 e; }; +union u { struct s2 f; struct s3 g; }; + +/* We allow a and b to overlap arbitrarily. */ + +void +f1 (int a[][N], int b[][N]) +{ + for (int i = 0; i < N; ++i) + a[0][i] += b[0][i]; +} + +void +f2 (union u *a, union u *b) +{ + for (int i = 0; i < N; ++i) + a->f.b.a[i] += b->g.e.a[i]; +} + +void +f3 (struct s1 *a, struct s1 *b) +{ + for (int i = 0; i < N - 1; ++i) + a->a[i + 1] += b->a[i]; +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c =================================================================== --- /dev/null 2017-05-17 17:16:48.996861112 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-05-18 07:51:50.870904826 +0100 @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +/* Intended to be larger than any VF. */ +#define GAP 128 +#define N (GAP * 3) + +struct s { int x[N]; }; + +void +f1 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP * 2; ++i) + a->x[i + GAP] += b->x[i]; +} + +/* { dg-final { scan-tree-dump-times "mark for run-time aliasing" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */

Ping Richard Sandiford <richard.sandiford@linaro.org> writes: > Richard Biener <richard.guenther@gmail.com> writes: >> On Thu, May 4, 2017 at 7:21 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> Richard Biener <richard.guenther@gmail.com> writes: >>>> On Thu, May 4, 2017 at 2:12 PM, Richard Biener >>>> <richard.guenther@gmail.com> wrote: >>>>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford >>>>> <richard.sandiford@linaro.org> wrote: >>>>>> This patch tries to calculate conservatively-correct distance >>>>>> vectors for two references whose base addresses are not the same. >>>>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence >>>>>> isn't guaranteed to occur. >>>>>> >>>>>> The motivating example is: >>>>>> >>>>>> struct s { int x[8]; }; >>>>>> void >>>>>> f (struct s *a, struct s *b) >>>>>> { >>>>>> for (int i = 0; i < 8; ++i) >>>>>> a->x[i] += b->x[i]; >>>>>> } >>>>>> >>>>>> in which the "a" and "b" accesses are either independent or have a >>>>>> dependence distance of 0 (assuming -fstrict-aliasing). Neither case >>>>>> prevents vectorisation, so we can vectorise without an alias check. >>>>>> >>>>>> I'd originally wanted to do the same thing for arrays as well, e.g.: >>>>>> >>>>>> void >>>>>> f (int a[][8], struct b[][8]) >>>>>> { >>>>>> for (int i = 0; i < 8; ++i) >>>>>> a[0][i] += b[0][i]; >>>>>> } >>>>>> >>>>>> I think this is valid because C11 6.7.6.2/6 says: >>>>>> >>>>>> For two array types to be compatible, both shall have compatible >>>>>> element types, and if both size specifiers are present, and are >>>>>> integer constant expressions, then both size specifiers shall have >>>>>> the same constant value. >>>>>> >>>>>> So if we access an array through an int (*)[8], it must have type X[8] >>>>>> or X[], where X is compatible with int. It doesn't seem possible in >>>>>> either case for "a[0]" and "b[0]" to overlap when "a != b". >>>>>> >>>>>> However, Richard B said that (at least in gimple) we support arbitrary >>>>>> overlap of arrays and allow arrays to be accessed with different >>>>>> dimensionality. There are examples of this in PR50067. I've therefore >>>>>> only handled references that end in a structure field access. >>>>>> >>>>>> There are two ways of handling these dependences in the vectoriser: >>>>>> use them to limit VF, or check at runtime as before. I've gone for >>>>>> the approach of checking at runtime if we can, to avoid limiting VF >>>>>> unnecessarily. We still fall back to a VF cap when runtime checks >>>>>> aren't allowed. >>>>>> >>>>>> The patch tests whether we queued an alias check with a dependence >>>>>> distance of X and then picked a VF <= X, in which case it's safe to >>>>>> drop the alias check. Since vect_prune_runtime_alias_check_list can >>>>>> be called twice with different VF for the same loop, it's no longer >>>>>> safe to clear may_alias_ddrs on exit. Instead we should use >>>>>> comp_alias_ddrs to check whether versioning is necessary. >>>>>> >>>>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >>>>> >>>>> You seem to do your "fancy" thing but also later compute the old >>>>> base equality anyway (for same_base_p). It looks to me for this >>>>> case the new fancy code can be simply skipped, keeping num_dimensions >>>>> as before? >>>>> >>>>> + /* Try to approach equal type sizes. */ >>>>> + if (!COMPLETE_TYPE_P (type_a) >>>>> + || !COMPLETE_TYPE_P (type_b) >>>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >>>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >>>>> + break; >>>>> >>>>> ah, interesting idea to avoid a quadratic search. Note that you should >>>>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR >>>>> as they are used for type-punning. >>> >>> All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs, >>> ARRAY_REFs or COMPONENT_REFs of structures, since that's all that >>> dr_analyze_indices allows, so I think we safe in terms of the tree codes. >> >> Yeah. I think we need to document that we should have a 1:1 match here. > > OK, I added that to the comments and also added an access_fn_component_p > that we can assert on. > >>>>> I see nonoverlapping_component_refs_of_decl_p should simply skip >>>>> ARRAY_REFs - but I also see there: >>>>> >>>>> /* ??? We cannot simply use the type of operand #0 of the refs here >>>>> as the Fortran compiler smuggles type punning into COMPONENT_REFs >>>>> for common blocks instead of using unions like everyone else. */ >>>>> tree type1 = DECL_CONTEXT (field1); >>>>> tree type2 = DECL_CONTEXT (field2); >>>>> >>>>> so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility. >>>>> You also may not use types_compatible_p here as for LTO that is _way_ too >>>>> lax for aggregates. The above uses >>>>> >>>>> /* We cannot disambiguate fields in a union or qualified union. */ >>>>> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) >>>>> return false; >>>>> >>>>> so you should also bail out on unions here, rather than the check you do later. >>> >>> The loop stops before we get to a union, so I think "only" the RECORD_TYPE >>> COMPONENT_REF handling is a potential problem. Does this mean that >>> I should use the nonoverlapping_component_refs_of_decl_p code: >>> >>> tree field1 = TREE_OPERAND (ref1, 1); >>> tree field2 = TREE_OPERAND (ref2, 1); >>> >>> /* ??? We cannot simply use the type of operand #0 of the refs here >>> as the Fortran compiler smuggles type punning into COMPONENT_REFs >>> for common blocks instead of using unions like everyone else. */ >>> tree type1 = DECL_CONTEXT (field1); >>> tree type2 = DECL_CONTEXT (field2); >>> >>> /* We cannot disambiguate fields in a union or qualified union. */ >>> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) >>> return false; >>> >>> if (field1 != field2) >>> { >>> /* A field and its representative need to be considered the >>> same. */ >>> if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2 >>> || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1) >>> return false; >>> /* Different fields of the same record type cannot overlap. >>> ??? Bitfields can overlap at RTL level so punt on them. */ >>> if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2)) >>> return false; >>> return true; >>> } >>> >>> as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p >>> during the new loop? >> >> Yes. OTOH you want to "match" while the above disambiguates. So it means >> you should use either FIELD_DECL equality or DECL_CONTEXT of the FIELD_DECL >> equality (which should be the same in the end). The RTL concern >> should not matter >> here. > > The attached patch adds an access_fn_components_comparable_p helper > function that checks whether the DECL_CONTEXTs are the same. > >>> And test for this as well as unions in the outer >>> references? >> >> So looking at dr_analyze_indices a union would be always the DR_BASE_OBJECT, >> and you (should) stop the ref walk at DR_BASE_OBJECT. > > I was just thinking that if the Fortran front-end has cases in which > TREE_TYPE (TREE_OPERAND (ref, 0)) != DECL_CONTEXT (TREE_OPERAND (ref, 1)) > for a COMPONENT_REF, should we treat that as equivalent to a union > access in ref_contains_union_access_p? But I'm not sure that's > necessary after all. > >> The dr_analyze_indices code is also somewhat fishy in that it simply >> ignores everything below unhandled component-refs even if there are >> indices involved (and it gets away with this because dependence >> analysis likely/hopefully gives up on the DR_BASE_OBJECT equality test >> in case it is sth like a[i].union for example ... hopefully ...). > > I think this is what you meant, but: I don't think the base object > itself can be a union, because we need at least one component reference > for the DR, and don't accept COMPONENT_REFs for unions as access functions. > So if the base involves a union, the base would also need to have a > COMPONENT_REF that selects a particular member of that union. > > And yeah, before the patch we did allow a dependence distance to be > calculated for a[i].union.f[j] vs. a[i].union.f[j + 1] (and still do > after the patch), on the basis that a[i].union.f refers to the same > object in both cases. > >>>>> You seem to rely on getting an access_fn entry for each handled_component_p. >>>>> It looks like this is the case -- we even seem to stop at unions >>>>> (with the same >>>>> fortran "issue"). I'm not sure that's the best thing to do but you >>>>> rely on that. >>> >>> Yeah, the loop is deliberately limited to the components associated with >>> an access_fn. I did wonder at first whether dr_analyze_indices should >>> store the original component reference trees for each access function. >>> That would make things simpler and more explicit, but would also eat up >>> more memory. Things like object_address_invariant_in_loop_p rely on the >>> access_fns in the same way that the loop in the patch does. >> >> in fact it fails to handle ARRAY_RANGE_REFs ... > > Yeah, the whole file seems to ignore those. What kind of code would > benefit? > >>>>> I don't understand the looping, it needs more comments. You seem to be >>>>> looking for the innermost compatible RECORD_TYPE but then num_dimensions >>>>> is how many compatible refs you found on the way (with incompatible ones >>>>> not counting?!). What about an inner varying array of structs? >>>>> This seems to >>>>> be disregarded in the analysis now? Thus, a[i].s.b[i].j vs. __real >>>>> b[i].s.b[i].j? >>> >>> I'll try to improve the comments. But the idea is that both sequences are >>> as long as possible, while that still gives compatible types. If there is >>> more than one such sequence, we pick the one nearest the base. >>> >>> So in your example, the access functions would be: >>> >>> 0 1 2 3 4 >>> a: .j [i] .b .s [i] >>> >>> 0 1 2 3 4 5 >>> b: __real .j [i] .b .s [i] >>> >>> If a and b are pointers, the final access functions would be >>> unconstrained base accesses, so we'd end up with: >>> >>> a: [0, 3] >>> b: [1, 4] >>> >>> for both sequences. >>> >>>>> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p >>>>> conveniently start from the other >>>>> end of the ref here. >>>> >>>> That said, for the motivational cases we either have one ref having >>>> more dimensions than the other (the __real vs. full complex access) or >>>> they have the same number of dimensions (and no access fn for the >>>> base). >>>> >>>> For the first case we should simply "drop" access_fns of the larger >>>> dimensional ref (from the start, plus outer component refs) up to the >>>> point the number of dimensions are equal. >>> >>> Yeah, that's what happens for your example. But if we had: >>> >>> a[i].s.c.d >>> __real b[i].s.b[i].j >>> >>> (where d is the same type as the real component) then the access >>> functions would be: >>> >>> 0 1 2 3 >>> a: .d .c .s [i] >>> >>> 0 1 2 3 4 5 >>> b: __real .j [i] .b .s [i] >>> >>> Comparing the a0/b2 column doesn't make sense, because one's an array >>> and the other is a structure. In this case the sequence we care about is: >>> >>> a: [1, 3] >>> b: [3, 5] >>> >>> which is what the loop gives. The a1/b3 column is the one that proves >>> there's no dependence. >>> >>>> Then we have the case of >>>> >>>> ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b)) >>>> >>>> where we have to punt. >>>> >>>> Then we have the case of >>>> >>>> ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) >>>> >>>> which is where the new code should kick in to see if we can drop access_fns >>>> from the other end (as unanalyzable but either having distance zero or not >>>> aliased because of TBAA). >>>> >>>> At least your testcases suggest you do not want to handle >>>> >>>> struct s { int x[N]; }; >>>> struct r { struct s s; }; >>>> f (struct s *a, struct r *b) >>>> { >>>> for (i = 0; i < N; ++i) >>>> a->s.x[i] = b->x[i]; >>>> } >>>> >>>> ? >>>> >>>> With this example your loop which seems to search for a "common" >>>> sequence in (different) midst of the reference trees makes more sense >>>> (still that loop is awkward to understand). >>> >>> Yeah, I want to handle that too, just hadn't thought of it as a specific >>> testcase. The code does give the expected dependence distance of 0. >> >> Ok. >> >> I think the patch is reasonable, maybe the loop can be restructured / >> simplified a bit and handling of the union case for example be done >> first (by looking at DR_BASE_OBJECT). > > I still prefer doing the loop first and keeping the "same base" check > together as a single condition, since it means that we're analysing the > reference in a single direction (DR_REF to base) rather than jumping > around. And unequal bases should be more common that equal ones. > > I think both orders involve doing potentially redundant work. The > current order tends towards doing redundant work for union accesses > and !flag_strict_aliasing, but they should be the less common cases. > > How does this look? Changes since v1: > > - Added access_fn_component_p to check for valid access function components. > > - Added access_fn_components_comparable_p instead of using > types_compatibloe_p directly. > > - Added more commentary. > > - Added local structures to represent the sequence, so that it's > more obvious which variables are temporaries and which aren't. > > - Added the test above to vect-alias-check-3.c. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. > > Thanks, > Richard > > > 2017-05-18 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > > * tree-data-ref.h (subscript): Add access_fn field. > (data_dependence_relation): Add could_be_independent_p. > (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. > (same_access_functions): Move to tree-data-ref.c. > * tree-data-ref.c (ref_contains_union_access_p): New function. > (access_fn_component_p): Likewise. > (access_fn_components_comparable_p): Likewise. > (dr_analyze_indices): Add a comment that this code needs to be > kept in sync with access_fn_component_p. > (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of > DR_ACCESS_FN. > (constant_access_functions): Likewise. > (add_other_self_distances): Likewise. > (same_access_functions): Likewise. (Moved from tree-data-ref.h.) > (initialize_data_dependence_relation): Use XCNEW and remove > explicit zeroing of DDR_REVERSED_P. Look for a subsequence > of access functions that have the same type. Allow the > subsequence to end with different bases in some circumstances. > Record the chosen access functions in SUB_ACCESS_FN. > (build_classic_dist_vector_1): Replace ddr_a and ddr_b with > a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. > (subscript_dependence_tester_1): Likewise dra and drb. > (build_classic_dist_vector): Update calls accordingly. > (subscript_dependence_tester): Likewise. > * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check > DDR_COULD_BE_INDEPENDENT_P. > * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test > comp_alias_ddrs instead of may_alias_ddrs. > * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Try > to mark for aliasing if DDR_COULD_BE_INDEPENDENT_P, but fall back > to using the recorded distance vectors if that fails. > (dependence_distance_ge_vf): New function. > (vect_prune_runtime_alias_test_list): Use it. Don't clear > LOOP_VINFO_MAY_ALIAS_DDRS. > > gcc/testsuite/ > * gcc.dg/vect/vect-alias-check-3.c: New test. > * gcc.dg/vect/vect-alias-check-4.c: Likewise. > * gcc.dg/vect/vect-alias-check-5.c: Likewise. > > Index: gcc/tree-data-ref.h > =================================================================== > --- gcc/tree-data-ref.h 2017-05-04 11:36:51.157328631 +0100 > +++ gcc/tree-data-ref.h 2017-05-18 07:51:50.871904726 +0100 > @@ -191,6 +191,9 @@ struct conflict_function > > struct subscript > { > + /* The access functions of the two references. */ > + tree access_fn[2]; > + > /* A description of the iterations for which the elements are > accessed twice. */ > conflict_function *conflicting_iterations_in_a; > @@ -209,6 +212,7 @@ struct subscript > > typedef struct subscript *subscript_p; > > +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] > #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a > #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b > #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict > @@ -264,6 +268,33 @@ struct data_dependence_relation > /* Set to true when the dependence relation is on the same data > access. */ > bool self_reference_p; > + > + /* True if the dependence described is conservatively correct rather > + than exact, and if it is still possible for the accesses to be > + conditionally independent. For example, the a and b references in: > + > + struct s *a, *b; > + for (int i = 0; i < n; ++i) > + a->f[i] += b->f[i]; > + > + conservatively have a distance vector of (0), for the case in which > + a == b, but the accesses are independent if a != b. Similarly, > + the a and b references in: > + > + struct s *a, *b; > + for (int i = 0; i < n; ++i) > + a[0].f[i] += b[i].f[i]; > + > + conservatively have a distance vector of (0), but they are indepenent > + when a != b + i. In contrast, the references in: > + > + struct s *a; > + for (int i = 0; i < n; ++i) > + a->f[i] += a->f[i]; > + > + have the same distance vector of (0), but the accesses can never be > + independent. */ > + bool could_be_independent_p; > }; > > typedef struct data_dependence_relation *ddr_p; > @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \ > #define DDR_DIST_VECT(DDR, I) \ > DDR_DIST_VECTS (DDR)[I] > #define DDR_REVERSED_P(DDR) (DDR)->reversed_p > +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p > > > bool dr_analyze_innermost (struct data_reference *, struct loop *); > @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data > return false; > > return true; > -} > - > -/* Return true when the DDR contains two data references that have the > - same access functions. */ > - > -static inline bool > -same_access_functions (const struct data_dependence_relation *ddr) > -{ > - unsigned i; > - > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), > - DR_ACCESS_FN (DDR_B (ddr), i))) > - return false; > - > - return true; > } > > /* Returns true when all the dependences are computable. */ > Index: gcc/tree-data-ref.c > =================================================================== > --- gcc/tree-data-ref.c 2017-05-18 07:51:26.126377691 +0100 > +++ gcc/tree-data-ref.c 2017-05-18 07:51:50.871904726 +0100 > @@ -123,8 +123,7 @@ Software Foundation; either version 3, o > } dependence_stats; > > static bool subscript_dependence_tester_1 (struct data_dependence_relation *, > - struct data_reference *, > - struct data_reference *, > + unsigned int, unsigned int, > struct loop *); > /* Returns true iff A divides B. */ > > @@ -144,6 +143,21 @@ int_divides_p (int a, int b) > return ((b % a) == 0); > } > > +/* Return true if reference REF contains a union access. */ > + > +static bool > +ref_contains_union_access_p (tree ref) > +{ > + while (handled_component_p (ref)) > + { > + ref = TREE_OPERAND (ref, 0); > + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE > + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) > + return true; > + } > + return false; > +} > + > > > /* Dump into FILE all the data references from DATAREFS. */ > @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out > unsigned int i; > struct loop *loopi; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > + subscript *sub; > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > { > fprintf (outf, " access_fn_A: "); > - print_generic_stmt (outf, DR_ACCESS_FN (dra, i)); > + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0)); > fprintf (outf, " access_fn_B: "); > - print_generic_stmt (outf, DR_ACCESS_FN (drb, i)); > - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); > + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1)); > + dump_subscript (outf, sub); > } > > fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); > @@ -886,6 +901,27 @@ dr_analyze_innermost (struct data_refere > return true; > } > > +/* Return true if OP is a valid component reference for a DR access > + function. This accepts a subset of what handled_component_p accepts. */ > + > +static bool > +access_fn_component_p (tree op) > +{ > + switch (TREE_CODE (op)) > + { > + case REALPART_EXPR: > + case IMAGPART_EXPR: > + case ARRAY_REF: > + return true; > + > + case COMPONENT_REF: > + return TREE_CODE (TREE_TYPE (TREE_OPERAND (op, 0))) == RECORD_TYPE; > + > + default: > + return false; > + } > +} > + > /* Determines the base object and the list of indices of memory reference > DR, analyzed in LOOP and instantiated in loop nest NEST. */ > > @@ -923,7 +959,9 @@ dr_analyze_indices (struct data_referenc > access_fns.safe_push (integer_one_node); > } > > - /* Analyze access functions of dimensions we know to be independent. */ > + /* Analyze access functions of dimensions we know to be independent. > + The list of component references handled here should be kept in > + sync with access_fn_component_p. */ > while (handled_component_p (ref)) > { > if (TREE_CODE (ref) == ARRAY_REF) > @@ -1472,6 +1510,27 @@ dr_may_alias_p (const struct data_refere > return refs_may_alias_p (addr_a, addr_b); > } > > +/* REF_A and REF_B both satisfy access_fns_comparable_p. Return true > + if it is meaningful to compare their associated access functions > + when checking for dependencies. */ > + > +static bool > +access_fn_components_comparable_p (tree ref_a, tree ref_b) > +{ > + if (TREE_CODE (ref_a) != TREE_CODE (ref_b)) > + return false; > + > + if (TREE_CODE (ref_a) == COMPONENT_REF) > + /* ??? We cannot simply use the type of operand #0 of the refs here as > + the Fortran compiler smuggles type punning into COMPONENT_REFs. > + Use the DECL_CONTEXT of the FIELD_DECLs instead. */ > + return (DECL_CONTEXT (TREE_OPERAND (ref_a, 1)) > + == DECL_CONTEXT (TREE_OPERAND (ref_b, 1))); > + > + return types_compatible_p (TREE_TYPE (TREE_OPERAND (ref_a, 0)), > + TREE_TYPE (TREE_OPERAND (ref_b, 0))); > +} > + > /* Initialize a data dependence relation between data accesses A and > B. NB_LOOPS is the number of loops surrounding the references: the > size of the classic distance/direction vectors. */ > @@ -1484,11 +1543,10 @@ initialize_data_dependence_relation (str > struct data_dependence_relation *res; > unsigned int i; > > - res = XNEW (struct data_dependence_relation); > + res = XCNEW (struct data_dependence_relation); > DDR_A (res) = a; > DDR_B (res) = b; > DDR_LOOP_NEST (res).create (0); > - DDR_REVERSED_P (res) = false; > DDR_SUBSCRIPTS (res).create (0); > DDR_DIR_VECTS (res).create (0); > DDR_DIST_VECTS (res).create (0); > @@ -1506,82 +1564,277 @@ initialize_data_dependence_relation (str > return res; > } > > - /* The case where the references are exactly the same. */ > - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) > + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); > + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); > + if (num_dimensions_a == 0 || num_dimensions_b == 0) > { > - if ((loop_nest.exists () > - && !object_address_invariant_in_loop_p (loop_nest[0], > - DR_BASE_OBJECT (a))) > - || DR_NUM_DIMENSIONS (a) == 0) > + DDR_ARE_DEPENDENT (res) = chrec_dont_know; > + return res; > + } > + > + /* For unconstrained bases, the root (highest-indexed) subscript > + describes a variation in the base of the original DR_REF rather > + than a component access. We have no type that accurately describes > + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* > + applying this subscript) so limit the search to the last real > + component access. > + > + E.g. for: > + > + void > + f (int a[][8], int b[][8]) > + { > + for (int i = 0; i < 8; ++i) > + a[i * 2][0] = b[i][0]; > + } > + > + the a and b accesses have a single ARRAY_REF component reference [0] > + but have two subscripts. */ > + if (DR_UNCONSTRAINED_BASE (a)) > + num_dimensions_a -= 1; > + if (DR_UNCONSTRAINED_BASE (b)) > + num_dimensions_b -= 1; > + > + /* These structures describe sequences of component references in > + DR_REF (A) and DR_REF (B). Each component reference is tied to a > + specific access function. */ > + struct { > + /* The sequence starts at DR_ACCESS_FN (A, START_A) of A and > + DR_ACCESS_FN (B, START_B) of B (inclusive) and extends to higher > + indices. In C notation, these are the indices of the rightmost > + component references; e.g. for a sequence .b.c.d, the start > + index is for .d. */ > + unsigned int start_a; > + unsigned int start_b; > + > + /* The sequence contains LENGTH consecutive access functions from > + each DR. */ > + unsigned int length; > + > + /* The enclosing objects for the A and B sequences respectively, > + i.e. the objects to which DR_ACCESS_FN (A, START_A + LENGTH - 1) > + and DR_ACCESS_FN (B, START_B + LENGTH - 1) are applied. */ > + tree object_a; > + tree object_b; > + } full_seq = {}, struct_seq = {}; > + > + /* Before each iteration of the loop: > + > + - REF_A is what you get after applying DR_ACCESS_FN (A, INDEX_A) and > + - REF_B is what you get after applying DR_ACCESS_FN (B, INDEX_B). */ > + unsigned int index_a = 0; > + unsigned int index_b = 0; > + tree ref_a = DR_REF (a); > + tree ref_b = DR_REF (b); > + > + /* Now walk the component references from the final DR_REFs back up to > + the enclosing base objects. Each component reference corresponds > + to one access function in the DR, with access function 0 being for > + the final DR_REF and the highest-indexed access function being the > + one that is applied to the base of the DR. > + > + Look for a sequence of component references whose access functions > + are comparable (see access_fn_components_comparable_p). If more > + than one such sequence exists, pick the one nearest the base > + (which is the leftmost sequence in C notation). Store this sequence > + in FULL_SEQ. > + > + For example, if we have: > + > + struct foo { struct bar s; ... } (*a)[10], (*b)[10]; > + > + A: a[0][i].s.c.d > + B: __real b[0][i].s.e[i].f > + > + (where d is the same type as the real component of f) then the access > + functions would be: > + > + 0 1 2 3 > + A: .d .c .s [i] > + > + 0 1 2 3 4 5 > + B: __real .f [i] .e .s [i] > + > + The A0/B2 column isn't comparable, since .d is a COMPONENT_REF > + and [i] is an ARRAY_REF. However, the A1/B3 column contains two > + COMPONENT_REF accesses for struct bar, so is comparable. Likewise > + the A2/B4 column contains two COMPONENT_REF accesses for struct foo, > + so is comparable. The A3/B5 column contains two ARRAY_REFs that > + index foo[10] arrays, so is again comparable. The sequence is > + therefore: > + > + A: [1, 3] (i.e. [i].s.c) > + B: [3, 5] (i.e. [i].s.e) > + > + Also look for sequences of component references whose access > + functions are comparable and whose enclosing objects have the same > + RECORD_TYPE. Store this sequence in STRUCT_SEQ. In the above > + example, STRUCT_SEQ would be: > + > + A: [1, 2] (i.e. s.c) > + B: [3, 4] (i.e. s.e) */ > + while (index_a < num_dimensions_a && index_b < num_dimensions_b) > + { > + /* REF_A and REF_B must be one of the component access types > + allowed by dr_analyze_indices. */ > + gcc_checking_assert (access_fn_component_p (ref_a)); > + gcc_checking_assert (access_fn_component_p (ref_b)); > + > + /* Get the immediately-enclosing objects for REF_A and REF_B, > + i.e. the references *before* applying DR_ACCESS_FN (A, INDEX_A) > + and DR_ACCESS_FN (B, INDEX_B). */ > + tree object_a = TREE_OPERAND (ref_a, 0); > + tree object_b = TREE_OPERAND (ref_b, 0); > + > + tree type_a = TREE_TYPE (object_a); > + tree type_b = TREE_TYPE (object_b); > + if (access_fn_components_comparable_p (ref_a, ref_b)) > + { > + /* This pair of component accesses is comparable for dependence > + analysis, so we can include DR_ACCESS_FN (A, INDEX_A) and > + DR_ACCESS_FN (B, INDEX_B) in the sequence. */ > + if (full_seq.start_a + full_seq.length != index_a > + || full_seq.start_b + full_seq.length != index_b) > + { > + /* The accesses don't extend the current sequence, > + so start a new one here. */ > + full_seq.start_a = index_a; > + full_seq.start_b = index_b; > + full_seq.length = 0; > + } > + > + /* Add this pair of references to the sequence. */ > + full_seq.length += 1; > + full_seq.object_a = object_a; > + full_seq.object_b = object_b; > + > + /* If the enclosing objects are structures (and thus have the > + same RECORD_TYPE), record the new sequence in STRUCT_SEQ. */ > + if (TREE_CODE (type_a) == RECORD_TYPE) > + struct_seq = full_seq; > + > + /* Move to the next containing reference for both A and B. */ > + ref_a = object_a; > + ref_b = object_b; > + index_a += 1; > + index_b += 1; > + continue; > + } > + > + /* Try to approach equal type sizes. */ > + if (!COMPLETE_TYPE_P (type_a) > + || !COMPLETE_TYPE_P (type_b) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) > + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) > + break; > + > + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); > + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); > + if (size_a <= size_b) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + index_a += 1; > + ref_a = object_a; > + } > + if (size_b <= size_a) > + { > + index_b += 1; > + ref_b = object_b; > } > - DDR_AFFINE_P (res) = true; > - DDR_ARE_DEPENDENT (res) = NULL_TREE; > - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); > - DDR_LOOP_NEST (res) = loop_nest; > - DDR_INNER_LOOP (res) = 0; > - DDR_SELF_REFERENCE (res) = true; > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > - { > - struct subscript *subscript; > - > - subscript = XNEW (struct subscript); > - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); > - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); > - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; > - SUB_DISTANCE (subscript) = chrec_dont_know; > - DDR_SUBSCRIPTS (res).safe_push (subscript); > - } > - return res; > } > > - /* If the references do not access the same object, we do not know > - whether they alias or not. We do not care about TBAA or alignment > - info so we can use OEP_ADDRESS_OF to avoid false negatives. > - But the accesses have to use compatible types as otherwise the > - built indices would not match. */ > - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF) > - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), > - TREE_TYPE (DR_BASE_OBJECT (b)))) > + /* See whether FULL_SEQ ends at the base and whether the two bases > + are equal. We do not care about TBAA or alignment info so we can > + use OEP_ADDRESS_OF to avoid false negatives. */ > + tree base_a = DR_BASE_OBJECT (a); > + tree base_b = DR_BASE_OBJECT (b); > + bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a > + && full_seq.start_b + full_seq.length == num_dimensions_b > + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) > + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) > + && types_compatible_p (TREE_TYPE (base_a), > + TREE_TYPE (base_b)) > + && (!loop_nest.exists () > + || (object_address_invariant_in_loop_p > + (loop_nest[0], base_a)))); > + > + /* If the bases are the same, we can include the base variation too. > + E.g. the b accesses in: > + > + for (int i = 0; i < n; ++i) > + b[i + 4][0] = b[i][0]; > + > + have a definite dependence distance of 4, while for: > + > + for (int i = 0; i < n; ++i) > + a[i + 4][0] = b[i][0]; > + > + the dependence distance depends on the gap between a and b. > + > + If the bases are different then we can only rely on the sequence > + rooted at a structure access, since arrays are allowed to overlap > + arbitrarily and change shape arbitrarily. E.g. we treat this as > + valid code: > + > + int a[256]; > + ... > + ((int (*)[4][3]) &a[1])[i][0] += ((int (*)[4][3]) &a[2])[i][0]; > + > + where two lvalues with the same int[4][3] type overlap, and where > + both lvalues are distinct from the object's declared type. */ > + if (same_base_p) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + if (DR_UNCONSTRAINED_BASE (a)) > + full_seq.length += 1; > } > + else > + full_seq = struct_seq; > > - /* If the base of the object is not invariant in the loop nest, we cannot > - analyze it. TODO -- in fact, it would suffice to record that there may > - be arbitrary dependences in the loops where the base object varies. */ > - if ((loop_nest.exists () > - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a))) > - || DR_NUM_DIMENSIONS (a) == 0) > + /* Punt if we didn't find a suitable sequence. */ > + if (full_seq.length == 0) > { > DDR_ARE_DEPENDENT (res) = chrec_dont_know; > return res; > } > > - /* If the number of dimensions of the access to not agree we can have > - a pointer access to a component of the array element type and an > - array access while the base-objects are still the same. Punt. */ > - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) > + if (!same_base_p) > { > - DDR_ARE_DEPENDENT (res) = chrec_dont_know; > - return res; > + /* Partial overlap is possible for different bases when strict aliasing > + is not in effect. It's also possible if either base involves a union > + access; e.g. for: > + > + struct s1 { int a[2]; }; > + struct s2 { struct s1 b; int c; }; > + struct s3 { int d; struct s1 e; }; > + union u { struct s2 f; struct s3 g; } *p, *q; > + > + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at > + "p->g.e" (base "p->g") and might partially overlap the s1 at > + "q->g.e" (base "q->g"). */ > + if (!flag_strict_aliasing > + || ref_contains_union_access_p (full_seq.object_a) > + || ref_contains_union_access_p (full_seq.object_b)) > + { > + DDR_ARE_DEPENDENT (res) = chrec_dont_know; > + return res; > + } > + > + DDR_COULD_BE_INDEPENDENT_P (res) = true; > } > > DDR_AFFINE_P (res) = true; > DDR_ARE_DEPENDENT (res) = NULL_TREE; > - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); > + DDR_SUBSCRIPTS (res).create (full_seq.length); > DDR_LOOP_NEST (res) = loop_nest; > DDR_INNER_LOOP (res) = 0; > DDR_SELF_REFERENCE (res) = false; > > - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) > + for (i = 0; i < full_seq.length; ++i) > { > struct subscript *subscript; > > subscript = XNEW (struct subscript); > + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i); > + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i); > SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); > SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); > SUB_LAST_CONFLICT (subscript) = chrec_dont_know; > @@ -3163,14 +3416,15 @@ add_outer_distances (struct data_depende > } > > /* Return false when fail to represent the data dependence as a > - distance vector. INIT_B is set to true when a component has been > + distance vector. A_INDEX is the index of the first reference > + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the > + second reference. INIT_B is set to true when a component has been > added to the distance vector DIST_V. INDEX_CARRY is then set to > the index in DIST_V that carries the dependence. */ > > static bool > build_classic_dist_vector_1 (struct data_dependence_relation *ddr, > - struct data_reference *ddr_a, > - struct data_reference *ddr_b, > + unsigned int a_index, unsigned int b_index, > lambda_vector dist_v, bool *init_b, > int *index_carry) > { > @@ -3188,8 +3442,8 @@ build_classic_dist_vector_1 (struct data > return false; > } > > - access_fn_a = DR_ACCESS_FN (ddr_a, i); > - access_fn_b = DR_ACCESS_FN (ddr_b, i); > + access_fn_a = SUB_ACCESS_FN (subscript, a_index); > + access_fn_b = SUB_ACCESS_FN (subscript, b_index); > > if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC > && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) > @@ -3249,10 +3503,11 @@ build_classic_dist_vector_1 (struct data > constant_access_functions (const struct data_dependence_relation *ddr) > { > unsigned i; > + subscript *sub; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) > - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) > + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) > return false; > > return true; > @@ -3315,10 +3570,11 @@ add_other_self_distances (struct data_de > lambda_vector dist_v; > unsigned i; > int index_carry = DDR_NB_LOOPS (ddr); > + subscript *sub; > > - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > { > - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); > + tree access_fun = SUB_ACCESS_FN (sub, 0); > > if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) > { > @@ -3330,7 +3586,7 @@ add_other_self_distances (struct data_de > return; > } > > - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); > + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); > > if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) > add_multivariate_self_dist (ddr, access_fun); > @@ -3401,6 +3657,23 @@ add_distance_for_zero_overlaps (struct d > } > } > > +/* Return true when the DDR contains two data references that have the > + same access functions. */ > + > +static inline bool > +same_access_functions (const struct data_dependence_relation *ddr) > +{ > + unsigned i; > + subscript *sub; > + > + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) > + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), > + SUB_ACCESS_FN (sub, 1))) > + return false; > + > + return true; > +} > + > /* Compute the classic per loop distance vector. DDR is the data > dependence relation to build a vector from. Return false when fail > to represent the data dependence as a distance vector. */ > @@ -3432,8 +3705,7 @@ build_classic_dist_vector (struct data_d > } > > dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), > - dist_v, &init_b, &index_carry)) > + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry)) > return false; > > /* Save the distance vector if we initialized one. */ > @@ -3466,12 +3738,11 @@ build_classic_dist_vector (struct data_d > if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) > { > lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - loop_nest)) > + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) > return false; > compute_subscript_distance (ddr); > - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - save_v, &init_b, &index_carry)) > + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, > + &index_carry)) > return false; > save_dist_v (ddr, save_v); > DDR_REVERSED_P (ddr) = true; > @@ -3507,12 +3778,10 @@ build_classic_dist_vector (struct data_d > { > lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); > > - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), > - DDR_A (ddr), loop_nest)) > + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) > return false; > compute_subscript_distance (ddr); > - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), > - opposite_v, &init_b, > + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, > &index_carry)) > return false; > > @@ -3591,13 +3860,13 @@ build_classic_dir_vector (struct data_de > } > } > > -/* Helper function. Returns true when there is a dependence between > - data references DRA and DRB. */ > +/* Helper function. Returns true when there is a dependence between the > + data references. A_INDEX is the index of the first reference (0 for > + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ > > static bool > subscript_dependence_tester_1 (struct data_dependence_relation *ddr, > - struct data_reference *dra, > - struct data_reference *drb, > + unsigned int a_index, unsigned int b_index, > struct loop *loop_nest) > { > unsigned int i; > @@ -3609,8 +3878,8 @@ subscript_dependence_tester_1 (struct da > { > conflict_function *overlaps_a, *overlaps_b; > > - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), > - DR_ACCESS_FN (drb, i), > + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), > + SUB_ACCESS_FN (subscript, b_index), > &overlaps_a, &overlaps_b, > &last_conflicts, loop_nest); > > @@ -3659,7 +3928,7 @@ subscript_dependence_tester_1 (struct da > subscript_dependence_tester (struct data_dependence_relation *ddr, > struct loop *loop_nest) > { > - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest)) > + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) > dependence_stats.num_dependence_dependent++; > > compute_subscript_distance (ddr); > Index: gcc/tree-ssa-loop-prefetch.c > =================================================================== > --- gcc/tree-ssa-loop-prefetch.c 2017-05-18 07:51:26.127377591 +0100 > +++ gcc/tree-ssa-loop-prefetch.c 2017-05-18 07:51:50.871904726 +0100 > @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop * > refb = (struct mem_ref *) DDR_B (dep)->aux; > > if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know > + || DDR_COULD_BE_INDEPENDENT_P (dep) > || DDR_NUM_DIST_VECTS (dep) == 0) > { > /* If the dependence cannot be analyzed, assume that there might be > Index: gcc/tree-vectorizer.h > =================================================================== > --- gcc/tree-vectorizer.h 2017-05-18 07:51:26.128377491 +0100 > +++ gcc/tree-vectorizer.h 2017-05-18 07:51:50.872904626 +0100 > @@ -383,7 +383,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) > #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ > ((L)->may_misalign_stmts.length () > 0) > #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ > - ((L)->may_alias_ddrs.length () > 0) > + ((L)->comp_alias_ddrs.length () > 0) > #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ > (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) > #define LOOP_REQUIRES_VERSIONING(L) \ > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2017-05-18 07:51:23.307659382 +0100 > +++ gcc/tree-vect-data-refs.c 2017-05-18 07:51:50.872904626 +0100 > @@ -340,6 +340,26 @@ vect_analyze_data_ref_dependence (struct > } > > loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); > + > + if (DDR_COULD_BE_INDEPENDENT_P (ddr)) > + /* For dependence distances of 2 or more, we have the option of > + limiting VF or checking for an alias at runtime. Prefer to check > + at runtime if we can, to avoid limiting the VF unnecessarily when > + the bases are in fact independent. > + > + Note that the alias checks will be removed if the VF ends up > + being small enough. */ > + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > + { > + int dist = dist_v[loop_depth]; > + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) > + { > + if (vect_mark_for_runtime_alias_test (ddr, loop_vinfo)) > + return false; > + break; > + } > + } > + > FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > { > int dist = dist_v[loop_depth]; > @@ -3017,6 +3037,44 @@ vect_no_alias_p (struct data_reference * > return false; > } > > +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH > + in DDR is >= VF. */ > + > +static bool > +dependence_distance_ge_vf (data_dependence_relation *ddr, > + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) > +{ > + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE > + || DDR_NUM_DIST_VECTS (ddr) == 0) > + return false; > + > + /* If the dependence is exact, we should have limited the VF instead. */ > + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); > + > + unsigned int i; > + lambda_vector dist_v; > + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) > + { > + HOST_WIDE_INT dist = dist_v[loop_depth]; > + if (dist != 0 > + && !(dist > 0 && DDR_REVERSED_P (ddr)) > + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) > + return false; > + } > + > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, > + "dependence distance between "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); > + dump_printf (MSG_NOTE, " and "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); > + dump_printf (MSG_NOTE, " is >= VF\n"); > + } > + > + return true; > +} > + > /* Function vect_prune_runtime_alias_test_list. > > Prune a list of ddrs to be tested at run-time by versioning for alias. > @@ -3075,6 +3133,10 @@ vect_prune_runtime_alias_test_list (loop > > comp_alias_ddrs.create (may_alias_ddrs.length ()); > > + unsigned int loop_depth > + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, > + LOOP_VINFO_LOOP_NEST (loop_vinfo)); > + > /* First, we collect all data ref pairs for aliasing checks. */ > FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) > { > @@ -3084,6 +3146,11 @@ vect_prune_runtime_alias_test_list (loop > tree segment_length_a, segment_length_b; > gimple *stmt_a, *stmt_b; > > + /* Ignore the alias if the VF we chose ended up being no greater > + than the dependence distance. */ > + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) > + continue; > + > dr_a = DDR_A (ddr); > stmt_a = DR_STMT (DDR_A (ddr)); > dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); > @@ -3294,10 +3361,6 @@ vect_prune_runtime_alias_test_list (loop > return false; > } > > - /* All alias checks have been resolved at compilation time. */ > - if (!comp_alias_ddrs.length ()) > - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); > - > return true; > } > > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c > =================================================================== > --- /dev/null 2017-05-17 17:16:48.996861112 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-05-18 07:51:50.870904826 +0100 > @@ -0,0 +1,112 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ > + > +/* Intended to be larger than any VF. */ > +#define GAP 128 > +#define N (GAP * 3) > + > +struct s { int x[N + 1]; }; > +struct t { struct s x[N + 1]; }; > +struct u { int x[N + 1]; int y; }; > +struct v { struct s s; }; > + > +void > +f1 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i]; > +} > + > +void > +f2 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[2].x[i]; > +} > + > +void > +f3 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[i].x[i]; > +} > + > +void > +f4 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[i].x[i] += b[i].x[i]; > +} > + > +void > +f5 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i + 1]; > +} > + > +void > +f6 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[2].x[i + 1]; > +} > + > +void > +f7 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[1].x[i] += b[i].x[i + 1]; > +} > + > +void > +f8 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a[i].x[i] += b[i].x[i + 1]; > +} > + > +void > +f9 (struct s *a, struct t *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[1].x[i]; > +} > + > +void > +f10 (struct s *a, struct t *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i].x[i]; > +} > + > +void > +f11 (struct u *a, struct u *b) > +{ > + for (int i = 0; i < N; ++i) > + a->x[i] += b->x[i] + b[i].y; > +} > + > +void > +f12 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +void > +f13 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP * 2; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +void > +f14 (struct v *a, struct s *b) > +{ > + for (int i = 0; i < N; ++i) > + a->s.x[i] = b->x[i]; > +} > + > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 14 "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c > =================================================================== > --- /dev/null 2017-05-17 17:16:48.996861112 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-05-18 07:51:50.870904826 +0100 > @@ -0,0 +1,35 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ > + > +#define N 16 > + > +struct s1 { int a[N]; }; > +struct s2 { struct s1 b; int c; }; > +struct s3 { int d; struct s1 e; }; > +union u { struct s2 f; struct s3 g; }; > + > +/* We allow a and b to overlap arbitrarily. */ > + > +void > +f1 (int a[][N], int b[][N]) > +{ > + for (int i = 0; i < N; ++i) > + a[0][i] += b[0][i]; > +} > + > +void > +f2 (union u *a, union u *b) > +{ > + for (int i = 0; i < N; ++i) > + a->f.b.a[i] += b->g.e.a[i]; > +} > + > +void > +f3 (struct s1 *a, struct s1 *b) > +{ > + for (int i = 0; i < N - 1; ++i) > + a->a[i + 1] += b->a[i]; > +} > + > +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c > =================================================================== > --- /dev/null 2017-05-17 17:16:48.996861112 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-05-18 07:51:50.870904826 +0100 > @@ -0,0 +1,19 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target vect_int } */ > + > +/* Intended to be larger than any VF. */ > +#define GAP 128 > +#define N (GAP * 3) > + > +struct s { int x[N]; }; > + > +void > +f1 (struct s *a, struct s *b) > +{ > + for (int i = 0; i < GAP * 2; ++i) > + a->x[i + GAP] += b->x[i]; > +} > + > +/* { dg-final { scan-tree-dump-times "mark for run-time aliasing" 1 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */ > +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */

On Wed, May 31, 2017 at 8:56 AM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Ping > > Richard Sandiford <richard.sandiford@linaro.org> writes: >> Richard Biener <richard.guenther@gmail.com> writes: >>> On Thu, May 4, 2017 at 7:21 PM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> Richard Biener <richard.guenther@gmail.com> writes: >>>>> On Thu, May 4, 2017 at 2:12 PM, Richard Biener >>>>> <richard.guenther@gmail.com> wrote: >>>>>> On Wed, May 3, 2017 at 10:00 AM, Richard Sandiford >>>>>> <richard.sandiford@linaro.org> wrote: >>>>>>> This patch tries to calculate conservatively-correct distance >>>>>>> vectors for two references whose base addresses are not the same. >>>>>>> It sets a new flag DDR_COULD_BE_INDEPENDENT_P if the dependence >>>>>>> isn't guaranteed to occur. >>>>>>> >>>>>>> The motivating example is: >>>>>>> >>>>>>> struct s { int x[8]; }; >>>>>>> void >>>>>>> f (struct s *a, struct s *b) >>>>>>> { >>>>>>> for (int i = 0; i < 8; ++i) >>>>>>> a->x[i] += b->x[i]; >>>>>>> } >>>>>>> >>>>>>> in which the "a" and "b" accesses are either independent or have a >>>>>>> dependence distance of 0 (assuming -fstrict-aliasing). Neither case >>>>>>> prevents vectorisation, so we can vectorise without an alias check. >>>>>>> >>>>>>> I'd originally wanted to do the same thing for arrays as well, e.g.: >>>>>>> >>>>>>> void >>>>>>> f (int a[][8], struct b[][8]) >>>>>>> { >>>>>>> for (int i = 0; i < 8; ++i) >>>>>>> a[0][i] += b[0][i]; >>>>>>> } >>>>>>> >>>>>>> I think this is valid because C11 6.7.6.2/6 says: >>>>>>> >>>>>>> For two array types to be compatible, both shall have compatible >>>>>>> element types, and if both size specifiers are present, and are >>>>>>> integer constant expressions, then both size specifiers shall have >>>>>>> the same constant value. >>>>>>> >>>>>>> So if we access an array through an int (*)[8], it must have type X[8] >>>>>>> or X[], where X is compatible with int. It doesn't seem possible in >>>>>>> either case for "a[0]" and "b[0]" to overlap when "a != b". >>>>>>> >>>>>>> However, Richard B said that (at least in gimple) we support arbitrary >>>>>>> overlap of arrays and allow arrays to be accessed with different >>>>>>> dimensionality. There are examples of this in PR50067. I've therefore >>>>>>> only handled references that end in a structure field access. >>>>>>> >>>>>>> There are two ways of handling these dependences in the vectoriser: >>>>>>> use them to limit VF, or check at runtime as before. I've gone for >>>>>>> the approach of checking at runtime if we can, to avoid limiting VF >>>>>>> unnecessarily. We still fall back to a VF cap when runtime checks >>>>>>> aren't allowed. >>>>>>> >>>>>>> The patch tests whether we queued an alias check with a dependence >>>>>>> distance of X and then picked a VF <= X, in which case it's safe to >>>>>>> drop the alias check. Since vect_prune_runtime_alias_check_list can >>>>>>> be called twice with different VF for the same loop, it's no longer >>>>>>> safe to clear may_alias_ddrs on exit. Instead we should use >>>>>>> comp_alias_ddrs to check whether versioning is necessary. >>>>>>> >>>>>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? >>>>>> >>>>>> You seem to do your "fancy" thing but also later compute the old >>>>>> base equality anyway (for same_base_p). It looks to me for this >>>>>> case the new fancy code can be simply skipped, keeping num_dimensions >>>>>> as before? >>>>>> >>>>>> + /* Try to approach equal type sizes. */ >>>>>> + if (!COMPLETE_TYPE_P (type_a) >>>>>> + || !COMPLETE_TYPE_P (type_b) >>>>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >>>>>> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >>>>>> + break; >>>>>> >>>>>> ah, interesting idea to avoid a quadratic search. Note that you should >>>>>> conservatively handle both BIT_FIELD_REF and VIEW_CONVERT_EXPR >>>>>> as they are used for type-punning. >>>> >>>> All the component refs here should be REALPART_EXPRs, IMAGPART_EXPRs, >>>> ARRAY_REFs or COMPONENT_REFs of structures, since that's all that >>>> dr_analyze_indices allows, so I think we safe in terms of the tree codes. >>> >>> Yeah. I think we need to document that we should have a 1:1 match here. >> >> OK, I added that to the comments and also added an access_fn_component_p >> that we can assert on. >> >>>>>> I see nonoverlapping_component_refs_of_decl_p should simply skip >>>>>> ARRAY_REFs - but I also see there: >>>>>> >>>>>> /* ??? We cannot simply use the type of operand #0 of the refs here >>>>>> as the Fortran compiler smuggles type punning into COMPONENT_REFs >>>>>> for common blocks instead of using unions like everyone else. */ >>>>>> tree type1 = DECL_CONTEXT (field1); >>>>>> tree type2 = DECL_CONTEXT (field2); >>>>>> >>>>>> so you probably can't simply use TREE_TYPE (outer_ref) for type compatibility. >>>>>> You also may not use types_compatible_p here as for LTO that is _way_ too >>>>>> lax for aggregates. The above uses >>>>>> >>>>>> /* We cannot disambiguate fields in a union or qualified union. */ >>>>>> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) >>>>>> return false; >>>>>> >>>>>> so you should also bail out on unions here, rather than the check you do later. >>>> >>>> The loop stops before we get to a union, so I think "only" the RECORD_TYPE >>>> COMPONENT_REF handling is a potential problem. Does this mean that >>>> I should use the nonoverlapping_component_refs_of_decl_p code: >>>> >>>> tree field1 = TREE_OPERAND (ref1, 1); >>>> tree field2 = TREE_OPERAND (ref2, 1); >>>> >>>> /* ??? We cannot simply use the type of operand #0 of the refs here >>>> as the Fortran compiler smuggles type punning into COMPONENT_REFs >>>> for common blocks instead of using unions like everyone else. */ >>>> tree type1 = DECL_CONTEXT (field1); >>>> tree type2 = DECL_CONTEXT (field2); >>>> >>>> /* We cannot disambiguate fields in a union or qualified union. */ >>>> if (type1 != type2 || TREE_CODE (type1) != RECORD_TYPE) >>>> return false; >>>> >>>> if (field1 != field2) >>>> { >>>> /* A field and its representative need to be considered the >>>> same. */ >>>> if (DECL_BIT_FIELD_REPRESENTATIVE (field1) == field2 >>>> || DECL_BIT_FIELD_REPRESENTATIVE (field2) == field1) >>>> return false; >>>> /* Different fields of the same record type cannot overlap. >>>> ??? Bitfields can overlap at RTL level so punt on them. */ >>>> if (DECL_BIT_FIELD (field1) && DECL_BIT_FIELD (field2)) >>>> return false; >>>> return true; >>>> } >>>> >>>> as the disambiguation test for COMPONENT_REFs, instead of types_compatible_p >>>> during the new loop? >>> >>> Yes. OTOH you want to "match" while the above disambiguates. So it means >>> you should use either FIELD_DECL equality or DECL_CONTEXT of the FIELD_DECL >>> equality (which should be the same in the end). The RTL concern >>> should not matter >>> here. >> >> The attached patch adds an access_fn_components_comparable_p helper >> function that checks whether the DECL_CONTEXTs are the same. >> >>>> And test for this as well as unions in the outer >>>> references? >>> >>> So looking at dr_analyze_indices a union would be always the DR_BASE_OBJECT, >>> and you (should) stop the ref walk at DR_BASE_OBJECT. >> >> I was just thinking that if the Fortran front-end has cases in which >> TREE_TYPE (TREE_OPERAND (ref, 0)) != DECL_CONTEXT (TREE_OPERAND (ref, 1)) >> for a COMPONENT_REF, should we treat that as equivalent to a union >> access in ref_contains_union_access_p? But I'm not sure that's >> necessary after all. >> >>> The dr_analyze_indices code is also somewhat fishy in that it simply >>> ignores everything below unhandled component-refs even if there are >>> indices involved (and it gets away with this because dependence >>> analysis likely/hopefully gives up on the DR_BASE_OBJECT equality test >>> in case it is sth like a[i].union for example ... hopefully ...). >> >> I think this is what you meant, but: I don't think the base object >> itself can be a union, because we need at least one component reference >> for the DR, and don't accept COMPONENT_REFs for unions as access functions. >> So if the base involves a union, the base would also need to have a >> COMPONENT_REF that selects a particular member of that union. >> >> And yeah, before the patch we did allow a dependence distance to be >> calculated for a[i].union.f[j] vs. a[i].union.f[j + 1] (and still do >> after the patch), on the basis that a[i].union.f refers to the same >> object in both cases. >> >>>>>> You seem to rely on getting an access_fn entry for each handled_component_p. >>>>>> It looks like this is the case -- we even seem to stop at unions >>>>>> (with the same >>>>>> fortran "issue"). I'm not sure that's the best thing to do but you >>>>>> rely on that. >>>> >>>> Yeah, the loop is deliberately limited to the components associated with >>>> an access_fn. I did wonder at first whether dr_analyze_indices should >>>> store the original component reference trees for each access function. >>>> That would make things simpler and more explicit, but would also eat up >>>> more memory. Things like object_address_invariant_in_loop_p rely on the >>>> access_fns in the same way that the loop in the patch does. >>> >>> in fact it fails to handle ARRAY_RANGE_REFs ... >> >> Yeah, the whole file seems to ignore those. What kind of code would >> benefit? This is currently only used by the Ada frontend so I'm not sure. It would be also non-trivial to handle them as they do not represent an independent dimension but just adjust the index domain (and size, but that doesn't matter). >>>>>> I don't understand the looping, it needs more comments. You seem to be >>>>>> looking for the innermost compatible RECORD_TYPE but then num_dimensions >>>>>> is how many compatible refs you found on the way (with incompatible ones >>>>>> not counting?!). What about an inner varying array of structs? >>>>>> This seems to >>>>>> be disregarded in the analysis now? Thus, a[i].s.b[i].j vs. __real >>>>>> b[i].s.b[i].j? >>>> >>>> I'll try to improve the comments. But the idea is that both sequences are >>>> as long as possible, while that still gives compatible types. If there is >>>> more than one such sequence, we pick the one nearest the base. >>>> >>>> So in your example, the access functions would be: >>>> >>>> 0 1 2 3 4 >>>> a: .j [i] .b .s [i] >>>> >>>> 0 1 2 3 4 5 >>>> b: __real .j [i] .b .s [i] >>>> >>>> If a and b are pointers, the final access functions would be >>>> unconstrained base accesses, so we'd end up with: >>>> >>>> a: [0, 3] >>>> b: [1, 4] >>>> >>>> for both sequences. >>>> >>>>>> nonoverlapping_component_refs_of_decl_p/nonoverlapping_component_refs_p >>>>>> conveniently start from the other >>>>>> end of the ref here. >>>>> >>>>> That said, for the motivational cases we either have one ref having >>>>> more dimensions than the other (the __real vs. full complex access) or >>>>> they have the same number of dimensions (and no access fn for the >>>>> base). >>>>> >>>>> For the first case we should simply "drop" access_fns of the larger >>>>> dimensional ref (from the start, plus outer component refs) up to the >>>>> point the number of dimensions are equal. >>>> >>>> Yeah, that's what happens for your example. But if we had: >>>> >>>> a[i].s.c.d >>>> __real b[i].s.b[i].j >>>> >>>> (where d is the same type as the real component) then the access >>>> functions would be: >>>> >>>> 0 1 2 3 >>>> a: .d .c .s [i] >>>> >>>> 0 1 2 3 4 5 >>>> b: __real .j [i] .b .s [i] >>>> >>>> Comparing the a0/b2 column doesn't make sense, because one's an array >>>> and the other is a structure. In this case the sequence we care about is: >>>> >>>> a: [1, 3] >>>> b: [3, 5] >>>> >>>> which is what the loop gives. The a1/b3 column is the one that proves >>>> there's no dependence. >>>> >>>>> Then we have the case of >>>>> >>>>> ! types_compatible_p (TREE_TYPE (base_a), TREE_TYPE (base_b)) >>>>> >>>>> where we have to punt. >>>>> >>>>> Then we have the case of >>>>> >>>>> ! operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) >>>>> >>>>> which is where the new code should kick in to see if we can drop access_fns >>>>> from the other end (as unanalyzable but either having distance zero or not >>>>> aliased because of TBAA). >>>>> >>>>> At least your testcases suggest you do not want to handle >>>>> >>>>> struct s { int x[N]; }; >>>>> struct r { struct s s; }; >>>>> f (struct s *a, struct r *b) >>>>> { >>>>> for (i = 0; i < N; ++i) >>>>> a->s.x[i] = b->x[i]; >>>>> } >>>>> >>>>> ? >>>>> >>>>> With this example your loop which seems to search for a "common" >>>>> sequence in (different) midst of the reference trees makes more sense >>>>> (still that loop is awkward to understand). >>>> >>>> Yeah, I want to handle that too, just hadn't thought of it as a specific >>>> testcase. The code does give the expected dependence distance of 0. >>> >>> Ok. >>> >>> I think the patch is reasonable, maybe the loop can be restructured / >>> simplified a bit and handling of the union case for example be done >>> first (by looking at DR_BASE_OBJECT). >> >> I still prefer doing the loop first and keeping the "same base" check >> together as a single condition, since it means that we're analysing the >> reference in a single direction (DR_REF to base) rather than jumping >> around. And unequal bases should be more common that equal ones. >> >> I think both orders involve doing potentially redundant work. The >> current order tends towards doing redundant work for union accesses >> and !flag_strict_aliasing, but they should be the less common cases. >> >> How does this look? Changes since v1: >> >> - Added access_fn_component_p to check for valid access function components. >> >> - Added access_fn_components_comparable_p instead of using >> types_compatibloe_p directly. >> >> - Added more commentary. >> >> - Added local structures to represent the sequence, so that it's >> more obvious which variables are temporaries and which aren't. >> >> - Added the test above to vect-alias-check-3.c. >> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu. This is ok. Thanks, Richard. >> Thanks, >> Richard >> >> >> 2017-05-18 Richard Sandiford <richard.sandiford@linaro.org> >> >> gcc/ >> >> * tree-data-ref.h (subscript): Add access_fn field. >> (data_dependence_relation): Add could_be_independent_p. >> (SUB_ACCESS_FN, DDR_COULD_BE_INDEPENDENT_P): New macros. >> (same_access_functions): Move to tree-data-ref.c. >> * tree-data-ref.c (ref_contains_union_access_p): New function. >> (access_fn_component_p): Likewise. >> (access_fn_components_comparable_p): Likewise. >> (dr_analyze_indices): Add a comment that this code needs to be >> kept in sync with access_fn_component_p. >> (dump_data_dependence_relation): Use SUB_ACCESS_FN instead of >> DR_ACCESS_FN. >> (constant_access_functions): Likewise. >> (add_other_self_distances): Likewise. >> (same_access_functions): Likewise. (Moved from tree-data-ref.h.) >> (initialize_data_dependence_relation): Use XCNEW and remove >> explicit zeroing of DDR_REVERSED_P. Look for a subsequence >> of access functions that have the same type. Allow the >> subsequence to end with different bases in some circumstances. >> Record the chosen access functions in SUB_ACCESS_FN. >> (build_classic_dist_vector_1): Replace ddr_a and ddr_b with >> a_index and b_index. Use SUB_ACCESS_FN instead of DR_ACCESS_FN. >> (subscript_dependence_tester_1): Likewise dra and drb. >> (build_classic_dist_vector): Update calls accordingly. >> (subscript_dependence_tester): Likewise. >> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Check >> DDR_COULD_BE_INDEPENDENT_P. >> * tree-vectorizer.h (LOOP_REQUIRES_VERSIONING_FOR_ALIAS): Test >> comp_alias_ddrs instead of may_alias_ddrs. >> * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Try >> to mark for aliasing if DDR_COULD_BE_INDEPENDENT_P, but fall back >> to using the recorded distance vectors if that fails. >> (dependence_distance_ge_vf): New function. >> (vect_prune_runtime_alias_test_list): Use it. Don't clear >> LOOP_VINFO_MAY_ALIAS_DDRS. >> >> gcc/testsuite/ >> * gcc.dg/vect/vect-alias-check-3.c: New test. >> * gcc.dg/vect/vect-alias-check-4.c: Likewise. >> * gcc.dg/vect/vect-alias-check-5.c: Likewise. >> >> Index: gcc/tree-data-ref.h >> =================================================================== >> --- gcc/tree-data-ref.h 2017-05-04 11:36:51.157328631 +0100 >> +++ gcc/tree-data-ref.h 2017-05-18 07:51:50.871904726 +0100 >> @@ -191,6 +191,9 @@ struct conflict_function >> >> struct subscript >> { >> + /* The access functions of the two references. */ >> + tree access_fn[2]; >> + >> /* A description of the iterations for which the elements are >> accessed twice. */ >> conflict_function *conflicting_iterations_in_a; >> @@ -209,6 +212,7 @@ struct subscript >> >> typedef struct subscript *subscript_p; >> >> +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] >> #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a >> #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b >> #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict >> @@ -264,6 +268,33 @@ struct data_dependence_relation >> /* Set to true when the dependence relation is on the same data >> access. */ >> bool self_reference_p; >> + >> + /* True if the dependence described is conservatively correct rather >> + than exact, and if it is still possible for the accesses to be >> + conditionally independent. For example, the a and b references in: >> + >> + struct s *a, *b; >> + for (int i = 0; i < n; ++i) >> + a->f[i] += b->f[i]; >> + >> + conservatively have a distance vector of (0), for the case in which >> + a == b, but the accesses are independent if a != b. Similarly, >> + the a and b references in: >> + >> + struct s *a, *b; >> + for (int i = 0; i < n; ++i) >> + a[0].f[i] += b[i].f[i]; >> + >> + conservatively have a distance vector of (0), but they are indepenent >> + when a != b + i. In contrast, the references in: >> + >> + struct s *a; >> + for (int i = 0; i < n; ++i) >> + a->f[i] += a->f[i]; >> + >> + have the same distance vector of (0), but the accesses can never be >> + independent. */ >> + bool could_be_independent_p; >> }; >> >> typedef struct data_dependence_relation *ddr_p; >> @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \ >> #define DDR_DIST_VECT(DDR, I) \ >> DDR_DIST_VECTS (DDR)[I] >> #define DDR_REVERSED_P(DDR) (DDR)->reversed_p >> +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p >> >> >> bool dr_analyze_innermost (struct data_reference *, struct loop *); >> @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data >> return false; >> >> return true; >> -} >> - >> -/* Return true when the DDR contains two data references that have the >> - same access functions. */ >> - >> -static inline bool >> -same_access_functions (const struct data_dependence_relation *ddr) >> -{ >> - unsigned i; >> - >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), >> - DR_ACCESS_FN (DDR_B (ddr), i))) >> - return false; >> - >> - return true; >> } >> >> /* Returns true when all the dependences are computable. */ >> Index: gcc/tree-data-ref.c >> =================================================================== >> --- gcc/tree-data-ref.c 2017-05-18 07:51:26.126377691 +0100 >> +++ gcc/tree-data-ref.c 2017-05-18 07:51:50.871904726 +0100 >> @@ -123,8 +123,7 @@ Software Foundation; either version 3, o >> } dependence_stats; >> >> static bool subscript_dependence_tester_1 (struct data_dependence_relation *, >> - struct data_reference *, >> - struct data_reference *, >> + unsigned int, unsigned int, >> struct loop *); >> /* Returns true iff A divides B. */ >> >> @@ -144,6 +143,21 @@ int_divides_p (int a, int b) >> return ((b % a) == 0); >> } >> >> +/* Return true if reference REF contains a union access. */ >> + >> +static bool >> +ref_contains_union_access_p (tree ref) >> +{ >> + while (handled_component_p (ref)) >> + { >> + ref = TREE_OPERAND (ref, 0); >> + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE >> + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) >> + return true; >> + } >> + return false; >> +} >> + >> >> >> /* Dump into FILE all the data references from DATAREFS. */ >> @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out >> unsigned int i; >> struct loop *loopi; >> >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> + subscript *sub; >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> { >> fprintf (outf, " access_fn_A: "); >> - print_generic_stmt (outf, DR_ACCESS_FN (dra, i)); >> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0)); >> fprintf (outf, " access_fn_B: "); >> - print_generic_stmt (outf, DR_ACCESS_FN (drb, i)); >> - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); >> + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1)); >> + dump_subscript (outf, sub); >> } >> >> fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); >> @@ -886,6 +901,27 @@ dr_analyze_innermost (struct data_refere >> return true; >> } >> >> +/* Return true if OP is a valid component reference for a DR access >> + function. This accepts a subset of what handled_component_p accepts. */ >> + >> +static bool >> +access_fn_component_p (tree op) >> +{ >> + switch (TREE_CODE (op)) >> + { >> + case REALPART_EXPR: >> + case IMAGPART_EXPR: >> + case ARRAY_REF: >> + return true; >> + >> + case COMPONENT_REF: >> + return TREE_CODE (TREE_TYPE (TREE_OPERAND (op, 0))) == RECORD_TYPE; >> + >> + default: >> + return false; >> + } >> +} >> + >> /* Determines the base object and the list of indices of memory reference >> DR, analyzed in LOOP and instantiated in loop nest NEST. */ >> >> @@ -923,7 +959,9 @@ dr_analyze_indices (struct data_referenc >> access_fns.safe_push (integer_one_node); >> } >> >> - /* Analyze access functions of dimensions we know to be independent. */ >> + /* Analyze access functions of dimensions we know to be independent. >> + The list of component references handled here should be kept in >> + sync with access_fn_component_p. */ >> while (handled_component_p (ref)) >> { >> if (TREE_CODE (ref) == ARRAY_REF) >> @@ -1472,6 +1510,27 @@ dr_may_alias_p (const struct data_refere >> return refs_may_alias_p (addr_a, addr_b); >> } >> >> +/* REF_A and REF_B both satisfy access_fns_comparable_p. Return true >> + if it is meaningful to compare their associated access functions >> + when checking for dependencies. */ >> + >> +static bool >> +access_fn_components_comparable_p (tree ref_a, tree ref_b) >> +{ >> + if (TREE_CODE (ref_a) != TREE_CODE (ref_b)) >> + return false; >> + >> + if (TREE_CODE (ref_a) == COMPONENT_REF) >> + /* ??? We cannot simply use the type of operand #0 of the refs here as >> + the Fortran compiler smuggles type punning into COMPONENT_REFs. >> + Use the DECL_CONTEXT of the FIELD_DECLs instead. */ >> + return (DECL_CONTEXT (TREE_OPERAND (ref_a, 1)) >> + == DECL_CONTEXT (TREE_OPERAND (ref_b, 1))); >> + >> + return types_compatible_p (TREE_TYPE (TREE_OPERAND (ref_a, 0)), >> + TREE_TYPE (TREE_OPERAND (ref_b, 0))); >> +} >> + >> /* Initialize a data dependence relation between data accesses A and >> B. NB_LOOPS is the number of loops surrounding the references: the >> size of the classic distance/direction vectors. */ >> @@ -1484,11 +1543,10 @@ initialize_data_dependence_relation (str >> struct data_dependence_relation *res; >> unsigned int i; >> >> - res = XNEW (struct data_dependence_relation); >> + res = XCNEW (struct data_dependence_relation); >> DDR_A (res) = a; >> DDR_B (res) = b; >> DDR_LOOP_NEST (res).create (0); >> - DDR_REVERSED_P (res) = false; >> DDR_SUBSCRIPTS (res).create (0); >> DDR_DIR_VECTS (res).create (0); >> DDR_DIST_VECTS (res).create (0); >> @@ -1506,82 +1564,277 @@ initialize_data_dependence_relation (str >> return res; >> } >> >> - /* The case where the references are exactly the same. */ >> - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) >> + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); >> + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); >> + if (num_dimensions_a == 0 || num_dimensions_b == 0) >> { >> - if ((loop_nest.exists () >> - && !object_address_invariant_in_loop_p (loop_nest[0], >> - DR_BASE_OBJECT (a))) >> - || DR_NUM_DIMENSIONS (a) == 0) >> + DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> + return res; >> + } >> + >> + /* For unconstrained bases, the root (highest-indexed) subscript >> + describes a variation in the base of the original DR_REF rather >> + than a component access. We have no type that accurately describes >> + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* >> + applying this subscript) so limit the search to the last real >> + component access. >> + >> + E.g. for: >> + >> + void >> + f (int a[][8], int b[][8]) >> + { >> + for (int i = 0; i < 8; ++i) >> + a[i * 2][0] = b[i][0]; >> + } >> + >> + the a and b accesses have a single ARRAY_REF component reference [0] >> + but have two subscripts. */ >> + if (DR_UNCONSTRAINED_BASE (a)) >> + num_dimensions_a -= 1; >> + if (DR_UNCONSTRAINED_BASE (b)) >> + num_dimensions_b -= 1; >> + >> + /* These structures describe sequences of component references in >> + DR_REF (A) and DR_REF (B). Each component reference is tied to a >> + specific access function. */ >> + struct { >> + /* The sequence starts at DR_ACCESS_FN (A, START_A) of A and >> + DR_ACCESS_FN (B, START_B) of B (inclusive) and extends to higher >> + indices. In C notation, these are the indices of the rightmost >> + component references; e.g. for a sequence .b.c.d, the start >> + index is for .d. */ >> + unsigned int start_a; >> + unsigned int start_b; >> + >> + /* The sequence contains LENGTH consecutive access functions from >> + each DR. */ >> + unsigned int length; >> + >> + /* The enclosing objects for the A and B sequences respectively, >> + i.e. the objects to which DR_ACCESS_FN (A, START_A + LENGTH - 1) >> + and DR_ACCESS_FN (B, START_B + LENGTH - 1) are applied. */ >> + tree object_a; >> + tree object_b; >> + } full_seq = {}, struct_seq = {}; >> + >> + /* Before each iteration of the loop: >> + >> + - REF_A is what you get after applying DR_ACCESS_FN (A, INDEX_A) and >> + - REF_B is what you get after applying DR_ACCESS_FN (B, INDEX_B). */ >> + unsigned int index_a = 0; >> + unsigned int index_b = 0; >> + tree ref_a = DR_REF (a); >> + tree ref_b = DR_REF (b); >> + >> + /* Now walk the component references from the final DR_REFs back up to >> + the enclosing base objects. Each component reference corresponds >> + to one access function in the DR, with access function 0 being for >> + the final DR_REF and the highest-indexed access function being the >> + one that is applied to the base of the DR. >> + >> + Look for a sequence of component references whose access functions >> + are comparable (see access_fn_components_comparable_p). If more >> + than one such sequence exists, pick the one nearest the base >> + (which is the leftmost sequence in C notation). Store this sequence >> + in FULL_SEQ. >> + >> + For example, if we have: >> + >> + struct foo { struct bar s; ... } (*a)[10], (*b)[10]; >> + >> + A: a[0][i].s.c.d >> + B: __real b[0][i].s.e[i].f >> + >> + (where d is the same type as the real component of f) then the access >> + functions would be: >> + >> + 0 1 2 3 >> + A: .d .c .s [i] >> + >> + 0 1 2 3 4 5 >> + B: __real .f [i] .e .s [i] >> + >> + The A0/B2 column isn't comparable, since .d is a COMPONENT_REF >> + and [i] is an ARRAY_REF. However, the A1/B3 column contains two >> + COMPONENT_REF accesses for struct bar, so is comparable. Likewise >> + the A2/B4 column contains two COMPONENT_REF accesses for struct foo, >> + so is comparable. The A3/B5 column contains two ARRAY_REFs that >> + index foo[10] arrays, so is again comparable. The sequence is >> + therefore: >> + >> + A: [1, 3] (i.e. [i].s.c) >> + B: [3, 5] (i.e. [i].s.e) >> + >> + Also look for sequences of component references whose access >> + functions are comparable and whose enclosing objects have the same >> + RECORD_TYPE. Store this sequence in STRUCT_SEQ. In the above >> + example, STRUCT_SEQ would be: >> + >> + A: [1, 2] (i.e. s.c) >> + B: [3, 4] (i.e. s.e) */ >> + while (index_a < num_dimensions_a && index_b < num_dimensions_b) >> + { >> + /* REF_A and REF_B must be one of the component access types >> + allowed by dr_analyze_indices. */ >> + gcc_checking_assert (access_fn_component_p (ref_a)); >> + gcc_checking_assert (access_fn_component_p (ref_b)); >> + >> + /* Get the immediately-enclosing objects for REF_A and REF_B, >> + i.e. the references *before* applying DR_ACCESS_FN (A, INDEX_A) >> + and DR_ACCESS_FN (B, INDEX_B). */ >> + tree object_a = TREE_OPERAND (ref_a, 0); >> + tree object_b = TREE_OPERAND (ref_b, 0); >> + >> + tree type_a = TREE_TYPE (object_a); >> + tree type_b = TREE_TYPE (object_b); >> + if (access_fn_components_comparable_p (ref_a, ref_b)) >> + { >> + /* This pair of component accesses is comparable for dependence >> + analysis, so we can include DR_ACCESS_FN (A, INDEX_A) and >> + DR_ACCESS_FN (B, INDEX_B) in the sequence. */ >> + if (full_seq.start_a + full_seq.length != index_a >> + || full_seq.start_b + full_seq.length != index_b) >> + { >> + /* The accesses don't extend the current sequence, >> + so start a new one here. */ >> + full_seq.start_a = index_a; >> + full_seq.start_b = index_b; >> + full_seq.length = 0; >> + } >> + >> + /* Add this pair of references to the sequence. */ >> + full_seq.length += 1; >> + full_seq.object_a = object_a; >> + full_seq.object_b = object_b; >> + >> + /* If the enclosing objects are structures (and thus have the >> + same RECORD_TYPE), record the new sequence in STRUCT_SEQ. */ >> + if (TREE_CODE (type_a) == RECORD_TYPE) >> + struct_seq = full_seq; >> + >> + /* Move to the next containing reference for both A and B. */ >> + ref_a = object_a; >> + ref_b = object_b; >> + index_a += 1; >> + index_b += 1; >> + continue; >> + } >> + >> + /* Try to approach equal type sizes. */ >> + if (!COMPLETE_TYPE_P (type_a) >> + || !COMPLETE_TYPE_P (type_b) >> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) >> + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) >> + break; >> + >> + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); >> + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); >> + if (size_a <= size_b) >> { >> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> - return res; >> + index_a += 1; >> + ref_a = object_a; >> + } >> + if (size_b <= size_a) >> + { >> + index_b += 1; >> + ref_b = object_b; >> } >> - DDR_AFFINE_P (res) = true; >> - DDR_ARE_DEPENDENT (res) = NULL_TREE; >> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); >> - DDR_LOOP_NEST (res) = loop_nest; >> - DDR_INNER_LOOP (res) = 0; >> - DDR_SELF_REFERENCE (res) = true; >> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) >> - { >> - struct subscript *subscript; >> - >> - subscript = XNEW (struct subscript); >> - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); >> - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); >> - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; >> - SUB_DISTANCE (subscript) = chrec_dont_know; >> - DDR_SUBSCRIPTS (res).safe_push (subscript); >> - } >> - return res; >> } >> >> - /* If the references do not access the same object, we do not know >> - whether they alias or not. We do not care about TBAA or alignment >> - info so we can use OEP_ADDRESS_OF to avoid false negatives. >> - But the accesses have to use compatible types as otherwise the >> - built indices would not match. */ >> - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF) >> - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), >> - TREE_TYPE (DR_BASE_OBJECT (b)))) >> + /* See whether FULL_SEQ ends at the base and whether the two bases >> + are equal. We do not care about TBAA or alignment info so we can >> + use OEP_ADDRESS_OF to avoid false negatives. */ >> + tree base_a = DR_BASE_OBJECT (a); >> + tree base_b = DR_BASE_OBJECT (b); >> + bool same_base_p = (full_seq.start_a + full_seq.length == num_dimensions_a >> + && full_seq.start_b + full_seq.length == num_dimensions_b >> + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) >> + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) >> + && types_compatible_p (TREE_TYPE (base_a), >> + TREE_TYPE (base_b)) >> + && (!loop_nest.exists () >> + || (object_address_invariant_in_loop_p >> + (loop_nest[0], base_a)))); >> + >> + /* If the bases are the same, we can include the base variation too. >> + E.g. the b accesses in: >> + >> + for (int i = 0; i < n; ++i) >> + b[i + 4][0] = b[i][0]; >> + >> + have a definite dependence distance of 4, while for: >> + >> + for (int i = 0; i < n; ++i) >> + a[i + 4][0] = b[i][0]; >> + >> + the dependence distance depends on the gap between a and b. >> + >> + If the bases are different then we can only rely on the sequence >> + rooted at a structure access, since arrays are allowed to overlap >> + arbitrarily and change shape arbitrarily. E.g. we treat this as >> + valid code: >> + >> + int a[256]; >> + ... >> + ((int (*)[4][3]) &a[1])[i][0] += ((int (*)[4][3]) &a[2])[i][0]; >> + >> + where two lvalues with the same int[4][3] type overlap, and where >> + both lvalues are distinct from the object's declared type. */ >> + if (same_base_p) >> { >> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> - return res; >> + if (DR_UNCONSTRAINED_BASE (a)) >> + full_seq.length += 1; >> } >> + else >> + full_seq = struct_seq; >> >> - /* If the base of the object is not invariant in the loop nest, we cannot >> - analyze it. TODO -- in fact, it would suffice to record that there may >> - be arbitrary dependences in the loops where the base object varies. */ >> - if ((loop_nest.exists () >> - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a))) >> - || DR_NUM_DIMENSIONS (a) == 0) >> + /* Punt if we didn't find a suitable sequence. */ >> + if (full_seq.length == 0) >> { >> DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> return res; >> } >> >> - /* If the number of dimensions of the access to not agree we can have >> - a pointer access to a component of the array element type and an >> - array access while the base-objects are still the same. Punt. */ >> - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) >> + if (!same_base_p) >> { >> - DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> - return res; >> + /* Partial overlap is possible for different bases when strict aliasing >> + is not in effect. It's also possible if either base involves a union >> + access; e.g. for: >> + >> + struct s1 { int a[2]; }; >> + struct s2 { struct s1 b; int c; }; >> + struct s3 { int d; struct s1 e; }; >> + union u { struct s2 f; struct s3 g; } *p, *q; >> + >> + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at >> + "p->g.e" (base "p->g") and might partially overlap the s1 at >> + "q->g.e" (base "q->g"). */ >> + if (!flag_strict_aliasing >> + || ref_contains_union_access_p (full_seq.object_a) >> + || ref_contains_union_access_p (full_seq.object_b)) >> + { >> + DDR_ARE_DEPENDENT (res) = chrec_dont_know; >> + return res; >> + } >> + >> + DDR_COULD_BE_INDEPENDENT_P (res) = true; >> } >> >> DDR_AFFINE_P (res) = true; >> DDR_ARE_DEPENDENT (res) = NULL_TREE; >> - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); >> + DDR_SUBSCRIPTS (res).create (full_seq.length); >> DDR_LOOP_NEST (res) = loop_nest; >> DDR_INNER_LOOP (res) = 0; >> DDR_SELF_REFERENCE (res) = false; >> >> - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) >> + for (i = 0; i < full_seq.length; ++i) >> { >> struct subscript *subscript; >> >> subscript = XNEW (struct subscript); >> + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, full_seq.start_a + i); >> + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, full_seq.start_b + i); >> SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); >> SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); >> SUB_LAST_CONFLICT (subscript) = chrec_dont_know; >> @@ -3163,14 +3416,15 @@ add_outer_distances (struct data_depende >> } >> >> /* Return false when fail to represent the data dependence as a >> - distance vector. INIT_B is set to true when a component has been >> + distance vector. A_INDEX is the index of the first reference >> + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the >> + second reference. INIT_B is set to true when a component has been >> added to the distance vector DIST_V. INDEX_CARRY is then set to >> the index in DIST_V that carries the dependence. */ >> >> static bool >> build_classic_dist_vector_1 (struct data_dependence_relation *ddr, >> - struct data_reference *ddr_a, >> - struct data_reference *ddr_b, >> + unsigned int a_index, unsigned int b_index, >> lambda_vector dist_v, bool *init_b, >> int *index_carry) >> { >> @@ -3188,8 +3442,8 @@ build_classic_dist_vector_1 (struct data >> return false; >> } >> >> - access_fn_a = DR_ACCESS_FN (ddr_a, i); >> - access_fn_b = DR_ACCESS_FN (ddr_b, i); >> + access_fn_a = SUB_ACCESS_FN (subscript, a_index); >> + access_fn_b = SUB_ACCESS_FN (subscript, b_index); >> >> if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC >> && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) >> @@ -3249,10 +3503,11 @@ build_classic_dist_vector_1 (struct data >> constant_access_functions (const struct data_dependence_relation *ddr) >> { >> unsigned i; >> + subscript *sub; >> >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) >> - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) >> + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) >> return false; >> >> return true; >> @@ -3315,10 +3570,11 @@ add_other_self_distances (struct data_de >> lambda_vector dist_v; >> unsigned i; >> int index_carry = DDR_NB_LOOPS (ddr); >> + subscript *sub; >> >> - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> { >> - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); >> + tree access_fun = SUB_ACCESS_FN (sub, 0); >> >> if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) >> { >> @@ -3330,7 +3586,7 @@ add_other_self_distances (struct data_de >> return; >> } >> >> - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); >> + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); >> >> if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) >> add_multivariate_self_dist (ddr, access_fun); >> @@ -3401,6 +3657,23 @@ add_distance_for_zero_overlaps (struct d >> } >> } >> >> +/* Return true when the DDR contains two data references that have the >> + same access functions. */ >> + >> +static inline bool >> +same_access_functions (const struct data_dependence_relation *ddr) >> +{ >> + unsigned i; >> + subscript *sub; >> + >> + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) >> + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), >> + SUB_ACCESS_FN (sub, 1))) >> + return false; >> + >> + return true; >> +} >> + >> /* Compute the classic per loop distance vector. DDR is the data >> dependence relation to build a vector from. Return false when fail >> to represent the data dependence as a distance vector. */ >> @@ -3432,8 +3705,7 @@ build_classic_dist_vector (struct data_d >> } >> >> dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >> - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), >> - dist_v, &init_b, &index_carry)) >> + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry)) >> return false; >> >> /* Save the distance vector if we initialized one. */ >> @@ -3466,12 +3738,11 @@ build_classic_dist_vector (struct data_d >> if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) >> { >> lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), >> - loop_nest)) >> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) >> return false; >> compute_subscript_distance (ddr); >> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), >> - save_v, &init_b, &index_carry)) >> + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, >> + &index_carry)) >> return false; >> save_dist_v (ddr, save_v); >> DDR_REVERSED_P (ddr) = true; >> @@ -3507,12 +3778,10 @@ build_classic_dist_vector (struct data_d >> { >> lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); >> >> - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), >> - DDR_A (ddr), loop_nest)) >> + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) >> return false; >> compute_subscript_distance (ddr); >> - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), >> - opposite_v, &init_b, >> + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, >> &index_carry)) >> return false; >> >> @@ -3591,13 +3860,13 @@ build_classic_dir_vector (struct data_de >> } >> } >> >> -/* Helper function. Returns true when there is a dependence between >> - data references DRA and DRB. */ >> +/* Helper function. Returns true when there is a dependence between the >> + data references. A_INDEX is the index of the first reference (0 for >> + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ >> >> static bool >> subscript_dependence_tester_1 (struct data_dependence_relation *ddr, >> - struct data_reference *dra, >> - struct data_reference *drb, >> + unsigned int a_index, unsigned int b_index, >> struct loop *loop_nest) >> { >> unsigned int i; >> @@ -3609,8 +3878,8 @@ subscript_dependence_tester_1 (struct da >> { >> conflict_function *overlaps_a, *overlaps_b; >> >> - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), >> - DR_ACCESS_FN (drb, i), >> + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), >> + SUB_ACCESS_FN (subscript, b_index), >> &overlaps_a, &overlaps_b, >> &last_conflicts, loop_nest); >> >> @@ -3659,7 +3928,7 @@ subscript_dependence_tester_1 (struct da >> subscript_dependence_tester (struct data_dependence_relation *ddr, >> struct loop *loop_nest) >> { >> - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest)) >> + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) >> dependence_stats.num_dependence_dependent++; >> >> compute_subscript_distance (ddr); >> Index: gcc/tree-ssa-loop-prefetch.c >> =================================================================== >> --- gcc/tree-ssa-loop-prefetch.c 2017-05-18 07:51:26.127377591 +0100 >> +++ gcc/tree-ssa-loop-prefetch.c 2017-05-18 07:51:50.871904726 +0100 >> @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop * >> refb = (struct mem_ref *) DDR_B (dep)->aux; >> >> if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know >> + || DDR_COULD_BE_INDEPENDENT_P (dep) >> || DDR_NUM_DIST_VECTS (dep) == 0) >> { >> /* If the dependence cannot be analyzed, assume that there might be >> Index: gcc/tree-vectorizer.h >> =================================================================== >> --- gcc/tree-vectorizer.h 2017-05-18 07:51:26.128377491 +0100 >> +++ gcc/tree-vectorizer.h 2017-05-18 07:51:50.872904626 +0100 >> @@ -383,7 +383,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) >> #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ >> ((L)->may_misalign_stmts.length () > 0) >> #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ >> - ((L)->may_alias_ddrs.length () > 0) >> + ((L)->comp_alias_ddrs.length () > 0) >> #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ >> (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) >> #define LOOP_REQUIRES_VERSIONING(L) \ >> Index: gcc/tree-vect-data-refs.c >> =================================================================== >> --- gcc/tree-vect-data-refs.c 2017-05-18 07:51:23.307659382 +0100 >> +++ gcc/tree-vect-data-refs.c 2017-05-18 07:51:50.872904626 +0100 >> @@ -340,6 +340,26 @@ vect_analyze_data_ref_dependence (struct >> } >> >> loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); >> + >> + if (DDR_COULD_BE_INDEPENDENT_P (ddr)) >> + /* For dependence distances of 2 or more, we have the option of >> + limiting VF or checking for an alias at runtime. Prefer to check >> + at runtime if we can, to avoid limiting the VF unnecessarily when >> + the bases are in fact independent. >> + >> + Note that the alias checks will be removed if the VF ends up >> + being small enough. */ >> + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) >> + { >> + int dist = dist_v[loop_depth]; >> + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) >> + { >> + if (vect_mark_for_runtime_alias_test (ddr, loop_vinfo)) >> + return false; >> + break; >> + } >> + } >> + >> FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) >> { >> int dist = dist_v[loop_depth]; >> @@ -3017,6 +3037,44 @@ vect_no_alias_p (struct data_reference * >> return false; >> } >> >> +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH >> + in DDR is >= VF. */ >> + >> +static bool >> +dependence_distance_ge_vf (data_dependence_relation *ddr, >> + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) >> +{ >> + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE >> + || DDR_NUM_DIST_VECTS (ddr) == 0) >> + return false; >> + >> + /* If the dependence is exact, we should have limited the VF instead. */ >> + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); >> + >> + unsigned int i; >> + lambda_vector dist_v; >> + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) >> + { >> + HOST_WIDE_INT dist = dist_v[loop_depth]; >> + if (dist != 0 >> + && !(dist > 0 && DDR_REVERSED_P (ddr)) >> + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) >> + return false; >> + } >> + >> + if (dump_enabled_p ()) >> + { >> + dump_printf_loc (MSG_NOTE, vect_location, >> + "dependence distance between "); >> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); >> + dump_printf (MSG_NOTE, " and "); >> + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); >> + dump_printf (MSG_NOTE, " is >= VF\n"); >> + } >> + >> + return true; >> +} >> + >> /* Function vect_prune_runtime_alias_test_list. >> >> Prune a list of ddrs to be tested at run-time by versioning for alias. >> @@ -3075,6 +3133,10 @@ vect_prune_runtime_alias_test_list (loop >> >> comp_alias_ddrs.create (may_alias_ddrs.length ()); >> >> + unsigned int loop_depth >> + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, >> + LOOP_VINFO_LOOP_NEST (loop_vinfo)); >> + >> /* First, we collect all data ref pairs for aliasing checks. */ >> FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) >> { >> @@ -3084,6 +3146,11 @@ vect_prune_runtime_alias_test_list (loop >> tree segment_length_a, segment_length_b; >> gimple *stmt_a, *stmt_b; >> >> + /* Ignore the alias if the VF we chose ended up being no greater >> + than the dependence distance. */ >> + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) >> + continue; >> + >> dr_a = DDR_A (ddr); >> stmt_a = DR_STMT (DDR_A (ddr)); >> dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); >> @@ -3294,10 +3361,6 @@ vect_prune_runtime_alias_test_list (loop >> return false; >> } >> >> - /* All alias checks have been resolved at compilation time. */ >> - if (!comp_alias_ddrs.length ()) >> - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); >> - >> return true; >> } >> >> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c >> =================================================================== >> --- /dev/null 2017-05-17 17:16:48.996861112 +0100 >> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-05-18 07:51:50.870904826 +0100 >> @@ -0,0 +1,112 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ >> + >> +/* Intended to be larger than any VF. */ >> +#define GAP 128 >> +#define N (GAP * 3) >> + >> +struct s { int x[N + 1]; }; >> +struct t { struct s x[N + 1]; }; >> +struct u { int x[N + 1]; int y; }; >> +struct v { struct s s; }; >> + >> +void >> +f1 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[i]; >> +} >> + >> +void >> +f2 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[1].x[i] += b[2].x[i]; >> +} >> + >> +void >> +f3 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[1].x[i] += b[i].x[i]; >> +} >> + >> +void >> +f4 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[i].x[i] += b[i].x[i]; >> +} >> + >> +void >> +f5 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[i + 1]; >> +} >> + >> +void >> +f6 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[1].x[i] += b[2].x[i + 1]; >> +} >> + >> +void >> +f7 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[1].x[i] += b[i].x[i + 1]; >> +} >> + >> +void >> +f8 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[i].x[i] += b[i].x[i + 1]; >> +} >> + >> +void >> +f9 (struct s *a, struct t *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[1].x[i]; >> +} >> + >> +void >> +f10 (struct s *a, struct t *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[i].x[i]; >> +} >> + >> +void >> +f11 (struct u *a, struct u *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->x[i] += b->x[i] + b[i].y; >> +} >> + >> +void >> +f12 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < GAP; ++i) >> + a->x[i + GAP] += b->x[i]; >> +} >> + >> +void >> +f13 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < GAP * 2; ++i) >> + a->x[i + GAP] += b->x[i]; >> +} >> + >> +void >> +f14 (struct v *a, struct s *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->s.x[i] = b->x[i]; >> +} >> + >> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 14 "vect" } } */ >> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c >> =================================================================== >> --- /dev/null 2017-05-17 17:16:48.996861112 +0100 >> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-05-18 07:51:50.870904826 +0100 >> @@ -0,0 +1,35 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_int } */ >> +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ >> + >> +#define N 16 >> + >> +struct s1 { int a[N]; }; >> +struct s2 { struct s1 b; int c; }; >> +struct s3 { int d; struct s1 e; }; >> +union u { struct s2 f; struct s3 g; }; >> + >> +/* We allow a and b to overlap arbitrarily. */ >> + >> +void >> +f1 (int a[][N], int b[][N]) >> +{ >> + for (int i = 0; i < N; ++i) >> + a[0][i] += b[0][i]; >> +} >> + >> +void >> +f2 (union u *a, union u *b) >> +{ >> + for (int i = 0; i < N; ++i) >> + a->f.b.a[i] += b->g.e.a[i]; >> +} >> + >> +void >> +f3 (struct s1 *a, struct s1 *b) >> +{ >> + for (int i = 0; i < N - 1; ++i) >> + a->a[i + 1] += b->a[i]; >> +} >> + >> +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ >> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c >> =================================================================== >> --- /dev/null 2017-05-17 17:16:48.996861112 +0100 >> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-05-18 07:51:50.870904826 +0100 >> @@ -0,0 +1,19 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target vect_int } */ >> + >> +/* Intended to be larger than any VF. */ >> +#define GAP 128 >> +#define N (GAP * 3) >> + >> +struct s { int x[N]; }; >> + >> +void >> +f1 (struct s *a, struct s *b) >> +{ >> + for (int i = 0; i < GAP * 2; ++i) >> + a->x[i + GAP] += b->x[i]; >> +} >> + >> +/* { dg-final { scan-tree-dump-times "mark for run-time aliasing" 1 "vect" } } */ >> +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */ >> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */

Richard Biener <richard.guenther@gmail.com> writes: >>> How does this look? Changes since v1: >>> >>> - Added access_fn_component_p to check for valid access function components. >>> >>> - Added access_fn_components_comparable_p instead of using >>> types_compatibloe_p directly. >>> >>> - Added more commentary. >>> >>> - Added local structures to represent the sequence, so that it's >>> more obvious which variables are temporaries and which aren't. >>> >>> - Added the test above to vect-alias-check-3.c. >>> >>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. > > This is ok. Thanks. Just been retesting, and I think I must have forgotten to include Ada last time. It turns out that the patch causes a dg-scan regression in gnat.dg/vect17.adb, because we now think that if the array RECORD_TYPEs *do* alias in: procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is begin for I in Sarray'Range loop R(I) := X(I) + Y(I); end loop; end; then the dependence distance must be zero. Eric, does that hold true for Ada? I.e. if X and R (or Y and R) alias, must it be the case that X(I) can only alias R(I) and not for example R(I-1) or R(I+1)? Or are the arrays allowed to overlap by an arbitrary number of indices? If the assumption is correct, is the patch below OK? Thanks, Richard 2017-06-07 Richard Sandiford <richard.sandiford@linaro.org> gcc/testsuite/ * gnat.dg/vect17.ads (Sarray): Increase range to 1 .. 5. * gnat.dg/vect17.adb (Add): Create a dependence distance of 1 when X = R or Y = R.Index: gcc/testsuite/gnat.dg/vect17.ads =================================================================== --- gcc/testsuite/gnat.dg/vect17.ads 2015-10-14 14:58:56.000000000 +0100 +++ gcc/testsuite/gnat.dg/vect17.ads 2017-06-07 22:10:24.796368118 +0100 @@ -1,6 +1,6 @@ package Vect17 is - type Sarray is array (1 .. 4) of Long_Float; + type Sarray is array (1 .. 5) of Long_Float; for Sarray'Alignment use 16; procedure Add (X, Y : aliased Sarray; R : aliased out Sarray); Index: gcc/testsuite/gnat.dg/vect17.adb =================================================================== --- gcc/testsuite/gnat.dg/vect17.adb 2015-10-14 14:58:56.000000000 +0100 +++ gcc/testsuite/gnat.dg/vect17.adb 2017-06-07 22:10:24.796368118 +0100 @@ -5,8 +5,9 @@ package body Vect17 is procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is begin - for I in Sarray'Range loop - R(I) := X(I) + Y(I); + R(1) := X(5) + Y(5); + for I in 1 .. 4 loop + R(I + 1) := X(I) + Y(I); end loop; end;

Ping for this Ada patch/question. Richard Sandiford <richard.sandiford@linaro.org> writes: > Richard Biener <richard.guenther@gmail.com> writes: >>>> How does this look? Changes since v1: >>>> >>>> - Added access_fn_component_p to check for valid access function components. >>>> >>>> - Added access_fn_components_comparable_p instead of using >>>> types_compatibloe_p directly. >>>> >>>> - Added more commentary. >>>> >>>> - Added local structures to represent the sequence, so that it's >>>> more obvious which variables are temporaries and which aren't. >>>> >>>> - Added the test above to vect-alias-check-3.c. >>>> >>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. >> >> This is ok. > > Thanks. Just been retesting, and I think I must have forgotten > to include Ada last time. It turns out that the patch causes a dg-scan > regression in gnat.dg/vect17.adb, because we now think that if the > array RECORD_TYPEs *do* alias in: > > procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is > begin > for I in Sarray'Range loop > R(I) := X(I) + Y(I); > end loop; > end; > > then the dependence distance must be zero. Eric, does that hold true > for Ada? I.e. if X and R (or Y and R) alias, must it be the case that > X(I) can only alias R(I) and not for example R(I-1) or R(I+1)? Or are > the arrays allowed to overlap by an arbitrary number of indices? > > If the assumption is correct, is the patch below OK? > > Thanks, > Richard > > > 2017-06-07 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/testsuite/ > * gnat.dg/vect17.ads (Sarray): Increase range to 1 .. 5. > * gnat.dg/vect17.adb (Add): Create a dependence distance of 1 > when X = R or Y = R. > > Index: gcc/testsuite/gnat.dg/vect17.ads > =================================================================== > --- gcc/testsuite/gnat.dg/vect17.ads 2015-10-14 14:58:56.000000000 +0100 > +++ gcc/testsuite/gnat.dg/vect17.ads 2017-06-07 22:10:24.796368118 +0100 > @@ -1,6 +1,6 @@ > package Vect17 is > > - type Sarray is array (1 .. 4) of Long_Float; > + type Sarray is array (1 .. 5) of Long_Float; > for Sarray'Alignment use 16; > > procedure Add (X, Y : aliased Sarray; R : aliased out Sarray); > Index: gcc/testsuite/gnat.dg/vect17.adb > =================================================================== > --- gcc/testsuite/gnat.dg/vect17.adb 2015-10-14 14:58:56.000000000 +0100 > +++ gcc/testsuite/gnat.dg/vect17.adb 2017-06-07 22:10:24.796368118 +0100 > @@ -5,8 +5,9 @@ package body Vect17 is > > procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is > begin > - for I in Sarray'Range loop > - R(I) := X(I) + Y(I); > + R(1) := X(5) + Y(5); > + for I in 1 .. 4 loop > + R(I + 1) := X(I) + Y(I); > end loop; > end; >

Ping*2 Richard Sandiford <richard.sandiford@linaro.org> writes: > Ping for this Ada patch/question. > > Richard Sandiford <richard.sandiford@linaro.org> writes: >> Richard Biener <richard.guenther@gmail.com> writes: >>>>> How does this look? Changes since v1: >>>>> >>>>> - Added access_fn_component_p to check for valid access function >>>>> components. >>>>> >>>>> - Added access_fn_components_comparable_p instead of using >>>>> types_compatibloe_p directly. >>>>> >>>>> - Added more commentary. >>>>> >>>>> - Added local structures to represent the sequence, so that it's >>>>> more obvious which variables are temporaries and which aren't. >>>>> >>>>> - Added the test above to vect-alias-check-3.c. >>>>> >>>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu. >>> >>> This is ok. >> >> Thanks. Just been retesting, and I think I must have forgotten >> to include Ada last time. It turns out that the patch causes a dg-scan >> regression in gnat.dg/vect17.adb, because we now think that if the >> array RECORD_TYPEs *do* alias in: >> >> procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is >> begin >> for I in Sarray'Range loop >> R(I) := X(I) + Y(I); >> end loop; >> end; >> >> then the dependence distance must be zero. Eric, does that hold true >> for Ada? I.e. if X and R (or Y and R) alias, must it be the case that >> X(I) can only alias R(I) and not for example R(I-1) or R(I+1)? Or are >> the arrays allowed to overlap by an arbitrary number of indices? >> >> If the assumption is correct, is the patch below OK? >> >> Thanks, >> Richard >> >> >> 2017-06-07 Richard Sandiford <richard.sandiford@linaro.org> >> >> gcc/testsuite/ >> * gnat.dg/vect17.ads (Sarray): Increase range to 1 .. 5. >> * gnat.dg/vect17.adb (Add): Create a dependence distance of 1 >> when X = R or Y = R. >> >> Index: gcc/testsuite/gnat.dg/vect17.ads >> =================================================================== >> --- gcc/testsuite/gnat.dg/vect17.ads 2015-10-14 14:58:56.000000000 +0100 >> +++ gcc/testsuite/gnat.dg/vect17.ads 2017-06-07 22:10:24.796368118 +0100 >> @@ -1,6 +1,6 @@ >> package Vect17 is >> >> - type Sarray is array (1 .. 4) of Long_Float; >> + type Sarray is array (1 .. 5) of Long_Float; >> for Sarray'Alignment use 16; >> >> procedure Add (X, Y : aliased Sarray; R : aliased out Sarray); >> Index: gcc/testsuite/gnat.dg/vect17.adb >> =================================================================== >> --- gcc/testsuite/gnat.dg/vect17.adb 2015-10-14 14:58:56.000000000 +0100 >> +++ gcc/testsuite/gnat.dg/vect17.adb 2017-06-07 22:10:24.796368118 +0100 >> @@ -5,8 +5,9 @@ package body Vect17 is >> >> procedure Add (X, Y : aliased Sarray; R : aliased out Sarray) is >> begin >> - for I in Sarray'Range loop >> - R(I) := X(I) + Y(I); >> + R(1) := X(5) + Y(5); >> + for I in 1 .. 4 loop >> + R(I + 1) := X(I) + Y(I); >> end loop; >> end; >>

Index: gcc/tree-data-ref.h =================================================================== --- gcc/tree-data-ref.h 2017-05-03 08:48:11.977015306 +0100 +++ gcc/tree-data-ref.h 2017-05-03 08:48:48.737038502 +0100 @@ -191,6 +191,9 @@ struct conflict_function struct subscript { + /* The access functions of the two references. */ + tree access_fn[2]; + /* A description of the iterations for which the elements are accessed twice. */ conflict_function *conflicting_iterations_in_a; @@ -209,6 +212,7 @@ struct subscript typedef struct subscript *subscript_p; +#define SUB_ACCESS_FN(SUB, I) (SUB)->access_fn[I] #define SUB_CONFLICTS_IN_A(SUB) (SUB)->conflicting_iterations_in_a #define SUB_CONFLICTS_IN_B(SUB) (SUB)->conflicting_iterations_in_b #define SUB_LAST_CONFLICT(SUB) (SUB)->last_conflict @@ -264,6 +268,33 @@ struct data_dependence_relation /* Set to true when the dependence relation is on the same data access. */ bool self_reference_p; + + /* True if the dependence described is conservatively correct rather + than exact, and if it is still possible for the accesses to be + conditionally independent. For example, the a and b references in: + + struct s *a, *b; + for (int i = 0; i < n; ++i) + a->f[i] += b->f[i]; + + conservatively have a distance vector of (0), for the case in which + a == b, but the accesses are independent if a != b. Similarly, + the a and b references in: + + struct s *a, *b; + for (int i = 0; i < n; ++i) + a[0].f[i] += b[i].f[i]; + + conservatively have a distance vector of (0), but they are indepenent + when a != b + i. In contrast, the references in: + + struct s *a; + for (int i = 0; i < n; ++i) + a->f[i] += a->f[i]; + + have the same distance vector of (0), but the accesses can never be + independent. */ + bool could_be_independent_p; }; typedef struct data_dependence_relation *ddr_p; @@ -294,6 +325,7 @@ #define DDR_DIR_VECT(DDR, I) \ #define DDR_DIST_VECT(DDR, I) \ DDR_DIST_VECTS (DDR)[I] #define DDR_REVERSED_P(DDR) (DDR)->reversed_p +#define DDR_COULD_BE_INDEPENDENT_P(DDR) (DDR)->could_be_independent_p bool dr_analyze_innermost (struct data_reference *, struct loop *); @@ -372,22 +404,6 @@ same_data_refs (data_reference_p a, data return false; return true; -} - -/* Return true when the DDR contains two data references that have the - same access functions. */ - -static inline bool -same_access_functions (const struct data_dependence_relation *ddr) -{ - unsigned i; - - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) - if (!eq_evolutions_p (DR_ACCESS_FN (DDR_A (ddr), i), - DR_ACCESS_FN (DDR_B (ddr), i))) - return false; - - return true; } /* Returns true when all the dependences are computable. */ Index: gcc/tree-data-ref.c =================================================================== --- gcc/tree-data-ref.c 2017-02-23 19:54:15.000000000 +0000 +++ gcc/tree-data-ref.c 2017-05-03 08:48:48.737038502 +0100 @@ -123,8 +123,7 @@ Software Foundation; either version 3, o } dependence_stats; static bool subscript_dependence_tester_1 (struct data_dependence_relation *, - struct data_reference *, - struct data_reference *, + unsigned int, unsigned int, struct loop *); /* Returns true iff A divides B. */ @@ -144,6 +143,21 @@ int_divides_p (int a, int b) return ((b % a) == 0); } +/* Return true if reference REF contains a union access. */ + +static bool +ref_contains_union_access_p (tree ref) +{ + while (handled_component_p (ref)) + { + ref = TREE_OPERAND (ref, 0); + if (TREE_CODE (TREE_TYPE (ref)) == UNION_TYPE + || TREE_CODE (TREE_TYPE (ref)) == QUAL_UNION_TYPE) + return true; + } + return false; +} + /* Dump into FILE all the data references from DATAREFS. */ @@ -433,13 +447,14 @@ dump_data_dependence_relation (FILE *out unsigned int i; struct loop *loopi; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) + subscript *sub; + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) { fprintf (outf, " access_fn_A: "); - print_generic_stmt (outf, DR_ACCESS_FN (dra, i), 0); + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 0), 0); fprintf (outf, " access_fn_B: "); - print_generic_stmt (outf, DR_ACCESS_FN (drb, i), 0); - dump_subscript (outf, DDR_SUBSCRIPT (ddr, i)); + print_generic_stmt (outf, SUB_ACCESS_FN (sub, 1), 0); + dump_subscript (outf, sub); } fprintf (outf, " inner loop index: %d\n", DDR_INNER_LOOP (ddr)); @@ -1484,11 +1499,10 @@ initialize_data_dependence_relation (str struct data_dependence_relation *res; unsigned int i; - res = XNEW (struct data_dependence_relation); + res = XCNEW (struct data_dependence_relation); DDR_A (res) = a; DDR_B (res) = b; DDR_LOOP_NEST (res).create (0); - DDR_REVERSED_P (res) = false; DDR_SUBSCRIPTS (res).create (0); DDR_DIR_VECTS (res).create (0); DDR_DIST_VECTS (res).create (0); @@ -1506,82 +1520,217 @@ initialize_data_dependence_relation (str return res; } - /* The case where the references are exactly the same. */ - if (operand_equal_p (DR_REF (a), DR_REF (b), 0)) + unsigned int num_dimensions_a = DR_NUM_DIMENSIONS (a); + unsigned int num_dimensions_b = DR_NUM_DIMENSIONS (b); + if (num_dimensions_a == 0 || num_dimensions_b == 0) { - if ((loop_nest.exists () - && !object_address_invariant_in_loop_p (loop_nest[0], - DR_BASE_OBJECT (a))) - || DR_NUM_DIMENSIONS (a) == 0) - { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; - } - DDR_AFFINE_P (res) = true; - DDR_ARE_DEPENDENT (res) = NULL_TREE; - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); - DDR_LOOP_NEST (res) = loop_nest; - DDR_INNER_LOOP (res) = 0; - DDR_SELF_REFERENCE (res) = true; - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) - { - struct subscript *subscript; + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + /* For unconstrained bases, the outer (highest-index) subscript + describes a variation in the base of the original DR_REF rather + than a component access. We have no type that accurately describes + the new DR_BASE_OBJECT (whose TREE_TYPE describes the type *after* + applying the outer subscript) so limit the search to the last real + component access. + + E.g. for: - subscript = XNEW (struct subscript); - SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); - SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); - SUB_LAST_CONFLICT (subscript) = chrec_dont_know; - SUB_DISTANCE (subscript) = chrec_dont_know; - DDR_SUBSCRIPTS (res).safe_push (subscript); + void + f (int a[][8], int b[][8]) + { + for (int i = 0; i < 8; ++i) + a[i * 2][0] = b[i][0]; } - return res; + + the a and b accesses have a single ARRAY_REF component reference [0] + but have two subscripts. */ + if (DR_UNCONSTRAINED_BASE (a)) + num_dimensions_a -= 1; + if (DR_UNCONSTRAINED_BASE (b)) + num_dimensions_b -= 1; + + /* Now look for two sequences of component references that have the same + type in both A and B. The first sequence includes an arbitrary mixture + of array and structure references while the second always ends on a + structure reference. + + The former (arbitrary) sequence uses access functions: + + [START_A, START_A + NUM_DIMENSIONS) of A + [START_B, START_B + NUM_DIMENSIONS) of B + + The latter sequence uses access functions: + + [STRUCT_START_A, STRUCT_START_A + STRUCT_NUM_DIMENSIONS) of A + [STRUCT_START_B, STRUCT_START_B + STRUCT_NUM_DIMENSIONS) of B + + STRUCT_REF_A and STRUCT_REF_B are the outer references for the + latter sequence. */ + unsigned int start_a = 0; + unsigned int start_b = 0; + unsigned int num_dimensions = 0; + unsigned int struct_start_a = 0; + unsigned int struct_start_b = 0; + unsigned int struct_num_dimensions = 0; + unsigned int index_a = 0; + unsigned int index_b = 0; + tree next_ref_a = DR_REF (a); + tree next_ref_b = DR_REF (b); + tree struct_ref_a = NULL_TREE; + tree struct_ref_b = NULL_TREE; + while (index_a < num_dimensions_a && index_b < num_dimensions_b) + { + gcc_checking_assert (handled_component_p (next_ref_a)); + gcc_checking_assert (handled_component_p (next_ref_b)); + tree outer_ref_a = TREE_OPERAND (next_ref_a, 0); + tree outer_ref_b = TREE_OPERAND (next_ref_b, 0); + tree type_a = TREE_TYPE (outer_ref_a); + tree type_b = TREE_TYPE (outer_ref_b); + if (types_compatible_p (type_a, type_b)) + { + /* This pair of accesses belong to a suitable sequence. */ + if (start_a + num_dimensions != index_a + || start_b + num_dimensions != index_b) + { + /* Start a new sequence here. */ + start_a = index_a; + start_b = index_b; + num_dimensions = 0; + } + num_dimensions += 1; + if (TREE_CODE (type_a) == RECORD_TYPE) + { + struct_start_a = start_a; + struct_start_b = start_b; + struct_num_dimensions = num_dimensions; + struct_ref_a = outer_ref_a; + struct_ref_b = outer_ref_b; + } + next_ref_a = outer_ref_a; + next_ref_b = outer_ref_b; + index_a += 1; + index_b += 1; + continue; + } + /* Try to approach equal type sizes. */ + if (!COMPLETE_TYPE_P (type_a) + || !COMPLETE_TYPE_P (type_b) + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_a)) + || !tree_fits_uhwi_p (TYPE_SIZE_UNIT (type_b))) + break; + unsigned HOST_WIDE_INT size_a = tree_to_uhwi (TYPE_SIZE_UNIT (type_a)); + unsigned HOST_WIDE_INT size_b = tree_to_uhwi (TYPE_SIZE_UNIT (type_b)); + if (size_a <= size_b) + { + index_a += 1; + next_ref_a = outer_ref_a; + } + if (size_b <= size_a) + { + index_b += 1; + next_ref_b = outer_ref_b; + } } - /* If the references do not access the same object, we do not know - whether they alias or not. We do not care about TBAA or alignment - info so we can use OEP_ADDRESS_OF to avoid false negatives. - But the accesses have to use compatible types as otherwise the - built indices would not match. */ - if (!operand_equal_p (DR_BASE_OBJECT (a), DR_BASE_OBJECT (b), OEP_ADDRESS_OF) - || !types_compatible_p (TREE_TYPE (DR_BASE_OBJECT (a)), - TREE_TYPE (DR_BASE_OBJECT (b)))) + /* See whether the sequence ends at the base and whether the two bases + are equal. We do not care about TBAA or alignment info so we can use + OEP_ADDRESS_OF to avoid false negatives. */ + tree base_a = DR_BASE_OBJECT (a); + tree base_b = DR_BASE_OBJECT (b); + bool same_base_p = (start_a + num_dimensions == num_dimensions_a + && start_b + num_dimensions == num_dimensions_b + && DR_UNCONSTRAINED_BASE (a) == DR_UNCONSTRAINED_BASE (b) + && operand_equal_p (base_a, base_b, OEP_ADDRESS_OF) + && types_compatible_p (TREE_TYPE (base_a), + TREE_TYPE (base_b)) + && (!loop_nest.exists () + || (object_address_invariant_in_loop_p + (loop_nest[0], base_a)))); + + /* If the bases are the same, we can include the base variation too. + E.g. the b accesses in: + + for (int i = 0; i < n; ++i) + b[i + 4][0] = b[i][0]; + + have a definite dependence distance of 4, while for: + + for (int i = 0; i < n; ++i) + a[i + 4][0] = b[i][0]; + + the dependence distance depends on the gap between a and b. + + If the bases are different then we can only rely on the sequence + rooted at a structure access, since arrays are allowed to overlap + arbitrarily and change shape arbitrarily. E.g. we treat this as + valid code: + + int a[256]; + ... + ((int (*)[4][3])&a[1])[i][0] += ((int (*)[4][3])&a[2])[i][0]; + + where two lvalues with the same int[4][3] type overlap, and where + both lvalues are distinct from the object's declared type. */ + if (same_base_p) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + if (DR_UNCONSTRAINED_BASE (a)) + num_dimensions += 1; + } + else + { + start_a = struct_start_a; + start_b = struct_start_b; + num_dimensions = struct_num_dimensions; } - /* If the base of the object is not invariant in the loop nest, we cannot - analyze it. TODO -- in fact, it would suffice to record that there may - be arbitrary dependences in the loops where the base object varies. */ - if ((loop_nest.exists () - && !object_address_invariant_in_loop_p (loop_nest[0], DR_BASE_OBJECT (a))) - || DR_NUM_DIMENSIONS (a) == 0) + /* Punt if we didn't find a suitable sequence. */ + if (num_dimensions == 0) { DDR_ARE_DEPENDENT (res) = chrec_dont_know; return res; } - /* If the number of dimensions of the access to not agree we can have - a pointer access to a component of the array element type and an - array access while the base-objects are still the same. Punt. */ - if (DR_NUM_DIMENSIONS (a) != DR_NUM_DIMENSIONS (b)) + if (!same_base_p) { - DDR_ARE_DEPENDENT (res) = chrec_dont_know; - return res; + /* Partial overlap is possible for different bases when strict aliasing + is not in effect. It's also possible if either base involves a union + access; e.g. for: + + struct s1 { int a[2]; }; + struct s2 { struct s1 b; int c; }; + struct s3 { int d; struct s1 e; }; + union u { struct s2 f; struct s3 g; } *p, *q; + + the s1 at "p->f.b" (base "p->f") partially overlaps the s1 at + "p->g.e" (base "p->g") and might partially overlap the s1 at + "q->g.e" (base "q->g"). */ + if (!flag_strict_aliasing + || ref_contains_union_access_p (struct_ref_a) + || ref_contains_union_access_p (struct_ref_b)) + { + DDR_ARE_DEPENDENT (res) = chrec_dont_know; + return res; + } + + DDR_COULD_BE_INDEPENDENT_P (res) = true; } DDR_AFFINE_P (res) = true; DDR_ARE_DEPENDENT (res) = NULL_TREE; - DDR_SUBSCRIPTS (res).create (DR_NUM_DIMENSIONS (a)); + DDR_SUBSCRIPTS (res).create (num_dimensions); DDR_LOOP_NEST (res) = loop_nest; DDR_INNER_LOOP (res) = 0; DDR_SELF_REFERENCE (res) = false; - for (i = 0; i < DR_NUM_DIMENSIONS (a); i++) + for (i = 0; i < num_dimensions; ++i) { struct subscript *subscript; subscript = XNEW (struct subscript); + SUB_ACCESS_FN (subscript, 0) = DR_ACCESS_FN (a, start_a + i); + SUB_ACCESS_FN (subscript, 1) = DR_ACCESS_FN (b, start_b + i); SUB_CONFLICTS_IN_A (subscript) = conflict_fn_not_known (); SUB_CONFLICTS_IN_B (subscript) = conflict_fn_not_known (); SUB_LAST_CONFLICT (subscript) = chrec_dont_know; @@ -3163,14 +3312,15 @@ add_outer_distances (struct data_depende } /* Return false when fail to represent the data dependence as a - distance vector. INIT_B is set to true when a component has been + distance vector. A_INDEX is the index of the first reference + (0 for DDR_A, 1 for DDR_B) and B_INDEX is the index of the + second reference. INIT_B is set to true when a component has been added to the distance vector DIST_V. INDEX_CARRY is then set to the index in DIST_V that carries the dependence. */ static bool build_classic_dist_vector_1 (struct data_dependence_relation *ddr, - struct data_reference *ddr_a, - struct data_reference *ddr_b, + unsigned int a_index, unsigned int b_index, lambda_vector dist_v, bool *init_b, int *index_carry) { @@ -3188,8 +3338,8 @@ build_classic_dist_vector_1 (struct data return false; } - access_fn_a = DR_ACCESS_FN (ddr_a, i); - access_fn_b = DR_ACCESS_FN (ddr_b, i); + access_fn_a = SUB_ACCESS_FN (subscript, a_index); + access_fn_b = SUB_ACCESS_FN (subscript, b_index); if (TREE_CODE (access_fn_a) == POLYNOMIAL_CHREC && TREE_CODE (access_fn_b) == POLYNOMIAL_CHREC) @@ -3249,10 +3399,11 @@ build_classic_dist_vector_1 (struct data constant_access_functions (const struct data_dependence_relation *ddr) { unsigned i; + subscript *sub; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) - if (!evolution_function_is_constant_p (DR_ACCESS_FN (DDR_A (ddr), i)) - || !evolution_function_is_constant_p (DR_ACCESS_FN (DDR_B (ddr), i))) + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) + if (!evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 0)) + || !evolution_function_is_constant_p (SUB_ACCESS_FN (sub, 1))) return false; return true; @@ -3315,10 +3466,11 @@ add_other_self_distances (struct data_de lambda_vector dist_v; unsigned i; int index_carry = DDR_NB_LOOPS (ddr); + subscript *sub; - for (i = 0; i < DDR_NUM_SUBSCRIPTS (ddr); i++) + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) { - tree access_fun = DR_ACCESS_FN (DDR_A (ddr), i); + tree access_fun = SUB_ACCESS_FN (sub, 0); if (TREE_CODE (access_fun) == POLYNOMIAL_CHREC) { @@ -3330,7 +3482,7 @@ add_other_self_distances (struct data_de return; } - access_fun = DR_ACCESS_FN (DDR_A (ddr), 0); + access_fun = SUB_ACCESS_FN (DDR_SUBSCRIPT (ddr, 0), 0); if (TREE_CODE (CHREC_LEFT (access_fun)) == POLYNOMIAL_CHREC) add_multivariate_self_dist (ddr, access_fun); @@ -3401,6 +3553,23 @@ add_distance_for_zero_overlaps (struct d } } +/* Return true when the DDR contains two data references that have the + same access functions. */ + +static inline bool +same_access_functions (const struct data_dependence_relation *ddr) +{ + unsigned i; + subscript *sub; + + FOR_EACH_VEC_ELT (DDR_SUBSCRIPTS (ddr), i, sub) + if (!eq_evolutions_p (SUB_ACCESS_FN (sub, 0), + SUB_ACCESS_FN (sub, 1))) + return false; + + return true; +} + /* Compute the classic per loop distance vector. DDR is the data dependence relation to build a vector from. Return false when fail to represent the data dependence as a distance vector. */ @@ -3432,8 +3601,7 @@ build_classic_dist_vector (struct data_d } dist_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!build_classic_dist_vector_1 (ddr, DDR_A (ddr), DDR_B (ddr), - dist_v, &init_b, &index_carry)) + if (!build_classic_dist_vector_1 (ddr, 0, 1, dist_v, &init_b, &index_carry)) return false; /* Save the distance vector if we initialized one. */ @@ -3466,12 +3634,11 @@ build_classic_dist_vector (struct data_d if (!lambda_vector_lexico_pos (dist_v, DDR_NB_LOOPS (ddr))) { lambda_vector save_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), DDR_A (ddr), - loop_nest)) + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) return false; compute_subscript_distance (ddr); - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), - save_v, &init_b, &index_carry)) + if (!build_classic_dist_vector_1 (ddr, 1, 0, save_v, &init_b, + &index_carry)) return false; save_dist_v (ddr, save_v); DDR_REVERSED_P (ddr) = true; @@ -3507,12 +3674,10 @@ build_classic_dist_vector (struct data_d { lambda_vector opposite_v = lambda_vector_new (DDR_NB_LOOPS (ddr)); - if (!subscript_dependence_tester_1 (ddr, DDR_B (ddr), - DDR_A (ddr), loop_nest)) + if (!subscript_dependence_tester_1 (ddr, 1, 0, loop_nest)) return false; compute_subscript_distance (ddr); - if (!build_classic_dist_vector_1 (ddr, DDR_B (ddr), DDR_A (ddr), - opposite_v, &init_b, + if (!build_classic_dist_vector_1 (ddr, 1, 0, opposite_v, &init_b, &index_carry)) return false; @@ -3591,13 +3756,13 @@ build_classic_dir_vector (struct data_de } } -/* Helper function. Returns true when there is a dependence between - data references DRA and DRB. */ +/* Helper function. Returns true when there is a dependence between the + data references. A_INDEX is the index of the first reference (0 for + DDR_A, 1 for DDR_B) and B_INDEX is the index of the second reference. */ static bool subscript_dependence_tester_1 (struct data_dependence_relation *ddr, - struct data_reference *dra, - struct data_reference *drb, + unsigned int a_index, unsigned int b_index, struct loop *loop_nest) { unsigned int i; @@ -3609,8 +3774,8 @@ subscript_dependence_tester_1 (struct da { conflict_function *overlaps_a, *overlaps_b; - analyze_overlapping_iterations (DR_ACCESS_FN (dra, i), - DR_ACCESS_FN (drb, i), + analyze_overlapping_iterations (SUB_ACCESS_FN (subscript, a_index), + SUB_ACCESS_FN (subscript, b_index), &overlaps_a, &overlaps_b, &last_conflicts, loop_nest); @@ -3659,7 +3824,7 @@ subscript_dependence_tester_1 (struct da subscript_dependence_tester (struct data_dependence_relation *ddr, struct loop *loop_nest) { - if (subscript_dependence_tester_1 (ddr, DDR_A (ddr), DDR_B (ddr), loop_nest)) + if (subscript_dependence_tester_1 (ddr, 0, 1, loop_nest)) dependence_stats.num_dependence_dependent++; compute_subscript_distance (ddr); Index: gcc/tree-ssa-loop-prefetch.c =================================================================== --- gcc/tree-ssa-loop-prefetch.c 2017-03-28 16:19:28.000000000 +0100 +++ gcc/tree-ssa-loop-prefetch.c 2017-05-03 08:48:48.737038502 +0100 @@ -1650,6 +1650,7 @@ determine_loop_nest_reuse (struct loop * refb = (struct mem_ref *) DDR_B (dep)->aux; if (DDR_ARE_DEPENDENT (dep) == chrec_dont_know + || DDR_COULD_BE_INDEPENDENT_P (dep) || DDR_NUM_DIST_VECTS (dep) == 0) { /* If the dependence cannot be analyzed, assume that there might be Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-03-28 16:19:28.000000000 +0100 +++ gcc/tree-vectorizer.h 2017-05-03 08:48:48.738045102 +0100 @@ -383,7 +383,7 @@ #define LOOP_VINFO_ORIG_LOOP_INFO(L) #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \ ((L)->may_misalign_stmts.length () > 0) #define LOOP_REQUIRES_VERSIONING_FOR_ALIAS(L) \ - ((L)->may_alias_ddrs.length () > 0) + ((L)->comp_alias_ddrs.length () > 0) #define LOOP_REQUIRES_VERSIONING_FOR_NITERS(L) \ (LOOP_VINFO_NITERS_ASSUMPTIONS (L)) #define LOOP_REQUIRES_VERSIONING(L) \ Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-05-03 08:48:30.536704993 +0100 +++ gcc/tree-vect-data-refs.c 2017-05-03 08:48:48.738045102 +0100 @@ -340,6 +340,26 @@ vect_analyze_data_ref_dependence (struct } loop_depth = index_in_loop_nest (loop->num, DDR_LOOP_NEST (ddr)); + + if (DDR_COULD_BE_INDEPENDENT_P (ddr)) + /* For dependence distances of 2 or more, we have the option of + limiting VF or checking for an alias at runtime. Prefer to check + at runtime if we can, to avoid limiting the VF unnecessarily when + the bases are in fact independent. + + Note that the alias checks will be removed if the VF ends up + being small enough. */ + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) + { + int dist = dist_v[loop_depth]; + if (dist != 0 && !(dist > 0 && DDR_REVERSED_P (ddr))) + { + if (vect_mark_for_runtime_alias_test (ddr, loop_vinfo)) + return false; + break; + } + } + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) { int dist = dist_v[loop_depth]; @@ -3017,6 +3037,44 @@ vect_no_alias_p (struct data_reference * return false; } +/* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH + in DDR is >= VF. */ + +static bool +dependence_distance_ge_vf (data_dependence_relation *ddr, + unsigned int loop_depth, unsigned HOST_WIDE_INT vf) +{ + if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE + || DDR_NUM_DIST_VECTS (ddr) == 0) + return false; + + /* If the dependence is exact, we should have limited the VF instead. */ + gcc_checking_assert (DDR_COULD_BE_INDEPENDENT_P (ddr)); + + unsigned int i; + lambda_vector dist_v; + FOR_EACH_VEC_ELT (DDR_DIST_VECTS (ddr), i, dist_v) + { + HOST_WIDE_INT dist = dist_v[loop_depth]; + if (dist != 0 + && !(dist > 0 && DDR_REVERSED_P (ddr)) + && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf) + return false; + } + + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + "dependence distance between "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_A (ddr))); + dump_printf (MSG_NOTE, " and "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, DR_REF (DDR_B (ddr))); + dump_printf (MSG_NOTE, " is >= VF\n"); + } + + return true; +} + /* Function vect_prune_runtime_alias_test_list. Prune a list of ddrs to be tested at run-time by versioning for alias. @@ -3075,6 +3133,10 @@ vect_prune_runtime_alias_test_list (loop comp_alias_ddrs.create (may_alias_ddrs.length ()); + unsigned int loop_depth + = index_in_loop_nest (LOOP_VINFO_LOOP (loop_vinfo)->num, + LOOP_VINFO_LOOP_NEST (loop_vinfo)); + /* First, we collect all data ref pairs for aliasing checks. */ FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr) { @@ -3084,6 +3146,11 @@ vect_prune_runtime_alias_test_list (loop tree segment_length_a, segment_length_b; gimple *stmt_a, *stmt_b; + /* Ignore the alias if the VF we chose ended up being no greater + than the dependence distance. */ + if (dependence_distance_ge_vf (ddr, loop_depth, vect_factor)) + continue; + dr_a = DDR_A (ddr); stmt_a = DR_STMT (DDR_A (ddr)); dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a)); @@ -3294,10 +3361,6 @@ vect_prune_runtime_alias_test_list (loop return false; } - /* All alias checks have been resolved at compilation time. */ - if (!comp_alias_ddrs.length ()) - LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).truncate (0); - return true; } Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c =================================================================== --- /dev/null 2017-05-03 08:16:26.972699664 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-3.c 2017-05-03 08:48:48.736031902 +0100 @@ -0,0 +1,104 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ + +/* Intended to be larger than any VF. */ +#define GAP 128 +#define N (GAP * 3) + +struct s { int x[N + 1]; }; +struct t { struct s x[N + 1]; }; +struct u { int x[N + 1]; int y; }; + +void +f1 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i]; +} + +void +f2 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[2].x[i]; +} + +void +f3 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[i].x[i]; +} + +void +f4 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[i].x[i] += b[i].x[i]; +} + +void +f5 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i + 1]; +} + +void +f6 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[2].x[i + 1]; +} + +void +f7 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[1].x[i] += b[i].x[i + 1]; +} + +void +f8 (struct s *a, struct s *b) +{ + for (int i = 0; i < N; ++i) + a[i].x[i] += b[i].x[i + 1]; +} + +void +f9 (struct s *a, struct t *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[1].x[i]; +} + +void +f10 (struct s *a, struct t *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i].x[i]; +} + +void +f11 (struct u *a, struct u *b) +{ + for (int i = 0; i < N; ++i) + a->x[i] += b->x[i] + b[i].y; +} + +void +f12 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP; ++i) + a->x[i + GAP] += b->x[i]; +} + +void +f13 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP * 2; ++i) + a->x[i + GAP] += b->x[i]; +} + +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 13 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c =================================================================== --- /dev/null 2017-05-03 08:16:26.972699664 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-4.c 2017-05-03 08:48:48.736031902 +0100 @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "--param vect-max-version-for-alias-checks=0" } */ + +#define N 16 + +struct s1 { int a[N]; }; +struct s2 { struct s1 b; int c; }; +struct s3 { int d; struct s1 e; }; +union u { struct s2 f; struct s3 g; }; + +/* We allow a and b to overlap arbitrarily. */ + +void +f1 (int a[][N], int b[][N]) +{ + for (int i = 0; i < N; ++i) + a[0][i] += b[0][i]; +} + +void +f2 (union u *a, union u *b) +{ + for (int i = 0; i < N; ++i) + a->f.b.a[i] += b->g.e.a[i]; +} + +void +f3 (struct s1 *a, struct s1 *b) +{ + for (int i = 0; i < N - 1; ++i) + a->a[i + 1] += b->a[i]; +} + +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c =================================================================== --- /dev/null 2017-05-03 08:16:26.972699664 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-5.c 2017-05-03 08:48:48.736031902 +0100 @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_int } */ + +/* Intended to be larger than any VF. */ +#define GAP 128 +#define N (GAP * 3) + +struct s { int x[N]; }; + +void +f1 (struct s *a, struct s *b) +{ + for (int i = 0; i < GAP * 2; ++i) + a->x[i + GAP] += b->x[i]; +} + +/* { dg-final { scan-tree-dump-times "mark for run-time aliasing" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "improved number of alias checks from 1 to 0" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */