Message ID | 87k244z2c5.fsf@linaro.org |
---|---|
State | New |
Headers | show |
On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > The test case triggered this assert in vect_update_misalignment_for_peel: > > gcc_assert (DR_MISALIGNMENT (dr) / dr_size == > DR_MISALIGNMENT (dr_peel) / dr_peel_size); > > We knew that the two DRs had the same misalignment at runtime, but when > considered in isolation, one data reference guaranteed a higher compile-time > base alignment than the other. > > In the test case this looks like a missed opportunity. Both references > are unconditional, so it should be possible to use the highest of the > available base alignment guarantees when analyzing each reference. > The patch does this. > > However, as the comment in the patch says, the base alignment guarantees > provided by a conditional reference only apply if the reference occurs > at least once. In this case it would be legitimate for two references > to have the same runtime misalignment and for one reference to provide a > stronger compile-time guarantee than the other about what the misalignment > actually is. The patch therefore relaxes the assert to handle that case. Hmm, but you don't actually check whether a reference occurs only conditional, do you? You just seem to say that for masked loads/stores the reference is conditional (I believe that's not true). But for a loop like for (;;) if (a[i]) sum += b[j]; you still assume b[j] executes unconditionally? The vectorizer of course only sees unconditionally executed stmts. So - I'd simply not add this DR_IS_CONDITIONAL. Did you run into any real-world (testsuite) issues without this? Note that the assert is to prevent bogus information. Iff we aligned DR with base alignment 8 and misalign 3 then if another same-align DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT as it still can be 8 after aligning DR. So I think it's wrong to put DRs with differing base-alignment into the same-align-refs chain, those should get their DR_MISALIGNMENT updated independenlty after peeling. I'd rather not mix fixing this with the improvement to eventuall use a larger align for the other DR if possible. Thanks, Richard. > Tested on powerpc64-linux-gnu, aarch64-linux-gnu and x86_64-linux-gnu. > OK to instal? > > Richard > > > 2017-06-22 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > PR tree-optimization/81136 > * tree-vectorizer.h: Include tree-hash-traits.h. > (vec_base_alignments): New typedef. > (vec_info): Add a base_alignments field. > (vect_compute_base_alignments: Declare. > * tree-data-ref.h (data_reference): Add an is_conditional field. > (DR_IS_CONDITIONAL): New macro. > (create_data_ref): Add an is_conditional argument. > * tree-data-ref.c (create_data_ref): Likewise. Use it to initialize > the is_conditional field. > (data_ref_loc): Add an is_conditional field. > (get_references_in_stmt): Set the is_conditional field. > (find_data_references_in_stmt): Update call to create_data_ref. > (graphite_find_data_references_in_stmt): Likewise. > * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Likewise. > * tree-vect-data-refs.c (vect_analyze_data_refs): Likewise. > (vect_get_base_address): New function. > (vect_compute_base_alignments): Likewise. > (vect_compute_base_alignment): Likewise, split out from... > (vect_compute_data_ref_alignment): ...here. Use precomputed > base alignments. Only compute a new base alignment here if the > reference is conditional. > (vect_update_misalignment_for_peel): Allow the compile-time > DR_MISALIGNMENTs of two references with the same runtime alignment > to be different if one of the references is conditional. > (vect_find_same_alignment_drs): Compare base addresses instead > of base objects. > (vect_compute_data_ref_alignment): Call vect_compute_base_alignments. > * tree-vect-slp.c (vect_slp_analyze_bb_1): Likewise. > (new_bb_vec_info): Initialize base_alignments. > * tree-vect-loop.c (new_loop_vec_info): Likewise. > * tree-vectorizer.c (vect_destroy_datarefs): Release base_alignments. > > gcc/testsuite/ > PR tree-optimization/81136 > * gcc.dg/vect/pr81136.c: New test. > > Index: gcc/tree-vectorizer.h > =================================================================== > --- gcc/tree-vectorizer.h 2017-06-08 08:51:43.347264181 +0100 > +++ gcc/tree-vectorizer.h 2017-06-22 12:23:21.288421018 +0100 > @@ -22,6 +22,7 @@ Software Foundation; either version 3, o > #define GCC_TREE_VECTORIZER_H > > #include "tree-data-ref.h" > +#include "tree-hash-traits.h" > #include "target.h" > > /* Used for naming of new temporaries. */ > @@ -84,6 +85,10 @@ struct stmt_info_for_cost { > > typedef vec<stmt_info_for_cost> stmt_vector_for_cost; > > +/* Maps base addresses to the largest alignment that we've been able > + to calculate for them. */ > +typedef hash_map<tree_operand_hash, unsigned int> vec_base_alignments; > + > /************************************************************************ > SLP > ************************************************************************/ > @@ -156,6 +161,10 @@ struct vec_info { > /* All data references. */ > vec<data_reference_p> datarefs; > > + /* Maps the base addresses of all data references in DATAREFS to the > + largest alignment that we've been able to calculate for them. */ > + vec_base_alignments base_alignments; > + > /* All data dependences. */ > vec<ddr_p> ddrs; > > @@ -1117,6 +1126,7 @@ extern bool vect_prune_runtime_alias_tes > extern bool vect_check_gather_scatter (gimple *, loop_vec_info, > gather_scatter_info *); > extern bool vect_analyze_data_refs (vec_info *, int *); > +extern void vect_compute_base_alignments (vec_info *); > extern tree vect_create_data_ref_ptr (gimple *, tree, struct loop *, tree, > tree *, gimple_stmt_iterator *, > gimple **, bool, bool *, > Index: gcc/tree-data-ref.h > =================================================================== > --- gcc/tree-data-ref.h 2017-06-08 08:51:43.349263895 +0100 > +++ gcc/tree-data-ref.h 2017-06-22 12:23:21.285421180 +0100 > @@ -119,6 +119,10 @@ struct data_reference > /* True when the data reference is in RHS of a stmt. */ > bool is_read; > > + /* True when the data reference is conditional, i.e. if it might not > + occur even when the statement runs to completion. */ > + bool is_conditional; > + > /* Behavior of the memory reference in the innermost loop. */ > struct innermost_loop_behavior innermost; > > @@ -138,6 +142,7 @@ #define DR_ACCESS_FN(DR, I) DR_AC > #define DR_NUM_DIMENSIONS(DR) DR_ACCESS_FNS (DR).length () > #define DR_IS_READ(DR) (DR)->is_read > #define DR_IS_WRITE(DR) (!DR_IS_READ (DR)) > +#define DR_IS_CONDITIONAL(DR) (DR)->is_conditional > #define DR_BASE_ADDRESS(DR) (DR)->innermost.base_address > #define DR_OFFSET(DR) (DR)->innermost.offset > #define DR_INIT(DR) (DR)->innermost.init > @@ -350,7 +355,8 @@ extern bool graphite_find_data_reference > vec<data_reference_p> *); > tree find_data_references_in_loop (struct loop *, vec<data_reference_p> *); > bool loop_nest_has_data_refs (loop_p loop); > -struct data_reference *create_data_ref (loop_p, loop_p, tree, gimple *, bool); > +struct data_reference *create_data_ref (loop_p, loop_p, tree, gimple *, bool, > + bool); > extern bool find_loop_nest (struct loop *, vec<loop_p> *); > extern struct data_dependence_relation *initialize_data_dependence_relation > (struct data_reference *, struct data_reference *, vec<loop_p>); > Index: gcc/tree-data-ref.c > =================================================================== > --- gcc/tree-data-ref.c 2017-06-08 08:51:43.349263895 +0100 > +++ gcc/tree-data-ref.c 2017-06-22 12:23:21.284421233 +0100 > @@ -1053,15 +1053,18 @@ free_data_ref (data_reference_p dr) > free (dr); > } > > -/* Analyzes memory reference MEMREF accessed in STMT. The reference > - is read if IS_READ is true, write otherwise. Returns the > - data_reference description of MEMREF. NEST is the outermost loop > - in which the reference should be instantiated, LOOP is the loop in > - which the data reference should be analyzed. */ > +/* Analyze memory reference MEMREF, which is accessed in STMT. The reference > + is a read if IS_READ is true, otherwise it is a write. IS_CONDITIONAL > + indicates that the reference is conditional, i.e. that it might not > + occur every time that STMT runs to completion. > + > + Return the data_reference description of MEMREF. NEST is the outermost > + loop in which the reference should be instantiated, LOOP is the loop > + in which the data reference should be analyzed. */ > > struct data_reference * > create_data_ref (loop_p nest, loop_p loop, tree memref, gimple *stmt, > - bool is_read) > + bool is_read, bool is_conditional) > { > struct data_reference *dr; > > @@ -1076,6 +1079,7 @@ create_data_ref (loop_p nest, loop_p loo > DR_STMT (dr) = stmt; > DR_REF (dr) = memref; > DR_IS_READ (dr) = is_read; > + DR_IS_CONDITIONAL (dr) = is_conditional; > > dr_analyze_innermost (dr, nest); > dr_analyze_indices (dr, nest, loop); > @@ -4446,6 +4450,10 @@ struct data_ref_loc > > /* True if the memory reference is read. */ > bool is_read; > + > + /* True if the data reference is conditional, i.e. if it might not > + occur even when the statement runs to completion. */ > + bool is_conditional; > }; > > > @@ -4512,6 +4520,7 @@ get_references_in_stmt (gimple *stmt, ve > { > ref.ref = op1; > ref.is_read = true; > + ref.is_conditional = false; > references->safe_push (ref); > } > } > @@ -4539,6 +4548,7 @@ get_references_in_stmt (gimple *stmt, ve > type = TREE_TYPE (gimple_call_arg (stmt, 3)); > if (TYPE_ALIGN (type) != align) > type = build_aligned_type (type, align); > + ref.is_conditional = true; > ref.ref = fold_build2 (MEM_REF, type, gimple_call_arg (stmt, 0), > ptr); > references->safe_push (ref); > @@ -4558,6 +4568,7 @@ get_references_in_stmt (gimple *stmt, ve > { > ref.ref = op1; > ref.is_read = true; > + ref.is_conditional = false; > references->safe_push (ref); > } > } > @@ -4571,6 +4582,7 @@ get_references_in_stmt (gimple *stmt, ve > { > ref.ref = op0; > ref.is_read = false; > + ref.is_conditional = false; > references->safe_push (ref); > } > return clobbers_memory; > @@ -4635,8 +4647,8 @@ find_data_references_in_stmt (struct loo > > FOR_EACH_VEC_ELT (references, i, ref) > { > - dr = create_data_ref (nest, loop_containing_stmt (stmt), > - ref->ref, stmt, ref->is_read); > + dr = create_data_ref (nest, loop_containing_stmt (stmt), ref->ref, > + stmt, ref->is_read, ref->is_conditional); > gcc_assert (dr != NULL); > datarefs->safe_push (dr); > } > @@ -4665,7 +4677,8 @@ graphite_find_data_references_in_stmt (l > > FOR_EACH_VEC_ELT (references, i, ref) > { > - dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read); > + dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read, > + ref->is_conditional); > gcc_assert (dr != NULL); > datarefs->safe_push (dr); > } > Index: gcc/tree-ssa-loop-prefetch.c > =================================================================== > --- gcc/tree-ssa-loop-prefetch.c 2017-06-07 21:58:55.928557601 +0100 > +++ gcc/tree-ssa-loop-prefetch.c 2017-06-22 12:23:21.285421180 +0100 > @@ -1633,7 +1633,7 @@ determine_loop_nest_reuse (struct loop * > for (ref = gr->refs; ref; ref = ref->next) > { > dr = create_data_ref (nest, loop_containing_stmt (ref->stmt), > - ref->mem, ref->stmt, !ref->write_p); > + ref->mem, ref->stmt, !ref->write_p, false); > > if (dr) > { > Index: gcc/tree-vect-data-refs.c > =================================================================== > --- gcc/tree-vect-data-refs.c 2017-06-08 08:51:43.350263752 +0100 > +++ gcc/tree-vect-data-refs.c 2017-06-22 12:23:21.286421126 +0100 > @@ -646,6 +646,102 @@ vect_slp_analyze_instance_dependence (sl > return res; > } > > +/* If DR is nested in a loop that is being vectorized, return the base > + address in the context of the vectorized loop (rather than the > + nested loop). Otherwise return the base address in the context > + of the containing statement. */ > + > +static tree > +vect_get_base_address (data_reference *dr) > +{ > + gimple *stmt = DR_STMT (dr); > + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > + struct loop *loop = loop_vinfo != NULL ? LOOP_VINFO_LOOP (loop_vinfo) : NULL; > + if (loop && nested_in_vect_loop_p (loop, stmt)) > + return STMT_VINFO_DR_BASE_ADDRESS (stmt_info); > + else > + return DR_BASE_ADDRESS (dr); > +} > + > +/* Compute and return the alignment of base address BASE_ADDR in DR. */ > + > +static unsigned int > +vect_compute_base_alignment (data_reference *dr, tree base_addr) > +{ > + /* To look at the alignment of the base we have to preserve an inner > + MEM_REF as that carries the alignment information of the actual > + access. */ > + tree base = DR_REF (dr); > + while (handled_component_p (base)) > + base = TREE_OPERAND (base, 0); > + unsigned int base_alignment = 0; > + unsigned HOST_WIDE_INT base_bitpos; > + get_object_alignment_1 (base, &base_alignment, &base_bitpos); > + > + /* As data-ref analysis strips the MEM_REF down to its base operand > + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to > + adjust things to make base_alignment valid as the alignment of > + DR_BASE_ADDRESS. */ > + if (TREE_CODE (base) == MEM_REF) > + { > + /* Note all this only works if DR_BASE_ADDRESS is the same as > + MEM_REF operand zero, otherwise DR/SCEV analysis might have factored > + in other offsets. We need to rework DR to compute the alingment > + of DR_BASE_ADDRESS as long as all information is still available. */ > + if (operand_equal_p (TREE_OPERAND (base, 0), base_addr, 0)) > + { > + base_bitpos -= mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT; > + base_bitpos &= (base_alignment - 1); > + } > + else > + base_bitpos = BITS_PER_UNIT; > + } > + if (base_bitpos != 0) > + base_alignment = base_bitpos & -base_bitpos; > + > + /* Also look at the alignment of the base address DR analysis > + computed. */ > + unsigned int base_addr_alignment = get_pointer_alignment (base_addr); > + if (base_addr_alignment > base_alignment) > + base_alignment = base_addr_alignment; > + > + return base_alignment; > +} > + > +/* Compute alignments for the base addresses of all datarefs in VINFO. */ > + > +void > +vect_compute_base_alignments (vec_info *vinfo) > +{ > + /* If the region we're going to vectorize is reached, all unconditional > + data references occur at least once. We can therefore pool the base > + alignment guarantees from each unconditional reference. */ > + data_reference *dr; > + unsigned int i; > + FOR_EACH_VEC_ELT (vinfo->datarefs, i, dr) > + if (!DR_IS_CONDITIONAL (dr)) > + { > + tree base_addr = vect_get_base_address (dr); > + unsigned int alignment = vect_compute_base_alignment (dr, base_addr); > + bool existed; > + unsigned int &entry > + = vinfo->base_alignments.get_or_insert (base_addr, &existed); > + if (!existed || entry < alignment) > + { > + entry = alignment; > + if (dump_enabled_p ()) > + { > + dump_printf_loc (MSG_NOTE, vect_location, > + "setting base alignment for "); > + dump_generic_expr (MSG_NOTE, TDF_SLIM, base_addr); > + dump_printf (MSG_NOTE, " to %d, based on ", alignment); > + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, DR_STMT (dr), 0); > + } > + } > + } > +} > + > /* Function vect_compute_data_ref_alignment > > Compute the misalignment of the data reference DR. > @@ -663,6 +759,7 @@ vect_compute_data_ref_alignment (struct > { > gimple *stmt = DR_STMT (dr); > stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > + vec_base_alignments *base_alignments = &stmt_info->vinfo->base_alignments; > loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > struct loop *loop = NULL; > tree ref = DR_REF (dr); > @@ -699,6 +796,8 @@ vect_compute_data_ref_alignment (struct > { > tree step = DR_STEP (dr); > > + base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info); > + aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info); > if (tree_fits_shwi_p (step) > && tree_to_shwi (step) % GET_MODE_SIZE (TYPE_MODE (vectype)) == 0) > { > @@ -706,8 +805,6 @@ vect_compute_data_ref_alignment (struct > dump_printf_loc (MSG_NOTE, vect_location, > "inner step divides the vector-size.\n"); > misalign = STMT_VINFO_DR_INIT (stmt_info); > - aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info); > - base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info); > } > else > { > @@ -738,39 +835,15 @@ vect_compute_data_ref_alignment (struct > } > } > > - /* To look at alignment of the base we have to preserve an inner MEM_REF > - as that carries alignment information of the actual access. */ > - base = ref; > - while (handled_component_p (base)) > - base = TREE_OPERAND (base, 0); > + /* Calculate the maximum of the pooled base address alignment and the > + alignment that we can compute for DR itself. The latter should > + already be included in the former for unconditional references. */ > unsigned int base_alignment = 0; > - unsigned HOST_WIDE_INT base_bitpos; > - get_object_alignment_1 (base, &base_alignment, &base_bitpos); > - /* As data-ref analysis strips the MEM_REF down to its base operand > - to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to > - adjust things to make base_alignment valid as the alignment of > - DR_BASE_ADDRESS. */ > - if (TREE_CODE (base) == MEM_REF) > - { > - /* Note all this only works if DR_BASE_ADDRESS is the same as > - MEM_REF operand zero, otherwise DR/SCEV analysis might have factored > - in other offsets. We need to rework DR to compute the alingment > - of DR_BASE_ADDRESS as long as all information is still available. */ > - if (operand_equal_p (TREE_OPERAND (base, 0), base_addr, 0)) > - { > - base_bitpos -= mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT; > - base_bitpos &= (base_alignment - 1); > - } > - else > - base_bitpos = BITS_PER_UNIT; > - } > - if (base_bitpos != 0) > - base_alignment = base_bitpos & -base_bitpos; > - /* Also look at the alignment of the base address DR analysis > - computed. */ > - unsigned int base_addr_alignment = get_pointer_alignment (base_addr); > - if (base_addr_alignment > base_alignment) > - base_alignment = base_addr_alignment; > + if (DR_IS_CONDITIONAL (dr)) > + base_alignment = vect_compute_base_alignment (dr, base_addr); > + if (unsigned int *entry = base_alignments->get (base_addr)) > + base_alignment = MAX (base_alignment, *entry); > + gcc_assert (base_alignment != 0); > > if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) > DR_VECT_AUX (dr)->base_element_aligned = true; > @@ -906,8 +979,29 @@ vect_update_misalignment_for_peel (struc > { > if (current_dr != dr) > continue; > - gcc_assert (DR_MISALIGNMENT (dr) / dr_size == > - DR_MISALIGNMENT (dr_peel) / dr_peel_size); > + /* Any alignment guarantees provided by a reference only apply if > + the reference actually occurs. For example, in: > + > + struct s __attribute__((aligned(32))) { > + int misaligner; > + int array[N]; > + }; > + > + int *ptr; > + for (int i = 0; i < n; ++i) > + ptr[i] = c[i] ? ((struct s *) (ptr - 1))->array[i] : 0; > + > + we can only assume that ptr is part of a struct s if at least one > + c[i] is true. This in turn means that we have a higher base > + alignment guarantee for the read from ptr (if it occurs) than for > + the write to ptr, and we cannot unconditionally carry the former > + over to the latter. We still know that the two address values > + have the same misalignment, so if peeling has forced one of them > + to be aligned, the other must be too. */ > + gcc_assert (DR_IS_CONDITIONAL (dr_peel) > + || DR_IS_CONDITIONAL (dr) > + || (DR_MISALIGNMENT (dr) / dr_size > + == DR_MISALIGNMENT (dr_peel) / dr_peel_size)); > SET_DR_MISALIGNMENT (dr, 0); > return; > } > @@ -2117,8 +2211,7 @@ vect_find_same_alignment_drs (struct dat > if (dra == drb) > return; > > - if (!operand_equal_p (DR_BASE_OBJECT (dra), DR_BASE_OBJECT (drb), > - OEP_ADDRESS_OF) > + if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) > || !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0) > || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0)) > return; > @@ -2176,6 +2269,7 @@ vect_analyze_data_refs_alignment (loop_v > vec<data_reference_p> datarefs = vinfo->datarefs; > struct data_reference *dr; > > + vect_compute_base_alignments (vinfo); > FOR_EACH_VEC_ELT (datarefs, i, dr) > { > stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr)); > @@ -3374,7 +3468,8 @@ vect_analyze_data_refs (vec_info *vinfo, > { > struct data_reference *newdr > = create_data_ref (NULL, loop_containing_stmt (stmt), > - DR_REF (dr), stmt, maybe_scatter ? false : true); > + DR_REF (dr), stmt, !maybe_scatter, > + DR_IS_CONDITIONAL (dr)); > gcc_assert (newdr != NULL && DR_REF (newdr)); > if (DR_BASE_ADDRESS (newdr) > && DR_OFFSET (newdr) > Index: gcc/tree-vect-slp.c > =================================================================== > --- gcc/tree-vect-slp.c 2017-06-07 21:58:56.336475882 +0100 > +++ gcc/tree-vect-slp.c 2017-06-22 12:23:21.288421018 +0100 > @@ -2367,6 +2367,7 @@ new_bb_vec_info (gimple_stmt_iterator re > gimple_stmt_iterator gsi; > > res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); > + new (&res->base_alignments) vec_base_alignments (); > res->kind = vec_info::bb; > BB_VINFO_BB (res) = bb; > res->region_begin = region_begin; > @@ -2741,6 +2742,8 @@ vect_slp_analyze_bb_1 (gimple_stmt_itera > return NULL; > } > > + vect_compute_base_alignments (bb_vinfo); > + > /* Analyze and verify the alignment of data references and the > dependence in the SLP instances. */ > for (i = 0; BB_VINFO_SLP_INSTANCES (bb_vinfo).iterate (i, &instance); ) > Index: gcc/tree-vect-loop.c > =================================================================== > --- gcc/tree-vect-loop.c 2017-06-22 12:22:57.734313143 +0100 > +++ gcc/tree-vect-loop.c 2017-06-22 12:23:21.287421072 +0100 > @@ -1157,6 +1157,7 @@ new_loop_vec_info (struct loop *loop) > LOOP_VINFO_VECT_FACTOR (res) = 0; > LOOP_VINFO_LOOP_NEST (res) = vNULL; > LOOP_VINFO_DATAREFS (res) = vNULL; > + new (&res->base_alignments) vec_base_alignments (); > LOOP_VINFO_DDRS (res) = vNULL; > LOOP_VINFO_UNALIGNED_DR (res) = NULL; > LOOP_VINFO_MAY_MISALIGN_STMTS (res) = vNULL; > Index: gcc/tree-vectorizer.c > =================================================================== > --- gcc/tree-vectorizer.c 2017-06-22 12:22:57.732313220 +0100 > +++ gcc/tree-vectorizer.c 2017-06-22 12:23:21.288421018 +0100 > @@ -370,6 +370,8 @@ vect_destroy_datarefs (vec_info *vinfo) > } > > free_data_refs (vinfo->datarefs); > + > + vinfo->base_alignments.~vec_base_alignments (); > } > > /* A helper function to free scev and LOOP niter information, as well as > Index: gcc/testsuite/gcc.dg/vect/pr81136.c > =================================================================== > --- /dev/null 2017-06-22 07:43:14.805493307 +0100 > +++ gcc/testsuite/gcc.dg/vect/pr81136.c 2017-06-22 12:23:21.283421287 +0100 > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > + > +struct __attribute__((aligned (32))) > +{ > + char misaligner; > + int foo[100]; > + int bar[100]; > +} *a; > + > +void > +fn1 (int n) > +{ > + int *b = a->foo; > + for (int i = 0; i < n; i++) > + a->bar[i] = b[i]; > +}
On Fri, Jun 23, 2017 at 1:19 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> The test case triggered this assert in vect_update_misalignment_for_peel: >> >> gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >> DR_MISALIGNMENT (dr_peel) / dr_peel_size); >> >> We knew that the two DRs had the same misalignment at runtime, but when >> considered in isolation, one data reference guaranteed a higher compile-time >> base alignment than the other. >> >> In the test case this looks like a missed opportunity. Both references >> are unconditional, so it should be possible to use the highest of the >> available base alignment guarantees when analyzing each reference. >> The patch does this. >> >> However, as the comment in the patch says, the base alignment guarantees >> provided by a conditional reference only apply if the reference occurs >> at least once. In this case it would be legitimate for two references >> to have the same runtime misalignment and for one reference to provide a >> stronger compile-time guarantee than the other about what the misalignment >> actually is. The patch therefore relaxes the assert to handle that case. > > Hmm, but you don't actually check whether a reference occurs only conditional, > do you? You just seem to say that for masked loads/stores the reference > is conditional (I believe that's not true). But for a loop like > > for (;;) > if (a[i]) > sum += b[j]; > > you still assume b[j] executes unconditionally? > > The vectorizer of course only sees unconditionally executed stmts. > > So - I'd simply not add this DR_IS_CONDITIONAL. Did you run into > any real-world (testsuite) issues without this? + for (int i = 0; i < n; ++i) + ptr[i] = c[i] ? ((struct s *) (ptr - 1))->array[i] : 0; + note that this example shows an if-conversion bug as if-conversion makes the access to ((struct s *) (ptr - 1))->array[i] unconditional if it may not trap (or the trap helper doesn't consider alignment-related traps). I'm not convinced we should worry ... (ok, I didn't say that). Richard. > Note that the assert is to prevent bogus information. Iff we aligned > DR with base alignment 8 and misalign 3 then if another same-align > DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT > as it still can be 8 after aligning DR. > > So I think it's wrong to put DRs with differing base-alignment into > the same-align-refs chain, those should get their DR_MISALIGNMENT > updated independenlty after peeling. > > I'd rather not mix fixing this with the improvement to eventuall use a > larger align for the other DR if possible. > > Thanks, > Richard. > >> Tested on powerpc64-linux-gnu, aarch64-linux-gnu and x86_64-linux-gnu. >> OK to instal? >> >> Richard >> >> >> 2017-06-22 Richard Sandiford <richard.sandiford@linaro.org> >> >> gcc/ >> PR tree-optimization/81136 >> * tree-vectorizer.h: Include tree-hash-traits.h. >> (vec_base_alignments): New typedef. >> (vec_info): Add a base_alignments field. >> (vect_compute_base_alignments: Declare. >> * tree-data-ref.h (data_reference): Add an is_conditional field. >> (DR_IS_CONDITIONAL): New macro. >> (create_data_ref): Add an is_conditional argument. >> * tree-data-ref.c (create_data_ref): Likewise. Use it to initialize >> the is_conditional field. >> (data_ref_loc): Add an is_conditional field. >> (get_references_in_stmt): Set the is_conditional field. >> (find_data_references_in_stmt): Update call to create_data_ref. >> (graphite_find_data_references_in_stmt): Likewise. >> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Likewise. >> * tree-vect-data-refs.c (vect_analyze_data_refs): Likewise. >> (vect_get_base_address): New function. >> (vect_compute_base_alignments): Likewise. >> (vect_compute_base_alignment): Likewise, split out from... >> (vect_compute_data_ref_alignment): ...here. Use precomputed >> base alignments. Only compute a new base alignment here if the >> reference is conditional. >> (vect_update_misalignment_for_peel): Allow the compile-time >> DR_MISALIGNMENTs of two references with the same runtime alignment >> to be different if one of the references is conditional. >> (vect_find_same_alignment_drs): Compare base addresses instead >> of base objects. >> (vect_compute_data_ref_alignment): Call vect_compute_base_alignments. >> * tree-vect-slp.c (vect_slp_analyze_bb_1): Likewise. >> (new_bb_vec_info): Initialize base_alignments. >> * tree-vect-loop.c (new_loop_vec_info): Likewise. >> * tree-vectorizer.c (vect_destroy_datarefs): Release base_alignments. >> >> gcc/testsuite/ >> PR tree-optimization/81136 >> * gcc.dg/vect/pr81136.c: New test. >> >> Index: gcc/tree-vectorizer.h >> =================================================================== >> --- gcc/tree-vectorizer.h 2017-06-08 08:51:43.347264181 +0100 >> +++ gcc/tree-vectorizer.h 2017-06-22 12:23:21.288421018 +0100 >> @@ -22,6 +22,7 @@ Software Foundation; either version 3, o >> #define GCC_TREE_VECTORIZER_H >> >> #include "tree-data-ref.h" >> +#include "tree-hash-traits.h" >> #include "target.h" >> >> /* Used for naming of new temporaries. */ >> @@ -84,6 +85,10 @@ struct stmt_info_for_cost { >> >> typedef vec<stmt_info_for_cost> stmt_vector_for_cost; >> >> +/* Maps base addresses to the largest alignment that we've been able >> + to calculate for them. */ >> +typedef hash_map<tree_operand_hash, unsigned int> vec_base_alignments; >> + >> /************************************************************************ >> SLP >> ************************************************************************/ >> @@ -156,6 +161,10 @@ struct vec_info { >> /* All data references. */ >> vec<data_reference_p> datarefs; >> >> + /* Maps the base addresses of all data references in DATAREFS to the >> + largest alignment that we've been able to calculate for them. */ >> + vec_base_alignments base_alignments; >> + >> /* All data dependences. */ >> vec<ddr_p> ddrs; >> >> @@ -1117,6 +1126,7 @@ extern bool vect_prune_runtime_alias_tes >> extern bool vect_check_gather_scatter (gimple *, loop_vec_info, >> gather_scatter_info *); >> extern bool vect_analyze_data_refs (vec_info *, int *); >> +extern void vect_compute_base_alignments (vec_info *); >> extern tree vect_create_data_ref_ptr (gimple *, tree, struct loop *, tree, >> tree *, gimple_stmt_iterator *, >> gimple **, bool, bool *, >> Index: gcc/tree-data-ref.h >> =================================================================== >> --- gcc/tree-data-ref.h 2017-06-08 08:51:43.349263895 +0100 >> +++ gcc/tree-data-ref.h 2017-06-22 12:23:21.285421180 +0100 >> @@ -119,6 +119,10 @@ struct data_reference >> /* True when the data reference is in RHS of a stmt. */ >> bool is_read; >> >> + /* True when the data reference is conditional, i.e. if it might not >> + occur even when the statement runs to completion. */ >> + bool is_conditional; >> + >> /* Behavior of the memory reference in the innermost loop. */ >> struct innermost_loop_behavior innermost; >> >> @@ -138,6 +142,7 @@ #define DR_ACCESS_FN(DR, I) DR_AC >> #define DR_NUM_DIMENSIONS(DR) DR_ACCESS_FNS (DR).length () >> #define DR_IS_READ(DR) (DR)->is_read >> #define DR_IS_WRITE(DR) (!DR_IS_READ (DR)) >> +#define DR_IS_CONDITIONAL(DR) (DR)->is_conditional >> #define DR_BASE_ADDRESS(DR) (DR)->innermost.base_address >> #define DR_OFFSET(DR) (DR)->innermost.offset >> #define DR_INIT(DR) (DR)->innermost.init >> @@ -350,7 +355,8 @@ extern bool graphite_find_data_reference >> vec<data_reference_p> *); >> tree find_data_references_in_loop (struct loop *, vec<data_reference_p> *); >> bool loop_nest_has_data_refs (loop_p loop); >> -struct data_reference *create_data_ref (loop_p, loop_p, tree, gimple *, bool); >> +struct data_reference *create_data_ref (loop_p, loop_p, tree, gimple *, bool, >> + bool); >> extern bool find_loop_nest (struct loop *, vec<loop_p> *); >> extern struct data_dependence_relation *initialize_data_dependence_relation >> (struct data_reference *, struct data_reference *, vec<loop_p>); >> Index: gcc/tree-data-ref.c >> =================================================================== >> --- gcc/tree-data-ref.c 2017-06-08 08:51:43.349263895 +0100 >> +++ gcc/tree-data-ref.c 2017-06-22 12:23:21.284421233 +0100 >> @@ -1053,15 +1053,18 @@ free_data_ref (data_reference_p dr) >> free (dr); >> } >> >> -/* Analyzes memory reference MEMREF accessed in STMT. The reference >> - is read if IS_READ is true, write otherwise. Returns the >> - data_reference description of MEMREF. NEST is the outermost loop >> - in which the reference should be instantiated, LOOP is the loop in >> - which the data reference should be analyzed. */ >> +/* Analyze memory reference MEMREF, which is accessed in STMT. The reference >> + is a read if IS_READ is true, otherwise it is a write. IS_CONDITIONAL >> + indicates that the reference is conditional, i.e. that it might not >> + occur every time that STMT runs to completion. >> + >> + Return the data_reference description of MEMREF. NEST is the outermost >> + loop in which the reference should be instantiated, LOOP is the loop >> + in which the data reference should be analyzed. */ >> >> struct data_reference * >> create_data_ref (loop_p nest, loop_p loop, tree memref, gimple *stmt, >> - bool is_read) >> + bool is_read, bool is_conditional) >> { >> struct data_reference *dr; >> >> @@ -1076,6 +1079,7 @@ create_data_ref (loop_p nest, loop_p loo >> DR_STMT (dr) = stmt; >> DR_REF (dr) = memref; >> DR_IS_READ (dr) = is_read; >> + DR_IS_CONDITIONAL (dr) = is_conditional; >> >> dr_analyze_innermost (dr, nest); >> dr_analyze_indices (dr, nest, loop); >> @@ -4446,6 +4450,10 @@ struct data_ref_loc >> >> /* True if the memory reference is read. */ >> bool is_read; >> + >> + /* True if the data reference is conditional, i.e. if it might not >> + occur even when the statement runs to completion. */ >> + bool is_conditional; >> }; >> >> >> @@ -4512,6 +4520,7 @@ get_references_in_stmt (gimple *stmt, ve >> { >> ref.ref = op1; >> ref.is_read = true; >> + ref.is_conditional = false; >> references->safe_push (ref); >> } >> } >> @@ -4539,6 +4548,7 @@ get_references_in_stmt (gimple *stmt, ve >> type = TREE_TYPE (gimple_call_arg (stmt, 3)); >> if (TYPE_ALIGN (type) != align) >> type = build_aligned_type (type, align); >> + ref.is_conditional = true; >> ref.ref = fold_build2 (MEM_REF, type, gimple_call_arg (stmt, 0), >> ptr); >> references->safe_push (ref); >> @@ -4558,6 +4568,7 @@ get_references_in_stmt (gimple *stmt, ve >> { >> ref.ref = op1; >> ref.is_read = true; >> + ref.is_conditional = false; >> references->safe_push (ref); >> } >> } >> @@ -4571,6 +4582,7 @@ get_references_in_stmt (gimple *stmt, ve >> { >> ref.ref = op0; >> ref.is_read = false; >> + ref.is_conditional = false; >> references->safe_push (ref); >> } >> return clobbers_memory; >> @@ -4635,8 +4647,8 @@ find_data_references_in_stmt (struct loo >> >> FOR_EACH_VEC_ELT (references, i, ref) >> { >> - dr = create_data_ref (nest, loop_containing_stmt (stmt), >> - ref->ref, stmt, ref->is_read); >> + dr = create_data_ref (nest, loop_containing_stmt (stmt), ref->ref, >> + stmt, ref->is_read, ref->is_conditional); >> gcc_assert (dr != NULL); >> datarefs->safe_push (dr); >> } >> @@ -4665,7 +4677,8 @@ graphite_find_data_references_in_stmt (l >> >> FOR_EACH_VEC_ELT (references, i, ref) >> { >> - dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read); >> + dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read, >> + ref->is_conditional); >> gcc_assert (dr != NULL); >> datarefs->safe_push (dr); >> } >> Index: gcc/tree-ssa-loop-prefetch.c >> =================================================================== >> --- gcc/tree-ssa-loop-prefetch.c 2017-06-07 21:58:55.928557601 +0100 >> +++ gcc/tree-ssa-loop-prefetch.c 2017-06-22 12:23:21.285421180 +0100 >> @@ -1633,7 +1633,7 @@ determine_loop_nest_reuse (struct loop * >> for (ref = gr->refs; ref; ref = ref->next) >> { >> dr = create_data_ref (nest, loop_containing_stmt (ref->stmt), >> - ref->mem, ref->stmt, !ref->write_p); >> + ref->mem, ref->stmt, !ref->write_p, false); >> >> if (dr) >> { >> Index: gcc/tree-vect-data-refs.c >> =================================================================== >> --- gcc/tree-vect-data-refs.c 2017-06-08 08:51:43.350263752 +0100 >> +++ gcc/tree-vect-data-refs.c 2017-06-22 12:23:21.286421126 +0100 >> @@ -646,6 +646,102 @@ vect_slp_analyze_instance_dependence (sl >> return res; >> } >> >> +/* If DR is nested in a loop that is being vectorized, return the base >> + address in the context of the vectorized loop (rather than the >> + nested loop). Otherwise return the base address in the context >> + of the containing statement. */ >> + >> +static tree >> +vect_get_base_address (data_reference *dr) >> +{ >> + gimple *stmt = DR_STMT (dr); >> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >> + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); >> + struct loop *loop = loop_vinfo != NULL ? LOOP_VINFO_LOOP (loop_vinfo) : NULL; >> + if (loop && nested_in_vect_loop_p (loop, stmt)) >> + return STMT_VINFO_DR_BASE_ADDRESS (stmt_info); >> + else >> + return DR_BASE_ADDRESS (dr); >> +} >> + >> +/* Compute and return the alignment of base address BASE_ADDR in DR. */ >> + >> +static unsigned int >> +vect_compute_base_alignment (data_reference *dr, tree base_addr) >> +{ >> + /* To look at the alignment of the base we have to preserve an inner >> + MEM_REF as that carries the alignment information of the actual >> + access. */ >> + tree base = DR_REF (dr); >> + while (handled_component_p (base)) >> + base = TREE_OPERAND (base, 0); >> + unsigned int base_alignment = 0; >> + unsigned HOST_WIDE_INT base_bitpos; >> + get_object_alignment_1 (base, &base_alignment, &base_bitpos); >> + >> + /* As data-ref analysis strips the MEM_REF down to its base operand >> + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to >> + adjust things to make base_alignment valid as the alignment of >> + DR_BASE_ADDRESS. */ >> + if (TREE_CODE (base) == MEM_REF) >> + { >> + /* Note all this only works if DR_BASE_ADDRESS is the same as >> + MEM_REF operand zero, otherwise DR/SCEV analysis might have factored >> + in other offsets. We need to rework DR to compute the alingment >> + of DR_BASE_ADDRESS as long as all information is still available. */ >> + if (operand_equal_p (TREE_OPERAND (base, 0), base_addr, 0)) >> + { >> + base_bitpos -= mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT; >> + base_bitpos &= (base_alignment - 1); >> + } >> + else >> + base_bitpos = BITS_PER_UNIT; >> + } >> + if (base_bitpos != 0) >> + base_alignment = base_bitpos & -base_bitpos; >> + >> + /* Also look at the alignment of the base address DR analysis >> + computed. */ >> + unsigned int base_addr_alignment = get_pointer_alignment (base_addr); >> + if (base_addr_alignment > base_alignment) >> + base_alignment = base_addr_alignment; >> + >> + return base_alignment; >> +} >> + >> +/* Compute alignments for the base addresses of all datarefs in VINFO. */ >> + >> +void >> +vect_compute_base_alignments (vec_info *vinfo) >> +{ >> + /* If the region we're going to vectorize is reached, all unconditional >> + data references occur at least once. We can therefore pool the base >> + alignment guarantees from each unconditional reference. */ >> + data_reference *dr; >> + unsigned int i; >> + FOR_EACH_VEC_ELT (vinfo->datarefs, i, dr) >> + if (!DR_IS_CONDITIONAL (dr)) >> + { >> + tree base_addr = vect_get_base_address (dr); >> + unsigned int alignment = vect_compute_base_alignment (dr, base_addr); >> + bool existed; >> + unsigned int &entry >> + = vinfo->base_alignments.get_or_insert (base_addr, &existed); >> + if (!existed || entry < alignment) >> + { >> + entry = alignment; >> + if (dump_enabled_p ()) >> + { >> + dump_printf_loc (MSG_NOTE, vect_location, >> + "setting base alignment for "); >> + dump_generic_expr (MSG_NOTE, TDF_SLIM, base_addr); >> + dump_printf (MSG_NOTE, " to %d, based on ", alignment); >> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, DR_STMT (dr), 0); >> + } >> + } >> + } >> +} >> + >> /* Function vect_compute_data_ref_alignment >> >> Compute the misalignment of the data reference DR. >> @@ -663,6 +759,7 @@ vect_compute_data_ref_alignment (struct >> { >> gimple *stmt = DR_STMT (dr); >> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >> + vec_base_alignments *base_alignments = &stmt_info->vinfo->base_alignments; >> loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); >> struct loop *loop = NULL; >> tree ref = DR_REF (dr); >> @@ -699,6 +796,8 @@ vect_compute_data_ref_alignment (struct >> { >> tree step = DR_STEP (dr); >> >> + base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info); >> + aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info); >> if (tree_fits_shwi_p (step) >> && tree_to_shwi (step) % GET_MODE_SIZE (TYPE_MODE (vectype)) == 0) >> { >> @@ -706,8 +805,6 @@ vect_compute_data_ref_alignment (struct >> dump_printf_loc (MSG_NOTE, vect_location, >> "inner step divides the vector-size.\n"); >> misalign = STMT_VINFO_DR_INIT (stmt_info); >> - aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info); >> - base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info); >> } >> else >> { >> @@ -738,39 +835,15 @@ vect_compute_data_ref_alignment (struct >> } >> } >> >> - /* To look at alignment of the base we have to preserve an inner MEM_REF >> - as that carries alignment information of the actual access. */ >> - base = ref; >> - while (handled_component_p (base)) >> - base = TREE_OPERAND (base, 0); >> + /* Calculate the maximum of the pooled base address alignment and the >> + alignment that we can compute for DR itself. The latter should >> + already be included in the former for unconditional references. */ >> unsigned int base_alignment = 0; >> - unsigned HOST_WIDE_INT base_bitpos; >> - get_object_alignment_1 (base, &base_alignment, &base_bitpos); >> - /* As data-ref analysis strips the MEM_REF down to its base operand >> - to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to >> - adjust things to make base_alignment valid as the alignment of >> - DR_BASE_ADDRESS. */ >> - if (TREE_CODE (base) == MEM_REF) >> - { >> - /* Note all this only works if DR_BASE_ADDRESS is the same as >> - MEM_REF operand zero, otherwise DR/SCEV analysis might have factored >> - in other offsets. We need to rework DR to compute the alingment >> - of DR_BASE_ADDRESS as long as all information is still available. */ >> - if (operand_equal_p (TREE_OPERAND (base, 0), base_addr, 0)) >> - { >> - base_bitpos -= mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT; >> - base_bitpos &= (base_alignment - 1); >> - } >> - else >> - base_bitpos = BITS_PER_UNIT; >> - } >> - if (base_bitpos != 0) >> - base_alignment = base_bitpos & -base_bitpos; >> - /* Also look at the alignment of the base address DR analysis >> - computed. */ >> - unsigned int base_addr_alignment = get_pointer_alignment (base_addr); >> - if (base_addr_alignment > base_alignment) >> - base_alignment = base_addr_alignment; >> + if (DR_IS_CONDITIONAL (dr)) >> + base_alignment = vect_compute_base_alignment (dr, base_addr); >> + if (unsigned int *entry = base_alignments->get (base_addr)) >> + base_alignment = MAX (base_alignment, *entry); >> + gcc_assert (base_alignment != 0); >> >> if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) >> DR_VECT_AUX (dr)->base_element_aligned = true; >> @@ -906,8 +979,29 @@ vect_update_misalignment_for_peel (struc >> { >> if (current_dr != dr) >> continue; >> - gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >> - DR_MISALIGNMENT (dr_peel) / dr_peel_size); >> + /* Any alignment guarantees provided by a reference only apply if >> + the reference actually occurs. For example, in: >> + >> + struct s __attribute__((aligned(32))) { >> + int misaligner; >> + int array[N]; >> + }; >> + >> + int *ptr; >> + for (int i = 0; i < n; ++i) >> + ptr[i] = c[i] ? ((struct s *) (ptr - 1))->array[i] : 0; >> + >> + we can only assume that ptr is part of a struct s if at least one >> + c[i] is true. This in turn means that we have a higher base >> + alignment guarantee for the read from ptr (if it occurs) than for >> + the write to ptr, and we cannot unconditionally carry the former >> + over to the latter. We still know that the two address values >> + have the same misalignment, so if peeling has forced one of them >> + to be aligned, the other must be too. */ >> + gcc_assert (DR_IS_CONDITIONAL (dr_peel) >> + || DR_IS_CONDITIONAL (dr) >> + || (DR_MISALIGNMENT (dr) / dr_size >> + == DR_MISALIGNMENT (dr_peel) / dr_peel_size)); >> SET_DR_MISALIGNMENT (dr, 0); >> return; >> } >> @@ -2117,8 +2211,7 @@ vect_find_same_alignment_drs (struct dat >> if (dra == drb) >> return; >> >> - if (!operand_equal_p (DR_BASE_OBJECT (dra), DR_BASE_OBJECT (drb), >> - OEP_ADDRESS_OF) >> + if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) >> || !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0) >> || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0)) >> return; >> @@ -2176,6 +2269,7 @@ vect_analyze_data_refs_alignment (loop_v >> vec<data_reference_p> datarefs = vinfo->datarefs; >> struct data_reference *dr; >> >> + vect_compute_base_alignments (vinfo); >> FOR_EACH_VEC_ELT (datarefs, i, dr) >> { >> stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr)); >> @@ -3374,7 +3468,8 @@ vect_analyze_data_refs (vec_info *vinfo, >> { >> struct data_reference *newdr >> = create_data_ref (NULL, loop_containing_stmt (stmt), >> - DR_REF (dr), stmt, maybe_scatter ? false : true); >> + DR_REF (dr), stmt, !maybe_scatter, >> + DR_IS_CONDITIONAL (dr)); >> gcc_assert (newdr != NULL && DR_REF (newdr)); >> if (DR_BASE_ADDRESS (newdr) >> && DR_OFFSET (newdr) >> Index: gcc/tree-vect-slp.c >> =================================================================== >> --- gcc/tree-vect-slp.c 2017-06-07 21:58:56.336475882 +0100 >> +++ gcc/tree-vect-slp.c 2017-06-22 12:23:21.288421018 +0100 >> @@ -2367,6 +2367,7 @@ new_bb_vec_info (gimple_stmt_iterator re >> gimple_stmt_iterator gsi; >> >> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); >> + new (&res->base_alignments) vec_base_alignments (); >> res->kind = vec_info::bb; >> BB_VINFO_BB (res) = bb; >> res->region_begin = region_begin; >> @@ -2741,6 +2742,8 @@ vect_slp_analyze_bb_1 (gimple_stmt_itera >> return NULL; >> } >> >> + vect_compute_base_alignments (bb_vinfo); >> + >> /* Analyze and verify the alignment of data references and the >> dependence in the SLP instances. */ >> for (i = 0; BB_VINFO_SLP_INSTANCES (bb_vinfo).iterate (i, &instance); ) >> Index: gcc/tree-vect-loop.c >> =================================================================== >> --- gcc/tree-vect-loop.c 2017-06-22 12:22:57.734313143 +0100 >> +++ gcc/tree-vect-loop.c 2017-06-22 12:23:21.287421072 +0100 >> @@ -1157,6 +1157,7 @@ new_loop_vec_info (struct loop *loop) >> LOOP_VINFO_VECT_FACTOR (res) = 0; >> LOOP_VINFO_LOOP_NEST (res) = vNULL; >> LOOP_VINFO_DATAREFS (res) = vNULL; >> + new (&res->base_alignments) vec_base_alignments (); >> LOOP_VINFO_DDRS (res) = vNULL; >> LOOP_VINFO_UNALIGNED_DR (res) = NULL; >> LOOP_VINFO_MAY_MISALIGN_STMTS (res) = vNULL; >> Index: gcc/tree-vectorizer.c >> =================================================================== >> --- gcc/tree-vectorizer.c 2017-06-22 12:22:57.732313220 +0100 >> +++ gcc/tree-vectorizer.c 2017-06-22 12:23:21.288421018 +0100 >> @@ -370,6 +370,8 @@ vect_destroy_datarefs (vec_info *vinfo) >> } >> >> free_data_refs (vinfo->datarefs); >> + >> + vinfo->base_alignments.~vec_base_alignments (); >> } >> >> /* A helper function to free scev and LOOP niter information, as well as >> Index: gcc/testsuite/gcc.dg/vect/pr81136.c >> =================================================================== >> --- /dev/null 2017-06-22 07:43:14.805493307 +0100 >> +++ gcc/testsuite/gcc.dg/vect/pr81136.c 2017-06-22 12:23:21.283421287 +0100 >> @@ -0,0 +1,16 @@ >> +/* { dg-do compile } */ >> + >> +struct __attribute__((aligned (32))) >> +{ >> + char misaligner; >> + int foo[100]; >> + int bar[100]; >> +} *a; >> + >> +void >> +fn1 (int n) >> +{ >> + int *b = a->foo; >> + for (int i = 0; i < n; i++) >> + a->bar[i] = b[i]; >> +}
Richard Biener <richard.guenther@gmail.com> writes: > On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> The test case triggered this assert in vect_update_misalignment_for_peel: >> >> gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >> DR_MISALIGNMENT (dr_peel) / dr_peel_size); >> >> We knew that the two DRs had the same misalignment at runtime, but when >> considered in isolation, one data reference guaranteed a higher compile-time >> base alignment than the other. >> >> In the test case this looks like a missed opportunity. Both references >> are unconditional, so it should be possible to use the highest of the >> available base alignment guarantees when analyzing each reference. >> The patch does this. >> >> However, as the comment in the patch says, the base alignment guarantees >> provided by a conditional reference only apply if the reference occurs >> at least once. In this case it would be legitimate for two references >> to have the same runtime misalignment and for one reference to provide a >> stronger compile-time guarantee than the other about what the misalignment >> actually is. The patch therefore relaxes the assert to handle that case. > > Hmm, but you don't actually check whether a reference occurs only conditional, > do you? You just seem to say that for masked loads/stores the reference > is conditional (I believe that's not true). But for a loop like > > for (;;) > if (a[i]) > sum += b[j]; > > you still assume b[j] executes unconditionally? Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL was supposed to mean "even if the containing statement executes and runs to completion, the reference might not actually occur". The example above isn't conditional in that sense because the reference to b[j] does occur if the store is reached and completes. Masked loads and stores are conditional in that sense though. The reference only occurs if the mask is nonzero; the memory isn't touched otherwise. The functions are used to if-convert things like: for (...) a[i] = b[i] ? c[i] : d[i]; where there's no guarantee that it's safe to access c[i] when !b[i] (or d[i] when b[i]). No reference occurs for an all-false mask. > The vectorizer of course only sees unconditionally executed stmts. > > So - I'd simply not add this DR_IS_CONDITIONAL. Did you run into > any real-world (testsuite) issues without this? Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment assumptions in silly corner cases. I could add a scan test for it, for targets with masked loads and stores. It wouldn't trigger an execution failure though because we assume that targets with masked loads and stores allow unaligned accesses: /* For now assume all conditional loads/stores support unaligned access without any special code. */ if (is_gimple_call (stmt) && gimple_call_internal_p (stmt) && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) return dr_unaligned_supported; So the worst that would happen is that we'd supposedly peel for alignment, but actually misalign everything instead, and so make things slower rather than quicker. > Note that the assert is to prevent bogus information. Iff we aligned > DR with base alignment 8 and misalign 3 then if another same-align > DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT > as it still can be 8 after aligning DR. > > So I think it's wrong to put DRs with differing base-alignment into > the same-align-refs chain, those should get their DR_MISALIGNMENT > updated independenlty after peeling. DR_MISALIGNMENT is relative to the vector alignment rather than the base alignment though. So: a) when looking for references *A1 and *A2 with the same alignment, we simply have to prove that A1 % vecalign == A2 % vecalign. This doesn't require any knowledge about the base alignment. If we break the addresses down as: A1 = BASE1 + REST1, REST1 = INIT1 + OFFSET1 + X * STEP1 A2 = BASE2 + REST2, REST2 = INIT2 + OFFSET2 + X * STEP2 and can prove that BASE1 == BASE2, the alignment of that base isn't important. We simply need to prove that REST1 % vecalign == REST2 % vecalign for all X. b) In the assert, we've peeled the loop so that DR_PEEL is guaranteed to be vector-aligned. If DR_PEEL is A1 in the example above, we have A1 % vecalign == 0, so A2 % vecalign must be 0 too. This again doesn't rely on the base alignment being known. What a high base alignment for DR_PEEL gives us is the ability to know at compile how many iterations need to be peeled to make DR_PEEL aligned. But the points above apply regardless of whether we know that value at compile time or not. In examples like the test case, we would have known at compile time that VF-1 iterations would need to be peeled if we'd picked the store as the DR_PEEL, but would have treated the number of peels as variable if we'd picked the load. The value calculated at runtime would still have been VF-1, it's just that the code wouldn't have been as efficient. One of the benefits of pooling the alignments for unconditional references is that it no longer matters which DR we pick: the number of peels will be a compile-time constant both ways. Thanks, Richard > I'd rather not mix fixing this with the improvement to eventuall use a > larger align for the other DR if possible. > > Thanks, > Richard. > >> Tested on powerpc64-linux-gnu, aarch64-linux-gnu and x86_64-linux-gnu. >> OK to instal? >> >> Richard >> >> >> 2017-06-22 Richard Sandiford <richard.sandiford@linaro.org> >> >> gcc/ >> PR tree-optimization/81136 >> * tree-vectorizer.h: Include tree-hash-traits.h. >> (vec_base_alignments): New typedef. >> (vec_info): Add a base_alignments field. >> (vect_compute_base_alignments: Declare. >> * tree-data-ref.h (data_reference): Add an is_conditional field. >> (DR_IS_CONDITIONAL): New macro. >> (create_data_ref): Add an is_conditional argument. >> * tree-data-ref.c (create_data_ref): Likewise. Use it to initialize >> the is_conditional field. >> (data_ref_loc): Add an is_conditional field. >> (get_references_in_stmt): Set the is_conditional field. >> (find_data_references_in_stmt): Update call to create_data_ref. >> (graphite_find_data_references_in_stmt): Likewise. >> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Likewise. >> * tree-vect-data-refs.c (vect_analyze_data_refs): Likewise. >> (vect_get_base_address): New function. >> (vect_compute_base_alignments): Likewise. >> (vect_compute_base_alignment): Likewise, split out from... >> (vect_compute_data_ref_alignment): ...here. Use precomputed >> base alignments. Only compute a new base alignment here if the >> reference is conditional. >> (vect_update_misalignment_for_peel): Allow the compile-time >> DR_MISALIGNMENTs of two references with the same runtime alignment >> to be different if one of the references is conditional. >> (vect_find_same_alignment_drs): Compare base addresses instead >> of base objects. >> (vect_compute_data_ref_alignment): Call vect_compute_base_alignments. >> * tree-vect-slp.c (vect_slp_analyze_bb_1): Likewise. >> (new_bb_vec_info): Initialize base_alignments. >> * tree-vect-loop.c (new_loop_vec_info): Likewise. >> * tree-vectorizer.c (vect_destroy_datarefs): Release base_alignments. >> >> gcc/testsuite/ >> PR tree-optimization/81136 >> * gcc.dg/vect/pr81136.c: New test. >> >> Index: gcc/tree-vectorizer.h >> =================================================================== >> --- gcc/tree-vectorizer.h 2017-06-08 08:51:43.347264181 +0100 >> +++ gcc/tree-vectorizer.h 2017-06-22 12:23:21.288421018 +0100 >> @@ -22,6 +22,7 @@ Software Foundation; either version 3, o >> #define GCC_TREE_VECTORIZER_H >> >> #include "tree-data-ref.h" >> +#include "tree-hash-traits.h" >> #include "target.h" >> >> /* Used for naming of new temporaries. */ >> @@ -84,6 +85,10 @@ struct stmt_info_for_cost { >> >> typedef vec<stmt_info_for_cost> stmt_vector_for_cost; >> >> +/* Maps base addresses to the largest alignment that we've been able >> + to calculate for them. */ >> +typedef hash_map<tree_operand_hash, unsigned int> vec_base_alignments; >> + >> /************************************************************************ >> SLP >> ************************************************************************/ >> @@ -156,6 +161,10 @@ struct vec_info { >> /* All data references. */ >> vec<data_reference_p> datarefs; >> >> + /* Maps the base addresses of all data references in DATAREFS to the >> + largest alignment that we've been able to calculate for them. */ >> + vec_base_alignments base_alignments; >> + >> /* All data dependences. */ >> vec<ddr_p> ddrs; >> >> @@ -1117,6 +1126,7 @@ extern bool vect_prune_runtime_alias_tes >> extern bool vect_check_gather_scatter (gimple *, loop_vec_info, >> gather_scatter_info *); >> extern bool vect_analyze_data_refs (vec_info *, int *); >> +extern void vect_compute_base_alignments (vec_info *); >> extern tree vect_create_data_ref_ptr (gimple *, tree, struct loop *, tree, >> tree *, gimple_stmt_iterator *, >> gimple **, bool, bool *, >> Index: gcc/tree-data-ref.h >> =================================================================== >> --- gcc/tree-data-ref.h 2017-06-08 08:51:43.349263895 +0100 >> +++ gcc/tree-data-ref.h 2017-06-22 12:23:21.285421180 +0100 >> @@ -119,6 +119,10 @@ struct data_reference >> /* True when the data reference is in RHS of a stmt. */ >> bool is_read; >> >> + /* True when the data reference is conditional, i.e. if it might not >> + occur even when the statement runs to completion. */ >> + bool is_conditional; >> + >> /* Behavior of the memory reference in the innermost loop. */ >> struct innermost_loop_behavior innermost; >> >> @@ -138,6 +142,7 @@ #define DR_ACCESS_FN(DR, I) DR_AC >> #define DR_NUM_DIMENSIONS(DR) DR_ACCESS_FNS (DR).length () >> #define DR_IS_READ(DR) (DR)->is_read >> #define DR_IS_WRITE(DR) (!DR_IS_READ (DR)) >> +#define DR_IS_CONDITIONAL(DR) (DR)->is_conditional >> #define DR_BASE_ADDRESS(DR) (DR)->innermost.base_address >> #define DR_OFFSET(DR) (DR)->innermost.offset >> #define DR_INIT(DR) (DR)->innermost.init >> @@ -350,7 +355,8 @@ extern bool graphite_find_data_reference >> vec<data_reference_p> *); >> tree find_data_references_in_loop (struct loop *, vec<data_reference_p> *); >> bool loop_nest_has_data_refs (loop_p loop); >> -struct data_reference *create_data_ref (loop_p, loop_p, tree, gimple *, bool); >> +struct data_reference *create_data_ref (loop_p, loop_p, tree, gimple *, bool, >> + bool); >> extern bool find_loop_nest (struct loop *, vec<loop_p> *); >> extern struct data_dependence_relation *initialize_data_dependence_relation >> (struct data_reference *, struct data_reference *, vec<loop_p>); >> Index: gcc/tree-data-ref.c >> =================================================================== >> --- gcc/tree-data-ref.c 2017-06-08 08:51:43.349263895 +0100 >> +++ gcc/tree-data-ref.c 2017-06-22 12:23:21.284421233 +0100 >> @@ -1053,15 +1053,18 @@ free_data_ref (data_reference_p dr) >> free (dr); >> } >> >> -/* Analyzes memory reference MEMREF accessed in STMT. The reference >> - is read if IS_READ is true, write otherwise. Returns the >> - data_reference description of MEMREF. NEST is the outermost loop >> - in which the reference should be instantiated, LOOP is the loop in >> - which the data reference should be analyzed. */ >> +/* Analyze memory reference MEMREF, which is accessed in STMT. The reference >> + is a read if IS_READ is true, otherwise it is a write. IS_CONDITIONAL >> + indicates that the reference is conditional, i.e. that it might not >> + occur every time that STMT runs to completion. >> + >> + Return the data_reference description of MEMREF. NEST is the outermost >> + loop in which the reference should be instantiated, LOOP is the loop >> + in which the data reference should be analyzed. */ >> >> struct data_reference * >> create_data_ref (loop_p nest, loop_p loop, tree memref, gimple *stmt, >> - bool is_read) >> + bool is_read, bool is_conditional) >> { >> struct data_reference *dr; >> >> @@ -1076,6 +1079,7 @@ create_data_ref (loop_p nest, loop_p loo >> DR_STMT (dr) = stmt; >> DR_REF (dr) = memref; >> DR_IS_READ (dr) = is_read; >> + DR_IS_CONDITIONAL (dr) = is_conditional; >> >> dr_analyze_innermost (dr, nest); >> dr_analyze_indices (dr, nest, loop); >> @@ -4446,6 +4450,10 @@ struct data_ref_loc >> >> /* True if the memory reference is read. */ >> bool is_read; >> + >> + /* True if the data reference is conditional, i.e. if it might not >> + occur even when the statement runs to completion. */ >> + bool is_conditional; >> }; >> >> >> @@ -4512,6 +4520,7 @@ get_references_in_stmt (gimple *stmt, ve >> { >> ref.ref = op1; >> ref.is_read = true; >> + ref.is_conditional = false; >> references->safe_push (ref); >> } >> } >> @@ -4539,6 +4548,7 @@ get_references_in_stmt (gimple *stmt, ve >> type = TREE_TYPE (gimple_call_arg (stmt, 3)); >> if (TYPE_ALIGN (type) != align) >> type = build_aligned_type (type, align); >> + ref.is_conditional = true; >> ref.ref = fold_build2 (MEM_REF, type, gimple_call_arg (stmt, 0), >> ptr); >> references->safe_push (ref); >> @@ -4558,6 +4568,7 @@ get_references_in_stmt (gimple *stmt, ve >> { >> ref.ref = op1; >> ref.is_read = true; >> + ref.is_conditional = false; >> references->safe_push (ref); >> } >> } >> @@ -4571,6 +4582,7 @@ get_references_in_stmt (gimple *stmt, ve >> { >> ref.ref = op0; >> ref.is_read = false; >> + ref.is_conditional = false; >> references->safe_push (ref); >> } >> return clobbers_memory; >> @@ -4635,8 +4647,8 @@ find_data_references_in_stmt (struct loo >> >> FOR_EACH_VEC_ELT (references, i, ref) >> { >> - dr = create_data_ref (nest, loop_containing_stmt (stmt), >> - ref->ref, stmt, ref->is_read); >> + dr = create_data_ref (nest, loop_containing_stmt (stmt), ref->ref, >> + stmt, ref->is_read, ref->is_conditional); >> gcc_assert (dr != NULL); >> datarefs->safe_push (dr); >> } >> @@ -4665,7 +4677,8 @@ graphite_find_data_references_in_stmt (l >> >> FOR_EACH_VEC_ELT (references, i, ref) >> { >> - dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read); >> + dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read, >> + ref->is_conditional); >> gcc_assert (dr != NULL); >> datarefs->safe_push (dr); >> } >> Index: gcc/tree-ssa-loop-prefetch.c >> =================================================================== >> --- gcc/tree-ssa-loop-prefetch.c 2017-06-07 21:58:55.928557601 +0100 >> +++ gcc/tree-ssa-loop-prefetch.c 2017-06-22 12:23:21.285421180 +0100 >> @@ -1633,7 +1633,7 @@ determine_loop_nest_reuse (struct loop * >> for (ref = gr->refs; ref; ref = ref->next) >> { >> dr = create_data_ref (nest, loop_containing_stmt (ref->stmt), >> - ref->mem, ref->stmt, !ref->write_p); >> + ref->mem, ref->stmt, !ref->write_p, false); >> >> if (dr) >> { >> Index: gcc/tree-vect-data-refs.c >> =================================================================== >> --- gcc/tree-vect-data-refs.c 2017-06-08 08:51:43.350263752 +0100 >> +++ gcc/tree-vect-data-refs.c 2017-06-22 12:23:21.286421126 +0100 >> @@ -646,6 +646,102 @@ vect_slp_analyze_instance_dependence (sl >> return res; >> } >> >> +/* If DR is nested in a loop that is being vectorized, return the base >> + address in the context of the vectorized loop (rather than the >> + nested loop). Otherwise return the base address in the context >> + of the containing statement. */ >> + >> +static tree >> +vect_get_base_address (data_reference *dr) >> +{ >> + gimple *stmt = DR_STMT (dr); >> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >> + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); >> + struct loop *loop = loop_vinfo != NULL ? LOOP_VINFO_LOOP (loop_vinfo) : NULL; >> + if (loop && nested_in_vect_loop_p (loop, stmt)) >> + return STMT_VINFO_DR_BASE_ADDRESS (stmt_info); >> + else >> + return DR_BASE_ADDRESS (dr); >> +} >> + >> +/* Compute and return the alignment of base address BASE_ADDR in DR. */ >> + >> +static unsigned int >> +vect_compute_base_alignment (data_reference *dr, tree base_addr) >> +{ >> + /* To look at the alignment of the base we have to preserve an inner >> + MEM_REF as that carries the alignment information of the actual >> + access. */ >> + tree base = DR_REF (dr); >> + while (handled_component_p (base)) >> + base = TREE_OPERAND (base, 0); >> + unsigned int base_alignment = 0; >> + unsigned HOST_WIDE_INT base_bitpos; >> + get_object_alignment_1 (base, &base_alignment, &base_bitpos); >> + >> + /* As data-ref analysis strips the MEM_REF down to its base operand >> + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to >> + adjust things to make base_alignment valid as the alignment of >> + DR_BASE_ADDRESS. */ >> + if (TREE_CODE (base) == MEM_REF) >> + { >> + /* Note all this only works if DR_BASE_ADDRESS is the same as >> + MEM_REF operand zero, otherwise DR/SCEV analysis might have factored >> + in other offsets. We need to rework DR to compute the alingment >> + of DR_BASE_ADDRESS as long as all information is still available. */ >> + if (operand_equal_p (TREE_OPERAND (base, 0), base_addr, 0)) >> + { >> + base_bitpos -= mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT; >> + base_bitpos &= (base_alignment - 1); >> + } >> + else >> + base_bitpos = BITS_PER_UNIT; >> + } >> + if (base_bitpos != 0) >> + base_alignment = base_bitpos & -base_bitpos; >> + >> + /* Also look at the alignment of the base address DR analysis >> + computed. */ >> + unsigned int base_addr_alignment = get_pointer_alignment (base_addr); >> + if (base_addr_alignment > base_alignment) >> + base_alignment = base_addr_alignment; >> + >> + return base_alignment; >> +} >> + >> +/* Compute alignments for the base addresses of all datarefs in VINFO. */ >> + >> +void >> +vect_compute_base_alignments (vec_info *vinfo) >> +{ >> + /* If the region we're going to vectorize is reached, all unconditional >> + data references occur at least once. We can therefore pool the base >> + alignment guarantees from each unconditional reference. */ >> + data_reference *dr; >> + unsigned int i; >> + FOR_EACH_VEC_ELT (vinfo->datarefs, i, dr) >> + if (!DR_IS_CONDITIONAL (dr)) >> + { >> + tree base_addr = vect_get_base_address (dr); >> + unsigned int alignment = vect_compute_base_alignment (dr, base_addr); >> + bool existed; >> + unsigned int &entry >> + = vinfo->base_alignments.get_or_insert (base_addr, &existed); >> + if (!existed || entry < alignment) >> + { >> + entry = alignment; >> + if (dump_enabled_p ()) >> + { >> + dump_printf_loc (MSG_NOTE, vect_location, >> + "setting base alignment for "); >> + dump_generic_expr (MSG_NOTE, TDF_SLIM, base_addr); >> + dump_printf (MSG_NOTE, " to %d, based on ", alignment); >> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, DR_STMT (dr), 0); >> + } >> + } >> + } >> +} >> + >> /* Function vect_compute_data_ref_alignment >> >> Compute the misalignment of the data reference DR. >> @@ -663,6 +759,7 @@ vect_compute_data_ref_alignment (struct >> { >> gimple *stmt = DR_STMT (dr); >> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >> + vec_base_alignments *base_alignments = &stmt_info->vinfo->base_alignments; >> loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); >> struct loop *loop = NULL; >> tree ref = DR_REF (dr); >> @@ -699,6 +796,8 @@ vect_compute_data_ref_alignment (struct >> { >> tree step = DR_STEP (dr); >> >> + base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info); >> + aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info); >> if (tree_fits_shwi_p (step) >> && tree_to_shwi (step) % GET_MODE_SIZE (TYPE_MODE (vectype)) == 0) >> { >> @@ -706,8 +805,6 @@ vect_compute_data_ref_alignment (struct >> dump_printf_loc (MSG_NOTE, vect_location, >> "inner step divides the vector-size.\n"); >> misalign = STMT_VINFO_DR_INIT (stmt_info); >> - aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info); >> - base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info); >> } >> else >> { >> @@ -738,39 +835,15 @@ vect_compute_data_ref_alignment (struct >> } >> } >> >> - /* To look at alignment of the base we have to preserve an inner MEM_REF >> - as that carries alignment information of the actual access. */ >> - base = ref; >> - while (handled_component_p (base)) >> - base = TREE_OPERAND (base, 0); >> + /* Calculate the maximum of the pooled base address alignment and the >> + alignment that we can compute for DR itself. The latter should >> + already be included in the former for unconditional references. */ >> unsigned int base_alignment = 0; >> - unsigned HOST_WIDE_INT base_bitpos; >> - get_object_alignment_1 (base, &base_alignment, &base_bitpos); >> - /* As data-ref analysis strips the MEM_REF down to its base operand >> - to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to >> - adjust things to make base_alignment valid as the alignment of >> - DR_BASE_ADDRESS. */ >> - if (TREE_CODE (base) == MEM_REF) >> - { >> - /* Note all this only works if DR_BASE_ADDRESS is the same as >> - MEM_REF operand zero, otherwise DR/SCEV analysis might have factored >> - in other offsets. We need to rework DR to compute the alingment >> - of DR_BASE_ADDRESS as long as all information is still available. */ >> - if (operand_equal_p (TREE_OPERAND (base, 0), base_addr, 0)) >> - { >> - base_bitpos -= mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT; >> - base_bitpos &= (base_alignment - 1); >> - } >> - else >> - base_bitpos = BITS_PER_UNIT; >> - } >> - if (base_bitpos != 0) >> - base_alignment = base_bitpos & -base_bitpos; >> - /* Also look at the alignment of the base address DR analysis >> - computed. */ >> - unsigned int base_addr_alignment = get_pointer_alignment (base_addr); >> - if (base_addr_alignment > base_alignment) >> - base_alignment = base_addr_alignment; >> + if (DR_IS_CONDITIONAL (dr)) >> + base_alignment = vect_compute_base_alignment (dr, base_addr); >> + if (unsigned int *entry = base_alignments->get (base_addr)) >> + base_alignment = MAX (base_alignment, *entry); >> + gcc_assert (base_alignment != 0); >> >> if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) >> DR_VECT_AUX (dr)->base_element_aligned = true; >> @@ -906,8 +979,29 @@ vect_update_misalignment_for_peel (struc >> { >> if (current_dr != dr) >> continue; >> - gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >> - DR_MISALIGNMENT (dr_peel) / dr_peel_size); >> + /* Any alignment guarantees provided by a reference only apply if >> + the reference actually occurs. For example, in: >> + >> + struct s __attribute__((aligned(32))) { >> + int misaligner; >> + int array[N]; >> + }; >> + >> + int *ptr; >> + for (int i = 0; i < n; ++i) >> + ptr[i] = c[i] ? ((struct s *) (ptr - 1))->array[i] : 0; >> + >> + we can only assume that ptr is part of a struct s if at least one >> + c[i] is true. This in turn means that we have a higher base >> + alignment guarantee for the read from ptr (if it occurs) than for >> + the write to ptr, and we cannot unconditionally carry the former >> + over to the latter. We still know that the two address values >> + have the same misalignment, so if peeling has forced one of them >> + to be aligned, the other must be too. */ >> + gcc_assert (DR_IS_CONDITIONAL (dr_peel) >> + || DR_IS_CONDITIONAL (dr) >> + || (DR_MISALIGNMENT (dr) / dr_size >> + == DR_MISALIGNMENT (dr_peel) / dr_peel_size)); >> SET_DR_MISALIGNMENT (dr, 0); >> return; >> } >> @@ -2117,8 +2211,7 @@ vect_find_same_alignment_drs (struct dat >> if (dra == drb) >> return; >> >> - if (!operand_equal_p (DR_BASE_OBJECT (dra), DR_BASE_OBJECT (drb), >> - OEP_ADDRESS_OF) >> + if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) >> || !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0) >> || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0)) >> return; >> @@ -2176,6 +2269,7 @@ vect_analyze_data_refs_alignment (loop_v >> vec<data_reference_p> datarefs = vinfo->datarefs; >> struct data_reference *dr; >> >> + vect_compute_base_alignments (vinfo); >> FOR_EACH_VEC_ELT (datarefs, i, dr) >> { >> stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr)); >> @@ -3374,7 +3468,8 @@ vect_analyze_data_refs (vec_info *vinfo, >> { >> struct data_reference *newdr >> = create_data_ref (NULL, loop_containing_stmt (stmt), >> - DR_REF (dr), stmt, maybe_scatter ? false : true); >> + DR_REF (dr), stmt, !maybe_scatter, >> + DR_IS_CONDITIONAL (dr)); >> gcc_assert (newdr != NULL && DR_REF (newdr)); >> if (DR_BASE_ADDRESS (newdr) >> && DR_OFFSET (newdr) >> Index: gcc/tree-vect-slp.c >> =================================================================== >> --- gcc/tree-vect-slp.c 2017-06-07 21:58:56.336475882 +0100 >> +++ gcc/tree-vect-slp.c 2017-06-22 12:23:21.288421018 +0100 >> @@ -2367,6 +2367,7 @@ new_bb_vec_info (gimple_stmt_iterator re >> gimple_stmt_iterator gsi; >> >> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); >> + new (&res->base_alignments) vec_base_alignments (); >> res->kind = vec_info::bb; >> BB_VINFO_BB (res) = bb; >> res->region_begin = region_begin; >> @@ -2741,6 +2742,8 @@ vect_slp_analyze_bb_1 (gimple_stmt_itera >> return NULL; >> } >> >> + vect_compute_base_alignments (bb_vinfo); >> + >> /* Analyze and verify the alignment of data references and the >> dependence in the SLP instances. */ >> for (i = 0; BB_VINFO_SLP_INSTANCES (bb_vinfo).iterate (i, &instance); ) >> Index: gcc/tree-vect-loop.c >> =================================================================== >> --- gcc/tree-vect-loop.c 2017-06-22 12:22:57.734313143 +0100 >> +++ gcc/tree-vect-loop.c 2017-06-22 12:23:21.287421072 +0100 >> @@ -1157,6 +1157,7 @@ new_loop_vec_info (struct loop *loop) >> LOOP_VINFO_VECT_FACTOR (res) = 0; >> LOOP_VINFO_LOOP_NEST (res) = vNULL; >> LOOP_VINFO_DATAREFS (res) = vNULL; >> + new (&res->base_alignments) vec_base_alignments (); >> LOOP_VINFO_DDRS (res) = vNULL; >> LOOP_VINFO_UNALIGNED_DR (res) = NULL; >> LOOP_VINFO_MAY_MISALIGN_STMTS (res) = vNULL; >> Index: gcc/tree-vectorizer.c >> =================================================================== >> --- gcc/tree-vectorizer.c 2017-06-22 12:22:57.732313220 +0100 >> +++ gcc/tree-vectorizer.c 2017-06-22 12:23:21.288421018 +0100 >> @@ -370,6 +370,8 @@ vect_destroy_datarefs (vec_info *vinfo) >> } >> >> free_data_refs (vinfo->datarefs); >> + >> + vinfo->base_alignments.~vec_base_alignments (); >> } >> >> /* A helper function to free scev and LOOP niter information, as well as >> Index: gcc/testsuite/gcc.dg/vect/pr81136.c >> =================================================================== >> --- /dev/null 2017-06-22 07:43:14.805493307 +0100 >> +++ gcc/testsuite/gcc.dg/vect/pr81136.c 2017-06-22 12:23:21.283421287 +0100 >> @@ -0,0 +1,16 @@ >> +/* { dg-do compile } */ >> + >> +struct __attribute__((aligned (32))) >> +{ >> + char misaligner; >> + int foo[100]; >> + int bar[100]; >> +} *a; >> + >> +void >> +fn1 (int n) >> +{ >> + int *b = a->foo; >> + for (int i = 0; i < n; i++) >> + a->bar[i] = b[i]; >> +}
On Fri, Jun 23, 2017 at 2:05 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> The test case triggered this assert in vect_update_misalignment_for_peel: >>> >>> gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >>> DR_MISALIGNMENT (dr_peel) / dr_peel_size); >>> >>> We knew that the two DRs had the same misalignment at runtime, but when >>> considered in isolation, one data reference guaranteed a higher compile-time >>> base alignment than the other. >>> >>> In the test case this looks like a missed opportunity. Both references >>> are unconditional, so it should be possible to use the highest of the >>> available base alignment guarantees when analyzing each reference. >>> The patch does this. >>> >>> However, as the comment in the patch says, the base alignment guarantees >>> provided by a conditional reference only apply if the reference occurs >>> at least once. In this case it would be legitimate for two references >>> to have the same runtime misalignment and for one reference to provide a >>> stronger compile-time guarantee than the other about what the misalignment >>> actually is. The patch therefore relaxes the assert to handle that case. >> >> Hmm, but you don't actually check whether a reference occurs only conditional, >> do you? You just seem to say that for masked loads/stores the reference >> is conditional (I believe that's not true). But for a loop like >> >> for (;;) >> if (a[i]) >> sum += b[j]; >> >> you still assume b[j] executes unconditionally? > > Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL > was supposed to mean "even if the containing statement executes > and runs to completion, the reference might not actually occur". > The example above isn't conditional in that sense because the > reference to b[j] does occur if the store is reached and completes. > > Masked loads and stores are conditional in that sense though. > The reference only occurs if the mask is nonzero; the memory > isn't touched otherwise. The functions are used to if-convert > things like: > > for (...) > a[i] = b[i] ? c[i] : d[i]; > > where there's no guarantee that it's safe to access c[i] when !b[i] > (or d[i] when b[i]). No reference occurs for an all-false mask. But as you touch generic data-ref code here you should apply more sensible semantics to DR_IS_CONDITIONAL than just marking masked loads/stores but not DRs occuring inside BBs only executed conditionally ... >> The vectorizer of course only sees unconditionally executed stmts. >> >> So - I'd simply not add this DR_IS_CONDITIONAL. Did you run into >> any real-world (testsuite) issues without this? > > Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment > assumptions in silly corner cases. I could add a scan test for it, > for targets with masked loads and stores. It wouldn't trigger > an execution failure though because we assume that targets with > masked loads and stores allow unaligned accesses: > > /* For now assume all conditional loads/stores support unaligned > access without any special code. */ > if (is_gimple_call (stmt) > && gimple_call_internal_p (stmt) > && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD > || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) > return dr_unaligned_supported; > > So the worst that would happen is that we'd supposedly peel for > alignment, but actually misalign everything instead, and so make > things slower rather than quicker. > >> Note that the assert is to prevent bogus information. Iff we aligned >> DR with base alignment 8 and misalign 3 then if another same-align >> DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT >> as it still can be 8 after aligning DR. >> >> So I think it's wrong to put DRs with differing base-alignment into >> the same-align-refs chain, those should get their DR_MISALIGNMENT >> updated independenlty after peeling. > > DR_MISALIGNMENT is relative to the vector alignment rather than > the base alignment though. So: We seem to use it that way, yes (looking at set_ptr_info_alignment uses). So why not fix the assert then by capping the alignment/misalignment we compute at this value as well? (and document this in the header around DR_MISALIGNMENT) Ideally we'd do alignment analysis independent of the vector size though (for those stupid targets with multiple vector sizes to consider...). > a) when looking for references *A1 and *A2 with the same alignment, > we simply have to prove that A1 % vecalign == A2 % vecalign. > This doesn't require any knowledge about the base alignment. > If we break the addresses down as: > > A1 = BASE1 + REST1, REST1 = INIT1 + OFFSET1 + X * STEP1 > A2 = BASE2 + REST2, REST2 = INIT2 + OFFSET2 + X * STEP2 > > and can prove that BASE1 == BASE2, the alignment of that base > isn't important. We simply need to prove that REST1 % vecalign > == REST2 % vecalign for all X. > > b) In the assert, we've peeled the loop so that DR_PEEL is guaranteed > to be vector-aligned. If DR_PEEL is A1 in the example above, we have > A1 % vecalign == 0, so A2 % vecalign must be 0 too. This again doesn't > rely on the base alignment being known. > > What a high base alignment for DR_PEEL gives us is the ability to know > at compile how many iterations need to be peeled to make DR_PEEL aligned. > But the points above apply regardless of whether we know that value at > compile time or not. > > In examples like the test case, we would have known at compile time that > VF-1 iterations would need to be peeled if we'd picked the store as the > DR_PEEL, but would have treated the number of peels as variable if we'd > picked the load. The value calculated at runtime would still have been > VF-1, it's just that the code wouldn't have been as efficient. > > One of the benefits of pooling the alignments for unconditional references > is that it no longer matters which DR we pick: the number of peels will > be a compile-time constant both ways. > > Thanks, > Richard > >> I'd rather not mix fixing this with the improvement to eventuall use a >> larger align for the other DR if possible. ^^^ So can you fix the ICE with capping base alignment / DR_MISALIGNMENT? Richard. >> Thanks, >> Richard. >> >>> Tested on powerpc64-linux-gnu, aarch64-linux-gnu and x86_64-linux-gnu. >>> OK to instal? >>> >>> Richard >>> >>> >>> 2017-06-22 Richard Sandiford <richard.sandiford@linaro.org> >>> >>> gcc/ >>> PR tree-optimization/81136 >>> * tree-vectorizer.h: Include tree-hash-traits.h. >>> (vec_base_alignments): New typedef. >>> (vec_info): Add a base_alignments field. >>> (vect_compute_base_alignments: Declare. >>> * tree-data-ref.h (data_reference): Add an is_conditional field. >>> (DR_IS_CONDITIONAL): New macro. >>> (create_data_ref): Add an is_conditional argument. >>> * tree-data-ref.c (create_data_ref): Likewise. Use it to initialize >>> the is_conditional field. >>> (data_ref_loc): Add an is_conditional field. >>> (get_references_in_stmt): Set the is_conditional field. >>> (find_data_references_in_stmt): Update call to create_data_ref. >>> (graphite_find_data_references_in_stmt): Likewise. >>> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Likewise. >>> * tree-vect-data-refs.c (vect_analyze_data_refs): Likewise. >>> (vect_get_base_address): New function. >>> (vect_compute_base_alignments): Likewise. >>> (vect_compute_base_alignment): Likewise, split out from... >>> (vect_compute_data_ref_alignment): ...here. Use precomputed >>> base alignments. Only compute a new base alignment here if the >>> reference is conditional. >>> (vect_update_misalignment_for_peel): Allow the compile-time >>> DR_MISALIGNMENTs of two references with the same runtime alignment >>> to be different if one of the references is conditional. >>> (vect_find_same_alignment_drs): Compare base addresses instead >>> of base objects. >>> (vect_compute_data_ref_alignment): Call vect_compute_base_alignments. >>> * tree-vect-slp.c (vect_slp_analyze_bb_1): Likewise. >>> (new_bb_vec_info): Initialize base_alignments. >>> * tree-vect-loop.c (new_loop_vec_info): Likewise. >>> * tree-vectorizer.c (vect_destroy_datarefs): Release base_alignments. >>> >>> gcc/testsuite/ >>> PR tree-optimization/81136 >>> * gcc.dg/vect/pr81136.c: New test. >>> >>> Index: gcc/tree-vectorizer.h >>> =================================================================== >>> --- gcc/tree-vectorizer.h 2017-06-08 08:51:43.347264181 +0100 >>> +++ gcc/tree-vectorizer.h 2017-06-22 12:23:21.288421018 +0100 >>> @@ -22,6 +22,7 @@ Software Foundation; either version 3, o >>> #define GCC_TREE_VECTORIZER_H >>> >>> #include "tree-data-ref.h" >>> +#include "tree-hash-traits.h" >>> #include "target.h" >>> >>> /* Used for naming of new temporaries. */ >>> @@ -84,6 +85,10 @@ struct stmt_info_for_cost { >>> >>> typedef vec<stmt_info_for_cost> stmt_vector_for_cost; >>> >>> +/* Maps base addresses to the largest alignment that we've been able >>> + to calculate for them. */ >>> +typedef hash_map<tree_operand_hash, unsigned int> vec_base_alignments; >>> + >>> /************************************************************************ >>> SLP >>> ************************************************************************/ >>> @@ -156,6 +161,10 @@ struct vec_info { >>> /* All data references. */ >>> vec<data_reference_p> datarefs; >>> >>> + /* Maps the base addresses of all data references in DATAREFS to the >>> + largest alignment that we've been able to calculate for them. */ >>> + vec_base_alignments base_alignments; >>> + >>> /* All data dependences. */ >>> vec<ddr_p> ddrs; >>> >>> @@ -1117,6 +1126,7 @@ extern bool vect_prune_runtime_alias_tes >>> extern bool vect_check_gather_scatter (gimple *, loop_vec_info, >>> gather_scatter_info *); >>> extern bool vect_analyze_data_refs (vec_info *, int *); >>> +extern void vect_compute_base_alignments (vec_info *); >>> extern tree vect_create_data_ref_ptr (gimple *, tree, struct loop *, tree, >>> tree *, gimple_stmt_iterator *, >>> gimple **, bool, bool *, >>> Index: gcc/tree-data-ref.h >>> =================================================================== >>> --- gcc/tree-data-ref.h 2017-06-08 08:51:43.349263895 +0100 >>> +++ gcc/tree-data-ref.h 2017-06-22 12:23:21.285421180 +0100 >>> @@ -119,6 +119,10 @@ struct data_reference >>> /* True when the data reference is in RHS of a stmt. */ >>> bool is_read; >>> >>> + /* True when the data reference is conditional, i.e. if it might not >>> + occur even when the statement runs to completion. */ >>> + bool is_conditional; >>> + >>> /* Behavior of the memory reference in the innermost loop. */ >>> struct innermost_loop_behavior innermost; >>> >>> @@ -138,6 +142,7 @@ #define DR_ACCESS_FN(DR, I) DR_AC >>> #define DR_NUM_DIMENSIONS(DR) DR_ACCESS_FNS (DR).length () >>> #define DR_IS_READ(DR) (DR)->is_read >>> #define DR_IS_WRITE(DR) (!DR_IS_READ (DR)) >>> +#define DR_IS_CONDITIONAL(DR) (DR)->is_conditional >>> #define DR_BASE_ADDRESS(DR) (DR)->innermost.base_address >>> #define DR_OFFSET(DR) (DR)->innermost.offset >>> #define DR_INIT(DR) (DR)->innermost.init >>> @@ -350,7 +355,8 @@ extern bool graphite_find_data_reference >>> vec<data_reference_p> *); >>> tree find_data_references_in_loop (struct loop *, vec<data_reference_p> *); >>> bool loop_nest_has_data_refs (loop_p loop); >>> -struct data_reference *create_data_ref (loop_p, loop_p, tree, gimple *, bool); >>> +struct data_reference *create_data_ref (loop_p, loop_p, tree, gimple *, bool, >>> + bool); >>> extern bool find_loop_nest (struct loop *, vec<loop_p> *); >>> extern struct data_dependence_relation *initialize_data_dependence_relation >>> (struct data_reference *, struct data_reference *, vec<loop_p>); >>> Index: gcc/tree-data-ref.c >>> =================================================================== >>> --- gcc/tree-data-ref.c 2017-06-08 08:51:43.349263895 +0100 >>> +++ gcc/tree-data-ref.c 2017-06-22 12:23:21.284421233 +0100 >>> @@ -1053,15 +1053,18 @@ free_data_ref (data_reference_p dr) >>> free (dr); >>> } >>> >>> -/* Analyzes memory reference MEMREF accessed in STMT. The reference >>> - is read if IS_READ is true, write otherwise. Returns the >>> - data_reference description of MEMREF. NEST is the outermost loop >>> - in which the reference should be instantiated, LOOP is the loop in >>> - which the data reference should be analyzed. */ >>> +/* Analyze memory reference MEMREF, which is accessed in STMT. The reference >>> + is a read if IS_READ is true, otherwise it is a write. IS_CONDITIONAL >>> + indicates that the reference is conditional, i.e. that it might not >>> + occur every time that STMT runs to completion. >>> + >>> + Return the data_reference description of MEMREF. NEST is the outermost >>> + loop in which the reference should be instantiated, LOOP is the loop >>> + in which the data reference should be analyzed. */ >>> >>> struct data_reference * >>> create_data_ref (loop_p nest, loop_p loop, tree memref, gimple *stmt, >>> - bool is_read) >>> + bool is_read, bool is_conditional) >>> { >>> struct data_reference *dr; >>> >>> @@ -1076,6 +1079,7 @@ create_data_ref (loop_p nest, loop_p loo >>> DR_STMT (dr) = stmt; >>> DR_REF (dr) = memref; >>> DR_IS_READ (dr) = is_read; >>> + DR_IS_CONDITIONAL (dr) = is_conditional; >>> >>> dr_analyze_innermost (dr, nest); >>> dr_analyze_indices (dr, nest, loop); >>> @@ -4446,6 +4450,10 @@ struct data_ref_loc >>> >>> /* True if the memory reference is read. */ >>> bool is_read; >>> + >>> + /* True if the data reference is conditional, i.e. if it might not >>> + occur even when the statement runs to completion. */ >>> + bool is_conditional; >>> }; >>> >>> >>> @@ -4512,6 +4520,7 @@ get_references_in_stmt (gimple *stmt, ve >>> { >>> ref.ref = op1; >>> ref.is_read = true; >>> + ref.is_conditional = false; >>> references->safe_push (ref); >>> } >>> } >>> @@ -4539,6 +4548,7 @@ get_references_in_stmt (gimple *stmt, ve >>> type = TREE_TYPE (gimple_call_arg (stmt, 3)); >>> if (TYPE_ALIGN (type) != align) >>> type = build_aligned_type (type, align); >>> + ref.is_conditional = true; >>> ref.ref = fold_build2 (MEM_REF, type, gimple_call_arg (stmt, 0), >>> ptr); >>> references->safe_push (ref); >>> @@ -4558,6 +4568,7 @@ get_references_in_stmt (gimple *stmt, ve >>> { >>> ref.ref = op1; >>> ref.is_read = true; >>> + ref.is_conditional = false; >>> references->safe_push (ref); >>> } >>> } >>> @@ -4571,6 +4582,7 @@ get_references_in_stmt (gimple *stmt, ve >>> { >>> ref.ref = op0; >>> ref.is_read = false; >>> + ref.is_conditional = false; >>> references->safe_push (ref); >>> } >>> return clobbers_memory; >>> @@ -4635,8 +4647,8 @@ find_data_references_in_stmt (struct loo >>> >>> FOR_EACH_VEC_ELT (references, i, ref) >>> { >>> - dr = create_data_ref (nest, loop_containing_stmt (stmt), >>> - ref->ref, stmt, ref->is_read); >>> + dr = create_data_ref (nest, loop_containing_stmt (stmt), ref->ref, >>> + stmt, ref->is_read, ref->is_conditional); >>> gcc_assert (dr != NULL); >>> datarefs->safe_push (dr); >>> } >>> @@ -4665,7 +4677,8 @@ graphite_find_data_references_in_stmt (l >>> >>> FOR_EACH_VEC_ELT (references, i, ref) >>> { >>> - dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read); >>> + dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read, >>> + ref->is_conditional); >>> gcc_assert (dr != NULL); >>> datarefs->safe_push (dr); >>> } >>> Index: gcc/tree-ssa-loop-prefetch.c >>> =================================================================== >>> --- gcc/tree-ssa-loop-prefetch.c 2017-06-07 21:58:55.928557601 +0100 >>> +++ gcc/tree-ssa-loop-prefetch.c 2017-06-22 12:23:21.285421180 +0100 >>> @@ -1633,7 +1633,7 @@ determine_loop_nest_reuse (struct loop * >>> for (ref = gr->refs; ref; ref = ref->next) >>> { >>> dr = create_data_ref (nest, loop_containing_stmt (ref->stmt), >>> - ref->mem, ref->stmt, !ref->write_p); >>> + ref->mem, ref->stmt, !ref->write_p, false); >>> >>> if (dr) >>> { >>> Index: gcc/tree-vect-data-refs.c >>> =================================================================== >>> --- gcc/tree-vect-data-refs.c 2017-06-08 08:51:43.350263752 +0100 >>> +++ gcc/tree-vect-data-refs.c 2017-06-22 12:23:21.286421126 +0100 >>> @@ -646,6 +646,102 @@ vect_slp_analyze_instance_dependence (sl >>> return res; >>> } >>> >>> +/* If DR is nested in a loop that is being vectorized, return the base >>> + address in the context of the vectorized loop (rather than the >>> + nested loop). Otherwise return the base address in the context >>> + of the containing statement. */ >>> + >>> +static tree >>> +vect_get_base_address (data_reference *dr) >>> +{ >>> + gimple *stmt = DR_STMT (dr); >>> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >>> + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); >>> + struct loop *loop = loop_vinfo != NULL ? LOOP_VINFO_LOOP (loop_vinfo) : NULL; >>> + if (loop && nested_in_vect_loop_p (loop, stmt)) >>> + return STMT_VINFO_DR_BASE_ADDRESS (stmt_info); >>> + else >>> + return DR_BASE_ADDRESS (dr); >>> +} >>> + >>> +/* Compute and return the alignment of base address BASE_ADDR in DR. */ >>> + >>> +static unsigned int >>> +vect_compute_base_alignment (data_reference *dr, tree base_addr) >>> +{ >>> + /* To look at the alignment of the base we have to preserve an inner >>> + MEM_REF as that carries the alignment information of the actual >>> + access. */ >>> + tree base = DR_REF (dr); >>> + while (handled_component_p (base)) >>> + base = TREE_OPERAND (base, 0); >>> + unsigned int base_alignment = 0; >>> + unsigned HOST_WIDE_INT base_bitpos; >>> + get_object_alignment_1 (base, &base_alignment, &base_bitpos); >>> + >>> + /* As data-ref analysis strips the MEM_REF down to its base operand >>> + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to >>> + adjust things to make base_alignment valid as the alignment of >>> + DR_BASE_ADDRESS. */ >>> + if (TREE_CODE (base) == MEM_REF) >>> + { >>> + /* Note all this only works if DR_BASE_ADDRESS is the same as >>> + MEM_REF operand zero, otherwise DR/SCEV analysis might have factored >>> + in other offsets. We need to rework DR to compute the alingment >>> + of DR_BASE_ADDRESS as long as all information is still available. */ >>> + if (operand_equal_p (TREE_OPERAND (base, 0), base_addr, 0)) >>> + { >>> + base_bitpos -= mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT; >>> + base_bitpos &= (base_alignment - 1); >>> + } >>> + else >>> + base_bitpos = BITS_PER_UNIT; >>> + } >>> + if (base_bitpos != 0) >>> + base_alignment = base_bitpos & -base_bitpos; >>> + >>> + /* Also look at the alignment of the base address DR analysis >>> + computed. */ >>> + unsigned int base_addr_alignment = get_pointer_alignment (base_addr); >>> + if (base_addr_alignment > base_alignment) >>> + base_alignment = base_addr_alignment; >>> + >>> + return base_alignment; >>> +} >>> + >>> +/* Compute alignments for the base addresses of all datarefs in VINFO. */ >>> + >>> +void >>> +vect_compute_base_alignments (vec_info *vinfo) >>> +{ >>> + /* If the region we're going to vectorize is reached, all unconditional >>> + data references occur at least once. We can therefore pool the base >>> + alignment guarantees from each unconditional reference. */ >>> + data_reference *dr; >>> + unsigned int i; >>> + FOR_EACH_VEC_ELT (vinfo->datarefs, i, dr) >>> + if (!DR_IS_CONDITIONAL (dr)) >>> + { >>> + tree base_addr = vect_get_base_address (dr); >>> + unsigned int alignment = vect_compute_base_alignment (dr, base_addr); >>> + bool existed; >>> + unsigned int &entry >>> + = vinfo->base_alignments.get_or_insert (base_addr, &existed); >>> + if (!existed || entry < alignment) >>> + { >>> + entry = alignment; >>> + if (dump_enabled_p ()) >>> + { >>> + dump_printf_loc (MSG_NOTE, vect_location, >>> + "setting base alignment for "); >>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, base_addr); >>> + dump_printf (MSG_NOTE, " to %d, based on ", alignment); >>> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, DR_STMT (dr), 0); >>> + } >>> + } >>> + } >>> +} >>> + >>> /* Function vect_compute_data_ref_alignment >>> >>> Compute the misalignment of the data reference DR. >>> @@ -663,6 +759,7 @@ vect_compute_data_ref_alignment (struct >>> { >>> gimple *stmt = DR_STMT (dr); >>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >>> + vec_base_alignments *base_alignments = &stmt_info->vinfo->base_alignments; >>> loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); >>> struct loop *loop = NULL; >>> tree ref = DR_REF (dr); >>> @@ -699,6 +796,8 @@ vect_compute_data_ref_alignment (struct >>> { >>> tree step = DR_STEP (dr); >>> >>> + base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info); >>> + aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info); >>> if (tree_fits_shwi_p (step) >>> && tree_to_shwi (step) % GET_MODE_SIZE (TYPE_MODE (vectype)) == 0) >>> { >>> @@ -706,8 +805,6 @@ vect_compute_data_ref_alignment (struct >>> dump_printf_loc (MSG_NOTE, vect_location, >>> "inner step divides the vector-size.\n"); >>> misalign = STMT_VINFO_DR_INIT (stmt_info); >>> - aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info); >>> - base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info); >>> } >>> else >>> { >>> @@ -738,39 +835,15 @@ vect_compute_data_ref_alignment (struct >>> } >>> } >>> >>> - /* To look at alignment of the base we have to preserve an inner MEM_REF >>> - as that carries alignment information of the actual access. */ >>> - base = ref; >>> - while (handled_component_p (base)) >>> - base = TREE_OPERAND (base, 0); >>> + /* Calculate the maximum of the pooled base address alignment and the >>> + alignment that we can compute for DR itself. The latter should >>> + already be included in the former for unconditional references. */ >>> unsigned int base_alignment = 0; >>> - unsigned HOST_WIDE_INT base_bitpos; >>> - get_object_alignment_1 (base, &base_alignment, &base_bitpos); >>> - /* As data-ref analysis strips the MEM_REF down to its base operand >>> - to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to >>> - adjust things to make base_alignment valid as the alignment of >>> - DR_BASE_ADDRESS. */ >>> - if (TREE_CODE (base) == MEM_REF) >>> - { >>> - /* Note all this only works if DR_BASE_ADDRESS is the same as >>> - MEM_REF operand zero, otherwise DR/SCEV analysis might have factored >>> - in other offsets. We need to rework DR to compute the alingment >>> - of DR_BASE_ADDRESS as long as all information is still available. */ >>> - if (operand_equal_p (TREE_OPERAND (base, 0), base_addr, 0)) >>> - { >>> - base_bitpos -= mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT; >>> - base_bitpos &= (base_alignment - 1); >>> - } >>> - else >>> - base_bitpos = BITS_PER_UNIT; >>> - } >>> - if (base_bitpos != 0) >>> - base_alignment = base_bitpos & -base_bitpos; >>> - /* Also look at the alignment of the base address DR analysis >>> - computed. */ >>> - unsigned int base_addr_alignment = get_pointer_alignment (base_addr); >>> - if (base_addr_alignment > base_alignment) >>> - base_alignment = base_addr_alignment; >>> + if (DR_IS_CONDITIONAL (dr)) >>> + base_alignment = vect_compute_base_alignment (dr, base_addr); >>> + if (unsigned int *entry = base_alignments->get (base_addr)) >>> + base_alignment = MAX (base_alignment, *entry); >>> + gcc_assert (base_alignment != 0); >>> >>> if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) >>> DR_VECT_AUX (dr)->base_element_aligned = true; >>> @@ -906,8 +979,29 @@ vect_update_misalignment_for_peel (struc >>> { >>> if (current_dr != dr) >>> continue; >>> - gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >>> - DR_MISALIGNMENT (dr_peel) / dr_peel_size); >>> + /* Any alignment guarantees provided by a reference only apply if >>> + the reference actually occurs. For example, in: >>> + >>> + struct s __attribute__((aligned(32))) { >>> + int misaligner; >>> + int array[N]; >>> + }; >>> + >>> + int *ptr; >>> + for (int i = 0; i < n; ++i) >>> + ptr[i] = c[i] ? ((struct s *) (ptr - 1))->array[i] : 0; >>> + >>> + we can only assume that ptr is part of a struct s if at least one >>> + c[i] is true. This in turn means that we have a higher base >>> + alignment guarantee for the read from ptr (if it occurs) than for >>> + the write to ptr, and we cannot unconditionally carry the former >>> + over to the latter. We still know that the two address values >>> + have the same misalignment, so if peeling has forced one of them >>> + to be aligned, the other must be too. */ >>> + gcc_assert (DR_IS_CONDITIONAL (dr_peel) >>> + || DR_IS_CONDITIONAL (dr) >>> + || (DR_MISALIGNMENT (dr) / dr_size >>> + == DR_MISALIGNMENT (dr_peel) / dr_peel_size)); >>> SET_DR_MISALIGNMENT (dr, 0); >>> return; >>> } >>> @@ -2117,8 +2211,7 @@ vect_find_same_alignment_drs (struct dat >>> if (dra == drb) >>> return; >>> >>> - if (!operand_equal_p (DR_BASE_OBJECT (dra), DR_BASE_OBJECT (drb), >>> - OEP_ADDRESS_OF) >>> + if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) >>> || !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0) >>> || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0)) >>> return; >>> @@ -2176,6 +2269,7 @@ vect_analyze_data_refs_alignment (loop_v >>> vec<data_reference_p> datarefs = vinfo->datarefs; >>> struct data_reference *dr; >>> >>> + vect_compute_base_alignments (vinfo); >>> FOR_EACH_VEC_ELT (datarefs, i, dr) >>> { >>> stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr)); >>> @@ -3374,7 +3468,8 @@ vect_analyze_data_refs (vec_info *vinfo, >>> { >>> struct data_reference *newdr >>> = create_data_ref (NULL, loop_containing_stmt (stmt), >>> - DR_REF (dr), stmt, maybe_scatter ? false : true); >>> + DR_REF (dr), stmt, !maybe_scatter, >>> + DR_IS_CONDITIONAL (dr)); >>> gcc_assert (newdr != NULL && DR_REF (newdr)); >>> if (DR_BASE_ADDRESS (newdr) >>> && DR_OFFSET (newdr) >>> Index: gcc/tree-vect-slp.c >>> =================================================================== >>> --- gcc/tree-vect-slp.c 2017-06-07 21:58:56.336475882 +0100 >>> +++ gcc/tree-vect-slp.c 2017-06-22 12:23:21.288421018 +0100 >>> @@ -2367,6 +2367,7 @@ new_bb_vec_info (gimple_stmt_iterator re >>> gimple_stmt_iterator gsi; >>> >>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); >>> + new (&res->base_alignments) vec_base_alignments (); >>> res->kind = vec_info::bb; >>> BB_VINFO_BB (res) = bb; >>> res->region_begin = region_begin; >>> @@ -2741,6 +2742,8 @@ vect_slp_analyze_bb_1 (gimple_stmt_itera >>> return NULL; >>> } >>> >>> + vect_compute_base_alignments (bb_vinfo); >>> + >>> /* Analyze and verify the alignment of data references and the >>> dependence in the SLP instances. */ >>> for (i = 0; BB_VINFO_SLP_INSTANCES (bb_vinfo).iterate (i, &instance); ) >>> Index: gcc/tree-vect-loop.c >>> =================================================================== >>> --- gcc/tree-vect-loop.c 2017-06-22 12:22:57.734313143 +0100 >>> +++ gcc/tree-vect-loop.c 2017-06-22 12:23:21.287421072 +0100 >>> @@ -1157,6 +1157,7 @@ new_loop_vec_info (struct loop *loop) >>> LOOP_VINFO_VECT_FACTOR (res) = 0; >>> LOOP_VINFO_LOOP_NEST (res) = vNULL; >>> LOOP_VINFO_DATAREFS (res) = vNULL; >>> + new (&res->base_alignments) vec_base_alignments (); >>> LOOP_VINFO_DDRS (res) = vNULL; >>> LOOP_VINFO_UNALIGNED_DR (res) = NULL; >>> LOOP_VINFO_MAY_MISALIGN_STMTS (res) = vNULL; >>> Index: gcc/tree-vectorizer.c >>> =================================================================== >>> --- gcc/tree-vectorizer.c 2017-06-22 12:22:57.732313220 +0100 >>> +++ gcc/tree-vectorizer.c 2017-06-22 12:23:21.288421018 +0100 >>> @@ -370,6 +370,8 @@ vect_destroy_datarefs (vec_info *vinfo) >>> } >>> >>> free_data_refs (vinfo->datarefs); >>> + >>> + vinfo->base_alignments.~vec_base_alignments (); >>> } >>> >>> /* A helper function to free scev and LOOP niter information, as well as >>> Index: gcc/testsuite/gcc.dg/vect/pr81136.c >>> =================================================================== >>> --- /dev/null 2017-06-22 07:43:14.805493307 +0100 >>> +++ gcc/testsuite/gcc.dg/vect/pr81136.c 2017-06-22 12:23:21.283421287 +0100 >>> @@ -0,0 +1,16 @@ >>> +/* { dg-do compile } */ >>> + >>> +struct __attribute__((aligned (32))) >>> +{ >>> + char misaligner; >>> + int foo[100]; >>> + int bar[100]; >>> +} *a; >>> + >>> +void >>> +fn1 (int n) >>> +{ >>> + int *b = a->foo; >>> + for (int i = 0; i < n; i++) >>> + a->bar[i] = b[i]; >>> +}
On Mon, Jun 26, 2017 at 12:25 PM, Richard Biener <richard.guenther@gmail.com> wrote: > On Fri, Jun 23, 2017 at 2:05 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Richard Biener <richard.guenther@gmail.com> writes: >>> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> The test case triggered this assert in vect_update_misalignment_for_peel: >>>> >>>> gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >>>> DR_MISALIGNMENT (dr_peel) / dr_peel_size); >>>> >>>> We knew that the two DRs had the same misalignment at runtime, but when >>>> considered in isolation, one data reference guaranteed a higher compile-time >>>> base alignment than the other. >>>> >>>> In the test case this looks like a missed opportunity. Both references >>>> are unconditional, so it should be possible to use the highest of the >>>> available base alignment guarantees when analyzing each reference. >>>> The patch does this. >>>> >>>> However, as the comment in the patch says, the base alignment guarantees >>>> provided by a conditional reference only apply if the reference occurs >>>> at least once. In this case it would be legitimate for two references >>>> to have the same runtime misalignment and for one reference to provide a >>>> stronger compile-time guarantee than the other about what the misalignment >>>> actually is. The patch therefore relaxes the assert to handle that case. >>> >>> Hmm, but you don't actually check whether a reference occurs only conditional, >>> do you? You just seem to say that for masked loads/stores the reference >>> is conditional (I believe that's not true). But for a loop like >>> >>> for (;;) >>> if (a[i]) >>> sum += b[j]; >>> >>> you still assume b[j] executes unconditionally? >> >> Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL >> was supposed to mean "even if the containing statement executes >> and runs to completion, the reference might not actually occur". >> The example above isn't conditional in that sense because the >> reference to b[j] does occur if the store is reached and completes. >> >> Masked loads and stores are conditional in that sense though. >> The reference only occurs if the mask is nonzero; the memory >> isn't touched otherwise. The functions are used to if-convert >> things like: >> >> for (...) >> a[i] = b[i] ? c[i] : d[i]; >> >> where there's no guarantee that it's safe to access c[i] when !b[i] >> (or d[i] when b[i]). No reference occurs for an all-false mask. > > But as you touch generic data-ref code here you should apply more > sensible semantics to DR_IS_CONDITIONAL than just marking > masked loads/stores but not DRs occuring inside BBs only executed > conditionally ... > >>> The vectorizer of course only sees unconditionally executed stmts. >>> >>> So - I'd simply not add this DR_IS_CONDITIONAL. Did you run into >>> any real-world (testsuite) issues without this? >> >> Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment >> assumptions in silly corner cases. I could add a scan test for it, >> for targets with masked loads and stores. It wouldn't trigger >> an execution failure though because we assume that targets with >> masked loads and stores allow unaligned accesses: >> >> /* For now assume all conditional loads/stores support unaligned >> access without any special code. */ >> if (is_gimple_call (stmt) >> && gimple_call_internal_p (stmt) >> && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD >> || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) >> return dr_unaligned_supported; >> >> So the worst that would happen is that we'd supposedly peel for >> alignment, but actually misalign everything instead, and so make >> things slower rather than quicker. >> >>> Note that the assert is to prevent bogus information. Iff we aligned >>> DR with base alignment 8 and misalign 3 then if another same-align >>> DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT >>> as it still can be 8 after aligning DR. >>> >>> So I think it's wrong to put DRs with differing base-alignment into >>> the same-align-refs chain, those should get their DR_MISALIGNMENT >>> updated independenlty after peeling. >> >> DR_MISALIGNMENT is relative to the vector alignment rather than >> the base alignment though. So: > > We seem to use it that way, yes (looking at set_ptr_info_alignment > uses). So why not fix the assert then by capping the alignment/misalignment > we compute at this value as well? (and document this in the header > around DR_MISALIGNMENT) > > Ideally we'd do alignment analysis independent of the vector size > though (for those stupid targets with multiple vector sizes to consider...). > >> a) when looking for references *A1 and *A2 with the same alignment, >> we simply have to prove that A1 % vecalign == A2 % vecalign. >> This doesn't require any knowledge about the base alignment. >> If we break the addresses down as: >> >> A1 = BASE1 + REST1, REST1 = INIT1 + OFFSET1 + X * STEP1 >> A2 = BASE2 + REST2, REST2 = INIT2 + OFFSET2 + X * STEP2 >> >> and can prove that BASE1 == BASE2, the alignment of that base >> isn't important. We simply need to prove that REST1 % vecalign >> == REST2 % vecalign for all X. >> >> b) In the assert, we've peeled the loop so that DR_PEEL is guaranteed >> to be vector-aligned. If DR_PEEL is A1 in the example above, we have >> A1 % vecalign == 0, so A2 % vecalign must be 0 too. This again doesn't >> rely on the base alignment being known. >> >> What a high base alignment for DR_PEEL gives us is the ability to know >> at compile how many iterations need to be peeled to make DR_PEEL aligned. >> But the points above apply regardless of whether we know that value at >> compile time or not. >> >> In examples like the test case, we would have known at compile time that >> VF-1 iterations would need to be peeled if we'd picked the store as the >> DR_PEEL, but would have treated the number of peels as variable if we'd >> picked the load. The value calculated at runtime would still have been >> VF-1, it's just that the code wouldn't have been as efficient. >> >> One of the benefits of pooling the alignments for unconditional references >> is that it no longer matters which DR we pick: the number of peels will >> be a compile-time constant both ways. >> >> Thanks, >> Richard >> >>> I'd rather not mix fixing this with the improvement to eventuall use a >>> larger align for the other DR if possible. > > ^^^ > > So can you fix the ICE with capping base alignment / DR_MISALIGNMENT? Oh, and as you are touching general tree-data-refs.[ch] on my wishlist is still "properly" computing the base alignment in tree-data-refs.[ch] itself rather than trying to undo analyze_innermost in the vectorizer and 2nd-guessing the alignment. Basically make alignment/misalignment first-class citizens. Richard. > Richard. > >>> Thanks, >>> Richard. >>> >>>> Tested on powerpc64-linux-gnu, aarch64-linux-gnu and x86_64-linux-gnu. >>>> OK to instal? >>>> >>>> Richard >>>> >>>> >>>> 2017-06-22 Richard Sandiford <richard.sandiford@linaro.org> >>>> >>>> gcc/ >>>> PR tree-optimization/81136 >>>> * tree-vectorizer.h: Include tree-hash-traits.h. >>>> (vec_base_alignments): New typedef. >>>> (vec_info): Add a base_alignments field. >>>> (vect_compute_base_alignments: Declare. >>>> * tree-data-ref.h (data_reference): Add an is_conditional field. >>>> (DR_IS_CONDITIONAL): New macro. >>>> (create_data_ref): Add an is_conditional argument. >>>> * tree-data-ref.c (create_data_ref): Likewise. Use it to initialize >>>> the is_conditional field. >>>> (data_ref_loc): Add an is_conditional field. >>>> (get_references_in_stmt): Set the is_conditional field. >>>> (find_data_references_in_stmt): Update call to create_data_ref. >>>> (graphite_find_data_references_in_stmt): Likewise. >>>> * tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Likewise. >>>> * tree-vect-data-refs.c (vect_analyze_data_refs): Likewise. >>>> (vect_get_base_address): New function. >>>> (vect_compute_base_alignments): Likewise. >>>> (vect_compute_base_alignment): Likewise, split out from... >>>> (vect_compute_data_ref_alignment): ...here. Use precomputed >>>> base alignments. Only compute a new base alignment here if the >>>> reference is conditional. >>>> (vect_update_misalignment_for_peel): Allow the compile-time >>>> DR_MISALIGNMENTs of two references with the same runtime alignment >>>> to be different if one of the references is conditional. >>>> (vect_find_same_alignment_drs): Compare base addresses instead >>>> of base objects. >>>> (vect_compute_data_ref_alignment): Call vect_compute_base_alignments. >>>> * tree-vect-slp.c (vect_slp_analyze_bb_1): Likewise. >>>> (new_bb_vec_info): Initialize base_alignments. >>>> * tree-vect-loop.c (new_loop_vec_info): Likewise. >>>> * tree-vectorizer.c (vect_destroy_datarefs): Release base_alignments. >>>> >>>> gcc/testsuite/ >>>> PR tree-optimization/81136 >>>> * gcc.dg/vect/pr81136.c: New test. >>>> >>>> Index: gcc/tree-vectorizer.h >>>> =================================================================== >>>> --- gcc/tree-vectorizer.h 2017-06-08 08:51:43.347264181 +0100 >>>> +++ gcc/tree-vectorizer.h 2017-06-22 12:23:21.288421018 +0100 >>>> @@ -22,6 +22,7 @@ Software Foundation; either version 3, o >>>> #define GCC_TREE_VECTORIZER_H >>>> >>>> #include "tree-data-ref.h" >>>> +#include "tree-hash-traits.h" >>>> #include "target.h" >>>> >>>> /* Used for naming of new temporaries. */ >>>> @@ -84,6 +85,10 @@ struct stmt_info_for_cost { >>>> >>>> typedef vec<stmt_info_for_cost> stmt_vector_for_cost; >>>> >>>> +/* Maps base addresses to the largest alignment that we've been able >>>> + to calculate for them. */ >>>> +typedef hash_map<tree_operand_hash, unsigned int> vec_base_alignments; >>>> + >>>> /************************************************************************ >>>> SLP >>>> ************************************************************************/ >>>> @@ -156,6 +161,10 @@ struct vec_info { >>>> /* All data references. */ >>>> vec<data_reference_p> datarefs; >>>> >>>> + /* Maps the base addresses of all data references in DATAREFS to the >>>> + largest alignment that we've been able to calculate for them. */ >>>> + vec_base_alignments base_alignments; >>>> + >>>> /* All data dependences. */ >>>> vec<ddr_p> ddrs; >>>> >>>> @@ -1117,6 +1126,7 @@ extern bool vect_prune_runtime_alias_tes >>>> extern bool vect_check_gather_scatter (gimple *, loop_vec_info, >>>> gather_scatter_info *); >>>> extern bool vect_analyze_data_refs (vec_info *, int *); >>>> +extern void vect_compute_base_alignments (vec_info *); >>>> extern tree vect_create_data_ref_ptr (gimple *, tree, struct loop *, tree, >>>> tree *, gimple_stmt_iterator *, >>>> gimple **, bool, bool *, >>>> Index: gcc/tree-data-ref.h >>>> =================================================================== >>>> --- gcc/tree-data-ref.h 2017-06-08 08:51:43.349263895 +0100 >>>> +++ gcc/tree-data-ref.h 2017-06-22 12:23:21.285421180 +0100 >>>> @@ -119,6 +119,10 @@ struct data_reference >>>> /* True when the data reference is in RHS of a stmt. */ >>>> bool is_read; >>>> >>>> + /* True when the data reference is conditional, i.e. if it might not >>>> + occur even when the statement runs to completion. */ >>>> + bool is_conditional; >>>> + >>>> /* Behavior of the memory reference in the innermost loop. */ >>>> struct innermost_loop_behavior innermost; >>>> >>>> @@ -138,6 +142,7 @@ #define DR_ACCESS_FN(DR, I) DR_AC >>>> #define DR_NUM_DIMENSIONS(DR) DR_ACCESS_FNS (DR).length () >>>> #define DR_IS_READ(DR) (DR)->is_read >>>> #define DR_IS_WRITE(DR) (!DR_IS_READ (DR)) >>>> +#define DR_IS_CONDITIONAL(DR) (DR)->is_conditional >>>> #define DR_BASE_ADDRESS(DR) (DR)->innermost.base_address >>>> #define DR_OFFSET(DR) (DR)->innermost.offset >>>> #define DR_INIT(DR) (DR)->innermost.init >>>> @@ -350,7 +355,8 @@ extern bool graphite_find_data_reference >>>> vec<data_reference_p> *); >>>> tree find_data_references_in_loop (struct loop *, vec<data_reference_p> *); >>>> bool loop_nest_has_data_refs (loop_p loop); >>>> -struct data_reference *create_data_ref (loop_p, loop_p, tree, gimple *, bool); >>>> +struct data_reference *create_data_ref (loop_p, loop_p, tree, gimple *, bool, >>>> + bool); >>>> extern bool find_loop_nest (struct loop *, vec<loop_p> *); >>>> extern struct data_dependence_relation *initialize_data_dependence_relation >>>> (struct data_reference *, struct data_reference *, vec<loop_p>); >>>> Index: gcc/tree-data-ref.c >>>> =================================================================== >>>> --- gcc/tree-data-ref.c 2017-06-08 08:51:43.349263895 +0100 >>>> +++ gcc/tree-data-ref.c 2017-06-22 12:23:21.284421233 +0100 >>>> @@ -1053,15 +1053,18 @@ free_data_ref (data_reference_p dr) >>>> free (dr); >>>> } >>>> >>>> -/* Analyzes memory reference MEMREF accessed in STMT. The reference >>>> - is read if IS_READ is true, write otherwise. Returns the >>>> - data_reference description of MEMREF. NEST is the outermost loop >>>> - in which the reference should be instantiated, LOOP is the loop in >>>> - which the data reference should be analyzed. */ >>>> +/* Analyze memory reference MEMREF, which is accessed in STMT. The reference >>>> + is a read if IS_READ is true, otherwise it is a write. IS_CONDITIONAL >>>> + indicates that the reference is conditional, i.e. that it might not >>>> + occur every time that STMT runs to completion. >>>> + >>>> + Return the data_reference description of MEMREF. NEST is the outermost >>>> + loop in which the reference should be instantiated, LOOP is the loop >>>> + in which the data reference should be analyzed. */ >>>> >>>> struct data_reference * >>>> create_data_ref (loop_p nest, loop_p loop, tree memref, gimple *stmt, >>>> - bool is_read) >>>> + bool is_read, bool is_conditional) >>>> { >>>> struct data_reference *dr; >>>> >>>> @@ -1076,6 +1079,7 @@ create_data_ref (loop_p nest, loop_p loo >>>> DR_STMT (dr) = stmt; >>>> DR_REF (dr) = memref; >>>> DR_IS_READ (dr) = is_read; >>>> + DR_IS_CONDITIONAL (dr) = is_conditional; >>>> >>>> dr_analyze_innermost (dr, nest); >>>> dr_analyze_indices (dr, nest, loop); >>>> @@ -4446,6 +4450,10 @@ struct data_ref_loc >>>> >>>> /* True if the memory reference is read. */ >>>> bool is_read; >>>> + >>>> + /* True if the data reference is conditional, i.e. if it might not >>>> + occur even when the statement runs to completion. */ >>>> + bool is_conditional; >>>> }; >>>> >>>> >>>> @@ -4512,6 +4520,7 @@ get_references_in_stmt (gimple *stmt, ve >>>> { >>>> ref.ref = op1; >>>> ref.is_read = true; >>>> + ref.is_conditional = false; >>>> references->safe_push (ref); >>>> } >>>> } >>>> @@ -4539,6 +4548,7 @@ get_references_in_stmt (gimple *stmt, ve >>>> type = TREE_TYPE (gimple_call_arg (stmt, 3)); >>>> if (TYPE_ALIGN (type) != align) >>>> type = build_aligned_type (type, align); >>>> + ref.is_conditional = true; >>>> ref.ref = fold_build2 (MEM_REF, type, gimple_call_arg (stmt, 0), >>>> ptr); >>>> references->safe_push (ref); >>>> @@ -4558,6 +4568,7 @@ get_references_in_stmt (gimple *stmt, ve >>>> { >>>> ref.ref = op1; >>>> ref.is_read = true; >>>> + ref.is_conditional = false; >>>> references->safe_push (ref); >>>> } >>>> } >>>> @@ -4571,6 +4582,7 @@ get_references_in_stmt (gimple *stmt, ve >>>> { >>>> ref.ref = op0; >>>> ref.is_read = false; >>>> + ref.is_conditional = false; >>>> references->safe_push (ref); >>>> } >>>> return clobbers_memory; >>>> @@ -4635,8 +4647,8 @@ find_data_references_in_stmt (struct loo >>>> >>>> FOR_EACH_VEC_ELT (references, i, ref) >>>> { >>>> - dr = create_data_ref (nest, loop_containing_stmt (stmt), >>>> - ref->ref, stmt, ref->is_read); >>>> + dr = create_data_ref (nest, loop_containing_stmt (stmt), ref->ref, >>>> + stmt, ref->is_read, ref->is_conditional); >>>> gcc_assert (dr != NULL); >>>> datarefs->safe_push (dr); >>>> } >>>> @@ -4665,7 +4677,8 @@ graphite_find_data_references_in_stmt (l >>>> >>>> FOR_EACH_VEC_ELT (references, i, ref) >>>> { >>>> - dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read); >>>> + dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read, >>>> + ref->is_conditional); >>>> gcc_assert (dr != NULL); >>>> datarefs->safe_push (dr); >>>> } >>>> Index: gcc/tree-ssa-loop-prefetch.c >>>> =================================================================== >>>> --- gcc/tree-ssa-loop-prefetch.c 2017-06-07 21:58:55.928557601 +0100 >>>> +++ gcc/tree-ssa-loop-prefetch.c 2017-06-22 12:23:21.285421180 +0100 >>>> @@ -1633,7 +1633,7 @@ determine_loop_nest_reuse (struct loop * >>>> for (ref = gr->refs; ref; ref = ref->next) >>>> { >>>> dr = create_data_ref (nest, loop_containing_stmt (ref->stmt), >>>> - ref->mem, ref->stmt, !ref->write_p); >>>> + ref->mem, ref->stmt, !ref->write_p, false); >>>> >>>> if (dr) >>>> { >>>> Index: gcc/tree-vect-data-refs.c >>>> =================================================================== >>>> --- gcc/tree-vect-data-refs.c 2017-06-08 08:51:43.350263752 +0100 >>>> +++ gcc/tree-vect-data-refs.c 2017-06-22 12:23:21.286421126 +0100 >>>> @@ -646,6 +646,102 @@ vect_slp_analyze_instance_dependence (sl >>>> return res; >>>> } >>>> >>>> +/* If DR is nested in a loop that is being vectorized, return the base >>>> + address in the context of the vectorized loop (rather than the >>>> + nested loop). Otherwise return the base address in the context >>>> + of the containing statement. */ >>>> + >>>> +static tree >>>> +vect_get_base_address (data_reference *dr) >>>> +{ >>>> + gimple *stmt = DR_STMT (dr); >>>> + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >>>> + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); >>>> + struct loop *loop = loop_vinfo != NULL ? LOOP_VINFO_LOOP (loop_vinfo) : NULL; >>>> + if (loop && nested_in_vect_loop_p (loop, stmt)) >>>> + return STMT_VINFO_DR_BASE_ADDRESS (stmt_info); >>>> + else >>>> + return DR_BASE_ADDRESS (dr); >>>> +} >>>> + >>>> +/* Compute and return the alignment of base address BASE_ADDR in DR. */ >>>> + >>>> +static unsigned int >>>> +vect_compute_base_alignment (data_reference *dr, tree base_addr) >>>> +{ >>>> + /* To look at the alignment of the base we have to preserve an inner >>>> + MEM_REF as that carries the alignment information of the actual >>>> + access. */ >>>> + tree base = DR_REF (dr); >>>> + while (handled_component_p (base)) >>>> + base = TREE_OPERAND (base, 0); >>>> + unsigned int base_alignment = 0; >>>> + unsigned HOST_WIDE_INT base_bitpos; >>>> + get_object_alignment_1 (base, &base_alignment, &base_bitpos); >>>> + >>>> + /* As data-ref analysis strips the MEM_REF down to its base operand >>>> + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to >>>> + adjust things to make base_alignment valid as the alignment of >>>> + DR_BASE_ADDRESS. */ >>>> + if (TREE_CODE (base) == MEM_REF) >>>> + { >>>> + /* Note all this only works if DR_BASE_ADDRESS is the same as >>>> + MEM_REF operand zero, otherwise DR/SCEV analysis might have factored >>>> + in other offsets. We need to rework DR to compute the alingment >>>> + of DR_BASE_ADDRESS as long as all information is still available. */ >>>> + if (operand_equal_p (TREE_OPERAND (base, 0), base_addr, 0)) >>>> + { >>>> + base_bitpos -= mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT; >>>> + base_bitpos &= (base_alignment - 1); >>>> + } >>>> + else >>>> + base_bitpos = BITS_PER_UNIT; >>>> + } >>>> + if (base_bitpos != 0) >>>> + base_alignment = base_bitpos & -base_bitpos; >>>> + >>>> + /* Also look at the alignment of the base address DR analysis >>>> + computed. */ >>>> + unsigned int base_addr_alignment = get_pointer_alignment (base_addr); >>>> + if (base_addr_alignment > base_alignment) >>>> + base_alignment = base_addr_alignment; >>>> + >>>> + return base_alignment; >>>> +} >>>> + >>>> +/* Compute alignments for the base addresses of all datarefs in VINFO. */ >>>> + >>>> +void >>>> +vect_compute_base_alignments (vec_info *vinfo) >>>> +{ >>>> + /* If the region we're going to vectorize is reached, all unconditional >>>> + data references occur at least once. We can therefore pool the base >>>> + alignment guarantees from each unconditional reference. */ >>>> + data_reference *dr; >>>> + unsigned int i; >>>> + FOR_EACH_VEC_ELT (vinfo->datarefs, i, dr) >>>> + if (!DR_IS_CONDITIONAL (dr)) >>>> + { >>>> + tree base_addr = vect_get_base_address (dr); >>>> + unsigned int alignment = vect_compute_base_alignment (dr, base_addr); >>>> + bool existed; >>>> + unsigned int &entry >>>> + = vinfo->base_alignments.get_or_insert (base_addr, &existed); >>>> + if (!existed || entry < alignment) >>>> + { >>>> + entry = alignment; >>>> + if (dump_enabled_p ()) >>>> + { >>>> + dump_printf_loc (MSG_NOTE, vect_location, >>>> + "setting base alignment for "); >>>> + dump_generic_expr (MSG_NOTE, TDF_SLIM, base_addr); >>>> + dump_printf (MSG_NOTE, " to %d, based on ", alignment); >>>> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, DR_STMT (dr), 0); >>>> + } >>>> + } >>>> + } >>>> +} >>>> + >>>> /* Function vect_compute_data_ref_alignment >>>> >>>> Compute the misalignment of the data reference DR. >>>> @@ -663,6 +759,7 @@ vect_compute_data_ref_alignment (struct >>>> { >>>> gimple *stmt = DR_STMT (dr); >>>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >>>> + vec_base_alignments *base_alignments = &stmt_info->vinfo->base_alignments; >>>> loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); >>>> struct loop *loop = NULL; >>>> tree ref = DR_REF (dr); >>>> @@ -699,6 +796,8 @@ vect_compute_data_ref_alignment (struct >>>> { >>>> tree step = DR_STEP (dr); >>>> >>>> + base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info); >>>> + aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info); >>>> if (tree_fits_shwi_p (step) >>>> && tree_to_shwi (step) % GET_MODE_SIZE (TYPE_MODE (vectype)) == 0) >>>> { >>>> @@ -706,8 +805,6 @@ vect_compute_data_ref_alignment (struct >>>> dump_printf_loc (MSG_NOTE, vect_location, >>>> "inner step divides the vector-size.\n"); >>>> misalign = STMT_VINFO_DR_INIT (stmt_info); >>>> - aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info); >>>> - base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info); >>>> } >>>> else >>>> { >>>> @@ -738,39 +835,15 @@ vect_compute_data_ref_alignment (struct >>>> } >>>> } >>>> >>>> - /* To look at alignment of the base we have to preserve an inner MEM_REF >>>> - as that carries alignment information of the actual access. */ >>>> - base = ref; >>>> - while (handled_component_p (base)) >>>> - base = TREE_OPERAND (base, 0); >>>> + /* Calculate the maximum of the pooled base address alignment and the >>>> + alignment that we can compute for DR itself. The latter should >>>> + already be included in the former for unconditional references. */ >>>> unsigned int base_alignment = 0; >>>> - unsigned HOST_WIDE_INT base_bitpos; >>>> - get_object_alignment_1 (base, &base_alignment, &base_bitpos); >>>> - /* As data-ref analysis strips the MEM_REF down to its base operand >>>> - to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to >>>> - adjust things to make base_alignment valid as the alignment of >>>> - DR_BASE_ADDRESS. */ >>>> - if (TREE_CODE (base) == MEM_REF) >>>> - { >>>> - /* Note all this only works if DR_BASE_ADDRESS is the same as >>>> - MEM_REF operand zero, otherwise DR/SCEV analysis might have factored >>>> - in other offsets. We need to rework DR to compute the alingment >>>> - of DR_BASE_ADDRESS as long as all information is still available. */ >>>> - if (operand_equal_p (TREE_OPERAND (base, 0), base_addr, 0)) >>>> - { >>>> - base_bitpos -= mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT; >>>> - base_bitpos &= (base_alignment - 1); >>>> - } >>>> - else >>>> - base_bitpos = BITS_PER_UNIT; >>>> - } >>>> - if (base_bitpos != 0) >>>> - base_alignment = base_bitpos & -base_bitpos; >>>> - /* Also look at the alignment of the base address DR analysis >>>> - computed. */ >>>> - unsigned int base_addr_alignment = get_pointer_alignment (base_addr); >>>> - if (base_addr_alignment > base_alignment) >>>> - base_alignment = base_addr_alignment; >>>> + if (DR_IS_CONDITIONAL (dr)) >>>> + base_alignment = vect_compute_base_alignment (dr, base_addr); >>>> + if (unsigned int *entry = base_alignments->get (base_addr)) >>>> + base_alignment = MAX (base_alignment, *entry); >>>> + gcc_assert (base_alignment != 0); >>>> >>>> if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) >>>> DR_VECT_AUX (dr)->base_element_aligned = true; >>>> @@ -906,8 +979,29 @@ vect_update_misalignment_for_peel (struc >>>> { >>>> if (current_dr != dr) >>>> continue; >>>> - gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >>>> - DR_MISALIGNMENT (dr_peel) / dr_peel_size); >>>> + /* Any alignment guarantees provided by a reference only apply if >>>> + the reference actually occurs. For example, in: >>>> + >>>> + struct s __attribute__((aligned(32))) { >>>> + int misaligner; >>>> + int array[N]; >>>> + }; >>>> + >>>> + int *ptr; >>>> + for (int i = 0; i < n; ++i) >>>> + ptr[i] = c[i] ? ((struct s *) (ptr - 1))->array[i] : 0; >>>> + >>>> + we can only assume that ptr is part of a struct s if at least one >>>> + c[i] is true. This in turn means that we have a higher base >>>> + alignment guarantee for the read from ptr (if it occurs) than for >>>> + the write to ptr, and we cannot unconditionally carry the former >>>> + over to the latter. We still know that the two address values >>>> + have the same misalignment, so if peeling has forced one of them >>>> + to be aligned, the other must be too. */ >>>> + gcc_assert (DR_IS_CONDITIONAL (dr_peel) >>>> + || DR_IS_CONDITIONAL (dr) >>>> + || (DR_MISALIGNMENT (dr) / dr_size >>>> + == DR_MISALIGNMENT (dr_peel) / dr_peel_size)); >>>> SET_DR_MISALIGNMENT (dr, 0); >>>> return; >>>> } >>>> @@ -2117,8 +2211,7 @@ vect_find_same_alignment_drs (struct dat >>>> if (dra == drb) >>>> return; >>>> >>>> - if (!operand_equal_p (DR_BASE_OBJECT (dra), DR_BASE_OBJECT (drb), >>>> - OEP_ADDRESS_OF) >>>> + if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) >>>> || !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0) >>>> || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0)) >>>> return; >>>> @@ -2176,6 +2269,7 @@ vect_analyze_data_refs_alignment (loop_v >>>> vec<data_reference_p> datarefs = vinfo->datarefs; >>>> struct data_reference *dr; >>>> >>>> + vect_compute_base_alignments (vinfo); >>>> FOR_EACH_VEC_ELT (datarefs, i, dr) >>>> { >>>> stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr)); >>>> @@ -3374,7 +3468,8 @@ vect_analyze_data_refs (vec_info *vinfo, >>>> { >>>> struct data_reference *newdr >>>> = create_data_ref (NULL, loop_containing_stmt (stmt), >>>> - DR_REF (dr), stmt, maybe_scatter ? false : true); >>>> + DR_REF (dr), stmt, !maybe_scatter, >>>> + DR_IS_CONDITIONAL (dr)); >>>> gcc_assert (newdr != NULL && DR_REF (newdr)); >>>> if (DR_BASE_ADDRESS (newdr) >>>> && DR_OFFSET (newdr) >>>> Index: gcc/tree-vect-slp.c >>>> =================================================================== >>>> --- gcc/tree-vect-slp.c 2017-06-07 21:58:56.336475882 +0100 >>>> +++ gcc/tree-vect-slp.c 2017-06-22 12:23:21.288421018 +0100 >>>> @@ -2367,6 +2367,7 @@ new_bb_vec_info (gimple_stmt_iterator re >>>> gimple_stmt_iterator gsi; >>>> >>>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); >>>> + new (&res->base_alignments) vec_base_alignments (); >>>> res->kind = vec_info::bb; >>>> BB_VINFO_BB (res) = bb; >>>> res->region_begin = region_begin; >>>> @@ -2741,6 +2742,8 @@ vect_slp_analyze_bb_1 (gimple_stmt_itera >>>> return NULL; >>>> } >>>> >>>> + vect_compute_base_alignments (bb_vinfo); >>>> + >>>> /* Analyze and verify the alignment of data references and the >>>> dependence in the SLP instances. */ >>>> for (i = 0; BB_VINFO_SLP_INSTANCES (bb_vinfo).iterate (i, &instance); ) >>>> Index: gcc/tree-vect-loop.c >>>> =================================================================== >>>> --- gcc/tree-vect-loop.c 2017-06-22 12:22:57.734313143 +0100 >>>> +++ gcc/tree-vect-loop.c 2017-06-22 12:23:21.287421072 +0100 >>>> @@ -1157,6 +1157,7 @@ new_loop_vec_info (struct loop *loop) >>>> LOOP_VINFO_VECT_FACTOR (res) = 0; >>>> LOOP_VINFO_LOOP_NEST (res) = vNULL; >>>> LOOP_VINFO_DATAREFS (res) = vNULL; >>>> + new (&res->base_alignments) vec_base_alignments (); >>>> LOOP_VINFO_DDRS (res) = vNULL; >>>> LOOP_VINFO_UNALIGNED_DR (res) = NULL; >>>> LOOP_VINFO_MAY_MISALIGN_STMTS (res) = vNULL; >>>> Index: gcc/tree-vectorizer.c >>>> =================================================================== >>>> --- gcc/tree-vectorizer.c 2017-06-22 12:22:57.732313220 +0100 >>>> +++ gcc/tree-vectorizer.c 2017-06-22 12:23:21.288421018 +0100 >>>> @@ -370,6 +370,8 @@ vect_destroy_datarefs (vec_info *vinfo) >>>> } >>>> >>>> free_data_refs (vinfo->datarefs); >>>> + >>>> + vinfo->base_alignments.~vec_base_alignments (); >>>> } >>>> >>>> /* A helper function to free scev and LOOP niter information, as well as >>>> Index: gcc/testsuite/gcc.dg/vect/pr81136.c >>>> =================================================================== >>>> --- /dev/null 2017-06-22 07:43:14.805493307 +0100 >>>> +++ gcc/testsuite/gcc.dg/vect/pr81136.c 2017-06-22 12:23:21.283421287 +0100 >>>> @@ -0,0 +1,16 @@ >>>> +/* { dg-do compile } */ >>>> + >>>> +struct __attribute__((aligned (32))) >>>> +{ >>>> + char misaligner; >>>> + int foo[100]; >>>> + int bar[100]; >>>> +} *a; >>>> + >>>> +void >>>> +fn1 (int n) >>>> +{ >>>> + int *b = a->foo; >>>> + for (int i = 0; i < n; i++) >>>> + a->bar[i] = b[i]; >>>> +}
Richard Biener <richard.guenther@gmail.com> writes: > On Fri, Jun 23, 2017 at 2:05 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Richard Biener <richard.guenther@gmail.com> writes: >>> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> The test case triggered this assert in vect_update_misalignment_for_peel: >>>> >>>> gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >>>> DR_MISALIGNMENT (dr_peel) / dr_peel_size); >>>> >>>> We knew that the two DRs had the same misalignment at runtime, but when >>>> considered in isolation, one data reference guaranteed a higher compile-time >>>> base alignment than the other. >>>> >>>> In the test case this looks like a missed opportunity. Both references >>>> are unconditional, so it should be possible to use the highest of the >>>> available base alignment guarantees when analyzing each reference. >>>> The patch does this. >>>> >>>> However, as the comment in the patch says, the base alignment guarantees >>>> provided by a conditional reference only apply if the reference occurs >>>> at least once. In this case it would be legitimate for two references >>>> to have the same runtime misalignment and for one reference to provide a >>>> stronger compile-time guarantee than the other about what the misalignment >>>> actually is. The patch therefore relaxes the assert to handle that case. >>> >>> Hmm, but you don't actually check whether a reference occurs only > conditional, >>> do you? You just seem to say that for masked loads/stores the reference >>> is conditional (I believe that's not true). But for a loop like >>> >>> for (;;) >>> if (a[i]) >>> sum += b[j]; >>> >>> you still assume b[j] executes unconditionally? >> >> Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL >> was supposed to mean "even if the containing statement executes >> and runs to completion, the reference might not actually occur". >> The example above isn't conditional in that sense because the >> reference to b[j] does occur if the store is reached and completes. >> >> Masked loads and stores are conditional in that sense though. >> The reference only occurs if the mask is nonzero; the memory >> isn't touched otherwise. The functions are used to if-convert >> things like: >> >> for (...) >> a[i] = b[i] ? c[i] : d[i]; >> >> where there's no guarantee that it's safe to access c[i] when !b[i] >> (or d[i] when b[i]). No reference occurs for an all-false mask. > > But as you touch generic data-ref code here you should apply more > sensible semantics to DR_IS_CONDITIONAL than just marking > masked loads/stores but not DRs occuring inside BBs only executed > conditionally ... I don't see why that's more sensible though. If a statement is only conditionally executed in a loop, it's up to the consumer to decide what to do about that. The conditions under which the statement is reached are a control-flow issue and tree-data-ref.c doesn't have any special information about it. Masked loads and stores are special because the DR_REFs created by tree-data-ref.c are artificial: they didn't exist as MEM_REFs in the original DR_STMT. And AIUI they didn't exist as MEM_REFs precisely because they're not guaranteed to happen, even if the load or store statement itself is executed. So in this case the DR_IS_CONDITIONAL is reflecting something that tree-data-ref.c itself has done. How about calling it DR_IS_CONDITIONAL_IN_STMT to avoid the general-sounding name? >>> The vectorizer of course only sees unconditionally executed stmts. >>> >>> So - I'd simply not add this DR_IS_CONDITIONAL. Did you run into >>> any real-world (testsuite) issues without this? >> >> Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment >> assumptions in silly corner cases. I could add a scan test for it, >> for targets with masked loads and stores. It wouldn't trigger >> an execution failure though because we assume that targets with >> masked loads and stores allow unaligned accesses: >> >> /* For now assume all conditional loads/stores support unaligned >> access without any special code. */ >> if (is_gimple_call (stmt) >> && gimple_call_internal_p (stmt) >> && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD >> || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) >> return dr_unaligned_supported; >> >> So the worst that would happen is that we'd supposedly peel for >> alignment, but actually misalign everything instead, and so make >> things slower rather than quicker. >> >>> Note that the assert is to prevent bogus information. Iff we aligned >>> DR with base alignment 8 and misalign 3 then if another same-align >>> DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT >>> as it still can be 8 after aligning DR. >>> >>> So I think it's wrong to put DRs with differing base-alignment into >>> the same-align-refs chain, those should get their DR_MISALIGNMENT >>> updated independenlty after peeling. >> >> DR_MISALIGNMENT is relative to the vector alignment rather than >> the base alignment though. So: > > We seem to use it that way, yes (looking at set_ptr_info_alignment > uses). So why not fix the assert then by capping the alignment/misalignment > we compute at this value as well? (and document this in the header > around DR_MISALIGNMENT) > > Ideally we'd do alignment analysis independent of the vector size > though (for those stupid targets with multiple vector sizes to consider...). > >> a) when looking for references *A1 and *A2 with the same alignment, >> we simply have to prove that A1 % vecalign == A2 % vecalign. >> This doesn't require any knowledge about the base alignment. >> If we break the addresses down as: >> >> A1 = BASE1 + REST1, REST1 = INIT1 + OFFSET1 + X * STEP1 >> A2 = BASE2 + REST2, REST2 = INIT2 + OFFSET2 + X * STEP2 >> >> and can prove that BASE1 == BASE2, the alignment of that base >> isn't important. We simply need to prove that REST1 % vecalign >> == REST2 % vecalign for all X. >> >> b) In the assert, we've peeled the loop so that DR_PEEL is guaranteed >> to be vector-aligned. If DR_PEEL is A1 in the example above, we have >> A1 % vecalign == 0, so A2 % vecalign must be 0 too. This again doesn't >> rely on the base alignment being known. >> >> What a high base alignment for DR_PEEL gives us is the ability to know >> at compile how many iterations need to be peeled to make DR_PEEL aligned. >> But the points above apply regardless of whether we know that value at >> compile time or not. >> >> In examples like the test case, we would have known at compile time that >> VF-1 iterations would need to be peeled if we'd picked the store as the >> DR_PEEL, but would have treated the number of peels as variable if we'd >> picked the load. The value calculated at runtime would still have been >> VF-1, it's just that the code wouldn't have been as efficient. >> >> One of the benefits of pooling the alignments for unconditional references >> is that it no longer matters which DR we pick: the number of peels will >> be a compile-time constant both ways. >> >> Thanks, >> Richard >> >>> I'd rather not mix fixing this with the improvement to eventuall use a >>> larger align for the other DR if possible. > > ^^^ > > So can you fix the ICE with capping base alignment / DR_MISALIGNMENT? I don't think the problem is the lack of a cap. In the test case we see that: 1. B is known at compile time to be X * vecsize + Y when considered in isolation, because the base alignment derived from its DR_REF >= vecsize. So DR_MISALIGNMENT (B) == Y. 2. A's misalignment wrt vecsize is not known at compile time when considered in isolation, because no useful base alignment can be derived from its DR_REF. (The DR_REF is to a plain int rather than to a structure with a high alignment.) So DR_MISALIGNMENT (A) == -1. 3. A and B when considered as a pair trivially have the same misalignment wrt vecsize, for the reasons above. Each of these results is individually correct. The problem is that the assert is conflating two things: it's saying that if we know two datarefs have the same misaligment, we must either be able to calculate a compile-time misalignment for both datarefs in isolation, or we must fail to calculate a compile-time misalignment for both datarefs in isolation. That isn't true: it's valid to have situations in which the compile-time misalignment is known for one dataref in isolation but not for the other. Thanks, Richard
On Mon, Jun 26, 2017 at 1:14 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Fri, Jun 23, 2017 at 2:05 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> Richard Biener <richard.guenther@gmail.com> writes: >>>> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford >>>> <richard.sandiford@linaro.org> wrote: >>>>> The test case triggered this assert in vect_update_misalignment_for_peel: >>>>> >>>>> gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >>>>> DR_MISALIGNMENT (dr_peel) / dr_peel_size); >>>>> >>>>> We knew that the two DRs had the same misalignment at runtime, but when >>>>> considered in isolation, one data reference guaranteed a higher compile-time >>>>> base alignment than the other. >>>>> >>>>> In the test case this looks like a missed opportunity. Both references >>>>> are unconditional, so it should be possible to use the highest of the >>>>> available base alignment guarantees when analyzing each reference. >>>>> The patch does this. >>>>> >>>>> However, as the comment in the patch says, the base alignment guarantees >>>>> provided by a conditional reference only apply if the reference occurs >>>>> at least once. In this case it would be legitimate for two references >>>>> to have the same runtime misalignment and for one reference to provide a >>>>> stronger compile-time guarantee than the other about what the misalignment >>>>> actually is. The patch therefore relaxes the assert to handle that case. >>>> >>>> Hmm, but you don't actually check whether a reference occurs only >> conditional, >>>> do you? You just seem to say that for masked loads/stores the reference >>>> is conditional (I believe that's not true). But for a loop like >>>> >>>> for (;;) >>>> if (a[i]) >>>> sum += b[j]; >>>> >>>> you still assume b[j] executes unconditionally? >>> >>> Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL >>> was supposed to mean "even if the containing statement executes >>> and runs to completion, the reference might not actually occur". >>> The example above isn't conditional in that sense because the >>> reference to b[j] does occur if the store is reached and completes. >>> >>> Masked loads and stores are conditional in that sense though. >>> The reference only occurs if the mask is nonzero; the memory >>> isn't touched otherwise. The functions are used to if-convert >>> things like: >>> >>> for (...) >>> a[i] = b[i] ? c[i] : d[i]; >>> >>> where there's no guarantee that it's safe to access c[i] when !b[i] >>> (or d[i] when b[i]). No reference occurs for an all-false mask. >> >> But as you touch generic data-ref code here you should apply more >> sensible semantics to DR_IS_CONDITIONAL than just marking >> masked loads/stores but not DRs occuring inside BBs only executed >> conditionally ... > > I don't see why that's more sensible though. If a statement is only > conditionally executed in a loop, it's up to the consumer to decide > what to do about that. The conditions under which the statement > is reached are a control-flow issue and tree-data-ref.c doesn't > have any special information about it. > > Masked loads and stores are special because the DR_REFs created by > tree-data-ref.c are artificial: they didn't exist as MEM_REFs in the > original DR_STMT. And AIUI they didn't exist as MEM_REFs precisely > because they're not guaranteed to happen, even if the load or store > statement itself is executed. So in this case the DR_IS_CONDITIONAL > is reflecting something that tree-data-ref.c itself has done. > > How about calling it DR_IS_CONDITIONAL_IN_STMT to avoid the > general-sounding name? That sounds better and avoids the ambiguity. >>>> The vectorizer of course only sees unconditionally executed stmts. >>>> >>>> So - I'd simply not add this DR_IS_CONDITIONAL. Did you run into >>>> any real-world (testsuite) issues without this? >>> >>> Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment >>> assumptions in silly corner cases. I could add a scan test for it, >>> for targets with masked loads and stores. It wouldn't trigger >>> an execution failure though because we assume that targets with >>> masked loads and stores allow unaligned accesses: >>> >>> /* For now assume all conditional loads/stores support unaligned >>> access without any special code. */ >>> if (is_gimple_call (stmt) >>> && gimple_call_internal_p (stmt) >>> && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD >>> || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) >>> return dr_unaligned_supported; >>> >>> So the worst that would happen is that we'd supposedly peel for >>> alignment, but actually misalign everything instead, and so make >>> things slower rather than quicker. >>> >>>> Note that the assert is to prevent bogus information. Iff we aligned >>>> DR with base alignment 8 and misalign 3 then if another same-align >>>> DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT >>>> as it still can be 8 after aligning DR. >>>> >>>> So I think it's wrong to put DRs with differing base-alignment into >>>> the same-align-refs chain, those should get their DR_MISALIGNMENT >>>> updated independenlty after peeling. >>> >>> DR_MISALIGNMENT is relative to the vector alignment rather than >>> the base alignment though. So: >> >> We seem to use it that way, yes (looking at set_ptr_info_alignment >> uses). So why not fix the assert then by capping the alignment/misalignment >> we compute at this value as well? (and document this in the header >> around DR_MISALIGNMENT) >> >> Ideally we'd do alignment analysis independent of the vector size >> though (for those stupid targets with multiple vector sizes to consider...). >> >>> a) when looking for references *A1 and *A2 with the same alignment, >>> we simply have to prove that A1 % vecalign == A2 % vecalign. >>> This doesn't require any knowledge about the base alignment. >>> If we break the addresses down as: >>> >>> A1 = BASE1 + REST1, REST1 = INIT1 + OFFSET1 + X * STEP1 >>> A2 = BASE2 + REST2, REST2 = INIT2 + OFFSET2 + X * STEP2 >>> >>> and can prove that BASE1 == BASE2, the alignment of that base >>> isn't important. We simply need to prove that REST1 % vecalign >>> == REST2 % vecalign for all X. >>> >>> b) In the assert, we've peeled the loop so that DR_PEEL is guaranteed >>> to be vector-aligned. If DR_PEEL is A1 in the example above, we have >>> A1 % vecalign == 0, so A2 % vecalign must be 0 too. This again doesn't >>> rely on the base alignment being known. >>> >>> What a high base alignment for DR_PEEL gives us is the ability to know >>> at compile how many iterations need to be peeled to make DR_PEEL aligned. >>> But the points above apply regardless of whether we know that value at >>> compile time or not. >>> >>> In examples like the test case, we would have known at compile time that >>> VF-1 iterations would need to be peeled if we'd picked the store as the >>> DR_PEEL, but would have treated the number of peels as variable if we'd >>> picked the load. The value calculated at runtime would still have been >>> VF-1, it's just that the code wouldn't have been as efficient. >>> >>> One of the benefits of pooling the alignments for unconditional references >>> is that it no longer matters which DR we pick: the number of peels will >>> be a compile-time constant both ways. >>> >>> Thanks, >>> Richard >>> >>>> I'd rather not mix fixing this with the improvement to eventuall use a >>>> larger align for the other DR if possible. >> >> ^^^ >> >> So can you fix the ICE with capping base alignment / DR_MISALIGNMENT? > > I don't think the problem is the lack of a cap. In the test case we > see that: > > 1. B is known at compile time to be X * vecsize + Y when considered in > isolation, because the base alignment derived from its DR_REF >= vecsize. > So DR_MISALIGNMENT (B) == Y. > > 2. A's misalignment wrt vecsize is not known at compile time when > considered in isolation, because no useful base alignment can be > derived from its DR_REF. (The DR_REF is to a plain int rather than > to a structure with a high alignment.) So DR_MISALIGNMENT (A) == -1. > > 3. A and B when considered as a pair trivially have the same misalignment > wrt vecsize, for the reasons above. > > Each of these results is individually correct. The problem is that the > assert is conflating two things: it's saying that if we know two datarefs > have the same misaligment, we must either be able to calculate a > compile-time misalignment for both datarefs in isolation, or we must > fail to calculate a compile-time misalignment for both datarefs in > isolation. That isn't true: it's valid to have situations in which the > compile-time misalignment is known for one dataref in isolation but not > for the other. True. So the assert should then become gcc_assert (! known_alignment_for_access_p (dr) || DR_MISALIGNMENT (dr) / dr_size == DR_MISALIGNMENT (dr_peel) / dr_peel_size); ? Richard. > > Thanks, > Richard >
Richard Biener <richard.guenther@gmail.com> writes: > On Mon, Jun 26, 2017 at 1:14 PM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Richard Biener <richard.guenther@gmail.com> writes: >>> On Fri, Jun 23, 2017 at 2:05 PM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> Richard Biener <richard.guenther@gmail.com> writes: >>>>> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford >>>>> <richard.sandiford@linaro.org> wrote: >>>>>> The test case triggered this assert in vect_update_misalignment_for_peel: >>>>>> >>>>>> gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >>>>>> DR_MISALIGNMENT (dr_peel) / dr_peel_size); >>>>>> >>>>>> We knew that the two DRs had the same misalignment at runtime, but when >>>>>> considered in isolation, one data reference guaranteed a higher compile-time >>>>>> base alignment than the other. >>>>>> >>>>>> In the test case this looks like a missed opportunity. Both references >>>>>> are unconditional, so it should be possible to use the highest of the >>>>>> available base alignment guarantees when analyzing each reference. >>>>>> The patch does this. >>>>>> >>>>>> However, as the comment in the patch says, the base alignment guarantees >>>>>> provided by a conditional reference only apply if the reference occurs >>>>>> at least once. In this case it would be legitimate for two references >>>>>> to have the same runtime misalignment and for one reference to provide a >>>>>> stronger compile-time guarantee than the other about what the misalignment >>>>>> actually is. The patch therefore relaxes the assert to handle that case. >>>>> >>>>> Hmm, but you don't actually check whether a reference occurs only >>> conditional, >>>>> do you? You just seem to say that for masked loads/stores the reference >>>>> is conditional (I believe that's not true). But for a loop like >>>>> >>>>> for (;;) >>>>> if (a[i]) >>>>> sum += b[j]; >>>>> >>>>> you still assume b[j] executes unconditionally? >>>> >>>> Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL >>>> was supposed to mean "even if the containing statement executes >>>> and runs to completion, the reference might not actually occur". >>>> The example above isn't conditional in that sense because the >>>> reference to b[j] does occur if the store is reached and completes. >>>> >>>> Masked loads and stores are conditional in that sense though. >>>> The reference only occurs if the mask is nonzero; the memory >>>> isn't touched otherwise. The functions are used to if-convert >>>> things like: >>>> >>>> for (...) >>>> a[i] = b[i] ? c[i] : d[i]; >>>> >>>> where there's no guarantee that it's safe to access c[i] when !b[i] >>>> (or d[i] when b[i]). No reference occurs for an all-false mask. >>> >>> But as you touch generic data-ref code here you should apply more >>> sensible semantics to DR_IS_CONDITIONAL than just marking >>> masked loads/stores but not DRs occuring inside BBs only executed >>> conditionally ... >> >> I don't see why that's more sensible though. If a statement is only >> conditionally executed in a loop, it's up to the consumer to decide >> what to do about that. The conditions under which the statement >> is reached are a control-flow issue and tree-data-ref.c doesn't >> have any special information about it. >> >> Masked loads and stores are special because the DR_REFs created by >> tree-data-ref.c are artificial: they didn't exist as MEM_REFs in the >> original DR_STMT. And AIUI they didn't exist as MEM_REFs precisely >> because they're not guaranteed to happen, even if the load or store >> statement itself is executed. So in this case the DR_IS_CONDITIONAL >> is reflecting something that tree-data-ref.c itself has done. >> >> How about calling it DR_IS_CONDITIONAL_IN_STMT to avoid the >> general-sounding name? > > That sounds better and avoids the ambiguity. OK. >>>>> The vectorizer of course only sees unconditionally executed stmts. >>>>> >>>>> So - I'd simply not add this DR_IS_CONDITIONAL. Did you run into >>>>> any real-world (testsuite) issues without this? >>>> >>>> Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment >>>> assumptions in silly corner cases. I could add a scan test for it, >>>> for targets with masked loads and stores. It wouldn't trigger >>>> an execution failure though because we assume that targets with >>>> masked loads and stores allow unaligned accesses: >>>> >>>> /* For now assume all conditional loads/stores support unaligned >>>> access without any special code. */ >>>> if (is_gimple_call (stmt) >>>> && gimple_call_internal_p (stmt) >>>> && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD >>>> || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) >>>> return dr_unaligned_supported; >>>> >>>> So the worst that would happen is that we'd supposedly peel for >>>> alignment, but actually misalign everything instead, and so make >>>> things slower rather than quicker. >>>> >>>>> Note that the assert is to prevent bogus information. Iff we aligned >>>>> DR with base alignment 8 and misalign 3 then if another same-align >>>>> DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT >>>>> as it still can be 8 after aligning DR. >>>>> >>>>> So I think it's wrong to put DRs with differing base-alignment into >>>>> the same-align-refs chain, those should get their DR_MISALIGNMENT >>>>> updated independenlty after peeling. >>>> >>>> DR_MISALIGNMENT is relative to the vector alignment rather than >>>> the base alignment though. So: >>> >>> We seem to use it that way, yes (looking at set_ptr_info_alignment >>> uses). So why not fix the assert then by capping the alignment/misalignment >>> we compute at this value as well? (and document this in the header >>> around DR_MISALIGNMENT) >>> >>> Ideally we'd do alignment analysis independent of the vector size >>> though (for those stupid targets with multiple vector sizes to consider...). >>> >>>> a) when looking for references *A1 and *A2 with the same alignment, >>>> we simply have to prove that A1 % vecalign == A2 % vecalign. >>>> This doesn't require any knowledge about the base alignment. >>>> If we break the addresses down as: >>>> >>>> A1 = BASE1 + REST1, REST1 = INIT1 + OFFSET1 + X * STEP1 >>>> A2 = BASE2 + REST2, REST2 = INIT2 + OFFSET2 + X * STEP2 >>>> >>>> and can prove that BASE1 == BASE2, the alignment of that base >>>> isn't important. We simply need to prove that REST1 % vecalign >>>> == REST2 % vecalign for all X. >>>> >>>> b) In the assert, we've peeled the loop so that DR_PEEL is guaranteed >>>> to be vector-aligned. If DR_PEEL is A1 in the example above, we have >>>> A1 % vecalign == 0, so A2 % vecalign must be 0 too. This again doesn't >>>> rely on the base alignment being known. >>>> >>>> What a high base alignment for DR_PEEL gives us is the ability to know >>>> at compile how many iterations need to be peeled to make DR_PEEL aligned. >>>> But the points above apply regardless of whether we know that value at >>>> compile time or not. >>>> >>>> In examples like the test case, we would have known at compile time that >>>> VF-1 iterations would need to be peeled if we'd picked the store as the >>>> DR_PEEL, but would have treated the number of peels as variable if we'd >>>> picked the load. The value calculated at runtime would still have been >>>> VF-1, it's just that the code wouldn't have been as efficient. >>>> >>>> One of the benefits of pooling the alignments for unconditional references >>>> is that it no longer matters which DR we pick: the number of peels will >>>> be a compile-time constant both ways. >>>> >>>> Thanks, >>>> Richard >>>> >>>>> I'd rather not mix fixing this with the improvement to eventuall use a >>>>> larger align for the other DR if possible. >>> >>> ^^^ >>> >>> So can you fix the ICE with capping base alignment / DR_MISALIGNMENT? >> >> I don't think the problem is the lack of a cap. In the test case we >> see that: >> >> 1. B is known at compile time to be X * vecsize + Y when considered in >> isolation, because the base alignment derived from its DR_REF >= vecsize. >> So DR_MISALIGNMENT (B) == Y. >> >> 2. A's misalignment wrt vecsize is not known at compile time when >> considered in isolation, because no useful base alignment can be >> derived from its DR_REF. (The DR_REF is to a plain int rather than >> to a structure with a high alignment.) So DR_MISALIGNMENT (A) == -1. >> >> 3. A and B when considered as a pair trivially have the same misalignment >> wrt vecsize, for the reasons above. >> >> Each of these results is individually correct. The problem is that the >> assert is conflating two things: it's saying that if we know two datarefs >> have the same misaligment, we must either be able to calculate a >> compile-time misalignment for both datarefs in isolation, or we must >> fail to calculate a compile-time misalignment for both datarefs in >> isolation. That isn't true: it's valid to have situations in which the >> compile-time misalignment is known for one dataref in isolation but not >> for the other. > > True. So the assert should then become > > gcc_assert (! known_alignment_for_access_p (dr) > || DR_MISALIGNMENT (dr) / dr_size == > DR_MISALIGNMENT (dr_peel) / dr_peel_size); > > ? I think it would need to be: gcc_assert (!known_alignment_for_access_p (dr) || !known_alignment_for_access_p (dr_peel) || (DR_MISALIGNMENT (dr) / dr_size == DR_MISALIGNMENT (dr_peel) / dr_peel_size)); But yeah, that would work too. The idea with the assert in the patch was that for unconditional references we probably *do* want to try to compute the same compile-time misalignment, but for optimisation reasons rather than correctness. Maybe that's more properly a gcc_checking_assert though, since nothing goes wrong if it fails. So perhaps: gcc_checking_assert (DR_IS_CONDITIONAL_IN_STMT (dr) || DR_IS_CONDITIONAL_IN_STMT (dr_peel) || (known_alignment_for_access_p (dr) == known_alignment_for_access_p (dr_peel))); as a follow-on assert. Should I split it into two patches, one to change the gcc_assert and another to add the optimisation? Thanks, Richard
On Mon, Jun 26, 2017 at 1:50 PM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >> On Mon, Jun 26, 2017 at 1:14 PM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> Richard Biener <richard.guenther@gmail.com> writes: >>>> On Fri, Jun 23, 2017 at 2:05 PM, Richard Sandiford >>>> <richard.sandiford@linaro.org> wrote: >>>>> Richard Biener <richard.guenther@gmail.com> writes: >>>>>> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford >>>>>> <richard.sandiford@linaro.org> wrote: >>>>>>> The test case triggered this assert in vect_update_misalignment_for_peel: >>>>>>> >>>>>>> gcc_assert (DR_MISALIGNMENT (dr) / dr_size == >>>>>>> DR_MISALIGNMENT (dr_peel) / dr_peel_size); >>>>>>> >>>>>>> We knew that the two DRs had the same misalignment at runtime, but when >>>>>>> considered in isolation, one data reference guaranteed a higher compile-time >>>>>>> base alignment than the other. >>>>>>> >>>>>>> In the test case this looks like a missed opportunity. Both references >>>>>>> are unconditional, so it should be possible to use the highest of the >>>>>>> available base alignment guarantees when analyzing each reference. >>>>>>> The patch does this. >>>>>>> >>>>>>> However, as the comment in the patch says, the base alignment guarantees >>>>>>> provided by a conditional reference only apply if the reference occurs >>>>>>> at least once. In this case it would be legitimate for two references >>>>>>> to have the same runtime misalignment and for one reference to provide a >>>>>>> stronger compile-time guarantee than the other about what the misalignment >>>>>>> actually is. The patch therefore relaxes the assert to handle that case. >>>>>> >>>>>> Hmm, but you don't actually check whether a reference occurs only >>>> conditional, >>>>>> do you? You just seem to say that for masked loads/stores the reference >>>>>> is conditional (I believe that's not true). But for a loop like >>>>>> >>>>>> for (;;) >>>>>> if (a[i]) >>>>>> sum += b[j]; >>>>>> >>>>>> you still assume b[j] executes unconditionally? >>>>> >>>>> Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL >>>>> was supposed to mean "even if the containing statement executes >>>>> and runs to completion, the reference might not actually occur". >>>>> The example above isn't conditional in that sense because the >>>>> reference to b[j] does occur if the store is reached and completes. >>>>> >>>>> Masked loads and stores are conditional in that sense though. >>>>> The reference only occurs if the mask is nonzero; the memory >>>>> isn't touched otherwise. The functions are used to if-convert >>>>> things like: >>>>> >>>>> for (...) >>>>> a[i] = b[i] ? c[i] : d[i]; >>>>> >>>>> where there's no guarantee that it's safe to access c[i] when !b[i] >>>>> (or d[i] when b[i]). No reference occurs for an all-false mask. >>>> >>>> But as you touch generic data-ref code here you should apply more >>>> sensible semantics to DR_IS_CONDITIONAL than just marking >>>> masked loads/stores but not DRs occuring inside BBs only executed >>>> conditionally ... >>> >>> I don't see why that's more sensible though. If a statement is only >>> conditionally executed in a loop, it's up to the consumer to decide >>> what to do about that. The conditions under which the statement >>> is reached are a control-flow issue and tree-data-ref.c doesn't >>> have any special information about it. >>> >>> Masked loads and stores are special because the DR_REFs created by >>> tree-data-ref.c are artificial: they didn't exist as MEM_REFs in the >>> original DR_STMT. And AIUI they didn't exist as MEM_REFs precisely >>> because they're not guaranteed to happen, even if the load or store >>> statement itself is executed. So in this case the DR_IS_CONDITIONAL >>> is reflecting something that tree-data-ref.c itself has done. >>> >>> How about calling it DR_IS_CONDITIONAL_IN_STMT to avoid the >>> general-sounding name? >> >> That sounds better and avoids the ambiguity. > > OK. > >>>>>> The vectorizer of course only sees unconditionally executed stmts. >>>>>> >>>>>> So - I'd simply not add this DR_IS_CONDITIONAL. Did you run into >>>>>> any real-world (testsuite) issues without this? >>>>> >>>>> Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment >>>>> assumptions in silly corner cases. I could add a scan test for it, >>>>> for targets with masked loads and stores. It wouldn't trigger >>>>> an execution failure though because we assume that targets with >>>>> masked loads and stores allow unaligned accesses: >>>>> >>>>> /* For now assume all conditional loads/stores support unaligned >>>>> access without any special code. */ >>>>> if (is_gimple_call (stmt) >>>>> && gimple_call_internal_p (stmt) >>>>> && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD >>>>> || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) >>>>> return dr_unaligned_supported; >>>>> >>>>> So the worst that would happen is that we'd supposedly peel for >>>>> alignment, but actually misalign everything instead, and so make >>>>> things slower rather than quicker. >>>>> >>>>>> Note that the assert is to prevent bogus information. Iff we aligned >>>>>> DR with base alignment 8 and misalign 3 then if another same-align >>>>>> DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT >>>>>> as it still can be 8 after aligning DR. >>>>>> >>>>>> So I think it's wrong to put DRs with differing base-alignment into >>>>>> the same-align-refs chain, those should get their DR_MISALIGNMENT >>>>>> updated independenlty after peeling. >>>>> >>>>> DR_MISALIGNMENT is relative to the vector alignment rather than >>>>> the base alignment though. So: >>>> >>>> We seem to use it that way, yes (looking at set_ptr_info_alignment >>>> uses). So why not fix the assert then by capping the alignment/misalignment >>>> we compute at this value as well? (and document this in the header >>>> around DR_MISALIGNMENT) >>>> >>>> Ideally we'd do alignment analysis independent of the vector size >>>> though (for those stupid targets with multiple vector sizes to consider...). >>>> >>>>> a) when looking for references *A1 and *A2 with the same alignment, >>>>> we simply have to prove that A1 % vecalign == A2 % vecalign. >>>>> This doesn't require any knowledge about the base alignment. >>>>> If we break the addresses down as: >>>>> >>>>> A1 = BASE1 + REST1, REST1 = INIT1 + OFFSET1 + X * STEP1 >>>>> A2 = BASE2 + REST2, REST2 = INIT2 + OFFSET2 + X * STEP2 >>>>> >>>>> and can prove that BASE1 == BASE2, the alignment of that base >>>>> isn't important. We simply need to prove that REST1 % vecalign >>>>> == REST2 % vecalign for all X. >>>>> >>>>> b) In the assert, we've peeled the loop so that DR_PEEL is guaranteed >>>>> to be vector-aligned. If DR_PEEL is A1 in the example above, we have >>>>> A1 % vecalign == 0, so A2 % vecalign must be 0 too. This again doesn't >>>>> rely on the base alignment being known. >>>>> >>>>> What a high base alignment for DR_PEEL gives us is the ability to know >>>>> at compile how many iterations need to be peeled to make DR_PEEL aligned. >>>>> But the points above apply regardless of whether we know that value at >>>>> compile time or not. >>>>> >>>>> In examples like the test case, we would have known at compile time that >>>>> VF-1 iterations would need to be peeled if we'd picked the store as the >>>>> DR_PEEL, but would have treated the number of peels as variable if we'd >>>>> picked the load. The value calculated at runtime would still have been >>>>> VF-1, it's just that the code wouldn't have been as efficient. >>>>> >>>>> One of the benefits of pooling the alignments for unconditional references >>>>> is that it no longer matters which DR we pick: the number of peels will >>>>> be a compile-time constant both ways. >>>>> >>>>> Thanks, >>>>> Richard >>>>> >>>>>> I'd rather not mix fixing this with the improvement to eventuall use a >>>>>> larger align for the other DR if possible. >>>> >>>> ^^^ >>>> >>>> So can you fix the ICE with capping base alignment / DR_MISALIGNMENT? >>> >>> I don't think the problem is the lack of a cap. In the test case we >>> see that: >>> >>> 1. B is known at compile time to be X * vecsize + Y when considered in >>> isolation, because the base alignment derived from its DR_REF >= vecsize. >>> So DR_MISALIGNMENT (B) == Y. >>> >>> 2. A's misalignment wrt vecsize is not known at compile time when >>> considered in isolation, because no useful base alignment can be >>> derived from its DR_REF. (The DR_REF is to a plain int rather than >>> to a structure with a high alignment.) So DR_MISALIGNMENT (A) == -1. >>> >>> 3. A and B when considered as a pair trivially have the same misalignment >>> wrt vecsize, for the reasons above. >>> >>> Each of these results is individually correct. The problem is that the >>> assert is conflating two things: it's saying that if we know two datarefs >>> have the same misaligment, we must either be able to calculate a >>> compile-time misalignment for both datarefs in isolation, or we must >>> fail to calculate a compile-time misalignment for both datarefs in >>> isolation. That isn't true: it's valid to have situations in which the >>> compile-time misalignment is known for one dataref in isolation but not >>> for the other. >> >> True. So the assert should then become >> >> gcc_assert (! known_alignment_for_access_p (dr) >> || DR_MISALIGNMENT (dr) / dr_size == >> DR_MISALIGNMENT (dr_peel) / dr_peel_size); >> >> ? > > I think it would need to be: > > gcc_assert (!known_alignment_for_access_p (dr) > || !known_alignment_for_access_p (dr_peel) > || (DR_MISALIGNMENT (dr) / dr_size > == DR_MISALIGNMENT (dr_peel) / dr_peel_size)); I think for !known_alignment_for_access_p (dr_peel) the assert doesn't make any sense (DR_MISALIGNMENT is -1), so yes, you are right. > But yeah, that would work too. The idea with the assert in the patch was > that for unconditional references we probably *do* want to try to compute > the same compile-time misalignment, but for optimisation reasons rather > than correctness. Maybe that's more properly a gcc_checking_assert > though, since nothing goes wrong if it fails. So perhaps: We shouldn't have asserts for optimization reasons, even with checking IMHO. > > gcc_checking_assert (DR_IS_CONDITIONAL_IN_STMT (dr) > || DR_IS_CONDITIONAL_IN_STMT (dr_peel) > || (known_alignment_for_access_p (dr) > == known_alignment_for_access_p (dr_peel))); > > as a follow-on assert. > > Should I split it into two patches, one to change the gcc_assert and > another to add the optimisation? Yes please. Thanks, Richard. > Thanks, > Richard
Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-06-08 08:51:43.347264181 +0100 +++ gcc/tree-vectorizer.h 2017-06-22 12:23:21.288421018 +0100 @@ -22,6 +22,7 @@ Software Foundation; either version 3, o #define GCC_TREE_VECTORIZER_H #include "tree-data-ref.h" +#include "tree-hash-traits.h" #include "target.h" /* Used for naming of new temporaries. */ @@ -84,6 +85,10 @@ struct stmt_info_for_cost { typedef vec<stmt_info_for_cost> stmt_vector_for_cost; +/* Maps base addresses to the largest alignment that we've been able + to calculate for them. */ +typedef hash_map<tree_operand_hash, unsigned int> vec_base_alignments; + /************************************************************************ SLP ************************************************************************/ @@ -156,6 +161,10 @@ struct vec_info { /* All data references. */ vec<data_reference_p> datarefs; + /* Maps the base addresses of all data references in DATAREFS to the + largest alignment that we've been able to calculate for them. */ + vec_base_alignments base_alignments; + /* All data dependences. */ vec<ddr_p> ddrs; @@ -1117,6 +1126,7 @@ extern bool vect_prune_runtime_alias_tes extern bool vect_check_gather_scatter (gimple *, loop_vec_info, gather_scatter_info *); extern bool vect_analyze_data_refs (vec_info *, int *); +extern void vect_compute_base_alignments (vec_info *); extern tree vect_create_data_ref_ptr (gimple *, tree, struct loop *, tree, tree *, gimple_stmt_iterator *, gimple **, bool, bool *, Index: gcc/tree-data-ref.h =================================================================== --- gcc/tree-data-ref.h 2017-06-08 08:51:43.349263895 +0100 +++ gcc/tree-data-ref.h 2017-06-22 12:23:21.285421180 +0100 @@ -119,6 +119,10 @@ struct data_reference /* True when the data reference is in RHS of a stmt. */ bool is_read; + /* True when the data reference is conditional, i.e. if it might not + occur even when the statement runs to completion. */ + bool is_conditional; + /* Behavior of the memory reference in the innermost loop. */ struct innermost_loop_behavior innermost; @@ -138,6 +142,7 @@ #define DR_ACCESS_FN(DR, I) DR_AC #define DR_NUM_DIMENSIONS(DR) DR_ACCESS_FNS (DR).length () #define DR_IS_READ(DR) (DR)->is_read #define DR_IS_WRITE(DR) (!DR_IS_READ (DR)) +#define DR_IS_CONDITIONAL(DR) (DR)->is_conditional #define DR_BASE_ADDRESS(DR) (DR)->innermost.base_address #define DR_OFFSET(DR) (DR)->innermost.offset #define DR_INIT(DR) (DR)->innermost.init @@ -350,7 +355,8 @@ extern bool graphite_find_data_reference vec<data_reference_p> *); tree find_data_references_in_loop (struct loop *, vec<data_reference_p> *); bool loop_nest_has_data_refs (loop_p loop); -struct data_reference *create_data_ref (loop_p, loop_p, tree, gimple *, bool); +struct data_reference *create_data_ref (loop_p, loop_p, tree, gimple *, bool, + bool); extern bool find_loop_nest (struct loop *, vec<loop_p> *); extern struct data_dependence_relation *initialize_data_dependence_relation (struct data_reference *, struct data_reference *, vec<loop_p>); Index: gcc/tree-data-ref.c =================================================================== --- gcc/tree-data-ref.c 2017-06-08 08:51:43.349263895 +0100 +++ gcc/tree-data-ref.c 2017-06-22 12:23:21.284421233 +0100 @@ -1053,15 +1053,18 @@ free_data_ref (data_reference_p dr) free (dr); } -/* Analyzes memory reference MEMREF accessed in STMT. The reference - is read if IS_READ is true, write otherwise. Returns the - data_reference description of MEMREF. NEST is the outermost loop - in which the reference should be instantiated, LOOP is the loop in - which the data reference should be analyzed. */ +/* Analyze memory reference MEMREF, which is accessed in STMT. The reference + is a read if IS_READ is true, otherwise it is a write. IS_CONDITIONAL + indicates that the reference is conditional, i.e. that it might not + occur every time that STMT runs to completion. + + Return the data_reference description of MEMREF. NEST is the outermost + loop in which the reference should be instantiated, LOOP is the loop + in which the data reference should be analyzed. */ struct data_reference * create_data_ref (loop_p nest, loop_p loop, tree memref, gimple *stmt, - bool is_read) + bool is_read, bool is_conditional) { struct data_reference *dr; @@ -1076,6 +1079,7 @@ create_data_ref (loop_p nest, loop_p loo DR_STMT (dr) = stmt; DR_REF (dr) = memref; DR_IS_READ (dr) = is_read; + DR_IS_CONDITIONAL (dr) = is_conditional; dr_analyze_innermost (dr, nest); dr_analyze_indices (dr, nest, loop); @@ -4446,6 +4450,10 @@ struct data_ref_loc /* True if the memory reference is read. */ bool is_read; + + /* True if the data reference is conditional, i.e. if it might not + occur even when the statement runs to completion. */ + bool is_conditional; }; @@ -4512,6 +4520,7 @@ get_references_in_stmt (gimple *stmt, ve { ref.ref = op1; ref.is_read = true; + ref.is_conditional = false; references->safe_push (ref); } } @@ -4539,6 +4548,7 @@ get_references_in_stmt (gimple *stmt, ve type = TREE_TYPE (gimple_call_arg (stmt, 3)); if (TYPE_ALIGN (type) != align) type = build_aligned_type (type, align); + ref.is_conditional = true; ref.ref = fold_build2 (MEM_REF, type, gimple_call_arg (stmt, 0), ptr); references->safe_push (ref); @@ -4558,6 +4568,7 @@ get_references_in_stmt (gimple *stmt, ve { ref.ref = op1; ref.is_read = true; + ref.is_conditional = false; references->safe_push (ref); } } @@ -4571,6 +4582,7 @@ get_references_in_stmt (gimple *stmt, ve { ref.ref = op0; ref.is_read = false; + ref.is_conditional = false; references->safe_push (ref); } return clobbers_memory; @@ -4635,8 +4647,8 @@ find_data_references_in_stmt (struct loo FOR_EACH_VEC_ELT (references, i, ref) { - dr = create_data_ref (nest, loop_containing_stmt (stmt), - ref->ref, stmt, ref->is_read); + dr = create_data_ref (nest, loop_containing_stmt (stmt), ref->ref, + stmt, ref->is_read, ref->is_conditional); gcc_assert (dr != NULL); datarefs->safe_push (dr); } @@ -4665,7 +4677,8 @@ graphite_find_data_references_in_stmt (l FOR_EACH_VEC_ELT (references, i, ref) { - dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read); + dr = create_data_ref (nest, loop, ref->ref, stmt, ref->is_read, + ref->is_conditional); gcc_assert (dr != NULL); datarefs->safe_push (dr); } Index: gcc/tree-ssa-loop-prefetch.c =================================================================== --- gcc/tree-ssa-loop-prefetch.c 2017-06-07 21:58:55.928557601 +0100 +++ gcc/tree-ssa-loop-prefetch.c 2017-06-22 12:23:21.285421180 +0100 @@ -1633,7 +1633,7 @@ determine_loop_nest_reuse (struct loop * for (ref = gr->refs; ref; ref = ref->next) { dr = create_data_ref (nest, loop_containing_stmt (ref->stmt), - ref->mem, ref->stmt, !ref->write_p); + ref->mem, ref->stmt, !ref->write_p, false); if (dr) { Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-06-08 08:51:43.350263752 +0100 +++ gcc/tree-vect-data-refs.c 2017-06-22 12:23:21.286421126 +0100 @@ -646,6 +646,102 @@ vect_slp_analyze_instance_dependence (sl return res; } +/* If DR is nested in a loop that is being vectorized, return the base + address in the context of the vectorized loop (rather than the + nested loop). Otherwise return the base address in the context + of the containing statement. */ + +static tree +vect_get_base_address (data_reference *dr) +{ + gimple *stmt = DR_STMT (dr); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); + struct loop *loop = loop_vinfo != NULL ? LOOP_VINFO_LOOP (loop_vinfo) : NULL; + if (loop && nested_in_vect_loop_p (loop, stmt)) + return STMT_VINFO_DR_BASE_ADDRESS (stmt_info); + else + return DR_BASE_ADDRESS (dr); +} + +/* Compute and return the alignment of base address BASE_ADDR in DR. */ + +static unsigned int +vect_compute_base_alignment (data_reference *dr, tree base_addr) +{ + /* To look at the alignment of the base we have to preserve an inner + MEM_REF as that carries the alignment information of the actual + access. */ + tree base = DR_REF (dr); + while (handled_component_p (base)) + base = TREE_OPERAND (base, 0); + unsigned int base_alignment = 0; + unsigned HOST_WIDE_INT base_bitpos; + get_object_alignment_1 (base, &base_alignment, &base_bitpos); + + /* As data-ref analysis strips the MEM_REF down to its base operand + to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to + adjust things to make base_alignment valid as the alignment of + DR_BASE_ADDRESS. */ + if (TREE_CODE (base) == MEM_REF) + { + /* Note all this only works if DR_BASE_ADDRESS is the same as + MEM_REF operand zero, otherwise DR/SCEV analysis might have factored + in other offsets. We need to rework DR to compute the alingment + of DR_BASE_ADDRESS as long as all information is still available. */ + if (operand_equal_p (TREE_OPERAND (base, 0), base_addr, 0)) + { + base_bitpos -= mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT; + base_bitpos &= (base_alignment - 1); + } + else + base_bitpos = BITS_PER_UNIT; + } + if (base_bitpos != 0) + base_alignment = base_bitpos & -base_bitpos; + + /* Also look at the alignment of the base address DR analysis + computed. */ + unsigned int base_addr_alignment = get_pointer_alignment (base_addr); + if (base_addr_alignment > base_alignment) + base_alignment = base_addr_alignment; + + return base_alignment; +} + +/* Compute alignments for the base addresses of all datarefs in VINFO. */ + +void +vect_compute_base_alignments (vec_info *vinfo) +{ + /* If the region we're going to vectorize is reached, all unconditional + data references occur at least once. We can therefore pool the base + alignment guarantees from each unconditional reference. */ + data_reference *dr; + unsigned int i; + FOR_EACH_VEC_ELT (vinfo->datarefs, i, dr) + if (!DR_IS_CONDITIONAL (dr)) + { + tree base_addr = vect_get_base_address (dr); + unsigned int alignment = vect_compute_base_alignment (dr, base_addr); + bool existed; + unsigned int &entry + = vinfo->base_alignments.get_or_insert (base_addr, &existed); + if (!existed || entry < alignment) + { + entry = alignment; + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, + "setting base alignment for "); + dump_generic_expr (MSG_NOTE, TDF_SLIM, base_addr); + dump_printf (MSG_NOTE, " to %d, based on ", alignment); + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, DR_STMT (dr), 0); + } + } + } +} + /* Function vect_compute_data_ref_alignment Compute the misalignment of the data reference DR. @@ -663,6 +759,7 @@ vect_compute_data_ref_alignment (struct { gimple *stmt = DR_STMT (dr); stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + vec_base_alignments *base_alignments = &stmt_info->vinfo->base_alignments; loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); struct loop *loop = NULL; tree ref = DR_REF (dr); @@ -699,6 +796,8 @@ vect_compute_data_ref_alignment (struct { tree step = DR_STEP (dr); + base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info); + aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info); if (tree_fits_shwi_p (step) && tree_to_shwi (step) % GET_MODE_SIZE (TYPE_MODE (vectype)) == 0) { @@ -706,8 +805,6 @@ vect_compute_data_ref_alignment (struct dump_printf_loc (MSG_NOTE, vect_location, "inner step divides the vector-size.\n"); misalign = STMT_VINFO_DR_INIT (stmt_info); - aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info); - base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info); } else { @@ -738,39 +835,15 @@ vect_compute_data_ref_alignment (struct } } - /* To look at alignment of the base we have to preserve an inner MEM_REF - as that carries alignment information of the actual access. */ - base = ref; - while (handled_component_p (base)) - base = TREE_OPERAND (base, 0); + /* Calculate the maximum of the pooled base address alignment and the + alignment that we can compute for DR itself. The latter should + already be included in the former for unconditional references. */ unsigned int base_alignment = 0; - unsigned HOST_WIDE_INT base_bitpos; - get_object_alignment_1 (base, &base_alignment, &base_bitpos); - /* As data-ref analysis strips the MEM_REF down to its base operand - to form DR_BASE_ADDRESS and adds the offset to DR_INIT we have to - adjust things to make base_alignment valid as the alignment of - DR_BASE_ADDRESS. */ - if (TREE_CODE (base) == MEM_REF) - { - /* Note all this only works if DR_BASE_ADDRESS is the same as - MEM_REF operand zero, otherwise DR/SCEV analysis might have factored - in other offsets. We need to rework DR to compute the alingment - of DR_BASE_ADDRESS as long as all information is still available. */ - if (operand_equal_p (TREE_OPERAND (base, 0), base_addr, 0)) - { - base_bitpos -= mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT; - base_bitpos &= (base_alignment - 1); - } - else - base_bitpos = BITS_PER_UNIT; - } - if (base_bitpos != 0) - base_alignment = base_bitpos & -base_bitpos; - /* Also look at the alignment of the base address DR analysis - computed. */ - unsigned int base_addr_alignment = get_pointer_alignment (base_addr); - if (base_addr_alignment > base_alignment) - base_alignment = base_addr_alignment; + if (DR_IS_CONDITIONAL (dr)) + base_alignment = vect_compute_base_alignment (dr, base_addr); + if (unsigned int *entry = base_alignments->get (base_addr)) + base_alignment = MAX (base_alignment, *entry); + gcc_assert (base_alignment != 0); if (base_alignment >= TYPE_ALIGN (TREE_TYPE (vectype))) DR_VECT_AUX (dr)->base_element_aligned = true; @@ -906,8 +979,29 @@ vect_update_misalignment_for_peel (struc { if (current_dr != dr) continue; - gcc_assert (DR_MISALIGNMENT (dr) / dr_size == - DR_MISALIGNMENT (dr_peel) / dr_peel_size); + /* Any alignment guarantees provided by a reference only apply if + the reference actually occurs. For example, in: + + struct s __attribute__((aligned(32))) { + int misaligner; + int array[N]; + }; + + int *ptr; + for (int i = 0; i < n; ++i) + ptr[i] = c[i] ? ((struct s *) (ptr - 1))->array[i] : 0; + + we can only assume that ptr is part of a struct s if at least one + c[i] is true. This in turn means that we have a higher base + alignment guarantee for the read from ptr (if it occurs) than for + the write to ptr, and we cannot unconditionally carry the former + over to the latter. We still know that the two address values + have the same misalignment, so if peeling has forced one of them + to be aligned, the other must be too. */ + gcc_assert (DR_IS_CONDITIONAL (dr_peel) + || DR_IS_CONDITIONAL (dr) + || (DR_MISALIGNMENT (dr) / dr_size + == DR_MISALIGNMENT (dr_peel) / dr_peel_size)); SET_DR_MISALIGNMENT (dr, 0); return; } @@ -2117,8 +2211,7 @@ vect_find_same_alignment_drs (struct dat if (dra == drb) return; - if (!operand_equal_p (DR_BASE_OBJECT (dra), DR_BASE_OBJECT (drb), - OEP_ADDRESS_OF) + if (!operand_equal_p (DR_BASE_ADDRESS (dra), DR_BASE_ADDRESS (drb), 0) || !operand_equal_p (DR_OFFSET (dra), DR_OFFSET (drb), 0) || !operand_equal_p (DR_STEP (dra), DR_STEP (drb), 0)) return; @@ -2176,6 +2269,7 @@ vect_analyze_data_refs_alignment (loop_v vec<data_reference_p> datarefs = vinfo->datarefs; struct data_reference *dr; + vect_compute_base_alignments (vinfo); FOR_EACH_VEC_ELT (datarefs, i, dr) { stmt_vec_info stmt_info = vinfo_for_stmt (DR_STMT (dr)); @@ -3374,7 +3468,8 @@ vect_analyze_data_refs (vec_info *vinfo, { struct data_reference *newdr = create_data_ref (NULL, loop_containing_stmt (stmt), - DR_REF (dr), stmt, maybe_scatter ? false : true); + DR_REF (dr), stmt, !maybe_scatter, + DR_IS_CONDITIONAL (dr)); gcc_assert (newdr != NULL && DR_REF (newdr)); if (DR_BASE_ADDRESS (newdr) && DR_OFFSET (newdr) Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c 2017-06-07 21:58:56.336475882 +0100 +++ gcc/tree-vect-slp.c 2017-06-22 12:23:21.288421018 +0100 @@ -2367,6 +2367,7 @@ new_bb_vec_info (gimple_stmt_iterator re gimple_stmt_iterator gsi; res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); + new (&res->base_alignments) vec_base_alignments (); res->kind = vec_info::bb; BB_VINFO_BB (res) = bb; res->region_begin = region_begin; @@ -2741,6 +2742,8 @@ vect_slp_analyze_bb_1 (gimple_stmt_itera return NULL; } + vect_compute_base_alignments (bb_vinfo); + /* Analyze and verify the alignment of data references and the dependence in the SLP instances. */ for (i = 0; BB_VINFO_SLP_INSTANCES (bb_vinfo).iterate (i, &instance); ) Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2017-06-22 12:22:57.734313143 +0100 +++ gcc/tree-vect-loop.c 2017-06-22 12:23:21.287421072 +0100 @@ -1157,6 +1157,7 @@ new_loop_vec_info (struct loop *loop) LOOP_VINFO_VECT_FACTOR (res) = 0; LOOP_VINFO_LOOP_NEST (res) = vNULL; LOOP_VINFO_DATAREFS (res) = vNULL; + new (&res->base_alignments) vec_base_alignments (); LOOP_VINFO_DDRS (res) = vNULL; LOOP_VINFO_UNALIGNED_DR (res) = NULL; LOOP_VINFO_MAY_MISALIGN_STMTS (res) = vNULL; Index: gcc/tree-vectorizer.c =================================================================== --- gcc/tree-vectorizer.c 2017-06-22 12:22:57.732313220 +0100 +++ gcc/tree-vectorizer.c 2017-06-22 12:23:21.288421018 +0100 @@ -370,6 +370,8 @@ vect_destroy_datarefs (vec_info *vinfo) } free_data_refs (vinfo->datarefs); + + vinfo->base_alignments.~vec_base_alignments (); } /* A helper function to free scev and LOOP niter information, as well as Index: gcc/testsuite/gcc.dg/vect/pr81136.c =================================================================== --- /dev/null 2017-06-22 07:43:14.805493307 +0100 +++ gcc/testsuite/gcc.dg/vect/pr81136.c 2017-06-22 12:23:21.283421287 +0100 @@ -0,0 +1,16 @@ +/* { dg-do compile } */ + +struct __attribute__((aligned (32))) +{ + char misaligner; + int foo[100]; + int bar[100]; +} *a; + +void +fn1 (int n) +{ + int *b = a->foo; + for (int i = 0; i < n; i++) + a->bar[i] = b[i]; +}