diff mbox

Tweak BB analysis for dr_analyze_innermost

Message ID 87r2y46xpp.fsf@linaro.org
State Accepted
Commit bc9f4235bcac6304141c472c94ecedeb9dbbff56
Headers show

Commit Message

Richard Sandiford June 28, 2017, 1:36 p.m. UTC
dr_analyze_innermost had a "struct loop *nest" parameter that acted
like a boolean.  This was added in r179161, with the idea that a
null nest selected BB-level analysis rather than loop analysis.

The handling seemed strange though.  If the DR was part of a loop,
we still tried to express the base and offset values as IVs, potentially
giving a nonzero step.  If that failed for any reason, we'd revert to
using the original base and offset, just as we would if we hadn't asked
for an IV in the first place.

It seems more natural to use the !in_loop handling whenever nest is null
and always set the step to zero.  This actually enables one more SLP
opportunity in bb-slp-pr65935.c.

I checked out r179161 and tried the patch there.  The test case added
in that revision still passes, so I don't think there was any particular
need to check simple_iv.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2017-06-28  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* tree-data-ref.c (dr_analyze_innermost): Replace the "nest"
	parameter with a "loop" parameter and use it instead of the
	loop containing DR_STMT.  Don't check simple_iv when doing
	BB analysis.  Describe the two analysis modes in the comment.

gcc/testsuite/
	* gcc.dg/vect/bb-slp-pr65935.c: Expect SLP to be used in main
	as well.

Comments

Bin.Cheng June 28, 2017, 2 p.m. UTC | #1
On Wed, Jun 28, 2017 at 2:36 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Index: gcc/tree-data-ref.c

> ===================================================================

> --- gcc/tree-data-ref.c 2017-06-28 14:33:41.294720044 +0100

> +++ gcc/tree-data-ref.c 2017-06-28 14:35:30.475155670 +0100

> @@ -749,15 +749,29 @@ canonicalize_base_object_address (tree a

>    return build_fold_addr_expr (TREE_OPERAND (addr, 0));

>  }

>

> -/* Analyzes the behavior of the memory reference DR in the innermost loop or

> -   basic block that contains it.  Returns true if analysis succeed or false

> -   otherwise.  */

> +/* Analyze the behavior of memory reference DR.  There are two modes:

> +

> +   - BB analysis.  In this case we simply split the address into base,

> +     init and offset components, without reference to any containing loop.

> +     The resulting base and offset are general expressions and they can

> +     vary arbitrarily from one iteration of the containing loop to the next.

> +     The step is always zero.

> +

> +   - loop analysis.  In this case we analyze the reference both wrt LOOP

> +     and on the basis that the reference occurs (is "used") in LOOP;

> +     see the comment above analyze_scalar_evolution_in_loop for more

> +     information about this distinction.  The base, init, offset and

> +     step fields are all invariant in LOOP.

> +

> +   Perform BB analysis if LOOP is null, or if LOOP is the function's

> +   dummy outermost loop.  In other cases perform loop analysis.

> +

> +   Return true if the analysis succeeded and store the results in DR if so.

> +   BB analysis can only fail for bitfield or reversed-storage accesses.  */

>

>  bool

> -dr_analyze_innermost (struct data_reference *dr, struct loop *nest)

> +dr_analyze_innermost (struct data_reference *dr, struct loop *loop)

>  {

> -  gimple *stmt = DR_STMT (dr);

> -  struct loop *loop = loop_containing_stmt (stmt);

>    tree ref = DR_REF (dr);

>    HOST_WIDE_INT pbitsize, pbitpos;

>    tree base, poffset;

> @@ -806,22 +820,11 @@ dr_analyze_innermost (struct data_refere

>

>    if (in_loop)

A nit.  No need to use bool variable now?  We can check loop != NULL instead.

Thanks,
bin
Richard Sandiford June 28, 2017, 2:04 p.m. UTC | #2
"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Wed, Jun 28, 2017 at 2:36 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> Index: gcc/tree-data-ref.c

>> ===================================================================

>> --- gcc/tree-data-ref.c 2017-06-28 14:33:41.294720044 +0100

>> +++ gcc/tree-data-ref.c 2017-06-28 14:35:30.475155670 +0100

>> @@ -749,15 +749,29 @@ canonicalize_base_object_address (tree a

>>    return build_fold_addr_expr (TREE_OPERAND (addr, 0));

>>  }

>>

>> -/* Analyzes the behavior of the memory reference DR in the innermost loop or

>> -   basic block that contains it.  Returns true if analysis succeed or false

>> -   otherwise.  */

>> +/* Analyze the behavior of memory reference DR.  There are two modes:

>> +

>> +   - BB analysis.  In this case we simply split the address into base,

>> +     init and offset components, without reference to any containing loop.

>> +     The resulting base and offset are general expressions and they can

>> +     vary arbitrarily from one iteration of the containing loop to the next.

>> +     The step is always zero.

>> +

>> +   - loop analysis.  In this case we analyze the reference both wrt LOOP

>> +     and on the basis that the reference occurs (is "used") in LOOP;

>> +     see the comment above analyze_scalar_evolution_in_loop for more

>> +     information about this distinction.  The base, init, offset and

>> +     step fields are all invariant in LOOP.

>> +

>> +   Perform BB analysis if LOOP is null, or if LOOP is the function's

>> +   dummy outermost loop.  In other cases perform loop analysis.

>> +

>> +   Return true if the analysis succeeded and store the results in DR if so.

>> +   BB analysis can only fail for bitfield or reversed-storage accesses.  */

>>

>>  bool

>> -dr_analyze_innermost (struct data_reference *dr, struct loop *nest)

>> +dr_analyze_innermost (struct data_reference *dr, struct loop *loop)

>>  {

>> -  gimple *stmt = DR_STMT (dr);

>> -  struct loop *loop = loop_containing_stmt (stmt);

>>    tree ref = DR_REF (dr);

>>    HOST_WIDE_INT pbitsize, pbitpos;

>>    tree base, poffset;

>> @@ -806,22 +820,11 @@ dr_analyze_innermost (struct data_refere

>>

>>    if (in_loop)

> A nit.  No need to use bool variable now?  We can check loop != NULL instead.


As the comment says, we also want to do BB analysis if the loop is the
outermost dummy loop, so in_loop remains set to:

  bool in_loop = (loop && loop->num);

I think it's worth keeping that abstraction.

Thanks,
Richard
Bin.Cheng June 28, 2017, 4:05 p.m. UTC | #3
On Wed, Jun 28, 2017 at 3:04 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>> On Wed, Jun 28, 2017 at 2:36 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> Index: gcc/tree-data-ref.c

>>> ===================================================================

>>> --- gcc/tree-data-ref.c 2017-06-28 14:33:41.294720044 +0100

>>> +++ gcc/tree-data-ref.c 2017-06-28 14:35:30.475155670 +0100

>>> @@ -749,15 +749,29 @@ canonicalize_base_object_address (tree a

>>>    return build_fold_addr_expr (TREE_OPERAND (addr, 0));

>>>  }

>>>

>>> -/* Analyzes the behavior of the memory reference DR in the innermost loop or

>>> -   basic block that contains it.  Returns true if analysis succeed or false

>>> -   otherwise.  */

>>> +/* Analyze the behavior of memory reference DR.  There are two modes:

>>> +

>>> +   - BB analysis.  In this case we simply split the address into base,

>>> +     init and offset components, without reference to any containing loop.

>>> +     The resulting base and offset are general expressions and they can

>>> +     vary arbitrarily from one iteration of the containing loop to the next.

>>> +     The step is always zero.

>>> +

>>> +   - loop analysis.  In this case we analyze the reference both wrt LOOP

>>> +     and on the basis that the reference occurs (is "used") in LOOP;

>>> +     see the comment above analyze_scalar_evolution_in_loop for more

>>> +     information about this distinction.  The base, init, offset and

>>> +     step fields are all invariant in LOOP.

>>> +

>>> +   Perform BB analysis if LOOP is null, or if LOOP is the function's

>>> +   dummy outermost loop.  In other cases perform loop analysis.

>>> +

>>> +   Return true if the analysis succeeded and store the results in DR if so.

>>> +   BB analysis can only fail for bitfield or reversed-storage accesses.  */

>>>

>>>  bool

>>> -dr_analyze_innermost (struct data_reference *dr, struct loop *nest)

>>> +dr_analyze_innermost (struct data_reference *dr, struct loop *loop)

>>>  {

>>> -  gimple *stmt = DR_STMT (dr);

>>> -  struct loop *loop = loop_containing_stmt (stmt);

>>>    tree ref = DR_REF (dr);

>>>    HOST_WIDE_INT pbitsize, pbitpos;

>>>    tree base, poffset;

>>> @@ -806,22 +820,11 @@ dr_analyze_innermost (struct data_refere

>>>

>>>    if (in_loop)

>> A nit.  No need to use bool variable now?  We can check loop != NULL instead.

>

> As the comment says, we also want to do BB analysis if the loop is the

> outermost dummy loop, so in_loop remains set to:

>

>   bool in_loop = (loop && loop->num);

>

> I think it's worth keeping that abstraction.

Sorry if I misunderstand this.   "loop != NULL" is different to
in_loop only when loop is the outermost dummy loop.  Which means nest
!= NULL based on change:
-  dr_analyze_innermost (dr, nest);
+  dr_analyze_innermost (dr, nest != NULL ? loop : NULL);

I wonder if it's possible for caller to pass non-NULL nest when
create_data_ref for a statement in outermost dummy loop?  Or should
it?

Also noticed that in vect_analyze_data_refs, there is below code:

      /* If target supports vector gather loads or scatter stores, or if
         this might be a SIMD lane access, see if they can't be used.  */
      if (is_a <loop_vec_info> (vinfo)
          && (maybe_gather || maybe_scatter || maybe_simd_lane_access)
          && !nested_in_vect_loop_p (loop, stmt))
        {
          struct data_reference *newdr
        = create_data_ref (NULL, loop_containing_stmt (stmt),
                   DR_REF (dr), stmt, maybe_scatter ? false : true);

we have nest==NULL and loop is the inner loop, but after change NULL
is passed to dr_analyze_innermost because of:

-  dr_analyze_innermost (dr, nest);
+  dr_analyze_innermost (dr, nest != NULL ? loop : NULL);

Is this the reason that bb-slp-pr65935.c gets vectorized after change?
 simple_iv should return {&s->phase[dir], 0} successfully though.
Question is what would happen if simple_iv succeeds with non-ZERO step
when called with nest==NULL?  The patch skips simple_iv and forces
ZERO step?

Thanks,
bin
>

> Thanks,

> Richard

>
Richard Sandiford June 28, 2017, 4:56 p.m. UTC | #4
"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Wed, Jun 28, 2017 at 3:04 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>> On Wed, Jun 28, 2017 at 2:36 PM, Richard Sandiford

>>> <richard.sandiford@linaro.org> wrote:

>>>> Index: gcc/tree-data-ref.c

>>>> ===================================================================

>>>> --- gcc/tree-data-ref.c 2017-06-28 14:33:41.294720044 +0100

>>>> +++ gcc/tree-data-ref.c 2017-06-28 14:35:30.475155670 +0100

>>>> @@ -749,15 +749,29 @@ canonicalize_base_object_address (tree a

>>>>    return build_fold_addr_expr (TREE_OPERAND (addr, 0));

>>>>  }

>>>>

>>>> -/* Analyzes the behavior of the memory reference DR in the innermost loop or

>>>> -   basic block that contains it.  Returns true if analysis succeed or false

>>>> -   otherwise.  */

>>>> +/* Analyze the behavior of memory reference DR.  There are two modes:

>>>> +

>>>> +   - BB analysis.  In this case we simply split the address into base,

>>>> +     init and offset components, without reference to any containing loop.

>>>> +     The resulting base and offset are general expressions and they can

>>>> +     vary arbitrarily from one iteration of the containing loop to the next.

>>>> +     The step is always zero.

>>>> +

>>>> +   - loop analysis.  In this case we analyze the reference both wrt LOOP

>>>> +     and on the basis that the reference occurs (is "used") in LOOP;

>>>> +     see the comment above analyze_scalar_evolution_in_loop for more

>>>> +     information about this distinction.  The base, init, offset and

>>>> +     step fields are all invariant in LOOP.

>>>> +

>>>> +   Perform BB analysis if LOOP is null, or if LOOP is the function's

>>>> +   dummy outermost loop.  In other cases perform loop analysis.

>>>> +

>>>> +   Return true if the analysis succeeded and store the results in DR if so.

>>>> +   BB analysis can only fail for bitfield or reversed-storage accesses.  */

>>>>

>>>>  bool

>>>> -dr_analyze_innermost (struct data_reference *dr, struct loop *nest)

>>>> +dr_analyze_innermost (struct data_reference *dr, struct loop *loop)

>>>>  {

>>>> -  gimple *stmt = DR_STMT (dr);

>>>> -  struct loop *loop = loop_containing_stmt (stmt);

>>>>    tree ref = DR_REF (dr);

>>>>    HOST_WIDE_INT pbitsize, pbitpos;

>>>>    tree base, poffset;

>>>> @@ -806,22 +820,11 @@ dr_analyze_innermost (struct data_refere

>>>>

>>>>    if (in_loop)

>>> A nit.  No need to use bool variable now?  We can check loop != NULL instead.

>>

>> As the comment says, we also want to do BB analysis if the loop is the

>> outermost dummy loop, so in_loop remains set to:

>>

>>   bool in_loop = (loop && loop->num);

>>

>> I think it's worth keeping that abstraction.

> Sorry if I misunderstand this.   "loop != NULL" is different to

> in_loop only when loop is the outermost dummy loop.  Which means nest

> != NULL based on change:

> -  dr_analyze_innermost (dr, nest);

> +  dr_analyze_innermost (dr, nest != NULL ? loop : NULL);

>

> I wonder if it's possible for caller to pass non-NULL nest when

> create_data_ref for a statement in outermost dummy loop?  Or should

> it?


Yeah, it (intentionally) doesn't change whether in_loop is true for
calls via create_data_ref.  But the interface is useful for direct
calls to dr_analyze_innermost.  Currently that's just predcom,
but the later patch adds another in the vectoriser.

> Also noticed that in vect_analyze_data_refs, there is below code:

>

>       /* If target supports vector gather loads or scatter stores, or if

>          this might be a SIMD lane access, see if they can't be used.  */

>       if (is_a <loop_vec_info> (vinfo)

>           && (maybe_gather || maybe_scatter || maybe_simd_lane_access)

>           && !nested_in_vect_loop_p (loop, stmt))

>         {

>           struct data_reference *newdr

>         = create_data_ref (NULL, loop_containing_stmt (stmt),

>                    DR_REF (dr), stmt, maybe_scatter ? false : true);

>

> we have nest==NULL and loop is the inner loop, but after change NULL

> is passed to dr_analyze_innermost because of:

>

> -  dr_analyze_innermost (dr, nest);

> +  dr_analyze_innermost (dr, nest != NULL ? loop : NULL);

>

> Is this the reason that bb-slp-pr65935.c gets vectorized after change?

>  simple_iv should return {&s->phase[dir], 0} successfully though.


No, bb-slp-pr65935.c is testing BB SLP, so we wouldn't have a
loop_vec_info.

> Question is what would happen if simple_iv succeeds with non-ZERO step

> when called with nest==NULL?  The patch skips simple_iv and forces

> ZERO step?


Yeah, I mentioned that in the covering note:

  The handling seemed strange though.  If the DR was part of a loop,
  we still tried to express the base and offset values as IVs, potentially
  giving a nonzero step.  If that failed for any reason, we'd revert to
  using the original base and offset, just as we would if we hadn't asked
  for an IV in the first place.

And like I say, the patch that introduced that behaviour didn't seem
to rely on it.

Thanks,
Richard
Bin.Cheng June 28, 2017, 5:16 p.m. UTC | #5
On Wed, Jun 28, 2017 at 5:56 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>> On Wed, Jun 28, 2017 at 3:04 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>> On Wed, Jun 28, 2017 at 2:36 PM, Richard Sandiford

>>>> <richard.sandiford@linaro.org> wrote:

>>>>> Index: gcc/tree-data-ref.c

>>>>> ===================================================================

>>>>> --- gcc/tree-data-ref.c 2017-06-28 14:33:41.294720044 +0100

>>>>> +++ gcc/tree-data-ref.c 2017-06-28 14:35:30.475155670 +0100

>>>>> @@ -749,15 +749,29 @@ canonicalize_base_object_address (tree a

>>>>>    return build_fold_addr_expr (TREE_OPERAND (addr, 0));

>>>>>  }

>>>>>

>>>>> -/* Analyzes the behavior of the memory reference DR in the innermost loop or

>>>>> -   basic block that contains it.  Returns true if analysis succeed or false

>>>>> -   otherwise.  */

>>>>> +/* Analyze the behavior of memory reference DR.  There are two modes:

>>>>> +

>>>>> +   - BB analysis.  In this case we simply split the address into base,

>>>>> +     init and offset components, without reference to any containing loop.

>>>>> +     The resulting base and offset are general expressions and they can

>>>>> +     vary arbitrarily from one iteration of the containing loop to the next.

>>>>> +     The step is always zero.

>>>>> +

>>>>> +   - loop analysis.  In this case we analyze the reference both wrt LOOP

>>>>> +     and on the basis that the reference occurs (is "used") in LOOP;

>>>>> +     see the comment above analyze_scalar_evolution_in_loop for more

>>>>> +     information about this distinction.  The base, init, offset and

>>>>> +     step fields are all invariant in LOOP.

>>>>> +

>>>>> +   Perform BB analysis if LOOP is null, or if LOOP is the function's

>>>>> +   dummy outermost loop.  In other cases perform loop analysis.

>>>>> +

>>>>> +   Return true if the analysis succeeded and store the results in DR if so.

>>>>> +   BB analysis can only fail for bitfield or reversed-storage accesses.  */

>>>>>

>>>>>  bool

>>>>> -dr_analyze_innermost (struct data_reference *dr, struct loop *nest)

>>>>> +dr_analyze_innermost (struct data_reference *dr, struct loop *loop)

>>>>>  {

>>>>> -  gimple *stmt = DR_STMT (dr);

>>>>> -  struct loop *loop = loop_containing_stmt (stmt);

>>>>>    tree ref = DR_REF (dr);

>>>>>    HOST_WIDE_INT pbitsize, pbitpos;

>>>>>    tree base, poffset;

>>>>> @@ -806,22 +820,11 @@ dr_analyze_innermost (struct data_refere

>>>>>

>>>>>    if (in_loop)

>>>> A nit.  No need to use bool variable now?  We can check loop != NULL instead.

>>>

>>> As the comment says, we also want to do BB analysis if the loop is the

>>> outermost dummy loop, so in_loop remains set to:

>>>

>>>   bool in_loop = (loop && loop->num);

>>>

>>> I think it's worth keeping that abstraction.

>> Sorry if I misunderstand this.   "loop != NULL" is different to

>> in_loop only when loop is the outermost dummy loop.  Which means nest

>> != NULL based on change:

>> -  dr_analyze_innermost (dr, nest);

>> +  dr_analyze_innermost (dr, nest != NULL ? loop : NULL);

>>

>> I wonder if it's possible for caller to pass non-NULL nest when

>> create_data_ref for a statement in outermost dummy loop?  Or should

>> it?

>

> Yeah, it (intentionally) doesn't change whether in_loop is true for

> calls via create_data_ref.  But the interface is useful for direct

> calls to dr_analyze_innermost.  Currently that's just predcom,

> but the later patch adds another in the vectoriser.

>

>> Also noticed that in vect_analyze_data_refs, there is below code:

>>

>>       /* If target supports vector gather loads or scatter stores, or if

>>          this might be a SIMD lane access, see if they can't be used.  */

>>       if (is_a <loop_vec_info> (vinfo)

>>           && (maybe_gather || maybe_scatter || maybe_simd_lane_access)

>>           && !nested_in_vect_loop_p (loop, stmt))

>>         {

>>           struct data_reference *newdr

>>         = create_data_ref (NULL, loop_containing_stmt (stmt),

>>                    DR_REF (dr), stmt, maybe_scatter ? false : true);

>>

>> we have nest==NULL and loop is the inner loop, but after change NULL

>> is passed to dr_analyze_innermost because of:

>>

>> -  dr_analyze_innermost (dr, nest);

>> +  dr_analyze_innermost (dr, nest != NULL ? loop : NULL);

>>

>> Is this the reason that bb-slp-pr65935.c gets vectorized after change?

>>  simple_iv should return {&s->phase[dir], 0} successfully though.

>

> No, bb-slp-pr65935.c is testing BB SLP, so we wouldn't have a

> loop_vec_info.

>

>> Question is what would happen if simple_iv succeeds with non-ZERO step

>> when called with nest==NULL?  The patch skips simple_iv and forces

>> ZERO step?

>

> Yeah, I mentioned that in the covering note:

>

>   The handling seemed strange though.  If the DR was part of a loop,

>   we still tried to express the base and offset values as IVs, potentially

>   giving a nonzero step.  If that failed for any reason, we'd revert to

>   using the original base and offset, just as we would if we hadn't asked

>   for an IV in the first place.

Hmm, it says "If that failed for any reason, .... revert to using the
original base and offset, just as we would if we hadn't asked for an
IV in the first place", but question is what would happen if simple_iv
succeeds with nest==NULL.  After change, the successful simple_iv is
bypassed.  It's likely this case can't happen in reality.
Also the patch is a simplification for me and I don't have any objection here.

Thanks,
bin
>

> And like I say, the patch that introduced that behaviour didn't seem

> to rely on it.

>

> Thanks,

> Richard
Richard Sandiford June 28, 2017, 7:06 p.m. UTC | #6
"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Wed, Jun 28, 2017 at 5:56 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>> Question is what would happen if simple_iv succeeds with non-ZERO step

>>> when called with nest==NULL?  The patch skips simple_iv and forces

>>> ZERO step?

>>

>> Yeah, I mentioned that in the covering note:

>>

>>   The handling seemed strange though.  If the DR was part of a loop,

>>   we still tried to express the base and offset values as IVs, potentially

>>   giving a nonzero step.  If that failed for any reason, we'd revert to

>>   using the original base and offset, just as we would if we hadn't asked

>>   for an IV in the first place.

> Hmm, it says "If that failed for any reason, .... revert to using the

> original base and offset, just as we would if we hadn't asked for an

> IV in the first place", but question is what would happen if simple_iv

> succeeds with nest==NULL.  After change, the successful simple_iv is

> bypassed.  It's likely this case can't happen in reality.


I suspect we're really in violent agreement here :-) but the reason
I was surprised at the question was that you seemed to be treating
the nest==NULL && simple_iv/nonzero-step combination as a corner case
that hadn't really been addressed, whereas removing that combination is
the main point of the patch.  The reason for removing it isn't that it
can't happen (bb-slp-pr65935.c proves that it did, to detrimental effect).
The reason is that IMO it isn't useful.  I don't think it makes sense to
set the step to a nonzero value for "BB analysis".

E.g., to ask a few rhetorical questions: why should we do that only for
"simple" IVs when analysis will still succeed for complex IVs and leave
the step zero?  Why should the simple_iv case be restricted to constant
steps for nest==NULL, rather than allow steps that are invariant in the
containing loop, like nest!=NULL does?  Why should the IV analysis be
based on the immediately containing loop rather than some other loop,
if the caller has asked for BB analysis rather than analysis wrt a
particular loop?

The reason I went back to the revision that introduced this behavior
was to check whether it really did need the nest==NULL && simple_iv/
nonzero-step combination.  And AFAICT it didn't.  I think it was just
accidental.

> Also the patch is a simplification for me and I don't have any objection here.


Thanks,
Richard
Bin.Cheng June 28, 2017, 9:06 p.m. UTC | #7
On Wed, Jun 28, 2017 at 8:06 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>> On Wed, Jun 28, 2017 at 5:56 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>>>> Question is what would happen if simple_iv succeeds with non-ZERO step

>>>> when called with nest==NULL?  The patch skips simple_iv and forces

>>>> ZERO step?

>>>

>>> Yeah, I mentioned that in the covering note:

>>>

>>>   The handling seemed strange though.  If the DR was part of a loop,

>>>   we still tried to express the base and offset values as IVs, potentially

>>>   giving a nonzero step.  If that failed for any reason, we'd revert to

>>>   using the original base and offset, just as we would if we hadn't asked

>>>   for an IV in the first place.

>> Hmm, it says "If that failed for any reason, .... revert to using the

>> original base and offset, just as we would if we hadn't asked for an

>> IV in the first place", but question is what would happen if simple_iv

>> succeeds with nest==NULL.  After change, the successful simple_iv is

>> bypassed.  It's likely this case can't happen in reality.

>

> I suspect we're really in violent agreement here :-) but the reason

> I was surprised at the question was that you seemed to be treating

> the nest==NULL && simple_iv/nonzero-step combination as a corner case

> that hadn't really been addressed, whereas removing that combination is

> the main point of the patch.  The reason for removing it isn't that it

Yes, this is where I misunderstood the patch.  Thanks very much for
your patient explanation.  Now I can see it IS a nice simplification.

Thanks,
Bin
> can't happen (bb-slp-pr65935.c proves that it did, to detrimental effect).

> The reason is that IMO it isn't useful.  I don't think it makes sense to

> set the step to a nonzero value for "BB analysis".

>

> Thanks,

> Richard
Richard Biener June 29, 2017, 9:40 a.m. UTC | #8
On Wed, Jun 28, 2017 at 3:36 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> dr_analyze_innermost had a "struct loop *nest" parameter that acted

> like a boolean.  This was added in r179161, with the idea that a

> null nest selected BB-level analysis rather than loop analysis.

>

> The handling seemed strange though.  If the DR was part of a loop,

> we still tried to express the base and offset values as IVs, potentially

> giving a nonzero step.  If that failed for any reason, we'd revert to

> using the original base and offset, just as we would if we hadn't asked

> for an IV in the first place.

>

> It seems more natural to use the !in_loop handling whenever nest is null

> and always set the step to zero.  This actually enables one more SLP

> opportunity in bb-slp-pr65935.c.

>

> I checked out r179161 and tried the patch there.  The test case added

> in that revision still passes, so I don't think there was any particular

> need to check simple_iv.

>

> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?


I have a few additional comments for consideration.  I remembered code
in vect_compute_data_ref_alignment explicitely looking at DR_STEP in
BB mode:

  /* Similarly we can only use base and misalignment information relative to
     an innermost loop if the misalignment stays the same throughout the
     execution of the loop.  As above, this is the case if the stride of
     the dataref evenly divides by the vector size.  */
  else
    {
      tree step = DR_STEP (dr);
      unsigned vf = loop ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) : 1;

      if (tree_fits_shwi_p (step)
          && ((tree_to_shwi (step) * vf)
              % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                             "step doesn't divide the vector-size.\n");
          misalign = NULL_TREE;
        }
    }

so I guess with the change we may end up with worse (misalign is
always NULL?) or bogus (with DR_STEP == 0 the test is always false)
alignment analysis results for BB vectorization?  I guess worse
or we had the issue before if DR_STEP was just not analyzable.

The testcase that now gets vectorized tried to use the asm to prevent
vectorization but that of course doesn't work for BB vectorization.
Using volatile stores might.  I guess the chance that we miscompile
both loops in a way that we still pass the testcase (even the compare
loop could be BB vectorized I guess) is unlikely.

The patch is ok but I guess we need to keep an eye on BB vectorization
results for targets w/o unaligned vector loads/stores for the above
alignment issue?

Thanks,
Richard.

> Richard

>

>

> 2017-06-28  Richard Sandiford  <richard.sandiford@linaro.org>

>

> gcc/

>         * tree-data-ref.c (dr_analyze_innermost): Replace the "nest"

>         parameter with a "loop" parameter and use it instead of the

>         loop containing DR_STMT.  Don't check simple_iv when doing

>         BB analysis.  Describe the two analysis modes in the comment.

>

> gcc/testsuite/

>         * gcc.dg/vect/bb-slp-pr65935.c: Expect SLP to be used in main

>         as well.

>

> Index: gcc/tree-data-ref.c

> ===================================================================

> --- gcc/tree-data-ref.c 2017-06-28 14:33:41.294720044 +0100

> +++ gcc/tree-data-ref.c 2017-06-28 14:35:30.475155670 +0100

> @@ -749,15 +749,29 @@ canonicalize_base_object_address (tree a

>    return build_fold_addr_expr (TREE_OPERAND (addr, 0));

>  }

>

> -/* Analyzes the behavior of the memory reference DR in the innermost loop or

> -   basic block that contains it.  Returns true if analysis succeed or false

> -   otherwise.  */

> +/* Analyze the behavior of memory reference DR.  There are two modes:

> +

> +   - BB analysis.  In this case we simply split the address into base,

> +     init and offset components, without reference to any containing loop.

> +     The resulting base and offset are general expressions and they can

> +     vary arbitrarily from one iteration of the containing loop to the next.

> +     The step is always zero.

> +

> +   - loop analysis.  In this case we analyze the reference both wrt LOOP

> +     and on the basis that the reference occurs (is "used") in LOOP;

> +     see the comment above analyze_scalar_evolution_in_loop for more

> +     information about this distinction.  The base, init, offset and

> +     step fields are all invariant in LOOP.

> +

> +   Perform BB analysis if LOOP is null, or if LOOP is the function's

> +   dummy outermost loop.  In other cases perform loop analysis.

> +

> +   Return true if the analysis succeeded and store the results in DR if so.

> +   BB analysis can only fail for bitfield or reversed-storage accesses.  */

>

>  bool

> -dr_analyze_innermost (struct data_reference *dr, struct loop *nest)

> +dr_analyze_innermost (struct data_reference *dr, struct loop *loop)

>  {

> -  gimple *stmt = DR_STMT (dr);

> -  struct loop *loop = loop_containing_stmt (stmt);

>    tree ref = DR_REF (dr);

>    HOST_WIDE_INT pbitsize, pbitpos;

>    tree base, poffset;

> @@ -806,22 +820,11 @@ dr_analyze_innermost (struct data_refere

>

>    if (in_loop)

>      {

> -      if (!simple_iv (loop, loop_containing_stmt (stmt), base, &base_iv,

> -                      nest ? true : false))

> +      if (!simple_iv (loop, loop, base, &base_iv, true))

>          {

> -          if (nest)

> -            {

> -              if (dump_file && (dump_flags & TDF_DETAILS))

> -                fprintf (dump_file, "failed: evolution of base is not"

> -                                    " affine.\n");

> -              return false;

> -            }

> -          else

> -            {

> -              base_iv.base = base;

> -              base_iv.step = ssize_int (0);

> -              base_iv.no_overflow = true;

> -            }

> +         if (dump_file && (dump_flags & TDF_DETAILS))

> +           fprintf (dump_file, "failed: evolution of base is not affine.\n");

> +         return false;

>          }

>      }

>    else

> @@ -843,22 +846,11 @@ dr_analyze_innermost (struct data_refere

>            offset_iv.base = poffset;

>            offset_iv.step = ssize_int (0);

>          }

> -      else if (!simple_iv (loop, loop_containing_stmt (stmt),

> -                           poffset, &offset_iv,

> -                          nest ? true : false))

> +      else if (!simple_iv (loop, loop, poffset, &offset_iv, true))

>          {

> -          if (nest)

> -            {

> -              if (dump_file && (dump_flags & TDF_DETAILS))

> -                fprintf (dump_file, "failed: evolution of offset is not"

> -                                    " affine.\n");

> -              return false;

> -            }

> -          else

> -            {

> -              offset_iv.base = poffset;

> -              offset_iv.step = ssize_int (0);

> -            }

> +         if (dump_file && (dump_flags & TDF_DETAILS))

> +           fprintf (dump_file, "failed: evolution of offset is not affine.\n");

> +         return false;

>          }

>      }

>

> @@ -1077,7 +1069,7 @@ create_data_ref (loop_p nest, loop_p loo

>    DR_REF (dr) = memref;

>    DR_IS_READ (dr) = is_read;

>

> -  dr_analyze_innermost (dr, nest);

> +  dr_analyze_innermost (dr, nest != NULL ? loop : NULL);

>    dr_analyze_indices (dr, nest, loop);

>    dr_analyze_alias (dr);

>

> Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c

> ===================================================================

> --- gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c  2017-06-28 14:33:41.294720044 +0100

> +++ gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c  2017-06-28 14:34:09.357523025 +0100

> @@ -57,4 +57,6 @@ int main()

>    return 0;

>  }

>

> -/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */

> +/* We should also be able to use 2-lane SLP to initialize the real and

> +   imaginary components in the first loop of main.  */

> +/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp1" } } */
Richard Sandiford June 29, 2017, 10:32 a.m. UTC | #9
Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Jun 28, 2017 at 3:36 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> dr_analyze_innermost had a "struct loop *nest" parameter that acted

>> like a boolean.  This was added in r179161, with the idea that a

>> null nest selected BB-level analysis rather than loop analysis.

>>

>> The handling seemed strange though.  If the DR was part of a loop,

>> we still tried to express the base and offset values as IVs, potentially

>> giving a nonzero step.  If that failed for any reason, we'd revert to

>> using the original base and offset, just as we would if we hadn't asked

>> for an IV in the first place.

>>

>> It seems more natural to use the !in_loop handling whenever nest is null

>> and always set the step to zero.  This actually enables one more SLP

>> opportunity in bb-slp-pr65935.c.

>>

>> I checked out r179161 and tried the patch there.  The test case added

>> in that revision still passes, so I don't think there was any particular

>> need to check simple_iv.

>>

>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

>

> I have a few additional comments for consideration.  I remembered code

> in vect_compute_data_ref_alignment explicitely looking at DR_STEP in

> BB mode:

>

>   /* Similarly we can only use base and misalignment information relative to

>      an innermost loop if the misalignment stays the same throughout the

>      execution of the loop.  As above, this is the case if the stride of

>      the dataref evenly divides by the vector size.  */

>   else

>     {

>       tree step = DR_STEP (dr);

>       unsigned vf = loop ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) : 1;

>

>       if (tree_fits_shwi_p (step)

>           && ((tree_to_shwi (step) * vf)

>               % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))

>         {

>           if (dump_enabled_p ())

>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>                              "step doesn't divide the vector-size.\n");

>           misalign = NULL_TREE;

>         }

>     }

>

> so I guess with the change we may end up with worse (misalign is

> always NULL?) or bogus (with DR_STEP == 0 the test is always false)

> alignment analysis results for BB vectorization?  I guess worse

> or we had the issue before if DR_STEP was just not analyzable.


DR_STEP will always be zero for bb vectorisation, so the results
shouldn't get worse.  But the value that was previously a nonzero
step is now part of the base or offset instead (the choice between
the two being the same as it would be for get_inner_reference).
We still take the alignments of those into account, so I think
we should be safe (at least after DR_BASE_ALIGNMENT).

Like you say, we previously had the same situation for bases
that weren't simple IVs, or for bases that were simple IVs but
had an invariant rather than constant step.

> The testcase that now gets vectorized tried to use the asm to prevent

> vectorization but that of course doesn't work for BB vectorization.

> Using volatile stores might.  I guess the chance that we miscompile

> both loops in a way that we still pass the testcase (even the compare

> loop could be BB vectorized I guess) is unlikely.


Putting an extra volatile asm between the real and imag stores stops
the vectorisation, if you'd prefer that.

> The patch is ok but I guess we need to keep an eye on BB vectorization

> results for targets w/o unaligned vector loads/stores for the above

> alignment issue?


Thanks,
Richard
Richard Biener June 29, 2017, 10:40 a.m. UTC | #10
On Thu, Jun 29, 2017 at 12:32 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:

>> On Wed, Jun 28, 2017 at 3:36 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> dr_analyze_innermost had a "struct loop *nest" parameter that acted

>>> like a boolean.  This was added in r179161, with the idea that a

>>> null nest selected BB-level analysis rather than loop analysis.

>>>

>>> The handling seemed strange though.  If the DR was part of a loop,

>>> we still tried to express the base and offset values as IVs, potentially

>>> giving a nonzero step.  If that failed for any reason, we'd revert to

>>> using the original base and offset, just as we would if we hadn't asked

>>> for an IV in the first place.

>>>

>>> It seems more natural to use the !in_loop handling whenever nest is null

>>> and always set the step to zero.  This actually enables one more SLP

>>> opportunity in bb-slp-pr65935.c.

>>>

>>> I checked out r179161 and tried the patch there.  The test case added

>>> in that revision still passes, so I don't think there was any particular

>>> need to check simple_iv.

>>>

>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

>>

>> I have a few additional comments for consideration.  I remembered code

>> in vect_compute_data_ref_alignment explicitely looking at DR_STEP in

>> BB mode:

>>

>>   /* Similarly we can only use base and misalignment information relative to

>>      an innermost loop if the misalignment stays the same throughout the

>>      execution of the loop.  As above, this is the case if the stride of

>>      the dataref evenly divides by the vector size.  */

>>   else

>>     {

>>       tree step = DR_STEP (dr);

>>       unsigned vf = loop ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) : 1;

>>

>>       if (tree_fits_shwi_p (step)

>>           && ((tree_to_shwi (step) * vf)

>>               % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))

>>         {

>>           if (dump_enabled_p ())

>>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>>                              "step doesn't divide the vector-size.\n");

>>           misalign = NULL_TREE;

>>         }

>>     }

>>

>> so I guess with the change we may end up with worse (misalign is

>> always NULL?) or bogus (with DR_STEP == 0 the test is always false)

>> alignment analysis results for BB vectorization?  I guess worse

>> or we had the issue before if DR_STEP was just not analyzable.

>

> DR_STEP will always be zero for bb vectorisation, so the results

> shouldn't get worse.  But the value that was previously a nonzero

> step is now part of the base or offset instead (the choice between

> the two being the same as it would be for get_inner_reference).

> We still take the alignments of those into account, so I think

> we should be safe (at least after DR_BASE_ALIGNMENT).


I think for

  DR_ALIGNED_TO (dr) = size_int (highest_pow2_factor (offset_iv.base));

we have a harder time extracting alignment from sth like (sizetype)i * 4 + 8
than if the base is zero.  But yes, in principle we can do that and hopefully
SCEV analysis is not much more powerful than highest_pow2_factor.

> Like you say, we previously had the same situation for bases

> that weren't simple IVs, or for bases that were simple IVs but

> had an invariant rather than constant step.


Yes.

>> The testcase that now gets vectorized tried to use the asm to prevent

>> vectorization but that of course doesn't work for BB vectorization.

>> Using volatile stores might.  I guess the chance that we miscompile

>> both loops in a way that we still pass the testcase (even the compare

>> loop could be BB vectorized I guess) is unlikely.

>

> Putting an extra volatile asm between the real and imag stores stops

> the vectorisation, if you'd prefer that.


No, your patch is fine.

>> The patch is ok but I guess we need to keep an eye on BB vectorization

>> results for targets w/o unaligned vector loads/stores for the above

>> alignment issue?


Still keep an eye on SCEV vs. highest_pow2_factor from testsuite fallout.

Patch still ok,
Richard.

> Thanks,

> Richard
diff mbox

Patch

Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2017-06-28 14:33:41.294720044 +0100
+++ gcc/tree-data-ref.c	2017-06-28 14:35:30.475155670 +0100
@@ -749,15 +749,29 @@  canonicalize_base_object_address (tree a
   return build_fold_addr_expr (TREE_OPERAND (addr, 0));
 }
 
-/* Analyzes the behavior of the memory reference DR in the innermost loop or
-   basic block that contains it.  Returns true if analysis succeed or false
-   otherwise.  */
+/* Analyze the behavior of memory reference DR.  There are two modes:
+
+   - BB analysis.  In this case we simply split the address into base,
+     init and offset components, without reference to any containing loop.
+     The resulting base and offset are general expressions and they can
+     vary arbitrarily from one iteration of the containing loop to the next.
+     The step is always zero.
+
+   - loop analysis.  In this case we analyze the reference both wrt LOOP
+     and on the basis that the reference occurs (is "used") in LOOP;
+     see the comment above analyze_scalar_evolution_in_loop for more
+     information about this distinction.  The base, init, offset and
+     step fields are all invariant in LOOP.
+
+   Perform BB analysis if LOOP is null, or if LOOP is the function's
+   dummy outermost loop.  In other cases perform loop analysis.
+
+   Return true if the analysis succeeded and store the results in DR if so.
+   BB analysis can only fail for bitfield or reversed-storage accesses.  */
 
 bool
-dr_analyze_innermost (struct data_reference *dr, struct loop *nest)
+dr_analyze_innermost (struct data_reference *dr, struct loop *loop)
 {
-  gimple *stmt = DR_STMT (dr);
-  struct loop *loop = loop_containing_stmt (stmt);
   tree ref = DR_REF (dr);
   HOST_WIDE_INT pbitsize, pbitpos;
   tree base, poffset;
@@ -806,22 +820,11 @@  dr_analyze_innermost (struct data_refere
 
   if (in_loop)
     {
-      if (!simple_iv (loop, loop_containing_stmt (stmt), base, &base_iv,
-                      nest ? true : false))
+      if (!simple_iv (loop, loop, base, &base_iv, true))
         {
-          if (nest)
-            {
-              if (dump_file && (dump_flags & TDF_DETAILS))
-                fprintf (dump_file, "failed: evolution of base is not"
-                                    " affine.\n");
-              return false;
-            }
-          else
-            {
-              base_iv.base = base;
-              base_iv.step = ssize_int (0);
-              base_iv.no_overflow = true;
-            }
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "failed: evolution of base is not affine.\n");
+	  return false;
         }
     }
   else
@@ -843,22 +846,11 @@  dr_analyze_innermost (struct data_refere
           offset_iv.base = poffset;
           offset_iv.step = ssize_int (0);
         }
-      else if (!simple_iv (loop, loop_containing_stmt (stmt),
-                           poffset, &offset_iv,
-			   nest ? true : false))
+      else if (!simple_iv (loop, loop, poffset, &offset_iv, true))
         {
-          if (nest)
-            {
-              if (dump_file && (dump_flags & TDF_DETAILS))
-                fprintf (dump_file, "failed: evolution of offset is not"
-                                    " affine.\n");
-              return false;
-            }
-          else
-            {
-              offset_iv.base = poffset;
-              offset_iv.step = ssize_int (0);
-            }
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "failed: evolution of offset is not affine.\n");
+	  return false;
         }
     }
 
@@ -1077,7 +1069,7 @@  create_data_ref (loop_p nest, loop_p loo
   DR_REF (dr) = memref;
   DR_IS_READ (dr) = is_read;
 
-  dr_analyze_innermost (dr, nest);
+  dr_analyze_innermost (dr, nest != NULL ? loop : NULL);
   dr_analyze_indices (dr, nest, loop);
   dr_analyze_alias (dr);
 
Index: gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c	2017-06-28 14:33:41.294720044 +0100
+++ gcc/testsuite/gcc.dg/vect/bb-slp-pr65935.c	2017-06-28 14:34:09.357523025 +0100
@@ -57,4 +57,6 @@  int main()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp1" } } */
+/* We should also be able to use 2-lane SLP to initialize the real and
+   imaginary components in the first loop of main.  */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 2 "slp1" } } */