diff mbox series

[RFC] Improve tree DSE

Message ID CAELXzTOSDWemBGrLbJqu+1fUoWWJ9bUJCvzT8P8HPGci67HQEg@mail.gmail.com
State New
Headers show
Series [RFC] Improve tree DSE | expand

Commit Message

Kugan Vivekanandarajah April 10, 2018, 12:52 a.m. UTC
I would like to queue this patch for stage1 review.

In DSE, while in dse_classify_store, as soon as we see a PHI use
statement that is part of the loop, we are immediately giving up.

As far as I understand, this can be improved. Attached patch is trying
to walk the uses of the PHI statement (by recursively calling
dse_classify_store) and then making sure the obtained store is indeed
redundant.

This is partly as reported in one of the testcase from PR44612. But
this PR is about other issues that is not handled in this patch.

Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

Is this OK for next stage1?

Thanks,
Kugan


gcc/ChangeLog:

2018-04-10  Kugan Vivekanandarajah  <kuganv@linaro.org>

    * tree-ssa-dse.c (dse_classify_store): Handle recursive PHI.
    (dse_dom_walker::dse_optimize_stmt): Update call dse_classify_store.

gcc/testsuite/ChangeLog:

2018-04-10  Kugan Vivekanandarajah  <kuganv@linaro.org>

    * gcc.dg/tree-ssa/ssa-dse-31.c: New test.
    * gcc.dg/tree-ssa/ssa-dse-32.c: New test.

Comments

Jeff Law May 1, 2018, 3:43 p.m. UTC | #1
On 04/09/2018 06:52 PM, Kugan Vivekanandarajah wrote:
> I would like to queue this patch for stage1 review.

> 

> In DSE, while in dse_classify_store, as soon as we see a PHI use

> statement that is part of the loop, we are immediately giving up.

> 

> As far as I understand, this can be improved. Attached patch is trying

> to walk the uses of the PHI statement (by recursively calling

> dse_classify_store) and then making sure the obtained store is indeed

> redundant.

> 

> This is partly as reported in one of the testcase from PR44612. But

> this PR is about other issues that is not handled in this patch.

> 

> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

> 

> Is this OK for next stage1?

> 

> Thanks,

> Kugan

> 

> 

> gcc/ChangeLog:

> 

> 2018-04-10  Kugan Vivekanandarajah  <kuganv@linaro.org>

> 

>     * tree-ssa-dse.c (dse_classify_store): Handle recursive PHI.

>     (dse_dom_walker::dse_optimize_stmt): Update call dse_classify_store.

> 

> gcc/testsuite/ChangeLog:

> 

> 2018-04-10  Kugan Vivekanandarajah  <kuganv@linaro.org>

> 

>     * gcc.dg/tree-ssa/ssa-dse-31.c: New test.

>     * gcc.dg/tree-ssa/ssa-dse-32.c: New test.

> 

> 

> 0001-improve-dse.patch

> 

> 

> From 5751eaff3d1c263e8631d5a07e43fecaaa0e9d26 Mon Sep 17 00:00:00 2001

> From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>

> Date: Tue, 10 Apr 2018 09:49:10 +1000

> Subject: [PATCH] improve dse

> 

> ---

>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c | 16 ++++++++++

>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c | 23 ++++++++++++++

>  gcc/tree-ssa-dse.c                         | 51 ++++++++++++++++++++++++------

>  3 files changed, 81 insertions(+), 9 deletions(-)

>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c

>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c

> 


> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c

> index 9220fea..3513fda 100644

> --- a/gcc/tree-ssa-dse.c

> +++ b/gcc/tree-ssa-dse.c

> @@ -521,11 +521,11 @@ live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap live)

>     Return TRUE if the above conditions are met, otherwise FALSE.  */

>  

>  static dse_store_status

> -dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,

> -		    bool byte_tracking_enabled, sbitmap live_bytes)

> +dse_classify_store (ao_ref *ref, gimple *stmt_outer, gimple *stmt,

> +		    gimple **use_stmt, bool byte_tracking_enabled,

> +		    sbitmap live_bytes, unsigned cnt = 0)

>  {

>    gimple *temp;

> -  unsigned cnt = 0;

>  

>    *use_stmt = NULL;

>  

> @@ -556,9 +556,11 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,

>  	{

>  	  cnt++;

>  

> +	  if (use_stmt == stmt_outer)

> +	    continue;

So is this really safe?  This seems to catch the case where the
recursive call stumbles onto the same statement we're already
processing.  ie, we've followed a loop backedge.

ISTM that further analysis here  is needed -- don't you have to make
sure that USE_STMT does not read from REF?  It could be a memmove call
for example.

I'm also struggling a little bit to see much value in handling this
case.  In the included testcases we've got a memset in a loop where the
args do not vary across the loop iterations and there are no reads from
the memory location within the loop. How realistic is that?


If you're looking to improve DSE, the cases in 63185, 64380 and 79958
may be interesting.
Kugan Vivekanandarajah May 1, 2018, 11:16 p.m. UTC | #2
Hi Jeff,

Thanks for the review.

On 2 May 2018 at 01:43, Jeff Law <law@redhat.com> wrote:
> On 04/09/2018 06:52 PM, Kugan Vivekanandarajah wrote:

>> I would like to queue this patch for stage1 review.

>>

>> In DSE, while in dse_classify_store, as soon as we see a PHI use

>> statement that is part of the loop, we are immediately giving up.

>>

>> As far as I understand, this can be improved. Attached patch is trying

>> to walk the uses of the PHI statement (by recursively calling

>> dse_classify_store) and then making sure the obtained store is indeed

>> redundant.

>>

>> This is partly as reported in one of the testcase from PR44612. But

>> this PR is about other issues that is not handled in this patch.

>>

>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>

>> Is this OK for next stage1?

>>

>> Thanks,

>> Kugan

>>

>>

>> gcc/ChangeLog:

>>

>> 2018-04-10  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     * tree-ssa-dse.c (dse_classify_store): Handle recursive PHI.

>>     (dse_dom_walker::dse_optimize_stmt): Update call dse_classify_store.

>>

>> gcc/testsuite/ChangeLog:

>>

>> 2018-04-10  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     * gcc.dg/tree-ssa/ssa-dse-31.c: New test.

>>     * gcc.dg/tree-ssa/ssa-dse-32.c: New test.

>>

>>

>> 0001-improve-dse.patch

>>

>>

>> From 5751eaff3d1c263e8631d5a07e43fecaaa0e9d26 Mon Sep 17 00:00:00 2001

>> From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>

>> Date: Tue, 10 Apr 2018 09:49:10 +1000

>> Subject: [PATCH] improve dse

>>

>> ---

>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c | 16 ++++++++++

>>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c | 23 ++++++++++++++

>>  gcc/tree-ssa-dse.c                         | 51 ++++++++++++++++++++++++------

>>  3 files changed, 81 insertions(+), 9 deletions(-)

>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c

>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c

>>

>

>> diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c

>> index 9220fea..3513fda 100644

>> --- a/gcc/tree-ssa-dse.c

>> +++ b/gcc/tree-ssa-dse.c

>> @@ -521,11 +521,11 @@ live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap live)

>>     Return TRUE if the above conditions are met, otherwise FALSE.  */

>>

>>  static dse_store_status

>> -dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,

>> -                 bool byte_tracking_enabled, sbitmap live_bytes)

>> +dse_classify_store (ao_ref *ref, gimple *stmt_outer, gimple *stmt,

>> +                 gimple **use_stmt, bool byte_tracking_enabled,

>> +                 sbitmap live_bytes, unsigned cnt = 0)

>>  {

>>    gimple *temp;

>> -  unsigned cnt = 0;

>>

>>    *use_stmt = NULL;

>>

>> @@ -556,9 +556,11 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,

>>       {

>>         cnt++;

>>

>> +       if (use_stmt == stmt_outer)

>> +         continue;

> So is this really safe?  This seems to catch the case where the

> recursive call stumbles onto the same statement we're already

> processing.  ie, we've followed a loop backedge.

>

> ISTM that further analysis here  is needed -- don't you have to make

> sure that USE_STMT does not read from REF?  It could be a memmove call

> for example.

I think you are right. This has to be handled.

>

> I'm also struggling a little bit to see much value in handling this

> case.  In the included testcases we've got a memset in a loop where the

> args do not vary across the loop iterations and there are no reads from

> the memory location within the loop. How realistic is that?


I was looking into another case from an application but that was not
handled partly due to limitations of alias analysis and thought that
this could be handled. If you think that this is not going to happen
often in practice, I agree that this is not worth the trouble.

>

>

> If you're looking to improve DSE, the cases in 63185, 64380 and 79958

> may be interesting.


Thanks for the pointers. I will have a look at them.

Thanks,
Kugan

>
Richard Biener May 2, 2018, 9:27 a.m. UTC | #3
On Tue, Apr 10, 2018 at 2:52 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> I would like to queue this patch for stage1 review.

>

> In DSE, while in dse_classify_store, as soon as we see a PHI use

> statement that is part of the loop, we are immediately giving up.

>

> As far as I understand, this can be improved. Attached patch is trying

> to walk the uses of the PHI statement (by recursively calling

> dse_classify_store) and then making sure the obtained store is indeed

> redundant.

>

> This is partly as reported in one of the testcase from PR44612. But

> this PR is about other issues that is not handled in this patch.

>

> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>

> Is this OK for next stage1?


              if (temp
                  /* Make sure we are not in a loop latch block.  */
                  || gimple_bb (stmt) == gimple_bb (use_stmt)
-                 || dominated_by_p (CDI_DOMINATORS,
-                                    gimple_bb (stmt), gimple_bb (use_stmt))
                  /* We can look through PHIs to regions post-dominating

you are removing part of the latch-block check but not the other.

+                 /* If stmt dominates PHI stmt, follow the PHI stmt.  */
+                 if (!temp)

well, just do this check earlier.  Or rather, it's already done in the
test above.

+                     /* Makesure the use stmt found is post dominated.  */
+                     && dominated_by_p (CDI_POST_DOMINATORS,
+                                        gimple_bb (stmt_outer),
gimple_bb (inner_use_stmt))

I think that this check also covers gimple_bb (stmt_outer) ==
gimple_bb (inner_use_stmt)
so for that case you'd need to check stmt dominance.  But better just give up?

Given the simple testcases you add I wonder if you want a cheaper
implementation,
namely check that when reaching a loop PHI the only aliasing stmt in
its use-chain
is the use_stmt you reached the PHI from.  That would avoid this and the tests
for the store being redundant and simplify the patch considerably.

Thanks,
Richard.

> Thanks,

> Kugan

>

>

> gcc/ChangeLog:

>

> 2018-04-10  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     * tree-ssa-dse.c (dse_classify_store): Handle recursive PHI.

>     (dse_dom_walker::dse_optimize_stmt): Update call dse_classify_store.

>

> gcc/testsuite/ChangeLog:

>

> 2018-04-10  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     * gcc.dg/tree-ssa/ssa-dse-31.c: New test.

>     * gcc.dg/tree-ssa/ssa-dse-32.c: New test.
Jeff Law May 2, 2018, 4:17 p.m. UTC | #4
On 05/02/2018 03:27 AM, Richard Biener wrote:
> On Tue, Apr 10, 2018 at 2:52 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> I would like to queue this patch for stage1 review.

>>

>> In DSE, while in dse_classify_store, as soon as we see a PHI use

>> statement that is part of the loop, we are immediately giving up.

>>

>> As far as I understand, this can be improved. Attached patch is trying

>> to walk the uses of the PHI statement (by recursively calling

>> dse_classify_store) and then making sure the obtained store is indeed

>> redundant.

>>

>> This is partly as reported in one of the testcase from PR44612. But

>> this PR is about other issues that is not handled in this patch.

>>

>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>

>> Is this OK for next stage1?

> 

>               if (temp

>                   /* Make sure we are not in a loop latch block.  */

>                   || gimple_bb (stmt) == gimple_bb (use_stmt)

> -                 || dominated_by_p (CDI_DOMINATORS,

> -                                    gimple_bb (stmt), gimple_bb (use_stmt))

>                   /* We can look through PHIs to regions post-dominating

> 

> you are removing part of the latch-block check but not the other.

> 

> +                 /* If stmt dominates PHI stmt, follow the PHI stmt.  */

> +                 if (!temp)

> 

> well, just do this check earlier.  Or rather, it's already done in the

> test above.

> 

> +                     /* Makesure the use stmt found is post dominated.  */

> +                     && dominated_by_p (CDI_POST_DOMINATORS,

> +                                        gimple_bb (stmt_outer),

> gimple_bb (inner_use_stmt))

> 

> I think that this check also covers gimple_bb (stmt_outer) ==

> gimple_bb (inner_use_stmt)

> so for that case you'd need to check stmt dominance.  But better just give up?

> 

> Given the simple testcases you add I wonder if you want a cheaper

> implementation,

> namely check that when reaching a loop PHI the only aliasing stmt in

> its use-chain

> is the use_stmt you reached the PHI from.  That would avoid this and the tests

> for the store being redundant and simplify the patch considerably.

Yea,  but the ideas in the patch might be useful for some of the other
DSE missed optimizations :-)

Jeff
Richard Biener May 2, 2018, 5:36 p.m. UTC | #5
On May 2, 2018 6:17:50 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:
>On 05/02/2018 03:27 AM, Richard Biener wrote:

>> On Tue, Apr 10, 2018 at 2:52 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> I would like to queue this patch for stage1 review.

>>>

>>> In DSE, while in dse_classify_store, as soon as we see a PHI use

>>> statement that is part of the loop, we are immediately giving up.

>>>

>>> As far as I understand, this can be improved. Attached patch is

>trying

>>> to walk the uses of the PHI statement (by recursively calling

>>> dse_classify_store) and then making sure the obtained store is

>indeed

>>> redundant.

>>>

>>> This is partly as reported in one of the testcase from PR44612. But

>>> this PR is about other issues that is not handled in this patch.

>>>

>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new

>regressions.

>>>

>>> Is this OK for next stage1?

>> 

>>               if (temp

>>                   /* Make sure we are not in a loop latch block.  */

>>                   || gimple_bb (stmt) == gimple_bb (use_stmt)

>> -                 || dominated_by_p (CDI_DOMINATORS,

>> -                                    gimple_bb (stmt), gimple_bb

>(use_stmt))

>>                   /* We can look through PHIs to regions

>post-dominating

>> 

>> you are removing part of the latch-block check but not the other.

>> 

>> +                 /* If stmt dominates PHI stmt, follow the PHI stmt.

> */

>> +                 if (!temp)

>> 

>> well, just do this check earlier.  Or rather, it's already done in

>the

>> test above.

>> 

>> +                     /* Makesure the use stmt found is post

>dominated.  */

>> +                     && dominated_by_p (CDI_POST_DOMINATORS,

>> +                                        gimple_bb (stmt_outer),

>> gimple_bb (inner_use_stmt))

>> 

>> I think that this check also covers gimple_bb (stmt_outer) ==

>> gimple_bb (inner_use_stmt)

>> so for that case you'd need to check stmt dominance.  But better just

>give up?

>> 

>> Given the simple testcases you add I wonder if you want a cheaper

>> implementation,

>> namely check that when reaching a loop PHI the only aliasing stmt in

>> its use-chain

>> is the use_stmt you reached the PHI from.  That would avoid this and

>the tests

>> for the store being redundant and simplify the patch considerably.

>Yea,  but the ideas in the patch might be useful for some of the other

>DSE missed optimizations :-)


Note that what we want in the end is some kind of general partial dead code elimination pass that also handles stores. There was a SSU PRE implementation as part of a gsoc project many years ago. I believe building on top of the ad-hoc DSE pass we have right now is not the very best thing to do long term. SSU PRE also performs store/code sinking thus merging with the also ad-hoc sinking pass... 

Richard. 

>Jeff
Jeff Law May 2, 2018, 5:39 p.m. UTC | #6
On 05/02/2018 11:36 AM, Richard Biener wrote:
> On May 2, 2018 6:17:50 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:

>> On 05/02/2018 03:27 AM, Richard Biener wrote:

>>> On Tue, Apr 10, 2018 at 2:52 AM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> I would like to queue this patch for stage1 review.

>>>>

>>>> In DSE, while in dse_classify_store, as soon as we see a PHI use

>>>> statement that is part of the loop, we are immediately giving up.

>>>>

>>>> As far as I understand, this can be improved. Attached patch is

>> trying

>>>> to walk the uses of the PHI statement (by recursively calling

>>>> dse_classify_store) and then making sure the obtained store is

>> indeed

>>>> redundant.

>>>>

>>>> This is partly as reported in one of the testcase from PR44612. But

>>>> this PR is about other issues that is not handled in this patch.

>>>>

>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new

>> regressions.

>>>>

>>>> Is this OK for next stage1?

>>>

>>>               if (temp

>>>                   /* Make sure we are not in a loop latch block.  */

>>>                   || gimple_bb (stmt) == gimple_bb (use_stmt)

>>> -                 || dominated_by_p (CDI_DOMINATORS,

>>> -                                    gimple_bb (stmt), gimple_bb

>> (use_stmt))

>>>                   /* We can look through PHIs to regions

>> post-dominating

>>>

>>> you are removing part of the latch-block check but not the other.

>>>

>>> +                 /* If stmt dominates PHI stmt, follow the PHI stmt.

>> */

>>> +                 if (!temp)

>>>

>>> well, just do this check earlier.  Or rather, it's already done in

>> the

>>> test above.

>>>

>>> +                     /* Makesure the use stmt found is post

>> dominated.  */

>>> +                     && dominated_by_p (CDI_POST_DOMINATORS,

>>> +                                        gimple_bb (stmt_outer),

>>> gimple_bb (inner_use_stmt))

>>>

>>> I think that this check also covers gimple_bb (stmt_outer) ==

>>> gimple_bb (inner_use_stmt)

>>> so for that case you'd need to check stmt dominance.  But better just

>> give up?

>>>

>>> Given the simple testcases you add I wonder if you want a cheaper

>>> implementation,

>>> namely check that when reaching a loop PHI the only aliasing stmt in

>>> its use-chain

>>> is the use_stmt you reached the PHI from.  That would avoid this and

>> the tests

>>> for the store being redundant and simplify the patch considerably.

>> Yea,  but the ideas in the patch might be useful for some of the other

>> DSE missed optimizations :-)

> 

> Note that what we want in the end is some kind of general partial dead code elimination pass that also handles stores. There was a SSU PRE implementation as part of a gsoc project many years ago. I believe building on top of the ad-hoc DSE pass we have right now is not the very best thing to do long term. SSU PRE also performs store/code sinking thus merging with the also ad-hoc sinking pass... 

If we did good store sinking I think partial dead store elimination just
falls out with the existing tree-ssa-dse.c implementation.  The cases I
was thinking about are already fully redundant, but the structure of
tree-ssa-dse isn't great for discovering them.

jeff
Richard Biener May 3, 2018, 7:23 a.m. UTC | #7
On Wed, May 2, 2018 at 7:39 PM, Jeff Law <law@redhat.com> wrote:
> On 05/02/2018 11:36 AM, Richard Biener wrote:

>> On May 2, 2018 6:17:50 PM GMT+02:00, Jeff Law <law@redhat.com> wrote:

>>> On 05/02/2018 03:27 AM, Richard Biener wrote:

>>>> On Tue, Apr 10, 2018 at 2:52 AM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> I would like to queue this patch for stage1 review.

>>>>>

>>>>> In DSE, while in dse_classify_store, as soon as we see a PHI use

>>>>> statement that is part of the loop, we are immediately giving up.

>>>>>

>>>>> As far as I understand, this can be improved. Attached patch is

>>> trying

>>>>> to walk the uses of the PHI statement (by recursively calling

>>>>> dse_classify_store) and then making sure the obtained store is

>>> indeed

>>>>> redundant.

>>>>>

>>>>> This is partly as reported in one of the testcase from PR44612. But

>>>>> this PR is about other issues that is not handled in this patch.

>>>>>

>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new

>>> regressions.

>>>>>

>>>>> Is this OK for next stage1?

>>>>

>>>>               if (temp

>>>>                   /* Make sure we are not in a loop latch block.  */

>>>>                   || gimple_bb (stmt) == gimple_bb (use_stmt)

>>>> -                 || dominated_by_p (CDI_DOMINATORS,

>>>> -                                    gimple_bb (stmt), gimple_bb

>>> (use_stmt))

>>>>                   /* We can look through PHIs to regions

>>> post-dominating

>>>>

>>>> you are removing part of the latch-block check but not the other.

>>>>

>>>> +                 /* If stmt dominates PHI stmt, follow the PHI stmt.

>>> */

>>>> +                 if (!temp)

>>>>

>>>> well, just do this check earlier.  Or rather, it's already done in

>>> the

>>>> test above.

>>>>

>>>> +                     /* Makesure the use stmt found is post

>>> dominated.  */

>>>> +                     && dominated_by_p (CDI_POST_DOMINATORS,

>>>> +                                        gimple_bb (stmt_outer),

>>>> gimple_bb (inner_use_stmt))

>>>>

>>>> I think that this check also covers gimple_bb (stmt_outer) ==

>>>> gimple_bb (inner_use_stmt)

>>>> so for that case you'd need to check stmt dominance.  But better just

>>> give up?

>>>>

>>>> Given the simple testcases you add I wonder if you want a cheaper

>>>> implementation,

>>>> namely check that when reaching a loop PHI the only aliasing stmt in

>>>> its use-chain

>>>> is the use_stmt you reached the PHI from.  That would avoid this and

>>> the tests

>>>> for the store being redundant and simplify the patch considerably.

>>> Yea,  but the ideas in the patch might be useful for some of the other

>>> DSE missed optimizations :-)

>>

>> Note that what we want in the end is some kind of general partial dead code elimination pass that also handles stores. There was a SSU PRE implementation as part of a gsoc project many years ago. I believe building on top of the ad-hoc DSE pass we have right now is not the very best thing to do long term. SSU PRE also performs store/code sinking thus merging with the also ad-hoc sinking pass...

> If we did good store sinking I think partial dead store elimination just

> falls out with the existing tree-ssa-dse.c implementation.  The cases I

> was thinking about are already fully redundant, but the structure of

> tree-ssa-dse isn't great for discovering them.


Yeah.

As a side-note - Kugan, you rely on post-dominance checks for
correctness but post-dominance
isn't what you expect when dealing with inifinite loops (with blocks
not reverse reachable from exit).
So relying on post-dominance for correctness is very fragile - we've
been bitten by this before.

Richard.

>

> jeff
Kugan Vivekanandarajah May 14, 2018, 3:32 a.m. UTC | #8
Hi Richard,

>> Given the simple testcases you add I wonder if you want a cheaper

>> implementation,

>> namely check that when reaching a loop PHI the only aliasing stmt in

>> its use-chain

>> is the use_stmt you reached the PHI from.  That would avoid this and the tests

>> for the store being redundant and simplify the patch considerably.


Tried implementing above in the attached patch.  Bootstrapped on
x86_64-linux-gnu. Full testing is ongoing.

Thanks,
Kugan

gcc/ChangeLog:

2018-05-14  Kugan Vivekanandarajah  <kuganv@linaro.org>

    * tree-ssa-dse.c (phi_aliases_stmt_only): New.
    (dse_classify_store): Use phi_aliases_stmt_only.

gcc/testsuite/ChangeLog:

2018-05-14  Kugan Vivekanandarajah  <kuganv@linaro.org>

    * gcc.dg/tree-ssa/ssa-dse-31.c: New test.
    * gcc.dg/tree-ssa/ssa-dse-32.c: New test.
From 102b1dd676446055fb881daa1fee4e96b6fe676d Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Wed, 9 May 2018 08:57:23 +1000
Subject: [PATCH] improve dse Change-Id:
 If23529a3ede8230b26de8d60c1e0c5141be8edb7

---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c | 16 +++++++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c | 23 +++++++++++++++++++++
 gcc/tree-ssa-dse.c                         | 33 +++++++++++++++++++++++++++---
 3 files changed, 69 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c
new file mode 100644
index 0000000..e4d71b2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c
@@ -0,0 +1,16 @@
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse-details" } */
+#define SIZE 4
+
+int main ()
+{
+  static float a[SIZE];
+  int i;
+  for (i = 0; i < SIZE; i++)
+   __builtin_memset ((void *) a, 0, sizeof(float)*3);
+   __builtin_memset ((void *) a, 0, sizeof(float)*SIZE);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Deleted dead calls" 1 "dse1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c
new file mode 100644
index 0000000..3d8fd5f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c
@@ -0,0 +1,23 @@
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse-details" } */
+#define SIZE 4
+
+void s4 (float *restrict a)
+{
+  (void) __builtin_memset ((void *) a, 0, sizeof(float)*SIZE);
+}
+
+
+int main ()
+{
+  int i;
+  float a[10];
+  printf("Start\n");
+  for (i = 0; i < SIZE; i++)
+    s4 (a);
+  printf("Done\n");
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Deleted dead calls" 1 "dse1"} } */
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 9220fea..6522a94 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -515,6 +515,28 @@ live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap live)
   return true;
 }
 
+/* Return true if PHI stmt aliases only STMT1. */
+
+static bool
+phi_aliases_stmt_only (gphi *phi, gimple *stmt1)
+{
+  gimple *phi_use;
+  imm_use_iterator ui2;
+  tree def = PHI_RESULT (phi);
+  bool ok = true;
+
+  FOR_EACH_IMM_USE_STMT (phi_use, ui2, def)
+    {
+      if (phi_use != stmt1)
+	{
+	  ok = false;
+	  BREAK_FROM_IMM_USE_STMT (ui2);
+	}
+    }
+
+  return ok;
+}
+
 /* A helper of dse_optimize_stmt.
    Given a GIMPLE_ASSIGN in STMT that writes to REF, find a candidate
    statement *USE_STMT that may prove STMT to be dead.
@@ -571,9 +593,14 @@ dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
 	    {
 	      if (temp
 		  /* Make sure we are not in a loop latch block.  */
-		  || gimple_bb (stmt) == gimple_bb (use_stmt)
-		  || dominated_by_p (CDI_DOMINATORS,
-				     gimple_bb (stmt), gimple_bb (use_stmt))
+		  || ((gimple_bb (stmt) == gimple_bb (use_stmt)
+		       || dominated_by_p (CDI_DOMINATORS,
+					  gimple_bb (stmt), gimple_bb (use_stmt)))
+		      /* When reaching a loop PHI, the only aliasing stmt
+			 in its use-chain is the stmt you reached the
+			 PHI is OK.  */
+		      && !phi_aliases_stmt_only (as_a <gphi *> (use_stmt),
+						 stmt))
 		  /* We can look through PHIs to regions post-dominating
 		     the DSE candidate stmt.  */
 		  || !dominated_by_p (CDI_POST_DOMINATORS,
Richard Biener May 14, 2018, 12:12 p.m. UTC | #9
On Mon, May 14, 2018 at 5:32 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Richard,

>

>>> Given the simple testcases you add I wonder if you want a cheaper

>>> implementation,

>>> namely check that when reaching a loop PHI the only aliasing stmt in

>>> its use-chain

>>> is the use_stmt you reached the PHI from.  That would avoid this and the tests

>>> for the store being redundant and simplify the patch considerably.

>

> Tried implementing above in the attached patch.  Bootstrapped on

> x86_64-linux-gnu. Full testing is ongoing.


I think your phi_aliases_stmt_only is equal to

   has_single_use (PHI_RESULT (phi))
   && PHI_RESULT (phi) == gimple_vuse (defvar_def)

that is, we are sure we have

  phidef = PHI < , ... defvar>

  # defvar = VUSE <phidef>
  defvar_def stmt

testing it this way is also more clearly matchign the intended
structure.  Maybe you can
add the above IL picture as comment.

OK with that changes.

Richard.

> Thanks,

> Kugan

>

> gcc/ChangeLog:

>

> 2018-05-14  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     * tree-ssa-dse.c (phi_aliases_stmt_only): New.

>     (dse_classify_store): Use phi_aliases_stmt_only.

>

> gcc/testsuite/ChangeLog:

>

> 2018-05-14  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     * gcc.dg/tree-ssa/ssa-dse-31.c: New test.

>     * gcc.dg/tree-ssa/ssa-dse-32.c: New test.
Jeff Law May 14, 2018, 8:15 p.m. UTC | #10
On 05/01/2018 05:16 PM, Kugan Vivekanandarajah wrote:
>> I'm also struggling a little bit to see much value in handling this

>> case.  In the included testcases we've got a memset in a loop where the

>> args do not vary across the loop iterations and there are no reads from

>> the memory location within the loop. How realistic is that?

> 

> I was looking into another case from an application but that was not

> handled partly due to limitations of alias analysis and thought that

> this could be handled. If you think that this is not going to happen

> often in practice, I agree that this is not worth the trouble.

I'm not sure, that's why I asked :-)

If you've seen this happen in your larger application or it's a
prerequisite for addressing a missed DSE in your larger application,
then that's probably enough justification to move forward with the
simpler solution Richi suggested.


Jeff
diff mbox series

Patch

From 5751eaff3d1c263e8631d5a07e43fecaaa0e9d26 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Tue, 10 Apr 2018 09:49:10 +1000
Subject: [PATCH] improve dse

---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c | 16 ++++++++++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c | 23 ++++++++++++++
 gcc/tree-ssa-dse.c                         | 51 ++++++++++++++++++++++++------
 3 files changed, 81 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c
new file mode 100644
index 0000000..e4d71b2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-31.c
@@ -0,0 +1,16 @@ 
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse-details" } */
+#define SIZE 4
+
+int main ()
+{
+  static float a[SIZE];
+  int i;
+  for (i = 0; i < SIZE; i++)
+   __builtin_memset ((void *) a, 0, sizeof(float)*3);
+   __builtin_memset ((void *) a, 0, sizeof(float)*SIZE);
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Deleted dead calls" 1 "dse1"} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c
new file mode 100644
index 0000000..3d8fd5f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-32.c
@@ -0,0 +1,23 @@ 
+
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse-details" } */
+#define SIZE 4
+
+void s4 (float *restrict a)
+{
+  (void) __builtin_memset ((void *) a, 0, sizeof(float)*SIZE);
+}
+
+
+int main ()
+{
+  int i;
+  float a[10];
+  printf("Start\n");
+  for (i = 0; i < SIZE; i++)
+    s4 (a);
+  printf("Done\n");
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "Deleted dead calls" 1 "dse1"} } */
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 9220fea..3513fda 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -521,11 +521,11 @@  live_bytes_read (ao_ref use_ref, ao_ref *ref, sbitmap live)
    Return TRUE if the above conditions are met, otherwise FALSE.  */
 
 static dse_store_status
-dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
-		    bool byte_tracking_enabled, sbitmap live_bytes)
+dse_classify_store (ao_ref *ref, gimple *stmt_outer, gimple *stmt,
+		    gimple **use_stmt, bool byte_tracking_enabled,
+		    sbitmap live_bytes, unsigned cnt = 0)
 {
   gimple *temp;
-  unsigned cnt = 0;
 
   *use_stmt = NULL;
 
@@ -556,9 +556,11 @@  dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
 	{
 	  cnt++;
 
+	  if (use_stmt == stmt_outer)
+	    continue;
 	  /* If we ever reach our DSE candidate stmt again fail.  We
 	     cannot handle dead stores in loops.  */
-	  if (use_stmt == stmt)
+	  else if (use_stmt == stmt)
 	    {
 	      fail = true;
 	      BREAK_FROM_IMM_USE_STMT (ui);
@@ -572,8 +574,6 @@  dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
 	      if (temp
 		  /* Make sure we are not in a loop latch block.  */
 		  || gimple_bb (stmt) == gimple_bb (use_stmt)
-		  || dominated_by_p (CDI_DOMINATORS,
-				     gimple_bb (stmt), gimple_bb (use_stmt))
 		  /* We can look through PHIs to regions post-dominating
 		     the DSE candidate stmt.  */
 		  || !dominated_by_p (CDI_POST_DOMINATORS,
@@ -582,8 +582,41 @@  dse_classify_store (ao_ref *ref, gimple *stmt, gimple **use_stmt,
 		  fail = true;
 		  BREAK_FROM_IMM_USE_STMT (ui);
 		}
+	      else if (dominated_by_p (CDI_DOMINATORS,
+				       gimple_bb (stmt), gimple_bb (use_stmt)))
+		{
+		  gphi *phi = as_a <gphi *> (use_stmt);
+		  gimple *def_stmt = SSA_NAME_DEF_STMT (PHI_RESULT (phi));
+		  enum dse_store_status status = DSE_STORE_LIVE;
+		  ao_ref use_ref;
+		  gimple *inner_use_stmt;
+
+		  /* If stmt dominates PHI stmt, follow the PHI stmt.  */
+		  if (!temp)
+		    status = dse_classify_store (ref, stmt, def_stmt, &inner_use_stmt,
+						 byte_tracking_enabled,
+						 live_bytes, cnt);
+
+		  if (status == DSE_STORE_DEAD
+		      /* Makesure the use stmt found is post dominated.  */
+		      && dominated_by_p (CDI_POST_DOMINATORS,
+					 gimple_bb (stmt_outer), gimple_bb (inner_use_stmt))
+		      /* Also check that the store is redundant.  */
+		      && initialize_ao_ref_for_dse (inner_use_stmt, &use_ref)
+		      && valid_ao_ref_for_dse (&use_ref)
+		      && use_ref.base == ref->base
+		      && known_eq (use_ref.size, use_ref.max_size)
+		      && known_ge (use_ref.size, ref->size)
+		      && known_eq (use_ref.offset, ref->offset))
+		    temp = inner_use_stmt;
+		  else
+		    {
+		      fail = true;
+		      BREAK_FROM_IMM_USE_STMT (ui);
+		    }
+		}
 	      /* Do not consider the PHI as use if it dominates the
-	         stmt defining the virtual operand we are processing,
+		 stmt defining the virtual operand we are processing,
 		 we have processed it already in this case.  */
 	      if (gimple_bb (defvar_def) != gimple_bb (use_stmt)
 		  && !dominated_by_p (CDI_DOMINATORS,
@@ -799,7 +832,7 @@  dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	      enum dse_store_status store_status;
 	      m_byte_tracking_enabled
 		= setup_live_bytes_from_ref (&ref, m_live_bytes);
-	      store_status = dse_classify_store (&ref, stmt, &use_stmt,
+	      store_status = dse_classify_store (&ref, stmt, stmt, &use_stmt,
 						 m_byte_tracking_enabled,
 						 m_live_bytes);
 	      if (store_status == DSE_STORE_LIVE)
@@ -834,7 +867,7 @@  dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	  m_byte_tracking_enabled
 	    = setup_live_bytes_from_ref (&ref, m_live_bytes);
 	  enum dse_store_status store_status;
-	  store_status = dse_classify_store (&ref, stmt, &use_stmt,
+	  store_status = dse_classify_store (&ref, stmt, stmt, &use_stmt,
 					     m_byte_tracking_enabled,
 					     m_live_bytes);
 	  if (store_status == DSE_STORE_LIVE)
-- 
2.7.4