diff mbox

[1/2] shrink wrap a function with a single loop: copy propagation

Message ID CACgzC7C5gtZKnuLoVc53NFOEbNa81e-Mh75BrEOq-DdyVNfr3A@mail.gmail.com
State New
Headers show

Commit Message

Zhenqiang Chen May 8, 2014, 8:06 a.m. UTC
Hi,

Similar issue was discussed in thread
http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01145.html. The patches
are close to Jeff's suggestion: "sink just the moves out of the
incoming argument registers".

The patch and following one try to shrink wrap a function with a
single loop, which can not be handled by
split_live_ranges_for_shrink_wrap and prepare_shrink_wrap, since the
induction variable has more than one definitions. Take the test case
in the patch as example, the pseudo code before shrink-wrap is like:

     p = p2
     if (!p) goto return
 L1: ...
     p = ...
     ...
     goto L1
     ...
return:

Function prepare_shrink_wrap does PRE like optimization to sink some
copies from entry block to the live block. The patches enhance
prepare_shrink_wrap with:
(1) Replace the reference of p to p2 in the entry block. (This patch)
(2) Create a new basic block on the live edge to hold the copy "p =
p2". (Next patch)

After shrink-wrap, the pseudo code would like:

     if (!p2) goto return
     p = p2
 L1: ...
     p = ...
     ...
     goto L1
return:

Bootstrap and no make check regression on X86-64 and ARM.
No Spec2k INT performance regression for X86-64 and ARM with -O3.

With the two patches, for X86-64 Spec2K INT, the number of functions
shrink-wrapped increases from 619 to 671.
For 453.povray in Spec2006, X86-64 is ~1% better. ARM THUMB mode is
~4% faster. No performance improvement for ARM mode since it uses the
mov (subs) to set the CC. There is no way to sink it out of the entry
block.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-05-08  Zhenqiang Chen  <zhenqiang.chen@linaro.org>

        * function.c (last_or_compare_p, try_copy_prop): new functions.
        (move_insn_for_shrink_wrap): try copy propagation.
        (prepare_shrink_wrap): Separate last_uses from uses.

testsuite/ChangeLog:
2014-05-08  Zhenqiang Chen  <zhenqiang.chen@linaro.org>

        * shrink-wrap-loop.c: New test case.

+/* { dg-final { cleanup-rtl-dump "pro_and_epilogue" } } */

Comments

Jeff Law May 9, 2014, 6 a.m. UTC | #1
On 05/08/14 02:06, Zhenqiang Chen wrote:
> Hi,
>
> Similar issue was discussed in thread
> http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01145.html. The patches
> are close to Jeff's suggestion: "sink just the moves out of the
> incoming argument registers".
>
> The patch and following one try to shrink wrap a function with a
> single loop, which can not be handled by
> split_live_ranges_for_shrink_wrap and prepare_shrink_wrap, since the
> induction variable has more than one definitions. Take the test case
> in the patch as example, the pseudo code before shrink-wrap is like:
>
>       p = p2
>       if (!p) goto return
>   L1: ...
>       p = ...
>       ...
>       goto L1
>       ...
> return:
>
> Function prepare_shrink_wrap does PRE like optimization to sink some
> copies from entry block to the live block. The patches enhance
> prepare_shrink_wrap with:
> (1) Replace the reference of p to p2 in the entry block. (This patch)
> (2) Create a new basic block on the live edge to hold the copy "p =
> p2". (Next patch)
>
> After shrink-wrap, the pseudo code would like:
>
>       if (!p2) goto return
>       p = p2
>   L1: ...
>       p = ...
>       ...
>       goto L1
> return:
Right. Seems like a reasonably useful transformation.  Not the totally 
general solution, but one that clearly has value.


> 2014-05-08  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>
>          * function.c (last_or_compare_p, try_copy_prop): new functions.
>          (move_insn_for_shrink_wrap): try copy propagation.
>          (prepare_shrink_wrap): Separate last_uses from uses.
>
> testsuite/ChangeLog:
> 2014-05-08  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>
>          * shrink-wrap-loop.c: New test case.
So first at a high level, Steven B.'s recommendation to pull the 
shrink-wrapping bits out of function.c is a good one.  I'd really like 
to see that happen as well.

> +/* Check whether INSN is the last insn in BB or
> +   a COMPARE for the last insn in BB.  */
> +
> +static bool
> +last_or_compare_p (basic_block bb, rtx insn)
> +{
> +  rtx x = single_set (insn);
> +
> +  if ((insn == BB_END (bb))
> +       || ((insn == PREV_INSN (BB_END (bb)))
> +          && x && REG_P (SET_DEST (x))
> +          && GET_MODE_CLASS (GET_MODE (SET_DEST (x))) == MODE_CC))
> +    return true;
> +
> +  return false;
So you don't actually check if the insn is a compare, just that it's 
destination is MODE_CC. That's probably close enough, but please note it 
in the comment.  Folks are going to read the comment first, not the 
implementation.

Q. Do we have to do anything special for HAVE_cc0 targets here?

> +}
> +
> +/* Try to copy propagate INSN with SRC and DEST in BB to the last COMPARE
> +   or JUMP insn, which use registers in LAST_USES.  */
So why restrict this to just cases where we have to propagate into a 
COMPARE at the end of a block?   So in your example, assume the first 
block looks like

p = p2;
if (!q) return;
[ ... ]

We ought to be able to turn that into

if (!q) return
p = p2;
[ ... ]


> +
> +static bool
> +try_copy_prop (basic_block bb, rtx insn, rtx src, rtx dest,
> +              HARD_REG_SET *last_uses)
My first thought here was that we must have some code which does 90% of 
what you need.  Did you look at any of the existing RTL optimization 
infrastructure to see if there was code you could extend to handle this?

Jeff
Zhenqiang Chen May 9, 2014, 7:30 a.m. UTC | #2
On 9 May 2014 14:00, Jeff Law <law@redhat.com> wrote:
> On 05/08/14 02:06, Zhenqiang Chen wrote:
>>
>> Hi,
>>
>> Similar issue was discussed in thread
>> http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01145.html. The patches
>> are close to Jeff's suggestion: "sink just the moves out of the
>> incoming argument registers".
>>
>> The patch and following one try to shrink wrap a function with a
>> single loop, which can not be handled by
>> split_live_ranges_for_shrink_wrap and prepare_shrink_wrap, since the
>> induction variable has more than one definitions. Take the test case
>> in the patch as example, the pseudo code before shrink-wrap is like:
>>
>>       p = p2
>>       if (!p) goto return
>>   L1: ...
>>       p = ...
>>       ...
>>       goto L1
>>       ...
>> return:
>>
>> Function prepare_shrink_wrap does PRE like optimization to sink some
>> copies from entry block to the live block. The patches enhance
>> prepare_shrink_wrap with:
>> (1) Replace the reference of p to p2 in the entry block. (This patch)
>> (2) Create a new basic block on the live edge to hold the copy "p =
>> p2". (Next patch)
>>
>> After shrink-wrap, the pseudo code would like:
>>
>>       if (!p2) goto return
>>       p = p2
>>   L1: ...
>>       p = ...
>>       ...
>>       goto L1
>> return:
>
> Right. Seems like a reasonably useful transformation.  Not the totally
> general solution, but one that clearly has value.
>
>
>
>> 2014-05-08  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>>
>>          * function.c (last_or_compare_p, try_copy_prop): new functions.
>>          (move_insn_for_shrink_wrap): try copy propagation.
>>          (prepare_shrink_wrap): Separate last_uses from uses.
>>
>> testsuite/ChangeLog:
>> 2014-05-08  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>>
>>          * shrink-wrap-loop.c: New test case.
>
> So first at a high level, Steven B.'s recommendation to pull the
> shrink-wrapping bits out of function.c is a good one.  I'd really like to
> see that happen as well.

I will do it.

>> +/* Check whether INSN is the last insn in BB or
>> +   a COMPARE for the last insn in BB.  */
>> +
>> +static bool
>> +last_or_compare_p (basic_block bb, rtx insn)
>> +{
>> +  rtx x = single_set (insn);
>> +
>> +  if ((insn == BB_END (bb))
>> +       || ((insn == PREV_INSN (BB_END (bb)))
>> +          && x && REG_P (SET_DEST (x))
>> +          && GET_MODE_CLASS (GET_MODE (SET_DEST (x))) == MODE_CC))
>> +    return true;
>> +
>> +  return false;
>
> So you don't actually check if the insn is a compare, just that it's
> destination is MODE_CC. That's probably close enough, but please note it in
> the comment.  Folks are going to read the comment first, not the
> implementation.
>
> Q. Do we have to do anything special for HAVE_cc0 targets here?

I have not think about it. I will recheck it.

>> +}
>> +
>> +/* Try to copy propagate INSN with SRC and DEST in BB to the last COMPARE
>> +   or JUMP insn, which use registers in LAST_USES.  */
>
> So why restrict this to just cases where we have to propagate into a COMPARE
> at the end of a block?   So in your example, assume the first block looks
> like

prepare_shrink_wrap will move_insn_for_shrink_wrap in BB_INSNS_REVERSE
order. Current prepare_shrink_wrap should handle the case since there
is no data flow dependence.

> p = p2;
> if (!q) return;
> [ ... ]
>
> We ought to be able to turn that into
>
> if (!q) return
> p = p2;
> [ ... ]
>
>
>
>> +
>> +static bool
>> +try_copy_prop (basic_block bb, rtx insn, rtx src, rtx dest,
>> +              HARD_REG_SET *last_uses)
>
> My first thought here was that we must have some code which does 90% of what
> you need.  Did you look at any of the existing RTL optimization
> infrastructure to see if there was code you could extend to handle this?

Most the codes are from function copyprop_hardreg_forward_1 in
"cprop_hardreg" pass. Previously I had a patch to reuse it:
http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00305.html. What do you
think about the old patch?

Thanks!
-Zhenqiang
Jeff Law May 13, 2014, 7:23 p.m. UTC | #3
On 05/09/14 01:30, Zhenqiang Chen wrote:
>>
>> So why restrict this to just cases where we have to propagate into a COMPARE
>> at the end of a block?   So in your example, assume the first block looks
>> like
>
> prepare_shrink_wrap will move_insn_for_shrink_wrap in BB_INSNS_REVERSE
> order. Current prepare_shrink_wrap should handle the case since there
> is no data flow dependence.
OK.

>>
>> My first thought here was that we must have some code which does 90% of what
>> you need.  Did you look at any of the existing RTL optimization
>> infrastructure to see if there was code you could extend to handle this?
>
> Most the codes are from function copyprop_hardreg_forward_1 in
> "cprop_hardreg" pass. Previously I had a patch to reuse it:
> http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00305.html. What do you
> think about the old patch?
Utilizing routines from the hard register copy propagation pass seems 
wise.  I'll take a look at the updated patch.

jeff
diff mbox

Patch

diff --git a/gcc/function.c b/gcc/function.c
index 383a52a..764ac82 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5421,14 +5421,139 @@  next_block_for_reg (basic_block bb, int
regno, int end_regno)
   return live_edge->dest;
 }

-/* Try to move INSN from BB to a successor.  Return true on success.
-   USES and DEFS are the set of registers that are used and defined
-   after INSN in BB.  */
+/* Check whether INSN is the last insn in BB or
+   a COMPARE for the last insn in BB.  */
+
+static bool
+last_or_compare_p (basic_block bb, rtx insn)
+{
+  rtx x = single_set (insn);
+
+  if ((insn == BB_END (bb))
+       || ((insn == PREV_INSN (BB_END (bb)))
+          && x && REG_P (SET_DEST (x))
+          && GET_MODE_CLASS (GET_MODE (SET_DEST (x))) == MODE_CC))
+    return true;
+
+  return false;
+}
+
+/* Try to copy propagate INSN with SRC and DEST in BB to the last COMPARE
+   or JUMP insn, which use registers in LAST_USES.  */
+
+static bool
+try_copy_prop (basic_block bb, rtx insn, rtx src, rtx dest,
+              HARD_REG_SET *last_uses)
+{
+  bool ret = false;
+  bool changed, is_asm;
+  unsigned i, alt, n_ops, dregno, sregno;
+
+  rtx x, r, n, tmp;
+
+  if (GET_CODE (dest) == SUBREG || GET_CODE (src) == SUBREG
+      || insn == BB_END (bb))
+    return false;
+
+  x = NEXT_INSN (insn);
+  sregno = REGNO (src);
+  dregno = REGNO (dest);
+
+  while (x != NULL_RTX)
+    {
+      tmp = NEXT_INSN (x);
+
+      if (BARRIER_P(x))
+       return false;
+
+      /* Skip other insns since dregno is not referred according to
+        previous checks.  */
+      if (!last_or_compare_p (bb, x))
+       {
+         x = tmp;
+         continue;
+       }
+      changed = 0;
+      extract_insn (x);
+      if (!constrain_operands (1))
+       fatal_insn_not_found (x);
+      preprocess_constraints ();
+      alt = which_alternative;
+      n_ops = recog_data.n_operands;
+
+      is_asm = asm_noperands (PATTERN (x)) >= 0;
+      if (is_asm)
+       return false;
+
+      for (i = 0; i < n_ops; i ++)
+       {
+         r = recog_data.operand [i];
+         if (REG_P (r) && REGNO (r) == dregno)
+           {
+             enum reg_class cl = recog_op_alt[i][alt].cl;
+             if (GET_MODE (r) != GET_MODE (src)
+                 || !in_hard_reg_set_p (reg_class_contents[cl],
+                                        GET_MODE (r), sregno)
+                 || recog_op_alt[i][alt].earlyclobber)
+               {
+                 if (changed)
+                   cancel_changes (0);
+                 return false;
+                }
+             n = gen_rtx_raw_REG (GET_MODE (r), sregno);
+             if (!validate_unshare_change (x, recog_data.operand_loc[i],
+                                           n, true))
+               {
+                 cancel_changes (0);
+                 return false;
+               }
+
+             ORIGINAL_REGNO (n) = ORIGINAL_REGNO (r);
+             REG_ATTRS (n) = REG_ATTRS (r);
+             REG_POINTER (n) = REG_POINTER (r);
+             changed = true;
+            }
+       }
+
+      if (changed)
+       {
+          if (!verify_changes (0))
+           {
+             cancel_changes (0);
+             return false;
+           }
+         else
+           {
+             confirm_change_group ();
+             df_insn_rescan (x);
+             ret = true;
+           }
+       }
+
+      if (x == BB_END (bb))
+       break;
+
+      x = tmp;
+    }
+
+  if (ret)
+    {
+      CLEAR_HARD_REG_BIT (*last_uses, dregno);
+      SET_HARD_REG_BIT (*last_uses, sregno);
+      df_set_bb_dirty (bb);
+    }
+  return ret;
+}
+
+ /* Try to move INSN from BB to a successor.  Return true on success.
+    USES and DEFS are the set of registers that are used and defined
+    after INSN in BB.  */

 static bool
 move_insn_for_shrink_wrap (basic_block bb, rtx insn,
                           const HARD_REG_SET uses,
-                          const HARD_REG_SET defs)
+                          const HARD_REG_SET defs,
+                          HARD_REG_SET *last_uses)
 {
   rtx set, src, dest;
   bitmap live_out, live_in, bb_uses, bb_defs;
@@ -5460,6 +5585,12 @@  move_insn_for_shrink_wrap (basic_block bb, rtx insn,
   /* See whether there is a successor block to which we could move INSN.  */
   next_block = next_block_for_reg (bb, dregno, end_dregno);
   if (!next_block)
+     return false;
+
+  /* If the destination register is referred in later insn,
+     try to forward it.  */
+  if (overlaps_hard_reg_set_p (*last_uses, GET_MODE (dest), dregno)
+      && !try_copy_prop (bb, insn, src, dest, last_uses))
     return false;

   /* At this point we are committed to moving INSN, but let's try to
@@ -5551,14 +5682,18 @@  static void
 prepare_shrink_wrap (basic_block entry_block)
 {
   rtx insn, curr, x;
-  HARD_REG_SET uses, defs;
+  HARD_REG_SET uses, defs, last_uses;
   df_ref *ref;

+  if (!JUMP_P (BB_END (entry_block)))
+    return;
+  CLEAR_HARD_REG_SET (last_uses);
   CLEAR_HARD_REG_SET (uses);
   CLEAR_HARD_REG_SET (defs);
   FOR_BB_INSNS_REVERSE_SAFE (entry_block, insn, curr)
     if (NONDEBUG_INSN_P (insn)
-       && !move_insn_for_shrink_wrap (entry_block, insn, uses, defs))
+       && !move_insn_for_shrink_wrap (entry_block, insn, uses, defs,
+                                      &last_uses))
       {
        /* Add all defined registers to DEFs.  */
        for (ref = DF_INSN_DEFS (insn); *ref; ref++)
@@ -5568,6 +5703,19 @@  prepare_shrink_wrap (basic_block entry_block)
              SET_HARD_REG_BIT (defs, REGNO (x));
          }

+       /* Add all used registers for the last and compare insn in BB.
+          These insns can not be sinked out of the ENTRY_BLOCK.  */
+       if (last_or_compare_p (entry_block, insn))
+         {
+           for (ref = DF_INSN_USES (insn); *ref; ref++)
+             {
+               x = DF_REF_REG (*ref);
+               if (REG_P (x) && HARD_REGISTER_P (x))
+                 SET_HARD_REG_BIT (last_uses, REGNO (x));
+             }
+           continue;
+         }
+
        /* Add all used registers to USESs.  */
        for (ref = DF_INSN_USES (insn); *ref; ref++)
          {
diff --git a/gcc/testsuite/gcc.dg/shrink-wrap-loop.c
b/gcc/testsuite/gcc.dg/shrink-wrap-loop.c
new file mode 100644
index 0000000..17dca4e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/shrink-wrap-loop.c
@@ -0,0 +1,20 @@ 
+/* { dg-do compile { target { { x86_64-*-* } || { arm_thumb2 } } } } */
+/* { dg-options "-O2 -fdump-rtl-pro_and_epilogue"  } */
+
+int foo (int *p1, int *p2);
+
+int
+test (int *p1, int *p2)
+{
+  int *p;
+
+  for (p = p2; p != 0; p++)
+    {
+      if (!foo (p, p1))
+        return 0;
+    }
+
+  return 1;
+}
+/* { dg-final { scan-rtl-dump "Performing shrink-wrapping"
"pro_and_epilogue"  } } */