diff mbox

[cprop] Check rtx_cost when propagating constant

Message ID CACgzC7A8cVyoyFQO24eOo9t0+UEuKcYtWdANiJvgukcg_P9baw@mail.gmail.com
State New
Headers show

Commit Message

Zhenqiang Chen June 17, 2014, 2:11 a.m. UTC
Hi,

For some large constant, ports like ARM, need one more instructions to
operate it. e.g

#define MASK 0xfe00ff
void maskdata (int * data, int len)
{
   int i = len;
   for (; i > 0; i -= 2)
    {
      data[i] &= MASK;
      data[i + 1] &= MASK;
    }
}

Need two instructions for each AND operation:

    and    r3, r3, #16711935
    bic    r3, r3, #65536

If we keep the MASK in a register, loop2_invariant pass can hoist it
out the loop. And it can be shared by different references.

So the patch skips constant propagation if it makes INSN's cost higher.

Bootstrap and no make check regression on X86-64 and ARM Chrome book.

OK for trunk?

Thanks!
-Zhenqiang

ChangeLog:
2014-06-17  Zhenqiang Chen  <zhenqiang.chen@linaro.org>

        * cprop.c (try_replace_reg): Check cost for constants.

Comments

Richard Biener June 17, 2014, 8:15 a.m. UTC | #1
On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen
<zhenqiang.chen@linaro.org> wrote:
> Hi,
>
> For some large constant, ports like ARM, need one more instructions to
> operate it. e.g
>
> #define MASK 0xfe00ff
> void maskdata (int * data, int len)
> {
>    int i = len;
>    for (; i > 0; i -= 2)
>     {
>       data[i] &= MASK;
>       data[i + 1] &= MASK;
>     }
> }
>
> Need two instructions for each AND operation:
>
>     and    r3, r3, #16711935
>     bic    r3, r3, #65536
>
> If we keep the MASK in a register, loop2_invariant pass can hoist it
> out the loop. And it can be shared by different references.
>
> So the patch skips constant propagation if it makes INSN's cost higher.

So cprop undos invariant motions work here?

Should we make sure we add a REG_EQUAL note when not propagating?

> Bootstrap and no make check regression on X86-64 and ARM Chrome book.
>
> OK for trunk?
>
> Thanks!
> -Zhenqiang
>
> ChangeLog:
> 2014-06-17  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>
>         * cprop.c (try_replace_reg): Check cost for constants.
>
> diff --git a/gcc/cprop.c b/gcc/cprop.c
> index aef3ee8..c9cf02a 100644
> --- a/gcc/cprop.c
> +++ b/gcc/cprop.c
> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn)
>    rtx src = 0;
>    int success = 0;
>    rtx set = single_set (insn);
> +  int old_cost = 0;
> +  bool copy_p = false;
> +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
> +
> +  if (set && SET_SRC (set) && REG_P (SET_SRC (set)))
> +    copy_p = true;
> +  else
> +    old_cost = set_rtx_cost (set, speed);

Looks bogus for set == NULL?

Also what about register pressure?

I think this kind of change needs wider testing as RTX costs are
usually not fully implemented and you introduce a new use kind
(or is it already used elsewhere in this way to compute cost
difference of a set with s/reg/const?).

What kind of performance difference do you see?

Thanks,
Richard.

>    /* Usually we substitute easy stuff, so we won't copy everything.
>       We however need to take care to not duplicate non-trivial CONST
> @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn)
>    to = copy_rtx (to);
>
>    validate_replace_src_group (from, to, insn);
> +
> +  /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop.
> +     And it can be shared by different references.  So skip propagation if
> +     it makes INSN's rtx cost higher.  */
> +  if (set && !copy_p && CONSTANT_P (to))
> +    {
> +      int new_cost = set_rtx_cost (set, speed);
> +      if (new_cost > old_cost)
> +       {
> +         cancel_changes (0);
> +         return false;
> +       }
> +    }
> +
>    if (num_changes_pending () && apply_change_group ())
>      success = 1;
Zhenqiang Chen June 17, 2014, 9:42 a.m. UTC | #2
On 17 June 2014 16:15, Richard Biener <richard.guenther@gmail.com> wrote:
> On Tue, Jun 17, 2014 at 4:11 AM, Zhenqiang Chen
> <zhenqiang.chen@linaro.org> wrote:
>> Hi,
>>
>> For some large constant, ports like ARM, need one more instructions to
>> operate it. e.g
>>
>> #define MASK 0xfe00ff
>> void maskdata (int * data, int len)
>> {
>>    int i = len;
>>    for (; i > 0; i -= 2)
>>     {
>>       data[i] &= MASK;
>>       data[i + 1] &= MASK;
>>     }
>> }
>>
>> Need two instructions for each AND operation:
>>
>>     and    r3, r3, #16711935
>>     bic    r3, r3, #65536
>>
>> If we keep the MASK in a register, loop2_invariant pass can hoist it
>> out the loop. And it can be shared by different references.
>>
>> So the patch skips constant propagation if it makes INSN's cost higher.
>
> So cprop undos invariant motions work here?

Yes. GLOBAL CONST-PROP will undo invariant motions.

> Should we make sure we add a REG_EQUAL note when not propagating?

Logs show there already has REG_EQUAL note.

>> Bootstrap and no make check regression on X86-64 and ARM Chrome book.
>>
>> OK for trunk?
>>
>> Thanks!
>> -Zhenqiang
>>
>> ChangeLog:
>> 2014-06-17  Zhenqiang Chen  <zhenqiang.chen@linaro.org>
>>
>>         * cprop.c (try_replace_reg): Check cost for constants.
>>
>> diff --git a/gcc/cprop.c b/gcc/cprop.c
>> index aef3ee8..c9cf02a 100644
>> --- a/gcc/cprop.c
>> +++ b/gcc/cprop.c
>> @@ -733,6 +733,14 @@ try_replace_reg (rtx from, rtx to, rtx insn)
>>    rtx src = 0;
>>    int success = 0;
>>    rtx set = single_set (insn);
>> +  int old_cost = 0;
>> +  bool copy_p = false;
>> +  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
>> +
>> +  if (set && SET_SRC (set) && REG_P (SET_SRC (set)))
>> +    copy_p = true;
>> +  else
>> +    old_cost = set_rtx_cost (set, speed);
>
> Looks bogus for set == NULL?

set_rtx_cost has checked it. If it is NULL, the function will return 0;

> Also what about register pressure?

Do you think it has big register pressure impact? I think it does not
increase register pressure.

> I think this kind of change needs wider testing as RTX costs are
> usually not fully implemented and you introduce a new use kind
> (or is it already used elsewhere in this way to compute cost
> difference of a set with s/reg/const?).

Passes like fwprop, cse, auto_inc_dec, uses RTX costs to make the
decision. e.g. in function attempt_change of auto-inc-dec.c, it has
code segments like:

  old_cost = (set_src_cost (mem, speed)
              + set_rtx_cost (PATTERN (inc_insn.insn), speed));
  new_cost = set_src_cost (mem_tmp, speed);
  ...
  if (old_cost < new_cost)
    {
      ...
      return false;
    }

The usage of RTX costs in this patch is similar.

I had run X86-64 bootstrap and regression tests with
--enable-languages=c,c++,lto,fortran,go,ada,objc,obj-c++,java

And ARM bootstrap and regression tests with
--enable-languages=c,c++,fortran,lto,objc,obj-c++

I will run tests on i686. What other tests do you think I have to run?

> What kind of performance difference do you see?

I had run coremark, dhrystone, eembc on ARM Cortex-M4 (with some arm
backend changes). Coremark with some options show >10% performance
improvement. dhrystone is a little better. Some wave in eembc, but
overall result is better.

I will run spec2000 on X86-64 and ARM, and back to you about the
performance changes.

Thanks!
-Zhenqiang

> Thanks,
> Richard.
>
>>    /* Usually we substitute easy stuff, so we won't copy everything.
>>       We however need to take care to not duplicate non-trivial CONST
>> @@ -740,6 +748,20 @@ try_replace_reg (rtx from, rtx to, rtx insn)
>>    to = copy_rtx (to);
>>
>>    validate_replace_src_group (from, to, insn);
>> +
>> +  /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop.
>> +     And it can be shared by different references.  So skip propagation if
>> +     it makes INSN's rtx cost higher.  */
>> +  if (set && !copy_p && CONSTANT_P (to))
>> +    {
>> +      int new_cost = set_rtx_cost (set, speed);
>> +      if (new_cost > old_cost)
>> +       {
>> +         cancel_changes (0);
>> +         return false;
>> +       }
>> +    }
>> +
>>    if (num_changes_pending () && apply_change_group ())
>>      success = 1;
diff mbox

Patch

diff --git a/gcc/cprop.c b/gcc/cprop.c
index aef3ee8..c9cf02a 100644
--- a/gcc/cprop.c
+++ b/gcc/cprop.c
@@ -733,6 +733,14 @@  try_replace_reg (rtx from, rtx to, rtx insn)
   rtx src = 0;
   int success = 0;
   rtx set = single_set (insn);
+  int old_cost = 0;
+  bool copy_p = false;
+  bool speed = optimize_bb_for_speed_p (BLOCK_FOR_INSN (insn));
+
+  if (set && SET_SRC (set) && REG_P (SET_SRC (set)))
+    copy_p = true;
+  else
+    old_cost = set_rtx_cost (set, speed);

   /* Usually we substitute easy stuff, so we won't copy everything.
      We however need to take care to not duplicate non-trivial CONST
@@ -740,6 +748,20 @@  try_replace_reg (rtx from, rtx to, rtx insn)
   to = copy_rtx (to);

   validate_replace_src_group (from, to, insn);
+
+  /* For CONSTANT_P (TO), loop2_invariant pass might hoist it out the loop.
+     And it can be shared by different references.  So skip propagation if
+     it makes INSN's rtx cost higher.  */
+  if (set && !copy_p && CONSTANT_P (to))
+    {
+      int new_cost = set_rtx_cost (set, speed);
+      if (new_cost > old_cost)
+       {
+         cancel_changes (0);
+         return false;
+       }
+    }
+
   if (num_changes_pending () && apply_change_group ())
     success = 1;