diff mbox

[0/7] Type promotion pass and elimination of zext/sext

Message ID 56372A31.3080607@linaro.org
State Superseded
Headers show

Commit Message

Kugan Vivekanandarajah Nov. 2, 2015, 9:17 a.m. UTC
On 29/10/15 02:45, Richard Biener wrote:
> On Tue, Oct 27, 2015 at 1:50 AM, kugan

> <kugan.vivekanandarajah@linaro.org> wrote:

>>

>>

>> On 23/10/15 01:23, Richard Biener wrote:

>>>

>>> On Thu, Oct 22, 2015 at 12:50 PM, Kugan

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>

>>>>

>>>>

>>>> On 21/10/15 23:45, Richard Biener wrote:

>>>>>

>>>>> On Tue, Oct 20, 2015 at 10:03 PM, Kugan

>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>

>>>>>>

>>>>>>

>>>>>> On 07/09/15 12:53, Kugan wrote:

>>>>>>>

>>>>>>>

>>>>>>> This a new version of the patch posted in

>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done

>>>>>>> more testing and spitted the patch to make it more easier to review.

>>>>>>> There are still couple of issues to be addressed and I am working on

>>>>>>> them.

>>>>>>>

>>>>>>> 1. AARCH64 bootstrap now fails with the commit

>>>>>>> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is

>>>>>>> mis-compiled

>>>>>>> in stage2 and fwprop.c is failing. It looks to me that there is a

>>>>>>> latent

>>>>>>> issue which gets exposed my patch. I can also reproduce this in x86_64

>>>>>>> if I use the same PROMOTE_MODE which is used in aarch64 port. For the

>>>>>>> time being, I am using  patch

>>>>>>> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a

>>>>>>> workaround. This meeds to be fixed before the patches are ready to be

>>>>>>> committed.

>>>>>>>

>>>>>>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with

>>>>>>> -O3 -g Error: unaligned opcodes detected in executable segment. It

>>>>>>> works

>>>>>>> fine if I remove the -g. I am looking into it and needs to be fixed as

>>>>>>> well.

>>>>>>

>>>>>>

>>>>>> Hi Richard,

>>>>>>

>>>>>> Now that stage 1 is going to close, I would like to get these patches

>>>>>> accepted for stage1. I will try my best to address your review comments

>>>>>> ASAP.

>>>>>

>>>>>

>>>>> Ok, can you make the whole patch series available so I can poke at the

>>>>> implementation a bit?  Please state the revision it was rebased on

>>>>> (or point me to a git/svn branch the work resides on).

>>>>>

>>>>

>>>> Thanks. Please find the patched rebated against trunk@229156. I have

>>>> skipped the test-case readjustment patches.

>>>

>>>

>>> Some quick observations.  On x86_64 when building

>>

>>

>> Hi Richard,

>>

>> Thanks for the review.

>>

>>>

>>> short bar (short y);

>>> int foo (short x)

>>> {

>>>    short y = bar (x) + 15;

>>>    return y;

>>> }

>>>

>>> with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs)

>>> I get

>>>

>>>    <bb 2>:

>>>    _1 = (int) x_10(D);

>>>    _2 = (_1) sext (16);

>>>    _11 = bar (_2);

>>>    _5 = (int) _11;

>>>    _12 = (unsigned int) _5;

>>>    _6 = _12 & 65535;

>>>    _7 = _6 + 15;

>>>    _13 = (int) _7;

>>>    _8 = (_13) sext (16);

>>>    _9 = (_8) sext (16);

>>>    return _9;

>>>

>>> which looks fine but the VRP optimization doesn't trigger for the

>>> redundant sext

>>> (ranges are computed correctly but the 2nd extension is not removed).


Thanks for the comments. Please fond the attached patches with which I
am now getting
cat .192t.optimized

;; Function foo (foo, funcdef_no=0, decl_uid=1406, cgraph_uid=0,
symbol_order=0)

foo (short int x)
{
  signed int _1;
  int _2;
  signed int _5;
  unsigned int _6;
  unsigned int _7;
  signed int _8;
  int _9;
  short int _11;
  unsigned int _12;
  signed int _13;

  <bb 2>:
  _1 = (signed int) x_10(D);
  _2 = _1;
  _11 = bar (_2);
  _5 = (signed int) _11;
  _12 = (unsigned int) _11;
  _6 = _12 & 65535;
  _7 = _6 + 15;
  _13 = (signed int) _7;
  _8 = (_13) sext (16);
  _9 = _8;
  return _9;

}


There are still some redundancies. The asm difference after RTL
optimizations is

-	addl	$15, %eax
+	addw	$15, %ax


>>>

>>> This also makes me notice trivial match.pd patterns are missing, like

>>> for example

>>>

>>> (simplify

>>>   (sext (sext@2 @0 @1) @3)

>>>   (if (tree_int_cst_compare (@1, @3) <= 0)

>>>    @2

>>>    (sext @0 @3)))

>>>

>>> as VRP doesn't run at -O1 we must rely on those to remove rendudant

>>> extensions,

>>> otherwise generated code might get worse compared to without the pass(?)

>>

>>

>> Do you think that we should enable this pass only when vrp is enabled.

>> Otherwise, even when we do the simple optimizations you mentioned below, we

>> might not be able to remove all the redundancies.

>>

>>>

>>> I also notice that the 'short' argument does not get it's sign-extension

>>> removed

>>> as redundand either even though we have

>>>

>>> _1 = (int) x_8(D);

>>> Found new range for _1: [-32768, 32767]

>>>

>>

>> I am looking into it.

>>

>>> In the end I suspect that keeping track of the "simple" cases in the

>>> promotion

>>> pass itself (by keeping a lattice) might be a good idea (after we fix VRP

>>> to do

>>> its work).  In some way whether the ABI guarantees promoted argument

>>> registers might need some other target hook queries.


I tried adding it in the attached patch with record_visit_stmt to track
whether an ssa would have value overflow or properly zero/sign extended
in promoted mode. We can use this to eliminate some of the zero/sign
extension at gimple level. As it is, it doesn't do much. If this is what
you had in mind, I will extend it based on your feedback.


>>>

>>> Now onto the 0002 patch.

>>>

>>> +static bool

>>> +type_precision_ok (tree type)

>>> +{

>>> +  return (TYPE_PRECISION (type)  == 8

>>> +         || TYPE_PRECISION (type) == 16

>>> +         || TYPE_PRECISION (type) == 32);

>>> +}

>>>

>>> that's a weird function to me.  You probably want

>>> TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))

>>> here?  And guard that thing with POINTER_TYPE_P || INTEGRAL_TYPE_P?

>>>

>>

>> I will change this. (I have a patch which I am testing with other changes

>> you have asked for)

>>

>>

>>> +/* Return the promoted type for TYPE.  */

>>> +static tree

>>> +get_promoted_type (tree type)

>>> +{

>>> +  tree promoted_type;

>>> +  enum machine_mode mode;

>>> +  int uns;

>>> +  if (POINTER_TYPE_P (type)

>>> +      || !INTEGRAL_TYPE_P (type)

>>> +      || !type_precision_ok (type))

>>> +    return type;

>>> +

>>> +  mode = TYPE_MODE (type);

>>> +#ifdef PROMOTE_MODE

>>> +  uns = TYPE_SIGN (type);

>>> +  PROMOTE_MODE (mode, uns, type);

>>> +#endif

>>> +  uns = TYPE_SIGN (type);

>>> +  promoted_type = lang_hooks.types.type_for_mode (mode, uns);

>>> +  if (promoted_type

>>> +      && (TYPE_PRECISION (promoted_type) > TYPE_PRECISION (type)))

>>> +    type = promoted_type;

>>>

>>> I think what you want to verify is that TYPE_PRECISION (promoted_type)

>>> == GET_MODE_PRECISION (mode).

>>> And to not even bother with this simply use

>>>

>>> promoted_type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode),

>>> uns);

>>>

>>

>> I am changing this too.

>>

>>> You use a domwalk but also might create new basic-blocks during it

>>> (insert_on_edge_immediate), that's a

>>> no-no, commit edge inserts after the domwalk.

>>

>>

>> I am sorry, I dont understand "commit edge inserts after the domwalk" Is

>> there a way to do this in the current implementation?

> 

> Yes, simply use gsi_insert_on_edge () and after the domwalk is done do

> gsi_commit_edge_inserts ().

> 

>>> ssa_sets_higher_bits_bitmap looks unused and

>>> we generally don't free dominance info, so please don't do that.

>>>

>>> I fired off a bootstrap on ppc64-linux which fails building stage1 libgcc

>>> with

>>>

>>> /abuild/rguenther/obj/./gcc/xgcc -B/abuild/rguenther/obj/./gcc/

>>> -B/usr/local/powerpc64-unknown-linux-gnu/bin/

>>> -B/usr/local/powerpc64-unknown-linux-gnu/lib/ -isystem

>>> /usr/local/powerpc64-unknown-linux-gnu/include -isystem

>>> /usr/local/powerpc64-unknown-linux-gnu/sys-include    -g -O2 -O2  -g

>>> -O2 -DIN_GCC    -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual

>>> -Wno-format -Wstrict-prototypes -Wmissing-prototypes

>>> -Wold-style-definition  -isystem ./include   -fPIC -mlong-double-128

>>> -mno-minimal-toc -g -DIN_LIBGCC2 -fbuilding-libgcc

>>> -fno-stack-protector   -fPIC -mlong-double-128 -mno-minimal-toc -I.

>>> -I. -I../.././gcc -I../../../trunk/libgcc -I../../../trunk/libgcc/.

>>> -I../../../trunk/libgcc/../gcc -I../../../trunk/libgcc/../include

>>> -I../../../trunk/libgcc/../libdecnumber/dpd

>>> -I../../../trunk/libgcc/../libdecnumber -DHAVE_CC_TLS  -o _divdi3.o

>>> -MT _divdi3.o -MD -MP -MF _divdi3.dep -DL_divdi3 -c

>>> ../../../trunk/libgcc/libgcc2.c \

>>>            -fexceptions -fnon-call-exceptions -fvisibility=hidden

>>> -DHIDE_EXPORTS

>>> In file included from ../../../trunk/libgcc/libgcc2.c:56:0:

>>> ../../../trunk/libgcc/libgcc2.c: In function ‘__divti3’:

>>> ../../../trunk/libgcc/libgcc2.h:193:20: internal compiler error: in

>>> expand_debug_locations, at cfgexpand.c:5277

>>>


With the attached patch, now I am running into Bootstrap comparison
failure. I am looking into it. Please review this version so that I can
address them while fixing this issue.

Thanks,
Kugan

>>

>> I am testing on gcc computefarm. I will get it to bootstrap and will do the

>> regression testing before posting the next version.

>>

>>> as hinted at above a bootstrap on i?86 (yes, 32bit) with

>>> --with-tune=pentiumpro might be another good testing candidate.

>>>

>>> +      FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_USE |

>>> SSA_OP_DEF)

>>> +       promote_def_and_uses (def);

>>>

>>> it looks like you are doing some redundant work by walking both defs

>>> and uses of each stmt.  I'd say you should separate

>>> def and use processing and use

>>>

>>>    FOR_EACH_SSA_USE_OPERAND (use, stmt, iter, SSA_OP_USE)

>>>      promote_use (use);

>>>    FOR_EACH_SSA_DEF_OPERAND (def, stmt, iter, SSA_OP_DEF)

>>>      promote_def (def);

>>>

>>

>> Name promote_def_and_uses in my implementation is a bit confusing. It is

>> promoting the SSA_NAMEs. We only have to do that for the definitions if we

>> can do the SSA_NAMEs defined by parameters.

>>

>> I also have a bitmap to see if we have promoted a variable and avoid doing

>> it again. I will try to improve this.

>>

>>

>>

>>> this should make processing more efficient (memory local) compared to

>>> doing the split handling

>>> in promote_def_and_uses.

>>>

>>> I think it will be convenient to have a SSA name info structure where

>>> you can remember the original

>>> type a name was promoted from as well as whether it was promoted or

>>> not.  This way adjusting

>>> debug uses should be "trivial":

>>>

>>> +static unsigned int

>>> +fixup_uses (tree use, tree promoted_type, tree old_type)

>>> +{

>>> +  gimple *stmt;

>>> +  imm_use_iterator ui;

>>> +  gimple_stmt_iterator gsi;

>>> +  use_operand_p op;

>>> +

>>> +  FOR_EACH_IMM_USE_STMT (stmt, ui, use)

>>> +    {

>>> +      bool do_not_promote = false;

>>> +      switch (gimple_code (stmt))

>>> +       {

>>> +       case GIMPLE_DEBUG:

>>> +           {

>>> +             gsi = gsi_for_stmt (stmt);

>>> +             gsi_remove (&gsi, true);

>>>

>>> rather than doing the above you'd do sth like

>>>

>>>    SET_USE (use, fold_convert (old_type, new_def));

>>>    update_stmt (stmt);

>>>

>>

>> We do have these information (original type a name was promoted from as well

>> as whether it was promoted or not). To make it easy to review, in the patch

>> that adds the pass,I am removing these debug stmts. But in patch 4, I am

>> trying to handle this properly. Maybe  I should combine them.

> 

> Yeah, it's a bit confusing otherwise.

> 

>>> note that while you may not be able to use promoted regs at all uses

>>> (like calls or asms) you can promote all defs, if only with a compensation

>>> statement after the original def.  The SSA name info struct can be used

>>> to note down the actual SSA name holding the promoted def.

>>>

>>> The pass looks a lot better than last time (it's way smaller!) but

>>> still needs some

>>> improvements.  There are some more fishy details with respect to how you

>>> allocate/change SSA names but I think those can be dealt with once the

>>> basic structure looks how I like it to be.

>>>

>>

>> I will post an updated patch in a day or two.

> 

> Thanks,

> Richard.

> 

>> Thanks again,

>> Kugan
diff mbox

Patch

From c0ce364e3a422912a08189645efde46c36583753 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Thu, 22 Oct 2015 10:51:42 +1100
Subject: [PATCH 1/3] Add new SEXT_EXPR tree code

---
 gcc/cfgexpand.c         | 12 ++++++++++++
 gcc/expr.c              | 20 ++++++++++++++++++++
 gcc/fold-const.c        |  4 ++++
 gcc/tree-cfg.c          | 12 ++++++++++++
 gcc/tree-inline.c       |  1 +
 gcc/tree-pretty-print.c | 11 +++++++++++
 gcc/tree.def            |  5 +++++
 7 files changed, 65 insertions(+)

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index eaad859..aeb64bb 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -5054,6 +5054,18 @@  expand_debug_expr (tree exp)
     case FMA_EXPR:
       return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2);
 
+    case SEXT_EXPR:
+      gcc_assert (CONST_INT_P (op1));
+      inner_mode = mode_for_size (INTVAL (op1), MODE_INT, 0);
+      gcc_assert (GET_MODE_BITSIZE (inner_mode) == INTVAL (op1));
+
+      if (mode != inner_mode)
+	op0 = simplify_gen_unary (SIGN_EXTEND,
+				  mode,
+				  gen_lowpart_SUBREG (inner_mode, op0),
+				  inner_mode);
+      return op0;
+
     default:
     flag_unsupported:
 #ifdef ENABLE_CHECKING
diff --git a/gcc/expr.c b/gcc/expr.c
index da68870..c2f535f 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9318,6 +9318,26 @@  expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
       target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
       return target;
 
+    case SEXT_EXPR:
+	{
+	  machine_mode inner_mode = mode_for_size (tree_to_uhwi (treeop1),
+						   MODE_INT, 0);
+	  rtx temp, result;
+	  rtx op0 = expand_normal (treeop0);
+	  op0 = force_reg (mode, op0);
+	  if (mode != inner_mode)
+	    {
+	      result = gen_reg_rtx (mode);
+	      temp = simplify_gen_unary (SIGN_EXTEND, mode,
+					 gen_lowpart_SUBREG (inner_mode, op0),
+					 inner_mode);
+	      convert_move (result, temp, 0);
+	    }
+	  else
+	    result = op0;
+	  return result;
+	}
+
     default:
       gcc_unreachable ();
     }
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index 602ea24..a149bad 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -987,6 +987,10 @@  int_const_binop_1 (enum tree_code code, const_tree arg1, const_tree parg2,
       res = wi::bit_and (arg1, arg2);
       break;
 
+    case SEXT_EXPR:
+      res = wi::sext (arg1, arg2.to_uhwi ());
+      break;
+
     case RSHIFT_EXPR:
     case LSHIFT_EXPR:
       if (wi::neg_p (arg2))
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 8e3e810..d18b3f7 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3752,6 +3752,18 @@  verify_gimple_assign_binary (gassign *stmt)
         return false;
       }
 
+    case SEXT_EXPR:
+      {
+	if (!INTEGRAL_TYPE_P (lhs_type)
+	    || !useless_type_conversion_p (lhs_type, rhs1_type)
+	    || !tree_fits_uhwi_p (rhs2))
+	  {
+	    error ("invalid operands in sext expr");
+	    return true;
+	  }
+	return false;
+      }
+
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
       {
diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index b8269ef..e61c200 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -3893,6 +3893,7 @@  estimate_operator_cost (enum tree_code code, eni_weights *weights,
     case BIT_XOR_EXPR:
     case BIT_AND_EXPR:
     case BIT_NOT_EXPR:
+    case SEXT_EXPR:
 
     case TRUTH_ANDIF_EXPR:
     case TRUTH_ORIF_EXPR:
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 11f90051..bec9082 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1923,6 +1923,14 @@  dump_generic_node (pretty_printer *pp, tree node, int spc, int flags,
       }
       break;
 
+    case SEXT_EXPR:
+      pp_string (pp, "SEXT_EXPR <");
+      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (pp, ", ");
+      dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false);
+      pp_greater (pp);
+      break;
+
     case MODIFY_EXPR:
     case INIT_EXPR:
       dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags,
@@ -3561,6 +3569,9 @@  op_symbol_code (enum tree_code code)
     case MIN_EXPR:
       return "min";
 
+    case SEXT_EXPR:
+      return "sext";
+
     default:
       return "<<< ??? >>>";
     }
diff --git a/gcc/tree.def b/gcc/tree.def
index d0a3bd6..789cfdd 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -760,6 +760,11 @@  DEFTREECODE (BIT_XOR_EXPR, "bit_xor_expr", tcc_binary, 2)
 DEFTREECODE (BIT_AND_EXPR, "bit_and_expr", tcc_binary, 2)
 DEFTREECODE (BIT_NOT_EXPR, "bit_not_expr", tcc_unary, 1)
 
+/*  Sign-extend operation.  It will sign extend first operand from
+ the sign bit specified by the second operand.  The type of the
+ result is that of the first operand.  */
+DEFTREECODE (SEXT_EXPR, "sext_expr", tcc_binary, 2)
+
 /* ANDIF and ORIF allow the second operand not to be computed if the
    value of the expression is determined from the first operand.  AND,
    OR, and XOR always compute the second operand whether its value is
-- 
1.9.1