[0/7] Type promotion pass and elimination of zext/sext

On 29/10/15 02:45, Richard Biener wrote:
> On Tue, Oct 27, 2015 at 1:50 AM, kugan

> <kugan.vivekanandarajah@linaro.org> wrote:

>>

>>

>> On 23/10/15 01:23, Richard Biener wrote:

>>>

>>> On Thu, Oct 22, 2015 at 12:50 PM, Kugan

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>

>>>>

>>>>

>>>> On 21/10/15 23:45, Richard Biener wrote:

>>>>>

>>>>> On Tue, Oct 20, 2015 at 10:03 PM, Kugan

>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>

>>>>>>

>>>>>>

>>>>>> On 07/09/15 12:53, Kugan wrote:

>>>>>>>

>>>>>>>

>>>>>>> This a new version of the patch posted in

>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done

>>>>>>> more testing and spitted the patch to make it more easier to review.

>>>>>>> There are still couple of issues to be addressed and I am working on

>>>>>>> them.

>>>>>>>

>>>>>>> 1. AARCH64 bootstrap now fails with the commit

>>>>>>> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is

>>>>>>> mis-compiled

>>>>>>> in stage2 and fwprop.c is failing. It looks to me that there is a

>>>>>>> latent

>>>>>>> issue which gets exposed my patch. I can also reproduce this in x86_64

>>>>>>> if I use the same PROMOTE_MODE which is used in aarch64 port. For the

>>>>>>> time being, I am using  patch

>>>>>>> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a

>>>>>>> workaround. This meeds to be fixed before the patches are ready to be

>>>>>>> committed.

>>>>>>>

>>>>>>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with

>>>>>>> -O3 -g Error: unaligned opcodes detected in executable segment. It

>>>>>>> works

>>>>>>> fine if I remove the -g. I am looking into it and needs to be fixed as

>>>>>>> well.

>>>>>>

>>>>>>

>>>>>> Hi Richard,

>>>>>>

>>>>>> Now that stage 1 is going to close, I would like to get these patches

>>>>>> accepted for stage1. I will try my best to address your review comments

>>>>>> ASAP.

>>>>>

>>>>>

>>>>> Ok, can you make the whole patch series available so I can poke at the

>>>>> implementation a bit?  Please state the revision it was rebased on

>>>>> (or point me to a git/svn branch the work resides on).

>>>>>

>>>>

>>>> Thanks. Please find the patched rebated against trunk@229156. I have

>>>> skipped the test-case readjustment patches.

>>>

>>>

>>> Some quick observations.  On x86_64 when building

>>

>>

>> Hi Richard,

>>

>> Thanks for the review.

>>

>>>

>>> short bar (short y);

>>> int foo (short x)

>>> {

>>>    short y = bar (x) + 15;

>>>    return y;

>>> }

>>>

>>> with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs)

>>> I get

>>>

>>>    <bb 2>:

>>>    _1 = (int) x_10(D);

>>>    _2 = (_1) sext (16);

>>>    _11 = bar (_2);

>>>    _5 = (int) _11;

>>>    _12 = (unsigned int) _5;

>>>    _6 = _12 & 65535;

>>>    _7 = _6 + 15;

>>>    _13 = (int) _7;

>>>    _8 = (_13) sext (16);

>>>    _9 = (_8) sext (16);

>>>    return _9;

>>>

>>> which looks fine but the VRP optimization doesn't trigger for the

>>> redundant sext

>>> (ranges are computed correctly but the 2nd extension is not removed).

Thanks for the comments. Please fond the attached patches with which I
am now getting
cat .192t.optimized

;; Function foo (foo, funcdef_no=0, decl_uid=1406, cgraph_uid=0,
symbol_order=0)

foo (short int x)
{
  signed int _1;
  int _2;
  signed int _5;
  unsigned int _6;
  unsigned int _7;
  signed int _8;
  int _9;
  short int _11;
  unsigned int _12;
  signed int _13;

  <bb 2>:
  _1 = (signed int) x_10(D);
  _2 = _1;
  _11 = bar (_2);
  _5 = (signed int) _11;
  _12 = (unsigned int) _11;
  _6 = _12 & 65535;
  _7 = _6 + 15;
  _13 = (signed int) _7;
  _8 = (_13) sext (16);
  _9 = _8;
  return _9;

}

There are still some redundancies. The asm difference after RTL
optimizations is

-	addl	$15, %eax
+	addw	$15, %ax

>>>

>>> This also makes me notice trivial match.pd patterns are missing, like

>>> for example

>>>

>>> (simplify

>>>   (sext (sext@2 @0 @1) @3)

>>>   (if (tree_int_cst_compare (@1, @3) <= 0)

>>>    @2

>>>    (sext @0 @3)))

>>>

>>> as VRP doesn't run at -O1 we must rely on those to remove rendudant

>>> extensions,

>>> otherwise generated code might get worse compared to without the pass(?)

>>

>>

>> Do you think that we should enable this pass only when vrp is enabled.

>> Otherwise, even when we do the simple optimizations you mentioned below, we

>> might not be able to remove all the redundancies.

>>

>>>

>>> I also notice that the 'short' argument does not get it's sign-extension

>>> removed

>>> as redundand either even though we have

>>>

>>> _1 = (int) x_8(D);

>>> Found new range for _1: [-32768, 32767]

>>>

>>

>> I am looking into it.

>>

>>> In the end I suspect that keeping track of the "simple" cases in the

>>> promotion

>>> pass itself (by keeping a lattice) might be a good idea (after we fix VRP

>>> to do

>>> its work).  In some way whether the ABI guarantees promoted argument

>>> registers might need some other target hook queries.

I tried adding it in the attached patch with record_visit_stmt to track
whether an ssa would have value overflow or properly zero/sign extended
in promoted mode. We can use this to eliminate some of the zero/sign
extension at gimple level. As it is, it doesn't do much. If this is what
you had in mind, I will extend it based on your feedback.

>>>

>>> Now onto the 0002 patch.

>>>

>>> +static bool

>>> +type_precision_ok (tree type)

>>> +{

>>> +  return (TYPE_PRECISION (type)  == 8

>>> +         || TYPE_PRECISION (type) == 16

>>> +         || TYPE_PRECISION (type) == 32);

>>> +}

>>>

>>> that's a weird function to me.  You probably want

>>> TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type))

>>> here?  And guard that thing with POINTER_TYPE_P || INTEGRAL_TYPE_P?

>>>

>>

>> I will change this. (I have a patch which I am testing with other changes

>> you have asked for)

>>

>>

>>> +/* Return the promoted type for TYPE.  */

>>> +static tree

>>> +get_promoted_type (tree type)

>>> +{

>>> +  tree promoted_type;

>>> +  enum machine_mode mode;

>>> +  int uns;

>>> +  if (POINTER_TYPE_P (type)

>>> +      || !INTEGRAL_TYPE_P (type)

>>> +      || !type_precision_ok (type))

>>> +    return type;

>>> +

>>> +  mode = TYPE_MODE (type);

>>> +#ifdef PROMOTE_MODE

>>> +  uns = TYPE_SIGN (type);

>>> +  PROMOTE_MODE (mode, uns, type);

>>> +#endif

>>> +  uns = TYPE_SIGN (type);

>>> +  promoted_type = lang_hooks.types.type_for_mode (mode, uns);

>>> +  if (promoted_type

>>> +      && (TYPE_PRECISION (promoted_type) > TYPE_PRECISION (type)))

>>> +    type = promoted_type;

>>>

>>> I think what you want to verify is that TYPE_PRECISION (promoted_type)

>>> == GET_MODE_PRECISION (mode).

>>> And to not even bother with this simply use

>>>

>>> promoted_type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode),

>>> uns);

>>>

>>

>> I am changing this too.

>>

>>> You use a domwalk but also might create new basic-blocks during it

>>> (insert_on_edge_immediate), that's a

>>> no-no, commit edge inserts after the domwalk.

>>

>>

>> I am sorry, I dont understand "commit edge inserts after the domwalk" Is

>> there a way to do this in the current implementation?

> 

> Yes, simply use gsi_insert_on_edge () and after the domwalk is done do

> gsi_commit_edge_inserts ().

> 

>>> ssa_sets_higher_bits_bitmap looks unused and

>>> we generally don't free dominance info, so please don't do that.

>>>

>>> I fired off a bootstrap on ppc64-linux which fails building stage1 libgcc

>>> with

>>>

>>> /abuild/rguenther/obj/./gcc/xgcc -B/abuild/rguenther/obj/./gcc/

>>> -B/usr/local/powerpc64-unknown-linux-gnu/bin/

>>> -B/usr/local/powerpc64-unknown-linux-gnu/lib/ -isystem

>>> /usr/local/powerpc64-unknown-linux-gnu/include -isystem

>>> /usr/local/powerpc64-unknown-linux-gnu/sys-include    -g -O2 -O2  -g

>>> -O2 -DIN_GCC    -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual

>>> -Wno-format -Wstrict-prototypes -Wmissing-prototypes

>>> -Wold-style-definition  -isystem ./include   -fPIC -mlong-double-128

>>> -mno-minimal-toc -g -DIN_LIBGCC2 -fbuilding-libgcc

>>> -fno-stack-protector   -fPIC -mlong-double-128 -mno-minimal-toc -I.

>>> -I. -I../.././gcc -I../../../trunk/libgcc -I../../../trunk/libgcc/.

>>> -I../../../trunk/libgcc/../gcc -I../../../trunk/libgcc/../include

>>> -I../../../trunk/libgcc/../libdecnumber/dpd

>>> -I../../../trunk/libgcc/../libdecnumber -DHAVE_CC_TLS  -o _divdi3.o

>>> -MT _divdi3.o -MD -MP -MF _divdi3.dep -DL_divdi3 -c

>>> ../../../trunk/libgcc/libgcc2.c \

>>>            -fexceptions -fnon-call-exceptions -fvisibility=hidden

>>> -DHIDE_EXPORTS

>>> In file included from ../../../trunk/libgcc/libgcc2.c:56:0:

>>> ../../../trunk/libgcc/libgcc2.c: In function ‘__divti3’:

>>> ../../../trunk/libgcc/libgcc2.h:193:20: internal compiler error: in

>>> expand_debug_locations, at cfgexpand.c:5277

>>>

With the attached patch, now I am running into Bootstrap comparison
failure. I am looking into it. Please review this version so that I can
address them while fixing this issue.

Thanks,
Kugan

>>

>> I am testing on gcc computefarm. I will get it to bootstrap and will do the

>> regression testing before posting the next version.

>>

>>> as hinted at above a bootstrap on i?86 (yes, 32bit) with

>>> --with-tune=pentiumpro might be another good testing candidate.

>>>

>>> +      FOR_EACH_SSA_TREE_OPERAND (def, stmt, iter, SSA_OP_USE |

>>> SSA_OP_DEF)

>>> +       promote_def_and_uses (def);

>>>

>>> it looks like you are doing some redundant work by walking both defs

>>> and uses of each stmt.  I'd say you should separate

>>> def and use processing and use

>>>

>>>    FOR_EACH_SSA_USE_OPERAND (use, stmt, iter, SSA_OP_USE)

>>>      promote_use (use);

>>>    FOR_EACH_SSA_DEF_OPERAND (def, stmt, iter, SSA_OP_DEF)

>>>      promote_def (def);

>>>

>>

>> Name promote_def_and_uses in my implementation is a bit confusing. It is

>> promoting the SSA_NAMEs. We only have to do that for the definitions if we

>> can do the SSA_NAMEs defined by parameters.

>>

>> I also have a bitmap to see if we have promoted a variable and avoid doing

>> it again. I will try to improve this.

>>

>>

>>

>>> this should make processing more efficient (memory local) compared to

>>> doing the split handling

>>> in promote_def_and_uses.

>>>

>>> I think it will be convenient to have a SSA name info structure where

>>> you can remember the original

>>> type a name was promoted from as well as whether it was promoted or

>>> not.  This way adjusting

>>> debug uses should be "trivial":

>>>

>>> +static unsigned int

>>> +fixup_uses (tree use, tree promoted_type, tree old_type)

>>> +{

>>> +  gimple *stmt;

>>> +  imm_use_iterator ui;

>>> +  gimple_stmt_iterator gsi;

>>> +  use_operand_p op;

>>> +

>>> +  FOR_EACH_IMM_USE_STMT (stmt, ui, use)

>>> +    {

>>> +      bool do_not_promote = false;

>>> +      switch (gimple_code (stmt))

>>> +       {

>>> +       case GIMPLE_DEBUG:

>>> +           {

>>> +             gsi = gsi_for_stmt (stmt);

>>> +             gsi_remove (&gsi, true);

>>>

>>> rather than doing the above you'd do sth like

>>>

>>>    SET_USE (use, fold_convert (old_type, new_def));

>>>    update_stmt (stmt);

>>>

>>

>> We do have these information (original type a name was promoted from as well

>> as whether it was promoted or not). To make it easy to review, in the patch

>> that adds the pass,I am removing these debug stmts. But in patch 4, I am

>> trying to handle this properly. Maybe  I should combine them.

> 

> Yeah, it's a bit confusing otherwise.

> 

>>> note that while you may not be able to use promoted regs at all uses

>>> (like calls or asms) you can promote all defs, if only with a compensation

>>> statement after the original def.  The SSA name info struct can be used

>>> to note down the actual SSA name holding the promoted def.

>>>

>>> The pass looks a lot better than last time (it's way smaller!) but

>>> still needs some

>>> improvements.  There are some more fishy details with respect to how you

>>> allocate/change SSA names but I think those can be dealt with once the

>>> basic structure looks how I like it to be.

>>>

>>

>> I will post an updated patch in a day or two.

> 

> Thanks,

> Richard.

> 

>> Thanks again,

>> Kugan

[0/7] Type promotion pass and elimination of zext/sext

Commit Message

Patch