RFC [1/2] divmod transform

On 25 May 2016 at 12:52, Richard Biener <rguenther@suse.de> wrote:
> On Tue, 24 May 2016, Prathamesh Kulkarni wrote:

>

>> On 24 May 2016 at 19:39, Richard Biener <rguenther@suse.de> wrote:

>> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:

>> >

>> >> On 24 May 2016 at 17:42, Richard Biener <rguenther@suse.de> wrote:

>> >> > On Tue, 24 May 2016, Prathamesh Kulkarni wrote:

>> >> >

>> >> >> On 23 May 2016 at 17:35, Richard Biener <richard.guenther@gmail.com> wrote:

>> >> >> > On Mon, May 23, 2016 at 10:58 AM, Prathamesh Kulkarni

>> >> >> > <prathamesh.kulkarni@linaro.org> wrote:

>> >> >> >> Hi,

>> >> >> >> I have updated my patch for divmod (attached), which was originally

>> >> >> >> based on Kugan's patch.

>> >> >> >> The patch transforms stmts with code TRUNC_DIV_EXPR and TRUNC_MOD_EXPR

>> >> >> >> having same operands to divmod representation, so we can cse computation of mod.

>> >> >> >>

>> >> >> >> t1 = a TRUNC_DIV_EXPR b;

>> >> >> >> t2 = a TRUNC_MOD_EXPR b

>> >> >> >> is transformed to:

>> >> >> >> complex_tmp = DIVMOD (a, b);

>> >> >> >> t1 = REALPART_EXPR (complex_tmp);

>> >> >> >> t2 = IMAGPART_EXPR (complex_tmp);

>> >> >> >>

>> >> >> >> * New hook divmod_expand_libfunc

>> >> >> >> The rationale for introducing the hook is that different targets have

>> >> >> >> incompatible calling conventions for divmod libfunc.

>> >> >> >> Currently three ports define divmod libfunc: c6x, spu and arm.

>> >> >> >> c6x and spu follow the convention of libgcc2.c:__udivmoddi4:

>> >> >> >> return quotient and store remainder in argument passed as pointer,

>> >> >> >> while the arm version takes two arguments and returns both

>> >> >> >> quotient and remainder having mode double the size of the operand mode.

>> >> >> >> The port should hence override the hook expand_divmod_libfunc

>> >> >> >> to generate call to target-specific divmod.

>> >> >> >> Ports should define this hook if:

>> >> >> >> a) The port does not have divmod or div insn for the given mode.

>> >> >> >> b) The port defines divmod libfunc for the given mode.

>> >> >> >> The default hook default_expand_divmod_libfunc() generates call

>> >> >> >> to libgcc2.c:__udivmoddi4 provided the operands are unsigned and

>> >> >> >> are of DImode.

>> >> >> >>

>> >> >> >> Patch passes bootstrap+test on x86_64-unknown-linux-gnu and

>> >> >> >> cross-tested on arm*-*-*.

>> >> >> >> Bootstrap+test in progress on arm-linux-gnueabihf.

>> >> >> >> Does this patch look OK ?

>> >> >> >

>> >> >> > diff --git a/gcc/targhooks.c b/gcc/targhooks.c

>> >> >> > index 6b4601b..e4a021a 100644

>> >> >> > --- a/gcc/targhooks.c

>> >> >> > +++ b/gcc/targhooks.c

>> >> >> > @@ -1965,4 +1965,31 @@ default_optab_supported_p (int, machine_mode,

>> >> >> > machine_mode, optimization_type)

>> >> >> >    return true;

>> >> >> >  }

>> >> >> >

>> >> >> > +void

>> >> >> > +default_expand_divmod_libfunc (bool unsignedp, machine_mode mode,

>> >> >> > +                              rtx op0, rtx op1,

>> >> >> > +                              rtx *quot_p, rtx *rem_p)

>> >> >> >

>> >> >> > functions need a comment.

>> >> >> >

>> >> >> > ISTR it was suggested that ARM change to libgcc2.c__udivmoddi4 style?  In that

>> >> >> > case we could avoid the target hook.

>> >> >> Well I would prefer adding the hook because that's more easier -;)

>> >> >> Would it be ok for now to go with the hook ?

>> >> >> >

>> >> >> > +      /* If target overrides expand_divmod_libfunc hook

>> >> >> > +        then perform divmod by generating call to the target-specifc divmod

>> >> >> > libfunc.  */

>> >> >> > +      if (targetm.expand_divmod_libfunc != default_expand_divmod_libfunc)

>> >> >> > +       return true;

>> >> >> > +

>> >> >> > +      /* Fall back to using libgcc2.c:__udivmoddi4.  */

>> >> >> > +      return (mode == DImode && unsignedp);

>> >> >> >

>> >> >> > I don't understand this - we know optab_libfunc returns non-NULL for 'mode'

>> >> >> > but still restrict this to DImode && unsigned?  Also if

>> >> >> > targetm.expand_divmod_libfunc

>> >> >> > is not the default we expect the target to handle all modes?

>> >> >> Ah indeed, the check for DImode is unnecessary.

>> >> >> However I suppose the check for unsignedp should be there,

>> >> >> since we want to generate call to __udivmoddi4 only if operand is unsigned ?

>> >> >

>> >> > The optab libfunc for sdivmod should be NULL in that case.

>> >> Ah indeed, thanks.

>> >> >

>> >> >> >

>> >> >> > That said - I expected the above piece to be simply a 'return true;' ;)

>> >> >> >

>> >> >> > Usually we use some can_expand_XXX helper in optabs.c to query if the target

>> >> >> > supports a specific operation (for example SImode divmod would use DImode

>> >> >> > divmod by means of widening operands - for the unsigned case of course).

>> >> >> Thanks for pointing out. So if a target does not support divmod

>> >> >> libfunc for a mode

>> >> >> but for a wider mode, then we could zero-extend operands to the wider-mode,

>> >> >> perform divmod on the wider-mode, and then cast result back to the

>> >> >> original mode.

>> >> >> I haven't done that in this patch, would it be OK to do that as a follow up ?

>> >> >

>> >> > I think that you should conservatively handle the div_optab query, thus if

>> >> > the target has a HW division in a wider mode don't use the divmod IFN.

>> >> > You'd simply iterate over GET_MODE_WIDER_MODE and repeat the

>> >> > if (optab_handler (div_optab, mode) != CODE_FOR_nothing) check, bailing

>> >> > out if that is available.

>> >> Done.

>> >> >

>> >> >> > +  /* Disable the transform if either is a constant, since

>> >> >> > division-by-constant

>> >> >> > +     may have specialized expansion.  */

>> >> >> > +  if (TREE_CONSTANT (op1) || TREE_CONSTANT (op2))

>> >> >> > +    return false;

>> >> >> >

>> >> >> > please use CONSTANT_CLASS_P (op1) || CONSTANT_CLASS_P (op2)

>> >> >> >

>> >> >> > +  if (TYPE_OVERFLOW_TRAPS (type))

>> >> >> > +    return false;

>> >> >> >

>> >> >> > why's that?  Generally please first test cheap things (trapping, constant-ness)

>> >> >> > before checking expensive stuff (target_supports_divmod_p).

>> >> >> I added TYPE_OVERFLOW_TRAPS (type) based on your suggestion in:

>> >> >> https://www.mail-archive.com/gcc@gcc.gnu.org/msg78534.html

>> >> >> "When looking at TRUNC_DIV_EXPR you should also exclude

>> >> >> the case where TYPE_OVERFLOW_TRAPS (type) as that should

>> >> >> expand using the [su]divv optabs (no trapping overflow

>> >> >> divmod optab exists)."

>> >> >

>> >> > Ok, didn't remember that.

>> >> >

>> >> >> >

>> >> >> > +static bool

>> >> >> > +convert_to_divmod (gassign *stmt)

>> >> >> > +{

>> >> >> > +  if (!divmod_candidate_p (stmt))

>> >> >> > +    return false;

>> >> >> > +

>> >> >> > +  tree op1 = gimple_assign_rhs1 (stmt);

>> >> >> > +  tree op2 = gimple_assign_rhs2 (stmt);

>> >> >> > +

>> >> >> > +  vec<gimple *> stmts = vNULL;

>> >> >> >

>> >> >> > use an auto_vec <gimple *> - you currently leak it in at least one place.

>> >> >> >

>> >> >> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))

>> >> >> > +       cfg_changed = true;

>> >> >> >

>> >> >> > note that this suggests you should check whether any of the stmts may throw

>> >> >> > internally as you don't update / transfer EH info correctly.  So for 'stmt' and

>> >> >> > all 'use_stmt' check stmt_can_throw_internal and if so do not add it to

>> >> >> > the list of stmts to modify.

>> >> >> >

>> >> >> > Btw, I think you should not add 'stmt' immediately but when iterating over

>> >> >> > all uses also gather uses in TRUNC_MOD_EXPR.

>> >> >> >

>> >> >> > Otherwise looks ok.

>> >> >> Done changes in this version. I am gathering mod uses same time as div uses,

>> >> >> so this imposes a constraint that mod dominates mod. I am not sure if

>> >> >> that's desirable.

>> >> >

>> >> > I think you also need a mod_seen variable now that you don't necessarily

>> >> > end up with 'stmt' in the vector of stmts.  I don't see how there is a

>> >> > constraint that mod dominates mod - it's just that the top_stmt needs

>> >> > to dominate all other uses that can be replaced with replacing top_stmt

>> >> > with a divmod.  It's just that the actual stmt set we choose may now

>> >> > depend on immediate uses order which on a second thought is bad

>> >> > as immediate uses order could be affected by debug stmts ... hmm.

>> >> >

>> >> > To avoid this please re-add the code adding 'stmt' to stmts immediately

>> >> > and add a use_stmt != stmt check to the immediate use processing loop

>> >> > so that we don't end up adding it twice.

>> >> Well I wonder what will happen for the following case:

>> >> t1 = x / y;

>> >> if (cond)

>> >>   t2 = x % y;

>> >> else

>> >>   t3 = x % y;

>> >>

>> >> Assuming stmt == top_stmt is "t2 = x % y" and use_stmt is "t3 = x % y",

>> >> use_stmt will not get added to stmts vector, since top_stmt and

>> >> use_stmt are not in same bb,

>> >> and bb's containing top_stmt and use_stmt don't dominate each other.

>> >> Not sure if this is practical case (I assume fre will hoist mod

>> >> outside if-else?)

>> >>

>> >> Now that we immediately add stmt to stmts vector, I suppose mod_seen

>> >> shall not be required ?

>> >

>> > In that case mod_seen is not needed.  But the situation you say will

>> > still happen so I wonder if we'd need a better way of iterating over

>> > immediate uses, like first pushing all candidates into a worklist

>> > vector and then iterating over that until we find no more candidates.

>> >

>> > You can then also handle the case of more than one group of stmts

>> > (the pass currently doesn't iterate in any particular useful order

>> > over BBs).

>> IIUC, we want to perform the transform if:

>> i) there exists top_stmt with code trunc_div_expr/trunc_mod_expr and

>> having same operands as stmt.

>> ii) top_stmt dominates all other stmts with code

>> trunc_div_expr/trunc_mod_expr and having same operands as top_stmt.

>>

>> Firstly, we try to get to top_stmt if it exists, by iterating over uses of stmt,

>> and then iterate over all uses of top_stmt and add them to stmts vector

>> only if top_stmt dominates all the stmts with same operands as top_stmt

>> and have code trunc_div_expr/trunc_mod_expr.

>>

>> /* Get to top_stmt.  */

>> top_stmt = stmt;

>> top_bb = gimple_bb (stmt);

>>

>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)

>> {

>>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR

>>       && use_stmt has same operands as stmt)

>>     {

>>       if (gimple_bb (use_stmt) dominates top_bb)

>>         {

>>           top_bb = gimple_bb (use_stmt);

>>           top_stmt = use_stmt;

>>         }

>>       else if (gimple_bb (use_stmt) == top_stmt

>>                && gimple_uid (use_stmt) < top_stmt)

>>         top_stmt = use_stmt;

>>     }

>> }

>>

>> /* Speculatively consider top_stmt as dominating all other

>> div_expr/mod_expr stmts with same operands as stmt.  */

>> stmts.safe_push (top_stmt);

>>

>> /* Walk uses of top_stmt to ensure that all stmts are dominated by top_stmt.  */

>> top_op1 = gimple_assign_rhs1 (top_stmt);

>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, top_op1)

>> {

>>   if (use_stmt code is TRUNC_DIV_EXPR or TRUNC_MOD_EXPR

>>       && use_stmt has same operands as top_stmt)

>>     {

>>       if (use_stmt == top_stmt)

>>         continue;

>>

>>       /* No top_stmt exits, do not proceed with transform  */

>>       if (top_bb does not dominate gimple_bb (use_stmt))

>>         return false;

>>

>>       stmts.safe_push (use_stmt);

>>     }

>> }

>>

>> For the case:

>> t1 = x / y;

>> if (cond)

>>   t2 = x % y;

>> else

>>   t3 = x % y;

>>

>> Assuming stmt is "t2 = x % y", it will walk uses of stmt and set

>> top_stmt to "t1 = x / y"

>> Then it will walk all uses of top_stmt:

>> "t2 = x % y" -> dominated by top_stmt

>> "t3 = x % y" -> dominated by top_stmt

>> Since all stmts are dominated by top_stmt, we add all three stmts to

>> vector of stmts and proceed with transform.

>>

>> For the case where, top_stmt dominates original stmt but not all stmts:

>>

>> if (cond)

>>   t1 = x / y;

>> else

>> {

>>   t2 = x % y;

>>   return;

>> }

>>

>> t3 = x % y;

>>

>> Assuming stmt is "t3 = x % y",

>> Walking stmt uses will set top_stmt to "t1 = x / y";

>>

>> Walking immediate uses of top_stmt, we find that "t2 = x % y" is not

>> dominated by top_stmt,

>> and hence don't do the transform.

>>

>> Does this sound reasonable ?

>

> Yes, that's reasonable.

Thanks, I have attached patch that implements above approach.
Does it look OK ?

The patch does not still handle the following case:
int f(int x, int y)
{
  extern int cond;
  int q, r;

  if (cond)
    q = x % y;
  else
    q = x % y;

  r = x % y;
  return q + r;
}

In above case although the mod stmt is not dominated by either div
stmt, I suppose the transform
is still possible by inserting DIVMOD (x, y) before if-else ?

For the following test-case, I am surprised why CSE didn't take place before
widening_mul pass ?

int
f_1 (int x, int y)
{
  int q = x / y;
  int r1 = 0, r2 = 0;
  if (cond)
    r1 = x % y;
  else
    r2 = x % y;
  return q + r1 + r2;
}

The input to widening_mul pass is:
f_1 (int x, int y)
{
  int r2;
  int r1;
  int q;
  int cond.0_1;
  int _2;
  int _11;

  <bb 2>:
  q_7 = x_5(D) / y_6(D);
  cond.0_1 = cond;
  if (cond.0_1 != 0)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  r1_9 = x_5(D) % y_6(D);
  goto <bb 5>;

  <bb 4>:
  r2_10 = x_5(D) % y_6(D);

  <bb 5>:
  # r1_3 = PHI <r1_9(3), 0(4)>
  # r2_4 = PHI <0(3), r2_10(4)>
  _2 = r1_3 + q_7;
  _11 = _2 + r2_4;
  return _11;

}

Thanks,
Prathamesh
>

> Richard.

>

>> Thanks,

>> Prathamesh

>> >

>> > Richard.

>> >

>>

>>

>

> --

> Richard Biener <rguenther@suse.de>

> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

RFC [1/2] divmod transform

Commit Message

Patch