diff mbox

[tree-tailcall] Check if function returns it's argument

Message ID CAAgBjMnDQ8jJNdYT1m=tAfJ69BNAFu1RM558-4fegJS4ciScfg@mail.gmail.com
State Superseded
Headers show

Commit Message

Prathamesh Kulkarni Dec. 1, 2016, 12:50 p.m. UTC
On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

>

>> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:

>> > On 11/25/2016 01:07 AM, Richard Biener wrote:

>> >

>> >>> For the tail-call, issue should we artificially create a lhs and use that

>> >>> as return value (perhaps by a separate pass before tailcall) ?

>> >>>

>> >>> __builtin_memcpy (a1, a2, a3);

>> >>> return a1;

>> >>>

>> >>> gets transformed to:

>> >>> _1 = __builtin_memcpy (a1, a2, a3)

>> >>> return _1;

>> >>>

>> >>> So tail-call optimization pass would see the IL in it's expected form.

>> >>

>> >>

>> >> As said, a RTL expert needs to chime in here.  Iff then tail-call

>> >> itself should do this rewrite.  But if this form is required to make

>> >> things work (I suppose you checked it _does_ actually work?) then

>> >> we'd need to make sure later passes do not undo it.  So it looks

>> >> fragile to me.  OTOH I seem to remember that the flags we set on

>> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is

>> >> verified again there?

>> >

>> > So tail calling actually sits on the border between trees and RTL.

>> > Essentially it's an expand-time decision as we use information from trees as

>> > well as low level target information.

>> >

>> > I would not expect the former sequence to tail call.  The tail calling code

>> > does not know that the return value from memcpy will be a1.  Thus the tail

>> > calling code has to assume that it'll have to copy a1 into the return

>> > register after returning from memcpy, which obviously can't be done if we

>> > tail called memcpy.

>> >

>> > The second form is much more likely to turn into a tail call sequence

>> > because the return value from memcpy will be sitting in the proper register.

>> > This form out to work for most calling conventions that allow tail calls.

>> >

>> > We could (in theory) try and exploit the fact that memcpy returns its first

>> > argument as a return value, but that would only be helpful on a target where

>> > the first argument and return value use the same register. So I'd have a

>> > slight preference to rewriting per Prathamesh's suggestion above since it's

>> > more general.

>> Thanks for the suggestion. The attached patch creates artificial lhs,

>> and returns it if the function returns it's argument and that argument

>> is used as return-value.

>>

>> eg:

>> f (void * a1, void * a2, long unsigned int a3)

>> {

>>   <bb 2> [0.0%]:

>>   # .MEM_5 = VDEF <.MEM_1(D)>

>>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));

>>   # VUSE <.MEM_5>

>>   return a1_2(D);

>>

>> }

>>

>> is transformed to:

>> f (void * a1, void * a2, long unsigned int a3)

>> {

>>   void * _6;

>>

>>   <bb 2> [0.0%]:

>>   # .MEM_5 = VDEF <.MEM_1(D)>

>>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));

>>   # VUSE <.MEM_5>

>>   return _6;

>>

>> }

>>

>> While testing, I came across an issue with function f() defined

>> intail-padding1.C:

>> struct X

>> {

>>   ~X() {}

>>   int n;

>>   char d;

>> };

>>

>> X f()

>> {

>>   X nrvo;

>>   __builtin_memset (&nrvo, 0, sizeof(X));

>>   return nrvo;

>> }

>>

>> input to the pass:

>> X f() ()

>> {

>>   <bb 2> [0.0%]:

>>   # .MEM_3 = VDEF <.MEM_1(D)>

>>   __builtin_memset (nrvo_2(D), 0, 8);

>>   # VUSE <.MEM_3>

>>   return nrvo_2(D);

>>

>> }

>>

>> verify_gimple_return failed with:

>> tail-padding1.C:13:1: error: invalid conversion in return statement

>>  }

>>  ^

>> struct X

>>

>> struct X &

>>

>> # VUSE <.MEM_3>

>> return _4;

>>

>> It seems the return type of function (struct X) differs with the type

>> of return value (struct X&).

>> Not sure how this is possible ?

>

> You need to honor DECL_BY_REFERENCE of DECL_RESULT.

Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))
resolved the error.
Does the attached version look OK ?
Validation in progress.

Thanks,
Prathamesh
>

>> To work around that, I guarded the transform on:

>> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),

>>                                              TREE_TYPE (retval)))

>>

>> in the patch. Does that look OK ?

>>

>> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.

>> Cross-tested on arm*-*-*, aarch64*-*-*.

>>

>> Thanks,

>> Prathamesh

>> >

>> >

>> > Jeff

>>

>

> --

> Richard Biener <rguenther@suse.de>

> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

Comments

Richard Biener Dec. 1, 2016, 12:56 p.m. UTC | #1
On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote:

> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote:

> >

> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote:

> >> > On 11/25/2016 01:07 AM, Richard Biener wrote:

> >> >

> >> >>> For the tail-call, issue should we artificially create a lhs and use that

> >> >>> as return value (perhaps by a separate pass before tailcall) ?

> >> >>>

> >> >>> __builtin_memcpy (a1, a2, a3);

> >> >>> return a1;

> >> >>>

> >> >>> gets transformed to:

> >> >>> _1 = __builtin_memcpy (a1, a2, a3)

> >> >>> return _1;

> >> >>>

> >> >>> So tail-call optimization pass would see the IL in it's expected form.

> >> >>

> >> >>

> >> >> As said, a RTL expert needs to chime in here.  Iff then tail-call

> >> >> itself should do this rewrite.  But if this form is required to make

> >> >> things work (I suppose you checked it _does_ actually work?) then

> >> >> we'd need to make sure later passes do not undo it.  So it looks

> >> >> fragile to me.  OTOH I seem to remember that the flags we set on

> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is

> >> >> verified again there?

> >> >

> >> > So tail calling actually sits on the border between trees and RTL.

> >> > Essentially it's an expand-time decision as we use information from trees as

> >> > well as low level target information.

> >> >

> >> > I would not expect the former sequence to tail call.  The tail calling code

> >> > does not know that the return value from memcpy will be a1.  Thus the tail

> >> > calling code has to assume that it'll have to copy a1 into the return

> >> > register after returning from memcpy, which obviously can't be done if we

> >> > tail called memcpy.

> >> >

> >> > The second form is much more likely to turn into a tail call sequence

> >> > because the return value from memcpy will be sitting in the proper register.

> >> > This form out to work for most calling conventions that allow tail calls.

> >> >

> >> > We could (in theory) try and exploit the fact that memcpy returns its first

> >> > argument as a return value, but that would only be helpful on a target where

> >> > the first argument and return value use the same register. So I'd have a

> >> > slight preference to rewriting per Prathamesh's suggestion above since it's

> >> > more general.

> >> Thanks for the suggestion. The attached patch creates artificial lhs,

> >> and returns it if the function returns it's argument and that argument

> >> is used as return-value.

> >>

> >> eg:

> >> f (void * a1, void * a2, long unsigned int a3)

> >> {

> >>   <bb 2> [0.0%]:

> >>   # .MEM_5 = VDEF <.MEM_1(D)>

> >>   __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));

> >>   # VUSE <.MEM_5>

> >>   return a1_2(D);

> >>

> >> }

> >>

> >> is transformed to:

> >> f (void * a1, void * a2, long unsigned int a3)

> >> {

> >>   void * _6;

> >>

> >>   <bb 2> [0.0%]:

> >>   # .MEM_5 = VDEF <.MEM_1(D)>

> >>   _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D));

> >>   # VUSE <.MEM_5>

> >>   return _6;

> >>

> >> }

> >>

> >> While testing, I came across an issue with function f() defined

> >> intail-padding1.C:

> >> struct X

> >> {

> >>   ~X() {}

> >>   int n;

> >>   char d;

> >> };

> >>

> >> X f()

> >> {

> >>   X nrvo;

> >>   __builtin_memset (&nrvo, 0, sizeof(X));

> >>   return nrvo;

> >> }

> >>

> >> input to the pass:

> >> X f() ()

> >> {

> >>   <bb 2> [0.0%]:

> >>   # .MEM_3 = VDEF <.MEM_1(D)>

> >>   __builtin_memset (nrvo_2(D), 0, 8);

> >>   # VUSE <.MEM_3>

> >>   return nrvo_2(D);

> >>

> >> }

> >>

> >> verify_gimple_return failed with:

> >> tail-padding1.C:13:1: error: invalid conversion in return statement

> >>  }

> >>  ^

> >> struct X

> >>

> >> struct X &

> >>

> >> # VUSE <.MEM_3>

> >> return _4;

> >>

> >> It seems the return type of function (struct X) differs with the type

> >> of return value (struct X&).

> >> Not sure how this is possible ?

> >

> > You need to honor DECL_BY_REFERENCE of DECL_RESULT.

> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))

> resolved the error.

> Does the attached version look OK ?


+                         ass_var = make_ssa_name (TREE_TYPE (arg));

can you try

      ass_var = copy_ssa_name (arg);

instead?  That way the underlying decl should make sure the
DECL_BY_REFERENCE check in the IL verification works.

Thanks,
Richard.


> Validation in progress.

> 

> Thanks,

> Prathamesh

> >

> >> To work around that, I guarded the transform on:

> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)),

> >>                                              TREE_TYPE (retval)))

> >>

> >> in the patch. Does that look OK ?

> >>

> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada.

> >> Cross-tested on arm*-*-*, aarch64*-*-*.

> >>

> >> Thanks,

> >> Prathamesh

> >> >

> >> >

> >> > Jeff

> >>

> >

> > --

> > Richard Biener <rguenther@suse.de>

> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

> 


-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
new file mode 100644
index 0000000..b3fdc6c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-tailc-details" } */
+
+void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
+{
+  __builtin_memcpy (a1, a2, a3);
+  return a1;
+}
+
+/* { dg-final { scan-tree-dump-times "Found tail call" 1 "tailc" } } */ 
diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c
index 66a0a4c..a1c8bd7 100644
--- a/gcc/tree-tailcall.c
+++ b/gcc/tree-tailcall.c
@@ -401,6 +401,7 @@  find_tail_calls (basic_block bb, struct tailcall **ret)
   basic_block abb;
   size_t idx;
   tree var;
+  greturn *ret_stmt = NULL;
 
   if (!single_succ_p (bb))
     return;
@@ -408,6 +409,8 @@  find_tail_calls (basic_block bb, struct tailcall **ret)
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
     {
       stmt = gsi_stmt (gsi);
+      if (!ret_stmt)
+	ret_stmt = dyn_cast<greturn *> (stmt);
 
       /* Ignore labels, returns, nops, clobbers and debug stmts.  */
       if (gimple_code (stmt) == GIMPLE_LABEL
@@ -422,6 +425,36 @@  find_tail_calls (basic_block bb, struct tailcall **ret)
 	{
 	  call = as_a <gcall *> (stmt);
 	  ass_var = gimple_call_lhs (call);
+	  if (!ass_var)
+	    {
+	      /* Check if function returns one if it's arguments
+		 and that argument is used as return value.
+		 In that case create an artificial lhs to call_stmt,
+		 and set it as the return value.  */
+
+	      unsigned rf = gimple_call_return_flags (call);
+	      if (rf & ERF_RETURNS_ARG)
+		{
+		  unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+		  if (argnum < gimple_call_num_args (call)
+		      && ret_stmt)
+		    {
+		      tree arg = gimple_call_arg (call, argnum);
+		      tree retval = gimple_return_retval (ret_stmt);
+		      if (retval
+			  && TREE_CODE (retval) == SSA_NAME
+			  && operand_equal_p (retval, arg, 0)
+			  && !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)))
+			{
+			  ass_var = make_ssa_name (TREE_TYPE (arg));
+			  gimple_call_set_lhs (call, ass_var);
+			  update_stmt (call);
+			  gimple_return_set_retval (ret_stmt, ass_var);
+			  update_stmt (ret_stmt);
+			}
+		    }
+		}
+	    }
 	  break;
 	}