[tree-tailcall] Check if function returns it's argument

Message ID CAAgBjM=gRR+C8GwrT6=9nHQZ3vapJKuOPsrDMHjEWmEnFCa3Rw@mail.gmail.com
State New
Headers show

Commit Message

Prathamesh Kulkarni Nov. 24, 2016, 12:15 p.m.
On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

>

>> Hi,

>> Consider following test-case:

>>

>> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

>> {

>>   __builtin_memcpy (a1, a2, a3);

>>   return a1;

>> }

>>

>> return a1 can be considered equivalent to return value of memcpy,

>> and the call could be emitted as a tail-call.

>> gcc doesn't emit the above call to memcpy as a tail-call,

>> but if it is changed to:

>>

>> void *t1 = __builtin_memcpy (a1, a2, a3);

>> return t1;

>>

>> Then memcpy is emitted as a tail-call.

>> The attached patch tries to handle the former case.

>>

>> Bootstrapped+tested on x86_64-unknown-linux-gnu.

>> Cross tested on arm*-*-*, aarch64*-*-*

>> Does this patch look OK ?

>

> +/* Return arg, if function returns it's argument or NULL if it doesn't.

> */

> +tree

> +gimple_call_return_arg (gcall *call_stmt)

> +{

>

>

> Please just inline it at the single use - the name is not terribly

> informative.

>

> I'm not sure you can rely on code-generation working if you not

> effectively change the IL to

>

>   a1 = __builtin_memcpy (a1, a2, a3);

>   return a1;

>

> someone more familiar with RTL expansion plus tail call emission on

> RTL needs to chime in.

Well I was trying to copy-propagate function's argument into uses of
it's return value if
function returned that argument, so the assignment to lhs of call
could be made redundant.

eg:
void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
{
  void *t1 = __builtin_memcpy (a1, a2, a3);
  return t1;
}

After patch, copyprop transformed it into:
t1 = __builtin_memcpy (a1, a2, a3);
return a1;

But this now interferes with tail-call optimization, because it is not
able to emit memcpy
as tail-call anymore due to which the patch regressed 20050503-1.c.
I am not sure how to workaround this.

Thanks,
Prathamesh
>

> Richard.

Comments

Richard Biener Nov. 24, 2016, 12:18 p.m. | #1
On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:

> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> >

> >> Hi,

> >> Consider following test-case:

> >>

> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> >> {

> >>   __builtin_memcpy (a1, a2, a3);

> >>   return a1;

> >> }

> >>

> >> return a1 can be considered equivalent to return value of memcpy,

> >> and the call could be emitted as a tail-call.

> >> gcc doesn't emit the above call to memcpy as a tail-call,

> >> but if it is changed to:

> >>

> >> void *t1 = __builtin_memcpy (a1, a2, a3);

> >> return t1;

> >>

> >> Then memcpy is emitted as a tail-call.

> >> The attached patch tries to handle the former case.

> >>

> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.

> >> Cross tested on arm*-*-*, aarch64*-*-*

> >> Does this patch look OK ?

> >

> > +/* Return arg, if function returns it's argument or NULL if it doesn't.

> > */

> > +tree

> > +gimple_call_return_arg (gcall *call_stmt)

> > +{

> >

> >

> > Please just inline it at the single use - the name is not terribly

> > informative.

> >

> > I'm not sure you can rely on code-generation working if you not

> > effectively change the IL to

> >

> >   a1 = __builtin_memcpy (a1, a2, a3);

> >   return a1;

> >

> > someone more familiar with RTL expansion plus tail call emission on

> > RTL needs to chime in.

> Well I was trying to copy-propagate function's argument into uses of

> it's return value if

> function returned that argument, so the assignment to lhs of call

> could be made redundant.

> 

> eg:

> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> {

>   void *t1 = __builtin_memcpy (a1, a2, a3);

>   return t1;

> }

>

> After patch, copyprop transformed it into:

> t1 = __builtin_memcpy (a1, a2, a3);

> return a1;


But that's a bad transform -- if we know that t1 == a1 then it's
better to use t1 as that's readily available in the return register
while the register for a1 might have been clobbered and thus we
need to spill it for the later return.
 
> But this now interferes with tail-call optimization, because it is not

> able to emit memcpy

> as tail-call anymore due to which the patch regressed 20050503-1.c.

> I am not sure how to workaround this.

> 

> Thanks,

> Prathamesh

> >

> > Richard.

> 


-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Jakub Jelinek Nov. 24, 2016, 12:21 p.m. | #2
On Thu, Nov 24, 2016 at 01:18:37PM +0100, Richard Biener wrote:
> > eg:

> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> > {

> >   void *t1 = __builtin_memcpy (a1, a2, a3);

> >   return t1;

> > }

> >

> > After patch, copyprop transformed it into:

> > t1 = __builtin_memcpy (a1, a2, a3);

> > return a1;

> 

> But that's a bad transform -- if we know that t1 == a1 then it's

> better to use t1 as that's readily available in the return register

> while the register for a1 might have been clobbered and thus we

> need to spill it for the later return.


I thought that the RA is aware of this and is able to undo it.

	Jakub
Richard Biener Nov. 24, 2016, 12:23 p.m. | #3
On Thu, 24 Nov 2016, Jakub Jelinek wrote:

> On Thu, Nov 24, 2016 at 01:18:37PM +0100, Richard Biener wrote:

> > > eg:

> > > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> > > {

> > >   void *t1 = __builtin_memcpy (a1, a2, a3);

> > >   return t1;

> > > }

> > >

> > > After patch, copyprop transformed it into:

> > > t1 = __builtin_memcpy (a1, a2, a3);

> > > return a1;

> > 

> > But that's a bad transform -- if we know that t1 == a1 then it's

> > better to use t1 as that's readily available in the return register

> > while the register for a1 might have been clobbered and thus we

> > need to spill it for the later return.

> 

> I thought that the RA is aware of this and is able to undo it.


The RA has some feature here, yes.  Not sure what it can do
exactly and if it can do something when the return value is
optimized away.

Richard.
Prathamesh Kulkarni Nov. 24, 2016, 12:31 p.m. | #4
On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

>

>> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:

>> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

>> >

>> >> Hi,

>> >> Consider following test-case:

>> >>

>> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

>> >> {

>> >>   __builtin_memcpy (a1, a2, a3);

>> >>   return a1;

>> >> }

>> >>

>> >> return a1 can be considered equivalent to return value of memcpy,

>> >> and the call could be emitted as a tail-call.

>> >> gcc doesn't emit the above call to memcpy as a tail-call,

>> >> but if it is changed to:

>> >>

>> >> void *t1 = __builtin_memcpy (a1, a2, a3);

>> >> return t1;

>> >>

>> >> Then memcpy is emitted as a tail-call.

>> >> The attached patch tries to handle the former case.

>> >>

>> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.

>> >> Cross tested on arm*-*-*, aarch64*-*-*

>> >> Does this patch look OK ?

>> >

>> > +/* Return arg, if function returns it's argument or NULL if it doesn't.

>> > */

>> > +tree

>> > +gimple_call_return_arg (gcall *call_stmt)

>> > +{

>> >

>> >

>> > Please just inline it at the single use - the name is not terribly

>> > informative.

>> >

>> > I'm not sure you can rely on code-generation working if you not

>> > effectively change the IL to

>> >

>> >   a1 = __builtin_memcpy (a1, a2, a3);

>> >   return a1;

>> >

>> > someone more familiar with RTL expansion plus tail call emission on

>> > RTL needs to chime in.

>> Well I was trying to copy-propagate function's argument into uses of

>> it's return value if

>> function returned that argument, so the assignment to lhs of call

>> could be made redundant.

>>

>> eg:

>> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

>> {

>>   void *t1 = __builtin_memcpy (a1, a2, a3);

>>   return t1;

>> }

>>

>> After patch, copyprop transformed it into:

>> t1 = __builtin_memcpy (a1, a2, a3);

>> return a1;

>

> But that's a bad transform -- if we know that t1 == a1 then it's

> better to use t1 as that's readily available in the return register

> while the register for a1 might have been clobbered and thus we

> need to spill it for the later return.

Oh I didn't realize this could possibly pessimize RA.
For test-case:

void *t1 = memcpy (dest, src, n);
if (t1 != dest)
  __builtin_abort ();

we could copy-propagate t1 into cond_expr and make the condition redundant.
However I suppose this particular case could be handled with VRP instead
(t1 and dest should be marked equivalent) ?

Thanks,
Prathamesh
>

>> But this now interferes with tail-call optimization, because it is not

>> able to emit memcpy

>> as tail-call anymore due to which the patch regressed 20050503-1.c.

>> I am not sure how to workaround this.

>>

>> Thanks,

>> Prathamesh

>> >

>> > Richard.

>>

>

> --

> Richard Biener <rguenther@suse.de>

> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Richard Biener Nov. 24, 2016, 12:38 p.m. | #5
On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:

> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> >

> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:

> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> >> >

> >> >> Hi,

> >> >> Consider following test-case:

> >> >>

> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> >> >> {

> >> >>   __builtin_memcpy (a1, a2, a3);

> >> >>   return a1;

> >> >> }

> >> >>

> >> >> return a1 can be considered equivalent to return value of memcpy,

> >> >> and the call could be emitted as a tail-call.

> >> >> gcc doesn't emit the above call to memcpy as a tail-call,

> >> >> but if it is changed to:

> >> >>

> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);

> >> >> return t1;

> >> >>

> >> >> Then memcpy is emitted as a tail-call.

> >> >> The attached patch tries to handle the former case.

> >> >>

> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.

> >> >> Cross tested on arm*-*-*, aarch64*-*-*

> >> >> Does this patch look OK ?

> >> >

> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't.

> >> > */

> >> > +tree

> >> > +gimple_call_return_arg (gcall *call_stmt)

> >> > +{

> >> >

> >> >

> >> > Please just inline it at the single use - the name is not terribly

> >> > informative.

> >> >

> >> > I'm not sure you can rely on code-generation working if you not

> >> > effectively change the IL to

> >> >

> >> >   a1 = __builtin_memcpy (a1, a2, a3);

> >> >   return a1;

> >> >

> >> > someone more familiar with RTL expansion plus tail call emission on

> >> > RTL needs to chime in.

> >> Well I was trying to copy-propagate function's argument into uses of

> >> it's return value if

> >> function returned that argument, so the assignment to lhs of call

> >> could be made redundant.

> >>

> >> eg:

> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> >> {

> >>   void *t1 = __builtin_memcpy (a1, a2, a3);

> >>   return t1;

> >> }

> >>

> >> After patch, copyprop transformed it into:

> >> t1 = __builtin_memcpy (a1, a2, a3);

> >> return a1;

> >

> > But that's a bad transform -- if we know that t1 == a1 then it's

> > better to use t1 as that's readily available in the return register

> > while the register for a1 might have been clobbered and thus we

> > need to spill it for the later return.

> Oh I didn't realize this could possibly pessimize RA.

> For test-case:

> 

> void *t1 = memcpy (dest, src, n);

> if (t1 != dest)

>   __builtin_abort ();

> 

> we could copy-propagate t1 into cond_expr and make the condition redundant.

> However I suppose this particular case could be handled with VRP instead

> (t1 and dest should be marked equivalent) ?


Yeah, exposing this to value-numbering in general can enable some
optimizations (but I wouldn't put it in copyprop).  Note it's then
difficult to avoid copy-propgating things...

The user can also write

void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
{
  __builtin_memcpy (a1, a2, a3);
  return a1;
}

so it's good to improve code-gen for that (for the tailcall issue).

Richard.

> Thanks,

> Prathamesh

> >

> >> But this now interferes with tail-call optimization, because it is not

> >> able to emit memcpy

> >> as tail-call anymore due to which the patch regressed 20050503-1.c.

> >> I am not sure how to workaround this.

> >>

> >> Thanks,

> >> Prathamesh

> >> >

> >> > Richard.

> >>

> >

> > --

> > Richard Biener <rguenther@suse.de>

> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

> 

> 


-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Prathamesh Kulkarni Nov. 25, 2016, 5:16 a.m. | #6
On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote:
> On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

>

>> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:

>> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

>> >

>> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:

>> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

>> >> >

>> >> >> Hi,

>> >> >> Consider following test-case:

>> >> >>

>> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

>> >> >> {

>> >> >>   __builtin_memcpy (a1, a2, a3);

>> >> >>   return a1;

>> >> >> }

>> >> >>

>> >> >> return a1 can be considered equivalent to return value of memcpy,

>> >> >> and the call could be emitted as a tail-call.

>> >> >> gcc doesn't emit the above call to memcpy as a tail-call,

>> >> >> but if it is changed to:

>> >> >>

>> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);

>> >> >> return t1;

>> >> >>

>> >> >> Then memcpy is emitted as a tail-call.

>> >> >> The attached patch tries to handle the former case.

>> >> >>

>> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.

>> >> >> Cross tested on arm*-*-*, aarch64*-*-*

>> >> >> Does this patch look OK ?

>> >> >

>> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't.

>> >> > */

>> >> > +tree

>> >> > +gimple_call_return_arg (gcall *call_stmt)

>> >> > +{

>> >> >

>> >> >

>> >> > Please just inline it at the single use - the name is not terribly

>> >> > informative.

>> >> >

>> >> > I'm not sure you can rely on code-generation working if you not

>> >> > effectively change the IL to

>> >> >

>> >> >   a1 = __builtin_memcpy (a1, a2, a3);

>> >> >   return a1;

>> >> >

>> >> > someone more familiar with RTL expansion plus tail call emission on

>> >> > RTL needs to chime in.

>> >> Well I was trying to copy-propagate function's argument into uses of

>> >> it's return value if

>> >> function returned that argument, so the assignment to lhs of call

>> >> could be made redundant.

>> >>

>> >> eg:

>> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

>> >> {

>> >>   void *t1 = __builtin_memcpy (a1, a2, a3);

>> >>   return t1;

>> >> }

>> >>

>> >> After patch, copyprop transformed it into:

>> >> t1 = __builtin_memcpy (a1, a2, a3);

>> >> return a1;

>> >

>> > But that's a bad transform -- if we know that t1 == a1 then it's

>> > better to use t1 as that's readily available in the return register

>> > while the register for a1 might have been clobbered and thus we

>> > need to spill it for the later return.

>> Oh I didn't realize this could possibly pessimize RA.

>> For test-case:

>>

>> void *t1 = memcpy (dest, src, n);

>> if (t1 != dest)

>>   __builtin_abort ();

>>

>> we could copy-propagate t1 into cond_expr and make the condition redundant.

>> However I suppose this particular case could be handled with VRP instead

>> (t1 and dest should be marked equivalent) ?

>

> Yeah, exposing this to value-numbering in general can enable some

> optimizations (but I wouldn't put it in copyprop).  Note it's then

> difficult to avoid copy-propgating things...

>

> The user can also write

>

> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> {

>   __builtin_memcpy (a1, a2, a3);

>   return a1;

> }

>

> so it's good to improve code-gen for that (for the tailcall issue).

For the tail-call, issue should we artificially create a lhs and use that
as return value (perhaps by a separate pass before tailcall) ?

__builtin_memcpy (a1, a2, a3);
return a1;

gets transformed to:
_1 = __builtin_memcpy (a1, a2, a3)
return _1;

So tail-call optimization pass would see the IL in it's expected form.

Thanks,
Prathamesh
>

> Richard.

>

>> Thanks,

>> Prathamesh

>> >

>> >> But this now interferes with tail-call optimization, because it is not

>> >> able to emit memcpy

>> >> as tail-call anymore due to which the patch regressed 20050503-1.c.

>> >> I am not sure how to workaround this.

>> >>

>> >> Thanks,

>> >> Prathamesh

>> >> >

>> >> > Richard.

>> >>

>> >

>> > --

>> > Richard Biener <rguenther@suse.de>

>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

>>

>>

>

> --

> Richard Biener <rguenther@suse.de>

> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Richard Biener Nov. 25, 2016, 8:07 a.m. | #7
On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:

> On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote:

> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> >

> >> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:

> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> >> >

> >> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:

> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> >> >> >

> >> >> >> Hi,

> >> >> >> Consider following test-case:

> >> >> >>

> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> >> >> >> {

> >> >> >>   __builtin_memcpy (a1, a2, a3);

> >> >> >>   return a1;

> >> >> >> }

> >> >> >>

> >> >> >> return a1 can be considered equivalent to return value of memcpy,

> >> >> >> and the call could be emitted as a tail-call.

> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call,

> >> >> >> but if it is changed to:

> >> >> >>

> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);

> >> >> >> return t1;

> >> >> >>

> >> >> >> Then memcpy is emitted as a tail-call.

> >> >> >> The attached patch tries to handle the former case.

> >> >> >>

> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.

> >> >> >> Cross tested on arm*-*-*, aarch64*-*-*

> >> >> >> Does this patch look OK ?

> >> >> >

> >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't.

> >> >> > */

> >> >> > +tree

> >> >> > +gimple_call_return_arg (gcall *call_stmt)

> >> >> > +{

> >> >> >

> >> >> >

> >> >> > Please just inline it at the single use - the name is not terribly

> >> >> > informative.

> >> >> >

> >> >> > I'm not sure you can rely on code-generation working if you not

> >> >> > effectively change the IL to

> >> >> >

> >> >> >   a1 = __builtin_memcpy (a1, a2, a3);

> >> >> >   return a1;

> >> >> >

> >> >> > someone more familiar with RTL expansion plus tail call emission on

> >> >> > RTL needs to chime in.

> >> >> Well I was trying to copy-propagate function's argument into uses of

> >> >> it's return value if

> >> >> function returned that argument, so the assignment to lhs of call

> >> >> could be made redundant.

> >> >>

> >> >> eg:

> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> >> >> {

> >> >>   void *t1 = __builtin_memcpy (a1, a2, a3);

> >> >>   return t1;

> >> >> }

> >> >>

> >> >> After patch, copyprop transformed it into:

> >> >> t1 = __builtin_memcpy (a1, a2, a3);

> >> >> return a1;

> >> >

> >> > But that's a bad transform -- if we know that t1 == a1 then it's

> >> > better to use t1 as that's readily available in the return register

> >> > while the register for a1 might have been clobbered and thus we

> >> > need to spill it for the later return.

> >> Oh I didn't realize this could possibly pessimize RA.

> >> For test-case:

> >>

> >> void *t1 = memcpy (dest, src, n);

> >> if (t1 != dest)

> >>   __builtin_abort ();

> >>

> >> we could copy-propagate t1 into cond_expr and make the condition redundant.

> >> However I suppose this particular case could be handled with VRP instead

> >> (t1 and dest should be marked equivalent) ?

> >

> > Yeah, exposing this to value-numbering in general can enable some

> > optimizations (but I wouldn't put it in copyprop).  Note it's then

> > difficult to avoid copy-propgating things...

> >

> > The user can also write

> >

> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> > {

> >   __builtin_memcpy (a1, a2, a3);

> >   return a1;

> > }

> >

> > so it's good to improve code-gen for that (for the tailcall issue).

> For the tail-call, issue should we artificially create a lhs and use that

> as return value (perhaps by a separate pass before tailcall) ?

> 

> __builtin_memcpy (a1, a2, a3);

> return a1;

> 

> gets transformed to:

> _1 = __builtin_memcpy (a1, a2, a3)

> return _1;

> 

> So tail-call optimization pass would see the IL in it's expected form.


As said, a RTL expert needs to chime in here.  Iff then tail-call
itself should do this rewrite.  But if this form is required to make
things work (I suppose you checked it _does_ actually work?) then
we'd need to make sure later passes do not undo it.  So it looks
fragile to me.  OTOH I seem to remember that the flags we set on
GIMPLE are merely a hint to RTL expansion and the tailcalling is
verified again there?

Thanks,
Richard.

> Thanks,

> Prathamesh

> >

> > Richard.

> >

> >> Thanks,

> >> Prathamesh

> >> >

> >> >> But this now interferes with tail-call optimization, because it is not

> >> >> able to emit memcpy

> >> >> as tail-call anymore due to which the patch regressed 20050503-1.c.

> >> >> I am not sure how to workaround this.

> >> >>

> >> >> Thanks,

> >> >> Prathamesh

> >> >> >

> >> >> > Richard.

> >> >>

> >> >

> >> > --

> >> > Richard Biener <rguenther@suse.de>

> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

> >>

> >>

> >

> > --

> > Richard Biener <rguenther@suse.de>

> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

> 

> 


-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Prathamesh Kulkarni Nov. 25, 2016, 8:18 a.m. | #8
On 25 November 2016 at 13:37, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:

>

>> On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote:

>> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

>> >

>> >> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:

>> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

>> >> >

>> >> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:

>> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

>> >> >> >

>> >> >> >> Hi,

>> >> >> >> Consider following test-case:

>> >> >> >>

>> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

>> >> >> >> {

>> >> >> >>   __builtin_memcpy (a1, a2, a3);

>> >> >> >>   return a1;

>> >> >> >> }

>> >> >> >>

>> >> >> >> return a1 can be considered equivalent to return value of memcpy,

>> >> >> >> and the call could be emitted as a tail-call.

>> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call,

>> >> >> >> but if it is changed to:

>> >> >> >>

>> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);

>> >> >> >> return t1;

>> >> >> >>

>> >> >> >> Then memcpy is emitted as a tail-call.

>> >> >> >> The attached patch tries to handle the former case.

>> >> >> >>

>> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.

>> >> >> >> Cross tested on arm*-*-*, aarch64*-*-*

>> >> >> >> Does this patch look OK ?

>> >> >> >

>> >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't.

>> >> >> > */

>> >> >> > +tree

>> >> >> > +gimple_call_return_arg (gcall *call_stmt)

>> >> >> > +{

>> >> >> >

>> >> >> >

>> >> >> > Please just inline it at the single use - the name is not terribly

>> >> >> > informative.

>> >> >> >

>> >> >> > I'm not sure you can rely on code-generation working if you not

>> >> >> > effectively change the IL to

>> >> >> >

>> >> >> >   a1 = __builtin_memcpy (a1, a2, a3);

>> >> >> >   return a1;

>> >> >> >

>> >> >> > someone more familiar with RTL expansion plus tail call emission on

>> >> >> > RTL needs to chime in.

>> >> >> Well I was trying to copy-propagate function's argument into uses of

>> >> >> it's return value if

>> >> >> function returned that argument, so the assignment to lhs of call

>> >> >> could be made redundant.

>> >> >>

>> >> >> eg:

>> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

>> >> >> {

>> >> >>   void *t1 = __builtin_memcpy (a1, a2, a3);

>> >> >>   return t1;

>> >> >> }

>> >> >>

>> >> >> After patch, copyprop transformed it into:

>> >> >> t1 = __builtin_memcpy (a1, a2, a3);

>> >> >> return a1;

>> >> >

>> >> > But that's a bad transform -- if we know that t1 == a1 then it's

>> >> > better to use t1 as that's readily available in the return register

>> >> > while the register for a1 might have been clobbered and thus we

>> >> > need to spill it for the later return.

>> >> Oh I didn't realize this could possibly pessimize RA.

>> >> For test-case:

>> >>

>> >> void *t1 = memcpy (dest, src, n);

>> >> if (t1 != dest)

>> >>   __builtin_abort ();

>> >>

>> >> we could copy-propagate t1 into cond_expr and make the condition redundant.

>> >> However I suppose this particular case could be handled with VRP instead

>> >> (t1 and dest should be marked equivalent) ?

>> >

>> > Yeah, exposing this to value-numbering in general can enable some

>> > optimizations (but I wouldn't put it in copyprop).  Note it's then

>> > difficult to avoid copy-propgating things...

>> >

>> > The user can also write

>> >

>> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

>> > {

>> >   __builtin_memcpy (a1, a2, a3);

>> >   return a1;

>> > }

>> >

>> > so it's good to improve code-gen for that (for the tailcall issue).

>> For the tail-call, issue should we artificially create a lhs and use that

>> as return value (perhaps by a separate pass before tailcall) ?

>>

>> __builtin_memcpy (a1, a2, a3);

>> return a1;

>>

>> gets transformed to:

>> _1 = __builtin_memcpy (a1, a2, a3)

>> return _1;

>>

>> So tail-call optimization pass would see the IL in it's expected form.

>

> As said, a RTL expert needs to chime in here.  Iff then tail-call

> itself should do this rewrite.  But if this form is required to make

> things work (I suppose you checked it _does_ actually work?) then

> we'd need to make sure later passes do not undo it.  So it looks

> fragile to me.  OTOH I seem to remember that the flags we set on

> GIMPLE are merely a hint to RTL expansion and the tailcalling is

> verified again there?


Yeah, I verified the form works:
void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
{
  void *t1 = __builtin_memcpy (a1, a2, a3);
  return t1;
}

assembly:
f:
.LFB0:
        .cfi_startproc
        jmp     memcpy
        .cfi_endproc

Thanks,
Prathamesh
>

> Thanks,

> Richard.

>

>> Thanks,

>> Prathamesh

>> >

>> > Richard.

>> >

>> >> Thanks,

>> >> Prathamesh

>> >> >

>> >> >> But this now interferes with tail-call optimization, because it is not

>> >> >> able to emit memcpy

>> >> >> as tail-call anymore due to which the patch regressed 20050503-1.c.

>> >> >> I am not sure how to workaround this.

>> >> >>

>> >> >> Thanks,

>> >> >> Prathamesh

>> >> >> >

>> >> >> > Richard.

>> >> >>

>> >> >

>> >> > --

>> >> > Richard Biener <rguenther@suse.de>

>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

>> >>

>> >>

>> >

>> > --

>> > Richard Biener <rguenther@suse.de>

>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

>>

>>

>

> --

> Richard Biener <rguenther@suse.de>

> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Richard Biener Nov. 25, 2016, 8:25 a.m. | #9
On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:

> On 25 November 2016 at 13:37, Richard Biener <rguenther@suse.de> wrote:

> > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:

> >

> >> On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote:

> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> >> >

> >> >> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:

> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> >> >> >

> >> >> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:

> >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

> >> >> >> >

> >> >> >> >> Hi,

> >> >> >> >> Consider following test-case:

> >> >> >> >>

> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> >> >> >> >> {

> >> >> >> >>   __builtin_memcpy (a1, a2, a3);

> >> >> >> >>   return a1;

> >> >> >> >> }

> >> >> >> >>

> >> >> >> >> return a1 can be considered equivalent to return value of memcpy,

> >> >> >> >> and the call could be emitted as a tail-call.

> >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call,

> >> >> >> >> but if it is changed to:

> >> >> >> >>

> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);

> >> >> >> >> return t1;

> >> >> >> >>

> >> >> >> >> Then memcpy is emitted as a tail-call.

> >> >> >> >> The attached patch tries to handle the former case.

> >> >> >> >>

> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.

> >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-*

> >> >> >> >> Does this patch look OK ?

> >> >> >> >

> >> >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't.

> >> >> >> > */

> >> >> >> > +tree

> >> >> >> > +gimple_call_return_arg (gcall *call_stmt)

> >> >> >> > +{

> >> >> >> >

> >> >> >> >

> >> >> >> > Please just inline it at the single use - the name is not terribly

> >> >> >> > informative.

> >> >> >> >

> >> >> >> > I'm not sure you can rely on code-generation working if you not

> >> >> >> > effectively change the IL to

> >> >> >> >

> >> >> >> >   a1 = __builtin_memcpy (a1, a2, a3);

> >> >> >> >   return a1;

> >> >> >> >

> >> >> >> > someone more familiar with RTL expansion plus tail call emission on

> >> >> >> > RTL needs to chime in.

> >> >> >> Well I was trying to copy-propagate function's argument into uses of

> >> >> >> it's return value if

> >> >> >> function returned that argument, so the assignment to lhs of call

> >> >> >> could be made redundant.

> >> >> >>

> >> >> >> eg:

> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> >> >> >> {

> >> >> >>   void *t1 = __builtin_memcpy (a1, a2, a3);

> >> >> >>   return t1;

> >> >> >> }

> >> >> >>

> >> >> >> After patch, copyprop transformed it into:

> >> >> >> t1 = __builtin_memcpy (a1, a2, a3);

> >> >> >> return a1;

> >> >> >

> >> >> > But that's a bad transform -- if we know that t1 == a1 then it's

> >> >> > better to use t1 as that's readily available in the return register

> >> >> > while the register for a1 might have been clobbered and thus we

> >> >> > need to spill it for the later return.

> >> >> Oh I didn't realize this could possibly pessimize RA.

> >> >> For test-case:

> >> >>

> >> >> void *t1 = memcpy (dest, src, n);

> >> >> if (t1 != dest)

> >> >>   __builtin_abort ();

> >> >>

> >> >> we could copy-propagate t1 into cond_expr and make the condition redundant.

> >> >> However I suppose this particular case could be handled with VRP instead

> >> >> (t1 and dest should be marked equivalent) ?

> >> >

> >> > Yeah, exposing this to value-numbering in general can enable some

> >> > optimizations (but I wouldn't put it in copyprop).  Note it's then

> >> > difficult to avoid copy-propgating things...

> >> >

> >> > The user can also write

> >> >

> >> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> >> > {

> >> >   __builtin_memcpy (a1, a2, a3);

> >> >   return a1;

> >> > }

> >> >

> >> > so it's good to improve code-gen for that (for the tailcall issue).

> >> For the tail-call, issue should we artificially create a lhs and use that

> >> as return value (perhaps by a separate pass before tailcall) ?

> >>

> >> __builtin_memcpy (a1, a2, a3);

> >> return a1;

> >>

> >> gets transformed to:

> >> _1 = __builtin_memcpy (a1, a2, a3)

> >> return _1;

> >>

> >> So tail-call optimization pass would see the IL in it's expected form.

> >

> > As said, a RTL expert needs to chime in here.  Iff then tail-call

> > itself should do this rewrite.  But if this form is required to make

> > things work (I suppose you checked it _does_ actually work?) then

> > we'd need to make sure later passes do not undo it.  So it looks

> > fragile to me.  OTOH I seem to remember that the flags we set on

> > GIMPLE are merely a hint to RTL expansion and the tailcalling is

> > verified again there?

> 

> Yeah, I verified the form works:

> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> {

>   void *t1 = __builtin_memcpy (a1, a2, a3);

>   return t1;

> }

> 

> assembly:

> f:

> .LFB0:

>         .cfi_startproc

>         jmp     memcpy

>         .cfi_endproc


I meant the

void *f(void *a1, void *a2, __SIZE_TYPE__ a3)
{
  __builtin_memcpy (a1, a2, a3);
  return a1;
}

form after your patch to the tailcall pass.

Richard.

> Thanks,

> Prathamesh

> >

> > Thanks,

> > Richard.

> >

> >> Thanks,

> >> Prathamesh

> >> >

> >> > Richard.

> >> >

> >> >> Thanks,

> >> >> Prathamesh

> >> >> >

> >> >> >> But this now interferes with tail-call optimization, because it is not

> >> >> >> able to emit memcpy

> >> >> >> as tail-call anymore due to which the patch regressed 20050503-1.c.

> >> >> >> I am not sure how to workaround this.

> >> >> >>

> >> >> >> Thanks,

> >> >> >> Prathamesh

> >> >> >> >

> >> >> >> > Richard.

> >> >> >>

> >> >> >

> >> >> > --

> >> >> > Richard Biener <rguenther@suse.de>

> >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

> >> >>

> >> >>

> >> >

> >> > --

> >> > Richard Biener <rguenther@suse.de>

> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

> >>

> >>

> >

> > --

> > Richard Biener <rguenther@suse.de>

> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

> 

> 


-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Prathamesh Kulkarni Nov. 25, 2016, 8:31 a.m. | #10
On 25 November 2016 at 13:55, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:

>

>> On 25 November 2016 at 13:37, Richard Biener <rguenther@suse.de> wrote:

>> > On Fri, 25 Nov 2016, Prathamesh Kulkarni wrote:

>> >

>> >> On 24 November 2016 at 18:08, Richard Biener <rguenther@suse.de> wrote:

>> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

>> >> >

>> >> >> On 24 November 2016 at 17:48, Richard Biener <rguenther@suse.de> wrote:

>> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

>> >> >> >

>> >> >> >> On 24 November 2016 at 14:07, Richard Biener <rguenther@suse.de> wrote:

>> >> >> >> > On Thu, 24 Nov 2016, Prathamesh Kulkarni wrote:

>> >> >> >> >

>> >> >> >> >> Hi,

>> >> >> >> >> Consider following test-case:

>> >> >> >> >>

>> >> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

>> >> >> >> >> {

>> >> >> >> >>   __builtin_memcpy (a1, a2, a3);

>> >> >> >> >>   return a1;

>> >> >> >> >> }

>> >> >> >> >>

>> >> >> >> >> return a1 can be considered equivalent to return value of memcpy,

>> >> >> >> >> and the call could be emitted as a tail-call.

>> >> >> >> >> gcc doesn't emit the above call to memcpy as a tail-call,

>> >> >> >> >> but if it is changed to:

>> >> >> >> >>

>> >> >> >> >> void *t1 = __builtin_memcpy (a1, a2, a3);

>> >> >> >> >> return t1;

>> >> >> >> >>

>> >> >> >> >> Then memcpy is emitted as a tail-call.

>> >> >> >> >> The attached patch tries to handle the former case.

>> >> >> >> >>

>> >> >> >> >> Bootstrapped+tested on x86_64-unknown-linux-gnu.

>> >> >> >> >> Cross tested on arm*-*-*, aarch64*-*-*

>> >> >> >> >> Does this patch look OK ?

>> >> >> >> >

>> >> >> >> > +/* Return arg, if function returns it's argument or NULL if it doesn't.

>> >> >> >> > */

>> >> >> >> > +tree

>> >> >> >> > +gimple_call_return_arg (gcall *call_stmt)

>> >> >> >> > +{

>> >> >> >> >

>> >> >> >> >

>> >> >> >> > Please just inline it at the single use - the name is not terribly

>> >> >> >> > informative.

>> >> >> >> >

>> >> >> >> > I'm not sure you can rely on code-generation working if you not

>> >> >> >> > effectively change the IL to

>> >> >> >> >

>> >> >> >> >   a1 = __builtin_memcpy (a1, a2, a3);

>> >> >> >> >   return a1;

>> >> >> >> >

>> >> >> >> > someone more familiar with RTL expansion plus tail call emission on

>> >> >> >> > RTL needs to chime in.

>> >> >> >> Well I was trying to copy-propagate function's argument into uses of

>> >> >> >> it's return value if

>> >> >> >> function returned that argument, so the assignment to lhs of call

>> >> >> >> could be made redundant.

>> >> >> >>

>> >> >> >> eg:

>> >> >> >> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

>> >> >> >> {

>> >> >> >>   void *t1 = __builtin_memcpy (a1, a2, a3);

>> >> >> >>   return t1;

>> >> >> >> }

>> >> >> >>

>> >> >> >> After patch, copyprop transformed it into:

>> >> >> >> t1 = __builtin_memcpy (a1, a2, a3);

>> >> >> >> return a1;

>> >> >> >

>> >> >> > But that's a bad transform -- if we know that t1 == a1 then it's

>> >> >> > better to use t1 as that's readily available in the return register

>> >> >> > while the register for a1 might have been clobbered and thus we

>> >> >> > need to spill it for the later return.

>> >> >> Oh I didn't realize this could possibly pessimize RA.

>> >> >> For test-case:

>> >> >>

>> >> >> void *t1 = memcpy (dest, src, n);

>> >> >> if (t1 != dest)

>> >> >>   __builtin_abort ();

>> >> >>

>> >> >> we could copy-propagate t1 into cond_expr and make the condition redundant.

>> >> >> However I suppose this particular case could be handled with VRP instead

>> >> >> (t1 and dest should be marked equivalent) ?

>> >> >

>> >> > Yeah, exposing this to value-numbering in general can enable some

>> >> > optimizations (but I wouldn't put it in copyprop).  Note it's then

>> >> > difficult to avoid copy-propgating things...

>> >> >

>> >> > The user can also write

>> >> >

>> >> > void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

>> >> > {

>> >> >   __builtin_memcpy (a1, a2, a3);

>> >> >   return a1;

>> >> > }

>> >> >

>> >> > so it's good to improve code-gen for that (for the tailcall issue).

>> >> For the tail-call, issue should we artificially create a lhs and use that

>> >> as return value (perhaps by a separate pass before tailcall) ?

>> >>

>> >> __builtin_memcpy (a1, a2, a3);

>> >> return a1;

>> >>

>> >> gets transformed to:

>> >> _1 = __builtin_memcpy (a1, a2, a3)

>> >> return _1;

>> >>

>> >> So tail-call optimization pass would see the IL in it's expected form.

>> >

>> > As said, a RTL expert needs to chime in here.  Iff then tail-call

>> > itself should do this rewrite.  But if this form is required to make

>> > things work (I suppose you checked it _does_ actually work?) then

>> > we'd need to make sure later passes do not undo it.  So it looks

>> > fragile to me.  OTOH I seem to remember that the flags we set on

>> > GIMPLE are merely a hint to RTL expansion and the tailcalling is

>> > verified again there?

>>

>> Yeah, I verified the form works:

>> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

>> {

>>   void *t1 = __builtin_memcpy (a1, a2, a3);

>>   return t1;

>> }

>>

>> assembly:

>> f:

>> .LFB0:

>>         .cfi_startproc

>>         jmp     memcpy

>>         .cfi_endproc

>

> I meant the

>

> void *f(void *a1, void *a2, __SIZE_TYPE__ a3)

> {

>   __builtin_memcpy (a1, a2, a3);

>   return a1;

> }

>

> form after your patch to the tailcall pass.

Oops, sorry -;)
Yes, before the patch:
f:
.LFB0:
        .cfi_startproc
        subq    $8, %rsp
        .cfi_def_cfa_offset 16
        call    memcpy
        addq    $8, %rsp
        .cfi_def_cfa_offset 8
        ret
        .cfi_endproc

after patch:
f:
.LFB0:
        .cfi_startproc
        jmp     memcpy
        .cfi_endproc

Thanks,
Prathamesh
>

> Richard.

>

>> Thanks,

>> Prathamesh

>> >

>> > Thanks,

>> > Richard.

>> >

>> >> Thanks,

>> >> Prathamesh

>> >> >

>> >> > Richard.

>> >> >

>> >> >> Thanks,

>> >> >> Prathamesh

>> >> >> >

>> >> >> >> But this now interferes with tail-call optimization, because it is not

>> >> >> >> able to emit memcpy

>> >> >> >> as tail-call anymore due to which the patch regressed 20050503-1.c.

>> >> >> >> I am not sure how to workaround this.

>> >> >> >>

>> >> >> >> Thanks,

>> >> >> >> Prathamesh

>> >> >> >> >

>> >> >> >> > Richard.

>> >> >> >>

>> >> >> >

>> >> >> > --

>> >> >> > Richard Biener <rguenther@suse.de>

>> >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

>> >> >>

>> >> >>

>> >> >

>> >> > --

>> >> > Richard Biener <rguenther@suse.de>

>> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

>> >>

>> >>

>> >

>> > --

>> > Richard Biener <rguenther@suse.de>

>> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

>>

>>

>

> --

> Richard Biener <rguenther@suse.de>

> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Jeff Law Nov. 25, 2016, 3:47 p.m. | #11
On 11/25/2016 01:07 AM, Richard Biener wrote:

>> For the tail-call, issue should we artificially create a lhs and use that

>> as return value (perhaps by a separate pass before tailcall) ?

>>

>> __builtin_memcpy (a1, a2, a3);

>> return a1;

>>

>> gets transformed to:

>> _1 = __builtin_memcpy (a1, a2, a3)

>> return _1;

>>

>> So tail-call optimization pass would see the IL in it's expected form.

>

> As said, a RTL expert needs to chime in here.  Iff then tail-call

> itself should do this rewrite.  But if this form is required to make

> things work (I suppose you checked it _does_ actually work?) then

> we'd need to make sure later passes do not undo it.  So it looks

> fragile to me.  OTOH I seem to remember that the flags we set on

> GIMPLE are merely a hint to RTL expansion and the tailcalling is

> verified again there?

So tail calling actually sits on the border between trees and RTL. 
Essentially it's an expand-time decision as we use information from 
trees as well as low level target information.

I would not expect the former sequence to tail call.  The tail calling 
code does not know that the return value from memcpy will be a1.  Thus 
the tail calling code has to assume that it'll have to copy a1 into the 
return register after returning from memcpy, which obviously can't be 
done if we tail called memcpy.

The second form is much more likely to turn into a tail call sequence 
because the return value from memcpy will be sitting in the proper 
register.  This form out to work for most calling conventions that allow 
tail calls.

We could (in theory) try and exploit the fact that memcpy returns its 
first argument as a return value, but that would only be helpful on a 
target where the first argument and return value use the same register. 
So I'd have a slight preference to rewriting per Prathamesh's suggestion 
above since it's more general.


Jeff

Patch hide | download patch | download mbox

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-3.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-3.c
new file mode 100644
index 0000000..8319e9f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-3.c
@@ -0,0 +1,35 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-copyprop-slim" } */
+
+void f(char *dest, const char *src, __SIZE_TYPE__ n)
+{
+  char *t1 = __builtin_strcpy (dest, src);
+  if (t1 != dest)
+    __builtin_abort ();
+
+  char *t2 = __builtin_strncpy (dest, src, n);
+  if (t2 != dest)
+    __builtin_abort ();
+
+  char *t3 = __builtin_strcat (dest, src);
+  if (t3 != dest)
+    __builtin_abort ();
+
+  char *t4 = __builtin_strncat (dest, src, n);
+  if (t4 != dest)
+    __builtin_abort ();
+
+  void *t5 = __builtin_memmove (dest, src, n);
+  if (t5 != dest)
+    __builtin_abort ();
+
+  void *t6 = __builtin_memcpy (dest, src, n);
+  if (t6 != dest)
+    __builtin_abort ();
+
+  void *t7 = __builtin_memset (dest, 0, n);
+  if (t7 != dest)
+    __builtin_abort ();
+}
+
+/* { dg-final { scan-tree-dump-not "__builtin_abort" "copyprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-4.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-4.c
new file mode 100644
index 0000000..199074d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-copyprop-4.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-copyprop" } */
+
+char *f(char *dest, const char *src)
+{
+  char *t1 = __builtin_strcpy (dest, src);
+  return t1; 
+}
+
+/* { dg-final { scan-tree-dump "return dest_\[0-9\]*\\(D\\);" "copyprop1" } } */
diff --git a/gcc/tree-ssa-copy.c b/gcc/tree-ssa-copy.c
index abc6205..77f4ad7 100644
--- a/gcc/tree-ssa-copy.c
+++ b/gcc/tree-ssa-copy.c
@@ -80,6 +80,9 @@  stmt_may_generate_copy (gimple *stmt)
   if (gimple_code (stmt) == GIMPLE_PHI)
     return !SSA_NAME_OCCURS_IN_ABNORMAL_PHI (gimple_phi_result (stmt));
 
+  if (gimple_code (stmt) == GIMPLE_CALL)
+    return true;
+
   if (gimple_code (stmt) != GIMPLE_ASSIGN)
     return false;
 
@@ -252,6 +255,40 @@  copy_prop_visit_cond_stmt (gimple *stmt, edge *taken_edge_p)
   return retval;
 }
 
+/* Check if call returns one of it's arguments, and in that
+   case set copy-of (lhs) to the returned argument.  */
+
+static enum ssa_prop_result
+copy_prop_visit_call (gcall *call_stmt, tree *result_p)
+{
+  tree lhs = gimple_call_lhs (call_stmt);
+  if (!lhs || TREE_CODE (lhs) != SSA_NAME)
+    return SSA_PROP_VARYING;
+
+  unsigned rf = gimple_call_return_flags (call_stmt);
+  if (rf & ERF_RETURNS_ARG)
+    {
+      unsigned argnum = rf & ERF_RETURN_ARG_MASK;
+      if (argnum < gimple_call_num_args (call_stmt))
+	{
+	  tree arg = gimple_call_arg (call_stmt, argnum);
+	  if (TREE_CODE (arg) == SSA_NAME)
+	    {
+	      arg = valueize_val (arg);
+	      if (!may_propagate_copy (lhs, arg))
+		return SSA_PROP_VARYING;
+
+	      *result_p = lhs;
+	      if (set_copy_of_val (*result_p, arg))
+		return SSA_PROP_INTERESTING;
+	      else
+		return SSA_PROP_NOT_INTERESTING;
+	    }
+	}
+    }
+
+  return SSA_PROP_VARYING;
+}
 
 /* Evaluate statement STMT.  If the statement produces a new output
    value, return SSA_PROP_INTERESTING and store the SSA_NAME holding
@@ -290,6 +327,8 @@  copy_prop_visit_stmt (gimple *stmt, edge *taken_edge_p, tree *result_p)
 	 jump.  */
       retval = copy_prop_visit_cond_stmt (stmt, taken_edge_p);
     }
+  else if (gcall *call_stmt = dyn_cast<gcall *> (stmt))
+    retval = copy_prop_visit_call (call_stmt, result_p);
   else
     retval = SSA_PROP_VARYING;