Message ID | CAAgBjMmjBidSVB02pv8oW5ZJhxP-_yAK9nC6e8KxkJz-EgxK7Q@mail.gmail.com |
---|---|
State | Superseded |
Headers | show |
On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > On 1 December 2016 at 18:26, Richard Biener <rguenther@suse.de> wrote: > > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > > > >> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote: > >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > >> > > >> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote: > >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote: > >> >> > > >> >> >>> For the tail-call, issue should we artificially create a lhs and use that > >> >> >>> as return value (perhaps by a separate pass before tailcall) ? > >> >> >>> > >> >> >>> __builtin_memcpy (a1, a2, a3); > >> >> >>> return a1; > >> >> >>> > >> >> >>> gets transformed to: > >> >> >>> _1 = __builtin_memcpy (a1, a2, a3) > >> >> >>> return _1; > >> >> >>> > >> >> >>> So tail-call optimization pass would see the IL in it's expected form. > >> >> >> > >> >> >> > >> >> >> As said, a RTL expert needs to chime in here. Iff then tail-call > >> >> >> itself should do this rewrite. But if this form is required to make > >> >> >> things work (I suppose you checked it _does_ actually work?) then > >> >> >> we'd need to make sure later passes do not undo it. So it looks > >> >> >> fragile to me. OTOH I seem to remember that the flags we set on > >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is > >> >> >> verified again there? > >> >> > > >> >> > So tail calling actually sits on the border between trees and RTL. > >> >> > Essentially it's an expand-time decision as we use information from trees as > >> >> > well as low level target information. > >> >> > > >> >> > I would not expect the former sequence to tail call. The tail calling code > >> >> > does not know that the return value from memcpy will be a1. Thus the tail > >> >> > calling code has to assume that it'll have to copy a1 into the return > >> >> > register after returning from memcpy, which obviously can't be done if we > >> >> > tail called memcpy. > >> >> > > >> >> > The second form is much more likely to turn into a tail call sequence > >> >> > because the return value from memcpy will be sitting in the proper register. > >> >> > This form out to work for most calling conventions that allow tail calls. > >> >> > > >> >> > We could (in theory) try and exploit the fact that memcpy returns its first > >> >> > argument as a return value, but that would only be helpful on a target where > >> >> > the first argument and return value use the same register. So I'd have a > >> >> > slight preference to rewriting per Prathamesh's suggestion above since it's > >> >> > more general. > >> >> Thanks for the suggestion. The attached patch creates artificial lhs, > >> >> and returns it if the function returns it's argument and that argument > >> >> is used as return-value. > >> >> > >> >> eg: > >> >> f (void * a1, void * a2, long unsigned int a3) > >> >> { > >> >> <bb 2> [0.0%]: > >> >> # .MEM_5 = VDEF <.MEM_1(D)> > >> >> __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); > >> >> # VUSE <.MEM_5> > >> >> return a1_2(D); > >> >> > >> >> } > >> >> > >> >> is transformed to: > >> >> f (void * a1, void * a2, long unsigned int a3) > >> >> { > >> >> void * _6; > >> >> > >> >> <bb 2> [0.0%]: > >> >> # .MEM_5 = VDEF <.MEM_1(D)> > >> >> _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); > >> >> # VUSE <.MEM_5> > >> >> return _6; > >> >> > >> >> } > >> >> > >> >> While testing, I came across an issue with function f() defined > >> >> intail-padding1.C: > >> >> struct X > >> >> { > >> >> ~X() {} > >> >> int n; > >> >> char d; > >> >> }; > >> >> > >> >> X f() > >> >> { > >> >> X nrvo; > >> >> __builtin_memset (&nrvo, 0, sizeof(X)); > >> >> return nrvo; > >> >> } > >> >> > >> >> input to the pass: > >> >> X f() () > >> >> { > >> >> <bb 2> [0.0%]: > >> >> # .MEM_3 = VDEF <.MEM_1(D)> > >> >> __builtin_memset (nrvo_2(D), 0, 8); > >> >> # VUSE <.MEM_3> > >> >> return nrvo_2(D); > >> >> > >> >> } > >> >> > >> >> verify_gimple_return failed with: > >> >> tail-padding1.C:13:1: error: invalid conversion in return statement > >> >> } > >> >> ^ > >> >> struct X > >> >> > >> >> struct X & > >> >> > >> >> # VUSE <.MEM_3> > >> >> return _4; > >> >> > >> >> It seems the return type of function (struct X) differs with the type > >> >> of return value (struct X&). > >> >> Not sure how this is possible ? > >> > > >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT. > >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)) > >> resolved the error. > >> Does the attached version look OK ? > > > > + ass_var = make_ssa_name (TREE_TYPE (arg)); > > > > can you try > > > > ass_var = copy_ssa_name (arg); > > > > instead? That way the underlying decl should make sure the > > DECL_BY_REFERENCE check in the IL verification works. > Done in the attached version and verified tail-padding1.C passes with > the change. > Does it look OK ? + if (!ass_var) + { + /* Check if function returns one if it's arguments + and that argument is used as return value. + In that case create an artificial lhs to call_stmt, + and set it as the return value. */ + + unsigned rf = gimple_call_return_flags (call); + if (rf & ERF_RETURNS_ARG) + { + unsigned argnum = rf & ERF_RETURN_ARG_MASK; + if (argnum < gimple_call_num_args (call) + && ret_stmt) check && ret_stmt above together with !ass_var. + { + tree arg = gimple_call_arg (call, argnum); + tree retval = gimple_return_retval (ret_stmt); + if (retval + && TREE_CODE (retval) == SSA_NAME + && operand_equal_p (retval, arg, 0) + && !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))) the DECL_BY_REFERENCE check can be removed now (I think). Richard. > Bootstrap+test in progress on x86_64-unknown-linux-gnu. > > Thanks, > Prathamesh > > > > Thanks, > > Richard. > > > > > >> Validation in progress. > >> > >> Thanks, > >> Prathamesh > >> > > >> >> To work around that, I guarded the transform on: > >> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)), > >> >> TREE_TYPE (retval))) > >> >> > >> >> in the patch. Does that look OK ? > >> >> > >> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada. > >> >> Cross-tested on arm*-*-*, aarch64*-*-*. > >> >> > >> >> Thanks, > >> >> Prathamesh > >> >> > > >> >> > > >> >> > Jeff > >> >> > >> > > >> > -- > >> > Richard Biener <rguenther@suse.de> > >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) > >> > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On 1 December 2016 at 18:38, Richard Biener <rguenther@suse.de> wrote: > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > >> On 1 December 2016 at 18:26, Richard Biener <rguenther@suse.de> wrote: >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: >> > >> >> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote: >> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: >> >> > >> >> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote: >> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote: >> >> >> > >> >> >> >>> For the tail-call, issue should we artificially create a lhs and use that >> >> >> >>> as return value (perhaps by a separate pass before tailcall) ? >> >> >> >>> >> >> >> >>> __builtin_memcpy (a1, a2, a3); >> >> >> >>> return a1; >> >> >> >>> >> >> >> >>> gets transformed to: >> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3) >> >> >> >>> return _1; >> >> >> >>> >> >> >> >>> So tail-call optimization pass would see the IL in it's expected form. >> >> >> >> >> >> >> >> >> >> >> >> As said, a RTL expert needs to chime in here. Iff then tail-call >> >> >> >> itself should do this rewrite. But if this form is required to make >> >> >> >> things work (I suppose you checked it _does_ actually work?) then >> >> >> >> we'd need to make sure later passes do not undo it. So it looks >> >> >> >> fragile to me. OTOH I seem to remember that the flags we set on >> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is >> >> >> >> verified again there? >> >> >> > >> >> >> > So tail calling actually sits on the border between trees and RTL. >> >> >> > Essentially it's an expand-time decision as we use information from trees as >> >> >> > well as low level target information. >> >> >> > >> >> >> > I would not expect the former sequence to tail call. The tail calling code >> >> >> > does not know that the return value from memcpy will be a1. Thus the tail >> >> >> > calling code has to assume that it'll have to copy a1 into the return >> >> >> > register after returning from memcpy, which obviously can't be done if we >> >> >> > tail called memcpy. >> >> >> > >> >> >> > The second form is much more likely to turn into a tail call sequence >> >> >> > because the return value from memcpy will be sitting in the proper register. >> >> >> > This form out to work for most calling conventions that allow tail calls. >> >> >> > >> >> >> > We could (in theory) try and exploit the fact that memcpy returns its first >> >> >> > argument as a return value, but that would only be helpful on a target where >> >> >> > the first argument and return value use the same register. So I'd have a >> >> >> > slight preference to rewriting per Prathamesh's suggestion above since it's >> >> >> > more general. >> >> >> Thanks for the suggestion. The attached patch creates artificial lhs, >> >> >> and returns it if the function returns it's argument and that argument >> >> >> is used as return-value. >> >> >> >> >> >> eg: >> >> >> f (void * a1, void * a2, long unsigned int a3) >> >> >> { >> >> >> <bb 2> [0.0%]: >> >> >> # .MEM_5 = VDEF <.MEM_1(D)> >> >> >> __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); >> >> >> # VUSE <.MEM_5> >> >> >> return a1_2(D); >> >> >> >> >> >> } >> >> >> >> >> >> is transformed to: >> >> >> f (void * a1, void * a2, long unsigned int a3) >> >> >> { >> >> >> void * _6; >> >> >> >> >> >> <bb 2> [0.0%]: >> >> >> # .MEM_5 = VDEF <.MEM_1(D)> >> >> >> _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); >> >> >> # VUSE <.MEM_5> >> >> >> return _6; >> >> >> >> >> >> } >> >> >> >> >> >> While testing, I came across an issue with function f() defined >> >> >> intail-padding1.C: >> >> >> struct X >> >> >> { >> >> >> ~X() {} >> >> >> int n; >> >> >> char d; >> >> >> }; >> >> >> >> >> >> X f() >> >> >> { >> >> >> X nrvo; >> >> >> __builtin_memset (&nrvo, 0, sizeof(X)); >> >> >> return nrvo; >> >> >> } >> >> >> >> >> >> input to the pass: >> >> >> X f() () >> >> >> { >> >> >> <bb 2> [0.0%]: >> >> >> # .MEM_3 = VDEF <.MEM_1(D)> >> >> >> __builtin_memset (nrvo_2(D), 0, 8); >> >> >> # VUSE <.MEM_3> >> >> >> return nrvo_2(D); >> >> >> >> >> >> } >> >> >> >> >> >> verify_gimple_return failed with: >> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement >> >> >> } >> >> >> ^ >> >> >> struct X >> >> >> >> >> >> struct X & >> >> >> >> >> >> # VUSE <.MEM_3> >> >> >> return _4; >> >> >> >> >> >> It seems the return type of function (struct X) differs with the type >> >> >> of return value (struct X&). >> >> >> Not sure how this is possible ? >> >> > >> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT. >> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)) >> >> resolved the error. >> >> Does the attached version look OK ? >> > >> > + ass_var = make_ssa_name (TREE_TYPE (arg)); >> > >> > can you try >> > >> > ass_var = copy_ssa_name (arg); >> > >> > instead? That way the underlying decl should make sure the >> > DECL_BY_REFERENCE check in the IL verification works. >> Done in the attached version and verified tail-padding1.C passes with >> the change. >> Does it look OK ? > > + if (!ass_var) > + { > + /* Check if function returns one if it's arguments > + and that argument is used as return value. > + In that case create an artificial lhs to call_stmt, > + and set it as the return value. */ > + > + unsigned rf = gimple_call_return_flags (call); > + if (rf & ERF_RETURNS_ARG) > + { > + unsigned argnum = rf & ERF_RETURN_ARG_MASK; > + if (argnum < gimple_call_num_args (call) > + && ret_stmt) > > check && ret_stmt above together with !ass_var. Oops, sorry about that. > > + { > + tree arg = gimple_call_arg (call, argnum); > + tree retval = gimple_return_retval (ret_stmt); > + if (retval > + && TREE_CODE (retval) == SSA_NAME > + && operand_equal_p (retval, arg, 0) > + && !DECL_BY_REFERENCE (DECL_RESULT > (cfun->decl))) > > the DECL_BY_REFERENCE check can be removed now (I think). Well after removing DECL_BY_REFERENCE, verify_gimple still fails but differently: tail-padding1.C:13:1: error: RESULT_DECL should be read only when DECL_BY_REFERENCE is set } ^ while verifying SSA_NAME nrvo_4 in statement # .MEM_3 = VDEF <.MEM_1(D)> nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8); tail-padding1.C:13:1: internal compiler error: verify_ssa failed Thanks, Prathamesh > > Richard. > >> Bootstrap+test in progress on x86_64-unknown-linux-gnu. >> >> Thanks, >> Prathamesh >> > >> > Thanks, >> > Richard. >> > >> > >> >> Validation in progress. >> >> >> >> Thanks, >> >> Prathamesh >> >> > >> >> >> To work around that, I guarded the transform on: >> >> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)), >> >> >> TREE_TYPE (retval))) >> >> >> >> >> >> in the patch. Does that look OK ? >> >> >> >> >> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada. >> >> >> Cross-tested on arm*-*-*, aarch64*-*-*. >> >> >> >> >> >> Thanks, >> >> >> Prathamesh >> >> >> > >> >> >> > >> >> >> > Jeff >> >> >> >> >> > >> >> > -- >> >> > Richard Biener <rguenther@suse.de> >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) >> >> >> > >> > -- >> > Richard Biener <rguenther@suse.de> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) >> > > -- > Richard Biener <rguenther@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > On 1 December 2016 at 18:38, Richard Biener <rguenther@suse.de> wrote: > > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > > > >> On 1 December 2016 at 18:26, Richard Biener <rguenther@suse.de> wrote: > >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > >> > > >> >> On 1 December 2016 at 17:40, Richard Biener <rguenther@suse.de> wrote: > >> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > >> >> > > >> >> >> On 25 November 2016 at 21:17, Jeff Law <law@redhat.com> wrote: > >> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote: > >> >> >> > > >> >> >> >>> For the tail-call, issue should we artificially create a lhs and use that > >> >> >> >>> as return value (perhaps by a separate pass before tailcall) ? > >> >> >> >>> > >> >> >> >>> __builtin_memcpy (a1, a2, a3); > >> >> >> >>> return a1; > >> >> >> >>> > >> >> >> >>> gets transformed to: > >> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3) > >> >> >> >>> return _1; > >> >> >> >>> > >> >> >> >>> So tail-call optimization pass would see the IL in it's expected form. > >> >> >> >> > >> >> >> >> > >> >> >> >> As said, a RTL expert needs to chime in here. Iff then tail-call > >> >> >> >> itself should do this rewrite. But if this form is required to make > >> >> >> >> things work (I suppose you checked it _does_ actually work?) then > >> >> >> >> we'd need to make sure later passes do not undo it. So it looks > >> >> >> >> fragile to me. OTOH I seem to remember that the flags we set on > >> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is > >> >> >> >> verified again there? > >> >> >> > > >> >> >> > So tail calling actually sits on the border between trees and RTL. > >> >> >> > Essentially it's an expand-time decision as we use information from trees as > >> >> >> > well as low level target information. > >> >> >> > > >> >> >> > I would not expect the former sequence to tail call. The tail calling code > >> >> >> > does not know that the return value from memcpy will be a1. Thus the tail > >> >> >> > calling code has to assume that it'll have to copy a1 into the return > >> >> >> > register after returning from memcpy, which obviously can't be done if we > >> >> >> > tail called memcpy. > >> >> >> > > >> >> >> > The second form is much more likely to turn into a tail call sequence > >> >> >> > because the return value from memcpy will be sitting in the proper register. > >> >> >> > This form out to work for most calling conventions that allow tail calls. > >> >> >> > > >> >> >> > We could (in theory) try and exploit the fact that memcpy returns its first > >> >> >> > argument as a return value, but that would only be helpful on a target where > >> >> >> > the first argument and return value use the same register. So I'd have a > >> >> >> > slight preference to rewriting per Prathamesh's suggestion above since it's > >> >> >> > more general. > >> >> >> Thanks for the suggestion. The attached patch creates artificial lhs, > >> >> >> and returns it if the function returns it's argument and that argument > >> >> >> is used as return-value. > >> >> >> > >> >> >> eg: > >> >> >> f (void * a1, void * a2, long unsigned int a3) > >> >> >> { > >> >> >> <bb 2> [0.0%]: > >> >> >> # .MEM_5 = VDEF <.MEM_1(D)> > >> >> >> __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); > >> >> >> # VUSE <.MEM_5> > >> >> >> return a1_2(D); > >> >> >> > >> >> >> } > >> >> >> > >> >> >> is transformed to: > >> >> >> f (void * a1, void * a2, long unsigned int a3) > >> >> >> { > >> >> >> void * _6; > >> >> >> > >> >> >> <bb 2> [0.0%]: > >> >> >> # .MEM_5 = VDEF <.MEM_1(D)> > >> >> >> _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); > >> >> >> # VUSE <.MEM_5> > >> >> >> return _6; > >> >> >> > >> >> >> } > >> >> >> > >> >> >> While testing, I came across an issue with function f() defined > >> >> >> intail-padding1.C: > >> >> >> struct X > >> >> >> { > >> >> >> ~X() {} > >> >> >> int n; > >> >> >> char d; > >> >> >> }; > >> >> >> > >> >> >> X f() > >> >> >> { > >> >> >> X nrvo; > >> >> >> __builtin_memset (&nrvo, 0, sizeof(X)); > >> >> >> return nrvo; > >> >> >> } > >> >> >> > >> >> >> input to the pass: > >> >> >> X f() () > >> >> >> { > >> >> >> <bb 2> [0.0%]: > >> >> >> # .MEM_3 = VDEF <.MEM_1(D)> > >> >> >> __builtin_memset (nrvo_2(D), 0, 8); > >> >> >> # VUSE <.MEM_3> > >> >> >> return nrvo_2(D); > >> >> >> > >> >> >> } > >> >> >> > >> >> >> verify_gimple_return failed with: > >> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement > >> >> >> } > >> >> >> ^ > >> >> >> struct X > >> >> >> > >> >> >> struct X & > >> >> >> > >> >> >> # VUSE <.MEM_3> > >> >> >> return _4; > >> >> >> > >> >> >> It seems the return type of function (struct X) differs with the type > >> >> >> of return value (struct X&). > >> >> >> Not sure how this is possible ? > >> >> > > >> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT. > >> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)) > >> >> resolved the error. > >> >> Does the attached version look OK ? > >> > > >> > + ass_var = make_ssa_name (TREE_TYPE (arg)); > >> > > >> > can you try > >> > > >> > ass_var = copy_ssa_name (arg); > >> > > >> > instead? That way the underlying decl should make sure the > >> > DECL_BY_REFERENCE check in the IL verification works. > >> Done in the attached version and verified tail-padding1.C passes with > >> the change. > >> Does it look OK ? > > > > + if (!ass_var) > > + { > > + /* Check if function returns one if it's arguments > > + and that argument is used as return value. > > + In that case create an artificial lhs to call_stmt, > > + and set it as the return value. */ > > + > > + unsigned rf = gimple_call_return_flags (call); > > + if (rf & ERF_RETURNS_ARG) > > + { > > + unsigned argnum = rf & ERF_RETURN_ARG_MASK; > > + if (argnum < gimple_call_num_args (call) > > + && ret_stmt) > > > > check && ret_stmt above together with !ass_var. > Oops, sorry about that. > > > > + { > > + tree arg = gimple_call_arg (call, argnum); > > + tree retval = gimple_return_retval (ret_stmt); > > + if (retval > > + && TREE_CODE (retval) == SSA_NAME > > + && operand_equal_p (retval, arg, 0) > > + && !DECL_BY_REFERENCE (DECL_RESULT > > (cfun->decl))) > > > > the DECL_BY_REFERENCE check can be removed now (I think). > Well after removing DECL_BY_REFERENCE, verify_gimple still fails but > differently: > > tail-padding1.C:13:1: error: RESULT_DECL should be read only when > DECL_BY_REFERENCE is set > } > ^ > while verifying SSA_NAME nrvo_4 in statement > # .MEM_3 = VDEF <.MEM_1(D)> > nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8); > tail-padding1.C:13:1: internal compiler error: verify_ssa failed Hmm, ok. Not sure why we enforce this. Note that in the end this patch looks fishy -- iff we really need the LHS on the assignment for correctness if we have the tailcall flag set then what guarantees that later passes do not remove it again? So anybody removing a LHS would need to unset the tailcall flag? Saying again that I don't know enough about the RTL part of tailcall expansion. Richard. > Thanks, > Prathamesh > > > > Richard. > > > >> Bootstrap+test in progress on x86_64-unknown-linux-gnu. > >> > >> Thanks, > >> Prathamesh > >> > > >> > Thanks, > >> > Richard. > >> > > >> > > >> >> Validation in progress. > >> >> > >> >> Thanks, > >> >> Prathamesh > >> >> > > >> >> >> To work around that, I guarded the transform on: > >> >> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)), > >> >> >> TREE_TYPE (retval))) > >> >> >> > >> >> >> in the patch. Does that look OK ? > >> >> >> > >> >> >> Bootstrap+tested on x86_64-unknown-linux-gnu with --enable-languages=all,ada. > >> >> >> Cross-tested on arm*-*-*, aarch64*-*-*. > >> >> >> > >> >> >> Thanks, > >> >> >> Prathamesh > >> >> >> > > >> >> >> > > >> >> >> > Jeff > >> >> >> > >> >> > > >> >> > -- > >> >> > Richard Biener <rguenther@suse.de> > >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) > >> >> > >> > > >> > -- > >> > Richard Biener <rguenther@suse.de> > >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) > >> > > > > -- > > Richard Biener <rguenther@suse.de> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
On 12/01/2016 06:22 AM, Richard Biener wrote: >> Well after removing DECL_BY_REFERENCE, verify_gimple still fails but >> differently: >> >> tail-padding1.C:13:1: error: RESULT_DECL should be read only when >> DECL_BY_REFERENCE is set >> } >> ^ >> while verifying SSA_NAME nrvo_4 in statement >> # .MEM_3 = VDEF <.MEM_1(D)> >> nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8); >> tail-padding1.C:13:1: internal compiler error: verify_ssa failed > > Hmm, ok. Not sure why we enforce this. I don't know either. But I would start by looking at tree-nrv.c since it looks (based on the variable names) that the named-value-return optimization kicked in. > > Note that in the end this patch looks fishy -- iff we really need > the LHS on the assignment for correctness if we have the tailcall > flag set then what guarantees that later passes do not remove it > again? So anybody removing a LHS would need to unset the tailcall flag? > > Saying again that I don't know enough about the RTL part of tailcall > expansion. The LHS on the assignment makes it easier to identify when a tail call is possible. It's not needed for correctness. Not having the LHS on the assignment just means we won't get an optimized tail call. Under what circumstances would the LHS possibly be removed? We know the return statement references the LHS, so it's not going to be something that DCE will do. jeff
On Thu, 1 Dec 2016, Jeff Law wrote: > On 12/01/2016 06:22 AM, Richard Biener wrote: > > > Well after removing DECL_BY_REFERENCE, verify_gimple still fails but > > > differently: > > > > > > tail-padding1.C:13:1: error: RESULT_DECL should be read only when > > > DECL_BY_REFERENCE is set > > > } > > > ^ > > > while verifying SSA_NAME nrvo_4 in statement > > > # .MEM_3 = VDEF <.MEM_1(D)> > > > nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8); > > > tail-padding1.C:13:1: internal compiler error: verify_ssa failed > > > > Hmm, ok. Not sure why we enforce this. > I don't know either. But I would start by looking at tree-nrv.c since it > looks (based on the variable names) that the named-value-return optimization > kicked in. > > > > > Note that in the end this patch looks fishy -- iff we really need > > the LHS on the assignment for correctness if we have the tailcall > > flag set then what guarantees that later passes do not remove it > > again? So anybody removing a LHS would need to unset the tailcall flag? > > > > Saying again that I don't know enough about the RTL part of tailcall > > expansion. > The LHS on the assignment makes it easier to identify when a tail call is > possible. It's not needed for correctness. Not having the LHS on the > assignment just means we won't get an optimized tail call. > > Under what circumstances would the LHS possibly be removed? We know the > return statement references the LHS, so it's not going to be something that > DCE will do. Well, I thought Prathamesh added the optimization to copy-propagate the lhs from the returned argument. So we'd have both transforms here. Of course as always the user could have written the code in this way. If the LHS is not required for correctness then I don't think we need to put it there - Pratamesh verified the call is tail-called already if marked by the tailcall pass, even if the LHS is not present. Richard.
On 12/02/2016 01:33 AM, Richard Biener wrote: >> The LHS on the assignment makes it easier to identify when a tail call is >> possible. It's not needed for correctness. Not having the LHS on the >> assignment just means we won't get an optimized tail call. >> >> Under what circumstances would the LHS possibly be removed? We know the >> return statement references the LHS, so it's not going to be something that >> DCE will do. > > Well, I thought Prathamesh added the optimization to copy-propagate > the lhs from the returned argument. So we'd have both transforms here. That seems like a mistake -- the fact that we can copy propagate the LHS from the returned argument is interesting, but in practice I've found it to not be useful to do so. The problem is it makes the value look live across a the call and we're then dependent upon the register allocator to know the trick about the returned argument value and apply it consistently -- which it does not last I checked. I think we're better off leaving the call in the form of LHS = call () if the return value is used. That's going to be more palatable to tail calling. > > Of course as always the user could have written the code in this way. > > If the LHS is not required for correctness then I don't think we need > to put it there - Pratamesh verified the call is tail-called already > if marked by the tailcall pass, even if the LHS is not present. But if the function returns the value from the tail call, then going through an LHS is the right thing to do. Using the magic "argX will be the return value" seems clever, but actually hurts in practice. jeff
On Mon, 5 Dec 2016, Jeff Law wrote: > On 12/02/2016 01:33 AM, Richard Biener wrote: > > > The LHS on the assignment makes it easier to identify when a tail call is > > > possible. It's not needed for correctness. Not having the LHS on the > > > assignment just means we won't get an optimized tail call. > > > > > > Under what circumstances would the LHS possibly be removed? We know the > > > return statement references the LHS, so it's not going to be something > > > that > > > DCE will do. > > > > Well, I thought Prathamesh added the optimization to copy-propagate > > the lhs from the returned argument. So we'd have both transforms here. > That seems like a mistake -- the fact that we can copy propagate the LHS from > the returned argument is interesting, but in practice I've found it to not be > useful to do so. > > The problem is it makes the value look live across a the call and we're then > dependent upon the register allocator to know the trick about the returned > argument value and apply it consistently -- which it does not last I checked. > > I think we're better off leaving the call in the form of LHS = call () if the > return value is used. That's going to be more palatable to tail calling. Yes, that's something I also raised earlier in the thread. Note that any kind of value-numbering probably wants to know the equivalence for simplifications but in the end wants to disable propagating the copy (in fact it should propagate the return value from the point of the call). I suppose I know how to implement that in FRE/PRE given it has separate value-numbering and elimination phases. Something for GCC 8. > > Of course as always the user could have written the code in this way. > > > > If the LHS is not required for correctness then I don't think we need > > to put it there - Pratamesh verified the call is tail-called already > > if marked by the tailcall pass, even if the LHS is not present. > But if the function returns the value from the tail call, then going through > an LHS is the right thing to do. Using the magic "argX will be the return > value" seems clever, but actually hurts in practice. So we do want the reverse transform (for the case of returning by reference that's going to be tricky if not impossible due to the IL hygiene we enforce). Richard.
On 12/06/2016 01:25 AM, Richard Biener wrote: >> But if the function returns the value from the tail call, then going through >> an LHS is the right thing to do. Using the magic "argX will be the return >> value" seems clever, but actually hurts in practice. > > So we do want the reverse transform (for the case of returning by > reference that's going to be tricky if not impossible due to the > IL hygiene we enforce). It might be useful, but I'd like to see instrumentation before doing any significant work on this problem. jeff
On 12/06/2016 03:16 AM, Richard Biener wrote: > On Tue, 6 Dec 2016, Richard Biener wrote: > >> On Mon, 5 Dec 2016, Jeff Law wrote: >> >>> On 12/02/2016 01:33 AM, Richard Biener wrote: >>>>> The LHS on the assignment makes it easier to identify when a tail call is >>>>> possible. It's not needed for correctness. Not having the LHS on the >>>>> assignment just means we won't get an optimized tail call. >>>>> >>>>> Under what circumstances would the LHS possibly be removed? We know the >>>>> return statement references the LHS, so it's not going to be something >>>>> that >>>>> DCE will do. >>>> >>>> Well, I thought Prathamesh added the optimization to copy-propagate >>>> the lhs from the returned argument. So we'd have both transforms here. >>> That seems like a mistake -- the fact that we can copy propagate the LHS from >>> the returned argument is interesting, but in practice I've found it to not be >>> useful to do so. >>> >>> The problem is it makes the value look live across a the call and we're then >>> dependent upon the register allocator to know the trick about the returned >>> argument value and apply it consistently -- which it does not last I checked. >>> >>> I think we're better off leaving the call in the form of LHS = call () if the >>> return value is used. That's going to be more palatable to tail calling. >> >> Yes, that's something I also raised earlier in the thread. Note that >> any kind of value-numbering probably wants to know the equivalence >> for simplifications but in the end wants to disable propagating the >> copy (in fact it should propagate the return value from the point of >> the call). I suppose I know how to implement that in FRE/PRE given it has >> separate value-numbering and elimination phases. Something for GCC 8. > > The following does that (it shows we don't handle separating LHS > and overall stmt effect very well). It optimizes a testcase like > > void *foo (void *p, int c, __SIZE_TYPE__ n) > { > void *q = __builtin_memset (p, c, n); > if (q == p) > return p; > return q; > } > > to > > foo (void * p, int c, long unsigned int n) > { > void * q; > > <bb 2> [0.0%]: > q_7 = __builtin_memset (p_3(D), c_4(D), n_5(D)); > return q_7; > > in early FRE. Yea. Not sure how often something like that would happen in practice, but using the equivalence to simplify rather than for propagation seems like the way to go. I keep thinking about doing some similar in DOM, but haven't gotten around to seeing what the fallout would be. jeff
On Wed, 7 Dec 2016, Jeff Law wrote: > On 12/06/2016 03:16 AM, Richard Biener wrote: > > On Tue, 6 Dec 2016, Richard Biener wrote: > > > > > On Mon, 5 Dec 2016, Jeff Law wrote: > > > > > > > On 12/02/2016 01:33 AM, Richard Biener wrote: > > > > > > The LHS on the assignment makes it easier to identify when a tail > > > > > > call is > > > > > > possible. It's not needed for correctness. Not having the LHS on > > > > > > the > > > > > > assignment just means we won't get an optimized tail call. > > > > > > > > > > > > Under what circumstances would the LHS possibly be removed? We know > > > > > > the > > > > > > return statement references the LHS, so it's not going to be > > > > > > something > > > > > > that > > > > > > DCE will do. > > > > > > > > > > Well, I thought Prathamesh added the optimization to copy-propagate > > > > > the lhs from the returned argument. So we'd have both transforms > > > > > here. > > > > That seems like a mistake -- the fact that we can copy propagate the LHS > > > > from > > > > the returned argument is interesting, but in practice I've found it to > > > > not be > > > > useful to do so. > > > > > > > > The problem is it makes the value look live across a the call and we're > > > > then > > > > dependent upon the register allocator to know the trick about the > > > > returned > > > > argument value and apply it consistently -- which it does not last I > > > > checked. > > > > > > > > I think we're better off leaving the call in the form of LHS = call () > > > > if the > > > > return value is used. That's going to be more palatable to tail > > > > calling. > > > > > > Yes, that's something I also raised earlier in the thread. Note that > > > any kind of value-numbering probably wants to know the equivalence > > > for simplifications but in the end wants to disable propagating the > > > copy (in fact it should propagate the return value from the point of > > > the call). I suppose I know how to implement that in FRE/PRE given it has > > > separate value-numbering and elimination phases. Something for GCC 8. > > > > The following does that (it shows we don't handle separating LHS > > and overall stmt effect very well). It optimizes a testcase like > > > > void *foo (void *p, int c, __SIZE_TYPE__ n) > > { > > void *q = __builtin_memset (p, c, n); > > if (q == p) > > return p; > > return q; > > } > > > > to > > > > foo (void * p, int c, long unsigned int n) > > { > > void * q; > > > > <bb 2> [0.0%]: > > q_7 = __builtin_memset (p_3(D), c_4(D), n_5(D)); > > return q_7; > > > > in early FRE. > Yea. Not sure how often something like that would happen in practice, but > using the equivalence to simplify rather than for propagation seems like the > way to go. > > I keep thinking about doing some similar in DOM, but haven't gotten around to > seeing what the fallout would be. Shouldn't be too bad (it would require to keep an additional what-to-substitute-for-value-X lattice during the DOM walk). But it will still require some "magic" to decide about those conditional equivalences... (I think). Separating "values" from what we substitute during elimination is a good thing in general, so we can be more aggressive with the value parts.
On 12/09/2016 01:10 AM, Richard Biener wrote: >> Yea. Not sure how often something like that would happen in practice, but >> using the equivalence to simplify rather than for propagation seems like the >> way to go. >> >> I keep thinking about doing some similar in DOM, but haven't gotten around to >> seeing what the fallout would be. > > Shouldn't be too bad (it would require to keep an additional > what-to-substitute-for-value-X lattice during the DOM walk). But it > will still require some "magic" to decide about those conditional > equivalences... (I think). > > Separating "values" from what we substitute during elimination is a good > thing in general, so we can be more aggressive with the value parts. I think we'd just need to change SSA_NAME_VALUE from a vector of trees to a pair representation or to have an on-the-side bitmap to indicate values that can be used for simplification but not for propagation. It shouldn't be terrible. jeff
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c new file mode 100644 index 0000000..b3fdc6c --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/tailcall-9.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-tailc-details" } */ + +void *f(void *a1, void *a2, __SIZE_TYPE__ a3) +{ + __builtin_memcpy (a1, a2, a3); + return a1; +} + +/* { dg-final { scan-tree-dump-times "Found tail call" 1 "tailc" } } */ diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c index 66a0a4c..6135dc2 100644 --- a/gcc/tree-tailcall.c +++ b/gcc/tree-tailcall.c @@ -401,6 +401,7 @@ find_tail_calls (basic_block bb, struct tailcall **ret) basic_block abb; size_t idx; tree var; + greturn *ret_stmt = NULL; if (!single_succ_p (bb)) return; @@ -408,6 +409,8 @@ find_tail_calls (basic_block bb, struct tailcall **ret) for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi)) { stmt = gsi_stmt (gsi); + if (!ret_stmt) + ret_stmt = dyn_cast<greturn *> (stmt); /* Ignore labels, returns, nops, clobbers and debug stmts. */ if (gimple_code (stmt) == GIMPLE_LABEL @@ -422,6 +425,36 @@ find_tail_calls (basic_block bb, struct tailcall **ret) { call = as_a <gcall *> (stmt); ass_var = gimple_call_lhs (call); + if (!ass_var) + { + /* Check if function returns one if it's arguments + and that argument is used as return value. + In that case create an artificial lhs to call_stmt, + and set it as the return value. */ + + unsigned rf = gimple_call_return_flags (call); + if (rf & ERF_RETURNS_ARG) + { + unsigned argnum = rf & ERF_RETURN_ARG_MASK; + if (argnum < gimple_call_num_args (call) + && ret_stmt) + { + tree arg = gimple_call_arg (call, argnum); + tree retval = gimple_return_retval (ret_stmt); + if (retval + && TREE_CODE (retval) == SSA_NAME + && operand_equal_p (retval, arg, 0) + && !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl))) + { + ass_var = copy_ssa_name (arg); + gimple_call_set_lhs (call, ass_var); + update_stmt (call); + gimple_return_set_retval (ret_stmt, ass_var); + update_stmt (ret_stmt); + } + } + } + } break; }