PR83750: CSE erf/erfc pair

Message ID CAAgBjMmWhVpka6LaP6J3gq=Y+aey9mhWRpQFN=sDY+4nmz8ZQQ@mail.gmail.com
State New
Headers show
Series
  • PR83750: CSE erf/erfc pair
Related show

Commit Message

Prathamesh Kulkarni Nov. 2, 2018, 9:36 a.m.
Hi,
This patch adds two transforms to match.pd to CSE erf/erfc pair.
erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -
erf(x) when canonicalization is disabled and result of erf(x) has
single use within 1 - erf(x).

The patch regressed builtin-nonneg-1.c. The following test-case
reproduces the issue with patch:

void test(double d1) {
  if (signbit(erfc(d1)))
    link_failure_erfc();
}

ssa dump:

  <bb 2> :
  _5 = __builtin_erf (d1_4(D));
  _1 = 1.0e+0 - _5;
  _6 = _1 < 0.0;
  _2 = (int) _6;
  if (_2 != 0)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  link_failure_erfc ();

  <bb 4> :
  return;

As can be seen, erfc(d1) is folded to 1 - erf(d1).
forwprop then transforms the if condition from _2 != 0
to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure
in undefined reference to link_failure_erfc().

So, the patch adds another transform erf(x) > 1 -> 0
which resolves the regression.

Bootstrapped+tested on x86_64-unknown-linux-gnu.
Cross-testing on arm and aarch64 variants in progress.
OK for trunk if passes ?

Thanks,
Prathamesh
2018-11-02  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

	* match.pd (erfc(x) -> 1 - erf(x)): New pattern.
	(1 - erf(x) -> erfc(x)): Likewise.
	(erf(x) > 1 -> 0): Likewise.

testsuite/
	* gcc.dg/tree-ssa/pr83750-1.c: New test
	* gcc.dg/tree-ssa/pr83750-2.c: Likewise.

Comments

Ulrich Drepper Nov. 2, 2018, 10:03 a.m. | #1
On Fri, Nov 2, 2018 at 10:36 AM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> So, the patch adds another transform erf(x) > 1 -> 0

> which resolves the regression.


Why don't you match for any constant with absolute value >= 1.0
instead of just 1.0?
Jeff Law Nov. 4, 2018, 7:31 p.m. | #2
On 11/2/18 3:36 AM, Prathamesh Kulkarni wrote:
> Hi,

> This patch adds two transforms to match.pd to CSE erf/erfc pair.

> erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -

> erf(x) when canonicalization is disabled and result of erf(x) has

> single use within 1 - erf(x).

> 

> The patch regressed builtin-nonneg-1.c. The following test-case

> reproduces the issue with patch:

> 

> void test(double d1) {

>   if (signbit(erfc(d1)))

>     link_failure_erfc();

> }

> 

> ssa dump:

> 

>   <bb 2> :

>   _5 = __builtin_erf (d1_4(D));

>   _1 = 1.0e+0 - _5;

>   _6 = _1 < 0.0;

>   _2 = (int) _6;

>   if (_2 != 0)

>     goto <bb 3>; [INV]

>   else

>     goto <bb 4>; [INV]

> 

>   <bb 3> :

>   link_failure_erfc ();

> 

>   <bb 4> :

>   return;

> 

> As can be seen, erfc(d1) is folded to 1 - erf(d1).

> forwprop then transforms the if condition from _2 != 0

> to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure

> in undefined reference to link_failure_erfc().

> 

> So, the patch adds another transform erf(x) > 1 -> 0

> which resolves the regression.

> 

> Bootstrapped+tested on x86_64-unknown-linux-gnu.

> Cross-testing on arm and aarch64 variants in progress.

> OK for trunk if passes ?

> 

> Thanks,

> Prathamesh

> 

> 

> pr83750-4.txt

> 

> 2018-11-02  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

> 

> 	* match.pd (erfc(x) -> 1 - erf(x)): New pattern.

> 	(1 - erf(x) -> erfc(x)): Likewise.

> 	(erf(x) > 1 -> 0): Likewise.

> 

> testsuite/

> 	* gcc.dg/tree-ssa/pr83750-1.c: New test

> 	* gcc.dg/tree-ssa/pr83750-2.c: Likewise.

Don't we have a flag specific to honoring nans?  Would that be better to
use than flag_unsafe_math_optimizations?  As Uli mentioned, there's
other cases (where ABS (const) >= 1.0.).

jeff
Richard Biener Nov. 5, 2018, 9:40 a.m. | #3
On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>

> Hi,

> This patch adds two transforms to match.pd to CSE erf/erfc pair.

> erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -

> erf(x) when canonicalization is disabled and result of erf(x) has

> single use within 1 - erf(x).

>

> The patch regressed builtin-nonneg-1.c. The following test-case

> reproduces the issue with patch:

>

> void test(double d1) {

>   if (signbit(erfc(d1)))

>     link_failure_erfc();

> }

>

> ssa dump:

>

>   <bb 2> :

>   _5 = __builtin_erf (d1_4(D));

>   _1 = 1.0e+0 - _5;

>   _6 = _1 < 0.0;

>   _2 = (int) _6;

>   if (_2 != 0)

>     goto <bb 3>; [INV]

>   else

>     goto <bb 4>; [INV]

>

>   <bb 3> :

>   link_failure_erfc ();

>

>   <bb 4> :

>   return;

>

> As can be seen, erfc(d1) is folded to 1 - erf(d1).

> forwprop then transforms the if condition from _2 != 0

> to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure

> in undefined reference to link_failure_erfc().

>

> So, the patch adds another transform erf(x) > 1 -> 0.


Ick.

Why not canonicalize erf (x) to 1-erfc(x) instead?

> which resolves the regression.

>

> Bootstrapped+tested on x86_64-unknown-linux-gnu.

> Cross-testing on arm and aarch64 variants in progress.

> OK for trunk if passes ?

>

> Thanks,

> Prathamesh
Prathamesh Kulkarni Nov. 5, 2018, 12:10 p.m. | #4
On Mon, 5 Nov 2018 at 15:10, Richard Biener <richard.guenther@gmail.com> wrote:
>

> On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni

> <prathamesh.kulkarni@linaro.org> wrote:

> >

> > Hi,

> > This patch adds two transforms to match.pd to CSE erf/erfc pair.

> > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -

> > erf(x) when canonicalization is disabled and result of erf(x) has

> > single use within 1 - erf(x).

> >

> > The patch regressed builtin-nonneg-1.c. The following test-case

> > reproduces the issue with patch:

> >

> > void test(double d1) {

> >   if (signbit(erfc(d1)))

> >     link_failure_erfc();

> > }

> >

> > ssa dump:

> >

> >   <bb 2> :

> >   _5 = __builtin_erf (d1_4(D));

> >   _1 = 1.0e+0 - _5;

> >   _6 = _1 < 0.0;

> >   _2 = (int) _6;

> >   if (_2 != 0)

> >     goto <bb 3>; [INV]

> >   else

> >     goto <bb 4>; [INV]

> >

> >   <bb 3> :

> >   link_failure_erfc ();

> >

> >   <bb 4> :

> >   return;

> >

> > As can be seen, erfc(d1) is folded to 1 - erf(d1).

> > forwprop then transforms the if condition from _2 != 0

> > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure

> > in undefined reference to link_failure_erfc().

> >

> > So, the patch adds another transform erf(x) > 1 -> 0.

>

> Ick.

>

> Why not canonicalize erf (x) to 1-erfc(x) instead?

Sorry I didn't quite follow, won't this cause similar issue with erf ?
I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)
and 1 - erfc(x) -> erf(x) after canonicalization is disabled.

This caused undefined reference to link_failure_erf() in following test-case:

extern int signbit(double);
extern void link_failure_erf(void);
extern double erf(double);

void test(double d1) {
  if (signbit(erf(d1)))
    link_failure_erf();
}

forwprop1 shows:

   <bb 2> :
  _5 = __builtin_erfc (d1_4(D));
  _1 = 1.0e+0 - _5;
  _6 = _5 > 1.0e+0;
  _2 = (int) _6;
  if (_5 > 1.0e+0)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  link_failure_erf ();

  <bb 4> :
  return;

which defeats DCE to remove call to link_failure_erf.

Thanks,
Prathamesh
>

> > which resolves the regression.

> >

> > Bootstrapped+tested on x86_64-unknown-linux-gnu.

> > Cross-testing on arm and aarch64 variants in progress.

> > OK for trunk if passes ?

> >

> > Thanks,

> > Prathamesh
Richard Biener Nov. 5, 2018, 12:44 p.m. | #5
On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>

> On Mon, 5 Nov 2018 at 15:10, Richard Biener <richard.guenther@gmail.com> wrote:

> >

> > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni

> > <prathamesh.kulkarni@linaro.org> wrote:

> > >

> > > Hi,

> > > This patch adds two transforms to match.pd to CSE erf/erfc pair.

> > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -

> > > erf(x) when canonicalization is disabled and result of erf(x) has

> > > single use within 1 - erf(x).

> > >

> > > The patch regressed builtin-nonneg-1.c. The following test-case

> > > reproduces the issue with patch:

> > >

> > > void test(double d1) {

> > >   if (signbit(erfc(d1)))

> > >     link_failure_erfc();

> > > }

> > >

> > > ssa dump:

> > >

> > >   <bb 2> :

> > >   _5 = __builtin_erf (d1_4(D));

> > >   _1 = 1.0e+0 - _5;

> > >   _6 = _1 < 0.0;

> > >   _2 = (int) _6;

> > >   if (_2 != 0)

> > >     goto <bb 3>; [INV]

> > >   else

> > >     goto <bb 4>; [INV]

> > >

> > >   <bb 3> :

> > >   link_failure_erfc ();

> > >

> > >   <bb 4> :

> > >   return;

> > >

> > > As can be seen, erfc(d1) is folded to 1 - erf(d1).

> > > forwprop then transforms the if condition from _2 != 0

> > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure

> > > in undefined reference to link_failure_erfc().

> > >

> > > So, the patch adds another transform erf(x) > 1 -> 0.

> >

> > Ick.

> >

> > Why not canonicalize erf (x) to 1-erfc(x) instead?

> Sorry I didn't quite follow, won't this cause similar issue with erf ?

> I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)

> and 1 - erfc(x) -> erf(x) after canonicalization is disabled.

>

> This caused undefined reference to link_failure_erf() in following test-case:

>

> extern int signbit(double);

> extern void link_failure_erf(void);

> extern double erf(double);

>

> void test(double d1) {

>   if (signbit(erf(d1)))

>     link_failure_erf();

> }


But that's already not optimized without any canonicalization
because erf returns sth in range [-1, 1].

I suggested the change because we have limited support for FP
value-ranges and nonnegative is one thing we can compute
(and erfc as opposed to erf is nonnegative).

> forwprop1 shows:

>

>    <bb 2> :

>   _5 = __builtin_erfc (d1_4(D));

>   _1 = 1.0e+0 - _5;

>   _6 = _5 > 1.0e+0;

>   _2 = (int) _6;

>   if (_5 > 1.0e+0)

>     goto <bb 3>; [INV]

>   else

>     goto <bb 4>; [INV]

>

>   <bb 3> :

>   link_failure_erf ();

>

>   <bb 4> :

>   return;

>

> which defeats DCE to remove call to link_failure_erf.

>

> Thanks,

> Prathamesh

> >

> > > which resolves the regression.

> > >

> > > Bootstrapped+tested on x86_64-unknown-linux-gnu.

> > > Cross-testing on arm and aarch64 variants in progress.

> > > OK for trunk if passes ?

> > >

> > > Thanks,

> > > Prathamesh
Prathamesh Kulkarni Nov. 5, 2018, 2:11 p.m. | #6
On Mon, 5 Nov 2018 at 18:14, Richard Biener <richard.guenther@gmail.com> wrote:
>

> On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni

> <prathamesh.kulkarni@linaro.org> wrote:

> >

> > On Mon, 5 Nov 2018 at 15:10, Richard Biener <richard.guenther@gmail.com> wrote:

> > >

> > > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni

> > > <prathamesh.kulkarni@linaro.org> wrote:

> > > >

> > > > Hi,

> > > > This patch adds two transforms to match.pd to CSE erf/erfc pair.

> > > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -

> > > > erf(x) when canonicalization is disabled and result of erf(x) has

> > > > single use within 1 - erf(x).

> > > >

> > > > The patch regressed builtin-nonneg-1.c. The following test-case

> > > > reproduces the issue with patch:

> > > >

> > > > void test(double d1) {

> > > >   if (signbit(erfc(d1)))

> > > >     link_failure_erfc();

> > > > }

> > > >

> > > > ssa dump:

> > > >

> > > >   <bb 2> :

> > > >   _5 = __builtin_erf (d1_4(D));

> > > >   _1 = 1.0e+0 - _5;

> > > >   _6 = _1 < 0.0;

> > > >   _2 = (int) _6;

> > > >   if (_2 != 0)

> > > >     goto <bb 3>; [INV]

> > > >   else

> > > >     goto <bb 4>; [INV]

> > > >

> > > >   <bb 3> :

> > > >   link_failure_erfc ();

> > > >

> > > >   <bb 4> :

> > > >   return;

> > > >

> > > > As can be seen, erfc(d1) is folded to 1 - erf(d1).

> > > > forwprop then transforms the if condition from _2 != 0

> > > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure

> > > > in undefined reference to link_failure_erfc().

> > > >

> > > > So, the patch adds another transform erf(x) > 1 -> 0.

> > >

> > > Ick.

> > >

> > > Why not canonicalize erf (x) to 1-erfc(x) instead?

> > Sorry I didn't quite follow, won't this cause similar issue with erf ?

> > I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)

> > and 1 - erfc(x) -> erf(x) after canonicalization is disabled.

> >

> > This caused undefined reference to link_failure_erf() in following test-case:

> >

> > extern int signbit(double);

> > extern void link_failure_erf(void);

> > extern double erf(double);

> >

> > void test(double d1) {

> >   if (signbit(erf(d1)))

> >     link_failure_erf();

> > }

>

> But that's already not optimized without any canonicalization

> because erf returns sth in range [-1, 1].

>

> I suggested the change because we have limited support for FP

> value-ranges and nonnegative is one thing we can compute

> (and erfc as opposed to erf is nonnegative).

Ah right, thanks for the explanation.
Unfortunately this still regresses builtin-nonneg-1.c, which can be
reproduced with following test-case:

extern int signbit(double);
extern void link_failure_erf(void);
extern double erf(double);
extern double fabs(double);

void test(double d1) {
  if (signbit(erf(fabs(d1))))
    link_failure_erf();
}

signbit(erf(fabs(d1)) is transformed to 0 without patch but with patch
it gets canonicalized to signbit(1 - erfc(fabs(d1))) which similarly
defeats DCE.

forwprop1 shows:
<bb 2> :
  _1 = ABS_EXPR <d1_5(D)>;
  _6 = __builtin_erfc (_1);
  _2 = 1.0e+0 - _6;
  _7 = _6 > 1.0e+0;
  _3 = (int) _7;
  if (_6 > 1.0e+0)
    goto <bb 3>; [INV]
  else
    goto <bb 4>; [INV]

  <bb 3> :
  link_failure_erf ();

  <bb 4> :
  return;

I assume we would need to somehow tell gcc that the canonicalized
expression 1 - erfc(x) would not exceed 1.0 ?
Is there a better way to do that apart from defining pattern (1 -
erfc(x)) > 1.0 -> 0
which I agree doesn't look ideal to add in match.pd ?

Thanks
Prathamesh
>

> > forwprop1 shows:

> >

> >    <bb 2> :

> >   _5 = __builtin_erfc (d1_4(D));

> >   _1 = 1.0e+0 - _5;

> >   _6 = _5 > 1.0e+0;

> >   _2 = (int) _6;

> >   if (_5 > 1.0e+0)

> >     goto <bb 3>; [INV]

> >   else

> >     goto <bb 4>; [INV]

> >

> >   <bb 3> :

> >   link_failure_erf ();

> >

> >   <bb 4> :

> >   return;

> >

> > which defeats DCE to remove call to link_failure_erf.

> >

> > Thanks,

> > Prathamesh

> > >

> > > > which resolves the regression.

> > > >

> > > > Bootstrapped+tested on x86_64-unknown-linux-gnu.

> > > > Cross-testing on arm and aarch64 variants in progress.

> > > > OK for trunk if passes ?

> > > >

> > > > Thanks,

> > > > Prathamesh
Joseph Myers Nov. 5, 2018, 6:27 p.m. | #7
On Sun, 4 Nov 2018, Jeff Law wrote:

> Don't we have a flag specific to honoring nans?  Would that be better to

> use than flag_unsafe_math_optimizations?  As Uli mentioned, there's


That's only relevant for the comparison optimization, of course.

Converting erfc to 1-erf is dubious, since the whole point of erfc is for 
cases where 1-erf is inaccurate.  (Conversion in the other direction also 
needs flag_unsafe_math_optimizations.)

-- 
Joseph S. Myers
joseph@codesourcery.com
Jeff Law Nov. 5, 2018, 6:37 p.m. | #8
On 11/5/18 11:27 AM, Joseph Myers wrote:
> On Sun, 4 Nov 2018, Jeff Law wrote:

> 

>> Don't we have a flag specific to honoring nans?  Would that be better to

>> use than flag_unsafe_math_optimizations?  As Uli mentioned, there's

> 

> That's only relevant for the comparison optimization, of course.

> 

> Converting erfc to 1-erf is dubious, since the whole point of erfc is for 

> cases where 1-erf is inaccurate.  (Conversion in the other direction also 

> needs flag_unsafe_math_optimizations.)

> 

Understood.  Thanks for clarifying.  It seems like
unsafe-math-optimization is a better fit than the nan specific flag.

jeff
Michael Matz Nov. 5, 2018, 6:48 p.m. | #9
Hi,

On Mon, 5 Nov 2018, Jeff Law wrote:

> >> Don't we have a flag specific to honoring nans?  Would that be better 

> >> to use than flag_unsafe_math_optimizations?  As Uli mentioned, 

> >> there's

> > 

> > That's only relevant for the comparison optimization, of course.

> > 

> > Converting erfc to 1-erf is dubious, since the whole point of erfc is 

> > for cases where 1-erf is inaccurate.  (Conversion in the other 

> > direction also needs flag_unsafe_math_optimizations.)

> > 

> Understood.  Thanks for clarifying.  It seems like 

> unsafe-math-optimization is a better fit than the nan specific flag.


But still we should consider general usefullness, even with unsafe-math.  
In this case we would remove a usage of a slow function that the user 
specifically used to deal with inaccuracies with an equally slow function 
(plus a little arithmetic that is shadows by the functions slowness) that 
now exacly produces the inaccuracies the user wanted to avoid.  I.e. the 
speed gain is zero.  The "canonicalization gain" referred to in the PR 
might be real, but it comes at the cost of introducing definite 
catastrophic cancellation.

IMHO that's not a sensible transformation to do, under any flags.


Ciao,
Michael.
Paul Koning Nov. 5, 2018, 7:07 p.m. | #10
> On Nov 5, 2018, at 1:48 PM, Michael Matz <matz@suse.de> wrote:

> 

> Hi,

> 

> On Mon, 5 Nov 2018, Jeff Law wrote:

> 

>>>> Don't we have a flag specific to honoring nans?  Would that be better 

>>>> to use than flag_unsafe_math_optimizations?  As Uli mentioned, 

>>>> there's

>>> 

>>> That's only relevant for the comparison optimization, of course.

>>> 

>>> Converting erfc to 1-erf is dubious, since the whole point of erfc is 

>>> for cases where 1-erf is inaccurate.  (Conversion in the other 

>>> direction also needs flag_unsafe_math_optimizations.)

>>> 

>> Understood.  Thanks for clarifying.  It seems like 

>> unsafe-math-optimization is a better fit than the nan specific flag.

> 

> But still we should consider general usefullness, even with unsafe-math.  

> In this case we would remove a usage of a slow function that the user 

> specifically used to deal with inaccuracies with an equally slow function 

> (plus a little arithmetic that is shadows by the functions slowness) that 

> now exacly produces the inaccuracies the user wanted to avoid.  I.e. the 

> speed gain is zero.  The "canonicalization gain" referred to in the PR 

> might be real, but it comes at the cost of introducing definite 

> catastrophic cancellation.

> 

> IMHO that's not a sensible transformation to do, under any flags.


That seems right.  The same goes for log vs. logp1, and exp vs. expm1.

	paul
Richard Biener Nov. 6, 2018, 10:34 a.m. | #11
On Mon, Nov 5, 2018 at 3:11 PM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>

> On Mon, 5 Nov 2018 at 18:14, Richard Biener <richard.guenther@gmail.com> wrote:

> >

> > On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni

> > <prathamesh.kulkarni@linaro.org> wrote:

> > >

> > > On Mon, 5 Nov 2018 at 15:10, Richard Biener <richard.guenther@gmail.com> wrote:

> > > >

> > > > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni

> > > > <prathamesh.kulkarni@linaro.org> wrote:

> > > > >

> > > > > Hi,

> > > > > This patch adds two transforms to match.pd to CSE erf/erfc pair.

> > > > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -

> > > > > erf(x) when canonicalization is disabled and result of erf(x) has

> > > > > single use within 1 - erf(x).

> > > > >

> > > > > The patch regressed builtin-nonneg-1.c. The following test-case

> > > > > reproduces the issue with patch:

> > > > >

> > > > > void test(double d1) {

> > > > >   if (signbit(erfc(d1)))

> > > > >     link_failure_erfc();

> > > > > }

> > > > >

> > > > > ssa dump:

> > > > >

> > > > >   <bb 2> :

> > > > >   _5 = __builtin_erf (d1_4(D));

> > > > >   _1 = 1.0e+0 - _5;

> > > > >   _6 = _1 < 0.0;

> > > > >   _2 = (int) _6;

> > > > >   if (_2 != 0)

> > > > >     goto <bb 3>; [INV]

> > > > >   else

> > > > >     goto <bb 4>; [INV]

> > > > >

> > > > >   <bb 3> :

> > > > >   link_failure_erfc ();

> > > > >

> > > > >   <bb 4> :

> > > > >   return;

> > > > >

> > > > > As can be seen, erfc(d1) is folded to 1 - erf(d1).

> > > > > forwprop then transforms the if condition from _2 != 0

> > > > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure

> > > > > in undefined reference to link_failure_erfc().

> > > > >

> > > > > So, the patch adds another transform erf(x) > 1 -> 0.

> > > >

> > > > Ick.

> > > >

> > > > Why not canonicalize erf (x) to 1-erfc(x) instead?

> > > Sorry I didn't quite follow, won't this cause similar issue with erf ?

> > > I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)

> > > and 1 - erfc(x) -> erf(x) after canonicalization is disabled.

> > >

> > > This caused undefined reference to link_failure_erf() in following test-case:

> > >

> > > extern int signbit(double);

> > > extern void link_failure_erf(void);

> > > extern double erf(double);

> > >

> > > void test(double d1) {

> > >   if (signbit(erf(d1)))

> > >     link_failure_erf();

> > > }

> >

> > But that's already not optimized without any canonicalization

> > because erf returns sth in range [-1, 1].

> >

> > I suggested the change because we have limited support for FP

> > value-ranges and nonnegative is one thing we can compute

> > (and erfc as opposed to erf is nonnegative).

> Ah right, thanks for the explanation.

> Unfortunately this still regresses builtin-nonneg-1.c, which can be

> reproduced with following test-case:

>

> extern int signbit(double);

> extern void link_failure_erf(void);

> extern double erf(double);

> extern double fabs(double);

>

> void test(double d1) {

>   if (signbit(erf(fabs(d1))))

>     link_failure_erf();

> }

>

> signbit(erf(fabs(d1)) is transformed to 0 without patch but with patch

> it gets canonicalized to signbit(1 - erfc(fabs(d1))) which similarly

> defeats DCE.

>

> forwprop1 shows:

> <bb 2> :

>   _1 = ABS_EXPR <d1_5(D)>;

>   _6 = __builtin_erfc (_1);

>   _2 = 1.0e+0 - _6;

>   _7 = _6 > 1.0e+0;

>   _3 = (int) _7;

>   if (_6 > 1.0e+0)

>     goto <bb 3>; [INV]

>   else

>     goto <bb 4>; [INV]

>

>   <bb 3> :

>   link_failure_erf ();

>

>   <bb 4> :

>   return;

>

> I assume we would need to somehow tell gcc that the canonicalized

> expression 1 - erfc(x) would not exceed 1.0 ?

> Is there a better way to do that apart from defining pattern (1 -

> erfc(x)) > 1.0 -> 0

> which I agree doesn't look ideal to add in match.pd ?


You could handle a MINUS_EXPR of 1 and erfc() in
gimple_assign_nonnegative_warnv_p (I wouldn't bother
to do it for tree_binary_nonnegative_warnv_p)

This is of course similarly "lame" but a bit cleaner than
a match.pd pattern that just works for the comparison.

In reality the proper long-term fix is to add basic range
propagation to floats.

Richard.

>

> Thanks

> Prathamesh

> >

> > > forwprop1 shows:

> > >

> > >    <bb 2> :

> > >   _5 = __builtin_erfc (d1_4(D));

> > >   _1 = 1.0e+0 - _5;

> > >   _6 = _5 > 1.0e+0;

> > >   _2 = (int) _6;

> > >   if (_5 > 1.0e+0)

> > >     goto <bb 3>; [INV]

> > >   else

> > >     goto <bb 4>; [INV]

> > >

> > >   <bb 3> :

> > >   link_failure_erf ();

> > >

> > >   <bb 4> :

> > >   return;

> > >

> > > which defeats DCE to remove call to link_failure_erf.

> > >

> > > Thanks,

> > > Prathamesh

> > > >

> > > > > which resolves the regression.

> > > > >

> > > > > Bootstrapped+tested on x86_64-unknown-linux-gnu.

> > > > > Cross-testing on arm and aarch64 variants in progress.

> > > > > OK for trunk if passes ?

> > > > >

> > > > > Thanks,

> > > > > Prathamesh
Prathamesh Kulkarni Nov. 9, 2018, 7:11 a.m. | #12
On Tue, 6 Nov 2018 at 16:04, Richard Biener <richard.guenther@gmail.com> wrote:
>

> On Mon, Nov 5, 2018 at 3:11 PM Prathamesh Kulkarni

> <prathamesh.kulkarni@linaro.org> wrote:

> >

> > On Mon, 5 Nov 2018 at 18:14, Richard Biener <richard.guenther@gmail.com> wrote:

> > >

> > > On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni

> > > <prathamesh.kulkarni@linaro.org> wrote:

> > > >

> > > > On Mon, 5 Nov 2018 at 15:10, Richard Biener <richard.guenther@gmail.com> wrote:

> > > > >

> > > > > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni

> > > > > <prathamesh.kulkarni@linaro.org> wrote:

> > > > > >

> > > > > > Hi,

> > > > > > This patch adds two transforms to match.pd to CSE erf/erfc pair.

> > > > > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -

> > > > > > erf(x) when canonicalization is disabled and result of erf(x) has

> > > > > > single use within 1 - erf(x).

> > > > > >

> > > > > > The patch regressed builtin-nonneg-1.c. The following test-case

> > > > > > reproduces the issue with patch:

> > > > > >

> > > > > > void test(double d1) {

> > > > > >   if (signbit(erfc(d1)))

> > > > > >     link_failure_erfc();

> > > > > > }

> > > > > >

> > > > > > ssa dump:

> > > > > >

> > > > > >   <bb 2> :

> > > > > >   _5 = __builtin_erf (d1_4(D));

> > > > > >   _1 = 1.0e+0 - _5;

> > > > > >   _6 = _1 < 0.0;

> > > > > >   _2 = (int) _6;

> > > > > >   if (_2 != 0)

> > > > > >     goto <bb 3>; [INV]

> > > > > >   else

> > > > > >     goto <bb 4>; [INV]

> > > > > >

> > > > > >   <bb 3> :

> > > > > >   link_failure_erfc ();

> > > > > >

> > > > > >   <bb 4> :

> > > > > >   return;

> > > > > >

> > > > > > As can be seen, erfc(d1) is folded to 1 - erf(d1).

> > > > > > forwprop then transforms the if condition from _2 != 0

> > > > > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure

> > > > > > in undefined reference to link_failure_erfc().

> > > > > >

> > > > > > So, the patch adds another transform erf(x) > 1 -> 0.

> > > > >

> > > > > Ick.

> > > > >

> > > > > Why not canonicalize erf (x) to 1-erfc(x) instead?

> > > > Sorry I didn't quite follow, won't this cause similar issue with erf ?

> > > > I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)

> > > > and 1 - erfc(x) -> erf(x) after canonicalization is disabled.

> > > >

> > > > This caused undefined reference to link_failure_erf() in following test-case:

> > > >

> > > > extern int signbit(double);

> > > > extern void link_failure_erf(void);

> > > > extern double erf(double);

> > > >

> > > > void test(double d1) {

> > > >   if (signbit(erf(d1)))

> > > >     link_failure_erf();

> > > > }

> > >

> > > But that's already not optimized without any canonicalization

> > > because erf returns sth in range [-1, 1].

> > >

> > > I suggested the change because we have limited support for FP

> > > value-ranges and nonnegative is one thing we can compute

> > > (and erfc as opposed to erf is nonnegative).

> > Ah right, thanks for the explanation.

> > Unfortunately this still regresses builtin-nonneg-1.c, which can be

> > reproduced with following test-case:

> >

> > extern int signbit(double);

> > extern void link_failure_erf(void);

> > extern double erf(double);

> > extern double fabs(double);

> >

> > void test(double d1) {

> >   if (signbit(erf(fabs(d1))))

> >     link_failure_erf();

> > }

> >

> > signbit(erf(fabs(d1)) is transformed to 0 without patch but with patch

> > it gets canonicalized to signbit(1 - erfc(fabs(d1))) which similarly

> > defeats DCE.

> >

> > forwprop1 shows:

> > <bb 2> :

> >   _1 = ABS_EXPR <d1_5(D)>;

> >   _6 = __builtin_erfc (_1);

> >   _2 = 1.0e+0 - _6;

> >   _7 = _6 > 1.0e+0;

> >   _3 = (int) _7;

> >   if (_6 > 1.0e+0)

> >     goto <bb 3>; [INV]

> >   else

> >     goto <bb 4>; [INV]

> >

> >   <bb 3> :

> >   link_failure_erf ();

> >

> >   <bb 4> :

> >   return;

> >

> > I assume we would need to somehow tell gcc that the canonicalized

> > expression 1 - erfc(x) would not exceed 1.0 ?

> > Is there a better way to do that apart from defining pattern (1 -

> > erfc(x)) > 1.0 -> 0

> > which I agree doesn't look ideal to add in match.pd ?

>

> You could handle a MINUS_EXPR of 1 and erfc() in

> gimple_assign_nonnegative_warnv_p (I wouldn't bother

> to do it for tree_binary_nonnegative_warnv_p)

>

Um won't the value range of 1 - erfc(x) be [-1.0, 1.0] (same as
erf(x)), so would adding it to gimple_assign_nonnegative_warnv_p be
correct ?
erf(fabs(x)) is non-negative and between [0.0, 1.0] although I am not
sure if 1 - erfc(fabs(x)) would be always non-negative too if
erfc(fabs(x)) can exceed 1.0 for some value of x ?
AFAIU for the above case, gcc doesn't know that 1 - erfc(x) cannot
exceed 1.0, and that's because we don't propagate value range for
floats as you pointed out.

Thanks,
Prathamesh
> This is of course similarly "lame" but a bit cleaner than

> a match.pd pattern that just works for the comparison.

>

> In reality the proper long-term fix is to add basic range

> propagation to floats.

>

> Richard.

>

> >

> > Thanks

> > Prathamesh

> > >

> > > > forwprop1 shows:

> > > >

> > > >    <bb 2> :

> > > >   _5 = __builtin_erfc (d1_4(D));

> > > >   _1 = 1.0e+0 - _5;

> > > >   _6 = _5 > 1.0e+0;

> > > >   _2 = (int) _6;

> > > >   if (_5 > 1.0e+0)

> > > >     goto <bb 3>; [INV]

> > > >   else

> > > >     goto <bb 4>; [INV]

> > > >

> > > >   <bb 3> :

> > > >   link_failure_erf ();

> > > >

> > > >   <bb 4> :

> > > >   return;

> > > >

> > > > which defeats DCE to remove call to link_failure_erf.

> > > >

> > > > Thanks,

> > > > Prathamesh

> > > > >

> > > > > > which resolves the regression.

> > > > > >

> > > > > > Bootstrapped+tested on x86_64-unknown-linux-gnu.

> > > > > > Cross-testing on arm and aarch64 variants in progress.

> > > > > > OK for trunk if passes ?

> > > > > >

> > > > > > Thanks,

> > > > > > Prathamesh
Richard Biener Nov. 9, 2018, 12:01 p.m. | #13
On Fri, Nov 9, 2018 at 8:11 AM Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>

> On Tue, 6 Nov 2018 at 16:04, Richard Biener <richard.guenther@gmail.com> wrote:

> >

> > On Mon, Nov 5, 2018 at 3:11 PM Prathamesh Kulkarni

> > <prathamesh.kulkarni@linaro.org> wrote:

> > >

> > > On Mon, 5 Nov 2018 at 18:14, Richard Biener <richard.guenther@gmail.com> wrote:

> > > >

> > > > On Mon, Nov 5, 2018 at 1:11 PM Prathamesh Kulkarni

> > > > <prathamesh.kulkarni@linaro.org> wrote:

> > > > >

> > > > > On Mon, 5 Nov 2018 at 15:10, Richard Biener <richard.guenther@gmail.com> wrote:

> > > > > >

> > > > > > On Fri, Nov 2, 2018 at 10:37 AM Prathamesh Kulkarni

> > > > > > <prathamesh.kulkarni@linaro.org> wrote:

> > > > > > >

> > > > > > > Hi,

> > > > > > > This patch adds two transforms to match.pd to CSE erf/erfc pair.

> > > > > > > erfc(x) is canonicalized to 1 - erf(x) and is then reversed to 1 -

> > > > > > > erf(x) when canonicalization is disabled and result of erf(x) has

> > > > > > > single use within 1 - erf(x).

> > > > > > >

> > > > > > > The patch regressed builtin-nonneg-1.c. The following test-case

> > > > > > > reproduces the issue with patch:

> > > > > > >

> > > > > > > void test(double d1) {

> > > > > > >   if (signbit(erfc(d1)))

> > > > > > >     link_failure_erfc();

> > > > > > > }

> > > > > > >

> > > > > > > ssa dump:

> > > > > > >

> > > > > > >   <bb 2> :

> > > > > > >   _5 = __builtin_erf (d1_4(D));

> > > > > > >   _1 = 1.0e+0 - _5;

> > > > > > >   _6 = _1 < 0.0;

> > > > > > >   _2 = (int) _6;

> > > > > > >   if (_2 != 0)

> > > > > > >     goto <bb 3>; [INV]

> > > > > > >   else

> > > > > > >     goto <bb 4>; [INV]

> > > > > > >

> > > > > > >   <bb 3> :

> > > > > > >   link_failure_erfc ();

> > > > > > >

> > > > > > >   <bb 4> :

> > > > > > >   return;

> > > > > > >

> > > > > > > As can be seen, erfc(d1) is folded to 1 - erf(d1).

> > > > > > > forwprop then transforms the if condition from _2 != 0

> > > > > > > to _5 > 1.0e+0 and that defeats DCE thus resulting in link failure

> > > > > > > in undefined reference to link_failure_erfc().

> > > > > > >

> > > > > > > So, the patch adds another transform erf(x) > 1 -> 0.

> > > > > >

> > > > > > Ick.

> > > > > >

> > > > > > Why not canonicalize erf (x) to 1-erfc(x) instead?

> > > > > Sorry I didn't quite follow, won't this cause similar issue with erf ?

> > > > > I changed the pattern to canonicalize erf(x) -> 1 - erfc(x)

> > > > > and 1 - erfc(x) -> erf(x) after canonicalization is disabled.

> > > > >

> > > > > This caused undefined reference to link_failure_erf() in following test-case:

> > > > >

> > > > > extern int signbit(double);

> > > > > extern void link_failure_erf(void);

> > > > > extern double erf(double);

> > > > >

> > > > > void test(double d1) {

> > > > >   if (signbit(erf(d1)))

> > > > >     link_failure_erf();

> > > > > }

> > > >

> > > > But that's already not optimized without any canonicalization

> > > > because erf returns sth in range [-1, 1].

> > > >

> > > > I suggested the change because we have limited support for FP

> > > > value-ranges and nonnegative is one thing we can compute

> > > > (and erfc as opposed to erf is nonnegative).

> > > Ah right, thanks for the explanation.

> > > Unfortunately this still regresses builtin-nonneg-1.c, which can be

> > > reproduced with following test-case:

> > >

> > > extern int signbit(double);

> > > extern void link_failure_erf(void);

> > > extern double erf(double);

> > > extern double fabs(double);

> > >

> > > void test(double d1) {

> > >   if (signbit(erf(fabs(d1))))

> > >     link_failure_erf();

> > > }

> > >

> > > signbit(erf(fabs(d1)) is transformed to 0 without patch but with patch

> > > it gets canonicalized to signbit(1 - erfc(fabs(d1))) which similarly

> > > defeats DCE.

> > >

> > > forwprop1 shows:

> > > <bb 2> :

> > >   _1 = ABS_EXPR <d1_5(D)>;

> > >   _6 = __builtin_erfc (_1);

> > >   _2 = 1.0e+0 - _6;

> > >   _7 = _6 > 1.0e+0;

> > >   _3 = (int) _7;

> > >   if (_6 > 1.0e+0)

> > >     goto <bb 3>; [INV]

> > >   else

> > >     goto <bb 4>; [INV]

> > >

> > >   <bb 3> :

> > >   link_failure_erf ();

> > >

> > >   <bb 4> :

> > >   return;

> > >

> > > I assume we would need to somehow tell gcc that the canonicalized

> > > expression 1 - erfc(x) would not exceed 1.0 ?

> > > Is there a better way to do that apart from defining pattern (1 -

> > > erfc(x)) > 1.0 -> 0

> > > which I agree doesn't look ideal to add in match.pd ?

> >

> > You could handle a MINUS_EXPR of 1 and erfc() in

> > gimple_assign_nonnegative_warnv_p (I wouldn't bother

> > to do it for tree_binary_nonnegative_warnv_p)

> >

> Um won't the value range of 1 - erfc(x) be [-1.0, 1.0] (same as

> erf(x)), so would adding it to gimple_assign_nonnegative_warnv_p be

> correct ?


Under the same condition as erf, when the argument is positive.
This is what the testcase is testing.

> erf(fabs(x)) is non-negative and between [0.0, 1.0] although I am not

> sure if 1 - erfc(fabs(x)) would be always non-negative too if

> erfc(fabs(x)) can exceed 1.0 for some value of x ?

> AFAIU for the above case, gcc doesn't know that 1 - erfc(x) cannot

> exceed 1.0, and that's because we don't propagate value range for

> floats as you pointed out.

>

> Thanks,

> Prathamesh

> > This is of course similarly "lame" but a bit cleaner than

> > a match.pd pattern that just works for the comparison.

> >

> > In reality the proper long-term fix is to add basic range

> > propagation to floats.

> >

> > Richard.

> >

> > >

> > > Thanks

> > > Prathamesh

> > > >

> > > > > forwprop1 shows:

> > > > >

> > > > >    <bb 2> :

> > > > >   _5 = __builtin_erfc (d1_4(D));

> > > > >   _1 = 1.0e+0 - _5;

> > > > >   _6 = _5 > 1.0e+0;

> > > > >   _2 = (int) _6;

> > > > >   if (_5 > 1.0e+0)

> > > > >     goto <bb 3>; [INV]

> > > > >   else

> > > > >     goto <bb 4>; [INV]

> > > > >

> > > > >   <bb 3> :

> > > > >   link_failure_erf ();

> > > > >

> > > > >   <bb 4> :

> > > > >   return;

> > > > >

> > > > > which defeats DCE to remove call to link_failure_erf.

> > > > >

> > > > > Thanks,

> > > > > Prathamesh

> > > > > >

> > > > > > > which resolves the regression.

> > > > > > >

> > > > > > > Bootstrapped+tested on x86_64-unknown-linux-gnu.

> > > > > > > Cross-testing on arm and aarch64 variants in progress.

> > > > > > > OK for trunk if passes ?

> > > > > > >

> > > > > > > Thanks,

> > > > > > > Prathamesh

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index d07ceb7d087..03e9230a579 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4490,7 +4490,28 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (if (targetm.libc_has_function (function_c99_math_complex))
     (complex
      (mult (exps@1 (realpart @0)) (realpart (cexpis:type@2 (imagpart @0))))
-     (mult @1 (imagpart @2)))))))
+     (mult @1 (imagpart @2))))))
+
+
+ /* Canonicalize erfc(x) -> 1 - erf(x) */
+ (simplify
+  (ERFC @0)
+   (minus { build_one_cst (TREE_TYPE (@0)); } (ERF @0))))
+
+(if (flag_unsafe_math_optimizations
+     && !canonicalize_math_p())
+
+ /* 1 - erf(x) -> erfc(x)
+    This is only done if result of erf() has single use in 1 - erf(x). */
+ (simplify
+  (minus real_onep (ERF@1 @0))
+   (if (single_use (@1))
+    (ERFC @0)))
+
+ /* erf(x) > 1 -> 0 */
+ (simplify
+  (gt (ERF @0) real_onep)
+  { integer_zero_node; }))
 
 (if (canonicalize_math_p ())
  /* floor(x) -> trunc(x) if x is nonnegative.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83750-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-1.c
new file mode 100644
index 00000000000..c4d3e428f15
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-1.c
@@ -0,0 +1,32 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target c99_runtime } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */
+
+float f1(float x)
+{
+  float g1(float, float);
+
+  float r = __builtin_erff (x);
+  float t = __builtin_erfcf (x);
+  return g1 (r, t);
+}
+
+double f2(double x)
+{
+  double g2(double, double);
+
+  double r = __builtin_erf (x);
+  double t = __builtin_erfc (x);
+  return g2 (r, t);
+}
+
+long double f3(long double x)
+{
+  long double g3(long double, long double);
+
+  long double r = __builtin_erfl (x);
+  long double t = __builtin_erfcl (x);
+  return g3(r, t);
+}
+
+/* { dg-final { scan-tree-dump-not "erfc" "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr83750-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-2.c
new file mode 100644
index 00000000000..60417b38681
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr83750-2.c
@@ -0,0 +1,22 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target c99_runtime } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */
+
+/* Check that the canonicalized form 1 - erf(x) is folded to erfc(x). */
+
+float f1(float x)
+{
+  return __builtin_erfcf (x);
+}
+
+double f2(double x)
+{
+  return __builtin_erfc (x);
+}
+
+long double f3(long double x)
+{
+  return __builtin_erfcl (x); 
+}
+
+/* { dg-final { scan-tree-dump-times "erfc" 3 "optimized" } } */