[match.pd] Fix for PR35691

Message ID CAAgBjMnP8EK=wgJ9kwZFOPM-Cmmsb7iYTBKDNOy88-0t3T6Ovw@mail.gmail.com
State New
Headers show

Commit Message

Prathamesh Kulkarni Nov. 7, 2016, 5:36 p.m.
On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote:
> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:

>

>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote:

>> > On Thu, 3 Nov 2016, Marc Glisse wrote:

>> >

>> >> On Thu, 3 Nov 2016, Richard Biener wrote:

>> >>

>> >> > > > > The transform would also work for vectors (element_precision for

>> >> > > > > the test but also a value-matching zero which should ensure the

>> >> > > > > same number of elements).

>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a

>> >> > > > matching zero.

>> >> > > > Could you please elaborate on that ?

>> >> > >

>> >> > > He may have meant something like:

>> >> > >

>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))

>> >> >

>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the

>> >> > point of the pattern.

>> >>

>> >> Oups, that's what I had written first, and then I somehow managed to confuse

>> >> myself enough to remove it so as to remove the call to types_match :-(

>> >>

>> >> > > So the last operand is checked with operand_equal_p instead of

>> >> > > integer_zerop. But the fact that we could compute bit_ior on the

>> >> > > comparison results should already imply that the number of elements is the

>> >> > > same.

>> >> >

>> >> > Though for equality compares we also allow scalar results IIRC.

>> >>

>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one

>> >> for a testcase, at least until the GIMPLE FE lands...

>> >>

>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some

>> >> > > vector mode on s390?), it would be cool to do the same for floats (most

>> >> > > likely at the RTL level).

>> >> >

>> >> > On GIMPLE view-converts could come to the rescue here as well.  Or we cab

>> >> > just allow bit-and/or on floats as much as we allow them on pointers.

>> >>

>> >> Would that generate sensible code on targets that do not have logic insns for

>> >> floats? Actually, even on x86_64 that generates inefficient code, so there

>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).

>> >>

>> >> I am also a bit wary of doing those obfuscating optimizations too early...

>> >> a==0 is something that other optimizations might use. long

>> >> c=(long&)a|(long&)b; (double&)c==0; less so...

>> >>

>> >> (and I am assuming that signaling NaNs don't make the whole transformation

>> >> impossible, which might be wrong)

>> >

>> > Yeah.  I also think it's not so much important - I just wanted to mention

>> > vectors...

>> >

>> > Btw, I still think we need a more sensible infrastructure for passes

>> > to gather, analyze and modify complex conditions.  (I'm always pointing

>> > to tree-affine.c as an, albeit not very good, example for handling

>> > a similar problem)

>> Thanks for mentioning the value-matching capture @@, I wasn't aware of

>> this match.pd feature.

>> The current patch keeps it restricted to only bitwise operators on integers.

>> Bootstrap+test running on x86_64-unknown-linux-gnu.

>> OK to commit if passes ?

>

> +/* PR35691: Transform

> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.

> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */

> +

>

> Please omit the vertical space

>

> +(for bitop (bit_and bit_ior)

> +     cmp (eq ne)

> + (simplify

> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))

>

> if you capture the first integer_zerop as @2 then you can re-use it...

>

> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))

> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))

> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE

> (@1)))

> +    (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));

>

> ... here inplace of the { build_zero_cst ... }.

>

> Ok with that changes.

Thanks, committed the attached version as r241915.

>

> Richard.
2016-11-07  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

	PR middle-end/35691
	* match.pd: Add following two patterns:
	(x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
	(x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.

testsuite/
	* gcc.dg/pr35691-1.c: New test-case.
	* gcc.dg/pr35691-4.c: Likewise.

Comments

Prathamesh Kulkarni Nov. 7, 2016, 5:40 p.m. | #1
On 7 November 2016 at 23:06, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
> On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote:

>> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:

>>

>>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote:

>>> > On Thu, 3 Nov 2016, Marc Glisse wrote:

>>> >

>>> >> On Thu, 3 Nov 2016, Richard Biener wrote:

>>> >>

>>> >> > > > > The transform would also work for vectors (element_precision for

>>> >> > > > > the test but also a value-matching zero which should ensure the

>>> >> > > > > same number of elements).

>>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a

>>> >> > > > matching zero.

>>> >> > > > Could you please elaborate on that ?

>>> >> > >

>>> >> > > He may have meant something like:

>>> >> > >

>>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))

>>> >> >

>>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the

>>> >> > point of the pattern.

>>> >>

>>> >> Oups, that's what I had written first, and then I somehow managed to confuse

>>> >> myself enough to remove it so as to remove the call to types_match :-(

>>> >>

>>> >> > > So the last operand is checked with operand_equal_p instead of

>>> >> > > integer_zerop. But the fact that we could compute bit_ior on the

>>> >> > > comparison results should already imply that the number of elements is the

>>> >> > > same.

>>> >> >

>>> >> > Though for equality compares we also allow scalar results IIRC.

>>> >>

>>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one

>>> >> for a testcase, at least until the GIMPLE FE lands...

>>> >>

>>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some

>>> >> > > vector mode on s390?), it would be cool to do the same for floats (most

>>> >> > > likely at the RTL level).

>>> >> >

>>> >> > On GIMPLE view-converts could come to the rescue here as well.  Or we cab

>>> >> > just allow bit-and/or on floats as much as we allow them on pointers.

>>> >>

>>> >> Would that generate sensible code on targets that do not have logic insns for

>>> >> floats? Actually, even on x86_64 that generates inefficient code, so there

>>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).

>>> >>

>>> >> I am also a bit wary of doing those obfuscating optimizations too early...

>>> >> a==0 is something that other optimizations might use. long

>>> >> c=(long&)a|(long&)b; (double&)c==0; less so...

>>> >>

>>> >> (and I am assuming that signaling NaNs don't make the whole transformation

>>> >> impossible, which might be wrong)

>>> >

>>> > Yeah.  I also think it's not so much important - I just wanted to mention

>>> > vectors...

>>> >

>>> > Btw, I still think we need a more sensible infrastructure for passes

>>> > to gather, analyze and modify complex conditions.  (I'm always pointing

>>> > to tree-affine.c as an, albeit not very good, example for handling

>>> > a similar problem)

>>> Thanks for mentioning the value-matching capture @@, I wasn't aware of

>>> this match.pd feature.

>>> The current patch keeps it restricted to only bitwise operators on integers.

>>> Bootstrap+test running on x86_64-unknown-linux-gnu.

>>> OK to commit if passes ?

>>

>> +/* PR35691: Transform

>> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.

>> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */

>> +

>>

>> Please omit the vertical space

>>

>> +(for bitop (bit_and bit_ior)

>> +     cmp (eq ne)

>> + (simplify

>> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))

>>

>> if you capture the first integer_zerop as @2 then you can re-use it...

>>

>> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))

>> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))

>> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE

>> (@1)))

>> +    (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));

>>

>> ... here inplace of the { build_zero_cst ... }.

>>

>> Ok with that changes.

> Thanks, committed the attached version as r241915.

ugh, the svn commit message has:

testsuite/
* gcc.dg/pr35691-1.c: New test-case.
* gcc.dg/pr35691-4.c: Likewise.

pr35691-4.c was a typo, should be pr35691-2.c :/
However testsuite/ChangeLog correctly has entry for pr35691-2.c
Is it possible to edit the commit message for r241915 ?
Sorry about this.

Regards,
Prathamesh
>

>>

>> Richard.
Richard Biener Nov. 8, 2016, 7:53 a.m. | #2
On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:

> On 7 November 2016 at 23:06, Prathamesh Kulkarni

> <prathamesh.kulkarni@linaro.org> wrote:

> > On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote:

> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:

> >>

> >>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote:

> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote:

> >>> >

> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote:

> >>> >>

> >>> >> > > > > The transform would also work for vectors (element_precision for

> >>> >> > > > > the test but also a value-matching zero which should ensure the

> >>> >> > > > > same number of elements).

> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a

> >>> >> > > > matching zero.

> >>> >> > > > Could you please elaborate on that ?

> >>> >> > >

> >>> >> > > He may have meant something like:

> >>> >> > >

> >>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))

> >>> >> >

> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the

> >>> >> > point of the pattern.

> >>> >>

> >>> >> Oups, that's what I had written first, and then I somehow managed to confuse

> >>> >> myself enough to remove it so as to remove the call to types_match :-(

> >>> >>

> >>> >> > > So the last operand is checked with operand_equal_p instead of

> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the

> >>> >> > > comparison results should already imply that the number of elements is the

> >>> >> > > same.

> >>> >> >

> >>> >> > Though for equality compares we also allow scalar results IIRC.

> >>> >>

> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one

> >>> >> for a testcase, at least until the GIMPLE FE lands...

> >>> >>

> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some

> >>> >> > > vector mode on s390?), it would be cool to do the same for floats (most

> >>> >> > > likely at the RTL level).

> >>> >> >

> >>> >> > On GIMPLE view-converts could come to the rescue here as well.  Or we cab

> >>> >> > just allow bit-and/or on floats as much as we allow them on pointers.

> >>> >>

> >>> >> Would that generate sensible code on targets that do not have logic insns for

> >>> >> floats? Actually, even on x86_64 that generates inefficient code, so there

> >>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).

> >>> >>

> >>> >> I am also a bit wary of doing those obfuscating optimizations too early...

> >>> >> a==0 is something that other optimizations might use. long

> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so...

> >>> >>

> >>> >> (and I am assuming that signaling NaNs don't make the whole transformation

> >>> >> impossible, which might be wrong)

> >>> >

> >>> > Yeah.  I also think it's not so much important - I just wanted to mention

> >>> > vectors...

> >>> >

> >>> > Btw, I still think we need a more sensible infrastructure for passes

> >>> > to gather, analyze and modify complex conditions.  (I'm always pointing

> >>> > to tree-affine.c as an, albeit not very good, example for handling

> >>> > a similar problem)

> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of

> >>> this match.pd feature.

> >>> The current patch keeps it restricted to only bitwise operators on integers.

> >>> Bootstrap+test running on x86_64-unknown-linux-gnu.

> >>> OK to commit if passes ?

> >>

> >> +/* PR35691: Transform

> >> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.

> >> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */

> >> +

> >>

> >> Please omit the vertical space

> >>

> >> +(for bitop (bit_and bit_ior)

> >> +     cmp (eq ne)

> >> + (simplify

> >> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))

> >>

> >> if you capture the first integer_zerop as @2 then you can re-use it...

> >>

> >> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))

> >> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))

> >> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE

> >> (@1)))

> >> +    (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));

> >>

> >> ... here inplace of the { build_zero_cst ... }.

> >>

> >> Ok with that changes.

> > Thanks, committed the attached version as r241915.

> ugh, the svn commit message has:

> 

> testsuite/

> * gcc.dg/pr35691-1.c: New test-case.

> * gcc.dg/pr35691-4.c: Likewise.

> 

> pr35691-4.c was a typo, should be pr35691-2.c :/

> However testsuite/ChangeLog correctly has entry for pr35691-2.c

> Is it possible to edit the commit message for r241915 ?

> Sorry about this.


No, just leave it as-is.

Richard.

> Regards,

> Prathamesh

> >

> >>

> >> Richard.

> 

> 


-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Prathamesh Kulkarni Nov. 8, 2016, 11:12 a.m. | #3
On 8 November 2016 at 13:23, Richard Biener <rguenther@suse.de> wrote:
> On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:

>

>> On 7 November 2016 at 23:06, Prathamesh Kulkarni

>> <prathamesh.kulkarni@linaro.org> wrote:

>> > On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote:

>> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:

>> >>

>> >>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote:

>> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote:

>> >>> >

>> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote:

>> >>> >>

>> >>> >> > > > > The transform would also work for vectors (element_precision for

>> >>> >> > > > > the test but also a value-matching zero which should ensure the

>> >>> >> > > > > same number of elements).

>> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a

>> >>> >> > > > matching zero.

>> >>> >> > > > Could you please elaborate on that ?

>> >>> >> > >

>> >>> >> > > He may have meant something like:

>> >>> >> > >

>> >>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))

>> >>> >> >

>> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the

>> >>> >> > point of the pattern.

>> >>> >>

>> >>> >> Oups, that's what I had written first, and then I somehow managed to confuse

>> >>> >> myself enough to remove it so as to remove the call to types_match :-(

>> >>> >>

>> >>> >> > > So the last operand is checked with operand_equal_p instead of

>> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the

>> >>> >> > > comparison results should already imply that the number of elements is the

>> >>> >> > > same.

>> >>> >> >

>> >>> >> > Though for equality compares we also allow scalar results IIRC.

>> >>> >>

>> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one

>> >>> >> for a testcase, at least until the GIMPLE FE lands...

>> >>> >>

>> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some

>> >>> >> > > vector mode on s390?), it would be cool to do the same for floats (most

>> >>> >> > > likely at the RTL level).

>> >>> >> >

>> >>> >> > On GIMPLE view-converts could come to the rescue here as well.  Or we cab

>> >>> >> > just allow bit-and/or on floats as much as we allow them on pointers.

>> >>> >>

>> >>> >> Would that generate sensible code on targets that do not have logic insns for

>> >>> >> floats? Actually, even on x86_64 that generates inefficient code, so there

>> >>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).

>> >>> >>

>> >>> >> I am also a bit wary of doing those obfuscating optimizations too early...

>> >>> >> a==0 is something that other optimizations might use. long

>> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so...

>> >>> >>

>> >>> >> (and I am assuming that signaling NaNs don't make the whole transformation

>> >>> >> impossible, which might be wrong)

>> >>> >

>> >>> > Yeah.  I also think it's not so much important - I just wanted to mention

>> >>> > vectors...

>> >>> >

>> >>> > Btw, I still think we need a more sensible infrastructure for passes

>> >>> > to gather, analyze and modify complex conditions.  (I'm always pointing

>> >>> > to tree-affine.c as an, albeit not very good, example for handling

>> >>> > a similar problem)

>> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of

>> >>> this match.pd feature.

>> >>> The current patch keeps it restricted to only bitwise operators on integers.

>> >>> Bootstrap+test running on x86_64-unknown-linux-gnu.

>> >>> OK to commit if passes ?

>> >>

>> >> +/* PR35691: Transform

>> >> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.

>> >> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */

>> >> +

>> >>

>> >> Please omit the vertical space

>> >>

>> >> +(for bitop (bit_and bit_ior)

>> >> +     cmp (eq ne)

>> >> + (simplify

>> >> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))

>> >>

>> >> if you capture the first integer_zerop as @2 then you can re-use it...

>> >>

>> >> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))

>> >> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))

>> >> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE

>> >> (@1)))

>> >> +    (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));

>> >>

>> >> ... here inplace of the { build_zero_cst ... }.

>> >>

>> >> Ok with that changes.

>> > Thanks, committed the attached version as r241915.

>> ugh, the svn commit message has:

>>

>> testsuite/

>> * gcc.dg/pr35691-1.c: New test-case.

>> * gcc.dg/pr35691-4.c: Likewise.

>>

>> pr35691-4.c was a typo, should be pr35691-2.c :/

>> However testsuite/ChangeLog correctly has entry for pr35691-2.c

>> Is it possible to edit the commit message for r241915 ?

>> Sorry about this.

>

> No, just leave it as-is.

Hi,
Chritstophe reported to me that the commit caused test-cases
pr35691-1.c and pr35691-2.c (which were added by the commit)
to FAIL for cortex-a5:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt

It seems truth_andif_expr is not simplified to bit_and_expr on
cortex-a5 as for x86_64 (and other arm variants).
The differences in dumps start from 004t.gimple for pr35691-1.c:

x86_64 gimple dump:
foo (int z0, unsigned int z1)
{
  int D.1800;
  int t0;
  int t1;
  int t2;

  _1 = z0 == 0;
  t0 = (int) _1;
  _2 = z1 == 0;
  t1 = (int) _2;
  _3 = t0 != 0;
  _4 = t1 != 0;
  _5 = _3 & _4;
  t2 = (int) _5;
  D.1800 = t2;
  return D.1800;
}

cortex-a5 gimple dump:
foo (int z0, unsigned int z1)
{
  int iftmp.0;
  int D.4176;
  int t0;
  int t1;
  int t2;

  _1 = z0 == 0;
  t0 = (int) _1;
  _2 = z1 == 0;
  t1 = (int) _2;
  if (t0 != 0) goto <D.4174>; else goto <D.4172>;
  <D.4174>:
  if (t1 != 0) goto <D.4175>; else goto <D.4172>;
  <D.4175>:
  iftmp.0 = 1;
  goto <D.4173>;
  <D.4172>:
  iftmp.0 = 0;
  <D.4173>:
  t2 = iftmp.0;
  D.4176 = t2;
  return D.4176;
}

Since the pattern expects truth_andif_expr to be converted to bit_and_expr,
it fails to match for cortex-a5.
This seems to happen only for cortex-a5 (the other variants a9, a15,
a57 are OK).

Is my assumption that truth_andif_expr would be always converted to bit_and_expr
for above case incorrect ?

Thanks,
Prathamesh
>

> Richard.

>

>> Regards,

>> Prathamesh

>> >

>> >>

>> >> Richard.

>>

>>

>

> --

> Richard Biener <rguenther@suse.de>

> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Richard Biener Nov. 8, 2016, 11:16 a.m. | #4
On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote:

> On 8 November 2016 at 13:23, Richard Biener <rguenther@suse.de> wrote:

> > On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:

> >

> >> On 7 November 2016 at 23:06, Prathamesh Kulkarni

> >> <prathamesh.kulkarni@linaro.org> wrote:

> >> > On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote:

> >> >> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:

> >> >>

> >> >>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote:

> >> >>> > On Thu, 3 Nov 2016, Marc Glisse wrote:

> >> >>> >

> >> >>> >> On Thu, 3 Nov 2016, Richard Biener wrote:

> >> >>> >>

> >> >>> >> > > > > The transform would also work for vectors (element_precision for

> >> >>> >> > > > > the test but also a value-matching zero which should ensure the

> >> >>> >> > > > > same number of elements).

> >> >>> >> > > > Um sorry, I didn't get how to check vectors to be of equal length by a

> >> >>> >> > > > matching zero.

> >> >>> >> > > > Could you please elaborate on that ?

> >> >>> >> > >

> >> >>> >> > > He may have meant something like:

> >> >>> >> > >

> >> >>> >> > >   (op (cmp @0 integer_zerop@2) (cmp @1 @2))

> >> >>> >> >

> >> >>> >> > I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the

> >> >>> >> > point of the pattern.

> >> >>> >>

> >> >>> >> Oups, that's what I had written first, and then I somehow managed to confuse

> >> >>> >> myself enough to remove it so as to remove the call to types_match :-(

> >> >>> >>

> >> >>> >> > > So the last operand is checked with operand_equal_p instead of

> >> >>> >> > > integer_zerop. But the fact that we could compute bit_ior on the

> >> >>> >> > > comparison results should already imply that the number of elements is the

> >> >>> >> > > same.

> >> >>> >> >

> >> >>> >> > Though for equality compares we also allow scalar results IIRC.

> >> >>> >>

> >> >>> >> Oh, right, I keep forgetting that :-( And I have no idea how to generate one

> >> >>> >> for a testcase, at least until the GIMPLE FE lands...

> >> >>> >>

> >> >>> >> > > On platforms that have IOR on floats (at least x86 with SSE, maybe some

> >> >>> >> > > vector mode on s390?), it would be cool to do the same for floats (most

> >> >>> >> > > likely at the RTL level).

> >> >>> >> >

> >> >>> >> > On GIMPLE view-converts could come to the rescue here as well.  Or we cab

> >> >>> >> > just allow bit-and/or on floats as much as we allow them on pointers.

> >> >>> >>

> >> >>> >> Would that generate sensible code on targets that do not have logic insns for

> >> >>> >> floats? Actually, even on x86_64 that generates inefficient code, so there

> >> >>> >> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).

> >> >>> >>

> >> >>> >> I am also a bit wary of doing those obfuscating optimizations too early...

> >> >>> >> a==0 is something that other optimizations might use. long

> >> >>> >> c=(long&)a|(long&)b; (double&)c==0; less so...

> >> >>> >>

> >> >>> >> (and I am assuming that signaling NaNs don't make the whole transformation

> >> >>> >> impossible, which might be wrong)

> >> >>> >

> >> >>> > Yeah.  I also think it's not so much important - I just wanted to mention

> >> >>> > vectors...

> >> >>> >

> >> >>> > Btw, I still think we need a more sensible infrastructure for passes

> >> >>> > to gather, analyze and modify complex conditions.  (I'm always pointing

> >> >>> > to tree-affine.c as an, albeit not very good, example for handling

> >> >>> > a similar problem)

> >> >>> Thanks for mentioning the value-matching capture @@, I wasn't aware of

> >> >>> this match.pd feature.

> >> >>> The current patch keeps it restricted to only bitwise operators on integers.

> >> >>> Bootstrap+test running on x86_64-unknown-linux-gnu.

> >> >>> OK to commit if passes ?

> >> >>

> >> >> +/* PR35691: Transform

> >> >> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.

> >> >> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */

> >> >> +

> >> >>

> >> >> Please omit the vertical space

> >> >>

> >> >> +(for bitop (bit_and bit_ior)

> >> >> +     cmp (eq ne)

> >> >> + (simplify

> >> >> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))

> >> >>

> >> >> if you capture the first integer_zerop as @2 then you can re-use it...

> >> >>

> >> >> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))

> >> >> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))

> >> >> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE

> >> >> (@1)))

> >> >> +    (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));

> >> >>

> >> >> ... here inplace of the { build_zero_cst ... }.

> >> >>

> >> >> Ok with that changes.

> >> > Thanks, committed the attached version as r241915.

> >> ugh, the svn commit message has:

> >>

> >> testsuite/

> >> * gcc.dg/pr35691-1.c: New test-case.

> >> * gcc.dg/pr35691-4.c: Likewise.

> >>

> >> pr35691-4.c was a typo, should be pr35691-2.c :/

> >> However testsuite/ChangeLog correctly has entry for pr35691-2.c

> >> Is it possible to edit the commit message for r241915 ?

> >> Sorry about this.

> >

> > No, just leave it as-is.

> Hi,

> Chritstophe reported to me that the commit caused test-cases

> pr35691-1.c and pr35691-2.c (which were added by the commit)

> to FAIL for cortex-a5:

> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt

> 

> It seems truth_andif_expr is not simplified to bit_and_expr on

> cortex-a5 as for x86_64 (and other arm variants).

> The differences in dumps start from 004t.gimple for pr35691-1.c:

> 

> x86_64 gimple dump:

> foo (int z0, unsigned int z1)

> {

>   int D.1800;

>   int t0;

>   int t1;

>   int t2;

> 

>   _1 = z0 == 0;

>   t0 = (int) _1;

>   _2 = z1 == 0;

>   t1 = (int) _2;

>   _3 = t0 != 0;

>   _4 = t1 != 0;

>   _5 = _3 & _4;

>   t2 = (int) _5;

>   D.1800 = t2;

>   return D.1800;

> }

> 

> cortex-a5 gimple dump:

> foo (int z0, unsigned int z1)

> {

>   int iftmp.0;

>   int D.4176;

>   int t0;

>   int t1;

>   int t2;

> 

>   _1 = z0 == 0;

>   t0 = (int) _1;

>   _2 = z1 == 0;

>   t1 = (int) _2;

>   if (t0 != 0) goto <D.4174>; else goto <D.4172>;

>   <D.4174>:

>   if (t1 != 0) goto <D.4175>; else goto <D.4172>;

>   <D.4175>:

>   iftmp.0 = 1;

>   goto <D.4173>;

>   <D.4172>:

>   iftmp.0 = 0;

>   <D.4173>:

>   t2 = iftmp.0;

>   D.4176 = t2;

>   return D.4176;

> }

> 

> Since the pattern expects truth_andif_expr to be converted to bit_and_expr,

> it fails to match for cortex-a5.

> This seems to happen only for cortex-a5 (the other variants a9, a15,

> a57 are OK).

> 

> Is my assumption that truth_andif_expr would be always converted to bit_and_expr

> for above case incorrect ?


Yes, it depends on LOGICAL_OP_SHORT_CIRCUIT.

Richard.

> Thanks,

> Prathamesh

> >

> > Richard.

> >

> >> Regards,

> >> Prathamesh

> >> >

> >> >>

> >> >> Richard.

> >>

> >>

> >

> > --

> > Richard Biener <rguenther@suse.de>

> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

> 

> 


-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Kyrill Tkachov Nov. 8, 2016, 11:19 a.m. | #5
On 08/11/16 11:16, Richard Biener wrote:
> On Tue, 8 Nov 2016, Prathamesh Kulkarni wrote:

>

>> On 8 November 2016 at 13:23, Richard Biener <rguenther@suse.de> wrote:

>>> On Mon, 7 Nov 2016, Prathamesh Kulkarni wrote:

>>>

>>>> On 7 November 2016 at 23:06, Prathamesh Kulkarni

>>>> <prathamesh.kulkarni@linaro.org> wrote:

>>>>> On 7 November 2016 at 15:43, Richard Biener <rguenther@suse.de> wrote:

>>>>>> On Fri, 4 Nov 2016, Prathamesh Kulkarni wrote:

>>>>>>

>>>>>>> On 4 November 2016 at 13:41, Richard Biener <rguenther@suse.de> wrote:

>>>>>>>> On Thu, 3 Nov 2016, Marc Glisse wrote:

>>>>>>>>

>>>>>>>>> On Thu, 3 Nov 2016, Richard Biener wrote:

>>>>>>>>>

>>>>>>>>>>>>> The transform would also work for vectors (element_precision for

>>>>>>>>>>>>> the test but also a value-matching zero which should ensure the

>>>>>>>>>>>>> same number of elements).

>>>>>>>>>>>> Um sorry, I didn't get how to check vectors to be of equal length by a

>>>>>>>>>>>> matching zero.

>>>>>>>>>>>> Could you please elaborate on that ?

>>>>>>>>>>> He may have meant something like:

>>>>>>>>>>>

>>>>>>>>>>>    (op (cmp @0 integer_zerop@2) (cmp @1 @2))

>>>>>>>>>> I meant with one being @@2 to allow signed vs. Unsigned @0/@1 which was the

>>>>>>>>>> point of the pattern.

>>>>>>>>> Oups, that's what I had written first, and then I somehow managed to confuse

>>>>>>>>> myself enough to remove it so as to remove the call to types_match :-(

>>>>>>>>>

>>>>>>>>>>> So the last operand is checked with operand_equal_p instead of

>>>>>>>>>>> integer_zerop. But the fact that we could compute bit_ior on the

>>>>>>>>>>> comparison results should already imply that the number of elements is the

>>>>>>>>>>> same.

>>>>>>>>>> Though for equality compares we also allow scalar results IIRC.

>>>>>>>>> Oh, right, I keep forgetting that :-( And I have no idea how to generate one

>>>>>>>>> for a testcase, at least until the GIMPLE FE lands...

>>>>>>>>>

>>>>>>>>>>> On platforms that have IOR on floats (at least x86 with SSE, maybe some

>>>>>>>>>>> vector mode on s390?), it would be cool to do the same for floats (most

>>>>>>>>>>> likely at the RTL level).

>>>>>>>>>> On GIMPLE view-converts could come to the rescue here as well.  Or we cab

>>>>>>>>>> just allow bit-and/or on floats as much as we allow them on pointers.

>>>>>>>>> Would that generate sensible code on targets that do not have logic insns for

>>>>>>>>> floats? Actually, even on x86_64 that generates inefficient code, so there

>>>>>>>>> would be some work (for instance grep finds no gen_iordf3, only gen_iorv2df3).

>>>>>>>>>

>>>>>>>>> I am also a bit wary of doing those obfuscating optimizations too early...

>>>>>>>>> a==0 is something that other optimizations might use. long

>>>>>>>>> c=(long&)a|(long&)b; (double&)c==0; less so...

>>>>>>>>>

>>>>>>>>> (and I am assuming that signaling NaNs don't make the whole transformation

>>>>>>>>> impossible, which might be wrong)

>>>>>>>> Yeah.  I also think it's not so much important - I just wanted to mention

>>>>>>>> vectors...

>>>>>>>>

>>>>>>>> Btw, I still think we need a more sensible infrastructure for passes

>>>>>>>> to gather, analyze and modify complex conditions.  (I'm always pointing

>>>>>>>> to tree-affine.c as an, albeit not very good, example for handling

>>>>>>>> a similar problem)

>>>>>>> Thanks for mentioning the value-matching capture @@, I wasn't aware of

>>>>>>> this match.pd feature.

>>>>>>> The current patch keeps it restricted to only bitwise operators on integers.

>>>>>>> Bootstrap+test running on x86_64-unknown-linux-gnu.

>>>>>>> OK to commit if passes ?

>>>>>> +/* PR35691: Transform

>>>>>> +   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.

>>>>>> +   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */

>>>>>> +

>>>>>>

>>>>>> Please omit the vertical space

>>>>>>

>>>>>> +(for bitop (bit_and bit_ior)

>>>>>> +     cmp (eq ne)

>>>>>> + (simplify

>>>>>> +  (bitop (cmp @0 integer_zerop) (cmp @1 integer_zerop))

>>>>>>

>>>>>> if you capture the first integer_zerop as @2 then you can re-use it...

>>>>>>

>>>>>> +   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))

>>>>>> +       && INTEGRAL_TYPE_P (TREE_TYPE (@1))

>>>>>> +       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE

>>>>>> (@1)))

>>>>>> +    (cmp (bit_ior @0 (convert @1)) { build_zero_cst (TREE_TYPE (@0));

>>>>>>

>>>>>> ... here inplace of the { build_zero_cst ... }.

>>>>>>

>>>>>> Ok with that changes.

>>>>> Thanks, committed the attached version as r241915.

>>>> ugh, the svn commit message has:

>>>>

>>>> testsuite/

>>>> * gcc.dg/pr35691-1.c: New test-case.

>>>> * gcc.dg/pr35691-4.c: Likewise.

>>>>

>>>> pr35691-4.c was a typo, should be pr35691-2.c :/

>>>> However testsuite/ChangeLog correctly has entry for pr35691-2.c

>>>> Is it possible to edit the commit message for r241915 ?

>>>> Sorry about this.

>>> No, just leave it as-is.

>> Hi,

>> Chritstophe reported to me that the commit caused test-cases

>> pr35691-1.c and pr35691-2.c (which were added by the commit)

>> to FAIL for cortex-a5:

>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt

>>

>> It seems truth_andif_expr is not simplified to bit_and_expr on

>> cortex-a5 as for x86_64 (and other arm variants).

>> The differences in dumps start from 004t.gimple for pr35691-1.c:

>>

>> x86_64 gimple dump:

>> foo (int z0, unsigned int z1)

>> {

>>    int D.1800;

>>    int t0;

>>    int t1;

>>    int t2;

>>

>>    _1 = z0 == 0;

>>    t0 = (int) _1;

>>    _2 = z1 == 0;

>>    t1 = (int) _2;

>>    _3 = t0 != 0;

>>    _4 = t1 != 0;

>>    _5 = _3 & _4;

>>    t2 = (int) _5;

>>    D.1800 = t2;

>>    return D.1800;

>> }

>>

>> cortex-a5 gimple dump:

>> foo (int z0, unsigned int z1)

>> {

>>    int iftmp.0;

>>    int D.4176;

>>    int t0;

>>    int t1;

>>    int t2;

>>

>>    _1 = z0 == 0;

>>    t0 = (int) _1;

>>    _2 = z1 == 0;

>>    t1 = (int) _2;

>>    if (t0 != 0) goto <D.4174>; else goto <D.4172>;

>>    <D.4174>:

>>    if (t1 != 0) goto <D.4175>; else goto <D.4172>;

>>    <D.4175>:

>>    iftmp.0 = 1;

>>    goto <D.4173>;

>>    <D.4172>:

>>    iftmp.0 = 0;

>>    <D.4173>:

>>    t2 = iftmp.0;

>>    D.4176 = t2;

>>    return D.4176;

>> }

>>

>> Since the pattern expects truth_andif_expr to be converted to bit_and_expr,

>> it fails to match for cortex-a5.

>> This seems to happen only for cortex-a5 (the other variants a9, a15,

>> a57 are OK).

>>

>> Is my assumption that truth_andif_expr would be always converted to bit_and_expr

>> for above case incorrect ?

> Yes, it depends on LOGICAL_OP_SHORT_CIRCUIT.


Right, and that varies by core in the arm backend.
looking at the tuning structures in arm.c I'd expect exynosm1 to also exhibit this

Kyrill

> Richard.

>

>> Thanks,

>> Prathamesh

>>> Richard.

>>>

>>>> Regards,

>>>> Prathamesh

>>>>>> Richard.

>>>>

>>> --

>>> Richard Biener <rguenther@suse.de>

>>> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

>>
Martin Sebor Nov. 8, 2016, 4:48 p.m. | #6
> Chritstophe reported to me that the commit caused test-cases

> pr35691-1.c and pr35691-2.c (which were added by the commit)

> to FAIL for cortex-a5:

> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/241915/arm-none-linux-gnueabihf/diff-gcc-rh60-arm-none-linux-gnueabihf-arm-cortex-a5-vfpv3-d16-fp16.txt


I also see the test fail on powerpc64le.  The forwprop1 tree dump
is attached in case it helps.

Martin
;; Function foo (foo, funcdef_no=0, decl_uid=2506, cgraph_uid=0, symbol_order=0)

Applying pattern match.pd:2422, gimple-match.c:1704
Applying pattern match.pd:913, gimple-match.c:621
Applying pattern match.pd:901, gimple-match.c:164
Applying pattern match.pd:2685, gimple-match.c:59999
gimple_simplified to if (_1 != 0)
Applying pattern match.pd:913, generic-match.c:429
Applying pattern match.pd:901, generic-match.c:136
Applying pattern match.pd:2685, generic-match.c:30968
  Replaced '_1 != 0' with 'z0_4(D) == 0'
Applying pattern match.pd:2422, gimple-match.c:1704
Applying pattern match.pd:913, gimple-match.c:621
Applying pattern match.pd:901, gimple-match.c:164
Applying pattern match.pd:2685, gimple-match.c:59999
gimple_simplified to if (_2 != 0)
Applying pattern match.pd:913, generic-match.c:429
Applying pattern match.pd:901, generic-match.c:136
Applying pattern match.pd:2685, generic-match.c:30968
  Replaced '_2 != 0' with 'z1_6(D) == 0'
gimple_simplified to _11 = iftmp.0_3;
foo (int z0, unsigned int z1)
{
  int t2;
  int t1;
  int t0;
  _Bool _1;
  _Bool _2;
  int iftmp.0_3;
  int _11;

  <bb 2>:
  _1 = z0_4(D) == 0;
  t0_5 = (int) _1;
  _2 = z1_6(D) == 0;
  t1_7 = (int) _2;
  if (z0_4(D) == 0)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  if (z1_6(D) == 0)
    goto <bb 5>;
  else
    goto <bb 4>;

  <bb 4>:

  <bb 5>:
  # iftmp.0_3 = PHI <1(3), 0(4)>
  t2_10 = iftmp.0_3;
  _11 = iftmp.0_3;
  return _11;

}

Patch hide | download patch | download mbox

diff --git a/gcc/match.pd b/gcc/match.pd
index 48f7351..29ddcd8 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -519,6 +519,18 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (TYPE_UNSIGNED (type))
     (bit_and @0 (bit_not (lshift { build_all_ones_cst (type); } @1)))))
 
+/* PR35691: Transform
+   (x == 0 & y == 0) -> (x | typeof(x)(y)) == 0.
+   (x != 0 | y != 0) -> (x | typeof(x)(y)) != 0.  */
+(for bitop (bit_and bit_ior)
+     cmp (eq ne)
+ (simplify
+  (bitop (cmp @0 integer_zerop@2) (cmp @1 integer_zerop))
+   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+       && INTEGRAL_TYPE_P (TREE_TYPE (@1))
+       && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)))
+    (cmp (bit_ior @0 (convert @1)) @2))))
+
 /* Fold (A & ~B) - (A & B) into (A ^ B) - B.  */
 (simplify
  (minus (bit_and:cs @0 (bit_not @1)) (bit_and:cs @0 @1))
diff --git a/gcc/testsuite/gcc.dg/pr35691-1.c b/gcc/testsuite/gcc.dg/pr35691-1.c
new file mode 100644
index 0000000..5211f815
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr35691-1.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop-details" } */
+
+int foo(int z0, unsigned z1)
+{
+  int t0 = (z0 == 0);
+  int t1 = (z1 == 0);
+  int t2 = (t0 && t1);
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */
diff --git a/gcc/testsuite/gcc.dg/pr35691-2.c b/gcc/testsuite/gcc.dg/pr35691-2.c
new file mode 100644
index 0000000..90cbf6d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr35691-2.c
@@ -0,0 +1,12 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-forwprop-details" } */
+
+int foo(int z0, unsigned z1)
+{
+  int t0 = (z0 != 0);
+  int t1 = (z1 != 0);
+  int t2 = (t0 || t1);
+  return t2;
+}
+
+/* { dg-final { scan-tree-dump "gimple_simplified to _\[0-9\]* = \\(int\\) z1_\[0-9\]*\\(D\\);" "forwprop1" } } */