diff mbox

Optimize certain end of loop conditions into min/max operation

Message ID 55FBAFD1.9080300@linaro.org
State New
Headers show

Commit Message

Michael Collison Sept. 18, 2015, 6:31 a.m. UTC
On 07/31/2015 11:27 AM, Jeff Law wrote:
> On 07/31/2015 12:18 PM, Michael Collison wrote:
>> Hi Jeff,
>>
>> Yes I will create a test case. I'm not quite sure what to check for even
>> in the machine dependent test case. It's quite possible for the
>> instructions that are generated to change over time.
> I think we're going to want to look at the gimple IR and search for 
> the MIN/MAX expressions rather than the instructions.  Given we don't 
> know where the transformation is going to land (yet), you can probably 
> start with -fdump-tree-optimized and scanning the .optimized dump.
>
> We can still do that and have the test be target specific.
>
> jeff
>
Jeff and Richard,

Here is the the patch modified with test cases for MIN_EXPR and MAX_EXPR 
expressions. I need some assistance; this test case will fail on targets 
that don't have support for MIN/MAX such as 68k. Is there any way to 
remedy this short of enumerating whether a target support MIN/MAX in 
testsuite/lib/target_support?

2015-07-24  Michael Collison <michael.collison@linaro.org>
                     Andrew Pinski <andrew.pinski@caviumnetworks.com>

                 * match.pd ((x < y) && (x < z) -> x < min (y,z),
                 (x > y) and (x > z) -> x > max (y,z))
                 * testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.

Comments

Marc Glisse Sept. 18, 2015, 7 a.m. UTC | #1
On Thu, 17 Sep 2015, Michael Collison wrote:

> Here is the the patch modified with test cases for MIN_EXPR and MAX_EXPR 
> expressions. I need some assistance; this test case will fail on targets that 
> don't have support for MIN/MAX such as 68k. Is there any way to remedy this 
> short of enumerating whether a target support MIN/MAX in 
> testsuite/lib/target_support?
>
> 2015-07-24  Michael Collison <michael.collison@linaro.org>
>                    Andrew Pinski <andrew.pinski@caviumnetworks.com>
>
>                * match.pd ((x < y) && (x < z) -> x < min (y,z),
>                (x > y) and (x > z) -> x > max (y,z))
>                * testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 5e8fd32..8691710 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
>     (convert (bit_and (op (convert:utype @0) (convert:utype @1))
>               (convert:utype @4)))))))
>
> +
> +/* Transform (@0 < @1 and @0 < @2) to use min */
> +(for op (lt le)
> +(simplify

You seem to be missing all indentation.

> +(bit_and:c (op @0 @1) (op @0 @2))

:c seems useless here. On the other hand, it might make sense to use op:s 
since this is mostly useful if it removes the 2 original comparisons.

> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))

How did you chose this restriction? It seems safe enough, but the 
transformation could make sense in other cases as well. It can always be 
generalized later though.

> +(op @0 (min @1 @2)))))
> +
> +/* Transform (@0 > @1 and @0 > @2) to use max */
> +(for op (gt ge)

Note that you could unify the patterns with something like:
(for op (lt le gt ge)
      ext (min min max max)
  (simplify ...

> +(simplify
> +(bit_and:c (op @0 @1) (op @0 @2))
> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
> +(op @0 (max @1 @2)))))
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
> new file mode 100644
> index 0000000..cc0189a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +#define N 1024
> +
> +int a[N], b[N], c[N];
> +
> +void add (unsigned int m, unsigned int n)
> +{
> +  unsigned int i;
> +  for (i = 0; i < m && i < n; ++i)

Maybe writing '&' instead of '&&' would make it depend less on the target. 
Also, both tests seem to be for GENERIC (i.e. I expect that you are 
already seeing the optimized version with -fdump-tree-original or 
-fdump-tree-gimple). Maybe something as simple as:
int f(long a, long b, long c) {
   int cmp1 = a < b;
   int cmp2 = a < c;
   return cmp1 & cmp2;
}

> +    a[i] = b[i] + c[i];
> +}
> +
> +void add2 (unsigned int m, unsigned int n)
> +{
> +  unsigned int i;
> +  for (i = N-1; i > m && i > n; --i)
> +    a[i] = b[i] + c[i];
> +}
> +
> +/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
> +/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */
Marc Glisse Sept. 18, 2015, 7:38 a.m. UTC | #2
Just a couple extra points. We can end up with a mix of < and >, which 
might prevent from matching:

   _3 = b_1(D) > a_2(D);
   _5 = a_2(D) < c_4(D);
   _8 = _3 & _5;

Just like with &, we could also transform:
x < y | x < z  --->  x < max(y, z)

(but maybe wait to make sure reviewers are ok with the first 
transformation before generalizing)

On Fri, 18 Sep 2015, Marc Glisse wrote:

> On Thu, 17 Sep 2015, Michael Collison wrote:
>
>> Here is the the patch modified with test cases for MIN_EXPR and MAX_EXPR 
>> expressions. I need some assistance; this test case will fail on targets 
>> that don't have support for MIN/MAX such as 68k. Is there any way to remedy 
>> this short of enumerating whether a target support MIN/MAX in 
>> testsuite/lib/target_support?
>> 
>> 2015-07-24  Michael Collison <michael.collison@linaro.org>
>>                    Andrew Pinski <andrew.pinski@caviumnetworks.com>
>>
>>                * match.pd ((x < y) && (x < z) -> x < min (y,z),
>>                (x > y) and (x > z) -> x > max (y,z))
>>                * testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.
>> 
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 5e8fd32..8691710 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
>>     (convert (bit_and (op (convert:utype @0) (convert:utype @1))
>>               (convert:utype @4)))))))
>> 
>> +
>> +/* Transform (@0 < @1 and @0 < @2) to use min */
>> +(for op (lt le)
>> +(simplify
>
> You seem to be missing all indentation.
>
>> +(bit_and:c (op @0 @1) (op @0 @2))
>
> :c seems useless here. On the other hand, it might make sense to use op:s 
> since this is mostly useful if it removes the 2 original comparisons.
>
>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>
> How did you chose this restriction? It seems safe enough, but the 
> transformation could make sense in other cases as well. It can always be 
> generalized later though.
>
>> +(op @0 (min @1 @2)))))
>> +
>> +/* Transform (@0 > @1 and @0 > @2) to use max */
>> +(for op (gt ge)
>
> Note that you could unify the patterns with something like:
> (for op (lt le gt ge)
>     ext (min min max max)
> (simplify ...
>
>> +(simplify
>> +(bit_and:c (op @0 @1) (op @0 @2))
>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>> +(op @0 (max @1 @2)))))
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>> new file mode 100644
>> index 0000000..cc0189a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> +
>> +#define N 1024
>> +
>> +int a[N], b[N], c[N];
>> +
>> +void add (unsigned int m, unsigned int n)
>> +{
>> +  unsigned int i;
>> +  for (i = 0; i < m && i < n; ++i)
>
> Maybe writing '&' instead of '&&' would make it depend less on the target. 
> Also, both tests seem to be for GENERIC (i.e. I expect that you are already 
> seeing the optimized version with -fdump-tree-original or 
> -fdump-tree-gimple). Maybe something as simple as:
> int f(long a, long b, long c) {
>  int cmp1 = a < b;
>  int cmp2 = a < c;
>  return cmp1 & cmp2;
> }
>
>> +    a[i] = b[i] + c[i];
>> +}
>> +
>> +void add2 (unsigned int m, unsigned int n)
>> +{
>> +  unsigned int i;
>> +  for (i = N-1; i > m && i > n; --i)
>> +    a[i] = b[i] + c[i];
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */
>
>
Richard Biener Sept. 18, 2015, 9:23 a.m. UTC | #3
On Fri, Sep 18, 2015 at 9:38 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
> Just a couple extra points. We can end up with a mix of < and >, which might
> prevent from matching:
>
>   _3 = b_1(D) > a_2(D);
>   _5 = a_2(D) < c_4(D);
>   _8 = _3 & _5;
>
> Just like with &, we could also transform:
> x < y | x < z  --->  x < max(y, z)
>
> (but maybe wait to make sure reviewers are ok with the first transformation
> before generalizing)

Please merge the patterns as suggested and do the :c/:s changes as well.

The issue with getting mixed < and > is indeed there - I've wanted to
extend :c to handle tcc_comparison in some way at some point but
didn't get to how best to implement that yet...

So to fix that currently you have to replicate the merged pattern
with swapped comparison operands.

Otherwise I'm fine with the general approach.

Richard.

>
> On Fri, 18 Sep 2015, Marc Glisse wrote:
>
>> On Thu, 17 Sep 2015, Michael Collison wrote:
>>
>>> Here is the the patch modified with test cases for MIN_EXPR and MAX_EXPR
>>> expressions. I need some assistance; this test case will fail on targets
>>> that don't have support for MIN/MAX such as 68k. Is there any way to remedy
>>> this short of enumerating whether a target support MIN/MAX in
>>> testsuite/lib/target_support?
>>>
>>> 2015-07-24  Michael Collison <michael.collison@linaro.org>
>>>                    Andrew Pinski <andrew.pinski@caviumnetworks.com>
>>>
>>>                * match.pd ((x < y) && (x < z) -> x < min (y,z),
>>>                (x > y) and (x > z) -> x > max (y,z))
>>>                * testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.
>>>
>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>> index 5e8fd32..8691710 100644
>>> --- a/gcc/match.pd
>>> +++ b/gcc/match.pd
>>> @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
>>>     (convert (bit_and (op (convert:utype @0) (convert:utype @1))
>>>               (convert:utype @4)))))))
>>>
>>> +
>>> +/* Transform (@0 < @1 and @0 < @2) to use min */
>>> +(for op (lt le)
>>> +(simplify
>>
>>
>> You seem to be missing all indentation.
>>
>>> +(bit_and:c (op @0 @1) (op @0 @2))
>>
>>
>> :c seems useless here. On the other hand, it might make sense to use op:s
>> since this is mostly useful if it removes the 2 original comparisons.
>>
>>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>>
>>
>> How did you chose this restriction? It seems safe enough, but the
>> transformation could make sense in other cases as well. It can always be
>> generalized later though.
>>
>>> +(op @0 (min @1 @2)))))
>>> +
>>> +/* Transform (@0 > @1 and @0 > @2) to use max */
>>> +(for op (gt ge)
>>
>>
>> Note that you could unify the patterns with something like:
>> (for op (lt le gt ge)
>>     ext (min min max max)
>> (simplify ...
>>
>>> +(simplify
>>> +(bit_and:c (op @0 @1) (op @0 @2))
>>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>>> +(op @0 (max @1 @2)))))
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>>> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>>> new file mode 100644
>>> index 0000000..cc0189a
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>>> @@ -0,0 +1,23 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>>> +
>>> +#define N 1024
>>> +
>>> +int a[N], b[N], c[N];
>>> +
>>> +void add (unsigned int m, unsigned int n)
>>> +{
>>> +  unsigned int i;
>>> +  for (i = 0; i < m && i < n; ++i)
>>
>>
>> Maybe writing '&' instead of '&&' would make it depend less on the target.
>> Also, both tests seem to be for GENERIC (i.e. I expect that you are already
>> seeing the optimized version with -fdump-tree-original or
>> -fdump-tree-gimple). Maybe something as simple as:
>> int f(long a, long b, long c) {
>>  int cmp1 = a < b;
>>  int cmp2 = a < c;
>>  return cmp1 & cmp2;
>> }
>>
>>> +    a[i] = b[i] + c[i];
>>> +}
>>> +
>>> +void add2 (unsigned int m, unsigned int n)
>>> +{
>>> +  unsigned int i;
>>> +  for (i = N-1; i > m && i > n; --i)
>>> +    a[i] = b[i] + c[i];
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
>>> +/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */
>>
>>
>>
>
> --
> Marc Glisse
Michael Collison Sept. 18, 2015, 9:41 p.m. UTC | #4
Marc,

Can you elaborate on merging the patterns using 'ext' as mentioned in 
your post? I don't see any documentation or examples.

On 09/18/2015 12:00 AM, Marc Glisse wrote:
> On Thu, 17 Sep 2015, Michael Collison wrote:
>
>> Here is the the patch modified with test cases for MIN_EXPR and 
>> MAX_EXPR expressions. I need some assistance; this test case will 
>> fail on targets that don't have support for MIN/MAX such as 68k. Is 
>> there any way to remedy this short of enumerating whether a target 
>> support MIN/MAX in testsuite/lib/target_support?
>>
>> 2015-07-24  Michael Collison <michael.collison@linaro.org>
>>                    Andrew Pinski <andrew.pinski@caviumnetworks.com>
>>
>>                * match.pd ((x < y) && (x < z) -> x < min (y,z),
>>                (x > y) and (x > z) -> x > max (y,z))
>>                * testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.
>>
>> diff --git a/gcc/match.pd b/gcc/match.pd
>> index 5e8fd32..8691710 100644
>> --- a/gcc/match.pd
>> +++ b/gcc/match.pd
>> @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3. If not see
>>     (convert (bit_and (op (convert:utype @0) (convert:utype @1))
>>               (convert:utype @4)))))))
>>
>> +
>> +/* Transform (@0 < @1 and @0 < @2) to use min */
>> +(for op (lt le)
>> +(simplify
>
> You seem to be missing all indentation.
>
>> +(bit_and:c (op @0 @1) (op @0 @2))
>
> :c seems useless here. On the other hand, it might make sense to use 
> op:s since this is mostly useful if it removes the 2 original 
> comparisons.
>
>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>
> How did you chose this restriction? It seems safe enough, but the 
> transformation could make sense in other cases as well. It can always 
> be generalized later though.
>
>> +(op @0 (min @1 @2)))))
>> +
>> +/* Transform (@0 > @1 and @0 > @2) to use max */
>> +(for op (gt ge)
>
> Note that you could unify the patterns with something like:
> (for op (lt le gt ge)
>      ext (min min max max)
>  (simplify ...
>
>> +(simplify
>> +(bit_and:c (op @0 @1) (op @0 @2))
>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>> +(op @0 (max @1 @2)))))
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c 
>> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>> new file mode 100644
>> index 0000000..cc0189a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>> @@ -0,0 +1,23 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>> +
>> +#define N 1024
>> +
>> +int a[N], b[N], c[N];
>> +
>> +void add (unsigned int m, unsigned int n)
>> +{
>> +  unsigned int i;
>> +  for (i = 0; i < m && i < n; ++i)
>
> Maybe writing '&' instead of '&&' would make it depend less on the 
> target. Also, both tests seem to be for GENERIC (i.e. I expect that 
> you are already seeing the optimized version with -fdump-tree-original 
> or -fdump-tree-gimple). Maybe something as simple as:
> int f(long a, long b, long c) {
>   int cmp1 = a < b;
>   int cmp2 = a < c;
>   return cmp1 & cmp2;
> }
>
>> +    a[i] = b[i] + c[i];
>> +}
>> +
>> +void add2 (unsigned int m, unsigned int n)
>> +{
>> +  unsigned int i;
>> +  for (i = N-1; i > m && i > n; --i)
>> +    a[i] = b[i] + c[i];
>> +}
>> +
>> +/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
>> +/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */
>
Marc Glisse Sept. 18, 2015, 10:01 p.m. UTC | #5
On Fri, 18 Sep 2015, Michael Collison wrote:

> Can you elaborate on merging the patterns using 'ext' as mentioned in your 
> post? I don't see any documentation or examples.

https://gcc.gnu.org/onlinedocs/gccint/The-Language.html
"for" lets you iterate on several variables at the same time.

For instance,

(for bitop (bit_and bit_ior)
      rbitop (bit_ior bit_and)
  (simplify
   (bitop:c (rbitop:c @0 @1) (bit_not@2 @0))
   (bitop @1 @2)))

expands to

(simplify
  (bit_and:c (bit_ior:c @0 @1) (bit_not@2 @0))
  (bit_and @1 @2))
(simplify
  (bit_ior:c (bit_and:c @0 @1) (bit_not@2 @0))
  (bit_ior @1 @2))

there are other examples using

(for cmp (eq ne)
      icmp (ne eq)

or

(for cmp (simple_comparison)
      scmp (swapped_simple_comparison)

and you can have repetitions

(for logs (LOG LOG
 	   LOG2 LOG2
 	   LOG10 LOG10)
      exps (SQRT CBRT)

(the iteration wraps around for exps).
Michael Collison Sept. 30, 2015, 7:29 a.m. UTC | #6
Richard and Marc,

What is ':s'? I don't see any documentation for it. So you would like me 
to remove :c and add :s?


On 09/18/2015 02:23 AM, Richard Biener wrote:
> On Fri, Sep 18, 2015 at 9:38 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
>> Just a couple extra points. We can end up with a mix of < and >, which might
>> prevent from matching:
>>
>>    _3 = b_1(D) > a_2(D);
>>    _5 = a_2(D) < c_4(D);
>>    _8 = _3 & _5;
>>
>> Just like with &, we could also transform:
>> x < y | x < z  --->  x < max(y, z)
>>
>> (but maybe wait to make sure reviewers are ok with the first transformation
>> before generalizing)
> Please merge the patterns as suggested and do the :c/:s changes as well.
>
> The issue with getting mixed < and > is indeed there - I've wanted to
> extend :c to handle tcc_comparison in some way at some point but
> didn't get to how best to implement that yet...
>
> So to fix that currently you have to replicate the merged pattern
> with swapped comparison operands.
>
> Otherwise I'm fine with the general approach.
>
> Richard.
>
>> On Fri, 18 Sep 2015, Marc Glisse wrote:
>>
>>> On Thu, 17 Sep 2015, Michael Collison wrote:
>>>
>>>> Here is the the patch modified with test cases for MIN_EXPR and MAX_EXPR
>>>> expressions. I need some assistance; this test case will fail on targets
>>>> that don't have support for MIN/MAX such as 68k. Is there any way to remedy
>>>> this short of enumerating whether a target support MIN/MAX in
>>>> testsuite/lib/target_support?
>>>>
>>>> 2015-07-24  Michael Collison <michael.collison@linaro.org>
>>>>                     Andrew Pinski <andrew.pinski@caviumnetworks.com>
>>>>
>>>>                 * match.pd ((x < y) && (x < z) -> x < min (y,z),
>>>>                 (x > y) and (x > z) -> x > max (y,z))
>>>>                 * testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.
>>>>
>>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>>> index 5e8fd32..8691710 100644
>>>> --- a/gcc/match.pd
>>>> +++ b/gcc/match.pd
>>>> @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not see
>>>>      (convert (bit_and (op (convert:utype @0) (convert:utype @1))
>>>>                (convert:utype @4)))))))
>>>>
>>>> +
>>>> +/* Transform (@0 < @1 and @0 < @2) to use min */
>>>> +(for op (lt le)
>>>> +(simplify
>>>
>>> You seem to be missing all indentation.
>>>
>>>> +(bit_and:c (op @0 @1) (op @0 @2))
>>>
>>> :c seems useless here. On the other hand, it might make sense to use op:s
>>> since this is mostly useful if it removes the 2 original comparisons.
>>>
>>>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>>>
>>> How did you chose this restriction? It seems safe enough, but the
>>> transformation could make sense in other cases as well. It can always be
>>> generalized later though.
>>>
>>>> +(op @0 (min @1 @2)))))
>>>> +
>>>> +/* Transform (@0 > @1 and @0 > @2) to use max */
>>>> +(for op (gt ge)
>>>
>>> Note that you could unify the patterns with something like:
>>> (for op (lt le gt ge)
>>>      ext (min min max max)
>>> (simplify ...
>>>
>>>> +(simplify
>>>> +(bit_and:c (op @0 @1) (op @0 @2))
>>>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>>>> +(op @0 (max @1 @2)))))
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>>>> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>>>> new file mode 100644
>>>> index 0000000..cc0189a
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>>>> @@ -0,0 +1,23 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>>>> +
>>>> +#define N 1024
>>>> +
>>>> +int a[N], b[N], c[N];
>>>> +
>>>> +void add (unsigned int m, unsigned int n)
>>>> +{
>>>> +  unsigned int i;
>>>> +  for (i = 0; i < m && i < n; ++i)
>>>
>>> Maybe writing '&' instead of '&&' would make it depend less on the target.
>>> Also, both tests seem to be for GENERIC (i.e. I expect that you are already
>>> seeing the optimized version with -fdump-tree-original or
>>> -fdump-tree-gimple). Maybe something as simple as:
>>> int f(long a, long b, long c) {
>>>   int cmp1 = a < b;
>>>   int cmp2 = a < c;
>>>   return cmp1 & cmp2;
>>> }
>>>
>>>> +    a[i] = b[i] + c[i];
>>>> +}
>>>> +
>>>> +void add2 (unsigned int m, unsigned int n)
>>>> +{
>>>> +  unsigned int i;
>>>> +  for (i = N-1; i > m && i > n; --i)
>>>> +    a[i] = b[i] + c[i];
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
>>>> +/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */
>>>
>>>
>> --
>> Marc Glisse
Richard Biener Sept. 30, 2015, 8:14 a.m. UTC | #7
On Wed, Sep 30, 2015 at 9:29 AM, Michael Collison
<michael.collison@linaro.org> wrote:
> Richard and Marc,
>
> What is ':s'? I don't see any documentation for it. So you would like me to
> remove :c and add :s?

There is documentation for both in the internals manual.

I don't have enough context to say whether you should remove "them" or
not.  What's
the current patch?  If you made the suggested changes you should be left with
only required :s and :c.

Richard.

>
>
> On 09/18/2015 02:23 AM, Richard Biener wrote:
>>
>> On Fri, Sep 18, 2015 at 9:38 AM, Marc Glisse <marc.glisse@inria.fr> wrote:
>>>
>>> Just a couple extra points. We can end up with a mix of < and >, which
>>> might
>>> prevent from matching:
>>>
>>>    _3 = b_1(D) > a_2(D);
>>>    _5 = a_2(D) < c_4(D);
>>>    _8 = _3 & _5;
>>>
>>> Just like with &, we could also transform:
>>> x < y | x < z  --->  x < max(y, z)
>>>
>>> (but maybe wait to make sure reviewers are ok with the first
>>> transformation
>>> before generalizing)
>>
>> Please merge the patterns as suggested and do the :c/:s changes as well.
>>
>> The issue with getting mixed < and > is indeed there - I've wanted to
>> extend :c to handle tcc_comparison in some way at some point but
>> didn't get to how best to implement that yet...
>>
>> So to fix that currently you have to replicate the merged pattern
>> with swapped comparison operands.
>>
>> Otherwise I'm fine with the general approach.
>>
>> Richard.
>>
>>> On Fri, 18 Sep 2015, Marc Glisse wrote:
>>>
>>>> On Thu, 17 Sep 2015, Michael Collison wrote:
>>>>
>>>>> Here is the the patch modified with test cases for MIN_EXPR and
>>>>> MAX_EXPR
>>>>> expressions. I need some assistance; this test case will fail on
>>>>> targets
>>>>> that don't have support for MIN/MAX such as 68k. Is there any way to
>>>>> remedy
>>>>> this short of enumerating whether a target support MIN/MAX in
>>>>> testsuite/lib/target_support?
>>>>>
>>>>> 2015-07-24  Michael Collison <michael.collison@linaro.org>
>>>>>                     Andrew Pinski <andrew.pinski@caviumnetworks.com>
>>>>>
>>>>>                 * match.pd ((x < y) && (x < z) -> x < min (y,z),
>>>>>                 (x > y) and (x > z) -> x > max (y,z))
>>>>>                 * testsuite/gcc.dg/tree-ssa/minmax-loopend.c: New test.
>>>>>
>>>>> diff --git a/gcc/match.pd b/gcc/match.pd
>>>>> index 5e8fd32..8691710 100644
>>>>> --- a/gcc/match.pd
>>>>> +++ b/gcc/match.pd
>>>>> @@ -1793,3 +1793,17 @@ along with GCC; see the file COPYING3.  If not
>>>>> see
>>>>>      (convert (bit_and (op (convert:utype @0) (convert:utype @1))
>>>>>                (convert:utype @4)))))))
>>>>>
>>>>> +
>>>>> +/* Transform (@0 < @1 and @0 < @2) to use min */
>>>>> +(for op (lt le)
>>>>> +(simplify
>>>>
>>>>
>>>> You seem to be missing all indentation.
>>>>
>>>>> +(bit_and:c (op @0 @1) (op @0 @2))
>>>>
>>>>
>>>> :c seems useless here. On the other hand, it might make sense to use
>>>> op:s
>>>> since this is mostly useful if it removes the 2 original comparisons.
>>>>
>>>>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>>>>
>>>>
>>>> How did you chose this restriction? It seems safe enough, but the
>>>> transformation could make sense in other cases as well. It can always be
>>>> generalized later though.
>>>>
>>>>> +(op @0 (min @1 @2)))))
>>>>> +
>>>>> +/* Transform (@0 > @1 and @0 > @2) to use max */
>>>>> +(for op (gt ge)
>>>>
>>>>
>>>> Note that you could unify the patterns with something like:
>>>> (for op (lt le gt ge)
>>>>      ext (min min max max)
>>>> (simplify ...
>>>>
>>>>> +(simplify
>>>>> +(bit_and:c (op @0 @1) (op @0 @2))
>>>>> +(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
>>>>> +(op @0 (max @1 @2)))))
>>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>>>>> b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>>>>> new file mode 100644
>>>>> index 0000000..cc0189a
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
>>>>> @@ -0,0 +1,23 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>>>>> +
>>>>> +#define N 1024
>>>>> +
>>>>> +int a[N], b[N], c[N];
>>>>> +
>>>>> +void add (unsigned int m, unsigned int n)
>>>>> +{
>>>>> +  unsigned int i;
>>>>> +  for (i = 0; i < m && i < n; ++i)
>>>>
>>>>
>>>> Maybe writing '&' instead of '&&' would make it depend less on the
>>>> target.
>>>> Also, both tests seem to be for GENERIC (i.e. I expect that you are
>>>> already
>>>> seeing the optimized version with -fdump-tree-original or
>>>> -fdump-tree-gimple). Maybe something as simple as:
>>>> int f(long a, long b, long c) {
>>>>   int cmp1 = a < b;
>>>>   int cmp2 = a < c;
>>>>   return cmp1 & cmp2;
>>>> }
>>>>
>>>>> +    a[i] = b[i] + c[i];
>>>>> +}
>>>>> +
>>>>> +void add2 (unsigned int m, unsigned int n)
>>>>> +{
>>>>> +  unsigned int i;
>>>>> +  for (i = N-1; i > m && i > n; --i)
>>>>> +    a[i] = b[i] + c[i];
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
>>>>> +/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */
>>>>
>>>>
>>>>
>>> --
>>> Marc Glisse
>
>
> --
> Michael Collison
> Linaro Toolchain Working Group
> michael.collison@linaro.org
>
diff mbox

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 5e8fd32..8691710 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1793,3 +1793,17 @@  along with GCC; see the file COPYING3.  If not see
      (convert (bit_and (op (convert:utype @0) (convert:utype @1))
                (convert:utype @4)))))))

+
+/* Transform (@0 < @1 and @0 < @2) to use min */
+(for op (lt le)
+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (min @1 @2)))))
+
+/* Transform (@0 > @1 and @0 > @2) to use max */
+(for op (gt ge)
+(simplify
+(bit_and:c (op @0 @1) (op @0 @2))
+(if (INTEGRAL_TYPE_P (TREE_TYPE (@0)))
+(op @0 (max @1 @2)))))
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c 
b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
new file mode 100644
index 0000000..cc0189a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/minmax-loopend.c
@@ -0,0 +1,23 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#define N 1024
+
+int a[N], b[N], c[N];
+
+void add (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = 0; i < m && i < n; ++i)
+    a[i] = b[i] + c[i];
+}
+
+void add2 (unsigned int m, unsigned int n)
+{
+  unsigned int i;
+  for (i = N-1; i > m && i > n; --i)
+    a[i] = b[i] + c[i];
+}
+
+/* { dg-final { scan-tree-dump "MIN_EXPR" 1 "optimized" } } */
+/* { dg-final { scan-tree-dump "MAX_EXPR" 1 "optimized" } } */