Transform (x >> cst) != 0 to x >= (1 << cst) and (x >> cst) == 0 to x < (1 << cst)

Message ID CAAgBjM=mScG0DeJ=wSn1eoQCrpZZdr8T9=CpGQbfiQ1WKazMyQ@mail.gmail.com
State New
Headers show
Series
  • Transform (x >> cst) != 0 to x >= (1 << cst) and (x >> cst) == 0 to x < (1 << cst)
Related show

Commit Message

Prathamesh Kulkarni Oct. 3, 2017, 7:54 p.m.
Hi,
This follow-up patch implements the patterns mentioned in $subject.
Bootstrap+test in progress on x86_64-unknown-linux-gnu and aarch64-linux-gnu.
OK to commit if passes ?

Thanks,
Prathamesh
2017-10-03  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

	* match.pd ((X >> CST) == 0 -> X < (1 << CST)): New pattern.
	((X >> CST) != 0 -> X >= (1 << CST)): Likewise.

testsuite/
	* gcc.dg/tree-ssa/cmpdiv.c: Add test-cases f3 and f4.

Comments

Jakub Jelinek Oct. 3, 2017, 8:19 p.m. | #1
On Tue, Oct 03, 2017 at 12:54:39PM -0700, Prathamesh Kulkarni wrote:
> Hi,

> This follow-up patch implements the patterns mentioned in $subject.

> Bootstrap+test in progress on x86_64-unknown-linux-gnu and aarch64-linux-gnu.

> OK to commit if passes ?

> 

> Thanks,

> Prathamesh


> 2017-10-03  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

> 

> 	* match.pd ((X >> CST) == 0 -> X < (1 << CST)): New pattern.

> 	((X >> CST) != 0 -> X >= (1 << CST)): Likewise.

> 

> testsuite/

> 	* gcc.dg/tree-ssa/cmpdiv.c: Add test-cases f3 and f4.


Why this way and not the other way around?
E.g. i?86/x86_64 and various other targets have shift instructions which
set condition codes, so (X >> 51) == 0 is certainly smaller
smaller and I believe cheaper than the latter.
Try:
void foo (void);

void
bar (unsigned long long x)
{
  if ((x >> 51) == 0)
    foo ();
}

void
baz (unsigned long long x)
{
  if (x < (1LL << 51))
    foo ();
}
with -Os on x86_64, the first function is 4 insns, 12 bytes,
the second one 5 insns, 21 bytes.

I wonder if this kind of instruction selection stuff shouldn't be
done in target.pd instead, with input from the target.

	Jakub
Marc Glisse Oct. 3, 2017, 9 p.m. | #2
On Tue, 3 Oct 2017, Jakub Jelinek wrote:

> On Tue, Oct 03, 2017 at 12:54:39PM -0700, Prathamesh Kulkarni wrote:

>> Hi,

>> This follow-up patch implements the patterns mentioned in $subject.

>> Bootstrap+test in progress on x86_64-unknown-linux-gnu and aarch64-linux-gnu.

>> OK to commit if passes ?

>>

>> Thanks,

>> Prathamesh

>

>> 2017-10-03  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

>>

>> 	* match.pd ((X >> CST) == 0 -> X < (1 << CST)): New pattern.

>> 	((X >> CST) != 0 -> X >= (1 << CST)): Likewise.


build_int_cstu doesn't work for vectors, you want build_one_cst. I never 
know if we should check single_use or not :-(

>> testsuite/

>> 	* gcc.dg/tree-ssa/cmpdiv.c: Add test-cases f3 and f4.

>

> Why this way and not the other way around?


For high level gimple optimizations, X < CST is more convenient (and 
smaller, just one insn) than (X >> CST) == 0.

> E.g. i?86/x86_64 and various other targets have shift instructions which

> set condition codes, so (X >> 51) == 0 is certainly smaller

> smaller and I believe cheaper than the latter.

> Try:

> void foo (void);

>

> void

> bar (unsigned long long x)

> {

>  if ((x >> 51) == 0)

>    foo ();

> }

>

> void

> baz (unsigned long long x)

> {

>  if (x < (1LL << 51))

>    foo ();

> }

> with -Os on x86_64, the first function is 4 insns, 12 bytes,

> the second one 5 insns, 21 bytes.

>

> I wonder if this kind of instruction selection stuff shouldn't be

> done in target.pd instead, with input from the target.


At a late stage, maybe during an RTL pass or expansion (or just before 
expansion) it would indeed be good to generate a shift for such 
comparisons, on targets where that sets a cc. The lack of this 
transformation could be considered a blocker for the other one, to avoid 
regressing on bar.

-- 
Marc Glisse
Jeff Law Oct. 3, 2017, 9:10 p.m. | #3
On 10/03/2017 03:00 PM, Marc Glisse wrote:
> On Tue, 3 Oct 2017, Jakub Jelinek wrote:

> 

>> On Tue, Oct 03, 2017 at 12:54:39PM -0700, Prathamesh Kulkarni wrote:

>>> Hi,

>>> This follow-up patch implements the patterns mentioned in $subject.

>>> Bootstrap+test in progress on x86_64-unknown-linux-gnu and

>>> aarch64-linux-gnu.

>>> OK to commit if passes ?

>>>

>>> Thanks,

>>> Prathamesh

>>

>>> 2017-10-03  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

>>>

>>>     * match.pd ((X >> CST) == 0 -> X < (1 << CST)): New pattern.

>>>     ((X >> CST) != 0 -> X >= (1 << CST)): Likewise.

> 

> build_int_cstu doesn't work for vectors, you want build_one_cst. I never

> know if we should check single_use or not :-(

> 

>>> testsuite/

>>>     * gcc.dg/tree-ssa/cmpdiv.c: Add test-cases f3 and f4.

>>

>> Why this way and not the other way around?

> 

> For high level gimple optimizations, X < CST is more convenient (and

> smaller, just one insn) than (X >> CST) == 0.

Right.  One could easily argue that Prathamesh's form should be the
preferred form because it is simpler at the gimple level -- and that
x86-isms should be dealt with later in the pipeline.


> 

>> E.g. i?86/x86_64 and various other targets have shift instructions which

>> set condition codes, so (X >> 51) == 0 is certainly smaller

>> smaller and I believe cheaper than the latter.

>> Try:

>> void foo (void);

>>

>> void

>> bar (unsigned long long x)

>> {

>>  if ((x >> 51) == 0)

>>    foo ();

>> }

>>

>> void

>> baz (unsigned long long x)

>> {

>>  if (x < (1LL << 51))

>>    foo ();

>> }

>> with -Os on x86_64, the first function is 4 insns, 12 bytes,

>> the second one 5 insns, 21 bytes.

>>

>> I wonder if this kind of instruction selection stuff shouldn't be

>> done in target.pd instead, with input from the target.

Right, but I think that argues that Prathamesh's patch is the right
direction and that to move forward what needs to happen is something
needs to be fixed at the gimple/rtl border to ensure we get good x86 code.

> 

> At a late stage, maybe during an RTL pass or expansion (or just before

> expansion) it would indeed be good to generate a shift for such

> comparisons, on targets where that sets a cc. The lack of this

> transformation could be considered a blocker for the other one, to avoid

> regressing on bar.

Right.  In fact, I think Jakub's test ought to be added to this work as
part of its basic testing.

jeff
Prathamesh Kulkarni Oct. 4, 2017, 6:34 p.m. | #4
On 3 October 2017 at 14:10, Jeff Law <law@redhat.com> wrote:
> On 10/03/2017 03:00 PM, Marc Glisse wrote:

>> On Tue, 3 Oct 2017, Jakub Jelinek wrote:

>>

>>> On Tue, Oct 03, 2017 at 12:54:39PM -0700, Prathamesh Kulkarni wrote:

>>>> Hi,

>>>> This follow-up patch implements the patterns mentioned in $subject.

>>>> Bootstrap+test in progress on x86_64-unknown-linux-gnu and

>>>> aarch64-linux-gnu.

>>>> OK to commit if passes ?

>>>>

>>>> Thanks,

>>>> Prathamesh

>>>

>>>> 2017-10-03  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

>>>>

>>>>     * match.pd ((X >> CST) == 0 -> X < (1 << CST)): New pattern.

>>>>     ((X >> CST) != 0 -> X >= (1 << CST)): Likewise.

>>

>> build_int_cstu doesn't work for vectors, you want build_one_cst. I never

>> know if we should check single_use or not :-(

Thanks for the pointers, I wasn't aware about build_one_cst.
>>

>>>> testsuite/

>>>>     * gcc.dg/tree-ssa/cmpdiv.c: Add test-cases f3 and f4.

>>>

>>> Why this way and not the other way around?

>>

>> For high level gimple optimizations, X < CST is more convenient (and

>> smaller, just one insn) than (X >> CST) == 0.

> Right.  One could easily argue that Prathamesh's form should be the

> preferred form because it is simpler at the gimple level -- and that

> x86-isms should be dealt with later in the pipeline.

>

>

>>

>>> E.g. i?86/x86_64 and various other targets have shift instructions which

>>> set condition codes, so (X >> 51) == 0 is certainly smaller

>>> smaller and I believe cheaper than the latter.

>>> Try:

>>> void foo (void);

>>>

>>> void

>>> bar (unsigned long long x)

>>> {

>>>  if ((x >> 51) == 0)

>>>    foo ();

>>> }

>>>

>>> void

>>> baz (unsigned long long x)

>>> {

>>>  if (x < (1LL << 51))

>>>    foo ();

>>> }

>>> with -Os on x86_64, the first function is 4 insns, 12 bytes,

>>> the second one 5 insns, 21 bytes.

Indeed, I can reproduce this behavior. Thanks for pointing it out.
>>>

>>> I wonder if this kind of instruction selection stuff shouldn't be

>>> done in target.pd instead, with input from the target.

> Right, but I think that argues that Prathamesh's patch is the right

> direction and that to move forward what needs to happen is something

> needs to be fixed at the gimple/rtl border to ensure we get good x86 code.

>

>>

>> At a late stage, maybe during an RTL pass or expansion (or just before

>> expansion) it would indeed be good to generate a shift for such

>> comparisons, on targets where that sets a cc. The lack of this

>> transformation could be considered a blocker for the other one, to avoid

>> regressing on bar.

> Right.  In fact, I think Jakub's test ought to be added to this work as

> part of its basic testing.

Um, how to write the above-test case for bar() in DejaGNU format ?
Is it possible to test for number of insns or to use nm to test for
size of a function with dejagnu directive ?
Or should I scan for "cmpq" since the pattern transforms a right shift
and cmp against 0 to cmp between operands ?

Thanks,
Prathamesh
>

> jeff

>
Richard Biener Oct. 5, 2017, 8:28 a.m. | #5
On Tue, 3 Oct 2017, Jeff Law wrote:

> On 10/03/2017 03:00 PM, Marc Glisse wrote:

> > On Tue, 3 Oct 2017, Jakub Jelinek wrote:

> > 

> >> On Tue, Oct 03, 2017 at 12:54:39PM -0700, Prathamesh Kulkarni wrote:

> >>> Hi,

> >>> This follow-up patch implements the patterns mentioned in $subject.

> >>> Bootstrap+test in progress on x86_64-unknown-linux-gnu and

> >>> aarch64-linux-gnu.

> >>> OK to commit if passes ?

> >>>

> >>> Thanks,

> >>> Prathamesh

> >>

> >>> 2017-10-03  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>

> >>>

> >>>     * match.pd ((X >> CST) == 0 -> X < (1 << CST)): New pattern.

> >>>     ((X >> CST) != 0 -> X >= (1 << CST)): Likewise.

> > 

> > build_int_cstu doesn't work for vectors, you want build_one_cst. I never

> > know if we should check single_use or not :-(

> > 

> >>> testsuite/

> >>>     * gcc.dg/tree-ssa/cmpdiv.c: Add test-cases f3 and f4.

> >>

> >> Why this way and not the other way around?

> > 

> > For high level gimple optimizations, X < CST is more convenient (and

> > smaller, just one insn) than (X >> CST) == 0.

> Right.  One could easily argue that Prathamesh's form should be the

> preferred form because it is simpler at the gimple level -- and that

> x86-isms should be dealt with later in the pipeline.

> 

> 

> > 

> >> E.g. i?86/x86_64 and various other targets have shift instructions which

> >> set condition codes, so (X >> 51) == 0 is certainly smaller

> >> smaller and I believe cheaper than the latter.

> >> Try:

> >> void foo (void);

> >>

> >> void

> >> bar (unsigned long long x)

> >> {

> >>  if ((x >> 51) == 0)

> >>    foo ();

> >> }

> >>

> >> void

> >> baz (unsigned long long x)

> >> {

> >>  if (x < (1LL << 51))

> >>    foo ();

> >> }

> >> with -Os on x86_64, the first function is 4 insns, 12 bytes,

> >> the second one 5 insns, 21 bytes.

> >>

> >> I wonder if this kind of instruction selection stuff shouldn't be

> >> done in target.pd instead, with input from the target.

> Right, but I think that argues that Prathamesh's patch is the right

> direction and that to move forward what needs to happen is something

> needs to be fixed at the gimple/rtl border to ensure we get good x86 code.


For this case it should be even "easy" within the current RTL expansion
framework since all we need is to look at a single comparison and its
operands.

So I'd suggest to go forward with the canonicalization as proposed
and address the expansion issue as followup.  It's currently a missed
optimization for baz anyway.

Richard.

> > 

> > At a late stage, maybe during an RTL pass or expansion (or just before

> > expansion) it would indeed be good to generate a shift for such

> > comparisons, on targets where that sets a cc. The lack of this

> > transformation could be considered a blocker for the other one, to avoid

> > regressing on bar.

> Right.  In fact, I think Jakub's test ought to be added to this work as

> part of its basic testing.

> 

> jeff

> 

> 


-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
Jeff Law Oct. 12, 2017, 4:58 a.m. | #6
On 10/04/2017 12:34 PM, Prathamesh Kulkarni wrote:

>>> At a late stage, maybe during an RTL pass or expansion (or just before

>>> expansion) it would indeed be good to generate a shift for such

>>> comparisons, on targets where that sets a cc. The lack of this

>>> transformation could be considered a blocker for the other one, to avoid

>>> regressing on bar.

>> Right.  In fact, I think Jakub's test ought to be added to this work as

>> part of its basic testing.

> Um, how to write the above-test case for bar() in DejaGNU format ?

> Is it possible to test for number of insns or to use nm to test for

> size of a function with dejagnu directive ?

> Or should I scan for "cmpq" since the pattern transforms a right shift

> and cmp against 0 to cmp between operands ?

I've used a variety of approaches.  The difficulty in the x86 world is
that there's enough tuning variants that change the resulting code which
can make scanning problematical.

So one approach is to look at the total object size if that's a reliable
indicator of the issue at hand.

As an example see gcc.target/m68k/pr52076-?.c.  I'm sure there's others
if you were to search for "object-size" in the testsuite.

You could try to count the insns or search for specific key insns that
indicate we generated desirable or undesirable code.

But what might ultimately be best would be to scan the rtl at expansion
time.  That catches things at the earliest point we can observe them.

Another alternative would be to include some dumping code to indicate
when we transform the canonical gimple form into the form we want for
the x86 backend at rtl expansion.  You could then scan the debugging
dumps for those annotations.

jeff

Patch

diff --git a/gcc/match.pd b/gcc/match.pd
index 43ab226a705..883ad5ba53c 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1287,6 +1287,18 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        && (VECTOR_TYPE_P (type) || !VECTOR_TYPE_P (TREE_TYPE (@0))))
    (ocmp @0 @1))))
 
+/* Transform
+   (x >> cst) != 0 -> x >= (1 << cst)
+   (x >> cst) == 0 -> x < (1 << cst)
+   if x, cst are unsigned.  */
+(for cmp (eq ne)
+     ocmp (lt ge)
+ (simplify
+  (cmp (rshift @0 INTEGER_CST@1) integer_zerop)
+  (if (TYPE_UNSIGNED (TREE_TYPE (@0))
+      && (VECTOR_TYPE_P (type) || !VECTOR_TYPE_P (TREE_TYPE (@0))))
+   (ocmp @0 (lshift { build_int_cstu (TREE_TYPE (@0), 1); } @1)))))
+
 /* X == C - X can never be true if C is odd.  */
 (for cmp (eq ne)
  (simplify
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cmpdiv.c b/gcc/testsuite/gcc.dg/tree-ssa/cmpdiv.c
index 14161f5ea6f..fc5bc8c3674 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/cmpdiv.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/cmpdiv.c
@@ -15,4 +15,19 @@  _Bool f2(unsigned x, unsigned y)
   return t2;
 }
 
+_Bool f3(unsigned x)
+{
+  unsigned t1 = x >> 4;
+  _Bool t2 = (t1 != 0);
+  return t2;
+}
+
+_Bool f4(unsigned x)
+{
+  unsigned t1 = x >> 4;
+  _Bool t2 = (t1 == 0);
+  return t2;
+}
+
 /* { dg-final { scan-tree-dump-not "trunc_div_expr" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "rshift_expr" "optimized" } } */