[RFC,PR82479] missing popcount builtin detection

Message ID CAELXzTNXvW1jo4hbn3LHgo5ne=u-0tFJB5Ri3-P=1z-_YHNnFA@mail.gmail.com
State New
Headers show
Series
  • [RFC,PR82479] missing popcount builtin detection
Related show

Commit Message

Kugan Vivekanandarajah Jan. 24, 2018, 9:56 p.m.
Hi All,

Here is a patch for popcount builtin detection similar to LLVM. I
would like to queue this for review for next stage 1.

1. This is done part of loop-distribution and effective for -O3 and above.
2. This does not distribute loop to detect popcount (like
memcpy/memmove). I dont think that happens in practice. Please correct
me if I am wrong.

Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

Thanks,
Kugan

gcc/ChangeLog:

2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

    PR middle-end/82479
    * tree-loop-distribution.c (handle_popcount): New.
    (pass_loop_distribution::execute): Use handle_popcount.

gcc/testsuite/ChangeLog:

2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

    PR middle-end/82479
    * gcc.dg/tree-ssa/popcount.c: New test.

Comments

Richard Biener Jan. 25, 2018, 9:04 a.m. | #1
On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi All,

>

> Here is a patch for popcount builtin detection similar to LLVM. I

> would like to queue this for review for next stage 1.

>

> 1. This is done part of loop-distribution and effective for -O3 and above.

> 2. This does not distribute loop to detect popcount (like

> memcpy/memmove). I dont think that happens in practice. Please correct

> me if I am wrong.


But then it has no business inside loop distribution but instead is
doing final value
replacement, right?  You are pattern-matching the whole loop after all.  I think
final value replacement would already do the correct thing if you
teached number of
iteration analysis that niter for

  <bb 3> [local count: 955630224]:
  # b_11 = PHI <b_5(5), b_8(6)>
  _1 = b_11 + -1;
  b_8 = _1 & b_11;
  if (b_8 != 0)
    goto <bb 6>; [89.00%]
  else
    goto <bb 8>; [11.00%]

  <bb 6> [local count: 850510900]:
  goto <bb 3>; [100.00%]

is related to popcount (b_5).

Richard.

> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>

> Thanks,

> Kugan

>

> gcc/ChangeLog:

>

> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     PR middle-end/82479

>     * tree-loop-distribution.c (handle_popcount): New.

>     (pass_loop_distribution::execute): Use handle_popcount.

>

> gcc/testsuite/ChangeLog:

>

> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     PR middle-end/82479

>     * gcc.dg/tree-ssa/popcount.c: New test.
Kugan Vivekanandarajah Jan. 31, 2018, 10:28 a.m. | #2
Hi Richard,

Thanks for the review.
On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:
> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi All,

>>

>> Here is a patch for popcount builtin detection similar to LLVM. I

>> would like to queue this for review for next stage 1.

>>

>> 1. This is done part of loop-distribution and effective for -O3 and above.

>> 2. This does not distribute loop to detect popcount (like

>> memcpy/memmove). I dont think that happens in practice. Please correct

>> me if I am wrong.

>

> But then it has no business inside loop distribution but instead is

> doing final value

> replacement, right?  You are pattern-matching the whole loop after all.  I think

> final value replacement would already do the correct thing if you

> teached number of

> iteration analysis that niter for

>

>   <bb 3> [local count: 955630224]:

>   # b_11 = PHI <b_5(5), b_8(6)>

>   _1 = b_11 + -1;

>   b_8 = _1 & b_11;

>   if (b_8 != 0)

>     goto <bb 6>; [89.00%]

>   else

>     goto <bb 8>; [11.00%]

>

>   <bb 6> [local count: 850510900]:

>   goto <bb 3>; [100.00%]


I am looking into this approach. What should be the scalar evolution
for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me
and can this be represented with the scev?

Thanks,
Kugan
>

> is related to popcount (b_5).

>

> Richard.

>

>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>

>> Thanks,

>> Kugan

>>

>> gcc/ChangeLog:

>>

>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     PR middle-end/82479

>>     * tree-loop-distribution.c (handle_popcount): New.

>>     (pass_loop_distribution::execute): Use handle_popcount.

>>

>> gcc/testsuite/ChangeLog:

>>

>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     PR middle-end/82479

>>     * gcc.dg/tree-ssa/popcount.c: New test.
Richard Biener Jan. 31, 2018, 10:39 a.m. | #3
On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Richard,

>

> Thanks for the review.

> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi All,

>>>

>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>> would like to queue this for review for next stage 1.

>>>

>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>> 2. This does not distribute loop to detect popcount (like

>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>> me if I am wrong.

>>

>> But then it has no business inside loop distribution but instead is

>> doing final value

>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>> final value replacement would already do the correct thing if you

>> teached number of

>> iteration analysis that niter for

>>

>>   <bb 3> [local count: 955630224]:

>>   # b_11 = PHI <b_5(5), b_8(6)>

>>   _1 = b_11 + -1;

>>   b_8 = _1 & b_11;

>>   if (b_8 != 0)

>>     goto <bb 6>; [89.00%]

>>   else

>>     goto <bb 8>; [11.00%]

>>

>>   <bb 6> [local count: 850510900]:

>>   goto <bb 3>; [100.00%]

>

> I am looking into this approach. What should be the scalar evolution

> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

> and can this be represented with the scev?


No, it's not affine and thus cannot be represented.  You only need the
scalar evolution of the counting IV which is already handled and
the number of iteration analysis needs to handle the above IV - this
is the missing part.

Richard.

> Thanks,

> Kugan

>>

>> is related to popcount (b_5).

>>

>> Richard.

>>

>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>

>>> Thanks,

>>> Kugan

>>>

>>> gcc/ChangeLog:

>>>

>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>

>>>     PR middle-end/82479

>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>

>>> gcc/testsuite/ChangeLog:

>>>

>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>

>>>     PR middle-end/82479

>>>     * gcc.dg/tree-ssa/popcount.c: New test.
Kugan Vivekanandarajah Feb. 1, 2018, 4:07 a.m. | #4
Hi Richard,

On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:
> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi Richard,

>>

>> Thanks for the review.

>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> Hi All,

>>>>

>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>> would like to queue this for review for next stage 1.

>>>>

>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>> 2. This does not distribute loop to detect popcount (like

>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>> me if I am wrong.

>>>

>>> But then it has no business inside loop distribution but instead is

>>> doing final value

>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>> final value replacement would already do the correct thing if you

>>> teached number of

>>> iteration analysis that niter for

>>>

>>>   <bb 3> [local count: 955630224]:

>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>   _1 = b_11 + -1;

>>>   b_8 = _1 & b_11;

>>>   if (b_8 != 0)

>>>     goto <bb 6>; [89.00%]

>>>   else

>>>     goto <bb 8>; [11.00%]

>>>

>>>   <bb 6> [local count: 850510900]:

>>>   goto <bb 3>; [100.00%]

>>

>> I am looking into this approach. What should be the scalar evolution

>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>> and can this be represented with the scev?

>

> No, it's not affine and thus cannot be represented.  You only need the

> scalar evolution of the counting IV which is already handled and

> the number of iteration analysis needs to handle the above IV - this

> is the missing part.

Thanks for the clarification. I am now matching this loop pattern in
number_of_iterations_exit when number_of_iterations_exit_assumptions
fails. If the pattern matches, I am inserting the _builtin_popcount in
the loop preheater and setting the loop niter with this. This will be
used by the final value replacement. Is this what you wanted?

In general, there is also a condition gating the loop like

if (b_4 != 0)
  goto loop;
else
  end:

This of course will not be removed by final value replacement. Since
popcount (0) is defined, this is redundant and could be removed but
not removed.

Thanks,
Kugan

>

> Richard.

>

>> Thanks,

>> Kugan

>>>

>>> is related to popcount (b_5).

>>>

>>> Richard.

>>>

>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>

>>>> Thanks,

>>>> Kugan

>>>>

>>>> gcc/ChangeLog:

>>>>

>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>

>>>>     PR middle-end/82479

>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>

>>>> gcc/testsuite/ChangeLog:

>>>>

>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>

>>>>     PR middle-end/82479

>>>>     * gcc.dg/tree-ssa/popcount.c: New test.
Richard Biener Feb. 1, 2018, 12:21 p.m. | #5
On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Richard,

>

> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi Richard,

>>>

>>> Thanks for the review.

>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> Hi All,

>>>>>

>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>> would like to queue this for review for next stage 1.

>>>>>

>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>> 2. This does not distribute loop to detect popcount (like

>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>> me if I am wrong.

>>>>

>>>> But then it has no business inside loop distribution but instead is

>>>> doing final value

>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>> final value replacement would already do the correct thing if you

>>>> teached number of

>>>> iteration analysis that niter for

>>>>

>>>>   <bb 3> [local count: 955630224]:

>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>   _1 = b_11 + -1;

>>>>   b_8 = _1 & b_11;

>>>>   if (b_8 != 0)

>>>>     goto <bb 6>; [89.00%]

>>>>   else

>>>>     goto <bb 8>; [11.00%]

>>>>

>>>>   <bb 6> [local count: 850510900]:

>>>>   goto <bb 3>; [100.00%]

>>>

>>> I am looking into this approach. What should be the scalar evolution

>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>> and can this be represented with the scev?

>>

>> No, it's not affine and thus cannot be represented.  You only need the

>> scalar evolution of the counting IV which is already handled and

>> the number of iteration analysis needs to handle the above IV - this

>> is the missing part.

> Thanks for the clarification. I am now matching this loop pattern in

> number_of_iterations_exit when number_of_iterations_exit_assumptions

> fails. If the pattern matches, I am inserting the _builtin_popcount in

> the loop preheater and setting the loop niter with this. This will be

> used by the final value replacement. Is this what you wanted?


No, you shouldn't insert a popcount stmt but instead the niter
GENERIC tree should be a CALL_EXPR to popcount with the
appropriate argument.

> In general, there is also a condition gating the loop like

>

> if (b_4 != 0)

>   goto loop;

> else

>   end:

>

> This of course will not be removed by final value replacement. Since

> popcount (0) is defined, this is redundant and could be removed but

> not removed.


Yeah, that's probably sth for another pass though.  I suppose the
end: case just uses zero in which case you'll have a PHI and you
can optimize

  if (b != 0)
    return popcount (b);
  return 0;

in phiopt.

Richard.

> Thanks,

> Kugan

>

>>

>> Richard.

>>

>>> Thanks,

>>> Kugan

>>>>

>>>> is related to popcount (b_5).

>>>>

>>>> Richard.

>>>>

>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>

>>>>> Thanks,

>>>>> Kugan

>>>>>

>>>>> gcc/ChangeLog:

>>>>>

>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>

>>>>>     PR middle-end/82479

>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>

>>>>> gcc/testsuite/ChangeLog:

>>>>>

>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>

>>>>>     PR middle-end/82479

>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.
Kugan Vivekanandarajah Feb. 8, 2018, 12:41 a.m. | #6
Hi Richard,

On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:
> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi Richard,

>>

>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> Hi Richard,

>>>>

>>>> Thanks for the review.

>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>> Hi All,

>>>>>>

>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>> would like to queue this for review for next stage 1.

>>>>>>

>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>> me if I am wrong.

>>>>>

>>>>> But then it has no business inside loop distribution but instead is

>>>>> doing final value

>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>> final value replacement would already do the correct thing if you

>>>>> teached number of

>>>>> iteration analysis that niter for

>>>>>

>>>>>   <bb 3> [local count: 955630224]:

>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>   _1 = b_11 + -1;

>>>>>   b_8 = _1 & b_11;

>>>>>   if (b_8 != 0)

>>>>>     goto <bb 6>; [89.00%]

>>>>>   else

>>>>>     goto <bb 8>; [11.00%]

>>>>>

>>>>>   <bb 6> [local count: 850510900]:

>>>>>   goto <bb 3>; [100.00%]

>>>>

>>>> I am looking into this approach. What should be the scalar evolution

>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>> and can this be represented with the scev?

>>>

>>> No, it's not affine and thus cannot be represented.  You only need the

>>> scalar evolution of the counting IV which is already handled and

>>> the number of iteration analysis needs to handle the above IV - this

>>> is the missing part.

>> Thanks for the clarification. I am now matching this loop pattern in

>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>> the loop preheater and setting the loop niter with this. This will be

>> used by the final value replacement. Is this what you wanted?

>

> No, you shouldn't insert a popcount stmt but instead the niter

> GENERIC tree should be a CALL_EXPR to popcount with the

> appropriate argument.


Thats what I tried earlier but ran into some ICEs. I wasn't sure if
niter in tree_niter_desc can take such.

Attached patch now does this. Also had to add support for CALL_EXPR in
few places to handle niter with CALL_EXPR. Does this look OK?

Thanks,
Kugan


gcc/ChangeLog:

2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

    * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.
    * tree-chrec.c (chrec_convert_1): Likewise.
    * tree-scalar-evolution.c (expression_expensive_p): Likewise.
    * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.
    * tree-ssa-loop-niter.c (check_popcount_pattern): New.
    (number_of_iterations_exit): Record niter for popcount patern.
    * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

gcc/testsuite/ChangeLog:

2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

    * gcc.dg/tree-ssa/popcount.c: New test.


>

>> In general, there is also a condition gating the loop like

>>

>> if (b_4 != 0)

>>   goto loop;

>> else

>>   end:

>>

>> This of course will not be removed by final value replacement. Since

>> popcount (0) is defined, this is redundant and could be removed but

>> not removed.

>

> Yeah, that's probably sth for another pass though.  I suppose the

> end: case just uses zero in which case you'll have a PHI and you

> can optimize

>

>   if (b != 0)

>     return popcount (b);

>   return 0;

>

> in phiopt.

>

> Richard.

>

>> Thanks,

>> Kugan

>>

>>>

>>> Richard.

>>>

>>>> Thanks,

>>>> Kugan

>>>>>

>>>>> is related to popcount (b_5).

>>>>>

>>>>> Richard.

>>>>>

>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>

>>>>>> Thanks,

>>>>>> Kugan

>>>>>>

>>>>>> gcc/ChangeLog:

>>>>>>

>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>

>>>>>>     PR middle-end/82479

>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>

>>>>>> gcc/testsuite/ChangeLog:

>>>>>>

>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>

>>>>>>     PR middle-end/82479

>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.
From 296efa2c09cdd797e3295f0c29a5a943dc87e9f2 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Thu, 8 Feb 2018 09:28:57 +1100
Subject: [PATCH] popcount v2

---
 gcc/gimple-expr.c                        |   3 +-
 gcc/testsuite/gcc.dg/tree-ssa/popcount.c |  41 ++++++++++
 gcc/tree-chrec.c                         |   2 +
 gcc/tree-scalar-evolution.c              |   2 +
 gcc/tree-ssa-loop-ivopts.c               |  10 +++
 gcc/tree-ssa-loop-niter.c                | 124 ++++++++++++++++++++++++++++++-
 gcc/tree-ssa.c                           |   2 +-
 7 files changed, 181 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c

diff --git a/gcc/gimple-expr.c b/gcc/gimple-expr.c
index 56caaca..3fc04ff 100644
--- a/gcc/gimple-expr.c
+++ b/gcc/gimple-expr.c
@@ -548,7 +548,8 @@ extract_ops_from_tree (tree expr, enum tree_code *subcode_p, tree *op1_p,
       *op2_p = NULL_TREE;
       *op3_p = NULL_TREE;
     }
-  else if (grhs_class == GIMPLE_SINGLE_RHS)
+  else if (grhs_class == GIMPLE_SINGLE_RHS
+	   || TREE_CODE (expr) == CALL_EXPR)
     {
       *op1_p = expr;
       *op2_p = NULL_TREE;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
new file mode 100644
index 0000000..86a66cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+extern int foo (int);
+
+int PopCount (long b) {
+    int c = 0;
+    b++;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+int PopCount2 (long b) {
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    foo (c);
+    return foo (c);
+}
+
+void PopCount3 (long b1) {
+
+    for (long i = 0; i < b1; ++i)
+      {
+	long b = i;
+	int c = 0;
+	while (b) {
+	    b &= b - 1;
+	    c++;
+	}
+	foo (c);
+      }
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */
diff --git a/gcc/tree-chrec.c b/gcc/tree-chrec.c
index 924df04..098f46f 100644
--- a/gcc/tree-chrec.c
+++ b/gcc/tree-chrec.c
@@ -1382,6 +1382,8 @@ keep_cast:
 						  CHREC_RIGHT (chrec)));
       res = chrec_convert_1 (type, res, at_stmt, use_overflow_semantics, from);
     }
+  else if (TREE_CODE (chrec) == CALL_EXPR)
+    res = fold_build1 (NOP_EXPR, TREE_TYPE (TREE_TYPE (chrec)), chrec);
   else
     res = fold_convert (type, chrec);
 
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 0ba1aa8..ff313c9 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -3488,6 +3488,8 @@ expression_expensive_p (tree expr)
       if (!integer_pow2p (TREE_OPERAND (expr, 1)))
 	return true;
     }
+  if (code == CALL_EXPR)
+    return false;
 
   switch (TREE_CODE_CLASS (code))
     {
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index b313571..519649a 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -985,6 +985,16 @@ contains_abnormal_ssa_name_p (tree expr)
   code = TREE_CODE (expr);
   codeclass = TREE_CODE_CLASS (code);
 
+  if (code == CALL_EXPR)
+    {
+      tree arg;
+      call_expr_arg_iterator iter;
+      FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
+	if (contains_abnormal_ssa_name_p (arg))
+	  return true;
+      return false;
+    }
+
   if (code == SSA_NAME)
     return SSA_NAME_OCCURS_IN_ABNORMAL_PHI (expr) != 0;
 
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index fa49abf..470ba2f 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2430,6 +2430,111 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,
   return (!integer_zerop (niter->assumptions));
 }
 
+/* See if LOOP is a popcout implementation of the form
+
+    int c = 0;
+    while (b) {
+	b = b & (b - 1);
+	c++;
+    }
+
+    If so, Set NITER to  __builtin_popcount (b) - 1
+    return true if we did, false otherwise.  */
+
+static bool
+check_popcount_pattern (loop_p loop, tree *niter)
+{
+  tree lhs, rhs;
+  tree dest, src;
+  gimple *and_minus_one;
+  int count = 0;
+  gphi *count_phi = NULL;
+
+  if (!single_exit (loop))
+    return false;
+
+  /* Check loop terminating branch is like
+     if (b != 0).  */
+  gimple *stmt = last_stmt (loop->header);
+  if (!stmt
+      || gimple_code (stmt) != GIMPLE_COND
+      || !zerop (gimple_cond_rhs (stmt)))
+    return false;
+
+  /* Cheeck "b = b & (b - 1)" is calculated.  */
+  lhs = gimple_cond_lhs (stmt);
+  if (TREE_CODE (lhs) != SSA_NAME)
+    return false;
+  gimple *and_stmt = SSA_NAME_DEF_STMT (lhs);
+  if (!is_gimple_assign (and_stmt)
+      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
+    return false;
+  lhs = gimple_assign_rhs1 (and_stmt);
+  rhs = gimple_assign_rhs2 (and_stmt);
+  if (TREE_CODE (lhs) == SSA_NAME
+      && (and_minus_one = SSA_NAME_DEF_STMT (lhs))
+      && is_gimple_assign (and_minus_one)
+      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
+      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
+      lhs = rhs;
+  else if (TREE_CODE (rhs) == SSA_NAME
+      && (and_minus_one = SSA_NAME_DEF_STMT (rhs))
+      && is_gimple_assign (and_minus_one)
+      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
+      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
+      ;
+  else
+    return false;
+  if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one))
+      && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 (and_minus_one)))
+    return false;
+
+  /* Check the recurrence.  */
+  gimple *phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));
+  gimple *src_phi = SSA_NAME_DEF_STMT (lhs);
+  if (gimple_code (phi) != GIMPLE_PHI
+      || gimple_code (src_phi) != GIMPLE_PHI)
+    return false;
+
+  /* Check the loop closed SSA definition for just the variable c defined in
+     loop.  */
+  src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);
+  basic_block bb = single_exit (loop)->dest;
+  for (gphi_iterator gpi = gsi_start_phis (bb);
+       !gsi_end_p (gpi); gsi_next (&gpi))
+    {
+      count_phi = gpi.phi ();
+      count++;
+    }
+
+  if (count != 1)
+    return false;
+
+  /* Make sure we have a count by one and it starts from zero.  */
+  rhs = gimple_phi_arg_def (count_phi, 0);
+  stmt = SSA_NAME_DEF_STMT (rhs);
+  if (!is_gimple_assign (stmt)
+      || (gimple_assign_rhs_code (stmt) != PLUS_EXPR)
+      || !integer_onep (gimple_assign_rhs2 (stmt)))
+    return false;
+  rhs = gimple_assign_rhs1 (stmt);
+  stmt = SSA_NAME_DEF_STMT (rhs);
+  if (gimple_code (stmt) != GIMPLE_PHI
+      || !integer_zerop (gimple_phi_arg_def (stmt,
+					     loop_preheader_edge (loop)->dest_idx)))
+    return false;
+
+  dest = gimple_phi_result (count_phi);
+  tree var = make_ssa_name (TREE_TYPE (dest), NULL);
+  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+
+  var = build_call_expr (fn, 1, src);
+  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,
+			build_int_cst (TREE_TYPE (dest), 1));
+  return true;
+}
+
+
 /* Like number_of_iterations_exit_assumptions, but return TRUE only if
    the niter information holds unconditionally.  */
 
@@ -2441,7 +2546,24 @@ number_of_iterations_exit (struct loop *loop, edge exit,
   gcond *stmt;
   if (!number_of_iterations_exit_assumptions (loop, exit, niter,
 					      &stmt, every_iteration))
-    return false;
+    {
+      tree count;
+      if (check_popcount_pattern (loop, &count))
+	{
+	  niter->assumptions = boolean_false_node;
+	  niter->control.base = NULL_TREE;
+	  niter->control.step = NULL_TREE;
+	  niter->control.no_overflow = false;
+	  niter->niter = count;
+	  niter->assumptions = boolean_true_node;
+	  niter->may_be_zero = boolean_false_node;
+	  niter->max = -1;
+	  niter->bound = NULL_TREE;
+	  niter->cmp = ERROR_MARK;
+	  return true;
+	}
+      return false;
+    }
 
   if (integer_nonzerop (niter->assumptions))
     return true;
diff --git a/gcc/tree-ssa.c b/gcc/tree-ssa.c
index ee311ce..572419c 100644
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -1047,7 +1047,7 @@ verify_ssa (bool check_modified_stmt, bool check_ssa_operands)
 	  verify_ssa_name (name, virtual_operand_p (name));
 
 	  stmt = SSA_NAME_DEF_STMT (name);
-	  if (!gimple_nop_p (stmt))
+	  if (stmt && !gimple_nop_p (stmt))
 	    {
 	      basic_block bb = gimple_bb (stmt);
 	      if (verify_def (bb, definition_block,
Richard Biener March 5, 2018, 3:24 p.m. | #7
On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Richard,

>

> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi Richard,

>>>

>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> Hi Richard,

>>>>>

>>>>> Thanks for the review.

>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>> Hi All,

>>>>>>>

>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>

>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>> me if I am wrong.

>>>>>>

>>>>>> But then it has no business inside loop distribution but instead is

>>>>>> doing final value

>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>> final value replacement would already do the correct thing if you

>>>>>> teached number of

>>>>>> iteration analysis that niter for

>>>>>>

>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>   _1 = b_11 + -1;

>>>>>>   b_8 = _1 & b_11;

>>>>>>   if (b_8 != 0)

>>>>>>     goto <bb 6>; [89.00%]

>>>>>>   else

>>>>>>     goto <bb 8>; [11.00%]

>>>>>>

>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>   goto <bb 3>; [100.00%]

>>>>>

>>>>> I am looking into this approach. What should be the scalar evolution

>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>> and can this be represented with the scev?

>>>>

>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>> scalar evolution of the counting IV which is already handled and

>>>> the number of iteration analysis needs to handle the above IV - this

>>>> is the missing part.

>>> Thanks for the clarification. I am now matching this loop pattern in

>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>> the loop preheater and setting the loop niter with this. This will be

>>> used by the final value replacement. Is this what you wanted?

>>

>> No, you shouldn't insert a popcount stmt but instead the niter

>> GENERIC tree should be a CALL_EXPR to popcount with the

>> appropriate argument.

>

> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

> niter in tree_niter_desc can take such.

>

> Attached patch now does this. Also had to add support for CALL_EXPR in

> few places to handle niter with CALL_EXPR. Does this look OK?


Overall this looks ok - the patch includes changes in places that I don't think
need changes such as chrec_convert_1 or extract_ops_from_tree.
The expression_expensive_p change should be more specific than making
all calls inexpensive as well.

The verify_ssa change looks bogus, you do

+  dest = gimple_phi_result (count_phi);
+  tree var = make_ssa_name (TREE_TYPE (dest), NULL);
+  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+
+  var = build_call_expr (fn, 1, src);
+  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,
+                       build_int_cst (TREE_TYPE (dest), 1));

why do you allocate a new SSA name here?  It seems unused
as you overwrive 'var' with the CALL_EXPR immediately.

I didn't review the pattern matching thoroughly nor the exact place you
call it.  But

+      if (check_popcount_pattern (loop, &count))
+       {
+         niter->assumptions = boolean_false_node;
+         niter->control.base = NULL_TREE;
+         niter->control.step = NULL_TREE;
+         niter->control.no_overflow = false;
+         niter->niter = count;
+         niter->assumptions = boolean_true_node;
+         niter->may_be_zero = boolean_false_node;
+         niter->max = -1;
+         niter->bound = NULL_TREE;
+         niter->cmp = ERROR_MARK;
+         return true;
+       }

simply setting may_be_zero to false looks fishy.  Try
with -fno-tree-loop-ch.  Also max should not be negative,
it should be the number of bits in the IV type?

A related testcase could be that we can completely peel
a loop like the following which iterates at most 8 times:

int a[8];
void foo (unsigned char ctrl)
{
  int c = 0;
  while (ctrl)
    {
       ctrl = ctrl & (ctrl - 1);
       a[c++] = ctrl;
    }
}

This is now stage1 material so please update and re-post.  Maybe Bin has
further suggestions as well.

Thanks,
Richard.

> Thanks,

> Kugan

>

>

> gcc/ChangeLog:

>

> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.

>     * tree-chrec.c (chrec_convert_1): Likewise.

>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.

>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.

>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.

>     (number_of_iterations_exit): Record niter for popcount patern.

>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

>

> gcc/testsuite/ChangeLog:

>

> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     * gcc.dg/tree-ssa/popcount.c: New test.

>

>

>>

>>> In general, there is also a condition gating the loop like

>>>

>>> if (b_4 != 0)

>>>   goto loop;

>>> else

>>>   end:

>>>

>>> This of course will not be removed by final value replacement. Since

>>> popcount (0) is defined, this is redundant and could be removed but

>>> not removed.

>>

>> Yeah, that's probably sth for another pass though.  I suppose the

>> end: case just uses zero in which case you'll have a PHI and you

>> can optimize

>>

>>   if (b != 0)

>>     return popcount (b);

>>   return 0;

>>

>> in phiopt.

>>

>> Richard.

>>

>>> Thanks,

>>> Kugan

>>>

>>>>

>>>> Richard.

>>>>

>>>>> Thanks,

>>>>> Kugan

>>>>>>

>>>>>> is related to popcount (b_5).

>>>>>>

>>>>>> Richard.

>>>>>>

>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>>

>>>>>>> Thanks,

>>>>>>> Kugan

>>>>>>>

>>>>>>> gcc/ChangeLog:

>>>>>>>

>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>

>>>>>>>     PR middle-end/82479

>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>>

>>>>>>> gcc/testsuite/ChangeLog:

>>>>>>>

>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>

>>>>>>>     PR middle-end/82479

>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.
Bin.Cheng March 6, 2018, 4:20 p.m. | #8
On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi Richard,

>>

>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> Hi Richard,

>>>>

>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>> Hi Richard,

>>>>>>

>>>>>> Thanks for the review.

>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>> Hi All,

>>>>>>>>

>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>>

>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>>> me if I am wrong.

>>>>>>>

>>>>>>> But then it has no business inside loop distribution but instead is

>>>>>>> doing final value

>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>>> final value replacement would already do the correct thing if you

>>>>>>> teached number of

>>>>>>> iteration analysis that niter for

>>>>>>>

>>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>>   _1 = b_11 + -1;

>>>>>>>   b_8 = _1 & b_11;

>>>>>>>   if (b_8 != 0)

>>>>>>>     goto <bb 6>; [89.00%]

>>>>>>>   else

>>>>>>>     goto <bb 8>; [11.00%]

>>>>>>>

>>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>>   goto <bb 3>; [100.00%]

>>>>>>

>>>>>> I am looking into this approach. What should be the scalar evolution

>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>>> and can this be represented with the scev?

>>>>>

>>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>>> scalar evolution of the counting IV which is already handled and

>>>>> the number of iteration analysis needs to handle the above IV - this

>>>>> is the missing part.

>>>> Thanks for the clarification. I am now matching this loop pattern in

>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>>> the loop preheater and setting the loop niter with this. This will be

>>>> used by the final value replacement. Is this what you wanted?

>>>

>>> No, you shouldn't insert a popcount stmt but instead the niter

>>> GENERIC tree should be a CALL_EXPR to popcount with the

>>> appropriate argument.

>>

>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

>> niter in tree_niter_desc can take such.

>>

>> Attached patch now does this. Also had to add support for CALL_EXPR in

>> few places to handle niter with CALL_EXPR. Does this look OK?

>

> Overall this looks ok - the patch includes changes in places that I don't think

> need changes such as chrec_convert_1 or extract_ops_from_tree.

> The expression_expensive_p change should be more specific than making

> all calls inexpensive as well.

>

> The verify_ssa change looks bogus, you do

>

> +  dest = gimple_phi_result (count_phi);

> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);

> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> +

> +  var = build_call_expr (fn, 1, src);

> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,

> +                       build_int_cst (TREE_TYPE (dest), 1));

>

> why do you allocate a new SSA name here?  It seems unused

> as you overwrive 'var' with the CALL_EXPR immediately.

>

> I didn't review the pattern matching thoroughly nor the exact place you

> call it.  But

>

> +      if (check_popcount_pattern (loop, &count))

> +       {

> +         niter->assumptions = boolean_false_node;

> +         niter->control.base = NULL_TREE;

> +         niter->control.step = NULL_TREE;

> +         niter->control.no_overflow = false;

> +         niter->niter = count;

> +         niter->assumptions = boolean_true_node;

> +         niter->may_be_zero = boolean_false_node;

> +         niter->max = -1;

> +         niter->bound = NULL_TREE;

> +         niter->cmp = ERROR_MARK;

> +         return true;

> +       }

>

> simply setting may_be_zero to false looks fishy.  Try

> with -fno-tree-loop-ch.  Also max should not be negative,

> it should be the number of bits in the IV type?

>

> A related testcase could be that we can completely peel

> a loop like the following which iterates at most 8 times:

>

> int a[8];

> void foo (unsigned char ctrl)

> {

>   int c = 0;

>   while (ctrl)

>     {

>        ctrl = ctrl & (ctrl - 1);

>        a[c++] = ctrl;

>     }

> }

>

> This is now stage1 material so please update and re-post.  Maybe Bin has

> further suggestions as well.

Sorry for being late on this.  Actually I thought about popcount in
distribution before.  More like the first patch, but handled as
another distribution pattern rather than a special case.  It's a bit
strange to compute and store the info in niters.  It's also not
straight forward when/where the transformation is finally done.
I haven't looked into the details so not sure how appropriate it will
be as a distribution pattern (current ones are only about data
references).  So I am okay with this version if it's more appropriate.

Thanks,
bin
>

> Thanks,

> Richard.

>

>> Thanks,

>> Kugan

>>

>>

>> gcc/ChangeLog:

>>

>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.

>>     * tree-chrec.c (chrec_convert_1): Likewise.

>>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.

>>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.

>>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.

>>     (number_of_iterations_exit): Record niter for popcount patern.

>>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

>>

>> gcc/testsuite/ChangeLog:

>>

>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     * gcc.dg/tree-ssa/popcount.c: New test.

>>

>>

>>>

>>>> In general, there is also a condition gating the loop like

>>>>

>>>> if (b_4 != 0)

>>>>   goto loop;

>>>> else

>>>>   end:

>>>>

>>>> This of course will not be removed by final value replacement. Since

>>>> popcount (0) is defined, this is redundant and could be removed but

>>>> not removed.

>>>

>>> Yeah, that's probably sth for another pass though.  I suppose the

>>> end: case just uses zero in which case you'll have a PHI and you

>>> can optimize

>>>

>>>   if (b != 0)

>>>     return popcount (b);

>>>   return 0;

>>>

>>> in phiopt.

>>>

>>> Richard.

>>>

>>>> Thanks,

>>>> Kugan

>>>>

>>>>>

>>>>> Richard.

>>>>>

>>>>>> Thanks,

>>>>>> Kugan

>>>>>>>

>>>>>>> is related to popcount (b_5).

>>>>>>>

>>>>>>> Richard.

>>>>>>>

>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>>>

>>>>>>>> Thanks,

>>>>>>>> Kugan

>>>>>>>>

>>>>>>>> gcc/ChangeLog:

>>>>>>>>

>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>

>>>>>>>>     PR middle-end/82479

>>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>>>

>>>>>>>> gcc/testsuite/ChangeLog:

>>>>>>>>

>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>

>>>>>>>>     PR middle-end/82479

>>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.
Richard Biener March 7, 2018, 8:26 a.m. | #9
On Tue, Mar 6, 2018 at 5:20 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener

> <richard.guenther@gmail.com> wrote:

>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi Richard,

>>>

>>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> Hi Richard,

>>>>>

>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>> Hi Richard,

>>>>>>>

>>>>>>> Thanks for the review.

>>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>>> Hi All,

>>>>>>>>>

>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>>>

>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>>>> me if I am wrong.

>>>>>>>>

>>>>>>>> But then it has no business inside loop distribution but instead is

>>>>>>>> doing final value

>>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>>>> final value replacement would already do the correct thing if you

>>>>>>>> teached number of

>>>>>>>> iteration analysis that niter for

>>>>>>>>

>>>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>>>   _1 = b_11 + -1;

>>>>>>>>   b_8 = _1 & b_11;

>>>>>>>>   if (b_8 != 0)

>>>>>>>>     goto <bb 6>; [89.00%]

>>>>>>>>   else

>>>>>>>>     goto <bb 8>; [11.00%]

>>>>>>>>

>>>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>>>   goto <bb 3>; [100.00%]

>>>>>>>

>>>>>>> I am looking into this approach. What should be the scalar evolution

>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>>>> and can this be represented with the scev?

>>>>>>

>>>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>>>> scalar evolution of the counting IV which is already handled and

>>>>>> the number of iteration analysis needs to handle the above IV - this

>>>>>> is the missing part.

>>>>> Thanks for the clarification. I am now matching this loop pattern in

>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>>>> the loop preheater and setting the loop niter with this. This will be

>>>>> used by the final value replacement. Is this what you wanted?

>>>>

>>>> No, you shouldn't insert a popcount stmt but instead the niter

>>>> GENERIC tree should be a CALL_EXPR to popcount with the

>>>> appropriate argument.

>>>

>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

>>> niter in tree_niter_desc can take such.

>>>

>>> Attached patch now does this. Also had to add support for CALL_EXPR in

>>> few places to handle niter with CALL_EXPR. Does this look OK?

>>

>> Overall this looks ok - the patch includes changes in places that I don't think

>> need changes such as chrec_convert_1 or extract_ops_from_tree.

>> The expression_expensive_p change should be more specific than making

>> all calls inexpensive as well.

>>

>> The verify_ssa change looks bogus, you do

>>

>> +  dest = gimple_phi_result (count_phi);

>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);

>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>> +

>> +  var = build_call_expr (fn, 1, src);

>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,

>> +                       build_int_cst (TREE_TYPE (dest), 1));

>>

>> why do you allocate a new SSA name here?  It seems unused

>> as you overwrive 'var' with the CALL_EXPR immediately.

>>

>> I didn't review the pattern matching thoroughly nor the exact place you

>> call it.  But

>>

>> +      if (check_popcount_pattern (loop, &count))

>> +       {

>> +         niter->assumptions = boolean_false_node;

>> +         niter->control.base = NULL_TREE;

>> +         niter->control.step = NULL_TREE;

>> +         niter->control.no_overflow = false;

>> +         niter->niter = count;

>> +         niter->assumptions = boolean_true_node;

>> +         niter->may_be_zero = boolean_false_node;

>> +         niter->max = -1;

>> +         niter->bound = NULL_TREE;

>> +         niter->cmp = ERROR_MARK;

>> +         return true;

>> +       }

>>

>> simply setting may_be_zero to false looks fishy.  Try

>> with -fno-tree-loop-ch.  Also max should not be negative,

>> it should be the number of bits in the IV type?

>>

>> A related testcase could be that we can completely peel

>> a loop like the following which iterates at most 8 times:

>>

>> int a[8];

>> void foo (unsigned char ctrl)

>> {

>>   int c = 0;

>>   while (ctrl)

>>     {

>>        ctrl = ctrl & (ctrl - 1);

>>        a[c++] = ctrl;

>>     }

>> }

>>

>> This is now stage1 material so please update and re-post.  Maybe Bin has

>> further suggestions as well.

> Sorry for being late on this.  Actually I thought about popcount in

> distribution before.  More like the first patch, but handled as

> another distribution pattern rather than a special case.  It's a bit

> strange to compute and store the info in niters.  It's also not

> straight forward when/where the transformation is finally done.


It's done at final value replacement if the counter is used.  But with
it in place we should also be able to compute niters for the case
where it isn't (like the above case).  So I believe that in the end handling
this in niter analysis is more powerful.

> I haven't looked into the details so not sure how appropriate it will

> be as a distribution pattern (current ones are only about data

> references).  So I am okay with this version if it's more appropriate.


Yeah, adding distribution patterns for scalar reduction forms is certainly
appropriate but then this is the same or at least similar as final
value replacement.

Richard.

> Thanks,

> bin

>>

>> Thanks,

>> Richard.

>>

>>> Thanks,

>>> Kugan

>>>

>>>

>>> gcc/ChangeLog:

>>>

>>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>

>>>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.

>>>     * tree-chrec.c (chrec_convert_1): Likewise.

>>>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.

>>>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.

>>>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.

>>>     (number_of_iterations_exit): Record niter for popcount patern.

>>>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

>>>

>>> gcc/testsuite/ChangeLog:

>>>

>>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>

>>>     * gcc.dg/tree-ssa/popcount.c: New test.

>>>

>>>

>>>>

>>>>> In general, there is also a condition gating the loop like

>>>>>

>>>>> if (b_4 != 0)

>>>>>   goto loop;

>>>>> else

>>>>>   end:

>>>>>

>>>>> This of course will not be removed by final value replacement. Since

>>>>> popcount (0) is defined, this is redundant and could be removed but

>>>>> not removed.

>>>>

>>>> Yeah, that's probably sth for another pass though.  I suppose the

>>>> end: case just uses zero in which case you'll have a PHI and you

>>>> can optimize

>>>>

>>>>   if (b != 0)

>>>>     return popcount (b);

>>>>   return 0;

>>>>

>>>> in phiopt.

>>>>

>>>> Richard.

>>>>

>>>>> Thanks,

>>>>> Kugan

>>>>>

>>>>>>

>>>>>> Richard.

>>>>>>

>>>>>>> Thanks,

>>>>>>> Kugan

>>>>>>>>

>>>>>>>> is related to popcount (b_5).

>>>>>>>>

>>>>>>>> Richard.

>>>>>>>>

>>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>>>>

>>>>>>>>> Thanks,

>>>>>>>>> Kugan

>>>>>>>>>

>>>>>>>>> gcc/ChangeLog:

>>>>>>>>>

>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>>

>>>>>>>>>     PR middle-end/82479

>>>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>>>>

>>>>>>>>> gcc/testsuite/ChangeLog:

>>>>>>>>>

>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>>

>>>>>>>>>     PR middle-end/82479

>>>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.
Bin.Cheng March 7, 2018, 11:25 a.m. | #10
On Wed, Mar 7, 2018 at 8:26 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Mar 6, 2018 at 5:20 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:

>> On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener

>> <richard.guenther@gmail.com> wrote:

>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> Hi Richard,

>>>>

>>>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>> Hi Richard,

>>>>>>

>>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>> Hi Richard,

>>>>>>>>

>>>>>>>> Thanks for the review.

>>>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>>>> Hi All,

>>>>>>>>>>

>>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>>>>

>>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>>>>> me if I am wrong.

>>>>>>>>>

>>>>>>>>> But then it has no business inside loop distribution but instead is

>>>>>>>>> doing final value

>>>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>>>>> final value replacement would already do the correct thing if you

>>>>>>>>> teached number of

>>>>>>>>> iteration analysis that niter for

>>>>>>>>>

>>>>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>>>>   _1 = b_11 + -1;

>>>>>>>>>   b_8 = _1 & b_11;

>>>>>>>>>   if (b_8 != 0)

>>>>>>>>>     goto <bb 6>; [89.00%]

>>>>>>>>>   else

>>>>>>>>>     goto <bb 8>; [11.00%]

>>>>>>>>>

>>>>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>>>>   goto <bb 3>; [100.00%]

>>>>>>>>

>>>>>>>> I am looking into this approach. What should be the scalar evolution

>>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>>>>> and can this be represented with the scev?

>>>>>>>

>>>>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>>>>> scalar evolution of the counting IV which is already handled and

>>>>>>> the number of iteration analysis needs to handle the above IV - this

>>>>>>> is the missing part.

>>>>>> Thanks for the clarification. I am now matching this loop pattern in

>>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>>>>> the loop preheater and setting the loop niter with this. This will be

>>>>>> used by the final value replacement. Is this what you wanted?

>>>>>

>>>>> No, you shouldn't insert a popcount stmt but instead the niter

>>>>> GENERIC tree should be a CALL_EXPR to popcount with the

>>>>> appropriate argument.

>>>>

>>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

>>>> niter in tree_niter_desc can take such.

>>>>

>>>> Attached patch now does this. Also had to add support for CALL_EXPR in

>>>> few places to handle niter with CALL_EXPR. Does this look OK?

>>>

>>> Overall this looks ok - the patch includes changes in places that I don't think

>>> need changes such as chrec_convert_1 or extract_ops_from_tree.

>>> The expression_expensive_p change should be more specific than making

>>> all calls inexpensive as well.

>>>

>>> The verify_ssa change looks bogus, you do

>>>

>>> +  dest = gimple_phi_result (count_phi);

>>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);

>>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>>> +

>>> +  var = build_call_expr (fn, 1, src);

>>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,

>>> +                       build_int_cst (TREE_TYPE (dest), 1));

>>>

>>> why do you allocate a new SSA name here?  It seems unused

>>> as you overwrive 'var' with the CALL_EXPR immediately.

>>>

>>> I didn't review the pattern matching thoroughly nor the exact place you

>>> call it.  But

>>>

>>> +      if (check_popcount_pattern (loop, &count))

>>> +       {

>>> +         niter->assumptions = boolean_false_node;

>>> +         niter->control.base = NULL_TREE;

>>> +         niter->control.step = NULL_TREE;

>>> +         niter->control.no_overflow = false;

>>> +         niter->niter = count;

>>> +         niter->assumptions = boolean_true_node;

>>> +         niter->may_be_zero = boolean_false_node;

>>> +         niter->max = -1;

>>> +         niter->bound = NULL_TREE;

>>> +         niter->cmp = ERROR_MARK;

>>> +         return true;

>>> +       }

>>>

>>> simply setting may_be_zero to false looks fishy.  Try

>>> with -fno-tree-loop-ch.  Also max should not be negative,

>>> it should be the number of bits in the IV type?

>>>

>>> A related testcase could be that we can completely peel

>>> a loop like the following which iterates at most 8 times:

>>>

>>> int a[8];

>>> void foo (unsigned char ctrl)

>>> {

>>>   int c = 0;

>>>   while (ctrl)

>>>     {

>>>        ctrl = ctrl & (ctrl - 1);

>>>        a[c++] = ctrl;

>>>     }

>>> }

>>>

>>> This is now stage1 material so please update and re-post.  Maybe Bin has

>>> further suggestions as well.

>> Sorry for being late on this.  Actually I thought about popcount in

>> distribution before.  More like the first patch, but handled as

>> another distribution pattern rather than a special case.  It's a bit

>> strange to compute and store the info in niters.  It's also not

>> straight forward when/where the transformation is finally done.

>

> It's done at final value replacement if the counter is used.  But with

> it in place we should also be able to compute niters for the case

> where it isn't (like the above case).  So I believe that in the end handling

> this in niter analysis is more powerful.

Yes, you are right, as distribution pattern, we would only be able to
handle standalone popcount loop.

Thanks,
bin
>

>> I haven't looked into the details so not sure how appropriate it will

>> be as a distribution pattern (current ones are only about data

>> references).  So I am okay with this version if it's more appropriate.

>

> Yeah, adding distribution patterns for scalar reduction forms is certainly

> appropriate but then this is the same or at least similar as final

> value replacement.

>

> Richard.

>

>> Thanks,

>> bin

>>>

>>> Thanks,

>>> Richard.

>>>

>>>> Thanks,

>>>> Kugan

>>>>

>>>>

>>>> gcc/ChangeLog:

>>>>

>>>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>

>>>>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.

>>>>     * tree-chrec.c (chrec_convert_1): Likewise.

>>>>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.

>>>>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.

>>>>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.

>>>>     (number_of_iterations_exit): Record niter for popcount patern.

>>>>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

>>>>

>>>> gcc/testsuite/ChangeLog:

>>>>

>>>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>

>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

>>>>

>>>>

>>>>>

>>>>>> In general, there is also a condition gating the loop like

>>>>>>

>>>>>> if (b_4 != 0)

>>>>>>   goto loop;

>>>>>> else

>>>>>>   end:

>>>>>>

>>>>>> This of course will not be removed by final value replacement. Since

>>>>>> popcount (0) is defined, this is redundant and could be removed but

>>>>>> not removed.

>>>>>

>>>>> Yeah, that's probably sth for another pass though.  I suppose the

>>>>> end: case just uses zero in which case you'll have a PHI and you

>>>>> can optimize

>>>>>

>>>>>   if (b != 0)

>>>>>     return popcount (b);

>>>>>   return 0;

>>>>>

>>>>> in phiopt.

>>>>>

>>>>> Richard.

>>>>>

>>>>>> Thanks,

>>>>>> Kugan

>>>>>>

>>>>>>>

>>>>>>> Richard.

>>>>>>>

>>>>>>>> Thanks,

>>>>>>>> Kugan

>>>>>>>>>

>>>>>>>>> is related to popcount (b_5).

>>>>>>>>>

>>>>>>>>> Richard.

>>>>>>>>>

>>>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>>>>>

>>>>>>>>>> Thanks,

>>>>>>>>>> Kugan

>>>>>>>>>>

>>>>>>>>>> gcc/ChangeLog:

>>>>>>>>>>

>>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>>>

>>>>>>>>>>     PR middle-end/82479

>>>>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>>>>>

>>>>>>>>>> gcc/testsuite/ChangeLog:

>>>>>>>>>>

>>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>>>

>>>>>>>>>>     PR middle-end/82479

>>>>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.
Kugan Vivekanandarajah March 8, 2018, 10:06 p.m. | #11
Hi Richard and Bin,

Thanks for your comments. I will revise the patch and post it as soon
as stage-1 opens.

Kugan

On 7 March 2018 at 22:25, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Wed, Mar 7, 2018 at 8:26 AM, Richard Biener

> <richard.guenther@gmail.com> wrote:

>> On Tue, Mar 6, 2018 at 5:20 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:

>>> On Mon, Mar 5, 2018 at 3:24 PM, Richard Biener

>>> <richard.guenther@gmail.com> wrote:

>>>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> Hi Richard,

>>>>>

>>>>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>> Hi Richard,

>>>>>>>

>>>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>>> Hi Richard,

>>>>>>>>>

>>>>>>>>> Thanks for the review.

>>>>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>>>>> Hi All,

>>>>>>>>>>>

>>>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>>>>>

>>>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>>>>>> me if I am wrong.

>>>>>>>>>>

>>>>>>>>>> But then it has no business inside loop distribution but instead is

>>>>>>>>>> doing final value

>>>>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>>>>>> final value replacement would already do the correct thing if you

>>>>>>>>>> teached number of

>>>>>>>>>> iteration analysis that niter for

>>>>>>>>>>

>>>>>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>>>>>   _1 = b_11 + -1;

>>>>>>>>>>   b_8 = _1 & b_11;

>>>>>>>>>>   if (b_8 != 0)

>>>>>>>>>>     goto <bb 6>; [89.00%]

>>>>>>>>>>   else

>>>>>>>>>>     goto <bb 8>; [11.00%]

>>>>>>>>>>

>>>>>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>>>>>   goto <bb 3>; [100.00%]

>>>>>>>>>

>>>>>>>>> I am looking into this approach. What should be the scalar evolution

>>>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>>>>>> and can this be represented with the scev?

>>>>>>>>

>>>>>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>>>>>> scalar evolution of the counting IV which is already handled and

>>>>>>>> the number of iteration analysis needs to handle the above IV - this

>>>>>>>> is the missing part.

>>>>>>> Thanks for the clarification. I am now matching this loop pattern in

>>>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>>>>>> the loop preheater and setting the loop niter with this. This will be

>>>>>>> used by the final value replacement. Is this what you wanted?

>>>>>>

>>>>>> No, you shouldn't insert a popcount stmt but instead the niter

>>>>>> GENERIC tree should be a CALL_EXPR to popcount with the

>>>>>> appropriate argument.

>>>>>

>>>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

>>>>> niter in tree_niter_desc can take such.

>>>>>

>>>>> Attached patch now does this. Also had to add support for CALL_EXPR in

>>>>> few places to handle niter with CALL_EXPR. Does this look OK?

>>>>

>>>> Overall this looks ok - the patch includes changes in places that I don't think

>>>> need changes such as chrec_convert_1 or extract_ops_from_tree.

>>>> The expression_expensive_p change should be more specific than making

>>>> all calls inexpensive as well.

>>>>

>>>> The verify_ssa change looks bogus, you do

>>>>

>>>> +  dest = gimple_phi_result (count_phi);

>>>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);

>>>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>>>> +

>>>> +  var = build_call_expr (fn, 1, src);

>>>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,

>>>> +                       build_int_cst (TREE_TYPE (dest), 1));

>>>>

>>>> why do you allocate a new SSA name here?  It seems unused

>>>> as you overwrive 'var' with the CALL_EXPR immediately.

>>>>

>>>> I didn't review the pattern matching thoroughly nor the exact place you

>>>> call it.  But

>>>>

>>>> +      if (check_popcount_pattern (loop, &count))

>>>> +       {

>>>> +         niter->assumptions = boolean_false_node;

>>>> +         niter->control.base = NULL_TREE;

>>>> +         niter->control.step = NULL_TREE;

>>>> +         niter->control.no_overflow = false;

>>>> +         niter->niter = count;

>>>> +         niter->assumptions = boolean_true_node;

>>>> +         niter->may_be_zero = boolean_false_node;

>>>> +         niter->max = -1;

>>>> +         niter->bound = NULL_TREE;

>>>> +         niter->cmp = ERROR_MARK;

>>>> +         return true;

>>>> +       }

>>>>

>>>> simply setting may_be_zero to false looks fishy.  Try

>>>> with -fno-tree-loop-ch.  Also max should not be negative,

>>>> it should be the number of bits in the IV type?

>>>>

>>>> A related testcase could be that we can completely peel

>>>> a loop like the following which iterates at most 8 times:

>>>>

>>>> int a[8];

>>>> void foo (unsigned char ctrl)

>>>> {

>>>>   int c = 0;

>>>>   while (ctrl)

>>>>     {

>>>>        ctrl = ctrl & (ctrl - 1);

>>>>        a[c++] = ctrl;

>>>>     }

>>>> }

>>>>

>>>> This is now stage1 material so please update and re-post.  Maybe Bin has

>>>> further suggestions as well.

>>> Sorry for being late on this.  Actually I thought about popcount in

>>> distribution before.  More like the first patch, but handled as

>>> another distribution pattern rather than a special case.  It's a bit

>>> strange to compute and store the info in niters.  It's also not

>>> straight forward when/where the transformation is finally done.

>>

>> It's done at final value replacement if the counter is used.  But with

>> it in place we should also be able to compute niters for the case

>> where it isn't (like the above case).  So I believe that in the end handling

>> this in niter analysis is more powerful.

> Yes, you are right, as distribution pattern, we would only be able to

> handle standalone popcount loop.

>

> Thanks,

> bin

>>

>>> I haven't looked into the details so not sure how appropriate it will

>>> be as a distribution pattern (current ones are only about data

>>> references).  So I am okay with this version if it's more appropriate.

>>

>> Yeah, adding distribution patterns for scalar reduction forms is certainly

>> appropriate but then this is the same or at least similar as final

>> value replacement.

>>

>> Richard.

>>

>>> Thanks,

>>> bin

>>>>

>>>> Thanks,

>>>> Richard.

>>>>

>>>>> Thanks,

>>>>> Kugan

>>>>>

>>>>>

>>>>> gcc/ChangeLog:

>>>>>

>>>>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>

>>>>>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.

>>>>>     * tree-chrec.c (chrec_convert_1): Likewise.

>>>>>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.

>>>>>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.

>>>>>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.

>>>>>     (number_of_iterations_exit): Record niter for popcount patern.

>>>>>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

>>>>>

>>>>> gcc/testsuite/ChangeLog:

>>>>>

>>>>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>

>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.

>>>>>

>>>>>

>>>>>>

>>>>>>> In general, there is also a condition gating the loop like

>>>>>>>

>>>>>>> if (b_4 != 0)

>>>>>>>   goto loop;

>>>>>>> else

>>>>>>>   end:

>>>>>>>

>>>>>>> This of course will not be removed by final value replacement. Since

>>>>>>> popcount (0) is defined, this is redundant and could be removed but

>>>>>>> not removed.

>>>>>>

>>>>>> Yeah, that's probably sth for another pass though.  I suppose the

>>>>>> end: case just uses zero in which case you'll have a PHI and you

>>>>>> can optimize

>>>>>>

>>>>>>   if (b != 0)

>>>>>>     return popcount (b);

>>>>>>   return 0;

>>>>>>

>>>>>> in phiopt.

>>>>>>

>>>>>> Richard.

>>>>>>

>>>>>>> Thanks,

>>>>>>> Kugan

>>>>>>>

>>>>>>>>

>>>>>>>> Richard.

>>>>>>>>

>>>>>>>>> Thanks,

>>>>>>>>> Kugan

>>>>>>>>>>

>>>>>>>>>> is related to popcount (b_5).

>>>>>>>>>>

>>>>>>>>>> Richard.

>>>>>>>>>>

>>>>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>>>>>>

>>>>>>>>>>> Thanks,

>>>>>>>>>>> Kugan

>>>>>>>>>>>

>>>>>>>>>>> gcc/ChangeLog:

>>>>>>>>>>>

>>>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>>>>

>>>>>>>>>>>     PR middle-end/82479

>>>>>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>>>>>>

>>>>>>>>>>> gcc/testsuite/ChangeLog:

>>>>>>>>>>>

>>>>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>>>>

>>>>>>>>>>>     PR middle-end/82479

>>>>>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.
Kugan Vivekanandarajah May 17, 2018, 1:39 a.m. | #12
Hi Richard,

On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:
> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi Richard,

>>

>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> Hi Richard,

>>>>

>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>> Hi Richard,

>>>>>>

>>>>>> Thanks for the review.

>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>> Hi All,

>>>>>>>>

>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>>

>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>>> me if I am wrong.

>>>>>>>

>>>>>>> But then it has no business inside loop distribution but instead is

>>>>>>> doing final value

>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>>> final value replacement would already do the correct thing if you

>>>>>>> teached number of

>>>>>>> iteration analysis that niter for

>>>>>>>

>>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>>   _1 = b_11 + -1;

>>>>>>>   b_8 = _1 & b_11;

>>>>>>>   if (b_8 != 0)

>>>>>>>     goto <bb 6>; [89.00%]

>>>>>>>   else

>>>>>>>     goto <bb 8>; [11.00%]

>>>>>>>

>>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>>   goto <bb 3>; [100.00%]

>>>>>>

>>>>>> I am looking into this approach. What should be the scalar evolution

>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>>> and can this be represented with the scev?

>>>>>

>>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>>> scalar evolution of the counting IV which is already handled and

>>>>> the number of iteration analysis needs to handle the above IV - this

>>>>> is the missing part.

>>>> Thanks for the clarification. I am now matching this loop pattern in

>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>>> the loop preheater and setting the loop niter with this. This will be

>>>> used by the final value replacement. Is this what you wanted?

>>>

>>> No, you shouldn't insert a popcount stmt but instead the niter

>>> GENERIC tree should be a CALL_EXPR to popcount with the

>>> appropriate argument.

>>

>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

>> niter in tree_niter_desc can take such.

>>

>> Attached patch now does this. Also had to add support for CALL_EXPR in

>> few places to handle niter with CALL_EXPR. Does this look OK?

>

> Overall this looks ok - the patch includes changes in places that I don't think

> need changes such as chrec_convert_1 or extract_ops_from_tree.

> The expression_expensive_p change should be more specific than making

> all calls inexpensive as well.


Changed it.

>

> The verify_ssa change looks bogus, you do

>

> +  dest = gimple_phi_result (count_phi);

> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);

> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> +

> +  var = build_call_expr (fn, 1, src);

> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,

> +                       build_int_cst (TREE_TYPE (dest), 1));

>

> why do you allocate a new SSA name here?  It seems unused

> as you overwrive 'var' with the CALL_EXPR immediately.

Changed now.

>

> I didn't review the pattern matching thoroughly nor the exact place you

> call it.  But

>

> +      if (check_popcount_pattern (loop, &count))

> +       {

> +         niter->assumptions = boolean_false_node;

> +         niter->control.base = NULL_TREE;

> +         niter->control.step = NULL_TREE;

> +         niter->control.no_overflow = false;

> +         niter->niter = count;

> +         niter->assumptions = boolean_true_node;

> +         niter->may_be_zero = boolean_false_node;

> +         niter->max = -1;

> +         niter->bound = NULL_TREE;

> +         niter->cmp = ERROR_MARK;

> +         return true;

> +       }

>

> simply setting may_be_zero to false looks fishy.

Should I set this to (argument to popcount == zero)?

> Try with -fno-tree-loop-ch.

I changed the pattern matching to handle loop without header copying
too. Looks a bit complicated checking all the conditions. Wondering if
this can be done in a simpler and easier to read way.

>  Also max should not be negative,

> it should be the number of bits in the IV type?


Changed this too.
>

> A related testcase could be that we can completely peel

> a loop like the following which iterates at most 8 times:

>

> int a[8];

> void foo (unsigned char ctrl)

> {

>   int c = 0;

>   while (ctrl)

>     {

>        ctrl = ctrl & (ctrl - 1);

>        a[c++] = ctrl;

>     }

> }


Hmm, this is an interesting test case but as I am now trying to match
a loop which does popcount, this is not handled.


Attaching the current version for review.


Thanks,
Kugan

>

> This is now stage1 material so please update and re-post.  Maybe Bin has

> further suggestions as well.

>

> Thanks,

> Richard.

>

>> Thanks,

>> Kugan

>>

>>

>> gcc/ChangeLog:

>>

>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     * gimple-expr.c (extract_ops_from_tree): Handle CALL_EXPR.

>>     * tree-chrec.c (chrec_convert_1): Likewise.

>>     * tree-scalar-evolution.c (expression_expensive_p): Likewise.

>>     * tree-ssa-loop-ivopts.c (contains_abnormal_ssa_name_p): Likewise.

>>     * tree-ssa-loop-niter.c (check_popcount_pattern): New.

>>     (number_of_iterations_exit): Record niter for popcount patern.

>>     * tree-ssa.c (verify_ssa): Check stmt to be non NULL.

>>

>> gcc/testsuite/ChangeLog:

>>

>> 2018-02-08  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     * gcc.dg/tree-ssa/popcount.c: New test.

>>

>>

>>>

>>>> In general, there is also a condition gating the loop like

>>>>

>>>> if (b_4 != 0)

>>>>   goto loop;

>>>> else

>>>>   end:

>>>>

>>>> This of course will not be removed by final value replacement. Since

>>>> popcount (0) is defined, this is redundant and could be removed but

>>>> not removed.

>>>

>>> Yeah, that's probably sth for another pass though.  I suppose the

>>> end: case just uses zero in which case you'll have a PHI and you

>>> can optimize

>>>

>>>   if (b != 0)

>>>     return popcount (b);

>>>   return 0;

>>>

>>> in phiopt.

>>>

>>> Richard.

>>>

>>>> Thanks,

>>>> Kugan

>>>>

>>>>>

>>>>> Richard.

>>>>>

>>>>>> Thanks,

>>>>>> Kugan

>>>>>>>

>>>>>>> is related to popcount (b_5).

>>>>>>>

>>>>>>> Richard.

>>>>>>>

>>>>>>>> Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

>>>>>>>>

>>>>>>>> Thanks,

>>>>>>>> Kugan

>>>>>>>>

>>>>>>>> gcc/ChangeLog:

>>>>>>>>

>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>

>>>>>>>>     PR middle-end/82479

>>>>>>>>     * tree-loop-distribution.c (handle_popcount): New.

>>>>>>>>     (pass_loop_distribution::execute): Use handle_popcount.

>>>>>>>>

>>>>>>>> gcc/testsuite/ChangeLog:

>>>>>>>>

>>>>>>>> 2018-01-25  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>>>>

>>>>>>>>     PR middle-end/82479

>>>>>>>>     * gcc.dg/tree-ssa/popcount.c: New test.
From 69093cbf4c2a2c86628f78acd563a56081e234f6 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Thu, 10 May 2018 21:41:53 +1000
Subject: [PATCH] popcount

Change-Id: I49b8557601cabffb5257e1daabeb5c59979fca96
---
 gcc/ipa-fnsummary.c                       |   4 +
 gcc/testsuite/gcc.dg/tree-ssa/popcount.c  |  41 +++++++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c |  29 ++++++
 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c |  28 ++++++
 gcc/tree-scalar-evolution.c               |   7 ++
 gcc/tree-ssa-loop-ivopts.c                |  10 ++
 gcc/tree-ssa-loop-niter.c                 | 148 +++++++++++++++++++++++++++++-
 7 files changed, 266 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index bdf9ba1..4dc156a 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -1485,6 +1485,10 @@ will_be_nonconstant_expr_predicate (struct ipa_node_params *info,
 					       nonconstant_names);
       return p2.or_with (summary->conds, p1);
     }
+  else if (TREE_CODE (expr) == CALL_EXPR)
+    {
+      return false;
+    }
   else
     {
       debug_tree (expr);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
new file mode 100644
index 0000000..86a66cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -0,0 +1,41 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+extern int foo (int);
+
+int PopCount (long b) {
+    int c = 0;
+    b++;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+int PopCount2 (long b) {
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    foo (c);
+    return foo (c);
+}
+
+void PopCount3 (long b1) {
+
+    for (long i = 0; i < b1; ++i)
+      {
+	long b = i;
+	int c = 0;
+	while (b) {
+	    b &= b - 1;
+	    c++;
+	}
+	foo (c);
+      }
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
new file mode 100644
index 0000000..52afc2d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
@@ -0,0 +1,29 @@
+/* { dg-do execute } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (int i, long b)
+{
+    int c = i;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+
+  if (foo (1, 7) != 4)
+   __builtin_abort ();
+  if (foo (0, 0) != 0)
+   __builtin_abort ();
+  if (foo (8, 0xff) != 16)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
new file mode 100644
index 0000000..0c69d97
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
@@ -0,0 +1,28 @@
+/* { dg-do execute } */
+/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
+
+int
+__attribute__ ((noinline, noclone))
+foo (long b)
+{
+    int c = i;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+int main()
+{
+  if (foo (7) != 3)
+   __builtin_abort ();
+  if (foo (0) != 0)
+   __builtin_abort ();
+  if (foo (0xff) != 8)
+   __builtin_abort ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 1 "optimized" } } */
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index fefc9de..967b154 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -1984,6 +1984,7 @@ interpret_expr (struct loop *loop, gimple *at_stmt, tree expr)
     return expr;
 
   if (TREE_CODE (expr) == POLYNOMIAL_CHREC
+      || TREE_CODE (expr) == CALL_EXPR
       || get_gimple_rhs_class (TREE_CODE (expr)) == GIMPLE_TERNARY_RHS)
     return chrec_dont_know;
 
@@ -3472,6 +3473,7 @@ bool
 expression_expensive_p (tree expr)
 {
   enum tree_code code;
+  tree fndecl;
 
   if (is_gimple_val (expr))
     return false;
@@ -3492,6 +3494,11 @@ expression_expensive_p (tree expr)
       if (!integer_pow2p (TREE_OPERAND (expr, 1)))
 	return true;
     }
+  if (code == CALL_EXPR
+      && (fndecl = get_callee_fndecl (expr))
+      && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL
+      && (BUILT_IN_POPCOUNT == DECL_FUNCTION_CODE (fndecl)))
+    return false;
 
   switch (TREE_CODE_CLASS (code))
     {
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index b313571..519649a 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -985,6 +985,16 @@ contains_abnormal_ssa_name_p (tree expr)
   code = TREE_CODE (expr);
   codeclass = TREE_CODE_CLASS (code);
 
+  if (code == CALL_EXPR)
+    {
+      tree arg;
+      call_expr_arg_iterator iter;
+      FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
+	if (contains_abnormal_ssa_name_p (arg))
+	  return true;
+      return false;
+    }
+
   if (code == SSA_NAME)
     return SSA_NAME_OCCURS_IN_ABNORMAL_PHI (expr) != 0;
 
diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 7a54c5f..f390321 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -2430,6 +2430,134 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,
   return (!integer_zerop (niter->assumptions));
 }
 
+/* See if LOOP is a popcout implementation of the form
+
+    int c = 0;
+    while (b) {
+	b = b & (b - 1);
+	c++;
+    }
+
+    If so, Set NITER to  __builtin_popcount (b) - 1
+    return true if we did, false otherwise.  */
+
+static bool
+check_popcount_pattern (loop_p loop, tree *niter, HOST_WIDE_INT *max)
+{
+  tree lhs, rhs;
+  tree dest;
+  gimple *and_minus_one;
+  gimple *phi;
+  int count = 0;
+  gimple *count_stmt = NULL;
+  bool adjust = true;
+
+  if (!single_exit (loop))
+    return false;
+
+  /* Check loop terminating branch is like
+     if (b != 0).  */
+  gimple *stmt = last_stmt (loop->header);
+  if (!stmt
+      || gimple_code (stmt) != GIMPLE_COND
+      || !zerop (gimple_cond_rhs (stmt)))
+    return false;
+
+  /* Check the loop closed SSA definition for just the variable c defined in
+     loop.  */
+  basic_block bb = single_exit (loop)->dest;
+  for (gphi_iterator gpi = gsi_start_phis (bb);
+       !gsi_end_p (gpi); gsi_next (&gpi))
+    {
+      phi = gpi.phi ();
+      count++;
+    }
+
+  if (count != 1)
+    return false;
+
+  rhs = gimple_phi_arg_def (phi, single_exit (loop)->dest_idx);
+  if (TREE_CODE (rhs) != SSA_NAME)
+    return false;
+  count_stmt = SSA_NAME_DEF_STMT (rhs);
+  lhs = gimple_cond_lhs (stmt);
+  if (TREE_CODE (lhs) != SSA_NAME)
+    return false;
+  gimple *and_stmt = SSA_NAME_DEF_STMT (lhs);
+
+  /* Depending on copy-header is performed, feeding PHI stmts might be in
+     the loop header or loop exit, handle this.  */
+  if (gimple_code (count_stmt) == GIMPLE_PHI)
+    {
+      tree t;
+      if (gimple_code (and_stmt) != GIMPLE_PHI
+	  || gimple_bb (and_stmt) != single_exit (loop)->src
+	  || gimple_bb (count_stmt) != single_exit (loop)->src)
+	return false;
+      t = gimple_phi_arg_def (count_stmt, loop_latch_edge (loop)->dest_idx);
+      if (TREE_CODE (t) != SSA_NAME)
+	return false;
+      count_stmt = SSA_NAME_DEF_STMT (t);
+      t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx);
+      if (TREE_CODE (t) != SSA_NAME)
+	return false;
+      and_stmt = SSA_NAME_DEF_STMT (t);
+      adjust = false;
+    }
+
+  /* Make sure we have a count by one.  */
+  if (!is_gimple_assign (count_stmt)
+      || (gimple_assign_rhs_code (count_stmt) != PLUS_EXPR)
+      || !integer_onep (gimple_assign_rhs2 (count_stmt)))
+    return false;
+
+  /* Cheeck "b = b & (b - 1)" is calculated.  */
+  if (!is_gimple_assign (and_stmt)
+      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
+    return false;
+
+  lhs = gimple_assign_rhs1 (and_stmt);
+  rhs = gimple_assign_rhs2 (and_stmt);
+  if (TREE_CODE (lhs) == SSA_NAME
+      && (and_minus_one = SSA_NAME_DEF_STMT (lhs))
+      && is_gimple_assign (and_minus_one)
+      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
+      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
+      lhs = rhs;
+  else if (TREE_CODE (rhs) == SSA_NAME
+      && (and_minus_one = SSA_NAME_DEF_STMT (rhs))
+      && is_gimple_assign (and_minus_one)
+      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
+      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
+      ;
+  else
+    return false;
+
+  if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one))
+      && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 (and_minus_one)))
+    return false;
+
+  /* Check the recurrence.  */
+  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));
+  gimple *src_phi = SSA_NAME_DEF_STMT (lhs);
+  if (gimple_code (phi) != GIMPLE_PHI
+      || gimple_code (src_phi) != GIMPLE_PHI)
+    return false;
+
+  dest = gimple_assign_lhs (count_stmt);
+  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);
+  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);
+  if (adjust)
+    *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),
+			  build_call_expr (fn, 1, src),
+			  build_int_cst (TREE_TYPE (dest), 1));
+  else
+    *niter = build_call_expr (fn, 1, src);
+  *max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));
+  return true;
+}
+
+
 /* Like number_of_iterations_exit_assumptions, but return TRUE only if
    the niter information holds unconditionally.  */
 
@@ -2441,7 +2569,25 @@ number_of_iterations_exit (struct loop *loop, edge exit,
   gcond *stmt;
   if (!number_of_iterations_exit_assumptions (loop, exit, niter,
 					      &stmt, every_iteration))
-    return false;
+    {
+      tree count;
+      HOST_WIDE_INT max;
+      if (check_popcount_pattern (loop, &count, &max))
+	{
+	  niter->assumptions = boolean_false_node;
+	  niter->control.base = NULL_TREE;
+	  niter->control.step = NULL_TREE;
+	  niter->control.no_overflow = false;
+	  niter->niter = count;
+	  niter->assumptions = boolean_true_node;
+	  niter->may_be_zero = boolean_false_node;
+	  niter->max = max;
+	  niter->bound = NULL_TREE;
+	  niter->cmp = ERROR_MARK;
+	  return true;
+	}
+      return false;
+    }
 
   if (integer_nonzerop (niter->assumptions))
     return true;
Bin.Cheng May 17, 2018, 9:56 a.m. | #13
On Thu, May 17, 2018 at 2:39 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Richard,

>

> On 6 March 2018 at 02:24, Richard Biener <richard.guenther@gmail.com> wrote:

>> On Thu, Feb 8, 2018 at 1:41 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi Richard,

>>>

>>> On 1 February 2018 at 23:21, Richard Biener <richard.guenther@gmail.com> wrote:

>>>> On Thu, Feb 1, 2018 at 5:07 AM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> Hi Richard,

>>>>>

>>>>> On 31 January 2018 at 21:39, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>> On Wed, Jan 31, 2018 at 11:28 AM, Kugan Vivekanandarajah

>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>> Hi Richard,

>>>>>>>

>>>>>>> Thanks for the review.

>>>>>>> On 25 January 2018 at 20:04, Richard Biener <richard.guenther@gmail.com> wrote:

>>>>>>>> On Wed, Jan 24, 2018 at 10:56 PM, Kugan Vivekanandarajah

>>>>>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>>>>>> Hi All,

>>>>>>>>>

>>>>>>>>> Here is a patch for popcount builtin detection similar to LLVM. I

>>>>>>>>> would like to queue this for review for next stage 1.

>>>>>>>>>

>>>>>>>>> 1. This is done part of loop-distribution and effective for -O3 and above.

>>>>>>>>> 2. This does not distribute loop to detect popcount (like

>>>>>>>>> memcpy/memmove). I dont think that happens in practice. Please correct

>>>>>>>>> me if I am wrong.

>>>>>>>>

>>>>>>>> But then it has no business inside loop distribution but instead is

>>>>>>>> doing final value

>>>>>>>> replacement, right?  You are pattern-matching the whole loop after all.  I think

>>>>>>>> final value replacement would already do the correct thing if you

>>>>>>>> teached number of

>>>>>>>> iteration analysis that niter for

>>>>>>>>

>>>>>>>>   <bb 3> [local count: 955630224]:

>>>>>>>>   # b_11 = PHI <b_5(5), b_8(6)>

>>>>>>>>   _1 = b_11 + -1;

>>>>>>>>   b_8 = _1 & b_11;

>>>>>>>>   if (b_8 != 0)

>>>>>>>>     goto <bb 6>; [89.00%]

>>>>>>>>   else

>>>>>>>>     goto <bb 8>; [11.00%]

>>>>>>>>

>>>>>>>>   <bb 6> [local count: 850510900]:

>>>>>>>>   goto <bb 3>; [100.00%]

>>>>>>>

>>>>>>> I am looking into this approach. What should be the scalar evolution

>>>>>>> for b_8 (i.e. b & (b -1) in a loop) should be? This is not clear to me

>>>>>>> and can this be represented with the scev?

>>>>>>

>>>>>> No, it's not affine and thus cannot be represented.  You only need the

>>>>>> scalar evolution of the counting IV which is already handled and

>>>>>> the number of iteration analysis needs to handle the above IV - this

>>>>>> is the missing part.

>>>>> Thanks for the clarification. I am now matching this loop pattern in

>>>>> number_of_iterations_exit when number_of_iterations_exit_assumptions

>>>>> fails. If the pattern matches, I am inserting the _builtin_popcount in

>>>>> the loop preheater and setting the loop niter with this. This will be

>>>>> used by the final value replacement. Is this what you wanted?

>>>>

>>>> No, you shouldn't insert a popcount stmt but instead the niter

>>>> GENERIC tree should be a CALL_EXPR to popcount with the

>>>> appropriate argument.

>>>

>>> Thats what I tried earlier but ran into some ICEs. I wasn't sure if

>>> niter in tree_niter_desc can take such.

>>>

>>> Attached patch now does this. Also had to add support for CALL_EXPR in

>>> few places to handle niter with CALL_EXPR. Does this look OK?

>>

>> Overall this looks ok - the patch includes changes in places that I don't think

>> need changes such as chrec_convert_1 or extract_ops_from_tree.

>> The expression_expensive_p change should be more specific than making

>> all calls inexpensive as well.

>

> Changed it.

>

>>

>> The verify_ssa change looks bogus, you do

>>

>> +  dest = gimple_phi_result (count_phi);

>> +  tree var = make_ssa_name (TREE_TYPE (dest), NULL);

>> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

>> +

>> +  var = build_call_expr (fn, 1, src);

>> +  *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest), var,

>> +                       build_int_cst (TREE_TYPE (dest), 1));

>>

>> why do you allocate a new SSA name here?  It seems unused

>> as you overwrive 'var' with the CALL_EXPR immediately.

> Changed now.

>

>>

>> I didn't review the pattern matching thoroughly nor the exact place you

>> call it.  But

>>

>> +      if (check_popcount_pattern (loop, &count))

>> +       {

>> +         niter->assumptions = boolean_false_node;

>> +         niter->control.base = NULL_TREE;

>> +         niter->control.step = NULL_TREE;

>> +         niter->control.no_overflow = false;

>> +         niter->niter = count;

>> +         niter->assumptions = boolean_true_node;

>> +         niter->may_be_zero = boolean_false_node;

>> +         niter->max = -1;

>> +         niter->bound = NULL_TREE;

>> +         niter->cmp = ERROR_MARK;

>> +         return true;

>> +       }

>>

>> simply setting may_be_zero to false looks fishy.

> Should I set this to (argument to popcount == zero)?

No, I think that's unnecessary.  The number of iterations is computed
as: may_be_zero ? 0 : niter;
Here niter is ZERO even when may_be_zero is set to false, and niters
is computed correctly.

I think the point is that may_be_zero is false doesn't imply that
niters is non-zero.

>

>> Try with -fno-tree-loop-ch.

> I changed the pattern matching to handle loop without header copying

> too. Looks a bit complicated checking all the conditions. Wondering if

> this can be done in a simpler and easier to read way.

>

>>  Also max should not be negative,

>> it should be the number of bits in the IV type?

>

> Changed this too.

>>

>> A related testcase could be that we can completely peel

>> a loop like the following which iterates at most 8 times:

>>

>> int a[8];

>> void foo (unsigned char ctrl)

>> {

>>   int c = 0;

>>   while (ctrl)

>>     {

>>        ctrl = ctrl & (ctrl - 1);

>>        a[c++] = ctrl;

>>     }

>> }

>

> Hmm, this is an interesting test case but as I am now trying to match

> a loop which does popcount, this is not handled.

>

>

> Attaching the current version for review.

Here are some comments.

> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c

> index 7a54c5f..f390321 100644

> --- a/gcc/tree-ssa-loop-niter.c

> +++ b/gcc/tree-ssa-loop-niter.c

> @@ -2430,6 +2430,134 @@ number_of_iterations_exit_assumptions (struct loop *loop, edge exit,

>    return (!integer_zerop (niter->assumptions));

>  }

>

> +/* See if LOOP is a popcout implementation of the form

> +

> +    int c = 0;

> +    while (b) {

> +    b = b & (b - 1);

> +    c++;

> +    }

> +

> +    If so, Set NITER to  __builtin_popcount (b) - 1

> +    return true if we did, false otherwise.  */

> +

> +static bool

> +check_popcount_pattern (loop_p loop, tree *niter, HOST_WIDE_INT *max)

> +{

> +  tree lhs, rhs;

> +  tree dest;

> +  gimple *and_minus_one;

> +  gimple *phi;

> +  int count = 0;

> +  gimple *count_stmt = NULL;

> +  bool adjust = true;

> +

> +  if (!single_exit (loop))

> +    return false;

> +

> +  /* Check loop terminating branch is like

> +     if (b != 0).  */

> +  gimple *stmt = last_stmt (loop->header);

> +  if (!stmt

> +      || gimple_code (stmt) != GIMPLE_COND

> +      || !zerop (gimple_cond_rhs (stmt)))

The check doesn't fully match the comment.  NE is not checked here.
Also can move below "(TREE_CODE (lhs) != SSA_NAME)" check up to this
point, making simple checks earlier.

> +    return false;

> +

> +  /* Check the loop closed SSA definition for just the variable c defined in

> +     loop.  */

> +  basic_block bb = single_exit (loop)->dest;

single_exit is repeatedly called various times, call it once and use
the returning edge instead?  I am not sure GCC is smart enough
removing repeated call instances.  Same to loop_latch_edge.

> +  for (gphi_iterator gpi = gsi_start_phis (bb);

> +       !gsi_end_p (gpi); gsi_next (&gpi))

> +    {

> +      phi = gpi.phi ();

> +      count++;

> +    }

> +

> +  if (count != 1)

> +    return false;

> +

> +  rhs = gimple_phi_arg_def (phi, single_exit (loop)->dest_idx);

> +  if (TREE_CODE (rhs) != SSA_NAME)

> +    return false;

> +  count_stmt = SSA_NAME_DEF_STMT (rhs);

> +  lhs = gimple_cond_lhs (stmt);

> +  if (TREE_CODE (lhs) != SSA_NAME)

> +    return false;

> +  gimple *and_stmt = SSA_NAME_DEF_STMT (lhs);

> +

> +  /* Depending on copy-header is performed, feeding PHI stmts might be in

> +     the loop header or loop exit, handle this.  */

> +  if (gimple_code (count_stmt) == GIMPLE_PHI)

> +    {

> +      tree t;

> +      if (gimple_code (and_stmt) != GIMPLE_PHI

> +      || gimple_bb (and_stmt) != single_exit (loop)->src

> +      || gimple_bb (count_stmt) != single_exit (loop)->src)

> +    return false;

> +      t = gimple_phi_arg_def (count_stmt, loop_latch_edge (loop)->dest_idx);

> +      if (TREE_CODE (t) != SSA_NAME)

> +    return false;

> +      count_stmt = SSA_NAME_DEF_STMT (t);

> +      t = gimple_phi_arg_def (and_stmt, loop_latch_edge (loop)->dest_idx);

> +      if (TREE_CODE (t) != SSA_NAME)

> +    return false;

> +      and_stmt = SSA_NAME_DEF_STMT (t);

> +      adjust = false;

> +    }

> +

> +  /* Make sure we have a count by one.  */

> +  if (!is_gimple_assign (count_stmt)

> +      || (gimple_assign_rhs_code (count_stmt) != PLUS_EXPR)

> +      || !integer_onep (gimple_assign_rhs2 (count_stmt)))

> +    return false;

> +

> +  /* Cheeck "b = b & (b - 1)" is calculated.  */

Typo.

> +  if (!is_gimple_assign (and_stmt)

> +      || gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)

> +    return false;

> +

> +  lhs = gimple_assign_rhs1 (and_stmt);

> +  rhs = gimple_assign_rhs2 (and_stmt);

> +  if (TREE_CODE (lhs) == SSA_NAME

> +      && (and_minus_one = SSA_NAME_DEF_STMT (lhs))

> +      && is_gimple_assign (and_minus_one)

> +      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)

> +      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))

> +      lhs = rhs;

> +  else if (TREE_CODE (rhs) == SSA_NAME

> +      && (and_minus_one = SSA_NAME_DEF_STMT (rhs))

> +      && is_gimple_assign (and_minus_one)

> +      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)

> +      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))

Could you avoid duplication by factoring the condition into an inline
function?  They are exactly the same for lhs/rhs.

> +      ;

> +  else

> +    return false;

> +

> +  if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one))

> +      && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 (and_minus_one)))

Here you already got lhs correctly, so don't need to check on and_stmt
rhs directly.  You can even merge this check into above one.

> +    return false;

> +

> +  /* Check the recurrence.  */

> +  phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));

> +  gimple *src_phi = SSA_NAME_DEF_STMT (lhs);

> +  if (gimple_code (phi) != GIMPLE_PHI

> +      || gimple_code (src_phi) != GIMPLE_PHI)

I think this is redundant since you have lhs equals to
gimple_assign_rhs1 (and_minus_one).  So phi == src_phi is always true?

> +    return false;

> +

> +  dest = gimple_assign_lhs (count_stmt);

> +  tree fn = builtin_decl_implicit (BUILT_IN_POPCOUNT);

> +  tree src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);

> +  if (adjust)

> +    *niter = fold_build2 (MINUS_EXPR, TREE_TYPE (dest),

> +              build_call_expr (fn, 1, src),

> +              build_int_cst (TREE_TYPE (dest), 1));

> +  else

> +    *niter = build_call_expr (fn, 1, src);

> +  *max = int_cst_value (TYPE_MAX_VALUE (TREE_TYPE (dest)));

> +  return true;

> +}

> +

> +

>  /* Like number_of_iterations_exit_assumptions, but return TRUE only if

>     the niter information holds unconditionally.  */

>

> @@ -2441,7 +2569,25 @@ number_of_iterations_exit (struct loop *loop, edge exit,

>    gcond *stmt;

>    if (!number_of_iterations_exit_assumptions (loop, exit, niter,

>                            &stmt, every_iteration))

> -    return false;

> +    {

> +      tree count;

> +      HOST_WIDE_INT max;

> +      if (check_popcount_pattern (loop, &count, &max))

> +    {

> +      niter->assumptions = boolean_false_node;

> +      niter->control.base = NULL_TREE;

> +      niter->control.step = NULL_TREE;

> +      niter->control.no_overflow = false;

> +      niter->niter = count;

> +      niter->assumptions = boolean_true_node;

> +      niter->may_be_zero = boolean_false_node;

> +      niter->max = max;

> +      niter->bound = NULL_TREE;

> +      niter->cmp = ERROR_MARK;

> +      return true;

> +    }

Better to merge these compound statement into check_popcount_pattern
and rename it into something like number_of_iterations_popcount.

I wondered if the more inefficient version popcount should be checked, like:

int count = 0;
while (x)
{
  count += x & 1;
  x = x >> 1;
}

Thanks,
bin

> +      return false;

> +    }

>

>    if (integer_nonzerop (niter->assumptions))

>      return true;

> --

> 2.7.4

>

Patch

From 9fa09af4b7013c6207e59a4920c82f089bfe45c2 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Wed, 24 Jan 2018 08:50:08 +1100
Subject: [PATCH] pocount builtin detection

Change-Id: Ic6e175f9cc9a69bd417936a4845c2c046fd446b4

Change-Id: I680eb107445660c60a5d38f5d7300ab1a3243bf5

Change-Id: Ia9f0df89e05520091dc7797195098118768c7ac2
---
 gcc/testsuite/gcc.dg/tree-ssa/popcount.c |  41 +++++++++
 gcc/tree-loop-distribution.c             | 145 +++++++++++++++++++++++++++++++
 2 files changed, 186 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/popcount.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
new file mode 100644
index 0000000..86a66cb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -0,0 +1,41 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+extern int foo (int);
+
+int PopCount (long b) {
+    int c = 0;
+    b++;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+int PopCount2 (long b) {
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    foo (c);
+    return foo (c);
+}
+
+void PopCount3 (long b1) {
+
+    for (long i = 0; i < b1; ++i)
+      {
+	long b = i;
+	int c = 0;
+	while (b) {
+	    b &= b - 1;
+	    c++;
+	}
+	foo (c);
+      }
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 3 "optimized" } } */
diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index a3d76e4..1060700 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -1585,6 +1585,148 @@  classify_builtin_ldst (loop_p loop, struct graph *rdg, partition *partition,
   return;
 }
 
+/* See if loop is a popcout implementation of the form
+
+    int c = 0;
+    while (b) {
+	b = b & (b - 1);
+	c++;
+    }
+
+    If so, convert this into c = __builtin_popcount (b)
+    return true if we did, false otherwise.  */
+
+
+static bool
+handle_popcount (loop_p loop)
+{
+  tree lhs, rhs;
+  tree dest, src;
+  gimple *and_minus_one;
+  int count = 0;
+  gphi *count_phi;
+  gimple *fn_call;
+  gimple *use_stmt;
+  use_operand_p use_p;
+  imm_use_iterator iter;
+  gimple_stmt_iterator gsi;
+
+  /* Check loop terminating branch is like
+     if (b != 0).  */
+  gimple *stmt = last_stmt (loop->header);
+  if (!stmt
+      || gimple_code (stmt) != GIMPLE_COND
+      || !zerop (gimple_cond_rhs (stmt)))
+    return false;
+
+  /* Cheeck "b = b & (b - 1)" is calculated.  */
+  lhs = gimple_cond_lhs (stmt);
+  gimple *and_stmt = SSA_NAME_DEF_STMT (lhs);
+  if (gimple_assign_rhs_code (and_stmt) != BIT_AND_EXPR)
+    return false;
+  lhs = gimple_assign_rhs1 (and_stmt);
+  rhs = gimple_assign_rhs2 (and_stmt);
+  if (TREE_CODE (lhs) == SSA_NAME
+      && (and_minus_one = SSA_NAME_DEF_STMT (lhs))
+      && is_gimple_assign (and_minus_one)
+      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
+      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
+      lhs = rhs;
+  else if (TREE_CODE (rhs) == SSA_NAME
+      && (and_minus_one = SSA_NAME_DEF_STMT (rhs))
+      && is_gimple_assign (and_minus_one)
+      && (gimple_assign_rhs_code (and_minus_one) == PLUS_EXPR)
+      && integer_minus_onep (gimple_assign_rhs2 (and_minus_one)))
+      ;
+  else
+    return false;
+  if ((gimple_assign_rhs1 (and_stmt) != gimple_assign_rhs1 (and_minus_one))
+      && (gimple_assign_rhs2 (and_stmt) != gimple_assign_rhs1 (and_minus_one)))
+    return false;
+
+  /* Check the recurrence.  */
+  gimple *phi = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (and_minus_one));
+  gimple *src_phi = SSA_NAME_DEF_STMT (lhs);
+  if (gimple_code (phi) != GIMPLE_PHI
+      || gimple_code (src_phi) != GIMPLE_PHI)
+    return false;
+
+  /* Check the loop closed SSA definition for just the variable c defined in
+     loop.  */
+  src = gimple_phi_arg_def (src_phi, loop_preheader_edge (loop)->dest_idx);
+  basic_block bb = single_exit (loop)->dest;
+  for (gphi_iterator gpi = gsi_start_phis (bb);
+       !gsi_end_p (gpi); gsi_next (&gpi))
+    {
+      count_phi = gpi.phi ();
+      count++;
+    }
+  if (count != 1)
+    return false;
+
+  /* Make sure we have a count by one and it starts from zero.  */
+  rhs = gimple_phi_arg_def (count_phi, 0);
+  stmt = SSA_NAME_DEF_STMT (rhs);
+  if  (!is_gimple_assign (stmt)
+      || (gimple_assign_rhs_code (stmt) != PLUS_EXPR)
+      || !integer_onep (gimple_assign_rhs2 (stmt)))
+    return false;
+  rhs = gimple_assign_rhs1 (stmt);
+  stmt = SSA_NAME_DEF_STMT (rhs);
+  if (gimple_code (stmt) != GIMPLE_PHI
+      || !integer_zerop (gimple_phi_arg_def (stmt, loop_preheader_edge (loop)->dest_idx)))
+    return false;
+
+  /* Create a var for builtin_popcount result and update the uses.  */
+  dest = gimple_phi_result (count_phi);
+  tree var = make_ssa_name (TREE_TYPE (dest), NULL);
+  FOR_EACH_IMM_USE_STMT (use_stmt, iter, dest)
+    {
+      FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+	SET_USE (use_p, var);
+    }
+  dest = var;
+
+  /* Remove the loop closed PHI stmt.  */
+  gsi = gsi_after_labels (gimple_bb (count_phi));
+  gphi_iterator gsi_phi = gsi_for_phi (count_phi);
+  remove_phi_node (&gsi_phi, true);
+
+  /* Insert the builtin.  */
+  gsi = gsi_after_labels (loop_preheader_edge (loop)->src);
+  src = force_gimple_operand_gsi (&gsi, src, true, NULL_TREE,
+				  false, GSI_CONTINUE_LINKING);
+  tree fn = build_fold_addr_expr (builtin_decl_implicit (BUILT_IN_POPCOUNT));
+  fn_call = gimple_build_call (fn, 1, src);
+  gimple_call_set_lhs (fn_call, dest);
+  gsi_insert_before (&gsi, fn_call, GSI_CONTINUE_LINKING);
+
+  /* In  most cases, we will have if (b != 0) preceeding the loop in
+     the form
+     if (b != 0)
+     {
+	loop;
+     }
+
+     Since __builtin_popcount (0) is defined, we can remove this condition
+     by making it always true.  */
+
+  stmt = last_stmt (EDGE_PRED (loop_preheader_edge (loop)->src, 0)->src);
+  if (stmt
+      && gimple_code (stmt) == GIMPLE_COND
+      && (gimple_cond_lhs (stmt) == src)
+      && zerop (gimple_cond_rhs (stmt)))
+    {
+      gimple_cond_set_lhs (as_a <gcond *> (stmt),
+			   build_int_cst (TREE_TYPE (src), 1));
+      update_stmt (stmt);
+    }
+
+  /* Finaly remove the loop.  */
+  destroy_loop (loop);
+  return true;
+}
+
 /* Classifies the builtin kind we can generate for PARTITION of RDG and LOOP.
    For the moment we detect memset, memcpy and memmove patterns.  Bitmap
    STMT_IN_ALL_PARTITIONS contains statements belonging to all partitions.  */
@@ -3032,6 +3174,9 @@  pass_loop_distribution::execute (function *fun)
 	  || !optimize_loop_for_speed_p (loop))
 	continue;
 
+      if (handle_popcount (loop))
+	continue;
+
       /* Don't distribute loop if niters is unknown.  */
       tree niters = number_of_latch_executions (loop);
       if (niters == NULL_TREE || niters == chrec_dont_know)
-- 
2.7.4