diff mbox series

Check whether any statements need masking (PR 83922)

Message ID 87lggu41h9.fsf@linaro.org
State Accepted
Commit 09a7858b2c53eccf28f780f5f3e4f2764f440eb1
Headers show
Series Check whether any statements need masking (PR 83922) | expand

Commit Message

Richard Sandiford Jan. 19, 2018, 9:55 a.m. UTC
This PR is an odd case in which, due to the low optimisation level,
we enter vectorisation with:

  outer1:
    x_1 = PHI <x_3(outer2), ...>;
    ...

  inner:
    x_2 = 0;
    ...

  outer2:
    x_3 = PHI <x_2(inner)>;

These statements are tentatively treated as a double reduction by
vect_force_simple_reduction, but in the end only x_3 and x_2 are marked
as relevant.  vect_analyze_loop_operations skips over x_3, leaving the
vectorizable_reduction check to a presumed future test of x_1, which
in this case never happens.  We therefore end up vectorising x_2 only
(complete with peeling for niters!) and leave the scalar x_3 in place.

This caused a segfault in the support for fully-masked loops,
since there were no statements that needed masking.  Fixed by
checking for that.

But I think this is also a flaw in vect_analyze_loop_operations.
Outer loop vectorisation reduces the number of times that the
inner loop is executed, so it wouldn't necessarily be valid
to leave the scalar x_3 in place for all vectorisable x_2.
There's already code to forbid that when x_1 isn't present:

              /* FORNOW: we currently don't support the case that these phis
                 are not used in the outerloop (unless it is double reduction,
                 i.e., this phi is vect_reduction_def), cause this case
                 requires to actually do something here.  */

I think we need to do the same if x_1 is present but not relevant.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2018-01-19  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	PR tree-optimization/83922
	* tree-vect-loop.c (vect_verify_full_masking): Return false if
	there are no statements that need masking.
	(vect_active_double_reduction_p): New function.
	(vect_analyze_loop_operations): Use it when handling phis that
	are not in the loop header.

gcc/testsuite/
	PR tree-optimization/83922
	* gcc.dg/pr83922.c: New test.

Comments

Richard Biener Jan. 19, 2018, 11:54 a.m. UTC | #1
On Fri, Jan 19, 2018 at 10:55 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This PR is an odd case in which, due to the low optimisation level,

> we enter vectorisation with:

>

>   outer1:

>     x_1 = PHI <x_3(outer2), ...>;

>     ...

>

>   inner:

>     x_2 = 0;

>     ...

>

>   outer2:

>     x_3 = PHI <x_2(inner)>;

>

> These statements are tentatively treated as a double reduction by

> vect_force_simple_reduction, but in the end only x_3 and x_2 are marked

> as relevant.  vect_analyze_loop_operations skips over x_3, leaving the

> vectorizable_reduction check to a presumed future test of x_1, which

> in this case never happens.  We therefore end up vectorising x_2 only

> (complete with peeling for niters!) and leave the scalar x_3 in place.

>

> This caused a segfault in the support for fully-masked loops,

> since there were no statements that needed masking.  Fixed by

> checking for that.

>

> But I think this is also a flaw in vect_analyze_loop_operations.

> Outer loop vectorisation reduces the number of times that the

> inner loop is executed, so it wouldn't necessarily be valid

> to leave the scalar x_3 in place for all vectorisable x_2.

> There's already code to forbid that when x_1 isn't present:

>

>               /* FORNOW: we currently don't support the case that these phis

>                  are not used in the outerloop (unless it is double reduction,

>                  i.e., this phi is vect_reduction_def), cause this case

>                  requires to actually do something here.  */

>

> I think we need to do the same if x_1 is present but not relevant.


Hmm, yeah...

> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.

> OK to install?


Ok.

Richard.

> Richard

>

>

> 2018-01-19  Richard Sandiford  <richard.sandiford@linaro.org>

>

> gcc/

>         PR tree-optimization/83922

>         * tree-vect-loop.c (vect_verify_full_masking): Return false if

>         there are no statements that need masking.

>         (vect_active_double_reduction_p): New function.

>         (vect_analyze_loop_operations): Use it when handling phis that

>         are not in the loop header.

>

> gcc/testsuite/

>         PR tree-optimization/83922

>         * gcc.dg/pr83922.c: New test.

>

> Index: gcc/tree-vect-loop.c

> ===================================================================

> --- gcc/tree-vect-loop.c        2018-01-19 09:36:33.409191362 +0000

> +++ gcc/tree-vect-loop.c        2018-01-19 09:52:00.681330865 +0000

> @@ -1294,6 +1294,12 @@ vect_verify_full_masking (loop_vec_info

>    struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);

>    unsigned int min_ni_width;

>

> +  /* Use a normal loop if there are no statements that need masking.

> +     This only happens in rare degenerate cases: it means that the loop

> +     has no loads, no stores, and no live-out values.  */

> +  if (LOOP_VINFO_MASKS (loop_vinfo).is_empty ())

> +    return false;

> +

>    /* Get the maximum number of iterations that is representable

>       in the counter type.  */

>    tree ni_type = TREE_TYPE (LOOP_VINFO_NITERSM1 (loop_vinfo));

> @@ -1739,6 +1745,33 @@ vect_update_vf_for_slp (loop_vec_info lo

>      }

>  }

>

> +/* Return true if STMT_INFO describes a double reduction phi and if

> +   the other phi in the reduction is also relevant for vectorization.

> +   This rejects cases such as:

> +

> +      outer1:

> +       x_1 = PHI <x_3(outer2), ...>;

> +       ...

> +

> +      inner:

> +       x_2 = ...;

> +       ...

> +

> +      outer2:

> +       x_3 = PHI <x_2(inner)>;

> +

> +   if nothing in x_2 or elsewhere makes x_1 relevant.  */

> +

> +static bool

> +vect_active_double_reduction_p (stmt_vec_info stmt_info)

> +{

> +  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_double_reduction_def)

> +    return false;

> +

> +  gimple *other_phi = STMT_VINFO_REDUC_DEF (stmt_info);

> +  return STMT_VINFO_RELEVANT_P (vinfo_for_stmt (other_phi));

> +}

> +

>  /* Function vect_analyze_loop_operations.

>

>     Scan the loop stmts and make sure they are all vectorizable.  */

> @@ -1786,8 +1819,7 @@ vect_analyze_loop_operations (loop_vec_i

>                   i.e., this phi is vect_reduction_def), cause this case

>                   requires to actually do something here.  */

>                if (STMT_VINFO_LIVE_P (stmt_info)

> -                  && STMT_VINFO_DEF_TYPE (stmt_info)

> -                     != vect_double_reduction_def)

> +                 && !vect_active_double_reduction_p (stmt_info))

>                  {

>                    if (dump_enabled_p ())

>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> Index: gcc/testsuite/gcc.dg/pr83922.c

> ===================================================================

> --- /dev/null   2018-01-19 09:30:49.543814408 +0000

> +++ gcc/testsuite/gcc.dg/pr83922.c      2018-01-19 09:52:00.680331041 +0000

> @@ -0,0 +1,21 @@

> +/* { dg-options "-O -ftree-vectorize" } */

> +

> +int j4;

> +

> +void

> +k1 (int ak)

> +{

> +  while (ak < 1)

> +    {

> +      int ur;

> +

> +      for (ur = 0; ur < 2; ++ur)

> +        {

> +          ++j4;

> +          if (j4 != 0)

> +            j4 = 0;

> +        }

> +

> +      ++ak;

> +    }

> +}
diff mbox series

Patch

Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2018-01-19 09:36:33.409191362 +0000
+++ gcc/tree-vect-loop.c	2018-01-19 09:52:00.681330865 +0000
@@ -1294,6 +1294,12 @@  vect_verify_full_masking (loop_vec_info
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   unsigned int min_ni_width;
 
+  /* Use a normal loop if there are no statements that need masking.
+     This only happens in rare degenerate cases: it means that the loop
+     has no loads, no stores, and no live-out values.  */
+  if (LOOP_VINFO_MASKS (loop_vinfo).is_empty ())
+    return false;
+
   /* Get the maximum number of iterations that is representable
      in the counter type.  */
   tree ni_type = TREE_TYPE (LOOP_VINFO_NITERSM1 (loop_vinfo));
@@ -1739,6 +1745,33 @@  vect_update_vf_for_slp (loop_vec_info lo
     }
 }
 
+/* Return true if STMT_INFO describes a double reduction phi and if
+   the other phi in the reduction is also relevant for vectorization.
+   This rejects cases such as:
+
+      outer1:
+	x_1 = PHI <x_3(outer2), ...>;
+	...
+
+      inner:
+	x_2 = ...;
+	...
+
+      outer2:
+	x_3 = PHI <x_2(inner)>;
+
+   if nothing in x_2 or elsewhere makes x_1 relevant.  */
+
+static bool
+vect_active_double_reduction_p (stmt_vec_info stmt_info)
+{
+  if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_double_reduction_def)
+    return false;
+
+  gimple *other_phi = STMT_VINFO_REDUC_DEF (stmt_info);
+  return STMT_VINFO_RELEVANT_P (vinfo_for_stmt (other_phi));
+}
+
 /* Function vect_analyze_loop_operations.
 
    Scan the loop stmts and make sure they are all vectorizable.  */
@@ -1786,8 +1819,7 @@  vect_analyze_loop_operations (loop_vec_i
                  i.e., this phi is vect_reduction_def), cause this case
                  requires to actually do something here.  */
               if (STMT_VINFO_LIVE_P (stmt_info)
-                  && STMT_VINFO_DEF_TYPE (stmt_info)
-                     != vect_double_reduction_def)
+		  && !vect_active_double_reduction_p (stmt_info))
                 {
                   if (dump_enabled_p ())
 		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
Index: gcc/testsuite/gcc.dg/pr83922.c
===================================================================
--- /dev/null	2018-01-19 09:30:49.543814408 +0000
+++ gcc/testsuite/gcc.dg/pr83922.c	2018-01-19 09:52:00.680331041 +0000
@@ -0,0 +1,21 @@ 
+/* { dg-options "-O -ftree-vectorize" } */
+
+int j4;
+
+void
+k1 (int ak)
+{
+  while (ak < 1)
+    {
+      int ur;
+
+      for (ur = 0; ur < 2; ++ur)
+        {
+          ++j4;
+          if (j4 != 0)
+            j4 = 0;
+        }
+
+      ++ak;
+    }
+}