diff mbox

gomp-nvptx branch - middle-end changes

Message ID alpine.LNX.2.20.13.1611221958200.31471@monopod.intra.ispras.ru
State New
Headers show

Commit Message

Alexander Monakov Nov. 22, 2016, 5:25 p.m. UTC
On Fri, 11 Nov 2016, Jakub Jelinek wrote:
> Ok for trunk, once the needed corresponding config/nvptx bits are committed,

> with one nit below that needs immediate action and the rest can be resolved

> incrementally.  I'd like to check in afterwards the attached patch, at least

> for now, so that non-offloaded SIMD code is less affected.


Testing your patch revealed an issue in Fortran offloaded code; types of
boolean_type_node in f951 and boolean_false_node in lto1 (when omp_device_lower
runs) don't match.  I'm attaching a revised patch that addresses it by simply
using an integer type (there are also two other minor issues, below).

> Please change this into

> (ENABLE_OFFLOADING && (flag_openmp || in_lto))

> for now, so that we don't waste compile time even when clearly it

> isn't needed, and incrementally change the inliner to propagate

> the property.


As ENABLE_OFFLOADING is not set in the offloading compiler, this additionally
needs to accept ACCEL_COMPILER.  Applied like this:

+  virtual bool gate (function *ARG_UNUSED (fun))
+    {
+      /* FIXME: this should use PROP_gimple_lomp_dev.  */
+#ifdef ACCEL_COMPILER
+      return true;
+#else
+      return ENABLE_OFFLOADING && (flag_openmp || in_lto_p);
+#endif
+    }


In your GOMP_USE_SIMT() patch,

> @@ -4314,6 +4364,12 @@ lower_rec_simd_input_clauses (tree new_v

>    if (max_vf == 0)

>      {

>        max_vf = omp_max_vf ();

> +      if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt),

> +			   OMP_CLAUSE__SIMT_))

> +	{

> +	  int max_simt = omp_max_simt_vf ();

> +	  max_vf = MAX (max_vf, max_simt);

> +	}


I don't believe here there's a need to take a maximum.  Cloning the loop upfront
means that SIMD+SIMT styles are not going to mix within a single loop.  I've
simplified it to an if-then-else in the revised patch.

> @@ -10601,7 +10656,11 @@ expand_omp_simd (struct omp_region *regi

>    bool offloaded = cgraph_node::get (current_function_decl)->offloadable;

>    for (struct omp_region *rgn = region; !offloaded && rgn; rgn = rgn->outer)

>      offloaded = rgn->type == GIMPLE_OMP_TARGET;

> -  bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1;

> +  bool is_simt

> +    = (offloaded

> +       && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),

> +			   OMP_CLAUSE__SIMT_)

> +       && safelen_int > 1);


Here computation of 'offloaded' is no longer needed, because presence of
OMP_CLAUSE__SIMT_ would imply that.  Removed in the revised patch.

I've noticed that your patch doesn't adjust 'maybe_simt' in "ordered" lowering.
Not sure if that's intentional -- as I understand it's possible to look at the
enclosing context's clauses because 'omp ordered' must be closely nested with
the corresponding loop.  I've added a FIXME in the patch.

Alexander

Comments

Jakub Jelinek Nov. 22, 2016, 5:37 p.m. UTC | #1
On Tue, Nov 22, 2016 at 08:25:45PM +0300, Alexander Monakov wrote:
> On Fri, 11 Nov 2016, Jakub Jelinek wrote:

> > Ok for trunk, once the needed corresponding config/nvptx bits are committed,

> > with one nit below that needs immediate action and the rest can be resolved

> > incrementally.  I'd like to check in afterwards the attached patch, at least

> > for now, so that non-offloaded SIMD code is less affected.

> 

> Testing your patch revealed an issue in Fortran offloaded code; types of

> boolean_type_node in f951 and boolean_false_node in lto1 (when omp_device_lower

> runs) don't match.  I'm attaching a revised patch that addresses it by simply

> using an integer type (there are also two other minor issues, below).


Ok.

> > Please change this into

> > (ENABLE_OFFLOADING && (flag_openmp || in_lto))

> > for now, so that we don't waste compile time even when clearly it

> > isn't needed, and incrementally change the inliner to propagate

> > the property.

> 

> As ENABLE_OFFLOADING is not set in the offloading compiler, this additionally

> needs to accept ACCEL_COMPILER.  Applied like this:

> 

> +  virtual bool gate (function *ARG_UNUSED (fun))

> +    {

> +      /* FIXME: this should use PROP_gimple_lomp_dev.  */

> +#ifdef ACCEL_COMPILER

> +      return true;

> +#else

> +      return ENABLE_OFFLOADING && (flag_openmp || in_lto_p);

> +#endif

> +    }


Makes sense.

> > @@ -4314,6 +4364,12 @@ lower_rec_simd_input_clauses (tree new_v

> >    if (max_vf == 0)

> >      {

> >        max_vf = omp_max_vf ();

> > +      if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt),

> > +			   OMP_CLAUSE__SIMT_))

> > +	{

> > +	  int max_simt = omp_max_simt_vf ();

> > +	  max_vf = MAX (max_vf, max_simt);

> > +	}

> 

> I don't believe here there's a need to take a maximum.  Cloning the loop upfront

> means that SIMD+SIMT styles are not going to mix within a single loop.  I've

> simplified it to an if-then-else in the revised patch.


Ok.

> > @@ -10601,7 +10656,11 @@ expand_omp_simd (struct omp_region *regi

> >    bool offloaded = cgraph_node::get (current_function_decl)->offloadable;

> >    for (struct omp_region *rgn = region; !offloaded && rgn; rgn = rgn->outer)

> >      offloaded = rgn->type == GIMPLE_OMP_TARGET;

> > -  bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1;

> > +  bool is_simt

> > +    = (offloaded

> > +       && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),

> > +			   OMP_CLAUSE__SIMT_)

> > +       && safelen_int > 1);

> 

> Here computation of 'offloaded' is no longer needed, because presence of

> OMP_CLAUSE__SIMT_ would imply that.  Removed in the revised patch.

> 

> I've noticed that your patch doesn't adjust 'maybe_simt' in "ordered" lowering.

> Not sure if that's intentional -- as I understand it's possible to look at the

> enclosing context's clauses because 'omp ordered' must be closely nested with


Right now omp ordered simd for non-simt basically causes vf 1, because the
vectorizer isn't ready for having non-vectorized portions of code within
vectorized loop.

> the corresponding loop.  I've added a FIXME in the patch.


Ok for trunk, thanks.

	Jakub
diff mbox

Patch

	* internal-fn.c (expand_GOMP_USE_SIMT): New function.
	* tree.c (omp_clause_num_ops): OMP_CLAUSE__SIMT_ has 0 operands.
	(omp_clause_code_name): Add _simt_ name.
	(walk_tree_1): Handle OMP_CLAUSE__SIMT_.
	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE__SIMT_.
	* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE__SIMT_.
	(scan_omp_simd): New function.
	(scan_omp_1_stmt): Use it in target regions if needed.
	(omp_max_vf): Don't max with omp_max_simt_vf.
	(lower_rec_simd_input_clauses): Use omp_max_simt_vf if
	OMP_CLAUSE__SIMT_ is present.
	(lower_rec_input_clauses): Compute maybe_simt from presence of
	OMP_CLAUSE__SIMT_.
	(lower_lastprivate_clauses): Likewise.
	(expand_omp_simd): Likewise.
	(execute_omp_device_lower): Lower IFN_GOMP_USE_SIMT.
	* internal-fn.def (GOMP_USE_SIMT): New internal function.
	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE__SIMT_.

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 6cd8522..b1dbc98 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -158,6 +158,14 @@  expand_ANNOTATE (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* This should get expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_USE_SIMT (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* Lane index on SIMT targets: thread index in the warp on NVPTX.  On targets
    without SIMT execution this should be expanded in omp_device_lower pass.  */
 
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index f055230..9a03e17 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -141,6 +141,7 @@  DEF_INTERNAL_INT_FN (FFS, ECF_CONST, ffs, unary)
 DEF_INTERNAL_INT_FN (PARITY, ECF_CONST, parity, unary)
 DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST, popcount, unary)
 
+DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_VF, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_LAST_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 6c52bff..eab0af5 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -278,6 +278,7 @@  static bool omp_any_child_fn_dumped;
 static void scan_omp (gimple_seq *, omp_context *);
 static tree scan_omp_1_op (tree *, int *, void *);
 static gphi *find_phi_with_arg_on_edge (tree, edge);
+static int omp_max_simt_vf (void);
 
 #define WALK_SUBSTMTS  \
     case GIMPLE_BIND: \
@@ -2192,6 +2193,7 @@  scan_sharing_clauses (tree clauses, omp_context *ctx,
 	case OMP_CLAUSE_INDEPENDENT:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE__SIMT_:
 	  break;
 
 	case OMP_CLAUSE_ALIGNED:
@@ -2363,6 +2365,7 @@  scan_sharing_clauses (tree clauses, omp_context *ctx,
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
 	case OMP_CLAUSE__GRIDDIM_:
+	case OMP_CLAUSE__SIMT_:
 	  break;
 
 	case OMP_CLAUSE_TILE:
@@ -3066,6 +3069,48 @@  scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
   scan_omp (gimple_omp_body_ptr (stmt), ctx);
 }
 
+/* Duplicate #pragma omp simd, one for SIMT, another one for SIMD.  */
+
+static void
+scan_omp_simd (gimple_stmt_iterator *gsi, gomp_for *stmt,
+	       omp_context *outer_ctx)
+{
+  gbind *bind = gimple_build_bind (NULL, NULL, NULL);
+  gsi_replace (gsi, bind, false);
+  gimple_seq seq = NULL;
+  gimple *g = gimple_build_call_internal (IFN_GOMP_USE_SIMT, 0);
+  tree cond = create_tmp_var_raw (integer_type_node);
+  DECL_CONTEXT (cond) = current_function_decl;
+  DECL_SEEN_IN_BIND_EXPR_P (cond) = 1;
+  gimple_bind_set_vars (bind, cond);
+  gimple_call_set_lhs (g, cond);
+  gimple_seq_add_stmt (&seq, g);
+  tree lab1 = create_artificial_label (UNKNOWN_LOCATION);
+  tree lab2 = create_artificial_label (UNKNOWN_LOCATION);
+  tree lab3 = create_artificial_label (UNKNOWN_LOCATION);
+  g = gimple_build_cond (NE_EXPR, cond, integer_zero_node, lab1, lab2);
+  gimple_seq_add_stmt (&seq, g);
+  g = gimple_build_label (lab1);
+  gimple_seq_add_stmt (&seq, g);
+  gimple_seq new_seq = copy_gimple_seq_and_replace_locals (stmt);
+  gomp_for *new_stmt = as_a <gomp_for *> (new_seq);
+  tree clause = build_omp_clause (gimple_location (stmt), OMP_CLAUSE__SIMT_);
+  OMP_CLAUSE_CHAIN (clause) = gimple_omp_for_clauses (new_stmt);
+  gimple_omp_for_set_clauses (new_stmt, clause);
+  gimple_seq_add_stmt (&seq, new_stmt);
+  g = gimple_build_goto (lab3);
+  gimple_seq_add_stmt (&seq, g);
+  g = gimple_build_label (lab2);
+  gimple_seq_add_stmt (&seq, g);
+  gimple_seq_add_stmt (&seq, stmt);
+  g = gimple_build_label (lab3);
+  gimple_seq_add_stmt (&seq, g);
+  gimple_bind_set_body (bind, seq);
+  update_stmt (bind);
+  scan_omp_for (new_stmt, outer_ctx);
+  scan_omp_for (stmt, outer_ctx);
+}
+
 /* Scan an OpenMP sections directive.  */
 
 static void
@@ -3969,7 +4014,13 @@  scan_omp_1_stmt (gimple_stmt_iterator *gsi, bool *handled_ops_p,
       break;
 
     case GIMPLE_OMP_FOR:
-      scan_omp_for (as_a <gomp_for *> (stmt), ctx);
+      if (((gimple_omp_for_kind (as_a <gomp_for *> (stmt))
+	    & GF_OMP_FOR_KIND_MASK) == GF_OMP_FOR_KIND_SIMD)
+	  && omp_maybe_offloaded_ctx (ctx)
+	  && omp_max_simt_vf ())
+	scan_omp_simd (gsi, as_a <gomp_for *> (stmt), ctx);
+      else
+	scan_omp_for (as_a <gomp_for *> (stmt), ctx);
       break;
 
     case GIMPLE_OMP_SECTIONS:
@@ -4316,8 +4367,7 @@  omp_max_vf (void)
       if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT)
 	vf = GET_MODE_NUNITS (vqimode);
     }
-  int svf = omp_max_simt_vf ();
-  return MAX (vf, svf);
+  return vf;
 }
 
 /* Helper function of lower_rec_input_clauses, used for #pragma omp simd
@@ -4329,7 +4379,11 @@  lower_rec_simd_input_clauses (tree new_var, omp_context *ctx, int &max_vf,
 {
   if (max_vf == 0)
     {
-      max_vf = omp_max_vf ();
+      if (find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
+			   OMP_CLAUSE__SIMT_))
+	max_vf = omp_max_simt_vf ();
+      else
+	max_vf = omp_max_vf ();
       if (max_vf > 1)
 	{
 	  tree c = find_omp_clause (gimple_omp_for_clauses (ctx->stmt),
@@ -4405,8 +4459,7 @@  lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
   int pass;
   bool is_simd = (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR
 		  && gimple_omp_for_kind (ctx->stmt) & GF_OMP_FOR_SIMD);
-  bool maybe_simt
-    = is_simd && omp_maybe_offloaded_ctx (ctx) && omp_max_simt_vf () > 1;
+  bool maybe_simt = is_simd && find_omp_clause (clauses, OMP_CLAUSE__SIMT_);
   int max_vf = 0;
   tree lane = NULL_TREE, idx = NULL_TREE;
   tree simt_lane = NULL_TREE;
@@ -5497,7 +5550,7 @@  lower_lastprivate_clauses (tree clauses, tree predicate, gimple_seq *stmt_list,
   if (gimple_code (ctx->stmt) == GIMPLE_OMP_FOR
       && gimple_omp_for_kind (ctx->stmt) & GF_OMP_FOR_SIMD)
     {
-      maybe_simt = omp_maybe_offloaded_ctx (ctx) && omp_max_simt_vf () > 1;
+      maybe_simt = find_omp_clause (orig_clauses, OMP_CLAUSE__SIMT_);
       simduid = find_omp_clause (orig_clauses, OMP_CLAUSE__SIMDUID_);
       if (simduid)
 	simduid = OMP_CLAUSE__SIMDUID__DECL (simduid);
@@ -10749,10 +10802,9 @@  expand_omp_simd (struct omp_region *region, struct omp_for_data *fd)
     }
   tree step = fd->loop.step;
 
-  bool offloaded = cgraph_node::get (current_function_decl)->offloadable;
-  for (struct omp_region *rgn = region; !offloaded && rgn; rgn = rgn->outer)
-    offloaded = rgn->type == GIMPLE_OMP_TARGET;
-  bool is_simt = offloaded && omp_max_simt_vf () > 1 && safelen_int > 1;
+  bool is_simt = (safelen_int > 1
+		  && find_omp_clause (gimple_omp_for_clauses (fd->for_stmt),
+				      OMP_CLAUSE__SIMT_));
   tree simt_lane = NULL_TREE, simt_maxlane = NULL_TREE;
   if (is_simt)
     {
@@ -15006,6 +15058,8 @@  lower_omp_ordered (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   gbind *bind;
   bool simd = find_omp_clause (gimple_omp_ordered_clauses (ord_stmt),
 			       OMP_CLAUSE_SIMD);
+  /* FIXME: this should check presence of OMP_CLAUSE__SIMT_ on the enclosing
+     loop.  */
   bool maybe_simt
     = simd && omp_maybe_offloaded_ctx (ctx) && omp_max_simt_vf () > 1;
   bool threads = find_omp_clause (gimple_omp_ordered_clauses (ord_stmt),
@@ -20167,6 +20221,9 @@  execute_omp_device_lower ()
 	tree type = lhs ? TREE_TYPE (lhs) : integer_type_node;
 	switch (gimple_call_internal_fn (stmt))
 	  {
+	  case IFN_GOMP_USE_SIMT:
+	    rhs = vf == 1 ? integer_zero_node : integer_one_node;
+	    break;
 	  case IFN_GOMP_SIMT_LANE:
 	  case IFN_GOMP_SIMT_LAST_LANE:
 	    rhs = vf == 1 ? build_zero_cst (type) : NULL_TREE;
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index a3d220d..eec2d4f3 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -438,6 +438,10 @@  enum omp_clause_code {
   /* Internally used only clause, holding SIMD uid.  */
   OMP_CLAUSE__SIMDUID_,
 
+  /* Internally used only clause, flag whether this is SIMT simd
+     loop or not.  */
+  OMP_CLAUSE__SIMT_,
+
   /* Internally used only clause, holding _Cilk_for # of iterations
      on OMP_PARALLEL.  */
   OMP_CLAUSE__CILK_FOR_COUNT_,
diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 096eefd..95db710 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -893,6 +893,10 @@  dump_omp_clause (pretty_printer *pp, tree clause, int spc, int flags)
       pp_right_paren (pp);
       break;
 
+    case OMP_CLAUSE__SIMT_:
+      pp_string (pp, "_simt_");
+      break;
+
     case OMP_CLAUSE_GANG:
       pp_string (pp, "gang");
       if (OMP_CLAUSE_GANG_EXPR (clause) != NULL_TREE)
diff --git a/gcc/tree.c b/gcc/tree.c
index a4c5b1b..9b0b806 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -320,6 +320,7 @@  unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_HINT  */
   0, /* OMP_CLAUSE_DEFALTMAP  */
   1, /* OMP_CLAUSE__SIMDUID_  */
+  0, /* OMP_CLAUSE__SIMT_  */
   1, /* OMP_CLAUSE__CILK_FOR_COUNT_  */
   0, /* OMP_CLAUSE_INDEPENDENT  */
   1, /* OMP_CLAUSE_WORKER  */
@@ -391,6 +392,7 @@  const char * const omp_clause_code_name[] =
   "hint",
   "defaultmap",
   "_simduid_",
+  "_simt_",
   "_Cilk_for_count_",
   "independent",
   "worker",
@@ -11893,6 +11895,7 @@  walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
 	case OMP_CLAUSE_TILE:
+	case OMP_CLAUSE__SIMT_:
 	  WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
 
 	case OMP_CLAUSE_LASTPRIVATE: