diff mbox

[RFC,tree-ifcombine] Making tree-ifcombine more friendly for conditional comparisons generation

Message ID 5630B718.3060204@arm.com
State New
Headers show

Commit Message

Kyrylo Tkachov Oct. 28, 2015, 11:52 a.m. UTC
Hi all,

This is a follow up to the thread at https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01646.html
I'm trying to increase the utilisation of conditional compare instructions on aarch64 and I've identified
tree ifcombine as a place that improves the opportunities for that.

The transformations in the ifcombine_ifandif function combine basic blocks with an AND or an IOR when possible,
which is the form that the expander code in ccmp.c looks for when generating conditional compares.
To recap from the referenced thread, currently ifcombine will only merge such blocks if the inner block contains
only a single statement i.e. the condition. I'd like to make it more aggressive.

This patch allows merging of inner blocks with multiple statements, as long as those statements are comparisons
or AND/IOR operations. We don't allow any number of such statements though. An additional gating is introduced
via the PARAM_IFCOMBINE_THRESHOLD param which is used to gate the estimate_num_insns_seq cost of the inner basic
block, as Richard suggested. Using this param, targets that don't utilise the ccmp.c machinery can make sure that
the codegen effect from this patch is minimal.  The default value for the param is such that it has minimal
codegen difference on targets other than aarch64 (for which I set it to a higher value)

I tried allowing any kind of statements in the inner block, not just
the types mentioned above and using just the param to gate this, but this didn't show good results on aarch64.
Pulling in the extra code into the unconditional path was not beneficial on the whole.

With this approach we can avoid explicitly checking for target conditional compare support in tree-ssa-ifcombine.c.
Also, with this patch I don't see the gcc.dg/pr66299-2.c regression that I complained about in the
earlier thread.

On aarch64 with this patch I see about 4.5% more ccmp instructions being generated on SPEC2006.
SPECINT improves by about 0.4% on a Cortex-A57 platform and SPECFP shows no regressions.

On x86_64 there was no codegen difference for SPEC2006.

What do people think of this approach?

A concern is that the new param is essentially a new magic number replacing an implicit magic number of '1'.
It feels that how aggressively we do this for ccmp generation ultimately should depend on BRANCH_COST somehow,
but BRANCH_COST is already overloaded and meaningless as discussed in other threads

Thanks,
Kyrill

2015-10-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * params.def (PARAM_IFCOMBINE_THRESHOLD): New param.
     * tree-ssa-ifcombine.c: Include tree-inline.h and params.h.
     (gimple_seq_contains_logical_ops_p): New function.
     (ifcombine_ifandif): Use it.
     * config/aarch64/aarch64.c: Include params.h
     (aarch64_override_options): Set PARAM_IFCOMBINE_THRESHOLD.

2015-10-28  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * gcc.target/aarch64/ifcombine_cond_cmp_1.c: New test.
diff mbox

Patch

commit 0a6d01f7a3c0a59e5119b3ec85528b89c1129efb
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Wed Sep 23 15:39:32 2015 +0100

    Add tree ifcombine cost gating

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 205f12d..9d59427 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -72,6 +72,7 @@ 
 #include "aarch64-cost-tables.h"
 #include "dumpfile.h"
 #include "builtins.h"
+#include "params.h"
 #include "rtl-iter.h"
 #include "tm-constrs.h"
 #include "sched-int.h"
@@ -7933,6 +7934,11 @@  aarch64_override_options (void)
 
   aarch64_override_options_internal (&global_options);
 
+  maybe_set_param_value (PARAM_IFCOMBINE_THRESHOLD,
+			  5,
+			  global_options.x_param_values,
+			  global_options_set.x_param_values);
+
   /* Save these options as the default ones in case we push and pop them later
      while processing functions with potential target attributes.  */
   target_option_default_node = target_option_current_node
diff --git a/gcc/params.def b/gcc/params.def
index da2c6a3..7969e08 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1041,6 +1041,12 @@  DEFPARAM (PARAM_CASE_VALUES_THRESHOLD,
 	  "if 0, use the default for the machine",
           0, 0, 0)
 
+DEFPARAM (PARAM_IFCOMBINE_THRESHOLD,
+	  "ifcombine-threshold",
+	  "The approximate cost of statements in an inner block above which "
+	  "it is not deemed beneficial to if-combine it with an outer block",
+	  2, 0, 0)
+
 /* Data race flags for C++0x memory model compliance.  */
 DEFPARAM (PARAM_ALLOW_STORE_DATA_RACES,
 	  "allow-store-data-races",
diff --git a/gcc/testsuite/gcc.target/aarch64/ifcombine_cond_cmp_1.c b/gcc/testsuite/gcc.target/aarch64/ifcombine_cond_cmp_1.c
new file mode 100644
index 0000000..36b78a6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ifcombine_cond_cmp_1.c
@@ -0,0 +1,16 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+int
+foo1 (int a, int b, int c, int d)
+{
+  return a > b && b <= c && c > d;
+}
+
+int
+foo2 (int a, int b, int c, int d)
+{
+  return a > b && (b <= c && c > d);
+}
+
+/* { dg-final { scan-assembler-times "ccmp" 4 } } */
diff --git a/gcc/tree-ssa-ifcombine.c b/gcc/tree-ssa-ifcombine.c
index 66be430..4b6c98e 100644
--- a/gcc/tree-ssa-ifcombine.c
+++ b/gcc/tree-ssa-ifcombine.c
@@ -35,10 +35,12 @@  along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "cfganal.h"
 #include "tree-pretty-print.h"
+#include "tree-inline.h"
 #include "internal-fn.h"
 #include "gimple-fold.h"
 #include "gimple-iterator.h"
 #include "gimplify-me.h"
+#include "params.h"
 #include "tree-cfg.h"
 #include "tree-pass.h"
 
@@ -325,6 +327,41 @@  recognize_bits_test (gcond *cond, tree *name, tree *bits, bool inv)
   return true;
 }
 
+/* Return true if the sequence GS contains only comparisons
+   and logical combinations of variables.  */
+
+static bool
+gimple_seq_contains_logical_ops_p (gimple_seq gs)
+{
+  gimple_stmt_iterator gsi;
+  bool ret = true;
+
+  for (gsi = gsi_start (gs); !gsi_one_before_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      enum gimple_code code = gimple_code (stmt);
+
+      if (code == GIMPLE_DEBUG)
+	continue;
+
+      if (code != GIMPLE_ASSIGN)
+	return false;
+
+      enum tree_code tcode = gimple_assign_rhs_code (stmt);
+      if (TREE_CODE_CLASS (tcode) != tcc_comparison
+	  && tcode != TRUTH_AND_EXPR
+	  && tcode != TRUTH_ANDIF_EXPR
+	  && tcode != TRUTH_OR_EXPR
+	  && tcode != TRUTH_ORIF_EXPR
+	  && tcode != BIT_AND_EXPR
+	  && tcode != BIT_IOR_EXPR
+	  && tcode != BIT_NOT_EXPR)
+	return false;
+    }
+
+  return ret;
+}
+
 /* If-convert on a and pattern with a common else block.  The inner
    if is specified by its INNER_COND_BB, the outer by OUTER_COND_BB.
    inner_inv, outer_inv and result_inv indicate whether the conditions
@@ -511,9 +548,31 @@  ifcombine_ifandif (basic_block inner_cond_bb, bool inner_inv,
 	  gimple_stmt_iterator gsi;
 	  if (!LOGICAL_OP_NON_SHORT_CIRCUIT)
 	    return false;
-	  /* Only do this optimization if the inner bb contains only the conditional. */
-	  if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb (inner_cond_bb)))
-	    return false;
+
+	  /* We'll be pulling in the inner_cond_bb code to execute
+	     unconditionally.  If the inner bb is just the condition then
+	     always combine.  If there are more statements, only allow
+	     statements that facilitate generation of conditional compare
+	     operations and even then only up to a limit of
+	     PARAM_IFCOMBINE_THRESHOLD.  */
+	  if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb
+				       (inner_cond_bb)))
+	    {
+	      bool bb_logic_ops_and_comparisons_p
+		= gimple_seq_contains_logical_ops_p (bb_seq (inner_cond_bb));
+
+	      if (!bb_logic_ops_and_comparisons_p)
+		return false;
+
+	      int inner_cost
+		= estimate_num_insns_seq (bb_seq (inner_cond_bb),
+					       &eni_time_weights);
+	      int threshold = PARAM_VALUE (PARAM_IFCOMBINE_THRESHOLD);
+
+	      if (inner_cost > threshold)
+		return false;
+	    }
+
 	  t1 = fold_build2_loc (gimple_location (inner_cond),
 				inner_cond_code,
 				boolean_type_node,