diff mbox

[RFC:,1/6,v2] New target hook: max_noce_ifcvt_seq_cost

Message ID 1466524231-17412-1-git-send-email-james.greenhalgh@arm.com
State New
Headers show

Commit Message

James Greenhalgh June 21, 2016, 3:50 p.m. UTC
On Fri, Jun 03, 2016 at 12:39:42PM +0200, Richard Biener wrote:
> On Thu, Jun 2, 2016 at 6:53 PM, James Greenhalgh

> <james.greenhalgh@arm.com> wrote:

> >

> > Hi,

> >

> > This patch introduces a new target hook, to be used like BRANCH_COST but

> > with a guaranteed unit of measurement. We want this to break away from

> > the current ambiguous uses of BRANCH_COST.

> >

> > BRANCH_COST is used in ifcvt.c in two types of comparisons. One against

> > instruction counts - where it is used as the limit on the number of new

> > instructions we are permitted to generate. The other (after multiplying

> > by COSTS_N_INSNS (1)) directly against RTX costs.

> >

> > Of these, a comparison against RTX costs is the more easily understood

> > metric across the compiler, and the one I've pulled out to the new hook.

> > To keep things consistent for targets which don't migrate, this new hook

> > has a default value of BRANCH_COST * COSTS_N_INSNS (1).

> >

> > OK?

>

> How does the caller compute "predictable"?  There are some archs where

> an information on whether this is a forward or backward jump is more

> useful I guess.  Also at least for !speed_p the distance of the branch is

> important given not all targets support arbitrary branch offsets.


Just through a call to predictable_edge_p. It isn't perfect. My worry
with adding more details of the branch is that you end up with a nonsense
target implementation that tries way too hard to be clever. But, I don't
mind passing the edge through to the target hook, that way a target has
it if they want it. In this patch revision, I pass the edge through.

> I remember that at the last Cauldron we discussed to change things to

> compare costs of sequences of instructions rather than giving targets no

> context with just asking for single (sub-)insn rtx costs.


I've made better use of seq_cost in this respin. Bernd was right,
constructing dummy RTX just for costs, then discarding it, then
constructing the actual RTX for matching doesn't make sense as a pipeline.
Better just to construct the real sequence and use the cost of that.

In this patch revision, I started by removing the idea that this costs
a branch at all. It doesn't, the use of this hook is really a target
trying to limit if-convert to not end up pulling too much on to the
unconditional path. It seems better to expose that limit directly by
explicitly asking for the maximum cost of an unconditional sequence we
would create, and comparing against seq_cost of the new RTL. This saves
a target trying to figure out what is meant by a cost of a branch.

Having done that, I think I can see a clearer path to getting the
default hook implementation in shape. I've introduced two new params,
which give maximum costs for the generated sequence (one for a "predictable"
branch, one for "unpredictable") in the speed_p cases. I'm not expecting it
to be useful to give the user control in the case we are compiling for
size - whether this is a size win or not is independent of whether the
branch is predictable.

For the default implementation, if the parameters are not set, I just
multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and
COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still short
of ideas on how best to form the default implementation. This means we're
still potentially going to introduce performance regressions for targets
that don't provide an implementation of the new hook, or a default value
for the new parameters. It does mean we can keep the testsuite clean by
setting parameter values suitably high for all targets that have
conditional move instructions.

The new default causes some changes in generated conditional move sequences
for x86_64. Whether these changes are for the better or not I can't say.

This first patch introduces the two new parameters, and uses them in the
default implementation of the target hook.

Bootstrapped on x86_64 and aarch64 with no issues.

OK?

Thanks,
James

---
2016-06-21  James Greenhalgh  <james.greenhalgh@arm.com>

	* target.def (max_noce_ifcvt_seq_cost): New.
	* doc/tm.texi.in (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Document it.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_max_noce_ifcvt_seq_cost): New.
	* targhooks.c (default_max_noce_ifcvt_seq_cost): New.
	* params.def (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST): New.
	(PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST): Likewise.
	* doc/invoke.texi: Document new params.
diff mbox

Patch

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e000218..b71968f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8816,6 +8816,17 @@  considered for if-conversion.  The default is 10, though the compiler will
 also use other heuristics to decide whether if-conversion is likely to be
 profitable.
 
+@item max-rtl-if-conversion-precitable-cost
+@item max-rtl-if-conversion-unprecitable-cost
+RTL if-conversion tries to remove conditional branches around a block and
+replace them with conditionally executed instructions.  These parameters
+give the maximum permissible cost for the sequence that would be generated
+by if-conversion depending on whether the branch is statically determined
+to be predictable or not.  The units for this parameter are the same as
+those for the GCC internal seq_cost metric.  The compiler will try to
+provide a reasonable default for this parameter using the BRANCH_COST
+target macro.
+
 @item max-crossjump-edges
 The maximum number of incoming edges to consider for cross-jumping.
 The algorithm used by @option{-fcrossjumping} is @math{O(N^2)} in
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b318615..bbf6c1b 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6526,6 +6526,31 @@  should probably only be given to addresses with different numbers of
 registers on machines with lots of registers.
 @end deftypefn
 
+@deftypefn {Target Hook} {unsigned int} TARGET_MAX_NOCE_IFCVT_SEQ_COST (bool @var{speed_p}, edge @var{e})
+This hook should return a value in the same units as
+@code{TARGET_RTX_COSTS}, giving the maximum acceptable cost for
+a sequence generated by the RTL if-conversion pass when conditional
+execution is not available.  The RTL if-conversion pass attempts
+to convert conditional operations that would require a branch to a
+series of unconditional operations and @code{mov@var{mode}cc} insns.
+This hook gives the maximum cost of the unconditional instructions and
+the @code{mov@var{mode}cc} insns.  RTL if-conversion is cancelled if the
+cost of the converted sequence is greater than the value returned by this
+hook.
+
+@code{speed_p} is true if we are compiling for speed.
+@code{predictable_p} is true if analysis suggests that the branch
+will be predictable.  A target may decide to implement this hook to
+return a lower maximum cost for branches that the compiler believes
+will be predictable.
+
+The default implementation of this hook uses
+@code{BRANCH_COST * COSTS_N_INSNS (1)} if we are compiling for size,
+uses the @code{max-rtl-if-conversion-[un]predictable} parameters if they
+are set, and uses a multiple of @code{BRANCH_COST} if we are compiling
+for speed and the appropriate parameter is not set.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P (void)
 This predicate controls the use of the eager delay slot filler to disallow
 speculatively executed instructions being placed in delay slots.  Targets
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 1e8423c..d2b7f41 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4762,6 +4762,8 @@  Define this macro if a non-short-circuit operation produced by
 
 @hook TARGET_ADDRESS_COST
 
+@hook TARGET_MAX_NOCE_IFCVT_SEQ_COST
+
 @hook TARGET_NO_SPECULATION_IN_DELAY_SLOTS_P
 
 @node Scheduling
diff --git a/gcc/params.def b/gcc/params.def
index 894b7f3..682adbd 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1217,6 +1217,20 @@  DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_INSNS,
 	  "if-conversion.",
 	  10, 0, 99)
 
+DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST,
+	  "max-rtl-if-conversion-predictable-cost",
+	  "Maximum permissible cost for the sequence that would be "
+	  "generated by the RTL if-conversion pass for a branch which "
+	  "is considered predictable.",
+	  20, 0, 200)
+
+DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST,
+	  "max-rtl-if-conversion-unpredictable-cost",
+	  "Maximum permissible cost for the sequence that would be "
+	  "generated by the RTL if-conversion pass for a branch which "
+	  "is considered predictable.",
+	  40, 0, 200)
+
 DEFPARAM (PARAM_HSA_GEN_DEBUG_STORES,
 	  "hsa-gen-debug-stores",
 	  "Level of hsa debug stores verbosity",
diff --git a/gcc/target.def b/gcc/target.def
index a4df363..22e4898 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -3572,6 +3572,35 @@  registers on machines with lots of registers.",
  int, (rtx address, machine_mode mode, addr_space_t as, bool speed),
  default_address_cost)
 
+/* Give a cost, in RTX Costs units, for an edge.  Like BRANCH_COST, but with
+   well defined units.  */
+DEFHOOK
+(max_noce_ifcvt_seq_cost,
+ "This hook should return a value in the same units as\n\
+@code{TARGET_RTX_COSTS}, giving the maximum acceptable cost for\n\
+a sequence generated by the RTL if-conversion pass when conditional\n\
+execution is not available.  The RTL if-conversion pass attempts\n\
+to convert conditional operations that would require a branch to a\n\
+series of unconditional operations and @code{mov@var{mode}cc} insns.\n\
+This hook gives the maximum cost of the unconditional instructions and\n\
+the @code{mov@var{mode}cc} insns.  RTL if-conversion is cancelled if the\n\
+cost of the converted sequence is greater than the value returned by this\n\
+hook.\n\
+\n\
+@code{speed_p} is true if we are compiling for speed.\n\
+@code{predictable_p} is true if analysis suggests that the branch\n\
+will be predictable.  A target may decide to implement this hook to\n\
+return a lower maximum cost for branches that the compiler believes\n\
+will be predictable.\n\
+\n\
+The default implementation of this hook uses\n\
+@code{BRANCH_COST * COSTS_N_INSNS (1)} if we are compiling for size,\n\
+uses the @code{max-rtl-if-conversion-[un]predictable} parameters if they\n\
+are set, and uses a multiple of @code{BRANCH_COST} if we are compiling\n\
+for speed and the appropriate parameter is not set.",
+unsigned int, (bool speed_p, edge e),
+default_max_noce_ifcvt_seq_cost)
+
 /* Permit speculative instructions in delay slots during delayed-branch 
    scheduling.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 3e089e7..42dea3b 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -74,6 +74,8 @@  along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "opts.h"
 #include "gimplify.h"
+#include "predict.h"
+#include "params.h"
 
 
 bool
@@ -1977,4 +1979,29 @@  default_optab_supported_p (int, machine_mode, machine_mode, optimization_type)
   return true;
 }
 
+/* Default implementation of TARGET_RTX_BRANCH_COST.  */
+
+unsigned int
+default_max_noce_ifcvt_seq_cost (bool speed_p, edge e)
+{
+  bool predictable_p = predictable_edge_p (e);
+  /* For size, some targets like to set a BRANCH_COST of zero to disable
+     ifcvt, continue to allow that.  Then multiply through by
+     COSTS_N_INSNS (1) so we're in a comparable base.  */
+
+  if (!speed_p)
+    return BRANCH_COST (speed_p, predictable_p) * COSTS_N_INSNS (1);
+
+  enum compiler_param param = predictable_p
+			      ? PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST
+			      : PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST;
+
+  /* If we have a parameter set, use that, otherwise take a guess using
+     BRANCH_COST.  */
+  if (global_options_set.x_param_values[param])
+    return PARAM_VALUE (param);
+  else
+    return BRANCH_COST (speed_p, predictable_p) * COSTS_N_INSNS (3);
+}
+
 #include "gt-targhooks.h"
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index d6581cf..e1bae6b 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -255,4 +255,6 @@  extern void default_setup_incoming_vararg_bounds (cumulative_args_t ca ATTRIBUTE
 extern bool default_optab_supported_p (int, machine_mode, machine_mode,
 				       optimization_type);
 
+extern unsigned int default_max_noce_ifcvt_seq_cost (bool, edge);
+
 #endif /* GCC_TARGHOOKS_H */