From patchwork Tue Sep  8 14:53:49 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: James Greenhalgh <james.greenhalgh@arm.com>
X-Patchwork-Id: 53271
Return-Path: <patchwork-forward+bncBC76NPM6UIOBBG7NXOXQKGQEX3XI6CQ@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-lb0-f198.google.com (mail-lb0-f198.google.com
 [209.85.217.198])
 by patches.linaro.org (Postfix) with ESMTPS id 34BEF22B1A
 for <linaro@patches.linaro.org>; Tue,  8 Sep 2015 14:54:20 +0000 (UTC)
Received: by lbbti1 with SMTP id ti1sf36836873lbb.3
 for <linaro@patches.linaro.org>; Tue, 08 Sep 2015 07:54:19 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :delivered-to:from:to:cc:subject:date:message-id:mime-version
 :content-type:x-original-sender:x-original-authentication-results;
 bh=frOFj8dzBaDNvJ9I1SPVg1/XQeZdNyw6szUznGpeQv4=;
 b=OAXSZn+KTmwD6smNVOwLzGqB3k4M+XzPVwTK7VQrrMqg5V2bDGIX6udn++x39jCVLx
 zO7wCyMNs+89SqCE/k5Hbn/sDEY1WDpF986WBNKYHxPia9gZngQMz2AhUGkF2MDKqWR6
 xxgH4/Mj4N8IfQZg9tq5Yg3b8iZDuJwp2+plvBQf1+pI4TP2CMS1icNObLcGHeReFs+e
 Z+IJxOYhexsm8D8AaMFtwW+QK5MTSG7KsoJySWe37aAG0owKqthblAMpvfsH0Ohjyzq/
 bcovSncNZdDP99L3RISPjR7wmDeqVc6n0+ZFKHr0FJmnGI19z2f6ju6BK5MzjjXADQJa
 iMXg==
X-Gm-Message-State: ALoCoQmFxLh9TOkI1Ik4rjPdHZYgOMAGOscR3W4X+H+jXv1yfKOGOVjO9Mwniv9aixbjh2DIBxb/
X-Received: by 10.195.12.83 with SMTP id eo19mr6717331wjd.0.1441724059160;
 Tue, 08 Sep 2015 07:54:19 -0700 (PDT)
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.152.5.168 with SMTP id t8ls679213lat.84.gmail; Tue, 08 Sep
 2015 07:54:19 -0700 (PDT)
X-Received: by 10.152.244.170 with SMTP id xh10mr23373786lac.7.1441724058994; 
 Tue, 08 Sep 2015 07:54:18 -0700 (PDT)
Received: from mail-lb0-x232.google.com (mail-lb0-x232.google.com.
 [2a00:1450:4010:c04::232]) by mx.google.com with ESMTPS id
 dx6si3450110lbc.64.2015.09.08.07.54.18
 for <patchwork-forward@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 08 Sep 2015 07:54:18 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c04::232 as permitted sender)
 client-ip=2a00:1450:4010:c04::232; 
Received: by lbcao8 with SMTP id ao8so54714312lbc.3
 for <patchwork-forward@linaro.org>;
 Tue, 08 Sep 2015 07:54:18 -0700 (PDT)
X-Received: by 10.152.5.133 with SMTP id s5mr921684las.19.1441724058817;
 Tue, 08 Sep 2015 07:54:18 -0700 (PDT)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patch@linaro.org
Received: by 10.112.59.35 with SMTP id w3csp1025977lbq;
 Tue, 8 Sep 2015 07:54:17 -0700 (PDT)
X-Received: by 10.50.73.41 with SMTP id i9mr42708617igv.14.1441724057488;
 Tue, 08 Sep 2015 07:54:17 -0700 (PDT)
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 l3si5904222pdg.217.2015.09.08.07.54.16 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 08 Sep 2015 07:54:17 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-406897-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Received: (qmail 97288 invoked by alias); 8 Sep 2015 14:54:03 -0000
Mailing-List: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
Precedence: list
List-Id: <patchwork-forward.linaro.org>
List-Unsubscribe: <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>, 
 <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 97250 invoked by uid 89); 8 Sep 2015 14:54:03 -0000
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.1 required=5.0 tests=AWL, BAYES_20,
 SPF_PASS, UNSUBSCRIBE_BODY autolearn=no version=3.3.2
X-HELO: eu-smtp-delivery-143.mimecast.com
Received: from eu-smtp-delivery-143.mimecast.com (HELO
 eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by
 sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
 Tue, 08 Sep 2015 14:54:00 +0000
Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com
 [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id
 uk-mta-6-HMkVsIPbTM-f28UVGUcZpg-1; Tue, 08 Sep 2015 15:53:55 +0100
Received: from e107456-lin.cambridge.arm.com ([10.1.2.79]) by
 cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959);
 Tue, 8 Sep 2015 15:53:54 +0100
From: James Greenhalgh <james.greenhalgh@arm.com>
To: gcc-patches@gcc.gnu.org
Cc: law@redhat.com,	ebotcazou@libertysurf.fr,	steven@gcc.gnu.org
Subject: [Patch] Teach RTL ifcvt to handle multiple simple set instructions
Date: Tue,  8 Sep 2015 15:53:49 +0100
Message-Id: <1441724029-3124-1-git-send-email-james.greenhalgh@arm.com>
MIME-Version: 1.0
X-MC-Unique: HMkVsIPbTM-f28UVGUcZpg-1
X-IsSubscribed: yes
X-Original-Sender: james.greenhalgh@arm.com
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com:
 domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c04::232 as permitted sender)
 smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org;
 dkim=pass header.i=@gcc.gnu.org
X-Google-Group-Id: 836684582541

Hi,

RTL "noce" ifcvt will currently give up if the branches it is trying to
make conditional are too complicated. One of the conditions for "too
complicated" is that the branch sets more than one value.

One common idiom that this misses is something like:

  int d = a[i];
  int e = b[i];
  if (d > e)
    std::swap (d, e)
  [...]

Which is currently going to generate something like

  compare (d, e)
  branch.le L1
    tmp = d;
    d = e;
    e = tmp;
  L1:

In the case that this is an unpredictable branch, we can do better
with:

  compare (d, e)
  d1 = if_then_else (le, e, d)
  e1 = if_then_else (le, d, e)
  d = d1
  e = e1

Register allocation will eliminate the two trailing unconditional
assignments, and we get a neater sequence.

This patch introduces this logic to the RTL if convert passes, catching
cases where a basic block does nothing other than multiple SETs. This
helps both with the std::swap idiom above, and with pathological cases
where tree passes create new basic blocks to resolve Phi nodes, which
contain only set instructions and end up unprecdictable.

One big question I have with this patch is how I ought to write a meaningful
cost model I've used. It seems like yet another misuse of RTX costs, and
another bit of stuff for targets to carefully balance. Now, if the
relative cost of branches and conditional move instructions is not
carefully managed, you may enable or disable these optimisations. This is
probably acceptable, but I dislike adding more and more gotcha's to
target costs, as I get bitten by them hard enough as is!

Elsewhere the ifcvt cost usage is pretty lacking - esentially counting
the number of instructions which will be if-converted and comparing that
against the magic number "2". I could follow this lead and just count
the number of moves I would convert, then compare that to the branch cost,
but this feels... wrong. This makes it pretty tough to choose a "good"
number for TARGET_BRANCH_COST. This isn't helped now that higher branch
costs can mean pulling expensive instructions in to the main execution
stream.

I've picked a fairly straightforward cost model for this patch, trying to
compare the cost of each conditional move, as calculated with rtx_costs,
against COSTS_N_INSNS (branch_cost). This essentially kills the
optimisation for any target with conditional-move cost > 1. Personally, I
consider that a pretty horrible bug in this patch - but I couldn't think of
anything better to try.

As you might expect, this triggers all over the place when
TARGET_BRANCH_COST numbers are tuned high. In an AArch64 Spec2006 build,
I saw 3.9% more CSEL operations with this patch and TARGET_BRANCH_COST set
to 4. Performance is also good on AArch64 on a range of microbenchmarks
and larger workloads (after playing with the branch costs). I didn't see
any performance regression on x86_64, as you would expect given that the
cost models preclude x86_64 targets from ever hitting this optimisation.

Bootstrapped and tested on x86_64 and AArch64 with no issues, and
bootstrapped and tested with the cost model turned off, to have some
confidence that we will continue to do the right thing if any targets do
up their branch costs and start using this code.

No testcase provided, as currently I don't know of targets with a high
enough branch cost to actually trigger the optimisation.

OK?

Thanks,
James
---
gcc/

2015-09-07  James Greenhalgh  <james.greenhalgh@arm.com>

	* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New.
	(noce_convert_multiple_sets): Likewise.
	(noce_process_if_block): Call them.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 157a716..059bd89 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2982,6 +2982,223 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond,
   return false;
 }
 
+/* We have something like:
+
+     if (x > y)
+       { i = a; j = b; k = c; }
+
+   Make it:
+
+     tmp_i = (x > y) ? a : i;
+     tmp_j = (x > y) ? b : j;
+     tmp_k = (x > y) ? c : k;
+     i = tmp_i; <- Should be cleaned up
+     j = tmp_j; <- Likewise.
+     k = tmp_k; <- Likewise.
+
+   Look for special cases such as use of temporary registers (for
+   example in a swap idiom).
+
+   IF_INFO contains the useful information about the block structure and
+   jump instructions.  */
+
+static int
+noce_convert_multiple_sets (struct noce_if_info *if_info)
+{
+  basic_block test_bb = if_info->test_bb;
+  basic_block then_bb = if_info->then_bb;
+  basic_block join_bb = if_info->join_bb;
+  rtx_insn *jump = if_info->jump;
+  rtx_insn *cond_earliest;
+  unsigned int cost = 0;
+  rtx_insn *insn;
+
+  start_sequence ();
+
+  /* Decompose the condition attached to the jump.  */
+  rtx cond = noce_get_condition (jump, &cond_earliest, false);
+  rtx x = XEXP (cond, 0);
+  rtx y = XEXP (cond, 1);
+  rtx_code cond_code = GET_CODE (cond);
+
+  /* The true targets for a conditional move.  */
+  vec<rtx> targets = vNULL;
+  /* The temporaries introduced to allow us to not consider register
+     overlap.  */
+  vec<rtx> temporaries = vNULL;
+  /* The insns we've emitted.  */
+  vec<rtx_insn *> unmodified_insns = vNULL;
+  unsigned count = 0;
+
+  FOR_BB_INSNS (then_bb, insn)
+    {
+      /* Skip over non-insns.  */
+      if (!active_insn_p (insn))
+	continue;
+
+      rtx set = single_set (insn);
+      gcc_checking_assert (set);
+
+      rtx target = SET_DEST (set);
+      rtx temp = gen_reg_rtx (GET_MODE (target));
+      rtx new_val = SET_SRC (set);
+      rtx old_val = target;
+
+      /* If we were supposed to read from an earlier write in this block,
+	 we've changed the register allocation.  Rewire the read.  While
+	 we are looking, also try to catch a swap idiom.  */
+      rtx candidate_rewire = new_val;
+      for (unsigned i = 0; i < count; i++)
+	{
+	  if (reg_overlap_mentioned_p (new_val, targets[i]))
+	    {
+	      /* Catch a "swap" style idiom.  */
+	      if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
+		{
+		  /* The write to targets[i] is only live until the read
+		     here.  As the condition codes match, we can propagate
+		     the set to here.  */
+		   candidate_rewire
+		     = SET_SRC (single_set (unmodified_insns[i]));
+
+		   /* Discount the cost calculation by one conditional
+		      set instruction.  As we are just putting out
+		      a group of SET instructions, any will do.  */
+		   cost -= insn_rtx_cost (PATTERN (get_last_insn ()),
+					  optimize_bb_for_speed_p (test_bb));
+		}
+	      else
+		candidate_rewire = temporaries[i];
+	    }
+	}
+      new_val = candidate_rewire;
+
+      /* If we had a non-canonical conditional jump (i.e. one where
+	 the fallthrough is to the "else" case) we need to reverse
+	 the conditional select.  */
+      if (if_info->then_else_reversed)
+	std::swap (old_val, new_val);
+
+      /* Actually emit the conditional move.  */
+      rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
+				       x, y, new_val, old_val);
+
+      /* If we failed to expand the conditional move, drop out and don't
+	 try to continue.  */
+      if (temp_dest == NULL_RTX)
+	{
+	  end_sequence ();
+	  return FALSE;
+	}
+
+      /* Track the cost of building these conditional instructions.  */
+      cost += insn_rtx_cost (PATTERN (get_last_insn ()),
+			     optimize_bb_for_speed_p (test_bb));
+
+      /* Bookkeeping.  */
+      count++;
+      targets.safe_push (target);
+      temporaries.safe_push (temp_dest);
+      unmodified_insns.safe_push (insn);
+    }
+
+  /* We must have seen some sort of insn to insert, otherwise we were
+     given an empty BB to convert, and we can't handle that.  */
+  if (unmodified_insns.is_empty ())
+    {
+      end_sequence ();
+      return FALSE;
+    }
+
+  /* Check if this is actually beneficial.  */
+  if (cost > COSTS_N_INSNS (if_info->branch_cost))
+     {
+       end_sequence ();
+       return FALSE;
+     }
+
+  /* Now fixup the assignments.  */
+  for (unsigned i = 0; i < count; i++)
+    noce_emit_move_insn (targets[i], temporaries[i]);
+
+  /* Actually emit the sequence.  */
+  rtx_insn *seq = get_insns ();
+
+  for (insn = seq; insn; insn = NEXT_INSN (insn))
+    set_used_flags (insn);
+
+  unshare_all_rtl_in_chain (seq);
+  end_sequence ();
+
+  if (!seq)
+    return FALSE;
+
+  emit_insn_before_setloc (seq, if_info->jump,
+			   INSN_LOCATION (unmodified_insns.last ()));
+
+  /* Clean up THEN_BB and the edges in and out of it.  */
+  remove_edge (find_edge (test_bb, join_bb));
+  remove_edge (find_edge (then_bb, join_bb));
+  redirect_edge_and_branch_force (single_succ_edge (test_bb), join_bb);
+  delete_basic_block (then_bb);
+  num_true_changes++;
+
+  /* Maybe merge blocks now the jump is simple enough.  */
+  if (can_merge_blocks_p (test_bb, join_bb))
+    {
+      merge_blocks (test_bb, join_bb);
+      num_true_changes++;
+    }
+
+  num_updated_if_blocks++;
+  return TRUE;
+}
+
+/* Return true iff basic block TEST_BB is comprised of only
+   (SET (REG) (REG)) insns suitable for conversion to a series
+   of conditional moves.  */
+
+static bool
+bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
+{
+  rtx_insn *insn;
+
+  /* We must have at least one real insn to convert, or there will
+     be trouble!  */
+  bool bb_is_not_empty = false;
+  FOR_BB_INSNS (test_bb, insn)
+    {
+      /* Skip over notes etc.  */
+      if (!active_insn_p (insn))
+	continue;
+
+      /* We only handle SET insns.  */
+      rtx set = single_set (insn);
+      if (set == NULL_RTX)
+	return false;
+
+      rtx dest = SET_DEST (set);
+      rtx src = SET_SRC (set);
+
+      /* We can possibly relax this, but for now only handle REG to REG
+	 moves.  This avoids any issues that might come from introducing
+	 loads/stores that might violate data-race-freedom guarantees.  */
+      if (!(REG_P (src) && REG_P (dest)))
+	return false;
+
+      /* Destination must be appropriate for a conditional write.  */
+      if (!noce_operand_ok (dest))
+	return false;
+
+      /* We must be able to conditionally move in this mode.  */
+      if (!can_conditionally_move_p (GET_MODE (dest)))
+	return false;
+
+      bb_is_not_empty = true;
+    }
+  return bb_is_not_empty;
+}
+
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
    it without using conditional execution.  Return TRUE if we were successful
    at converting the block.  */
@@ -3004,12 +3221,22 @@ noce_process_if_block (struct noce_if_info *if_info)
      (1) if (...) x = a; else x = b;
      (2) x = b; if (...) x = a;
      (3) if (...) x = a;   // as if with an initial x = x.
-
+     (4) if (...) { x = a; y = b; z = c; }  // Like 3, for multiple SETS.
      The later patterns require jumps to be more expensive.
      For the if (...) x = a; else x = b; case we allow multiple insns
      inside the then and else blocks as long as their only effect is
      to calculate a value for x.
-     ??? For future expansion, look for multiple X in such patterns.  */
+     ??? For future expansion, further expand the "multiple X" rules.  */
+
+  /* First look for multiple SETS.  */
+  if (!else_bb
+      && HAVE_conditional_move
+      && !HAVE_cc0
+      && bb_ok_for_noce_convert_multiple_sets (then_bb))
+    {
+      if (noce_convert_multiple_sets (if_info))
+	return TRUE;
+    }
 
   if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost,
 				    &if_info->then_simple))