From patchwork Tue Sep 8 14:53:49 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Greenhalgh X-Patchwork-Id: 53271 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-lb0-f198.google.com (mail-lb0-f198.google.com [209.85.217.198]) by patches.linaro.org (Postfix) with ESMTPS id 34BEF22B1A for ; Tue, 8 Sep 2015 14:54:20 +0000 (UTC) Received: by lbbti1 with SMTP id ti1sf36836873lbb.3 for ; Tue, 08 Sep 2015 07:54:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:from:to:cc:subject:date:message-id:mime-version :content-type:x-original-sender:x-original-authentication-results; bh=frOFj8dzBaDNvJ9I1SPVg1/XQeZdNyw6szUznGpeQv4=; b=OAXSZn+KTmwD6smNVOwLzGqB3k4M+XzPVwTK7VQrrMqg5V2bDGIX6udn++x39jCVLx zO7wCyMNs+89SqCE/k5Hbn/sDEY1WDpF986WBNKYHxPia9gZngQMz2AhUGkF2MDKqWR6 xxgH4/Mj4N8IfQZg9tq5Yg3b8iZDuJwp2+plvBQf1+pI4TP2CMS1icNObLcGHeReFs+e Z+IJxOYhexsm8D8AaMFtwW+QK5MTSG7KsoJySWe37aAG0owKqthblAMpvfsH0Ohjyzq/ bcovSncNZdDP99L3RISPjR7wmDeqVc6n0+ZFKHr0FJmnGI19z2f6ju6BK5MzjjXADQJa iMXg== X-Gm-Message-State: ALoCoQmFxLh9TOkI1Ik4rjPdHZYgOMAGOscR3W4X+H+jXv1yfKOGOVjO9Mwniv9aixbjh2DIBxb/ X-Received: by 10.195.12.83 with SMTP id eo19mr6717331wjd.0.1441724059160; Tue, 08 Sep 2015 07:54:19 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.5.168 with SMTP id t8ls679213lat.84.gmail; Tue, 08 Sep 2015 07:54:19 -0700 (PDT) X-Received: by 10.152.244.170 with SMTP id xh10mr23373786lac.7.1441724058994; Tue, 08 Sep 2015 07:54:18 -0700 (PDT) Received: from mail-lb0-x232.google.com (mail-lb0-x232.google.com. [2a00:1450:4010:c04::232]) by mx.google.com with ESMTPS id dx6si3450110lbc.64.2015.09.08.07.54.18 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Sep 2015 07:54:18 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::232 as permitted sender) client-ip=2a00:1450:4010:c04::232; Received: by lbcao8 with SMTP id ao8so54714312lbc.3 for ; Tue, 08 Sep 2015 07:54:18 -0700 (PDT) X-Received: by 10.152.5.133 with SMTP id s5mr921684las.19.1441724058817; Tue, 08 Sep 2015 07:54:18 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.59.35 with SMTP id w3csp1025977lbq; Tue, 8 Sep 2015 07:54:17 -0700 (PDT) X-Received: by 10.50.73.41 with SMTP id i9mr42708617igv.14.1441724057488; Tue, 08 Sep 2015 07:54:17 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id l3si5904222pdg.217.2015.09.08.07.54.16 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 Sep 2015 07:54:17 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-406897-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 97288 invoked by alias); 8 Sep 2015 14:54:03 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 97250 invoked by uid 89); 8 Sep 2015 14:54:03 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.1 required=5.0 tests=AWL, BAYES_20, SPF_PASS, UNSUBSCRIBE_BODY autolearn=no version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 08 Sep 2015 14:54:00 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-6-HMkVsIPbTM-f28UVGUcZpg-1; Tue, 08 Sep 2015 15:53:55 +0100 Received: from e107456-lin.cambridge.arm.com ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 8 Sep 2015 15:53:54 +0100 From: James Greenhalgh To: gcc-patches@gcc.gnu.org Cc: law@redhat.com, ebotcazou@libertysurf.fr, steven@gcc.gnu.org Subject: [Patch] Teach RTL ifcvt to handle multiple simple set instructions Date: Tue, 8 Sep 2015 15:53:49 +0100 Message-Id: <1441724029-3124-1-git-send-email-james.greenhalgh@arm.com> MIME-Version: 1.0 X-MC-Unique: HMkVsIPbTM-f28UVGUcZpg-1 X-IsSubscribed: yes X-Original-Sender: james.greenhalgh@arm.com X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::232 as permitted sender) smtp.mailfrom=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 Hi, RTL "noce" ifcvt will currently give up if the branches it is trying to make conditional are too complicated. One of the conditions for "too complicated" is that the branch sets more than one value. One common idiom that this misses is something like: int d = a[i]; int e = b[i]; if (d > e) std::swap (d, e) [...] Which is currently going to generate something like compare (d, e) branch.le L1 tmp = d; d = e; e = tmp; L1: In the case that this is an unpredictable branch, we can do better with: compare (d, e) d1 = if_then_else (le, e, d) e1 = if_then_else (le, d, e) d = d1 e = e1 Register allocation will eliminate the two trailing unconditional assignments, and we get a neater sequence. This patch introduces this logic to the RTL if convert passes, catching cases where a basic block does nothing other than multiple SETs. This helps both with the std::swap idiom above, and with pathological cases where tree passes create new basic blocks to resolve Phi nodes, which contain only set instructions and end up unprecdictable. One big question I have with this patch is how I ought to write a meaningful cost model I've used. It seems like yet another misuse of RTX costs, and another bit of stuff for targets to carefully balance. Now, if the relative cost of branches and conditional move instructions is not carefully managed, you may enable or disable these optimisations. This is probably acceptable, but I dislike adding more and more gotcha's to target costs, as I get bitten by them hard enough as is! Elsewhere the ifcvt cost usage is pretty lacking - esentially counting the number of instructions which will be if-converted and comparing that against the magic number "2". I could follow this lead and just count the number of moves I would convert, then compare that to the branch cost, but this feels... wrong. This makes it pretty tough to choose a "good" number for TARGET_BRANCH_COST. This isn't helped now that higher branch costs can mean pulling expensive instructions in to the main execution stream. I've picked a fairly straightforward cost model for this patch, trying to compare the cost of each conditional move, as calculated with rtx_costs, against COSTS_N_INSNS (branch_cost). This essentially kills the optimisation for any target with conditional-move cost > 1. Personally, I consider that a pretty horrible bug in this patch - but I couldn't think of anything better to try. As you might expect, this triggers all over the place when TARGET_BRANCH_COST numbers are tuned high. In an AArch64 Spec2006 build, I saw 3.9% more CSEL operations with this patch and TARGET_BRANCH_COST set to 4. Performance is also good on AArch64 on a range of microbenchmarks and larger workloads (after playing with the branch costs). I didn't see any performance regression on x86_64, as you would expect given that the cost models preclude x86_64 targets from ever hitting this optimisation. Bootstrapped and tested on x86_64 and AArch64 with no issues, and bootstrapped and tested with the cost model turned off, to have some confidence that we will continue to do the right thing if any targets do up their branch costs and start using this code. No testcase provided, as currently I don't know of targets with a high enough branch cost to actually trigger the optimisation. OK? Thanks, James --- gcc/ 2015-09-07 James Greenhalgh * ifcvt.c (bb_ok_for_noce_convert_multiple_sets): New. (noce_convert_multiple_sets): Likewise. (noce_process_if_block): Call them. diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c index 157a716..059bd89 100644 --- a/gcc/ifcvt.c +++ b/gcc/ifcvt.c @@ -2982,6 +2982,223 @@ bb_valid_for_noce_process_p (basic_block test_bb, rtx cond, return false; } +/* We have something like: + + if (x > y) + { i = a; j = b; k = c; } + + Make it: + + tmp_i = (x > y) ? a : i; + tmp_j = (x > y) ? b : j; + tmp_k = (x > y) ? c : k; + i = tmp_i; <- Should be cleaned up + j = tmp_j; <- Likewise. + k = tmp_k; <- Likewise. + + Look for special cases such as use of temporary registers (for + example in a swap idiom). + + IF_INFO contains the useful information about the block structure and + jump instructions. */ + +static int +noce_convert_multiple_sets (struct noce_if_info *if_info) +{ + basic_block test_bb = if_info->test_bb; + basic_block then_bb = if_info->then_bb; + basic_block join_bb = if_info->join_bb; + rtx_insn *jump = if_info->jump; + rtx_insn *cond_earliest; + unsigned int cost = 0; + rtx_insn *insn; + + start_sequence (); + + /* Decompose the condition attached to the jump. */ + rtx cond = noce_get_condition (jump, &cond_earliest, false); + rtx x = XEXP (cond, 0); + rtx y = XEXP (cond, 1); + rtx_code cond_code = GET_CODE (cond); + + /* The true targets for a conditional move. */ + vec targets = vNULL; + /* The temporaries introduced to allow us to not consider register + overlap. */ + vec temporaries = vNULL; + /* The insns we've emitted. */ + vec unmodified_insns = vNULL; + unsigned count = 0; + + FOR_BB_INSNS (then_bb, insn) + { + /* Skip over non-insns. */ + if (!active_insn_p (insn)) + continue; + + rtx set = single_set (insn); + gcc_checking_assert (set); + + rtx target = SET_DEST (set); + rtx temp = gen_reg_rtx (GET_MODE (target)); + rtx new_val = SET_SRC (set); + rtx old_val = target; + + /* If we were supposed to read from an earlier write in this block, + we've changed the register allocation. Rewire the read. While + we are looking, also try to catch a swap idiom. */ + rtx candidate_rewire = new_val; + for (unsigned i = 0; i < count; i++) + { + if (reg_overlap_mentioned_p (new_val, targets[i])) + { + /* Catch a "swap" style idiom. */ + if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX) + { + /* The write to targets[i] is only live until the read + here. As the condition codes match, we can propagate + the set to here. */ + candidate_rewire + = SET_SRC (single_set (unmodified_insns[i])); + + /* Discount the cost calculation by one conditional + set instruction. As we are just putting out + a group of SET instructions, any will do. */ + cost -= insn_rtx_cost (PATTERN (get_last_insn ()), + optimize_bb_for_speed_p (test_bb)); + } + else + candidate_rewire = temporaries[i]; + } + } + new_val = candidate_rewire; + + /* If we had a non-canonical conditional jump (i.e. one where + the fallthrough is to the "else" case) we need to reverse + the conditional select. */ + if (if_info->then_else_reversed) + std::swap (old_val, new_val); + + /* Actually emit the conditional move. */ + rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code, + x, y, new_val, old_val); + + /* If we failed to expand the conditional move, drop out and don't + try to continue. */ + if (temp_dest == NULL_RTX) + { + end_sequence (); + return FALSE; + } + + /* Track the cost of building these conditional instructions. */ + cost += insn_rtx_cost (PATTERN (get_last_insn ()), + optimize_bb_for_speed_p (test_bb)); + + /* Bookkeeping. */ + count++; + targets.safe_push (target); + temporaries.safe_push (temp_dest); + unmodified_insns.safe_push (insn); + } + + /* We must have seen some sort of insn to insert, otherwise we were + given an empty BB to convert, and we can't handle that. */ + if (unmodified_insns.is_empty ()) + { + end_sequence (); + return FALSE; + } + + /* Check if this is actually beneficial. */ + if (cost > COSTS_N_INSNS (if_info->branch_cost)) + { + end_sequence (); + return FALSE; + } + + /* Now fixup the assignments. */ + for (unsigned i = 0; i < count; i++) + noce_emit_move_insn (targets[i], temporaries[i]); + + /* Actually emit the sequence. */ + rtx_insn *seq = get_insns (); + + for (insn = seq; insn; insn = NEXT_INSN (insn)) + set_used_flags (insn); + + unshare_all_rtl_in_chain (seq); + end_sequence (); + + if (!seq) + return FALSE; + + emit_insn_before_setloc (seq, if_info->jump, + INSN_LOCATION (unmodified_insns.last ())); + + /* Clean up THEN_BB and the edges in and out of it. */ + remove_edge (find_edge (test_bb, join_bb)); + remove_edge (find_edge (then_bb, join_bb)); + redirect_edge_and_branch_force (single_succ_edge (test_bb), join_bb); + delete_basic_block (then_bb); + num_true_changes++; + + /* Maybe merge blocks now the jump is simple enough. */ + if (can_merge_blocks_p (test_bb, join_bb)) + { + merge_blocks (test_bb, join_bb); + num_true_changes++; + } + + num_updated_if_blocks++; + return TRUE; +} + +/* Return true iff basic block TEST_BB is comprised of only + (SET (REG) (REG)) insns suitable for conversion to a series + of conditional moves. */ + +static bool +bb_ok_for_noce_convert_multiple_sets (basic_block test_bb) +{ + rtx_insn *insn; + + /* We must have at least one real insn to convert, or there will + be trouble! */ + bool bb_is_not_empty = false; + FOR_BB_INSNS (test_bb, insn) + { + /* Skip over notes etc. */ + if (!active_insn_p (insn)) + continue; + + /* We only handle SET insns. */ + rtx set = single_set (insn); + if (set == NULL_RTX) + return false; + + rtx dest = SET_DEST (set); + rtx src = SET_SRC (set); + + /* We can possibly relax this, but for now only handle REG to REG + moves. This avoids any issues that might come from introducing + loads/stores that might violate data-race-freedom guarantees. */ + if (!(REG_P (src) && REG_P (dest))) + return false; + + /* Destination must be appropriate for a conditional write. */ + if (!noce_operand_ok (dest)) + return false; + + /* We must be able to conditionally move in this mode. */ + if (!can_conditionally_move_p (GET_MODE (dest))) + return false; + + bb_is_not_empty = true; + } + return bb_is_not_empty; +} + /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert it without using conditional execution. Return TRUE if we were successful at converting the block. */ @@ -3004,12 +3221,22 @@ noce_process_if_block (struct noce_if_info *if_info) (1) if (...) x = a; else x = b; (2) x = b; if (...) x = a; (3) if (...) x = a; // as if with an initial x = x. - + (4) if (...) { x = a; y = b; z = c; } // Like 3, for multiple SETS. The later patterns require jumps to be more expensive. For the if (...) x = a; else x = b; case we allow multiple insns inside the then and else blocks as long as their only effect is to calculate a value for x. - ??? For future expansion, look for multiple X in such patterns. */ + ??? For future expansion, further expand the "multiple X" rules. */ + + /* First look for multiple SETS. */ + if (!else_bb + && HAVE_conditional_move + && !HAVE_cc0 + && bb_ok_for_noce_convert_multiple_sets (then_bb)) + { + if (noce_convert_multiple_sets (if_info)) + return TRUE; + } if (! bb_valid_for_noce_process_p (then_bb, cond, &if_info->then_cost, &if_info->then_simple))