From patchwork Mon Jul 3 08:45:34 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 106890 Delivered-To: patch@linaro.org Received: by 10.140.101.44 with SMTP id t41csp5522923qge; Mon, 3 Jul 2017 01:46:17 -0700 (PDT) X-Received: by 10.84.215.2 with SMTP id k2mr2129791pli.104.1499071577811; Mon, 03 Jul 2017 01:46:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1499071577; cv=none; d=google.com; s=arc-20160816; b=fmnAss2IL0Qp/qB4Ql4fjTImPmi/WjuYCNbxLf0t73n8EdpmMyColQ7KDzk/EenSVP UbLEuwwxShBJaNYAElYNeZg2KY9UCOOs9XRXP9U6VY3g/gW20nnsYFdHLsFiULWTzKWN zhv9oZi1mNnxPV0SPIZRlsuHz5vFggjKlH3M1VCq1MC1bmjZmq1SjSnKTwbw89craOv4 JMlcaq0/ENnNJxGXM49ros+qvH8FsTHeWn8CAwWfKmwn/Mb7x30RsqUe0+TwGhZ6WH1+ eoysv18QbIDadeT5UkJ60x1jowl661Iz312KPbVIb7LnQeTr4V084xCmAGKRS5XSvLTl 6zBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:date:subject:mail-followup-to:to :from:delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=89Ec3DOcIfA84mlqdPPXOA9ChEK+exaa8UvyLuvWbas=; b=mdbtUxk17lzoDxH2odKwFGkVwo03kItLnutIXL04Yu6ynZ+liZIg3+RE5ydN7dQrLw p1zSfoOyRoflYpchBKzxO/GKwbj8X9nFAtZTrt5loaSs++uhj8upRvqaqXlEaDzukLf8 i/gwm0tYThHGmHCUO3x4Uk7Cc4VgE9xQQqgqZTL8GmnKYONzd1GjpfurH74Dwy4N5nWP 9L1DUChqE+KZLBeFbWnXLvIi5dKYSadqDgZhhASzEzL17+8hxL/1s6Jxcdck1lek5/vh TE827nXajABJwmJBgRiUEPS7hKpZZTy9EQB6Wkdpkwq9QgeipjoCozryQmBUcaEyV2VM eoLQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.b=J9giigSg; spf=pass (google.com: domain of gcc-patches-return-457418-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-457418-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id p22si11720832pfl.488.2017.07.03.01.46.17 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 03 Jul 2017 01:46:17 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-457418-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.b=J9giigSg; spf=pass (google.com: domain of gcc-patches-return-457418-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-457418-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=D4ZKopHb/EkKKdSR0ldoU37YIadf7uC7wpPOTVnkLz1SVeAqtVHLk 5+K1LM8Ml8YrYW9XW6+YF5GEmNK6swGlatc6DRhQPyvFJ3la+90g1yCIEt5AA4A7 Q1AMpt6nnbItqRulm8aZnmRyliCHhyqbq5GRKR+ea1px+RAgJ2zMlc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=mPC1KCwiCQTId7q3GVg1XIpfAeU=; b=J9giigSguNgoqvfSsthk MscP5VOEea5zAq0x/RRdAFqrLXl/89Sb2rh4DzQbjgsYv4DEnmP5oHIZrWYO2U5u YhrabRMq8NOpvcGc6muaqXjvW82uPB8jF7nv4xdp4hm1f4NlXFRHQxnF4Z9dkld7 0kNU2eSsZAaFHVu2FIsG0W4= Received: (qmail 87788 invoked by alias); 3 Jul 2017 08:46:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 87527 invoked by uid 89); 3 Jul 2017 08:45:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-9.2 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS, UNSUBSCRIBE_BODY autolearn=ham version=3.3.2 spammy=exposes, 25628, sk:propert, motivating X-HELO: mail-wm0-f49.google.com Received: from mail-wm0-f49.google.com (HELO mail-wm0-f49.google.com) (74.125.82.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 03 Jul 2017 08:45:42 +0000 Received: by mail-wm0-f49.google.com with SMTP id w126so161078249wme.0 for ; Mon, 03 Jul 2017 01:45:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=89Ec3DOcIfA84mlqdPPXOA9ChEK+exaa8UvyLuvWbas=; b=KhAhpnVCjXGFfQSKiyqtddlHj3p9PL0XmnXZMOu3yhRSRr0fOb5FO0FtCcZjonvx9V v8sQ1KOv/xp6nXLXhMr6HLLgxtRcE6Wa1dYg6WhDl+Nd39qnrBiYO0snCxUU0vcYB/ku W8GqlpnVh/cMc9/GMVrFe4FCHR2RrNfUsn2gsBvQbjRgetVif/D00w0/jVjTFjEig3Jh Qky9k2iwxMvYo3QUKvX5dIsCy763CnVZTpGntetMIBtU5bcDOoYyqRq07OXSXzph//Ci cVQccBiCOOWyCLiPk7tt+eNNQ/2T7g4pI9dks+GbJFgEe8euL15tpVoheaqWUetdeAMD ekbw== X-Gm-Message-State: AKS2vOykUXSZFCQnP+9Fvr7rvXedaVptQeiHdAgd0Qrl6ihUf+rfvNKE fcHitQsb9MR0DKVYKAL/yA== X-Received: by 10.28.194.137 with SMTP id s131mr13003787wmf.111.1499071539344; Mon, 03 Jul 2017 01:45:39 -0700 (PDT) Received: from localhost (92.40.248.165.threembb.co.uk. [92.40.248.165]) by smtp.gmail.com with ESMTPSA id 15sm12776745wmx.18.2017.07.03.01.45.36 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 03 Jul 2017 01:45:36 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: RFC/A: Early predictive commoning pass Date: Mon, 03 Jul 2017 09:45:34 +0100 Message-ID: <8760f9yklt.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 General predictive commoning would play havoc with loop vectorisation, so the current pass order is clearly the right one. But running a very limited form of predictive commoning before vectorisation would allow us to vectorise things like: for (int i = 1; i < n; ++i) x[i] = x[i - 1] + 1; This patch adds an extra pass that is restricted to cases that should help (or at least not hinder) vectorisation. It gives some nice improvements on some internal benchmarks. I compared the output for SPEC 2k6 before and after the patch. For some benchmarks it led to a trivial register renaming, but had no effect on those benchmarks beyond that. The only benchmark that changed in a significant way was 416.gamess, where we were able to vectorise some simple loops that we weren't previously. None of those loops seem to be hot though, so there was no measurable difference in the score. Tested on aarch64-linux-gnu and x86_64-linux-gnu. Thoughts? Is this too much of a special case to support a new pass? OTOH, other compilers do vectorise the loop above, so it would be nice if we could too... Richard 2017-07-03 Richard Sandiford gcc/ * passes.def (pass_early_predcom): New. * tree-pass.h (make_pass_early_predcom): Declare. * tree-predcom.c (MAX_DISTANCE): Turn into an inclusive rather than exclusive upper bound. (only_simple_p): New variable. (max_distance): Likewise. (add_ref_to_chain): Use MAX_DISTANCE rather than max_distance and treat it as an inclusive upper bound. Require the store to come after the load at the maximum distance if only_simple_p. (add_looparound_copies): Do nothing if only_simple_p. (determine_roots_comp): Use MAX_DISTANCE rather than max_distance and treat it as an inclusive upper bound. Require the start of a chain to be a store if only_simple_p. (determine_unroll_factor): Return 1 if only_simple_p. (tree_predictive_commoning): Add an early_p parameter. Set up only_simple_p and max_distance. (run_tree_predictive_commoning): Add an early_p parameter. Update call to tree_predictive_commoning. (pass_data_early_predcom): New descriptor. (pass_early_predcom): New class. (pass_data_predcom::execute): Update call to run_tree_predictive_commoning. (make_pass_early_predcom): New function. gcc/testsuite/ * gnat.dg/vect18.adb: Turn off predictive commoning. Index: gcc/passes.def =================================================================== --- gcc/passes.def 2017-06-22 12:22:55.989380389 +0100 +++ gcc/passes.def 2017-07-03 09:17:28.626495661 +0100 @@ -290,6 +290,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_parallelize_loops, false /* oacc_kernels_p */); NEXT_PASS (pass_expand_omp_ssa); NEXT_PASS (pass_ch_vect); + NEXT_PASS (pass_early_predcom); NEXT_PASS (pass_if_conversion); /* pass_vectorize must immediately follow pass_if_conversion. Please do not add any other passes in between. */ Index: gcc/tree-pass.h =================================================================== --- gcc/tree-pass.h 2017-06-22 12:22:34.954287935 +0100 +++ gcc/tree-pass.h 2017-07-03 09:17:28.627495621 +0100 @@ -369,6 +369,7 @@ extern gimple_opt_pass *make_pass_tree_l extern gimple_opt_pass *make_pass_lim (gcc::context *ctxt); extern gimple_opt_pass *make_pass_tree_unswitch (gcc::context *ctxt); extern gimple_opt_pass *make_pass_loop_split (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_early_predcom (gcc::context *ctxt); extern gimple_opt_pass *make_pass_predcom (gcc::context *ctxt); extern gimple_opt_pass *make_pass_iv_canon (gcc::context *ctxt); extern gimple_opt_pass *make_pass_scev_cprop (gcc::context *ctxt); Index: gcc/tree-predcom.c =================================================================== --- gcc/tree-predcom.c 2017-07-03 08:42:47.632532107 +0100 +++ gcc/tree-predcom.c 2017-07-03 09:29:08.744451338 +0100 @@ -218,7 +218,7 @@ Free Software Foundation; either version /* The maximum number of iterations between the considered memory references. */ -#define MAX_DISTANCE (target_avail_regs < 16 ? 4 : 8) +#define MAX_DISTANCE (target_avail_regs < 16 ? 3 : 7) /* Data references (or phi nodes that carry data reference values across loop iterations). */ @@ -343,6 +343,71 @@ struct component static hash_map *name_expansions; +/* True if we're running the early predcom pass and should only handle + cases that aid vectorization. Specifically this means that: + + - only CT_INVARIANT and CT_STORE_LOAD chains are used + - the maximum distance for a CT_STORE_LOAD chain is 1 iteration, + and at that distance the store must come after the load + - there's no unrolling or detection of looparound phis. + + The idea is handle inductions that go via memory, such as: + + for (int i = 1; i < n; ++i) + x[i] = x[i - 1] + 1; + + As it stands this loop could never be vectorized, because a loop body + that contains a read of x[j] followed by a write to x[j + 1] would + have its vectorization factor limited to 1. Transforming it to: + + int tmp = x[0]; + for (int i = 0; i < n; ++i) + { + tmp += 1; + x[i] = tmp: + } + + exposes the fact that the stored value is a simple vectorizable + induction with start value x[0] and step 1. + + Commoning is not always useful even in this situation. For example, + carrying over the value of x[i] won't help us to vectorize: + + for (int i = 1; i < n; ++i) + { + y[i] = x[i - 1]; + x[i] += i; + } + + There's no real need to restrict things further though, because we're + unable to vectorize these load/store combinations in their current + form whatever happens. + + We require the store to come after the load when the distance is 1 + to avoid cases like: + + for (int i = 1; i < n; ++i) + { + x[i] = ...; + ... = x[i - 1]; + } + + These accesses effectively have a distance somewhere between 1 and 2, + since after commoning the value stored in the previous iteration would + still be live at the next store. This means that the combination + isn't useful for exposing simple inductions. + + Also, unlike the motivating case above, this combination does not + prevent vectorization. If a write to x[j + 1] comes before a read + of x[j], the vectorized write completes for all vector elements + before the read starts for any vector elements. */ + +static bool only_simple_p; + +/* The maximum loop carry distance for this execution of the pass. */ + +static int max_distance; + /* Dumps data reference REF to FILE. */ extern void dump_dref (FILE *, dref); @@ -960,7 +1025,13 @@ add_ref_to_chain (chain_p chain, dref re gcc_assert (wi::les_p (root->offset, ref->offset)); widest_int dist = ref->offset - root->offset; - if (wi::leu_p (MAX_DISTANCE, dist)) + /* When running before vectorization, only allow the maximum distance + if the consumer comes before the producer. See the comment above + ONLY_SIMPLE_P for details. */ + if (wi::ltu_p (max_distance, dist) + || (only_simple_p + && wi::eq_p (max_distance, dist) + && root->pos < ref->pos)) { free (ref); return; @@ -1194,6 +1265,11 @@ add_looparound_copies (struct loop *loop dref ref, root = get_chain_root (chain); gphi *phi; + /* There's no point doing this when running before vectorization, + since we won't unroll the loop or combine chains. */ + if (only_simple_p) + return; + FOR_EACH_VEC_ELT (chain->refs, i, ref) { phi = find_looparound_phi (loop, ref, root); @@ -1232,7 +1308,7 @@ determine_roots_comp (struct loop *loop, FOR_EACH_VEC_ELT (comp->refs, i, a) { if (!chain || DR_IS_WRITE (a->ref) - || wi::leu_p (MAX_DISTANCE, a->offset - last_ofs)) + || wi::ltu_p (max_distance, a->offset - last_ofs)) { if (nontrivial_chain_p (chain)) { @@ -1241,6 +1317,14 @@ determine_roots_comp (struct loop *loop, } else release_chain (chain); + /* Only create CT_STORE_LOAD and CT_INVARIANT chains when + running before vectorization. */ + if (only_simple_p && !DR_IS_WRITE (a->ref)) + { + free (a); + chain = NULL; + continue; + } chain = make_rooted_chain (a); last_ofs = a->offset; continue; @@ -1759,6 +1843,10 @@ determine_unroll_factor (vec ch unsigned factor = 1, af, nfactor, i; unsigned max = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES); + /* Do not unroll when running before vectorization. */ + if (only_simple_p) + return 1; + FOR_EACH_VEC_ELT (chains, i, chain) { if (chain->type == CT_INVARIANT) @@ -2562,8 +2650,11 @@ tree_predictive_commoning_loop (struct l } prepare_initializers (loop, chains); - /* Try to combine the chains that are always worked with together. */ - try_combine_chains (&chains); + /* During the main pass, try to combine the chains that are always + worked with together. For the early pass it should be better + to leave this to the vectorizer. */ + if (!only_simple_p) + try_combine_chains (&chains); if (dump_file && (dump_flags & TDF_DETAILS)) { @@ -2623,15 +2714,19 @@ tree_predictive_commoning_loop (struct l return unroll; } -/* Runs predictive commoning. */ +/* Runs predictive commoning. EARLY_P is true if we are running before + vectorization. */ unsigned -tree_predictive_commoning (void) +tree_predictive_commoning (bool early_p) { bool unrolled = false; struct loop *loop; unsigned ret = 0; + only_simple_p = early_p; + max_distance = early_p ? 1 : MAX_DISTANCE; + initialize_original_copy_tables (); FOR_EACH_LOOP (loop, LI_ONLY_INNERMOST) if (optimize_loop_for_speed_p (loop)) @@ -2649,19 +2744,51 @@ tree_predictive_commoning (void) return ret; } -/* Predictive commoning Pass. */ +/* Predictive commoning pass. EARLY_P is true if we are running before + vectorization. */ static unsigned -run_tree_predictive_commoning (struct function *fun) +run_tree_predictive_commoning (struct function *fun, bool early_p) { if (number_of_loops (fun) <= 1) return 0; - return tree_predictive_commoning (); + return tree_predictive_commoning (early_p); } namespace { +const pass_data pass_data_early_predcom = +{ + GIMPLE_PASS, /* type */ + "epcom", /* name */ + OPTGROUP_LOOP, /* optinfo_flags */ + TV_PREDCOM, /* tv_id */ + PROP_cfg, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_update_ssa_only_virtuals, /* todo_flags_finish */ +}; + +class pass_early_predcom : public gimple_opt_pass +{ +public: + pass_early_predcom (gcc::context *ctxt) + : gimple_opt_pass (pass_data_early_predcom, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) + { + return flag_predictive_commoning && flag_tree_loop_vectorize; + } + virtual unsigned int execute (function *fun) + { + return run_tree_predictive_commoning (fun, true); + } +}; // class pass_early_predcom + const pass_data pass_data_predcom = { GIMPLE_PASS, /* type */ @@ -2686,7 +2813,7 @@ const pass_data pass_data_predcom = virtual bool gate (function *) { return flag_predictive_commoning != 0; } virtual unsigned int execute (function *fun) { - return run_tree_predictive_commoning (fun); + return run_tree_predictive_commoning (fun, false); } }; // class pass_predcom @@ -2694,9 +2821,13 @@ const pass_data pass_data_predcom = } // anon namespace gimple_opt_pass * +make_pass_early_predcom (gcc::context *ctxt) +{ + return new pass_early_predcom (ctxt); +} + +gimple_opt_pass * make_pass_predcom (gcc::context *ctxt) { return new pass_predcom (ctxt); } - - Index: gcc/testsuite/gnat.dg/vect18.adb =================================================================== --- gcc/testsuite/gnat.dg/vect18.adb 2015-10-14 14:58:56.000000000 +0100 +++ gcc/testsuite/gnat.dg/vect18.adb 2017-07-03 09:17:28.627495621 +0100 @@ -1,5 +1,5 @@ -- { dg-do compile { target i?86-*-* x86_64-*-* } } --- { dg-options "-O3 -msse2 -fdump-tree-vect-details" } +-- { dg-options "-O3 -msse2 -fdump-tree-vect-details -fno-predictive-commoning" } package body Vect18 is