From patchwork Mon Jan 22 13:46:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Machado X-Patchwork-Id: 125424 Delivered-To: patch@linaro.org Received: by 10.46.66.141 with SMTP id h13csp1164564ljf; Mon, 22 Jan 2018 05:46:59 -0800 (PST) X-Google-Smtp-Source: AH8x227hPrgAMN/5jJDltKP8RQOBLm1tIcrDiCxK1tfPuUMYKGfPWa3sJxISZWMdoCmK/TkFwRMd X-Received: by 2002:a17:902:6943:: with SMTP id k3-v6mr3573028plt.285.1516628819420; Mon, 22 Jan 2018 05:46:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516628819; cv=none; d=google.com; s=arc-20160816; b=mc3flPxbHpFnOpdzESQ0Ba5xsbdjlhSbProT1JeFOaC8+KRcgUk0WNKvKiPZRN4CTP DXwax6/zCDVEdF7Uoxe37d+8iBsBhkYK9R1X/7Cc/3dLROblDcA4wP/7cK+b4jIbdxDo 48aAA3ohzdhUiODuiLKPEuO/duTjJNk2QC9ZE2euYcsNLfOaVEdglkXjjV9tcJ/066IU nX9qzlpZcntnM7wJGjk+i8tSCtBznHrEkTp4C0us04ti58Dg6nMFkqIDnYJwEItO/pjO O7rgbToynxlFJeXIcJbto8DB55auWpYRIyqJV9iH335nAn22AZRqokjHuM71VgJHRyXF B1tQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=jl2S8FU/+oxcWNuBAz7/kUjjoxNG6Gsi02dd7lykxjI=; b=QhdL5SuTmT9jGbH6H5+THtfqNV7aaqm/8+2cl3vWaiR7rnLb88aV/0LXUrG2VR1i3u j4whSHARHsucahdt2iw3QLbrsIKVJkDJb6XRGDlAmw9xMa91qC/RDJw6Jo0M7PiKZztY emlOtn+nT6dkIVuJWAThk+YotjuD/+nEXfyHCgSrMZNlfiBwPkh7cln2eymQQWz31hPS Z3IPuyXATTHVffdyMyxKZVryS0IFgS3jfdB8jovs7AlWT+3XdErid3PmNZbDpyXVhbys eQ+rDDr9ZMF2TuhWMfndbsU6LR//MFy9R7+Yg7mm/WyaHU7R9i+9wf2KaNUKrONlzvhY BHVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Bt1UbKgM; spf=pass (google.com: domain of gcc-patches-return-471793-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-471793-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id 87si212520pfh.330.2018.01.22.05.46.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 22 Jan 2018 05:46:59 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-471793-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Bt1UbKgM; spf=pass (google.com: domain of gcc-patches-return-471793-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-471793-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references; q=dns; s= default; b=I1kTO53IsAu1BWSjbRUOkAKZGjvwZwO9oTr78qkiwglsDPp5fnf4+ C92P/OnAA8IYLUaa1RaNdwoC2zAoe7tcatmZvo3XIpo2P5zrMNVnk1qcm9SblYAh AnSKDipNZ/fLahO5sZGlRnLiYmXPoYtSdPUCdAjTWY8WloUtP1oOLo= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references; s= default; bh=7lH8llLCx3H4w+DLsmO8jWwm06c=; b=Bt1UbKgMpJTwirKeZHR9 59o27qKdLOy27yiQEasIRAtMCe6MZl4TGEFiUsxrSV61WEaZu/nT7gr4Q7vIw2Yq CTcCLnZGCgtOnIMbmWiaqtyIcGbepaOdDYHeyLhH2OZLOnuro5JG0ou8xePYsGyG skCLWrU73q0hrVrE4oCiwVk= Received: (qmail 1614 invoked by alias); 22 Jan 2018 13:46:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 1474 invoked by uid 89); 22 Jan 2018 13:46:39 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=Luis, 5566, prefetching, Minimum X-HELO: mail-qt0-f193.google.com Received: from mail-qt0-f193.google.com (HELO mail-qt0-f193.google.com) (209.85.216.193) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 22 Jan 2018 13:46:36 +0000 Received: by mail-qt0-f193.google.com with SMTP id x27so20763181qtm.12 for ; Mon, 22 Jan 2018 05:46:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=jl2S8FU/+oxcWNuBAz7/kUjjoxNG6Gsi02dd7lykxjI=; b=lPLOEvmW3TjSW5+zyNXq15ZJYvIFu3SOlO3CHPnhcZWvcGeHN8cZY00KYq6hJuFutT /5GxbfxgU9Lm7CQPApZiMpNBQQgnGI7JSxqMSZyUqDATrNTcRSimvhZvC0Qh714Swxpw rrqpcD/KNboQAGnCv6t/EH6Ad1o+LBPxJ/p3GRcUhBP1CW98vinHLfCwvW/UsQ5lS5FM hMZDH1ewSA0jpH7NOEelNkcmkbyWA/y1sUL3p8D1EFppF8HETERiHHXQS5gSt8H/Prlk l0+LdL1LnT7h6/orKYMpFGh+eiM1iO6tjq9BnN/uQoJnLJO3rORqdp1VMb1hOpV78e78 c3bg== X-Gm-Message-State: AKwxytdrAZnCp+LFJJyfwigSte2iqkRUlK7NNS92pgLQKsTutMqjEazY wX4xRzrqvh0Ljmlf/elEHfNVpdgalMM= X-Received: by 10.55.132.2 with SMTP id g2mr2315719qkd.131.1516628794747; Mon, 22 Jan 2018 05:46:34 -0800 (PST) Received: from localhost.localdomain ([177.180.105.91]) by smtp.gmail.com with ESMTPSA id s39sm10492452qth.67.2018.01.22.05.46.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 22 Jan 2018 05:46:33 -0800 (PST) From: Luis Machado To: gcc-patches@gcc.gnu.org Cc: james.greenhalgh@arm.com, Richard.Earnshaw@arm.com Subject: [PATCH 1/2] Introduce prefetch-minimum stride option Date: Mon, 22 Jan 2018 11:46:09 -0200 Message-Id: <1516628770-25036-2-git-send-email-luis.machado@linaro.org> In-Reply-To: <1516628770-25036-1-git-send-email-luis.machado@linaro.org> References: <1516628770-25036-1-git-send-email-luis.machado@linaro.org> X-IsSubscribed: yes This patch adds a new option to control the minimum stride, for a memory reference, after which the loop prefetch pass may issue software prefetch hints for. There are two motivations: * Make the pass less aggressive, only issuing prefetch hints for bigger strides that are more likely to benefit from prefetching. I've noticed a case in cpu2017 where we were issuing thousands of hints, for example. * For processors that have a hardware prefetcher, like Falkor, it allows the loop prefetch pass to defer prefetching of smaller (less than the threshold) strides to the hardware prefetcher instead. This prevents conflicts between the software prefetcher and the hardware prefetcher. I've noticed considerable reduction in the number of prefetch hints and slightly positive performance numbers. This aligns GCC and LLVM in terms of prefetch behavior for Falkor. The default settings should guarantee no changes for existing targets. Those are free to tweak the settings as necessary. No regressions in the testsuite and bootstrapped ok on aarch64-linux. Ok? 2018-01-22 Luis Machado Introduce option to limit software prefetching to known constant strides above a specific threshold with the goal of preventing conflicts with a hardware prefetcher. gcc/ * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) : New const int field. * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include minimum_stride field. (exynosm1_prefetch_tune): Likewise. (thunderxt88_prefetch_tune): Likewise. (thunderx_prefetch_tune): Likewise. (thunderx2t99_prefetch_tune): Likewise. (qdf24xx_prefetch_tune): Likewise. Set minimum_stride to 2048. (aarch64_override_options_internal): Update to set PARAM_PREFETCH_MINIMUM_STRIDE. * doc/invoke.texi (prefetch-minimum-stride): Document new option. * params.def (PARAM_PREFETCH_MINIMUM_STRIDE): New. * params.h (PARAM_PREFETCH_MINIMUM_STRIDE): Define. * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Return false if stride is constant and is below the minimum stride threshold. --- gcc/config/aarch64/aarch64-protos.h | 3 +++ gcc/config/aarch64/aarch64.c | 13 ++++++++++++- gcc/doc/invoke.texi | 15 +++++++++++++++ gcc/params.def | 9 +++++++++ gcc/params.h | 2 ++ gcc/tree-ssa-loop-prefetch.c | 16 ++++++++++++++++ 6 files changed, 57 insertions(+), 1 deletion(-) -- 2.7.4 diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index ef1b0bc..8736bd9 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -230,6 +230,9 @@ struct cpu_prefetch_tune const int l1_cache_size; const int l1_cache_line_size; const int l2_cache_size; + /* The minimum constant stride beyond which we should use prefetch + hints for. */ + const int minimum_stride; const int default_opt_level; }; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 174310c..0ed9f14 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune = -1, /* l1_cache_size */ -1, /* l1_cache_line_size */ -1, /* l2_cache_size */ + -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -556,6 +557,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune = -1, /* l1_cache_size */ 64, /* l1_cache_line_size */ -1, /* l2_cache_size */ + -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -565,7 +567,8 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 1024, /* l2_cache_size */ - -1 /* default_opt_level */ + 2048, /* minimum_stride */ + 3 /* default_opt_level */ }; static const cpu_prefetch_tune thunderxt88_prefetch_tune = @@ -574,6 +577,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ 16*1024, /* l2_cache_size */ + -1, /* minimum_stride */ 3 /* default_opt_level */ }; @@ -583,6 +587,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ -1, /* l2_cache_size */ + -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -592,6 +597,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 256, /* l2_cache_size */ + -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -10461,6 +10467,11 @@ aarch64_override_options_internal (struct gcc_options *opts) aarch64_tune_params.prefetch->l2_cache_size, opts->x_param_values, global_options_set.x_param_values); + if (aarch64_tune_params.prefetch->minimum_stride >= 0) + maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE, + aarch64_tune_params.prefetch->minimum_stride, + opts->x_param_values, + global_options_set.x_param_values); /* Use the alternative scheduling-pressure algorithm by default. */ maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, SCHED_PRESSURE_MODEL, diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 27c5974..1cb1ef5 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -10567,6 +10567,21 @@ The size of L1 cache, in kilobytes. @item l2-cache-size The size of L2 cache, in kilobytes. +@item prefetch-minimum-stride +Minimum constant stride, in bytes, to start using prefetch hints for. If +the stride is less than this threshold, prefetch hints will not be issued. + +This setting is useful for processors that have hardware prefetchers, in +which case there may be conflicts between the hardware prefetchers and +the software prefetchers. If the hardware prefetchers have a maximum +stride they can handle, it should be used here to improve the use of +software prefetchers. + +A value of -1, the default, means we don't have a threshold and therefore +prefetch hints can be issued for any constant stride. + +This setting is only useful for strides that are known and constant. + @item loop-interchange-max-num-stmts The maximum number of stmts in a loop to be interchanged. diff --git a/gcc/params.def b/gcc/params.def index 930b318..bf2d12c 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -790,6 +790,15 @@ DEFPARAM (PARAM_L2_CACHE_SIZE, "The size of L2 cache.", 512, 0, 0) +/* The minimum constant stride beyond which we should use prefetch hints + for. */ + +DEFPARAM (PARAM_PREFETCH_MINIMUM_STRIDE, + "prefetch-minimum-stride", + "The minimum constant stride beyond which we should use prefetch " + "hints for.", + -1, 0, 0) + /* Maximum number of statements in loop nest for loop interchange. */ DEFPARAM (PARAM_LOOP_INTERCHANGE_MAX_NUM_STMTS, diff --git a/gcc/params.h b/gcc/params.h index 98249d2..96012db 100644 --- a/gcc/params.h +++ b/gcc/params.h @@ -196,6 +196,8 @@ extern void init_param_values (int *params); PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE) #define L2_CACHE_SIZE \ PARAM_VALUE (PARAM_L2_CACHE_SIZE) +#define PREFETCH_MINIMUM_STRIDE \ + PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE) #define USE_CANONICAL_TYPES \ PARAM_VALUE (PARAM_USE_CANONICAL_TYPES) #define IRA_MAX_LOOPS_NUM \ diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c index 2f10db1..112ccac 100644 --- a/gcc/tree-ssa-loop-prefetch.c +++ b/gcc/tree-ssa-loop-prefetch.c @@ -992,6 +992,22 @@ prune_by_reuse (struct mem_ref_group *groups) static bool should_issue_prefetch_p (struct mem_ref *ref) { + /* Some processors may have a hardware prefetcher that may conflict with + prefetch hints for a range of strides. Make sure we don't issue + prefetches for such cases if the stride is within this particular + range. */ + if (cst_and_fits_in_hwi (ref->group->step) + && absu_hwi (int_cst_value (ref->group->step)) < PREFETCH_MINIMUM_STRIDE) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Step for reference %u:%u (%d) is less than the mininum " + " required stride of %d\n", + ref->group->uid, ref->uid, int_cst_value (ref->group->step), + PREFETCH_MINIMUM_STRIDE); + return false; + } + /* For now do not issue prefetches for only first few of the iterations. */ if (ref->prefetch_before != PREFETCH_ALL) From patchwork Mon Jan 22 13:46:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Luis Machado X-Patchwork-Id: 125425 Delivered-To: patch@linaro.org Received: by 10.46.66.141 with SMTP id h13csp1164657ljf; Mon, 22 Jan 2018 05:47:11 -0800 (PST) X-Google-Smtp-Source: AH8x2262d2MUXkf2DUY0HX0DeDxQ/+iO9bQDmOarQ2/oz4guI4JkBNoby1yRnxIjoEaRWmeQr/Zl X-Received: by 2002:a17:902:82c3:: with SMTP id u3-v6mr3531182plz.416.1516628831078; Mon, 22 Jan 2018 05:47:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516628831; cv=none; d=google.com; s=arc-20160816; b=RyCty3n05DAdhuOhDGsisKQWKStr7c1jIUcT4hlGuJKsLAGoU3cH4ze+bf+wvSV7PD gN8rwDdr/xWU+fBrwDfmk7UQSkU7uJ0S+KXbyzm6AvLLedpgj/toONc7WAg18CteTXjp Vavxz/Gj8Mbba+gOKmRgom+sHcsCuvsMTU6S6GTyzbjncP0Ct6z9bwKKgwabIhAPdg7Y E4oIrkBR3lQ4ozS88eDerGXUT9Sw1F6HGgceck5BakiRGgYsi3HLcW3mb7IoVSKiH9b8 9k2DohTURur2ls6JmzTl45svH+sjFtoFx6yov9IgyCF8/koyTaPLO2A+8Tibw2OQHC0s /Klw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=references:in-reply-to:message-id:date:subject:cc:to:from :delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=gfYXY2eHE95so7+/JAKoMHzgkxx6x3RL2T9g0DQxL04=; b=vCgRmTIOShwNATaS6Wau/uAyjpJYQE973r7kB7gJS2cirVnBSJKdp6giFSE11QQ5sD juGOxS7YaaTwC50sUHAiZtA9zDrb/3BWuQGwU7Mno+mVigJkBBp5Tvyh7wug0J9qq59O 8vAbDedXs528D6KEBOk39JBTBJfzAVEEceZOh0KvlFURLjCLc0d5zby2PT4MUBvTPhFb YK1hxQWTt/u20LUrgQoj/zBhvHyGpfGrHr51Kq4yo6flm5s3NomQ2OHBBc6KyQe7DygT AcMRsug4tRFBeV4w/K9KdL3MG/a5KzWb8Orb7GcaeykbyQqzyGK72OlYfKy74EDDar18 QUxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=S5+EsZNC; spf=pass (google.com: domain of gcc-patches-return-471794-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-471794-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id c10-v6si1047292plm.342.2018.01.22.05.47.10 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 22 Jan 2018 05:47:11 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-471794-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=S5+EsZNC; spf=pass (google.com: domain of gcc-patches-return-471794-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-471794-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references; q=dns; s= default; b=PZHh2DnnMorwaJr9mtvhmyURLyzaehVVhxC13k/+1ziusJVGlxbX/ SWJ5B+SM2GeQ4gtPZjx+EQQaZlTH7MT3+os+neK6JmpYlR2h68qp8Mf5yoVbVqS2 m4yFUs/kP34wQtgfknzcFOOM2+F7J8IIQpwD1OHwMJpILlhpjva9fs= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references; s= default; bh=7UozLNxZKwTsk7zH9vgcEd9n2F0=; b=S5+EsZNClta+WDPUaEHf trnEgZV6r6otyU06M8q5AhmwFc9yBrjUpeM3Qdqvewe0c3PHDMhAAQivrFnJLIJ1 ZNji0M5bw5Sf/MrfkOmiKmP9YWD6UmxgGe2yxIvKbhZ9HULNRtAda5QqsPHvYyqe cjalCsVrOkITiYo02NXrQCE= Received: (qmail 2218 invoked by alias); 22 Jan 2018 13:46:44 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 2007 invoked by uid 89); 22 Jan 2018 13:46:42 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-qt0-f196.google.com Received: from mail-qt0-f196.google.com (HELO mail-qt0-f196.google.com) (209.85.216.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 22 Jan 2018 13:46:40 +0000 Received: by mail-qt0-f196.google.com with SMTP id s3so20760659qtb.10 for ; Mon, 22 Jan 2018 05:46:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=gfYXY2eHE95so7+/JAKoMHzgkxx6x3RL2T9g0DQxL04=; b=JRDBatPlh2QEZlWh2SAT3P27PxEnEC3D7/kfpmCGQFNeuQYJ+Dkl6dQmHGJzyzAhDV gUw0nFY1ujL2LJWB7rKvzZNXmfjphcFY/jT0fy4Z3swoE64dxeC507tqTf3GjY9seYg0 VSgYZNW1ljRkw4Wt3t0m1Zt0fqOUQvqfi04gYFO+gxkuFIeVydoDnjd6/+QWrYUxDSry hfDG05urjK6bzZghi2CUCXG0mSu6idwbkcbWCIGMpXu1e1fPluCFR6J9xl8aUcdydEmU wFImjh2uNuqVoLuXTlov3bi9YF5VMMvlcJN02bGerKT2A2BM2mkrZ+teX6747i0MIyPa aY+A== X-Gm-Message-State: AKwxytdMd5SVINkYnWDIDPE/S65Xok6g4q2oacnKh3XTjLtiPGvRmhnY dhiR11yGO8L/PGA7aT/UkKxioq9u+Zk= X-Received: by 10.237.49.134 with SMTP id 6mr11563522qth.178.1516628798444; Mon, 22 Jan 2018 05:46:38 -0800 (PST) Received: from localhost.localdomain ([177.180.105.91]) by smtp.gmail.com with ESMTPSA id s39sm10492452qth.67.2018.01.22.05.46.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 22 Jan 2018 05:46:37 -0800 (PST) From: Luis Machado To: gcc-patches@gcc.gnu.org Cc: james.greenhalgh@arm.com, Richard.Earnshaw@arm.com Subject: [PATCH 2/2] Introduce prefetch-dynamic-strides option. Date: Mon, 22 Jan 2018 11:46:10 -0200 Message-Id: <1516628770-25036-3-git-send-email-luis.machado@linaro.org> In-Reply-To: <1516628770-25036-1-git-send-email-luis.machado@linaro.org> References: <1516628770-25036-1-git-send-email-luis.machado@linaro.org> X-IsSubscribed: yes The following patch adds an option to control software prefetching of memory references with non-constant/unknown strides. Currently we prefetch these references if the pass thinks there is benefit to doing so. But, since this is all based on heuristics, it's not always the case that we end up with better performance. For Falkor there is also the problem of conflicts with the hardware prefetcher, so we need to be more conservative in terms of what we issue software prefetch hints for. This also aligns GCC with what LLVM does for Falkor. Similarly to the previous patch, the defaults guarantee no change in behavior for other targets and architectures. I've regression-tested and bootstrapped it on aarch64-linux. No problems found. Ok? 2018-01-22 Luis Machado Introduce option to control whether the software prefetch pass should issue prefetch hints for non-constant strides. gcc/ * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) : New const unsigned int field. * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include prefetch_dynamic_strides. (exynosm1_prefetch_tune): Likewise. (thunderxt88_prefetch_tune): Likewise. (thunderx_prefetch_tune): Likewise. (thunderx2t99_prefetch_tune): Likewise. (qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0. (aarch64_override_options_internal): Update to set PARAM_PREFETCH_DYNAMIC_STRIDES. * doc/invoke.texi (prefetch-dynamic-strides): Document new option. * params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New. * params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define. * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for prefetch-dynamic-strides setting. --- gcc/config/aarch64/aarch64-protos.h | 3 +++ gcc/config/aarch64/aarch64.c | 11 +++++++++++ gcc/doc/invoke.texi | 10 ++++++++++ gcc/params.def | 9 +++++++++ gcc/params.h | 2 ++ gcc/tree-ssa-loop-prefetch.c | 10 ++++++++++ 6 files changed, 45 insertions(+) -- 2.7.4 diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 8736bd9..22bd9ae 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -230,6 +230,9 @@ struct cpu_prefetch_tune const int l1_cache_size; const int l1_cache_line_size; const int l2_cache_size; + /* Whether software prefetch hints should be issued for non-constant + strides. */ + const unsigned int prefetch_dynamic_strides; /* The minimum constant stride beyond which we should use prefetch hints for. */ const int minimum_stride; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 0ed9f14..713b230 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune = -1, /* l1_cache_size */ -1, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -557,6 +558,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune = -1, /* l1_cache_size */ 64, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -567,6 +569,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 1024, /* l2_cache_size */ + 0, /* prefetch_dynamic_strides */ 2048, /* minimum_stride */ 3 /* default_opt_level */ }; @@ -577,6 +580,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ 16*1024, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ 3 /* default_opt_level */ }; @@ -587,6 +591,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -597,6 +602,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 256, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -10467,6 +10473,11 @@ aarch64_override_options_internal (struct gcc_options *opts) aarch64_tune_params.prefetch->l2_cache_size, opts->x_param_values, global_options_set.x_param_values); + if (aarch64_tune_params.prefetch->prefetch_dynamic_strides == 0) + maybe_set_param_value (PARAM_PREFETCH_DYNAMIC_STRIDES, + aarch64_tune_params.prefetch->prefetch_dynamic_strides, + opts->x_param_values, + global_options_set.x_param_values); if (aarch64_tune_params.prefetch->minimum_stride >= 0) maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE, aarch64_tune_params.prefetch->minimum_stride, diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 1cb1ef5..541c24c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -10567,6 +10567,16 @@ The size of L1 cache, in kilobytes. @item l2-cache-size The size of L2 cache, in kilobytes. +@item prefetch-dynamic-strides +Whether the loop array prefetch pass should issue software prefetch hints +for strides that are non-constant. In some cases this may be +beneficial, though the fact the stride is non-constant may make it +hard to predict when there is clear benefit to issuing these hints. + +Set to 1, the default, if the prefetch hints should be issued for non-constant +strides. Set to 0 if prefetch hints should be issued only for strides that +are known to be constant and below @option{prefetch-minimum-stride}. + @item prefetch-minimum-stride Minimum constant stride, in bytes, to start using prefetch hints for. If the stride is less than this threshold, prefetch hints will not be issued. diff --git a/gcc/params.def b/gcc/params.def index bf2d12c..c564178 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -790,6 +790,15 @@ DEFPARAM (PARAM_L2_CACHE_SIZE, "The size of L2 cache.", 512, 0, 0) +/* Whether software prefetch hints should be issued for non-constant + strides. */ + +DEFPARAM (PARAM_PREFETCH_DYNAMIC_STRIDES, + "prefetch-dynamic-strides", + "Whether software prefetch hints should be issued for non-constant " + "strides.", + 1, 0, 1) + /* The minimum constant stride beyond which we should use prefetch hints for. */ diff --git a/gcc/params.h b/gcc/params.h index 96012db..8aa960a 100644 --- a/gcc/params.h +++ b/gcc/params.h @@ -196,6 +196,8 @@ extern void init_param_values (int *params); PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE) #define L2_CACHE_SIZE \ PARAM_VALUE (PARAM_L2_CACHE_SIZE) +#define PREFETCH_DYNAMIC_STRIDES \ + PARAM_VALUE (PARAM_PREFETCH_DYNAMIC_STRIDES) #define PREFETCH_MINIMUM_STRIDE \ PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE) #define USE_CANONICAL_TYPES \ diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c index 112ccac..de2acc8 100644 --- a/gcc/tree-ssa-loop-prefetch.c +++ b/gcc/tree-ssa-loop-prefetch.c @@ -992,6 +992,16 @@ prune_by_reuse (struct mem_ref_group *groups) static bool should_issue_prefetch_p (struct mem_ref *ref) { + /* Do we want to issue prefetches for non-constant strides? */ + if (!cst_and_fits_in_hwi (ref->group->step) && PREFETCH_DYNAMIC_STRIDES == 0) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Skipping non-constant step for reference %u:%u\n", + ref->group->uid, ref->uid); + return false; + } + /* Some processors may have a hardware prefetcher that may conflict with prefetch hints for a range of strides. Make sure we don't issue prefetches for such cases if the stride is within this particular