[RFC,AARCH64,2/5] : Add number of hw prefetchers available to cpu_prefetch_tune

Message ID CAELXzTORASUv0UGqNwWByWNNDY5xiiZA3Pt-TJfd+S1MWbgdNw@mail.gmail.com
State New
Headers show
Series
  • Loop unrolling and memory load streams
Related show

Commit Message

Kugan Vivekanandarajah Sept. 15, 2017, 1:28 a.m.
This patch adds number of hw prefetchers available to
cpu_prefetch_tune so it can be used in loop unrolling decisions.

Thanks,
Kugan

gcc/ChangeLog:

2017-09-12  Kugan Vivekanandarajah  <kuganv@linaro.org>

    * config/aarch64/aarch64-protos.h (struct cpu_prefetch_tune): Add
      new field hw_prefetchers_avail.
    * config/aarch64/aarch64.c: Add values for hw_prefetchers_avail.

Comments

Andrew Pinski Sept. 15, 2017, 3:20 a.m. | #1
On Thu, Sep 14, 2017 at 6:28 PM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> This patch adds number of hw prefetchers available to

> cpu_prefetch_tune so it can be used in loop unrolling decisions.


Can you explain the difference between this and num_slots
(PARAM_SIMULTANEOUS_PREFETCHES)?  Because it seems like they should be
the same here.

Thanks,
Andrew

>

> Thanks,

> Kugan

>

> gcc/ChangeLog:

>

> 2017-09-12  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     * config/aarch64/aarch64-protos.h (struct cpu_prefetch_tune): Add

>       new field hw_prefetchers_avail.

>     * config/aarch64/aarch64.c: Add values for hw_prefetchers_avail.
Kugan Vivekanandarajah Sept. 16, 2017, 10:51 p.m. | #2
Hi Andrew,

On 15 September 2017 at 13:20, Andrew Pinski <pinskia@gmail.com> wrote:
> On Thu, Sep 14, 2017 at 6:28 PM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> This patch adds number of hw prefetchers available to

>> cpu_prefetch_tune so it can be used in loop unrolling decisions.

>

> Can you explain the difference between this and num_slots

> (PARAM_SIMULTANEOUS_PREFETCHES)?  Because it seems like they should be

> the same here.

>

I kept it different for two reason.

1. I am not sure if this would have the same effect on all the
micro-architecture. Keeping it separate allows each microarchitecture
to enable prefetch loop arrays and aiding hw prefetcher (my goal here)
by limiting prefetch streams.

2. The params used for ARAM_SIMULTANEOUS_PREFETCHES seems to be a
value determined by experimentation rather than based on functional
units in hardware. This also allows tuning them speretaterly.

Thanks,
Kugan

Patch

From 07de7988c4c36a8eb262d53c259dc17d20d3b770 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Fri, 25 Aug 2017 10:02:45 +1000
Subject: [PATCH 2/5] Add hw prefetchers to cpu_prefetch_tune

---
 gcc/config/aarch64/aarch64-protos.h |  1 +
 gcc/config/aarch64/aarch64.c        | 18 ++++++++++++------
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index e397ff4..a182105 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -211,6 +211,7 @@  struct cpu_prefetch_tune
   const int l1_cache_line_size;
   const int l2_cache_size;
   const int default_opt_level;
+  const int hw_prefetchers_avail;
 };
 
 struct tune_params
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d753666..7d1ee70 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -533,7 +533,8 @@  static const cpu_prefetch_tune generic_prefetch_tune =
   -1,			/* l1_cache_size  */
   -1,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
-  -1			/* default_opt_level  */
+  -1,			/* default_opt_level  */
+  -1			/* default hw_prefetchers_avail */
 };
 
 static const cpu_prefetch_tune exynosm1_prefetch_tune =
@@ -542,7 +543,8 @@  static const cpu_prefetch_tune exynosm1_prefetch_tune =
   -1,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
-  -1			/* default_opt_level  */
+  -1,			/* default_opt_level  */
+  -1			/* default hw_prefetchers_avail */
 };
 
 static const cpu_prefetch_tune qdf24xx_prefetch_tune =
@@ -551,7 +553,8 @@  static const cpu_prefetch_tune qdf24xx_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   1024,			/* l2_cache_size  */
-  3			/* default_opt_level  */
+  3,			/* default_opt_level  */
+  7			/* hw_prefetchers_avail */
 };
 
 static const cpu_prefetch_tune thunderxt88_prefetch_tune =
@@ -560,7 +563,8 @@  static const cpu_prefetch_tune thunderxt88_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   16*1024,		/* l2_cache_size  */
-  3			/* default_opt_level  */
+  3,			/* default_opt_level  */
+  -1			/* default hw_prefetchers_avail */
 };
 
 static const cpu_prefetch_tune thunderx_prefetch_tune =
@@ -569,7 +573,8 @@  static const cpu_prefetch_tune thunderx_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
-  -1			/* default_opt_level  */
+  -1,			/* default_opt_level  */
+  -1			/* default hw_prefetchers_avail */
 };
 
 static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
@@ -578,7 +583,8 @@  static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   256,			/* l2_cache_size  */
-  -1			/* default_opt_level  */
+  -1,			/* default_opt_level  */
+  -1			/* default hw_prefetchers_avail */
 };
 
 static const struct tune_params generic_tunings =
-- 
2.7.4