Message ID | 1516628770-25036-3-git-send-email-luis.machado@linaro.org |
---|---|
State | New |
Headers | show |
Series | Add a couple new options to control loop prefetch pass | expand |
Hi Luis, On 22/01/18 13:46, Luis Machado wrote: > The following patch adds an option to control software prefetching of memory > references with non-constant/unknown strides. > > Currently we prefetch these references if the pass thinks there is benefit to > doing so. But, since this is all based on heuristics, it's not always the case > that we end up with better performance. > > For Falkor there is also the problem of conflicts with the hardware prefetcher, > so we need to be more conservative in terms of what we issue software prefetch > hints for. > > This also aligns GCC with what LLVM does for Falkor. > > Similarly to the previous patch, the defaults guarantee no change in behavior > for other targets and architectures. > > I've regression-tested and bootstrapped it on aarch64-linux. No problems found. > > Ok? > This also looks like a sensible approach to me with a caveat inline. The same general comments as for patch [1/2] apply. Thanks, Kyrill > 2018-01-22 Luis Machado <luis.machado@linaro.org> > > Introduce option to control whether the software prefetch pass should > issue prefetch hints for non-constant strides. > > gcc/ > * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) > <prefetch_dynamic_strides>: New const unsigned int field. > * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include > prefetch_dynamic_strides. > (exynosm1_prefetch_tune): Likewise. > (thunderxt88_prefetch_tune): Likewise. > (thunderx_prefetch_tune): Likewise. > (thunderx2t99_prefetch_tune): Likewise. > (qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0. > (aarch64_override_options_internal): Update to set > PARAM_PREFETCH_DYNAMIC_STRIDES. > * doc/invoke.texi (prefetch-dynamic-strides): Document new option. > * params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New. > * params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define. > * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for > prefetch-dynamic-strides setting. > --- > gcc/config/aarch64/aarch64-protos.h | 3 +++ > gcc/config/aarch64/aarch64.c | 11 +++++++++++ > gcc/doc/invoke.texi | 10 ++++++++++ > gcc/params.def | 9 +++++++++ > gcc/params.h | 2 ++ > gcc/tree-ssa-loop-prefetch.c | 10 ++++++++++ > 6 files changed, 45 insertions(+) > > diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h > index 8736bd9..22bd9ae 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -230,6 +230,9 @@ struct cpu_prefetch_tune > const int l1_cache_size; > const int l1_cache_line_size; > const int l2_cache_size; > + /* Whether software prefetch hints should be issued for non-constant > + strides. */ > + const unsigned int prefetch_dynamic_strides; I understand that the midend PARAMs are defined as integers, but I think the backend tuning option here is better represented as a boolean as it really is just a yes/no decision. > /* The minimum constant stride beyond which we should use prefetch > hints for. */ > const int minimum_stride; > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 0ed9f14..713b230 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune = > -1, /* l1_cache_size */ > -1, /* l1_cache_line_size */ > -1, /* l2_cache_size */ > + 1, /* prefetch_dynamic_strides */ > -1, /* minimum_stride */ > -1 /* default_opt_level */ > }; > @@ -557,6 +558,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune = > -1, /* l1_cache_size */ > 64, /* l1_cache_line_size */ > -1, /* l2_cache_size */ > + 1, /* prefetch_dynamic_strides */ > -1, /* minimum_stride */ > -1 /* default_opt_level */ > }; > @@ -567,6 +569,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune = > 32, /* l1_cache_size */ > 64, /* l1_cache_line_size */ > 1024, /* l2_cache_size */ > + 0, /* prefetch_dynamic_strides */ > 2048, /* minimum_stride */ > 3 /* default_opt_level */ > }; > @@ -577,6 +580,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune = > 32, /* l1_cache_size */ > 128, /* l1_cache_line_size */ > 16*1024, /* l2_cache_size */ > + 1, /* prefetch_dynamic_strides */ > -1, /* minimum_stride */ > 3 /* default_opt_level */ > }; > @@ -587,6 +591,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune = > 32, /* l1_cache_size */ > 128, /* l1_cache_line_size */ > -1, /* l2_cache_size */ > + 1, /* prefetch_dynamic_strides */ > -1, /* minimum_stride */ > -1 /* default_opt_level */ > }; > @@ -597,6 +602,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune = > 32, /* l1_cache_size */ > 64, /* l1_cache_line_size */ > 256, /* l2_cache_size */ > + 1, /* prefetch_dynamic_strides */ > -1, /* minimum_stride */ > -1 /* default_opt_level */ > }; > @@ -10467,6 +10473,11 @@ aarch64_override_options_internal (struct gcc_options *opts) > aarch64_tune_params.prefetch->l2_cache_size, > opts->x_param_values, > global_options_set.x_param_values); > + if (aarch64_tune_params.prefetch->prefetch_dynamic_strides == 0) > + maybe_set_param_value (PARAM_PREFETCH_DYNAMIC_STRIDES, > + aarch64_tune_params.prefetch->prefetch_dynamic_strides, > + opts->x_param_values, > + global_options_set.x_param_values); > if (aarch64_tune_params.prefetch->minimum_stride >= 0) > maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE, > aarch64_tune_params.prefetch->minimum_stride, > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 1cb1ef5..541c24c 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -10567,6 +10567,16 @@ The size of L1 cache, in kilobytes. > @item l2-cache-size > The size of L2 cache, in kilobytes. > > +@item prefetch-dynamic-strides > +Whether the loop array prefetch pass should issue software prefetch hints > +for strides that are non-constant. In some cases this may be > +beneficial, though the fact the stride is non-constant may make it > +hard to predict when there is clear benefit to issuing these hints. > + > +Set to 1, the default, if the prefetch hints should be issued for non-constant > +strides. Set to 0 if prefetch hints should be issued only for strides that > +are known to be constant and below @option{prefetch-minimum-stride}. > + > @item prefetch-minimum-stride > Minimum constant stride, in bytes, to start using prefetch hints for. If > the stride is less than this threshold, prefetch hints will not be issued. > diff --git a/gcc/params.def b/gcc/params.def > index bf2d12c..c564178 100644 > --- a/gcc/params.def > +++ b/gcc/params.def > @@ -790,6 +790,15 @@ DEFPARAM (PARAM_L2_CACHE_SIZE, > "The size of L2 cache.", > 512, 0, 0) > > +/* Whether software prefetch hints should be issued for non-constant > + strides. */ > + > +DEFPARAM (PARAM_PREFETCH_DYNAMIC_STRIDES, > + "prefetch-dynamic-strides", > + "Whether software prefetch hints should be issued for non-constant " > + "strides.", > + 1, 0, 1) > + > /* The minimum constant stride beyond which we should use prefetch hints > for. */ > > diff --git a/gcc/params.h b/gcc/params.h > index 96012db..8aa960a 100644 > --- a/gcc/params.h > +++ b/gcc/params.h > @@ -196,6 +196,8 @@ extern void init_param_values (int *params); > PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE) > #define L2_CACHE_SIZE \ > PARAM_VALUE (PARAM_L2_CACHE_SIZE) > +#define PREFETCH_DYNAMIC_STRIDES \ > + PARAM_VALUE (PARAM_PREFETCH_DYNAMIC_STRIDES) > #define PREFETCH_MINIMUM_STRIDE \ > PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE) > #define USE_CANONICAL_TYPES \ > diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c > index 112ccac..de2acc8 100644 > --- a/gcc/tree-ssa-loop-prefetch.c > +++ b/gcc/tree-ssa-loop-prefetch.c > @@ -992,6 +992,16 @@ prune_by_reuse (struct mem_ref_group *groups) > static bool > should_issue_prefetch_p (struct mem_ref *ref) > { > + /* Do we want to issue prefetches for non-constant strides? */ > + if (!cst_and_fits_in_hwi (ref->group->step) && PREFETCH_DYNAMIC_STRIDES == 0) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, > + "Skipping non-constant step for reference %u:%u\n", > + ref->group->uid, ref->uid); > + return false; > + } > + > /* Some processors may have a hardware prefetcher that may conflict with > prefetch hints for a range of strides. Make sure we don't issue > prefetches for such cases if the stride is within this particular > -- > 2.7.4 >
Hi, On 01/23/2018 07:40 AM, Kyrill Tkachov wrote: > Hi Luis, > > On 22/01/18 13:46, Luis Machado wrote: >> The following patch adds an option to control software prefetching of >> memory >> references with non-constant/unknown strides. >> >> Currently we prefetch these references if the pass thinks there is >> benefit to >> doing so. But, since this is all based on heuristics, it's not always >> the case >> that we end up with better performance. >> >> For Falkor there is also the problem of conflicts with the hardware >> prefetcher, >> so we need to be more conservative in terms of what we issue software >> prefetch >> hints for. >> >> This also aligns GCC with what LLVM does for Falkor. >> >> Similarly to the previous patch, the defaults guarantee no change in >> behavior >> for other targets and architectures. >> >> I've regression-tested and bootstrapped it on aarch64-linux. No >> problems found. >> >> Ok? >> > > This also looks like a sensible approach to me with a caveat inline. > The same general comments as for patch [1/2] apply. >> diff --git a/gcc/config/aarch64/aarch64-protos.h >> b/gcc/config/aarch64/aarch64-protos.h >> index 8736bd9..22bd9ae 100644 >> --- a/gcc/config/aarch64/aarch64-protos.h >> +++ b/gcc/config/aarch64/aarch64-protos.h >> @@ -230,6 +230,9 @@ struct cpu_prefetch_tune >> const int l1_cache_size; >> const int l1_cache_line_size; >> const int l2_cache_size; >> + /* Whether software prefetch hints should be issued for non-constant >> + strides. */ >> + const unsigned int prefetch_dynamic_strides; > > I understand that the midend PARAMs are defined as integers, but I think > the backend tuning option here is better represented as a boolean as it > really > is just a yes/no decision. > I started off with a boolean to be honest. Then i noticed the midend only used integers, which i restricted to the range of 0..1. I'll change this locally to use booleans again. Thanks! Luis
On 01/22/2018 06:46 AM, Luis Machado wrote: > The following patch adds an option to control software prefetching of memory > references with non-constant/unknown strides. > > Currently we prefetch these references if the pass thinks there is benefit to > doing so. But, since this is all based on heuristics, it's not always the case > that we end up with better performance. > > For Falkor there is also the problem of conflicts with the hardware prefetcher, > so we need to be more conservative in terms of what we issue software prefetch > hints for. > > This also aligns GCC with what LLVM does for Falkor. > > Similarly to the previous patch, the defaults guarantee no change in behavior > for other targets and architectures. > > I've regression-tested and bootstrapped it on aarch64-linux. No problems found. > > Ok? > > 2018-01-22 Luis Machado <luis.machado@linaro.org> > > Introduce option to control whether the software prefetch pass should > issue prefetch hints for non-constant strides. > > gcc/ > * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) > <prefetch_dynamic_strides>: New const unsigned int field. > * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include > prefetch_dynamic_strides. > (exynosm1_prefetch_tune): Likewise. > (thunderxt88_prefetch_tune): Likewise. > (thunderx_prefetch_tune): Likewise. > (thunderx2t99_prefetch_tune): Likewise. > (qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0. > (aarch64_override_options_internal): Update to set > PARAM_PREFETCH_DYNAMIC_STRIDES. > * doc/invoke.texi (prefetch-dynamic-strides): Document new option. > * params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New. > * params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define. > * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for > prefetch-dynamic-strides setting. OK for the trunk. jeff
On 05/01/2018 03:30 PM, Jeff Law wrote: > On 01/22/2018 06:46 AM, Luis Machado wrote: >> The following patch adds an option to control software prefetching of memory >> references with non-constant/unknown strides. >> >> Currently we prefetch these references if the pass thinks there is benefit to >> doing so. But, since this is all based on heuristics, it's not always the case >> that we end up with better performance. >> >> For Falkor there is also the problem of conflicts with the hardware prefetcher, >> so we need to be more conservative in terms of what we issue software prefetch >> hints for. >> >> This also aligns GCC with what LLVM does for Falkor. >> >> Similarly to the previous patch, the defaults guarantee no change in behavior >> for other targets and architectures. >> >> I've regression-tested and bootstrapped it on aarch64-linux. No problems found. >> >> Ok? >> >> 2018-01-22 Luis Machado <luis.machado@linaro.org> >> >> Introduce option to control whether the software prefetch pass should >> issue prefetch hints for non-constant strides. >> >> gcc/ >> * config/aarch64/aarch64-protos.h (cpu_prefetch_tune) >> <prefetch_dynamic_strides>: New const unsigned int field. >> * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include >> prefetch_dynamic_strides. >> (exynosm1_prefetch_tune): Likewise. >> (thunderxt88_prefetch_tune): Likewise. >> (thunderx_prefetch_tune): Likewise. >> (thunderx2t99_prefetch_tune): Likewise. >> (qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0. >> (aarch64_override_options_internal): Update to set >> PARAM_PREFETCH_DYNAMIC_STRIDES. >> * doc/invoke.texi (prefetch-dynamic-strides): Document new option. >> * params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New. >> * params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define. >> * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for >> prefetch-dynamic-strides setting. > OK for the trunk. > jeff > Thanks. Committed as r259996.
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 8736bd9..22bd9ae 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -230,6 +230,9 @@ struct cpu_prefetch_tune const int l1_cache_size; const int l1_cache_line_size; const int l2_cache_size; + /* Whether software prefetch hints should be issued for non-constant + strides. */ + const unsigned int prefetch_dynamic_strides; /* The minimum constant stride beyond which we should use prefetch hints for. */ const int minimum_stride; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 0ed9f14..713b230 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune = -1, /* l1_cache_size */ -1, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -557,6 +558,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune = -1, /* l1_cache_size */ 64, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -567,6 +569,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 1024, /* l2_cache_size */ + 0, /* prefetch_dynamic_strides */ 2048, /* minimum_stride */ 3 /* default_opt_level */ }; @@ -577,6 +580,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ 16*1024, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ 3 /* default_opt_level */ }; @@ -587,6 +591,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune = 32, /* l1_cache_size */ 128, /* l1_cache_line_size */ -1, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -597,6 +602,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune = 32, /* l1_cache_size */ 64, /* l1_cache_line_size */ 256, /* l2_cache_size */ + 1, /* prefetch_dynamic_strides */ -1, /* minimum_stride */ -1 /* default_opt_level */ }; @@ -10467,6 +10473,11 @@ aarch64_override_options_internal (struct gcc_options *opts) aarch64_tune_params.prefetch->l2_cache_size, opts->x_param_values, global_options_set.x_param_values); + if (aarch64_tune_params.prefetch->prefetch_dynamic_strides == 0) + maybe_set_param_value (PARAM_PREFETCH_DYNAMIC_STRIDES, + aarch64_tune_params.prefetch->prefetch_dynamic_strides, + opts->x_param_values, + global_options_set.x_param_values); if (aarch64_tune_params.prefetch->minimum_stride >= 0) maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE, aarch64_tune_params.prefetch->minimum_stride, diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 1cb1ef5..541c24c 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -10567,6 +10567,16 @@ The size of L1 cache, in kilobytes. @item l2-cache-size The size of L2 cache, in kilobytes. +@item prefetch-dynamic-strides +Whether the loop array prefetch pass should issue software prefetch hints +for strides that are non-constant. In some cases this may be +beneficial, though the fact the stride is non-constant may make it +hard to predict when there is clear benefit to issuing these hints. + +Set to 1, the default, if the prefetch hints should be issued for non-constant +strides. Set to 0 if prefetch hints should be issued only for strides that +are known to be constant and below @option{prefetch-minimum-stride}. + @item prefetch-minimum-stride Minimum constant stride, in bytes, to start using prefetch hints for. If the stride is less than this threshold, prefetch hints will not be issued. diff --git a/gcc/params.def b/gcc/params.def index bf2d12c..c564178 100644 --- a/gcc/params.def +++ b/gcc/params.def @@ -790,6 +790,15 @@ DEFPARAM (PARAM_L2_CACHE_SIZE, "The size of L2 cache.", 512, 0, 0) +/* Whether software prefetch hints should be issued for non-constant + strides. */ + +DEFPARAM (PARAM_PREFETCH_DYNAMIC_STRIDES, + "prefetch-dynamic-strides", + "Whether software prefetch hints should be issued for non-constant " + "strides.", + 1, 0, 1) + /* The minimum constant stride beyond which we should use prefetch hints for. */ diff --git a/gcc/params.h b/gcc/params.h index 96012db..8aa960a 100644 --- a/gcc/params.h +++ b/gcc/params.h @@ -196,6 +196,8 @@ extern void init_param_values (int *params); PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE) #define L2_CACHE_SIZE \ PARAM_VALUE (PARAM_L2_CACHE_SIZE) +#define PREFETCH_DYNAMIC_STRIDES \ + PARAM_VALUE (PARAM_PREFETCH_DYNAMIC_STRIDES) #define PREFETCH_MINIMUM_STRIDE \ PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE) #define USE_CANONICAL_TYPES \ diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c index 112ccac..de2acc8 100644 --- a/gcc/tree-ssa-loop-prefetch.c +++ b/gcc/tree-ssa-loop-prefetch.c @@ -992,6 +992,16 @@ prune_by_reuse (struct mem_ref_group *groups) static bool should_issue_prefetch_p (struct mem_ref *ref) { + /* Do we want to issue prefetches for non-constant strides? */ + if (!cst_and_fits_in_hwi (ref->group->step) && PREFETCH_DYNAMIC_STRIDES == 0) + { + if (dump_file && (dump_flags & TDF_DETAILS)) + fprintf (dump_file, + "Skipping non-constant step for reference %u:%u\n", + ref->group->uid, ref->uid); + return false; + } + /* Some processors may have a hardware prefetcher that may conflict with prefetch hints for a range of strides. Make sure we don't issue prefetches for such cases if the stride is within this particular