diff mbox series

[2/2] Introduce prefetch-dynamic-strides option.

Message ID 1516628770-25036-3-git-send-email-luis.machado@linaro.org
State New
Headers show
Series Add a couple new options to control loop prefetch pass | expand

Commit Message

Luis Machado Jan. 22, 2018, 1:46 p.m. UTC
The following patch adds an option to control software prefetching of memory
references with non-constant/unknown strides.

Currently we prefetch these references if the pass thinks there is benefit to
doing so. But, since this is all based on heuristics, it's not always the case
that we end up with better performance.

For Falkor there is also the problem of conflicts with the hardware prefetcher,
so we need to be more conservative in terms of what we issue software prefetch
hints for.

This also aligns GCC with what LLVM does for Falkor.

Similarly to the previous patch, the defaults guarantee no change in behavior
for other targets and architectures.

I've regression-tested and bootstrapped it on aarch64-linux. No problems found.

Ok?

2018-01-22  Luis Machado  <luis.machado@linaro.org>

	Introduce option to control whether the software prefetch pass should
	issue prefetch hints for non-constant strides.

	gcc/
	* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)
	<prefetch_dynamic_strides>: New const unsigned int field.
	* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include
	prefetch_dynamic_strides.
	(exynosm1_prefetch_tune): Likewise.
	(thunderxt88_prefetch_tune): Likewise.
	(thunderx_prefetch_tune): Likewise.
	(thunderx2t99_prefetch_tune): Likewise.
	(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0.
	(aarch64_override_options_internal): Update to set
	PARAM_PREFETCH_DYNAMIC_STRIDES.
	* doc/invoke.texi (prefetch-dynamic-strides): Document new option.
	* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.
	* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.
	* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for
	prefetch-dynamic-strides setting.
---
 gcc/config/aarch64/aarch64-protos.h |  3 +++
 gcc/config/aarch64/aarch64.c        | 11 +++++++++++
 gcc/doc/invoke.texi                 | 10 ++++++++++
 gcc/params.def                      |  9 +++++++++
 gcc/params.h                        |  2 ++
 gcc/tree-ssa-loop-prefetch.c        | 10 ++++++++++
 6 files changed, 45 insertions(+)

-- 
2.7.4

Comments

Kyrill Tkachov Jan. 23, 2018, 9:40 a.m. UTC | #1
Hi Luis,

On 22/01/18 13:46, Luis Machado wrote:
> The following patch adds an option to control software prefetching of memory

> references with non-constant/unknown strides.

>

> Currently we prefetch these references if the pass thinks there is benefit to

> doing so. But, since this is all based on heuristics, it's not always the case

> that we end up with better performance.

>

> For Falkor there is also the problem of conflicts with the hardware prefetcher,

> so we need to be more conservative in terms of what we issue software prefetch

> hints for.

>

> This also aligns GCC with what LLVM does for Falkor.

>

> Similarly to the previous patch, the defaults guarantee no change in behavior

> for other targets and architectures.

>

> I've regression-tested and bootstrapped it on aarch64-linux. No problems found.

>

> Ok?

>


This also looks like a sensible approach to me with a caveat inline.
The same general comments as for patch [1/2] apply.

Thanks,
Kyrill

> 2018-01-22  Luis Machado  <luis.machado@linaro.org>

>

>         Introduce option to control whether the software prefetch pass should

>         issue prefetch hints for non-constant strides.

>

>         gcc/

>         * config/aarch64/aarch64-protos.h (cpu_prefetch_tune)

>         <prefetch_dynamic_strides>: New const unsigned int field.

>         * config/aarch64/aarch64.c (generic_prefetch_tune): Update to include

>         prefetch_dynamic_strides.

>         (exynosm1_prefetch_tune): Likewise.

>         (thunderxt88_prefetch_tune): Likewise.

>         (thunderx_prefetch_tune): Likewise.

>         (thunderx2t99_prefetch_tune): Likewise.

>         (qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0.

>         (aarch64_override_options_internal): Update to set

>         PARAM_PREFETCH_DYNAMIC_STRIDES.

>         * doc/invoke.texi (prefetch-dynamic-strides): Document new option.

>         * params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.

>         * params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.

>         * tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for

>         prefetch-dynamic-strides setting.

> ---

>  gcc/config/aarch64/aarch64-protos.h |  3 +++

>  gcc/config/aarch64/aarch64.c        | 11 +++++++++++

>  gcc/doc/invoke.texi                 | 10 ++++++++++

>  gcc/params.def                      |  9 +++++++++

>  gcc/params.h                        |  2 ++

>  gcc/tree-ssa-loop-prefetch.c        | 10 ++++++++++

>  6 files changed, 45 insertions(+)

>

> diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h

> index 8736bd9..22bd9ae 100644

> --- a/gcc/config/aarch64/aarch64-protos.h

> +++ b/gcc/config/aarch64/aarch64-protos.h

> @@ -230,6 +230,9 @@ struct cpu_prefetch_tune

>    const int l1_cache_size;

>    const int l1_cache_line_size;

>    const int l2_cache_size;

> +  /* Whether software prefetch hints should be issued for non-constant

> +     strides.  */

> +  const unsigned int prefetch_dynamic_strides;


I understand that the midend PARAMs are defined as integers, but I think
the backend tuning option here is better represented as a boolean as it really
is just a yes/no decision.

>    /* The minimum constant stride beyond which we should use prefetch

>       hints for.  */

>    const int minimum_stride;

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

> index 0ed9f14..713b230 100644

> --- a/gcc/config/aarch64/aarch64.c

> +++ b/gcc/config/aarch64/aarch64.c

> @@ -547,6 +547,7 @@ static const cpu_prefetch_tune generic_prefetch_tune =

>    -1,                  /* l1_cache_size  */

>    -1,                  /* l1_cache_line_size  */

>    -1,                  /* l2_cache_size  */

> +  1,                   /* prefetch_dynamic_strides */

>    -1,                  /* minimum_stride */

>    -1                   /* default_opt_level  */

>  };

> @@ -557,6 +558,7 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =

>    -1,                  /* l1_cache_size  */

>    64,                  /* l1_cache_line_size  */

>    -1,                  /* l2_cache_size  */

> +  1,                   /* prefetch_dynamic_strides */

>    -1,                  /* minimum_stride */

>    -1                   /* default_opt_level  */

>  };

> @@ -567,6 +569,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =

>    32,                  /* l1_cache_size  */

>    64,                  /* l1_cache_line_size  */

>    1024,                        /* l2_cache_size  */

> +  0,                   /* prefetch_dynamic_strides */

>    2048,                        /* minimum_stride */

>    3                    /* default_opt_level  */

>  };

> @@ -577,6 +580,7 @@ static const cpu_prefetch_tune thunderxt88_prefetch_tune =

>    32,                  /* l1_cache_size  */

>    128,                 /* l1_cache_line_size  */

>    16*1024,             /* l2_cache_size  */

> +  1,                   /* prefetch_dynamic_strides */

>    -1,                  /* minimum_stride */

>    3                    /* default_opt_level  */

>  };

> @@ -587,6 +591,7 @@ static const cpu_prefetch_tune thunderx_prefetch_tune =

>    32,                  /* l1_cache_size  */

>    128,                 /* l1_cache_line_size  */

>    -1,                  /* l2_cache_size  */

> +  1,                   /* prefetch_dynamic_strides */

>    -1,                  /* minimum_stride */

>    -1                   /* default_opt_level  */

>  };

> @@ -597,6 +602,7 @@ static const cpu_prefetch_tune thunderx2t99_prefetch_tune =

>    32,                  /* l1_cache_size  */

>    64,                  /* l1_cache_line_size  */

>    256,                 /* l2_cache_size  */

> +  1,                   /* prefetch_dynamic_strides */

>    -1,                  /* minimum_stride */

>    -1                   /* default_opt_level  */

>  };

> @@ -10467,6 +10473,11 @@ aarch64_override_options_internal (struct gcc_options *opts)

> aarch64_tune_params.prefetch->l2_cache_size,

>                             opts->x_param_values,

> global_options_set.x_param_values);

> +  if (aarch64_tune_params.prefetch->prefetch_dynamic_strides == 0)

> +    maybe_set_param_value (PARAM_PREFETCH_DYNAMIC_STRIDES,

> + aarch64_tune_params.prefetch->prefetch_dynamic_strides,

> +                          opts->x_param_values,

> + global_options_set.x_param_values);

>    if (aarch64_tune_params.prefetch->minimum_stride >= 0)

>      maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE,

> aarch64_tune_params.prefetch->minimum_stride,

> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi

> index 1cb1ef5..541c24c 100644

> --- a/gcc/doc/invoke.texi

> +++ b/gcc/doc/invoke.texi

> @@ -10567,6 +10567,16 @@ The size of L1 cache, in kilobytes.

>  @item l2-cache-size

>  The size of L2 cache, in kilobytes.

>

> +@item prefetch-dynamic-strides

> +Whether the loop array prefetch pass should issue software prefetch hints

> +for strides that are non-constant.  In some cases this may be

> +beneficial, though the fact the stride is non-constant may make it

> +hard to predict when there is clear benefit to issuing these hints.

> +

> +Set to 1, the default, if the prefetch hints should be issued for non-constant

> +strides.  Set to 0 if prefetch hints should be issued only for strides that

> +are known to be constant and below @option{prefetch-minimum-stride}.

> +

>  @item prefetch-minimum-stride

>  Minimum constant stride, in bytes, to start using prefetch hints for.  If

>  the stride is less than this threshold, prefetch hints will not be issued.

> diff --git a/gcc/params.def b/gcc/params.def

> index bf2d12c..c564178 100644

> --- a/gcc/params.def

> +++ b/gcc/params.def

> @@ -790,6 +790,15 @@ DEFPARAM (PARAM_L2_CACHE_SIZE,

>            "The size of L2 cache.",

>            512, 0, 0)

>

> +/* Whether software prefetch hints should be issued for non-constant

> +   strides.  */

> +

> +DEFPARAM (PARAM_PREFETCH_DYNAMIC_STRIDES,

> +         "prefetch-dynamic-strides",

> +         "Whether software prefetch hints should be issued for non-constant "

> +         "strides.",

> +         1, 0, 1)

> +

>  /* The minimum constant stride beyond which we should use prefetch hints

>     for.  */

>

> diff --git a/gcc/params.h b/gcc/params.h

> index 96012db..8aa960a 100644

> --- a/gcc/params.h

> +++ b/gcc/params.h

> @@ -196,6 +196,8 @@ extern void init_param_values (int *params);

>    PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE)

>  #define L2_CACHE_SIZE \

>    PARAM_VALUE (PARAM_L2_CACHE_SIZE)

> +#define PREFETCH_DYNAMIC_STRIDES \

> +  PARAM_VALUE (PARAM_PREFETCH_DYNAMIC_STRIDES)

>  #define PREFETCH_MINIMUM_STRIDE \

>    PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE)

>  #define USE_CANONICAL_TYPES \

> diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c

> index 112ccac..de2acc8 100644

> --- a/gcc/tree-ssa-loop-prefetch.c

> +++ b/gcc/tree-ssa-loop-prefetch.c

> @@ -992,6 +992,16 @@ prune_by_reuse (struct mem_ref_group *groups)

>  static bool

>  should_issue_prefetch_p (struct mem_ref *ref)

>  {

> +  /* Do we want to issue prefetches for non-constant strides?  */

> +  if (!cst_and_fits_in_hwi (ref->group->step) && PREFETCH_DYNAMIC_STRIDES == 0)

> +    {

> +      if (dump_file && (dump_flags & TDF_DETAILS))

> +       fprintf (dump_file,

> +                "Skipping non-constant step for reference %u:%u\n",

> +                ref->group->uid, ref->uid);

> +      return false;

> +    }

> +

>    /* Some processors may have a hardware prefetcher that may conflict with

>       prefetch hints for a range of strides.  Make sure we don't issue

>       prefetches for such cases if the stride is within this particular

> -- 

> 2.7.4

>
Luis Machado Jan. 23, 2018, 1:17 p.m. UTC | #2
Hi,

On 01/23/2018 07:40 AM, Kyrill Tkachov wrote:
> Hi Luis,

> 

> On 22/01/18 13:46, Luis Machado wrote:

>> The following patch adds an option to control software prefetching of 

>> memory

>> references with non-constant/unknown strides.

>>

>> Currently we prefetch these references if the pass thinks there is 

>> benefit to

>> doing so. But, since this is all based on heuristics, it's not always 

>> the case

>> that we end up with better performance.

>>

>> For Falkor there is also the problem of conflicts with the hardware 

>> prefetcher,

>> so we need to be more conservative in terms of what we issue software 

>> prefetch

>> hints for.

>>

>> This also aligns GCC with what LLVM does for Falkor.

>>

>> Similarly to the previous patch, the defaults guarantee no change in 

>> behavior

>> for other targets and architectures.

>>

>> I've regression-tested and bootstrapped it on aarch64-linux. No 

>> problems found.

>>

>> Ok?

>>

> 

> This also looks like a sensible approach to me with a caveat inline.

> The same general comments as for patch [1/2] apply.

>> diff --git a/gcc/config/aarch64/aarch64-protos.h 

>> b/gcc/config/aarch64/aarch64-protos.h

>> index 8736bd9..22bd9ae 100644

>> --- a/gcc/config/aarch64/aarch64-protos.h

>> +++ b/gcc/config/aarch64/aarch64-protos.h

>> @@ -230,6 +230,9 @@ struct cpu_prefetch_tune

>>    const int l1_cache_size;

>>    const int l1_cache_line_size;

>>    const int l2_cache_size;

>> +  /* Whether software prefetch hints should be issued for non-constant

>> +     strides.  */

>> +  const unsigned int prefetch_dynamic_strides;

> 

> I understand that the midend PARAMs are defined as integers, but I think

> the backend tuning option here is better represented as a boolean as it 

> really

> is just a yes/no decision.

>

I started off with a boolean to be honest. Then i noticed the midend 
only used integers, which i restricted to the range of 0..1.

I'll change this locally to use booleans again.

Thanks!
Luis
Jeff Law May 1, 2018, 6:30 p.m. UTC | #3
On 01/22/2018 06:46 AM, Luis Machado wrote:
> The following patch adds an option to control software prefetching of memory

> references with non-constant/unknown strides.

> 

> Currently we prefetch these references if the pass thinks there is benefit to

> doing so. But, since this is all based on heuristics, it's not always the case

> that we end up with better performance.

> 

> For Falkor there is also the problem of conflicts with the hardware prefetcher,

> so we need to be more conservative in terms of what we issue software prefetch

> hints for.

> 

> This also aligns GCC with what LLVM does for Falkor.

> 

> Similarly to the previous patch, the defaults guarantee no change in behavior

> for other targets and architectures.

> 

> I've regression-tested and bootstrapped it on aarch64-linux. No problems found.

> 

> Ok?

> 

> 2018-01-22  Luis Machado  <luis.machado@linaro.org>

> 

> 	Introduce option to control whether the software prefetch pass should

> 	issue prefetch hints for non-constant strides.

> 

> 	gcc/

> 	* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)

> 	<prefetch_dynamic_strides>: New const unsigned int field.

> 	* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include

> 	prefetch_dynamic_strides.

> 	(exynosm1_prefetch_tune): Likewise.

> 	(thunderxt88_prefetch_tune): Likewise.

> 	(thunderx_prefetch_tune): Likewise.

> 	(thunderx2t99_prefetch_tune): Likewise.

> 	(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0.

> 	(aarch64_override_options_internal): Update to set

> 	PARAM_PREFETCH_DYNAMIC_STRIDES.

> 	* doc/invoke.texi (prefetch-dynamic-strides): Document new option.

> 	* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.

> 	* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.

> 	* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for

> 	prefetch-dynamic-strides setting.

OK for the trunk.
jeff
Luis Machado May 7, 2018, 2:13 p.m. UTC | #4
On 05/01/2018 03:30 PM, Jeff Law wrote:
> On 01/22/2018 06:46 AM, Luis Machado wrote:

>> The following patch adds an option to control software prefetching of memory

>> references with non-constant/unknown strides.

>>

>> Currently we prefetch these references if the pass thinks there is benefit to

>> doing so. But, since this is all based on heuristics, it's not always the case

>> that we end up with better performance.

>>

>> For Falkor there is also the problem of conflicts with the hardware prefetcher,

>> so we need to be more conservative in terms of what we issue software prefetch

>> hints for.

>>

>> This also aligns GCC with what LLVM does for Falkor.

>>

>> Similarly to the previous patch, the defaults guarantee no change in behavior

>> for other targets and architectures.

>>

>> I've regression-tested and bootstrapped it on aarch64-linux. No problems found.

>>

>> Ok?

>>

>> 2018-01-22  Luis Machado  <luis.machado@linaro.org>

>>

>> 	Introduce option to control whether the software prefetch pass should

>> 	issue prefetch hints for non-constant strides.

>>

>> 	gcc/

>> 	* config/aarch64/aarch64-protos.h (cpu_prefetch_tune)

>> 	<prefetch_dynamic_strides>: New const unsigned int field.

>> 	* config/aarch64/aarch64.c (generic_prefetch_tune): Update to include

>> 	prefetch_dynamic_strides.

>> 	(exynosm1_prefetch_tune): Likewise.

>> 	(thunderxt88_prefetch_tune): Likewise.

>> 	(thunderx_prefetch_tune): Likewise.

>> 	(thunderx2t99_prefetch_tune): Likewise.

>> 	(qdf24xx_prefetch_tune): Likewise. Set prefetch_dynamic_strides to 0.

>> 	(aarch64_override_options_internal): Update to set

>> 	PARAM_PREFETCH_DYNAMIC_STRIDES.

>> 	* doc/invoke.texi (prefetch-dynamic-strides): Document new option.

>> 	* params.def (PARAM_PREFETCH_DYNAMIC_STRIDES): New.

>> 	* params.h (PARAM_PREFETCH_DYNAMIC_STRIDES): Define.

>> 	* tree-ssa-loop-prefetch.c (should_issue_prefetch_p): Account for

>> 	prefetch-dynamic-strides setting.

> OK for the trunk.

> jeff

> 


Thanks. Committed as r259996.
diff mbox series

Patch

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 8736bd9..22bd9ae 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -230,6 +230,9 @@  struct cpu_prefetch_tune
   const int l1_cache_size;
   const int l1_cache_line_size;
   const int l2_cache_size;
+  /* Whether software prefetch hints should be issued for non-constant
+     strides.  */
+  const unsigned int prefetch_dynamic_strides;
   /* The minimum constant stride beyond which we should use prefetch
      hints for.  */
   const int minimum_stride;
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0ed9f14..713b230 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -547,6 +547,7 @@  static const cpu_prefetch_tune generic_prefetch_tune =
   -1,			/* l1_cache_size  */
   -1,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  1,			/* prefetch_dynamic_strides */
   -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
@@ -557,6 +558,7 @@  static const cpu_prefetch_tune exynosm1_prefetch_tune =
   -1,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  1,			/* prefetch_dynamic_strides */
   -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
@@ -567,6 +569,7 @@  static const cpu_prefetch_tune qdf24xx_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   1024,			/* l2_cache_size  */
+  0,			/* prefetch_dynamic_strides */
   2048,			/* minimum_stride */
   3			/* default_opt_level  */
 };
@@ -577,6 +580,7 @@  static const cpu_prefetch_tune thunderxt88_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   16*1024,		/* l2_cache_size  */
+  1,			/* prefetch_dynamic_strides */
   -1,			/* minimum_stride */
   3			/* default_opt_level  */
 };
@@ -587,6 +591,7 @@  static const cpu_prefetch_tune thunderx_prefetch_tune =
   32,			/* l1_cache_size  */
   128,			/* l1_cache_line_size  */
   -1,			/* l2_cache_size  */
+  1,			/* prefetch_dynamic_strides */
   -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
@@ -597,6 +602,7 @@  static const cpu_prefetch_tune thunderx2t99_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   256,			/* l2_cache_size  */
+  1,			/* prefetch_dynamic_strides */
   -1,			/* minimum_stride */
   -1			/* default_opt_level  */
 };
@@ -10467,6 +10473,11 @@  aarch64_override_options_internal (struct gcc_options *opts)
 			   aarch64_tune_params.prefetch->l2_cache_size,
 			   opts->x_param_values,
 			   global_options_set.x_param_values);
+  if (aarch64_tune_params.prefetch->prefetch_dynamic_strides == 0)
+    maybe_set_param_value (PARAM_PREFETCH_DYNAMIC_STRIDES,
+			   aarch64_tune_params.prefetch->prefetch_dynamic_strides,
+			   opts->x_param_values,
+			   global_options_set.x_param_values);
   if (aarch64_tune_params.prefetch->minimum_stride >= 0)
     maybe_set_param_value (PARAM_PREFETCH_MINIMUM_STRIDE,
 			   aarch64_tune_params.prefetch->minimum_stride,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 1cb1ef5..541c24c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -10567,6 +10567,16 @@  The size of L1 cache, in kilobytes.
 @item l2-cache-size
 The size of L2 cache, in kilobytes.
 
+@item prefetch-dynamic-strides
+Whether the loop array prefetch pass should issue software prefetch hints
+for strides that are non-constant.  In some cases this may be
+beneficial, though the fact the stride is non-constant may make it
+hard to predict when there is clear benefit to issuing these hints.
+
+Set to 1, the default, if the prefetch hints should be issued for non-constant
+strides.  Set to 0 if prefetch hints should be issued only for strides that
+are known to be constant and below @option{prefetch-minimum-stride}.
+
 @item prefetch-minimum-stride
 Minimum constant stride, in bytes, to start using prefetch hints for.  If
 the stride is less than this threshold, prefetch hints will not be issued.
diff --git a/gcc/params.def b/gcc/params.def
index bf2d12c..c564178 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -790,6 +790,15 @@  DEFPARAM (PARAM_L2_CACHE_SIZE,
 	  "The size of L2 cache.",
 	  512, 0, 0)
 
+/* Whether software prefetch hints should be issued for non-constant
+   strides.  */
+
+DEFPARAM (PARAM_PREFETCH_DYNAMIC_STRIDES,
+	  "prefetch-dynamic-strides",
+	  "Whether software prefetch hints should be issued for non-constant "
+	  "strides.",
+	  1, 0, 1)
+
 /* The minimum constant stride beyond which we should use prefetch hints
    for.  */
 
diff --git a/gcc/params.h b/gcc/params.h
index 96012db..8aa960a 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -196,6 +196,8 @@  extern void init_param_values (int *params);
   PARAM_VALUE (PARAM_L1_CACHE_LINE_SIZE)
 #define L2_CACHE_SIZE \
   PARAM_VALUE (PARAM_L2_CACHE_SIZE)
+#define PREFETCH_DYNAMIC_STRIDES \
+  PARAM_VALUE (PARAM_PREFETCH_DYNAMIC_STRIDES)
 #define PREFETCH_MINIMUM_STRIDE \
   PARAM_VALUE (PARAM_PREFETCH_MINIMUM_STRIDE)
 #define USE_CANONICAL_TYPES \
diff --git a/gcc/tree-ssa-loop-prefetch.c b/gcc/tree-ssa-loop-prefetch.c
index 112ccac..de2acc8 100644
--- a/gcc/tree-ssa-loop-prefetch.c
+++ b/gcc/tree-ssa-loop-prefetch.c
@@ -992,6 +992,16 @@  prune_by_reuse (struct mem_ref_group *groups)
 static bool
 should_issue_prefetch_p (struct mem_ref *ref)
 {
+  /* Do we want to issue prefetches for non-constant strides?  */
+  if (!cst_and_fits_in_hwi (ref->group->step) && PREFETCH_DYNAMIC_STRIDES == 0)
+    {
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file,
+		 "Skipping non-constant step for reference %u:%u\n",
+		 ref->group->uid, ref->uid);
+      return false;
+    }
+
   /* Some processors may have a hardware prefetcher that may conflict with
      prefetch hints for a range of strides.  Make sure we don't issue
      prefetches for such cases if the stride is within this particular