diff mbox series

[AArch64] Adjust tuning parameters for Falkor

Message ID 1510711256-1931-1-git-send-email-luis.machado@linaro.org
State New
Headers show
Series [AArch64] Adjust tuning parameters for Falkor | expand

Commit Message

Luis Machado Nov. 15, 2017, 2 a.m. UTC
Disabling software prefetching and switching the autoprefetcher to weak improves
CPU2017 rate and speed benchmarks for both int and fp sets on Falkor.

SPECrate 2017 fp is up 0.38%
SPECspeed 2017 fp is up 0.54%
SPECrate 2017 int is up 3.02%
SPECspeed 2017 int is up 3.16%

There are only a couple individual regressions. The biggest one being about 4%
in parest.

For SPEC2006, we've noticed the following:

SPECint is up 0.91%
SPECfp is stable

In the case of SPEC2006 we noticed both a big regression in mcf (about 20%)
and a big improvement for hmmer (about 40%).

Since the overall result is positive, we would like to make these new tuning
settings the default for Falkor.

We may revisit the software prefetcher setting in the future, in case we
can adjust it enough so it provides us a good balance between improvements and
regressions (mcf). But for now it is best if it stays off.

I understand the freeze is happening soon, so it would be great to have this
in before then.

OK?

Thanks,
Luis

2017-11-14  Luis Machado  <luis.machado@linaro.org>

	* config/aarch64/aarch64.c (qdf24xx_prefetch_tune): Remove.
	(qdf24xx_tunings): Replace qdf24xx_prefetch_tune with
	generic_prefetch_tune and tune_params::AUTOPREFETCHER_STRONG with
	tune_params::AUTOPREFETCHER_WEAK.
---
 gcc/ChangeLog                |  7 +++++++
 gcc/config/aarch64/aarch64.c | 13 ++-----------
 2 files changed, 9 insertions(+), 11 deletions(-)

-- 
2.7.4

Comments

Andrew Pinski Nov. 15, 2017, 2:07 a.m. UTC | #1
On Tue, Nov 14, 2017 at 6:00 PM, Luis Machado <luis.machado@linaro.org> wrote:
> Disabling software prefetching and switching the autoprefetcher to weak improves

> CPU2017 rate and speed benchmarks for both int and fp sets on Falkor.

>

> SPECrate 2017 fp is up 0.38%

> SPECspeed 2017 fp is up 0.54%

> SPECrate 2017 int is up 3.02%

> SPECspeed 2017 int is up 3.16%

>

> There are only a couple individual regressions. The biggest one being about 4%

> in parest.

>

> For SPEC2006, we've noticed the following:

>

> SPECint is up 0.91%

> SPECfp is stable

>

> In the case of SPEC2006 we noticed both a big regression in mcf (about 20%)

> and a big improvement for hmmer (about 40%).

>

> Since the overall result is positive, we would like to make these new tuning

> settings the default for Falkor.

>

> We may revisit the software prefetcher setting in the future, in case we

> can adjust it enough so it provides us a good balance between improvements and

> regressions (mcf). But for now it is best if it stays off.

>

> I understand the freeze is happening soon, so it would be great to have this

> in before then.

>

> OK?

>

> Thanks,

> Luis

>

> 2017-11-14  Luis Machado  <luis.machado@linaro.org>

>

>         * config/aarch64/aarch64.c (qdf24xx_prefetch_tune): Remove.

>         (qdf24xx_tunings): Replace qdf24xx_prefetch_tune with

>         generic_prefetch_tune and tune_params::AUTOPREFETCHER_STRONG with

>         tune_params::AUTOPREFETCHER_WEAK.

> ---

>  gcc/ChangeLog                |  7 +++++++

>  gcc/config/aarch64/aarch64.c | 13 ++-----------

>  2 files changed, 9 insertions(+), 11 deletions(-)

>

> diff --git a/gcc/ChangeLog b/gcc/ChangeLog

> index b80a421..4dbfda0 100644

> --- a/gcc/ChangeLog

> +++ b/gcc/ChangeLog

> @@ -1,3 +1,10 @@

> +2017-11-14  Luis Machado  <luis.machado@linaro.org>

> +

> +       * config/aarch64/aarch64.c (qdf24xx_prefetch_tune): Remove.

> +       (qdf24xx_tunings): Replace qdf24xx_prefetch_tune with

> +       generic_prefetch_tune and tune_params::AUTOPREFETCHER_STRONG with

> +       tune_params::AUTOPREFETCHER_WEAK.

> +

>  2017-11-14  Carl Love  <cel@us.ibm.com>

>

>         * config/rs6000/rs6000.c (swap_endian_selector_for_mode): Remove

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

> index 0c67e2b..171a230 100644

> --- a/gcc/config/aarch64/aarch64.c

> +++ b/gcc/config/aarch64/aarch64.c

> @@ -502,15 +502,6 @@ static const cpu_prefetch_tune exynosm1_prefetch_tune =

>    -1                   /* default_opt_level  */

>  };

>

> -static const cpu_prefetch_tune qdf24xx_prefetch_tune =

> -{

> -  4,                   /* num_slots  */

> -  32,                  /* l1_cache_size  */

> -  64,                  /* l1_cache_line_size  */

> -  1024,                        /* l2_cache_size  */

> -  3                    /* default_opt_level  */


I think the best thing is to leave this tuning structure in place and
just change default_opt_level   to -1 to disable it at -O3.

Thanks,
Andrew

> -};

> -

>  static const cpu_prefetch_tune thunderxt88_prefetch_tune =

>  {

>    8,                   /* num_slots  */

> @@ -817,9 +808,9 @@ static const struct tune_params qdf24xx_tunings =

>    2,   /* min_div_recip_mul_sf.  */

>    2,   /* min_div_recip_mul_df.  */

>    0,   /* max_case_values.  */

> -  tune_params::AUTOPREFETCHER_STRONG,  /* autoprefetcher_model.  */

> +  tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.  */

>    (AARCH64_EXTRA_TUNE_NONE),           /* tune_flags.  */

> -  &qdf24xx_prefetch_tune

> +  &generic_prefetch_tune

>  };

>

>  /* Tuning structure for the Qualcomm Saphira core.  Default to falkor values

> --

> 2.7.4

><div id="DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><br />

<table style="border-top: 1px solid #D3D4DE;">
	<tr>
        <td style="width: 55px; padding-top: 13px;"><a
href="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail"
target="_blank"><img
src="https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png"
alt="" width="46" height="29" style="width: 46px; height: 29px;"
/></a></td>
		<td style="width: 470px; padding-top: 12px; color: #41424e;
font-size: 13px; font-family: Arial, Helvetica, sans-serif;
line-height: 18px;">Virus-free. <a
href="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail"
target="_blank" style="color: #4453ea;">www.avg.com</a>
		</td>
	</tr>
</table><a href="#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1"
height="1"></a></div>
Luis Machado Nov. 15, 2017, 3 a.m. UTC | #2
> I think the best thing is to leave this tuning structure in place and

> just change default_opt_level   to -1 to disable it at -O3.

> 

> Thanks,

> Andrew

> 


Indeed that seems to be more appropriate if re-enabling prefetches in the
future is a possibility.

How about the following?

Thanks,
Luis

2017-11-15  Luis Machado  <luis.machado@linaro.org>

	gcc/
	* config/aarch64/aarch64.c
	(qdf24xx_prefetch_tune) <default_opt_level>: Set to -1.
	(qdf24xx_tunings) <autoprefetcher_model>: Set to
	tune_params::AUTOPREFETCHER_WEAK.
---
 gcc/ChangeLog                | 7 +++++++
 gcc/config/aarch64/aarch64.c | 4 ++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b80a421..0e05f9e 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2017-11-15  Luis Machado  <luis.machado@linaro.org>
+
+	* config/aarch64/aarch64.c
+	(qdf24xx_prefetch_tune) <default_opt_level>: Set to -1.
+	(qdf24xx_tunings) <autoprefetcher_model>: Set to
+	tune_params::AUTOPREFETCHER_WEAK.
+
 2017-11-14  Carl Love  <cel@us.ibm.com>
 
 	* config/rs6000/rs6000.c (swap_endian_selector_for_mode): Remove
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0c67e2b..8779cad 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -508,7 +508,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =
   32,			/* l1_cache_size  */
   64,			/* l1_cache_line_size  */
   1024,			/* l2_cache_size  */
-  3			/* default_opt_level  */
+  -1			/* default_opt_level  */
 };
 
 static const cpu_prefetch_tune thunderxt88_prefetch_tune =
@@ -817,7 +817,7 @@ static const struct tune_params qdf24xx_tunings =
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
   0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_STRONG,	/* autoprefetcher_model.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),		/* tune_flags.  */
   &qdf24xx_prefetch_tune
 };
-- 
2.7.4
Luis Machado Nov. 16, 2017, 8:05 p.m. UTC | #3
On 15 November 2017 at 01:00, Luis Machado <luis.machado@linaro.org> wrote:

> > I think the best thing is to leave this tuning structure in place and

> > just change default_opt_level   to -1 to disable it at -O3.

> >

> > Thanks,

> > Andrew

> >

>

> Indeed that seems to be more appropriate if re-enabling prefetches in the

> future is a possibility.

>

> How about the following?

>

> Thanks,

> Luis

>


Ping?
Kyrill Tkachov Nov. 17, 2017, 9:25 a.m. UTC | #4
Hi Luis,

[cc'ing aarch64 maintainers, it's quicker to get review that way]

On 15/11/17 03:00, Luis Machado wrote:
> > I think the best thing is to leave this tuning structure in place and

> > just change default_opt_level   to -1 to disable it at -O3.

> >

> > Thanks,

> > Andrew

> >

>

> Indeed that seems to be more appropriate if re-enabling prefetches in the

> future is a possibility.

>

> How about the following?

>


This looks correct to me to achieve what you want to achieve,
but I can't approve it myself so you'll need an ok from an aarch64 
maintainer.

Thanks,
Kyrill

> Thanks,

> Luis

>

> 2017-11-15  Luis Machado  <luis.machado@linaro.org>

>

>         gcc/

>         * config/aarch64/aarch64.c

>         (qdf24xx_prefetch_tune) <default_opt_level>: Set to -1.

>         (qdf24xx_tunings) <autoprefetcher_model>: Set to

>         tune_params::AUTOPREFETCHER_WEAK.

> ---

>  gcc/ChangeLog                | 7 +++++++

>  gcc/config/aarch64/aarch64.c | 4 ++--

>  2 files changed, 9 insertions(+), 2 deletions(-)

>

> diff --git a/gcc/ChangeLog b/gcc/ChangeLog

> index b80a421..0e05f9e 100644

> --- a/gcc/ChangeLog

> +++ b/gcc/ChangeLog

> @@ -1,3 +1,10 @@

> +2017-11-15  Luis Machado  <luis.machado@linaro.org>

> +

> +       * config/aarch64/aarch64.c

> +       (qdf24xx_prefetch_tune) <default_opt_level>: Set to -1.

> +       (qdf24xx_tunings) <autoprefetcher_model>: Set to

> +       tune_params::AUTOPREFETCHER_WEAK.

> +

>  2017-11-14  Carl Love  <cel@us.ibm.com>

>

>          * config/rs6000/rs6000.c (swap_endian_selector_for_mode): Remove

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

> index 0c67e2b..8779cad 100644

> --- a/gcc/config/aarch64/aarch64.c

> +++ b/gcc/config/aarch64/aarch64.c

> @@ -508,7 +508,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =

>    32,                  /* l1_cache_size  */

>    64,                  /* l1_cache_line_size  */

>    1024,                        /* l2_cache_size  */

> -  3                    /* default_opt_level  */

> +  -1                   /* default_opt_level  */

>  };

>

>  static const cpu_prefetch_tune thunderxt88_prefetch_tune =

> @@ -817,7 +817,7 @@ static const struct tune_params qdf24xx_tunings =

>    2,   /* min_div_recip_mul_sf.  */

>    2,   /* min_div_recip_mul_df.  */

>    0,   /* max_case_values.  */

> -  tune_params::AUTOPREFETCHER_STRONG,  /* autoprefetcher_model.  */

> +  tune_params::AUTOPREFETCHER_WEAK,    /* autoprefetcher_model.  */

>    (AARCH64_EXTRA_TUNE_NONE),           /* tune_flags.  */

>    &qdf24xx_prefetch_tune

>  };

> -- 

> 2.7.4

>
Luis Machado Nov. 17, 2017, 3:46 p.m. UTC | #5
On 11/17/2017 07:25 AM, Kyrill Tkachov wrote:
> Hi Luis,

> 

> [cc'ing aarch64 maintainers, it's quicker to get review that way]

> 

> On 15/11/17 03:00, Luis Machado wrote:

>> > I think the best thing is to leave this tuning structure in place and

>> > just change default_opt_level   to -1 to disable it at -O3.

>> >

>> > Thanks,

>> > Andrew

>> >

>>

>> Indeed that seems to be more appropriate if re-enabling prefetches in the

>> future is a possibility.

>>

>> How about the following?

>>

> 

> This looks correct to me to achieve what you want to achieve,

> but I can't approve it myself so you'll need an ok from an aarch64 

> maintainer.

> 

> Thanks,

> Kyrill

> 


Thanks for the feedback. I'll wait for one of the maintainers to OK it.

Luis
James Greenhalgh Nov. 17, 2017, 3:48 p.m. UTC | #6
On Wed, Nov 15, 2017 at 03:00:53AM +0000, Luis Machado wrote:
> > I think the best thing is to leave this tuning structure in place and

> > just change default_opt_level   to -1 to disable it at -O3.

> > 

> > Thanks,

> > Andrew

> > 

> 

> Indeed that seems to be more appropriate if re-enabling prefetches in the

> future is a possibility.

> 

> How about the following?


This is OK.

Thanks,
James

> 2017-11-15  Luis Machado  <luis.machado@linaro.org>

> 

> 	gcc/

> 	* config/aarch64/aarch64.c

> 	(qdf24xx_prefetch_tune) <default_opt_level>: Set to -1.

> 	(qdf24xx_tunings) <autoprefetcher_model>: Set to

> 	tune_params::AUTOPREFETCHER_WEAK.

> ---

>  gcc/ChangeLog                | 7 +++++++

>  gcc/config/aarch64/aarch64.c | 4 ++--

>  2 files changed, 9 insertions(+), 2 deletions(-)

> 

> diff --git a/gcc/ChangeLog b/gcc/ChangeLog

> index b80a421..0e05f9e 100644

> --- a/gcc/ChangeLog

> +++ b/gcc/ChangeLog

> @@ -1,3 +1,10 @@

> +2017-11-15  Luis Machado  <luis.machado@linaro.org>

> +

> +	* config/aarch64/aarch64.c

> +	(qdf24xx_prefetch_tune) <default_opt_level>: Set to -1.

> +	(qdf24xx_tunings) <autoprefetcher_model>: Set to

> +	tune_params::AUTOPREFETCHER_WEAK.

> +

>  2017-11-14  Carl Love  <cel@us.ibm.com>

>  

>  	* config/rs6000/rs6000.c (swap_endian_selector_for_mode): Remove

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c

> index 0c67e2b..8779cad 100644

> --- a/gcc/config/aarch64/aarch64.c

> +++ b/gcc/config/aarch64/aarch64.c

> @@ -508,7 +508,7 @@ static const cpu_prefetch_tune qdf24xx_prefetch_tune =

>    32,			/* l1_cache_size  */

>    64,			/* l1_cache_line_size  */

>    1024,			/* l2_cache_size  */

> -  3			/* default_opt_level  */

> +  -1			/* default_opt_level  */

>  };

>  

>  static const cpu_prefetch_tune thunderxt88_prefetch_tune =

> @@ -817,7 +817,7 @@ static const struct tune_params qdf24xx_tunings =

>    2,	/* min_div_recip_mul_sf.  */

>    2,	/* min_div_recip_mul_df.  */

>    0,	/* max_case_values.  */

> -  tune_params::AUTOPREFETCHER_STRONG,	/* autoprefetcher_model.  */

> +  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */

>    (AARCH64_EXTRA_TUNE_NONE),		/* tune_flags.  */

>    &qdf24xx_prefetch_tune

>  };

> -- 

> 2.7.4

>
Luis Machado Nov. 17, 2017, 4:04 p.m. UTC | #7
On 11/17/2017 01:48 PM, James Greenhalgh wrote:
> On Wed, Nov 15, 2017 at 03:00:53AM +0000, Luis Machado wrote:

>>> I think the best thing is to leave this tuning structure in place and

>>> just change default_opt_level   to -1 to disable it at -O3.

>>>

>>> Thanks,

>>> Andrew

>>>

>>

>> Indeed that seems to be more appropriate if re-enabling prefetches in the

>> future is a possibility.

>>

>> How about the following?

> 

> This is OK.

> 

> Thanks,

> James


Thanks James. I've pushed this now.

Luis
diff mbox series

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b80a421..4dbfda0 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@ 
+2017-11-14  Luis Machado  <luis.machado@linaro.org>
+
+	* config/aarch64/aarch64.c (qdf24xx_prefetch_tune): Remove.
+	(qdf24xx_tunings): Replace qdf24xx_prefetch_tune with
+	generic_prefetch_tune and tune_params::AUTOPREFETCHER_STRONG with
+	tune_params::AUTOPREFETCHER_WEAK.
+
 2017-11-14  Carl Love  <cel@us.ibm.com>
 
 	* config/rs6000/rs6000.c (swap_endian_selector_for_mode): Remove
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0c67e2b..171a230 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -502,15 +502,6 @@  static const cpu_prefetch_tune exynosm1_prefetch_tune =
   -1			/* default_opt_level  */
 };
 
-static const cpu_prefetch_tune qdf24xx_prefetch_tune =
-{
-  4,			/* num_slots  */
-  32,			/* l1_cache_size  */
-  64,			/* l1_cache_line_size  */
-  1024,			/* l2_cache_size  */
-  3			/* default_opt_level  */
-};
-
 static const cpu_prefetch_tune thunderxt88_prefetch_tune =
 {
   8,			/* num_slots  */
@@ -817,9 +808,9 @@  static const struct tune_params qdf24xx_tunings =
   2,	/* min_div_recip_mul_sf.  */
   2,	/* min_div_recip_mul_df.  */
   0,	/* max_case_values.  */
-  tune_params::AUTOPREFETCHER_STRONG,	/* autoprefetcher_model.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
   (AARCH64_EXTRA_TUNE_NONE),		/* tune_flags.  */
-  &qdf24xx_prefetch_tune
+  &generic_prefetch_tune
 };
 
 /* Tuning structure for the Qualcomm Saphira core.  Default to falkor values