Message ID | 1452513883-25826-1-git-send-email-james.greenhalgh@arm.com |
---|---|
State | New |
Headers | show |
On Mon, Jan 11, 2016 at 12:04:43PM +0000, James Greenhalgh wrote: > > Hi, > > I've seen a couple of large performance issues caused by expanding > the high-precision reciprocal square root for Cortex-A57, so I'd like > to turn it off by default. > > This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from > Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) for > some private microbenchmark kernels which stress the divide/sqrt/multiply > units. It therefore seems to me to be the correct choice to make across > a number of workloads. > > Bootstrapped and tested on aarch64-none-linux-gnu with no issues. > > OK? *Ping* Thanks, James > --- > 2015-12-11 James Greenhalgh <james.greenhalgh@arm.com> > > * config/aarch64/aarch64.c (cortexa57_tunings): Remove > AARCH64_EXTRA_TUNE_RECIP_SQRT. > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 1d5d898..999c9fc 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings = > 0, /* max_case_values. */ > 0, /* cache_line_size. */ > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > - (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS > - | AARCH64_EXTRA_TUNE_RECIP_SQRT) /* tune_flags. */ > + (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags. */ > }; > > static const struct tune_params cortexa72_tunings =
On Mon, Jan 25, 2016 at 11:20:46AM +0000, James Greenhalgh wrote: > On Mon, Jan 11, 2016 at 12:04:43PM +0000, James Greenhalgh wrote: > > > > Hi, > > > > I've seen a couple of large performance issues caused by expanding > > the high-precision reciprocal square root for Cortex-A57, so I'd like > > to turn it off by default. > > > > This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from > > Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) for > > some private microbenchmark kernels which stress the divide/sqrt/multiply > > units. It therefore seems to me to be the correct choice to make across > > a number of workloads. > > > > Bootstrapped and tested on aarch64-none-linux-gnu with no issues. > > > > OK? > > *Ping* *pingx2* Thanks, James > > --- > > 2015-12-11 James Greenhalgh <james.greenhalgh@arm.com> > > > > * config/aarch64/aarch64.c (cortexa57_tunings): Remove > > AARCH64_EXTRA_TUNE_RECIP_SQRT. > > > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > index 1d5d898..999c9fc 100644 > > --- a/gcc/config/aarch64/aarch64.c > > +++ b/gcc/config/aarch64/aarch64.c > > @@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings = > > 0, /* max_case_values. */ > > 0, /* cache_line_size. */ > > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > > - (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS > > - | AARCH64_EXTRA_TUNE_RECIP_SQRT) /* tune_flags. */ > > + (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags. */ > > }; > > > > static const struct tune_params cortexa72_tunings = >
On Mon, Feb 01, 2016 at 02:00:01PM +0000, James Greenhalgh wrote: > On Mon, Jan 25, 2016 at 11:20:46AM +0000, James Greenhalgh wrote: > > On Mon, Jan 11, 2016 at 12:04:43PM +0000, James Greenhalgh wrote: > > > > > > Hi, > > > > > > I've seen a couple of large performance issues caused by expanding > > > the high-precision reciprocal square root for Cortex-A57, so I'd like > > > to turn it off by default. > > > > > > This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from > > > Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) for > > > some private microbenchmark kernels which stress the divide/sqrt/multiply > > > units. It therefore seems to me to be the correct choice to make across > > > a number of workloads. > > > > > > Bootstrapped and tested on aarch64-none-linux-gnu with no issues. > > > > > > OK? > > > > *Ping* > > *pingx2* *ping^3* Thanks, James > > > --- > > > 2015-12-11 James Greenhalgh <james.greenhalgh@arm.com> > > > > > > * config/aarch64/aarch64.c (cortexa57_tunings): Remove > > > AARCH64_EXTRA_TUNE_RECIP_SQRT. > > > > > > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > > index 1d5d898..999c9fc 100644 > > > --- a/gcc/config/aarch64/aarch64.c > > > +++ b/gcc/config/aarch64/aarch64.c > > > @@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings = > > > 0, /* max_case_values. */ > > > 0, /* cache_line_size. */ > > > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > > > - (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS > > > - | AARCH64_EXTRA_TUNE_RECIP_SQRT) /* tune_flags. */ > > > + (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags. */ > > > }; > > > > > > static const struct tune_params cortexa72_tunings = > > >
On Mon, Feb 08, 2016 at 10:57:10AM +0000, James Greenhalgh wrote: > On Mon, Feb 01, 2016 at 02:00:01PM +0000, James Greenhalgh wrote: > > On Mon, Jan 25, 2016 at 11:20:46AM +0000, James Greenhalgh wrote: > > > On Mon, Jan 11, 2016 at 12:04:43PM +0000, James Greenhalgh wrote: > > > > > > > > Hi, > > > > > > > > I've seen a couple of large performance issues caused by expanding > > > > the high-precision reciprocal square root for Cortex-A57, so I'd like > > > > to turn it off by default. > > > > > > > > This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from > > > > Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) for > > > > some private microbenchmark kernels which stress the divide/sqrt/multiply > > > > units. It therefore seems to me to be the correct choice to make across > > > > a number of workloads. > > > > > > > > Bootstrapped and tested on aarch64-none-linux-gnu with no issues. > > > > > > > > OK? > > > > > > *Ping* > > > > *pingx2* > > *ping^3* *ping^4* Thanks, James > > > > --- > > > > 2015-12-11 James Greenhalgh <james.greenhalgh@arm.com> > > > > > > > > * config/aarch64/aarch64.c (cortexa57_tunings): Remove > > > > AARCH64_EXTRA_TUNE_RECIP_SQRT. > > > > > > > > > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > > > > index 1d5d898..999c9fc 100644 > > > > --- a/gcc/config/aarch64/aarch64.c > > > > +++ b/gcc/config/aarch64/aarch64.c > > > > @@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings = > > > > 0, /* max_case_values. */ > > > > 0, /* cache_line_size. */ > > > > tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > > > > - (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS > > > > - | AARCH64_EXTRA_TUNE_RECIP_SQRT) /* tune_flags. */ > > > > + (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags. */ > > > > }; > > > > > > > > static const struct tune_params cortexa72_tunings = > > > > > >
On 11 January 2016 at 12:04, James Greenhalgh <james.greenhalgh@arm.com> wrote: > 2015-12-11 James Greenhalgh <james.greenhalgh@arm.com> > > * config/aarch64/aarch64.c (cortexa57_tunings): Remove > AARCH64_EXTRA_TUNE_RECIP_SQRT. > OK /Marcus
On Mon, Feb 15, 2016 at 11:24:53AM -0600, Evandro Menezes wrote: > On 02/15/16 04:50, James Greenhalgh wrote: > >On Mon, Feb 08, 2016 at 10:57:10AM +0000, James Greenhalgh wrote: > >>On Mon, Feb 01, 2016 at 02:00:01PM +0000, James Greenhalgh wrote: > >>>On Mon, Jan 25, 2016 at 11:20:46AM +0000, James Greenhalgh wrote: > >>>>On Mon, Jan 11, 2016 at 12:04:43PM +0000, James Greenhalgh wrote: > >>>>>Hi, > >>>>> > >>>>>I've seen a couple of large performance issues caused by expanding > >>>>>the high-precision reciprocal square root for Cortex-A57, so I'd like > >>>>>to turn it off by default. > >>>>> > >>>>>This is good for art (~2%) from Spec2000, bad (~3.5%) for fma3d from > >>>>>Spec2000, good (~5.5%) for gromcas from Spec2006, and very good (>10%) for > >>>>>some private microbenchmark kernels which stress the divide/sqrt/multiply > >>>>>units. It therefore seems to me to be the correct choice to make across > >>>>>a number of workloads. > >>>>> > >>>>>Bootstrapped and tested on aarch64-none-linux-gnu with no issues. > >>>>> > >>>>>OK? > >>>>*Ping* > >>>*pingx2* > >>*ping^3* > >*ping^4* > > > >Thanks, > >James > > > >>>>>--- > >>>>>2015-12-11 James Greenhalgh <james.greenhalgh@arm.com> > >>>>> > >>>>> * config/aarch64/aarch64.c (cortexa57_tunings): Remove > >>>>> AARCH64_EXTRA_TUNE_RECIP_SQRT. > >>>>> > >>>>>diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > >>>>>index 1d5d898..999c9fc 100644 > >>>>>--- a/gcc/config/aarch64/aarch64.c > >>>>>+++ b/gcc/config/aarch64/aarch64.c > >>>>>@@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings = > >>>>> 0, /* max_case_values. */ > >>>>> 0, /* cache_line_size. */ > >>>>> tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ > >>>>>- (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS > >>>>>- | AARCH64_EXTRA_TUNE_RECIP_SQRT) /* tune_flags. */ > >>>>>+ (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags. */ > >>>>> }; > >>>>> static const struct tune_params cortexa72_tunings = > > > > James, > > There seem to be SPEC CPU2000fp validation issues on A57 when this > flag is present too. Though I evaluated the algorithm with a huge > random set of values, always delivering accuracy around 1ulp, which > should be enough for CPU2000fp (wit x86-64), I expected the > benchmarks to pass. > > My suspicion is that the Newton series on AArch64 is probably good > only for SP. Then, DP might require an extra round, probably > exacerbating the performance penalty. > > I'd like to try to split this tuning option into one for SP and > another for DP. Thoughts? I haven't seen validation issues with the default expansion, but with -mlow-precision-recip-sqrt I do see failures. I think this is to be expected. I don't support splitting the low-precision flag to "-mlow-precision-float-recip-sqrt" and "-mlow-precision-double-recip-sqrt", I think that is pushing a particular set of Spec tuning flags over any meaningful use case. I could imagine a case for splitting the internal tuning flag to give AARCH64_EXTRA_TUNE_SF_RECIP_SQRT and AARCH64_EXTRA_TUNE_DF_RECIP_SQRT, but I'm not sure I understand the benefits of this? Certainly, I think your goals for performance (turn on for 64-bit divide/sqrt) would contradict your goals for accuracy (turn off for 64-bit divide/sqrt). I'm happy with these flags as they are, but I might be missing a subtelty in your argument? Thanks, James
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1d5d898..999c9fc 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -484,8 +484,7 @@ static const struct tune_params cortexa57_tunings = 0, /* max_case_values. */ 0, /* cache_line_size. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS - | AARCH64_EXTRA_TUNE_RECIP_SQRT) /* tune_flags. */ + (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS) /* tune_flags. */ }; static const struct tune_params cortexa72_tunings =