[RFC,1/5] Add separate parms for rtl unroller

Message ID CAELXzTMdH_scC+o3xFy2mBTakFDohccDLF8ZgX1AcSS03s7k7Q@mail.gmail.com
State New
Headers show
Series
  • Loop unrolling and memory load streams
Related show

Commit Message

Kugan Vivekanandarajah Sept. 15, 2017, 1:27 a.m.
This patch adds separate params for rtl unroller so that they can be
tunned accordingly. Default values I have are based on some testing on
aarch64. I am happy to leave it as the current value and set them in
the back-end.

Thanks,
Kugan


gcc/ChangeLog:

2017-09-12  Kugan Vivekanandarajah  <kuganv@linaro.org>

    * loop-unroll.c (decide_unroll_constant_iterations): Use new params.
    (decide_unroll_runtime_iterations): Likewise.
    (decide_unroll_stupid): Likewise.
    * params.def (DEFPARAM): Separate and add new params for rtl unroller.

Comments

Richard Biener Sept. 15, 2017, 9:31 a.m. | #1
On Fri, Sep 15, 2017 at 3:27 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> This patch adds separate params for rtl unroller so that they can be

> tunned accordingly. Default values I have are based on some testing on

> aarch64. I am happy to leave it as the current value and set them in

> the back-end.


PARAM_MAX_AVERAGE_UNROLLED_INSNS is only used by the RTL
unroller.  Why should we separate PARAM_MAX_UNROLL_TIMES?

PARAM_MAX_UNROLLED_INSNS is only used by gimple passes
that perform unrolling.  Since GIMPLE is three-address it should
match RTL reasonably well -- but I'd be ok in having a separate param
for those.  But I wouldn't name those 'partial'.

That said, those are magic numbers and I expect we can find some
that work well on RTL and GIMPLE.

Richard.

>

> Thanks,

> Kugan

>

>

> gcc/ChangeLog:

>

> 2017-09-12  Kugan Vivekanandarajah  <kuganv@linaro.org>

>

>     * loop-unroll.c (decide_unroll_constant_iterations): Use new params.

>     (decide_unroll_runtime_iterations): Likewise.

>     (decide_unroll_stupid): Likewise.

>     * params.def (DEFPARAM): Separate and add new params for rtl unroller.
Kugan Vivekanandarajah Sept. 18, 2017, 1:36 a.m. | #2
Hi Richard,

On 15 September 2017 at 19:31, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Fri, Sep 15, 2017 at 3:27 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> This patch adds separate params for rtl unroller so that they can be

>> tunned accordingly. Default values I have are based on some testing on

>> aarch64. I am happy to leave it as the current value and set them in

>> the back-end.

>

> PARAM_MAX_AVERAGE_UNROLLED_INSNS is only used by the RTL

> unroller.  Why should we separate PARAM_MAX_UNROLL_TIMES?

>

> PARAM_MAX_UNROLLED_INSNS is only used by gimple passes

> that perform unrolling.  Since GIMPLE is three-address it should

> match RTL reasonably well -- but I'd be ok in having a separate param

> for those.  But I wouldn't name those 'partial'.

>

> That said, those are magic numbers and I expect we can find some

> that work well on RTL and GIMPLE.


Thanks for the review. I am mostly interested in having separate
params for RTL runtime unrolling as this is different to what GIMPLE
unroller does. May be I should have separate params only for the
runtime unrolling (or the partial unroller)  and let RTL/GIMPLE share
the other. Any preference here ? Any preference on the name ?

I am suspecting that RTL unroller which does the same as GIMPLE is
kind of obsolete now?

Thanks,
Kugan





> Richard.

>

>>

>> Thanks,

>> Kugan

>>

>>

>> gcc/ChangeLog:

>>

>> 2017-09-12  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>

>>     * loop-unroll.c (decide_unroll_constant_iterations): Use new params.

>>     (decide_unroll_runtime_iterations): Likewise.

>>     (decide_unroll_stupid): Likewise.

>>     * params.def (DEFPARAM): Separate and add new params for rtl unroller.
Richard Biener Sept. 18, 2017, 7:50 a.m. | #3
On Mon, Sep 18, 2017 at 3:36 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Richard,

>

> On 15 September 2017 at 19:31, Richard Biener

> <richard.guenther@gmail.com> wrote:

>> On Fri, Sep 15, 2017 at 3:27 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> This patch adds separate params for rtl unroller so that they can be

>>> tunned accordingly. Default values I have are based on some testing on

>>> aarch64. I am happy to leave it as the current value and set them in

>>> the back-end.

>>

>> PARAM_MAX_AVERAGE_UNROLLED_INSNS is only used by the RTL

>> unroller.  Why should we separate PARAM_MAX_UNROLL_TIMES?

>>

>> PARAM_MAX_UNROLLED_INSNS is only used by gimple passes

>> that perform unrolling.  Since GIMPLE is three-address it should

>> match RTL reasonably well -- but I'd be ok in having a separate param

>> for those.  But I wouldn't name those 'partial'.

>>

>> That said, those are magic numbers and I expect we can find some

>> that work well on RTL and GIMPLE.

>

> Thanks for the review. I am mostly interested in having separate

> params for RTL runtime unrolling as this is different to what GIMPLE

> unroller does.


Why?  Do you just want to have more magic knobs to machine-auto-tune?

> May be I should have separate params only for the

> runtime unrolling (or the partial unroller)  and let RTL/GIMPLE share

> the other. Any preference here ? Any preference on the name ?

>

> I am suspecting that RTL unroller which does the same as GIMPLE is

> kind of obsolete now?


We do not have a GIMPLE loop unroller pass.  On GIMPLE we only do
complete peeling as a separate pass and several passes perform
unrolling as part of their transform.

Richard.

> Thanks,

> Kugan

>

>

>

>

>

>> Richard.

>>

>>>

>>> Thanks,

>>> Kugan

>>>

>>>

>>> gcc/ChangeLog:

>>>

>>> 2017-09-12  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>

>>>     * loop-unroll.c (decide_unroll_constant_iterations): Use new params.

>>>     (decide_unroll_runtime_iterations): Likewise.

>>>     (decide_unroll_stupid): Likewise.

>>>     * params.def (DEFPARAM): Separate and add new params for rtl unroller.
Kugan Vivekanandarajah Sept. 19, 2017, 6:25 a.m. | #4
Hi Richard,

On 18 September 2017 at 17:50, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Sep 18, 2017 at 3:36 AM, Kugan Vivekanandarajah

> <kugan.vivekanandarajah@linaro.org> wrote:

>> Hi Richard,

>>

>> On 15 September 2017 at 19:31, Richard Biener

>> <richard.guenther@gmail.com> wrote:

>>> On Fri, Sep 15, 2017 at 3:27 AM, Kugan Vivekanandarajah

>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>> This patch adds separate params for rtl unroller so that they can be

>>>> tunned accordingly. Default values I have are based on some testing on

>>>> aarch64. I am happy to leave it as the current value and set them in

>>>> the back-end.

>>>

>>> PARAM_MAX_AVERAGE_UNROLLED_INSNS is only used by the RTL

>>> unroller.  Why should we separate PARAM_MAX_UNROLL_TIMES?

>>>

>>> PARAM_MAX_UNROLLED_INSNS is only used by gimple passes

>>> that perform unrolling.  Since GIMPLE is three-address it should

>>> match RTL reasonably well -- but I'd be ok in having a separate param

>>> for those.  But I wouldn't name those 'partial'.

>>>

>>> That said, those are magic numbers and I expect we can find some

>>> that work well on RTL and GIMPLE.

>>

>> Thanks for the review. I am mostly interested in having separate

>> params for RTL runtime unrolling as this is different to what GIMPLE

>> unroller does.

>

> Why?  Do you just want to have more magic knobs to machine-auto-tune?

>

>> May be I should have separate params only for the

>> runtime unrolling (or the partial unroller)  and let RTL/GIMPLE share

>> the other. Any preference here ? Any preference on the name ?

>>

>> I am suspecting that RTL unroller which does the same as GIMPLE is

>> kind of obsolete now?

>

> We do not have a GIMPLE loop unroller pass.  On GIMPLE we only do

> complete peeling as a separate pass and several passes perform

> unrolling as part of their transform.


Sorry for my terminology. I was referring to complete unrolling of
loops in tree-sea-loop-ivcanon.c. In any case, what I am interested in
is partial unrolling of loops in loop-unroll.c. For Falkor at least,
partially unrolling of small loops that satisfies certain condition
seems to improve performance. Since this is not enabled by default and
some of the params are also shared by other optimisations (as you also
have pointed out), I thought that having separate params for
loop-unroll.c will limit the interference of changing this params on
other optimisations. My initial patch was wrong and this should only
separate params that are shared.

And also, is anyone working on a partial unroller for Gimple? I would
like to give it a try if no one is working on this. Is there any
preferences here.

Thanks,
Kugan




> Richard.

>

>> Thanks,

>> Kugan

>>

>>

>>

>>

>>

>>> Richard.

>>>

>>>>

>>>> Thanks,

>>>> Kugan

>>>>

>>>>

>>>> gcc/ChangeLog:

>>>>

>>>> 2017-09-12  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>

>>>>     * loop-unroll.c (decide_unroll_constant_iterations): Use new params.

>>>>     (decide_unroll_runtime_iterations): Likewise.

>>>>     (decide_unroll_stupid): Likewise.

>>>>     * params.def (DEFPARAM): Separate and add new params for rtl unroller.
Richard Biener Sept. 19, 2017, 11:45 a.m. | #5
On Tue, Sep 19, 2017 at 8:25 AM, Kugan Vivekanandarajah
<kugan.vivekanandarajah@linaro.org> wrote:
> Hi Richard,

>

> On 18 September 2017 at 17:50, Richard Biener

> <richard.guenther@gmail.com> wrote:

>> On Mon, Sep 18, 2017 at 3:36 AM, Kugan Vivekanandarajah

>> <kugan.vivekanandarajah@linaro.org> wrote:

>>> Hi Richard,

>>>

>>> On 15 September 2017 at 19:31, Richard Biener

>>> <richard.guenther@gmail.com> wrote:

>>>> On Fri, Sep 15, 2017 at 3:27 AM, Kugan Vivekanandarajah

>>>> <kugan.vivekanandarajah@linaro.org> wrote:

>>>>> This patch adds separate params for rtl unroller so that they can be

>>>>> tunned accordingly. Default values I have are based on some testing on

>>>>> aarch64. I am happy to leave it as the current value and set them in

>>>>> the back-end.

>>>>

>>>> PARAM_MAX_AVERAGE_UNROLLED_INSNS is only used by the RTL

>>>> unroller.  Why should we separate PARAM_MAX_UNROLL_TIMES?

>>>>

>>>> PARAM_MAX_UNROLLED_INSNS is only used by gimple passes

>>>> that perform unrolling.  Since GIMPLE is three-address it should

>>>> match RTL reasonably well -- but I'd be ok in having a separate param

>>>> for those.  But I wouldn't name those 'partial'.

>>>>

>>>> That said, those are magic numbers and I expect we can find some

>>>> that work well on RTL and GIMPLE.

>>>

>>> Thanks for the review. I am mostly interested in having separate

>>> params for RTL runtime unrolling as this is different to what GIMPLE

>>> unroller does.

>>

>> Why?  Do you just want to have more magic knobs to machine-auto-tune?

>>

>>> May be I should have separate params only for the

>>> runtime unrolling (or the partial unroller)  and let RTL/GIMPLE share

>>> the other. Any preference here ? Any preference on the name ?

>>>

>>> I am suspecting that RTL unroller which does the same as GIMPLE is

>>> kind of obsolete now?

>>

>> We do not have a GIMPLE loop unroller pass.  On GIMPLE we only do

>> complete peeling as a separate pass and several passes perform

>> unrolling as part of their transform.

>

> Sorry for my terminology. I was referring to complete unrolling of

> loops in tree-sea-loop-ivcanon.c. In any case, what I am interested in

> is partial unrolling of loops in loop-unroll.c. For Falkor at least,

> partially unrolling of small loops that satisfies certain condition

> seems to improve performance. Since this is not enabled by default and

> some of the params are also shared by other optimisations (as you also

> have pointed out), I thought that having separate params for

> loop-unroll.c will limit the interference of changing this params on

> other optimisations. My initial patch was wrong and this should only

> separate params that are shared.


Well, as the complete peeling doesn't use those params but only
passes that perform "partial" unrolling I see nothing wrong with sharing
the parameters.

> And also, is anyone working on a partial unroller for Gimple? I would

> like to give it a try if no one is working on this. Is there any

> preferences here.


People have talked about it but I'm not aware of actual implementations.
Though an implementation is quite trivial with the interesting piece being
the costing (as usual).

Richard.

> Thanks,

> Kugan

>

>

>

>

>> Richard.

>>

>>> Thanks,

>>> Kugan

>>>

>>>

>>>

>>>

>>>

>>>> Richard.

>>>>

>>>>>

>>>>> Thanks,

>>>>> Kugan

>>>>>

>>>>>

>>>>> gcc/ChangeLog:

>>>>>

>>>>> 2017-09-12  Kugan Vivekanandarajah  <kuganv@linaro.org>

>>>>>

>>>>>     * loop-unroll.c (decide_unroll_constant_iterations): Use new params.

>>>>>     (decide_unroll_runtime_iterations): Likewise.

>>>>>     (decide_unroll_stupid): Likewise.

>>>>>     * params.def (DEFPARAM): Separate and add new params for rtl unroller.

Patch

From a899caf9f82767de3db556225b28dc52a81d5967 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Mon, 14 Aug 2017 10:12:09 +1000
Subject: [PATCH 1/5] add parms for rtl unroller

---
 gcc/loop-unroll.c | 24 ++++++++++++------------
 gcc/params.def    | 17 +++++++++++++++++
 2 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 84145bb..871558c 100644
--- a/gcc/loop-unroll.c
+++ b/gcc/loop-unroll.c
@@ -360,13 +360,13 @@  decide_unroll_constant_iterations (struct loop *loop, int flags)
 
   /* nunroll = total number of copies of the original loop body in
      unrolled loop (i.e. if it is 2, we have to duplicate loop body once.  */
-  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
+  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLEDP_INSNS) / loop->ninsns;
   nunroll_by_av
-    = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns;
+    = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLEDP_INSNS) / loop->av_ninsns;
   if (nunroll > nunroll_by_av)
     nunroll = nunroll_by_av;
-  if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
-    nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES);
+  if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES))
+    nunroll = PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES);
 
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
@@ -664,12 +664,12 @@  decide_unroll_runtime_iterations (struct loop *loop, int flags)
 
   /* nunroll = total number of copies of the original loop body in
      unrolled loop (i.e. if it is 2, we have to duplicate loop body once.  */
-  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
-  nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns;
+  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLEDP_INSNS) / loop->ninsns;
+  nunroll_by_av = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLEDP_INSNS) / loop->av_ninsns;
   if (nunroll > nunroll_by_av)
     nunroll = nunroll_by_av;
-  if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
-    nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES);
+  if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES))
+    nunroll = PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES);
 
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
@@ -1158,13 +1158,13 @@  decide_unroll_stupid (struct loop *loop, int flags)
 
   /* nunroll = total number of copies of the original loop body in
      unrolled loop (i.e. if it is 2, we have to duplicate loop body once.  */
-  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLED_INSNS) / loop->ninsns;
+  nunroll = PARAM_VALUE (PARAM_MAX_UNROLLEDP_INSNS) / loop->ninsns;
   nunroll_by_av
-    = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLED_INSNS) / loop->av_ninsns;
+    = PARAM_VALUE (PARAM_MAX_AVERAGE_UNROLLEDP_INSNS) / loop->av_ninsns;
   if (nunroll > nunroll_by_av)
     nunroll = nunroll_by_av;
-  if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLL_TIMES))
-    nunroll = PARAM_VALUE (PARAM_MAX_UNROLL_TIMES);
+  if (nunroll > (unsigned) PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES))
+    nunroll = PARAM_VALUE (PARAM_MAX_UNROLLP_TIMES);
 
   if (targetm.loop_unroll_adjust)
     nunroll = targetm.loop_unroll_adjust (nunroll, loop);
diff --git a/gcc/params.def b/gcc/params.def
index 805302b..c8b0a2b 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -302,6 +302,23 @@  DEFPARAM(PARAM_MAX_PEELED_INSNS,
 	"max-peeled-insns",
 	"The maximum number of insns of a peeled loop.",
 	100, 0, 0)
+
+DEFPARAM(PARAM_MAX_UNROLLEDP_INSNS,
+	 "max-partial-unrolled-insns",
+	 "The maximum number of instructions to consider to unroll in a loop by rtl unroller.",
+	 100, 0, 0)
+/* This parameter limits how many times the loop is unrolled depending
+   on number of insns really executed in each iteration.  */
+DEFPARAM(PARAM_MAX_AVERAGE_UNROLLEDP_INSNS,
+	 "max-partial-average-unrolled-insns",
+	 "The maximum number of instructions to consider to unroll in a loop on average by rtl unroller.",
+	 40, 0, 0)
+/* The maximum number of unrollings of a single loop.  */
+DEFPARAM(PARAM_MAX_UNROLLP_TIMES,
+	"max-partial-unroll-times",
+	"The maximum number of unrollings of a single loop by rtl unroller.",
+	4, 0, 0)
+
 /* The maximum number of peelings of a single loop.  */
 DEFPARAM(PARAM_MAX_PEEL_TIMES,
 	"max-peel-times",
-- 
2.7.4