diff mbox

[SMS] Avoid considering debug_insn when calculating SCCs

Message ID BANLkTimKTPwg3kM+rUsH7SDQOBX2iF-e5Q@mail.gmail.com
State New
Headers show

Commit Message

Revital Eres April 17, 2011, 7 a.m. UTC
Hello,

The attached patch avoids considering debug_insn when calculating SCCs.
With this change the existence of debug_insn does not influence
the scheduling order and rec_MII.

Bootstrap and regtest on ppc64-redhat-linux and regtest on arm-linux-gnueabi.

OK for mainline?

Thanks,
Revital

Changelog:

* ddg.c (find_nodes_on_paths): Ignore DEBUG_INSNs.

Comments

Revital Eres May 4, 2011, 8:15 a.m. UTC | #1
Hello,

The following is a summary of discussion I had with Ayal regarding the patch:

Some background: currently, SMS supports only  targets where the doloop
pattern is decoupled from the rest of the loop's instructions (for example
PowerPC) (we'll call it 'case decoupled' for simplicity) In this case,
the branch is not been scheduled with the rest of the instructions but
rather placed in row ii-1 at the end of the scheduling procedure after
all the rest of the instructions had been scheduled. The resulting kernel
is optimal with respect to the Stage Count because min_cycle placed in
row 0 after normalizing the cycles to start from cycle zero.
This patch tries to extend SMS to support targets where the doloop
pattern is not decoupled from the rest of the loop's instructions (name
it 'case NEW' for simplicity). In this case the branch can not be placed
wherever we want due to the fact it must honor dependencies and thus we
schedule the branch instruction with the rest of the loop's instructions
and rotate it to be in row ii-1 at the end of the scheduling procedure
to make sure it's the last instruction in the iteration.

The suggestion was to simplify the patch by always schedule the branch
with the rest of the instructions.
This should not effect performance but rather code size by increasing
the SC by at most 1, which means adding instructions from at most one
iteration to the prologue and epilogue; for case decoupled. (where we
have the alternative of normalizing the cycles and achieve optimal SC).

The following is my attempt to prove that the SC can increase by
at most one:
If the distance between min_cycle and max_cycle remains the same when
considering the same loop with decoupled branch part, once scheduling
the branch instruction with the rest of the loop's instructions and
once ignoring it; it means that the SC is at most +1 for the first case.
This is true in one direction as the branch instruction should not effect
the scheduling window of any other instruction which is what we expect
for case decoupled. The question is if there are cases where the branch
can be scheduled outside the range of min_cycle and max_cycle.  I think
there is no such case because the branch will be scheduled in asap =
0 which means that it will fall in the range of min_cycle max_cycle.
In practice there is edge between memory references and the branch
instruction with latency zero which is inserted by haifa sched. Also,
it might be that the branch will be scheduled outside the range of
min_cycle and max_cycle due to resources constraints. For example, in
PowerPC the issue rate in SMS in always 1 which forces the branch to be
scheduled in a new cycle (and might also influence ii in artifact way).

Example of resulting SMS kernel for the same loop:

The SMS kernel for case NEW, resulting in SC of 3 and ii 5:

cycle                      node
------------------------------------
-3                                0
-2
-1                                1
<- start of SC 2
0
1
2                                 3
3
4                               5 the branch
<- start of SC 3
5                                  4

The SMS kernel for case decoupled resulting in SC of 2 and ii 5:

cycle                      node
------------------------------------
0                                0
1
2                                 1
3                                 3
4                                 2
<- start of SC 2
5
6                                  3


Thanks,
Revital
diff mbox

Patch

=== modified file 'gcc/ddg.c'
--- gcc/ddg.c	2011-03-27 07:11:08 +0000
+++ gcc/ddg.c	2011-04-11 12:04:54 +0000
@@ -1116,13 +1116,19 @@  find_nodes_on_paths (sbitmap result, ddg
 	{
 	  ddg_edge_ptr e;
 	  ddg_node_ptr u_node = &g->nodes[u];
-
+	  
+	  /* Ignore DEBUG_INSNs when calculating the SCCs to avoid their
+	     influence on the scheduling order and rec_mii.  */
+	  if (DEBUG_INSN_P (u_node->insn))
+	    continue;
+	  
 	  for (e = u_node->out; e != (ddg_edge_ptr) 0; e = e->next_out)
 	    {
 	      ddg_node_ptr v_node = e->dest;
 	      int v = v_node->cuid;
 
-	      if (!TEST_BIT (reachable_from, v))
+	      /* Ignore DEBUG_INSN when calculating the SCCs.  */
+	      if (!TEST_BIT (reachable_from, v) && !DEBUG_INSN_P (v_node->insn))
 		{
 		  SET_BIT (reachable_from, v);
 		  SET_BIT (tmp, v);
@@ -1146,12 +1152,18 @@  find_nodes_on_paths (sbitmap result, ddg
 	  ddg_edge_ptr e;
 	  ddg_node_ptr u_node = &g->nodes[u];
 
+	  /* Ignore DEBUG_INSNs when calculating the SCCs to avoid their
+	     influence on the scheduling order and rec_mii.  */
+	  if (DEBUG_INSN_P (u_node->insn))
+	    continue;
+	  
 	  for (e = u_node->in; e != (ddg_edge_ptr) 0; e = e->next_in)
 	    {
 	      ddg_node_ptr v_node = e->src;
 	      int v = v_node->cuid;
 
-	      if (!TEST_BIT (reach_to, v))
+	      /* Ignore DEBUG_INSN when calculating the SCCs.  */
+	      if (!TEST_BIT (reach_to, v) && !DEBUG_INSN_P (v_node->insn))
 		{
 		  SET_BIT (reach_to, v);
 		  SET_BIT (tmp, v);