diff mbox

[SMS,2/2,RFC] Register pressure estimation for the partial schedule

Message ID CAHz1=dUPK80PjmqTaO=FDXw56f5vunRxuDwGcfNvwLj5mqL6iA@mail.gmail.com
State New
Headers show

Commit Message

Revital Eres Nov. 21, 2011, 5:07 a.m. UTC
Hello,

The attached patch adds register pressure estimation of the partial schedule.

Tested and bootstrap with the other patches in the series on
ppc64-redhat-linux,
enabling SMS on loops with SC 1.

Comments are welcome.

Thanks,
Revital

Changelog:
        * loop-invariant.c (get_regno_pressure_class): Move function to...
        * ira.c (get_regno_pressure_class): Here.
        * common.opt (fmodulo-sched-reg-pressure): New flag.
        * doc/invoke.texi (fmodulo-sched-reg-pressure): Document it.
        * ira.h (get_regno_pressure_class): Declare.
        * rtl.h (set_reg_allocno_class): Declare.
        * reginfo.c (set_reg_allocno_class): New function.
        * Makefile.in (modulo-sched.o): Include ira.h.
        * modulo-sched.c (ira.h): New include.
        (rtl_insn_ps, undo_reg_moves, mark_def_regs, mark_reg_use,
        mark_reg_use_1, insn_exists_in_epilog_p, calc_lr_out_regs,
        change_pressure, update_reg_moves_pressure_info,
        initiate_reg_pressure_info, mark_regno_live, mark_reg_birth_1,
        mark_reg_birth, mark_regno_death, mark_ref_regs,
        calc_insn_reg_pressure_info, calc_reg_pressure, free_loop_data,
        free_reg_pressure_info, ps_reg_pressure_p): New functions.
        (apply_reg_moves): Add parameter.
        (curr_regs_live, curr_reg_pressure, curr_loop): New
        data-structures.
        (loop_data): New struct.
        (LOOP_DATA): New definition.
        (sms_schedule): Use register pressure estimation.

Comments

Richard Sandiford Nov. 25, 2011, 10:17 a.m. UTC | #1
Hi Revital,

Revital Eres <revital.eres@linaro.org> writes:
> The attached patch adds register pressure estimation of the partial schedule.

My main comment is that we shouldn't need to track separate liveness
sets for each loop here, since we're only looking at one basic block.
I.e., rather than operate on the per-loop LOOP_DATA (loop)->regs_{ref,live},
we should be able to use a single pair of bitmaps.

Also, the code goes to a lot of trouble over this case:

+  /* Add to the set of out live regs all the registers defined in bb
+     which have uses outside of it (those registers where eliminated in
+     the above calculation).  Eliminate from this set the definitions
+     that exist in the epilog and with no uses inside the basic-block
+     as these definitions will be eliminated from the bb and thus should
+     not be considered for estimating register pressure in the bb.  */

But how often does it occur in practice?  It's not necessarily the case
that the instruction will be eliminated, because things like volatility
might require us to keep it.  It's probably more accurate to say that we
can treat these as unused defs.

There's an argument to say that we should only consider registers
that are used in the loop.  If the pressure is high because of
registers that are live across the loop but not used within it,
then it's reasonable to force code outside the loop to spill some
of those.  That would suggest starting with the intersection of
DR_LR_OUT and DF_LR_BB_INFO (bb)->use.  Starting with that set
also has the advantage of handling the above case for free.

(This occurs often in our friend the popular embedded benchmark, which
often has a single function of the form:

  A: ...set up...
  B: for (i = 0; i < num_runs; i++)
  C:   ...benchmark...
  D: ...record time...

Some values are live from A->D, but those values shouldn't affect
an SMSable loop somewhere in C.)

We talked earlier about making the main pressure-estimation code
process the loop twice, but I see instead you've gone for two
separate passes, one to calculate LR out, then the main pass.
I think with the changes above, running the same loop twice is
going to be easier and no less efficient.  We could even add
code to skip the second iteration if it would start with the
same lr_out as the first iteration.

Richard
diff mbox

Patch

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi	(revision 181149)
+++ doc/invoke.texi	(working copy)
@@ -373,6 +373,7 @@  Objective-C and Objective-C++ Dialects}.
 -floop-parallelize-all -flto -flto-compression-level @gol
 -flto-partition=@var{alg} -flto-report -fmerge-all-constants @gol
 -fmerge-constants -fmodulo-sched -fmodulo-sched-allow-regmoves @gol
+-fmodulo-sched-reg-pressure @gol
 -fmove-loop-invariants fmudflap -fmudflapir -fmudflapth -fno-branch-count-reg @gol
 -fno-default-inline @gol
 -fno-defer-pop -fno-function-cse -fno-guess-branch-probability @gol
@@ -6457,6 +6458,11 @@  deleted which will trigger the generatio
 life-range analysis.  This option is effective only with
 @option{-fmodulo-sched} enabled.
 
+@item -fmodulo-sched-reg-pressure
+@opindex fmodulo-sched-reg-pressure
+Perform SMS based modulo scheduling with register pressure estimation.
+This option is effective only with @option{-fmodulo-sched} enabled.
+
 @item -fno-branch-count-reg
 @opindex fno-branch-count-reg
 Do not use ``decrement and branch'' instructions on a count register,
Index: loop-invariant.c
===================================================================
--- loop-invariant.c	(revision 181149)
+++ loop-invariant.c	(working copy)
@@ -1619,34 +1619,6 @@  static rtx regs_set[(FIRST_PSEUDO_REGIST
 /* Number of regs stored in the previous array.  */
 static int n_regs_set;
 
-/* Return pressure class and number of needed hard registers (through
-   *NREGS) of register REGNO.  */
-static enum reg_class
-get_regno_pressure_class (int regno, int *nregs)
-{
-  if (regno >= FIRST_PSEUDO_REGISTER)
-    {
-      enum reg_class pressure_class;
-
-      pressure_class = reg_allocno_class (regno);
-      pressure_class = ira_pressure_class_translate[pressure_class];
-      *nregs
-	= ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)];
-      return pressure_class;
-    }
-  else if (! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
-	   && ! TEST_HARD_REG_BIT (eliminable_regset, regno))
-    {
-      *nregs = 1;
-      return ira_pressure_class_translate[REGNO_REG_CLASS (regno)];
-    }
-  else
-    {
-      *nregs = 0;
-      return NO_REGS;
-    }
-}
-
 /* Increase (if INCR_P) or decrease current register pressure for
    register REGNO.  */
 static void
Index: common.opt
===================================================================
--- common.opt	(revision 181149)
+++ common.opt	(working copy)
@@ -1457,6 +1457,10 @@  fmodulo-sched-allow-regmoves
 Common Report Var(flag_modulo_sched_allow_regmoves)
 Perform SMS based modulo scheduling with register moves allowed
 
+fmodulo-sched-reg-pressure
+Common Report Var(flag_modulo_sched_reg_pressure)
+Perform SMS based modulo scheduling with regsiter pressure estimation.
+
 fmove-loop-invariants
 Common Report Var(flag_move_loop_invariants) Init(1) Optimization
 Move loop invariant computations out of loops
Index: ira.c
===================================================================
--- ira.c	(revision 181149)
+++ ira.c	(working copy)
@@ -3784,6 +3784,34 @@  ira (FILE *f)
   timevar_pop (TV_IRA);
 }
 
+/* Return pressure class and number of needed hard registers (through
+   *NREGS) of register REGNO.  */
+enum reg_class
+get_regno_pressure_class (int regno, int *nregs)
+{
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    {
+      enum reg_class pressure_class;
+
+      pressure_class = reg_allocno_class (regno);
+      pressure_class = ira_pressure_class_translate[pressure_class];
+      *nregs
+	= ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno)];
+      return pressure_class;
+    }
+  else if (!TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)
+	   && !TEST_HARD_REG_BIT (eliminable_regset, regno))
+    {
+      *nregs = 1;
+      return ira_pressure_class_translate[REGNO_REG_CLASS (regno)];
+    }
+  else
+    {
+      *nregs = 0;
+      return NO_REGS;
+    }
+}
+
 
 
 static bool
Index: ira.h
===================================================================
--- ira.h	(revision 181149)
+++ ira.h	(working copy)
@@ -145,3 +145,4 @@  extern bool ira_better_spill_reload_regn
 extern bool ira_bad_reload_regno (int, rtx, rtx);
 
 extern void ira_adjust_equiv_reg_cost (unsigned, int);
+enum reg_class get_regno_pressure_class (int, int *);
Index: rtl.h
===================================================================
--- rtl.h	(revision 181149)
+++ rtl.h	(working copy)
@@ -2074,6 +2074,7 @@  extern const char *decode_asm_operands (
 extern enum reg_class reg_preferred_class (int);
 extern enum reg_class reg_alternate_class (int);
 extern enum reg_class reg_allocno_class (int);
+extern void set_reg_allocno_class (int, enum reg_class);
 extern void setup_reg_classes (int, enum reg_class, enum reg_class,
 			       enum reg_class);
 
Index: reginfo.c
===================================================================
--- reginfo.c	(revision 181149)
+++ reginfo.c	(working copy)
@@ -953,6 +953,16 @@  reg_allocno_class (int regno)
   return (enum reg_class) reg_pref[regno].allocnoclass;
 }
 
+/* Set the register REG with reg_class ALLOCNOCLASS.  */
+void
+set_reg_allocno_class (int regno, enum reg_class allocnoclass)
+{
+  if (reg_pref == 0)
+    return;
+
+  reg_pref[regno].allocnoclass = allocnoclass;
+}
+
 
 
 /* Allocate space for reg info.  */
Index: Makefile.in
===================================================================
--- Makefile.in	(revision 181149)
+++ Makefile.in	(working copy)
@@ -3311,7 +3311,7 @@  modulo-sched.o : modulo-sched.c $(DDG_H)
    $(FLAGS_H) insn-config.h $(INSN_ATTR_H) $(EXCEPT_H) $(RECOG_H) \
    $(SCHED_INT_H) $(CFGLAYOUT_H) $(CFGLOOP_H) $(EXPR_H) $(PARAMS_H) \
    cfghooks.h $(GCOV_IO_H) hard-reg-set.h $(TM_H) $(TIMEVAR_H) $(TREE_PASS_H) \
-   $(DF_H) $(DBGCNT_H)
+   $(DF_H) $(DBGCNT_H) ira.h
 haifa-sched.o : haifa-sched.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(SCHED_INT_H) $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(FUNCTION_H) \
    $(INSN_ATTR_H) $(DIAGNOSTIC_CORE_H) $(RECOG_H) $(EXCEPT_H) $(TM_P_H) $(TARGET_H) output.h \

--- modulo-sched2.c	2011-11-20 08:30:12.000000000 +0100
+++ modulo-sched.c	2011-11-20 09:14:48.000000000 +0100
@@ -48,6 +48,7 @@  along with GCC; see the file COPYING3.  
 #include "tree-pass.h"
 #include "dbgcnt.h"
 #include "df.h"
+#include "ira.h"
 
 #ifdef INSN_SCHEDULING
 
@@ -326,6 +327,22 @@  ps_rtl_insn (partial_schedule_ptr ps, in
     return ps_reg_move (ps, id)->insn;
 }
 
+/* Return the ID of the partial schedule instruction in PS which belongs
+   to INSN.  */
+static int
+rtl_insn_ps (partial_schedule_ptr ps, rtx insn)
+{
+  int row;
+  ps_insn_ptr crr_insn;
+
+  for (row = 0; row < ps->ii; row++)
+    for (crr_insn = ps->rows[row]; crr_insn; crr_insn = crr_insn->next_in_row)
+      if (insn == ps_rtl_insn (ps, crr_insn->id))
+        return crr_insn->id;
+
+  return -1;
+}
+
 /* Partial schedule instruction ID, which belongs to PS, occured in
    the original (unscheduled) loop.  Return the first instruction
    in the loop that was associated with ps_rtl_insn (PS, ID).
@@ -823,10 +840,11 @@  schedule_reg_moves (partial_schedule_ptr
   return true;
 }
 
-/* Emit the moves associatied with PS.  Apply the substitutions
-   associated with them.  */
+/* Emit the moves associated with PS.  Apply the substitutions
+   associated with them.  If RESCAN_P is true update the df information.
+   */
 static void
-apply_reg_moves (partial_schedule_ptr ps)
+apply_reg_moves (partial_schedule_ptr ps, bool df_rescan_p)
 {
   ps_reg_move_info *move;
   int i;
@@ -839,11 +857,29 @@  apply_reg_moves (partial_schedule_ptr ps
       EXECUTE_IF_SET_IN_SBITMAP (move->uses, 0, i_use, sbi)
 	{
 	  replace_rtx (ps->g->nodes[i_use].insn, move->old_reg, move->new_reg);
-	  df_insn_rescan (ps->g->nodes[i_use].insn);
+	  if (df_rescan_p)
+	    df_insn_rescan (ps->g->nodes[i_use].insn);
 	}
     }
 }
 
+/* Undo the moves associated with PS.  */
+static void
+undo_reg_moves (partial_schedule_ptr ps)
+{
+  ps_reg_move_info *move;
+  int i;
+
+  FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
+  {
+    unsigned int i_use;
+    sbitmap_iterator sbi;
+
+    EXECUTE_IF_SET_IN_SBITMAP (move->uses, 0, i_use, sbi)
+      replace_rtx (ps->g->nodes[i_use].insn, move->new_reg, move->old_reg);
+  }
+}
+
 /* Bump the SCHED_TIMEs of all nodes by AMOUNT.  Set the values of
    SCHED_ROW and SCHED_STAGE.  Instruction scheduled on cycle AMOUNT
    will move to cycle zero.  */
@@ -1334,6 +1370,586 @@  setup_sched_infos (void)
   current_sched_info = &sms_sched_info;
 }
 
+/* Registers currently living.  */
+static bitmap_head curr_regs_live;
+
+/* Current reg pressure for each pressure class.  */
+static int curr_reg_pressure[N_REG_CLASSES];
+
+/* The data stored for the loop.  */
+
+struct loop_data
+{
+  /* Maximal register pressure inside loop for given register class
+     (defined only for the pressure classes).  */
+  int max_reg_pressure[N_REG_CLASSES];
+  /* Loop regs referenced and live pseudo-registers.  */
+  bitmap_head regs_ref;
+  bitmap_head regs_live;
+};
+
+#define LOOP_DATA(LOOP) ((struct loop_data *) (LOOP)->aux)
+
+/* Currently processed loop.  */
+static struct loop *curr_loop;
+
+/* Auxiliary function for calc_lr_out_regs.  */
+static void
+mark_def_regs (rtx reg, const_rtx setter ATTRIBUTE_UNUSED, void *data)
+{
+  bitmap_head *def_regs = (bitmap_head *) data;
+  
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+  
+  if (!REG_P (reg))
+    return;
+  
+  bitmap_set_bit (def_regs, REGNO (reg));
+  return;
+}
+
+/* Auxiliary function for mark_reg_use_1.  */
+static int
+mark_reg_use (rtx * x, void *data)
+{
+  bitmap_head *reg_used = (bitmap_head *) data;
+  
+  if (REG_P (*x))
+    bitmap_set_bit (reg_used, REGNO (*x));
+
+  return 0;
+}
+
+/* Auxiliary function for calc_lr_out_regs.  */
+static void
+mark_reg_use_1 (rtx * x, void *data)
+{
+  for_each_rtx (x, mark_reg_use, data);
+}
+
+/* Return TRUE if the instruction noted by ID will be emitted in the
+   epilog.  Otherwise return FALSE.  Use STAGE_COUNT in the calculation.
+   */
+static bool
+insn_exists_in_epilog_p (partial_schedule_ptr ps, int id, int stage_count)
+{
+  int last_stage = stage_count - 1;
+  int first_u, last_u;
+  int i;
+
+  first_u = SCHED_STAGE (id);
+  last_u = first_u + ps_num_consecutive_stages (ps, id) - 1;
+
+  for (i = 0; i < last_stage; i++)
+    if ((i + 1) <= last_u && last_stage >= first_u)
+      return true;
+
+  return false;
+}
+
+/* Calculate the registers that live out of the basic-block and mark
+   them in LR_OUT_REGS bitmap.  Use stage-count SC in the calculation.
+   Mark in SKIP_INSNS bitmap instructions that should not be considered
+   for the register pressure calculation.  */
+static void
+calc_lr_out_regs (partial_schedule_ptr ps,
+		  bitmap_head *lr_out_regs, bitmap_head *skip_insns, int sc)
+{
+  rtx insn;
+  bitmap_head insn_defs;
+  bitmap_head tmp_lr_out_regs;
+  basic_block bb = ps->g->bb;
+  unsigned int j;
+  bitmap_iterator bi;
+  int k;
+  unsigned rd_num;
+  struct df_rd_bb_info *rd_bb_info;
+  rtx link;
+
+  bitmap_initialize (&tmp_lr_out_regs, &reg_obstack);
+  bitmap_initialize (&insn_defs, &reg_obstack);
+
+  /* Start with the set of registers in DF_LR_OUT.  */
+  bitmap_copy (lr_out_regs, DF_LR_OUT (bb));
+  for (k = ps->ii - 1; k >= 0; k--)
+    {
+      ps_insn_ptr ps_i = ps->rows_reverse[k];
+
+      while (ps_i)
+	{
+	  insn = ps_rtl_insn (ps, ps_i->id);
+
+	  if (!NONDEBUG_INSN_P (insn))
+	    continue;
+
+	  bitmap_clear (&tmp_lr_out_regs);
+	  bitmap_clear (&insn_defs);
+	  note_uses (&PATTERN (insn), mark_reg_use_1, &tmp_lr_out_regs);
+#ifdef AUTO_INC_DEC
+	  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
+	    if (REG_NOTE_KIND (link) == REG_INC)
+	      mark_def_regs (XEXP (link, 0), NULL, &insn_defs);
+#endif
+	  note_stores (PATTERN (insn), mark_def_regs, &insn_defs);
+	  /* Remove from the set of lr_out_regs registers any register
+	     defined in the current instruction.  */
+	  EXECUTE_IF_SET_IN_BITMAP (&insn_defs, 0, j, bi)
+	    bitmap_clear_bit (lr_out_regs, j);
+	  bitmap_ior_into (lr_out_regs, &tmp_lr_out_regs);
+	  ps_i = ps_i->prev_in_row;
+	}
+    }
+  /* Add to the set of out live regs all the registers defined in bb
+     which have uses outside of it (those registers where eliminated in
+     the above calculation).  Eliminate from this set the definitions
+     that exist in the epilog and with no uses inside the basic-block
+     as these definitions will be eliminated from the bb and thus should
+     not be considered for estimating register pressure in the bb.  */
+  rd_bb_info = DF_RD_BB_INFO (bb);
+  EXECUTE_IF_SET_IN_BITMAP (&rd_bb_info->gen, 0, rd_num, bi)
+  {
+    df_ref rd = DF_DEFS_GET (rd_num);
+    struct df_link *r_use;
+    int regno = DF_REF_REGNO (rd);
+    rtx def_insn = DF_REF_INSN (rd);
+    bool use_outside_of_bb = false;
+    int num_uses = 0;
+
+    if (!bitmap_bit_p (DF_LR_OUT (bb), regno))
+      continue;
+
+    for (r_use = DF_REF_CHAIN (rd); r_use != NULL; r_use = r_use->next)
+      {
+	rtx use_insn = DF_REF_INSN (r_use->ref);
+
+	if (BLOCK_FOR_INSN (use_insn) != bb)
+	  {
+	    use_outside_of_bb = true;
+	    continue;
+	  }
+
+	num_uses++;
+      }
+
+    if (use_outside_of_bb)
+      bitmap_set_bit (lr_out_regs, regno);
+
+    if (num_uses == 0)
+      {
+	int id = rtl_insn_ps (ps, def_insn);
+
+	gcc_assert (id >= 0);
+
+	if (insn_exists_in_epilog_p (ps, id, sc))
+	  {
+	    bitmap_set_bit (skip_insns, INSN_UID (def_insn));
+	    bitmap_clear_bit (lr_out_regs, regno);
+	  }
+      }
+  }
+
+  bitmap_clear (&insn_defs);
+  bitmap_clear (&tmp_lr_out_regs);
+}
+
+/* Increase (if INCR_P) or decrease current register pressure for
+   register REGNO.  */
+static void
+change_pressure (int regno, bool incr_p)
+{
+  int nregs;
+  enum reg_class pressure_class;
+
+  pressure_class = get_regno_pressure_class (regno, &nregs);
+  if (!incr_p)
+    curr_reg_pressure[pressure_class] -= nregs;
+  else
+    curr_reg_pressure[pressure_class] += nregs;
+}
+
+/* Update the register class information for the register moves in PS.  */
+static void
+update_reg_moves_pressure_info (partial_schedule_ptr ps)
+{
+  ps_reg_move_info *move;
+  int i;
+
+  if (resize_reg_info ())
+    ira_set_pseudo_classes (dump_file);
+
+  FOR_EACH_VEC_ELT (ps_reg_move_info, ps->reg_moves, i, move)
+  {
+    enum reg_class pressure_class;
+    int regno_new = REGNO (move->new_reg);
+    int regno_old = REGNO (move->old_reg);
+
+    /* Update register class information for the register moves.  */
+    pressure_class = reg_allocno_class (regno_old);
+    set_reg_allocno_class (regno_new, pressure_class);
+    pressure_class = reg_allocno_class (regno_new);
+
+    pressure_class = ira_pressure_class_translate[pressure_class];
+    ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno_new)]
+      =
+      ira_reg_class_max_nregs[pressure_class][PSEUDO_REGNO_MODE (regno_old)];
+
+    setup_reg_classes (regno_new, reg_preferred_class (regno_old),
+		       reg_alternate_class (regno_old),
+		       reg_allocno_class (regno_old));
+  }
+}
+
+/* Initialize the data-structures needed for the register pressure
+   calculation.  Mark in LR_OUT_REGS bitmap the live out registers
+   and in SKIP_INSNS the instructions that should not be considered in
+   the calculation.  */
+static void
+initiate_reg_pressure_info (partial_schedule_ptr ps,
+			    bitmap_head *lr_out_regs,
+			    bitmap_head *skip_insns, int sc)
+{
+  struct loop *loop;
+  loop_iterator li;
+  int i;
+  unsigned int j;
+  bitmap_iterator bi;
+  basic_block bb = ps->g->bb;
+
+  /* Calculate the LR_LIVE_OUT set of registers.  */  
+  calc_lr_out_regs (ps, lr_out_regs, skip_insns, sc);
+  update_reg_moves_pressure_info (ps);
+  
+  FOR_EACH_LOOP (li, loop, 0) 
+    if (loop->aux == NULL)
+      {
+	loop->aux = xcalloc (1, sizeof (struct loop_data));
+	memset (LOOP_DATA (loop)->max_reg_pressure, INT_MIN,
+		sizeof (LOOP_DATA (loop)->max_reg_pressure));
+	bitmap_initialize (&LOOP_DATA (loop)->regs_ref, &reg_obstack);
+	bitmap_initialize (&LOOP_DATA (loop)->regs_live, &reg_obstack);
+      }
+  
+  bitmap_initialize (&curr_regs_live, &reg_obstack);
+  curr_loop = bb->loop_father;
+  if (curr_loop != current_loops->tree_root)
+    for (loop = curr_loop;
+	 loop->num != current_loops->tree_root->num;
+	 loop = loop_outer (loop))
+      bitmap_ior_into (&LOOP_DATA (loop)->regs_live, DF_LR_OUT (bb));
+  
+  bitmap_ior_into (&LOOP_DATA (curr_loop)->regs_live, lr_out_regs);
+  bitmap_copy (&curr_regs_live, lr_out_regs);
+  for (i = 0; i < ira_pressure_classes_num; i++)
+    curr_reg_pressure[ira_pressure_classes[i]] = 0;
+
+  EXECUTE_IF_SET_IN_BITMAP (&curr_regs_live, 0, j, bi)
+    change_pressure (j, true);
+}
+
+/* Mark REGNO birth.  */
+static void
+mark_regno_live (int regno)
+{
+  struct loop *loop;
+
+  for (loop = curr_loop;
+       loop != current_loops->tree_root;
+       loop = loop_outer (loop))
+    bitmap_set_bit (&LOOP_DATA (loop)->regs_live, regno);
+  if (!bitmap_set_bit (&curr_regs_live, regno))
+    return;
+
+  change_pressure (regno, true);
+}
+
+static int
+mark_reg_birth_1 (rtx *x, void *data ATTRIBUTE_UNUSED)
+{
+  int regno;
+  rtx reg = *x;
+
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+
+  if (!REG_P (reg))
+    return 0;
+
+  regno = REGNO (reg);
+
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    mark_regno_live (regno);
+  else
+    {
+      int last = regno + hard_regno_nregs[regno][GET_MODE (reg)];
+
+      while (regno < last)
+	{
+	  mark_regno_live (regno);
+	  regno++;
+	}
+    }
+  return 0;
+}
+
+/* Mark any register in X as live.  */
+static void
+mark_reg_birth (rtx *x, void *data)
+{
+  for_each_rtx (x, mark_reg_birth_1, data);
+}
+
+/* Mark REGNO death.  */
+static void
+mark_regno_death (int regno, void *data ATTRIBUTE_UNUSED)
+{
+  if (!bitmap_clear_bit (&curr_regs_live, regno))
+    return;
+
+  change_pressure (regno, false);
+}
+
+/* Mark register REG death.  */
+static void
+mark_reg_death (rtx reg, const_rtx setter ATTRIBUTE_UNUSED,
+		void *data ATTRIBUTE_UNUSED)
+{
+  int regno;
+
+  if (GET_CODE (reg) == SUBREG)
+    reg = SUBREG_REG (reg);
+
+  if (!REG_P (reg))
+    return;
+
+  regno = REGNO (reg);
+
+  if (regno >= FIRST_PSEUDO_REGISTER)
+    mark_regno_death (regno, data);
+  else
+    {
+      int last = regno + hard_regno_nregs[regno][GET_MODE (reg)];
+
+      while (regno < last)
+	{
+	  mark_regno_death (regno, data);
+	  regno++;
+	}
+    }
+}
+
+/* Mark occurrence of registers in X. TMP_CURR_REGS_LIVE
+   bitmap holds the set of live registers. TMP_REG_PRESSURE holds the
+   register pressure so far.  */
+static void
+mark_ref_regs (rtx x, int *tmp_reg_pressure, bitmap_head *tmp_curr_regs_live)
+{
+  RTX_CODE code;
+  int i;
+  const char *fmt;
+  int nregs;
+  enum reg_class pressure_class;
+
+  if (!x)
+    return;
+
+  code = GET_CODE (x);
+  if (code == REG)
+    {
+      struct loop *loop;
+
+      for (loop = curr_loop;
+	   loop != current_loops->tree_root; loop = loop_outer (loop))
+	bitmap_set_bit (&LOOP_DATA (loop)->regs_ref, REGNO (x));
+
+      if (bitmap_set_bit (tmp_curr_regs_live, REGNO (x)))
+	{
+	  pressure_class = get_regno_pressure_class (REGNO (x), &nregs);
+	  tmp_reg_pressure[pressure_class] += nregs;
+	}
+      return;
+    }
+  fmt = GET_RTX_FORMAT (code);
+  for (i = GET_RTX_LENGTH (code) - 1; i >= 0; i--)
+    if (fmt[i] == 'e')
+      mark_ref_regs (XEXP (x, i), tmp_reg_pressure, tmp_curr_regs_live);
+    else if (fmt[i] == 'E')
+      {
+	int j;
+
+	for (j = 0; j < XVECLEN (x, i); j++)
+	  mark_ref_regs (XVECEXP (x, i, j), tmp_reg_pressure,
+			 tmp_curr_regs_live);
+      }
+}
+
+/* Update the register pressure for INSN.  */
+static void
+calc_insn_reg_pressure_info (rtx insn)
+{
+  rtx link;
+  int i;
+  int tmp_reg_pressure[N_REG_CLASSES];
+  bitmap_head tmp_curr_regs_live;
+
+  bitmap_initialize (&tmp_curr_regs_live, &reg_obstack);
+
+  /* Tmp_curr_regs_live and tmp_reg_pressure hold the register pressure
+     information seen so far including for the current instruction.
+     We are taking a conservative approach here is the sense that we do
+     not add the dead registers in the current instruction to the pull
+     of available registers just yet.  */
+  bitmap_copy (&tmp_curr_regs_live, &curr_regs_live);
+  memcpy (tmp_reg_pressure, curr_reg_pressure, N_REG_CLASSES * sizeof (int));
+  mark_ref_regs (PATTERN (insn), tmp_reg_pressure, &tmp_curr_regs_live);
+
+  /* Update the pressure after the instruction.  */
+  note_stores (PATTERN (insn), mark_reg_death, NULL);
+
+#ifdef AUTO_INC_DEC
+  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
+    if (REG_NOTE_KIND (link) == REG_INC)
+      mark_reg_death (XEXP (link, 0), NULL, NULL);
+#endif
+  note_uses (&PATTERN (insn), mark_reg_birth, NULL);
+  /* Update max pressure.  */
+  for (i = 0; (int) i < ira_pressure_classes_num; i++)
+    {
+      enum reg_class pressure_class;
+
+      pressure_class = ira_pressure_classes[i];
+      if (LOOP_DATA (curr_loop)->max_reg_pressure[pressure_class] <
+	  tmp_reg_pressure[pressure_class])
+	LOOP_DATA (curr_loop)->max_reg_pressure[pressure_class] =
+	  tmp_reg_pressure[pressure_class];
+    }
+
+  bitmap_clear (&tmp_curr_regs_live);
+}
+
+/* Calculate the resgiter pressure in PS.  SKIP_INSNS bitmap holds
+   the instructions that should be ignored during the calculation.  */
+static void
+calc_reg_pressure (partial_schedule_ptr ps, 
+		   bitmap_head *skip_insns)
+{
+  int k;
+  
+  for (k = ps->ii - 1; k >= 0; k--)
+    {
+      ps_insn_ptr ps_i = ps->rows_reverse[k];
+      
+      while (ps_i)
+	{
+	  rtx insn = ps_rtl_insn (ps, ps_i->id);
+	  
+	  if (bitmap_bit_p (skip_insns, INSN_UID (insn)))
+	    goto next;
+	  
+	  if (!NONDEBUG_INSN_P (insn))
+	    goto next;
+	  
+	  calc_insn_reg_pressure_info (insn);
+	next:
+	  ps_i = ps_i->prev_in_row;
+	}
+    }
+}
+
+/* Releases the auxiliary data for LOOP.  */
+static void
+free_loop_data (struct loop *loop)
+{
+  struct loop_data *data = LOOP_DATA (loop);
+  if (!data)
+    return;
+
+  bitmap_clear (&LOOP_DATA (loop)->regs_ref);
+  bitmap_clear (&LOOP_DATA (loop)->regs_live);
+  free (data);
+  loop->aux = NULL;
+}
+
+/* Free the data-structures needed for the calculation.  */
+static void
+free_reg_pressure_info (void)
+{
+  loop_iterator li;
+  struct loop *loop;
+
+  bitmap_clear (&curr_regs_live);
+
+  FOR_EACH_LOOP (li, loop, 0) 
+    free_loop_data (loop);
+}
+
+/* Return TRUE if PS has register pressure. Otherwise return FALSE.
+   LOOP is the original loop and SC is the stage count which is needed
+   for the calculation.  */
+static bool
+ps_reg_pressure_p (struct loop *loop, partial_schedule_ptr ps, int sc)
+{
+  bool pressure_p = false;
+  int i;
+  bitmap_head lr_out_regs;
+  bitmap_head skip_insns;
+
+  bitmap_initialize (&lr_out_regs, &reg_obstack);
+  bitmap_initialize (&skip_insns, &reg_obstack);
+  apply_reg_moves (ps, 0);
+  initiate_reg_pressure_info (ps, &lr_out_regs, &skip_insns, sc); 
+  calc_reg_pressure (ps, &skip_insns);
+
+  if (dump_file)
+    {
+      struct loop *parent;
+      unsigned int j;
+      bitmap_iterator bi;
+
+      parent = loop_outer (loop);
+      fprintf (dump_file, "\n  Loop %d (parent %d, header bb%d, depth %d)\n",
+	       loop->num, (parent == NULL ? -1 : parent->num),
+	       loop->header->index, loop_depth (loop));
+      fprintf (dump_file, "\n    ref. regnos:");
+      EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (loop)->regs_ref, 0, j, bi)
+	fprintf (dump_file, " %d", j);
+      fprintf (dump_file, "\n    live regnos:");
+      EXECUTE_IF_SET_IN_BITMAP (&LOOP_DATA (loop)->regs_live, 0, j, bi)
+	fprintf (dump_file, " %d", j);
+      fprintf (dump_file, "\n    Pressure:");
+    }
+
+  for (i = 0; i < ira_pressure_classes_num; i++)
+    {
+      enum reg_class pressure_class;
+
+      pressure_class = ira_pressure_classes[i];
+
+      if (LOOP_DATA (loop)->max_reg_pressure[pressure_class] == 0)
+	continue;
+
+      if (dump_file)
+	fprintf (dump_file, "%s=%d  %d ", reg_class_names[pressure_class],
+		 LOOP_DATA (loop)->max_reg_pressure[pressure_class],
+		 ira_available_class_regs[pressure_class]);
+
+      if (LOOP_DATA (loop)->max_reg_pressure[pressure_class]
+	  > ira_class_hard_regs_num[pressure_class])
+	{
+	  if (dump_file)
+	    fprintf (dump_file, "(pressure)\n");
+
+	  pressure_p = true;
+	}
+    }
+
+  bitmap_clear (&lr_out_regs);
+  bitmap_clear (&skip_insns);
+  free_reg_pressure_info ();
+  undo_reg_moves (ps);
+  return pressure_p;
+}
+
 /* Probability in % that the sms-ed loop rolls enough so that optimized
    version may be entered.  Just a guess.  */
 #define PROB_SMS_ENOUGH_ITERATIONS 80
@@ -1366,6 +1982,13 @@  sms_schedule (void)
       return;  /* There are no loops to schedule.  */
     }
 
+  if (flag_modulo_sched_reg_pressure)
+    {
+      regstat_init_n_sets_and_refs ();
+      ira_set_pseudo_classes (dump_file);
+      ira_setup_eliminable_regset ();
+    }
+
   /* Initialize issue_rate.  */
   if (targetm.sched.issue_rate)
     {
@@ -1681,7 +2304,9 @@  sms_schedule (void)
 	  set_columns_for_ps (ps);
 
 	  min_cycle = PS_MIN_CYCLE (ps) - SMODULO (PS_MIN_CYCLE (ps), ps->ii);
-	  if (!schedule_reg_moves (ps))
+	  if (!schedule_reg_moves (ps) 
+	      || (flag_modulo_sched_reg_pressure
+		  && ps_reg_pressure_p (loop, ps, stage_count)))
 	    {
 	      mii = ps->ii + 1;
 	      free_partial_schedule (ps);
@@ -1742,7 +2367,7 @@  sms_schedule (void)
 	  /* The life-info is not valid any more.  */
 	  df_set_bb_dirty (g->bb);
 
-	  apply_reg_moves (ps);
+	  apply_reg_moves (ps, 1);
 	  if (dump_file)
 	    print_node_sched_params (dump_file, g->num_nodes, ps);
 	  /* Generate prolog and epilog.  */
@@ -1757,6 +2382,11 @@  sms_schedule (void)
     }
 
   free (g_arr);
+  if (flag_modulo_sched_reg_pressure)
+    {
+      regstat_free_n_sets_and_refs ();
+      free_reg_info ();
+    }
 
   /* Release scheduler data, needed until now because of DFA.  */
   haifa_sched_finish ();