diff mbox series

Improve spilling for variable-size slots

Message ID 87po8znwur.fsf@linaro.org
State New
Headers show
Series Improve spilling for variable-size slots | expand

Commit Message

Richard Sandiford Nov. 3, 2017, 4:35 p.m. UTC
Once SVE is enabled, a general AArch64 spill slot offset will be

  A + B * VL

where A is a constant and B is a multiple of the SVE vector length.
The offsets in SVE load and store instructions are a multiple of VL
(and so can encode some values of B), while offsets for base AArch64
load and store instructions aren't (and encode some values of A).

We therefore get better spill code if variable-sized slots are grouped
together separately from constant-sized slots, and if variable-sized
slots are not reused for constant-sized data.  Then, spills to the
constant-sized slots can add B * VL to the offset first, creating a
common anchor point for spills with the same B component but different
A components.  Spills to variable-sized slots can likewise add A to
the offset first, creating a common anchor point for spills with the
same A component but different B components.

This patch implements the sorting and grouping side of the optimisation.
A later patch creates the anchor points.

The patch is a no-op on other targets.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
OK to install?

Richard


2017-11-03  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* lra-spills.c (pseudo_reg_slot_compare): Sort slots by whether
	they are variable or constant sized.
	(assign_stack_slot_num_and_sort_pseudos): Don't reuse variable-sized
	slots for constant-sized data.

Comments

Jeff Law Nov. 16, 2017, 6:51 p.m. UTC | #1
On 11/03/2017 10:35 AM, Richard Sandiford wrote:
> Once SVE is enabled, a general AArch64 spill slot offset will be

> 

>   A + B * VL

> 

> where A is a constant and B is a multiple of the SVE vector length.

> The offsets in SVE load and store instructions are a multiple of VL

> (and so can encode some values of B), while offsets for base AArch64

> load and store instructions aren't (and encode some values of A).

> 

> We therefore get better spill code if variable-sized slots are grouped

> together separately from constant-sized slots, and if variable-sized

> slots are not reused for constant-sized data.  Then, spills to the

> constant-sized slots can add B * VL to the offset first, creating a

> common anchor point for spills with the same B component but different

> A components.  Spills to variable-sized slots can likewise add A to

> the offset first, creating a common anchor point for spills with the

> same A component but different B components.

> 

> This patch implements the sorting and grouping side of the optimisation.

> A later patch creates the anchor points.

> 

> The patch is a no-op on other targets.

> 

> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.

> OK to install?

> 

> Richard

> 

> 

> 2017-11-03  Richard Sandiford  <richard.sandiford@linaro.org>

> 	    Alan Hayward  <alan.hayward@arm.com>

> 	    David Sherwood  <david.sherwood@arm.com>

> 

> gcc/

> 	* lra-spills.c (pseudo_reg_slot_compare): Sort slots by whether

> 	they are variable or constant sized.

> 	(assign_stack_slot_num_and_sort_pseudos): Don't reuse variable-sized

> 	slots for constant-sized data.

OK.
jeff
>
diff mbox series

Patch

Index: gcc/lra-spills.c
===================================================================
--- gcc/lra-spills.c	2017-11-03 12:15:45.033032920 +0000
+++ gcc/lra-spills.c	2017-11-03 12:22:34.003396358 +0000
@@ -174,9 +174,17 @@  regno_freq_compare (const void *v1p, con
 }
 
 /* Sort pseudos according to their slots, putting the slots in the order
-   that they should be allocated.  Slots with lower numbers have the highest
-   priority and should get the smallest displacement from the stack or
-   frame pointer (whichever is being used).
+   that they should be allocated.
+
+   First prefer to group slots with variable sizes together and slots
+   with constant sizes together, since that usually makes them easier
+   to address from a common anchor point.  E.g. loads of polynomial-sized
+   registers tend to take polynomial offsets while loads of constant-sized
+   registers tend to take constant (non-polynomial) offsets.
+
+   Next, slots with lower numbers have the highest priority and should
+   get the smallest displacement from the stack or frame pointer
+   (whichever is being used).
 
    The first allocated slot is always closest to the frame pointer,
    so prefer lower slot numbers when frame_pointer_needed.  If the stack
@@ -194,6 +202,10 @@  pseudo_reg_slot_compare (const void *v1p
 
   slot_num1 = pseudo_slots[regno1].slot_num;
   slot_num2 = pseudo_slots[regno2].slot_num;
+  diff = (int (slots[slot_num1].size.is_constant ())
+	  - int (slots[slot_num2].size.is_constant ()));
+  if (diff != 0)
+    return diff;
   if ((diff = slot_num1 - slot_num2) != 0)
     return (frame_pointer_needed
 	    || (!FRAME_GROWS_DOWNWARD) == STACK_GROWS_DOWNWARD ? diff : -diff);
@@ -356,8 +368,17 @@  assign_stack_slot_num_and_sort_pseudos (
 	j = slots_num;
       else
 	{
+	  machine_mode mode
+	    = wider_subreg_mode (PSEUDO_REGNO_MODE (regno),
+				 lra_reg_info[regno].biggest_mode);
 	  for (j = 0; j < slots_num; j++)
 	    if (slots[j].hard_regno < 0
+		/* Although it's possible to share slots between modes
+		   with constant and non-constant widths, we usually
+		   get better spill code by keeping the constant and
+		   non-constant areas separate.  */
+		&& (GET_MODE_SIZE (mode).is_constant ()
+		    == slots[j].size.is_constant ())
 		&& ! (lra_intersected_live_ranges_p
 		      (slots[j].live_ranges,
 		       lra_reg_info[regno].live_ranges)))