handling address mode changes inside extract_bit_field

Message ID CABXYE2VQo+55AACbN_H-DJXVZwu5TAbTYP94Op8Voh62cZ5_zw@mail.gmail.com
State New
Headers show

Commit Message

Jim Wilson March 1, 2017, 10:06 p.m.
This is a proposed patch for the bug 79794 which I just submitted.
This isn't a regression, so this can wait for after the gcc 7 branch
if necessary.

The problem here is that a reg+offset MEM target is passed to
extract_bit_field with a vector register source.  On aarch64, we have
an instruction for this, but it accepts a reg address only, so the
address gets loaded into a reg inside extract_bit_field.  We then
return to expand_expr which does
      ! rtx_equal_p (temp, target)
which fails because of the address mode change, so we end up copying
target into a reg and then back to itself.

expand_expr has a solution for this problem.  There is an alt_rtl
variable that can be set when temp is logically the same as target.
This variable is currently not passed into extract_bit_field.  This
patch does that.

There is an additional complication that the actual address load into
a reg occurs inside maybe_expand_insn, and it doesn't seem reasonable
to pass alt_reg into that.  However, I can grab a bit from the
expand_operand structure to indicate when an operand is the target,
and then clear it if target is replaced with a reg.

The resulting patch works, but ends up a bit more invasive than I
hoped.  The patch has passed a bootstrap and make check test on x86_64
and aarch64.

Jim

Comments

Jeff Law May 5, 2017, 2:24 a.m. | #1
On 03/01/2017 03:06 PM, Jim Wilson wrote:
> This is a proposed patch for the bug 79794 which I just submitted.

> This isn't a regression, so this can wait for after the gcc 7 branch

> if necessary.

> 

> The problem here is that a reg+offset MEM target is passed to

> extract_bit_field with a vector register source.  On aarch64, we have

> an instruction for this, but it accepts a reg address only, so the

> address gets loaded into a reg inside extract_bit_field.  We then

> return to expand_expr which does

>        ! rtx_equal_p (temp, target)

> which fails because of the address mode change, so we end up copying

> target into a reg and then back to itself.

> 

> expand_expr has a solution for this problem.  There is an alt_rtl

> variable that can be set when temp is logically the same as target.

> This variable is currently not passed into extract_bit_field.  This

> patch does that.

> 

> There is an additional complication that the actual address load into

> a reg occurs inside maybe_expand_insn, and it doesn't seem reasonable

> to pass alt_reg into that.  However, I can grab a bit from the

> expand_operand structure to indicate when an operand is the target,

> and then clear it if target is replaced with a reg.

> 

> The resulting patch works, but ends up a bit more invasive than I

> hoped.  The patch has passed a bootstrap and make check test on x86_64

> and aarch64.

> 

> Jim

> 

> 

> alt-rtl.patch

> 

> 

> Proposed patch for RTL expand bug affecting aarch64 vector code.

> 

> 	PR middle-end/79794

>      	* expmed.c (extract_bit_field_1): Add alt_rtl argument.  Before

> 	maybe_expand_insn call, set ops[0].target.  If still set after call,

> 	set alt_rtl.  Add extra arg to recursive calls.

> 	(extract_bit_field): Add alt_rtl argument.  Pass to

> 	extract_bit_field.

> 	* expmed.h (extract_bit_field): Fix prototype.

> 	* expr.c (emit_group_load_1, copy_blkmode_from_reg)

> 	(copy_blkmode_to_reg, read_complex_part, store_field): Pass extra NULL

> 	to extract_bit_field_calls.

> 	(expand_expr_real_1): Pass alt_rtl to expand_expr_real instead of 0.

> 	Pass alt_rtl to extract_bit_field calls.

> 	* calls.c (store_unaligned_arguments_into_psuedos)

> 	load_register_parameters): Pass extra NULL to extract_bit_field calls.

> 	* optabs.c (maybe_legitimize_operand): Clear op->target when call

> 	gen_reg_rtx.

> 	* optabs.h (struct expand_operand): Add target bitfield.

The only part I found intrusive was the op->target stuff, but it wasn't 
terrible.

The additional argument creates visual clutter in the diffs as the 
callers get updated, but that's pretty easy to filter out.

This seems fine to me.  A testcase to add to the gcc.target testsuite 
would be useful, but I don't think it's strictly necessary.



jeff
Jim Wilson May 13, 2017, 1:40 a.m. | #2
On Thu, May 4, 2017 at 7:24 PM, Jeff Law <law@redhat.com> wrote:
> On 03/01/2017 03:06 PM, Jim Wilson wrote:

> This seems fine to me.  A testcase to add to the gcc.target testsuite would

> be useful, but I don't think it's strictly necessary.


Thanks for the review.  It was 2 months since I posted it, so I
retested it to make sure x86_64 and aarch64 bootstrap and make check
still worked, then checked it in.  That took me a while as I was busy
with other stuff this week.  The comment about a testcase is a good
point; I will work on one to submit.

Jim
Martin Sebor May 13, 2017, 2:01 a.m. | #3
On 05/04/2017 08:24 PM, Jeff Law wrote:
> On 03/01/2017 03:06 PM, Jim Wilson wrote:

>> This is a proposed patch for the bug 79794 which I just submitted.

>> This isn't a regression, so this can wait for after the gcc 7 branch

>> if necessary.

>>

>> The problem here is that a reg+offset MEM target is passed to

>> extract_bit_field with a vector register source.  On aarch64, we have

>> an instruction for this, but it accepts a reg address only, so the

>> address gets loaded into a reg inside extract_bit_field.  We then

>> return to expand_expr which does

>>        ! rtx_equal_p (temp, target)

>> which fails because of the address mode change, so we end up copying

>> target into a reg and then back to itself.

>>

>> expand_expr has a solution for this problem.  There is an alt_rtl

>> variable that can be set when temp is logically the same as target.

>> This variable is currently not passed into extract_bit_field.  This

>> patch does that.

>>

>> There is an additional complication that the actual address load into

>> a reg occurs inside maybe_expand_insn, and it doesn't seem reasonable

>> to pass alt_reg into that.  However, I can grab a bit from the

>> expand_operand structure to indicate when an operand is the target,

>> and then clear it if target is replaced with a reg.

>>

>> The resulting patch works, but ends up a bit more invasive than I

>> hoped.  The patch has passed a bootstrap and make check test on x86_64

>> and aarch64.

>>

>> Jim

>>

>>

>> alt-rtl.patch

>>

>>

>> Proposed patch for RTL expand bug affecting aarch64 vector code.

>>

>>     PR middle-end/79794

>>          * expmed.c (extract_bit_field_1): Add alt_rtl argument.  Before

>>     maybe_expand_insn call, set ops[0].target.  If still set after call,

>>     set alt_rtl.  Add extra arg to recursive calls.

>>     (extract_bit_field): Add alt_rtl argument.  Pass to

>>     extract_bit_field.

>>     * expmed.h (extract_bit_field): Fix prototype.

>>     * expr.c (emit_group_load_1, copy_blkmode_from_reg)

>>     (copy_blkmode_to_reg, read_complex_part, store_field): Pass extra

>> NULL

>>     to extract_bit_field_calls.

>>     (expand_expr_real_1): Pass alt_rtl to expand_expr_real instead of 0.

>>     Pass alt_rtl to extract_bit_field calls.

>>     * calls.c (store_unaligned_arguments_into_psuedos)

>>     load_register_parameters): Pass extra NULL to extract_bit_field

>> calls.

>>     * optabs.c (maybe_legitimize_operand): Clear op->target when call

>>     gen_reg_rtx.

>>     * optabs.h (struct expand_operand): Add target bitfield.

> The only part I found intrusive was the op->target stuff, but it wasn't

> terrible.

>

> The additional argument creates visual clutter in the diffs as the

> callers get updated, but that's pretty easy to filter out.


Explicitly passing the additional argument at all the call sites
can be mitigated by giving the new alt_rtl argument a default
value of NULL in the declarations of the extract_bit_field functions.

Martin
Jim Wilson May 13, 2017, 2:27 a.m. | #4
On Fri, May 12, 2017 at 7:01 PM, Martin Sebor <msebor@gmail.com> wrote:
> Explicitly passing the additional argument at all the call sites

> can be mitigated by giving the new alt_rtl argument a default

> value of NULL in the declarations of the extract_bit_field functions.


I keep forgetting about C++ features, as I'm not used to writing C++.
I already checked in the patch, and I don't see the benefit of
changing it again.  I will try to remember this for next time.

Jim
Joseph Myers May 15, 2017, 11:46 a.m. | #5
The extra argument to extract_bit_field breaks builds for tilegx-linux-gnu 
and tilepro-linux-gnu (as shown by my glibc bot); there are calls in those 
back ends which haven't been updated.

-- 
Joseph S. Myers
joseph@codesourcery.com
Jeff Law May 15, 2017, 1:05 p.m. | #6
On 05/15/2017 05:46 AM, Joseph Myers wrote:
> The extra argument to extract_bit_field breaks builds for tilegx-linux-gnu

> and tilepro-linux-gnu (as shown by my glibc bot); there are calls in those

> back ends which haven't been updated.

> 

I've got patches for the tile backends that I'll push today.

jeff
Jim Wilson June 8, 2017, 5:07 p.m. | #7
I've got a testcase to add for this patch.  Sorry about the delay, I
took some time off to deal with a medical problem.

This was tested with and without the extract_bit_field patch.  The
testcase fails without the patch and works with the patch.

Jim	gcc/testsuite/
	PR middle-end/79794
	* gcc.target/aarch64/pr79794.c: New.

Index: gcc/testsuite/gcc.target/aarch64/pr79794.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/pr79794.c	(nonexistent)
+++ gcc/testsuite/gcc.target/aarch64/pr79794.c	(working copy)
@@ -0,0 +1,25 @@
+/* PR middle-end/79794  */
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-final { scan-assembler-not "umov" } } */
+
+struct node_struct
+{
+  float _Complex gap;
+  unsigned long long state;
+};
+
+struct reg_struct
+{
+  int size;
+  struct node_struct *node;
+};
+
+void
+func(int target, struct reg_struct *reg)
+{
+  int i;
+
+  for(i=0; i<reg->size; i++)
+    reg->node[i].state ^= ((unsigned long long) 1 << target);
+}

Jeff Law June 23, 2017, 4:05 p.m. | #8
On 06/08/2017 11:07 AM, Jim Wilson wrote:
> I've got a testcase to add for this patch.  Sorry about the delay, I

> took some time off to deal with a medical problem.

> 

> This was tested with and without the extract_bit_field patch.  The

> testcase fails without the patch and works with the patch.

Thanks.  Please go ahead and install this.

Sorry for the delay on my side as well.  stack-clash has had me buried
since late May and I'm just starting to dig out a bit.

jeff

Patch

Proposed patch for RTL expand bug affecting aarch64 vector code.

	PR middle-end/79794
    	* expmed.c (extract_bit_field_1): Add alt_rtl argument.  Before
	maybe_expand_insn call, set ops[0].target.  If still set after call,
	set alt_rtl.  Add extra arg to recursive calls.
	(extract_bit_field): Add alt_rtl argument.  Pass to
	extract_bit_field.
	* expmed.h (extract_bit_field): Fix prototype.
	* expr.c (emit_group_load_1, copy_blkmode_from_reg)
	(copy_blkmode_to_reg, read_complex_part, store_field): Pass extra NULL
	to extract_bit_field_calls.
	(expand_expr_real_1): Pass alt_rtl to expand_expr_real instead of 0.
	Pass alt_rtl to extract_bit_field calls.
	* calls.c (store_unaligned_arguments_into_psuedos)
	load_register_parameters): Pass extra NULL to extract_bit_field calls.
	* optabs.c (maybe_legitimize_operand): Clear op->target when call
	gen_reg_rtx.
	* optabs.h (struct expand_operand): Add target bitfield.

Index: gcc/calls.c
===================================================================
--- gcc/calls.c	(revision 245764)
+++ gcc/calls.c	(working copy)
@@ -1161,7 +1161,7 @@  store_unaligned_arguments_into_pseudos (struct arg
 
 	    args[i].aligned_regs[j] = reg;
 	    word = extract_bit_field (word, bitsize, 0, 1, NULL_RTX,
-				      word_mode, word_mode, false);
+				      word_mode, word_mode, false, NULL);
 
 	    /* There is no need to restrict this code to loading items
 	       in TYPE_ALIGN sized hunks.  The bitfield instructions can
@@ -2554,7 +2554,8 @@  load_register_parameters (struct arg_data *args, i
 		  unsigned int bitoff = (nregs - 1) * BITS_PER_WORD;
 		  unsigned int bitsize = size * BITS_PER_UNIT - bitoff;
 		  rtx x = extract_bit_field (mem, bitsize, bitoff, 1, dest,
-					     word_mode, word_mode, false);
+					     word_mode, word_mode, false,
+					     NULL);
 		  if (BYTES_BIG_ENDIAN)
 		    x = expand_shift (LSHIFT_EXPR, word_mode, x,
 				      BITS_PER_WORD - bitsize, dest, 1);
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	(revision 245764)
+++ gcc/expmed.c	(working copy)
@@ -1528,7 +1528,7 @@  static rtx
 extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		     unsigned HOST_WIDE_INT bitnum, int unsignedp, rtx target,
 		     machine_mode mode, machine_mode tmode,
-		     bool reverse, bool fallback_p)
+		     bool reverse, bool fallback_p, rtx *alt_rtl)
 {
   rtx op0 = str_rtx;
   machine_mode int_mode;
@@ -1604,10 +1604,13 @@  extract_bit_field_1 (rtx str_rtx, unsigned HOST_WI
       unsigned HOST_WIDE_INT pos = bitnum / GET_MODE_BITSIZE (innermode);
 
       create_output_operand (&ops[0], target, innermode);
+      ops[0].target = 1;
       create_input_operand (&ops[1], op0, outermode);
       create_integer_operand (&ops[2], pos);
       if (maybe_expand_insn (icode, 3, ops))
 	{
+	  if (alt_rtl && ops[0].target)
+	    *alt_rtl = target;
 	  target = ops[0].value;
       	  if (GET_MODE (target) != mode)
 	    return gen_lowpart (tmode, target);
@@ -1729,7 +1732,7 @@  extract_bit_field_1 (rtx str_rtx, unsigned HOST_WI
 	    = extract_bit_field_1 (op0, MIN (BITS_PER_WORD,
 					     bitsize - i * BITS_PER_WORD),
 				   bitnum + bit_offset, 1, target_part,
-				   mode, word_mode, reverse, fallback_p);
+				   mode, word_mode, reverse, fallback_p, NULL);
 
 	  gcc_assert (target_part);
 	  if (!result_part)
@@ -1832,7 +1835,7 @@  extract_bit_field_1 (rtx str_rtx, unsigned HOST_WI
 	  xop0 = copy_to_reg (xop0);
 	  rtx result = extract_bit_field_1 (xop0, bitsize, bitpos,
 					    unsignedp, target,
-					    mode, tmode, reverse, false);
+					    mode, tmode, reverse, false, NULL);
 	  if (result)
 	    return result;
 	  delete_insns_since (last);
@@ -1888,7 +1891,8 @@  extract_bit_field_1 (rtx str_rtx, unsigned HOST_WI
 rtx
 extract_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
 		   unsigned HOST_WIDE_INT bitnum, int unsignedp, rtx target,
-		   machine_mode mode, machine_mode tmode, bool reverse)
+		   machine_mode mode, machine_mode tmode, bool reverse,
+		   rtx *alt_rtl)
 {
   machine_mode mode1;
 
@@ -1923,7 +1927,7 @@  extract_bit_field (rtx str_rtx, unsigned HOST_WIDE
     }
 
   return extract_bit_field_1 (str_rtx, bitsize, bitnum, unsignedp,
-			      target, mode, tmode, reverse, true);
+			      target, mode, tmode, reverse, true, alt_rtl);
 }
 
 /* Use shifts and boolean operations to extract a field of BITSIZE bits
Index: gcc/expmed.h
===================================================================
--- gcc/expmed.h	(revision 245764)
+++ gcc/expmed.h	(working copy)
@@ -725,7 +725,7 @@  extern void store_bit_field (rtx, unsigned HOST_WI
 			     machine_mode, rtx, bool);
 extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
 			      unsigned HOST_WIDE_INT, int, rtx,
-			      machine_mode, machine_mode, bool);
+			      machine_mode, machine_mode, bool, rtx *);
 extern rtx extract_low_bits (machine_mode, machine_mode, rtx);
 extern rtx expand_mult (machine_mode, rtx, rtx, rtx, int);
 extern rtx expand_mult_highpart_adjust (machine_mode, rtx, rtx, rtx, rtx, int);
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 245764)
+++ gcc/expr.c	(working copy)
@@ -2192,7 +2192,8 @@  emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_sr
 		      && (!REG_P (tmps[i]) || GET_MODE (tmps[i]) != mode)))
 		tmps[i] = extract_bit_field (tmps[i], bytelen * BITS_PER_UNIT,
 					     subpos * BITS_PER_UNIT,
-					     1, NULL_RTX, mode, mode, false);
+					     1, NULL_RTX, mode, mode, false,
+					     NULL);
 	    }
 	  else
 	    {
@@ -2202,7 +2203,8 @@  emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_sr
 	      mem = assign_stack_temp (GET_MODE (src), slen);
 	      emit_move_insn (mem, src);
 	      tmps[i] = extract_bit_field (mem, bytelen * BITS_PER_UNIT,
-					   0, 1, NULL_RTX, mode, mode, false);
+					   0, 1, NULL_RTX, mode, mode, false,
+					   NULL);
 	    }
 	}
       /* FIXME: A SIMD parallel will eventually lead to a subreg of a
@@ -2245,7 +2247,7 @@  emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_sr
       else
 	tmps[i] = extract_bit_field (src, bytelen * BITS_PER_UNIT,
 				     bytepos * BITS_PER_UNIT, 1, NULL_RTX,
-				     mode, mode, false);
+				     mode, mode, false, NULL);
 
       if (shift)
 	tmps[i] = expand_shift (LSHIFT_EXPR, mode, tmps[i],
@@ -2697,7 +2699,7 @@  copy_blkmode_from_reg (rtx target, rtx srcreg, tre
 		       extract_bit_field (src, bitsize,
 					  xbitpos % BITS_PER_WORD, 1,
 					  NULL_RTX, copy_mode, copy_mode,
-					  false),
+					  false, NULL),
 		       false);
     }
 }
@@ -2776,7 +2778,7 @@  copy_blkmode_to_reg (machine_mode mode, tree src)
 		       extract_bit_field (src_word, bitsize,
 					  bitpos % BITS_PER_WORD, 1,
 					  NULL_RTX, word_mode, word_mode,
-					  false),
+					  false, NULL),
 		       false);
     }
 
@@ -3225,7 +3227,7 @@  read_complex_part (rtx cplx, bool imag_p)
     }
 
   return extract_bit_field (cplx, ibitsize, imag_p ? ibitsize : 0,
-			    true, NULL_RTX, imode, imode, false);
+			    true, NULL_RTX, imode, imode, false, NULL);
 }
 
 /* A subroutine of emit_move_insn_1.  Yet another lowpart generator.
@@ -6911,7 +6913,7 @@  store_field (rtx target, HOST_WIDE_INT bitsize, HO
 	{
 	  machine_mode temp_mode = smallest_mode_for_size (bitsize, MODE_INT);
 	  temp = extract_bit_field (temp, bitsize, 0, 1, NULL_RTX, temp_mode,
-				    temp_mode, false);
+				    temp_mode, false, NULL);
 	}
 
       /* Store the value in the bitfield.  */
@@ -9760,7 +9762,8 @@  expand_expr_real_1 (tree exp, rtx target, machine_
 	    case GIMPLE_SINGLE_RHS:
 	      {
 		r = expand_expr_real (gimple_assign_rhs1 (g), target,
-				      tmode, modifier, NULL, inner_reference_p);
+				      tmode, modifier, alt_rtl,
+				      inner_reference_p);
 		break;
 	      }
 	    default:
@@ -10190,7 +10193,7 @@  expand_expr_real_1 (tree exp, rtx target, machine_
 					0, TYPE_UNSIGNED (TREE_TYPE (exp)),
 					(modifier == EXPAND_STACK_PARM
 					 ? NULL_RTX : target),
-					mode, mode, false);
+					mode, mode, false, alt_rtl);
 	  }
 	if (reverse
 	    && modifier != EXPAND_MEMORY
@@ -10687,7 +10690,7 @@  expand_expr_real_1 (tree exp, rtx target, machine_
 	    op0 = extract_bit_field (op0, bitsize, bitpos, unsignedp,
 				     (modifier == EXPAND_STACK_PARM
 				      ? NULL_RTX : target),
-				     ext_mode, ext_mode, reversep);
+				     ext_mode, ext_mode, reversep, alt_rtl);
 
 	    /* If the result has a record type and the mode of OP0 is an
 	       integral mode then, if BITSIZE is narrower than this mode
@@ -10908,7 +10911,7 @@  expand_expr_real_1 (tree exp, rtx target, machine_
       else if (reduce_bit_field)
 	return extract_bit_field (op0, TYPE_PRECISION (type), 0,
 				  TYPE_UNSIGNED (type), NULL_RTX,
-				  mode, mode, false);
+				  mode, mode, false, NULL);
       /* As a last resort, spill op0 to memory, and reload it in a
 	 different mode.  */
       else if (!MEM_P (op0))
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	(revision 245764)
+++ gcc/optabs.c	(working copy)
@@ -6942,6 +6942,7 @@  maybe_legitimize_operand (enum insn_code icode, un
 	return true;
 
       op->value = gen_reg_rtx (mode);
+      op->target = 0;
       break;
 
     case EXPAND_INPUT:
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	(revision 245764)
+++ gcc/optabs.h	(working copy)
@@ -48,8 +48,11 @@  struct expand_operand {
      rather than signed.  Only meaningful for certain types.  */
   unsigned int unsigned_p : 1;
 
+  /* Is the target operand.  */
+  unsigned int target : 1;
+
   /* Unused; available for future use.  */
-  unsigned int unused : 7;
+  unsigned int unused : 6;
 
   /* The mode passed to the convert_*_operand function.  It has a
      type-dependent meaning.  */