diff mbox

[ARM] stop changing signedness in PROMOTE_MODE

Message ID CABXYE2X4OhMvyK-z+yQvDkcT0TgKUc2Nv+YdhXWZnJDNDMeJqg@mail.gmail.com
State New
Headers show

Commit Message

Jim Wilson July 10, 2015, 3:46 p.m. UTC
On Tue, Jul 7, 2015 at 2:35 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On July 7, 2015 6:29:21 PM GMT+02:00, Jim Wilson <jim.wilson@linaro.org> wrote:
>>signed sub-word locals.  Thus to detect the need for a conversion, you
>>have to have the decls, and we don't have them here.  There is also
>
> It probably is.  The decks for the parameter based SSA names are available, for the PHI destination there might be no decl.

I tried looking again, and found the decls.  I'm able to get correct
code for my testcase with the attached patch to force the conversion.
It is rather inelegant, but I think I can cache the values I need to
make this simpler and cleaner.  I still don't have decls from
insert_part_to_rtx_on_edge and insert_rtx_to_part_on_edge, but it
looks like those are for breaking cycles, and hence might not need
conversions.

Jim

Comments

Jim Wilson July 14, 2015, 4:49 p.m. UTC | #1
On Tue, Jul 14, 2015 at 9:13 AM, Richard Earnshaw
<Richard.Earnshaw@foss.arm.com> wrote:
> We went through this a couple of weeks back.  The backend documentation
> for PROMOTE_MODE says:

I disagree that this is a fully resolved issue.  I see clear problems
with how the ARM port uses PROMOTE_MODE.

> " For most machines, the macro definition does not change @var{unsignedp}.
> However, some machines, have instructions that preferentially handle
> either signed or unsigned quantities of certain modes.  For example, on
> the DEC Alpha, 32-bit loads from memory and 32-bit add instructions
> sign-extend the result to 64 bits.  On such machines, set
> @var{unsignedp} according to which kind of extension is more efficient."

The Alpha case is different than the ARM case.  The Alpha only changes
sign for 32-bit values in 64-bit registers.  The alpha happens to have
a nice set of instructions that operates on 32-bit values, that
accepts these wrong-signed values and handle them correctly.  Thus on
the alpha, it appears that there are no extra zero/sign extends
required, and everything is the same speed or faster with the wrong
sign.  The MIPS port works the same way.  This is not true on the arm
though.  The arm port is changing sign on 8 and 16 bit value, but does
not have instructions that directly operate on 8 and 16 bit values.
This requires the compiler to emit extra zero/sign extend instructions
that affect the performance of the code.  Other than the thumb1 8-bit
load, and the pre-armv6 use of andi for 8-bit zero-extend, I haven't
seen anything that is faster in the wrong sign, and there are a number
of operations that are actually slower because of the extra zero/sign
extends required by the arm PROMOTE_MODE definition.  I've given some
examples.

I have since noticed that the relatively new pass to optimize away
unnecessary zero/sign extensions is actually fixing some of the damage
caused by the ARM PROMOTE_MODE definition.  But there are still some
cases that won't get fixed, as per my earlier examples.  It is better
to emit the fast code at the beginning than to rely on an optimization
pass to convert the slow code into fast code.  So I think the ARM
PROMOTE_MODE definition still needs to be fixed.

> So clearly it anticipates that all permitted extensions should work, and
> in particular it makes no mention of this having to match some
> abi-mandated promotions.  That makes this a MI bug not a target one.

If you look at older gcc releases, like gcc-2.95.3, you will see that
there is only PROMOTE_MODE and a boolean PROMOTE_FUNCTION_ARGS which
says whether PROMOTE_MODE is applied to function arguments or not.
You can't have different zero/sign extensions for parameters and
locals in gcc-2.95.3.  The fact that you can have it today is more an
accident than a deliberate choice.  When PROMOTE_FUNCTION_ARGS was
hookized, it was made into a separate function, and for the ARM port,
the code for PROMOTE_MODE was not correctly copied into the new hook,
resulting in the accidental difference for parameter and local
zero/sign promotion for ARM.  The PROMOTE_MODE docs were written back
when different sign/zero extensions for parms/locals weren't possible,
and hence did not consider the case.

Now that we do have the problem, we can't fix it without an ARM port
ABI change, which is undesirable, so we may have to fix it with a MI
change.

There were two MI changes suggested, one was fixing the out-of-ssa
pass to handle SUBREG_PROMOTED_P promotions.  The other was to
disallow creating PHI nodes between parms and locals.  I haven't had a
chance to try implementing the second one yet; I hope to work on that
today.

Jim
Jim Wilson July 15, 2015, 3:54 p.m. UTC | #2
On Wed, Jul 15, 2015 at 6:04 AM, Michael Matz <matz@suse.de> wrote:
> Hi,
>
> On Tue, 14 Jul 2015, Jim Wilson wrote:
>
>> Now that we do have the problem, we can't fix it without an ARM port ABI
>> change, which is undesirable, so we may have to fix it with a MI change.
>
> What's the ABI implication of fixing the inconsistency?

Currently signed chars and signed shorts are passed sign-extended.  If
we make TARGET_PROMOTE_FUNCTION_MODE work the same as PROMOTE_MODE,
then they will be passed zero-extended.

Given the testcase:

int sub (int) __attribute__ ((noinline));
int sub2 (signed char) __attribute__ ((noinline));
int sub (int i) { return sub2 (i); }
int sub2 (signed char c) { return c & 0xff; }

Currently sub will do a char sign-extend to convert the int to signed
char, and sub2 will do a char zero-extend for the and.  With the
change, sub will do a char zero-extend to convert the int to unsigned
char, and sub2 will do nothing.  If you compile sub without the change
and sub2 with the change, then you lose the and operation and get a
sign-extended char at the end.

>> There were two MI changes suggested, one was fixing the out-of-ssa pass
>> to handle SUBREG_PROMOTED_P promotions.  The other was to disallow
>> creating PHI nodes between parms and locals.  I haven't had a chance to
>> try implementing the second one yet; I hope to work on that today.
>
> Don't bother with the latter, it doesn't have a chance of being accepted.

I tried looking at it anyways, as I need to learn more about this
stuff.  It didn't seem feasible without changing a lot of optimization
passes which doesn't seem reasonable.

> If the terrible hack in outof-ssa really will be necessary (and I really
> really hope it won't) then I think I prefer the approach you partly tried
> in comment #12 of PR 65932 already.  Let partition_to_pseudo[] refer to
> the promoted subreg and deal with that situation in emit_partition_copy;
> I'd then hope that the unsignedsrcp parameter could go away (unfortunately
> the sizeexp will have to stay).

Yes, I think that is a cleaner way to do it, but I had trouble getting
that to work as I don't know enough about the code yet.  Doing it
directly in emit_partition_copy was easier, just to prove it could
work.  I can go back and try to make this work again.

Jim
diff mbox

Patch

Index: tree-outof-ssa.c
===================================================================
--- tree-outof-ssa.c	(revision 225477)
+++ tree-outof-ssa.c	(working copy)
@@ -230,11 +230,32 @@  set_location_for_edge (edge e)
    SRC/DEST might be BLKmode memory locations SIZEEXP is a tree from
    which we deduce the size to copy in that case.  */
 
-static inline rtx_insn *
-emit_partition_copy (rtx dest, rtx src, int unsignedsrcp, tree sizeexp)
+rtx_insn *
+emit_partition_copy (rtx dest, rtx src, int unsignedsrcp, tree sizeexp,
+		     tree var2 ATTRIBUTE_UNUSED)
 {
   start_sequence ();
 
+  /* If var2 is set, then sizeexp is the src decl and var2 is the dest decl.  */
+  if (var2)
+    {
+      tree src_var = (TREE_CODE (sizeexp) == SSA_NAME
+		      ? SSA_NAME_VAR (sizeexp) : sizeexp);
+      tree dest_var = (TREE_CODE (var2) == SSA_NAME
+		       ? SSA_NAME_VAR (var2) : var2);
+      int src_unsignedp = TYPE_UNSIGNED (TREE_TYPE (src_var));
+      int dest_unsignedp = TYPE_UNSIGNED (TREE_TYPE (dest_var));
+      machine_mode src_mode = promote_decl_mode (src_var, &src_unsignedp);
+      machine_mode dest_mode = promote_decl_mode (dest_var, &dest_unsignedp);
+      if (src_unsignedp != dest_unsignedp
+	  && src_mode != DECL_MODE (src_var)
+	  && dest_mode != DECL_MODE (dest_var))
+	{
+	  src = gen_lowpart_common (DECL_MODE (src_var), src);
+	  unsignedsrcp = dest_unsignedp;
+	}
+    }
+
   if (GET_MODE (src) != VOIDmode && GET_MODE (src) != GET_MODE (dest))
     src = convert_to_mode (GET_MODE (dest), src, unsignedsrcp);
   if (GET_MODE (src) == BLKmode)
@@ -256,7 +277,7 @@  emit_partition_copy (rtx dest, rtx src,
 static void
 insert_partition_copy_on_edge (edge e, int dest, int src, source_location locus)
 {
-  tree var;
+  tree var, var2;
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
       fprintf (dump_file,
@@ -276,10 +297,11 @@  insert_partition_copy_on_edge (edge e, i
     set_curr_insn_location (locus);
 
   var = partition_to_var (SA.map, src);
+  var2 = partition_to_var (SA.map, dest);
   rtx_insn *seq = emit_partition_copy (copy_rtx (SA.partition_to_pseudo[dest]),
 				       copy_rtx (SA.partition_to_pseudo[src]),
 				       TYPE_UNSIGNED (TREE_TYPE (var)),
-				       var);
+				       var, var2);
 
   insert_insn_on_edge (seq, e);
 }
@@ -373,7 +395,8 @@  insert_rtx_to_part_on_edge (edge e, int
      involved), so it doesn't matter.  */
   rtx_insn *seq = emit_partition_copy (copy_rtx (SA.partition_to_pseudo[dest]),
 				       src, unsignedsrcp,
-				       partition_to_var (SA.map, dest));
+				       partition_to_var (SA.map, dest), 0);
+
 
   insert_insn_on_edge (seq, e);
 }
@@ -406,7 +429,7 @@  insert_part_to_rtx_on_edge (edge e, rtx
   rtx_insn *seq = emit_partition_copy (dest,
 				       copy_rtx (SA.partition_to_pseudo[src]),
 				       TYPE_UNSIGNED (TREE_TYPE (var)),
-				       var);
+				       var, 0);
 
   insert_insn_on_edge (seq, e);
 }