diff mbox

[AARCH64] improve long double 0.0 support

Message ID CABXYE2UuEX7nTeo_msQ7Wzjs2Bn0214yLSJu-GLuP+O0k-cy=g@mail.gmail.com
State New
Headers show

Commit Message

Jim Wilson June 4, 2015, 12:35 a.m. UTC
I noticed that poor code is emitted for a long double 0.0.  This testcase
long double sub (void) { return 0.0; }
void sub2 (long double *ld) { *ld = 0.0; }
currently generates
sub:
ldr q0, .LC0
ret
...
sub2:
ldr q0, .LC1
str q0, [x0]
ret
where LC0 and LC1 are 16-byte constant table long double zeros.  With
the attached patch, I get
sub:
movi v0.2d, #0
ret
...
sub2:
stp xzr, xzr, [x0]
ret

The main problem is in aarch64_valid_floating_const, which rejects all
constants for TFmode.  There is a comment that says we should handle
0, but not until after the movtf pattern is improved.  This
improvement apparently happened two years ago with this patch
2013-05-09  Sofiane Naci  <sofiane.naci@arm.com>
        * config/aarch64/aarch64.md: New movtf split.
        ...
so this comment is no longer relevant, and we should handle 0 now.
The patch deletes the out of date comment and moves the 0 check before
the TFmode check so that TFmode 0 is accepted.

There are a few other changes needed to make this work well.  The
movtf expander needs to avoid forcing 0 to a reg for a mem dest, just
like the movti pattern already does.  The Ump/?rY alternative needs to
be split into two, as %H doesn't work for const_double 0, again this
is like the movti pattern.  The condition needs to allow 0 values in
operand 1, as is done in the movti pattern.

I noticed another related problem while making this change.  The
ldp/stp instructions in the movtf_aarch64 pattern have neon attribute
types.  However, these are integer instructions with matching 'r'
constraints and hence should be using load2/store2 attribute types,
just like in the movti pattern.

This was tested with a default languages make bootstrap and make check
on an APM system.

There are some similar problems with the movsf and movdf patterns.  I
plan to submit a patch to fix them after this one is accepted.

Jim

Comments

Jim Wilson June 12, 2015, 8:24 p.m. UTC | #1
On 06/03/2015 05:35 PM, Jim Wilson wrote:
> I noticed that poor code is emitted for a long double 0.0.

ping

https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00370.html

Jim
diff mbox

Patch

2015-06-03  Jim Wilson  <jim.wilson@linaro.org>

	* config/aarch64/aarch64.c (aarch64_valid_floating_const): Move
	aarch64_float_const_zero_rtx_p check before TFmode check.
	* config/aarch64/aarch64.md (movtf): Don't call force_reg if op1 is
	an fp zero.
	(movtf_aarch64): Separate ?rY alternative into two.  Adjust assembly
	code and attributes to match.  Change condition from register_operand
	to aarch64_reg_or_fp_zero for op1.  Change type for ldp from
	neon_load1_2reg to load2.  Change type for stp from neon_store1_2reg
	to store2.

Index: config/aarch64/aarch64.c
===================================================================
--- config/aarch64/aarch64.c	(revision 224054)
+++ config/aarch64/aarch64.c	(working copy)
@@ -7430,16 +7430,13 @@  aarch64_valid_floating_const (machine_mo
   if (!CONST_DOUBLE_P (x))
     return false;
 
-  /* TODO: We could handle moving 0.0 to a TFmode register,
-     but first we would like to refactor the movtf_aarch64
-     to be more amicable to split moves properly and
-     correctly gate on TARGET_SIMD.  For now - reject all
-     constants which are not to SFmode or DFmode registers.  */
+  if (aarch64_float_const_zero_rtx_p (x))
+    return true;
+
+  /* We only handle moving 0.0 to a TFmode register.  */
   if (!(mode == SFmode || mode == DFmode))
     return false;
 
-  if (aarch64_float_const_zero_rtx_p (x))
-    return true;
   return aarch64_float_const_representable_p (x);
 }
 
Index: config/aarch64/aarch64.md
===================================================================
--- config/aarch64/aarch64.md	(revision 224053)
+++ config/aarch64/aarch64.md	(working copy)
@@ -1040,18 +1040,20 @@  (define_expand "movtf"
 	FAIL;
      }
 
-    if (GET_CODE (operands[0]) == MEM)
+    if (GET_CODE (operands[0]) == MEM
+        && ! (GET_CODE (operands[1]) == CONST_DOUBLE
+	      && aarch64_float_const_zero_rtx_p (operands[1])))
       operands[1] = force_reg (TFmode, operands[1]);
   "
 )
 
 (define_insn "*movtf_aarch64"
   [(set (match_operand:TF 0
-	 "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r ,Ump")
+	 "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r ,Ump,Ump")
 	(match_operand:TF 1
-	 "general_operand"      " w,?r, ?r,w ,Y,Y ,m,w,Ump,?rY"))]
+	 "general_operand"      " w,?r, ?r,w ,Y,Y ,m,w,Ump,?r ,Y"))]
   "TARGET_FLOAT && (register_operand (operands[0], TFmode)
-    || register_operand (operands[1], TFmode))"
+    || aarch64_reg_or_fp_zero (operands[1], TFmode))"
   "@
    orr\\t%0.16b, %1.16b, %1.16b
    #
@@ -1062,12 +1064,13 @@  (define_insn "*movtf_aarch64"
    ldr\\t%q0, %1
    str\\t%q1, %0
    ldp\\t%0, %H0, %1
-   stp\\t%1, %H1, %0"
+   stp\\t%1, %H1, %0
+   stp\\txzr, xzr, %0"
   [(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,fconstd,fconstd,\
-                     f_loadd,f_stored,neon_load1_2reg,neon_store1_2reg")
-   (set_attr "length" "4,8,8,8,4,4,4,4,4,4")
-   (set_attr "fp" "*,*,yes,yes,*,yes,yes,yes,*,*")
-   (set_attr "simd" "yes,*,*,*,yes,*,*,*,*,*")]
+                     f_loadd,f_stored,load2,store2,store2")
+   (set_attr "length" "4,8,8,8,4,4,4,4,4,4,4")
+   (set_attr "fp" "*,*,yes,yes,*,yes,yes,yes,*,*,*")
+   (set_attr "simd" "yes,*,*,*,yes,*,*,*,*,*,*")]
 )
 
 (define_split