From patchwork Thu Jun  4 00:35:35 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jim Wilson <jim.wilson@linaro.org>
X-Patchwork-Id: 49497
Return-Path: <patchwork-forward+bncBC7PZZF3R4KBB3N2X2VQKGQEC4EVOLY@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-wi0-f198.google.com (mail-wi0-f198.google.com
 [209.85.212.198])
 by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 4A10820DDC
 for <linaro@patches.linaro.org>; Thu,  4 Jun 2015 00:35:58 +0000 (UTC)
Received: by wifx6 with SMTP id x6sf9342973wif.1
 for <linaro@patches.linaro.org>; Wed, 03 Jun 2015 17:35:57 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :delivered-to:mime-version:date:message-id:subject:from:to
 :content-type:x-original-sender:x-original-authentication-results;
 bh=hMEHwSk7D6nRvvq1sIh2OW0JhoL5RfeymS3bXGjVIyw=;
 b=Kl5tLYOoAE1BFQdBIQNfQsTdICJwqIlnU8G2LxUmXZSV0ox+xhPNFW9azK/c8l5uRw
 03HyFQutth355QkesAzvQz6nMaYAqK4UFNySY4Ov0zNiRxO7EFS5aqs38vhzObavGC/D
 Pg6gW2vYJRLHnPA9TBLF/B9w0VNzIM4tbB+W95a9iEznUr8S+ib1ct2fI/l2Inp2Dxp2
 pnl5i41EZemBueFbQ+nbsd66d+HCsC5v7vpBpWDl/T16McjmG6SNFGGP9aFb+aoHmnMt
 cN/Ze6xXFv9A90Elg3syjapxD22KI8zfcM7w4CUBoPCqXmfqj6Htc0jQcW6r2eHgxHwa
 ZPKQ==
X-Gm-Message-State: ALoCoQl5eYndbbXNFWw+ZdnVJyblmhwm86ZK3us5sGbkeMbtbmd8/RCzwJqiN5jX+uXInSaepqlb
X-Received: by 10.112.118.162 with SMTP id kn2mr33291355lbb.22.1433378157467; 
 Wed, 03 Jun 2015 17:35:57 -0700 (PDT)
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.152.115.161 with SMTP id jp1ls130850lab.96.gmail; Wed, 03 Jun
 2015 17:35:57 -0700 (PDT)
X-Received: by 10.112.93.37 with SMTP id cr5mr13274837lbb.106.1433378157319; 
 Wed, 03 Jun 2015 17:35:57 -0700 (PDT)
Received: from mail-lb0-x230.google.com (mail-lb0-x230.google.com.
 [2a00:1450:4010:c04::230])
 by mx.google.com with ESMTPS id z9si321206laj.152.2015.06.03.17.35.57
 for <patchwork-forward@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 03 Jun 2015 17:35:57 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c04::230 as permitted sender)
 client-ip=2a00:1450:4010:c04::230; 
Received: by lbbqq2 with SMTP id qq2so17464306lbb.3
 for <patchwork-forward@linaro.org>;
 Wed, 03 Jun 2015 17:35:57 -0700 (PDT)
X-Received: by 10.112.182.4 with SMTP id ea4mr29801413lbc.35.1433378156975; 
 Wed, 03 Jun 2015 17:35:56 -0700 (PDT)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patch@linaro.org
Received: by 10.112.108.230 with SMTP id hn6csp198387lbb;
 Wed, 3 Jun 2015 17:35:55 -0700 (PDT)
X-Received: by 10.70.88.145 with SMTP id bg17mr64394839pdb.167.1433378154535; 
 Wed, 03 Jun 2015 17:35:54 -0700 (PDT)
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 yl8si3182932pab.167.2015.06.03.17.35.53 for <patch@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Wed, 03 Jun 2015 17:35:54 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-399836-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Received: (qmail 36529 invoked by alias); 4 Jun 2015 00:35:40 -0000
Mailing-List: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
Precedence: list
List-Id: <patchwork-forward.linaro.org>
List-Unsubscribe: <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>, 
 <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 36519 invoked by uid 89); 4 Jun 2015 00:35:39 -0000
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL, BAYES_00,
 KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_LOW,
 SPF_PASS autolearn=no version=3.3.2
X-HELO: mail-qk0-f175.google.com
Received: from mail-qk0-f175.google.com (HELO mail-qk0-f175.google.com)
 (209.85.220.175) by sourceware.org
 (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256
 encrypted) ESMTPS; Thu, 04 Jun 2015 00:35:38 +0000
Received: by qkhg32 with SMTP id g32so15895312qkh.0 for
 <gcc-patches@gcc.gnu.org>; Wed, 03 Jun 2015 17:35:36 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.140.102.180 with SMTP id w49mr38700579qge.82.1433378136025; 
 Wed, 03 Jun 2015 17:35:36 -0700 (PDT)
Received: by 10.140.94.182 with HTTP; Wed, 3 Jun 2015 17:35:35 -0700 (PDT)
Date: Wed, 3 Jun 2015 17:35:35 -0700
Message-ID: <CABXYE2UuEX7nTeo_msQ7Wzjs2Bn0214yLSJu-GLuP+O0k-cy=g@mail.gmail.com>
Subject: [PATCH, AARCH64] improve long double 0.0 support
From: Jim Wilson <jim.wilson@linaro.org>
To: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
X-Original-Sender: jim.wilson@linaro.org
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com:
 domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c04::230 as permitted sender)
 smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org; 
 dkim=pass header.i=@gcc.gnu.org
X-Google-Group-Id: 836684582541

I noticed that poor code is emitted for a long double 0.0.  This testcase
long double sub (void) { return 0.0; }
void sub2 (long double *ld) { *ld = 0.0; }
currently generates
sub:
ldr q0, .LC0
ret
...
sub2:
ldr q0, .LC1
str q0, [x0]
ret
where LC0 and LC1 are 16-byte constant table long double zeros.  With
the attached patch, I get
sub:
movi v0.2d, #0
ret
...
sub2:
stp xzr, xzr, [x0]
ret

The main problem is in aarch64_valid_floating_const, which rejects all
constants for TFmode.  There is a comment that says we should handle
0, but not until after the movtf pattern is improved.  This
improvement apparently happened two years ago with this patch
2013-05-09  Sofiane Naci  <sofiane.naci@arm.com>
        * config/aarch64/aarch64.md: New movtf split.
        ...
so this comment is no longer relevant, and we should handle 0 now.
The patch deletes the out of date comment and moves the 0 check before
the TFmode check so that TFmode 0 is accepted.

There are a few other changes needed to make this work well.  The
movtf expander needs to avoid forcing 0 to a reg for a mem dest, just
like the movti pattern already does.  The Ump/?rY alternative needs to
be split into two, as %H doesn't work for const_double 0, again this
is like the movti pattern.  The condition needs to allow 0 values in
operand 1, as is done in the movti pattern.

I noticed another related problem while making this change.  The
ldp/stp instructions in the movtf_aarch64 pattern have neon attribute
types.  However, these are integer instructions with matching 'r'
constraints and hence should be using load2/store2 attribute types,
just like in the movti pattern.

This was tested with a default languages make bootstrap and make check
on an APM system.

There are some similar problems with the movsf and movdf patterns.  I
plan to submit a patch to fix them after this one is accepted.

Jim

2015-06-03  Jim Wilson  <jim.wilson@linaro.org>

	* config/aarch64/aarch64.c (aarch64_valid_floating_const): Move
	aarch64_float_const_zero_rtx_p check before TFmode check.
	* config/aarch64/aarch64.md (movtf): Don't call force_reg if op1 is
	an fp zero.
	(movtf_aarch64): Separate ?rY alternative into two.  Adjust assembly
	code and attributes to match.  Change condition from register_operand
	to aarch64_reg_or_fp_zero for op1.  Change type for ldp from
	neon_load1_2reg to load2.  Change type for stp from neon_store1_2reg
	to store2.

Index: config/aarch64/aarch64.c
===================================================================
--- config/aarch64/aarch64.c	(revision 224054)
+++ config/aarch64/aarch64.c	(working copy)
@@ -7430,16 +7430,13 @@ aarch64_valid_floating_const (machine_mo
   if (!CONST_DOUBLE_P (x))
     return false;
 
-  /* TODO: We could handle moving 0.0 to a TFmode register,
-     but first we would like to refactor the movtf_aarch64
-     to be more amicable to split moves properly and
-     correctly gate on TARGET_SIMD.  For now - reject all
-     constants which are not to SFmode or DFmode registers.  */
+  if (aarch64_float_const_zero_rtx_p (x))
+    return true;
+
+  /* We only handle moving 0.0 to a TFmode register.  */
   if (!(mode == SFmode || mode == DFmode))
     return false;
 
-  if (aarch64_float_const_zero_rtx_p (x))
-    return true;
   return aarch64_float_const_representable_p (x);
 }
 
Index: config/aarch64/aarch64.md
===================================================================
--- config/aarch64/aarch64.md	(revision 224053)
+++ config/aarch64/aarch64.md	(working copy)
@@ -1040,18 +1040,20 @@ (define_expand "movtf"
 	FAIL;
      }
 
-    if (GET_CODE (operands[0]) == MEM)
+    if (GET_CODE (operands[0]) == MEM
+        && ! (GET_CODE (operands[1]) == CONST_DOUBLE
+	      && aarch64_float_const_zero_rtx_p (operands[1])))
       operands[1] = force_reg (TFmode, operands[1]);
   "
 )
 
 (define_insn "*movtf_aarch64"
   [(set (match_operand:TF 0
-	 "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r ,Ump")
+	 "nonimmediate_operand" "=w,?&r,w ,?r,w,?w,w,m,?r ,Ump,Ump")
 	(match_operand:TF 1
-	 "general_operand"      " w,?r, ?r,w ,Y,Y ,m,w,Ump,?rY"))]
+	 "general_operand"      " w,?r, ?r,w ,Y,Y ,m,w,Ump,?r ,Y"))]
   "TARGET_FLOAT && (register_operand (operands[0], TFmode)
-    || register_operand (operands[1], TFmode))"
+    || aarch64_reg_or_fp_zero (operands[1], TFmode))"
   "@
    orr\\t%0.16b, %1.16b, %1.16b
    #
@@ -1062,12 +1064,13 @@ (define_insn "*movtf_aarch64"
    ldr\\t%q0, %1
    str\\t%q1, %0
    ldp\\t%0, %H0, %1
-   stp\\t%1, %H1, %0"
+   stp\\t%1, %H1, %0
+   stp\\txzr, xzr, %0"
   [(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,fconstd,fconstd,\
-                     f_loadd,f_stored,neon_load1_2reg,neon_store1_2reg")
-   (set_attr "length" "4,8,8,8,4,4,4,4,4,4")
-   (set_attr "fp" "*,*,yes,yes,*,yes,yes,yes,*,*")
-   (set_attr "simd" "yes,*,*,*,yes,*,*,*,*,*")]
+                     f_loadd,f_stored,load2,store2,store2")
+   (set_attr "length" "4,8,8,8,4,4,4,4,4,4,4")
+   (set_attr "fp" "*,*,yes,yes,*,yes,yes,yes,*,*,*")
+   (set_attr "simd" "yes,*,*,*,yes,*,*,*,*,*,*")]
 )
 
 (define_split