From patchwork Tue Dec 11 11:27:32 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Richard Earnshaw \(lists\)" <richard.earnshaw@arm.com>
X-Patchwork-Id: 153449
Delivered-To: patch@linaro.org
Received: by 2002:a2e:299d:0:0:0:0:0 with SMTP id p29-v6csp499504ljp;
 Tue, 11 Dec 2018 03:27:51 -0800 (PST)
X-Google-Smtp-Source: AFSGD/XV4rQbj4HJ1bKQrkk4CvznPo+E/GgKK6YV4hVSu1CjaIE4bOB4DWbZt8lvknFRJQjaOFpe
X-Received: by 2002:a62:5b83:: with SMTP id
 p125mr16138530pfb.116.1544527671168; 
 Tue, 11 Dec 2018 03:27:51 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1544527671; cv=none;
 d=google.com; s=arc-20160816;
 b=e98FYwK4hGzjRRSl1KLxJ4/6nlKikWUsm5xrqiZuUJWkEG9bFJf9ttPoOuamx8Jn0d
 alChkEZM40g90/Nx14F382tqrtPiOS6xz7M29b9+j9X1lqCy4OKqvXL9lHz635Qt4UV5
 413cQqGU2nQ95gyMtWFL9SXV7FiXwDGIMRKb1iPrZFphz2gJk79aOBmC9dnGIwfJx2IE
 5FY/vXztz+g4M0ES+HkXibH2R/ddFdx52J4t1t+DugalCG2ZCjno9uC2wjTXs4BiPGu6
 R9kV3Ivn9bz6crBO/k54j/hC3w8Nj8qZt9A3sXBOqtlrsq1Zeq2UAB++cUFLfg3436u9
 aowg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=mime-version:user-agent:date:message-id:openpgp:subject:from:to
 :delivered-to:sender:list-help:list-post:list-archive
 :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature
 :domainkey-signature;
 bh=b3sWYIgwAEfsHTSUPOWrWJSr0t++29xe3tFROcysu/I=;
 b=nowF8nSbieswfPic43dklvqP3oek1lUnmw6RL6Jz3xDvzBATQl9xQjMJLyGog9P2ry
 JXsIPJrh3q8ydA61c65piWu6wX3GoWDScA+31fxbz5cLpjaANK4/al/6uEJi1wU1r3Wn
 aRwL/0S9azTJRpJ+2hIOZlqadbZzFdEX6J0BqH/ciaX6mkbNKVzujZ8oHsNI8i6JPqDG
 rVV4os4mD33jdBgZNUSBHp2QUorr4yDIJwzpcOaHZ+zYFX+qvvaAnsZT5uYcsTBAyuUf
 Oiax1Hngc8ES1Vs9Wfn0KBy5s8z8F3/989YCqtRMBcW9Xt6uRV3asg7CCQU4cDwWUM27
 +Haw==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@gcc.gnu.org header.s=default header.b="Yi5/FdQ2"; 
 spf=pass (google.com: domain of
 gcc-patches-return-492085-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom="gcc-patches-return-492085-patch=linaro.org@gcc.gnu.org"
Return-Path: <gcc-patches-return-492085-patch=linaro.org@gcc.gnu.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 n4si12344557pgd.10.2018.12.11.03.27.50 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Tue, 11 Dec 2018 03:27:51 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-492085-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@gcc.gnu.org header.s=default header.b="Yi5/FdQ2"; 
 spf=pass (google.com: domain of
 gcc-patches-return-492085-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom="gcc-patches-return-492085-patch=linaro.org@gcc.gnu.org"
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender:to
 :from:subject:message-id:date:mime-version:content-type; q=dns;
 s=default; b=FqKrHH4GT2iJKqKNu76hUUzRdRYJ5VoWGoAjrI+QZs6uC/WhOQ
 madUgQtF7WtjnX829PO0BGEpy0YaxSJjqXe0+AtsbGz3aOND+SqURtA20/BlfWrO
 AWJuha3ra/u6lkgAgQtCCctWh+60ZUG88xcRyv2u7VU5LF0N01LWAlEVs=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender:to
 :from:subject:message-id:date:mime-version:content-type; s=
 default; bh=OdhM3fZPhLgvcAMfDJwJvxBnlBY=; b=Yi5/FdQ26pw3+HnZBV2x
 B0BnaiNE6xJTbaw19YvU0xKd0wjAtNjKOpt3kbU+g22lyHcMgFDJv5XJbUz6cH5L
 smWwEZLfPgKActWN92URhTRnFeAZfFLMpSlGgaoDqIyiOYda8KARLVRQBf3wJy9M
 M/ZalgzG4VXNQ7f3SENSHNI=
Received: (qmail 51993 invoked by alias); 11 Dec 2018 11:27:39 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-patch=linaro.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 51977 invoked by uid 89); 11 Dec 2018 11:27:38 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-26.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, 
 GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,
 SPF_PASS autolearn=ham version=3.3.2 spammy=operate,
 preferable, define_expand, sk:UNSPEC_
X-HELO: foss.arm.com
Received: from usa-sjc-mx-foss1.foss.arm.com (HELO foss.arm.com)
 (217.140.101.70) by sourceware.org
 (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
 Tue, 11 Dec 2018 11:27:36 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by
 usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id
 B63BC80D; Tue, 11 Dec 2018 03:27:34 -0800 (PST)
Received: from e120077-lin.cambridge.arm.com (e120077-lin.cambridge.arm.com
 [10.2.206.231])	by usa-sjc-imap-foss1.foss.arm.com (Postfix)
 with ESMTPSA id 3FEFA3F6A8; Tue, 11 Dec 2018 03:27:34 -0800 (PST)
To: gcc-patches <gcc-patches@gcc.gnu.org>
From: "Richard Earnshaw (lists)" <Richard.Earnshaw@arm.com>
Subject: [aarch64] PR target/87369 Prefer bsl/bit/bif for copysign
Openpgp: preference=signencrypt
Message-ID: <39893456-e4ec-1a0b-0bed-9255917bfa41@arm.com>
Date: Tue, 11 Dec 2018 11:27:32 +0000
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
 rv:60.0) Gecko/20100101 Thunderbird/60.2.1
MIME-Version: 1.0

The copysign operations will almost always be performed on values in
floating-point registers.  As such, we do not want the compiler to
simplify the operations into code sequences that can only be done using
the general-purpose register set.  Unfortunately, this is what is
currently happening.

Fortunately, it seems quite unlikely that copysign() will be
subsequently followed by other logical operations on the values
involved, so I think it is acceptable to use an unspec here.  This
allows us to preserve the operation in a form that allows the register
allocator to make the right choice later on, without limitation on the
final form of the operation (well, if we do end up using the gp register
bank, we get a dead constant load that we cannot easily eliminate at a
late stage).

	PR target/37369
	* config/aarch64/iterators.md (sizem1): Add sizes for SFmode and DFmode.
	(Vbtype): Add SFmode mapping.
	* config/aarch64/aarch64.md (copysigndf3, copysignsf3): Delete.
	(copysign<GPF:mode>3): New expand pattern.
	(copysign<GPF:mode>3_insn): New insn pattern.

Applied to trunk

R.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 82af4d47f78..6657316c5dd 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -222,6 +222,7 @@ (define_c_enum "unspec" [
     UNSPEC_FADDA
     UNSPEC_REV_SUBREG
     UNSPEC_SPECULATION_TRACKER
+    UNSPEC_COPYSIGN
 ])
 
 (define_c_enum "unspecv" [
@@ -5987,49 +5988,47 @@ (define_expand "lrint<GPF:mode><GPI:mode>2"
 ;;   LDR d2, #(1 << 63)
 ;;   BSL v2.8b, [y], [x]
 ;;
-;; or another, equivalent, sequence using one of BSL/BIT/BIF.
-;; aarch64_simd_bsldf will select the best suited of these instructions
-;; to generate based on register allocation, and knows how to partially
-;; constant fold based on the values of X and Y, so expand through that.
-
-(define_expand "copysigndf3"
-  [(match_operand:DF 0 "register_operand")
-   (match_operand:DF 1 "register_operand")
-   (match_operand:DF 2 "register_operand")]
+;; or another, equivalent, sequence using one of BSL/BIT/BIF.  Because
+;; we expect these operations to nearly always operate on
+;; floating-point values, we do not want the operation to be
+;; simplified into a bit-field insert operation that operates on the
+;; integer side, since typically that would involve three inter-bank
+;; register copies.  As we do not expect copysign to be followed by
+;; other logical operations on the result, it seems preferable to keep
+;; this as an unspec operation, rather than exposing the underlying
+;; logic to the compiler.
+
+(define_expand "copysign<GPF:mode>3"
+  [(match_operand:GPF 0 "register_operand")
+   (match_operand:GPF 1 "register_operand")
+   (match_operand:GPF 2 "register_operand")]
   "TARGET_FLOAT && TARGET_SIMD"
 {
-  rtx mask = gen_reg_rtx (DImode);
-  emit_move_insn (mask, GEN_INT (HOST_WIDE_INT_1U << 63));
-  emit_insn (gen_aarch64_simd_bsldf (operands[0], mask,
-				     operands[2], operands[1]));
+  rtx bitmask = gen_reg_rtx (<V_INT_EQUIV>mode);
+  emit_move_insn (bitmask, GEN_INT (HOST_WIDE_INT_M1U
+				    << (GET_MODE_BITSIZE (<MODE>mode) - 1)));
+  emit_insn (gen_copysign<mode>3_insn (operands[0], operands[1], operands[2],
+				       bitmask));
   DONE;
 }
 )
 
-;; As above, but we must first get to a 64-bit value if we wish to use
-;; aarch64_simd_bslv2sf.
-
-(define_expand "copysignsf3"
-  [(match_operand:SF 0 "register_operand")
-   (match_operand:SF 1 "register_operand")
-   (match_operand:SF 2 "register_operand")]
+(define_insn "copysign<GPF:mode>3_insn"
+  [(set (match_operand:GPF 0 "register_operand" "=w,w,w,r")
+	(unspec:GPF [(match_operand:GPF 1 "register_operand" "w,0,w,r")
+		     (match_operand:GPF 2 "register_operand" "w,w,0,0")
+		     (match_operand:<V_INT_EQUIV> 3 "register_operand" "0,w,w,X")]
+	 UNSPEC_COPYSIGN))]
   "TARGET_FLOAT && TARGET_SIMD"
-{
-  rtx v_bitmask = gen_reg_rtx (V2SImode);
-
-  /* Juggle modes to get us in to a vector mode for BSL.  */
-  rtx op1 = lowpart_subreg (DImode, operands[1], SFmode);
-  rtx op2 = lowpart_subreg (V2SFmode, operands[2], SFmode);
-  rtx tmp = gen_reg_rtx (V2SFmode);
-  emit_move_insn (v_bitmask,
-		  aarch64_simd_gen_const_vector_dup (V2SImode,
-						     HOST_WIDE_INT_M1U << 31));
-  emit_insn (gen_aarch64_simd_bslv2sf (tmp, v_bitmask, op2, op1));
-  emit_move_insn (operands[0], lowpart_subreg (SFmode, tmp, V2SFmode));
-  DONE;
-}
+  "@
+   bsl\\t%0.<Vbtype>, %2.<Vbtype>, %1.<Vbtype>
+   bit\\t%0.<Vbtype>, %2.<Vbtype>, %3.<Vbtype>
+   bif\\t%0.<Vbtype>, %1.<Vbtype>, %3.<Vbtype>
+   bfxil\\t%<w1>0, %<w1>1, #0, <sizem1>"
+  [(set_attr "type" "neon_bsl<q>,neon_bsl<q>,neon_bsl<q>,bfm")]
 )
 
+
 ;; For xorsign (x, y), we want to generate:
 ;;
 ;; LDR   d2, #1<<63
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index a80755734d6..ae75666167d 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -601,7 +601,8 @@ (define_mode_attr size [(QI "b") (HI "h") (SI "w")])
 (define_mode_attr sizen [(QI "8") (HI "16") (SI "32") (DI "64")])
 
 ;; Give the ordinal of the MSB in the mode
-(define_mode_attr sizem1 [(QI "#7") (HI "#15") (SI "#31") (DI "#63")])
+(define_mode_attr sizem1 [(QI "#7") (HI "#15") (SI "#31") (DI "#63")
+			  (HF "#15") (SF "#31") (DF "#63")])
 
 ;; Attribute to describe constants acceptable in logical operations
 (define_mode_attr lconst [(SI "K") (DI "L")])
@@ -687,7 +688,7 @@ (define_mode_attr Vbtype [(V8QI "8b")  (V16QI "16b")
 			  (V8HF "16b") (V2SF  "8b")
 			  (V4SF "16b") (V2DF  "16b")
 			  (DI   "8b")  (DF    "8b")
-			  (SI   "8b")])
+			  (SI   "8b")  (SF    "8b")])
 
 ;; Define element mode for each vector mode.
 (define_mode_attr VEL [(V8QI  "QI") (V16QI "QI") (VNx16QI "QI")