[AArch64] Simplify reduc_plus_scal_v2[sd]f sequence

Message ID	1463476002-513-1-git-send-email-james.greenhalgh@arm.com
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of gcc-patches-return-427436-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=gL/h+/GRpa2FNEDb7/Iyb5W/SPJHfVFM12D2QxcUS0iu0d8XZd H2aBoy7lMpCzK1LrWKRPF6TwMn3bY1tAx9JHGRWVJegGJPdR0ZmUDSQw4ltaTWyq plSz+RAm1i2pJW3tk70p4jTHdJshpXATbjN2876N54BWAZCrPt5EV050M= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: James Greenhalgh <james.greenhalgh@arm.com> To: <gcc-patches@gcc.gnu.org> CC: <nd@arm.com>, <marcus.shawcroft@arm.com>, <richard.earnshaw@arm.com> Subject: [Patch AArch64] Simplify reduc_plus_scal_v2[sd]f sequence Date: Tue, 17 May 2016 10:06:42 +0100 Message-ID: <1463476002-513-1-git-send-email-james.greenhalgh@arm.com> MIME-Version: 1.0 NoDisclaimer: True X-Microsoft-Exchange-Diagnostics: 1; AM2PR08MB0228; 20:pR1BsJ7IZXqghgvJq4ui6/Q5u5f+t975LLgQwP6WIq9auoSr+0qtWbsI6dZGvxO3PBZ1jgkOkXu+PsUAMptvSpl81p6ioHiyLfNUfZJ7QxXfVs57GIg/7kVP2Fzz4pSApR+hd8VERh4hLG8b0tA4vOBfP5Ipy7O/mFwfqCEpLyLbsuVi32jnTRHRVRcViOyk8n/Dypjb0b0lS+4rNgr5t/lA8gB8LbbxUDFeEbSUEnTMApODR4jwst16Uq2vDwHx; 4:Bfmf9CK4Jd/fKDp6shc9AqD3wDQit77ZoBbeiEyL/JlM26kon0sg9hNMeSgZOx1l6am682ekiIOjHeswkhuMhMAsmKgPEQsO8r1R1jV3S4oUyCg8Flo+6jPq6GDyzOLPPyVBnkcbnz6mePfjuEYZ+/DgUOTsK8UkFhxhrq2QOsm97Qt9oHh3PBVfcJCAhHE5XkXZdALzNR+BInL56NOmKkSFlbW0oK0cokrdeg0YUa7UN0S4BYfdr7SS36R8hii5B9pZFMNw/G2IL3Mz/SGsz/TyofAm/8Rsjj+0ygKhvIUzTkhw6T6lhmGkVlxBMLfGBOT8l6SyfYazGtg5DTw5eK6bY5Kfl6W3CUmEm5+t4WdDYIyu22yD8c9lbZI6/VYp31z6nfqEeoKSmQQzpVa/Ar9inISDnFc5htBXPkkXvfHTC0pzQCjS+BdZiFmEuBthyqWjLQ32MWMVXyjodKA10drUf03FSPRZkQCBHcWn3wMZAr5tMePKbeNAgM3LmuyQ SpamDiagnosticOutput: 1:23 SpamDiagnosticMetadata: NSPM Content-Type: multipart/mixed; boundary="------------2.6.4.2.gae996d8"

Message ID

1463476002-513-1-git-send-email-james.greenhalgh@arm.com

State

New

Headers

Received-SPF: pass (google.com: domain of
	gcc-patches-return-427436-patch=linaro.org@gcc.gnu.org
	designates 209.132.180.131 as permitted sender)
	client-ip=209.132.180.131; 
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:mime-version:content-type; q=dns;
	s=default; b=gL/h+/GRpa2FNEDb7/Iyb5W/SPJHfVFM12D2QxcUS0iu0d8XZd
	H2aBoy7lMpCzK1LrWKRPF6TwMn3bY1tAx9JHGRWVJegGJPdR0ZmUDSQw4ltaTWyq
	plSz+RAm1i2pJW3tk70p4jTHdJshpXATbjN2876N54BWAZCrPt5EV050M=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: James Greenhalgh <james.greenhalgh@arm.com>
To: <gcc-patches@gcc.gnu.org>
CC: <nd@arm.com>, <marcus.shawcroft@arm.com>, <richard.earnshaw@arm.com>
Subject: [Patch AArch64] Simplify reduc_plus_scal_v2[sd]f sequence
Date: Tue, 17 May 2016 10:06:42 +0100
Message-ID: <1463476002-513-1-git-send-email-james.greenhalgh@arm.com>
MIME-Version: 1.0
NoDisclaimer: True
SpamDiagnosticOutput: 1:23
SpamDiagnosticMetadata: NSPM
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 17 May 2016 09:07:04.6331
	(UTC)
X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d;
	Ip=[217.140.96.140]; Helo=[nebula.arm.com]
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM2PR08MB0228
X-MC-Unique: 6CQHzPRiRh6-IH0YWc_RUQ-1
Content-Type: multipart/mixed; boundary="------------2.6.4.2.gae996d8"
X-IsSubscribed: yes

Commit Message

James Greenhalgh May 17, 2016, 9:06 a.m. UTC

Hi,

This is just a simplification, it probably makes life easier for register
allocation in some corner cases and seems the right thing to do. We don't
use the internal version elsewhere, so we're safe to delete it and change
the types.

OK?

Bootstrapped on AArch64 with no issues.

Thanks,
James

---
2016-05-17  James Greenhalgh  <james.greenhalgh@arm.com>

	* config/aarch64/aarch64-simd.md
	(aarch64_reduc_plus_internal<mode>): Rename to...
	(reduc_plus_scal): ...This, and remove previous implementation.

Comments

Marcus Shawcroft May 17, 2016, 10:32 a.m. UTC | #1

On 17 May 2016 at 10:06, James Greenhalgh <james.greenhalgh@arm.com> wrote:
>

> Hi,

>

> This is just a simplification, it probably makes life easier for register

> allocation in some corner cases and seems the right thing to do. We don't

> use the internal version elsewhere, so we're safe to delete it and change

> the types.

>

> OK?

>

> Bootstrapped on AArch64 with no issues.


Help me understand why this is ok for BE ?

Cheers
/Marcus

James Greenhalgh May 17, 2016, 11:02 a.m. UTC | #2

On Tue, May 17, 2016 at 11:32:36AM +0100, Marcus Shawcroft wrote:
> On 17 May 2016 at 10:06, James Greenhalgh <james.greenhalgh@arm.com> wrote:

> >

> > Hi,

> >

> > This is just a simplification, it probably makes life easier for register

> > allocation in some corner cases and seems the right thing to do. We don't

> > use the internal version elsewhere, so we're safe to delete it and change

> > the types.

> >

> > OK?

> >

> > Bootstrapped on AArch64 with no issues.

> 

> Help me understand why this is ok for BE ?

The reduc_plus_scal_<mode> pattern wants to take a vector and return a scalar
value representing the sum of the lanes of that vector. We want to go
from V2DFmode to DFmode.

The architectural instruction FADDP writes to a scalar value in the low
bits of the register, leaving zeroes in the upper bits.

i.e.

	faddp  d0, v1.2d

128                 64                    0
 |    0x0            | v1.d[0] + v1.d[1]  |

In the current implementation, we use the
aarch64_reduc_plus_internal<mode> pattern, which treats the result of
FADDP as a vector of two elements. We then need an extra step to extract
the correct scalar value from that vector. From GCC's point of view the lane
containing the result is either lane 0 (little-endian) or lane 1
(big-endian), which is why the current code is endian dependent. The extract
operation will always be a NOP move from architectural bits 0-63 to
architectural bits 0-63 - but we never elide the move as future passes can't
be certain that the upper bits are zero (they come out of an UNSPEC so
could be anything).

However, this is all unneccesary. FADDP does exactly what we want,
regardless of endianness, we just need to model the instruction as writing
the scalar value in the first place. Which is what this patch wires up.

We probably just missed this optimization in the migration from the
reduc_splus optabs (which required a vector return value) to the
reduc_plus_scal optabs (which require a scalar return value).

Does that help?

Thanks,
James

Marcus Shawcroft May 17, 2016, 1:36 p.m. UTC | #3

On 17 May 2016 at 12:02, James Greenhalgh <james.greenhalgh@arm.com> wrote:
> On Tue, May 17, 2016 at 11:32:36AM +0100, Marcus Shawcroft wrote:

>> On 17 May 2016 at 10:06, James Greenhalgh <james.greenhalgh@arm.com> wrote:

>> >

>> > Hi,

>> >

>> > This is just a simplification, it probably makes life easier for register

>> > allocation in some corner cases and seems the right thing to do. We don't

>> > use the internal version elsewhere, so we're safe to delete it and change

>> > the types.

>> >

>> > OK?

>> >

>> > Bootstrapped on AArch64 with no issues.

>>

>> Help me understand why this is ok for BE ?

>

> The reduc_plus_scal_<mode> pattern wants to take a vector and return a scalar

> value representing the sum of the lanes of that vector. We want to go

> from V2DFmode to DFmode.

>

> The architectural instruction FADDP writes to a scalar value in the low

> bits of the register, leaving zeroes in the upper bits.

>

> i.e.

>

>         faddp  d0, v1.2d

>

> 128                 64                    0

>  |    0x0            | v1.d[0] + v1.d[1]  |

>

> In the current implementation, we use the

> aarch64_reduc_plus_internal<mode> pattern, which treats the result of

> FADDP as a vector of two elements. We then need an extra step to extract

> the correct scalar value from that vector. From GCC's point of view the lane

> containing the result is either lane 0 (little-endian) or lane 1

> (big-endian), which is why the current code is endian dependent. The extract

> operation will always be a NOP move from architectural bits 0-63 to

> architectural bits 0-63 - but we never elide the move as future passes can't

> be certain that the upper bits are zero (they come out of an UNSPEC so

> could be anything).

>

> However, this is all unneccesary. FADDP does exactly what we want,

> regardless of endianness, we just need to model the instruction as writing

> the scalar value in the first place. Which is what this patch wires up.

>

> We probably just missed this optimization in the migration from the

> reduc_splus optabs (which required a vector return value) to the

> reduc_plus_scal optabs (which require a scalar return value).

>

> Does that help?



Yep. Thanks. OK to commit. /Marcus

> Thanks,

> James

>

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index bd73bce..30023f0 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1989,19 +1989,6 @@ 
   }
 )
 
-(define_expand "reduc_plus_scal_<mode>"
-  [(match_operand:<VEL> 0 "register_operand" "=w")
-   (match_operand:V2F 1 "register_operand" "w")]
-  "TARGET_SIMD"
-  {
-    rtx elt = GEN_INT (ENDIAN_LANE_N (<MODE>mode, 0));
-    rtx scratch = gen_reg_rtx (<MODE>mode);
-    emit_insn (gen_aarch64_reduc_plus_internal<mode> (scratch, operands[1]));
-    emit_insn (gen_aarch64_get_lane<mode> (operands[0], scratch, elt));
-    DONE;
-  }
-)
-
 (define_insn "aarch64_reduc_plus_internal<mode>"
  [(set (match_operand:VDQV 0 "register_operand" "=w")
        (unspec:VDQV [(match_operand:VDQV 1 "register_operand" "w")]
@@ -2020,9 +2007,9 @@ 
   [(set_attr "type" "neon_reduc_add")]
 )
 
-(define_insn "aarch64_reduc_plus_internal<mode>"
- [(set (match_operand:V2F 0 "register_operand" "=w")
-       (unspec:V2F [(match_operand:V2F 1 "register_operand" "w")]
+(define_insn "reduc_plus_scal_<mode>"
+ [(set (match_operand:<VEL> 0 "register_operand" "=w")
+       (unspec:<VEL> [(match_operand:V2F 1 "register_operand" "w")]
 		   UNSPEC_FADDV))]
  "TARGET_SIMD"
  "faddp\\t%<Vetype>0, %1.<Vtype>"