[AArch64] Fix sve/extract_[12].c for big-endian SVE

Message ID	87d11wpx0r.fsf@linaro.org
State	Accepted
Commit	8711e791deaf97590d68ee82ff7a0b81d54e944d
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of gcc-patches-return-472099-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=NeXzIwggXXaxeDJODQWZd38tlGwxykPEVUKbwTR4mbVZ5FBSge m0gP6ZB/GG7wQI2cxpNdAePRlHqbXw2c0uW1woXBhTAgzEG8ndl1UaGzUabfZ7WU ENjMHnBsNRUpdopKZxyZ7qAgo9hUiLq6sHgU/5DSq6gql++nV4+LyNqpQ= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Richard Sandiford <richard.sandiford@linaro.org> To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com, james.greenhalgh@arm.com, marcus.shawcroft@arm.com, richard.sandiford@linaro.org Cc: richard.earnshaw@arm.com, james.greenhalgh@arm.com, marcus.shawcroft@arm.com Subject: [AArch64] Fix sve/extract_[12].c for big-endian SVE Date: Fri, 26 Jan 2018 13:31:00 +0000 Message-ID: <87d11wpx0r.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain
Series	[AArch64] Fix sve/extract_[12].c for big-endian SVE \| expand [AArch64] Fix sve/extract_[12].c for big-endian SVE

Message ID

87d11wpx0r.fsf@linaro.org

State

Accepted

Commit

8711e791deaf97590d68ee82ff7a0b81d54e944d

Headers

Received-SPF: pass (google.com: domain of
	gcc-patches-return-472099-patch=linaro.org@gcc.gnu.org
	designates 209.132.180.131 as permitted sender)
	client-ip=209.132.180.131; 
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:mime-version:content-type; q=dns;
	s=default; b=NeXzIwggXXaxeDJODQWZd38tlGwxykPEVUKbwTR4mbVZ5FBSge
	m0gP6ZB/GG7wQI2cxpNdAePRlHqbXw2c0uW1woXBhTAgzEG8ndl1UaGzUabfZ7WU
	ENjMHnBsNRUpdopKZxyZ7qAgo9hUiLq6sHgU/5DSq6gql++nV4+LyNqpQ=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: Richard Sandiford <richard.sandiford@linaro.org>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com,
	james.greenhalgh@arm.com, marcus.shawcroft@arm.com,
	richard.sandiford@linaro.org
Cc: richard.earnshaw@arm.com, james.greenhalgh@arm.com,
	marcus.shawcroft@arm.com
Subject: [AArch64] Fix sve/extract_[12].c for big-endian SVE
Date: Fri, 26 Jan 2018 13:31:00 +0000
Message-ID: <87d11wpx0r.fsf@linaro.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain

Series

[AArch64] Fix sve/extract_[12].c for big-endian SVE | expand

Commit Message

Richard Sandiford Jan. 26, 2018, 1:31 p.m. UTC

sve/extract_[12].c were relying on the target-independent optimisation
that removes a redundant vec_select, so that we don't end up with
things like:

    dup v0.4s, v0.4s[0]
    ...use s0...

But that optimisation rightly doesn't trigger for big-endian targets,
because GCC expects lane 0 to be in the high part of the register
rather than the low part.

SVE breaks this assumption -- see the comment at the head of
aarch64-sve.md for details -- so the optimisation is valid for
both endiannesses.  Long term, we probably need some kind of target
hook to make GCC aware of this.

But there's another problem with the current extract pattern: it doesn't
tell the register allocator how cheap an extraction of lane 0 is with
tied registers.  It seems better to split the lane 0 case out into
its own pattern and use tied operands for the FPR<-SIMD case,
so that using different registers has the cost of an extra reload.
I think we want this for both endiannesses, regardless of the hook
described above.

Also, the gen_lowpart in this pattern fails for aarch64_be due to
TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG
instead.  We're only creating this rtl in order to print it, so there's
no need for anything fancier.

Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

Richard


2018-01-26  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* config/aarch64/aarch64-sve.md (*vec_extract<mode><Vel>_0): New
	pattern.
	(*vec_extract<mode><Vel>_v128): Require a nonzero lane number.
	Use gen_rtx_REG rather than gen_lowpart.

Comments

Kyrill Tkachov Jan. 26, 2018, 1:59 p.m. UTC | #1

Hi Richard,

On 26/01/18 13:31, Richard Sandiford wrote:
> sve/extract_[12].c were relying on the target-independent optimisation

> that removes a redundant vec_select, so that we don't end up with

> things like:

>

>     dup v0.4s, v0.4s[0]

>     ...use s0...

>

> But that optimisation rightly doesn't trigger for big-endian targets,

> because GCC expects lane 0 to be in the high part of the register

> rather than the low part.

>

> SVE breaks this assumption -- see the comment at the head of

> aarch64-sve.md for details -- so the optimisation is valid for

> both endiannesses.  Long term, we probably need some kind of target

> hook to make GCC aware of this.

>

> But there's another problem with the current extract pattern: it doesn't

> tell the register allocator how cheap an extraction of lane 0 is with

> tied registers.  It seems better to split the lane 0 case out into

> its own pattern and use tied operands for the FPR<-SIMD case,

> so that using different registers has the cost of an extra reload.

> I think we want this for both endiannesses, regardless of the hook

> described above.

>

> Also, the gen_lowpart in this pattern fails for aarch64_be due to

> TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG

> instead.  We're only creating this rtl in order to print it, so there's

> no need for anything fancier.

>

> Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

>

> Richard

>

>

> 2018-01-26  Richard Sandiford <richard.sandiford@linaro.org>

>

> gcc/

>         * config/aarch64/aarch64-sve.md (*vec_extract<mode><Vel>_0): New

>         pattern.

>         (*vec_extract<mode><Vel>_v128): Require a nonzero lane number.

>         Use gen_rtx_REG rather than gen_lowpart.

>

> Index: gcc/config/aarch64/aarch64-sve.md

> ===================================================================

> --- gcc/config/aarch64/aarch64-sve.md   2018-01-13 18:01:51.232735405 +0000

> +++ gcc/config/aarch64/aarch64-sve.md   2018-01-26 13:26:50.176756711 +0000

> @@ -484,18 +484,52 @@ (define_expand "vec_extract<mode><Vel>"

>    }

>  )

>

> +;; Extract element zero.  This is a special case because we want to force

> +;; the registers to be the same for the second alternative, and then

> +;; split the instruction into nothing after RA.

> +(define_insn_and_split "*vec_extract<mode><Vel>_0"

> +  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")

> +       (vec_select:<VEL>

> +         (match_operand:SVE_ALL 1 "register_operand" "w, 0, w")

> +         (parallel [(const_int 0)])))]

> +  "TARGET_SVE"

> +  {

> +    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));

> +    switch (which_alternative)

> +      {

> +       case 0:

> +         return "umov\\t%<vwcore>0, %1.<Vetype>[0]";

> +       case 1:

> +         return "#";

> +       case 2:

> +         return "st1\\t{%1.<Vetype>}[0], %0";

> +       default:

> +         gcc_unreachable ();

> +      }

> +  }

> +  "&& reload_completed

> +   && REG_P (operands[1])

> +   && REGNO (operands[0]) == REGNO (operands[1])"


Is it guaranteed that operands[0] will be a REG rtx here?
The aarch64_simd_nonimmediate_operand predicate and Utv constraint might allow
a MEM, which would cause an error in an RTL-checking build.
If I'm not mistaken GCC may attempt the split even when matching alternative 2.

Kyrill

> +  [(const_int 0)]

> +  {

> +    emit_note (NOTE_INSN_DELETED);

> +    DONE;

> +  }

> +  [(set_attr "type" "neon_to_gp_q, untyped, neon_store1_one_lane_q")]

> +)

> +

>  ;; Extract an element from the Advanced SIMD portion of the register.

>  ;; We don't just reuse the aarch64-simd.md pattern because we don't

> -;; want any chnage in lane number on big-endian targets.

> +;; want any change in lane number on big-endian targets.

>  (define_insn "*vec_extract<mode><Vel>_v128"

>    [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")

>          (vec_select:<VEL>

>            (match_operand:SVE_ALL 1 "register_operand" "w, w, w")

>            (parallel [(match_operand:SI 2 "const_int_operand")])))]

>    "TARGET_SVE

> -   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 0, 15)"

> +   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 1, 15)"

>    {

> -    operands[1] = gen_lowpart (<V128>mode, operands[1]);

> +    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));

>      switch (which_alternative)

>        {

>          case 0:

Richard Sandiford Jan. 26, 2018, 3:15 p.m. UTC | #2

Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com> writes:
> On 26/01/18 13:31, Richard Sandiford wrote:

>> sve/extract_[12].c were relying on the target-independent optimisation

>> that removes a redundant vec_select, so that we don't end up with

>> things like:

>>

>>     dup v0.4s, v0.4s[0]

>>     ...use s0...

>>

>> But that optimisation rightly doesn't trigger for big-endian targets,

>> because GCC expects lane 0 to be in the high part of the register

>> rather than the low part.

>>

>> SVE breaks this assumption -- see the comment at the head of

>> aarch64-sve.md for details -- so the optimisation is valid for

>> both endiannesses.  Long term, we probably need some kind of target

>> hook to make GCC aware of this.

>>

>> But there's another problem with the current extract pattern: it doesn't

>> tell the register allocator how cheap an extraction of lane 0 is with

>> tied registers.  It seems better to split the lane 0 case out into

>> its own pattern and use tied operands for the FPR<-SIMD case,

>> so that using different registers has the cost of an extra reload.

>> I think we want this for both endiannesses, regardless of the hook

>> described above.

>>

>> Also, the gen_lowpart in this pattern fails for aarch64_be due to

>> TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG

>> instead.  We're only creating this rtl in order to print it, so there's

>> no need for anything fancier.

>>

>> Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

>>

>> Richard

>>

>>

>> 2018-01-26  Richard Sandiford <richard.sandiford@linaro.org>

>>

>> gcc/

>>         * config/aarch64/aarch64-sve.md (*vec_extract<mode><Vel>_0): New

>>         pattern.

>>         (*vec_extract<mode><Vel>_v128): Require a nonzero lane number.

>>         Use gen_rtx_REG rather than gen_lowpart.

>>

>> Index: gcc/config/aarch64/aarch64-sve.md

>> ===================================================================

>> --- gcc/config/aarch64/aarch64-sve.md   2018-01-13 18:01:51.232735405 +0000

>> +++ gcc/config/aarch64/aarch64-sve.md   2018-01-26 13:26:50.176756711 +0000

>> @@ -484,18 +484,52 @@ (define_expand "vec_extract<mode><Vel>"

>>    }

>>  )

>>

>> +;; Extract element zero.  This is a special case because we want to force

>> +;; the registers to be the same for the second alternative, and then

>> +;; split the instruction into nothing after RA.

>> +(define_insn_and_split "*vec_extract<mode><Vel>_0"

>> +  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")

>> +       (vec_select:<VEL>

>> +         (match_operand:SVE_ALL 1 "register_operand" "w, 0, w")

>> +         (parallel [(const_int 0)])))]

>> +  "TARGET_SVE"

>> +  {

>> +    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));

>> +    switch (which_alternative)

>> +      {

>> +       case 0:

>> +         return "umov\\t%<vwcore>0, %1.<Vetype>[0]";

>> +       case 1:

>> +         return "#";

>> +       case 2:

>> +         return "st1\\t{%1.<Vetype>}[0], %0";

>> +       default:

>> +         gcc_unreachable ();

>> +      }

>> +  }

>> +  "&& reload_completed

>> +   && REG_P (operands[1])

>> +   && REGNO (operands[0]) == REGNO (operands[1])"

>

> Is it guaranteed that operands[0] will be a REG rtx here?

> The aarch64_simd_nonimmediate_operand predicate and Utv constraint might allow

> a MEM, which would cause an error in an RTL-checking build.

> If I'm not mistaken GCC may attempt the split even when matching alternative 2


Bah, good catch.  The REG_P was checking the wrong operand.
Fixed version below.

Richard


2018-01-26  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* config/aarch64/aarch64-sve.md (*vec_extract<mode><Vel>_0): New
	pattern.
	(*vec_extract<mode><Vel>_v128): Require a nonzero lane number.

Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2018-01-26 15:14:36.016144657 +0000
+++ gcc/config/aarch64/aarch64-sve.md	2018-01-26 15:14:36.171138212 +0000
@@ -484,18 +484,52 @@ (define_expand "vec_extract<mode><Vel>"
   }
 )
 
+;; Extract element zero.  This is a special case because we want to force
+;; the registers to be the same for the second alternative, and then
+;; split the instruction into nothing after RA.
+(define_insn_and_split "*vec_extract<mode><Vel>_0"
+  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
+	(vec_select:<VEL>
+	  (match_operand:SVE_ALL 1 "register_operand" "w, 0, w")
+	  (parallel [(const_int 0)])))]
+  "TARGET_SVE"
+  {
+    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));
+    switch (which_alternative)
+      {
+	case 0:
+	  return "umov\\t%<vwcore>0, %1.<Vetype>[0]";
+	case 1:
+	  return "#";
+	case 2:
+	  return "st1\\t{%1.<Vetype>}[0], %0";
+	default:
+	  gcc_unreachable ();
+      }
+  }
+  "&& reload_completed
+   && REG_P (operands[0])
+   && REGNO (operands[0]) == REGNO (operands[1])"
+  [(const_int 0)]
+  {
+    emit_note (NOTE_INSN_DELETED);
+    DONE;
+  }
+  [(set_attr "type" "neon_to_gp_q, untyped, neon_store1_one_lane_q")]
+)
+
 ;; Extract an element from the Advanced SIMD portion of the register.
 ;; We don't just reuse the aarch64-simd.md pattern because we don't
-;; want any chnage in lane number on big-endian targets.
+;; want any change in lane number on big-endian targets.
 (define_insn "*vec_extract<mode><Vel>_v128"
   [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
 	(vec_select:<VEL>
 	  (match_operand:SVE_ALL 1 "register_operand" "w, w, w")
 	  (parallel [(match_operand:SI 2 "const_int_operand")])))]
   "TARGET_SVE
-   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 0, 15)"
+   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 1, 15)"
   {
-    operands[1] = gen_lowpart (<V128>mode, operands[1]);
+    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));
     switch (which_alternative)
       {
 	case 0:

James Greenhalgh Jan. 29, 2018, 10:48 a.m. UTC | #3

On Fri, Jan 26, 2018 at 03:15:58PM +0000, Richard Sandiford wrote:
> Kyrill  Tkachov <kyrylo.tkachov@foss.arm.com> writes:

> > On 26/01/18 13:31, Richard Sandiford wrote:

> >> sve/extract_[12].c were relying on the target-independent optimisation

> >> that removes a redundant vec_select, so that we don't end up with

> >> things like:

> >>

> >>     dup v0.4s, v0.4s[0]

> >>     ...use s0...

> >>

> >> But that optimisation rightly doesn't trigger for big-endian targets,

> >> because GCC expects lane 0 to be in the high part of the register

> >> rather than the low part.

> >>

> >> SVE breaks this assumption -- see the comment at the head of

> >> aarch64-sve.md for details -- so the optimisation is valid for

> >> both endiannesses.  Long term, we probably need some kind of target

> >> hook to make GCC aware of this.


This explanation is scary - it implies there might be more surprises
waiting for us.

> >>

> >> But there's another problem with the current extract pattern: it doesn't

> >> tell the register allocator how cheap an extraction of lane 0 is with

> >> tied registers.  It seems better to split the lane 0 case out into

> >> its own pattern and use tied operands for the FPR<-SIMD case,

> >> so that using different registers has the cost of an extra reload.

> >> I think we want this for both endiannesses, regardless of the hook

> >> described above.

> >>

> >> Also, the gen_lowpart in this pattern fails for aarch64_be due to

> >> TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG

> >> instead.  We're only creating this rtl in order to print it, so there's

> >> no need for anything fancier.

> >>

> >> Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?


OK.

Thanks,
James

Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2018-01-13 18:01:51.232735405 +0000
+++ gcc/config/aarch64/aarch64-sve.md	2018-01-26 13:26:50.176756711 +0000
@@ -484,18 +484,52 @@  (define_expand "vec_extract<mode><Vel>"
   }
 )
 
+;; Extract element zero.  This is a special case because we want to force
+;; the registers to be the same for the second alternative, and then
+;; split the instruction into nothing after RA.
+(define_insn_and_split "*vec_extract<mode><Vel>_0"
+  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
+	(vec_select:<VEL>
+	  (match_operand:SVE_ALL 1 "register_operand" "w, 0, w")
+	  (parallel [(const_int 0)])))]
+  "TARGET_SVE"
+  {
+    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));
+    switch (which_alternative)
+      {
+	case 0:
+	  return "umov\\t%<vwcore>0, %1.<Vetype>[0]";
+	case 1:
+	  return "#";
+	case 2:
+	  return "st1\\t{%1.<Vetype>}[0], %0";
+	default:
+	  gcc_unreachable ();
+      }
+  }
+  "&& reload_completed
+   && REG_P (operands[1])
+   && REGNO (operands[0]) == REGNO (operands[1])"
+  [(const_int 0)]
+  {
+    emit_note (NOTE_INSN_DELETED);
+    DONE;
+  }
+  [(set_attr "type" "neon_to_gp_q, untyped, neon_store1_one_lane_q")]
+)
+
 ;; Extract an element from the Advanced SIMD portion of the register.
 ;; We don't just reuse the aarch64-simd.md pattern because we don't
-;; want any chnage in lane number on big-endian targets.
+;; want any change in lane number on big-endian targets.
 (define_insn "*vec_extract<mode><Vel>_v128"
   [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
 	(vec_select:<VEL>
 	  (match_operand:SVE_ALL 1 "register_operand" "w, w, w")
 	  (parallel [(match_operand:SI 2 "const_int_operand")])))]
   "TARGET_SVE
-   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 0, 15)"
+   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (<VEL>mode), 1, 15)"
   {
-    operands[1] = gen_lowpart (<V128>mode, operands[1]);
+    operands[1] = gen_rtx_REG (<V128>mode, REGNO (operands[1]));
     switch (which_alternative)
       {
 	case 0:

[AArch64] Fix sve/extract_[12].c for big-endian SVE

Commit Message

Comments

Patch