[AArch64] Use all SVE LD1RQ variants

Message ID	87y3kkohjj.fsf@linaro.org
State	Accepted
Commit	947b137212d16d432eec201fe7f800dfdb481203
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of gcc-patches-return-472104-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=mjr1rFCpzF2GpPFV/mmzxsW/lCEtX2Uy8jZJ4GnRhCcE0hZdii 0BKGPfR3+WWpSL821aBoDwpum9LpcuSx1Dm2HncFjKbh3s2CTtizaMBkzLpXkkH6 SOLPPNADMyNyni4LR2D9pqrr/N35jROYIi/EHA5NIUhi+n6y9y8/v+tJQ= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Richard Sandiford <richard.sandiford@linaro.org> To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com, james.greenhalgh@arm.com, marcus.shawcroft@arm.com, richard.sandiford@linaro.org Cc: richard.earnshaw@arm.com, james.greenhalgh@arm.com, marcus.shawcroft@arm.com Subject: [AArch64] Use all SVE LD1RQ variants Date: Fri, 26 Jan 2018 13:50:40 +0000 Message-ID: <87y3kkohjj.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain
Series	[AArch64] Use all SVE LD1RQ variants \| expand [AArch64] Use all SVE LD1RQ variants

Message ID

87y3kkohjj.fsf@linaro.org

State

Accepted

Commit

947b137212d16d432eec201fe7f800dfdb481203

Headers

Received-SPF: pass (google.com: domain of
	gcc-patches-return-472104-patch=linaro.org@gcc.gnu.org
	designates 209.132.180.131 as permitted sender)
	client-ip=209.132.180.131; 
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:mime-version:content-type; q=dns;
	s=default; b=mjr1rFCpzF2GpPFV/mmzxsW/lCEtX2Uy8jZJ4GnRhCcE0hZdii
	0BKGPfR3+WWpSL821aBoDwpum9LpcuSx1Dm2HncFjKbh3s2CTtizaMBkzLpXkkH6
	SOLPPNADMyNyni4LR2D9pqrr/N35jROYIi/EHA5NIUhi+n6y9y8/v+tJQ=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: Richard Sandiford <richard.sandiford@linaro.org>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com,
	james.greenhalgh@arm.com, marcus.shawcroft@arm.com,
	richard.sandiford@linaro.org
Cc: richard.earnshaw@arm.com, james.greenhalgh@arm.com,
	marcus.shawcroft@arm.com
Subject: [AArch64] Use all SVE LD1RQ variants
Date: Fri, 26 Jan 2018 13:50:40 +0000
Message-ID: <87y3kkohjj.fsf@linaro.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain

Series

[AArch64] Use all SVE LD1RQ variants | expand

Commit Message

Richard Sandiford Jan. 26, 2018, 1:50 p.m. UTC

The fallback way of handling a repeated 128-bit constant vector for SVE
is to force the 128 bits to the constant pool and use LD1RQ to load it.
Previously the code always used the byte variant of LD1RQ (LD1RQB),
with a preceding BSWAP for big-endian targets.  However, that BSWAP
doesn't handle all cases correctly.

The simplest fix seemed to be to use the LD1RQ appropriate for the
element size.

This helps to fix some of the sve/slp_*.c tests for aarch64_be,
although a later patch is needed as well.

Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

Richard


2018-01-26  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* config/aarch64/aarch64-sve.md (sve_ld1rq): Replace with...
	(*sve_ld1rq<Vesize>): ... this new pattern.  Handle all element sizes,
	not just bytes.
	* config/aarch64/aarch64.c (aarch64_expand_sve_widened_duplicate):
	Remove BSWAP handing for big-endian targets and use the form of
	LD1RQ appropariate for the mode.

gcc/testsuite/
	* gcc.target/aarch64/sve/slp_2.c: Expect LD1RQD rather than LD1RQB.
	* gcc.target/aarch64/sve/slp_3.c: Expect LD1RQW rather than LD1RQB.
	* gcc.target/aarch64/sve/slp_4.c: Expect LD1RQH rather than LD1RQB.

Comments

James Greenhalgh Jan. 29, 2018, 10:46 a.m. UTC | #1

On Fri, Jan 26, 2018 at 01:50:40PM +0000, Richard Sandiford wrote:
> The fallback way of handling a repeated 128-bit constant vector for SVE

> is to force the 128 bits to the constant pool and use LD1RQ to load it.

> Previously the code always used the byte variant of LD1RQ (LD1RQB),

> with a preceding BSWAP for big-endian targets.  However, that BSWAP

> doesn't handle all cases correctly.

> 

> The simplest fix seemed to be to use the LD1RQ appropriate for the

> element size.

> 

> This helps to fix some of the sve/slp_*.c tests for aarch64_be,

> although a later patch is needed as well.

> 

> Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?


OK.

Thanks,
James

> 2018-01-26  Richard Sandiford  <richard.sandiford@linaro.org>

> 

> gcc/

> 	* config/aarch64/aarch64-sve.md (sve_ld1rq): Replace with...

> 	(*sve_ld1rq<Vesize>): ... this new pattern.  Handle all element sizes,

> 	not just bytes.

> 	* config/aarch64/aarch64.c (aarch64_expand_sve_widened_duplicate):

> 	Remove BSWAP handing for big-endian targets and use the form of

> 	LD1RQ appropariate for the mode.

> 

> gcc/testsuite/

> 	* gcc.target/aarch64/sve/slp_2.c: Expect LD1RQD rather than LD1RQB.

> 	* gcc.target/aarch64/sve/slp_3.c: Expect LD1RQW rather than LD1RQB.

> 	* gcc.target/aarch64/sve/slp_4.c: Expect LD1RQH rather than LD1RQB.

>

Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2018-01-26 13:26:50.176756711 +0000
+++ gcc/config/aarch64/aarch64-sve.md	2018-01-26 13:49:00.069630245 +0000
@@ -652,14 +652,14 @@  (define_insn "sve_ld1r<mode>"
 ;; Load 128 bits from memory and duplicate to fill a vector.  Since there
 ;; are so few operations on 128-bit "elements", we don't define a VNx1TI
 ;; and simply use vectors of bytes instead.
-(define_insn "sve_ld1rq"
-  [(set (match_operand:VNx16QI 0 "register_operand" "=w")
-	(unspec:VNx16QI
-	  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+(define_insn "*sve_ld1rq<Vesize>"
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+	(unspec:SVE_ALL
+	  [(match_operand:<VPRED> 1 "register_operand" "Upl")
 	   (match_operand:TI 2 "aarch64_sve_ld1r_operand" "Uty")]
 	  UNSPEC_LD1RQ))]
   "TARGET_SVE"
-  "ld1rqb\t%0.b, %1/z, %2"
+  "ld1rq<Vesize>\t%0.<Vetype>, %1/z, %2"
 )
 
 ;; Implement a predicate broadcast by shifting the low bit of the scalar
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2018-01-26 13:46:00.955822193 +0000
+++ gcc/config/aarch64/aarch64.c	2018-01-26 13:49:00.071630173 +0000
@@ -2787,16 +2787,7 @@  aarch64_expand_sve_widened_duplicate (rt
       return true;
     }
 
-  /* The bytes are loaded in little-endian order, so do a byteswap on
-     big-endian targets.  */
-  if (BYTES_BIG_ENDIAN)
-    {
-      src = simplify_unary_operation (BSWAP, src_mode, src, src_mode);
-      if (!src)
-	return NULL_RTX;
-    }
-
-  /* Use LD1RQ to load the 128 bits from memory.  */
+  /* Use LD1RQ[BHWD] to load the 128 bits from memory.  */
   src = force_const_mem (src_mode, src);
   if (!src)
     return false;
@@ -2808,8 +2799,12 @@  aarch64_expand_sve_widened_duplicate (rt
       src = replace_equiv_address (src, addr);
     }
 
-  rtx ptrue = force_reg (VNx16BImode, CONSTM1_RTX (VNx16BImode));
-  emit_insn (gen_sve_ld1rq (gen_lowpart (VNx16QImode, dest), ptrue, src));
+  machine_mode mode = GET_MODE (dest);
+  unsigned int elem_bytes = GET_MODE_UNIT_SIZE (mode);
+  machine_mode pred_mode = aarch64_sve_pred_mode (elem_bytes).require ();
+  rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+  src = gen_rtx_UNSPEC (mode, gen_rtvec (2, ptrue, src), UNSPEC_LD1RQ);
+  emit_insn (gen_rtx_SET (dest, src));
   return true;
 }
 
Index: gcc/testsuite/gcc.target/aarch64/sve/slp_2.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/slp_2.c	2018-01-13 17:58:43.651957575 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/slp_2.c	2018-01-26 13:49:00.071630173 +0000
@@ -32,7 +32,7 @@  TEST_ALL (VEC_PERM)
 /* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 } } */
 /* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 } } */
 /* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 } } */
-/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 3 } } */
+/* { dg-final { scan-assembler-times {\tld1rqd\tz[0-9]+\.d, } 3 } } */
 /* { dg-final { scan-assembler-not {\tzip1\t} } } */
 /* { dg-final { scan-assembler-not {\tzip2\t} } } */
 
Index: gcc/testsuite/gcc.target/aarch64/sve/slp_3.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/slp_3.c	2018-01-13 17:58:43.651957575 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/slp_3.c	2018-01-26 13:49:00.071630173 +0000
@@ -36,7 +36,7 @@  TEST_ALL (VEC_PERM)
 /* 1 for each 16-bit type and 4 for double.  */
 /* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 } } */
 /* 1 for each 32-bit type.  */
-/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 3 } } */
+/* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 } } */
 /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #41\n} 2 } } */
 /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #25\n} 2 } } */
 /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #31\n} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/slp_4.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/slp_4.c	2018-01-13 17:58:43.651957575 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/slp_4.c	2018-01-26 13:49:00.072630137 +0000
@@ -38,7 +38,7 @@  TEST_ALL (VEC_PERM)
 /* 1 for each 8-bit type, 4 for each 32-bit type and 8 for double.  */
 /* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 22 } } */
 /* 1 for each 16-bit type.  */
-/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]\.b, } 3 } } */
+/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]\.h, } 3 } } */
 /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #99\n} 2 } } */
 /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #11\n} 2 } } */
 /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #17\n} 2 } } */

[AArch64] Use all SVE LD1RQ variants

Commit Message

Comments

Patch