[AArch64] Prefer LD1RQ for big-endian SVE

Message ID	87tvv8ohct.fsf@linaro.org
State	Accepted
Commit	8179efe00e04285184112de7dbb977a75852197c
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of gcc-patches-return-472106-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=D9QqGSjuwDfY8hpUdSiunISXaUCu0IM63lg2bvBREIXfjpAiXP Wwl6uaRUpMiMdIimp0xFBbCQtGbKHfPMrkcV8cVwzsEoEpEuIArJUwU75dtXdTcv A7AWQd6HFcPeu/BF+H/R24MS/X6x/HFbG1A1j9rTgmehzbc91miuVgeTc= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Richard Sandiford <richard.sandiford@linaro.org> To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com, james.greenhalgh@arm.com, marcus.shawcroft@arm.com, richard.sandiford@linaro.org Cc: richard.earnshaw@arm.com, james.greenhalgh@arm.com, marcus.shawcroft@arm.com Subject: [AArch64] Prefer LD1RQ for big-endian SVE Date: Fri, 26 Jan 2018 13:54:42 +0000 Message-ID: <87tvv8ohct.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain
Series	[AArch64] Prefer LD1RQ for big-endian SVE \| expand [AArch64] Prefer LD1RQ for big-endian SVE

Message ID

87tvv8ohct.fsf@linaro.org

State

Accepted

Commit

8179efe00e04285184112de7dbb977a75852197c

Headers

Received-SPF: pass (google.com: domain of
	gcc-patches-return-472106-patch=linaro.org@gcc.gnu.org
	designates 209.132.180.131 as permitted sender)
	client-ip=209.132.180.131; 
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:cc:subject:date:message-id:mime-version:content-type; q=dns;
	s=default; b=D9QqGSjuwDfY8hpUdSiunISXaUCu0IM63lg2bvBREIXfjpAiXP
	Wwl6uaRUpMiMdIimp0xFBbCQtGbKHfPMrkcV8cVwzsEoEpEuIArJUwU75dtXdTcv
	A7AWQd6HFcPeu/BF+H/R24MS/X6x/HFbG1A1j9rTgmehzbc91miuVgeTc=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: Richard Sandiford <richard.sandiford@linaro.org>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com,
	james.greenhalgh@arm.com, marcus.shawcroft@arm.com,
	richard.sandiford@linaro.org
Cc: richard.earnshaw@arm.com, james.greenhalgh@arm.com,
	marcus.shawcroft@arm.com
Subject: [AArch64] Prefer LD1RQ for big-endian SVE
Date: Fri, 26 Jan 2018 13:54:42 +0000
Message-ID: <87tvv8ohct.fsf@linaro.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain

Series

[AArch64] Prefer LD1RQ for big-endian SVE | expand

Commit Message

Richard Sandiford Jan. 26, 2018, 1:54 p.m. UTC

This patch deals with cases in which a CONST_VECTOR contains a
repeating bit pattern that is wider than one element but narrower
than 128 bits.  The current code:

* treats the repeating pattern as a single element
* uses the associated LD1R to load and replicate it (such as LD1RD
  for 64-bit patterns)
* uses a subreg to cast the result back to the original vector type

The problem is that for big-endian targets, the final cast is
effectively a form of element reverse.  E.g. say we're using LD1RD to load
16-bit elements, with h being the high parts and l being the low parts:

                               +-----+-----+-----+-----+-----+----
                         lanes |  0  |  1  |  2  |  3  |  4  | ...
                               +-----+-----+-----+-----+-----+----
     memory              bytes |h0 l0 h1 l1 h2 l2 h3 l3 h0 l0 ....
                               +----------------------------------
                                 V  V  V  V  V  V  V  V
                     ----------+-----------------------+
    register         ....      |           0           |
     after           ----------+-----------------------+  lsb
     LD1RD           .... h3 l3 h0 l0 h1 l1 h2 l2 h3 l3|
                     ----------------------------------+

                     ----+-----+-----+-----+-----+-----+
    expected         ... |  4  |  3  |  2  |  1  |  0  |
    register         ----+-----+-----+-----+-----+-----+  lsb
    contents         .... h0 l0 h3 l3 h2 l2 h1 l1 h0 l0|
                     ----------------------------------+

A later patch fixes the handling of general subregs to account
for this, but it means that we need to do a REV instruction
after the load.  It seems better to use LD1RQ[BHW] on a 128-bit
pattern instead, since that gets the endianness right without
a separate fixup instruction.

This is another step towards fixing sve/slp_* for aarch64_be.

Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

Richard


2018-01-26  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* config/aarch64/aarch64.c (aarch64_expand_sve_const_vector): Prefer
	the TImode handling for big-endian targets.

gcc/testsuite/
	* gcc.target/aarch64/sve/slp_2.c: Expect LD1RQ to be used instead
	of LD1R[HWD] for multi-element constants on big-endian targets.
	* gcc.target/aarch64/sve/slp_3.c: Likewise.
	* gcc.target/aarch64/sve/slp_4.c: Likewise.

Comments

James Greenhalgh Jan. 29, 2018, 10:10 a.m. UTC | #1

On Fri, Jan 26, 2018 at 01:54:42PM +0000, Richard Sandiford wrote:
> This patch deals with cases in which a CONST_VECTOR contains a

> repeating bit pattern that is wider than one element but narrower

> than 128 bits.  The current code:

> 

> * treats the repeating pattern as a single element

> * uses the associated LD1R to load and replicate it (such as LD1RD

>   for 64-bit patterns)

> * uses a subreg to cast the result back to the original vector type

> 

> The problem is that for big-endian targets, the final cast is

> effectively a form of element reverse.  E.g. say we're using LD1RD to load

> 16-bit elements, with h being the high parts and l being the low parts:

> 

>                                +-----+-----+-----+-----+-----+----

>                          lanes |  0  |  1  |  2  |  3  |  4  | ...

>                                +-----+-----+-----+-----+-----+----

>      memory              bytes |h0 l0 h1 l1 h2 l2 h3 l3 h0 l0 ....

>                                +----------------------------------

>                                  V  V  V  V  V  V  V  V

>                      ----------+-----------------------+

>     register         ....      |           0           |

>      after           ----------+-----------------------+  lsb

>      LD1RD           .... h3 l3 h0 l0 h1 l1 h2 l2 h3 l3|

>                      ----------------------------------+

> 

>                      ----+-----+-----+-----+-----+-----+

>     expected         ... |  4  |  3  |  2  |  1  |  0  |

>     register         ----+-----+-----+-----+-----+-----+  lsb

>     contents         .... h0 l0 h3 l3 h2 l2 h1 l1 h0 l0|

>                      ----------------------------------+

> 

> A later patch fixes the handling of general subregs to account

> for this, but it means that we need to do a REV instruction

> after the load.  It seems better to use LD1RQ[BHW] on a 128-bit

> pattern instead, since that gets the endianness right without

> a separate fixup instruction.

> 

> This is another step towards fixing sve/slp_* for aarch64_be.

> 

> Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?


OK.

Thanks,
James

> 

> 2018-01-26  Richard Sandiford  <richard.sandiford@linaro.org>

> 

> gcc/

> 	* config/aarch64/aarch64.c (aarch64_expand_sve_const_vector): Prefer

> 	the TImode handling for big-endian targets.

> 

> gcc/testsuite/

> 	* gcc.target/aarch64/sve/slp_2.c: Expect LD1RQ to be used instead

> 	of LD1R[HWD] for multi-element constants on big-endian targets.

> 	* gcc.target/aarch64/sve/slp_3.c: Likewise.

> 	* gcc.target/aarch64/sve/slp_4.c: Likewise.

> 

> Index: gcc/config/aarch64/aarch64.c

> ===================================================================

> --- gcc/config/aarch64/aarch64.c	2018-01-26 13:49:00.071630173 +0000

> +++ gcc/config/aarch64/aarch64.c	2018-01-26 13:51:19.665760175 +0000

> @@ -2824,10 +2824,18 @@ aarch64_expand_sve_const_vector (rtx des

>        /* The constant is a repeating seqeuence of at least two elements,

>  	 where the repeating elements occupy no more than 128 bits.

>  	 Get an integer representation of the replicated value.  */

> -      unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns;

> -      gcc_assert (int_bits <= 128);

> -

> -      scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();

> +      scalar_int_mode int_mode;

> +      if (BYTES_BIG_ENDIAN)

> +	/* For now, always use LD1RQ to load the value on big-endian

> +	   targets, since the handling of smaller integers includes a

> +	   subreg that is semantically an element reverse.  */

> +	int_mode = TImode;

> +      else

> +	{

> +	  unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns;

> +	  gcc_assert (int_bits <= 128);

> +	  int_mode = int_mode_for_size (int_bits, 0).require ();

> +	}

>        rtx int_value = simplify_gen_subreg (int_mode, src, mode, 0);

>        if (int_value

>  	  && aarch64_expand_sve_widened_duplicate (dest, int_mode, int_value))

> Index: gcc/testsuite/gcc.target/aarch64/sve/slp_2.c

> ===================================================================

> --- gcc/testsuite/gcc.target/aarch64/sve/slp_2.c	2018-01-26 13:49:00.071630173 +0000

> +++ gcc/testsuite/gcc.target/aarch64/sve/slp_2.c	2018-01-26 13:51:19.665760175 +0000

> @@ -29,9 +29,12 @@ #define TEST_ALL(T)				\

>  

>  TEST_ALL (VEC_PERM)

>  

> -/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 } } */

> -/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 } } */

> -/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 } } */

> +/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 { target aarch64_little_endian } } } */

> +/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */

> +/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 { target aarch64_little_endian } } } */

> +/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 { target aarch64_big_endian } } } */

> +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 { target aarch64_little_endian } } } */

> +/* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 { target aarch64_big_endian } } } */

>  /* { dg-final { scan-assembler-times {\tld1rqd\tz[0-9]+\.d, } 3 } } */

>  /* { dg-final { scan-assembler-not {\tzip1\t} } } */

>  /* { dg-final { scan-assembler-not {\tzip2\t} } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/slp_3.c

> ===================================================================

> --- gcc/testsuite/gcc.target/aarch64/sve/slp_3.c	2018-01-26 13:49:00.071630173 +0000

> +++ gcc/testsuite/gcc.target/aarch64/sve/slp_3.c	2018-01-26 13:51:19.665760175 +0000

> @@ -32,9 +32,12 @@ #define TEST_ALL(T)				\

>  TEST_ALL (VEC_PERM)

>  

>  /* 1 for each 8-bit type.  */

> -/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 } } */

> +/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 { target aarch64_little_endian } } } */

> +/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */

>  /* 1 for each 16-bit type and 4 for double.  */

> -/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 } } */

> +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 { target aarch64_little_endian } } } */

> +/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 { target aarch64_big_endian } } } */

> +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 4 { target aarch64_big_endian } } } */

>  /* 1 for each 32-bit type.  */

>  /* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 } } */

>  /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #41\n} 2 } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/slp_4.c

> ===================================================================

> --- gcc/testsuite/gcc.target/aarch64/sve/slp_4.c	2018-01-26 13:49:00.072630137 +0000

> +++ gcc/testsuite/gcc.target/aarch64/sve/slp_4.c	2018-01-26 13:51:19.665760175 +0000

> @@ -36,7 +36,9 @@ #define TEST_ALL(T)				\

>  TEST_ALL (VEC_PERM)

>  

>  /* 1 for each 8-bit type, 4 for each 32-bit type and 8 for double.  */

> -/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 22 } } */

> +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 22 { target aarch64_little_endian } } } */

> +/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */

> +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 20 { target aarch64_big_endian } } } */

>  /* 1 for each 16-bit type.  */

>  /* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]\.h, } 3 } } */

>  /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #99\n} 2 } } */

Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2018-01-26 13:49:00.071630173 +0000
+++ gcc/config/aarch64/aarch64.c	2018-01-26 13:51:19.665760175 +0000
@@ -2824,10 +2824,18 @@  aarch64_expand_sve_const_vector (rtx des
       /* The constant is a repeating seqeuence of at least two elements,
 	 where the repeating elements occupy no more than 128 bits.
 	 Get an integer representation of the replicated value.  */
-      unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns;
-      gcc_assert (int_bits <= 128);
-
-      scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
+      scalar_int_mode int_mode;
+      if (BYTES_BIG_ENDIAN)
+	/* For now, always use LD1RQ to load the value on big-endian
+	   targets, since the handling of smaller integers includes a
+	   subreg that is semantically an element reverse.  */
+	int_mode = TImode;
+      else
+	{
+	  unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns;
+	  gcc_assert (int_bits <= 128);
+	  int_mode = int_mode_for_size (int_bits, 0).require ();
+	}
       rtx int_value = simplify_gen_subreg (int_mode, src, mode, 0);
       if (int_value
 	  && aarch64_expand_sve_widened_duplicate (dest, int_mode, int_value))
Index: gcc/testsuite/gcc.target/aarch64/sve/slp_2.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/slp_2.c	2018-01-26 13:49:00.071630173 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/slp_2.c	2018-01-26 13:51:19.665760175 +0000
@@ -29,9 +29,12 @@  #define TEST_ALL(T)				\
 
 TEST_ALL (VEC_PERM)
 
-/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 } } */
-/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 } } */
-/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 } } */
+/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 { target aarch64_big_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 { target aarch64_big_endian } } } */
 /* { dg-final { scan-assembler-times {\tld1rqd\tz[0-9]+\.d, } 3 } } */
 /* { dg-final { scan-assembler-not {\tzip1\t} } } */
 /* { dg-final { scan-assembler-not {\tzip2\t} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/slp_3.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/slp_3.c	2018-01-26 13:49:00.071630173 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/slp_3.c	2018-01-26 13:51:19.665760175 +0000
@@ -32,9 +32,12 @@  #define TEST_ALL(T)				\
 TEST_ALL (VEC_PERM)
 
 /* 1 for each 8-bit type.  */
-/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 } } */
+/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */
 /* 1 for each 16-bit type and 4 for double.  */
-/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 } } */
+/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 { target aarch64_big_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 4 { target aarch64_big_endian } } } */
 /* 1 for each 32-bit type.  */
 /* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 } } */
 /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #41\n} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/slp_4.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/slp_4.c	2018-01-26 13:49:00.072630137 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/slp_4.c	2018-01-26 13:51:19.665760175 +0000
@@ -36,7 +36,9 @@  #define TEST_ALL(T)				\
 TEST_ALL (VEC_PERM)
 
 /* 1 for each 8-bit type, 4 for each 32-bit type and 8 for double.  */
-/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 22 } } */
+/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 22 { target aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 20 { target aarch64_big_endian } } } */
 /* 1 for each 16-bit type.  */
 /* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]\.h, } 3 } } */
 /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #99\n} 2 } } */

[AArch64] Prefer LD1RQ for big-endian SVE

Commit Message

Comments

Patch