Message ID | 87tvv8ohct.fsf@linaro.org |
---|---|
State | Accepted |
Commit | 8179efe00e04285184112de7dbb977a75852197c |
Headers | show |
Series | [AArch64] Prefer LD1RQ for big-endian SVE | expand |
On Fri, Jan 26, 2018 at 01:54:42PM +0000, Richard Sandiford wrote: > This patch deals with cases in which a CONST_VECTOR contains a > repeating bit pattern that is wider than one element but narrower > than 128 bits. The current code: > > * treats the repeating pattern as a single element > * uses the associated LD1R to load and replicate it (such as LD1RD > for 64-bit patterns) > * uses a subreg to cast the result back to the original vector type > > The problem is that for big-endian targets, the final cast is > effectively a form of element reverse. E.g. say we're using LD1RD to load > 16-bit elements, with h being the high parts and l being the low parts: > > +-----+-----+-----+-----+-----+---- > lanes | 0 | 1 | 2 | 3 | 4 | ... > +-----+-----+-----+-----+-----+---- > memory bytes |h0 l0 h1 l1 h2 l2 h3 l3 h0 l0 .... > +---------------------------------- > V V V V V V V V > ----------+-----------------------+ > register .... | 0 | > after ----------+-----------------------+ lsb > LD1RD .... h3 l3 h0 l0 h1 l1 h2 l2 h3 l3| > ----------------------------------+ > > ----+-----+-----+-----+-----+-----+ > expected ... | 4 | 3 | 2 | 1 | 0 | > register ----+-----+-----+-----+-----+-----+ lsb > contents .... h0 l0 h3 l3 h2 l2 h1 l1 h0 l0| > ----------------------------------+ > > A later patch fixes the handling of general subregs to account > for this, but it means that we need to do a REV instruction > after the load. It seems better to use LD1RQ[BHW] on a 128-bit > pattern instead, since that gets the endianness right without > a separate fixup instruction. > > This is another step towards fixing sve/slp_* for aarch64_be. > > Tested on aarch64_be-elf and aarch64-linux-gnu. OK to install? OK. Thanks, James > > 2018-01-26 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > * config/aarch64/aarch64.c (aarch64_expand_sve_const_vector): Prefer > the TImode handling for big-endian targets. > > gcc/testsuite/ > * gcc.target/aarch64/sve/slp_2.c: Expect LD1RQ to be used instead > of LD1R[HWD] for multi-element constants on big-endian targets. > * gcc.target/aarch64/sve/slp_3.c: Likewise. > * gcc.target/aarch64/sve/slp_4.c: Likewise. > > Index: gcc/config/aarch64/aarch64.c > =================================================================== > --- gcc/config/aarch64/aarch64.c 2018-01-26 13:49:00.071630173 +0000 > +++ gcc/config/aarch64/aarch64.c 2018-01-26 13:51:19.665760175 +0000 > @@ -2824,10 +2824,18 @@ aarch64_expand_sve_const_vector (rtx des > /* The constant is a repeating seqeuence of at least two elements, > where the repeating elements occupy no more than 128 bits. > Get an integer representation of the replicated value. */ > - unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns; > - gcc_assert (int_bits <= 128); > - > - scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require (); > + scalar_int_mode int_mode; > + if (BYTES_BIG_ENDIAN) > + /* For now, always use LD1RQ to load the value on big-endian > + targets, since the handling of smaller integers includes a > + subreg that is semantically an element reverse. */ > + int_mode = TImode; > + else > + { > + unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns; > + gcc_assert (int_bits <= 128); > + int_mode = int_mode_for_size (int_bits, 0).require (); > + } > rtx int_value = simplify_gen_subreg (int_mode, src, mode, 0); > if (int_value > && aarch64_expand_sve_widened_duplicate (dest, int_mode, int_value)) > Index: gcc/testsuite/gcc.target/aarch64/sve/slp_2.c > =================================================================== > --- gcc/testsuite/gcc.target/aarch64/sve/slp_2.c 2018-01-26 13:49:00.071630173 +0000 > +++ gcc/testsuite/gcc.target/aarch64/sve/slp_2.c 2018-01-26 13:51:19.665760175 +0000 > @@ -29,9 +29,12 @@ #define TEST_ALL(T) \ > > TEST_ALL (VEC_PERM) > > -/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 } } */ > -/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 } } */ > -/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 } } */ > +/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 { target aarch64_little_endian } } } */ > +/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */ > +/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 { target aarch64_little_endian } } } */ > +/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 { target aarch64_big_endian } } } */ > +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 { target aarch64_little_endian } } } */ > +/* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 { target aarch64_big_endian } } } */ > /* { dg-final { scan-assembler-times {\tld1rqd\tz[0-9]+\.d, } 3 } } */ > /* { dg-final { scan-assembler-not {\tzip1\t} } } */ > /* { dg-final { scan-assembler-not {\tzip2\t} } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/slp_3.c > =================================================================== > --- gcc/testsuite/gcc.target/aarch64/sve/slp_3.c 2018-01-26 13:49:00.071630173 +0000 > +++ gcc/testsuite/gcc.target/aarch64/sve/slp_3.c 2018-01-26 13:51:19.665760175 +0000 > @@ -32,9 +32,12 @@ #define TEST_ALL(T) \ > TEST_ALL (VEC_PERM) > > /* 1 for each 8-bit type. */ > -/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 } } */ > +/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 { target aarch64_little_endian } } } */ > +/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */ > /* 1 for each 16-bit type and 4 for double. */ > -/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 } } */ > +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 { target aarch64_little_endian } } } */ > +/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 { target aarch64_big_endian } } } */ > +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 4 { target aarch64_big_endian } } } */ > /* 1 for each 32-bit type. */ > /* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 } } */ > /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #41\n} 2 } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/slp_4.c > =================================================================== > --- gcc/testsuite/gcc.target/aarch64/sve/slp_4.c 2018-01-26 13:49:00.072630137 +0000 > +++ gcc/testsuite/gcc.target/aarch64/sve/slp_4.c 2018-01-26 13:51:19.665760175 +0000 > @@ -36,7 +36,9 @@ #define TEST_ALL(T) \ > TEST_ALL (VEC_PERM) > > /* 1 for each 8-bit type, 4 for each 32-bit type and 8 for double. */ > -/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 22 } } */ > +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 22 { target aarch64_little_endian } } } */ > +/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */ > +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 20 { target aarch64_big_endian } } } */ > /* 1 for each 16-bit type. */ > /* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]\.h, } 3 } } */ > /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #99\n} 2 } } */
Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2018-01-26 13:49:00.071630173 +0000 +++ gcc/config/aarch64/aarch64.c 2018-01-26 13:51:19.665760175 +0000 @@ -2824,10 +2824,18 @@ aarch64_expand_sve_const_vector (rtx des /* The constant is a repeating seqeuence of at least two elements, where the repeating elements occupy no more than 128 bits. Get an integer representation of the replicated value. */ - unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns; - gcc_assert (int_bits <= 128); - - scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require (); + scalar_int_mode int_mode; + if (BYTES_BIG_ENDIAN) + /* For now, always use LD1RQ to load the value on big-endian + targets, since the handling of smaller integers includes a + subreg that is semantically an element reverse. */ + int_mode = TImode; + else + { + unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns; + gcc_assert (int_bits <= 128); + int_mode = int_mode_for_size (int_bits, 0).require (); + } rtx int_value = simplify_gen_subreg (int_mode, src, mode, 0); if (int_value && aarch64_expand_sve_widened_duplicate (dest, int_mode, int_value)) Index: gcc/testsuite/gcc.target/aarch64/sve/slp_2.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve/slp_2.c 2018-01-26 13:49:00.071630173 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve/slp_2.c 2018-01-26 13:51:19.665760175 +0000 @@ -29,9 +29,12 @@ #define TEST_ALL(T) \ TEST_ALL (VEC_PERM) -/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 } } */ -/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 } } */ -/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 } } */ +/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 { target aarch64_little_endian } } } */ +/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */ +/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 { target aarch64_little_endian } } } */ +/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 { target aarch64_big_endian } } } */ +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 { target aarch64_little_endian } } } */ +/* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 { target aarch64_big_endian } } } */ /* { dg-final { scan-assembler-times {\tld1rqd\tz[0-9]+\.d, } 3 } } */ /* { dg-final { scan-assembler-not {\tzip1\t} } } */ /* { dg-final { scan-assembler-not {\tzip2\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/slp_3.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve/slp_3.c 2018-01-26 13:49:00.071630173 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve/slp_3.c 2018-01-26 13:51:19.665760175 +0000 @@ -32,9 +32,12 @@ #define TEST_ALL(T) \ TEST_ALL (VEC_PERM) /* 1 for each 8-bit type. */ -/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 } } */ +/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 2 { target aarch64_little_endian } } } */ +/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */ /* 1 for each 16-bit type and 4 for double. */ -/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 } } */ +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 { target aarch64_little_endian } } } */ +/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 { target aarch64_big_endian } } } */ +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 4 { target aarch64_big_endian } } } */ /* 1 for each 32-bit type. */ /* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 } } */ /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #41\n} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/slp_4.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve/slp_4.c 2018-01-26 13:49:00.072630137 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve/slp_4.c 2018-01-26 13:51:19.665760175 +0000 @@ -36,7 +36,9 @@ #define TEST_ALL(T) \ TEST_ALL (VEC_PERM) /* 1 for each 8-bit type, 4 for each 32-bit type and 8 for double. */ -/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 22 } } */ +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 22 { target aarch64_little_endian } } } */ +/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target aarch64_big_endian } } } */ +/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 20 { target aarch64_big_endian } } } */ /* 1 for each 16-bit type. */ /* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]\.h, } 3 } } */ /* { dg-final { scan-assembler-times {\tmov\tz[0-9]+\.d, #99\n} 2 } } */