diff mbox

[ARM] PR target/71270 fix neon_valid_immediate for big-endian

Message ID 586F858B.5050209@foss.arm.com
State New
Headers show

Commit Message

Kyrill Tkachov Jan. 6, 2017, 11:54 a.m. UTC
Hi all,

In this wrong-code issue the RTL tries to load a const_vector:
(const_vector:V8QI [
         (const_int 1 [0x1])
         (const_int 0 [0])
         (const_int 1 [0x1])
         (const_int 0 [0])
         (const_int 1 [0x1])
         (const_int 0 [0])
         (const_int 1 [0x1])
         (const_int 0 [0])
     ])

into a NEON register. The logic for that is in neon_valid_immediate which does a number of tricks
to decide what value and of what size to do a VMOV on to load the correct vector immediate into the register.
It goes wrong on big-endian. On both big and little-endian this outputs:
vmov.i16    d18, #0x1

This is wrong on big-endian. The vector layout has to be such as if loaded from memory.
I've tried various approaches of fixing neon_valid_immediate to generate the correct immediate but have been unsuccessful,
resulting in regressions in various parts of the testsuite or making a big mess of the function.

Given that armeb is not a target of major concern I believe the safest route at this stage is to only allow vector constants
that will obviously work on big-endian, that is the ones that are just a single element duplicated in all lanes.

This patch fixes the execution failures:
FAIL: gfortran.dg/intrinsic_pack_1.f90
FAIL: gfortran.dg/c_f_pointer_logical.f03
FAIL: gcc.dg/torture/pr60183.c
FAIL: gcc.dg/vect/pr51581-4.c

on armeb-none-eabi.

Ok for trunk?

Thanks,
Kyrill

2017-01-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR target/71270
     * config/arm/arm.c (neon_valid_immediate): Reject vector constants
     in big-endian mode when they are not a single duplicated value.

Comments

Kyrill Tkachov Jan. 18, 2017, 9:35 a.m. UTC | #1
Ping.
https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00381.html

Thanks,
Kyrill

On 06/01/17 11:54, Kyrill Tkachov wrote:
> Hi all,

>

> In this wrong-code issue the RTL tries to load a const_vector:

> (const_vector:V8QI [

>         (const_int 1 [0x1])

>         (const_int 0 [0])

>         (const_int 1 [0x1])

>         (const_int 0 [0])

>         (const_int 1 [0x1])

>         (const_int 0 [0])

>         (const_int 1 [0x1])

>         (const_int 0 [0])

>     ])

>

> into a NEON register. The logic for that is in neon_valid_immediate which does a number of tricks

> to decide what value and of what size to do a VMOV on to load the correct vector immediate into the register.

> It goes wrong on big-endian. On both big and little-endian this outputs:

> vmov.i16    d18, #0x1

>

> This is wrong on big-endian. The vector layout has to be such as if loaded from memory.

> I've tried various approaches of fixing neon_valid_immediate to generate the correct immediate but have been unsuccessful,

> resulting in regressions in various parts of the testsuite or making a big mess of the function.

>

> Given that armeb is not a target of major concern I believe the safest route at this stage is to only allow vector constants

> that will obviously work on big-endian, that is the ones that are just a single element duplicated in all lanes.

>

> This patch fixes the execution failures:

> FAIL: gfortran.dg/intrinsic_pack_1.f90

> FAIL: gfortran.dg/c_f_pointer_logical.f03

> FAIL: gcc.dg/torture/pr60183.c

> FAIL: gcc.dg/vect/pr51581-4.c

>

> on armeb-none-eabi.

>

> Ok for trunk?

>

> Thanks,

> Kyrill

>

> 2017-01-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

>

>     PR target/71270

>     * config/arm/arm.c (neon_valid_immediate): Reject vector constants

>     in big-endian mode when they are not a single duplicated value.
Richard Earnshaw (lists) Jan. 20, 2017, 2:33 p.m. UTC | #2
On 06/01/17 11:54, Kyrill Tkachov wrote:
> Hi all,

> 

> In this wrong-code issue the RTL tries to load a const_vector:

> (const_vector:V8QI [

>         (const_int 1 [0x1])

>         (const_int 0 [0])

>         (const_int 1 [0x1])

>         (const_int 0 [0])

>         (const_int 1 [0x1])

>         (const_int 0 [0])

>         (const_int 1 [0x1])

>         (const_int 0 [0])

>     ])

> 

> into a NEON register. The logic for that is in neon_valid_immediate

> which does a number of tricks

> to decide what value and of what size to do a VMOV on to load the

> correct vector immediate into the register.

> It goes wrong on big-endian. On both big and little-endian this outputs:

> vmov.i16    d18, #0x1

> 

> This is wrong on big-endian. The vector layout has to be such as if

> loaded from memory.

> I've tried various approaches of fixing neon_valid_immediate to generate

> the correct immediate but have been unsuccessful,

> resulting in regressions in various parts of the testsuite or making a

> big mess of the function.

> 

> Given that armeb is not a target of major concern I believe the safest

> route at this stage is to only allow vector constants

> that will obviously work on big-endian, that is the ones that are just a

> single element duplicated in all lanes.

> 

> This patch fixes the execution failures:

> FAIL: gfortran.dg/intrinsic_pack_1.f90

> FAIL: gfortran.dg/c_f_pointer_logical.f03

> FAIL: gcc.dg/torture/pr60183.c

> FAIL: gcc.dg/vect/pr51581-4.c

> 

> on armeb-none-eabi.

> 

> Ok for trunk?

> 

> Thanks,

> Kyrill

> 

> 2017-01-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

> 

>     PR target/71270

>     * config/arm/arm.c (neon_valid_immediate): Reject vector constants

>     in big-endian mode when they are not a single duplicated value.

> 


Ok, but if you plan to close the PR above, please create a new
'enhancement' PR to fix the missed optimization.

R.

> armeb-vec.patch

> 

> 

> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

> index 45e0a3cd11fa650f456f7382ffbbcd1c932b95eb..2beee8ebe94eeadd6fb1d5b119e3b474e1ab902a 100644

> --- a/gcc/config/arm/arm.c

> +++ b/gcc/config/arm/arm.c

> @@ -11542,6 +11542,12 @@ neon_valid_immediate (rtx op, machine_mode mode, int inverse,

>  	return 18;

>      }

>  

> +  /* The tricks done in the code below apply for little-endian vector layout.

> +     For big-endian vectors only allow vectors of the form { a, a, a..., a }.

> +     FIXME: Implement logic for big-endian vectors.  */

> +  if (BYTES_BIG_ENDIAN && vector && !const_vec_duplicate_p (op))

> +    return -1;

> +

>    /* Splat vector constant out into a byte vector.  */

>    for (i = 0; i < n_elts; i++)

>      {

>
Kyrill Tkachov Jan. 20, 2017, 2:41 p.m. UTC | #3
On 20/01/17 14:33, Richard Earnshaw (lists) wrote:
> On 06/01/17 11:54, Kyrill Tkachov wrote:

>> Hi all,

>>

>> In this wrong-code issue the RTL tries to load a const_vector:

>> (const_vector:V8QI [

>>          (const_int 1 [0x1])

>>          (const_int 0 [0])

>>          (const_int 1 [0x1])

>>          (const_int 0 [0])

>>          (const_int 1 [0x1])

>>          (const_int 0 [0])

>>          (const_int 1 [0x1])

>>          (const_int 0 [0])

>>      ])

>>

>> into a NEON register. The logic for that is in neon_valid_immediate

>> which does a number of tricks

>> to decide what value and of what size to do a VMOV on to load the

>> correct vector immediate into the register.

>> It goes wrong on big-endian. On both big and little-endian this outputs:

>> vmov.i16    d18, #0x1

>>

>> This is wrong on big-endian. The vector layout has to be such as if

>> loaded from memory.

>> I've tried various approaches of fixing neon_valid_immediate to generate

>> the correct immediate but have been unsuccessful,

>> resulting in regressions in various parts of the testsuite or making a

>> big mess of the function.

>>

>> Given that armeb is not a target of major concern I believe the safest

>> route at this stage is to only allow vector constants

>> that will obviously work on big-endian, that is the ones that are just a

>> single element duplicated in all lanes.

>>

>> This patch fixes the execution failures:

>> FAIL: gfortran.dg/intrinsic_pack_1.f90

>> FAIL: gfortran.dg/c_f_pointer_logical.f03

>> FAIL: gcc.dg/torture/pr60183.c

>> FAIL: gcc.dg/vect/pr51581-4.c

>>

>> on armeb-none-eabi.

>>

>> Ok for trunk?

>>

>> Thanks,

>> Kyrill

>>

>> 2017-01-06  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

>>

>>      PR target/71270

>>      * config/arm/arm.c (neon_valid_immediate): Reject vector constants

>>      in big-endian mode when they are not a single duplicated value.

>>

> Ok, but if you plan to close the PR above, please create a new

> 'enhancement' PR to fix the missed optimization.


Thanks, I've committed it as r244716 and opened PR 79166 to track
the missed optimisation.

Kyrill

> R.

>

>> armeb-vec.patch

>>

>>

>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

>> index 45e0a3cd11fa650f456f7382ffbbcd1c932b95eb..2beee8ebe94eeadd6fb1d5b119e3b474e1ab902a 100644

>> --- a/gcc/config/arm/arm.c

>> +++ b/gcc/config/arm/arm.c

>> @@ -11542,6 +11542,12 @@ neon_valid_immediate (rtx op, machine_mode mode, int inverse,

>>   	return 18;

>>       }

>>   

>> +  /* The tricks done in the code below apply for little-endian vector layout.

>> +     For big-endian vectors only allow vectors of the form { a, a, a..., a }.

>> +     FIXME: Implement logic for big-endian vectors.  */

>> +  if (BYTES_BIG_ENDIAN && vector && !const_vec_duplicate_p (op))

>> +    return -1;

>> +

>>     /* Splat vector constant out into a byte vector.  */

>>     for (i = 0; i < n_elts; i++)

>>       {

>>
diff mbox

Patch

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 45e0a3cd11fa650f456f7382ffbbcd1c932b95eb..2beee8ebe94eeadd6fb1d5b119e3b474e1ab902a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -11542,6 +11542,12 @@  neon_valid_immediate (rtx op, machine_mode mode, int inverse,
 	return 18;
     }
 
+  /* The tricks done in the code below apply for little-endian vector layout.
+     For big-endian vectors only allow vectors of the form { a, a, a..., a }.
+     FIXME: Implement logic for big-endian vectors.  */
+  if (BYTES_BIG_ENDIAN && vector && !const_vec_duplicate_p (op))
+    return -1;
+
   /* Splat vector constant out into a byte vector.  */
   for (i = 0; i < n_elts; i++)
     {