diff mbox

[ARM] Fix testsuite testcase neon-vcond-[ltgt,unordered].c

Message ID 52696EFA.9060003@linaro.org
State Superseded
Headers show

Commit Message

Kugan Vivekanandarajah Oct. 24, 2013, 7:03 p.m. UTC
> I can't seem to get it to fail on my checkout of the linaro 4.8 branch.
> I tried both arm-none-eabi and arm-none-linux-gnueabihf. What kind of
> options/configuration are needed to reproduce this? Also, what kind of
> assembly is produced when the testcase fails? It'd be nice to make sure
> that the allocator doesn't end up doing something sub-optimal and
> unnecessarily moving stuff around to satisfy the alternative constraints
> that produce the other bit-select variants.
> 

Hi Kyrill,

It happens for armv5te arm-none-linux-gnueabi. --with-mode=arm
--with-arch=armv5te --with-float=soft

You can also find the logs here in
http://cbuild.validation.linaro.org/build/gcc-linaro-4.8-2013.10/logs/armv7l-precise-cbuild461-calxeda02_21_00_precise_armel-armv5r2/

I changed neon-vcond-gt.c too.

Thanks,
Kugan

2013-10-23  Kugan Vivekanandarajah  <kuganv@linaro.org>

	* gcc.target/arm/neon-vcond-gt.c: Scan for vbsl or vbit or vbif.
	* gcc.target/arm/neon-vcond-ltgt.c: Scan for vbsl or vbit or vbif.
	* gcc.target/arm/neon-vcond-unordered.c: Scan for vbsl or vbit or vbif.

Comments

Kyrylo Tkachov Oct. 25, 2013, 8:34 a.m. UTC | #1
On 24/10/13 20:03, Kugan wrote:
>
> Hi Kyrill,
>
> It happens for armv5te arm-none-linux-gnueabi. --with-mode=arm
> --with-arch=armv5te --with-float=soft

Ah ok, I can reproduce it now. So, while I agree that we add a scan for vbit and 
vbif to these testcases, there seems to be something dodgy going on with the 
register allocation.

With -march=armv5te I'm getting the following snippet of code in the ltgt case:

.L12:
         ldr     r4, [ip]
         ldr     r5, [ip, #4]
         ldr     r6, [ip, #8]
         ldr     r7, [ip, #12]
         vmov    d20, r4, r5  @ v4sf
         vmov    d21, r6, r7
         vcgt.f32        q8, q10, q9
         vcgt.f32        q10, q9, q10
         vorr    q8, q8, q10
         vmov    d22, r4, r5  @ v4sf
         vmov    d23, r6, r7
         vbit    q11, q9, q8
         vmov    r4, r5, d22  @ v4sf
         vmov    r6, r7, d23

The second vcgt.f32 trashes q10, then recreates it in q11 with:
vmov    d22, r4, r5  @ v4sf
vmov    d23, r6, r7

so it can do the vbit. Surely there's something better that can be done?

In contrast, with -march=armv7-a we get:

.L12:
         vld1.32 {q9}, [r4]!
         vcgt.f32        q8, q9, q10
         vcgt.f32        q11, q10, q9
         vorr    q8, q8, q11
         vbsl    q8, q10, q9
         vst1.32 {q8}, [lr]!



So, while I agree with the patch, there seems to be some funny business with the 
register allocation that could be worth looking into.

Thanks,
Kyrill

>
> You can also find the logs here in
> http://cbuild.validation.linaro.org/build/gcc-linaro-4.8-2013.10/logs/armv7l-precise-cbuild461-calxeda02_21_00_precise_armel-armv5r2/
>
> I changed neon-vcond-gt.c too.
>
> Thanks,
> Kugan
>
> 2013-10-23  Kugan Vivekanandarajah  <kuganv@linaro.org>
>
> 	* gcc.target/arm/neon-vcond-gt.c: Scan for vbsl or vbit or vbif.
> 	* gcc.target/arm/neon-vcond-ltgt.c: Scan for vbsl or vbit or vbif.
> 	* gcc.target/arm/neon-vcond-unordered.c: Scan for vbsl or vbit or vbif.
>
>
>
>
Kugan Vivekanandarajah Oct. 28, 2013, 11:07 p.m. UTC | #2
On 25/10/13 19:04, Kyrill Tkachov wrote:
> On 24/10/13 20:03, Kugan wrote:
>>
>> Hi Kyrill,
>>
>> It happens for armv5te arm-none-linux-gnueabi. --with-mode=arm
>> --with-arch=armv5te --with-float=soft
> 
> Ah ok, I can reproduce it now. So, while I agree that we add a scan for
> vbit and vbif to these testcases, there seems to be something dodgy
> going on with the register allocation.
> 
> With -march=armv5te I'm getting the following snippet of code in the
> ltgt case:
> 
> .L12:
>         ldr     r4, [ip]
>         ldr     r5, [ip, #4]
>         ldr     r6, [ip, #8]
>         ldr     r7, [ip, #12]
>         vmov    d20, r4, r5  @ v4sf
>         vmov    d21, r6, r7
>         vcgt.f32        q8, q10, q9
>         vcgt.f32        q10, q9, q10
>         vorr    q8, q8, q10
>         vmov    d22, r4, r5  @ v4sf
>         vmov    d23, r6, r7
>         vbit    q11, q9, q8
>         vmov    r4, r5, d22  @ v4sf
>         vmov    r6, r7, d23
> 
> The second vcgt.f32 trashes q10, then recreates it in q11 with:
> vmov    d22, r4, r5  @ v4sf
> vmov    d23, r6, r7
> 
> so it can do the vbit. Surely there's something better that can be done?
> 
> In contrast, with -march=armv7-a we get:
> 
> .L12:
>         vld1.32 {q9}, [r4]!
>         vcgt.f32        q8, q9, q10
>         vcgt.f32        q11, q10, q9
>         vorr    q8, q8, q11
>         vbsl    q8, q10, q9
>         vst1.32 {q8}, [lr]!
> 

This is because  of the unaligned access done for armv7-a. arm.c has the
following comment:

  /* Enable -munaligned-access by default for
     - all ARMv6 architecture-based processors
     - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
     - ARMv8 architecture-base processors.

     Disable -munaligned-access by default for
     - all pre-ARMv6 architecture-based processors
     - ARMv6-M architecture-based processors.  */

Please look at the rtl difference.
- is armv7-a
+ is armv5te

;; vect_var_.18_61 = MEM[(float *)vect_pw2.14_59];

-(insn 71 70 72 (set (reg:V4SF 192)
-        (unspec:V4SF [
-                (mem:V4SF (reg:SI 163 [ ivtmp.47 ]) [0 MEM[(float
*)vect_pw2.14_59]+0 S16 A32])
-            ] UNSPEC_MISALIGNED_ACCESS)) neon-vcond-ltgt.c:12 -1
+(insn 71 70 72 (clobber (reg:V4SF 168 [ vect_var_.18 ]))
neon-vcond-ltgt.c:12 -1
+     (nil))
+
+(insn 72 71 73 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 0)
+        (mem:SI (reg:SI 163 [ ivtmp.47 ]) [0 MEM[(float
*)vect_pw2.14_59]+0 S4 A32])) neon-vcond-ltgt.c:12 -1
+     (nil))
+
+(insn 73 72 74 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 4)
+        (mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ])
+                (const_int 4 [0x4])) [0 MEM[(float *)vect_pw2.14_59]+4
S4 A32])) neon-vcond-ltgt.c:12 -1
+     (nil))
+
+(insn 74 73 75 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 8)
+        (mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ])
+                (const_int 8 [0x8])) [0 MEM[(float *)vect_pw2.14_59]+8
S4 A32])) neon-vcond-ltgt.c:12 -1
      (nil))

-(insn 72 71 0 (set (reg:V4SF 168 [ vect_var_.18 ])
-        (reg:V4SF 192)) neon-vcond-ltgt.c:12 -1
+(insn 75 74 0 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 12)
+        (mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ])
+                (const_int 12 [0xc])) [0 MEM[(float
*)vect_pw2.14_59]+12 S4 A32])) neon-vcond-ltgt.c:12 -1
      (nil))
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.target/arm/neon-vcond-gt.c b/gcc/testsuite/gcc.target/arm/neon-vcond-gt.c
index 86ccf95..8e9f378 100644
--- a/gcc/testsuite/gcc.target/arm/neon-vcond-gt.c
+++ b/gcc/testsuite/gcc.target/arm/neon-vcond-gt.c
@@ -14,4 +14,4 @@  void foo (int ilast,float* w, float* w2)
 }
 
 /* { dg-final { scan-assembler "vcgt\\.f32\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
-/* { dg-final { scan-assembler "vbit\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
+/* { dg-final { scan-assembler "vbsl|vbit|vbif\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c b/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c
index acb23a9..c8306e3 100644
--- a/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c
+++ b/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c
@@ -15,4 +15,4 @@  void foo (int ilast,float* w, float* w2)
 
 /* { dg-final { scan-assembler-times "vcgt\\.f32\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" 2 } } */
 /* { dg-final { scan-assembler "vorr\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
-/* { dg-final { scan-assembler "vbsl\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
+/* { dg-final { scan-assembler "vbsl|vbit|vbif\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c b/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c
index c3e448d..3bb67d3 100644
--- a/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c
+++ b/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c
@@ -16,4 +16,4 @@  void foo (int ilast,float* w, float* w2)
 /* { dg-final { scan-assembler "vcgt\\.f32\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
 /* { dg-final { scan-assembler "vcge\\.f32\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
 /* { dg-final { scan-assembler "vorr\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
-/* { dg-final { scan-assembler "vbsl\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */
+/* { dg-final { scan-assembler "vbsl|vbit|vbif\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */