Message ID | 52696EFA.9060003@linaro.org |
---|---|
State | Superseded |
Headers | show |
On 24/10/13 20:03, Kugan wrote: > > Hi Kyrill, > > It happens for armv5te arm-none-linux-gnueabi. --with-mode=arm > --with-arch=armv5te --with-float=soft Ah ok, I can reproduce it now. So, while I agree that we add a scan for vbit and vbif to these testcases, there seems to be something dodgy going on with the register allocation. With -march=armv5te I'm getting the following snippet of code in the ltgt case: .L12: ldr r4, [ip] ldr r5, [ip, #4] ldr r6, [ip, #8] ldr r7, [ip, #12] vmov d20, r4, r5 @ v4sf vmov d21, r6, r7 vcgt.f32 q8, q10, q9 vcgt.f32 q10, q9, q10 vorr q8, q8, q10 vmov d22, r4, r5 @ v4sf vmov d23, r6, r7 vbit q11, q9, q8 vmov r4, r5, d22 @ v4sf vmov r6, r7, d23 The second vcgt.f32 trashes q10, then recreates it in q11 with: vmov d22, r4, r5 @ v4sf vmov d23, r6, r7 so it can do the vbit. Surely there's something better that can be done? In contrast, with -march=armv7-a we get: .L12: vld1.32 {q9}, [r4]! vcgt.f32 q8, q9, q10 vcgt.f32 q11, q10, q9 vorr q8, q8, q11 vbsl q8, q10, q9 vst1.32 {q8}, [lr]! So, while I agree with the patch, there seems to be some funny business with the register allocation that could be worth looking into. Thanks, Kyrill > > You can also find the logs here in > http://cbuild.validation.linaro.org/build/gcc-linaro-4.8-2013.10/logs/armv7l-precise-cbuild461-calxeda02_21_00_precise_armel-armv5r2/ > > I changed neon-vcond-gt.c too. > > Thanks, > Kugan > > 2013-10-23 Kugan Vivekanandarajah <kuganv@linaro.org> > > * gcc.target/arm/neon-vcond-gt.c: Scan for vbsl or vbit or vbif. > * gcc.target/arm/neon-vcond-ltgt.c: Scan for vbsl or vbit or vbif. > * gcc.target/arm/neon-vcond-unordered.c: Scan for vbsl or vbit or vbif. > > > >
On 25/10/13 19:04, Kyrill Tkachov wrote: > On 24/10/13 20:03, Kugan wrote: >> >> Hi Kyrill, >> >> It happens for armv5te arm-none-linux-gnueabi. --with-mode=arm >> --with-arch=armv5te --with-float=soft > > Ah ok, I can reproduce it now. So, while I agree that we add a scan for > vbit and vbif to these testcases, there seems to be something dodgy > going on with the register allocation. > > With -march=armv5te I'm getting the following snippet of code in the > ltgt case: > > .L12: > ldr r4, [ip] > ldr r5, [ip, #4] > ldr r6, [ip, #8] > ldr r7, [ip, #12] > vmov d20, r4, r5 @ v4sf > vmov d21, r6, r7 > vcgt.f32 q8, q10, q9 > vcgt.f32 q10, q9, q10 > vorr q8, q8, q10 > vmov d22, r4, r5 @ v4sf > vmov d23, r6, r7 > vbit q11, q9, q8 > vmov r4, r5, d22 @ v4sf > vmov r6, r7, d23 > > The second vcgt.f32 trashes q10, then recreates it in q11 with: > vmov d22, r4, r5 @ v4sf > vmov d23, r6, r7 > > so it can do the vbit. Surely there's something better that can be done? > > In contrast, with -march=armv7-a we get: > > .L12: > vld1.32 {q9}, [r4]! > vcgt.f32 q8, q9, q10 > vcgt.f32 q11, q10, q9 > vorr q8, q8, q11 > vbsl q8, q10, q9 > vst1.32 {q8}, [lr]! > This is because of the unaligned access done for armv7-a. arm.c has the following comment: /* Enable -munaligned-access by default for - all ARMv6 architecture-based processors - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors. - ARMv8 architecture-base processors. Disable -munaligned-access by default for - all pre-ARMv6 architecture-based processors - ARMv6-M architecture-based processors. */ Please look at the rtl difference. - is armv7-a + is armv5te ;; vect_var_.18_61 = MEM[(float *)vect_pw2.14_59]; -(insn 71 70 72 (set (reg:V4SF 192) - (unspec:V4SF [ - (mem:V4SF (reg:SI 163 [ ivtmp.47 ]) [0 MEM[(float *)vect_pw2.14_59]+0 S16 A32]) - ] UNSPEC_MISALIGNED_ACCESS)) neon-vcond-ltgt.c:12 -1 +(insn 71 70 72 (clobber (reg:V4SF 168 [ vect_var_.18 ])) neon-vcond-ltgt.c:12 -1 + (nil)) + +(insn 72 71 73 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 0) + (mem:SI (reg:SI 163 [ ivtmp.47 ]) [0 MEM[(float *)vect_pw2.14_59]+0 S4 A32])) neon-vcond-ltgt.c:12 -1 + (nil)) + +(insn 73 72 74 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 4) + (mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ]) + (const_int 4 [0x4])) [0 MEM[(float *)vect_pw2.14_59]+4 S4 A32])) neon-vcond-ltgt.c:12 -1 + (nil)) + +(insn 74 73 75 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 8) + (mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ]) + (const_int 8 [0x8])) [0 MEM[(float *)vect_pw2.14_59]+8 S4 A32])) neon-vcond-ltgt.c:12 -1 (nil)) -(insn 72 71 0 (set (reg:V4SF 168 [ vect_var_.18 ]) - (reg:V4SF 192)) neon-vcond-ltgt.c:12 -1 +(insn 75 74 0 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 12) + (mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ]) + (const_int 12 [0xc])) [0 MEM[(float *)vect_pw2.14_59]+12 S4 A32])) neon-vcond-ltgt.c:12 -1 (nil))
diff --git a/gcc/testsuite/gcc.target/arm/neon-vcond-gt.c b/gcc/testsuite/gcc.target/arm/neon-vcond-gt.c index 86ccf95..8e9f378 100644 --- a/gcc/testsuite/gcc.target/arm/neon-vcond-gt.c +++ b/gcc/testsuite/gcc.target/arm/neon-vcond-gt.c @@ -14,4 +14,4 @@ void foo (int ilast,float* w, float* w2) } /* { dg-final { scan-assembler "vcgt\\.f32\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vbit\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */ +/* { dg-final { scan-assembler "vbsl|vbit|vbif\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c b/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c index acb23a9..c8306e3 100644 --- a/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c +++ b/gcc/testsuite/gcc.target/arm/neon-vcond-ltgt.c @@ -15,4 +15,4 @@ void foo (int ilast,float* w, float* w2) /* { dg-final { scan-assembler-times "vcgt\\.f32\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" 2 } } */ /* { dg-final { scan-assembler "vorr\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vbsl\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */ +/* { dg-final { scan-assembler "vbsl|vbit|vbif\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c b/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c index c3e448d..3bb67d3 100644 --- a/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c +++ b/gcc/testsuite/gcc.target/arm/neon-vcond-unordered.c @@ -16,4 +16,4 @@ void foo (int ilast,float* w, float* w2) /* { dg-final { scan-assembler "vcgt\\.f32\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */ /* { dg-final { scan-assembler "vcge\\.f32\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */ /* { dg-final { scan-assembler "vorr\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */ -/* { dg-final { scan-assembler "vbsl\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */ +/* { dg-final { scan-assembler "vbsl|vbit|vbif\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+,\[\\t \]*q\[0-9\]+" } } */