Message ID | 560D0567.40207@linaro.org |
---|---|
State | New |
Headers | show |
Hi Michael, On 01/10/15 11:05, Michael Collison wrote: > Kyrill, > > I have modified the patch to address your comments. I also modified > check_effective_target_vect_widen_sum_hi_to_si_pattern in > target-supports.exp to > indicate that arm neon supports vector widen sum of HImode to SImode. > This resolved > several test suite failures. > > Successfully tested on arm-none-eabi, arm-none-linux-gnueabihf. I have > four related execution failure > tests on armeb-non-linux-gnueabihf with -flto only. > > gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test > gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test We'd want to get to the bottom of these before committing. Does codegen before and after the patch show anything? When it comes to big-endian and NEON, the fiddly parts are usually lane numbers. Do you need to select the proper lanes with ENDIAN_LANE_N like Charles in his patch at: https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00656.html? Thanks, Kyrill > > I am debugging but have not tracked down the root cause yet. Feedback? > > 2015-07-22 Michael Collison <michael.collison@linaro.org> > > * config/arm/neon.md (widen_<us>sum<mode>): New patterns > where mode is VQI to improve mixed mode vectorization. > * config/arm/neon.md (vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3): New > define_insn to match low half of signed vaddw. > * config/arm/neon.md (vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3): New > define_insn to match high half of signed vaddw. > * config/arm/neon.md (vec_sel_widen_usum_lo<VQI:mode><VW:mode>3): New > define_insn to match low half of unsigned vaddw. > * config/arm/neon.md (vec_sel_widen_usum_hi<VQI:mode><VW:mode>3): New > define_insn to match high half of unsigned vaddw. > * testsuite/gcc.target/arm/neon-vaddws16.c: New test. > * testsuite/gcc.target/arm/neon-vaddws32.c: New test. > * testsuite/gcc.target/arm/neon-vaddwu16.c: New test. > * testsuite/gcc.target/arm/neon-vaddwu32.c: New test. > * testsuite/gcc.target/arm/neon-vaddwu8.c: New test. > * testsuite/lib/target-supports.exp > (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate > that arm neon support vector widen sum of HImode TO SImode. Note that the testsuite changes should have their own ChangeLog entry with the paths there starting relative to gcc/testsuite/ > > On 09/23/2015 01:49 AM, Kyrill Tkachov wrote: >> Hi Michael, >> >> On 23/09/15 00:52, Michael Collison wrote: >>> This is a modified version of the previous patch that removes the >>> documentation and read-md.c fixes. These patches have been submitted >>> separately and approved. >>> >>> This patch is designed to address code that was not being vectorized due >>> to missing widening patterns in the ARM backend. Code such as: >>> >>> int t6(int len, void * dummy, short * __restrict x) >>> { >>> len = len & ~31; >>> int result = 0; >>> __asm volatile (""); >>> for (int i = 0; i < len; i++) >>> result += x[i]; >>> return result; >>> } >>> >>> Validated on arm-none-eabi, arm-none-linux-gnueabi, >>> arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf. >>> >>> 2015-09-22 Michael Collison <michael.collison@linaro.org> >>> >>> * config/arm/neon.md (widen_<us>sum<mode>): New patterns >>> where mode is VQI to improve mixed mode add vectorization. >>> >> Please list all the new define_expands and define_insns >> in the changelog. Also, please add an ChangeLog entry for >> the testsuite additions. >> >> The approach looks ok to me with a few comments on some >> parts of the patch itself. >> >> >> +(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3" >> + [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w") >> + (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW >> (match_operand:VQI 1 "s_register_operand" "%w") >> + (match_operand:VQI 2 >> "vect_par_constant_high" ""))) >> + (match_operand:<VW:V_widen> 3 "s_register_operand" >> "0")))] >> + "TARGET_NEON" >> + "vaddw.<V_s_elem>\t%q0, %q3, %f1" >> + [(set_attr "type" "neon_add_widen") >> + (set_attr "length" "8")] >> +) >> >> >> This is a single instruction, and it has a length of 4, so no need to >> override the length attribute. >> Same with the other define_insns in this patch. >> >> >> diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c >> b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c >> new file mode 100644 >> index 0000000..ed10669 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c >> @@ -0,0 +1,21 @@ >> +/* { dg-do compile } */ >> +/* { dg-require-effective-target arm_neon_hw } */ >> >> The arm_neon_hw check is usually used when you want to run the tests. >> Since this is a compile-only tests you just need arm_neon_ok. >> >> +/* { dg-add-options arm_neon_ok } */ >> +/* { dg-options "-O3" } */ >> + >> + >> +int >> +t6(int len, void * dummy, short * __restrict x) >> +{ >> + len = len & ~31; >> + int result = 0; >> + __asm volatile (""); >> + for (int i = 0; i < len; i++) >> + result += x[i]; >> + return result; >> +} >> + >> +/* { dg-final { scan-assembler "vaddw\.s16" } } */ >> + >> + >> + >> >> Stray trailing newlines. Similar comments for the other testcases. >> >> Thanks, >> Kyrill >>
Hi Kyrill, Since your email I have done the following: 1. Added the ENDIAN_LANE_N to the define_expand patterns for big endian targets. The big endian patches produced no change in the test results. I still have several execution failures with targeting big endian with lto enabled. 2. I diff'd the rtl dumps from a big endian compiler with lto enabled and disabled. I also examined the assembly language and there no differences except for the .ascii directives. I want to ask a question about existing patterns in neon.md that utilize the vec_select and all the lanes as my example does: Why are the following pattern not matched if the target is big endian? (define_insn "neon_vec_unpack<US>_lo_<mode>" [(set (match_operand:<V_unpack> 0 "register_operand" "=w") (SE:<V_unpack> (vec_select:<V_HALF> (match_operand:VU 1 "register_operand" "w") (match_operand:VU 2 "vect_par_constant_low" ""))))] "TARGET_NEON && !BYTES_BIG_ENDIAN" "vmovl.<US><V_sz_elem> %q0, %e1" [(set_attr "type" "neon_shift_imm_long")] ) (define_insn "neon_vec_unpack<US>_hi_<mode>" [(set (match_operand:<V_unpack> 0 "register_operand" "=w") (SE:<V_unpack> (vec_select:<V_HALF> (match_operand:VU 1 "register_operand" "w") (match_operand:VU 2 "vect_par_constant_high" ""))))] "TARGET_NEON && !BYTES_BIG_ENDIAN" "vmovl.<US><V_sz_elem> %q0, %f1" [(set_attr "type" "neon_shift_imm_long")] These patterns are similar to the new patterns I am adding and I am wondering if my patterns should exclude BYTES_BIG_ENDIAN? On 10/08/2015 04:02 AM, Kyrill Tkachov wrote: > Hi Michael, > > On 01/10/15 11:05, Michael Collison wrote: >> Kyrill, >> >> I have modified the patch to address your comments. I also modified >> check_effective_target_vect_widen_sum_hi_to_si_pattern in >> target-supports.exp to >> indicate that arm neon supports vector widen sum of HImode to SImode. >> This resolved >> several test suite failures. >> >> Successfully tested on arm-none-eabi, arm-none-linux-gnueabihf. I have >> four related execution failure >> tests on armeb-non-linux-gnueabihf with -flto only. >> >> gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test >> gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test > > We'd want to get to the bottom of these before committing. > Does codegen before and after the patch show anything? > When it comes to big-endian and NEON, the fiddly parts are > usually lane numbers. Do you need to select the proper lanes with > ENDIAN_LANE_N like Charles in his patch at: > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00656.html? > > Thanks, > Kyrill > >> >> I am debugging but have not tracked down the root cause yet. Feedback? >> >> 2015-07-22 Michael Collison <michael.collison@linaro.org> >> >> * config/arm/neon.md (widen_<us>sum<mode>): New patterns >> where mode is VQI to improve mixed mode vectorization. >> * config/arm/neon.md >> (vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3): New >> define_insn to match low half of signed vaddw. >> * config/arm/neon.md >> (vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3): New >> define_insn to match high half of signed vaddw. >> * config/arm/neon.md >> (vec_sel_widen_usum_lo<VQI:mode><VW:mode>3): New >> define_insn to match low half of unsigned vaddw. >> * config/arm/neon.md >> (vec_sel_widen_usum_hi<VQI:mode><VW:mode>3): New >> define_insn to match high half of unsigned vaddw. >> * testsuite/gcc.target/arm/neon-vaddws16.c: New test. >> * testsuite/gcc.target/arm/neon-vaddws32.c: New test. >> * testsuite/gcc.target/arm/neon-vaddwu16.c: New test. >> * testsuite/gcc.target/arm/neon-vaddwu32.c: New test. >> * testsuite/gcc.target/arm/neon-vaddwu8.c: New test. >> * testsuite/lib/target-supports.exp >> (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate >> that arm neon support vector widen sum of HImode TO SImode. > > Note that the testsuite changes should have their own ChangeLog entry > with the paths there starting relative to gcc/testsuite/ > >> >> On 09/23/2015 01:49 AM, Kyrill Tkachov wrote: >>> Hi Michael, >>> >>> On 23/09/15 00:52, Michael Collison wrote: >>>> This is a modified version of the previous patch that removes the >>>> documentation and read-md.c fixes. These patches have been submitted >>>> separately and approved. >>>> >>>> This patch is designed to address code that was not being >>>> vectorized due >>>> to missing widening patterns in the ARM backend. Code such as: >>>> >>>> int t6(int len, void * dummy, short * __restrict x) >>>> { >>>> len = len & ~31; >>>> int result = 0; >>>> __asm volatile (""); >>>> for (int i = 0; i < len; i++) >>>> result += x[i]; >>>> return result; >>>> } >>>> >>>> Validated on arm-none-eabi, arm-none-linux-gnueabi, >>>> arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf. >>>> >>>> 2015-09-22 Michael Collison <michael.collison@linaro.org> >>>> >>>> * config/arm/neon.md (widen_<us>sum<mode>): New patterns >>>> where mode is VQI to improve mixed mode add vectorization. >>>> >>> Please list all the new define_expands and define_insns >>> in the changelog. Also, please add an ChangeLog entry for >>> the testsuite additions. >>> >>> The approach looks ok to me with a few comments on some >>> parts of the patch itself. >>> >>> >>> +(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3" >>> + [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w") >>> + (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW >>> (match_operand:VQI 1 "s_register_operand" "%w") >>> + (match_operand:VQI 2 >>> "vect_par_constant_high" ""))) >>> + (match_operand:<VW:V_widen> 3 "s_register_operand" >>> "0")))] >>> + "TARGET_NEON" >>> + "vaddw.<V_s_elem>\t%q0, %q3, %f1" >>> + [(set_attr "type" "neon_add_widen") >>> + (set_attr "length" "8")] >>> +) >>> >>> >>> This is a single instruction, and it has a length of 4, so no need to >>> override the length attribute. >>> Same with the other define_insns in this patch. >>> >>> >>> diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c >>> b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c >>> new file mode 100644 >>> index 0000000..ed10669 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c >>> @@ -0,0 +1,21 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-require-effective-target arm_neon_hw } */ >>> >>> The arm_neon_hw check is usually used when you want to run the tests. >>> Since this is a compile-only tests you just need arm_neon_ok. >>> >>> +/* { dg-add-options arm_neon_ok } */ >>> +/* { dg-options "-O3" } */ >>> + >>> + >>> +int >>> +t6(int len, void * dummy, short * __restrict x) >>> +{ >>> + len = len & ~31; >>> + int result = 0; >>> + __asm volatile (""); >>> + for (int i = 0; i < len; i++) >>> + result += x[i]; >>> + return result; >>> +} >>> + >>> +/* { dg-final { scan-assembler "vaddw\.s16" } } */ >>> + >>> + >>> + >>> >>> Stray trailing newlines. Similar comments for the other testcases. >>> >>> Thanks, >>> Kyrill >>> >
On 20 October 2015 at 08:54, Michael Collison <michael.collison@linaro.org> wrote: > I want to ask a question about existing patterns in neon.md that utilize the > vec_select and all the lanes as my example does: Why are the following > pattern not matched if the target is big endian? > (define_insn "neon_vec_unpack<US>_lo_<mode>" > [(set (match_operand:<V_unpack> 0 "register_operand" "=w") > (SE:<V_unpack> (vec_select:<V_HALF> > (match_operand:VU 1 "register_operand" "w") > (match_operand:VU 2 "vect_par_constant_low" ""))))] > "TARGET_NEON && !BYTES_BIG_ENDIAN" > "vmovl.<US><V_sz_elem> %q0, %e1" > [(set_attr "type" "neon_shift_imm_long")] > ) > > (define_insn "neon_vec_unpack<US>_hi_<mode>" > [(set (match_operand:<V_unpack> 0 "register_operand" "=w") > (SE:<V_unpack> (vec_select:<V_HALF> > (match_operand:VU 1 "register_operand" "w") > (match_operand:VU 2 "vect_par_constant_high" ""))))] > "TARGET_NEON && !BYTES_BIG_ENDIAN" > "vmovl.<US><V_sz_elem> %q0, %f1" > [(set_attr "type" "neon_shift_imm_long")] > > These patterns are similar to the new patterns I am adding and I am > wondering if my patterns should exclude BYTES_BIG_ENDIAN? These patterns use %e and %f to access the low and high part of the input operand - so %e is used to match the use of _lo in the pattern name, and vect_par_constant_low, and %f with _hi and vect_par_constant_high. For big-endian, the use of %e and %f would need to be swapped. Looking at the patch you posted last month (possibly not the latest version?): This is a pattern which is supposed to act on the low part of the input vector, hence _lo in the name: +(define_insn "vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3" + [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w") + (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.<V_s_elem>\t%q0, %q3, %e1" Here, using %e1 carries an implicit assumption that the low part of the input vector is in the lowest numbered of the pair of D registers, which is only true on little-endian. This is a bit ugly (and untested) but perhaps something like this would fix the problem { return BYTES_BIG_ENDIAN ? "vaddw.<V_s_elem>\t%q0, %q3, %f1" : "vaddw.<V_s_elem>\t%q0, %q3, %e1"; } + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) Similarly, here. Pattern is _hi, register is %f1: +(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3" + [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w") + (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.<V_s_elem>\t%q0, %q3, %f1" + [(set_attr "type" "neon_add_widen") + (set_attr "length" "8")] +) However, as far as I can see, there isn't an endianness dependency in widen_ssum<mode>3/widen_usum<mode>3 because both halves of the vector are used and added together. Hope this helps Charles
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index 654d9d5..b3485f1 100644 --- a/gcc/config/arm/neon.md +++ b/gcc/config/arm/neon.md @@ -1174,6 +1174,55 @@ ;; Widening operations +(define_expand "widen_ssum<mode>3" + [(set (match_operand:<V_double_width> 0 "s_register_operand" "") + (plus:<V_double_width> (sign_extend:<V_double_width> (match_operand:VQI 1 "s_register_operand" "")) + (match_operand:<V_double_width> 2 "s_register_operand" "")))] + "TARGET_NEON" + { + int i; + int half_elem = <V_mode_nunits>/2; + rtvec v1 = rtvec_alloc (half_elem); + rtvec v2 = rtvec_alloc (half_elem); + rtx p1, p2; + + for (i = 0; i < half_elem; i++) + RTVEC_ELT (v1, i) = GEN_INT (i); + p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1); + + for (i = half_elem; i < <V_mode_nunits>; i++) + RTVEC_ELT (v2, i - half_elem) = GEN_INT (i); + p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2); + + if (operands[0] != operands[2]) + emit_move_insn (operands[0], operands[2]); + + emit_insn (gen_vec_sel_widen_ssum_lo<mode><V_half>3 (operands[0], operands[1], p1, operands[0])); + emit_insn (gen_vec_sel_widen_ssum_hi<mode><V_half>3 (operands[0], operands[1], p2, operands[0])); + DONE; + } +) + +(define_insn "vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3" + [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w") + (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.<V_s_elem>\t%q0, %q3, %e1" + [(set_attr "type" "neon_add_widen")] +) + +(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3" + [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w") + (plus:<VW:V_widen> (sign_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.<V_s_elem>\t%q0, %q3, %f1" + [(set_attr "type" "neon_add_widen")] +) + (define_insn "widen_ssum<mode>3" [(set (match_operand:<V_widen> 0 "s_register_operand" "=w") (plus:<V_widen> (sign_extend:<V_widen> @@ -1184,6 +1233,55 @@ [(set_attr "type" "neon_add_widen")] ) +(define_expand "widen_usum<mode>3" + [(set (match_operand:<V_double_width> 0 "s_register_operand" "") + (plus:<V_double_width> (zero_extend:<V_double_width> (match_operand:VQI 1 "s_register_operand" "")) + (match_operand:<V_double_width> 2 "s_register_operand" "")))] + "TARGET_NEON" + { + int i; + int half_elem = <V_mode_nunits>/2; + rtvec v1 = rtvec_alloc (half_elem); + rtvec v2 = rtvec_alloc (half_elem); + rtx p1, p2; + + for (i = 0; i < half_elem; i++) + RTVEC_ELT (v1, i) = GEN_INT (i); + p1 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v1); + + for (i = half_elem; i < <V_mode_nunits>; i++) + RTVEC_ELT (v2, i - half_elem) = GEN_INT (i); + p2 = gen_rtx_PARALLEL (GET_MODE (operands[1]), v2); + + if (operands[0] != operands[2]) + emit_move_insn (operands[0], operands[2]); + + emit_insn (gen_vec_sel_widen_usum_lo<mode><V_half>3 (operands[0], operands[1], p1, operands[0])); + emit_insn (gen_vec_sel_widen_usum_hi<mode><V_half>3 (operands[0], operands[1], p2, operands[0])); + DONE; + } +) + +(define_insn "vec_sel_widen_usum_lo<VQI:mode><VW:mode>3" + [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w") + (plus:<VW:V_widen> (zero_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_low" ""))) + (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.<V_u_elem>\t%q0, %q3, %e1" + [(set_attr "type" "neon_add_widen")] +) + +(define_insn "vec_sel_widen_usum_hi<VQI:mode><VW:mode>3" + [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w") + (plus:<VW:V_widen> (zero_extend:<VW:V_widen> (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w") + (match_operand:VQI 2 "vect_par_constant_high" ""))) + (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))] + "TARGET_NEON" + "vaddw.<V_u_elem>\t%q0, %q3, %f1" + [(set_attr "type" "neon_add_widen")] +) + (define_insn "widen_usum<mode>3" [(set (match_operand:<V_widen> 0 "s_register_operand" "=w") (plus:<V_widen> (zero_extend:<V_widen> @@ -5347,7 +5445,7 @@ [(set (match_operand:<V_unpack> 0 "register_operand" "=w") (mult:<V_unpack> (SE:<V_unpack> (vec_select:<V_HALF> (match_operand:VU 1 "register_operand" "w") - (match_operand:VU 2 "vect_par_constant_low" ""))) + (match_operand:VU 2 "vect_par_constant_low" ""))) (SE:<V_unpack> (vec_select:<V_HALF> (match_operand:VU 3 "register_operand" "w") (match_dup 2)))))] diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c new file mode 100644 index 0000000..96c657e --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, short * __restrict x) +{ + len = len & ~31; + int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws32.c b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c new file mode 100644 index 0000000..1bfdc13 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, int * __restrict x) +{ + len = len & ~31; + long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.s32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c new file mode 100644 index 0000000..98f8768 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, unsigned short * __restrict x) +{ + len = len & ~31; + unsigned int result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw.u16" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c new file mode 100644 index 0000000..4a72a39 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + +int +t6(int len, void * dummy, unsigned int * __restrict x) +{ + len = len & ~31; + unsigned long long result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u32" } } */ diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c new file mode 100644 index 0000000..9c9c68a --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_neon_hw } */ +/* { dg-add-options arm_neon_ok } */ +/* { dg-options "-O3" } */ + + +int +t6(int len, void * dummy, char * __restrict x) +{ + len = len & ~31; + unsigned short result = 0; + __asm volatile (""); + for (int i = 0; i < len; i++) + result += x[i]; + return result; +} + +/* { dg-final { scan-assembler "vaddw\.u8" } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 1988301..5530edc 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -3838,6 +3838,7 @@ proc check_effective_target_vect_widen_sum_hi_to_si_pattern { } { } else { set et_vect_widen_sum_hi_to_si_pattern_saved 0 if { [istarget powerpc*-*-*] + || [check_effective_target_arm_neon_ok] || [istarget ia64-*-*] } { set et_vect_widen_sum_hi_to_si_pattern_saved 1 } -- 1.9.1