Message ID | 87tvxtz0xc.fsf@linaro.org |
---|---|
State | New |
Headers | show |
Series | Add support for bitwise reductions | expand |
Richard Sandiford <richard.sandiford@linaro.org> writes: > This patch adds support for the SVE bitwise reduction instructions > (ANDV, ORV and EORV). It's a fairly mechanical extension of existing > REDUC_* operators. > > Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu > and powerpc64le-linux-gnu. Here's an updated version that applies on top of the recent removal of REDUC_*_EXPR. Tested as before. Thanks, Richard 2017-11-22 Richard Sandiford <richard.sandiford@linaro.org> Alan Hayward <alan.hayward@arm.com> David Sherwood <david.sherwood@arm.com> gcc/ * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab) (reduc_xor_scal_optab): New optabs. * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m}) (reduc_xor_scal_@var{m}): Document. * doc/sourcebuild.texi (vect_logical_reduc): Likewise. * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New internal functions. * fold-const-call.c (fold_const_call): Handle them. * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR. * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>): (*reduc_<bit_reduc>_scal_<mode>): New patterns. * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV) (UNSPEC_XORV): New unspecs. (optab): Add entries for them. (BITWISEV): New int iterator. (bit_reduc_op): New int attributes. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_logical_reduc): New proc. * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc and add an associated scan-dump test. Prevent vectorization of the first two loops. * gcc.dg/vect/vect-reduc-or_2.c: Likewise. * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions. * gcc.target/aarch64/sve_reduc_2.c: Likewise. * gcc.target/aarch64/sve_reduc_1_run.c: Likewise. (INIT_VECTOR): Tweak initial value so that some bits are always set. * gcc.target/aarch64/sve_reduc_2_run.c: Likewise. Index: gcc/optabs.def =================================================================== --- gcc/optabs.def 2017-11-22 18:05:58.624329338 +0000 +++ gcc/optabs.def 2017-11-22 18:06:54.516061226 +0000 @@ -292,6 +292,9 @@ OPTAB_D (reduc_smin_scal_optab, "reduc_s OPTAB_D (reduc_plus_scal_optab, "reduc_plus_scal_$a") OPTAB_D (reduc_umax_scal_optab, "reduc_umax_scal_$a") OPTAB_D (reduc_umin_scal_optab, "reduc_umin_scal_$a") +OPTAB_D (reduc_and_scal_optab, "reduc_and_scal_$a") +OPTAB_D (reduc_ior_scal_optab, "reduc_ior_scal_$a") +OPTAB_D (reduc_xor_scal_optab, "reduc_xor_scal_$a") OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi 2017-11-22 18:05:58.620520950 +0000 +++ gcc/doc/md.texi 2017-11-22 18:06:54.515109580 +0000 @@ -5244,6 +5244,17 @@ Compute the sum of the elements of a vec operand 0 is the scalar result, with mode equal to the mode of the elements of the input vector. +@cindex @code{reduc_and_scal_@var{m}} instruction pattern +@item @samp{reduc_and_scal_@var{m}} +@cindex @code{reduc_ior_scal_@var{m}} instruction pattern +@itemx @samp{reduc_ior_scal_@var{m}} +@cindex @code{reduc_xor_scal_@var{m}} instruction pattern +@itemx @samp{reduc_xor_scal_@var{m}} +Compute the bitwise @code{AND}/@code{IOR}/@code{XOR} reduction of the elements +of a vector of mode @var{m}. Operand 1 is the vector input and operand 0 +is the scalar result. The mode of the scalar result is the same as one +element of @var{m}. + @cindex @code{sdot_prod@var{m}} instruction pattern @item @samp{sdot_prod@var{m}} @cindex @code{udot_prod@var{m}} instruction pattern Index: gcc/doc/sourcebuild.texi =================================================================== --- gcc/doc/sourcebuild.texi 2017-11-22 18:05:58.621473047 +0000 +++ gcc/doc/sourcebuild.texi 2017-11-22 18:06:54.515109580 +0000 @@ -1570,6 +1570,9 @@ Target supports 16- and 8-bytes vectors. @item vect_sizes_32B_16B Target supports 32- and 16-bytes vectors. + +@item vect_logical_reduc +Target supports AND, IOR and XOR reduction on vectors. @end table @subsubsection Thread Local Storage attributes Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def 2017-11-22 18:05:51.545487816 +0000 +++ gcc/internal-fn.def 2017-11-22 18:06:54.516061226 +0000 @@ -137,6 +137,12 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (REDUC_MAX, reduc_smax_scal, reduc_umax_scal, unary) DEF_INTERNAL_SIGNED_OPTAB_FN (REDUC_MIN, ECF_CONST | ECF_NOTHROW, first, reduc_smin_scal, reduc_umin_scal, unary) +DEF_INTERNAL_OPTAB_FN (REDUC_AND, ECF_CONST | ECF_NOTHROW, + reduc_and_scal, unary) +DEF_INTERNAL_OPTAB_FN (REDUC_IOR, ECF_CONST | ECF_NOTHROW, + reduc_ior_scal, unary) +DEF_INTERNAL_OPTAB_FN (REDUC_XOR, ECF_CONST | ECF_NOTHROW, + reduc_xor_scal, unary) /* Unary math functions. */ DEF_INTERNAL_FLT_FN (ACOS, ECF_CONST, acos, unary) Index: gcc/fold-const-call.c =================================================================== --- gcc/fold-const-call.c 2017-11-22 17:53:21.698058809 +0000 +++ gcc/fold-const-call.c 2017-11-22 18:06:54.516061226 +0000 @@ -1176,6 +1176,15 @@ fold_const_call (combined_fn fn, tree ty case CFN_REDUC_MIN: return fold_const_reduction (type, arg, MIN_EXPR); + case CFN_REDUC_AND: + return fold_const_reduction (type, arg, BIT_AND_EXPR); + + case CFN_REDUC_IOR: + return fold_const_reduction (type, arg, BIT_IOR_EXPR); + + case CFN_REDUC_XOR: + return fold_const_reduction (type, arg, BIT_XOR_EXPR); + default: return fold_const_call_1 (fn, type, arg); } Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2017-11-22 18:05:58.629089823 +0000 +++ gcc/tree-vect-loop.c 2017-11-22 18:06:54.517964518 +0000 @@ -2436,11 +2436,20 @@ reduction_fn_for_scalar_code (enum tree_ *reduc_fn = IFN_REDUC_PLUS; return true; - case MULT_EXPR: - case MINUS_EXPR: + case BIT_AND_EXPR: + *reduc_fn = IFN_REDUC_AND; + return true; + case BIT_IOR_EXPR: + *reduc_fn = IFN_REDUC_IOR; + return true; + case BIT_XOR_EXPR: - case BIT_AND_EXPR: + *reduc_fn = IFN_REDUC_XOR; + return true; + + case MULT_EXPR: + case MINUS_EXPR: *reduc_fn = IFN_LAST; return true; Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2017-11-22 18:05:58.618616756 +0000 +++ gcc/config/aarch64/aarch64-sve.md 2017-11-22 18:06:54.514157934 +0000 @@ -1513,6 +1513,26 @@ (define_insn "*reduc_<maxmin_uns>_scal_< "<maxmin_uns_op>v\t%<Vetype>0, %1, %2.<Vetype>" ) +(define_expand "reduc_<optab>_scal_<mode>" + [(set (match_operand:<VEL> 0 "register_operand") + (unspec:<VEL> [(match_dup 2) + (match_operand:SVE_I 1 "register_operand")] + BITWISEV))] + "TARGET_SVE" + { + operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode)); + } +) + +(define_insn "*reduc_<optab>_scal_<mode>" + [(set (match_operand:<VEL> 0 "register_operand" "=w") + (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl") + (match_operand:SVE_I 2 "register_operand" "w")] + BITWISEV))] + "TARGET_SVE" + "<bit_reduc_op>\t%<Vetype>0, %1, %2.<Vetype>" +) + ;; Unpredicated floating-point addition. (define_expand "add<mode>3" [(set (match_operand:SVE_F 0 "register_operand") Index: gcc/config/aarch64/iterators.md =================================================================== --- gcc/config/aarch64/iterators.md 2017-11-22 18:05:58.618616756 +0000 +++ gcc/config/aarch64/iterators.md 2017-11-22 18:06:54.514157934 +0000 @@ -409,6 +409,9 @@ (define_c_enum "unspec" UNSPEC_SDOT ; Used in aarch64-simd.md. UNSPEC_UDOT ; Used in aarch64-simd.md. UNSPEC_SEL ; Used in aarch64-sve.md. + UNSPEC_ANDV ; Used in aarch64-sve.md. + UNSPEC_IORV ; Used in aarch64-sve.md. + UNSPEC_XORV ; Used in aarch64-sve.md. UNSPEC_ANDF ; Used in aarch64-sve.md. UNSPEC_IORF ; Used in aarch64-sve.md. UNSPEC_XORF ; Used in aarch64-sve.md. @@ -1318,6 +1321,8 @@ (define_int_iterator MAXMINV [UNSPEC_UMA (define_int_iterator FMAXMINV [UNSPEC_FMAXV UNSPEC_FMINV UNSPEC_FMAXNMV UNSPEC_FMINNMV]) +(define_int_iterator BITWISEV [UNSPEC_ANDV UNSPEC_IORV UNSPEC_XORV]) + (define_int_iterator LOGICALF [UNSPEC_ANDF UNSPEC_IORF UNSPEC_XORF]) (define_int_iterator HADDSUB [UNSPEC_SHADD UNSPEC_UHADD @@ -1437,7 +1442,10 @@ (define_int_attr atomic_ldop ;; name for consistency with the integer patterns. (define_int_attr optab [(UNSPEC_ANDF "and") (UNSPEC_IORF "ior") - (UNSPEC_XORF "xor")]) + (UNSPEC_XORF "xor") + (UNSPEC_ANDV "and") + (UNSPEC_IORV "ior") + (UNSPEC_XORV "xor")]) (define_int_attr maxmin_uns [(UNSPEC_UMAXV "umax") (UNSPEC_UMINV "umin") @@ -1465,6 +1473,10 @@ (define_int_attr maxmin_uns_op [(UNSPEC (UNSPEC_FMAXNM "fmaxnm") (UNSPEC_FMINNM "fminnm")]) +(define_int_attr bit_reduc_op [(UNSPEC_ANDV "andv") + (UNSPEC_IORV "orv") + (UNSPEC_XORV "eorv")]) + ;; The SVE logical instruction that implements an unspec. (define_int_attr logicalf_op [(UNSPEC_ANDF "and") (UNSPEC_IORF "orr") Index: gcc/testsuite/lib/target-supports.exp =================================================================== --- gcc/testsuite/lib/target-supports.exp 2017-11-22 18:05:58.626233532 +0000 +++ gcc/testsuite/lib/target-supports.exp 2017-11-22 18:06:54.517012872 +0000 @@ -7187,6 +7187,12 @@ proc check_effective_target_vect_call_ro return $et_vect_call_roundf_saved($et_index) } +# Return 1 if the target supports AND, OR and XOR reduction. + +proc check_effective_target_vect_logical_reduc { } { + return [check_effective_target_aarch64_sve] +} + # Return 1 if the target supports section-anchors proc check_effective_target_section_anchors { } { Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2017-11-22 18:05:58.624329338 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2017-11-22 18:06:54.516061226 +0000 @@ -1,4 +1,4 @@ -/* { dg-require-effective-target whole_vector_shift } */ +/* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */ /* Write a reduction loop to be reduced using vector shifts. */ @@ -24,17 +24,17 @@ main (unsigned char argc, char **argv) check_vect (); for (i = 0; i < N; i++) - in[i] = (i + i + 1) & 0xfd; + { + in[i] = (i + i + 1) & 0xfd; + asm volatile ("" ::: "memory"); + } for (i = 0; i < N; i++) { expected |= in[i]; - asm volatile (""); + asm volatile ("" ::: "memory"); } - /* Prevent constant propagation of the entire loop below. */ - asm volatile ("" : : : "memory"); - for (i = 0; i < N; i++) sum |= in[i]; @@ -47,5 +47,5 @@ main (unsigned char argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" } } */ - +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */ +/* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */ Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2017-11-22 18:05:58.625281435 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2017-11-22 18:06:54.516061226 +0000 @@ -1,4 +1,4 @@ -/* { dg-require-effective-target whole_vector_shift } */ +/* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */ /* Write a reduction loop to be reduced using vector shifts and folded. */ @@ -23,12 +23,15 @@ main (unsigned char argc, char **argv) check_vect (); for (i = 0; i < N; i++) - in[i] = (i + i + 1) & 0xfd; + { + in[i] = (i + i + 1) & 0xfd; + asm volatile ("" ::: "memory"); + } for (i = 0; i < N; i++) { expected |= in[i]; - asm volatile (""); + asm volatile ("" ::: "memory"); } for (i = 0; i < N; i++) @@ -43,5 +46,5 @@ main (unsigned char argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" } } */ - +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */ +/* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_1.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve_reduc_1.c 2017-11-22 18:05:58.625281435 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_reduc_1.c 2017-11-22 18:06:54.516061226 +0000 @@ -65,6 +65,46 @@ #define TEST_MAXMIN(T) \ TEST_MAXMIN (DEF_REDUC_MAXMIN) +#define DEF_REDUC_BITWISE(TYPE, NAME, BIT_OP) \ +TYPE __attribute__ ((noinline, noclone)) \ +reduc_##NAME##_##TYPE (TYPE *a, int n) \ +{ \ + TYPE r = 13; \ + for (int i = 0; i < n; ++i) \ + r BIT_OP a[i]; \ + return r; \ +} + +#define TEST_BITWISE(T) \ + T (int8_t, and, &=) \ + T (int16_t, and, &=) \ + T (int32_t, and, &=) \ + T (int64_t, and, &=) \ + T (uint8_t, and, &=) \ + T (uint16_t, and, &=) \ + T (uint32_t, and, &=) \ + T (uint64_t, and, &=) \ + \ + T (int8_t, ior, |=) \ + T (int16_t, ior, |=) \ + T (int32_t, ior, |=) \ + T (int64_t, ior, |=) \ + T (uint8_t, ior, |=) \ + T (uint16_t, ior, |=) \ + T (uint32_t, ior, |=) \ + T (uint64_t, ior, |=) \ + \ + T (int8_t, xor, ^=) \ + T (int16_t, xor, ^=) \ + T (int32_t, xor, ^=) \ + T (int64_t, xor, ^=) \ + T (uint8_t, xor, ^=) \ + T (uint16_t, xor, ^=) \ + T (uint32_t, xor, ^=) \ + T (uint64_t, xor, ^=) + +TEST_BITWISE (DEF_REDUC_BITWISE) + /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 1 } } */ /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */ @@ -102,6 +142,12 @@ TEST_MAXMIN (DEF_REDUC_MAXMIN) /* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */ + +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */ + +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */ + /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */ /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */ @@ -133,3 +179,18 @@ TEST_MAXMIN (DEF_REDUC_MAXMIN) /* { dg-final { scan-assembler-times {\tfminnmv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnmv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnmv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tandv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */ +/* { dg-final { scan-assembler-times {\tandv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */ +/* { dg-final { scan-assembler-times {\tandv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */ +/* { dg-final { scan-assembler-times {\tandv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */ + +/* { dg-final { scan-assembler-times {\torv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */ +/* { dg-final { scan-assembler-times {\torv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */ +/* { dg-final { scan-assembler-times {\torv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */ +/* { dg-final { scan-assembler-times {\torv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */ + +/* { dg-final { scan-assembler-times {\teorv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */ +/* { dg-final { scan-assembler-times {\teorv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */ +/* { dg-final { scan-assembler-times {\teorv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */ +/* { dg-final { scan-assembler-times {\teorv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_2.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve_reduc_2.c 2017-11-22 18:05:58.625281435 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_reduc_2.c 2017-11-22 18:06:54.517012872 +0000 @@ -73,6 +73,49 @@ #define TEST_MAXMIN(T) \ TEST_MAXMIN (DEF_REDUC_MAXMIN) +#define DEF_REDUC_BITWISE(TYPE,NAME,BIT_OP) \ +void __attribute__ ((noinline, noclone)) \ +reduc_##NAME##TYPE (TYPE (*restrict a)[NUM_ELEMS(TYPE)], \ + TYPE *restrict r, int n) \ +{ \ + for (int i = 0; i < n; i++) \ + { \ + r[i] = a[i][0]; \ + for (int j = 0; j < NUM_ELEMS(TYPE); j++) \ + r[i] BIT_OP a[i][j]; \ + } \ +} + +#define TEST_BITWISE(T) \ + T (int8_t, and, &=) \ + T (int16_t, and, &=) \ + T (int32_t, and, &=) \ + T (int64_t, and, &=) \ + T (uint8_t, and, &=) \ + T (uint16_t, and, &=) \ + T (uint32_t, and, &=) \ + T (uint64_t, and, &=) \ + \ + T (int8_t, ior, |=) \ + T (int16_t, ior, |=) \ + T (int32_t, ior, |=) \ + T (int64_t, ior, |=) \ + T (uint8_t, ior, |=) \ + T (uint16_t, ior, |=) \ + T (uint32_t, ior, |=) \ + T (uint64_t, ior, |=) \ + \ + T (int8_t, xor, ^=) \ + T (int16_t, xor, ^=) \ + T (int32_t, xor, ^=) \ + T (int64_t, xor, ^=) \ + T (uint8_t, xor, ^=) \ + T (uint16_t, xor, ^=) \ + T (uint32_t, xor, ^=) \ + T (uint64_t, xor, ^=) + +TEST_BITWISE (DEF_REDUC_BITWISE) + /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */ /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */ @@ -104,3 +147,18 @@ TEST_MAXMIN (DEF_REDUC_MAXMIN) /* { dg-final { scan-assembler-times {\tfminnmv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnmv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnmv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tandv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tandv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tandv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tandv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */ + +/* { dg-final { scan-assembler-times {\torv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */ +/* { dg-final { scan-assembler-times {\torv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */ +/* { dg-final { scan-assembler-times {\torv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */ +/* { dg-final { scan-assembler-times {\torv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */ + +/* { dg-final { scan-assembler-times {\teorv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */ +/* { dg-final { scan-assembler-times {\teorv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */ +/* { dg-final { scan-assembler-times {\teorv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */ +/* { dg-final { scan-assembler-times {\teorv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_1_run.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve_reduc_1_run.c 2017-11-22 18:05:58.625281435 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_reduc_1_run.c 2017-11-22 18:06:54.516061226 +0000 @@ -9,7 +9,7 @@ #define INIT_VECTOR(TYPE) \ TYPE a[NUM_ELEMS (TYPE) + 1]; \ for (int i = 0; i < NUM_ELEMS (TYPE) + 1; i++) \ { \ - a[i] = (i * 2) * (i & 1 ? 1 : -1); \ + a[i] = ((i * 2) * (i & 1 ? 1 : -1) | 3); \ asm volatile ("" ::: "memory"); \ } @@ -35,10 +35,22 @@ #define TEST_REDUC_MAXMIN(TYPE, NAME, CM __builtin_abort (); \ } +#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP) \ + { \ + INIT_VECTOR (TYPE); \ + TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE)); \ + volatile TYPE r2 = 13; \ + for (int i = 0; i < NUM_ELEMS (TYPE); ++i) \ + r2 BIT_OP a[i]; \ + if (r1 != r2) \ + __builtin_abort (); \ + } + int main () { TEST_PLUS (TEST_REDUC_PLUS) TEST_MAXMIN (TEST_REDUC_MAXMIN) + TEST_BITWISE (TEST_REDUC_BITWISE) return 0; } Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_2_run.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve_reduc_2_run.c 2017-11-22 18:05:58.625281435 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_reduc_2_run.c 2017-11-22 18:06:54.517012872 +0000 @@ -56,6 +56,20 @@ #define TEST_REDUC_MAXMIN(TYPE, NAME, CM } \ } +#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP) \ + { \ + INIT_MATRIX (TYPE); \ + reduc_##NAME##_##TYPE (mat, r, NROWS); \ + for (int i = 0; i < NROWS; i++) \ + { \ + volatile TYPE r2 = mat[i][0]; \ + for (int j = 0; j < NUM_ELEMS (TYPE); ++j) \ + r2 BIT_OP mat[i][j]; \ + if (r[i] != r2) \ + __builtin_abort (); \ + } \ + } + int main () { TEST_PLUS (TEST_REDUC_PLUS)
On 11/22/2017 11:12 AM, Richard Sandiford wrote: > Richard Sandiford <richard.sandiford@linaro.org> writes: >> This patch adds support for the SVE bitwise reduction instructions >> (ANDV, ORV and EORV). It's a fairly mechanical extension of existing >> REDUC_* operators. >> >> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu >> and powerpc64le-linux-gnu. > > Here's an updated version that applies on top of the recent > removal of REDUC_*_EXPR. Tested as before. > > Thanks, > Richard > > > 2017-11-22 Richard Sandiford <richard.sandiford@linaro.org> > Alan Hayward <alan.hayward@arm.com> > David Sherwood <david.sherwood@arm.com> > > gcc/ > * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab) > (reduc_xor_scal_optab): New optabs. > * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m}) > (reduc_xor_scal_@var{m}): Document. > * doc/sourcebuild.texi (vect_logical_reduc): Likewise. > * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New > internal functions. > * fold-const-call.c (fold_const_call): Handle them. > * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new > internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR. > * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>): > (*reduc_<bit_reduc>_scal_<mode>): New patterns. > * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV) > (UNSPEC_XORV): New unspecs. > (optab): Add entries for them. > (BITWISEV): New int iterator. > (bit_reduc_op): New int attributes. > > gcc/testsuite/ > * lib/target-supports.exp (check_effective_target_vect_logical_reduc): > New proc. > * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc > and add an associated scan-dump test. Prevent vectorization > of the first two loops. > * gcc.dg/vect/vect-reduc-or_2.c: Likewise. > * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions. > * gcc.target/aarch64/sve_reduc_2.c: Likewise. > * gcc.target/aarch64/sve_reduc_1_run.c: Likewise. > (INIT_VECTOR): Tweak initial value so that some bits are always set. > * gcc.target/aarch64/sve_reduc_2_run.c: Likewise. OK. Jeff
On Thu, Dec 14, 2017 at 12:36:58AM +0000, Jeff Law wrote: > On 11/22/2017 11:12 AM, Richard Sandiford wrote: > > Richard Sandiford <richard.sandiford@linaro.org> writes: > >> This patch adds support for the SVE bitwise reduction instructions > >> (ANDV, ORV and EORV). It's a fairly mechanical extension of existing > >> REDUC_* operators. > >> > >> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu > >> and powerpc64le-linux-gnu. > > > > Here's an updated version that applies on top of the recent > > removal of REDUC_*_EXPR. Tested as before. > > > > Thanks, > > Richard > > > > > > 2017-11-22 Richard Sandiford <richard.sandiford@linaro.org> > > Alan Hayward <alan.hayward@arm.com> > > David Sherwood <david.sherwood@arm.com> > > > > gcc/ > > * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab) > > (reduc_xor_scal_optab): New optabs. > > * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m}) > > (reduc_xor_scal_@var{m}): Document. > > * doc/sourcebuild.texi (vect_logical_reduc): Likewise. > > * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New > > internal functions. > > * fold-const-call.c (fold_const_call): Handle them. > > * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new > > internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR. > > * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>): > > (*reduc_<bit_reduc>_scal_<mode>): New patterns. > > * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV) > > (UNSPEC_XORV): New unspecs. > > (optab): Add entries for them. > > (BITWISEV): New int iterator. > > (bit_reduc_op): New int attributes. > > > > gcc/testsuite/ > > * lib/target-supports.exp (check_effective_target_vect_logical_reduc): > > New proc. > > * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc > > and add an associated scan-dump test. Prevent vectorization > > of the first two loops. > > * gcc.dg/vect/vect-reduc-or_2.c: Likewise. > > * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions. > > * gcc.target/aarch64/sve_reduc_2.c: Likewise. > > * gcc.target/aarch64/sve_reduc_1_run.c: Likewise. > > (INIT_VECTOR): Tweak initial value so that some bits are always set. > > * gcc.target/aarch64/sve_reduc_2_run.c: Likewise. > OK. > Jeff I'm also OK with the AArch64 parts. James
Jeff Law <law@redhat.com> writes: > On 11/22/2017 11:12 AM, Richard Sandiford wrote: >> Richard Sandiford <richard.sandiford@linaro.org> writes: >>> This patch adds support for the SVE bitwise reduction instructions >>> (ANDV, ORV and EORV). It's a fairly mechanical extension of existing >>> REDUC_* operators. >>> >>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu >>> and powerpc64le-linux-gnu. >> >> Here's an updated version that applies on top of the recent >> removal of REDUC_*_EXPR. Tested as before. >> >> Thanks, >> Richard >> >> >> 2017-11-22 Richard Sandiford <richard.sandiford@linaro.org> >> Alan Hayward <alan.hayward@arm.com> >> David Sherwood <david.sherwood@arm.com> >> >> gcc/ >> * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab) >> (reduc_xor_scal_optab): New optabs. >> * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m}) >> (reduc_xor_scal_@var{m}): Document. >> * doc/sourcebuild.texi (vect_logical_reduc): Likewise. >> * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New >> internal functions. >> * fold-const-call.c (fold_const_call): Handle them. >> * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new >> internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR. >> * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>): >> (*reduc_<bit_reduc>_scal_<mode>): New patterns. >> * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV) >> (UNSPEC_XORV): New unspecs. >> (optab): Add entries for them. >> (BITWISEV): New int iterator. >> (bit_reduc_op): New int attributes. >> >> gcc/testsuite/ >> * lib/target-supports.exp (check_effective_target_vect_logical_reduc): >> New proc. >> * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc >> and add an associated scan-dump test. Prevent vectorization >> of the first two loops. >> * gcc.dg/vect/vect-reduc-or_2.c: Likewise. >> * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions. >> * gcc.target/aarch64/sve_reduc_2.c: Likewise. >> * gcc.target/aarch64/sve_reduc_1_run.c: Likewise. >> (INIT_VECTOR): Tweak initial value so that some bits are always set. >> * gcc.target/aarch64/sve_reduc_2_run.c: Likewise. > OK. > Jeff Two tests have regressed on sparc-sun-solaris2.*: +FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects scan-tree-dump vect "Reduce using vector shifts" +FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using vector shifts" +FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects scan-tree-dump vect "Reduce using vector shifts" +FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using vector shifts" Rainer -- ----------------------------------------------------------------------------- Rainer Orth, Center for Biotechnology, Bielefeld University
Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes: > Jeff Law <law@redhat.com> writes: >> On 11/22/2017 11:12 AM, Richard Sandiford wrote: >>> Richard Sandiford <richard.sandiford@linaro.org> writes: >>>> This patch adds support for the SVE bitwise reduction instructions >>>> (ANDV, ORV and EORV). It's a fairly mechanical extension of existing >>>> REDUC_* operators. >>>> >>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu >>>> and powerpc64le-linux-gnu. >>> >>> Here's an updated version that applies on top of the recent >>> removal of REDUC_*_EXPR. Tested as before. >>> >>> Thanks, >>> Richard >>> >>> >>> 2017-11-22 Richard Sandiford <richard.sandiford@linaro.org> >>> Alan Hayward <alan.hayward@arm.com> >>> David Sherwood <david.sherwood@arm.com> >>> >>> gcc/ >>> * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab) >>> (reduc_xor_scal_optab): New optabs. >>> * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m}) >>> (reduc_xor_scal_@var{m}): Document. >>> * doc/sourcebuild.texi (vect_logical_reduc): Likewise. >>> * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New >>> internal functions. >>> * fold-const-call.c (fold_const_call): Handle them. >>> * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new >>> internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR. >>> * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>): >>> (*reduc_<bit_reduc>_scal_<mode>): New patterns. >>> * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV) >>> (UNSPEC_XORV): New unspecs. >>> (optab): Add entries for them. >>> (BITWISEV): New int iterator. >>> (bit_reduc_op): New int attributes. >>> >>> gcc/testsuite/ >>> * lib/target-supports.exp (check_effective_target_vect_logical_reduc): >>> New proc. >>> * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc >>> and add an associated scan-dump test. Prevent vectorization >>> of the first two loops. >>> * gcc.dg/vect/vect-reduc-or_2.c: Likewise. >>> * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions. >>> * gcc.target/aarch64/sve_reduc_2.c: Likewise. >>> * gcc.target/aarch64/sve_reduc_1_run.c: Likewise. >>> (INIT_VECTOR): Tweak initial value so that some bits are always set. >>> * gcc.target/aarch64/sve_reduc_2_run.c: Likewise. >> OK. >> Jeff > > Two tests have regressed on sparc-sun-solaris2.*: > > +FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects > scan-tree-dump vect "Reduce using vector shifts" > +FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using > vector shifts" > +FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects > scan-tree-dump vect "Reduce using vector shifts" > +FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using > vector shifts" Bah, I think I broke this yesterday in: 2018-01-24 Richard Sandiford <richard.sandiford@linaro.org> PR testsuite/83889 [...] * gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run. * gcc.dg/vect/vect-reduc-or_2.c: Likewise. (r257022), which removed: /* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */ I'd somehow thought that the dump lines in these two tests were already guarded, but they weren't. Tested on aarch64-linux-gnu and x86_64-linux-gnu and applied as obvious. Sorry for the breakage. Richard 2018-01-25 Richard Sandiford <richard.sandiford@linaro.org> gcc/testsuite/ * gcc.dg/vect/vect-reduc-or_1.c: Require whole_vector_shift for the shift dump line. * gcc.dg/vect/vect-reduc-or_2.c: Likewise. Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-24 16:22:31.724089913 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-25 10:16:16.283500281 +0000 @@ -45,5 +45,5 @@ main (unsigned char argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */ +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */ /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */ Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-24 16:22:31.724089913 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-25 10:16:16.284500239 +0000 @@ -44,5 +44,5 @@ main (unsigned char argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */ +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */ /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */
On 25 January 2018 at 11:24, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes: >> Jeff Law <law@redhat.com> writes: >>> On 11/22/2017 11:12 AM, Richard Sandiford wrote: >>>> Richard Sandiford <richard.sandiford@linaro.org> writes: >>>>> This patch adds support for the SVE bitwise reduction instructions >>>>> (ANDV, ORV and EORV). It's a fairly mechanical extension of existing >>>>> REDUC_* operators. >>>>> >>>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu >>>>> and powerpc64le-linux-gnu. >>>> >>>> Here's an updated version that applies on top of the recent >>>> removal of REDUC_*_EXPR. Tested as before. >>>> >>>> Thanks, >>>> Richard >>>> >>>> >>>> 2017-11-22 Richard Sandiford <richard.sandiford@linaro.org> >>>> Alan Hayward <alan.hayward@arm.com> >>>> David Sherwood <david.sherwood@arm.com> >>>> >>>> gcc/ >>>> * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab) >>>> (reduc_xor_scal_optab): New optabs. >>>> * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m}) >>>> (reduc_xor_scal_@var{m}): Document. >>>> * doc/sourcebuild.texi (vect_logical_reduc): Likewise. >>>> * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New >>>> internal functions. >>>> * fold-const-call.c (fold_const_call): Handle them. >>>> * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new >>>> internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR. >>>> * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>): >>>> (*reduc_<bit_reduc>_scal_<mode>): New patterns. >>>> * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV) >>>> (UNSPEC_XORV): New unspecs. >>>> (optab): Add entries for them. >>>> (BITWISEV): New int iterator. >>>> (bit_reduc_op): New int attributes. >>>> >>>> gcc/testsuite/ >>>> * lib/target-supports.exp (check_effective_target_vect_logical_reduc): >>>> New proc. >>>> * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc >>>> and add an associated scan-dump test. Prevent vectorization >>>> of the first two loops. >>>> * gcc.dg/vect/vect-reduc-or_2.c: Likewise. >>>> * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions. >>>> * gcc.target/aarch64/sve_reduc_2.c: Likewise. >>>> * gcc.target/aarch64/sve_reduc_1_run.c: Likewise. >>>> (INIT_VECTOR): Tweak initial value so that some bits are always set. >>>> * gcc.target/aarch64/sve_reduc_2_run.c: Likewise. >>> OK. >>> Jeff >> >> Two tests have regressed on sparc-sun-solaris2.*: >> >> +FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects >> scan-tree-dump vect "Reduce using vector shifts" >> +FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using >> vector shifts" >> +FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects >> scan-tree-dump vect "Reduce using vector shifts" >> +FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using >> vector shifts" > > Bah, I think I broke this yesterday in: > > 2018-01-24 Richard Sandiford <richard.sandiford@linaro.org> > > PR testsuite/83889 > [...] > * gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run. > * gcc.dg/vect/vect-reduc-or_2.c: Likewise. > > (r257022), which removed: > > /* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */ > > I'd somehow thought that the dump lines in these two tests were already > guarded, but they weren't. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu and applied as obvious. > Sorry for the breakage. > > Richard > > Hi Richard, While this fixes the regression on armeb (same as on sparc), the effect on arm-none-linux-gnueabi and arm-none-eabi is that the tests are now skipped, while they used to pass. Is this expected? Or is the guard you added too restrictive? Thanks, Christophe > 2018-01-25 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/testsuite/ > * gcc.dg/vect/vect-reduc-or_1.c: Require whole_vector_shift for > the shift dump line. > * gcc.dg/vect/vect-reduc-or_2.c: Likewise. > > Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c > =================================================================== > --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-24 16:22:31.724089913 +0000 > +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-25 10:16:16.283500281 +0000 > @@ -45,5 +45,5 @@ main (unsigned char argc, char **argv) > return 0; > } > > -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */ > +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */ > /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */ > Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c > =================================================================== > --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-24 16:22:31.724089913 +0000 > +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-25 10:16:16.284500239 +0000 > @@ -44,5 +44,5 @@ main (unsigned char argc, char **argv) > return 0; > } > > -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */ > +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */ > /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */
Christophe Lyon <christophe.lyon@linaro.org> writes: > On 25 January 2018 at 11:24, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes: >>> Jeff Law <law@redhat.com> writes: >>>> On 11/22/2017 11:12 AM, Richard Sandiford wrote: >>>>> Richard Sandiford <richard.sandiford@linaro.org> writes: >>>>>> This patch adds support for the SVE bitwise reduction instructions >>>>>> (ANDV, ORV and EORV). It's a fairly mechanical extension of existing >>>>>> REDUC_* operators. >>>>>> >>>>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu >>>>>> and powerpc64le-linux-gnu. >>>>> >>>>> Here's an updated version that applies on top of the recent >>>>> removal of REDUC_*_EXPR. Tested as before. >>>>> >>>>> Thanks, >>>>> Richard >>>>> >>>>> >>>>> 2017-11-22 Richard Sandiford <richard.sandiford@linaro.org> >>>>> Alan Hayward <alan.hayward@arm.com> >>>>> David Sherwood <david.sherwood@arm.com> >>>>> >>>>> gcc/ >>>>> * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab) >>>>> (reduc_xor_scal_optab): New optabs. >>>>> * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m}) >>>>> (reduc_xor_scal_@var{m}): Document. >>>>> * doc/sourcebuild.texi (vect_logical_reduc): Likewise. >>>>> * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New >>>>> internal functions. >>>>> * fold-const-call.c (fold_const_call): Handle them. >>>>> * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new >>>>> internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR. >>>>> * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>): >>>>> (*reduc_<bit_reduc>_scal_<mode>): New patterns. >>>>> * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV) >>>>> (UNSPEC_XORV): New unspecs. >>>>> (optab): Add entries for them. >>>>> (BITWISEV): New int iterator. >>>>> (bit_reduc_op): New int attributes. >>>>> >>>>> gcc/testsuite/ >>>>> * lib/target-supports.exp (check_effective_target_vect_logical_reduc): >>>>> New proc. >>>>> * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc >>>>> and add an associated scan-dump test. Prevent vectorization >>>>> of the first two loops. >>>>> * gcc.dg/vect/vect-reduc-or_2.c: Likewise. >>>>> * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions. >>>>> * gcc.target/aarch64/sve_reduc_2.c: Likewise. >>>>> * gcc.target/aarch64/sve_reduc_1_run.c: Likewise. >>>>> (INIT_VECTOR): Tweak initial value so that some bits are always set. >>>>> * gcc.target/aarch64/sve_reduc_2_run.c: Likewise. >>>> OK. >>>> Jeff >>> >>> Two tests have regressed on sparc-sun-solaris2.*: >>> >>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects >>> scan-tree-dump vect "Reduce using vector shifts" >>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using >>> vector shifts" >>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects >>> scan-tree-dump vect "Reduce using vector shifts" >>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using >>> vector shifts" >> >> Bah, I think I broke this yesterday in: >> >> 2018-01-24 Richard Sandiford <richard.sandiford@linaro.org> >> >> PR testsuite/83889 >> [...] >> * gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run. >> * gcc.dg/vect/vect-reduc-or_2.c: Likewise. >> >> (r257022), which removed: >> >> /* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */ >> >> I'd somehow thought that the dump lines in these two tests were already >> guarded, but they weren't. >> >> Tested on aarch64-linux-gnu and x86_64-linux-gnu and applied as obvious. >> Sorry for the breakage. >> >> Richard >> >> > > Hi Richard, > > While this fixes the regression on armeb (same as on sparc), the > effect on arm-none-linux-gnueabi and arm-none-eabi > is that the tests are now skipped, while they used to pass. > Is this expected? Or is the guard you added too restrictive? I think that means that the tests went from UNSUPPORTED to PASS on the last two targets with r257022. Is that right? It's expected in the sense that whole_vector_shift isn't true for any arm*-*-* target, and historically this test was restricted to whole_vector_shift (apart from the blip this week). Thanks, Richard > Thanks, > > Christophe > >> 2018-01-25 Richard Sandiford <richard.sandiford@linaro.org> >> >> gcc/testsuite/ >> * gcc.dg/vect/vect-reduc-or_1.c: Require whole_vector_shift for >> the shift dump line. >> * gcc.dg/vect/vect-reduc-or_2.c: Likewise. >> >> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c >> =================================================================== >> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-24 16:22:31.724089913 +0000 >> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-25 10:16:16.283500281 +0000 >> @@ -45,5 +45,5 @@ main (unsigned char argc, char **argv) >> return 0; >> } >> >> -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */ >> +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */ >> /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */ >> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c >> =================================================================== >> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-24 16:22:31.724089913 +0000 >> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-25 10:16:16.284500239 +0000 >> @@ -44,5 +44,5 @@ main (unsigned char argc, char **argv) >> return 0; >> } >> >> -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */ >> +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */ >> /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */
On 26 January 2018 at 10:33, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Christophe Lyon <christophe.lyon@linaro.org> writes: >> On 25 January 2018 at 11:24, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes: >>>> Jeff Law <law@redhat.com> writes: >>>>> On 11/22/2017 11:12 AM, Richard Sandiford wrote: >>>>>> Richard Sandiford <richard.sandiford@linaro.org> writes: >>>>>>> This patch adds support for the SVE bitwise reduction instructions >>>>>>> (ANDV, ORV and EORV). It's a fairly mechanical extension of existing >>>>>>> REDUC_* operators. >>>>>>> >>>>>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu >>>>>>> and powerpc64le-linux-gnu. >>>>>> >>>>>> Here's an updated version that applies on top of the recent >>>>>> removal of REDUC_*_EXPR. Tested as before. >>>>>> >>>>>> Thanks, >>>>>> Richard >>>>>> >>>>>> >>>>>> 2017-11-22 Richard Sandiford <richard.sandiford@linaro.org> >>>>>> Alan Hayward <alan.hayward@arm.com> >>>>>> David Sherwood <david.sherwood@arm.com> >>>>>> >>>>>> gcc/ >>>>>> * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab) >>>>>> (reduc_xor_scal_optab): New optabs. >>>>>> * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m}) >>>>>> (reduc_xor_scal_@var{m}): Document. >>>>>> * doc/sourcebuild.texi (vect_logical_reduc): Likewise. >>>>>> * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New >>>>>> internal functions. >>>>>> * fold-const-call.c (fold_const_call): Handle them. >>>>>> * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new >>>>>> internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR. >>>>>> * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>): >>>>>> (*reduc_<bit_reduc>_scal_<mode>): New patterns. >>>>>> * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV) >>>>>> (UNSPEC_XORV): New unspecs. >>>>>> (optab): Add entries for them. >>>>>> (BITWISEV): New int iterator. >>>>>> (bit_reduc_op): New int attributes. >>>>>> >>>>>> gcc/testsuite/ >>>>>> * lib/target-supports.exp (check_effective_target_vect_logical_reduc): >>>>>> New proc. >>>>>> * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc >>>>>> and add an associated scan-dump test. Prevent vectorization >>>>>> of the first two loops. >>>>>> * gcc.dg/vect/vect-reduc-or_2.c: Likewise. >>>>>> * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions. >>>>>> * gcc.target/aarch64/sve_reduc_2.c: Likewise. >>>>>> * gcc.target/aarch64/sve_reduc_1_run.c: Likewise. >>>>>> (INIT_VECTOR): Tweak initial value so that some bits are always set. >>>>>> * gcc.target/aarch64/sve_reduc_2_run.c: Likewise. >>>>> OK. >>>>> Jeff >>>> >>>> Two tests have regressed on sparc-sun-solaris2.*: >>>> >>>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects >>>> scan-tree-dump vect "Reduce using vector shifts" >>>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using >>>> vector shifts" >>>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects >>>> scan-tree-dump vect "Reduce using vector shifts" >>>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using >>>> vector shifts" >>> >>> Bah, I think I broke this yesterday in: >>> >>> 2018-01-24 Richard Sandiford <richard.sandiford@linaro.org> >>> >>> PR testsuite/83889 >>> [...] >>> * gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run. >>> * gcc.dg/vect/vect-reduc-or_2.c: Likewise. >>> >>> (r257022), which removed: >>> >>> /* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */ >>> >>> I'd somehow thought that the dump lines in these two tests were already >>> guarded, but they weren't. >>> >>> Tested on aarch64-linux-gnu and x86_64-linux-gnu and applied as obvious. >>> Sorry for the breakage. >>> >>> Richard >>> >>> >> >> Hi Richard, >> >> While this fixes the regression on armeb (same as on sparc), the >> effect on arm-none-linux-gnueabi and arm-none-eabi >> is that the tests are now skipped, while they used to pass. >> Is this expected? Or is the guard you added too restrictive? > > I think that means that the tests went from UNSUPPORTED to PASS on > the last two targets with r257022. Is that right? > Yes, that's what I meant. > It's expected in the sense that whole_vector_shift isn't true for > any arm*-*-* target, and historically this test was restricted to > whole_vector_shift (apart from the blip this week). > OK, then. Just surprising to see PASS disappear. Thanks, Christophe > Thanks, > Richard > >> Thanks, >> >> Christophe >> >>> 2018-01-25 Richard Sandiford <richard.sandiford@linaro.org> >>> >>> gcc/testsuite/ >>> * gcc.dg/vect/vect-reduc-or_1.c: Require whole_vector_shift for >>> the shift dump line. >>> * gcc.dg/vect/vect-reduc-or_2.c: Likewise. >>> >>> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c >>> =================================================================== >>> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-24 16:22:31.724089913 +0000 >>> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-25 10:16:16.283500281 +0000 >>> @@ -45,5 +45,5 @@ main (unsigned char argc, char **argv) >>> return 0; >>> } >>> >>> -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */ >>> +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */ >>> /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */ >>> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c >>> =================================================================== >>> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-24 16:22:31.724089913 +0000 >>> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-25 10:16:16.284500239 +0000 >>> @@ -44,5 +44,5 @@ main (unsigned char argc, char **argv) >>> return 0; >>> } >>> >>> -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */ >>> +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */ >>> /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */
Index: gcc/tree.def =================================================================== --- gcc/tree.def 2017-11-17 09:40:43.533167007 +0000 +++ gcc/tree.def 2017-11-17 09:49:36.196354636 +0000 @@ -1298,6 +1298,9 @@ DEFTREECODE (TRANSACTION_EXPR, "transact DEFTREECODE (REDUC_MAX_EXPR, "reduc_max_expr", tcc_unary, 1) DEFTREECODE (REDUC_MIN_EXPR, "reduc_min_expr", tcc_unary, 1) DEFTREECODE (REDUC_PLUS_EXPR, "reduc_plus_expr", tcc_unary, 1) +DEFTREECODE (REDUC_AND_EXPR, "reduc_and_expr", tcc_unary, 1) +DEFTREECODE (REDUC_IOR_EXPR, "reduc_ior_expr", tcc_unary, 1) +DEFTREECODE (REDUC_XOR_EXPR, "reduc_xor_expr", tcc_unary, 1) /* Widening dot-product. The first two arguments are of type t1. Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi 2017-11-17 09:44:46.386606597 +0000 +++ gcc/doc/md.texi 2017-11-17 09:49:36.189354637 +0000 @@ -5244,6 +5244,17 @@ Compute the sum of the elements of a vec operand 0 is the scalar result, with mode equal to the mode of the elements of the input vector. +@cindex @code{reduc_and_scal_@var{m}} instruction pattern +@item @samp{reduc_and_scal_@var{m}} +@cindex @code{reduc_ior_scal_@var{m}} instruction pattern +@itemx @samp{reduc_ior_scal_@var{m}} +@cindex @code{reduc_xor_scal_@var{m}} instruction pattern +@itemx @samp{reduc_xor_scal_@var{m}} +Compute the bitwise @code{AND}/@code{IOR}/@code{XOR} reduction of the elements +of a vector of mode @var{m}. Operand 1 is the vector input and operand 0 +is the scalar result. The mode of the scalar result is the same as one +element of @var{m}. + @cindex @code{sdot_prod@var{m}} instruction pattern @item @samp{sdot_prod@var{m}} @cindex @code{udot_prod@var{m}} instruction pattern Index: gcc/doc/sourcebuild.texi =================================================================== --- gcc/doc/sourcebuild.texi 2017-11-09 15:19:05.427168565 +0000 +++ gcc/doc/sourcebuild.texi 2017-11-17 09:49:36.190354637 +0000 @@ -1570,6 +1570,9 @@ Target supports 16- and 8-bytes vectors. @item vect_sizes_32B_16B Target supports 32- and 16-bytes vectors. + +@item vect_logical_reduc +Target supports AND, IOR and XOR reduction on vectors. @end table @subsubsection Thread Local Storage attributes Index: gcc/doc/generic.texi =================================================================== --- gcc/doc/generic.texi 2017-11-17 09:40:43.510667010 +0000 +++ gcc/doc/generic.texi 2017-11-17 09:49:36.188354638 +0000 @@ -1740,6 +1740,12 @@ a value from @code{enum annot_expr_kind} @tindex VEC_PACK_FIX_TRUNC_EXPR @tindex VEC_COND_EXPR @tindex SAD_EXPR +@tindex REDUC_MAX_EXPR +@tindex REDUC_MIN_EXPR +@tindex REDUC_PLUS_EXPR +@tindex REDUC_AND_EXPR +@tindex REDUC_IOR_EXPR +@tindex REDUC_XOR_EXPR @table @code @item VEC_DUPLICATE_EXPR @@ -1841,6 +1847,20 @@ operand must be at lease twice of the si first and second one. The SAD is calculated between the first and second operands, added to the third operand, and returned. +@item REDUC_MAX_EXPR +@itemx REDUC_MIN_EXPR +@itemx REDUC_PLUS_EXPR +@itemx REDUC_AND_EXPR +@itemx REDUC_IOR_EXPR +@itemx REDUC_XOR_EXPR +These nodes represent operations that take a vector input and repeatedly +apply a binary operator on pairs of elements until only one scalar remains. +For example, @samp{REDUC_PLUS_EXPR <@var{x}>} returns the sum of +the elements in @var{x} and @samp{REDUC_MAX_EXPR <@var{x}>} returns +the maximum element in @var{x}. The associativity of the operation +is unspecified; for example, @samp{REDUC_PLUS_EXPR <@var{x}>} could +sum floating-point @var{x} in forward order, in reverse order, +using a tree, or in some other way. @end table Index: gcc/optabs.def =================================================================== --- gcc/optabs.def 2017-11-17 09:44:46.386606597 +0000 +++ gcc/optabs.def 2017-11-17 09:49:36.192354637 +0000 @@ -292,6 +292,9 @@ OPTAB_D (reduc_smin_scal_optab, "reduc_s OPTAB_D (reduc_plus_scal_optab, "reduc_plus_scal_$a") OPTAB_D (reduc_umax_scal_optab, "reduc_umax_scal_$a") OPTAB_D (reduc_umin_scal_optab, "reduc_umin_scal_$a") +OPTAB_D (reduc_and_scal_optab, "reduc_and_scal_$a") +OPTAB_D (reduc_ior_scal_optab, "reduc_ior_scal_$a") +OPTAB_D (reduc_xor_scal_optab, "reduc_xor_scal_$a") OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") Index: gcc/cfgexpand.c =================================================================== --- gcc/cfgexpand.c 2017-11-17 09:40:43.509767010 +0000 +++ gcc/cfgexpand.c 2017-11-17 09:49:36.187354638 +0000 @@ -5069,6 +5069,9 @@ expand_debug_expr (tree exp) case REDUC_MAX_EXPR: case REDUC_MIN_EXPR: case REDUC_PLUS_EXPR: + case REDUC_AND_EXPR: + case REDUC_IOR_EXPR: + case REDUC_XOR_EXPR: case VEC_COND_EXPR: case VEC_PACK_FIX_TRUNC_EXPR: case VEC_PACK_SAT_EXPR: Index: gcc/expr.c =================================================================== --- gcc/expr.c 2017-11-17 09:06:05.552470755 +0000 +++ gcc/expr.c 2017-11-17 09:49:36.191354637 +0000 @@ -9438,6 +9438,9 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b case REDUC_MAX_EXPR: case REDUC_MIN_EXPR: case REDUC_PLUS_EXPR: + case REDUC_AND_EXPR: + case REDUC_IOR_EXPR: + case REDUC_XOR_EXPR: { op0 = expand_normal (treeop0); this_optab = optab_for_tree_code (code, type, optab_default); Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c 2017-11-17 09:06:23.404260252 +0000 +++ gcc/fold-const.c 2017-11-17 09:49:36.192354637 +0000 @@ -1869,6 +1869,9 @@ const_unop (enum tree_code code, tree ty case REDUC_MIN_EXPR: case REDUC_MAX_EXPR: case REDUC_PLUS_EXPR: + case REDUC_AND_EXPR: + case REDUC_IOR_EXPR: + case REDUC_XOR_EXPR: { unsigned int nelts, i; enum tree_code subcode; @@ -1882,6 +1885,9 @@ const_unop (enum tree_code code, tree ty case REDUC_MIN_EXPR: subcode = MIN_EXPR; break; case REDUC_MAX_EXPR: subcode = MAX_EXPR; break; case REDUC_PLUS_EXPR: subcode = PLUS_EXPR; break; + case REDUC_AND_EXPR: subcode = BIT_AND_EXPR; break; + case REDUC_IOR_EXPR: subcode = BIT_IOR_EXPR; break; + case REDUC_XOR_EXPR: subcode = BIT_XOR_EXPR; break; default: gcc_unreachable (); } Index: gcc/optabs-tree.c =================================================================== --- gcc/optabs-tree.c 2017-11-17 09:40:43.523267008 +0000 +++ gcc/optabs-tree.c 2017-11-17 09:49:36.192354637 +0000 @@ -157,6 +157,15 @@ optab_for_tree_code (enum tree_code code case REDUC_PLUS_EXPR: return reduc_plus_scal_optab; + case REDUC_AND_EXPR: + return reduc_and_scal_optab; + + case REDUC_IOR_EXPR: + return reduc_ior_scal_optab; + + case REDUC_XOR_EXPR: + return reduc_xor_scal_optab; + case VEC_WIDEN_MULT_HI_EXPR: return TYPE_UNSIGNED (type) ? vec_widen_umult_hi_optab : vec_widen_smult_hi_optab; Index: gcc/tree-cfg.c =================================================================== --- gcc/tree-cfg.c 2017-11-17 09:05:59.899390175 +0000 +++ gcc/tree-cfg.c 2017-11-17 09:49:36.194354636 +0000 @@ -3773,6 +3773,9 @@ verify_gimple_assign_unary (gassign *stm case REDUC_MAX_EXPR: case REDUC_MIN_EXPR: case REDUC_PLUS_EXPR: + case REDUC_AND_EXPR: + case REDUC_IOR_EXPR: + case REDUC_XOR_EXPR: if (!VECTOR_TYPE_P (rhs1_type) || !useless_type_conversion_p (lhs_type, TREE_TYPE (rhs1_type))) { Index: gcc/tree-inline.c =================================================================== --- gcc/tree-inline.c 2017-11-17 09:40:43.527767008 +0000 +++ gcc/tree-inline.c 2017-11-17 09:49:36.195354636 +0000 @@ -3878,6 +3878,9 @@ estimate_operator_cost (enum tree_code c case REDUC_MAX_EXPR: case REDUC_MIN_EXPR: case REDUC_PLUS_EXPR: + case REDUC_AND_EXPR: + case REDUC_IOR_EXPR: + case REDUC_XOR_EXPR: case WIDEN_SUM_EXPR: case WIDEN_MULT_EXPR: case DOT_PROD_EXPR: Index: gcc/tree-pretty-print.c =================================================================== --- gcc/tree-pretty-print.c 2017-11-17 08:57:27.159529444 +0000 +++ gcc/tree-pretty-print.c 2017-11-17 09:49:36.195354636 +0000 @@ -3231,24 +3231,6 @@ dump_generic_node (pretty_printer *pp, t is_expr = false; break; - case REDUC_MAX_EXPR: - pp_string (pp, " REDUC_MAX_EXPR < "); - dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); - pp_string (pp, " > "); - break; - - case REDUC_MIN_EXPR: - pp_string (pp, " REDUC_MIN_EXPR < "); - dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); - pp_string (pp, " > "); - break; - - case REDUC_PLUS_EXPR: - pp_string (pp, " REDUC_PLUS_EXPR < "); - dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); - pp_string (pp, " > "); - break; - case VEC_SERIES_EXPR: case VEC_WIDEN_MULT_HI_EXPR: case VEC_WIDEN_MULT_LO_EXPR: @@ -3267,6 +3249,12 @@ dump_generic_node (pretty_printer *pp, t break; case VEC_DUPLICATE_EXPR: + case REDUC_MAX_EXPR: + case REDUC_MIN_EXPR: + case REDUC_PLUS_EXPR: + case REDUC_AND_EXPR: + case REDUC_IOR_EXPR: + case REDUC_XOR_EXPR: pp_space (pp); for (str = get_tree_code_name (code); *str; str++) pp_character (pp, TOUPPER (*str)); Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2017-11-17 09:44:46.389306597 +0000 +++ gcc/tree-vect-loop.c 2017-11-17 09:49:36.196354636 +0000 @@ -2437,11 +2437,20 @@ reduction_code_for_scalar_code (enum tre *reduc_code = REDUC_PLUS_EXPR; return true; - case MULT_EXPR: - case MINUS_EXPR: + case BIT_AND_EXPR: + *reduc_code = REDUC_AND_EXPR; + return true; + case BIT_IOR_EXPR: + *reduc_code = REDUC_IOR_EXPR; + return true; + case BIT_XOR_EXPR: - case BIT_AND_EXPR: + *reduc_code = REDUC_XOR_EXPR; + return true; + + case MULT_EXPR: + case MINUS_EXPR: *reduc_code = ERROR_MARK; return true; Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2017-11-17 09:44:46.385706597 +0000 +++ gcc/config/aarch64/aarch64-sve.md 2017-11-17 09:49:36.188354638 +0000 @@ -1513,6 +1513,26 @@ (define_insn "*reduc_<maxmin_uns>_scal_< "<maxmin_uns_op>v\t%<Vetype>0, %1, %2.<Vetype>" ) +(define_expand "reduc_<optab>_scal_<mode>" + [(set (match_operand:<VEL> 0 "register_operand") + (unspec:<VEL> [(match_dup 2) + (match_operand:SVE_I 1 "register_operand")] + BITWISEV))] + "TARGET_SVE" + { + operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode)); + } +) + +(define_insn "*reduc_<optab>_scal_<mode>" + [(set (match_operand:<VEL> 0 "register_operand" "=w") + (unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl") + (match_operand:SVE_I 2 "register_operand" "w")] + BITWISEV))] + "TARGET_SVE" + "<bit_reduc_op>\t%<Vetype>0, %1, %2.<Vetype>" +) + ;; Unpredicated floating-point addition. (define_expand "add<mode>3" [(set (match_operand:SVE_F 0 "register_operand") Index: gcc/config/aarch64/iterators.md =================================================================== --- gcc/config/aarch64/iterators.md 2017-11-17 09:40:36.505067706 +0000 +++ gcc/config/aarch64/iterators.md 2017-11-17 09:49:36.188354638 +0000 @@ -405,6 +405,9 @@ (define_c_enum "unspec" UNSPEC_SDOT ; Used in aarch64-simd.md. UNSPEC_UDOT ; Used in aarch64-simd.md. UNSPEC_SEL ; Used in aarch64-sve.md. + UNSPEC_ANDV ; Used in aarch64-sve.md. + UNSPEC_IORV ; Used in aarch64-sve.md. + UNSPEC_XORV ; Used in aarch64-sve.md. UNSPEC_ANDF ; Used in aarch64-sve.md. UNSPEC_IORF ; Used in aarch64-sve.md. UNSPEC_XORF ; Used in aarch64-sve.md. @@ -1298,6 +1301,8 @@ (define_int_iterator MAXMINV [UNSPEC_UMA (define_int_iterator FMAXMINV [UNSPEC_FMAXV UNSPEC_FMINV UNSPEC_FMAXNMV UNSPEC_FMINNMV]) +(define_int_iterator BITWISEV [UNSPEC_ANDV UNSPEC_IORV UNSPEC_XORV]) + (define_int_iterator LOGICALF [UNSPEC_ANDF UNSPEC_IORF UNSPEC_XORF]) (define_int_iterator HADDSUB [UNSPEC_SHADD UNSPEC_UHADD @@ -1417,7 +1422,10 @@ (define_int_attr atomic_ldop ;; name for consistency with the integer patterns. (define_int_attr optab [(UNSPEC_ANDF "and") (UNSPEC_IORF "ior") - (UNSPEC_XORF "xor")]) + (UNSPEC_XORF "xor") + (UNSPEC_ANDV "and") + (UNSPEC_IORV "ior") + (UNSPEC_XORV "xor")]) (define_int_attr maxmin_uns [(UNSPEC_UMAXV "umax") (UNSPEC_UMINV "umin") @@ -1445,6 +1453,10 @@ (define_int_attr maxmin_uns_op [(UNSPEC (UNSPEC_FMAXNM "fmaxnm") (UNSPEC_FMINNM "fminnm")]) +(define_int_attr bit_reduc_op [(UNSPEC_ANDV "andv") + (UNSPEC_IORV "orv") + (UNSPEC_XORV "eorv")]) + ;; The SVE logical instruction that implements an unspec. (define_int_attr logicalf_op [(UNSPEC_ANDF "and") (UNSPEC_IORF "orr") Index: gcc/testsuite/lib/target-supports.exp =================================================================== --- gcc/testsuite/lib/target-supports.exp 2017-11-17 09:06:28.516102419 +0000 +++ gcc/testsuite/lib/target-supports.exp 2017-11-17 09:49:36.194354636 +0000 @@ -7162,6 +7162,12 @@ proc check_effective_target_vect_call_ro return $et_vect_call_roundf_saved($et_index) } +# Return 1 if the target supports AND, OR and XOR reduction. + +proc check_effective_target_vect_logical_reduc { } { + return [check_effective_target_aarch64_sve] +} + # Return 1 if the target supports section-anchors proc check_effective_target_section_anchors { } { Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2017-11-09 15:15:28.900668540 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2017-11-17 09:49:36.192354637 +0000 @@ -1,4 +1,4 @@ -/* { dg-require-effective-target whole_vector_shift } */ +/* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */ /* Write a reduction loop to be reduced using vector shifts. */ @@ -24,17 +24,17 @@ main (unsigned char argc, char **argv) check_vect (); for (i = 0; i < N; i++) - in[i] = (i + i + 1) & 0xfd; + { + in[i] = (i + i + 1) & 0xfd; + asm volatile ("" ::: "memory"); + } for (i = 0; i < N; i++) { expected |= in[i]; - asm volatile (""); + asm volatile ("" ::: "memory"); } - /* Prevent constant propagation of the entire loop below. */ - asm volatile ("" : : : "memory"); - for (i = 0; i < N; i++) sum |= in[i]; @@ -47,5 +47,5 @@ main (unsigned char argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" } } */ - +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */ +/* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */ Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c =================================================================== --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2017-11-09 15:15:28.900668540 +0000 +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2017-11-17 09:49:36.192354637 +0000 @@ -1,4 +1,4 @@ -/* { dg-require-effective-target whole_vector_shift } */ +/* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */ /* Write a reduction loop to be reduced using vector shifts and folded. */ @@ -23,12 +23,15 @@ main (unsigned char argc, char **argv) check_vect (); for (i = 0; i < N; i++) - in[i] = (i + i + 1) & 0xfd; + { + in[i] = (i + i + 1) & 0xfd; + asm volatile ("" ::: "memory"); + } for (i = 0; i < N; i++) { expected |= in[i]; - asm volatile (""); + asm volatile ("" ::: "memory"); } for (i = 0; i < N; i++) @@ -43,5 +46,5 @@ main (unsigned char argc, char **argv) return 0; } -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" } } */ - +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */ +/* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_1.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve_reduc_1.c 2017-11-17 09:06:21.395260303 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_reduc_1.c 2017-11-17 09:49:36.192354637 +0000 @@ -65,6 +65,46 @@ #define TEST_MAXMIN(T) \ TEST_MAXMIN (DEF_REDUC_MAXMIN) +#define DEF_REDUC_BITWISE(TYPE, NAME, BIT_OP) \ +TYPE __attribute__ ((noinline, noclone)) \ +reduc_##NAME##_##TYPE (TYPE *a, int n) \ +{ \ + TYPE r = 13; \ + for (int i = 0; i < n; ++i) \ + r BIT_OP a[i]; \ + return r; \ +} + +#define TEST_BITWISE(T) \ + T (int8_t, and, &=) \ + T (int16_t, and, &=) \ + T (int32_t, and, &=) \ + T (int64_t, and, &=) \ + T (uint8_t, and, &=) \ + T (uint16_t, and, &=) \ + T (uint32_t, and, &=) \ + T (uint64_t, and, &=) \ + \ + T (int8_t, ior, |=) \ + T (int16_t, ior, |=) \ + T (int32_t, ior, |=) \ + T (int64_t, ior, |=) \ + T (uint8_t, ior, |=) \ + T (uint16_t, ior, |=) \ + T (uint32_t, ior, |=) \ + T (uint64_t, ior, |=) \ + \ + T (int8_t, xor, ^=) \ + T (int16_t, xor, ^=) \ + T (int32_t, xor, ^=) \ + T (int64_t, xor, ^=) \ + T (uint8_t, xor, ^=) \ + T (uint16_t, xor, ^=) \ + T (uint32_t, xor, ^=) \ + T (uint64_t, xor, ^=) + +TEST_BITWISE (DEF_REDUC_BITWISE) + /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 1 } } */ /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */ @@ -102,6 +142,12 @@ TEST_MAXMIN (DEF_REDUC_MAXMIN) /* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */ + +/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */ + +/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */ + /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */ /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */ @@ -133,3 +179,18 @@ TEST_MAXMIN (DEF_REDUC_MAXMIN) /* { dg-final { scan-assembler-times {\tfminnmv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnmv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnmv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tandv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */ +/* { dg-final { scan-assembler-times {\tandv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */ +/* { dg-final { scan-assembler-times {\tandv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */ +/* { dg-final { scan-assembler-times {\tandv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */ + +/* { dg-final { scan-assembler-times {\torv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */ +/* { dg-final { scan-assembler-times {\torv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */ +/* { dg-final { scan-assembler-times {\torv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */ +/* { dg-final { scan-assembler-times {\torv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */ + +/* { dg-final { scan-assembler-times {\teorv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */ +/* { dg-final { scan-assembler-times {\teorv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */ +/* { dg-final { scan-assembler-times {\teorv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */ +/* { dg-final { scan-assembler-times {\teorv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_2.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve_reduc_2.c 2017-11-17 09:06:21.395260303 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_reduc_2.c 2017-11-17 09:49:36.193354637 +0000 @@ -73,6 +73,49 @@ #define TEST_MAXMIN(T) \ TEST_MAXMIN (DEF_REDUC_MAXMIN) +#define DEF_REDUC_BITWISE(TYPE,NAME,BIT_OP) \ +void __attribute__ ((noinline, noclone)) \ +reduc_##NAME##TYPE (TYPE (*restrict a)[NUM_ELEMS(TYPE)], \ + TYPE *restrict r, int n) \ +{ \ + for (int i = 0; i < n; i++) \ + { \ + r[i] = a[i][0]; \ + for (int j = 0; j < NUM_ELEMS(TYPE); j++) \ + r[i] BIT_OP a[i][j]; \ + } \ +} + +#define TEST_BITWISE(T) \ + T (int8_t, and, &=) \ + T (int16_t, and, &=) \ + T (int32_t, and, &=) \ + T (int64_t, and, &=) \ + T (uint8_t, and, &=) \ + T (uint16_t, and, &=) \ + T (uint32_t, and, &=) \ + T (uint64_t, and, &=) \ + \ + T (int8_t, ior, |=) \ + T (int16_t, ior, |=) \ + T (int32_t, ior, |=) \ + T (int64_t, ior, |=) \ + T (uint8_t, ior, |=) \ + T (uint16_t, ior, |=) \ + T (uint32_t, ior, |=) \ + T (uint64_t, ior, |=) \ + \ + T (int8_t, xor, ^=) \ + T (int16_t, xor, ^=) \ + T (int32_t, xor, ^=) \ + T (int64_t, xor, ^=) \ + T (uint8_t, xor, ^=) \ + T (uint16_t, xor, ^=) \ + T (uint32_t, xor, ^=) \ + T (uint64_t, xor, ^=) + +TEST_BITWISE (DEF_REDUC_BITWISE) + /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */ /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */ @@ -104,3 +147,18 @@ TEST_MAXMIN (DEF_REDUC_MAXMIN) /* { dg-final { scan-assembler-times {\tfminnmv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnmv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */ /* { dg-final { scan-assembler-times {\tfminnmv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tandv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tandv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tandv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tandv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */ + +/* { dg-final { scan-assembler-times {\torv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */ +/* { dg-final { scan-assembler-times {\torv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */ +/* { dg-final { scan-assembler-times {\torv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */ +/* { dg-final { scan-assembler-times {\torv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */ + +/* { dg-final { scan-assembler-times {\teorv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */ +/* { dg-final { scan-assembler-times {\teorv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */ +/* { dg-final { scan-assembler-times {\teorv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */ +/* { dg-final { scan-assembler-times {\teorv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_1_run.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve_reduc_1_run.c 2017-11-17 09:06:21.395260303 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_reduc_1_run.c 2017-11-17 09:49:36.193354637 +0000 @@ -9,7 +9,7 @@ #define INIT_VECTOR(TYPE) \ TYPE a[NUM_ELEMS (TYPE) + 1]; \ for (int i = 0; i < NUM_ELEMS (TYPE) + 1; i++) \ { \ - a[i] = (i * 2) * (i & 1 ? 1 : -1); \ + a[i] = ((i * 2) * (i & 1 ? 1 : -1) | 3); \ asm volatile ("" ::: "memory"); \ } @@ -35,10 +35,22 @@ #define TEST_REDUC_MAXMIN(TYPE, NAME, CM __builtin_abort (); \ } +#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP) \ + { \ + INIT_VECTOR (TYPE); \ + TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE)); \ + volatile TYPE r2 = 13; \ + for (int i = 0; i < NUM_ELEMS (TYPE); ++i) \ + r2 BIT_OP a[i]; \ + if (r1 != r2) \ + __builtin_abort (); \ + } + int main () { TEST_PLUS (TEST_REDUC_PLUS) TEST_MAXMIN (TEST_REDUC_MAXMIN) + TEST_BITWISE (TEST_REDUC_BITWISE) return 0; } Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_2_run.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve_reduc_2_run.c 2017-11-17 09:06:21.395260303 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_reduc_2_run.c 2017-11-17 09:49:36.193354637 +0000 @@ -56,6 +56,20 @@ #define TEST_REDUC_MAXMIN(TYPE, NAME, CM } \ } +#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP) \ + { \ + INIT_MATRIX (TYPE); \ + reduc_##NAME##_##TYPE (mat, r, NROWS); \ + for (int i = 0; i < NROWS; i++) \ + { \ + volatile TYPE r2 = mat[i][0]; \ + for (int j = 0; j < NUM_ELEMS (TYPE); ++j) \ + r2 BIT_OP mat[i][j]; \ + if (r[i] != r2) \ + __builtin_abort (); \ + } \ + } + int main () { TEST_PLUS (TEST_REDUC_PLUS)