diff mbox series

Add support for bitwise reductions

Message ID 87tvxtz0xc.fsf@linaro.org
State New
Headers show
Series Add support for bitwise reductions | expand

Commit Message

Richard Sandiford Nov. 17, 2017, 9:53 a.m. UTC
This patch adds support for the SVE bitwise reduction instructions
(ANDV, ORV and EORV).  It's a fairly mechanical extension of existing
REDUC_* operators.

Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
and powerpc64le-linux-gnu.

Richard


2017-11-17  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree.def (REDUC_AND_EXPR, REDUC_IOR_EXPR, REDUC_XOR_EXPR): New
	tree codes.
	* doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})
	(reduc_xor_scal_@var{m}): Document.
	* doc/sourcebuild.texi (vect_logical_reduc): Likewise.
	* doc/generic.texi (REDUC_MAX_EXPR, REDUC_MIN_EXPR, REDUC_PLUS_EXPR)
	(REDUC_AND_EXPR, REDUC_IOR_EXPR, REDUC_XOR_EXPR): Likewise.
	* optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)
	(reduc_xor_scal_optab): New optabs.
	* cfgexpand.c (expand_debug_expr): Handle the new tree codes.
	* expr.c (expand_expr_real_2): Likewise.
	* fold-const.c (const_unop): Likewise.
	* optabs-tree.c (optab_for_tree_code): Likewise.
	* tree-cfg.c (verify_gimple_assign_unary): Likewise.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* tree-pretty-print.c (dump_generic_node): Likewise.  Reuse
	generic unary code for REDUC_MAX_EXPR, REDUC_MIN_EXPR and
	REDUC_PLUS_EXPR.
	* tree-vect-loop.c (reduction_code_for_scalar_code): Return the
	new reduction codes for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.
	* config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>):
	(*reduc_<bit_reduc>_scal_<mode>): New patterns.
	* config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)
	(UNSPEC_XORV): New unspecs.
	(optab): Add entries for them.
	(BITWISEV): New int iterator.
	(bit_reduc_op): New int attributes.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_logical_reduc):
	New proc.
	* gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc
	and add an associated scan-dump test.  Prevent vectorization
	of the first two loops.
	* gcc.dg/vect/vect-reduc-or_2.c: Likewise.
	* gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions.
	* gcc.target/aarch64/sve_reduc_2.c: Likewise.
	* gcc.target/aarch64/sve_reduc_1_run.c: Likewise.
	(INIT_VECTOR): Tweak initial value so that some bits are always set.
	* gcc.target/aarch64/sve_reduc_2_run.c: Likewise.

Comments

Richard Sandiford Nov. 22, 2017, 6:12 p.m. UTC | #1
Richard Sandiford <richard.sandiford@linaro.org> writes:
> This patch adds support for the SVE bitwise reduction instructions

> (ANDV, ORV and EORV).  It's a fairly mechanical extension of existing

> REDUC_* operators.

>

> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu

> and powerpc64le-linux-gnu.


Here's an updated version that applies on top of the recent
removal of REDUC_*_EXPR.  Tested as before.

Thanks,
Richard


2017-11-22  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)
	(reduc_xor_scal_optab): New optabs.
	* doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})
	(reduc_xor_scal_@var{m}): Document.
	* doc/sourcebuild.texi (vect_logical_reduc): Likewise.
	* internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New
	internal functions.
	* fold-const-call.c (fold_const_call): Handle them.
	* tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new
	internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.
	* config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>):
	(*reduc_<bit_reduc>_scal_<mode>): New patterns.
	* config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)
	(UNSPEC_XORV): New unspecs.
	(optab): Add entries for them.
	(BITWISEV): New int iterator.
	(bit_reduc_op): New int attributes.

gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_logical_reduc):
	New proc.
	* gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc
	and add an associated scan-dump test.  Prevent vectorization
	of the first two loops.
	* gcc.dg/vect/vect-reduc-or_2.c: Likewise.
	* gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions.
	* gcc.target/aarch64/sve_reduc_2.c: Likewise.
	* gcc.target/aarch64/sve_reduc_1_run.c: Likewise.
	(INIT_VECTOR): Tweak initial value so that some bits are always set.
	* gcc.target/aarch64/sve_reduc_2_run.c: Likewise.

Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def	2017-11-22 18:05:58.624329338 +0000
+++ gcc/optabs.def	2017-11-22 18:06:54.516061226 +0000
@@ -292,6 +292,9 @@ OPTAB_D (reduc_smin_scal_optab, "reduc_s
 OPTAB_D (reduc_plus_scal_optab, "reduc_plus_scal_$a")
 OPTAB_D (reduc_umax_scal_optab, "reduc_umax_scal_$a")
 OPTAB_D (reduc_umin_scal_optab, "reduc_umin_scal_$a")
+OPTAB_D (reduc_and_scal_optab,  "reduc_and_scal_$a")
+OPTAB_D (reduc_ior_scal_optab,  "reduc_ior_scal_$a")
+OPTAB_D (reduc_xor_scal_optab,  "reduc_xor_scal_$a")
 
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2017-11-22 18:05:58.620520950 +0000
+++ gcc/doc/md.texi	2017-11-22 18:06:54.515109580 +0000
@@ -5244,6 +5244,17 @@ Compute the sum of the elements of a vec
 operand 0 is the scalar result, with mode equal to the mode of the elements of
 the input vector.
 
+@cindex @code{reduc_and_scal_@var{m}} instruction pattern
+@item @samp{reduc_and_scal_@var{m}}
+@cindex @code{reduc_ior_scal_@var{m}} instruction pattern
+@itemx @samp{reduc_ior_scal_@var{m}}
+@cindex @code{reduc_xor_scal_@var{m}} instruction pattern
+@itemx @samp{reduc_xor_scal_@var{m}}
+Compute the bitwise @code{AND}/@code{IOR}/@code{XOR} reduction of the elements
+of a vector of mode @var{m}.  Operand 1 is the vector input and operand 0
+is the scalar result.  The mode of the scalar result is the same as one
+element of @var{m}.
+
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
 @cindex @code{udot_prod@var{m}} instruction pattern
Index: gcc/doc/sourcebuild.texi
===================================================================
--- gcc/doc/sourcebuild.texi	2017-11-22 18:05:58.621473047 +0000
+++ gcc/doc/sourcebuild.texi	2017-11-22 18:06:54.515109580 +0000
@@ -1570,6 +1570,9 @@ Target supports 16- and 8-bytes vectors.
 
 @item vect_sizes_32B_16B
 Target supports 32- and 16-bytes vectors.
+
+@item vect_logical_reduc
+Target supports AND, IOR and XOR reduction on vectors.
 @end table
 
 @subsubsection Thread Local Storage attributes
Index: gcc/internal-fn.def
===================================================================
--- gcc/internal-fn.def	2017-11-22 18:05:51.545487816 +0000
+++ gcc/internal-fn.def	2017-11-22 18:06:54.516061226 +0000
@@ -137,6 +137,12 @@ DEF_INTERNAL_SIGNED_OPTAB_FN (REDUC_MAX,
 			      reduc_smax_scal, reduc_umax_scal, unary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (REDUC_MIN, ECF_CONST | ECF_NOTHROW, first,
 			      reduc_smin_scal, reduc_umin_scal, unary)
+DEF_INTERNAL_OPTAB_FN (REDUC_AND, ECF_CONST | ECF_NOTHROW,
+		       reduc_and_scal, unary)
+DEF_INTERNAL_OPTAB_FN (REDUC_IOR, ECF_CONST | ECF_NOTHROW,
+		       reduc_ior_scal, unary)
+DEF_INTERNAL_OPTAB_FN (REDUC_XOR, ECF_CONST | ECF_NOTHROW,
+		       reduc_xor_scal, unary)
 
 /* Unary math functions.  */
 DEF_INTERNAL_FLT_FN (ACOS, ECF_CONST, acos, unary)
Index: gcc/fold-const-call.c
===================================================================
--- gcc/fold-const-call.c	2017-11-22 17:53:21.698058809 +0000
+++ gcc/fold-const-call.c	2017-11-22 18:06:54.516061226 +0000
@@ -1176,6 +1176,15 @@ fold_const_call (combined_fn fn, tree ty
     case CFN_REDUC_MIN:
       return fold_const_reduction (type, arg, MIN_EXPR);
 
+    case CFN_REDUC_AND:
+      return fold_const_reduction (type, arg, BIT_AND_EXPR);
+
+    case CFN_REDUC_IOR:
+      return fold_const_reduction (type, arg, BIT_IOR_EXPR);
+
+    case CFN_REDUC_XOR:
+      return fold_const_reduction (type, arg, BIT_XOR_EXPR);
+
     default:
       return fold_const_call_1 (fn, type, arg);
     }
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-11-22 18:05:58.629089823 +0000
+++ gcc/tree-vect-loop.c	2017-11-22 18:06:54.517964518 +0000
@@ -2436,11 +2436,20 @@ reduction_fn_for_scalar_code (enum tree_
         *reduc_fn = IFN_REDUC_PLUS;
         return true;
 
-      case MULT_EXPR:
-      case MINUS_EXPR:
+      case BIT_AND_EXPR:
+	*reduc_fn = IFN_REDUC_AND;
+	return true;
+
       case BIT_IOR_EXPR:
+	*reduc_fn = IFN_REDUC_IOR;
+	return true;
+
       case BIT_XOR_EXPR:
-      case BIT_AND_EXPR:
+	*reduc_fn = IFN_REDUC_XOR;
+	return true;
+
+      case MULT_EXPR:
+      case MINUS_EXPR:
         *reduc_fn = IFN_LAST;
         return true;
 
Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2017-11-22 18:05:58.618616756 +0000
+++ gcc/config/aarch64/aarch64-sve.md	2017-11-22 18:06:54.514157934 +0000
@@ -1513,6 +1513,26 @@ (define_insn "*reduc_<maxmin_uns>_scal_<
   "<maxmin_uns_op>v\t%<Vetype>0, %1, %2.<Vetype>"
 )
 
+(define_expand "reduc_<optab>_scal_<mode>"
+  [(set (match_operand:<VEL> 0 "register_operand")
+	(unspec:<VEL> [(match_dup 2)
+		       (match_operand:SVE_I 1 "register_operand")]
+		      BITWISEV))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+(define_insn "*reduc_<optab>_scal_<mode>"
+  [(set (match_operand:<VEL> 0 "register_operand" "=w")
+	(unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+		       (match_operand:SVE_I 2 "register_operand" "w")]
+		      BITWISEV))]
+  "TARGET_SVE"
+  "<bit_reduc_op>\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
 ;; Unpredicated floating-point addition.
 (define_expand "add<mode>3"
   [(set (match_operand:SVE_F 0 "register_operand")
Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md	2017-11-22 18:05:58.618616756 +0000
+++ gcc/config/aarch64/iterators.md	2017-11-22 18:06:54.514157934 +0000
@@ -409,6 +409,9 @@ (define_c_enum "unspec"
     UNSPEC_SDOT		; Used in aarch64-simd.md.
     UNSPEC_UDOT		; Used in aarch64-simd.md.
     UNSPEC_SEL		; Used in aarch64-sve.md.
+    UNSPEC_ANDV		; Used in aarch64-sve.md.
+    UNSPEC_IORV		; Used in aarch64-sve.md.
+    UNSPEC_XORV		; Used in aarch64-sve.md.
     UNSPEC_ANDF		; Used in aarch64-sve.md.
     UNSPEC_IORF		; Used in aarch64-sve.md.
     UNSPEC_XORF		; Used in aarch64-sve.md.
@@ -1318,6 +1321,8 @@ (define_int_iterator MAXMINV [UNSPEC_UMA
 (define_int_iterator FMAXMINV [UNSPEC_FMAXV UNSPEC_FMINV
 			       UNSPEC_FMAXNMV UNSPEC_FMINNMV])
 
+(define_int_iterator BITWISEV [UNSPEC_ANDV UNSPEC_IORV UNSPEC_XORV])
+
 (define_int_iterator LOGICALF [UNSPEC_ANDF UNSPEC_IORF UNSPEC_XORF])
 
 (define_int_iterator HADDSUB [UNSPEC_SHADD UNSPEC_UHADD
@@ -1437,7 +1442,10 @@ (define_int_attr atomic_ldop
 ;; name for consistency with the integer patterns.
 (define_int_attr optab [(UNSPEC_ANDF "and")
 			(UNSPEC_IORF "ior")
-			(UNSPEC_XORF "xor")])
+			(UNSPEC_XORF "xor")
+			(UNSPEC_ANDV "and")
+			(UNSPEC_IORV "ior")
+			(UNSPEC_XORV "xor")])
 
 (define_int_attr  maxmin_uns [(UNSPEC_UMAXV "umax")
 			      (UNSPEC_UMINV "umin")
@@ -1465,6 +1473,10 @@ (define_int_attr  maxmin_uns_op [(UNSPEC
 				 (UNSPEC_FMAXNM "fmaxnm")
 				 (UNSPEC_FMINNM "fminnm")])
 
+(define_int_attr bit_reduc_op [(UNSPEC_ANDV "andv")
+			       (UNSPEC_IORV "orv")
+			       (UNSPEC_XORV "eorv")])
+
 ;; The SVE logical instruction that implements an unspec.
 (define_int_attr logicalf_op [(UNSPEC_ANDF "and")
 		 	      (UNSPEC_IORF "orr")
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	2017-11-22 18:05:58.626233532 +0000
+++ gcc/testsuite/lib/target-supports.exp	2017-11-22 18:06:54.517012872 +0000
@@ -7187,6 +7187,12 @@ proc check_effective_target_vect_call_ro
     return $et_vect_call_roundf_saved($et_index)
 }
 
+# Return 1 if the target supports AND, OR and XOR reduction.
+
+proc check_effective_target_vect_logical_reduc { } {
+    return [check_effective_target_aarch64_sve]
+}
+
 # Return 1 if the target supports section-anchors
 
 proc check_effective_target_section_anchors { } {
Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c	2017-11-22 18:05:58.624329338 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c	2017-11-22 18:06:54.516061226 +0000
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target whole_vector_shift } */
+/* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */
 
 /* Write a reduction loop to be reduced using vector shifts.  */
 
@@ -24,17 +24,17 @@ main (unsigned char argc, char **argv)
   check_vect ();
 
   for (i = 0; i < N; i++)
-    in[i] = (i + i + 1) & 0xfd;
+    {
+      in[i] = (i + i + 1) & 0xfd;
+      asm volatile ("" ::: "memory");
+    }
 
   for (i = 0; i < N; i++)
     {
       expected |= in[i];
-      asm volatile ("");
+      asm volatile ("" ::: "memory");
     }
 
-  /* Prevent constant propagation of the entire loop below.  */
-  asm volatile ("" : : : "memory");
-
   for (i = 0; i < N; i++)
     sum |= in[i];
 
@@ -47,5 +47,5 @@ main (unsigned char argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" } } */
-
+/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */
+/* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c	2017-11-22 18:05:58.625281435 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c	2017-11-22 18:06:54.516061226 +0000
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target whole_vector_shift } */
+/* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */
 
 /* Write a reduction loop to be reduced using vector shifts and folded.  */
 
@@ -23,12 +23,15 @@ main (unsigned char argc, char **argv)
   check_vect ();
 
   for (i = 0; i < N; i++)
-    in[i] = (i + i + 1) & 0xfd;
+    {
+      in[i] = (i + i + 1) & 0xfd;
+      asm volatile ("" ::: "memory");
+    }
 
   for (i = 0; i < N; i++)
     {
       expected |= in[i];
-      asm volatile ("");
+      asm volatile ("" ::: "memory");
     }
 
   for (i = 0; i < N; i++)
@@ -43,5 +46,5 @@ main (unsigned char argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" } } */
-
+/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */
+/* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_1.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve_reduc_1.c	2017-11-22 18:05:58.625281435 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_reduc_1.c	2017-11-22 18:06:54.516061226 +0000
@@ -65,6 +65,46 @@ #define TEST_MAXMIN(T)				\
 
 TEST_MAXMIN (DEF_REDUC_MAXMIN)
 
+#define DEF_REDUC_BITWISE(TYPE, NAME, BIT_OP)	\
+TYPE __attribute__ ((noinline, noclone))	\
+reduc_##NAME##_##TYPE (TYPE *a, int n)		\
+{						\
+  TYPE r = 13;					\
+  for (int i = 0; i < n; ++i)			\
+    r BIT_OP a[i];				\
+  return r;					\
+}
+
+#define TEST_BITWISE(T)				\
+  T (int8_t, and, &=)				\
+  T (int16_t, and, &=)				\
+  T (int32_t, and, &=)				\
+  T (int64_t, and, &=)				\
+  T (uint8_t, and, &=)				\
+  T (uint16_t, and, &=)				\
+  T (uint32_t, and, &=)				\
+  T (uint64_t, and, &=)				\
+						\
+  T (int8_t, ior, |=)				\
+  T (int16_t, ior, |=)				\
+  T (int32_t, ior, |=)				\
+  T (int64_t, ior, |=)				\
+  T (uint8_t, ior, |=)				\
+  T (uint16_t, ior, |=)				\
+  T (uint32_t, ior, |=)				\
+  T (uint64_t, ior, |=)				\
+						\
+  T (int8_t, xor, ^=)				\
+  T (int16_t, xor, ^=)				\
+  T (int32_t, xor, ^=)				\
+  T (int64_t, xor, ^=)				\
+  T (uint8_t, xor, ^=)				\
+  T (uint16_t, xor, ^=)				\
+  T (uint32_t, xor, ^=)				\
+  T (uint64_t, xor, ^=)
+
+TEST_BITWISE (DEF_REDUC_BITWISE)
+
 /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */
@@ -102,6 +142,12 @@ TEST_MAXMIN (DEF_REDUC_MAXMIN)
 /* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */
 
+/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */
+
+/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */
+
+/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */
+
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
@@ -133,3 +179,18 @@ TEST_MAXMIN (DEF_REDUC_MAXMIN)
 /* { dg-final { scan-assembler-times {\tfminnmv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfminnmv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfminnmv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tandv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */
+/* { dg-final { scan-assembler-times {\tandv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */
+/* { dg-final { scan-assembler-times {\tandv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */
+/* { dg-final { scan-assembler-times {\tandv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */
+
+/* { dg-final { scan-assembler-times {\torv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */
+/* { dg-final { scan-assembler-times {\torv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */
+/* { dg-final { scan-assembler-times {\torv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */
+/* { dg-final { scan-assembler-times {\torv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */
+
+/* { dg-final { scan-assembler-times {\teorv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */
+/* { dg-final { scan-assembler-times {\teorv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */
+/* { dg-final { scan-assembler-times {\teorv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */
+/* { dg-final { scan-assembler-times {\teorv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_2.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve_reduc_2.c	2017-11-22 18:05:58.625281435 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_reduc_2.c	2017-11-22 18:06:54.517012872 +0000
@@ -73,6 +73,49 @@ #define TEST_MAXMIN(T)				\
 
 TEST_MAXMIN (DEF_REDUC_MAXMIN)
 
+#define DEF_REDUC_BITWISE(TYPE,NAME,BIT_OP)			\
+void __attribute__ ((noinline, noclone))			\
+reduc_##NAME##TYPE (TYPE (*restrict a)[NUM_ELEMS(TYPE)],	\
+		    TYPE *restrict r, int n)			\
+{								\
+  for (int i = 0; i < n; i++)					\
+    {								\
+      r[i] = a[i][0];						\
+      for (int j = 0; j < NUM_ELEMS(TYPE); j++)			\
+        r[i] BIT_OP a[i][j];					\
+    }								\
+}
+
+#define TEST_BITWISE(T)				\
+  T (int8_t, and, &=)				\
+  T (int16_t, and, &=)				\
+  T (int32_t, and, &=)				\
+  T (int64_t, and, &=)				\
+  T (uint8_t, and, &=)				\
+  T (uint16_t, and, &=)				\
+  T (uint32_t, and, &=)				\
+  T (uint64_t, and, &=)				\
+						\
+  T (int8_t, ior, |=)				\
+  T (int16_t, ior, |=)				\
+  T (int32_t, ior, |=)				\
+  T (int64_t, ior, |=)				\
+  T (uint8_t, ior, |=)				\
+  T (uint16_t, ior, |=)				\
+  T (uint32_t, ior, |=)				\
+  T (uint64_t, ior, |=)				\
+						\
+  T (int8_t, xor, ^=)				\
+  T (int16_t, xor, ^=)				\
+  T (int32_t, xor, ^=)				\
+  T (int64_t, xor, ^=)				\
+  T (uint8_t, xor, ^=)				\
+  T (uint16_t, xor, ^=)				\
+  T (uint32_t, xor, ^=)				\
+  T (uint64_t, xor, ^=)
+
+TEST_BITWISE (DEF_REDUC_BITWISE)
+
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
@@ -104,3 +147,18 @@ TEST_MAXMIN (DEF_REDUC_MAXMIN)
 /* { dg-final { scan-assembler-times {\tfminnmv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfminnmv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfminnmv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tandv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tandv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tandv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tandv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\torv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */
+/* { dg-final { scan-assembler-times {\torv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */
+/* { dg-final { scan-assembler-times {\torv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
+/* { dg-final { scan-assembler-times {\torv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\teorv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */
+/* { dg-final { scan-assembler-times {\teorv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */
+/* { dg-final { scan-assembler-times {\teorv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
+/* { dg-final { scan-assembler-times {\teorv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_1_run.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve_reduc_1_run.c	2017-11-22 18:05:58.625281435 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_reduc_1_run.c	2017-11-22 18:06:54.516061226 +0000
@@ -9,7 +9,7 @@ #define INIT_VECTOR(TYPE)				\
   TYPE a[NUM_ELEMS (TYPE) + 1];				\
   for (int i = 0; i < NUM_ELEMS (TYPE) + 1; i++)	\
     {							\
-      a[i] = (i * 2) * (i & 1 ? 1 : -1);		\
+      a[i] = ((i * 2) * (i & 1 ? 1 : -1) | 3);		\
       asm volatile ("" ::: "memory");			\
     }
 
@@ -35,10 +35,22 @@ #define TEST_REDUC_MAXMIN(TYPE, NAME, CM
       __builtin_abort ();					\
   }
 
+#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)			\
+  {								\
+    INIT_VECTOR (TYPE);						\
+    TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE));	\
+    volatile TYPE r2 = 13;					\
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)			\
+      r2 BIT_OP a[i];						\
+    if (r1 != r2)						\
+      __builtin_abort ();					\
+  }
+
 int main ()
 {
   TEST_PLUS (TEST_REDUC_PLUS)
   TEST_MAXMIN (TEST_REDUC_MAXMIN)
+  TEST_BITWISE (TEST_REDUC_BITWISE)
 
   return 0;
 }
Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_2_run.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve_reduc_2_run.c	2017-11-22 18:05:58.625281435 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_reduc_2_run.c	2017-11-22 18:06:54.517012872 +0000
@@ -56,6 +56,20 @@ #define TEST_REDUC_MAXMIN(TYPE, NAME, CM
       }							\
     }
 
+#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)		\
+  {							\
+    INIT_MATRIX (TYPE);					\
+    reduc_##NAME##_##TYPE (mat, r, NROWS);		\
+    for (int i = 0; i < NROWS; i++)			\
+      {							\
+	volatile TYPE r2 = mat[i][0];			\
+	for (int j = 0; j < NUM_ELEMS (TYPE); ++j)	\
+	  r2 BIT_OP mat[i][j];				\
+	if (r[i] != r2)					\
+	  __builtin_abort ();				\
+      }							\
+    }
+
 int main ()
 {
   TEST_PLUS (TEST_REDUC_PLUS)
Jeff Law Dec. 14, 2017, 12:36 a.m. UTC | #2
On 11/22/2017 11:12 AM, Richard Sandiford wrote:
> Richard Sandiford <richard.sandiford@linaro.org> writes:

>> This patch adds support for the SVE bitwise reduction instructions

>> (ANDV, ORV and EORV).  It's a fairly mechanical extension of existing

>> REDUC_* operators.

>>

>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu

>> and powerpc64le-linux-gnu.

> 

> Here's an updated version that applies on top of the recent

> removal of REDUC_*_EXPR.  Tested as before.

> 

> Thanks,

> Richard

> 

> 

> 2017-11-22  Richard Sandiford  <richard.sandiford@linaro.org>

> 	    Alan Hayward  <alan.hayward@arm.com>

> 	    David Sherwood  <david.sherwood@arm.com>

> 

> gcc/

> 	* optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)

> 	(reduc_xor_scal_optab): New optabs.

> 	* doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})

> 	(reduc_xor_scal_@var{m}): Document.

> 	* doc/sourcebuild.texi (vect_logical_reduc): Likewise.

> 	* internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New

> 	internal functions.

> 	* fold-const-call.c (fold_const_call): Handle them.

> 	* tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new

> 	internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.

> 	* config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>):

> 	(*reduc_<bit_reduc>_scal_<mode>): New patterns.

> 	* config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)

> 	(UNSPEC_XORV): New unspecs.

> 	(optab): Add entries for them.

> 	(BITWISEV): New int iterator.

> 	(bit_reduc_op): New int attributes.

> 

> gcc/testsuite/

> 	* lib/target-supports.exp (check_effective_target_vect_logical_reduc):

> 	New proc.

> 	* gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc

> 	and add an associated scan-dump test.  Prevent vectorization

> 	of the first two loops.

> 	* gcc.dg/vect/vect-reduc-or_2.c: Likewise.

> 	* gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions.

> 	* gcc.target/aarch64/sve_reduc_2.c: Likewise.

> 	* gcc.target/aarch64/sve_reduc_1_run.c: Likewise.

> 	(INIT_VECTOR): Tweak initial value so that some bits are always set.

> 	* gcc.target/aarch64/sve_reduc_2_run.c: Likewise.

OK.
Jeff
James Greenhalgh Jan. 7, 2018, 5:03 p.m. UTC | #3
On Thu, Dec 14, 2017 at 12:36:58AM +0000, Jeff Law wrote:
> On 11/22/2017 11:12 AM, Richard Sandiford wrote:

> > Richard Sandiford <richard.sandiford@linaro.org> writes:

> >> This patch adds support for the SVE bitwise reduction instructions

> >> (ANDV, ORV and EORV).  It's a fairly mechanical extension of existing

> >> REDUC_* operators.

> >>

> >> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu

> >> and powerpc64le-linux-gnu.

> > 

> > Here's an updated version that applies on top of the recent

> > removal of REDUC_*_EXPR.  Tested as before.

> > 

> > Thanks,

> > Richard

> > 

> > 

> > 2017-11-22  Richard Sandiford  <richard.sandiford@linaro.org>

> > 	    Alan Hayward  <alan.hayward@arm.com>

> > 	    David Sherwood  <david.sherwood@arm.com>

> > 

> > gcc/

> > 	* optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)

> > 	(reduc_xor_scal_optab): New optabs.

> > 	* doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})

> > 	(reduc_xor_scal_@var{m}): Document.

> > 	* doc/sourcebuild.texi (vect_logical_reduc): Likewise.

> > 	* internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New

> > 	internal functions.

> > 	* fold-const-call.c (fold_const_call): Handle them.

> > 	* tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new

> > 	internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.

> > 	* config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>):

> > 	(*reduc_<bit_reduc>_scal_<mode>): New patterns.

> > 	* config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)

> > 	(UNSPEC_XORV): New unspecs.

> > 	(optab): Add entries for them.

> > 	(BITWISEV): New int iterator.

> > 	(bit_reduc_op): New int attributes.

> > 

> > gcc/testsuite/

> > 	* lib/target-supports.exp (check_effective_target_vect_logical_reduc):

> > 	New proc.

> > 	* gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc

> > 	and add an associated scan-dump test.  Prevent vectorization

> > 	of the first two loops.

> > 	* gcc.dg/vect/vect-reduc-or_2.c: Likewise.

> > 	* gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions.

> > 	* gcc.target/aarch64/sve_reduc_2.c: Likewise.

> > 	* gcc.target/aarch64/sve_reduc_1_run.c: Likewise.

> > 	(INIT_VECTOR): Tweak initial value so that some bits are always set.

> > 	* gcc.target/aarch64/sve_reduc_2_run.c: Likewise.

> OK.

> Jeff


I'm also OK with the AArch64 parts.

James
Rainer Orth Jan. 24, 2018, 8:29 p.m. UTC | #4
Jeff Law <law@redhat.com> writes:

> On 11/22/2017 11:12 AM, Richard Sandiford wrote:

>> Richard Sandiford <richard.sandiford@linaro.org> writes:

>>> This patch adds support for the SVE bitwise reduction instructions

>>> (ANDV, ORV and EORV).  It's a fairly mechanical extension of existing

>>> REDUC_* operators.

>>>

>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu

>>> and powerpc64le-linux-gnu.

>> 

>> Here's an updated version that applies on top of the recent

>> removal of REDUC_*_EXPR.  Tested as before.

>> 

>> Thanks,

>> Richard

>> 

>> 

>> 2017-11-22  Richard Sandiford  <richard.sandiford@linaro.org>

>> 	    Alan Hayward  <alan.hayward@arm.com>

>> 	    David Sherwood  <david.sherwood@arm.com>

>> 

>> gcc/

>> 	* optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)

>> 	(reduc_xor_scal_optab): New optabs.

>> 	* doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})

>> 	(reduc_xor_scal_@var{m}): Document.

>> 	* doc/sourcebuild.texi (vect_logical_reduc): Likewise.

>> 	* internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New

>> 	internal functions.

>> 	* fold-const-call.c (fold_const_call): Handle them.

>> 	* tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new

>> 	internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.

>> 	* config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>):

>> 	(*reduc_<bit_reduc>_scal_<mode>): New patterns.

>> 	* config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)

>> 	(UNSPEC_XORV): New unspecs.

>> 	(optab): Add entries for them.

>> 	(BITWISEV): New int iterator.

>> 	(bit_reduc_op): New int attributes.

>> 

>> gcc/testsuite/

>> 	* lib/target-supports.exp (check_effective_target_vect_logical_reduc):

>> 	New proc.

>> 	* gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc

>> 	and add an associated scan-dump test.  Prevent vectorization

>> 	of the first two loops.

>> 	* gcc.dg/vect/vect-reduc-or_2.c: Likewise.

>> 	* gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions.

>> 	* gcc.target/aarch64/sve_reduc_2.c: Likewise.

>> 	* gcc.target/aarch64/sve_reduc_1_run.c: Likewise.

>> 	(INIT_VECTOR): Tweak initial value so that some bits are always set.

>> 	* gcc.target/aarch64/sve_reduc_2_run.c: Likewise.

> OK.

> Jeff


Two tests have regressed on sparc-sun-solaris2.*:

+FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects  scan-tree-dump vect "Reduce using vector shifts"
+FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using vector shifts"
+FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects  scan-tree-dump vect "Reduce using vector shifts"
+FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using vector shifts"

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University
Richard Sandiford Jan. 25, 2018, 10:24 a.m. UTC | #5
Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:
> Jeff Law <law@redhat.com> writes:

>> On 11/22/2017 11:12 AM, Richard Sandiford wrote:

>>> Richard Sandiford <richard.sandiford@linaro.org> writes:

>>>> This patch adds support for the SVE bitwise reduction instructions

>>>> (ANDV, ORV and EORV).  It's a fairly mechanical extension of existing

>>>> REDUC_* operators.

>>>>

>>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu

>>>> and powerpc64le-linux-gnu.

>>> 

>>> Here's an updated version that applies on top of the recent

>>> removal of REDUC_*_EXPR.  Tested as before.

>>> 

>>> Thanks,

>>> Richard

>>> 

>>> 

>>> 2017-11-22  Richard Sandiford  <richard.sandiford@linaro.org>

>>> 	    Alan Hayward  <alan.hayward@arm.com>

>>> 	    David Sherwood  <david.sherwood@arm.com>

>>> 

>>> gcc/

>>> 	* optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)

>>> 	(reduc_xor_scal_optab): New optabs.

>>> 	* doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})

>>> 	(reduc_xor_scal_@var{m}): Document.

>>> 	* doc/sourcebuild.texi (vect_logical_reduc): Likewise.

>>> 	* internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New

>>> 	internal functions.

>>> 	* fold-const-call.c (fold_const_call): Handle them.

>>> 	* tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new

>>> 	internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.

>>> 	* config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>):

>>> 	(*reduc_<bit_reduc>_scal_<mode>): New patterns.

>>> 	* config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)

>>> 	(UNSPEC_XORV): New unspecs.

>>> 	(optab): Add entries for them.

>>> 	(BITWISEV): New int iterator.

>>> 	(bit_reduc_op): New int attributes.

>>> 

>>> gcc/testsuite/

>>> 	* lib/target-supports.exp (check_effective_target_vect_logical_reduc):

>>> 	New proc.

>>> 	* gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc

>>> 	and add an associated scan-dump test.  Prevent vectorization

>>> 	of the first two loops.

>>> 	* gcc.dg/vect/vect-reduc-or_2.c: Likewise.

>>> 	* gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions.

>>> 	* gcc.target/aarch64/sve_reduc_2.c: Likewise.

>>> 	* gcc.target/aarch64/sve_reduc_1_run.c: Likewise.

>>> 	(INIT_VECTOR): Tweak initial value so that some bits are always set.

>>> 	* gcc.target/aarch64/sve_reduc_2_run.c: Likewise.

>> OK.

>> Jeff

>

> Two tests have regressed on sparc-sun-solaris2.*:

>

> +FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects

> scan-tree-dump vect "Reduce using vector shifts"

> +FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using

> vector shifts"

> +FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects

> scan-tree-dump vect "Reduce using vector shifts"

> +FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using

> vector shifts"


Bah, I think I broke this yesterday in:

2018-01-24  Richard Sandiford  <richard.sandiford@linaro.org>

	PR testsuite/83889
        [...]
	* gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run.
	* gcc.dg/vect/vect-reduc-or_2.c: Likewise.

(r257022), which removed:

  /* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */

I'd somehow thought that the dump lines in these two tests were already
guarded, but they weren't.

Tested on aarch64-linux-gnu and x86_64-linux-gnu and applied as obvious.
Sorry for the breakage.

Richard


2018-01-25  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/testsuite/
	* gcc.dg/vect/vect-reduc-or_1.c: Require whole_vector_shift for
	the shift dump line.
	* gcc.dg/vect/vect-reduc-or_2.c: Likewise.

Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c	2018-01-24 16:22:31.724089913 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c	2018-01-25 10:16:16.283500281 +0000
@@ -45,5 +45,5 @@ main (unsigned char argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */
+/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */
 /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c	2018-01-24 16:22:31.724089913 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c	2018-01-25 10:16:16.284500239 +0000
@@ -44,5 +44,5 @@ main (unsigned char argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */
+/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */
 /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */
Christophe Lyon Jan. 26, 2018, 9:09 a.m. UTC | #6
On 25 January 2018 at 11:24, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:

>> Jeff Law <law@redhat.com> writes:

>>> On 11/22/2017 11:12 AM, Richard Sandiford wrote:

>>>> Richard Sandiford <richard.sandiford@linaro.org> writes:

>>>>> This patch adds support for the SVE bitwise reduction instructions

>>>>> (ANDV, ORV and EORV).  It's a fairly mechanical extension of existing

>>>>> REDUC_* operators.

>>>>>

>>>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu

>>>>> and powerpc64le-linux-gnu.

>>>>

>>>> Here's an updated version that applies on top of the recent

>>>> removal of REDUC_*_EXPR.  Tested as before.

>>>>

>>>> Thanks,

>>>> Richard

>>>>

>>>>

>>>> 2017-11-22  Richard Sandiford  <richard.sandiford@linaro.org>

>>>>         Alan Hayward  <alan.hayward@arm.com>

>>>>         David Sherwood  <david.sherwood@arm.com>

>>>>

>>>> gcc/

>>>>     * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)

>>>>     (reduc_xor_scal_optab): New optabs.

>>>>     * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})

>>>>     (reduc_xor_scal_@var{m}): Document.

>>>>     * doc/sourcebuild.texi (vect_logical_reduc): Likewise.

>>>>     * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New

>>>>     internal functions.

>>>>     * fold-const-call.c (fold_const_call): Handle them.

>>>>     * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new

>>>>     internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.

>>>>     * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>):

>>>>     (*reduc_<bit_reduc>_scal_<mode>): New patterns.

>>>>     * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)

>>>>     (UNSPEC_XORV): New unspecs.

>>>>     (optab): Add entries for them.

>>>>     (BITWISEV): New int iterator.

>>>>     (bit_reduc_op): New int attributes.

>>>>

>>>> gcc/testsuite/

>>>>     * lib/target-supports.exp (check_effective_target_vect_logical_reduc):

>>>>     New proc.

>>>>     * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc

>>>>     and add an associated scan-dump test.  Prevent vectorization

>>>>     of the first two loops.

>>>>     * gcc.dg/vect/vect-reduc-or_2.c: Likewise.

>>>>     * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions.

>>>>     * gcc.target/aarch64/sve_reduc_2.c: Likewise.

>>>>     * gcc.target/aarch64/sve_reduc_1_run.c: Likewise.

>>>>     (INIT_VECTOR): Tweak initial value so that some bits are always set.

>>>>     * gcc.target/aarch64/sve_reduc_2_run.c: Likewise.

>>> OK.

>>> Jeff

>>

>> Two tests have regressed on sparc-sun-solaris2.*:

>>

>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects

>> scan-tree-dump vect "Reduce using vector shifts"

>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using

>> vector shifts"

>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects

>> scan-tree-dump vect "Reduce using vector shifts"

>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using

>> vector shifts"

>

> Bah, I think I broke this yesterday in:

>

> 2018-01-24  Richard Sandiford  <richard.sandiford@linaro.org>

>

>         PR testsuite/83889

>         [...]

>         * gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run.

>         * gcc.dg/vect/vect-reduc-or_2.c: Likewise.

>

> (r257022), which removed:

>

>   /* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */

>

> I'd somehow thought that the dump lines in these two tests were already

> guarded, but they weren't.

>

> Tested on aarch64-linux-gnu and x86_64-linux-gnu and applied as obvious.

> Sorry for the breakage.

>

> Richard

>

>


Hi Richard,

While this fixes the regression on armeb (same as on sparc), the
effect on arm-none-linux-gnueabi and arm-none-eabi
is that the tests are now skipped, while they used to pass.
Is this expected? Or is the guard you added too restrictive?

Thanks,

Christophe

> 2018-01-25  Richard Sandiford  <richard.sandiford@linaro.org>

>

> gcc/testsuite/

>         * gcc.dg/vect/vect-reduc-or_1.c: Require whole_vector_shift for

>         the shift dump line.

>         * gcc.dg/vect/vect-reduc-or_2.c: Likewise.

>

> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c

> ===================================================================

> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-24 16:22:31.724089913 +0000

> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-25 10:16:16.283500281 +0000

> @@ -45,5 +45,5 @@ main (unsigned char argc, char **argv)

>    return 0;

>  }

>

> -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */

> +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */

>  /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */

> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c

> ===================================================================

> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-24 16:22:31.724089913 +0000

> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-25 10:16:16.284500239 +0000

> @@ -44,5 +44,5 @@ main (unsigned char argc, char **argv)

>    return 0;

>  }

>

> -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */

> +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */

>  /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */
Richard Sandiford Jan. 26, 2018, 9:33 a.m. UTC | #7
Christophe Lyon <christophe.lyon@linaro.org> writes:
> On 25 January 2018 at 11:24, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:

>>> Jeff Law <law@redhat.com> writes:

>>>> On 11/22/2017 11:12 AM, Richard Sandiford wrote:

>>>>> Richard Sandiford <richard.sandiford@linaro.org> writes:

>>>>>> This patch adds support for the SVE bitwise reduction instructions

>>>>>> (ANDV, ORV and EORV).  It's a fairly mechanical extension of existing

>>>>>> REDUC_* operators.

>>>>>>

>>>>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu

>>>>>> and powerpc64le-linux-gnu.

>>>>>

>>>>> Here's an updated version that applies on top of the recent

>>>>> removal of REDUC_*_EXPR.  Tested as before.

>>>>>

>>>>> Thanks,

>>>>> Richard

>>>>>

>>>>>

>>>>> 2017-11-22  Richard Sandiford  <richard.sandiford@linaro.org>

>>>>>         Alan Hayward  <alan.hayward@arm.com>

>>>>>         David Sherwood  <david.sherwood@arm.com>

>>>>>

>>>>> gcc/

>>>>>     * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)

>>>>>     (reduc_xor_scal_optab): New optabs.

>>>>>     * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})

>>>>>     (reduc_xor_scal_@var{m}): Document.

>>>>>     * doc/sourcebuild.texi (vect_logical_reduc): Likewise.

>>>>>     * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New

>>>>>     internal functions.

>>>>>     * fold-const-call.c (fold_const_call): Handle them.

>>>>>     * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new

>>>>>     internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.

>>>>>     * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>):

>>>>>     (*reduc_<bit_reduc>_scal_<mode>): New patterns.

>>>>>     * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)

>>>>>     (UNSPEC_XORV): New unspecs.

>>>>>     (optab): Add entries for them.

>>>>>     (BITWISEV): New int iterator.

>>>>>     (bit_reduc_op): New int attributes.

>>>>>

>>>>> gcc/testsuite/

>>>>>     * lib/target-supports.exp (check_effective_target_vect_logical_reduc):

>>>>>     New proc.

>>>>>     * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc

>>>>>     and add an associated scan-dump test.  Prevent vectorization

>>>>>     of the first two loops.

>>>>>     * gcc.dg/vect/vect-reduc-or_2.c: Likewise.

>>>>>     * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions.

>>>>>     * gcc.target/aarch64/sve_reduc_2.c: Likewise.

>>>>>     * gcc.target/aarch64/sve_reduc_1_run.c: Likewise.

>>>>>     (INIT_VECTOR): Tweak initial value so that some bits are always set.

>>>>>     * gcc.target/aarch64/sve_reduc_2_run.c: Likewise.

>>>> OK.

>>>> Jeff

>>>

>>> Two tests have regressed on sparc-sun-solaris2.*:

>>>

>>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects

>>> scan-tree-dump vect "Reduce using vector shifts"

>>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using

>>> vector shifts"

>>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects

>>> scan-tree-dump vect "Reduce using vector shifts"

>>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using

>>> vector shifts"

>>

>> Bah, I think I broke this yesterday in:

>>

>> 2018-01-24  Richard Sandiford  <richard.sandiford@linaro.org>

>>

>>         PR testsuite/83889

>>         [...]

>>         * gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run.

>>         * gcc.dg/vect/vect-reduc-or_2.c: Likewise.

>>

>> (r257022), which removed:

>>

>>   /* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */

>>

>> I'd somehow thought that the dump lines in these two tests were already

>> guarded, but they weren't.

>>

>> Tested on aarch64-linux-gnu and x86_64-linux-gnu and applied as obvious.

>> Sorry for the breakage.

>>

>> Richard

>>

>>

>

> Hi Richard,

>

> While this fixes the regression on armeb (same as on sparc), the

> effect on arm-none-linux-gnueabi and arm-none-eabi

> is that the tests are now skipped, while they used to pass.

> Is this expected? Or is the guard you added too restrictive?


I think that means that the tests went from UNSUPPORTED to PASS on
the last two targets with r257022.  Is that right?

It's expected in the sense that whole_vector_shift isn't true for
any arm*-*-* target, and historically this test was restricted to
whole_vector_shift (apart from the blip this week).

Thanks,
Richard

> Thanks,

>

> Christophe

>

>> 2018-01-25  Richard Sandiford  <richard.sandiford@linaro.org>

>>

>> gcc/testsuite/

>>         * gcc.dg/vect/vect-reduc-or_1.c: Require whole_vector_shift for

>>         the shift dump line.

>>         * gcc.dg/vect/vect-reduc-or_2.c: Likewise.

>>

>> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c

>> ===================================================================

>> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-24 16:22:31.724089913 +0000

>> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-25 10:16:16.283500281 +0000

>> @@ -45,5 +45,5 @@ main (unsigned char argc, char **argv)

>>    return 0;

>>  }

>>

>> -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */

>> +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */

>>  /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */

>> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c

>> ===================================================================

>> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-24 16:22:31.724089913 +0000

>> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-25 10:16:16.284500239 +0000

>> @@ -44,5 +44,5 @@ main (unsigned char argc, char **argv)

>>    return 0;

>>  }

>>

>> -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */

>> +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */

>>  /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */
Christophe Lyon Jan. 26, 2018, 9:40 a.m. UTC | #8
On 26 January 2018 at 10:33, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Christophe Lyon <christophe.lyon@linaro.org> writes:

>> On 25 January 2018 at 11:24, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:

>>>> Jeff Law <law@redhat.com> writes:

>>>>> On 11/22/2017 11:12 AM, Richard Sandiford wrote:

>>>>>> Richard Sandiford <richard.sandiford@linaro.org> writes:

>>>>>>> This patch adds support for the SVE bitwise reduction instructions

>>>>>>> (ANDV, ORV and EORV).  It's a fairly mechanical extension of existing

>>>>>>> REDUC_* operators.

>>>>>>>

>>>>>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu

>>>>>>> and powerpc64le-linux-gnu.

>>>>>>

>>>>>> Here's an updated version that applies on top of the recent

>>>>>> removal of REDUC_*_EXPR.  Tested as before.

>>>>>>

>>>>>> Thanks,

>>>>>> Richard

>>>>>>

>>>>>>

>>>>>> 2017-11-22  Richard Sandiford  <richard.sandiford@linaro.org>

>>>>>>         Alan Hayward  <alan.hayward@arm.com>

>>>>>>         David Sherwood  <david.sherwood@arm.com>

>>>>>>

>>>>>> gcc/

>>>>>>     * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)

>>>>>>     (reduc_xor_scal_optab): New optabs.

>>>>>>     * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})

>>>>>>     (reduc_xor_scal_@var{m}): Document.

>>>>>>     * doc/sourcebuild.texi (vect_logical_reduc): Likewise.

>>>>>>     * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New

>>>>>>     internal functions.

>>>>>>     * fold-const-call.c (fold_const_call): Handle them.

>>>>>>     * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new

>>>>>>     internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.

>>>>>>     * config/aarch64/aarch64-sve.md (reduc_<bit_reduc>_scal_<mode>):

>>>>>>     (*reduc_<bit_reduc>_scal_<mode>): New patterns.

>>>>>>     * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)

>>>>>>     (UNSPEC_XORV): New unspecs.

>>>>>>     (optab): Add entries for them.

>>>>>>     (BITWISEV): New int iterator.

>>>>>>     (bit_reduc_op): New int attributes.

>>>>>>

>>>>>> gcc/testsuite/

>>>>>>     * lib/target-supports.exp (check_effective_target_vect_logical_reduc):

>>>>>>     New proc.

>>>>>>     * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc

>>>>>>     and add an associated scan-dump test.  Prevent vectorization

>>>>>>     of the first two loops.

>>>>>>     * gcc.dg/vect/vect-reduc-or_2.c: Likewise.

>>>>>>     * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions.

>>>>>>     * gcc.target/aarch64/sve_reduc_2.c: Likewise.

>>>>>>     * gcc.target/aarch64/sve_reduc_1_run.c: Likewise.

>>>>>>     (INIT_VECTOR): Tweak initial value so that some bits are always set.

>>>>>>     * gcc.target/aarch64/sve_reduc_2_run.c: Likewise.

>>>>> OK.

>>>>> Jeff

>>>>

>>>> Two tests have regressed on sparc-sun-solaris2.*:

>>>>

>>>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects

>>>> scan-tree-dump vect "Reduce using vector shifts"

>>>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using

>>>> vector shifts"

>>>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects

>>>> scan-tree-dump vect "Reduce using vector shifts"

>>>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using

>>>> vector shifts"

>>>

>>> Bah, I think I broke this yesterday in:

>>>

>>> 2018-01-24  Richard Sandiford  <richard.sandiford@linaro.org>

>>>

>>>         PR testsuite/83889

>>>         [...]

>>>         * gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run.

>>>         * gcc.dg/vect/vect-reduc-or_2.c: Likewise.

>>>

>>> (r257022), which removed:

>>>

>>>   /* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */

>>>

>>> I'd somehow thought that the dump lines in these two tests were already

>>> guarded, but they weren't.

>>>

>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu and applied as obvious.

>>> Sorry for the breakage.

>>>

>>> Richard

>>>

>>>

>>

>> Hi Richard,

>>

>> While this fixes the regression on armeb (same as on sparc), the

>> effect on arm-none-linux-gnueabi and arm-none-eabi

>> is that the tests are now skipped, while they used to pass.

>> Is this expected? Or is the guard you added too restrictive?

>

> I think that means that the tests went from UNSUPPORTED to PASS on

> the last two targets with r257022.  Is that right?

>

Yes, that's what I meant.

> It's expected in the sense that whole_vector_shift isn't true for

> any arm*-*-* target, and historically this test was restricted to

> whole_vector_shift (apart from the blip this week).

>

OK, then. Just surprising to see PASS disappear.

Thanks,

Christophe

> Thanks,

> Richard

>

>> Thanks,

>>

>> Christophe

>>

>>> 2018-01-25  Richard Sandiford  <richard.sandiford@linaro.org>

>>>

>>> gcc/testsuite/

>>>         * gcc.dg/vect/vect-reduc-or_1.c: Require whole_vector_shift for

>>>         the shift dump line.

>>>         * gcc.dg/vect/vect-reduc-or_2.c: Likewise.

>>>

>>> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c

>>> ===================================================================

>>> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-24 16:22:31.724089913 +0000

>>> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-25 10:16:16.283500281 +0000

>>> @@ -45,5 +45,5 @@ main (unsigned char argc, char **argv)

>>>    return 0;

>>>  }

>>>

>>> -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */

>>> +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */

>>>  /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */

>>> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c

>>> ===================================================================

>>> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-24 16:22:31.724089913 +0000

>>> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-25 10:16:16.284500239 +0000

>>> @@ -44,5 +44,5 @@ main (unsigned char argc, char **argv)

>>>    return 0;

>>>  }

>>>

>>> -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */

>>> +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { whole_vector_shift && { ! vect_logical_reduc } } } } } */

>>>  /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */
diff mbox series

Patch

Index: gcc/tree.def
===================================================================
--- gcc/tree.def	2017-11-17 09:40:43.533167007 +0000
+++ gcc/tree.def	2017-11-17 09:49:36.196354636 +0000
@@ -1298,6 +1298,9 @@  DEFTREECODE (TRANSACTION_EXPR, "transact
 DEFTREECODE (REDUC_MAX_EXPR, "reduc_max_expr", tcc_unary, 1)
 DEFTREECODE (REDUC_MIN_EXPR, "reduc_min_expr", tcc_unary, 1)
 DEFTREECODE (REDUC_PLUS_EXPR, "reduc_plus_expr", tcc_unary, 1)
+DEFTREECODE (REDUC_AND_EXPR, "reduc_and_expr", tcc_unary, 1)
+DEFTREECODE (REDUC_IOR_EXPR, "reduc_ior_expr", tcc_unary, 1)
+DEFTREECODE (REDUC_XOR_EXPR, "reduc_xor_expr", tcc_unary, 1)
 
 /* Widening dot-product.
    The first two arguments are of type t1.
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2017-11-17 09:44:46.386606597 +0000
+++ gcc/doc/md.texi	2017-11-17 09:49:36.189354637 +0000
@@ -5244,6 +5244,17 @@  Compute the sum of the elements of a vec
 operand 0 is the scalar result, with mode equal to the mode of the elements of
 the input vector.
 
+@cindex @code{reduc_and_scal_@var{m}} instruction pattern
+@item @samp{reduc_and_scal_@var{m}}
+@cindex @code{reduc_ior_scal_@var{m}} instruction pattern
+@itemx @samp{reduc_ior_scal_@var{m}}
+@cindex @code{reduc_xor_scal_@var{m}} instruction pattern
+@itemx @samp{reduc_xor_scal_@var{m}}
+Compute the bitwise @code{AND}/@code{IOR}/@code{XOR} reduction of the elements
+of a vector of mode @var{m}.  Operand 1 is the vector input and operand 0
+is the scalar result.  The mode of the scalar result is the same as one
+element of @var{m}.
+
 @cindex @code{sdot_prod@var{m}} instruction pattern
 @item @samp{sdot_prod@var{m}}
 @cindex @code{udot_prod@var{m}} instruction pattern
Index: gcc/doc/sourcebuild.texi
===================================================================
--- gcc/doc/sourcebuild.texi	2017-11-09 15:19:05.427168565 +0000
+++ gcc/doc/sourcebuild.texi	2017-11-17 09:49:36.190354637 +0000
@@ -1570,6 +1570,9 @@  Target supports 16- and 8-bytes vectors.
 
 @item vect_sizes_32B_16B
 Target supports 32- and 16-bytes vectors.
+
+@item vect_logical_reduc
+Target supports AND, IOR and XOR reduction on vectors.
 @end table
 
 @subsubsection Thread Local Storage attributes
Index: gcc/doc/generic.texi
===================================================================
--- gcc/doc/generic.texi	2017-11-17 09:40:43.510667010 +0000
+++ gcc/doc/generic.texi	2017-11-17 09:49:36.188354638 +0000
@@ -1740,6 +1740,12 @@  a value from @code{enum annot_expr_kind}
 @tindex VEC_PACK_FIX_TRUNC_EXPR
 @tindex VEC_COND_EXPR
 @tindex SAD_EXPR
+@tindex REDUC_MAX_EXPR
+@tindex REDUC_MIN_EXPR
+@tindex REDUC_PLUS_EXPR
+@tindex REDUC_AND_EXPR
+@tindex REDUC_IOR_EXPR
+@tindex REDUC_XOR_EXPR
 
 @table @code
 @item VEC_DUPLICATE_EXPR
@@ -1841,6 +1847,20 @@  operand must be at lease twice of the si
 first and second one.  The SAD is calculated between the first and second
 operands, added to the third operand, and returned.
 
+@item REDUC_MAX_EXPR
+@itemx REDUC_MIN_EXPR
+@itemx REDUC_PLUS_EXPR
+@itemx REDUC_AND_EXPR
+@itemx REDUC_IOR_EXPR
+@itemx REDUC_XOR_EXPR
+These nodes represent operations that take a vector input and repeatedly
+apply a binary operator on pairs of elements until only one scalar remains.
+For example, @samp{REDUC_PLUS_EXPR <@var{x}>} returns the sum of
+the elements in @var{x} and @samp{REDUC_MAX_EXPR <@var{x}>} returns
+the maximum element in @var{x}.  The associativity of the operation
+is unspecified; for example, @samp{REDUC_PLUS_EXPR <@var{x}>} could
+sum floating-point @var{x} in forward order, in reverse order,
+using a tree, or in some other way.
 @end table
 
 
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def	2017-11-17 09:44:46.386606597 +0000
+++ gcc/optabs.def	2017-11-17 09:49:36.192354637 +0000
@@ -292,6 +292,9 @@  OPTAB_D (reduc_smin_scal_optab, "reduc_s
 OPTAB_D (reduc_plus_scal_optab, "reduc_plus_scal_$a")
 OPTAB_D (reduc_umax_scal_optab, "reduc_umax_scal_$a")
 OPTAB_D (reduc_umin_scal_optab, "reduc_umin_scal_$a")
+OPTAB_D (reduc_and_scal_optab,  "reduc_and_scal_$a")
+OPTAB_D (reduc_ior_scal_optab,  "reduc_ior_scal_$a")
+OPTAB_D (reduc_xor_scal_optab,  "reduc_xor_scal_$a")
 
 OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
 OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-11-17 09:40:43.509767010 +0000
+++ gcc/cfgexpand.c	2017-11-17 09:49:36.187354638 +0000
@@ -5069,6 +5069,9 @@  expand_debug_expr (tree exp)
     case REDUC_MAX_EXPR:
     case REDUC_MIN_EXPR:
     case REDUC_PLUS_EXPR:
+    case REDUC_AND_EXPR:
+    case REDUC_IOR_EXPR:
+    case REDUC_XOR_EXPR:
     case VEC_COND_EXPR:
     case VEC_PACK_FIX_TRUNC_EXPR:
     case VEC_PACK_SAT_EXPR:
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-11-17 09:06:05.552470755 +0000
+++ gcc/expr.c	2017-11-17 09:49:36.191354637 +0000
@@ -9438,6 +9438,9 @@  #define REDUCE_BIT_FIELD(expr)	(reduce_b
     case REDUC_MAX_EXPR:
     case REDUC_MIN_EXPR:
     case REDUC_PLUS_EXPR:
+    case REDUC_AND_EXPR:
+    case REDUC_IOR_EXPR:
+    case REDUC_XOR_EXPR:
       {
         op0 = expand_normal (treeop0);
         this_optab = optab_for_tree_code (code, type, optab_default);
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-11-17 09:06:23.404260252 +0000
+++ gcc/fold-const.c	2017-11-17 09:49:36.192354637 +0000
@@ -1869,6 +1869,9 @@  const_unop (enum tree_code code, tree ty
     case REDUC_MIN_EXPR:
     case REDUC_MAX_EXPR:
     case REDUC_PLUS_EXPR:
+    case REDUC_AND_EXPR:
+    case REDUC_IOR_EXPR:
+    case REDUC_XOR_EXPR:
       {
 	unsigned int nelts, i;
 	enum tree_code subcode;
@@ -1882,6 +1885,9 @@  const_unop (enum tree_code code, tree ty
 	  case REDUC_MIN_EXPR: subcode = MIN_EXPR; break;
 	  case REDUC_MAX_EXPR: subcode = MAX_EXPR; break;
 	  case REDUC_PLUS_EXPR: subcode = PLUS_EXPR; break;
+	  case REDUC_AND_EXPR: subcode = BIT_AND_EXPR; break;
+	  case REDUC_IOR_EXPR: subcode = BIT_IOR_EXPR; break;
+	  case REDUC_XOR_EXPR: subcode = BIT_XOR_EXPR; break;
 	  default: gcc_unreachable ();
 	  }
 
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2017-11-17 09:40:43.523267008 +0000
+++ gcc/optabs-tree.c	2017-11-17 09:49:36.192354637 +0000
@@ -157,6 +157,15 @@  optab_for_tree_code (enum tree_code code
     case REDUC_PLUS_EXPR:
       return reduc_plus_scal_optab;
 
+    case REDUC_AND_EXPR:
+      return reduc_and_scal_optab;
+
+    case REDUC_IOR_EXPR:
+      return reduc_ior_scal_optab;
+
+    case REDUC_XOR_EXPR:
+      return reduc_xor_scal_optab;
+
     case VEC_WIDEN_MULT_HI_EXPR:
       return TYPE_UNSIGNED (type) ?
 	vec_widen_umult_hi_optab : vec_widen_smult_hi_optab;
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2017-11-17 09:05:59.899390175 +0000
+++ gcc/tree-cfg.c	2017-11-17 09:49:36.194354636 +0000
@@ -3773,6 +3773,9 @@  verify_gimple_assign_unary (gassign *stm
     case REDUC_MAX_EXPR:
     case REDUC_MIN_EXPR:
     case REDUC_PLUS_EXPR:
+    case REDUC_AND_EXPR:
+    case REDUC_IOR_EXPR:
+    case REDUC_XOR_EXPR:
       if (!VECTOR_TYPE_P (rhs1_type)
 	  || !useless_type_conversion_p (lhs_type, TREE_TYPE (rhs1_type)))
         {
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	2017-11-17 09:40:43.527767008 +0000
+++ gcc/tree-inline.c	2017-11-17 09:49:36.195354636 +0000
@@ -3878,6 +3878,9 @@  estimate_operator_cost (enum tree_code c
     case REDUC_MAX_EXPR:
     case REDUC_MIN_EXPR:
     case REDUC_PLUS_EXPR:
+    case REDUC_AND_EXPR:
+    case REDUC_IOR_EXPR:
+    case REDUC_XOR_EXPR:
     case WIDEN_SUM_EXPR:
     case WIDEN_MULT_EXPR:
     case DOT_PROD_EXPR:
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	2017-11-17 08:57:27.159529444 +0000
+++ gcc/tree-pretty-print.c	2017-11-17 09:49:36.195354636 +0000
@@ -3231,24 +3231,6 @@  dump_generic_node (pretty_printer *pp, t
       is_expr = false;
       break;
 
-    case REDUC_MAX_EXPR:
-      pp_string (pp, " REDUC_MAX_EXPR < ");
-      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
-      pp_string (pp, " > ");
-      break;
-
-    case REDUC_MIN_EXPR:
-      pp_string (pp, " REDUC_MIN_EXPR < ");
-      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
-      pp_string (pp, " > ");
-      break;
-
-    case REDUC_PLUS_EXPR:
-      pp_string (pp, " REDUC_PLUS_EXPR < ");
-      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
-      pp_string (pp, " > ");
-      break;
-
     case VEC_SERIES_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
@@ -3267,6 +3249,12 @@  dump_generic_node (pretty_printer *pp, t
       break;
 
     case VEC_DUPLICATE_EXPR:
+    case REDUC_MAX_EXPR:
+    case REDUC_MIN_EXPR:
+    case REDUC_PLUS_EXPR:
+    case REDUC_AND_EXPR:
+    case REDUC_IOR_EXPR:
+    case REDUC_XOR_EXPR:
       pp_space (pp);
       for (str = get_tree_code_name (code); *str; str++)
 	pp_character (pp, TOUPPER (*str));
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-11-17 09:44:46.389306597 +0000
+++ gcc/tree-vect-loop.c	2017-11-17 09:49:36.196354636 +0000
@@ -2437,11 +2437,20 @@  reduction_code_for_scalar_code (enum tre
         *reduc_code = REDUC_PLUS_EXPR;
         return true;
 
-      case MULT_EXPR:
-      case MINUS_EXPR:
+      case BIT_AND_EXPR:
+	*reduc_code = REDUC_AND_EXPR;
+	return true;
+
       case BIT_IOR_EXPR:
+	*reduc_code = REDUC_IOR_EXPR;
+	return true;
+
       case BIT_XOR_EXPR:
-      case BIT_AND_EXPR:
+	*reduc_code = REDUC_XOR_EXPR;
+	return true;
+
+      case MULT_EXPR:
+      case MINUS_EXPR:
         *reduc_code = ERROR_MARK;
         return true;
 
Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2017-11-17 09:44:46.385706597 +0000
+++ gcc/config/aarch64/aarch64-sve.md	2017-11-17 09:49:36.188354638 +0000
@@ -1513,6 +1513,26 @@  (define_insn "*reduc_<maxmin_uns>_scal_<
   "<maxmin_uns_op>v\t%<Vetype>0, %1, %2.<Vetype>"
 )
 
+(define_expand "reduc_<optab>_scal_<mode>"
+  [(set (match_operand:<VEL> 0 "register_operand")
+	(unspec:<VEL> [(match_dup 2)
+		       (match_operand:SVE_I 1 "register_operand")]
+		      BITWISEV))]
+  "TARGET_SVE"
+  {
+    operands[2] = force_reg (<VPRED>mode, CONSTM1_RTX (<VPRED>mode));
+  }
+)
+
+(define_insn "*reduc_<optab>_scal_<mode>"
+  [(set (match_operand:<VEL> 0 "register_operand" "=w")
+	(unspec:<VEL> [(match_operand:<VPRED> 1 "register_operand" "Upl")
+		       (match_operand:SVE_I 2 "register_operand" "w")]
+		      BITWISEV))]
+  "TARGET_SVE"
+  "<bit_reduc_op>\t%<Vetype>0, %1, %2.<Vetype>"
+)
+
 ;; Unpredicated floating-point addition.
 (define_expand "add<mode>3"
   [(set (match_operand:SVE_F 0 "register_operand")
Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md	2017-11-17 09:40:36.505067706 +0000
+++ gcc/config/aarch64/iterators.md	2017-11-17 09:49:36.188354638 +0000
@@ -405,6 +405,9 @@  (define_c_enum "unspec"
     UNSPEC_SDOT		; Used in aarch64-simd.md.
     UNSPEC_UDOT		; Used in aarch64-simd.md.
     UNSPEC_SEL		; Used in aarch64-sve.md.
+    UNSPEC_ANDV		; Used in aarch64-sve.md.
+    UNSPEC_IORV		; Used in aarch64-sve.md.
+    UNSPEC_XORV		; Used in aarch64-sve.md.
     UNSPEC_ANDF		; Used in aarch64-sve.md.
     UNSPEC_IORF		; Used in aarch64-sve.md.
     UNSPEC_XORF		; Used in aarch64-sve.md.
@@ -1298,6 +1301,8 @@  (define_int_iterator MAXMINV [UNSPEC_UMA
 (define_int_iterator FMAXMINV [UNSPEC_FMAXV UNSPEC_FMINV
 			       UNSPEC_FMAXNMV UNSPEC_FMINNMV])
 
+(define_int_iterator BITWISEV [UNSPEC_ANDV UNSPEC_IORV UNSPEC_XORV])
+
 (define_int_iterator LOGICALF [UNSPEC_ANDF UNSPEC_IORF UNSPEC_XORF])
 
 (define_int_iterator HADDSUB [UNSPEC_SHADD UNSPEC_UHADD
@@ -1417,7 +1422,10 @@  (define_int_attr atomic_ldop
 ;; name for consistency with the integer patterns.
 (define_int_attr optab [(UNSPEC_ANDF "and")
 			(UNSPEC_IORF "ior")
-			(UNSPEC_XORF "xor")])
+			(UNSPEC_XORF "xor")
+			(UNSPEC_ANDV "and")
+			(UNSPEC_IORV "ior")
+			(UNSPEC_XORV "xor")])
 
 (define_int_attr  maxmin_uns [(UNSPEC_UMAXV "umax")
 			      (UNSPEC_UMINV "umin")
@@ -1445,6 +1453,10 @@  (define_int_attr  maxmin_uns_op [(UNSPEC
 				 (UNSPEC_FMAXNM "fmaxnm")
 				 (UNSPEC_FMINNM "fminnm")])
 
+(define_int_attr bit_reduc_op [(UNSPEC_ANDV "andv")
+			       (UNSPEC_IORV "orv")
+			       (UNSPEC_XORV "eorv")])
+
 ;; The SVE logical instruction that implements an unspec.
 (define_int_attr logicalf_op [(UNSPEC_ANDF "and")
 		 	      (UNSPEC_IORF "orr")
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	2017-11-17 09:06:28.516102419 +0000
+++ gcc/testsuite/lib/target-supports.exp	2017-11-17 09:49:36.194354636 +0000
@@ -7162,6 +7162,12 @@  proc check_effective_target_vect_call_ro
     return $et_vect_call_roundf_saved($et_index)
 }
 
+# Return 1 if the target supports AND, OR and XOR reduction.
+
+proc check_effective_target_vect_logical_reduc { } {
+    return [check_effective_target_aarch64_sve]
+}
+
 # Return 1 if the target supports section-anchors
 
 proc check_effective_target_section_anchors { } {
Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c	2017-11-09 15:15:28.900668540 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c	2017-11-17 09:49:36.192354637 +0000
@@ -1,4 +1,4 @@ 
-/* { dg-require-effective-target whole_vector_shift } */
+/* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */
 
 /* Write a reduction loop to be reduced using vector shifts.  */
 
@@ -24,17 +24,17 @@  main (unsigned char argc, char **argv)
   check_vect ();
 
   for (i = 0; i < N; i++)
-    in[i] = (i + i + 1) & 0xfd;
+    {
+      in[i] = (i + i + 1) & 0xfd;
+      asm volatile ("" ::: "memory");
+    }
 
   for (i = 0; i < N; i++)
     {
       expected |= in[i];
-      asm volatile ("");
+      asm volatile ("" ::: "memory");
     }
 
-  /* Prevent constant propagation of the entire loop below.  */
-  asm volatile ("" : : : "memory");
-
   for (i = 0; i < N; i++)
     sum |= in[i];
 
@@ -47,5 +47,5 @@  main (unsigned char argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" } } */
-
+/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */
+/* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c	2017-11-09 15:15:28.900668540 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c	2017-11-17 09:49:36.192354637 +0000
@@ -1,4 +1,4 @@ 
-/* { dg-require-effective-target whole_vector_shift } */
+/* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */
 
 /* Write a reduction loop to be reduced using vector shifts and folded.  */
 
@@ -23,12 +23,15 @@  main (unsigned char argc, char **argv)
   check_vect ();
 
   for (i = 0; i < N; i++)
-    in[i] = (i + i + 1) & 0xfd;
+    {
+      in[i] = (i + i + 1) & 0xfd;
+      asm volatile ("" ::: "memory");
+    }
 
   for (i = 0; i < N; i++)
     {
       expected |= in[i];
-      asm volatile ("");
+      asm volatile ("" ::: "memory");
     }
 
   for (i = 0; i < N; i++)
@@ -43,5 +46,5 @@  main (unsigned char argc, char **argv)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" } } */
-
+/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target { ! vect_logical_reduc } } } } */
+/* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" { target vect_logical_reduc } } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_1.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve_reduc_1.c	2017-11-17 09:06:21.395260303 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_reduc_1.c	2017-11-17 09:49:36.192354637 +0000
@@ -65,6 +65,46 @@  #define TEST_MAXMIN(T)				\
 
 TEST_MAXMIN (DEF_REDUC_MAXMIN)
 
+#define DEF_REDUC_BITWISE(TYPE, NAME, BIT_OP)	\
+TYPE __attribute__ ((noinline, noclone))	\
+reduc_##NAME##_##TYPE (TYPE *a, int n)		\
+{						\
+  TYPE r = 13;					\
+  for (int i = 0; i < n; ++i)			\
+    r BIT_OP a[i];				\
+  return r;					\
+}
+
+#define TEST_BITWISE(T)				\
+  T (int8_t, and, &=)				\
+  T (int16_t, and, &=)				\
+  T (int32_t, and, &=)				\
+  T (int64_t, and, &=)				\
+  T (uint8_t, and, &=)				\
+  T (uint16_t, and, &=)				\
+  T (uint32_t, and, &=)				\
+  T (uint64_t, and, &=)				\
+						\
+  T (int8_t, ior, |=)				\
+  T (int16_t, ior, |=)				\
+  T (int32_t, ior, |=)				\
+  T (int64_t, ior, |=)				\
+  T (uint8_t, ior, |=)				\
+  T (uint16_t, ior, |=)				\
+  T (uint32_t, ior, |=)				\
+  T (uint64_t, ior, |=)				\
+						\
+  T (int8_t, xor, ^=)				\
+  T (int16_t, xor, ^=)				\
+  T (int32_t, xor, ^=)				\
+  T (int64_t, xor, ^=)				\
+  T (uint8_t, xor, ^=)				\
+  T (uint16_t, xor, ^=)				\
+  T (uint32_t, xor, ^=)				\
+  T (uint64_t, xor, ^=)
+
+TEST_BITWISE (DEF_REDUC_BITWISE)
+
 /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */
@@ -102,6 +142,12 @@  TEST_MAXMIN (DEF_REDUC_MAXMIN)
 /* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.s, p[0-7]/m, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfminnm\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */
 
+/* { dg-final { scan-assembler-times {\tand\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */
+
+/* { dg-final { scan-assembler-times {\torr\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */
+
+/* { dg-final { scan-assembler-times {\teor\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 8 } } */
+
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
@@ -133,3 +179,18 @@  TEST_MAXMIN (DEF_REDUC_MAXMIN)
 /* { dg-final { scan-assembler-times {\tfminnmv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfminnmv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfminnmv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tandv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */
+/* { dg-final { scan-assembler-times {\tandv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */
+/* { dg-final { scan-assembler-times {\tandv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */
+/* { dg-final { scan-assembler-times {\tandv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */
+
+/* { dg-final { scan-assembler-times {\torv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */
+/* { dg-final { scan-assembler-times {\torv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */
+/* { dg-final { scan-assembler-times {\torv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */
+/* { dg-final { scan-assembler-times {\torv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */
+
+/* { dg-final { scan-assembler-times {\teorv\tb[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */
+/* { dg-final { scan-assembler-times {\teorv\th[0-9]+, p[0-7], z[0-9]+\.h} 2 } } */
+/* { dg-final { scan-assembler-times {\teorv\ts[0-9]+, p[0-7], z[0-9]+\.s} 2 } } */
+/* { dg-final { scan-assembler-times {\teorv\td[0-9]+, p[0-7], z[0-9]+\.d} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_2.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve_reduc_2.c	2017-11-17 09:06:21.395260303 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_reduc_2.c	2017-11-17 09:49:36.193354637 +0000
@@ -73,6 +73,49 @@  #define TEST_MAXMIN(T)				\
 
 TEST_MAXMIN (DEF_REDUC_MAXMIN)
 
+#define DEF_REDUC_BITWISE(TYPE,NAME,BIT_OP)			\
+void __attribute__ ((noinline, noclone))			\
+reduc_##NAME##TYPE (TYPE (*restrict a)[NUM_ELEMS(TYPE)],	\
+		    TYPE *restrict r, int n)			\
+{								\
+  for (int i = 0; i < n; i++)					\
+    {								\
+      r[i] = a[i][0];						\
+      for (int j = 0; j < NUM_ELEMS(TYPE); j++)			\
+        r[i] BIT_OP a[i][j];					\
+    }								\
+}
+
+#define TEST_BITWISE(T)				\
+  T (int8_t, and, &=)				\
+  T (int16_t, and, &=)				\
+  T (int32_t, and, &=)				\
+  T (int64_t, and, &=)				\
+  T (uint8_t, and, &=)				\
+  T (uint16_t, and, &=)				\
+  T (uint32_t, and, &=)				\
+  T (uint64_t, and, &=)				\
+						\
+  T (int8_t, ior, |=)				\
+  T (int16_t, ior, |=)				\
+  T (int32_t, ior, |=)				\
+  T (int64_t, ior, |=)				\
+  T (uint8_t, ior, |=)				\
+  T (uint16_t, ior, |=)				\
+  T (uint32_t, ior, |=)				\
+  T (uint64_t, ior, |=)				\
+						\
+  T (int8_t, xor, ^=)				\
+  T (int16_t, xor, ^=)				\
+  T (int32_t, xor, ^=)				\
+  T (int64_t, xor, ^=)				\
+  T (uint8_t, xor, ^=)				\
+  T (uint16_t, xor, ^=)				\
+  T (uint32_t, xor, ^=)				\
+  T (uint64_t, xor, ^=)
+
+TEST_BITWISE (DEF_REDUC_BITWISE)
+
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
@@ -104,3 +147,18 @@  TEST_MAXMIN (DEF_REDUC_MAXMIN)
 /* { dg-final { scan-assembler-times {\tfminnmv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfminnmv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfminnmv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tandv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tandv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tandv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
+/* { dg-final { scan-assembler-times {\tandv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\torv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */
+/* { dg-final { scan-assembler-times {\torv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */
+/* { dg-final { scan-assembler-times {\torv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
+/* { dg-final { scan-assembler-times {\torv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */
+
+/* { dg-final { scan-assembler-times {\teorv\tb[0-9]+, p[0-7], z[0-9]+\.b\n} 2 } } */
+/* { dg-final { scan-assembler-times {\teorv\th[0-9]+, p[0-7], z[0-9]+\.h\n} 2 } } */
+/* { dg-final { scan-assembler-times {\teorv\ts[0-9]+, p[0-7], z[0-9]+\.s\n} 2 } } */
+/* { dg-final { scan-assembler-times {\teorv\td[0-9]+, p[0-7], z[0-9]+\.d\n} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_1_run.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve_reduc_1_run.c	2017-11-17 09:06:21.395260303 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_reduc_1_run.c	2017-11-17 09:49:36.193354637 +0000
@@ -9,7 +9,7 @@  #define INIT_VECTOR(TYPE)				\
   TYPE a[NUM_ELEMS (TYPE) + 1];				\
   for (int i = 0; i < NUM_ELEMS (TYPE) + 1; i++)	\
     {							\
-      a[i] = (i * 2) * (i & 1 ? 1 : -1);		\
+      a[i] = ((i * 2) * (i & 1 ? 1 : -1) | 3);		\
       asm volatile ("" ::: "memory");			\
     }
 
@@ -35,10 +35,22 @@  #define TEST_REDUC_MAXMIN(TYPE, NAME, CM
       __builtin_abort ();					\
   }
 
+#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)			\
+  {								\
+    INIT_VECTOR (TYPE);						\
+    TYPE r1 = reduc_##NAME##_##TYPE (a, NUM_ELEMS (TYPE));	\
+    volatile TYPE r2 = 13;					\
+    for (int i = 0; i < NUM_ELEMS (TYPE); ++i)			\
+      r2 BIT_OP a[i];						\
+    if (r1 != r2)						\
+      __builtin_abort ();					\
+  }
+
 int main ()
 {
   TEST_PLUS (TEST_REDUC_PLUS)
   TEST_MAXMIN (TEST_REDUC_MAXMIN)
+  TEST_BITWISE (TEST_REDUC_BITWISE)
 
   return 0;
 }
Index: gcc/testsuite/gcc.target/aarch64/sve_reduc_2_run.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve_reduc_2_run.c	2017-11-17 09:06:21.395260303 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_reduc_2_run.c	2017-11-17 09:49:36.193354637 +0000
@@ -56,6 +56,20 @@  #define TEST_REDUC_MAXMIN(TYPE, NAME, CM
       }							\
     }
 
+#define TEST_REDUC_BITWISE(TYPE, NAME, BIT_OP)		\
+  {							\
+    INIT_MATRIX (TYPE);					\
+    reduc_##NAME##_##TYPE (mat, r, NROWS);		\
+    for (int i = 0; i < NROWS; i++)			\
+      {							\
+	volatile TYPE r2 = mat[i][0];			\
+	for (int j = 0; j < NUM_ELEMS (TYPE); ++j)	\
+	  r2 BIT_OP mat[i][j];				\
+	if (r[i] != r2)					\
+	  __builtin_abort ();				\
+      }							\
+    }
+
 int main ()
 {
   TEST_PLUS (TEST_REDUC_PLUS)