From patchwork Thu Nov  9 13:24:40 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Sandiford <richard.sandiford@linaro.org>
X-Patchwork-Id: 118445
Delivered-To: patch@linaro.org
Received: by 10.140.22.164 with SMTP id 33csp6589920qgn;
 Thu, 9 Nov 2017 05:25:15 -0800 (PST)
X-Google-Smtp-Source: ABhQp+RSynF7MWngR0BiZaPNV+fNDdfEBfxmljORo5sLvRTH1Q5+LL/ZxZGZesmivTe6p1Aw+EEm
X-Received: by 10.98.31.215 with SMTP id l84mr530286pfj.36.1510233915842;
 Thu, 09 Nov 2017 05:25:15 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1510233915; cv=none;
 d=google.com; s=arc-20160816;
 b=Fp6dszBsTSBmkvzLg7xxCIas6EtrG5Wjok6QxZOQIL/JYACcBPvUs6kzX8AC7FzGu/
 4yG45eB5Ir75g3Q8jnOx3ao3jMJbeW3qzoYeM0NM+IN2jiOtGvOasjnRo9k4hkm8IDbU
 JQyV+l/31ZAu16nCiI5pR/aULXo3qX0YeB7PhmBenGoFMDT+KWk8/8EWLjAkXfm9Lhd/
 1KFFmN9xLn1PhHIAZ8PoorA9ORmg1v6qxJsTpG/SfXda1hOO8NfSwPxdKTvPnI9in0hm
 Si8y5LaaAUOXYh/UQeXP6rnQFQs/s6XXEHYnbEtYFa5YGusCLtbpO++Qpg9/Y4Fsqk7O
 qGHA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=mime-version:user-agent:message-id:date:subject:cc:mail-followup-to
 :to:from:delivered-to:sender:list-help:list-post:list-archive
 :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature
 :domainkey-signature:arc-authentication-results;
 bh=H/05W2Zfkz/cZfGNNz4nuGRvQWlUWhWWDXSp0VeBNUs=;
 b=vhxo/nB+GDK4FSSfRfNOZHnwJNz7mkqb6eWDJ3l2GVIbAANggBqmFBkH+2wN/crOs+
 RelqIzPWz1CK5cQ1UUvZj8rQQmuG8SEvJJjEE56R8vddK934GKlptFWXybyuwqjdQWVY
 MZwYgsFYUpRA4+xA+CuSVCNJbK88+4SuQouL38/X2gdGUJVd92SHegNSsC5OEM5PJbpi
 WajtX1W3G+DCMGDQ+Ys1HLn5NsUfUOGyMt62jHvxiAvkpxaGO17S3ASbMzsXN7z8+1ur
 o07dwpl6rNk0Lmo4yAe0U44FJyZpinzf/XUBv18nmNpH+UvZnBCLueNxm0sUxeeaz5ie
 wMIg==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@gcc.gnu.org header.s=default header.b=cGCCTY4i;
 spf=pass (google.com: domain of
 gcc-patches-return-466406-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=gcc-patches-return-466406-patch=linaro.org@gcc.gnu.org;
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <gcc-patches-return-466406-patch=linaro.org@gcc.gnu.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 w19si6757299pfa.59.2017.11.09.05.25.15 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Thu, 09 Nov 2017 05:25:15 -0800 (PST)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-466406-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@gcc.gnu.org header.s=default header.b=cGCCTY4i;
 spf=pass (google.com: domain of
 gcc-patches-return-466406-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom=gcc-patches-return-466406-patch=linaro.org@gcc.gnu.org;
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender:from
 :to:cc:subject:date:message-id:mime-version:content-type; q=dns;
 s=default; b=VBp228OmyCxPeKFKI4dd43gsfUjnKRMG2oP/kJRE4KcbPfb3kj
 VLsGp67B/ZDXkU/Vr9bF/KhjusqxoBwta2W/7RE/zFSuOCa8+NRwuatkzx/2he4i
 MdEp8HDv2ksbHkubyS1PB2MKDobMDsrJFEBNGYfgBdDlenw8FZnmOi4gQ=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender:from
 :to:cc:subject:date:message-id:mime-version:content-type; s=
 default; bh=e9iQSmSc1gkS+2e9DLQykvCuVqk=; b=cGCCTY4iwhHZUHvhJlVz
 z05Q1gqVY5sqlvGPU5Wjzk0QNApaRo9o5VtJxG0d00fLp3PYeQnuCBIK5qpW72Xd
 ERB1gCQmvaWRIusqMMFS1z8gDM08dChyNo3h2CAOY5jvnYZ4KGFqdI4DxwJv4pmW
 0nZhMRhjNGs6JxtkEAydUDE=
Received: (qmail 102984 invoked by alias); 9 Nov 2017 13:24:55 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-patch=linaro.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 102145 invoked by uid 89); 9 Nov 2017 13:24:54 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-15.5 required=5.0 tests=AWL, BAYES_00,
 GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS,
 RCVD_IN_DNSWL_NONE,
 SPF_PASS autolearn=ham version=3.3.2 spammy=powers
X-HELO: mail-wm0-f65.google.com
Received: from mail-wm0-f65.google.com (HELO mail-wm0-f65.google.com)
 (74.125.82.65) by sourceware.org
 (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
 Thu, 09 Nov 2017 13:24:49 +0000
Received: by mail-wm0-f65.google.com with SMTP id b14so1844387wme.2 for
 <gcc-patches@gcc.gnu.org>; Thu, 09 Nov 2017 05:24:48 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net;
 s=20161025;
 h=x-gm-message-state:from:to:mail-followup-to:cc:subject:date
 :message-id:user-agent:mime-version;
 bh=H/05W2Zfkz/cZfGNNz4nuGRvQWlUWhWWDXSp0VeBNUs=;
 b=njS7xGsKxK6/cPWCp6pzQ/4KvZeufJecNisVNBgID+sNqXD9pV5SfbVp6MpbxIH2yr
 HqLUV92K9AXNnK08of+oD+LONxzTKm0iC72nrBXi3H9YJIqsaxFADs2y+20Yurkzdikc
 tY9cddBcQLb1jlJkG1CEyVQWtNcTKqEp+dq6FK4Ww3YDcgoHvkz7jhJxMIPDDuy5GW2f
 4hytf/xN1imf377iFXRl+jRPWK86wxfvnp6uxd7OckrFEfxJmSiGaZ38zKQgj15pORZ2
 XqHG8rRcfvvKLYfZvRbQzX19UoGAqRTYtmoV1L3CXyenOhkN78K4YURe0xh7GRQoMota
 lUdw==
X-Gm-Message-State: AJaThX4uCFBqQTrpa1F6hFfTloX05WXP2A/xQSWSG1IpQu9Q8M6U8/bQ	Dbp4F2cuIK5B9Z7UYJOQzOfZ+A==
X-Received: by 10.28.55.197 with SMTP id e188mr410530wma.60.1510233886039;
 Thu, 09 Nov 2017 05:24:46 -0800 (PST)
Received: from localhost (94.197.120.38.threembb.co.uk. [94.197.120.38]) by
 smtp.gmail.com with ESMTPSA id
 r14sm13954503wrb.43.2017.11.09.05.24.44 (version=TLS1_2
 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);
 Thu, 09 Nov 2017 05:24:45 -0800 (PST)
From: Richard Sandiford <richard.sandiford@linaro.org>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com,
 james.greenhalgh@arm.com, marcus.shawcroft@arm.com,
 richard.sandiford@linaro.org
Cc: richard.earnshaw@arm.com, james.greenhalgh@arm.com,
 marcus.shawcroft@arm.com
Subject: Add optabs for common types of permutation
Date: Thu, 09 Nov 2017 13:24:40 +0000
Message-ID: <87shdnwpnr.fsf@linaro.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)
MIME-Version: 1.0

...so that we can use them for variable-length vectors.  For now
constant-length vectors continue to use VEC_PERM_EXPR and the
vec_perm_const optab even for cases that the new optabs could
handle.

The vector optabs are inconsistent about whether there should be
an underscore before the mode part of the name, but the other lo/hi
optabs have one.

Doing this means that we're able to optimise some SLP tests using
non-SLP (for now) on targets with variable-length vectors, so the
patch needs to add a few XFAILs.  Most of these go away with later
patches.

Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
and powerpc64le-linus-gnu.  OK to install?

Richard


2017-11-09  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/md.texi (vec_reverse, vec_interleave_lo, vec_interleave_hi)
	(vec_extract_even, vec_extract_odd): Document new optabs.
	* internal-fn.def (VEC_INTERLEAVE_LO, VEC_INTERLEAVE_HI)
	(VEC_EXTRACT_EVEN, VEC_EXTRACT_ODD, VEC_REVERSE): New internal
	functions.
	* optabs.def (vec_interleave_lo_optab, vec_interleave_hi_optab)
	(vec_extract_even_optab, vec_extract_odd_optab, vec_reverse_optab):
	New optabs.
	* tree-vect-data-refs.c: Include internal-fn.h.
	(vect_grouped_store_supported): Try using IFN_VEC_INTERLEAVE_{LO,HI}.
	(vect_permute_store_chain): Use them here too.
	(vect_grouped_load_supported): Try using IFN_VEC_EXTRACT_{EVEN,ODD}.
	(vect_permute_load_chain): Use them here too.
	* tree-vect-stmts.c (can_reverse_vector_p): New function.
	(get_negative_load_store_type): Use it.
	(reverse_vector): New function.
	(vectorizable_store, vectorizable_load): Use it.
	* config/aarch64/iterators.md (perm_optab): New iterator.
	* config/aarch64/aarch64-sve.md (<perm_optab>_<mode>): New expander.
	(vec_reverse_<mode>): Likewise.

gcc/testsuite/
	* gcc.dg/vect/no-vfa-vect-depend-2.c: Remove XFAIL.
	* gcc.dg/vect/no-vfa-vect-depend-3.c: Likewise.
	* gcc.dg/vect/pr33953.c: XFAIL for vect_variable_length.
	* gcc.dg/vect/pr68445.c: Likewise.
	* gcc.dg/vect/slp-12a.c: Likewise.
	* gcc.dg/vect/slp-13-big-array.c: Likewise.
	* gcc.dg/vect/slp-13.c: Likewise.
	* gcc.dg/vect/slp-14.c: Likewise.
	* gcc.dg/vect/slp-15.c: Likewise.
	* gcc.dg/vect/slp-42.c: Likewise.
	* gcc.dg/vect/slp-multitypes-2.c: Likewise.
	* gcc.dg/vect/slp-multitypes-4.c: Likewise.
	* gcc.dg/vect/slp-multitypes-5.c: Likewise.
	* gcc.dg/vect/slp-reduc-4.c: Likewise.
	* gcc.dg/vect/slp-reduc-7.c: Likewise.
	* gcc.target/aarch64/sve_vec_perm_2.c: New test.
	* gcc.target/aarch64/sve_vec_perm_2_run.c: Likewise.
	* gcc.target/aarch64/sve_vec_perm_3.c: New test.
	* gcc.target/aarch64/sve_vec_perm_3_run.c: Likewise.
	* gcc.target/aarch64/sve_vec_perm_4.c: New test.
	* gcc.target/aarch64/sve_vec_perm_4_run.c: Likewise.

Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2017-11-09 13:21:01.989917982 +0000
+++ gcc/doc/md.texi	2017-11-09 13:21:02.323463345 +0000
@@ -5017,6 +5017,46 @@ There is no need for a target to supply
 and @samp{vec_perm_const@var{m}} if the former can trivially implement
 the operation with, say, the vector constant loaded into a register.
 
+@cindex @code{vec_reverse_@var{m}} instruction pattern
+@item @samp{vec_reverse_@var{m}}
+Reverse the order of the elements in vector input operand 1 and store
+the result in vector output operand 0.  Both operands have mode @var{m}.
+
+This pattern is provided mainly for targets with variable-length vectors.
+Targets with fixed-length vectors can instead handle any reverse-specific
+optimizations in @samp{vec_perm_const@var{m}}.
+
+@cindex @code{vec_interleave_lo_@var{m}} instruction pattern
+@item @samp{vec_interleave_lo_@var{m}}
+Take the lowest-indexed halves of vector input operands 1 and 2 and
+interleave the elements, so that element @var{x} of operand 1 is followed by
+element @var{x} of operand 2.  Store the result in vector output operand 0.
+All three operands have mode @var{m}.
+
+This pattern is provided mainly for targets with variable-length
+vectors.  Targets with fixed-length vectors can instead handle any
+interleave-specific optimizations in @samp{vec_perm_const@var{m}}.
+
+@cindex @code{vec_interleave_hi_@var{m}} instruction pattern
+@item @samp{vec_interleave_hi_@var{m}}
+Like @samp{vec_interleave_lo_@var{m}}, but operate on the highest-indexed
+halves instead of the lowest-indexed halves.
+
+@cindex @code{vec_extract_even_@var{m}} instruction pattern
+@item @samp{vec_extract_even_@var{m}}
+Concatenate vector input operands 1 and 2, extract the elements with
+even-numbered indices, and store the result in vector output operand 0.
+All three operands have mode @var{m}.
+
+This pattern is provided mainly for targets with variable-length vectors.
+Targets with fixed-length vectors can instead handle any
+extract-specific optimizations in @samp{vec_perm_const@var{m}}.
+
+@cindex @code{vec_extract_odd_@var{m}} instruction pattern
+@item @samp{vec_extract_odd_@var{m}}
+Like @samp{vec_extract_even_@var{m}}, but extract the elements with
+odd-numbered indices.
+
 @cindex @code{push@var{m}1} instruction pattern
 @item @samp{push@var{m}1}
 Output a push instruction.  Operand 0 is value to push.  Used only when
Index: gcc/internal-fn.def
===================================================================
--- gcc/internal-fn.def	2017-11-09 13:21:01.989917982 +0000
+++ gcc/internal-fn.def	2017-11-09 13:21:02.323463345 +0000
@@ -102,6 +102,17 @@ DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_
 DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0,
 		       vec_mask_store_lanes, mask_store_lanes)
 
+DEF_INTERNAL_OPTAB_FN (VEC_INTERLEAVE_LO, ECF_CONST | ECF_NOTHROW,
+		       vec_interleave_lo, binary)
+DEF_INTERNAL_OPTAB_FN (VEC_INTERLEAVE_HI, ECF_CONST | ECF_NOTHROW,
+		       vec_interleave_hi, binary)
+DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT_EVEN, ECF_CONST | ECF_NOTHROW,
+		       vec_extract_even, binary)
+DEF_INTERNAL_OPTAB_FN (VEC_EXTRACT_ODD, ECF_CONST | ECF_NOTHROW,
+		       vec_extract_odd, binary)
+DEF_INTERNAL_OPTAB_FN (VEC_REVERSE, ECF_CONST | ECF_NOTHROW,
+		       vec_reverse, unary)
+
 DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary)
 
 /* Unary math functions.  */
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def	2017-11-09 13:21:01.989917982 +0000
+++ gcc/optabs.def	2017-11-09 13:21:02.323463345 +0000
@@ -309,6 +309,11 @@ OPTAB_D (vec_perm_optab, "vec_perm$a")
 OPTAB_D (vec_realign_load_optab, "vec_realign_load_$a")
 OPTAB_D (vec_set_optab, "vec_set$a")
 OPTAB_D (vec_shr_optab, "vec_shr_$a")
+OPTAB_D (vec_interleave_lo_optab, "vec_interleave_lo_$a")
+OPTAB_D (vec_interleave_hi_optab, "vec_interleave_hi_$a")
+OPTAB_D (vec_extract_even_optab, "vec_extract_even_$a")
+OPTAB_D (vec_extract_odd_optab, "vec_extract_odd_$a")
+OPTAB_D (vec_reverse_optab, "vec_reverse_$a")
 OPTAB_D (vec_unpacks_float_hi_optab, "vec_unpacks_float_hi_$a")
 OPTAB_D (vec_unpacks_float_lo_optab, "vec_unpacks_float_lo_$a")
 OPTAB_D (vec_unpacks_hi_optab, "vec_unpacks_hi_$a")
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/tree-vect-data-refs.c	2017-11-09 13:21:02.326167766 +0000
@@ -52,6 +52,7 @@ Software Foundation; either version 3, o
 #include "params.h"
 #include "tree-cfg.h"
 #include "tree-hash-traits.h"
+#include "internal-fn.h"
 
 /* Return true if load- or store-lanes optab OPTAB is implemented for
    COUNT vectors of type VECTYPE.  NAME is the name of OPTAB.  */
@@ -4636,7 +4637,16 @@ vect_grouped_store_supported (tree vecty
       return false;
     }
 
-  /* Check that the permutation is supported.  */
+  /* Powers of 2 use a tree of interleaving operations.  See whether the
+     target supports them directly.  */
+  if (count != 3
+      && direct_internal_fn_supported_p (IFN_VEC_INTERLEAVE_LO, vectype,
+					 OPTIMIZE_FOR_SPEED)
+      && direct_internal_fn_supported_p (IFN_VEC_INTERLEAVE_HI, vectype,
+					 OPTIMIZE_FOR_SPEED))
+    return true;
+
+  /* Otherwise check for support in the form of general permutations.  */
   unsigned int nelt;
   if (VECTOR_MODE_P (mode) && GET_MODE_NUNITS (mode).is_constant (&nelt))
     {
@@ -4881,50 +4891,78 @@ vect_permute_store_chain (vec<tree> dr_c
       /* If length is not equal to 3 then only power of 2 is supported.  */
       gcc_assert (pow2p_hwi (length));
 
-      /* vect_grouped_store_supported ensures that this is constant.  */
-      unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
-      auto_vec_perm_indices sel (nelt);
-      sel.quick_grow (nelt);
-      for (i = 0, n = nelt / 2; i < n; i++)
+      if (direct_internal_fn_supported_p (IFN_VEC_INTERLEAVE_LO, vectype,
+					  OPTIMIZE_FOR_SPEED)
+	  && direct_internal_fn_supported_p (IFN_VEC_INTERLEAVE_HI, vectype,
+					     OPTIMIZE_FOR_SPEED))
+	{
+	  /* We could support the case where only one of the optabs is
+	     implemented, but that seems unlikely.  */
+	  perm_mask_low = NULL_TREE;
+	  perm_mask_high = NULL_TREE;
+	}
+      else
 	{
-	  sel[i * 2] = i;
-	  sel[i * 2 + 1] = i + nelt;
+	  /* vect_grouped_store_supported ensures that this is constant.  */
+	  unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
+	  auto_vec_perm_indices sel (nelt);
+	  sel.quick_grow (nelt);
+	  for (i = 0, n = nelt / 2; i < n; i++)
+	    {
+	      sel[i * 2] = i;
+	      sel[i * 2 + 1] = i + nelt;
+	    }
+	  perm_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+
+	  for (i = 0; i < nelt; i++)
+	    sel[i] += nelt / 2;
+	  perm_mask_high = vect_gen_perm_mask_checked (vectype, sel);
 	}
-	perm_mask_high = vect_gen_perm_mask_checked (vectype, sel);
 
-	for (i = 0; i < nelt; i++)
-	  sel[i] += nelt / 2;
-	perm_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+      for (i = 0, n = log_length; i < n; i++)
+	{
+	  for (j = 0; j < length / 2; j++)
+	    {
+	      vect1 = dr_chain[j];
+	      vect2 = dr_chain[j + length / 2];
 
-	for (i = 0, n = log_length; i < n; i++)
-	  {
-	    for (j = 0; j < length/2; j++)
-	      {
-		vect1 = dr_chain[j];
-		vect2 = dr_chain[j+length/2];
+	      /* Create interleaving stmt:
+		 high = VEC_PERM_EXPR <vect1, vect2,
+				       {0, nelt, 1, nelt + 1, ...}>  */
+	      low = make_temp_ssa_name (vectype, NULL, "vect_inter_low");
+	      if (perm_mask_low)
+		perm_stmt = gimple_build_assign (low, VEC_PERM_EXPR, vect1,
+						 vect2, perm_mask_low);
+	      else
+		{
+		  perm_stmt = gimple_build_call_internal
+		    (IFN_VEC_INTERLEAVE_LO, 2, vect1, vect2);
+		  gimple_set_lhs (perm_stmt, low);
+		}
+	      vect_finish_stmt_generation (stmt, perm_stmt, gsi);
+	      (*result_chain)[2 * j] = low;
 
-		/* Create interleaving stmt:
-		   high = VEC_PERM_EXPR <vect1, vect2, {0, nelt, 1, nelt+1,
-							...}>  */
-		high = make_temp_ssa_name (vectype, NULL, "vect_inter_high");
+	      /* Create interleaving stmt:
+		 high = VEC_PERM_EXPR <vect1, vect2,
+				      {nelt / 2, nelt * 3 / 2,
+				       nelt / 2 + 1, nelt * 3 / 2 + 1,
+				       ...}>  */
+	      high = make_temp_ssa_name (vectype, NULL, "vect_inter_high");
+	      if (perm_mask_high)
 		perm_stmt = gimple_build_assign (high, VEC_PERM_EXPR, vect1,
 						 vect2, perm_mask_high);
-		vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-		(*result_chain)[2*j] = high;
-
-		/* Create interleaving stmt:
-		   low = VEC_PERM_EXPR <vect1, vect2,
-					{nelt/2, nelt*3/2, nelt/2+1, nelt*3/2+1,
-					 ...}>  */
-		low = make_temp_ssa_name (vectype, NULL, "vect_inter_low");
-		perm_stmt = gimple_build_assign (low, VEC_PERM_EXPR, vect1,
-						 vect2, perm_mask_low);
-		vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-		(*result_chain)[2*j+1] = low;
-	      }
-	    memcpy (dr_chain.address (), result_chain->address (),
-		    length * sizeof (tree));
-	  }
+	      else
+		{
+		  perm_stmt = gimple_build_call_internal
+		    (IFN_VEC_INTERLEAVE_HI, 2, vect1, vect2);
+		  gimple_set_lhs (perm_stmt, high);
+		}
+	      vect_finish_stmt_generation (stmt, perm_stmt, gsi);
+	      (*result_chain)[2 * j + 1] = high;
+	    }
+	  memcpy (dr_chain.address (), result_chain->address (),
+		  length * sizeof (tree));
+	}
     }
 }
 
@@ -5235,7 +5273,16 @@ vect_grouped_load_supported (tree vectyp
       return false;
     }
 
-  /* Check that the permutation is supported.  */
+  /* Powers of 2 use a tree of extract operations.  See whether the
+     target supports them directly.  */
+  if (count != 3
+      && direct_internal_fn_supported_p (IFN_VEC_EXTRACT_EVEN, vectype,
+					 OPTIMIZE_FOR_SPEED)
+      && direct_internal_fn_supported_p (IFN_VEC_EXTRACT_ODD, vectype,
+					 OPTIMIZE_FOR_SPEED))
+    return true;
+
+  /* Otherwise check for support in the form of general permutations.  */
   unsigned int nelt;
   if (VECTOR_MODE_P (mode) && GET_MODE_NUNITS (mode).is_constant (&nelt))
     {
@@ -5464,17 +5511,30 @@ vect_permute_load_chain (vec<tree> dr_ch
       /* If length is not equal to 3 then only power of 2 is supported.  */
       gcc_assert (pow2p_hwi (length));
 
-      /* vect_grouped_load_supported ensures that this is constant.  */
-      unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
-      auto_vec_perm_indices sel (nelt);
-      sel.quick_grow (nelt);
-      for (i = 0; i < nelt; ++i)
-	sel[i] = i * 2;
-      perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);
-
-      for (i = 0; i < nelt; ++i)
-	sel[i] = i * 2 + 1;
-      perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel);
+      if (direct_internal_fn_supported_p (IFN_VEC_EXTRACT_EVEN, vectype,
+					  OPTIMIZE_FOR_SPEED)
+	  && direct_internal_fn_supported_p (IFN_VEC_EXTRACT_ODD, vectype,
+					     OPTIMIZE_FOR_SPEED))
+	{
+	  /* We could support the case where only one of the optabs is
+	     implemented, but that seems unlikely.  */
+	  perm_mask_even = NULL_TREE;
+	  perm_mask_odd = NULL_TREE;
+	}
+      else
+	{
+	  /* vect_grouped_load_supported ensures that this is constant.  */
+	  unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
+	  auto_vec_perm_indices sel (nelt);
+	  sel.quick_grow (nelt);
+	  for (i = 0; i < nelt; ++i)
+	    sel[i] = i * 2;
+	  perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);
+
+	  for (i = 0; i < nelt; ++i)
+	    sel[i] = i * 2 + 1;
+	  perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel);
+	}
 
       for (i = 0; i < log_length; i++)
 	{
@@ -5485,19 +5545,33 @@ vect_permute_load_chain (vec<tree> dr_ch
 
 	      /* data_ref = permute_even (first_data_ref, second_data_ref);  */
 	      data_ref = make_temp_ssa_name (vectype, NULL, "vect_perm_even");
-	      perm_stmt = gimple_build_assign (data_ref, VEC_PERM_EXPR,
-					       first_vect, second_vect,
-					       perm_mask_even);
+	      if (perm_mask_even)
+		perm_stmt = gimple_build_assign (data_ref, VEC_PERM_EXPR,
+						 first_vect, second_vect,
+						 perm_mask_even);
+	      else
+		{
+		  perm_stmt = gimple_build_call_internal
+		    (IFN_VEC_EXTRACT_EVEN, 2, first_vect, second_vect);
+		  gimple_set_lhs (perm_stmt, data_ref);
+		}
 	      vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-	      (*result_chain)[j/2] = data_ref;
+	      (*result_chain)[j / 2] = data_ref;
 
 	      /* data_ref = permute_odd (first_data_ref, second_data_ref);  */
 	      data_ref = make_temp_ssa_name (vectype, NULL, "vect_perm_odd");
-	      perm_stmt = gimple_build_assign (data_ref, VEC_PERM_EXPR,
-					       first_vect, second_vect,
-					       perm_mask_odd);
+	      if (perm_mask_odd)
+		perm_stmt = gimple_build_assign (data_ref, VEC_PERM_EXPR,
+						 first_vect, second_vect,
+						 perm_mask_odd);
+	      else
+		{
+		  perm_stmt = gimple_build_call_internal
+		    (IFN_VEC_EXTRACT_ODD, 2, first_vect, second_vect);
+		  gimple_set_lhs (perm_stmt, data_ref);
+		}
 	      vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-	      (*result_chain)[j/2+length/2] = data_ref;
+	      (*result_chain)[j / 2 + length / 2] = data_ref;
 	    }
 	  memcpy (dr_chain.address (), result_chain->address (),
 		  length * sizeof (tree));
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/tree-vect-stmts.c	2017-11-09 13:21:02.327069240 +0000
@@ -1796,6 +1796,46 @@ perm_mask_for_reverse (tree vectype)
   return vect_gen_perm_mask_checked (vectype, sel);
 }
 
+/* Return true if the target can reverse the elements in a vector of
+   type VECTOR_TYPE.  */
+
+static bool
+can_reverse_vector_p (tree vector_type)
+{
+  return (direct_internal_fn_supported_p (IFN_VEC_REVERSE, vector_type,
+					  OPTIMIZE_FOR_SPEED)
+	  || perm_mask_for_reverse (vector_type));
+}
+
+/* Generate a statement to reverse the elements in vector INPUT and
+   return the SSA name that holds the result.  GSI is a statement iterator
+   pointing to STMT, which is the scalar statement we're vectorizing.
+   VEC_DEST is the destination variable with which new SSA names
+   should be associated.  */
+
+static tree
+reverse_vector (tree vec_dest, tree input, gimple *stmt,
+		gimple_stmt_iterator *gsi)
+{
+  tree new_temp = make_ssa_name (vec_dest);
+  tree vector_type = TREE_TYPE (input);
+  gimple *perm_stmt;
+  if (direct_internal_fn_supported_p (IFN_VEC_REVERSE, vector_type,
+				      OPTIMIZE_FOR_SPEED))
+    {
+      perm_stmt = gimple_build_call_internal (IFN_VEC_REVERSE, 1, input);
+      gimple_set_lhs (perm_stmt, new_temp);
+    }
+  else
+    {
+      tree perm_mask = perm_mask_for_reverse (vector_type);
+      perm_stmt = gimple_build_assign (new_temp, VEC_PERM_EXPR,
+				       input, input, perm_mask);
+    }
+  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
+  return new_temp;
+}
+
 /* A subroutine of get_load_store_type, with a subset of the same
    arguments.  Handle the case where STMT is part of a grouped load
    or store.
@@ -1999,7 +2039,7 @@ get_negative_load_store_type (gimple *st
       return VMAT_CONTIGUOUS_DOWN;
     }
 
-  if (!perm_mask_for_reverse (vectype))
+  if (!can_reverse_vector_p (vectype))
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -6760,20 +6800,10 @@ vectorizable_store (gimple *stmt, gimple
 
 	      if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
 		{
-		  tree perm_mask = perm_mask_for_reverse (vectype);
 		  tree perm_dest 
 		    = vect_create_destination_var (gimple_assign_rhs1 (stmt),
 						   vectype);
-		  tree new_temp = make_ssa_name (perm_dest);
-
-		  /* Generate the permute statement.  */
-		  gimple *perm_stmt 
-		    = gimple_build_assign (new_temp, VEC_PERM_EXPR, vec_oprnd,
-					   vec_oprnd, perm_mask);
-		  vect_finish_stmt_generation (stmt, perm_stmt, gsi);
-
-		  perm_stmt = SSA_NAME_DEF_STMT (new_temp);
-		  vec_oprnd = new_temp;
+		  vec_oprnd = reverse_vector (perm_dest, vec_oprnd, stmt, gsi);
 		}
 
 	      /* Arguments are ready.  Create the new vector stmt.  */
@@ -7998,9 +8028,7 @@ vectorizable_load (gimple *stmt, gimple_
 
 	      if (memory_access_type == VMAT_CONTIGUOUS_REVERSE)
 		{
-		  tree perm_mask = perm_mask_for_reverse (vectype);
-		  new_temp = permute_vec_elements (new_temp, new_temp,
-						   perm_mask, stmt, gsi);
+		  new_temp = reverse_vector (vec_dest, new_temp, stmt, gsi);
 		  new_stmt = SSA_NAME_DEF_STMT (new_temp);
 		}
 
Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md	2017-11-09 13:21:01.989917982 +0000
+++ gcc/config/aarch64/iterators.md	2017-11-09 13:21:02.322561871 +0000
@@ -1556,6 +1556,11 @@ (define_int_attr pauth_hint_num_a [(UNSP
 				    (UNSPEC_PACI1716 "8")
 				    (UNSPEC_AUTI1716 "12")])
 
+(define_int_attr perm_optab [(UNSPEC_ZIP1 "vec_interleave_lo")
+			     (UNSPEC_ZIP2 "vec_interleave_hi")
+			     (UNSPEC_UZP1 "vec_extract_even")
+			     (UNSPEC_UZP2 "vec_extract_odd")])
+
 (define_int_attr perm_insn [(UNSPEC_ZIP1 "zip") (UNSPEC_ZIP2 "zip")
 			    (UNSPEC_TRN1 "trn") (UNSPEC_TRN2 "trn")
 			    (UNSPEC_UZP1 "uzp") (UNSPEC_UZP2 "uzp")])
Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2017-11-09 13:21:01.989917982 +0000
+++ gcc/config/aarch64/aarch64-sve.md	2017-11-09 13:21:02.320758923 +0000
@@ -630,6 +630,19 @@ (define_expand "vec_perm<mode>"
   }
 )
 
+(define_expand "<perm_optab>_<mode>"
+  [(set (match_operand:SVE_ALL 0 "register_operand")
+	(unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand")
+			 (match_operand:SVE_ALL 2 "register_operand")]
+			OPTAB_PERMUTE))]
+  "TARGET_SVE && !GET_MODE_NUNITS (<MODE>mode).is_constant ()")
+
+(define_expand "vec_reverse_<mode>"
+  [(set (match_operand:SVE_ALL 0 "register_operand")
+	(unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand")]
+			UNSPEC_REV))]
+  "TARGET_SVE && !GET_MODE_NUNITS (<MODE>mode).is_constant ()")
+
 (define_insn "*aarch64_sve_tbl<mode>"
   [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
 	(unspec:SVE_ALL
Index: gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-2.c	2017-11-09 13:21:02.323463345 +0000
@@ -51,7 +51,4 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" {xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
-/* Requires reverse for variable-length SVE, which is implemented for
-   by a later patch.  Until then we report it twice, once for SVE and
-   once for 128-bit Advanced SIMD.  */
-/* { dg-final { scan-tree-dump-times "dependence distance negative" 1 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "dependence distance negative" 1 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/no-vfa-vect-depend-3.c	2017-11-09 13:21:02.323463345 +0000
@@ -183,7 +183,4 @@ int main ()
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 4 "vect" {xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
-/* f4 requires reverse for SVE, which is implemented by a later patch.
-   Until then we report it twice, once for SVE and once for 128-bit
-   Advanced SIMD.  */
-/* { dg-final { scan-tree-dump-times "dependence distance negative" 4 "vect" { xfail { aarch64_sve && vect_variable_length } } } } */
+/* { dg-final { scan-tree-dump-times "dependence distance negative" 4 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr33953.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr33953.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/pr33953.c	2017-11-09 13:21:02.323463345 +0000
@@ -29,6 +29,6 @@ void blockmove_NtoN_blend_noremap32 (con
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_align && { ! vect_hw_misalign } } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { { vect_no_align && { ! vect_hw_misalign } } || vect_variable_length } } } } */
 
 
Index: gcc/testsuite/gcc.dg/vect/pr68445.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr68445.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/pr68445.c	2017-11-09 13:21:02.323463345 +0000
@@ -16,4 +16,4 @@ void IMB_double_fast_x (int *destf, int
     }
 }
 
-/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { xfail vect_variable_length } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-12a.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-12a.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-12a.c	2017-11-09 13:21:02.323463345 +0000
@@ -75,5 +75,5 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided8 && vect_int_mult } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { vect_strided8 && vect_int_mult } xfail vect_variable_length } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { vect_strided8 && vect_int_mult } } } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-13-big-array.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-13-big-array.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-13-big-array.c	2017-11-09 13:21:02.324364818 +0000
@@ -134,4 +134,4 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_pack_trunc } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-13.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-13.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-13.c	2017-11-09 13:21:02.324364818 +0000
@@ -128,4 +128,4 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 2 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && { ! vect_pack_trunc } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { ! vect_pack_trunc } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 3 loops" 1 "vect" { target { { vect_interleave && vect_extract_even_odd } && vect_pack_trunc } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */
Index: gcc/testsuite/gcc.dg/vect/slp-14.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-14.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-14.c	2017-11-09 13:21:02.324364818 +0000
@@ -111,5 +111,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_int_mult } } }  */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult xfail vect_variable_length } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-15.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-15.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-15.c	2017-11-09 13:21:02.324364818 +0000
@@ -112,6 +112,6 @@ int main (void)
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  {target vect_int_mult } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect"  {target  { ! { vect_int_mult } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" {target vect_int_mult } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target vect_int_mult xfail vect_variable_length } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" {target { ! { vect_int_mult } } } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-42.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-42.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-42.c	2017-11-09 13:21:02.324364818 +0000
@@ -15,5 +15,5 @@ void foo (int n)
     }
 }
 
-/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" { xfail vect_variable_length } } } */
 /* { dg-final { scan-tree-dump "vectorized 1 loops" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-multitypes-2.c	2017-11-09 13:21:02.324364818 +0000
@@ -77,5 +77,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect"  } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { xfail vect_variable_length } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-multitypes-4.c	2017-11-09 13:21:02.324364818 +0000
@@ -52,5 +52,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { target vect_unpack } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  { target vect_unpack } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_unpack xfail vect_variable_length } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-multitypes-5.c	2017-11-09 13:21:02.324364818 +0000
@@ -52,5 +52,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_pack_trunc } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_pack_trunc } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target vect_pack_trunc xfail vect_variable_length } } } */
   
Index: gcc/testsuite/gcc.dg/vect/slp-reduc-4.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-reduc-4.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-4.c	2017-11-09 13:21:02.325266292 +0000
@@ -57,5 +57,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_min_max } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_int_min_max } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_int_min_max || vect_variable_length } } } } */
 
Index: gcc/testsuite/gcc.dg/vect/slp-reduc-7.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/slp-reduc-7.c	2017-11-09 13:21:01.989917982 +0000
+++ gcc/testsuite/gcc.dg/vect/slp-reduc-7.c	2017-11-09 13:21:02.325266292 +0000
@@ -55,5 +55,5 @@ int main (void)
 }
 
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail vect_no_int_add } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail vect_no_int_add } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { xfail { vect_no_int_add || vect_variable_length } } } } */
 
Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_2.c
===================================================================
--- /dev/null	2017-11-09 12:47:20.377612760 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_2.c	2017-11-09 13:21:02.325266292 +0000
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include <stdint.h>
+
+#define VEC_PERM(TYPE)						\
+TYPE __attribute__ ((noinline, noclone))			\
+vec_reverse_##TYPE (TYPE *restrict a, TYPE *restrict b, int n)	\
+{								\
+  for (int i = 0; i < n; ++i)					\
+    a[i] = b[n - i - 1];					\
+}
+
+#define TEST_ALL(T)				\
+  T (int8_t)					\
+  T (uint8_t)					\
+  T (int16_t)					\
+  T (uint16_t)					\
+  T (int32_t)					\
+  T (uint32_t)					\
+  T (int64_t)					\
+  T (uint64_t)					\
+  T (float)					\
+  T (double)
+
+TEST_ALL (VEC_PERM)
+
+/* { dg-final { scan-assembler-times {\trev\tz[0-9]+\.b, z[0-9]+\.b\n} 2 } } */
+/* { dg-final { scan-assembler-times {\trev\tz[0-9]+\.h, z[0-9]+\.h\n} 2 } } */
+/* { dg-final { scan-assembler-times {\trev\tz[0-9]+\.s, z[0-9]+\.s\n} 3 } } */
+/* { dg-final { scan-assembler-times {\trev\tz[0-9]+\.d, z[0-9]+\.d\n} 3 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_2_run.c
===================================================================
--- /dev/null	2017-11-09 12:47:20.377612760 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_2_run.c	2017-11-09 13:21:02.325266292 +0000
@@ -0,0 +1,29 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include "sve_vec_perm_2.c"
+
+#define N 153
+
+#define HARNESS(TYPE)						\
+  {								\
+    TYPE a[N], b[N];						\
+    for (unsigned int i = 0; i < N; ++i)			\
+      {								\
+	b[i] = i * 2 + i % 5;					\
+	asm volatile ("" ::: "memory");				\
+      }								\
+    vec_reverse_##TYPE (a, b, N);				\
+    for (unsigned int i = 0; i < N; ++i)			\
+      {								\
+	TYPE expected = (N - i - 1) * 2 + (N - i - 1) % 5;	\
+	if (a[i] != expected)					\
+	  __builtin_abort ();					\
+      }								\
+  }
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+  TEST_ALL (HARNESS)
+}
Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_3.c
===================================================================
--- /dev/null	2017-11-09 12:47:20.377612760 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_3.c	2017-11-09 13:21:02.325266292 +0000
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=scalable" } */
+
+#include <stdint.h>
+
+#define VEC_PERM(TYPE)					\
+TYPE __attribute__ ((noinline, noclone))		\
+vec_zip_##TYPE (TYPE *restrict a, TYPE *restrict b,	\
+		TYPE *restrict c, long n)		\
+{							\
+  for (long i = 0; i < n; ++i)				\
+    {							\
+      a[i * 8] = c[i * 4];				\
+      a[i * 8 + 1] = b[i * 4];				\
+      a[i * 8 + 2] = c[i * 4 + 1];			\
+      a[i * 8 + 3] = b[i * 4 + 1];			\
+      a[i * 8 + 4] = c[i * 4 + 2];			\
+      a[i * 8 + 5] = b[i * 4 + 2];			\
+      a[i * 8 + 6] = c[i * 4 + 3];			\
+      a[i * 8 + 7] = b[i * 4 + 3];			\
+    }							\
+}
+
+#define TEST_ALL(T)				\
+  T (int8_t)					\
+  T (uint8_t)					\
+  T (int16_t)					\
+  T (uint16_t)					\
+  T (int32_t)					\
+  T (uint32_t)					\
+  T (int64_t)					\
+  T (uint64_t)					\
+  T (float)					\
+  T (double)
+
+TEST_ALL (VEC_PERM)
+
+/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 24 } } */
+/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 24 } } */
+/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 24 } } */
+/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 24 } } */
+/* Currently we can't use SLP for groups bigger than 128 bits.  */
+/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 36 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 36 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 36 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 36 { xfail *-*-* } } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_3_run.c
===================================================================
--- /dev/null	2017-11-09 12:47:20.377612760 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_3_run.c	2017-11-09 13:21:02.325266292 +0000
@@ -0,0 +1,31 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include "sve_vec_perm_3.c"
+
+#define N (43 * 8)
+
+#define HARNESS(TYPE)						\
+  {								\
+    TYPE a[N], b[N], c[N];					\
+    for (unsigned int i = 0; i < N; ++i)			\
+      {								\
+	b[i] = i * 2 + i % 5;					\
+	c[i] = i * 3;						\
+	asm volatile ("" ::: "memory");				\
+      }								\
+    vec_zip_##TYPE (a, b, c, N / 8);				\
+    for (unsigned int i = 0; i < N / 2; ++i)			\
+      {								\
+	TYPE expected1 = i * 3;					\
+	TYPE expected2 = i * 2 + i % 5;				\
+	if (a[i * 2] != expected1 || a[i * 2 + 1] != expected2)	\
+	  __builtin_abort ();					\
+      }								\
+  }
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+  TEST_ALL (HARNESS)
+}
Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_4.c
===================================================================
--- /dev/null	2017-11-09 12:47:20.377612760 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_4.c	2017-11-09 13:21:02.325266292 +0000
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=scalable" } */
+
+#include <stdint.h>
+
+#define VEC_PERM(TYPE)					\
+TYPE __attribute__ ((noinline, noclone))		\
+vec_uzp_##TYPE (TYPE *restrict a, TYPE *restrict b,	\
+		 TYPE *restrict c, long n)		\
+{							\
+  for (long i = 0; i < n; ++i)				\
+    {							\
+      a[i * 4] = c[i * 8];				\
+      b[i * 4] = c[i * 8 + 1];				\
+      a[i * 4 + 1] = c[i * 8 + 2];			\
+      b[i * 4 + 1] = c[i * 8 + 3];			\
+      a[i * 4 + 2] = c[i * 8 + 4];			\
+      b[i * 4 + 2] = c[i * 8 + 5];			\
+      a[i * 4 + 3] = c[i * 8 + 6];			\
+      b[i * 4 + 3] = c[i * 8 + 7];			\
+    }							\
+}
+
+#define TEST_ALL(T)				\
+  T (int8_t)					\
+  T (uint8_t)					\
+  T (int16_t)					\
+  T (uint16_t)					\
+  T (int32_t)					\
+  T (uint32_t)					\
+  T (int64_t)					\
+  T (uint64_t)					\
+  T (float)					\
+  T (double)
+
+TEST_ALL (VEC_PERM)
+
+/* We could use a single uzp1 and uzp2 per function by implementing
+   SLP load permutation for variable width.  XFAIL until then.  */
+/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 2 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 3 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.d, z[0-9]+\.d, z[0-9]+\.d\n} 3 { xfail *-*-* } } } */
+/* Delete these if the tests above start passing instead.  */
+/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 24 } } */
+/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.b, z[0-9]+\.b, z[0-9]+\.b\n} 24 } } */
+/* { dg-final { scan-assembler-times {\tuzp1\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 24 } } */
+/* { dg-final { scan-assembler-times {\tuzp2\tz[0-9]+\.h, z[0-9]+\.h, z[0-9]+\.h\n} 24 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve_vec_perm_4_run.c
===================================================================
--- /dev/null	2017-11-09 12:47:20.377612760 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve_vec_perm_4_run.c	2017-11-09 13:21:02.325266292 +0000
@@ -0,0 +1,29 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include "sve_vec_perm_4.c"
+
+#define N (43 * 8)
+
+#define HARNESS(TYPE)					\
+  {							\
+    TYPE a[N], b[N], c[N];				\
+    for (unsigned int i = 0; i < N; ++i)		\
+      {							\
+	c[i] = i * 2 + i % 5;				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+    vec_uzp_##TYPE (a, b, c, N / 8);			\
+    for (unsigned int i = 0; i < N; ++i)		\
+      {							\
+	TYPE expected = i * 2 + i % 5;			\
+	if ((i & 1 ? b[i / 2] : a[i / 2]) != expected)	\
+	  __builtin_abort ();				\
+      }							\
+  }
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+  TEST_ALL (HARNESS)
+}