From patchwork Mon May 27 09:51:06 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
X-Patchwork-Id: 165201
Delivered-To: patch@linaro.org
Received: by 2002:a92:9e1a:0:0:0:0:0 with SMTP id q26csp6943328ili;
 Mon, 27 May 2019 02:52:03 -0700 (PDT)
X-Google-Smtp-Source: APXvYqwuG39GpxfT2Yw1BUW7JiAZdmS8yyvXskEVk1rMdfUWX7bzHYNwCEekDvhav9S/76Jd4Aip
X-Received: by 2002:aa7:942f:: with SMTP id
 y15mr76974002pfo.121.1558950723130; 
 Mon, 27 May 2019 02:52:03 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1558950723; cv=none;
 d=google.com; s=arc-20160816;
 b=KIwoGCl5LI0lD5G1vWFjwkHzBc2M9zGnZggiIw7BYsxNo3Z0Am6DbzYrkMFeFvy3tn
 /ludm3tx3MYagO/JfSzRKdKUYKy+iJ0p/2nlX/wvNPzXg7L55QYkhujpUrIprfZuVetd
 nnyp+rBDoJI1+wMp0gm+J9Q+U/ojK3lYX9PbbgZ0nI2DraULm9EKlvXtcrpy1I15lpIe
 yW4Y9HG0hXkQlZqpl29NumAiZXKajTdAhliSp8+ROVfXdfpAlZH03OBQJiw8/Yx94EAh
 7YmwqPVA5b3wr99odGePyHXpsBAiCBkNSr5pATIvEHn8a3iO9f8r605Fk+cT2NLKsLYz
 pXhQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=to:subject:message-id:date:from:mime-version:dkim-signature
 :delivered-to:sender:list-help:list-post:list-archive
 :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature
 :domainkey-signature;
 bh=z/g6jsCmkqA2a/jKdgP1+ZHzk6wBt1vQX4DLCUyaJRo=;
 b=U41J/ooYHVO3HsOa6hX01gftRvfgTxnKYutQqYoxMGxb2QcKZSb7fmoGrrcH7aB0Ni
 1iA2z+ttswsRgdOD3gNIi2yyqjXH0ocgeV5+sKfUHpgCNfSi2CEI8OMiFPu3DoOvSQde
 wU6RF8axE86DZutkB/vuVH8X7JvXBgTuYgeQaX8XlbrBgH2eIvyWK3nUUlj5nfhwuYUU
 N8VLgZDLYZ+S/WSP373QhiRfx+PMaKVhRtQFXo0i23rsZirdFRx4eHFyiuILo4P/vjmJ
 fkMJK4na6rvf6RPwEV1VFRmrrYvuFgO+RIw3R7EkBFiXq6+HsH54tDK4JLoeNeFhk3gH
 NtDQ==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@gcc.gnu.org header.s=default header.b=xMDRswOA;
 dkim=neutral (body hash did not verify) header.i=@linaro.org
 header.s=google header.b=uHcVWL9p; 
 spf=pass (google.com: domain of
 gcc-patches-return-501708-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom="gcc-patches-return-501708-patch=linaro.org@gcc.gnu.org";
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <gcc-patches-return-501708-patch=linaro.org@gcc.gnu.org>
Received: from sourceware.org (server1.sourceware.org. [209.132.180.131])
 by mx.google.com with ESMTPS id
 j88si17156945pje.35.2019.05.27.02.52.02 for <patch@linaro.org>
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 27 May 2019 02:52:03 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-return-501708-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 client-ip=209.132.180.131; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@gcc.gnu.org header.s=default header.b=xMDRswOA;
 dkim=neutral (body hash did not verify) header.i=@linaro.org
 header.s=google header.b=uHcVWL9p; 
 spf=pass (google.com: domain of
 gcc-patches-return-501708-patch=linaro.org@gcc.gnu.org
 designates 209.132.180.131 as permitted sender)
 smtp.mailfrom="gcc-patches-return-501708-patch=linaro.org@gcc.gnu.org";
 dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :mime-version:from:date:message-id:subject:to:content-type; q=
 dns; s=default; b=ni4kFCF4WOzwhKctBNlS92JUTChSAsPgpHTDACmyquXNdh
 XT2Lk0zOPFL1pTfntOJzgGh8G9bmbjBCUuvFYGhAXiZ3SDsvoKDvAsdxenvbRzom
 aaGgP2wx5Atvm+ksrtdSPn1K+RQ4328WLXpPva3dTvUYgdn9TbgMD9R4enqLo=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
 :list-unsubscribe:list-archive:list-post:list-help:sender
 :mime-version:from:date:message-id:subject:to:content-type; s=
 default; bh=LyVG/G73ZjiF2qPsXJPOjL9rmu8=; b=xMDRswOAh7Eh4BTcBbDN
 LRjI9jbN7f9+GswxEzi4vygmXMQoYWUn9LLeIh4HeNKF5kbF76/f4qsJJYWWrIyw
 LB96TdAy4edllRFvU46AIyP87NRJI/4TiSJ17SzxuytKtFhFxe4yAUQmVlVtCxDX
 VZfZ+I/2uIHaeFQJwY9PD2k=
Received: (qmail 34914 invoked by alias); 27 May 2019 09:51:49 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-patch=linaro.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 34906 invoked by uid 89); 27 May 2019 09:51:49 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-22.7 required=5.0 tests=AWL, BAYES_00,
 GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3,
 RCVD_IN_DNSWL_NONE,
 SPF_PASS autolearn=ham version=3.3.1 spammy=54, 1.1, zip, 51
X-HELO: mail-lj1-f180.google.com
Received: from mail-lj1-f180.google.com (HELO mail-lj1-f180.google.com)
 (209.85.208.180) by sourceware.org
 (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP;
 Mon, 27 May 2019 09:51:45 +0000
Received: by mail-lj1-f180.google.com with SMTP id z5so14140010lji.10 for
 <gcc-patches@gcc.gnu.org>; Mon, 27 May 2019 02:51:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=mime-version:from:date:message-id:subject:to;
 bh=xrenmyzYgSCl1PfxyOPYnX9C0U0A9xhYvC2jZVSlCyQ=;
 b=uHcVWL9prktF7f91n6K/kj04yaWa9xXhTkDQ9/2VKDbUdFNw2os52J4MlokK1XV4gF
 /oDUWT8vWnN2T8aVznJc/c2SIV3lcGhfx95zlxiJnLnpaSOGuUs6myE9qEzVDa2fnrPV
 k2B06rZTjCpnVejgU1zhu5quaxg9e0fYTYSvf9nEMwkcZcJ4pp0ywoxGRm3T+DL1zMvZ
 j3Hh4ekBWEQryd1+Tnbr91BaOCiRCXdoo5T/nZx3ZuaTLHw3J2P3owK3nmD/x3cEpLR/
 GLCQhhe1CZtTEA7d65w1ZKWnJFbHegjlKxOVHrTYuGta0AuQn3DNVnKjJT5WtB4fVCHg
 Odwg==
MIME-Version: 1.0
From: Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org>
Date: Mon, 27 May 2019 15:21:06 +0530
Message-ID: <CAAgBjM=-CJO=OkkRArPUrDy5JpVAQyALLC0Dq909QJVL6GGDGQ@mail.gmail.com>
Subject: [AArch64] [SVE] PR88837 - Poor vector construction code in
 VL-specific mode
To: gcc Patches <gcc-patches@gcc.gnu.org>,
 Richard Sandiford <richard.sandiford@arm.com>
X-IsSubscribed: yes

Hi,
The attached patch tries to improve initialization for fixed-length
SVE vector and it's algorithm is described in comments for
aarch64_sve_expand_vector_init() in the patch, with help from Richard
Sandiford. I verified tests added in the patch pass with qemu and am
trying to run bootstrap+test on patch in qemu.
Does the patch look OK ?

Thanks,
Prathamesh
2019-05-27  Prathamesh Kulkarni  <prathamesh.kulkarni@linaro.org>
	    Richard Sandiford  <richard.sandiford@arm.com>

	* vector-builder.h (vector_builder::count_dups): New method.
	* config/aarch64/aarch64-protos.h (aarch64_expand_sve_vector_init):
	Declare prototype.
	* config/aarch64/aarch64/sve.md (aarch64_sve_rev64<mode>): Use @.
	(vec_init<mode><Vel>): New pattern.
	* config/aarch64/aarch64.c (emit_insr): New function.
	(aarch64_sve_expand_vector_init_handle_trailing_constants): Likewise.
	(aarch64_sve_expand_vector_init_insert_elems): Likewise.
	(aarch64_sve_expand_vector_init_handle_trailing_same_elem): Likewise.
	(aarch64_sve_expand_vector_init): Define two overloaded functions.

testsuite/
	* gcc.target/aarch64/sve/init_1.c: New test.
	* gcc.target/aarch64/sve/init_1_run.c: Likewise.
	* gcc.target/aarch64/sve/init_2.c: Likewise.
	* gcc.target/aarch64/sve/init_2_run.c: Likewise.
	* gcc.target/aarch64/sve/init_3.c: Likewise.
	* gcc.target/aarch64/sve/init_3_run.c: Likewise.
	* gcc.target/aarch64/sve/init_4.c: Likewise.
	* gcc.target/aarch64/sve/init_4_run.c: Likewise.
	* gcc.target/aarch64/sve/init_5.c: Likewise.
	* gcc.target/aarch64/sve/init_5_run.c: Likewise.
	* gcc.target/aarch64/sve/init_6.c: Likewise.
	* gcc.target/aarch64/sve/init_6_run.c: Likewise.
	* gcc.target/aarch64/sve/init_7.c: Likewise.
	* gcc.target/aarch64/sve/init_7_run.c: Likewise.
	* gcc.target/aarch64/sve/init_8.c: Likewise.
	* gcc.target/aarch64/sve/init_8_run.c: Likewise.
	* gcc.target/aarch64/sve/init_9.c: Likewise.
	* gcc.target/aarch64/sve/init_9_run.c: Likewise.
	* gcc.target/aarch64/sve/init_10.c: Likewise.
	* gcc.target/aarch64/sve/init_10_run.c: Likewise.
	* gcc.target/aarch64/sve/init_11.c: Likewise.
	* gcc.target/aarch64/sve/init_11_run.c: Likewise.
	* gcc.target/aarch64/sve/init_12.c: Likewise.
	* gcc.target/aarch64/sve/init_12_run.c: Likewise.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b6c0d0a8eb6..f82728ed2d3 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -515,6 +515,7 @@ bool aarch64_maybe_expand_sve_subreg_move (rtx, rtx);
 void aarch64_split_sve_subreg_move (rtx, rtx, rtx);
 void aarch64_expand_prologue (void);
 void aarch64_expand_vector_init (rtx, rtx);
+void aarch64_sve_expand_vector_init (rtx, rtx);
 void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
 				   const_tree, unsigned);
 void aarch64_init_expanders (void);
diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index b9cb1fae98c..a4e0014eb3d 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -863,7 +863,7 @@
   "revb\t%0.h, %1/m, %2.h"
 )
 
-(define_insn "*aarch64_sve_rev<mode>"
+(define_insn "@aarch64_sve_rev<mode>"
   [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
 	(unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "w")]
 			UNSPEC_REV))]
@@ -3207,3 +3207,15 @@
     DONE;
   }
 )
+
+;; Standard pattern name vec_init<mode><Vel>.
+
+(define_expand "vec_init<mode><Vel>"
+  [(match_operand:SVE_ALL 0 "register_operand" "")
+    (match_operand 1 "" "")]
+  "TARGET_SVE"
+  {
+    aarch64_sve_expand_vector_init (operands[0], operands[1]);
+    DONE;
+  }
+)
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 83453d03095..8967e02524e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -15244,6 +15244,261 @@ aarch64_expand_vector_init (rtx target, rtx vals)
     }
 }
 
+/* Emit RTL corresponding to:
+   insr TARGET, ELEM.  */
+
+static void
+emit_insr (rtx target, rtx elem)
+{
+  machine_mode mode = GET_MODE (target);
+  scalar_mode elem_mode = GET_MODE_INNER (mode);
+  elem = force_reg (elem_mode, elem);
+
+  insn_code icode = optab_handler (vec_shl_insert_optab, mode);
+  gcc_assert (icode != CODE_FOR_nothing);
+  emit_insn (GEN_FCN (icode) (target, target, elem));
+}
+
+/* Subroutine of aarch64_sve_expand_vector_init for handling
+   trailing constants.
+   This function works as follows:
+   (a) Create a new vector consisting of trailing constants.
+   (b) Initialize TARGET with the constant vector using emit_move_insn.
+   (c) Insert remaining elements in TARGET using insr.
+   NELTS is the total number of elements in original vector while
+
+   ??? The heuristic used is to do above only if number of constants
+   is at least half the total number of elements.  May need fine tuning.  */
+
+static bool
+aarch64_sve_expand_vector_init_handle_trailing_constants
+ (rtx target, const rtx_vector_builder &builder, int nelts, int nelts_reqd)
+{
+  machine_mode mode = GET_MODE (target);
+  scalar_mode elem_mode = GET_MODE_INNER (mode);
+  int n_trailing_constants = 0;
+
+  for (int i = nelts_reqd - 1;
+       i >= 0 && aarch64_legitimate_constant_p (elem_mode, builder.elt (i));
+       i--)
+    n_trailing_constants++;
+
+  if (n_trailing_constants >= nelts_reqd / 2)
+    {
+      rtx_vector_builder v (mode, 1, nelts);
+      for (int i = 0; i < nelts; i++)
+	v.quick_push (builder.elt (i + nelts_reqd - n_trailing_constants));
+      rtx const_vec = v.build ();
+      emit_move_insn (target, const_vec);
+
+      for (int i = nelts_reqd - n_trailing_constants - 1; i >= 0; i--)
+	emit_insr (target, builder.elt (i));
+
+      return true;
+    }
+
+  return false;
+}
+
+/* Subroutine of aarch64_sve_expand_vector_init.
+   Works as follows:
+   (a) Initialize TARGET by broadcasting element NELTS_REQD - 1 of BUILDER.
+   (b) Skip trailing elements from BUILDER, which are same as
+       element NELTS_REQD - 1.
+   (c) Insert earlier elements in reverse order in TARGET using insr.  */
+
+static void
+aarch64_sve_expand_vector_init_insert_elems (rtx target,
+					     const rtx_vector_builder &builder,
+					     int nelts_reqd)
+{
+  machine_mode mode = GET_MODE (target);
+  scalar_mode elem_mode = GET_MODE_INNER (mode);
+
+  struct expand_operand ops[2];
+  enum insn_code icode = optab_handler (vec_duplicate_optab, mode);
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], builder.elt (nelts_reqd - 1), elem_mode);
+  expand_insn (icode, 2, ops);
+
+  int ndups = builder.count_dups (nelts_reqd - 1, -1, -1);
+  for (int i = nelts_reqd - ndups - 1; i >= 0; i--)
+    emit_insr (target, builder.elt (i));
+}
+
+/* Subroutine of aarch64_sve_expand_vector_init to handle case
+   when all trailing elements of builder are same.
+   This works as follows:
+   (a) Using expand_insn interface to broadcast last vector element in TARGET.
+   (b) Insert remaining elements in TARGET using insr.
+
+   ??? The heuristic used is to do above if number of same trailing elements
+   is at least 3/4 of total number of elements, loosely based on
+   heuristic from mostly_zeros_p. May need fine-tuning.  */
+
+static bool
+aarch64_sve_expand_vector_init_handle_trailing_same_elem
+ (rtx target, const rtx_vector_builder &builder, int nelts_reqd)
+{
+  int ndups = builder.count_dups (nelts_reqd - 1, -1, -1);
+  if (ndups >= (3 * nelts_reqd) / 4)
+    {
+      aarch64_sve_expand_vector_init_insert_elems (target, builder,
+						   nelts_reqd - ndups + 1);
+      return true;
+    }
+
+  return false;
+}
+
+/* Initialize register TARGET from BUILDER. NELTS is the constant number
+   of elements in BUILDER.
+
+   The function tries to initialize TARGET from BUILDER if it fits one
+   of the special cases outlined below.
+
+   Failing that, the function divides BUILDER into two sub-vectors:
+   v_even = even elements of BUILDER;
+   v_odd = odd elements of BUILDER;
+
+   and recursively calls itself with v_even and v_odd.
+
+   if (recursive call succeeded for v_even or v_odd)
+     TARGET = zip (v_even, v_odd)
+
+   The function returns true if it managed to build TARGET from BUILDER
+   with one of the special cases, false otherwise.
+
+   Example: {a, 1, b, 2, c, 3, d, 4}
+
+   The vector gets divided into:
+   v_even = {a, b, c, d}
+   v_odd = {1, 2, 3, 4}
+
+   aarch64_sve_expand_vector_init(v_odd) hits case 1 and
+   initialize tmp2 from constant vector v_odd using emit_move_insn.
+
+   aarch64_sve_expand_vector_init(v_even) fails since v_even contains
+   4 elements, so we construct tmp1 from v_even using insr:
+   tmp1 = dup(d)
+   insr tmp1, c
+   insr tmp1, b
+   insr tmp1, a
+
+   And finally:
+   TARGET = zip (tmp1, tmp2)
+   which sets TARGET to {a, 1, b, 2, c, 3, d, 4}.  */
+
+static bool
+aarch64_sve_expand_vector_init (rtx target, const rtx_vector_builder &builder,
+				int nelts, int nelts_reqd)
+{
+  machine_mode mode = GET_MODE (target);
+
+  /* Case 1: Vector contains trailing constants.  */
+
+  if (aarch64_sve_expand_vector_init_handle_trailing_constants
+       (target, builder, nelts, nelts_reqd))
+    return true;
+
+  /* Case 2: Vector contains leading constants.  */
+
+  rtx_vector_builder rev_builder (mode, 1, nelts_reqd);
+  for (int i = 0; i < nelts_reqd; i++)
+    rev_builder.quick_push (builder.elt (nelts_reqd - i - 1));
+  rev_builder.finalize ();
+
+  if (aarch64_sve_expand_vector_init_handle_trailing_constants
+       (target, rev_builder, nelts, nelts_reqd))
+    {
+      emit_insn (gen_aarch64_sve_rev (mode, target, target));
+      return true;
+    }
+
+  /* Case 3: Vector contains trailing same element.  */
+
+  if (aarch64_sve_expand_vector_init_handle_trailing_same_elem
+       (target, builder, nelts_reqd))
+    return true;
+
+  /* Case 4: Vector contains leading same element.  */
+
+  if (aarch64_sve_expand_vector_init_handle_trailing_same_elem
+       (target, rev_builder, nelts_reqd) && nelts_reqd == nelts)
+    {
+      emit_insn (gen_aarch64_sve_rev (mode, target, target));
+      return true;
+    }
+
+  /* Avoid recursing below 4-elements.
+     ??? The threshold 4 may need fine-tuning.  */
+
+  if (nelts_reqd <= 4)
+    return false;
+
+  rtx_vector_builder v_even (mode, 1, nelts);
+  rtx_vector_builder v_odd (mode, 1, nelts);
+
+  for (int i = 0; i < nelts * 2; i += 2)
+    {
+      v_even.quick_push (builder.elt (i));
+      v_odd.quick_push (builder.elt (i + 1));
+    }
+
+  v_even.finalize ();
+  v_odd.finalize ();
+
+  rtx tmp1 = gen_reg_rtx (mode);
+  bool did_even_p = aarch64_sve_expand_vector_init (tmp1, v_even,
+						    nelts, nelts_reqd / 2);
+
+  rtx tmp2 = gen_reg_rtx (mode);
+  bool did_odd_p = aarch64_sve_expand_vector_init (tmp2, v_odd,
+						   nelts, nelts_reqd / 2);
+
+  if (!did_even_p && !did_odd_p)
+    return false;
+
+  /* Initialize v_even and v_odd using INSR if it didn't match any of the
+     special cases and zip v_even, v_odd.  */
+
+  if (!did_even_p)
+    aarch64_sve_expand_vector_init_insert_elems (tmp1, v_even, nelts_reqd / 2);
+
+  if (!did_odd_p)
+    aarch64_sve_expand_vector_init_insert_elems (tmp2, v_odd, nelts_reqd / 2);
+
+  rtvec v = gen_rtvec (2, tmp1, tmp2);
+  emit_set_insn (target, gen_rtx_UNSPEC (mode, v, UNSPEC_ZIP1));
+  return true;
+}
+
+/* Initialize register TARGET from the elements in PARALLEL rtx VALS.  */
+
+void
+aarch64_sve_expand_vector_init (rtx target, rtx vals)
+{
+  machine_mode mode = GET_MODE (target);
+  int nelts = XVECLEN (vals, 0);
+
+  rtx_vector_builder v (mode, 1, nelts);
+  for (int i = 0; i < nelts; i++)
+    v.quick_push (XVECEXP (vals, 0, i));
+  v.finalize ();
+
+  /* If neither sub-vectors of v could be initialized specially,
+     then use INSR to insert all elements from v into TARGET.
+     ??? This might not be optimal for vectors with large
+     initializers like 16-element or above.
+     For nelts < 4, it probably isn't useful to handle specially.  */
+
+  if (nelts < 4
+      || !aarch64_sve_expand_vector_init (target, v, nelts, nelts))
+    aarch64_sve_expand_vector_init_insert_elems (target, v, nelts);
+}
+
 static unsigned HOST_WIDE_INT
 aarch64_shift_truncation_mask (machine_mode mode)
 {
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_1.c b/gcc/testsuite/gcc.target/aarch64/sve/init_1.c
new file mode 100644
index 00000000000..c51876947fb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_1.c
@@ -0,0 +1,27 @@
+/* { dg-do compile { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
+
+/* Case 1.1: Trailing constants with stepped sequence.  */
+
+#include <stdint.h>
+
+typedef int32_t vnx4si __attribute__((vector_size (32)));
+
+__attribute__((noipa))
+vnx4si foo(int a, int b)
+{
+  return (vnx4si) { a, b, 1, 2, 3, 4, 5, 6 };
+}
+
+/*
+foo:
+.LFB0:
+        .cfi_startproc
+        ptrue   p0.s, vl8
+        index   z0.s, #1, #1
+        insr    z0.s, w1
+        insr    z0.s, w0
+        ret
+*/
+
+/* { dg-final { scan-assembler {\tindex\t(z[0-9]+\.s), #1, #1\n\tinsr\t\1, w1\n\tinsr\t\1, w0} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_10.c b/gcc/testsuite/gcc.target/aarch64/sve/init_10.c
new file mode 100644
index 00000000000..7bca3f0ecc9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_10.c
@@ -0,0 +1,29 @@
+/* { dg-do compile { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
+
+/* Case 5.4: Interleaved repeating elements and non-repeating elements.  */
+
+#include <stdint.h>
+
+typedef int32_t vnx4si __attribute__((vector_size (32)));
+
+__attribute__((noipa))
+vnx4si foo(int a, int b, int c, int f)
+{
+  return (vnx4si) { a, f, b, f, c, f, c, f };
+}
+
+/*
+foo:
+.LFB0:
+        .cfi_startproc
+        mov     z0.s, w2
+        mov     z1.s, w3
+        insr    z0.s, w1
+        ptrue   p0.s, vl8
+        insr    z0.s, w0
+        zip1    z0.s, z0.s, z1.s
+        ret
+*/
+
+/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.s), w3\n\tmov\t(z[0-9]+\.s), w2\n.*\n\tinsr\t\2, w1\n\tinsr\t\2, w0\n\tzip1\t\2, \2, \1} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_10_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_10_run.c
new file mode 100644
index 00000000000..d9640e42ddd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_10_run.c
@@ -0,0 +1,21 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
+
+#include "init_10.c"
+
+int main()
+{
+  int a = 10;
+  int b = 11;
+  int c = 12;
+  int f = 13;
+
+  vnx4si v = foo (a, b, c, f);
+  int expected[] = { a, f, b, f, c, f, c, f };
+
+  for (int i = 0; i < 8; i++)
+    if (v[i] != expected[i])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_11.c b/gcc/testsuite/gcc.target/aarch64/sve/init_11.c
new file mode 100644
index 00000000000..b90895df436
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_11.c
@@ -0,0 +1,27 @@
+/* { dg-do compile { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
+
+/* Case 5.5: Interleaved repeating elements and trailing same elements.  */
+
+#include <stdint.h>
+
+typedef int32_t vnx4si __attribute__((vector_size (32)));
+
+vnx4si foo(int a, int b, int f) 
+{
+  return (vnx4si) { a, f, b, f, b, f, b, f };
+}
+
+/*
+foo:
+.LFB0:
+        .cfi_startproc
+        mov     z0.s, w1
+        mov     z1.s, w2
+        insr    z0.s, w0
+        ptrue   p0.s, vl8
+        zip1    z0.s, z0.s, z1.s
+        ret
+*/
+
+/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.s), w1\n\tmov\t(z[0-9]+\.s), w2\n\tinsr\t\1, w0\n.*\tzip1\t\1, \1, \2} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_11_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_11_run.c
new file mode 100644
index 00000000000..8a99da45433
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_11_run.c
@@ -0,0 +1,20 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
+
+#include "init_11.c"
+
+int main()
+{
+  int a = 10;
+  int b = 11;
+  int f = 12;
+
+  vnx4si v = foo (a, b, f);
+  int expected[] = { a, f, b, f, b, f, b, f };
+
+  for (int i = 0; i < 8; i++)
+    if (v[i] != expected[i])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_12.c b/gcc/testsuite/gcc.target/aarch64/sve/init_12.c
new file mode 100644
index 00000000000..b36967d6d59
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_12.c
@@ -0,0 +1,30 @@
+/* { dg-do compile { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
+
+/* Case 5.5: Interleaved repeating elements and trailing same elements.  */
+
+#include <stdint.h>
+
+typedef int32_t vnx4si __attribute__((vector_size (32)));
+
+__attribute__((noipa))
+vnx4si foo(int a, int b, int f) 
+{
+  return (vnx4si) { b, f, b, f, b, f, a, f };
+}
+
+/*
+foo:
+.LFB0:
+        .cfi_startproc
+        mov     z0.s, w0
+        mov     z1.s, w2
+        insr    z0.s, w1
+        ptrue   p0.s, vl8
+        insr    z0.s, w1
+        insr    z0.s, w1
+        zip1    z0.s, z0.s, z1.s
+        ret
+*/
+
+/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.s), w2\n\tmov\t(z[0-9]+\.s), w0\n.*\n\tinsr\t\2, w1\n\tinsr\t\2, w1\n\tinsr\t\2, w1\n\tzip1\t\2, \2, \1} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_12_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_12_run.c
new file mode 100644
index 00000000000..b77464c6b3c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_12_run.c
@@ -0,0 +1,20 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
+
+#include "init_12.c"
+
+int main()
+{
+  int a = 10;
+  int b = 11;
+  int f = 12;
+
+  vnx4si v = foo (a, b, f);
+  int expected[] = { b, f, b, f, b, f, a, f };
+
+  for (int i = 0; i < 8; i++)
+    if (v[i] != expected[i])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_1_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_1_run.c
new file mode 100644
index 00000000000..c0cc5235da4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_1_run.c
@@ -0,0 +1,19 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
+
+#include "init_1.c"
+
+int main()
+{
+  int a = 10;
+  int b = 11;
+
+  vnx4si v = foo (a, b);
+  int expected[] = { a, b, 1, 2, 3, 4, 5, 6 };
+
+  for (int i = 0; i < 8; i++)
+    if (v[i] != expected[i])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_2.c b/gcc/testsuite/gcc.target/aarch64/sve/init_2.c
new file mode 100644
index 00000000000..1ab7c4300e6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
+
+/* Case 1.2: Trailing constants with repeating sequence.  */
+
+#include <stdint.h>
+
+typedef int32_t vnx4si __attribute__((vector_size (32)));
+
+__attribute__((noipa))
+vnx4si foo(int a, int b)
+{
+  return (vnx4si) { a, b, 2, 3, 2, 3, 2, 3 };
+}
+
+/*
+foo:
+.LFB0:
+        .cfi_startproc
+        ptrue   p0.s, vl8
+        adrp    x2, .LANCHOR0
+        add     x2, x2, :lo12:.LANCHOR0
+        ld1w    z0.s, p0/z, [x2]
+        insr    z0.s, w1
+        insr    z0.s, w0
+        ret
+*/
+
+/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-9]+/z, \[x[0-9]+\]\n\tinsr\t\1, w1\n\tinsr\t\1, w0} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_2_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_2_run.c
new file mode 100644
index 00000000000..0f3705d145b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_2_run.c
@@ -0,0 +1,19 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
+
+#include "init_2.c"
+
+int main()
+{
+  int a = 10;
+  int b = 11;
+
+  vnx4si v = foo (a, b);
+  int expected[] = { a, b, 2, 3, 2, 3, 2, 3 };
+
+  for (int i = 0; i < 8; i++)
+    if (v[i] != expected[i])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_3.c b/gcc/testsuite/gcc.target/aarch64/sve/init_3.c
new file mode 100644
index 00000000000..ccf3fa85292
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_3.c
@@ -0,0 +1,28 @@
+/* { dg-do compile { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
+
+/* Case 2.1: Leading constants with stepped sequence.  */
+
+#include <stdint.h>
+
+typedef int32_t vnx4si __attribute__((vector_size (32)));
+
+__attribute__((noipa))
+vnx4si foo(int a, int b)
+{
+  return (vnx4si) { 1, 2, 3, 4, 5, 6, a, b };
+}
+
+/*
+foo:
+.LFB0:
+        .cfi_startproc
+        ptrue   p0.s, vl8
+        index   z0.s, #6, #-1
+        insr    z0.s, w0
+        insr    z0.s, w1
+        rev     z0.s, z0.s
+        ret
+*/
+
+/* { dg-final { scan-assembler {\tindex\t(z[0-9]+\.s), #6, #-1\n\tinsr\t\1, w0\n\tinsr\t\1, w1\n\trev\t\1, \1} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_3_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_3_run.c
new file mode 100644
index 00000000000..5df711dfc79
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_3_run.c
@@ -0,0 +1,19 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
+
+#include "init_3.c"
+
+int main()
+{
+  int a = 10;
+  int b = 11;
+
+  vnx4si v = foo (a, b);
+  int expected[] = { 1, 2, 3, 4, 5, 6, a, b };
+
+  for (int i = 0; i < 8; i++)
+    if (v[i] != expected[i])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_4.c b/gcc/testsuite/gcc.target/aarch64/sve/init_4.c
new file mode 100644
index 00000000000..b817dc5d9f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_4.c
@@ -0,0 +1,30 @@
+/* { dg-do compile { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
+
+/* Case 2.2: Leading constants with stepped sequence.  */
+
+#include <stdint.h>
+
+typedef int32_t vnx4si __attribute__((vector_size (32)));
+
+__attribute__((noipa))
+vnx4si foo(int a, int b)
+{
+  return (vnx4si) { 3, 2, 3, 2, 3, 2, b, a };
+}
+
+/*
+foo:
+.LFB0:
+        .cfi_startproc
+        ptrue   p0.s, vl8
+        adrp    x2, .LANCHOR0
+        add     x2, x2, :lo12:.LANCHOR0
+        ld1w    z0.s, p0/z, [x2]
+        insr    z0.s, w1
+        insr    z0.s, w0
+        rev     z0.s, z0.s
+        ret
+*/
+
+/* { dg-final { scan-assembler {\tld1w\t(z[0-9]+\.s), p[0-9]+/z, \[x[0-9]+\]\n\tinsr\t\1, w1\n\tinsr\t\1, w0\n\trev\t\1, \1} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_4_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_4_run.c
new file mode 100644
index 00000000000..563353fe673
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_4_run.c
@@ -0,0 +1,19 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
+
+#include "init_4.c"
+
+int main()
+{
+  int a = 10;
+  int b = 11;
+
+  vnx4si v = foo (a, b);
+  int expected[] = { 3, 2, 3, 2, 3, 2, b, a };
+
+  for (int i = 0; i < 8; i++)
+    if (v[i] != expected[i])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_5.c b/gcc/testsuite/gcc.target/aarch64/sve/init_5.c
new file mode 100644
index 00000000000..d662dfba8b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_5.c
@@ -0,0 +1,27 @@
+/* { dg-do compile { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
+
+/* Case 3: Trailing same element.  */ 
+
+#include <stdint.h>
+
+typedef int32_t vnx4si __attribute__((vector_size (32)));
+
+__attribute__((noipa))
+vnx4si foo(int a, int b, int c)
+{
+  return (vnx4si) { a, b, c, c, c, c, c, c };
+}
+
+/*
+foo:
+.LFB0:
+        .cfi_startproc
+        mov     z0.s, w2
+        ptrue   p0.s, vl8
+        insr    z0.s, w1
+        insr    z0.s, w0
+        ret
+*/
+
+/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.s), w2\n.*\tinsr\t\1, w1\n\tinsr\t\1, w0} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_5_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_5_run.c
new file mode 100644
index 00000000000..ae444a17688
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_5_run.c
@@ -0,0 +1,20 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
+
+#include "init_5.c"
+
+int main()
+{
+  int a = 10;
+  int b = 11;
+  int c = 12;
+
+  vnx4si v = foo (a, b, c);
+  int expected[] = { a, b, c, c, c, c, c, c };
+
+  for (int i = 0; i < 8; i++)
+    if (v[i] != expected[i])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_6.c b/gcc/testsuite/gcc.target/aarch64/sve/init_6.c
new file mode 100644
index 00000000000..fd0e21dcb85
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_6.c
@@ -0,0 +1,28 @@
+/* { dg-do compile { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
+
+/* Case 3: Trailing same element.  */ 
+
+#include <stdint.h>
+
+typedef int32_t vnx4si __attribute__((vector_size (32)));
+
+__attribute__((noipa))
+vnx4si foo(int a, int b, int c)
+{
+  return (vnx4si) { c, c, c, c, c, c, b, a };
+}
+
+/*
+foo:
+.LFB0:
+        .cfi_startproc
+        mov     z0.s, w2
+        ptrue   p0.s, vl8
+        insr    z0.s, w1
+        insr    z0.s, w0
+        rev     z0.s, z0.s
+        ret
+*/
+
+/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.s), w2\n.*\tinsr\t\1, w1\n\tinsr\t\1, w0\n\trev\t\1, \1} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_6_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_6_run.c
new file mode 100644
index 00000000000..d919f0ce0ba
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_6_run.c
@@ -0,0 +1,20 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
+
+#include "init_6.c"
+
+int main()
+{
+  int a = 10;
+  int b = 11;
+  int c = 12;
+
+  vnx4si v = foo (a, b, c);
+  int expected[] = { c, c, c, c, c, c, b, a };
+
+  for (int i = 0; i < 8; i++)
+    if (v[i] != expected[i])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_7.c b/gcc/testsuite/gcc.target/aarch64/sve/init_7.c
new file mode 100644
index 00000000000..5f3d82242d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_7.c
@@ -0,0 +1,32 @@
+/* { dg-do compile { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
+
+/* Case 5.1: All elements.  */ 
+
+#include <stdint.h>
+
+typedef int32_t vnx4si __attribute__((vector_size (32)));
+
+__attribute__((noipa))
+vnx4si foo(int a, int b, int c, int d, int e, int f, int g, int h)
+{
+  return (vnx4si) { a, b, c, d, e, f, g, h };
+}
+
+/*
+foo:
+.LFB0:
+        .cfi_startproc
+        mov     z0.s, w7
+        ptrue   p0.s, vl8
+        insr    z0.s, w6
+        insr    z0.s, w5
+        insr    z0.s, w4
+        insr    z0.s, w3
+        insr    z0.s, w2
+        insr    z0.s, w1
+        insr    z0.s, w0
+        ret
+*/
+
+/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.s), w7\n.*\tinsr\t\1, w6\n\tinsr\t\1, w5\n\tinsr\t\1, w4\n\tinsr\t\1, w3\n\tinsr\t\1, w2\n\tinsr\t\1, w1\n\tinsr\t\1, w0} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_7_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_7_run.c
new file mode 100644
index 00000000000..c9f040c6d4d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_7_run.c
@@ -0,0 +1,25 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
+
+#include "init_7.c"
+
+int main()
+{
+  int a = 10;
+  int b = 11;
+  int c = 12;
+  int d = 13;
+  int e = 14;
+  int f = 15;
+  int g = 16;
+  int h = 17;
+
+  vnx4si v = foo (a, b, c, d, e, f, g, h);
+  int expected[] = { a, b, c, d, e, f, g, h };
+
+  for (int i = 0; i < 8; i++)
+    if (v[i] != expected[i])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_8.c b/gcc/testsuite/gcc.target/aarch64/sve/init_8.c
new file mode 100644
index 00000000000..9a1869a2765
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_8.c
@@ -0,0 +1,32 @@
+/* { dg-do compile { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
+
+/* Case 5.2: Interleaved elements and constants.  */ 
+
+#include <stdint.h>
+
+typedef int32_t vnx4si __attribute__((vector_size (32)));
+
+__attribute__((noipa))
+vnx4si foo(int a, int b, int c, int d)
+{
+  return (vnx4si) { a, 1, b, 2, c, 3, d, 4 }; 
+}
+
+/*
+foo:
+.LFB0:
+        .cfi_startproc
+        ptrue   p0.s, vl8
+        mov     z0.s, w3
+        adrp    x3, .LANCHOR0
+        insr    z0.s, w2
+        add     x3, x3, :lo12:.LANCHOR0
+        insr    z0.s, w1
+        ld1w    z1.s, p0/z, [x3]
+        insr    z0.s, w0
+        zip1    z0.s, z0.s, z1.s
+        ret
+*/
+
+/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.s), w3\n\tadrp\t(x[0-9]+), \.LANCHOR0\n\tinsr\t\1, w2\n\tadd\t\2, \2, :lo12:\.LANCHOR0\n\tinsr\t\1, w1\n\tld1w\t(z[0-9]+\.s), p[0-9]+/z, \[\2\]\n\tinsr\t\1, w0\n\tzip1\t\1, \1, \3} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_8_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_8_run.c
new file mode 100644
index 00000000000..14a8ad44145
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_8_run.c
@@ -0,0 +1,21 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
+
+#include "init_8.c"
+
+int main()
+{
+  int a = 10;
+  int b = 11;
+  int c = 12;
+  int d = 13;
+
+  vnx4si v = foo (a, b, c, d);
+  int expected[] = { a, 1, b, 2, c, 3, d, 4 };
+
+  for (int i = 0; i < 8; i++)
+    if (v[i] != expected[i])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_9.c b/gcc/testsuite/gcc.target/aarch64/sve/init_9.c
new file mode 100644
index 00000000000..0ecbce848ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_9.c
@@ -0,0 +1,27 @@
+/* { dg-do compile { target aarch64_asm_sve_ok } } */
+/* { dg-options "-O2 -ftree-vectorize -fno-schedule-insns -msve-vector-bits=256 --save-temps" } */
+
+/* Case 5.3: Repeated elements.  */ 
+
+#include <stdint.h>
+
+typedef int32_t vnx4si __attribute__((vector_size (32)));
+
+__attribute__((noipa))
+vnx4si foo(int a, int b)
+{
+  return (vnx4si) { a, b, a, b, a, b, a, b };
+}
+
+/*
+foo:
+.LFB0:
+        .cfi_startproc
+        mov     z0.s, w0
+        mov     z1.s, w1
+        ptrue   p0.s, vl8
+        zip1    z0.s, z0.s, z1.s
+        ret
+*/
+
+/* { dg-final { scan-assembler {\tmov\t(z[0-9]+\.s), w0\n\tmov\t(z[0-9]+\.s), w1\n.*\tzip1\t\1, \1, \2} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/init_9_run.c b/gcc/testsuite/gcc.target/aarch64/sve/init_9_run.c
new file mode 100644
index 00000000000..6c67025c585
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/init_9_run.c
@@ -0,0 +1,19 @@
+/* { dg-do run { target aarch64_sve256_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -msve-vector-bits=256 --save-temps" } */
+
+#include "init_9.c"
+
+int main()
+{
+  int a = 10;
+  int b = 11;
+
+  vnx4si v = foo (a, b);
+  int expected[] = { a, b, a, b, a, b, a, b };
+
+  for (int i = 0; i < 8; i++)
+    if (v[i] != expected[i])
+      __builtin_abort ();
+
+  return 0;
+}
diff --git a/gcc/vector-builder.h b/gcc/vector-builder.h
index 9967daa6e4c..9f95b01bc3b 100644
--- a/gcc/vector-builder.h
+++ b/gcc/vector-builder.h
@@ -96,6 +96,7 @@ public:
   unsigned int encoded_nelts () const;
   bool encoded_full_vector_p () const;
   T elt (unsigned int) const;
+  unsigned int count_dups (int, int, int) const;
 
   bool operator == (const Derived &) const;
   bool operator != (const Derived &x) const { return !operator == (x); }
@@ -223,6 +224,23 @@ vector_builder<T, Derived>::elt (unsigned int i) const
 				 derived ()->step (prev, final));
 }
 
+/* Return the number of leading duplicate elements in the range
+   [START:END:STEP].  The value is always at least 1.  */
+
+template<typename T, typename Derived>
+unsigned int
+vector_builder<T, Derived>::count_dups (int start, int end, int step) const
+{
+  gcc_assert ((end - start) % step == 0);
+
+  unsigned int ndups = 1;
+  for (int i = start + step;
+       i != end && derived ()->equal_p (elt (i), elt (start));
+       i += step)
+    ndups++;
+  return ndups;
+}
+
 /* Change the encoding to NPATTERNS patterns of NELTS_PER_PATTERN each,
    but without changing the underlying vector.  */