diff mbox series

Add clobbers around IFN_LOAD/STORE_LANES

Message ID 87vabyqnxm.fsf@linaro.org
State New
Headers show
Series Add clobbers around IFN_LOAD/STORE_LANES | expand

Commit Message

Richard Sandiford May 8, 2018, 1:25 p.m. UTC
We build up the input to IFN_STORE_LANES one vector at a time.
In RTL, each of these vector assignments becomes a write to
subregs of the form (subreg:VEC (reg:AGGR R)), where R is the
eventual input to the store lanes instruction.  The problem is
that RTL isn't very good at tracking liveness when things are
initialised piecemeal by subregs, so R tends to end up being
live on all paths from the entry block to the store.  This in
turn leads to unnecessary spilling around calls, as well as to
excess register pressure in vector loops.

This patch adds gimple clobbers to indicate the liveness of the
IFN_STORE_LANES variable and makes sure that gimple clobbers are
expanded to rtl clobbers where useful.  For consistency it also
uses clobbers to mark the point at which an IFN_LOAD_LANES
variable is no longer needed.

Tested on aarch64-linux-gnu (with and without SVE), aaarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


2018-05-08  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* cfgexpand.c (expand_clobber): New function.
	(expand_gimple_stmt_1): Use it.
	* tree-vect-stmts.c (vect_clobber_variable): New function,
	split out from...
	(vectorizable_simd_clone_call): ...here.
	(vectorizable_store): Emit a clobber either side of an
	IFN_STORE_LANES sequence.
	(vectorizable_load): Emit a clobber after an IFN_LOAD_LANES sequence.

gcc/testsuite/
	* gcc.target/aarch64/store_lane_spill_1.c: New test.
	* gcc.target/aarch64/sve/store_lane_spill_1.c: Likewise.

Comments

Richard Biener May 8, 2018, 1:56 p.m. UTC | #1
On Tue, May 8, 2018 at 3:25 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> We build up the input to IFN_STORE_LANES one vector at a time.

> In RTL, each of these vector assignments becomes a write to

> subregs of the form (subreg:VEC (reg:AGGR R)), where R is the

> eventual input to the store lanes instruction.  The problem is

> that RTL isn't very good at tracking liveness when things are

> initialised piecemeal by subregs, so R tends to end up being

> live on all paths from the entry block to the store.  This in

> turn leads to unnecessary spilling around calls, as well as to

> excess register pressure in vector loops.

>

> This patch adds gimple clobbers to indicate the liveness of the

> IFN_STORE_LANES variable and makes sure that gimple clobbers are

> expanded to rtl clobbers where useful.  For consistency it also

> uses clobbers to mark the point at which an IFN_LOAD_LANES

> variable is no longer needed.

>

> Tested on aarch64-linux-gnu (with and without SVE), aaarch64_be-elf

> and x86_64-linux-gnu.  OK to install?


Minor comment inline.

Also it looks like clobbers are at the moment all thrown away during
RTL expansion?  Do the clobbers we generate with this patch eventually
get collected somehow if they turn out to be no longer necessary?
How many of them do we generate?  I expect not many decls get
expanded to registers and if they are most of them are likely
not constructed piecemail  - thus, maybe we should restrict ourselves
to non-scalar typed lhs?  So, change it to

  if (DECL_P (lhs)
      && (AGGREGATE_TYPE_P (TREE_TYPE (lhs)) // but what about
single-element aggregates?
             || VECTOR_TYPE_P (TREE_TYPE (lhs))
             || COMPLEX_TYPE_P (TREE_TYPE (lhs))))

?

The vectorizer part is ok with the minor adjustment pointed out below.  Maybe
you want to split this patch while we discuss the RTL bits.

Thanks,
Richard.

> Thanks,

> Richard

>

>

> 2018-05-08  Richard Sandiford  <richard.sandiford@linaro.org>

>

> gcc/

>         * cfgexpand.c (expand_clobber): New function.

>         (expand_gimple_stmt_1): Use it.

>         * tree-vect-stmts.c (vect_clobber_variable): New function,

>         split out from...

>         (vectorizable_simd_clone_call): ...here.

>         (vectorizable_store): Emit a clobber either side of an

>         IFN_STORE_LANES sequence.

>         (vectorizable_load): Emit a clobber after an IFN_LOAD_LANES sequence.

>

> gcc/testsuite/

>         * gcc.target/aarch64/store_lane_spill_1.c: New test.

>         * gcc.target/aarch64/sve/store_lane_spill_1.c: Likewise.

>

> Index: gcc/cfgexpand.c

> ===================================================================

> --- gcc/cfgexpand.c     2018-05-08 09:42:02.974668379 +0100

> +++ gcc/cfgexpand.c     2018-05-08 14:23:25.039856499 +0100

> @@ -3582,6 +3582,20 @@ expand_return (tree retval, tree bounds)

>      }

>  }

>

> +/* Expand a clobber of LHS.  If LHS is stored it in a register, tell

> +   the rtl optimizers that its value is no longer needed.  */

> +

> +static void

> +expand_clobber (tree lhs)

> +{

> +  if (DECL_P (lhs))

> +    {

> +      rtx decl_rtl = DECL_RTL_IF_SET (lhs);

> +      if (decl_rtl && REG_P (decl_rtl))

> +       emit_clobber (decl_rtl);

> +    }

> +}

> +

>  /* A subroutine of expand_gimple_stmt, expanding one gimple statement

>     STMT that doesn't require special handling for outgoing edges.  That

>     is no tailcalls and no GIMPLE_COND.  */

> @@ -3687,7 +3701,7 @@ expand_gimple_stmt_1 (gimple *stmt)

>             if (TREE_CLOBBER_P (rhs))

>               /* This is a clobber to mark the going out of scope for

>                  this LHS.  */

> -             ;

> +             expand_clobber (lhs);

>             else

>               expand_assignment (lhs, rhs,

>                                  gimple_assign_nontemporal_move_p (

> Index: gcc/tree-vect-stmts.c

> ===================================================================

> --- gcc/tree-vect-stmts.c       2018-05-08 09:42:03.335655127 +0100

> +++ gcc/tree-vect-stmts.c       2018-05-08 14:23:25.040856464 +0100

> @@ -182,6 +182,18 @@ create_array_ref (tree type, tree ptr, t

>    return mem_ref;

>  }

>

> +/* Add a clobber of variable VAR to the vectorization of STMT.

> +   Emit the clobber before *GSI.  */

> +

> +static void

> +vect_clobber_variable (gimple *stmt, gimple_stmt_iterator *gsi, tree var)

> +{

> +  tree clobber = build_constructor (TREE_TYPE (var), NULL);

> +  TREE_THIS_VOLATILE (clobber) = 1;


There's now a helper for this - build_clobber.

> +  gimple *new_stmt = gimple_build_assign (var, clobber);

> +  vect_finish_stmt_generation (stmt, new_stmt, gsi);

> +}

> +

>  /* Utility functions used by vect_mark_stmts_to_be_vectorized.  */

>

>  /* Function vect_mark_relevant.

> @@ -4128,12 +4140,7 @@ vectorizable_simd_clone_call (gimple *st

>                 }

>

>               if (ratype)

> -               {

> -                 tree clobber = build_constructor (ratype, NULL);

> -                 TREE_THIS_VOLATILE (clobber) = 1;

> -                 new_stmt = gimple_build_assign (new_temp, clobber);

> -                 vect_finish_stmt_generation (stmt, new_stmt, gsi);

> -               }

> +               vect_clobber_variable (stmt, gsi, new_temp);

>               continue;

>             }

>           else if (simd_clone_subparts (vectype) > nunits)

> @@ -4156,10 +4163,7 @@ vectorizable_simd_clone_call (gimple *st

>                       CONSTRUCTOR_APPEND_ELT (ret_ctor_elts, NULL_TREE,

>                                               gimple_assign_lhs (new_stmt));

>                     }

> -                 tree clobber = build_constructor (ratype, NULL);

> -                 TREE_THIS_VOLATILE (clobber) = 1;

> -                 new_stmt = gimple_build_assign (new_temp, clobber);

> -                 vect_finish_stmt_generation (stmt, new_stmt, gsi);

> +                 vect_clobber_variable (stmt, gsi, new_temp);

>                 }

>               else

>                 CONSTRUCTOR_APPEND_ELT (ret_ctor_elts, NULL_TREE, new_temp);

> @@ -4186,11 +4190,7 @@ vectorizable_simd_clone_call (gimple *st

>               new_stmt

>                 = gimple_build_assign (make_ssa_name (vec_dest), t);

>               vect_finish_stmt_generation (stmt, new_stmt, gsi);

> -             tree clobber = build_constructor (ratype, NULL);

> -             TREE_THIS_VOLATILE (clobber) = 1;

> -             vect_finish_stmt_generation (stmt,

> -                                          gimple_build_assign (new_temp,

> -                                                               clobber), gsi);

> +             vect_clobber_variable (stmt, gsi, new_temp);

>             }

>         }

>

> @@ -6913,8 +6913,15 @@ vectorizable_store (gimple *stmt, gimple

>         {

>           tree vec_array;

>

> -         /* Combine all the vectors into an array.  */

> +         /* Get an array into which we can store the individual vectors.  */

>           vec_array = create_vector_array (vectype, vec_num);

> +

> +         /* Invalidate the current contents of VEC_ARRAY.  This should

> +            become an RTL clobber too, which prevents the vector registers

> +            from being upward-exposed.  */

> +         vect_clobber_variable (stmt, gsi, vec_array);

> +

> +         /* Store the individual vectors into the array.  */

>           for (i = 0; i < vec_num; i++)

>             {

>               vec_oprnd = dr_chain[i];

> @@ -6953,6 +6960,9 @@ vectorizable_store (gimple *stmt, gimple

>           gimple_call_set_nothrow (call, true);

>           new_stmt = call;

>           vect_finish_stmt_generation (stmt, new_stmt, gsi);

> +

> +         /* Record that VEC_ARRAY is now dead.  */

> +         vect_clobber_variable (stmt, gsi, vec_array);

>         }

>        else

>         {

> @@ -8105,6 +8115,9 @@ vectorizable_load (gimple *stmt, gimple_

>

>           /* Record the mapping between SSA_NAMEs and statements.  */

>           vect_record_grouped_load_vectors (stmt, dr_chain);

> +

> +         /* Record that VEC_ARRAY is now dead.  */

> +         vect_clobber_variable (stmt, gsi, vec_array);

>         }

>        else

>         {

> Index: gcc/testsuite/gcc.target/aarch64/store_lane_spill_1.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/store_lane_spill_1.c       2018-05-08 14:23:25.039856499 +0100

> @@ -0,0 +1,21 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#pragma GCC target "+nosve"

> +

> +int cont (void);

> +

> +void

> +f (int (*x)[3], int *a, int *b, int *c, int n)

> +{

> +  do

> +    for (int i = 0; i < n; ++i)

> +      {

> +       x[i][0] = a[i] + 1;

> +       x[i][1] = b[i] + 2;

> +       x[i][2] = c[i] + 3;

> +      }

> +  while (cont ());

> +}

> +

> +/* { dg-final { scan-assembler-not {\tst1\t} } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/store_lane_spill_1.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/store_lane_spill_1.c   2018-05-08 14:23:25.040856464 +0100

> @@ -0,0 +1,19 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +int cont (void);

> +

> +void

> +f (int (*x)[3], int *a, int *b, int *c, int n)

> +{

> +  do

> +    for (int i = 0; i < n; ++i)

> +      {

> +       x[i][0] = a[i] + 1;

> +       x[i][1] = b[i] + 2;

> +       x[i][2] = c[i] + 3;

> +      }

> +  while (cont ());

> +}

> +

> +/* { dg-final { scan-assembler-not {\tstr\tz[0-9]} } } */
Richard Sandiford May 8, 2018, 3:56 p.m. UTC | #2
Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, May 8, 2018 at 3:25 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> We build up the input to IFN_STORE_LANES one vector at a time.

>> In RTL, each of these vector assignments becomes a write to

>> subregs of the form (subreg:VEC (reg:AGGR R)), where R is the

>> eventual input to the store lanes instruction.  The problem is

>> that RTL isn't very good at tracking liveness when things are

>> initialised piecemeal by subregs, so R tends to end up being

>> live on all paths from the entry block to the store.  This in

>> turn leads to unnecessary spilling around calls, as well as to

>> excess register pressure in vector loops.

>>

>> This patch adds gimple clobbers to indicate the liveness of the

>> IFN_STORE_LANES variable and makes sure that gimple clobbers are

>> expanded to rtl clobbers where useful.  For consistency it also

>> uses clobbers to mark the point at which an IFN_LOAD_LANES

>> variable is no longer needed.

>>

>> Tested on aarch64-linux-gnu (with and without SVE), aaarch64_be-elf

>> and x86_64-linux-gnu.  OK to install?

>

> Minor comment inline.


Thanks, fixed.

> Also it looks like clobbers are at the moment all thrown away during

> RTL expansion?  Do the clobbers we generate with this patch eventually

> get collected somehow if they turn out to be no longer necessary?

> How many of them do we generate?  I expect not many decls get

> expanded to registers and if they are most of them are likely

> not constructed piecemail  - thus, maybe we should restrict ourselves

> to non-scalar typed lhs?  So, change it to

>

>   if (DECL_P (lhs)

>       && (AGGREGATE_TYPE_P (TREE_TYPE (lhs)) // but what about

> single-element aggregates?

>              || VECTOR_TYPE_P (TREE_TYPE (lhs))

>              || COMPLEX_TYPE_P (TREE_TYPE (lhs))))


How about instead deciding based on whether the pseudo register spans a
single hard register or multiple hard registers, as per the patch below?
The clobber is only useful if the pseudo register can be partially
modified via subregs.

This could potentially also help with any large single-element
aggregrates that get broken down into word-sized subreg ops.

> The vectorizer part is ok with the minor adjustment pointed out below.  Maybe

> you want to split this patch while we discuss the RTL bits.


OK, thanks.  I'll keep it as one patch for applying purposes,
but snipped the approved bits below.

Richard


2018-05-08  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* cfgexpand.c (expand_clobber): New function.
	(expand_gimple_stmt_1): Use it.

Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2018-05-08 16:50:31.815501502 +0100
+++ gcc/cfgexpand.c	2018-05-08 16:50:31.997495804 +0100
@@ -3582,6 +3582,26 @@ expand_return (tree retval, tree bounds)
     }
 }
 
+/* Expand a clobber of LHS.  If LHS is stored it in a multi-part
+   register, tell the rtl optimizers that its value is no longer
+   needed.  */
+
+static void
+expand_clobber (tree lhs)
+{
+  if (DECL_P (lhs))
+    {
+      rtx decl_rtl = DECL_RTL_IF_SET (lhs);
+      if (decl_rtl && REG_P (decl_rtl))
+	{
+	  machine_mode decl_mode = GET_MODE (decl_rtl);
+	  if (maybe_gt (GET_MODE_SIZE (decl_mode),
+			REGMODE_NATURAL_SIZE (decl_mode)))
+	    emit_clobber (decl_rtl);
+	}
+    }
+}
+
 /* A subroutine of expand_gimple_stmt, expanding one gimple statement
    STMT that doesn't require special handling for outgoing edges.  That
    is no tailcalls and no GIMPLE_COND.  */
@@ -3687,7 +3707,7 @@ expand_gimple_stmt_1 (gimple *stmt)
 	    if (TREE_CLOBBER_P (rhs))
 	      /* This is a clobber to mark the going out of scope for
 		 this LHS.  */
-	      ;
+	      expand_clobber (lhs);
 	    else
 	      expand_assignment (lhs, rhs,
 				 gimple_assign_nontemporal_move_p (
Richard Biener May 9, 2018, 9:50 a.m. UTC | #3
On Tue, May 8, 2018 at 5:56 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:

>> On Tue, May 8, 2018 at 3:25 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> We build up the input to IFN_STORE_LANES one vector at a time.

>>> In RTL, each of these vector assignments becomes a write to

>>> subregs of the form (subreg:VEC (reg:AGGR R)), where R is the

>>> eventual input to the store lanes instruction.  The problem is

>>> that RTL isn't very good at tracking liveness when things are

>>> initialised piecemeal by subregs, so R tends to end up being

>>> live on all paths from the entry block to the store.  This in

>>> turn leads to unnecessary spilling around calls, as well as to

>>> excess register pressure in vector loops.

>>>

>>> This patch adds gimple clobbers to indicate the liveness of the

>>> IFN_STORE_LANES variable and makes sure that gimple clobbers are

>>> expanded to rtl clobbers where useful.  For consistency it also

>>> uses clobbers to mark the point at which an IFN_LOAD_LANES

>>> variable is no longer needed.

>>>

>>> Tested on aarch64-linux-gnu (with and without SVE), aaarch64_be-elf

>>> and x86_64-linux-gnu.  OK to install?

>>

>> Minor comment inline.

>

> Thanks, fixed.

>

>> Also it looks like clobbers are at the moment all thrown away during

>> RTL expansion?  Do the clobbers we generate with this patch eventually

>> get collected somehow if they turn out to be no longer necessary?

>> How many of them do we generate?  I expect not many decls get

>> expanded to registers and if they are most of them are likely

>> not constructed piecemail  - thus, maybe we should restrict ourselves

>> to non-scalar typed lhs?  So, change it to

>>

>>   if (DECL_P (lhs)

>>       && (AGGREGATE_TYPE_P (TREE_TYPE (lhs)) // but what about

>> single-element aggregates?

>>              || VECTOR_TYPE_P (TREE_TYPE (lhs))

>>              || COMPLEX_TYPE_P (TREE_TYPE (lhs))))

>

> How about instead deciding based on whether the pseudo register spans a

> single hard register or multiple hard registers, as per the patch below?

> The clobber is only useful if the pseudo register can be partially

> modified via subregs.

>

> This could potentially also help with any large single-element

> aggregrates that get broken down into word-sized subreg ops.


Yeah, that looks even better.

>> The vectorizer part is ok with the minor adjustment pointed out below.  Maybe

>> you want to split this patch while we discuss the RTL bits.

>

> OK, thanks.  I'll keep it as one patch for applying purposes,

> but snipped the approved bits below.


This version is ok.

Thanks,
Richard.

> Richard

>

>

> 2018-05-08  Richard Sandiford  <richard.sandiford@linaro.org>

>

> gcc/

>         * cfgexpand.c (expand_clobber): New function.

>         (expand_gimple_stmt_1): Use it.

>

> Index: gcc/cfgexpand.c

> ===================================================================

> --- gcc/cfgexpand.c     2018-05-08 16:50:31.815501502 +0100

> +++ gcc/cfgexpand.c     2018-05-08 16:50:31.997495804 +0100

> @@ -3582,6 +3582,26 @@ expand_return (tree retval, tree bounds)

>      }

>  }

>

> +/* Expand a clobber of LHS.  If LHS is stored it in a multi-part

> +   register, tell the rtl optimizers that its value is no longer

> +   needed.  */

> +

> +static void

> +expand_clobber (tree lhs)

> +{

> +  if (DECL_P (lhs))

> +    {

> +      rtx decl_rtl = DECL_RTL_IF_SET (lhs);

> +      if (decl_rtl && REG_P (decl_rtl))

> +       {

> +         machine_mode decl_mode = GET_MODE (decl_rtl);

> +         if (maybe_gt (GET_MODE_SIZE (decl_mode),

> +                       REGMODE_NATURAL_SIZE (decl_mode)))

> +           emit_clobber (decl_rtl);

> +       }

> +    }

> +}

> +

>  /* A subroutine of expand_gimple_stmt, expanding one gimple statement

>     STMT that doesn't require special handling for outgoing edges.  That

>     is no tailcalls and no GIMPLE_COND.  */

> @@ -3687,7 +3707,7 @@ expand_gimple_stmt_1 (gimple *stmt)

>             if (TREE_CLOBBER_P (rhs))

>               /* This is a clobber to mark the going out of scope for

>                  this LHS.  */

> -             ;

> +             expand_clobber (lhs);

>             else

>               expand_assignment (lhs, rhs,

>                                  gimple_assign_nontemporal_move_p (
diff mbox series

Patch

Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2018-05-08 09:42:02.974668379 +0100
+++ gcc/cfgexpand.c	2018-05-08 14:23:25.039856499 +0100
@@ -3582,6 +3582,20 @@  expand_return (tree retval, tree bounds)
     }
 }
 
+/* Expand a clobber of LHS.  If LHS is stored it in a register, tell
+   the rtl optimizers that its value is no longer needed.  */
+
+static void
+expand_clobber (tree lhs)
+{
+  if (DECL_P (lhs))
+    {
+      rtx decl_rtl = DECL_RTL_IF_SET (lhs);
+      if (decl_rtl && REG_P (decl_rtl))
+	emit_clobber (decl_rtl);
+    }
+}
+
 /* A subroutine of expand_gimple_stmt, expanding one gimple statement
    STMT that doesn't require special handling for outgoing edges.  That
    is no tailcalls and no GIMPLE_COND.  */
@@ -3687,7 +3701,7 @@  expand_gimple_stmt_1 (gimple *stmt)
 	    if (TREE_CLOBBER_P (rhs))
 	      /* This is a clobber to mark the going out of scope for
 		 this LHS.  */
-	      ;
+	      expand_clobber (lhs);
 	    else
 	      expand_assignment (lhs, rhs,
 				 gimple_assign_nontemporal_move_p (
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2018-05-08 09:42:03.335655127 +0100
+++ gcc/tree-vect-stmts.c	2018-05-08 14:23:25.040856464 +0100
@@ -182,6 +182,18 @@  create_array_ref (tree type, tree ptr, t
   return mem_ref;
 }
 
+/* Add a clobber of variable VAR to the vectorization of STMT.
+   Emit the clobber before *GSI.  */
+
+static void
+vect_clobber_variable (gimple *stmt, gimple_stmt_iterator *gsi, tree var)
+{
+  tree clobber = build_constructor (TREE_TYPE (var), NULL);
+  TREE_THIS_VOLATILE (clobber) = 1;
+  gimple *new_stmt = gimple_build_assign (var, clobber);
+  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+}
+
 /* Utility functions used by vect_mark_stmts_to_be_vectorized.  */
 
 /* Function vect_mark_relevant.
@@ -4128,12 +4140,7 @@  vectorizable_simd_clone_call (gimple *st
 		}
 
 	      if (ratype)
-		{
-		  tree clobber = build_constructor (ratype, NULL);
-		  TREE_THIS_VOLATILE (clobber) = 1;
-		  new_stmt = gimple_build_assign (new_temp, clobber);
-		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
-		}
+		vect_clobber_variable (stmt, gsi, new_temp);
 	      continue;
 	    }
 	  else if (simd_clone_subparts (vectype) > nunits)
@@ -4156,10 +4163,7 @@  vectorizable_simd_clone_call (gimple *st
 		      CONSTRUCTOR_APPEND_ELT (ret_ctor_elts, NULL_TREE,
 					      gimple_assign_lhs (new_stmt));
 		    }
-		  tree clobber = build_constructor (ratype, NULL);
-		  TREE_THIS_VOLATILE (clobber) = 1;
-		  new_stmt = gimple_build_assign (new_temp, clobber);
-		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+		  vect_clobber_variable (stmt, gsi, new_temp);
 		}
 	      else
 		CONSTRUCTOR_APPEND_ELT (ret_ctor_elts, NULL_TREE, new_temp);
@@ -4186,11 +4190,7 @@  vectorizable_simd_clone_call (gimple *st
 	      new_stmt
 		= gimple_build_assign (make_ssa_name (vec_dest), t);
 	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
-	      tree clobber = build_constructor (ratype, NULL);
-	      TREE_THIS_VOLATILE (clobber) = 1;
-	      vect_finish_stmt_generation (stmt,
-					   gimple_build_assign (new_temp,
-								clobber), gsi);
+	      vect_clobber_variable (stmt, gsi, new_temp);
 	    }
 	}
 
@@ -6913,8 +6913,15 @@  vectorizable_store (gimple *stmt, gimple
 	{
 	  tree vec_array;
 
-	  /* Combine all the vectors into an array.  */
+	  /* Get an array into which we can store the individual vectors.  */
 	  vec_array = create_vector_array (vectype, vec_num);
+
+	  /* Invalidate the current contents of VEC_ARRAY.  This should
+	     become an RTL clobber too, which prevents the vector registers
+	     from being upward-exposed.  */
+	  vect_clobber_variable (stmt, gsi, vec_array);
+
+	  /* Store the individual vectors into the array.  */
 	  for (i = 0; i < vec_num; i++)
 	    {
 	      vec_oprnd = dr_chain[i];
@@ -6953,6 +6960,9 @@  vectorizable_store (gimple *stmt, gimple
 	  gimple_call_set_nothrow (call, true);
 	  new_stmt = call;
 	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
+
+	  /* Record that VEC_ARRAY is now dead.  */
+	  vect_clobber_variable (stmt, gsi, vec_array);
 	}
       else
 	{
@@ -8105,6 +8115,9 @@  vectorizable_load (gimple *stmt, gimple_
 
 	  /* Record the mapping between SSA_NAMEs and statements.  */
 	  vect_record_grouped_load_vectors (stmt, dr_chain);
+
+	  /* Record that VEC_ARRAY is now dead.  */
+	  vect_clobber_variable (stmt, gsi, vec_array);
 	}
       else
 	{
Index: gcc/testsuite/gcc.target/aarch64/store_lane_spill_1.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/store_lane_spill_1.c	2018-05-08 14:23:25.039856499 +0100
@@ -0,0 +1,21 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#pragma GCC target "+nosve"
+
+int cont (void);
+
+void
+f (int (*x)[3], int *a, int *b, int *c, int n)
+{
+  do
+    for (int i = 0; i < n; ++i)
+      {
+	x[i][0] = a[i] + 1;
+	x[i][1] = b[i] + 2;
+	x[i][2] = c[i] + 3;
+      }
+  while (cont ());
+}
+
+/* { dg-final { scan-assembler-not {\tst1\t} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/store_lane_spill_1.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/store_lane_spill_1.c	2018-05-08 14:23:25.040856464 +0100
@@ -0,0 +1,19 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+int cont (void);
+
+void
+f (int (*x)[3], int *a, int *b, int *c, int n)
+{
+  do
+    for (int i = 0; i < n; ++i)
+      {
+	x[i][0] = a[i] + 1;
+	x[i][1] = b[i] + 2;
+	x[i][2] = c[i] + 3;
+      }
+  while (cont ());
+}
+
+/* { dg-final { scan-assembler-not {\tstr\tz[0-9]} } } */