[SLP/AArch64] Fix unpack handling for big-endian SVE

Message ID 87h8r8og4o.fsf@linaro.org
State New
Headers show
Series
  • [SLP/AArch64] Fix unpack handling for big-endian SVE
Related show

Commit Message

Richard Sandiford Jan. 26, 2018, 2:21 p.m.
I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks
the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered
lanes.  This meant that both the SVE patterns and the handling of
fully-masked loops were wrong.

The patch deals with that by making sure that all vec_unpack* optabs
are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate
define_insn.  This in turn meant that we can get rid of the duplication
between the signed and unsigned patterns for predicates.  (We provide
implementations of both the signed and unsigned optabs because the sign
doesn't matter for predicates: every element contains only one
significant bit.)

Also, the float unpacks need to unpack one half of the input vector,
but the unpacked upper bits are "don't care".  There are two obvious
ways of handling that: use an unpack (filling with zeros) or use a ZIP
(filling with a duplicate of the low bits).  The code previously used
unpacks, but the sequence involved a subreg that is semantically an
element reverse on big-endian targets.  Using the ZIP patterns avoids
that, and at the moment there's no reason to prefer one over the other
for performance reasons, so the patch switches to ZIP unconditionally.
As the comment says, it would be easy to optimise this later if UUNPK
turns out to be better for some implementations.

Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu.
I think the tree-vect-loop-manip.c part is obvious, but OK for the
AArch64 bits?

Richard


2018-01-26  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):
	Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR
	for big-endian.
	* config/aarch64/iterators.md (hi_lanes_optab): New int attribute.
	* config/aarch64/aarch64-sve.md
	(*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to...
	(aarch64_sve_<perm_insn><perm_hilo><mode>): ...this.
	(*extend<mode><Vwide>2): Rename to...
	(aarch64_sve_extend<mode><Vwide>2): ...this.
	(vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand,
	renaming the old pattern to...
	(aarch64_sve_punpk<perm_hilo>_<mode>): ...this.  Only define
	unsigned packs.
	(vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a
	define_expand, renaming the old pattern to...
	(aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this.
	(*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete.
	(vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into
	account when deciding which SVE instruction the optab should use.
	(vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise.

gcc/testsuite/
	* gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather
	than unpacks.
	* gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
	* gcc.target/aarch64/sve/unpack_float_1.c: Likewise.

Comments

Richard Biener Jan. 26, 2018, 2:58 p.m. | #1
On Fri, Jan 26, 2018 at 3:21 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks

> the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered

> lanes.  This meant that both the SVE patterns and the handling of

> fully-masked loops were wrong.

>

> The patch deals with that by making sure that all vec_unpack* optabs

> are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate

> define_insn.  This in turn meant that we can get rid of the duplication

> between the signed and unsigned patterns for predicates.  (We provide

> implementations of both the signed and unsigned optabs because the sign

> doesn't matter for predicates: every element contains only one

> significant bit.)

>

> Also, the float unpacks need to unpack one half of the input vector,

> but the unpacked upper bits are "don't care".  There are two obvious

> ways of handling that: use an unpack (filling with zeros) or use a ZIP

> (filling with a duplicate of the low bits).  The code previously used

> unpacks, but the sequence involved a subreg that is semantically an

> element reverse on big-endian targets.  Using the ZIP patterns avoids

> that, and at the moment there's no reason to prefer one over the other

> for performance reasons, so the patch switches to ZIP unconditionally.

> As the comment says, it would be easy to optimise this later if UUNPK

> turns out to be better for some implementations.

>

> Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu.

> I think the tree-vect-loop-manip.c part is obvious, but OK for the

> AArch64 bits?

>

> Richard

>

>

> 2018-01-26  Richard Sandiford  <richard.sandiford@linaro.org>

>

> gcc/

>         * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):

>         Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR

>         for big-endian.

>         * config/aarch64/iterators.md (hi_lanes_optab): New int attribute.

>         * config/aarch64/aarch64-sve.md

>         (*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to...

>         (aarch64_sve_<perm_insn><perm_hilo><mode>): ...this.

>         (*extend<mode><Vwide>2): Rename to...

>         (aarch64_sve_extend<mode><Vwide>2): ...this.

>         (vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand,

>         renaming the old pattern to...

>         (aarch64_sve_punpk<perm_hilo>_<mode>): ...this.  Only define

>         unsigned packs.

>         (vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a

>         define_expand, renaming the old pattern to...

>         (aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this.

>         (*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete.

>         (vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into

>         account when deciding which SVE instruction the optab should use.

>         (vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise.

>

> gcc/testsuite/

>         * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather

>         than unpacks.

>         * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.

>         * gcc.target/aarch64/sve/unpack_float_1.c: Likewise.

>

> Index: gcc/tree-vect-loop-manip.c

> ===================================================================

> --- gcc/tree-vect-loop-manip.c  2018-01-13 18:02:00.948360274 +0000

> +++ gcc/tree-vect-loop-manip.c  2018-01-26 14:00:15.717916957 +0000

> @@ -334,7 +334,8 @@ vect_maybe_permute_loop_masks (gimple_se

>         {

>           tree src = src_rgm->masks[i / 2];

>           tree dest = dest_rgm->masks[i];

> -         tree_code code = (i & 1 ? VEC_UNPACK_HI_EXPR

> +         tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1)

> +                           ? VEC_UNPACK_HI_EXPR

>                             : VEC_UNPACK_LO_EXPR);


This looks broken anyways -- the element oder on GIMPLE is the same
for little and big-endian.  I see two other BYTES_BIG_ENDIAN uses in
tree-vect-*, not sure when they crept in but they are all broken.

Richard.

>           gassign *stmt;

>           if (dest_masktype == unpack_masktype)

> Index: gcc/config/aarch64/iterators.md

> ===================================================================

> --- gcc/config/aarch64/iterators.md     2018-01-13 18:01:26.106685901 +0000

> +++ gcc/config/aarch64/iterators.md     2018-01-26 14:00:15.716916999 +0000

> @@ -1674,6 +1674,15 @@ (define_int_attr perm_hilo [(UNSPEC_ZIP1

>                             (UNSPEC_UNPACKSHI "hi") (UNSPEC_UNPACKUHI "hi")

>                             (UNSPEC_UNPACKSLO "lo") (UNSPEC_UNPACKULO "lo")])

>

> +;; Return true if the associated optab refers to the high-numbered lanes,

> +;; false if it refers to the low-numbered lanes.  The convention is for

> +;; "hi" to refer to the low-numbered lanes (the first ones in memory)

> +;; for big-endian.

> +(define_int_attr hi_lanes_optab [(UNSPEC_UNPACKSHI "!BYTES_BIG_ENDIAN")

> +                                (UNSPEC_UNPACKUHI "!BYTES_BIG_ENDIAN")

> +                                (UNSPEC_UNPACKSLO "BYTES_BIG_ENDIAN")

> +                                (UNSPEC_UNPACKULO "BYTES_BIG_ENDIAN")])

> +

>  (define_int_attr frecp_suffix  [(UNSPEC_FRECPE "e") (UNSPEC_FRECPX "x")])

>

>  (define_int_attr crc_variant [(UNSPEC_CRC32B "crc32b") (UNSPEC_CRC32H "crc32h")

> Index: gcc/config/aarch64/aarch64-sve.md

> ===================================================================

> --- gcc/config/aarch64/aarch64-sve.md   2018-01-26 13:55:12.277219211 +0000

> +++ gcc/config/aarch64/aarch64-sve.md   2018-01-26 14:00:15.716916999 +0000

> @@ -817,7 +817,7 @@ (define_insn "*aarch64_sve_<perm_insn><p

>    "<perm_insn><perm_hilo>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"

>  )

>

> -(define_insn "*aarch64_sve_<perm_insn><perm_hilo><mode>"

> +(define_insn "aarch64_sve_<perm_insn><perm_hilo><mode>"

>    [(set (match_operand:SVE_ALL 0 "register_operand" "=w")

>         (unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "w")

>                          (match_operand:SVE_ALL 2 "register_operand" "w")]

> @@ -2156,7 +2156,7 @@ (define_insn "*<optab><mode>vnx4sf2"

>  )

>

>  ;; Conversion of DI or SI to DF, predicated with a PTRUE.

> -(define_insn "*<optab><mode>vnx2df2"

> +(define_insn "aarch64_sve_<optab><mode>vnx2df2"

>    [(set (match_operand:VNx2DF 0 "register_operand" "=w")

>         (unspec:VNx2DF

>           [(match_operand:VNx2BI 1 "register_operand" "Upl")

> @@ -2183,7 +2183,7 @@ (define_insn "*trunc<Vwide><mode>2"

>

>  ;; Conversion of SFs to the same number of DFs, or HFs to the same number

>  ;; of SFs.

> -(define_insn "*extend<mode><Vwide>2"

> +(define_insn "aarch64_sve_extend<mode><Vwide>2"

>    [(set (match_operand:<VWIDE> 0 "register_operand" "=w")

>         (unspec:<VWIDE>

>           [(match_operand:<VWIDE_PRED> 1 "register_operand" "Upl")

> @@ -2195,17 +2195,50 @@ (define_insn "*extend<mode><Vwide>2"

>    "fcvt\t%0.<Vewtype>, %1/m, %2.<Vetype>"

>  )

>

> +;; Unpack the low or high half of a predicate, where "high" refers to

> +;; the low-numbered lanes for big-endian and the high-numbered lanes

> +;; for little-endian.

> +(define_expand "vec_unpack<su>_<perm_hilo>_<mode>"

> +  [(match_operand:<VWIDE> 0 "register_operand")

> +   (unspec:<VWIDE> [(match_operand:PRED_BHS 1 "register_operand")]

> +                  UNPACK)]

> +  "TARGET_SVE"

> +  {

> +    emit_insn ((<hi_lanes_optab>

> +               ? gen_aarch64_sve_punpkhi_<PRED_BHS:mode>

> +               : gen_aarch64_sve_punpklo_<PRED_BHS:mode>)

> +              (operands[0], operands[1]));

> +    DONE;

> +  }

> +)

> +

>  ;; PUNPKHI and PUNPKLO.

> -(define_insn "vec_unpack<su>_<perm_hilo>_<mode>"

> +(define_insn "aarch64_sve_punpk<perm_hilo>_<mode>"

>    [(set (match_operand:<VWIDE> 0 "register_operand" "=Upa")

>         (unspec:<VWIDE> [(match_operand:PRED_BHS 1 "register_operand" "Upa")]

> -                       UNPACK))]

> +                       UNPACK_UNSIGNED))]

>    "TARGET_SVE"

>    "punpk<perm_hilo>\t%0.h, %1.b"

>  )

>

> +;; Unpack the low or high half of a vector, where "high" refers to

> +;; the low-numbered lanes for big-endian and the high-numbered lanes

> +;; for little-endian.

> +(define_expand "vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>"

> +  [(match_operand:<VWIDE> 0 "register_operand")

> +   (unspec:<VWIDE> [(match_operand:SVE_BHSI 1 "register_operand")] UNPACK)]

> +  "TARGET_SVE"

> +  {

> +    emit_insn ((<hi_lanes_optab>

> +               ? gen_aarch64_sve_<su>unpkhi_<SVE_BHSI:mode>

> +               : gen_aarch64_sve_<su>unpklo_<SVE_BHSI:mode>)

> +              (operands[0], operands[1]));

> +    DONE;

> +  }

> +)

> +

>  ;; SUNPKHI, UUNPKHI, SUNPKLO and UUNPKLO.

> -(define_insn "vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>"

> +(define_insn "aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>"

>    [(set (match_operand:<VWIDE> 0 "register_operand" "=w")

>         (unspec:<VWIDE> [(match_operand:SVE_BHSI 1 "register_operand" "w")]

>                         UNPACK))]

> @@ -2213,32 +2246,28 @@ (define_insn "vec_unpack<su>_<perm_hilo>

>    "<su>unpk<perm_hilo>\t%0.<Vewtype>, %1.<Vetype>"

>  )

>

> -;; Used by the vec_unpacks_<perm_hilo>_<mode> expander to unpack the bit

> -;; representation of a VNx4SF or VNx8HF without conversion.  The choice

> -;; between signed and unsigned isn't significant.

> -(define_insn "*vec_unpacku_<perm_hilo>_<mode>_no_convert"

> -  [(set (match_operand:SVE_HSF 0 "register_operand" "=w")

> -       (unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand" "w")]

> -                       UNPACK_UNSIGNED))]

> -  "TARGET_SVE"

> -  "uunpk<perm_hilo>\t%0.<Vewtype>, %1.<Vetype>"

> -)

> -

>  ;; Unpack one half of a VNx4SF to VNx2DF, or one half of a VNx8HF to VNx4SF.

>  ;; First unpack the source without conversion, then float-convert the

>  ;; unpacked source.

>  (define_expand "vec_unpacks_<perm_hilo>_<mode>"

> -  [(set (match_dup 2)

> -       (unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand")]

> -                       UNPACK_UNSIGNED))

> -   (set (match_operand:<VWIDE> 0 "register_operand")

> -       (unspec:<VWIDE> [(match_dup 3)

> -                        (unspec:<VWIDE> [(match_dup 2)] UNSPEC_FLOAT_CONVERT)]

> -                       UNSPEC_MERGE_PTRUE))]

> -  "TARGET_SVE"

> -  {

> -    operands[2] = gen_reg_rtx (<MODE>mode);

> -    operands[3] = force_reg (<VWIDE_PRED>mode, CONSTM1_RTX (<VWIDE_PRED>mode));

> +  [(match_operand:<VWIDE> 0 "register_operand")

> +   (unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand")]

> +                  UNPACK_UNSIGNED)]

> +  "TARGET_SVE"

> +  {

> +    /* Use ZIP to do the unpack, since we don't care about the upper halves

> +       and since it has the nice property of not needing any subregs.

> +       If using UUNPK* turns out to be preferable, we could model it as

> +       a ZIP whose first operand is zero.  */

> +    rtx temp = gen_reg_rtx (<MODE>mode);

> +    emit_insn ((<hi_lanes_optab>

> +               ? gen_aarch64_sve_zip2<mode>

> +               : gen_aarch64_sve_zip1<mode>)

> +               (temp, operands[1], operands[1]));

> +    rtx ptrue = force_reg (<VWIDE_PRED>mode, CONSTM1_RTX (<VWIDE_PRED>mode));

> +    emit_insn (gen_aarch64_sve_extend<mode><Vwide>2 (operands[0],

> +                                                    ptrue, temp));

> +    DONE;

>    }

>  )

>

> @@ -2246,18 +2275,25 @@ (define_expand "vec_unpacks_<perm_hilo>_

>  ;; to VNx2DI, reinterpret the VNx2DI as a VNx4SI, then convert the

>  ;; unpacked VNx4SI to VNx2DF.

>  (define_expand "vec_unpack<su_optab>_float_<perm_hilo>_vnx4si"

> -  [(set (match_dup 2)

> -       (unspec:VNx2DI [(match_operand:VNx4SI 1 "register_operand")]

> -                      UNPACK_UNSIGNED))

> -   (set (match_operand:VNx2DF 0 "register_operand")

> -       (unspec:VNx2DF [(match_dup 3)

> -                       (FLOATUORS:VNx2DF (match_dup 4))]

> -                      UNSPEC_MERGE_PTRUE))]

> -  "TARGET_SVE"

> -  {

> -    operands[2] = gen_reg_rtx (VNx2DImode);

> -    operands[3] = force_reg (VNx2BImode, CONSTM1_RTX (VNx2BImode));

> -    operands[4] = gen_rtx_SUBREG (VNx4SImode, operands[2], 0);

> +  [(match_operand:VNx2DF 0 "register_operand")

> +   (FLOATUORS:VNx2DF

> +     (unspec:VNx2DI [(match_operand:VNx4SI 1 "register_operand")]

> +                   UNPACK_UNSIGNED))]

> +  "TARGET_SVE"

> +  {

> +    /* Use ZIP to do the unpack, since we don't care about the upper halves

> +       and since it has the nice property of not needing any subregs.

> +       If using UUNPK* turns out to be preferable, we could model it as

> +       a ZIP whose first operand is zero.  */

> +    rtx temp = gen_reg_rtx (VNx4SImode);

> +    emit_insn ((<hi_lanes_optab>

> +               ? gen_aarch64_sve_zip2vnx4si

> +               : gen_aarch64_sve_zip1vnx4si)

> +              (temp, operands[1], operands[1]));

> +    rtx ptrue = force_reg (VNx2BImode, CONSTM1_RTX (VNx2BImode));

> +    emit_insn (gen_aarch64_sve_<FLOATUORS:optab>vnx4sivnx2df2 (operands[0],

> +                                                              ptrue, temp));

> +    DONE;

>    }

>  )

>

> Index: gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c

> ===================================================================

> --- gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c 2018-01-13 17:54:52.934269783 +0000

> +++ gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c 2018-01-26 14:00:15.716916999 +0000

> @@ -10,6 +10,6 @@ unpack_double_int_plus8 (double *d, int3

>      d[i] = s[i] + 8;

>  }

>

> -/* { dg-final { scan-assembler-times {\tuunpklo\tz[0-9]+\.d, z[0-9]+\.s\n} 1 } } */

> -/* { dg-final { scan-assembler-times {\tuunpkhi\tz[0-9]+\.d, z[0-9]+\.s\n} 1 } } */

> +/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */

> +/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */

>  /* { dg-final { scan-assembler-times {\tscvtf\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.s\n} 2 } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c

> ===================================================================

> --- gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c       2018-01-13 17:54:52.934269783 +0000

> +++ gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c       2018-01-26 14:00:15.717916957 +0000

> @@ -10,6 +10,6 @@ unpack_double_int_plus9 (double *d, uint

>      d[i] = (double) (s[i] + 9);

>  }

>

> -/* { dg-final { scan-assembler-times {\tuunpklo\tz[0-9]+\.d, z[0-9]+\.s\n} 1 } } */

> -/* { dg-final { scan-assembler-times {\tuunpkhi\tz[0-9]+\.d, z[0-9]+\.s\n} 1 } } */

> +/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */

> +/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */

>  /* { dg-final { scan-assembler-times {\tucvtf\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.s\n} 2 } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/unpack_float_1.c

> ===================================================================

> --- gcc/testsuite/gcc.target/aarch64/sve/unpack_float_1.c       2018-01-13 17:54:52.935269745 +0000

> +++ gcc/testsuite/gcc.target/aarch64/sve/unpack_float_1.c       2018-01-26 14:00:15.717916957 +0000

> @@ -8,6 +8,6 @@ unpack_float_plus_7point9 (double *d, fl

>      d[i] = s[i] + 7.9;

>  }

>

> -/* { dg-final { scan-assembler-times {\tuunpklo\tz[0-9]+\.d, z[0-9]+\.s\n} 1 } } */

> -/* { dg-final { scan-assembler-times {\tuunpkhi\tz[0-9]+\.d, z[0-9]+\.s\n} 1 } } */

> +/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */

> +/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */

>  /* { dg-final { scan-assembler-times {\tfcvt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.s\n} 2 } } */
Richard Sandiford Jan. 26, 2018, 3:04 p.m. | #2
Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Jan 26, 2018 at 3:21 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks

>> the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered

>> lanes.  This meant that both the SVE patterns and the handling of

>> fully-masked loops were wrong.

>>

>> The patch deals with that by making sure that all vec_unpack* optabs

>> are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate

>> define_insn.  This in turn meant that we can get rid of the duplication

>> between the signed and unsigned patterns for predicates.  (We provide

>> implementations of both the signed and unsigned optabs because the sign

>> doesn't matter for predicates: every element contains only one

>> significant bit.)

>>

>> Also, the float unpacks need to unpack one half of the input vector,

>> but the unpacked upper bits are "don't care".  There are two obvious

>> ways of handling that: use an unpack (filling with zeros) or use a ZIP

>> (filling with a duplicate of the low bits).  The code previously used

>> unpacks, but the sequence involved a subreg that is semantically an

>> element reverse on big-endian targets.  Using the ZIP patterns avoids

>> that, and at the moment there's no reason to prefer one over the other

>> for performance reasons, so the patch switches to ZIP unconditionally.

>> As the comment says, it would be easy to optimise this later if UUNPK

>> turns out to be better for some implementations.

>>

>> Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu.

>> I think the tree-vect-loop-manip.c part is obvious, but OK for the

>> AArch64 bits?

>>

>> Richard

>>

>>

>> 2018-01-26  Richard Sandiford  <richard.sandiford@linaro.org>

>>

>> gcc/

>>         * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):

>>         Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR

>>         for big-endian.

>>         * config/aarch64/iterators.md (hi_lanes_optab): New int attribute.

>>         * config/aarch64/aarch64-sve.md

>>         (*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to...

>>         (aarch64_sve_<perm_insn><perm_hilo><mode>): ...this.

>>         (*extend<mode><Vwide>2): Rename to...

>>         (aarch64_sve_extend<mode><Vwide>2): ...this.

>>         (vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand,

>>         renaming the old pattern to...

>>         (aarch64_sve_punpk<perm_hilo>_<mode>): ...this.  Only define

>>         unsigned packs.

>>         (vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a

>>         define_expand, renaming the old pattern to...

>>         (aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this.

>>         (*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete.

>>         (vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into

>>         account when deciding which SVE instruction the optab should use.

>>         (vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise.

>>

>> gcc/testsuite/

>>         * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather

>>         than unpacks.

>>         * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.

>>         * gcc.target/aarch64/sve/unpack_float_1.c: Likewise.

>>

>> Index: gcc/tree-vect-loop-manip.c

>> ===================================================================

>> --- gcc/tree-vect-loop-manip.c  2018-01-13 18:02:00.948360274 +0000

>> +++ gcc/tree-vect-loop-manip.c  2018-01-26 14:00:15.717916957 +0000

>> @@ -334,7 +334,8 @@ vect_maybe_permute_loop_masks (gimple_se

>>         {

>>           tree src = src_rgm->masks[i / 2];

>>           tree dest = dest_rgm->masks[i];

>> -         tree_code code = (i & 1 ? VEC_UNPACK_HI_EXPR

>> +         tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1)

>> +                           ? VEC_UNPACK_HI_EXPR

>>                             : VEC_UNPACK_LO_EXPR);

>

> This looks broken anyways -- the element oder on GIMPLE is the same

> for little and big-endian.  I see two other BYTES_BIG_ENDIAN uses in

> tree-vect-*, not sure when they crept in but they are all broken.


Are you sure?  Everything apart from this new code seems consistently
to treat UNPACK_HI as "high part of the register" rather than
"high memory address/lane number".  I think changing it now would
break powerpc64 and mips.

Thanks,
Richard
Richard Biener Jan. 26, 2018, 3:18 p.m. | #3
On Fri, Jan 26, 2018 at 4:04 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:

>> On Fri, Jan 26, 2018 at 3:21 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks

>>> the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered

>>> lanes.  This meant that both the SVE patterns and the handling of

>>> fully-masked loops were wrong.

>>>

>>> The patch deals with that by making sure that all vec_unpack* optabs

>>> are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate

>>> define_insn.  This in turn meant that we can get rid of the duplication

>>> between the signed and unsigned patterns for predicates.  (We provide

>>> implementations of both the signed and unsigned optabs because the sign

>>> doesn't matter for predicates: every element contains only one

>>> significant bit.)

>>>

>>> Also, the float unpacks need to unpack one half of the input vector,

>>> but the unpacked upper bits are "don't care".  There are two obvious

>>> ways of handling that: use an unpack (filling with zeros) or use a ZIP

>>> (filling with a duplicate of the low bits).  The code previously used

>>> unpacks, but the sequence involved a subreg that is semantically an

>>> element reverse on big-endian targets.  Using the ZIP patterns avoids

>>> that, and at the moment there's no reason to prefer one over the other

>>> for performance reasons, so the patch switches to ZIP unconditionally.

>>> As the comment says, it would be easy to optimise this later if UUNPK

>>> turns out to be better for some implementations.

>>>

>>> Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu.

>>> I think the tree-vect-loop-manip.c part is obvious, but OK for the

>>> AArch64 bits?

>>>

>>> Richard

>>>

>>>

>>> 2018-01-26  Richard Sandiford  <richard.sandiford@linaro.org>

>>>

>>> gcc/

>>>         * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):

>>>         Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR

>>>         for big-endian.

>>>         * config/aarch64/iterators.md (hi_lanes_optab): New int attribute.

>>>         * config/aarch64/aarch64-sve.md

>>>         (*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to...

>>>         (aarch64_sve_<perm_insn><perm_hilo><mode>): ...this.

>>>         (*extend<mode><Vwide>2): Rename to...

>>>         (aarch64_sve_extend<mode><Vwide>2): ...this.

>>>         (vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand,

>>>         renaming the old pattern to...

>>>         (aarch64_sve_punpk<perm_hilo>_<mode>): ...this.  Only define

>>>         unsigned packs.

>>>         (vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a

>>>         define_expand, renaming the old pattern to...

>>>         (aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this.

>>>         (*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete.

>>>         (vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into

>>>         account when deciding which SVE instruction the optab should use.

>>>         (vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise.

>>>

>>> gcc/testsuite/

>>>         * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather

>>>         than unpacks.

>>>         * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.

>>>         * gcc.target/aarch64/sve/unpack_float_1.c: Likewise.

>>>

>>> Index: gcc/tree-vect-loop-manip.c

>>> ===================================================================

>>> --- gcc/tree-vect-loop-manip.c  2018-01-13 18:02:00.948360274 +0000

>>> +++ gcc/tree-vect-loop-manip.c  2018-01-26 14:00:15.717916957 +0000

>>> @@ -334,7 +334,8 @@ vect_maybe_permute_loop_masks (gimple_se

>>>         {

>>>           tree src = src_rgm->masks[i / 2];

>>>           tree dest = dest_rgm->masks[i];

>>> -         tree_code code = (i & 1 ? VEC_UNPACK_HI_EXPR

>>> +         tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1)

>>> +                           ? VEC_UNPACK_HI_EXPR

>>>                             : VEC_UNPACK_LO_EXPR);

>>

>> This looks broken anyways -- the element oder on GIMPLE is the same

>> for little and big-endian.  I see two other BYTES_BIG_ENDIAN uses in

>> tree-vect-*, not sure when they crept in but they are all broken.

>

> Are you sure?  Everything apart from this new code seems consistently

> to treat UNPACK_HI as "high part of the register" rather than

> "high memory address/lane number".  I think changing it now would

> break powerpc64 and mips.


Looks like so.  But we agreed on changing this, looks like this is a left-over.
(I'm always looking at constant folding for how it's supposed to work).
Looks like VEC_WIDEN_LSHIFT_HI/O_EXPR doesn't have constant folding.

Richard.

> Thanks,

> Richard
James Greenhalgh Feb. 6, 2018, 12:29 p.m. | #4
On Fri, Jan 26, 2018 at 02:21:11PM +0000, Richard Sandiford wrote:
> I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks

> the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered

> lanes.  This meant that both the SVE patterns and the handling of

> fully-masked loops were wrong.

> 

> The patch deals with that by making sure that all vec_unpack* optabs

> are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate

> define_insn.  This in turn meant that we can get rid of the duplication

> between the signed and unsigned patterns for predicates.  (We provide

> implementations of both the signed and unsigned optabs because the sign

> doesn't matter for predicates: every element contains only one

> significant bit.)

> 

> Also, the float unpacks need to unpack one half of the input vector,

> but the unpacked upper bits are "don't care".  There are two obvious

> ways of handling that: use an unpack (filling with zeros) or use a ZIP

> (filling with a duplicate of the low bits).  The code previously used

> unpacks, but the sequence involved a subreg that is semantically an

> element reverse on big-endian targets.  Using the ZIP patterns avoids

> that, and at the moment there's no reason to prefer one over the other

> for performance reasons, so the patch switches to ZIP unconditionally.

> As the comment says, it would be easy to optimise this later if UUNPK

> turns out to be better for some implementations.

> 

> Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu.

> I think the tree-vect-loop-manip.c part is obvious, but OK for the

> AArch64 bits?


The AArch64 bits are OK by me, if this direction is appropriate for
the tree-vect-loop-manip.c parts.

Thanks,
James

> 

> Richard

> 

> 

> 2018-01-26  Richard Sandiford  <richard.sandiford@linaro.org>

> 

> gcc/

> 	* tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):

> 	Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR

> 	for big-endian.

> 	* config/aarch64/iterators.md (hi_lanes_optab): New int attribute.

> 	* config/aarch64/aarch64-sve.md

> 	(*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to...

> 	(aarch64_sve_<perm_insn><perm_hilo><mode>): ...this.

> 	(*extend<mode><Vwide>2): Rename to...

> 	(aarch64_sve_extend<mode><Vwide>2): ...this.

> 	(vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand,

> 	renaming the old pattern to...

> 	(aarch64_sve_punpk<perm_hilo>_<mode>): ...this.  Only define

> 	unsigned packs.

> 	(vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a

> 	define_expand, renaming the old pattern to...

> 	(aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this.

> 	(*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete.

> 	(vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into

> 	account when deciding which SVE instruction the optab should use.

> 	(vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise.

> 

> gcc/testsuite/

> 	* gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather

> 	than unpacks.

> 	* gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.

> 	* gcc.target/aarch64/sve/unpack_float_1.c: Likewise.

Patch

Index: gcc/tree-vect-loop-manip.c
===================================================================
--- gcc/tree-vect-loop-manip.c	2018-01-13 18:02:00.948360274 +0000
+++ gcc/tree-vect-loop-manip.c	2018-01-26 14:00:15.717916957 +0000
@@ -334,7 +334,8 @@  vect_maybe_permute_loop_masks (gimple_se
 	{
 	  tree src = src_rgm->masks[i / 2];
 	  tree dest = dest_rgm->masks[i];
-	  tree_code code = (i & 1 ? VEC_UNPACK_HI_EXPR
+	  tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1)
+			    ? VEC_UNPACK_HI_EXPR
 			    : VEC_UNPACK_LO_EXPR);
 	  gassign *stmt;
 	  if (dest_masktype == unpack_masktype)
Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md	2018-01-13 18:01:26.106685901 +0000
+++ gcc/config/aarch64/iterators.md	2018-01-26 14:00:15.716916999 +0000
@@ -1674,6 +1674,15 @@  (define_int_attr perm_hilo [(UNSPEC_ZIP1
 			    (UNSPEC_UNPACKSHI "hi") (UNSPEC_UNPACKUHI "hi")
 			    (UNSPEC_UNPACKSLO "lo") (UNSPEC_UNPACKULO "lo")])
 
+;; Return true if the associated optab refers to the high-numbered lanes,
+;; false if it refers to the low-numbered lanes.  The convention is for
+;; "hi" to refer to the low-numbered lanes (the first ones in memory)
+;; for big-endian.
+(define_int_attr hi_lanes_optab [(UNSPEC_UNPACKSHI "!BYTES_BIG_ENDIAN")
+				 (UNSPEC_UNPACKUHI "!BYTES_BIG_ENDIAN")
+				 (UNSPEC_UNPACKSLO "BYTES_BIG_ENDIAN")
+				 (UNSPEC_UNPACKULO "BYTES_BIG_ENDIAN")])
+
 (define_int_attr frecp_suffix  [(UNSPEC_FRECPE "e") (UNSPEC_FRECPX "x")])
 
 (define_int_attr crc_variant [(UNSPEC_CRC32B "crc32b") (UNSPEC_CRC32H "crc32h")
Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2018-01-26 13:55:12.277219211 +0000
+++ gcc/config/aarch64/aarch64-sve.md	2018-01-26 14:00:15.716916999 +0000
@@ -817,7 +817,7 @@  (define_insn "*aarch64_sve_<perm_insn><p
   "<perm_insn><perm_hilo>\t%0.<Vetype>, %1.<Vetype>, %2.<Vetype>"
 )
 
-(define_insn "*aarch64_sve_<perm_insn><perm_hilo><mode>"
+(define_insn "aarch64_sve_<perm_insn><perm_hilo><mode>"
   [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
 	(unspec:SVE_ALL [(match_operand:SVE_ALL 1 "register_operand" "w")
 			 (match_operand:SVE_ALL 2 "register_operand" "w")]
@@ -2156,7 +2156,7 @@  (define_insn "*<optab><mode>vnx4sf2"
 )
 
 ;; Conversion of DI or SI to DF, predicated with a PTRUE.
-(define_insn "*<optab><mode>vnx2df2"
+(define_insn "aarch64_sve_<optab><mode>vnx2df2"
   [(set (match_operand:VNx2DF 0 "register_operand" "=w")
 	(unspec:VNx2DF
 	  [(match_operand:VNx2BI 1 "register_operand" "Upl")
@@ -2183,7 +2183,7 @@  (define_insn "*trunc<Vwide><mode>2"
 
 ;; Conversion of SFs to the same number of DFs, or HFs to the same number
 ;; of SFs.
-(define_insn "*extend<mode><Vwide>2"
+(define_insn "aarch64_sve_extend<mode><Vwide>2"
   [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
 	(unspec:<VWIDE>
 	  [(match_operand:<VWIDE_PRED> 1 "register_operand" "Upl")
@@ -2195,17 +2195,50 @@  (define_insn "*extend<mode><Vwide>2"
   "fcvt\t%0.<Vewtype>, %1/m, %2.<Vetype>"
 )
 
+;; Unpack the low or high half of a predicate, where "high" refers to
+;; the low-numbered lanes for big-endian and the high-numbered lanes
+;; for little-endian.
+(define_expand "vec_unpack<su>_<perm_hilo>_<mode>"
+  [(match_operand:<VWIDE> 0 "register_operand")
+   (unspec:<VWIDE> [(match_operand:PRED_BHS 1 "register_operand")]
+		   UNPACK)]
+  "TARGET_SVE"
+  {
+    emit_insn ((<hi_lanes_optab>
+		? gen_aarch64_sve_punpkhi_<PRED_BHS:mode>
+		: gen_aarch64_sve_punpklo_<PRED_BHS:mode>)
+	       (operands[0], operands[1]));
+    DONE;
+  }
+)
+
 ;; PUNPKHI and PUNPKLO.
-(define_insn "vec_unpack<su>_<perm_hilo>_<mode>"
+(define_insn "aarch64_sve_punpk<perm_hilo>_<mode>"
   [(set (match_operand:<VWIDE> 0 "register_operand" "=Upa")
 	(unspec:<VWIDE> [(match_operand:PRED_BHS 1 "register_operand" "Upa")]
-			UNPACK))]
+			UNPACK_UNSIGNED))]
   "TARGET_SVE"
   "punpk<perm_hilo>\t%0.h, %1.b"
 )
 
+;; Unpack the low or high half of a vector, where "high" refers to
+;; the low-numbered lanes for big-endian and the high-numbered lanes
+;; for little-endian.
+(define_expand "vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>"
+  [(match_operand:<VWIDE> 0 "register_operand")
+   (unspec:<VWIDE> [(match_operand:SVE_BHSI 1 "register_operand")] UNPACK)]
+  "TARGET_SVE"
+  {
+    emit_insn ((<hi_lanes_optab>
+		? gen_aarch64_sve_<su>unpkhi_<SVE_BHSI:mode>
+		: gen_aarch64_sve_<su>unpklo_<SVE_BHSI:mode>)
+	       (operands[0], operands[1]));
+    DONE;
+  }
+)
+
 ;; SUNPKHI, UUNPKHI, SUNPKLO and UUNPKLO.
-(define_insn "vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>"
+(define_insn "aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>"
   [(set (match_operand:<VWIDE> 0 "register_operand" "=w")
 	(unspec:<VWIDE> [(match_operand:SVE_BHSI 1 "register_operand" "w")]
 			UNPACK))]
@@ -2213,32 +2246,28 @@  (define_insn "vec_unpack<su>_<perm_hilo>
   "<su>unpk<perm_hilo>\t%0.<Vewtype>, %1.<Vetype>"
 )
 
-;; Used by the vec_unpacks_<perm_hilo>_<mode> expander to unpack the bit
-;; representation of a VNx4SF or VNx8HF without conversion.  The choice
-;; between signed and unsigned isn't significant.
-(define_insn "*vec_unpacku_<perm_hilo>_<mode>_no_convert"
-  [(set (match_operand:SVE_HSF 0 "register_operand" "=w")
-	(unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand" "w")]
-			UNPACK_UNSIGNED))]
-  "TARGET_SVE"
-  "uunpk<perm_hilo>\t%0.<Vewtype>, %1.<Vetype>"
-)
-
 ;; Unpack one half of a VNx4SF to VNx2DF, or one half of a VNx8HF to VNx4SF.
 ;; First unpack the source without conversion, then float-convert the
 ;; unpacked source.
 (define_expand "vec_unpacks_<perm_hilo>_<mode>"
-  [(set (match_dup 2)
-	(unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand")]
-			UNPACK_UNSIGNED))
-   (set (match_operand:<VWIDE> 0 "register_operand")
-	(unspec:<VWIDE> [(match_dup 3)
-			 (unspec:<VWIDE> [(match_dup 2)] UNSPEC_FLOAT_CONVERT)]
-			UNSPEC_MERGE_PTRUE))]
-  "TARGET_SVE"
-  {
-    operands[2] = gen_reg_rtx (<MODE>mode);
-    operands[3] = force_reg (<VWIDE_PRED>mode, CONSTM1_RTX (<VWIDE_PRED>mode));
+  [(match_operand:<VWIDE> 0 "register_operand")
+   (unspec:SVE_HSF [(match_operand:SVE_HSF 1 "register_operand")]
+		   UNPACK_UNSIGNED)]
+  "TARGET_SVE"
+  {
+    /* Use ZIP to do the unpack, since we don't care about the upper halves
+       and since it has the nice property of not needing any subregs.
+       If using UUNPK* turns out to be preferable, we could model it as
+       a ZIP whose first operand is zero.  */
+    rtx temp = gen_reg_rtx (<MODE>mode);
+    emit_insn ((<hi_lanes_optab>
+		? gen_aarch64_sve_zip2<mode>
+		: gen_aarch64_sve_zip1<mode>)
+		(temp, operands[1], operands[1]));
+    rtx ptrue = force_reg (<VWIDE_PRED>mode, CONSTM1_RTX (<VWIDE_PRED>mode));
+    emit_insn (gen_aarch64_sve_extend<mode><Vwide>2 (operands[0],
+						     ptrue, temp));
+    DONE;
   }
 )
 
@@ -2246,18 +2275,25 @@  (define_expand "vec_unpacks_<perm_hilo>_
 ;; to VNx2DI, reinterpret the VNx2DI as a VNx4SI, then convert the
 ;; unpacked VNx4SI to VNx2DF.
 (define_expand "vec_unpack<su_optab>_float_<perm_hilo>_vnx4si"
-  [(set (match_dup 2)
-	(unspec:VNx2DI [(match_operand:VNx4SI 1 "register_operand")]
-		       UNPACK_UNSIGNED))
-   (set (match_operand:VNx2DF 0 "register_operand")
-	(unspec:VNx2DF [(match_dup 3)
-			(FLOATUORS:VNx2DF (match_dup 4))]
-		       UNSPEC_MERGE_PTRUE))]
-  "TARGET_SVE"
-  {
-    operands[2] = gen_reg_rtx (VNx2DImode);
-    operands[3] = force_reg (VNx2BImode, CONSTM1_RTX (VNx2BImode));
-    operands[4] = gen_rtx_SUBREG (VNx4SImode, operands[2], 0);
+  [(match_operand:VNx2DF 0 "register_operand")
+   (FLOATUORS:VNx2DF
+     (unspec:VNx2DI [(match_operand:VNx4SI 1 "register_operand")]
+		    UNPACK_UNSIGNED))]
+  "TARGET_SVE"
+  {
+    /* Use ZIP to do the unpack, since we don't care about the upper halves
+       and since it has the nice property of not needing any subregs.
+       If using UUNPK* turns out to be preferable, we could model it as
+       a ZIP whose first operand is zero.  */
+    rtx temp = gen_reg_rtx (VNx4SImode);
+    emit_insn ((<hi_lanes_optab>
+	        ? gen_aarch64_sve_zip2vnx4si
+	        : gen_aarch64_sve_zip1vnx4si)
+	       (temp, operands[1], operands[1]));
+    rtx ptrue = force_reg (VNx2BImode, CONSTM1_RTX (VNx2BImode));
+    emit_insn (gen_aarch64_sve_<FLOATUORS:optab>vnx4sivnx2df2 (operands[0],
+							       ptrue, temp));
+    DONE;
   }
 )
 
Index: gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c	2018-01-13 17:54:52.934269783 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_signed_1.c	2018-01-26 14:00:15.716916999 +0000
@@ -10,6 +10,6 @@  unpack_double_int_plus8 (double *d, int3
     d[i] = s[i] + 8;
 }
 
-/* { dg-final { scan-assembler-times {\tuunpklo\tz[0-9]+\.d, z[0-9]+\.s\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tuunpkhi\tz[0-9]+\.d, z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tscvtf\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.s\n} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c	2018-01-13 17:54:52.934269783 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c	2018-01-26 14:00:15.717916957 +0000
@@ -10,6 +10,6 @@  unpack_double_int_plus9 (double *d, uint
     d[i] = (double) (s[i] + 9);
 }
 
-/* { dg-final { scan-assembler-times {\tuunpklo\tz[0-9]+\.d, z[0-9]+\.s\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tuunpkhi\tz[0-9]+\.d, z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tucvtf\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.s\n} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/unpack_float_1.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/unpack_float_1.c	2018-01-13 17:54:52.935269745 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/unpack_float_1.c	2018-01-26 14:00:15.717916957 +0000
@@ -8,6 +8,6 @@  unpack_float_plus_7point9 (double *d, fl
     d[i] = s[i] + 7.9;
 }
 
-/* { dg-final { scan-assembler-times {\tuunpklo\tz[0-9]+\.d, z[0-9]+\.s\n} 1 } } */
-/* { dg-final { scan-assembler-times {\tuunpkhi\tz[0-9]+\.d, z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tzip1\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
+/* { dg-final { scan-assembler-times {\tzip2\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tfcvt\tz[0-9]+\.d, p[0-7]/m, z[0-9]+\.s\n} 2 } } */