Handle vector boolean types when calculating the SLP unroll factor

Message ID 87efilqfrn.fsf@linaro.org
State New
Headers show
Series
  • Handle vector boolean types when calculating the SLP unroll factor
Related show

Commit Message

Richard Sandiford May 9, 2018, 10:34 a.m.
The SLP unrolling factor is calculated by finding the smallest
scalar type for each SLP statement and taking the number of required
lanes from the vector versions of those scalar types.  E.g. for an
int32->int64 conversion, it's the vector of int32s rather than the
vector of int64s that determines the unroll factor.

We rely on tree-vect-patterns.c to replace boolean operations like:

   bool a, b, c;
   a = b & c;

with integer operations of whatever the best size is in context.
E.g. if b and c are fed by comparisons of ints, a, b and c will become
the appropriate size for an int comparison.  For most targets this means
that a, b and c will end up as int-sized themselves, but on targets like
SVE and AVX512 with packed vector booleans, they'll instead become a
small bitfield like :1, padded to a byte for memory purposes.
The SLP code would then take these scalar types and try to calculate
the vector type for them, causing the unroll factor to be much higher
than necessary.

This patch makes SLP use the cached vector boolean type if that's
appropriate.  Tested on aarch64-linux-gnu (with and without SVE),
aarch64_be-none-elf and x86_64-linux-gnu.  OK to install?

Richard


2018-05-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* tree-vect-slp.c (get_vectype_for_smallest_scalar_type): New function.
	(vect_build_slp_tree_1): Use it when calculating the unroll factor.

gcc/testsuite/
	* gcc.target/aarch64/sve/vcond_10.c: New test.
	* gcc.target/aarch64/sve/vcond_10_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_11.c: Likewise.
	* gcc.target/aarch64/sve/vcond_11_run.c: Likewise.

Comments

Richard Biener May 9, 2018, 10:55 a.m. | #1
On Wed, May 9, 2018 at 12:34 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> The SLP unrolling factor is calculated by finding the smallest

> scalar type for each SLP statement and taking the number of required

> lanes from the vector versions of those scalar types.  E.g. for an

> int32->int64 conversion, it's the vector of int32s rather than the

> vector of int64s that determines the unroll factor.

>

> We rely on tree-vect-patterns.c to replace boolean operations like:

>

>    bool a, b, c;

>    a = b & c;

>

> with integer operations of whatever the best size is in context.

> E.g. if b and c are fed by comparisons of ints, a, b and c will become

> the appropriate size for an int comparison.  For most targets this means

> that a, b and c will end up as int-sized themselves, but on targets like

> SVE and AVX512 with packed vector booleans, they'll instead become a

> small bitfield like :1, padded to a byte for memory purposes.

> The SLP code would then take these scalar types and try to calculate

> the vector type for them, causing the unroll factor to be much higher

> than necessary.

>

> This patch makes SLP use the cached vector boolean type if that's

> appropriate.  Tested on aarch64-linux-gnu (with and without SVE),

> aarch64_be-none-elf and x86_64-linux-gnu.  OK to install?

>

> Richard

>

>

> 2018-05-09  Richard Sandiford  <richard.sandiford@linaro.org>

>

> gcc/

>         * tree-vect-slp.c (get_vectype_for_smallest_scalar_type): New function.

>         (vect_build_slp_tree_1): Use it when calculating the unroll factor.

>

> gcc/testsuite/

>         * gcc.target/aarch64/sve/vcond_10.c: New test.

>         * gcc.target/aarch64/sve/vcond_10_run.c: Likewise.

>         * gcc.target/aarch64/sve/vcond_11.c: Likewise.

>         * gcc.target/aarch64/sve/vcond_11_run.c: Likewise.

>

> Index: gcc/tree-vect-slp.c

> ===================================================================

> --- gcc/tree-vect-slp.c 2018-05-08 09:42:03.526648115 +0100

> +++ gcc/tree-vect-slp.c 2018-05-09 11:30:41.061096063 +0100

> @@ -608,6 +608,41 @@ vect_record_max_nunits (vec_info *vinfo,

>    return true;

>  }

>

> +/* Return the vector type associated with the smallest scalar type in STMT.  */

> +

> +static tree

> +get_vectype_for_smallest_scalar_type (gimple *stmt)

> +{

> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);

> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);

> +  if (vectype != NULL_TREE

> +      && VECTOR_BOOLEAN_TYPE_P (vectype))


Hum.  At this point you can't really rely on vector types being set...

> +    {

> +      /* The result of a vector boolean operation has the smallest scalar

> +        type unless the statement is extending an even narrower boolean.  */

> +      if (!gimple_assign_cast_p (stmt))

> +       return vectype;

> +

> +      tree src = gimple_assign_rhs1 (stmt);

> +      gimple *def_stmt;

> +      enum vect_def_type dt;

> +      tree src_vectype = NULL_TREE;

> +      if (vect_is_simple_use (src, stmt_info->vinfo, &def_stmt, &dt,

> +                             &src_vectype)

> +         && src_vectype

> +         && VECTOR_BOOLEAN_TYPE_P (src_vectype))

> +       {

> +         if (TYPE_PRECISION (TREE_TYPE (src_vectype))

> +             < TYPE_PRECISION (TREE_TYPE (vectype)))

> +           return src_vectype;

> +         return vectype;

> +       }

> +    }

> +  HOST_WIDE_INT dummy;

> +  tree scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);

> +  return get_vectype_for_scalar_type (scalar_type);

> +}

> +

>  /* Verify if the scalar stmts STMTS are isomorphic, require data

>     permutation or are of unsupported types of operation.  Return

>     true if they are, otherwise return false and indicate in *MATCHES

> @@ -636,12 +671,11 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>    enum tree_code first_cond_code = ERROR_MARK;

>    tree lhs;

>    bool need_same_oprnds = false;

> -  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;

> +  tree vectype = NULL_TREE, first_op1 = NULL_TREE;

>    optab optab;

>    int icode;

>    machine_mode optab_op2_mode;

>    machine_mode vec_mode;

> -  HOST_WIDE_INT dummy;

>    gimple *first_load = NULL, *prev_first_load = NULL;

>

>    /* For every stmt in NODE find its def stmt/s.  */

> @@ -685,15 +719,14 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>           return false;

>         }

>

> -      scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);


... so I wonder how this goes wrong here.

I suppose we want to ignore vector booleans for the purpose of max_nunits
computation.  So isn't a better fix to simply "ignore" those in
vect_get_smallest_scalar_type instead?  I see that for intermediate
full-boolean operations like

  a = x[i] < 0;
  b = y[i] > 0;
  tem = a & b;

we want to ignore 'tem = a & b' fully here for the purpose of
vect_record_max_nunits.  So if scalar_type is a bitfield type
then skip it?

Richard.

> -      vectype = get_vectype_for_scalar_type (scalar_type);

> +      vectype = get_vectype_for_smallest_scalar_type (stmt);

>        if (!vect_record_max_nunits (vinfo, stmt, group_size, vectype,

>                                    max_nunits))

>         {

>           /* Fatal mismatch.  */

>           matches[0] = false;

> -          return false;

> -        }

> +         return false;

> +       }

>

>        if (gcall *call_stmt = dyn_cast <gcall *> (stmt))

>         {

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c     2018-05-09 11:30:41.057096221 +0100

> @@ -0,0 +1,36 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

> +

> +#include <stdint.h>

> +

> +#define DEF_LOOP(TYPE)                                                 \

> +  void __attribute__ ((noinline, noclone))                             \

> +  test_##TYPE (TYPE *a, TYPE a1, TYPE a2, TYPE a3, TYPE a4, int n)     \

> +  {                                                                    \

> +    for (int i = 0; i < n; i += 2)                                     \

> +      {                                                                        \

> +       a[i] = a[i] >= 1 && a[i] != 3 ? a1 : a2;                        \

> +       a[i + 1] = a[i + 1] >= 1 && a[i + 1] != 3 ? a3 : a4;            \

> +      }                                                                        \

> +  }

> +

> +#define FOR_EACH_TYPE(T) \

> +  T (int8_t) \

> +  T (uint8_t) \

> +  T (int16_t) \

> +  T (uint16_t) \

> +  T (int32_t) \

> +  T (uint32_t) \

> +  T (int64_t) \

> +  T (uint64_t) \

> +  T (_Float16) \

> +  T (float) \

> +  T (double)

> +

> +FOR_EACH_TYPE (DEF_LOOP)

> +

> +/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */

> +/* { dg-final { scan-assembler-times {\tld1h\t} 3 } } */

> +/* { dg-final { scan-assembler-times {\tld1w\t} 3 } } */

> +/* { dg-final { scan-assembler-times {\tld1d\t} 3 } } */

> +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 11 } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c 2018-05-09 11:30:41.057096221 +0100

> @@ -0,0 +1,24 @@

> +/* { dg-do run { target aarch64_sve_hw } } */

> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

> +

> +#include "vcond_10.c"

> +

> +#define N 133

> +

> +#define TEST_LOOP(TYPE)                                                        \

> +  {                                                                    \

> +    TYPE a[N];                                                         \

> +    for (int i = 0; i < N; ++i)                                                \

> +      a[i] = i % 7;                                                    \

> +    test_##TYPE (a, 10, 11, 12, 13, N);                                        \

> +    for (int i = 0; i < N; ++i)                                                \

> +      if (a[i] != 10 + (i & 1) * 2 + (i % 7 == 0 || i % 7 == 3))       \

> +       __builtin_abort ();                                             \

> +  }

> +

> +int

> +main (void)

> +{

> +  FOR_EACH_TYPE (TEST_LOOP);

> +  return 0;

> +}

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c     2018-05-09 11:30:41.057096221 +0100

> @@ -0,0 +1,36 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

> +

> +#include <stdint.h>

> +

> +#define DEF_LOOP(TYPE)                                                 \

> +  void __attribute__ ((noinline, noclone))                             \

> +  test_##TYPE (int *restrict a, TYPE *restrict b, int a1, int a2,      \

> +              int a3, int a4, int n)                                   \

> +  {                                                                    \

> +    for (int i = 0; i < n; i += 2)                                     \

> +      {                                                                        \

> +       a[i] = a[i] >= 1 & b[i] != 3 ? a1 : a2;                         \

> +       a[i + 1] = a[i + 1] >= 1 & b[i + 1] != 3 ? a3 : a4;             \

> +      }                                                                        \

> +  }

> +

> +#define FOR_EACH_TYPE(T) \

> +  T (int8_t) \

> +  T (uint8_t) \

> +  T (int16_t) \

> +  T (uint16_t) \

> +  T (int64_t) \

> +  T (uint64_t) \

> +  T (double)

> +

> +FOR_EACH_TYPE (DEF_LOOP)

> +

> +/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */

> +/* { dg-final { scan-assembler-times {\tld1h\t} 2 } } */

> +/* 4 for each 8-bit function, 2 for each 16-bit function, 1 for

> +   each 64-bit function.  */

> +/* { dg-final { scan-assembler-times {\tld1w\t} 15 } } */

> +/* 3 64-bit functions * 2 64-bit vectors per 32-bit vector.  */

> +/* { dg-final { scan-assembler-times {\tld1d\t} 6 } } */

> +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 15 } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11_run.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11_run.c 2018-05-09 11:30:41.059096142 +0100

> @@ -0,0 +1,28 @@

> +/* { dg-do run { target aarch64_sve_hw } } */

> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

> +

> +#include "vcond_11.c"

> +

> +#define N 133

> +

> +#define TEST_LOOP(TYPE)                                                        \

> +  {                                                                    \

> +    int a[N];                                                          \

> +    TYPE b[N];                                                         \

> +    for (int i = 0; i < N; ++i)                                                \

> +      {                                                                        \

> +       a[i] = i % 5;                                                   \

> +       b[i] = i % 7;                                                   \

> +      }                                                                        \

> +    test_##TYPE (a, b, 10, 11, 12, 13, N);                             \

> +    for (int i = 0; i < N; ++i)                                                \

> +      if (a[i] != 10 + (i & 1) * 2 + (i % 5 == 0 || i % 7 == 3))       \

> +       __builtin_abort ();                                             \

> +  }

> +

> +int

> +main (void)

> +{

> +  FOR_EACH_TYPE (TEST_LOOP);

> +  return 0;

> +}
Richard Sandiford May 9, 2018, 11:29 a.m. | #2
Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, May 9, 2018 at 12:34 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> The SLP unrolling factor is calculated by finding the smallest

>> scalar type for each SLP statement and taking the number of required

>> lanes from the vector versions of those scalar types.  E.g. for an

>> int32->int64 conversion, it's the vector of int32s rather than the

>> vector of int64s that determines the unroll factor.

>>

>> We rely on tree-vect-patterns.c to replace boolean operations like:

>>

>>    bool a, b, c;

>>    a = b & c;

>>

>> with integer operations of whatever the best size is in context.

>> E.g. if b and c are fed by comparisons of ints, a, b and c will become

>> the appropriate size for an int comparison.  For most targets this means

>> that a, b and c will end up as int-sized themselves, but on targets like

>> SVE and AVX512 with packed vector booleans, they'll instead become a

>> small bitfield like :1, padded to a byte for memory purposes.

>> The SLP code would then take these scalar types and try to calculate

>> the vector type for them, causing the unroll factor to be much higher

>> than necessary.

>>

>> This patch makes SLP use the cached vector boolean type if that's

>> appropriate.  Tested on aarch64-linux-gnu (with and without SVE),

>> aarch64_be-none-elf and x86_64-linux-gnu.  OK to install?

>>

>> Richard

>>

>>

>> 2018-05-09  Richard Sandiford  <richard.sandiford@linaro.org>

>>

>> gcc/

>>         * tree-vect-slp.c (get_vectype_for_smallest_scalar_type): New function.

>>         (vect_build_slp_tree_1): Use it when calculating the unroll factor.

>>

>> gcc/testsuite/

>>         * gcc.target/aarch64/sve/vcond_10.c: New test.

>>         * gcc.target/aarch64/sve/vcond_10_run.c: Likewise.

>>         * gcc.target/aarch64/sve/vcond_11.c: Likewise.

>>         * gcc.target/aarch64/sve/vcond_11_run.c: Likewise.

>>

>> Index: gcc/tree-vect-slp.c

>> ===================================================================

>> --- gcc/tree-vect-slp.c 2018-05-08 09:42:03.526648115 +0100

>> +++ gcc/tree-vect-slp.c 2018-05-09 11:30:41.061096063 +0100

>> @@ -608,6 +608,41 @@ vect_record_max_nunits (vec_info *vinfo,

>>    return true;

>>  }

>>

>> +/* Return the vector type associated with the smallest scalar type in STMT.  */

>> +

>> +static tree

>> +get_vectype_for_smallest_scalar_type (gimple *stmt)

>> +{

>> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);

>> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);

>> +  if (vectype != NULL_TREE

>> +      && VECTOR_BOOLEAN_TYPE_P (vectype))

>

> Hum.  At this point you can't really rely on vector types being set...


Not for everything, but here we only care about the result of the
pattern replacements, and pattern replacements do set the vector type
up-front.  vect_determine_vectorization_factor (which runs earlier
for loop vectorisation) also relies on this.

>> +    {

>> +      /* The result of a vector boolean operation has the smallest scalar

>> +        type unless the statement is extending an even narrower boolean.  */

>> +      if (!gimple_assign_cast_p (stmt))

>> +       return vectype;

>> +

>> +      tree src = gimple_assign_rhs1 (stmt);

>> +      gimple *def_stmt;

>> +      enum vect_def_type dt;

>> +      tree src_vectype = NULL_TREE;

>> +      if (vect_is_simple_use (src, stmt_info->vinfo, &def_stmt, &dt,

>> +                             &src_vectype)

>> +         && src_vectype

>> +         && VECTOR_BOOLEAN_TYPE_P (src_vectype))

>> +       {

>> +         if (TYPE_PRECISION (TREE_TYPE (src_vectype))

>> +             < TYPE_PRECISION (TREE_TYPE (vectype)))

>> +           return src_vectype;

>> +         return vectype;

>> +       }

>> +    }

>> +  HOST_WIDE_INT dummy;

>> +  tree scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);

>> +  return get_vectype_for_scalar_type (scalar_type);

>> +}

>> +

>>  /* Verify if the scalar stmts STMTS are isomorphic, require data

>>     permutation or are of unsupported types of operation.  Return

>>     true if they are, otherwise return false and indicate in *MATCHES

>> @@ -636,12 +671,11 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>>    enum tree_code first_cond_code = ERROR_MARK;

>>    tree lhs;

>>    bool need_same_oprnds = false;

>> -  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;

>> +  tree vectype = NULL_TREE, first_op1 = NULL_TREE;

>>    optab optab;

>>    int icode;

>>    machine_mode optab_op2_mode;

>>    machine_mode vec_mode;

>> -  HOST_WIDE_INT dummy;

>>    gimple *first_load = NULL, *prev_first_load = NULL;

>>

>>    /* For every stmt in NODE find its def stmt/s.  */

>> @@ -685,15 +719,14 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>>           return false;

>>         }

>>

>> -      scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);

>

> ... so I wonder how this goes wrong here.


It picks the right scalar type, but then we go on to use
get_vectype_for_scalar_type when get_mask_type_for_scalar_type
is what we actually want.  The easiest fix for that seemed to use
the vectype that had already been calculated (also as for
vect_determine_vectorization_factor).

> I suppose we want to ignore vector booleans for the purpose of max_nunits

> computation.  So isn't a better fix to simply "ignore" those in

> vect_get_smallest_scalar_type instead?  I see that for intermediate

> full-boolean operations like

>

>   a = x[i] < 0;

>   b = y[i] > 0;

>   tem = a & b;

>

> we want to ignore 'tem = a & b' fully here for the purpose of

> vect_record_max_nunits.  So if scalar_type is a bitfield type

> then skip it?


Bitfield types will always be the smallest scalar type if they're
present, so I think in pathological cases this could make us
incorrectly ignore source operands to a compare.

If we're confident that compares and casts of VECT_SCALAR_BOOLEAN_TYPE_Ps
never affect the VF or UF then we should probably skip them based on
that rather than whether the scalar type is a bitfield, so that the
behaviour is the same for all targets.  It seems a bit dangerous though...

Thanks,
Richard

>

> Richard.

>

>> -      vectype = get_vectype_for_scalar_type (scalar_type);

>> +      vectype = get_vectype_for_smallest_scalar_type (stmt);

>>        if (!vect_record_max_nunits (vinfo, stmt, group_size, vectype,

>>                                    max_nunits))

>>         {

>>           /* Fatal mismatch.  */

>>           matches[0] = false;

>> -          return false;

>> -        }

>> +         return false;

>> +       }

>>

>>        if (gcall *call_stmt = dyn_cast <gcall *> (stmt))

>>         {

>> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c

>> ===================================================================

>> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

>> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c     2018-05-09 11:30:41.057096221 +0100

>> @@ -0,0 +1,36 @@

>> +/* { dg-do compile } */

>> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

>> +

>> +#include <stdint.h>

>> +

>> +#define DEF_LOOP(TYPE)                                                 \

>> +  void __attribute__ ((noinline, noclone))                             \

>> +  test_##TYPE (TYPE *a, TYPE a1, TYPE a2, TYPE a3, TYPE a4, int n)     \

>> +  {                                                                    \

>> +    for (int i = 0; i < n; i += 2)                                     \

>> +      {                                                                        \

>> +       a[i] = a[i] >= 1 && a[i] != 3 ? a1 : a2;                        \

>> +       a[i + 1] = a[i + 1] >= 1 && a[i + 1] != 3 ? a3 : a4;            \

>> +      }                                                                        \

>> +  }

>> +

>> +#define FOR_EACH_TYPE(T) \

>> +  T (int8_t) \

>> +  T (uint8_t) \

>> +  T (int16_t) \

>> +  T (uint16_t) \

>> +  T (int32_t) \

>> +  T (uint32_t) \

>> +  T (int64_t) \

>> +  T (uint64_t) \

>> +  T (_Float16) \

>> +  T (float) \

>> +  T (double)

>> +

>> +FOR_EACH_TYPE (DEF_LOOP)

>> +

>> +/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */

>> +/* { dg-final { scan-assembler-times {\tld1h\t} 3 } } */

>> +/* { dg-final { scan-assembler-times {\tld1w\t} 3 } } */

>> +/* { dg-final { scan-assembler-times {\tld1d\t} 3 } } */

>> +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 11 } } */

>> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c

>> ===================================================================

>> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

>> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c 2018-05-09 11:30:41.057096221 +0100

>> @@ -0,0 +1,24 @@

>> +/* { dg-do run { target aarch64_sve_hw } } */

>> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

>> +

>> +#include "vcond_10.c"

>> +

>> +#define N 133

>> +

>> +#define TEST_LOOP(TYPE)                                                        \

>> +  {                                                                    \

>> +    TYPE a[N];                                                         \

>> +    for (int i = 0; i < N; ++i)                                                \

>> +      a[i] = i % 7;                                                    \

>> +    test_##TYPE (a, 10, 11, 12, 13, N);                                        \

>> +    for (int i = 0; i < N; ++i)                                                \

>> +      if (a[i] != 10 + (i & 1) * 2 + (i % 7 == 0 || i % 7 == 3))       \

>> +       __builtin_abort ();                                             \

>> +  }

>> +

>> +int

>> +main (void)

>> +{

>> +  FOR_EACH_TYPE (TEST_LOOP);

>> +  return 0;

>> +}

>> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c

>> ===================================================================

>> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

>> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c     2018-05-09 11:30:41.057096221 +0100

>> @@ -0,0 +1,36 @@

>> +/* { dg-do compile } */

>> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

>> +

>> +#include <stdint.h>

>> +

>> +#define DEF_LOOP(TYPE)                                                 \

>> +  void __attribute__ ((noinline, noclone))                             \

>> +  test_##TYPE (int *restrict a, TYPE *restrict b, int a1, int a2,      \

>> +              int a3, int a4, int n)                                   \

>> +  {                                                                    \

>> +    for (int i = 0; i < n; i += 2)                                     \

>> +      {                                                                        \

>> +       a[i] = a[i] >= 1 & b[i] != 3 ? a1 : a2;                         \

>> +       a[i + 1] = a[i + 1] >= 1 & b[i + 1] != 3 ? a3 : a4;             \

>> +      }                                                                        \

>> +  }

>> +

>> +#define FOR_EACH_TYPE(T) \

>> +  T (int8_t) \

>> +  T (uint8_t) \

>> +  T (int16_t) \

>> +  T (uint16_t) \

>> +  T (int64_t) \

>> +  T (uint64_t) \

>> +  T (double)

>> +

>> +FOR_EACH_TYPE (DEF_LOOP)

>> +

>> +/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */

>> +/* { dg-final { scan-assembler-times {\tld1h\t} 2 } } */

>> +/* 4 for each 8-bit function, 2 for each 16-bit function, 1 for

>> +   each 64-bit function.  */

>> +/* { dg-final { scan-assembler-times {\tld1w\t} 15 } } */

>> +/* 3 64-bit functions * 2 64-bit vectors per 32-bit vector.  */

>> +/* { dg-final { scan-assembler-times {\tld1d\t} 6 } } */

>> +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 15 } } */

>> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11_run.c

>> ===================================================================

>> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

>> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11_run.c 2018-05-09 11:30:41.059096142 +0100

>> @@ -0,0 +1,28 @@

>> +/* { dg-do run { target aarch64_sve_hw } } */

>> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

>> +

>> +#include "vcond_11.c"

>> +

>> +#define N 133

>> +

>> +#define TEST_LOOP(TYPE)                                                        \

>> +  {                                                                    \

>> +    int a[N];                                                          \

>> +    TYPE b[N];                                                         \

>> +    for (int i = 0; i < N; ++i)                                                \

>> +      {                                                                        \

>> +       a[i] = i % 5;                                                   \

>> +       b[i] = i % 7;                                                   \

>> +      }                                                                        \

>> +    test_##TYPE (a, b, 10, 11, 12, 13, N);                             \

>> +    for (int i = 0; i < N; ++i)                                                \

>> +      if (a[i] != 10 + (i & 1) * 2 + (i % 5 == 0 || i % 7 == 3))       \

>> +       __builtin_abort ();                                             \

>> +  }

>> +

>> +int

>> +main (void)

>> +{

>> +  FOR_EACH_TYPE (TEST_LOOP);

>> +  return 0;

>> +}
Richard Biener May 9, 2018, 12:11 p.m. | #3
On Wed, May 9, 2018 at 1:29 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:

>> On Wed, May 9, 2018 at 12:34 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> The SLP unrolling factor is calculated by finding the smallest

>>> scalar type for each SLP statement and taking the number of required

>>> lanes from the vector versions of those scalar types.  E.g. for an

>>> int32->int64 conversion, it's the vector of int32s rather than the

>>> vector of int64s that determines the unroll factor.

>>>

>>> We rely on tree-vect-patterns.c to replace boolean operations like:

>>>

>>>    bool a, b, c;

>>>    a = b & c;

>>>

>>> with integer operations of whatever the best size is in context.

>>> E.g. if b and c are fed by comparisons of ints, a, b and c will become

>>> the appropriate size for an int comparison.  For most targets this means

>>> that a, b and c will end up as int-sized themselves, but on targets like

>>> SVE and AVX512 with packed vector booleans, they'll instead become a

>>> small bitfield like :1, padded to a byte for memory purposes.

>>> The SLP code would then take these scalar types and try to calculate

>>> the vector type for them, causing the unroll factor to be much higher

>>> than necessary.

>>>

>>> This patch makes SLP use the cached vector boolean type if that's

>>> appropriate.  Tested on aarch64-linux-gnu (with and without SVE),

>>> aarch64_be-none-elf and x86_64-linux-gnu.  OK to install?

>>>

>>> Richard

>>>

>>>

>>> 2018-05-09  Richard Sandiford  <richard.sandiford@linaro.org>

>>>

>>> gcc/

>>>         * tree-vect-slp.c (get_vectype_for_smallest_scalar_type): New function.

>>>         (vect_build_slp_tree_1): Use it when calculating the unroll factor.

>>>

>>> gcc/testsuite/

>>>         * gcc.target/aarch64/sve/vcond_10.c: New test.

>>>         * gcc.target/aarch64/sve/vcond_10_run.c: Likewise.

>>>         * gcc.target/aarch64/sve/vcond_11.c: Likewise.

>>>         * gcc.target/aarch64/sve/vcond_11_run.c: Likewise.

>>>

>>> Index: gcc/tree-vect-slp.c

>>> ===================================================================

>>> --- gcc/tree-vect-slp.c 2018-05-08 09:42:03.526648115 +0100

>>> +++ gcc/tree-vect-slp.c 2018-05-09 11:30:41.061096063 +0100

>>> @@ -608,6 +608,41 @@ vect_record_max_nunits (vec_info *vinfo,

>>>    return true;

>>>  }

>>>

>>> +/* Return the vector type associated with the smallest scalar type in STMT.  */

>>> +

>>> +static tree

>>> +get_vectype_for_smallest_scalar_type (gimple *stmt)

>>> +{

>>> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);

>>> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);

>>> +  if (vectype != NULL_TREE

>>> +      && VECTOR_BOOLEAN_TYPE_P (vectype))

>>

>> Hum.  At this point you can't really rely on vector types being set...

>

> Not for everything, but here we only care about the result of the

> pattern replacements, and pattern replacements do set the vector type

> up-front.  vect_determine_vectorization_factor (which runs earlier

> for loop vectorisation) also relies on this.

>

>>> +    {

>>> +      /* The result of a vector boolean operation has the smallest scalar

>>> +        type unless the statement is extending an even narrower boolean.  */

>>> +      if (!gimple_assign_cast_p (stmt))

>>> +       return vectype;

>>> +

>>> +      tree src = gimple_assign_rhs1 (stmt);

>>> +      gimple *def_stmt;

>>> +      enum vect_def_type dt;

>>> +      tree src_vectype = NULL_TREE;

>>> +      if (vect_is_simple_use (src, stmt_info->vinfo, &def_stmt, &dt,

>>> +                             &src_vectype)

>>> +         && src_vectype

>>> +         && VECTOR_BOOLEAN_TYPE_P (src_vectype))

>>> +       {

>>> +         if (TYPE_PRECISION (TREE_TYPE (src_vectype))

>>> +             < TYPE_PRECISION (TREE_TYPE (vectype)))

>>> +           return src_vectype;

>>> +         return vectype;

>>> +       }

>>> +    }

>>> +  HOST_WIDE_INT dummy;

>>> +  tree scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);

>>> +  return get_vectype_for_scalar_type (scalar_type);

>>> +}

>>> +

>>>  /* Verify if the scalar stmts STMTS are isomorphic, require data

>>>     permutation or are of unsupported types of operation.  Return

>>>     true if they are, otherwise return false and indicate in *MATCHES

>>> @@ -636,12 +671,11 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>>>    enum tree_code first_cond_code = ERROR_MARK;

>>>    tree lhs;

>>>    bool need_same_oprnds = false;

>>> -  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;

>>> +  tree vectype = NULL_TREE, first_op1 = NULL_TREE;

>>>    optab optab;

>>>    int icode;

>>>    machine_mode optab_op2_mode;

>>>    machine_mode vec_mode;

>>> -  HOST_WIDE_INT dummy;

>>>    gimple *first_load = NULL, *prev_first_load = NULL;

>>>

>>>    /* For every stmt in NODE find its def stmt/s.  */

>>> @@ -685,15 +719,14 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>>>           return false;

>>>         }

>>>

>>> -      scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);

>>

>> ... so I wonder how this goes wrong here.

>

> It picks the right scalar type, but then we go on to use

> get_vectype_for_scalar_type when get_mask_type_for_scalar_type

> is what we actually want.  The easiest fix for that seemed to use

> the vectype that had already been calculated (also as for

> vect_determine_vectorization_factor).

>

>> I suppose we want to ignore vector booleans for the purpose of max_nunits

>> computation.  So isn't a better fix to simply "ignore" those in

>> vect_get_smallest_scalar_type instead?  I see that for intermediate

>> full-boolean operations like

>>

>>   a = x[i] < 0;

>>   b = y[i] > 0;

>>   tem = a & b;

>>

>> we want to ignore 'tem = a & b' fully here for the purpose of

>> vect_record_max_nunits.  So if scalar_type is a bitfield type

>> then skip it?

>

> Bitfield types will always be the smallest scalar type if they're

> present, so I think in pathological cases this could make us

> incorrectly ignore source operands to a compare.

>

> If we're confident that compares and casts of VECT_SCALAR_BOOLEAN_TYPE_Ps

> never affect the VF or UF then we should probably skip them based on

> that rather than whether the scalar type is a bitfield, so that the

> behaviour is the same for all targets.  It seems a bit dangerous though...


Well, all stmts that have no inherent promotion / demotion have no
effect on the VF
if you also have loads / stores.

One reason I dislike the current way of computing vector types and vectorization
factor is that it tries to do that ad-hoc from looking at stmts
locally instead of
somehow propagating things from sources to sinks -- which would be a requirement
if we ever drop the requirement of same-sized vector types throughout
vectorization...

In fact I wonder if we can get away with recording max_nunits here and delay
SLP_INSTANCE_UNROLLING_FACTOR computation until we compute the actual vector
types.  I think the code is most useful for BB vectorization where we
need to terminate
the SLP when we get to stmts we cannot handle without "unrolling"
(given the vector
size constraint).

Anyhow - I probably dislike your patch most because you add another
get_vectype_for_smallest_scalar_type helper which looks like a hack to me...

How is this issue solved for the non-SLP case?  I do remember that function
computing the VF and/or vector types is quite a mess with vector booleans...

Richard.

> Thanks,

> Richard

>

>>

>> Richard.

>>

>>> -      vectype = get_vectype_for_scalar_type (scalar_type);

>>> +      vectype = get_vectype_for_smallest_scalar_type (stmt);

>>>        if (!vect_record_max_nunits (vinfo, stmt, group_size, vectype,

>>>                                    max_nunits))

>>>         {

>>>           /* Fatal mismatch.  */

>>>           matches[0] = false;

>>> -          return false;

>>> -        }

>>> +         return false;

>>> +       }

>>>

>>>        if (gcall *call_stmt = dyn_cast <gcall *> (stmt))

>>>         {

>>> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c

>>> ===================================================================

>>> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

>>> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c     2018-05-09 11:30:41.057096221 +0100

>>> @@ -0,0 +1,36 @@

>>> +/* { dg-do compile } */

>>> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

>>> +

>>> +#include <stdint.h>

>>> +

>>> +#define DEF_LOOP(TYPE)                                                 \

>>> +  void __attribute__ ((noinline, noclone))                             \

>>> +  test_##TYPE (TYPE *a, TYPE a1, TYPE a2, TYPE a3, TYPE a4, int n)     \

>>> +  {                                                                    \

>>> +    for (int i = 0; i < n; i += 2)                                     \

>>> +      {                                                                        \

>>> +       a[i] = a[i] >= 1 && a[i] != 3 ? a1 : a2;                        \

>>> +       a[i + 1] = a[i + 1] >= 1 && a[i + 1] != 3 ? a3 : a4;            \

>>> +      }                                                                        \

>>> +  }

>>> +

>>> +#define FOR_EACH_TYPE(T) \

>>> +  T (int8_t) \

>>> +  T (uint8_t) \

>>> +  T (int16_t) \

>>> +  T (uint16_t) \

>>> +  T (int32_t) \

>>> +  T (uint32_t) \

>>> +  T (int64_t) \

>>> +  T (uint64_t) \

>>> +  T (_Float16) \

>>> +  T (float) \

>>> +  T (double)

>>> +

>>> +FOR_EACH_TYPE (DEF_LOOP)

>>> +

>>> +/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */

>>> +/* { dg-final { scan-assembler-times {\tld1h\t} 3 } } */

>>> +/* { dg-final { scan-assembler-times {\tld1w\t} 3 } } */

>>> +/* { dg-final { scan-assembler-times {\tld1d\t} 3 } } */

>>> +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 11 } } */

>>> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c

>>> ===================================================================

>>> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

>>> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c 2018-05-09 11:30:41.057096221 +0100

>>> @@ -0,0 +1,24 @@

>>> +/* { dg-do run { target aarch64_sve_hw } } */

>>> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

>>> +

>>> +#include "vcond_10.c"

>>> +

>>> +#define N 133

>>> +

>>> +#define TEST_LOOP(TYPE)                                                        \

>>> +  {                                                                    \

>>> +    TYPE a[N];                                                         \

>>> +    for (int i = 0; i < N; ++i)                                                \

>>> +      a[i] = i % 7;                                                    \

>>> +    test_##TYPE (a, 10, 11, 12, 13, N);                                        \

>>> +    for (int i = 0; i < N; ++i)                                                \

>>> +      if (a[i] != 10 + (i & 1) * 2 + (i % 7 == 0 || i % 7 == 3))       \

>>> +       __builtin_abort ();                                             \

>>> +  }

>>> +

>>> +int

>>> +main (void)

>>> +{

>>> +  FOR_EACH_TYPE (TEST_LOOP);

>>> +  return 0;

>>> +}

>>> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c

>>> ===================================================================

>>> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

>>> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c     2018-05-09 11:30:41.057096221 +0100

>>> @@ -0,0 +1,36 @@

>>> +/* { dg-do compile } */

>>> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

>>> +

>>> +#include <stdint.h>

>>> +

>>> +#define DEF_LOOP(TYPE)                                                 \

>>> +  void __attribute__ ((noinline, noclone))                             \

>>> +  test_##TYPE (int *restrict a, TYPE *restrict b, int a1, int a2,      \

>>> +              int a3, int a4, int n)                                   \

>>> +  {                                                                    \

>>> +    for (int i = 0; i < n; i += 2)                                     \

>>> +      {                                                                        \

>>> +       a[i] = a[i] >= 1 & b[i] != 3 ? a1 : a2;                         \

>>> +       a[i + 1] = a[i + 1] >= 1 & b[i + 1] != 3 ? a3 : a4;             \

>>> +      }                                                                        \

>>> +  }

>>> +

>>> +#define FOR_EACH_TYPE(T) \

>>> +  T (int8_t) \

>>> +  T (uint8_t) \

>>> +  T (int16_t) \

>>> +  T (uint16_t) \

>>> +  T (int64_t) \

>>> +  T (uint64_t) \

>>> +  T (double)

>>> +

>>> +FOR_EACH_TYPE (DEF_LOOP)

>>> +

>>> +/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */

>>> +/* { dg-final { scan-assembler-times {\tld1h\t} 2 } } */

>>> +/* 4 for each 8-bit function, 2 for each 16-bit function, 1 for

>>> +   each 64-bit function.  */

>>> +/* { dg-final { scan-assembler-times {\tld1w\t} 15 } } */

>>> +/* 3 64-bit functions * 2 64-bit vectors per 32-bit vector.  */

>>> +/* { dg-final { scan-assembler-times {\tld1d\t} 6 } } */

>>> +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 15 } } */

>>> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11_run.c

>>> ===================================================================

>>> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

>>> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11_run.c 2018-05-09 11:30:41.059096142 +0100

>>> @@ -0,0 +1,28 @@

>>> +/* { dg-do run { target aarch64_sve_hw } } */

>>> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

>>> +

>>> +#include "vcond_11.c"

>>> +

>>> +#define N 133

>>> +

>>> +#define TEST_LOOP(TYPE)                                                        \

>>> +  {                                                                    \

>>> +    int a[N];                                                          \

>>> +    TYPE b[N];                                                         \

>>> +    for (int i = 0; i < N; ++i)                                                \

>>> +      {                                                                        \

>>> +       a[i] = i % 5;                                                   \

>>> +       b[i] = i % 7;                                                   \

>>> +      }                                                                        \

>>> +    test_##TYPE (a, b, 10, 11, 12, 13, N);                             \

>>> +    for (int i = 0; i < N; ++i)                                                \

>>> +      if (a[i] != 10 + (i & 1) * 2 + (i % 5 == 0 || i % 7 == 3))       \

>>> +       __builtin_abort ();                                             \

>>> +  }

>>> +

>>> +int

>>> +main (void)

>>> +{

>>> +  FOR_EACH_TYPE (TEST_LOOP);

>>> +  return 0;

>>> +}
Richard Sandiford May 10, 2018, 6:31 a.m. | #4
Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, May 9, 2018 at 1:29 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> Richard Biener <richard.guenther@gmail.com> writes:

>>> On Wed, May 9, 2018 at 12:34 PM, Richard Sandiford

>>> <richard.sandiford@linaro.org> wrote:

>>>> The SLP unrolling factor is calculated by finding the smallest

>>>> scalar type for each SLP statement and taking the number of required

>>>> lanes from the vector versions of those scalar types.  E.g. for an

>>>> int32->int64 conversion, it's the vector of int32s rather than the

>>>> vector of int64s that determines the unroll factor.

>>>>

>>>> We rely on tree-vect-patterns.c to replace boolean operations like:

>>>>

>>>>    bool a, b, c;

>>>>    a = b & c;

>>>>

>>>> with integer operations of whatever the best size is in context.

>>>> E.g. if b and c are fed by comparisons of ints, a, b and c will become

>>>> the appropriate size for an int comparison.  For most targets this means

>>>> that a, b and c will end up as int-sized themselves, but on targets like

>>>> SVE and AVX512 with packed vector booleans, they'll instead become a

>>>> small bitfield like :1, padded to a byte for memory purposes.

>>>> The SLP code would then take these scalar types and try to calculate

>>>> the vector type for them, causing the unroll factor to be much higher

>>>> than necessary.

>>>>

>>>> This patch makes SLP use the cached vector boolean type if that's

>>>> appropriate.  Tested on aarch64-linux-gnu (with and without SVE),

>>>> aarch64_be-none-elf and x86_64-linux-gnu.  OK to install?

>>>>

>>>> Richard

>>>>

>>>>

>>>> 2018-05-09  Richard Sandiford  <richard.sandiford@linaro.org>

>>>>

>>>> gcc/

>>>>         * tree-vect-slp.c (get_vectype_for_smallest_scalar_type): New function.

>>>>         (vect_build_slp_tree_1): Use it when calculating the unroll factor.

>>>>

>>>> gcc/testsuite/

>>>>         * gcc.target/aarch64/sve/vcond_10.c: New test.

>>>>         * gcc.target/aarch64/sve/vcond_10_run.c: Likewise.

>>>>         * gcc.target/aarch64/sve/vcond_11.c: Likewise.

>>>>         * gcc.target/aarch64/sve/vcond_11_run.c: Likewise.

>>>>

>>>> Index: gcc/tree-vect-slp.c

>>>> ===================================================================

>>>> --- gcc/tree-vect-slp.c 2018-05-08 09:42:03.526648115 +0100

>>>> +++ gcc/tree-vect-slp.c 2018-05-09 11:30:41.061096063 +0100

>>>> @@ -608,6 +608,41 @@ vect_record_max_nunits (vec_info *vinfo,

>>>>    return true;

>>>>  }

>>>>

>>>> +/* Return the vector type associated with the smallest scalar type in STMT.  */

>>>> +

>>>> +static tree

>>>> +get_vectype_for_smallest_scalar_type (gimple *stmt)

>>>> +{

>>>> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);

>>>> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);

>>>> +  if (vectype != NULL_TREE

>>>> +      && VECTOR_BOOLEAN_TYPE_P (vectype))

>>>

>>> Hum.  At this point you can't really rely on vector types being set...

>>

>> Not for everything, but here we only care about the result of the

>> pattern replacements, and pattern replacements do set the vector type

>> up-front.  vect_determine_vectorization_factor (which runs earlier

>> for loop vectorisation) also relies on this.

>>

>>>> +    {

>>>> +      /* The result of a vector boolean operation has the smallest scalar

>>>> +        type unless the statement is extending an even narrower boolean.  */

>>>> +      if (!gimple_assign_cast_p (stmt))

>>>> +       return vectype;

>>>> +

>>>> +      tree src = gimple_assign_rhs1 (stmt);

>>>> +      gimple *def_stmt;

>>>> +      enum vect_def_type dt;

>>>> +      tree src_vectype = NULL_TREE;

>>>> +      if (vect_is_simple_use (src, stmt_info->vinfo, &def_stmt, &dt,

>>>> +                             &src_vectype)

>>>> +         && src_vectype

>>>> +         && VECTOR_BOOLEAN_TYPE_P (src_vectype))

>>>> +       {

>>>> +         if (TYPE_PRECISION (TREE_TYPE (src_vectype))

>>>> +             < TYPE_PRECISION (TREE_TYPE (vectype)))

>>>> +           return src_vectype;

>>>> +         return vectype;

>>>> +       }

>>>> +    }

>>>> +  HOST_WIDE_INT dummy;

>>>> +  tree scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);

>>>> +  return get_vectype_for_scalar_type (scalar_type);

>>>> +}

>>>> +

>>>>  /* Verify if the scalar stmts STMTS are isomorphic, require data

>>>>     permutation or are of unsupported types of operation.  Return

>>>>     true if they are, otherwise return false and indicate in *MATCHES

>>>> @@ -636,12 +671,11 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>>>>    enum tree_code first_cond_code = ERROR_MARK;

>>>>    tree lhs;

>>>>    bool need_same_oprnds = false;

>>>> -  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;

>>>> +  tree vectype = NULL_TREE, first_op1 = NULL_TREE;

>>>>    optab optab;

>>>>    int icode;

>>>>    machine_mode optab_op2_mode;

>>>>    machine_mode vec_mode;

>>>> -  HOST_WIDE_INT dummy;

>>>>    gimple *first_load = NULL, *prev_first_load = NULL;

>>>>

>>>>    /* For every stmt in NODE find its def stmt/s.  */

>>>> @@ -685,15 +719,14 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>>>>           return false;

>>>>         }

>>>>

>>>> -      scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);

>>>

>>> ... so I wonder how this goes wrong here.

>>

>> It picks the right scalar type, but then we go on to use

>> get_vectype_for_scalar_type when get_mask_type_for_scalar_type

>> is what we actually want.  The easiest fix for that seemed to use

>> the vectype that had already been calculated (also as for

>> vect_determine_vectorization_factor).

>>

>>> I suppose we want to ignore vector booleans for the purpose of max_nunits

>>> computation.  So isn't a better fix to simply "ignore" those in

>>> vect_get_smallest_scalar_type instead?  I see that for intermediate

>>> full-boolean operations like

>>>

>>>   a = x[i] < 0;

>>>   b = y[i] > 0;

>>>   tem = a & b;

>>>

>>> we want to ignore 'tem = a & b' fully here for the purpose of

>>> vect_record_max_nunits.  So if scalar_type is a bitfield type

>>> then skip it?

>>

>> Bitfield types will always be the smallest scalar type if they're

>> present, so I think in pathological cases this could make us

>> incorrectly ignore source operands to a compare.

>>

>> If we're confident that compares and casts of VECT_SCALAR_BOOLEAN_TYPE_Ps

>> never affect the VF or UF then we should probably skip them based on

>> that rather than whether the scalar type is a bitfield, so that the

>> behaviour is the same for all targets.  It seems a bit dangerous though...

>

> Well, all stmts that have no inherent promotion / demotion have no

> effect on the VF

> if you also have loads / stores.

>

> One reason I dislike the current way of computing vector types and vectorization

> factor is that it tries to do that ad-hoc from looking at stmts

> locally instead of

> somehow propagating things from sources to sinks -- which would be a requirement

> if we ever drop the requirement of same-sized vector types throughout

> vectorization...


Yeah.  This patch was just supposed to be a point improvement rather
than perfection.

> In fact I wonder if we can get away with recording max_nunits here and delay

> SLP_INSTANCE_UNROLLING_FACTOR computation until we compute the actual vector

> types.  I think the code is most useful for BB vectorization where we

> need to terminate

> the SLP when we get to stmts we cannot handle without "unrolling"

> (given the vector

> size constraint).


Part of the problem is that vect_build_slp_tree_1 also uses the vector
type to choose between shifts by vectors and shifts by scalars, and to
test whether two-operand permutes are valid.  So as things stand I think
we do need to know the vector type at some level here, even though those
two cases aren't interesting for booleans.

> Anyhow - I probably dislike your patch most because you add another

> get_vectype_for_smallest_scalar_type helper which looks like a hack to me...

>

> How is this issue solved for the non-SLP case?  I do remember that function

> computing the VF and/or vector types is quite a mess with vector booleans...


OK, for the purposes of fixing this bug, would it be OK to split out
the code in vect_determine_vectorization_factor that computes the
vector types and reuse it in SLP, even though I don't think either
of us like the way it's done?  At least that way there's only one
place to change in future.

This patch does that.  I tweaked a couple of the comments and
added a couple more dump lines, but otherwise the code in
vect_get_vector_types_for_stmt and vect_get_mask_type_for_stmt
is the same as the original.

Tested as before.

Thanks,
Richard


2018-05-10  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* tree-vectorizer.h (vect_get_vector_types_for_stmt): Declare.
	(vect_get_mask_type_for_stmt): Likewise.
	* tree-vect-slp.c (vect_two_operations_perm_ok_p): New function,
	split out from...
	(vect_build_slp_tree_1): ...here.  Use vect_get_vector_types_for_stmt
	to determine the statement's vector type and the vector type that
	should be used for calculating nunits.  Deal with cases in which
	the type has to be deferred.
	(vect_slp_analyze_node_operations): Use vect_get_vector_types_for_stmt
	and vect_get_mask_type_for_stmt to calculate STMT_VINFO_VECTYPE.
	* tree-vect-loop.c (vect_determine_vf_for_stmt_1)
	(vect_determine_vf_for_stmt): New functions, split out from...
	(vect_determine_vectorization_factor): ...here.
	* tree-vect-stmts.c (vect_get_vector_types_for_stmt)
	(vect_get_mask_type_for_stmt): New functions, split out from
	vect_determine_vectorization_factor.

gcc/testsuite/
	* gcc.target/aarch64/sve/vcond_10.c: New test.
	* gcc.target/aarch64/sve/vcond_10_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_11.c: Likewise.
	* gcc.target/aarch64/sve/vcond_11_run.c: Likewise.

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2018-05-10 07:18:12.104514856 +0100
+++ gcc/tree-vectorizer.h	2018-05-10 07:18:12.322505512 +0100
@@ -1467,6 +1467,8 @@ extern tree vect_gen_perm_mask_checked (
 extern void optimize_mask_stores (struct loop*);
 extern gcall *vect_gen_while (tree, tree, tree);
 extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
+extern bool vect_get_vector_types_for_stmt (stmt_vec_info, tree *, tree *);
+extern tree vect_get_mask_type_for_stmt (stmt_vec_info);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2018-05-10 07:18:12.104514856 +0100
+++ gcc/tree-vect-slp.c	2018-05-10 07:18:12.321505555 +0100
@@ -608,6 +608,33 @@ vect_record_max_nunits (vec_info *vinfo,
   return true;
 }
 
+/* STMTS is a group of GROUP_SIZE SLP statements in which some
+   statements do the same operation as the first statement and in which
+   the others do ALT_STMT_CODE.  Return true if we can take one vector
+   of the first operation and one vector of the second and permute them
+   to get the required result.  VECTYPE is the type of the vector that
+   would be permuted.  */
+
+static bool
+vect_two_operations_perm_ok_p (vec<gimple *> stmts, unsigned int group_size,
+			       tree vectype, tree_code alt_stmt_code)
+{
+  unsigned HOST_WIDE_INT count;
+  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&count))
+    return false;
+
+  vec_perm_builder sel (count, count, 1);
+  for (unsigned int i = 0; i < count; ++i)
+    {
+      unsigned int elt = i;
+      if (gimple_assign_rhs_code (stmts[i % group_size]) == alt_stmt_code)
+	elt += count;
+      sel.quick_push (elt);
+    }
+  vec_perm_indices indices (sel, 2, count);
+  return can_vec_perm_const_p (TYPE_MODE (vectype), indices);
+}
+
 /* Verify if the scalar stmts STMTS are isomorphic, require data
    permutation or are of unsupported types of operation.  Return
    true if they are, otherwise return false and indicate in *MATCHES
@@ -636,17 +663,17 @@ vect_build_slp_tree_1 (vec_info *vinfo,
   enum tree_code first_cond_code = ERROR_MARK;
   tree lhs;
   bool need_same_oprnds = false;
-  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;
+  tree vectype = NULL_TREE, first_op1 = NULL_TREE;
   optab optab;
   int icode;
   machine_mode optab_op2_mode;
   machine_mode vec_mode;
-  HOST_WIDE_INT dummy;
   gimple *first_load = NULL, *prev_first_load = NULL;
 
   /* For every stmt in NODE find its def stmt/s.  */
   FOR_EACH_VEC_ELT (stmts, i, stmt)
     {
+      stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
       swap[i] = 0;
       matches[i] = false;
 
@@ -685,15 +712,19 @@ vect_build_slp_tree_1 (vec_info *vinfo,
 	  return false;
 	}
 
-      scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);
-      vectype = get_vectype_for_scalar_type (scalar_type);
-      if (!vect_record_max_nunits (vinfo, stmt, group_size, vectype,
-				   max_nunits))
+      tree nunits_vectype;
+      if (!vect_get_vector_types_for_stmt (stmt_info, &vectype,
+					   &nunits_vectype)
+	  || (nunits_vectype
+	      && !vect_record_max_nunits (vinfo, stmt, group_size,
+					  nunits_vectype, max_nunits)))
 	{
 	  /* Fatal mismatch.  */
 	  matches[0] = false;
-          return false;
-        }
+	  return false;
+	}
+
+      gcc_assert (vectype);
 
       if (gcall *call_stmt = dyn_cast <gcall *> (stmt))
 	{
@@ -730,6 +761,17 @@ vect_build_slp_tree_1 (vec_info *vinfo,
 	      || rhs_code == LROTATE_EXPR
 	      || rhs_code == RROTATE_EXPR)
 	    {
+	      if (vectype == boolean_type_node)
+		{
+		  if (dump_enabled_p ())
+		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				     "Build SLP failed: shift of a"
+				     " boolean.\n");
+		  /* Fatal mismatch.  */
+		  matches[0] = false;
+		  return false;
+		}
+
 	      vec_mode = TYPE_MODE (vectype);
 
 	      /* First see if we have a vector/vector shift.  */
@@ -973,29 +1015,12 @@ vect_build_slp_tree_1 (vec_info *vinfo,
 
   /* If we allowed a two-operation SLP node verify the target can cope
      with the permute we are going to use.  */
-  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   if (alt_stmt_code != ERROR_MARK
       && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)
     {
-      unsigned HOST_WIDE_INT count;
-      if (!nunits.is_constant (&count))
-	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			     "Build SLP failed: different operations "
-			     "not allowed with variable-length SLP.\n");
-	  return false;
-	}
-      vec_perm_builder sel (count, count, 1);
-      for (i = 0; i < count; ++i)
-	{
-	  unsigned int elt = i;
-	  if (gimple_assign_rhs_code (stmts[i % group_size]) == alt_stmt_code)
-	    elt += count;
-	  sel.quick_push (elt);
-	}
-      vec_perm_indices indices (sel, 2, count);
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
+      if (vectype == boolean_type_node
+	  || !vect_two_operations_perm_ok_p (stmts, group_size,
+					     vectype, alt_stmt_code))
 	{
 	  for (i = 0; i < group_size; ++i)
 	    if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code)
@@ -2759,36 +2784,18 @@ vect_slp_analyze_node_operations (vec_in
   if (bb_vinfo
       && ! STMT_VINFO_DATA_REF (stmt_info))
     {
-      gcc_assert (PURE_SLP_STMT (stmt_info));
-
-      tree scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
-      if (dump_enabled_p ())
-	{
-	  dump_printf_loc (MSG_NOTE, vect_location,
-			   "get vectype for scalar type:  ");
-	  dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);
-	  dump_printf (MSG_NOTE, "\n");
-	}
-
-      tree vectype = get_vectype_for_scalar_type (scalar_type);
-      if (!vectype)
-	{
-	  if (dump_enabled_p ())
-	    {
-	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			       "not SLPed: unsupported data-type ");
-	      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
-				 scalar_type);
-	      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
-	    }
-	  return false;
-	}
-
-      if (dump_enabled_p ())
-	{
-	  dump_printf_loc (MSG_NOTE, vect_location, "vectype:  ");
-	  dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype);
-	  dump_printf (MSG_NOTE, "\n");
+      tree vectype, nunits_vectype;
+      if (!vect_get_vector_types_for_stmt (stmt_info, &vectype,
+					   &nunits_vectype))
+	/* We checked this when building the node.  */
+	gcc_unreachable ();
+      if (vectype == boolean_type_node)
+	{
+	  vectype = vect_get_mask_type_for_stmt (stmt_info);
+	  if (!vectype)
+	    /* vect_get_mask_type_for_stmt has already explained the
+	       failure.  */
+	    return false;
 	}
 
       gimple *sstmt;
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2018-05-10 07:18:12.104514856 +0100
+++ gcc/tree-vect-loop.c	2018-05-10 07:18:12.320505598 +0100
@@ -155,6 +155,108 @@ Software Foundation; either version 3, o
 
 static void vect_estimate_min_profitable_iters (loop_vec_info, int *, int *);
 
+/* Subroutine of vect_determine_vf_for_stmt that handles only one
+   statement.  VECTYPE_MAYBE_SET_P is true if STMT_VINFO_VECTYPE
+   may already be set for general statements (not just data refs).  */
+
+static bool
+vect_determine_vf_for_stmt_1 (stmt_vec_info stmt_info,
+			      bool vectype_maybe_set_p,
+			      poly_uint64 *vf,
+			      vec<stmt_vec_info > *mask_producers)
+{
+  gimple *stmt = stmt_info->stmt;
+
+  if ((!STMT_VINFO_RELEVANT_P (stmt_info)
+       && !STMT_VINFO_LIVE_P (stmt_info))
+      || gimple_clobber_p (stmt))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location, "skip.\n");
+      return true;
+    }
+
+  tree stmt_vectype, nunits_vectype;
+  if (!vect_get_vector_types_for_stmt (stmt_info, &stmt_vectype,
+				       &nunits_vectype))
+    return false;
+
+  if (stmt_vectype)
+    {
+      if (STMT_VINFO_VECTYPE (stmt_info))
+	/* The only case when a vectype had been already set is for stmts
+	   that contain a data ref, or for "pattern-stmts" (stmts generated
+	   by the vectorizer to represent/replace a certain idiom).  */
+	gcc_assert ((STMT_VINFO_DATA_REF (stmt_info)
+		     || vectype_maybe_set_p)
+		    && STMT_VINFO_VECTYPE (stmt_info) == stmt_vectype);
+      else if (stmt_vectype == boolean_type_node)
+	mask_producers->safe_push (stmt_info);
+      else
+	STMT_VINFO_VECTYPE (stmt_info) = stmt_vectype;
+    }
+
+  if (nunits_vectype)
+    vect_update_max_nunits (vf, nunits_vectype);
+
+  return true;
+}
+
+/* Subroutine of vect_determine_vectorization_factor.  Set the vector
+   types of STMT_INFO and all attached pattern statements and update
+   the vectorization factor VF accordingly.  If some of the statements
+   produce a mask result whose vector type can only be calculated later,
+   add them to MASK_PRODUCERS.  Return true on success or false if
+   something prevented vectorization.  */
+
+static bool
+vect_determine_vf_for_stmt (stmt_vec_info stmt_info, poly_uint64 *vf,
+			    vec<stmt_vec_info > *mask_producers)
+{
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location, "==> examining statement: ");
+      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt_info->stmt, 0);
+    }
+  if (!vect_determine_vf_for_stmt_1 (stmt_info, false, vf, mask_producers))
+    return false;
+
+  if (STMT_VINFO_IN_PATTERN_P (stmt_info)
+      && STMT_VINFO_RELATED_STMT (stmt_info))
+    {
+      stmt_info = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (stmt_info));
+
+      /* If a pattern statement has def stmts, analyze them too.  */
+      gimple *pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info);
+      for (gimple_stmt_iterator si = gsi_start (pattern_def_seq);
+	   !gsi_end_p (si); gsi_next (&si))
+	{
+	  stmt_vec_info def_stmt_info = vinfo_for_stmt (gsi_stmt (si));
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "==> examining pattern def stmt: ");
+	      dump_gimple_stmt (MSG_NOTE, TDF_SLIM,
+				def_stmt_info->stmt, 0);
+	    }
+	  if (!vect_determine_vf_for_stmt_1 (def_stmt_info, true,
+					     vf, mask_producers))
+	    return false;
+	}
+
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "==> examining pattern statement: ");
+	  dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt_info->stmt, 0);
+	}
+      if (!vect_determine_vf_for_stmt_1 (stmt_info, true, vf, mask_producers))
+	return false;
+    }
+
+  return true;
+}
+
 /* Function vect_determine_vectorization_factor
 
    Determine the vectorization factor (VF).  VF is the number of data elements
@@ -192,12 +294,6 @@ vect_determine_vectorization_factor (loo
   tree vectype;
   stmt_vec_info stmt_info;
   unsigned i;
-  HOST_WIDE_INT dummy;
-  gimple *stmt, *pattern_stmt = NULL;
-  gimple_seq pattern_def_seq = NULL;
-  gimple_stmt_iterator pattern_def_si = gsi_none ();
-  bool analyze_pattern_stmt = false;
-  bool bool_result;
   auto_vec<stmt_vec_info> mask_producers;
 
   if (dump_enabled_p ())
@@ -269,304 +365,13 @@ vect_determine_vectorization_factor (loo
 	    }
 	}
 
-      for (gimple_stmt_iterator si = gsi_start_bb (bb);
-	   !gsi_end_p (si) || analyze_pattern_stmt;)
-        {
-          tree vf_vectype;
-
-          if (analyze_pattern_stmt)
-	    stmt = pattern_stmt;
-          else
-            stmt = gsi_stmt (si);
-
-          stmt_info = vinfo_for_stmt (stmt);
-
-	  if (dump_enabled_p ())
-	    {
-	      dump_printf_loc (MSG_NOTE, vect_location,
-                               "==> examining statement: ");
-	      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
-	    }
-
-	  gcc_assert (stmt_info);
-
-	  /* Skip stmts which do not need to be vectorized.  */
-	  if ((!STMT_VINFO_RELEVANT_P (stmt_info)
-	       && !STMT_VINFO_LIVE_P (stmt_info))
-	      || gimple_clobber_p (stmt))
-            {
-              if (STMT_VINFO_IN_PATTERN_P (stmt_info)
-                  && (pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info))
-                  && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_stmt))
-                      || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_stmt))))
-                {
-                  stmt = pattern_stmt;
-                  stmt_info = vinfo_for_stmt (pattern_stmt);
-                  if (dump_enabled_p ())
-                    {
-                      dump_printf_loc (MSG_NOTE, vect_location,
-                                       "==> examining pattern statement: ");
-                      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);
-                    }
-                }
-              else
-	        {
-	          if (dump_enabled_p ())
-	            dump_printf_loc (MSG_NOTE, vect_location, "skip.\n");
-                  gsi_next (&si);
-	          continue;
-                }
-	    }
-          else if (STMT_VINFO_IN_PATTERN_P (stmt_info)
-                   && (pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info))
-                   && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt (pattern_stmt))
-                       || STMT_VINFO_LIVE_P (vinfo_for_stmt (pattern_stmt))))
-            analyze_pattern_stmt = true;
-
-	  /* If a pattern statement has def stmts, analyze them too.  */
-	  if (is_pattern_stmt_p (stmt_info))
-	    {
-	      if (pattern_def_seq == NULL)
-		{
-		  pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info);
-		  pattern_def_si = gsi_start (pattern_def_seq);
-		}
-	      else if (!gsi_end_p (pattern_def_si))
-		gsi_next (&pattern_def_si);
-	      if (pattern_def_seq != NULL)
-		{
-		  gimple *pattern_def_stmt = NULL;
-		  stmt_vec_info pattern_def_stmt_info = NULL;
-
-		  while (!gsi_end_p (pattern_def_si))
-		    {
-		      pattern_def_stmt = gsi_stmt (pattern_def_si);
-		      pattern_def_stmt_info
-			= vinfo_for_stmt (pattern_def_stmt);
-		      if (STMT_VINFO_RELEVANT_P (pattern_def_stmt_info)
-			  || STMT_VINFO_LIVE_P (pattern_def_stmt_info))
-			break;
-		      gsi_next (&pattern_def_si);
-		    }
-
-		  if (!gsi_end_p (pattern_def_si))
-		    {
-		      if (dump_enabled_p ())
-			{
-			  dump_printf_loc (MSG_NOTE, vect_location,
-                                           "==> examining pattern def stmt: ");
-			  dump_gimple_stmt (MSG_NOTE, TDF_SLIM,
-                                            pattern_def_stmt, 0);
-			}
-
-		      stmt = pattern_def_stmt;
-		      stmt_info = pattern_def_stmt_info;
-		    }
-		  else
-		    {
-		      pattern_def_si = gsi_none ();
-		      analyze_pattern_stmt = false;
-		    }
-		}
-	      else
-		analyze_pattern_stmt = false;
-	    }
-
-	  if (gimple_get_lhs (stmt) == NULL_TREE
-	      /* MASK_STORE has no lhs, but is ok.  */
-	      && (!is_gimple_call (stmt)
-		  || !gimple_call_internal_p (stmt)
-		  || gimple_call_internal_fn (stmt) != IFN_MASK_STORE))
-	    {
-	      if (is_gimple_call (stmt))
-		{
-		  /* Ignore calls with no lhs.  These must be calls to
-		     #pragma omp simd functions, and what vectorization factor
-		     it really needs can't be determined until
-		     vectorizable_simd_clone_call.  */
-		  if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
-		    {
-		      pattern_def_seq = NULL;
-		      gsi_next (&si);
-		    }
-		  continue;
-		}
-	      if (dump_enabled_p ())
-		{
-	          dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                                   "not vectorized: irregular stmt.");
-		  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM, stmt,
-                                    0);
-		}
-	      return false;
-	    }
-
-	  if (VECTOR_MODE_P (TYPE_MODE (gimple_expr_type (stmt))))
-	    {
-	      if (dump_enabled_p ())
-	        {
-	          dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                                   "not vectorized: vector stmt in loop:");
-	          dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
-	        }
-	      return false;
-	    }
-
-	  bool_result = false;
-
-	  if (STMT_VINFO_VECTYPE (stmt_info))
-	    {
-	      /* The only case when a vectype had been already set is for stmts
-	         that contain a dataref, or for "pattern-stmts" (stmts
-		 generated by the vectorizer to represent/replace a certain
-		 idiom).  */
-	      gcc_assert (STMT_VINFO_DATA_REF (stmt_info)
-			  || is_pattern_stmt_p (stmt_info)
-			  || !gsi_end_p (pattern_def_si));
-	      vectype = STMT_VINFO_VECTYPE (stmt_info);
-	    }
-	  else
-	    {
-	      gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));
-	      if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
-		scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
-	      else
-		scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
-
-	      /* Bool ops don't participate in vectorization factor
-		 computation.  For comparison use compared types to
-		 compute a factor.  */
-	      if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)
-		  && is_gimple_assign (stmt)
-		  && gimple_assign_rhs_code (stmt) != COND_EXPR)
-		{
-		  if (STMT_VINFO_RELEVANT_P (stmt_info)
-		      || STMT_VINFO_LIVE_P (stmt_info))
-		    mask_producers.safe_push (stmt_info);
-		  bool_result = true;
-
-		  if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))
-		      == tcc_comparison
-		      && !VECT_SCALAR_BOOLEAN_TYPE_P
-			    (TREE_TYPE (gimple_assign_rhs1 (stmt))))
-		    scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
-		  else
-		    {
-		      if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
-			{
-			  pattern_def_seq = NULL;
-			  gsi_next (&si);
-			}
-		      continue;
-		    }
-		}
-
-	      if (dump_enabled_p ())
-		{
-		  dump_printf_loc (MSG_NOTE, vect_location,
-                                   "get vectype for scalar type:  ");
-		  dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);
-                  dump_printf (MSG_NOTE, "\n");
-		}
-	      vectype = get_vectype_for_scalar_type (scalar_type);
-	      if (!vectype)
-		{
-		  if (dump_enabled_p ())
-		    {
-		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                                       "not vectorized: unsupported "
-                                       "data-type ");
-		      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
-                                         scalar_type);
-                      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
-		    }
-		  return false;
-		}
-
-	      if (!bool_result)
-		STMT_VINFO_VECTYPE (stmt_info) = vectype;
-
-	      if (dump_enabled_p ())
-		{
-		  dump_printf_loc (MSG_NOTE, vect_location, "vectype: ");
-		  dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype);
-                  dump_printf (MSG_NOTE, "\n");
-		}
-            }
-
-	  /* Don't try to compute VF out scalar types if we stmt
-	     produces boolean vector.  Use result vectype instead.  */
-	  if (VECTOR_BOOLEAN_TYPE_P (vectype))
-	    vf_vectype = vectype;
-	  else
-	    {
-	      /* The vectorization factor is according to the smallest
-		 scalar type (or the largest vector size, but we only
-		 support one vector size per loop).  */
-	      if (!bool_result)
-		scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,
-							     &dummy);
-	      if (dump_enabled_p ())
-		{
-		  dump_printf_loc (MSG_NOTE, vect_location,
-				   "get vectype for scalar type:  ");
-		  dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);
-		  dump_printf (MSG_NOTE, "\n");
-		}
-	      vf_vectype = get_vectype_for_scalar_type (scalar_type);
-	    }
-	  if (!vf_vectype)
-	    {
-	      if (dump_enabled_p ())
-		{
-		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                                   "not vectorized: unsupported data-type ");
-		  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
-                                     scalar_type);
-                  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
-		}
-	      return false;
-	    }
-
-	  if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),
-			GET_MODE_SIZE (TYPE_MODE (vf_vectype))))
-	    {
-	      if (dump_enabled_p ())
-		{
-		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-                                   "not vectorized: different sized vector "
-                                   "types in statement, ");
-		  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
-                                     vectype);
-		  dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
-		  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
-                                     vf_vectype);
-                  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
-		}
-	      return false;
-	    }
-
-	  if (dump_enabled_p ())
-	    {
-	      dump_printf_loc (MSG_NOTE, vect_location, "vectype: ");
-	      dump_generic_expr (MSG_NOTE, TDF_SLIM, vf_vectype);
-              dump_printf (MSG_NOTE, "\n");
-	    }
-
-	  if (dump_enabled_p ())
-	    {
-	      dump_printf_loc (MSG_NOTE, vect_location, "nunits = ");
-	      dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (vf_vectype));
-	      dump_printf (MSG_NOTE, "\n");
-	    }
-
-	  vect_update_max_nunits (&vectorization_factor, vf_vectype);
-
-	  if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
-	    {
-	      pattern_def_seq = NULL;
-	      gsi_next (&si);
-	    }
+      for (gimple_stmt_iterator si = gsi_start_bb (bb); !gsi_end_p (si);
+	   gsi_next (&si))
+	{
+	  stmt_info = vinfo_for_stmt (gsi_stmt (si));
+	  if (!vect_determine_vf_for_stmt (stmt_info, &vectorization_factor,
+					   &mask_producers))
+	    return false;
         }
     }
 
@@ -589,119 +394,11 @@ vect_determine_vectorization_factor (loo
 
   for (i = 0; i < mask_producers.length (); i++)
     {
-      tree mask_type = NULL;
-
-      stmt = STMT_VINFO_STMT (mask_producers[i]);
-
-      if (is_gimple_assign (stmt)
-	  && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
-	  && !VECT_SCALAR_BOOLEAN_TYPE_P
-				      (TREE_TYPE (gimple_assign_rhs1 (stmt))))
-	{
-	  scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
-	  mask_type = get_mask_type_for_scalar_type (scalar_type);
-
-	  if (!mask_type)
-	    {
-	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				 "not vectorized: unsupported mask\n");
-	      return false;
-	    }
-	}
-      else
-	{
-	  tree rhs;
-	  ssa_op_iter iter;
-	  gimple *def_stmt;
-	  enum vect_def_type dt;
-
-	  FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE)
-	    {
-	      if (!vect_is_simple_use (rhs, mask_producers[i]->vinfo,
-				       &def_stmt, &dt, &vectype))
-		{
-		  if (dump_enabled_p ())
-		    {
-		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				       "not vectorized: can't compute mask type "
-				       "for statement, ");
-		      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM, stmt,
-					0);
-		    }
-		  return false;
-		}
-
-	      /* No vectype probably means external definition.
-		 Allow it in case there is another operand which
-		 allows to determine mask type.  */
-	      if (!vectype)
-		continue;
-
-	      if (!mask_type)
-		mask_type = vectype;
-	      else if (maybe_ne (TYPE_VECTOR_SUBPARTS (mask_type),
-				 TYPE_VECTOR_SUBPARTS (vectype)))
-		{
-		  if (dump_enabled_p ())
-		    {
-		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				       "not vectorized: different sized masks "
-				       "types in statement, ");
-		      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
-					 mask_type);
-		      dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
-		      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
-					 vectype);
-		      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
-		    }
-		  return false;
-		}
-	      else if (VECTOR_BOOLEAN_TYPE_P (mask_type)
-		       != VECTOR_BOOLEAN_TYPE_P (vectype))
-		{
-		  if (dump_enabled_p ())
-		    {
-		      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-				       "not vectorized: mixed mask and "
-				       "nonmask vector types in statement, ");
-		      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
-					 mask_type);
-		      dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
-		      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
-					 vectype);
-		      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
-		    }
-		  return false;
-		}
-	    }
-
-	  /* We may compare boolean value loaded as vector of integers.
-	     Fix mask_type in such case.  */
-	  if (mask_type
-	      && !VECTOR_BOOLEAN_TYPE_P (mask_type)
-	      && gimple_code (stmt) == GIMPLE_ASSIGN
-	      && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison)
-	    mask_type = build_same_sized_truth_vector_type (mask_type);
-	}
-
-      /* No mask_type should mean loop invariant predicate.
-	 This is probably a subject for optimization in
-	 if-conversion.  */
+      stmt_info = mask_producers[i];
+      tree mask_type = vect_get_mask_type_for_stmt (stmt_info);
       if (!mask_type)
-	{
-	  if (dump_enabled_p ())
-	    {
-	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			       "not vectorized: can't compute mask type "
-			       "for statement, ");
-	      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM, stmt,
-				0);
-	    }
-	  return false;
-	}
-
-      STMT_VINFO_VECTYPE (mask_producers[i]) = mask_type;
+	return false;
+      STMT_VINFO_VECTYPE (stmt_info) = mask_type;
     }
 
   return true;
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2018-05-10 07:18:12.104514856 +0100
+++ gcc/tree-vect-stmts.c	2018-05-10 07:18:12.322505512 +0100
@@ -10520,3 +10520,311 @@ vect_gen_while_not (gimple_seq *seq, tre
   gimple_seq_add_stmt (seq, call);
   return gimple_build (seq, BIT_NOT_EXPR, mask_type, tmp);
 }
+
+/* Try to compute the vector types required to vectorize STMT_INFO,
+   returning true on success and false if vectorization isn't possible.
+
+   On success:
+
+   - Set *STMT_VECTYPE_OUT to:
+     - NULL_TREE if the statement doesn't need to be vectorized;
+     - boolean_type_node if the statement is a boolean operation whose
+       vector type can only be determined once all the other vector types
+       are known; and
+     - the equivalent of STMT_VINFO_VECTYPE otherwise.
+
+   - Set *NUNITS_VECTYPE_OUT to the vector type that contains the maximum
+     number of units needed to vectorize STMT_INFO, or NULL_TREE if the
+     statement does not help to determine the overall number of units.  */
+
+bool
+vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,
+				tree *stmt_vectype_out,
+				tree *nunits_vectype_out)
+{
+  gimple *stmt = stmt_info->stmt;
+
+  *stmt_vectype_out = NULL_TREE;
+  *nunits_vectype_out = NULL_TREE;
+
+  if (gimple_get_lhs (stmt) == NULL_TREE
+      /* MASK_STORE has no lhs, but is ok.  */
+      && !gimple_call_internal_p (stmt, IFN_MASK_STORE))
+    {
+      if (is_a <gcall *> (stmt))
+	{
+	  /* Ignore calls with no lhs.  These must be calls to
+	     #pragma omp simd functions, and what vectorization factor
+	     it really needs can't be determined until
+	     vectorizable_simd_clone_call.  */
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "defer to SIMD clone analysis.\n");
+	  return true;
+	}
+
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "not vectorized: irregular stmt.");
+	  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+	}
+      return false;
+    }
+
+  if (VECTOR_MODE_P (TYPE_MODE (gimple_expr_type (stmt))))
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "not vectorized: vector stmt in loop:");
+	  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+	}
+      return false;
+    }
+
+  tree vectype;
+  tree scalar_type = NULL_TREE;
+  if (STMT_VINFO_VECTYPE (stmt_info))
+    *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);
+  else
+    {
+      gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));
+      if (gimple_call_internal_p (stmt, IFN_MASK_STORE))
+	scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));
+      else
+	scalar_type = TREE_TYPE (gimple_get_lhs (stmt));
+
+      /* Pure bool ops don't participate in number-of-units computation.
+	 For comparisons use the types being compared.  */
+      if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)
+	  && is_gimple_assign (stmt)
+	  && gimple_assign_rhs_code (stmt) != COND_EXPR)
+	{
+	  *stmt_vectype_out = boolean_type_node;
+
+	  tree rhs1 = gimple_assign_rhs1 (stmt);
+	  if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
+	      && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))
+	    scalar_type = TREE_TYPE (rhs1);
+	  else
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "pure bool operation.\n");
+	      return true;
+	    }
+	}
+
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "get vectype for scalar type:  ");
+	  dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);
+	  dump_printf (MSG_NOTE, "\n");
+	}
+      vectype = get_vectype_for_scalar_type (scalar_type);
+      if (!vectype)
+	{
+	  if (dump_enabled_p ())
+	    {
+	      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			       "not vectorized: unsupported data-type ");
+	      dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+				 scalar_type);
+	      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	    }
+	  return false;
+	}
+
+      if (!*stmt_vectype_out)
+	*stmt_vectype_out = vectype;
+
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location, "vectype: ");
+	  dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype);
+	  dump_printf (MSG_NOTE, "\n");
+	}
+    }
+
+  /* Don't try to compute scalar types if the stmt produces a boolean
+     vector; use the existing vector type instead.  */
+  tree nunits_vectype;
+  if (VECTOR_BOOLEAN_TYPE_P (vectype))
+    nunits_vectype = vectype;
+  else
+    {
+      /* The number of units is set according to the smallest scalar
+	 type (or the largest vector size, but we only support one
+	 vector size per vectorization).  */
+      if (*stmt_vectype_out != boolean_type_node)
+	{
+	  HOST_WIDE_INT dummy;
+	  scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);
+	}
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "get vectype for scalar type:  ");
+	  dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);
+	  dump_printf (MSG_NOTE, "\n");
+	}
+      nunits_vectype = get_vectype_for_scalar_type (scalar_type);
+    }
+  if (!nunits_vectype)
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "not vectorized: unsupported data-type ");
+	  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, scalar_type);
+	  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	}
+      return false;
+    }
+
+  if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),
+		GET_MODE_SIZE (TYPE_MODE (nunits_vectype))))
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "not vectorized: different sized vector "
+			   "types in statement, ");
+	  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, vectype);
+	  dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
+	  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, nunits_vectype);
+	  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	}
+      return false;
+    }
+
+  if (dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_NOTE, vect_location, "vectype: ");
+      dump_generic_expr (MSG_NOTE, TDF_SLIM, nunits_vectype);
+      dump_printf (MSG_NOTE, "\n");
+
+      dump_printf_loc (MSG_NOTE, vect_location, "nunits = ");
+      dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (nunits_vectype));
+      dump_printf (MSG_NOTE, "\n");
+    }
+
+  *nunits_vectype_out = nunits_vectype;
+  return true;
+}
+
+/* Try to determine the correct vector type for STMT_INFO, which is a
+   statement that produces a scalar boolean result.  Return the vector
+   type on success, otherwise return NULL_TREE.  */
+
+tree
+vect_get_mask_type_for_stmt (stmt_vec_info stmt_info)
+{
+  gimple *stmt = stmt_info->stmt;
+  tree mask_type = NULL;
+  tree vectype, scalar_type;
+
+  if (is_gimple_assign (stmt)
+      && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison
+      && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt))))
+    {
+      scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
+      mask_type = get_mask_type_for_scalar_type (scalar_type);
+
+      if (!mask_type)
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "not vectorized: unsupported mask\n");
+	  return NULL_TREE;
+	}
+    }
+  else
+    {
+      tree rhs;
+      ssa_op_iter iter;
+      gimple *def_stmt;
+      enum vect_def_type dt;
+
+      FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE)
+	{
+	  if (!vect_is_simple_use (rhs, stmt_info->vinfo,
+				   &def_stmt, &dt, &vectype))
+	    {
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				   "not vectorized: can't compute mask type "
+				   "for statement, ");
+		  dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt,
+				    0);
+		}
+	      return NULL_TREE;
+	    }
+
+	  /* No vectype probably means external definition.
+	     Allow it in case there is another operand which
+	     allows to determine mask type.  */
+	  if (!vectype)
+	    continue;
+
+	  if (!mask_type)
+	    mask_type = vectype;
+	  else if (maybe_ne (TYPE_VECTOR_SUBPARTS (mask_type),
+			     TYPE_VECTOR_SUBPARTS (vectype)))
+	    {
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				   "not vectorized: different sized masks "
+				   "types in statement, ");
+		  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+				     mask_type);
+		  dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
+		  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+				     vectype);
+		  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+		}
+	      return NULL_TREE;
+	    }
+	  else if (VECTOR_BOOLEAN_TYPE_P (mask_type)
+		   != VECTOR_BOOLEAN_TYPE_P (vectype))
+	    {
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				   "not vectorized: mixed mask and "
+				   "nonmask vector types in statement, ");
+		  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+				     mask_type);
+		  dump_printf (MSG_MISSED_OPTIMIZATION, " and ");
+		  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,
+				     vectype);
+		  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+		}
+	      return NULL_TREE;
+	    }
+	}
+
+      /* We may compare boolean value loaded as vector of integers.
+	 Fix mask_type in such case.  */
+      if (mask_type
+	  && !VECTOR_BOOLEAN_TYPE_P (mask_type)
+	  && gimple_code (stmt) == GIMPLE_ASSIGN
+	  && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison)
+	mask_type = build_same_sized_truth_vector_type (mask_type);
+    }
+
+  /* No mask_type should mean loop invariant predicate.
+     This is probably a subject for optimization in if-conversion.  */
+  if (!mask_type && dump_enabled_p ())
+    {
+      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+		       "not vectorized: can't compute mask type "
+		       "for statement, ");
+      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
+    }
+  return mask_type;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c	2018-05-10 07:18:12.317505726 +0100
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include <stdint.h>
+
+#define DEF_LOOP(TYPE)							\
+  void __attribute__ ((noinline, noclone))				\
+  test_##TYPE (TYPE *a, TYPE a1, TYPE a2, TYPE a3, TYPE a4, int n)	\
+  {									\
+    for (int i = 0; i < n; i += 2)					\
+      {									\
+	a[i] = a[i] >= 1 && a[i] != 3 ? a1 : a2;			\
+	a[i + 1] = a[i + 1] >= 1 && a[i + 1] != 3 ? a3 : a4;		\
+      }									\
+  }
+
+#define FOR_EACH_TYPE(T) \
+  T (int8_t) \
+  T (uint8_t) \
+  T (int16_t) \
+  T (uint16_t) \
+  T (int32_t) \
+  T (uint32_t) \
+  T (int64_t) \
+  T (uint64_t) \
+  T (_Float16) \
+  T (float) \
+  T (double)
+
+FOR_EACH_TYPE (DEF_LOOP)
+
+/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1h\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1w\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1d\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 11 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c	2018-05-10 07:18:12.317505726 +0100
@@ -0,0 +1,24 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include "vcond_10.c"
+
+#define N 133
+
+#define TEST_LOOP(TYPE)							\
+  {									\
+    TYPE a[N];								\
+    for (int i = 0; i < N; ++i)						\
+      a[i] = i % 7;							\
+    test_##TYPE (a, 10, 11, 12, 13, N);					\
+    for (int i = 0; i < N; ++i)						\
+      if (a[i] != 10 + (i & 1) * 2 + (i % 7 == 0 || i % 7 == 3))	\
+	__builtin_abort ();						\
+  }
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (TEST_LOOP);
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c	2018-05-10 07:18:12.317505726 +0100
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include <stdint.h>
+
+#define DEF_LOOP(TYPE)							\
+  void __attribute__ ((noinline, noclone))				\
+  test_##TYPE (int *restrict a, TYPE *restrict b, int a1, int a2,	\
+	       int a3, int a4, int n)					\
+  {									\
+    for (int i = 0; i < n; i += 2)					\
+      {									\
+	a[i] = a[i] >= 1 & b[i] != 3 ? a1 : a2;				\
+	a[i + 1] = a[i + 1] >= 1 & b[i + 1] != 3 ? a3 : a4;		\
+      }									\
+  }
+
+#define FOR_EACH_TYPE(T) \
+  T (int8_t) \
+  T (uint8_t) \
+  T (int16_t) \
+  T (uint16_t) \
+  T (int64_t) \
+  T (uint64_t) \
+  T (double)
+
+FOR_EACH_TYPE (DEF_LOOP)
+
+/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1h\t} 2 } } */
+/* 4 for each 8-bit function, 2 for each 16-bit function, 1 for
+   each 64-bit function.  */
+/* { dg-final { scan-assembler-times {\tld1w\t} 15 } } */
+/* 3 64-bit functions * 2 64-bit vectors per 32-bit vector.  */
+/* { dg-final { scan-assembler-times {\tld1d\t} 6 } } */
+/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 15 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11_run.c	2018-05-10 07:18:12.317505726 +0100
@@ -0,0 +1,28 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include "vcond_11.c"
+
+#define N 133
+
+#define TEST_LOOP(TYPE)							\
+  {									\
+    int a[N];								\
+    TYPE b[N];								\
+    for (int i = 0; i < N; ++i)						\
+      {									\
+	a[i] = i % 5;							\
+	b[i] = i % 7;							\
+      }									\
+    test_##TYPE (a, b, 10, 11, 12, 13, N);				\
+    for (int i = 0; i < N; ++i)						\
+      if (a[i] != 10 + (i & 1) * 2 + (i % 5 == 0 || i % 7 == 3))	\
+	__builtin_abort ();						\
+  }
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (TEST_LOOP);
+  return 0;
+}
Richard Biener May 15, 2018, 10:03 a.m. | #5
On Thu, May 10, 2018 at 8:31 AM Richard Sandiford <
richard.sandiford@linaro.org> wrote:

> Richard Biener <richard.guenther@gmail.com> writes:

> > On Wed, May 9, 2018 at 1:29 PM, Richard Sandiford

> > <richard.sandiford@linaro.org> wrote:

> >> Richard Biener <richard.guenther@gmail.com> writes:

> >>> On Wed, May 9, 2018 at 12:34 PM, Richard Sandiford

> >>> <richard.sandiford@linaro.org> wrote:

> >>>> The SLP unrolling factor is calculated by finding the smallest

> >>>> scalar type for each SLP statement and taking the number of required

> >>>> lanes from the vector versions of those scalar types.  E.g. for an

> >>>> int32->int64 conversion, it's the vector of int32s rather than the

> >>>> vector of int64s that determines the unroll factor.

> >>>>

> >>>> We rely on tree-vect-patterns.c to replace boolean operations like:

> >>>>

> >>>>    bool a, b, c;

> >>>>    a = b & c;

> >>>>

> >>>> with integer operations of whatever the best size is in context.

> >>>> E.g. if b and c are fed by comparisons of ints, a, b and c will

become
> >>>> the appropriate size for an int comparison.  For most targets this

means
> >>>> that a, b and c will end up as int-sized themselves, but on targets

like
> >>>> SVE and AVX512 with packed vector booleans, they'll instead become a

> >>>> small bitfield like :1, padded to a byte for memory purposes.

> >>>> The SLP code would then take these scalar types and try to calculate

> >>>> the vector type for them, causing the unroll factor to be much higher

> >>>> than necessary.

> >>>>

> >>>> This patch makes SLP use the cached vector boolean type if that's

> >>>> appropriate.  Tested on aarch64-linux-gnu (with and without SVE),

> >>>> aarch64_be-none-elf and x86_64-linux-gnu.  OK to install?

> >>>>

> >>>> Richard

> >>>>

> >>>>

> >>>> 2018-05-09  Richard Sandiford  <richard.sandiford@linaro.org>

> >>>>

> >>>> gcc/

> >>>>         * tree-vect-slp.c (get_vectype_for_smallest_scalar_type):

New function.
> >>>>         (vect_build_slp_tree_1): Use it when calculating the unroll

factor.
> >>>>

> >>>> gcc/testsuite/

> >>>>         * gcc.target/aarch64/sve/vcond_10.c: New test.

> >>>>         * gcc.target/aarch64/sve/vcond_10_run.c: Likewise.

> >>>>         * gcc.target/aarch64/sve/vcond_11.c: Likewise.

> >>>>         * gcc.target/aarch64/sve/vcond_11_run.c: Likewise.

> >>>>

> >>>> Index: gcc/tree-vect-slp.c

> >>>> ===================================================================

> >>>> --- gcc/tree-vect-slp.c 2018-05-08 09:42:03.526648115 +0100

> >>>> +++ gcc/tree-vect-slp.c 2018-05-09 11:30:41.061096063 +0100

> >>>> @@ -608,6 +608,41 @@ vect_record_max_nunits (vec_info *vinfo,

> >>>>    return true;

> >>>>  }

> >>>>

> >>>> +/* Return the vector type associated with the smallest scalar type

in STMT.  */
> >>>> +

> >>>> +static tree

> >>>> +get_vectype_for_smallest_scalar_type (gimple *stmt)

> >>>> +{

> >>>> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);

> >>>> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);

> >>>> +  if (vectype != NULL_TREE

> >>>> +      && VECTOR_BOOLEAN_TYPE_P (vectype))

> >>>

> >>> Hum.  At this point you can't really rely on vector types being set...

> >>

> >> Not for everything, but here we only care about the result of the

> >> pattern replacements, and pattern replacements do set the vector type

> >> up-front.  vect_determine_vectorization_factor (which runs earlier

> >> for loop vectorisation) also relies on this.

> >>

> >>>> +    {

> >>>> +      /* The result of a vector boolean operation has the smallest

scalar
> >>>> +        type unless the statement is extending an even narrower

boolean.  */
> >>>> +      if (!gimple_assign_cast_p (stmt))

> >>>> +       return vectype;

> >>>> +

> >>>> +      tree src = gimple_assign_rhs1 (stmt);

> >>>> +      gimple *def_stmt;

> >>>> +      enum vect_def_type dt;

> >>>> +      tree src_vectype = NULL_TREE;

> >>>> +      if (vect_is_simple_use (src, stmt_info->vinfo, &def_stmt, &dt,

> >>>> +                             &src_vectype)

> >>>> +         && src_vectype

> >>>> +         && VECTOR_BOOLEAN_TYPE_P (src_vectype))

> >>>> +       {

> >>>> +         if (TYPE_PRECISION (TREE_TYPE (src_vectype))

> >>>> +             < TYPE_PRECISION (TREE_TYPE (vectype)))

> >>>> +           return src_vectype;

> >>>> +         return vectype;

> >>>> +       }

> >>>> +    }

> >>>> +  HOST_WIDE_INT dummy;

> >>>> +  tree scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,

&dummy);
> >>>> +  return get_vectype_for_scalar_type (scalar_type);

> >>>> +}

> >>>> +

> >>>>  /* Verify if the scalar stmts STMTS are isomorphic, require data

> >>>>     permutation or are of unsupported types of operation.  Return

> >>>>     true if they are, otherwise return false and indicate in *MATCHES

> >>>> @@ -636,12 +671,11 @@ vect_build_slp_tree_1 (vec_info *vinfo,

> >>>>    enum tree_code first_cond_code = ERROR_MARK;

> >>>>    tree lhs;

> >>>>    bool need_same_oprnds = false;

> >>>> -  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;

> >>>> +  tree vectype = NULL_TREE, first_op1 = NULL_TREE;

> >>>>    optab optab;

> >>>>    int icode;

> >>>>    machine_mode optab_op2_mode;

> >>>>    machine_mode vec_mode;

> >>>> -  HOST_WIDE_INT dummy;

> >>>>    gimple *first_load = NULL, *prev_first_load = NULL;

> >>>>

> >>>>    /* For every stmt in NODE find its def stmt/s.  */

> >>>> @@ -685,15 +719,14 @@ vect_build_slp_tree_1 (vec_info *vinfo,

> >>>>           return false;

> >>>>         }

> >>>>

> >>>> -      scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,

&dummy);
> >>>

> >>> ... so I wonder how this goes wrong here.

> >>

> >> It picks the right scalar type, but then we go on to use

> >> get_vectype_for_scalar_type when get_mask_type_for_scalar_type

> >> is what we actually want.  The easiest fix for that seemed to use

> >> the vectype that had already been calculated (also as for

> >> vect_determine_vectorization_factor).

> >>

> >>> I suppose we want to ignore vector booleans for the purpose of

max_nunits
> >>> computation.  So isn't a better fix to simply "ignore" those in

> >>> vect_get_smallest_scalar_type instead?  I see that for intermediate

> >>> full-boolean operations like

> >>>

> >>>   a = x[i] < 0;

> >>>   b = y[i] > 0;

> >>>   tem = a & b;

> >>>

> >>> we want to ignore 'tem = a & b' fully here for the purpose of

> >>> vect_record_max_nunits.  So if scalar_type is a bitfield type

> >>> then skip it?

> >>

> >> Bitfield types will always be the smallest scalar type if they're

> >> present, so I think in pathological cases this could make us

> >> incorrectly ignore source operands to a compare.

> >>

> >> If we're confident that compares and casts of

VECT_SCALAR_BOOLEAN_TYPE_Ps
> >> never affect the VF or UF then we should probably skip them based on

> >> that rather than whether the scalar type is a bitfield, so that the

> >> behaviour is the same for all targets.  It seems a bit dangerous

though...
> >

> > Well, all stmts that have no inherent promotion / demotion have no

> > effect on the VF

> > if you also have loads / stores.

> >

> > One reason I dislike the current way of computing vector types and

vectorization
> > factor is that it tries to do that ad-hoc from looking at stmts

> > locally instead of

> > somehow propagating things from sources to sinks -- which would be a

requirement
> > if we ever drop the requirement of same-sized vector types throughout

> > vectorization...


> Yeah.  This patch was just supposed to be a point improvement rather

> than perfection.


> > In fact I wonder if we can get away with recording max_nunits here and

delay
> > SLP_INSTANCE_UNROLLING_FACTOR computation until we compute the actual

vector
> > types.  I think the code is most useful for BB vectorization where we

> > need to terminate

> > the SLP when we get to stmts we cannot handle without "unrolling"

> > (given the vector

> > size constraint).


> Part of the problem is that vect_build_slp_tree_1 also uses the vector

> type to choose between shifts by vectors and shifts by scalars, and to

> test whether two-operand permutes are valid.  So as things stand I think

> we do need to know the vector type at some level here, even though those

> two cases aren't interesting for booleans.


> > Anyhow - I probably dislike your patch most because you add another

> > get_vectype_for_smallest_scalar_type helper which looks like a hack to

me...
> >

> > How is this issue solved for the non-SLP case?  I do remember that

function
> > computing the VF and/or vector types is quite a mess with vector

booleans...

> OK, for the purposes of fixing this bug, would it be OK to split out

> the code in vect_determine_vectorization_factor that computes the

> vector types and reuse it in SLP, even though I don't think either

> of us like the way it's done?  At least that way there's only one

> place to change in future.


> This patch does that.  I tweaked a couple of the comments and

> added a couple more dump lines, but otherwise the code in

> vect_get_vector_types_for_stmt and vect_get_mask_type_for_stmt

> is the same as the original.


> Tested as before.


Much better - thanks for doing it.  OK for trunk and sorry again for the
delay...

Richard.

> Thanks,

> Richard



> 2018-05-10  Richard Sandiford  <richard.sandiford@linaro.org>


> gcc/

>          * tree-vectorizer.h (vect_get_vector_types_for_stmt): Declare.

>          (vect_get_mask_type_for_stmt): Likewise.

>          * tree-vect-slp.c (vect_two_operations_perm_ok_p): New function,

>          split out from...

>          (vect_build_slp_tree_1): ...here.  Use

vect_get_vector_types_for_stmt
>          to determine the statement's vector type and the vector type that

>          should be used for calculating nunits.  Deal with cases in which

>          the type has to be deferred.

>          (vect_slp_analyze_node_operations): Use

vect_get_vector_types_for_stmt
>          and vect_get_mask_type_for_stmt to calculate STMT_VINFO_VECTYPE.

>          * tree-vect-loop.c (vect_determine_vf_for_stmt_1)

>          (vect_determine_vf_for_stmt): New functions, split out from...

>          (vect_determine_vectorization_factor): ...here.

>          * tree-vect-stmts.c (vect_get_vector_types_for_stmt)

>          (vect_get_mask_type_for_stmt): New functions, split out from

>          vect_determine_vectorization_factor.


> gcc/testsuite/

>          * gcc.target/aarch64/sve/vcond_10.c: New test.

>          * gcc.target/aarch64/sve/vcond_10_run.c: Likewise.

>          * gcc.target/aarch64/sve/vcond_11.c: Likewise.

>          * gcc.target/aarch64/sve/vcond_11_run.c: Likewise.


> Index: gcc/tree-vectorizer.h

> ===================================================================

> --- gcc/tree-vectorizer.h       2018-05-10 07:18:12.104514856 +0100

> +++ gcc/tree-vectorizer.h       2018-05-10 07:18:12.322505512 +0100

> @@ -1467,6 +1467,8 @@ extern tree vect_gen_perm_mask_checked (

>   extern void optimize_mask_stores (struct loop*);

>   extern gcall *vect_gen_while (tree, tree, tree);

>   extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);

> +extern bool vect_get_vector_types_for_stmt (stmt_vec_info, tree *, tree

*);
> +extern tree vect_get_mask_type_for_stmt (stmt_vec_info);


>   /* In tree-vect-data-refs.c.  */

>   extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);

> Index: gcc/tree-vect-slp.c

> ===================================================================

> --- gcc/tree-vect-slp.c 2018-05-10 07:18:12.104514856 +0100

> +++ gcc/tree-vect-slp.c 2018-05-10 07:18:12.321505555 +0100

> @@ -608,6 +608,33 @@ vect_record_max_nunits (vec_info *vinfo,

>     return true;

>   }


> +/* STMTS is a group of GROUP_SIZE SLP statements in which some

> +   statements do the same operation as the first statement and in which

> +   the others do ALT_STMT_CODE.  Return true if we can take one vector

> +   of the first operation and one vector of the second and permute them

> +   to get the required result.  VECTYPE is the type of the vector that

> +   would be permuted.  */

> +

> +static bool

> +vect_two_operations_perm_ok_p (vec<gimple *> stmts, unsigned int

group_size,
> +                              tree vectype, tree_code alt_stmt_code)

> +{

> +  unsigned HOST_WIDE_INT count;

> +  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&count))

> +    return false;

> +

> +  vec_perm_builder sel (count, count, 1);

> +  for (unsigned int i = 0; i < count; ++i)

> +    {

> +      unsigned int elt = i;

> +      if (gimple_assign_rhs_code (stmts[i % group_size]) ==

alt_stmt_code)
> +       elt += count;

> +      sel.quick_push (elt);

> +    }

> +  vec_perm_indices indices (sel, 2, count);

> +  return can_vec_perm_const_p (TYPE_MODE (vectype), indices);

> +}

> +

>   /* Verify if the scalar stmts STMTS are isomorphic, require data

>      permutation or are of unsupported types of operation.  Return

>      true if they are, otherwise return false and indicate in *MATCHES

> @@ -636,17 +663,17 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>     enum tree_code first_cond_code = ERROR_MARK;

>     tree lhs;

>     bool need_same_oprnds = false;

> -  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;

> +  tree vectype = NULL_TREE, first_op1 = NULL_TREE;

>     optab optab;

>     int icode;

>     machine_mode optab_op2_mode;

>     machine_mode vec_mode;

> -  HOST_WIDE_INT dummy;

>     gimple *first_load = NULL, *prev_first_load = NULL;


>     /* For every stmt in NODE find its def stmt/s.  */

>     FOR_EACH_VEC_ELT (stmts, i, stmt)

>       {

> +      stmt_vec_info stmt_info = vinfo_for_stmt (stmt);

>         swap[i] = 0;

>         matches[i] = false;


> @@ -685,15 +712,19 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>            return false;

>          }


> -      scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);

> -      vectype = get_vectype_for_scalar_type (scalar_type);

> -      if (!vect_record_max_nunits (vinfo, stmt, group_size, vectype,

> -                                  max_nunits))

> +      tree nunits_vectype;

> +      if (!vect_get_vector_types_for_stmt (stmt_info, &vectype,

> +                                          &nunits_vectype)

> +         || (nunits_vectype

> +             && !vect_record_max_nunits (vinfo, stmt, group_size,

> +                                         nunits_vectype, max_nunits)))

>          {

>            /* Fatal mismatch.  */

>            matches[0] = false;

> -          return false;

> -        }

> +         return false;

> +       }

> +

> +      gcc_assert (vectype);


>         if (gcall *call_stmt = dyn_cast <gcall *> (stmt))

>          {

> @@ -730,6 +761,17 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>                || rhs_code == LROTATE_EXPR

>                || rhs_code == RROTATE_EXPR)

>              {

> +             if (vectype == boolean_type_node)

> +               {

> +                 if (dump_enabled_p ())

> +                   dump_printf_loc (MSG_MISSED_OPTIMIZATION,

vect_location,
> +                                    "Build SLP failed: shift of a"

> +                                    " boolean.\n");

> +                 /* Fatal mismatch.  */

> +                 matches[0] = false;

> +                 return false;

> +               }

> +

>                vec_mode = TYPE_MODE (vectype);


>                /* First see if we have a vector/vector shift.  */

> @@ -973,29 +1015,12 @@ vect_build_slp_tree_1 (vec_info *vinfo,


>     /* If we allowed a two-operation SLP node verify the target can cope

>        with the permute we are going to use.  */

> -  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);

>     if (alt_stmt_code != ERROR_MARK

>         && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)

>       {

> -      unsigned HOST_WIDE_INT count;

> -      if (!nunits.is_constant (&count))

> -       {

> -         if (dump_enabled_p ())

> -           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> -                            "Build SLP failed: different operations "

> -                            "not allowed with variable-length SLP.\n");

> -         return false;

> -       }

> -      vec_perm_builder sel (count, count, 1);

> -      for (i = 0; i < count; ++i)

> -       {

> -         unsigned int elt = i;

> -         if (gimple_assign_rhs_code (stmts[i % group_size]) ==

alt_stmt_code)
> -           elt += count;

> -         sel.quick_push (elt);

> -       }

> -      vec_perm_indices indices (sel, 2, count);

> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))

> +      if (vectype == boolean_type_node

> +         || !vect_two_operations_perm_ok_p (stmts, group_size,

> +                                            vectype, alt_stmt_code))

>          {

>            for (i = 0; i < group_size; ++i)

>              if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code)

> @@ -2759,36 +2784,18 @@ vect_slp_analyze_node_operations (vec_in

>     if (bb_vinfo

>         && ! STMT_VINFO_DATA_REF (stmt_info))

>       {

> -      gcc_assert (PURE_SLP_STMT (stmt_info));

> -

> -      tree scalar_type = TREE_TYPE (gimple_get_lhs (stmt));

> -      if (dump_enabled_p ())

> -       {

> -         dump_printf_loc (MSG_NOTE, vect_location,

> -                          "get vectype for scalar type:  ");

> -         dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);

> -         dump_printf (MSG_NOTE, "\n");

> -       }

> -

> -      tree vectype = get_vectype_for_scalar_type (scalar_type);

> -      if (!vectype)

> -       {

> -         if (dump_enabled_p ())

> -           {

> -             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> -                              "not SLPed: unsupported data-type ");

> -             dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,

> -                                scalar_type);

> -             dump_printf (MSG_MISSED_OPTIMIZATION, "\n");

> -           }

> -         return false;

> -       }

> -

> -      if (dump_enabled_p ())

> -       {

> -         dump_printf_loc (MSG_NOTE, vect_location, "vectype:  ");

> -         dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype);

> -         dump_printf (MSG_NOTE, "\n");

> +      tree vectype, nunits_vectype;

> +      if (!vect_get_vector_types_for_stmt (stmt_info, &vectype,

> +                                          &nunits_vectype))

> +       /* We checked this when building the node.  */

> +       gcc_unreachable ();

> +      if (vectype == boolean_type_node)

> +       {

> +         vectype = vect_get_mask_type_for_stmt (stmt_info);

> +         if (!vectype)

> +           /* vect_get_mask_type_for_stmt has already explained the

> +              failure.  */

> +           return false;

>          }


>         gimple *sstmt;

> Index: gcc/tree-vect-loop.c

> ===================================================================

> --- gcc/tree-vect-loop.c        2018-05-10 07:18:12.104514856 +0100

> +++ gcc/tree-vect-loop.c        2018-05-10 07:18:12.320505598 +0100

> @@ -155,6 +155,108 @@ Software Foundation; either version 3, o


>   static void vect_estimate_min_profitable_iters (loop_vec_info, int *,

int *);

> +/* Subroutine of vect_determine_vf_for_stmt that handles only one

> +   statement.  VECTYPE_MAYBE_SET_P is true if STMT_VINFO_VECTYPE

> +   may already be set for general statements (not just data refs).  */

> +

> +static bool

> +vect_determine_vf_for_stmt_1 (stmt_vec_info stmt_info,

> +                             bool vectype_maybe_set_p,

> +                             poly_uint64 *vf,

> +                             vec<stmt_vec_info > *mask_producers)

> +{

> +  gimple *stmt = stmt_info->stmt;

> +

> +  if ((!STMT_VINFO_RELEVANT_P (stmt_info)

> +       && !STMT_VINFO_LIVE_P (stmt_info))

> +      || gimple_clobber_p (stmt))

> +    {

> +      if (dump_enabled_p ())

> +       dump_printf_loc (MSG_NOTE, vect_location, "skip.\n");

> +      return true;

> +    }

> +

> +  tree stmt_vectype, nunits_vectype;

> +  if (!vect_get_vector_types_for_stmt (stmt_info, &stmt_vectype,

> +                                      &nunits_vectype))

> +    return false;

> +

> +  if (stmt_vectype)

> +    {

> +      if (STMT_VINFO_VECTYPE (stmt_info))

> +       /* The only case when a vectype had been already set is for stmts

> +          that contain a data ref, or for "pattern-stmts" (stmts

generated
> +          by the vectorizer to represent/replace a certain idiom).  */

> +       gcc_assert ((STMT_VINFO_DATA_REF (stmt_info)

> +                    || vectype_maybe_set_p)

> +                   && STMT_VINFO_VECTYPE (stmt_info) == stmt_vectype);

> +      else if (stmt_vectype == boolean_type_node)

> +       mask_producers->safe_push (stmt_info);

> +      else

> +       STMT_VINFO_VECTYPE (stmt_info) = stmt_vectype;

> +    }

> +

> +  if (nunits_vectype)

> +    vect_update_max_nunits (vf, nunits_vectype);

> +

> +  return true;

> +}

> +

> +/* Subroutine of vect_determine_vectorization_factor.  Set the vector

> +   types of STMT_INFO and all attached pattern statements and update

> +   the vectorization factor VF accordingly.  If some of the statements

> +   produce a mask result whose vector type can only be calculated later,

> +   add them to MASK_PRODUCERS.  Return true on success or false if

> +   something prevented vectorization.  */

> +

> +static bool

> +vect_determine_vf_for_stmt (stmt_vec_info stmt_info, poly_uint64 *vf,

> +                           vec<stmt_vec_info > *mask_producers)

> +{

> +  if (dump_enabled_p ())

> +    {

> +      dump_printf_loc (MSG_NOTE, vect_location, "==> examining

statement: ");
> +      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt_info->stmt, 0);

> +    }

> +  if (!vect_determine_vf_for_stmt_1 (stmt_info, false, vf,

mask_producers))
> +    return false;

> +

> +  if (STMT_VINFO_IN_PATTERN_P (stmt_info)

> +      && STMT_VINFO_RELATED_STMT (stmt_info))

> +    {

> +      stmt_info = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (stmt_info));

> +

> +      /* If a pattern statement has def stmts, analyze them too.  */

> +      gimple *pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ (stmt_info);

> +      for (gimple_stmt_iterator si = gsi_start (pattern_def_seq);

> +          !gsi_end_p (si); gsi_next (&si))

> +       {

> +         stmt_vec_info def_stmt_info = vinfo_for_stmt (gsi_stmt (si));

> +         if (dump_enabled_p ())

> +           {

> +             dump_printf_loc (MSG_NOTE, vect_location,

> +                              "==> examining pattern def stmt: ");

> +             dump_gimple_stmt (MSG_NOTE, TDF_SLIM,

> +                               def_stmt_info->stmt, 0);

> +           }

> +         if (!vect_determine_vf_for_stmt_1 (def_stmt_info, true,

> +                                            vf, mask_producers))

> +           return false;

> +       }

> +

> +      if (dump_enabled_p ())

> +       {

> +         dump_printf_loc (MSG_NOTE, vect_location,

> +                          "==> examining pattern statement: ");

> +         dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt_info->stmt, 0);

> +       }

> +      if (!vect_determine_vf_for_stmt_1 (stmt_info, true, vf,

mask_producers))
> +       return false;

> +    }

> +

> +  return true;

> +}

> +

>   /* Function vect_determine_vectorization_factor


>      Determine the vectorization factor (VF).  VF is the number of data

elements
> @@ -192,12 +294,6 @@ vect_determine_vectorization_factor (loo

>     tree vectype;

>     stmt_vec_info stmt_info;

>     unsigned i;

> -  HOST_WIDE_INT dummy;

> -  gimple *stmt, *pattern_stmt = NULL;

> -  gimple_seq pattern_def_seq = NULL;

> -  gimple_stmt_iterator pattern_def_si = gsi_none ();

> -  bool analyze_pattern_stmt = false;

> -  bool bool_result;

>     auto_vec<stmt_vec_info> mask_producers;


>     if (dump_enabled_p ())

> @@ -269,304 +365,13 @@ vect_determine_vectorization_factor (loo

>              }

>          }


> -      for (gimple_stmt_iterator si = gsi_start_bb (bb);

> -          !gsi_end_p (si) || analyze_pattern_stmt;)

> -        {

> -          tree vf_vectype;

> -

> -          if (analyze_pattern_stmt)

> -           stmt = pattern_stmt;

> -          else

> -            stmt = gsi_stmt (si);

> -

> -          stmt_info = vinfo_for_stmt (stmt);

> -

> -         if (dump_enabled_p ())

> -           {

> -             dump_printf_loc (MSG_NOTE, vect_location,

> -                               "==> examining statement: ");

> -             dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);

> -           }

> -

> -         gcc_assert (stmt_info);

> -

> -         /* Skip stmts which do not need to be vectorized.  */

> -         if ((!STMT_VINFO_RELEVANT_P (stmt_info)

> -              && !STMT_VINFO_LIVE_P (stmt_info))

> -             || gimple_clobber_p (stmt))

> -            {

> -              if (STMT_VINFO_IN_PATTERN_P (stmt_info)

> -                  && (pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info))

> -                  && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt

(pattern_stmt))
> -                      || STMT_VINFO_LIVE_P (vinfo_for_stmt

(pattern_stmt))))
> -                {

> -                  stmt = pattern_stmt;

> -                  stmt_info = vinfo_for_stmt (pattern_stmt);

> -                  if (dump_enabled_p ())

> -                    {

> -                      dump_printf_loc (MSG_NOTE, vect_location,

> -                                       "==> examining pattern statement:

");
> -                      dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0);

> -                    }

> -                }

> -              else

> -               {

> -                 if (dump_enabled_p ())

> -                   dump_printf_loc (MSG_NOTE, vect_location, "skip.\n");

> -                  gsi_next (&si);

> -                 continue;

> -                }

> -           }

> -          else if (STMT_VINFO_IN_PATTERN_P (stmt_info)

> -                   && (pattern_stmt = STMT_VINFO_RELATED_STMT

(stmt_info))
> -                   && (STMT_VINFO_RELEVANT_P (vinfo_for_stmt

(pattern_stmt))
> -                       || STMT_VINFO_LIVE_P (vinfo_for_stmt

(pattern_stmt))))
> -            analyze_pattern_stmt = true;

> -

> -         /* If a pattern statement has def stmts, analyze them too.  */

> -         if (is_pattern_stmt_p (stmt_info))

> -           {

> -             if (pattern_def_seq == NULL)

> -               {

> -                 pattern_def_seq = STMT_VINFO_PATTERN_DEF_SEQ

(stmt_info);
> -                 pattern_def_si = gsi_start (pattern_def_seq);

> -               }

> -             else if (!gsi_end_p (pattern_def_si))

> -               gsi_next (&pattern_def_si);

> -             if (pattern_def_seq != NULL)

> -               {

> -                 gimple *pattern_def_stmt = NULL;

> -                 stmt_vec_info pattern_def_stmt_info = NULL;

> -

> -                 while (!gsi_end_p (pattern_def_si))

> -                   {

> -                     pattern_def_stmt = gsi_stmt (pattern_def_si);

> -                     pattern_def_stmt_info

> -                       = vinfo_for_stmt (pattern_def_stmt);

> -                     if (STMT_VINFO_RELEVANT_P (pattern_def_stmt_info)

> -                         || STMT_VINFO_LIVE_P (pattern_def_stmt_info))

> -                       break;

> -                     gsi_next (&pattern_def_si);

> -                   }

> -

> -                 if (!gsi_end_p (pattern_def_si))

> -                   {

> -                     if (dump_enabled_p ())

> -                       {

> -                         dump_printf_loc (MSG_NOTE, vect_location,

> -                                           "==> examining pattern def

stmt: ");
> -                         dump_gimple_stmt (MSG_NOTE, TDF_SLIM,

> -                                            pattern_def_stmt, 0);

> -                       }

> -

> -                     stmt = pattern_def_stmt;

> -                     stmt_info = pattern_def_stmt_info;

> -                   }

> -                 else

> -                   {

> -                     pattern_def_si = gsi_none ();

> -                     analyze_pattern_stmt = false;

> -                   }

> -               }

> -             else

> -               analyze_pattern_stmt = false;

> -           }

> -

> -         if (gimple_get_lhs (stmt) == NULL_TREE

> -             /* MASK_STORE has no lhs, but is ok.  */

> -             && (!is_gimple_call (stmt)

> -                 || !gimple_call_internal_p (stmt)

> -                 || gimple_call_internal_fn (stmt) != IFN_MASK_STORE))

> -           {

> -             if (is_gimple_call (stmt))

> -               {

> -                 /* Ignore calls with no lhs.  These must be calls to

> -                    #pragma omp simd functions, and what vectorization

factor
> -                    it really needs can't be determined until

> -                    vectorizable_simd_clone_call.  */

> -                 if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))

> -                   {

> -                     pattern_def_seq = NULL;

> -                     gsi_next (&si);

> -                   }

> -                 continue;

> -               }

> -             if (dump_enabled_p ())

> -               {

> -                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> -                                   "not vectorized: irregular stmt.");

> -                 dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM,

stmt,
> -                                    0);

> -               }

> -             return false;

> -           }

> -

> -         if (VECTOR_MODE_P (TYPE_MODE (gimple_expr_type (stmt))))

> -           {

> -             if (dump_enabled_p ())

> -               {

> -                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> -                                   "not vectorized: vector stmt in

loop:");
> -                 dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,

stmt, 0);
> -               }

> -             return false;

> -           }

> -

> -         bool_result = false;

> -

> -         if (STMT_VINFO_VECTYPE (stmt_info))

> -           {

> -             /* The only case when a vectype had been already set is for

stmts
> -                that contain a dataref, or for "pattern-stmts" (stmts

> -                generated by the vectorizer to represent/replace a

certain
> -                idiom).  */

> -             gcc_assert (STMT_VINFO_DATA_REF (stmt_info)

> -                         || is_pattern_stmt_p (stmt_info)

> -                         || !gsi_end_p (pattern_def_si));

> -             vectype = STMT_VINFO_VECTYPE (stmt_info);

> -           }

> -         else

> -           {

> -             gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));

> -             if (gimple_call_internal_p (stmt, IFN_MASK_STORE))

> -               scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));

> -             else

> -               scalar_type = TREE_TYPE (gimple_get_lhs (stmt));

> -

> -             /* Bool ops don't participate in vectorization factor

> -                computation.  For comparison use compared types to

> -                compute a factor.  */

> -             if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)

> -                 && is_gimple_assign (stmt)

> -                 && gimple_assign_rhs_code (stmt) != COND_EXPR)

> -               {

> -                 if (STMT_VINFO_RELEVANT_P (stmt_info)

> -                     || STMT_VINFO_LIVE_P (stmt_info))

> -                   mask_producers.safe_push (stmt_info);

> -                 bool_result = true;

> -

> -                 if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt))

> -                     == tcc_comparison

> -                     && !VECT_SCALAR_BOOLEAN_TYPE_P

> -                           (TREE_TYPE (gimple_assign_rhs1 (stmt))))

> -                   scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));

> -                 else

> -                   {

> -                     if (!analyze_pattern_stmt && gsi_end_p

(pattern_def_si))
> -                       {

> -                         pattern_def_seq = NULL;

> -                         gsi_next (&si);

> -                       }

> -                     continue;

> -                   }

> -               }

> -

> -             if (dump_enabled_p ())

> -               {

> -                 dump_printf_loc (MSG_NOTE, vect_location,

> -                                   "get vectype for scalar type:  ");

> -                 dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);

> -                  dump_printf (MSG_NOTE, "\n");

> -               }

> -             vectype = get_vectype_for_scalar_type (scalar_type);

> -             if (!vectype)

> -               {

> -                 if (dump_enabled_p ())

> -                   {

> -                     dump_printf_loc (MSG_MISSED_OPTIMIZATION,

vect_location,
> -                                       "not vectorized: unsupported "

> -                                       "data-type ");

> -                     dump_generic_expr (MSG_MISSED_OPTIMIZATION,

TDF_SLIM,
> -                                         scalar_type);

> -                      dump_printf (MSG_MISSED_OPTIMIZATION, "\n");

> -                   }

> -                 return false;

> -               }

> -

> -             if (!bool_result)

> -               STMT_VINFO_VECTYPE (stmt_info) = vectype;

> -

> -             if (dump_enabled_p ())

> -               {

> -                 dump_printf_loc (MSG_NOTE, vect_location, "vectype: ");

> -                 dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype);

> -                  dump_printf (MSG_NOTE, "\n");

> -               }

> -            }

> -

> -         /* Don't try to compute VF out scalar types if we stmt

> -            produces boolean vector.  Use result vectype instead.  */

> -         if (VECTOR_BOOLEAN_TYPE_P (vectype))

> -           vf_vectype = vectype;

> -         else

> -           {

> -             /* The vectorization factor is according to the smallest

> -                scalar type (or the largest vector size, but we only

> -                support one vector size per loop).  */

> -             if (!bool_result)

> -               scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,

> -                                                            &dummy);

> -             if (dump_enabled_p ())

> -               {

> -                 dump_printf_loc (MSG_NOTE, vect_location,

> -                                  "get vectype for scalar type:  ");

> -                 dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);

> -                 dump_printf (MSG_NOTE, "\n");

> -               }

> -             vf_vectype = get_vectype_for_scalar_type (scalar_type);

> -           }

> -         if (!vf_vectype)

> -           {

> -             if (dump_enabled_p ())

> -               {

> -                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> -                                   "not vectorized: unsupported

data-type ");
> -                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,

> -                                     scalar_type);

> -                  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");

> -               }

> -             return false;

> -           }

> -

> -         if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),

> -                       GET_MODE_SIZE (TYPE_MODE (vf_vectype))))

> -           {

> -             if (dump_enabled_p ())

> -               {

> -                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> -                                   "not vectorized: different sized

vector "
> -                                   "types in statement, ");

> -                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,

> -                                     vectype);

> -                 dump_printf (MSG_MISSED_OPTIMIZATION, " and ");

> -                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,

> -                                     vf_vectype);

> -                  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");

> -               }

> -             return false;

> -           }

> -

> -         if (dump_enabled_p ())

> -           {

> -             dump_printf_loc (MSG_NOTE, vect_location, "vectype: ");

> -             dump_generic_expr (MSG_NOTE, TDF_SLIM, vf_vectype);

> -              dump_printf (MSG_NOTE, "\n");

> -           }

> -

> -         if (dump_enabled_p ())

> -           {

> -             dump_printf_loc (MSG_NOTE, vect_location, "nunits = ");

> -             dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (vf_vectype));

> -             dump_printf (MSG_NOTE, "\n");

> -           }

> -

> -         vect_update_max_nunits (&vectorization_factor, vf_vectype);

> -

> -         if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))

> -           {

> -             pattern_def_seq = NULL;

> -             gsi_next (&si);

> -           }

> +      for (gimple_stmt_iterator si = gsi_start_bb (bb); !gsi_end_p (si);

> +          gsi_next (&si))

> +       {

> +         stmt_info = vinfo_for_stmt (gsi_stmt (si));

> +         if (!vect_determine_vf_for_stmt (stmt_info,

&vectorization_factor,
> +                                          &mask_producers))

> +           return false;

>           }

>       }


> @@ -589,119 +394,11 @@ vect_determine_vectorization_factor (loo


>     for (i = 0; i < mask_producers.length (); i++)

>       {

> -      tree mask_type = NULL;

> -

> -      stmt = STMT_VINFO_STMT (mask_producers[i]);

> -

> -      if (is_gimple_assign (stmt)

> -         && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) ==

tcc_comparison
> -         && !VECT_SCALAR_BOOLEAN_TYPE_P

> -                                     (TREE_TYPE (gimple_assign_rhs1

(stmt))))
> -       {

> -         scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));

> -         mask_type = get_mask_type_for_scalar_type (scalar_type);

> -

> -         if (!mask_type)

> -           {

> -             if (dump_enabled_p ())

> -               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> -                                "not vectorized: unsupported mask\n");

> -             return false;

> -           }

> -       }

> -      else

> -       {

> -         tree rhs;

> -         ssa_op_iter iter;

> -         gimple *def_stmt;

> -         enum vect_def_type dt;

> -

> -         FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE)

> -           {

> -             if (!vect_is_simple_use (rhs, mask_producers[i]->vinfo,

> -                                      &def_stmt, &dt, &vectype))

> -               {

> -                 if (dump_enabled_p ())

> -                   {

> -                     dump_printf_loc (MSG_MISSED_OPTIMIZATION,

vect_location,
> -                                      "not vectorized: can't compute

mask type "
> -                                      "for statement, ");

> -                     dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,

  TDF_SLIM, stmt,
> -                                       0);

> -                   }

> -                 return false;

> -               }

> -

> -             /* No vectype probably means external definition.

> -                Allow it in case there is another operand which

> -                allows to determine mask type.  */

> -             if (!vectype)

> -               continue;

> -

> -             if (!mask_type)

> -               mask_type = vectype;

> -             else if (maybe_ne (TYPE_VECTOR_SUBPARTS (mask_type),

> -                                TYPE_VECTOR_SUBPARTS (vectype)))

> -               {

> -                 if (dump_enabled_p ())

> -                   {

> -                     dump_printf_loc (MSG_MISSED_OPTIMIZATION,

vect_location,
> -                                      "not vectorized: different sized

masks "
> -                                      "types in statement, ");

> -                     dump_generic_expr (MSG_MISSED_OPTIMIZATION,

TDF_SLIM,
> -                                        mask_type);

> -                     dump_printf (MSG_MISSED_OPTIMIZATION, " and ");

> -                     dump_generic_expr (MSG_MISSED_OPTIMIZATION,

TDF_SLIM,
> -                                        vectype);

> -                     dump_printf (MSG_MISSED_OPTIMIZATION, "\n");

> -                   }

> -                 return false;

> -               }

> -             else if (VECTOR_BOOLEAN_TYPE_P (mask_type)

> -                      != VECTOR_BOOLEAN_TYPE_P (vectype))

> -               {

> -                 if (dump_enabled_p ())

> -                   {

> -                     dump_printf_loc (MSG_MISSED_OPTIMIZATION,

vect_location,
> -                                      "not vectorized: mixed mask and "

> -                                      "nonmask vector types in

statement, ");
> -                     dump_generic_expr (MSG_MISSED_OPTIMIZATION,

TDF_SLIM,
> -                                        mask_type);

> -                     dump_printf (MSG_MISSED_OPTIMIZATION, " and ");

> -                     dump_generic_expr (MSG_MISSED_OPTIMIZATION,

TDF_SLIM,
> -                                        vectype);

> -                     dump_printf (MSG_MISSED_OPTIMIZATION, "\n");

> -                   }

> -                 return false;

> -               }

> -           }

> -

> -         /* We may compare boolean value loaded as vector of integers.

> -            Fix mask_type in such case.  */

> -         if (mask_type

> -             && !VECTOR_BOOLEAN_TYPE_P (mask_type)

> -             && gimple_code (stmt) == GIMPLE_ASSIGN

> -             && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) ==

tcc_comparison)
> -           mask_type = build_same_sized_truth_vector_type (mask_type);

> -       }

> -

> -      /* No mask_type should mean loop invariant predicate.

> -        This is probably a subject for optimization in

> -        if-conversion.  */

> +      stmt_info = mask_producers[i];

> +      tree mask_type = vect_get_mask_type_for_stmt (stmt_info);

>         if (!mask_type)

> -       {

> -         if (dump_enabled_p ())

> -           {

> -             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> -                              "not vectorized: can't compute mask type "

> -                              "for statement, ");

> -             dump_gimple_stmt (MSG_MISSED_OPTIMIZATION,  TDF_SLIM, stmt,

> -                               0);

> -           }

> -         return false;

> -       }

> -

> -      STMT_VINFO_VECTYPE (mask_producers[i]) = mask_type;

> +       return false;

> +      STMT_VINFO_VECTYPE (stmt_info) = mask_type;

>       }


>     return true;

> Index: gcc/tree-vect-stmts.c

> ===================================================================

> --- gcc/tree-vect-stmts.c       2018-05-10 07:18:12.104514856 +0100

> +++ gcc/tree-vect-stmts.c       2018-05-10 07:18:12.322505512 +0100

> @@ -10520,3 +10520,311 @@ vect_gen_while_not (gimple_seq *seq, tre

>     gimple_seq_add_stmt (seq, call);

>     return gimple_build (seq, BIT_NOT_EXPR, mask_type, tmp);

>   }

> +

> +/* Try to compute the vector types required to vectorize STMT_INFO,

> +   returning true on success and false if vectorization isn't possible.

> +

> +   On success:

> +

> +   - Set *STMT_VECTYPE_OUT to:

> +     - NULL_TREE if the statement doesn't need to be vectorized;

> +     - boolean_type_node if the statement is a boolean operation whose

> +       vector type can only be determined once all the other vector types

> +       are known; and

> +     - the equivalent of STMT_VINFO_VECTYPE otherwise.

> +

> +   - Set *NUNITS_VECTYPE_OUT to the vector type that contains the maximum

> +     number of units needed to vectorize STMT_INFO, or NULL_TREE if the

> +     statement does not help to determine the overall number of units.

  */
> +

> +bool

> +vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,

> +                               tree *stmt_vectype_out,

> +                               tree *nunits_vectype_out)

> +{

> +  gimple *stmt = stmt_info->stmt;

> +

> +  *stmt_vectype_out = NULL_TREE;

> +  *nunits_vectype_out = NULL_TREE;

> +

> +  if (gimple_get_lhs (stmt) == NULL_TREE

> +      /* MASK_STORE has no lhs, but is ok.  */

> +      && !gimple_call_internal_p (stmt, IFN_MASK_STORE))

> +    {

> +      if (is_a <gcall *> (stmt))

> +       {

> +         /* Ignore calls with no lhs.  These must be calls to

> +            #pragma omp simd functions, and what vectorization factor

> +            it really needs can't be determined until

> +            vectorizable_simd_clone_call.  */

> +         if (dump_enabled_p ())

> +           dump_printf_loc (MSG_NOTE, vect_location,

> +                            "defer to SIMD clone analysis.\n");

> +         return true;

> +       }

> +

> +      if (dump_enabled_p ())

> +       {

> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> +                          "not vectorized: irregular stmt.");

> +         dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);

> +       }

> +      return false;

> +    }

> +

> +  if (VECTOR_MODE_P (TYPE_MODE (gimple_expr_type (stmt))))

> +    {

> +      if (dump_enabled_p ())

> +       {

> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> +                          "not vectorized: vector stmt in loop:");

> +         dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);

> +       }

> +      return false;

> +    }

> +

> +  tree vectype;

> +  tree scalar_type = NULL_TREE;

> +  if (STMT_VINFO_VECTYPE (stmt_info))

> +    *stmt_vectype_out = vectype = STMT_VINFO_VECTYPE (stmt_info);

> +  else

> +    {

> +      gcc_assert (!STMT_VINFO_DATA_REF (stmt_info));

> +      if (gimple_call_internal_p (stmt, IFN_MASK_STORE))

> +       scalar_type = TREE_TYPE (gimple_call_arg (stmt, 3));

> +      else

> +       scalar_type = TREE_TYPE (gimple_get_lhs (stmt));

> +

> +      /* Pure bool ops don't participate in number-of-units computation.

> +        For comparisons use the types being compared.  */

> +      if (VECT_SCALAR_BOOLEAN_TYPE_P (scalar_type)

> +         && is_gimple_assign (stmt)

> +         && gimple_assign_rhs_code (stmt) != COND_EXPR)

> +       {

> +         *stmt_vectype_out = boolean_type_node;

> +

> +         tree rhs1 = gimple_assign_rhs1 (stmt);

> +         if (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) ==

tcc_comparison
> +             && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (rhs1)))

> +           scalar_type = TREE_TYPE (rhs1);

> +         else

> +           {

> +             if (dump_enabled_p ())

> +               dump_printf_loc (MSG_NOTE, vect_location,

> +                                "pure bool operation.\n");

> +             return true;

> +           }

> +       }

> +

> +      if (dump_enabled_p ())

> +       {

> +         dump_printf_loc (MSG_NOTE, vect_location,

> +                          "get vectype for scalar type:  ");

> +         dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);

> +         dump_printf (MSG_NOTE, "\n");

> +       }

> +      vectype = get_vectype_for_scalar_type (scalar_type);

> +      if (!vectype)

> +       {

> +         if (dump_enabled_p ())

> +           {

> +             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> +                              "not vectorized: unsupported data-type ");

> +             dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,

> +                                scalar_type);

> +             dump_printf (MSG_MISSED_OPTIMIZATION, "\n");

> +           }

> +         return false;

> +       }

> +

> +      if (!*stmt_vectype_out)

> +       *stmt_vectype_out = vectype;

> +

> +      if (dump_enabled_p ())

> +       {

> +         dump_printf_loc (MSG_NOTE, vect_location, "vectype: ");

> +         dump_generic_expr (MSG_NOTE, TDF_SLIM, vectype);

> +         dump_printf (MSG_NOTE, "\n");

> +       }

> +    }

> +

> +  /* Don't try to compute scalar types if the stmt produces a boolean

> +     vector; use the existing vector type instead.  */

> +  tree nunits_vectype;

> +  if (VECTOR_BOOLEAN_TYPE_P (vectype))

> +    nunits_vectype = vectype;

> +  else

> +    {

> +      /* The number of units is set according to the smallest scalar

> +        type (or the largest vector size, but we only support one

> +        vector size per vectorization).  */

> +      if (*stmt_vectype_out != boolean_type_node)

> +       {

> +         HOST_WIDE_INT dummy;

> +         scalar_type = vect_get_smallest_scalar_type (stmt, &dummy,

&dummy);
> +       }

> +      if (dump_enabled_p ())

> +       {

> +         dump_printf_loc (MSG_NOTE, vect_location,

> +                          "get vectype for scalar type:  ");

> +         dump_generic_expr (MSG_NOTE, TDF_SLIM, scalar_type);

> +         dump_printf (MSG_NOTE, "\n");

> +       }

> +      nunits_vectype = get_vectype_for_scalar_type (scalar_type);

> +    }

> +  if (!nunits_vectype)

> +    {

> +      if (dump_enabled_p ())

> +       {

> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> +                          "not vectorized: unsupported data-type ");

> +         dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,

scalar_type);
> +         dump_printf (MSG_MISSED_OPTIMIZATION, "\n");

> +       }

> +      return false;

> +    }

> +

> +  if (maybe_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),

> +               GET_MODE_SIZE (TYPE_MODE (nunits_vectype))))

> +    {

> +      if (dump_enabled_p ())

> +       {

> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> +                          "not vectorized: different sized vector "

> +                          "types in statement, ");

> +         dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, vectype);

> +         dump_printf (MSG_MISSED_OPTIMIZATION, " and ");

> +         dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,

nunits_vectype);
> +         dump_printf (MSG_MISSED_OPTIMIZATION, "\n");

> +       }

> +      return false;

> +    }

> +

> +  if (dump_enabled_p ())

> +    {

> +      dump_printf_loc (MSG_NOTE, vect_location, "vectype: ");

> +      dump_generic_expr (MSG_NOTE, TDF_SLIM, nunits_vectype);

> +      dump_printf (MSG_NOTE, "\n");

> +

> +      dump_printf_loc (MSG_NOTE, vect_location, "nunits = ");

> +      dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (nunits_vectype));

> +      dump_printf (MSG_NOTE, "\n");

> +    }

> +

> +  *nunits_vectype_out = nunits_vectype;

> +  return true;

> +}

> +

> +/* Try to determine the correct vector type for STMT_INFO, which is a

> +   statement that produces a scalar boolean result.  Return the vector

> +   type on success, otherwise return NULL_TREE.  */

> +

> +tree

> +vect_get_mask_type_for_stmt (stmt_vec_info stmt_info)

> +{

> +  gimple *stmt = stmt_info->stmt;

> +  tree mask_type = NULL;

> +  tree vectype, scalar_type;

> +

> +  if (is_gimple_assign (stmt)

> +      && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) ==

tcc_comparison
> +      && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1

(stmt))))
> +    {

> +      scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));

> +      mask_type = get_mask_type_for_scalar_type (scalar_type);

> +

> +      if (!mask_type)

> +       {

> +         if (dump_enabled_p ())

> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> +                            "not vectorized: unsupported mask\n");

> +         return NULL_TREE;

> +       }

> +    }

> +  else

> +    {

> +      tree rhs;

> +      ssa_op_iter iter;

> +      gimple *def_stmt;

> +      enum vect_def_type dt;

> +

> +      FOR_EACH_SSA_TREE_OPERAND (rhs, stmt, iter, SSA_OP_USE)

> +       {

> +         if (!vect_is_simple_use (rhs, stmt_info->vinfo,

> +                                  &def_stmt, &dt, &vectype))

> +           {

> +             if (dump_enabled_p ())

> +               {

> +                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> +                                  "not vectorized: can't compute mask

type "
> +                                  "for statement, ");

> +                 dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM,

stmt,
> +                                   0);

> +               }

> +             return NULL_TREE;

> +           }

> +

> +         /* No vectype probably means external definition.

> +            Allow it in case there is another operand which

> +            allows to determine mask type.  */

> +         if (!vectype)

> +           continue;

> +

> +         if (!mask_type)

> +           mask_type = vectype;

> +         else if (maybe_ne (TYPE_VECTOR_SUBPARTS (mask_type),

> +                            TYPE_VECTOR_SUBPARTS (vectype)))

> +           {

> +             if (dump_enabled_p ())

> +               {

> +                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> +                                  "not vectorized: different sized masks

"
> +                                  "types in statement, ");

> +                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,

> +                                    mask_type);

> +                 dump_printf (MSG_MISSED_OPTIMIZATION, " and ");

> +                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,

> +                                    vectype);

> +                 dump_printf (MSG_MISSED_OPTIMIZATION, "\n");

> +               }

> +             return NULL_TREE;

> +           }

> +         else if (VECTOR_BOOLEAN_TYPE_P (mask_type)

> +                  != VECTOR_BOOLEAN_TYPE_P (vectype))

> +           {

> +             if (dump_enabled_p ())

> +               {

> +                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> +                                  "not vectorized: mixed mask and "

> +                                  "nonmask vector types in statement, ");

> +                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,

> +                                    mask_type);

> +                 dump_printf (MSG_MISSED_OPTIMIZATION, " and ");

> +                 dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM,

> +                                    vectype);

> +                 dump_printf (MSG_MISSED_OPTIMIZATION, "\n");

> +               }

> +             return NULL_TREE;

> +           }

> +       }

> +

> +      /* We may compare boolean value loaded as vector of integers.

> +        Fix mask_type in such case.  */

> +      if (mask_type

> +         && !VECTOR_BOOLEAN_TYPE_P (mask_type)

> +         && gimple_code (stmt) == GIMPLE_ASSIGN

> +         && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) ==

tcc_comparison)
> +       mask_type = build_same_sized_truth_vector_type (mask_type);

> +    }

> +

> +  /* No mask_type should mean loop invariant predicate.

> +     This is probably a subject for optimization in if-conversion.  */

> +  if (!mask_type && dump_enabled_p ())

> +    {

> +      dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> +                      "not vectorized: can't compute mask type "

> +                      "for statement, ");

> +      dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);

> +    }

> +  return mask_type;

> +}

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c     2018-05-10

07:18:12.317505726 +0100
> @@ -0,0 +1,36 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

> +

> +#include <stdint.h>

> +

> +#define DEF_LOOP(TYPE)                                                 \

> +  void __attribute__ ((noinline, noclone))                             \

> +  test_##TYPE (TYPE *a, TYPE a1, TYPE a2, TYPE a3, TYPE a4, int n)     \

> +  {                                                                    \

> +    for (int i = 0; i < n; i += 2)                                     \

> +      {

        \
> +       a[i] = a[i] >= 1 && a[i] != 3 ? a1 : a2;                        \

> +       a[i + 1] = a[i + 1] >= 1 && a[i + 1] != 3 ? a3 : a4;            \

> +      }

        \
> +  }

> +

> +#define FOR_EACH_TYPE(T) \

> +  T (int8_t) \

> +  T (uint8_t) \

> +  T (int16_t) \

> +  T (uint16_t) \

> +  T (int32_t) \

> +  T (uint32_t) \

> +  T (int64_t) \

> +  T (uint64_t) \

> +  T (_Float16) \

> +  T (float) \

> +  T (double)

> +

> +FOR_EACH_TYPE (DEF_LOOP)

> +

> +/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */

> +/* { dg-final { scan-assembler-times {\tld1h\t} 3 } } */

> +/* { dg-final { scan-assembler-times {\tld1w\t} 3 } } */

> +/* { dg-final { scan-assembler-times {\tld1d\t} 3 } } */

> +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 11 } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c 2018-05-10

07:18:12.317505726 +0100
> @@ -0,0 +1,24 @@

> +/* { dg-do run { target aarch64_sve_hw } } */

> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

> +

> +#include "vcond_10.c"

> +

> +#define N 133

> +

> +#define TEST_LOOP(TYPE)

        \
> +  {                                                                    \

> +    TYPE a[N];                                                         \

> +    for (int i = 0; i < N; ++i)

        \
> +      a[i] = i % 7;                                                    \

> +    test_##TYPE (a, 10, 11, 12, 13, N);

        \
> +    for (int i = 0; i < N; ++i)

        \
> +      if (a[i] != 10 + (i & 1) * 2 + (i % 7 == 0 || i % 7 == 3))       \

> +       __builtin_abort ();                                             \

> +  }

> +

> +int

> +main (void)

> +{

> +  FOR_EACH_TYPE (TEST_LOOP);

> +  return 0;

> +}

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c     2018-05-10

07:18:12.317505726 +0100
> @@ -0,0 +1,36 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */

> +

> +#include <stdint.h>

> +

> +#define DEF_LOOP(TYPE)                                                 \

> +  void __att

Patch

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2018-05-08 09:42:03.526648115 +0100
+++ gcc/tree-vect-slp.c	2018-05-09 11:30:41.061096063 +0100
@@ -608,6 +608,41 @@  vect_record_max_nunits (vec_info *vinfo,
   return true;
 }
 
+/* Return the vector type associated with the smallest scalar type in STMT.  */
+
+static tree
+get_vectype_for_smallest_scalar_type (gimple *stmt)
+{
+  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  if (vectype != NULL_TREE
+      && VECTOR_BOOLEAN_TYPE_P (vectype))
+    {
+      /* The result of a vector boolean operation has the smallest scalar
+	 type unless the statement is extending an even narrower boolean.  */
+      if (!gimple_assign_cast_p (stmt))
+	return vectype;
+
+      tree src = gimple_assign_rhs1 (stmt);
+      gimple *def_stmt;
+      enum vect_def_type dt;
+      tree src_vectype = NULL_TREE;
+      if (vect_is_simple_use (src, stmt_info->vinfo, &def_stmt, &dt,
+			      &src_vectype)
+	  && src_vectype
+	  && VECTOR_BOOLEAN_TYPE_P (src_vectype))
+	{
+	  if (TYPE_PRECISION (TREE_TYPE (src_vectype))
+	      < TYPE_PRECISION (TREE_TYPE (vectype)))
+	    return src_vectype;
+	  return vectype;
+	}
+    }
+  HOST_WIDE_INT dummy;
+  tree scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);
+  return get_vectype_for_scalar_type (scalar_type);
+}
+
 /* Verify if the scalar stmts STMTS are isomorphic, require data
    permutation or are of unsupported types of operation.  Return
    true if they are, otherwise return false and indicate in *MATCHES
@@ -636,12 +671,11 @@  vect_build_slp_tree_1 (vec_info *vinfo,
   enum tree_code first_cond_code = ERROR_MARK;
   tree lhs;
   bool need_same_oprnds = false;
-  tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE;
+  tree vectype = NULL_TREE, first_op1 = NULL_TREE;
   optab optab;
   int icode;
   machine_mode optab_op2_mode;
   machine_mode vec_mode;
-  HOST_WIDE_INT dummy;
   gimple *first_load = NULL, *prev_first_load = NULL;
 
   /* For every stmt in NODE find its def stmt/s.  */
@@ -685,15 +719,14 @@  vect_build_slp_tree_1 (vec_info *vinfo,
 	  return false;
 	}
 
-      scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy);
-      vectype = get_vectype_for_scalar_type (scalar_type);
+      vectype = get_vectype_for_smallest_scalar_type (stmt);
       if (!vect_record_max_nunits (vinfo, stmt, group_size, vectype,
 				   max_nunits))
 	{
 	  /* Fatal mismatch.  */
 	  matches[0] = false;
-          return false;
-        }
+	  return false;
+	}
 
       if (gcall *call_stmt = dyn_cast <gcall *> (stmt))
 	{
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c	2018-05-09 11:30:41.057096221 +0100
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include <stdint.h>
+
+#define DEF_LOOP(TYPE)							\
+  void __attribute__ ((noinline, noclone))				\
+  test_##TYPE (TYPE *a, TYPE a1, TYPE a2, TYPE a3, TYPE a4, int n)	\
+  {									\
+    for (int i = 0; i < n; i += 2)					\
+      {									\
+	a[i] = a[i] >= 1 && a[i] != 3 ? a1 : a2;			\
+	a[i + 1] = a[i + 1] >= 1 && a[i + 1] != 3 ? a3 : a4;		\
+      }									\
+  }
+
+#define FOR_EACH_TYPE(T) \
+  T (int8_t) \
+  T (uint8_t) \
+  T (int16_t) \
+  T (uint16_t) \
+  T (int32_t) \
+  T (uint32_t) \
+  T (int64_t) \
+  T (uint64_t) \
+  T (_Float16) \
+  T (float) \
+  T (double)
+
+FOR_EACH_TYPE (DEF_LOOP)
+
+/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1h\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1w\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1d\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 11 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c	2018-05-09 11:30:41.057096221 +0100
@@ -0,0 +1,24 @@ 
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include "vcond_10.c"
+
+#define N 133
+
+#define TEST_LOOP(TYPE)							\
+  {									\
+    TYPE a[N];								\
+    for (int i = 0; i < N; ++i)						\
+      a[i] = i % 7;							\
+    test_##TYPE (a, 10, 11, 12, 13, N);					\
+    for (int i = 0; i < N; ++i)						\
+      if (a[i] != 10 + (i & 1) * 2 + (i % 7 == 0 || i % 7 == 3))	\
+	__builtin_abort ();						\
+  }
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (TEST_LOOP);
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c	2018-05-09 11:30:41.057096221 +0100
@@ -0,0 +1,36 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include <stdint.h>
+
+#define DEF_LOOP(TYPE)							\
+  void __attribute__ ((noinline, noclone))				\
+  test_##TYPE (int *restrict a, TYPE *restrict b, int a1, int a2,	\
+	       int a3, int a4, int n)					\
+  {									\
+    for (int i = 0; i < n; i += 2)					\
+      {									\
+	a[i] = a[i] >= 1 & b[i] != 3 ? a1 : a2;				\
+	a[i + 1] = a[i + 1] >= 1 & b[i + 1] != 3 ? a3 : a4;		\
+      }									\
+  }
+
+#define FOR_EACH_TYPE(T) \
+  T (int8_t) \
+  T (uint8_t) \
+  T (int16_t) \
+  T (uint16_t) \
+  T (int64_t) \
+  T (uint64_t) \
+  T (double)
+
+FOR_EACH_TYPE (DEF_LOOP)
+
+/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */
+/* { dg-final { scan-assembler-times {\tld1h\t} 2 } } */
+/* 4 for each 8-bit function, 2 for each 16-bit function, 1 for
+   each 64-bit function.  */
+/* { dg-final { scan-assembler-times {\tld1w\t} 15 } } */
+/* 3 64-bit functions * 2 64-bit vectors per 32-bit vector.  */
+/* { dg-final { scan-assembler-times {\tld1d\t} 6 } } */
+/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 15 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11_run.c	2018-05-09 11:30:41.059096142 +0100
@@ -0,0 +1,28 @@ 
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */
+
+#include "vcond_11.c"
+
+#define N 133
+
+#define TEST_LOOP(TYPE)							\
+  {									\
+    int a[N];								\
+    TYPE b[N];								\
+    for (int i = 0; i < N; ++i)						\
+      {									\
+	a[i] = i % 5;							\
+	b[i] = i % 7;							\
+      }									\
+    test_##TYPE (a, b, 10, 11, 12, 13, N);				\
+    for (int i = 0; i < N; ++i)						\
+      if (a[i] != 10 + (i & 1) * 2 + (i % 5 == 0 || i % 7 == 3))	\
+	__builtin_abort ();						\
+  }
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (TEST_LOOP);
+  return 0;
+}