Implement SLP of internal functions

Message ID	87muwzoqd3.fsf@linaro.org
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of gcc-patches-return-477750-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=TrtsDIHM2tTz1gs0RdsJcYhUb3uxjdwZn90JsNYsjl2eu5t8HbuI8 Za+c8MBrYFwIq566thD/2k2JebkDYWAz1VsAk+k2yb1yECtqM4nNArkIyJfb3SJB VW9gRhRAv7Z0WkThnW0mCND1t5Kin3kUmb+zK0eRh5EwCR15FQmNvQ= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org From: Richard Sandiford <richard.sandiford@linaro.org> To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Implement SLP of internal functions Date: Wed, 16 May 2018 11:18:48 +0100 Message-ID: <87muwzoqd3.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain
Series	Implement SLP of internal functions \| expand Implement SLP of internal functions

Message ID

87muwzoqd3.fsf@linaro.org

State

New

Headers

Received-SPF: pass (google.com: domain of
	gcc-patches-return-477750-patch=linaro.org@gcc.gnu.org
	designates 209.132.180.131 as permitted sender)
	client-ip=209.132.180.131; 
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:from
	:to:subject:date:message-id:mime-version:content-type; q=dns; s=
	default; b=TrtsDIHM2tTz1gs0RdsJcYhUb3uxjdwZn90JsNYsjl2eu5t8HbuI8
	Za+c8MBrYFwIq566thD/2k2JebkDYWAz1VsAk+k2yb1yECtqM4nNArkIyJfb3SJB
	VW9gRhRAv7Z0WkThnW0mCND1t5Kin3kUmb+zK0eRh5EwCR15FQmNvQ=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
From: Richard Sandiford <richard.sandiford@linaro.org>
To: gcc-patches@gcc.gnu.org
Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org
Subject: Implement SLP of internal functions
Date: Wed, 16 May 2018 11:18:48 +0100
Message-ID: <87muwzoqd3.fsf@linaro.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain

Series

Implement SLP of internal functions | expand

Commit Message

Richard Sandiford May 16, 2018, 10:18 a.m. UTC

SLP of calls was previously restricted to built-in functions.
This patch extends it to internal functions.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-05-16  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* internal-fn.h (vectorizable_internal_fn_p): New function.
	* tree-vect-slp.c (compatible_calls_p): Likewise.
	(vect_build_slp_tree_1): Remove nops argument.  Handle calls
	to internal functions.
	(vect_build_slp_tree_2): Update call to vect_build_slp_tree_1.

gcc/testsuite/
	* gcc.target/aarch64/sve/cond_arith_4.c: New test.
	* gcc.target/aarch64/sve/cond_arith_4_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_arith_5.c: Likewise.
	* gcc.target/aarch64/sve/cond_arith_5_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_14.c: Likewise.
	* gcc.target/aarch64/sve/slp_14_run.c: Likewise.

Comments

Richard Biener May 17, 2018, 11:36 a.m. UTC | #1

On Wed, May 16, 2018 at 12:18 PM Richard Sandiford <
richard.sandiford@linaro.org> wrote:

> SLP of calls was previously restricted to built-in functions.

> This patch extends it to internal functions.


> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf

> and x86_64-linux-gnu.  OK to install?


> Richard



> 2018-05-16  Richard Sandiford  <richard.sandiford@linaro.org>


> gcc/

>          * internal-fn.h (vectorizable_internal_fn_p): New function.

>          * tree-vect-slp.c (compatible_calls_p): Likewise.

>          (vect_build_slp_tree_1): Remove nops argument.  Handle calls

>          to internal functions.

>          (vect_build_slp_tree_2): Update call to vect_build_slp_tree_1.


> gcc/testsuite/

>          * gcc.target/aarch64/sve/cond_arith_4.c: New test.

>          * gcc.target/aarch64/sve/cond_arith_4_run.c: Likewise.

>          * gcc.target/aarch64/sve/cond_arith_5.c: Likewise.

>          * gcc.target/aarch64/sve/cond_arith_5_run.c: Likewise.

>          * gcc.target/aarch64/sve/slp_14.c: Likewise.

>          * gcc.target/aarch64/sve/slp_14_run.c: Likewise.


> Index: gcc/internal-fn.h

> ===================================================================

> --- gcc/internal-fn.h   2018-05-16 11:06:14.513574219 +0100

> +++ gcc/internal-fn.h   2018-05-16 11:12:11.872116220 +0100

> @@ -158,6 +158,17 @@ direct_internal_fn_p (internal_fn fn)

>     return direct_internal_fn_array[fn].type0 >= -1;

>   }


> +/* Return true if FN is a direct internal function that can be

vectorized by
> +   converting the return type and all argument types to vectors of the

same
> +   number of elements.  E.g. we can vectorize an IFN_SQRT on floats as an

> +   IFN_SQRT on vectors of N floats.  */

> +

> +inline bool

> +vectorizable_internal_fn_p (internal_fn fn)

> +{

> +  return direct_internal_fn_array[fn].vectorizable;

> +}

> +

>   /* Return optab information about internal function FN.  Only meaningful

>      if direct_internal_fn_p (FN).  */


> Index: gcc/tree-vect-slp.c

> ===================================================================

> --- gcc/tree-vect-slp.c 2018-05-16 11:02:46.262494712 +0100

> +++ gcc/tree-vect-slp.c 2018-05-16 11:12:11.873116180 +0100

> @@ -564,6 +564,41 @@ vect_get_and_check_slp_defs (vec_info *v

>     return 0;

>   }


> +/* Return true if call statements CALL1 and CALL2 are similar enough

> +   to be combined into the same SLP group.  */

> +

> +static bool

> +compatible_calls_p (gcall *call1, gcall *call2)

> +{

> +  unsigned int nargs = gimple_call_num_args (call1);

> +  if (nargs != gimple_call_num_args (call2))

> +    return false;

> +

> +  if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2))

> +    return false;

> +

> +  if (gimple_call_internal_p (call1))

> +    {

> +      if (TREE_TYPE (gimple_call_lhs (call1))

> +         != TREE_TYPE (gimple_call_lhs (call2)))

> +       return false;

> +      for (unsigned int i = 0; i < nargs; ++i)

> +       if (TREE_TYPE (gimple_call_arg (call1, i))

> +           != TREE_TYPE (gimple_call_arg (call2, i)))


Please use types_compatible_p in these two type comparisons.

Can you please add a generic vect_call_sqrtf to the main
vectorizer testsuite?  In fact I already see
gcc.dg/vect/fast-math-bb-slp-call-1.c.
Does that mean SQRT does never appear as internal function before
vectorization?

OK with that changes.
Richard.

> +         return false;

> +    }

> +  else

> +    {

> +      if (!operand_equal_p (gimple_call_fn (call1),

> +                           gimple_call_fn (call2), 0))

> +       return false;

> +

> +      if (gimple_call_fntype (call1) != gimple_call_fntype (call2))

> +       return false;

> +    }

> +  return true;

> +}

> +

>   /* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the

>      caller's attempt to find the vector type in STMT with the narrowest

>      element type.  Return true if VECTYPE is nonnull and if it is valid

> @@ -625,8 +660,8 @@ vect_record_max_nunits (vec_info *vinfo,

>   static bool

>   vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,

>                         vec<gimple *> stmts, unsigned int group_size,

> -                      unsigned nops, poly_uint64 *max_nunits,

> -                      bool *matches, bool *two_operators)

> +                      poly_uint64 *max_nunits, bool *matches,

> +                      bool *two_operators)

>   {

>     unsigned int i;

>     gimple *first_stmt = stmts[0], *stmt = stmts[0];

> @@ -698,7 +733,9 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>         if (gcall *call_stmt = dyn_cast <gcall *> (stmt))

>          {

>            rhs_code = CALL_EXPR;

> -         if (gimple_call_internal_p (call_stmt)

> +         if ((gimple_call_internal_p (call_stmt)

> +              && (!vectorizable_internal_fn_p

> +                  (gimple_call_internal_fn (call_stmt))))

>                || gimple_call_tail_p (call_stmt)

>                || gimple_call_noreturn_p (call_stmt)

>                || !gimple_call_nothrow_p (call_stmt)

> @@ -833,11 +870,8 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>            if (rhs_code == CALL_EXPR)

>              {

>                gimple *first_stmt = stmts[0];

> -             if (gimple_call_num_args (stmt) != nops

> -                 || !operand_equal_p (gimple_call_fn (first_stmt),

> -                                      gimple_call_fn (stmt), 0)

> -                 || gimple_call_fntype (first_stmt)

> -                    != gimple_call_fntype (stmt))

> +             if (!compatible_calls_p (as_a <gcall *> (first_stmt),

> +                                      as_a <gcall *> (stmt)))

>                  {

>                    if (dump_enabled_p ())

>                      {

> @@ -1166,8 +1200,7 @@ vect_build_slp_tree_2 (vec_info *vinfo,


>     bool two_operators = false;

>     unsigned char *swap = XALLOCAVEC (unsigned char, group_size);

> -  if (!vect_build_slp_tree_1 (vinfo, swap,

> -                             stmts, group_size, nops,

> +  if (!vect_build_slp_tree_1 (vinfo, swap, stmts, group_size,

>                                &this_max_nunits, matches, &two_operators))

>       return NULL;


> Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c 2018-05-16

11:12:11.872116220 +0100
> @@ -0,0 +1,62 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include <stdint.h>

> +

> +#define TEST(TYPE, NAME, OP)                                           \

> +  void __attribute__ ((noinline, noclone))                             \

> +  test_##TYPE##_##NAME (TYPE *__restrict x,                            \

> +                       TYPE *__restrict y,                             \

> +                       TYPE z1, TYPE z2,                               \

> +                       TYPE *__restrict pred, int n)                   \

> +  {                                                                    \

> +    for (int i = 0; i < n; i += 2)                                     \

> +      {

        \
> +       x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]);                      \

> +       x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]);      \

> +      }

        \
> +  }

> +

> +#define TEST_INT_TYPE(TYPE) \

> +  TEST (TYPE, div, /)

> +

> +#define TEST_FP_TYPE(TYPE) \

> +  TEST (TYPE, add, +) \

> +  TEST (TYPE, sub, -) \

> +  TEST (TYPE, mul, *) \

> +  TEST (TYPE, div, /)

> +

> +#define TEST_ALL \

> +  TEST_INT_TYPE (int32_t) \

> +  TEST_INT_TYPE (uint32_t) \

> +  TEST_INT_TYPE (int64_t) \

> +  TEST_INT_TYPE (uint64_t) \

> +  TEST_FP_TYPE (float) \

> +  TEST_FP_TYPE (double)

> +

> +TEST_ALL

> +

> +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 1 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 1 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 1 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 1 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 1 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 12

} } */
> +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 6 } }

*/
> +

> +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 12

} } */
> +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 6 } }

*/
> +

> +/* { dg-final { scan-assembler-not {\tsel\t} } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c

2018-05-16 11:12:11.872116220 +0100
> @@ -0,0 +1,32 @@

> +/* { dg-do run { target aarch64_sve_hw } } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include "cond_arith_4.c"

> +

> +#define N 98

> +

> +#undef TEST

> +#define TEST(TYPE, NAME, OP)                                   \

> +  {                                                            \

> +    TYPE x[N], y[N], pred[N], z[2] = { 5, 7 };                 \

> +    for (int i = 0; i < N; ++i)                                        \

> +      {                                                                \

> +       y[i] = i * i;                                           \

> +       pred[i] = i % 3;                                        \

> +      }                                                                \

> +    test_##TYPE##_##NAME (x, y, z[0], z[1], pred, N);          \

> +    for (int i = 0; i < N; ++i)                                        \

> +      {                                                                \

> +       TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i];   \

> +       if (x[i] != expected)                                   \

> +         __builtin_abort ();                                   \

> +       asm volatile ("" ::: "memory");                         \

> +      }                                                                \

> +  }

> +

> +int

> +main (void)

> +{

> +  TEST_ALL

> +  return 0;

> +}

> Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c 2018-05-16

11:12:11.872116220 +0100
> @@ -0,0 +1,85 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */

> +

> +#include <stdint.h>

> +

> +#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP)                          \

> +  void __attribute__ ((noinline, noclone))                             \

> +  test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (DATA_TYPE *__restrict x,   \

> +                                           DATA_TYPE *__restrict y,    \

> +                                           DATA_TYPE z1, DATA_TYPE z2, \

> +                                           DATA_TYPE *__restrict pred, \

> +                                           OTHER_TYPE *__restrict foo, \

> +                                           int n)                      \

> +  {                                                                    \

> +    for (int i = 0; i < n; i += 2)                                     \

> +      {

        \
> +       x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]);                      \

> +       x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]);      \

> +       foo[i] += 1;                                                    \

> +       foo[i + 1] += 2;                                                \

> +      }

        \
> +  }

> +

> +#define TEST_INT_TYPE(DATA_TYPE, OTHER_TYPE) \

> +  TEST (DATA_TYPE, OTHER_TYPE, div, /)

> +

> +#define TEST_FP_TYPE(DATA_TYPE, OTHER_TYPE) \

> +  TEST (DATA_TYPE, OTHER_TYPE, add, +) \

> +  TEST (DATA_TYPE, OTHER_TYPE, sub, -) \

> +  TEST (DATA_TYPE, OTHER_TYPE, mul, *) \

> +  TEST (DATA_TYPE, OTHER_TYPE, div, /)

> +

> +#define TEST_ALL \

> +  TEST_INT_TYPE (int32_t, int8_t) \

> +  TEST_INT_TYPE (int32_t, int16_t) \

> +  TEST_INT_TYPE (uint32_t, int8_t) \

> +  TEST_INT_TYPE (uint32_t, int16_t) \

> +  TEST_INT_TYPE (int64_t, int8_t) \

> +  TEST_INT_TYPE (int64_t, int16_t) \

> +  TEST_INT_TYPE (int64_t, int32_t) \

> +  TEST_INT_TYPE (uint64_t, int8_t) \

> +  TEST_INT_TYPE (uint64_t, int16_t) \

> +  TEST_INT_TYPE (uint64_t, int32_t) \

> +  TEST_FP_TYPE (float, int8_t) \

> +  TEST_FP_TYPE (float, int16_t) \

> +  TEST_FP_TYPE (double, int8_t) \

> +  TEST_FP_TYPE (double, int16_t) \

> +  TEST_FP_TYPE (double, int32_t)

> +

> +TEST_ALL

> +

> +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 6 }

} */
> +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 6 }

} */
> +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 14

} } */
> +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 14

} } */
> +

> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 6 }

} */
> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 14

} } */
> +

> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 6 }

} */
> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 14

} } */
> +

> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 6 }

} */
> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 14

} } */
> +

> +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 6 }

} */
> +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 14

} } */
> +

> +/* The load XFAILs for fixed-length SVE account for extra loads from the

> +   constant pool.  */

> +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]/z,} 12

{ xfail { aarch64_sve && { ! vect_variable_length } } } } } */
> +/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7],} 12 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z,} 12

{ xfail { aarch64_sve && { ! vect_variable_length } } } } } */
> +/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7],} 12 }

} */
> +

> +/* 72 for x operations, 6 for foo operations.  */

> +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 78

{ xfail { aarch64_sve && { ! vect_variable_length } } } } } */
> +/* 36 for x operations, 6 for foo operations.  */

> +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 42 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 168

} } */
> +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 84 }

} */
> +

> +/* { dg-final { scan-assembler-not {\tsel\t} } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c

2018-05-16 11:12:11.873116180 +0100
> @@ -0,0 +1,35 @@

> +/* { dg-do run { target aarch64_sve_hw } } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include "cond_arith_5.c"

> +

> +#define N 98

> +

> +#undef TEST

> +#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP)                          \

> +  {                                                                    \

> +    DATA_TYPE x[N], y[N], pred[N], z[2] = { 5, 7 };                    \

> +    OTHER_TYPE foo[N];                                                 \

> +    for (int i = 0; i < N; ++i)

        \
> +      {

        \
> +       y[i] = i * i;                                                   \

> +       pred[i] = i % 3;                                                \

> +       foo[i] = i * 5;                                                 \

> +      }

        \
> +    test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (x, y, z[0], z[1],

        \
> +                                             pred, foo, N);            \

> +    for (int i = 0; i < N; ++i)

        \
> +      {

        \
> +       DATA_TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i];      \

> +       if (x[i] != expected)                                           \

> +         __builtin_abort ();                                           \

> +       asm volatile ("" ::: "memory");                                 \

> +      }

        \
> +  }

> +

> +int

> +main (void)

> +{

> +  TEST_ALL

> +  return 0;

> +}

> Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/slp_14.c       2018-05-16

11:12:11.873116180 +0100
> @@ -0,0 +1,48 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include <stdint.h>

> +

> +#define VEC_PERM(TYPE)                                         \

> +void __attribute__ ((weak))                                    \

> +vec_slp_##TYPE (TYPE *restrict a, TYPE *restrict b, int n)     \

> +{                                                              \

> +  for (int i = 0; i < n; ++i)                                  \

> +    {                                                          \

> +      TYPE a1 = a[i * 2];                                      \

> +      TYPE a2 = a[i * 2 + 1];                                  \

> +      TYPE b1 = b[i * 2];                                      \

> +      TYPE b2 = b[i * 2 + 1];                                  \

> +      a[i * 2] = b1 > 1 ? a1 / b1 : a1;                                \

> +      a[i * 2 + 1] = b2 > 2 ? a2 / b2 : a2;                    \

> +    }                                                          \

> +}

> +

> +#define TEST_ALL(T)                            \

> +  T (int32_t)                                  \

> +  T (uint32_t)                                 \

> +  T (int64_t)                                  \

> +  T (uint64_t)                                 \

> +  T (float)                                    \

> +  T (double)

> +

> +TEST_ALL (VEC_PERM)

> +

> +/* The loop should be fully-masked.  The load XFAILs for fixed-length

> +   SVE account for extra loads from the constant pool.  */

> +/* { dg-final { scan-assembler-times {\tld1w\t} 6 { xfail { aarch64_sve

&& { ! vect_variable_length } } } } } */
> +/* { dg-final { scan-assembler-times {\tst1w\t} 3 } } */

> +/* { dg-final { scan-assembler-times {\tld1d\t} 6 { xfail { aarch64_sve

&& { ! vect_variable_length } } } } } */
> +/* { dg-final { scan-assembler-times {\tst1d\t} 3 } } */

> +/* { dg-final { scan-assembler-not {\tldr} } } */

> +/* { dg-final { scan-assembler-not {\tstr} } } */

> +

> +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */

> +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */

> +

> +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s} 1 } } */

> +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s} 1 } } */

> +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s} 1 } } */

> +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d} 1 } } */

> +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d} 1 } } */

> +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d} 1 } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c   2018-05-16

11:12:11.873116180 +0100
> @@ -0,0 +1,34 @@

> +/* { dg-do run { target aarch64_sve_hw } } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include "slp_14.c"

> +

> +#define N1 (103 * 2)

> +#define N2 (111 * 2)

> +

> +#define HARNESS(TYPE)                                          \

> +  {                                                            \

> +    TYPE a[N2], b[N2];                                         \

> +    for (unsigned int i = 0; i < N2; ++i)                      \

> +      {                                                                \

> +       a[i] = i * 2 + i % 5;                                   \

> +       b[i] = i % 11;                                          \

> +      }                                                                \

> +    vec_slp_##TYPE (a, b, N1 / 2);                             \

> +    for (unsigned int i = 0; i < N2; ++i)                      \

> +      {                                                                \

> +       TYPE orig_a = i * 2 + i % 5;                            \

> +       TYPE orig_b = i % 11;                                   \

> +       TYPE expected_a = orig_a;                               \

> +       if (i < N1 && orig_b > (i & 1 ? 2 : 1))                 \

> +         expected_a /= orig_b;                                 \

> +       if (a[i] != expected_a || b[i] != orig_b)               \

> +         __builtin_abort ();                                   \

> +      }                                                                \

> +  }

> +

> +int

> +main (void)

> +{

> +  TEST_ALL (HARNESS)

> +}

Richard Sandiford May 25, 2018, 10:31 a.m. UTC | #2

Richard Biener <richard.guenther@gmail.com> writes:
>> Index: gcc/tree-vect-slp.c

>> ===================================================================

>> --- gcc/tree-vect-slp.c 2018-05-16 11:02:46.262494712 +0100

>> +++ gcc/tree-vect-slp.c 2018-05-16 11:12:11.873116180 +0100

>> @@ -564,6 +564,41 @@ vect_get_and_check_slp_defs (vec_info *v

>>     return 0;

>>   }

>

>> +/* Return true if call statements CALL1 and CALL2 are similar enough

>> +   to be combined into the same SLP group.  */

>> +

>> +static bool

>> +compatible_calls_p (gcall *call1, gcall *call2)

>> +{

>> +  unsigned int nargs = gimple_call_num_args (call1);

>> +  if (nargs != gimple_call_num_args (call2))

>> +    return false;

>> +

>> +  if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2))

>> +    return false;

>> +

>> +  if (gimple_call_internal_p (call1))

>> +    {

>> +      if (TREE_TYPE (gimple_call_lhs (call1))

>> +         != TREE_TYPE (gimple_call_lhs (call2)))

>> +       return false;

>> +      for (unsigned int i = 0; i < nargs; ++i)

>> +       if (TREE_TYPE (gimple_call_arg (call1, i))

>> +           != TREE_TYPE (gimple_call_arg (call2, i)))

>

> Please use types_compatible_p in these two type comparisons.


OK.

> Can you please add a generic vect_call_sqrtf to the main vectorizer

> testsuite?  In fact I already see

> gcc.dg/vect/fast-math-bb-slp-call-1.c.  Does that mean SQRT does never

> appear as internal function before vectorization?


Yeah, sqrt vectorisation is scalar built-in -> vector internal function.

But this patch adds a generic type keyed off vect_double_cond_arith.
Would that be OK instead?

Tested as before.

Thanks,
Richard


2018-05-25  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* internal-fn.h (vectorizable_internal_fn_p): New function.
	* tree-vect-slp.c (compatible_calls_p): Likewise.
	(vect_build_slp_tree_1): Remove nops argument.  Handle calls
	to internal functions.
	(vect_build_slp_tree_2): Update call to vect_build_slp_tree_1.

gcc/testsuite/
	* gcc.dg/vect/vect-cond-arith-6.c: New test.
	* gcc.target/aarch64/sve/cond_arith_4.c: Likewise.
	* gcc.target/aarch64/sve/cond_arith_4_run.c: Likewise.
	* gcc.target/aarch64/sve/cond_arith_5.c: Likewise.
	* gcc.target/aarch64/sve/cond_arith_5_run.c: Likewise.
	* gcc.target/aarch64/sve/slp_14.c: Likewise.
	* gcc.target/aarch64/sve/slp_14_run.c: Likewise.

Index: gcc/internal-fn.h
===================================================================
--- gcc/internal-fn.h	2018-05-25 11:28:05.953287025 +0100
+++ gcc/internal-fn.h	2018-05-25 11:28:06.193277781 +0100
@@ -160,6 +160,17 @@ direct_internal_fn_p (internal_fn fn)
   return direct_internal_fn_array[fn].type0 >= -1;
 }
 
+/* Return true if FN is a direct internal function that can be vectorized by
+   converting the return type and all argument types to vectors of the same
+   number of elements.  E.g. we can vectorize an IFN_SQRT on floats as an
+   IFN_SQRT on vectors of N floats.  */
+
+inline bool
+vectorizable_internal_fn_p (internal_fn fn)
+{
+  return direct_internal_fn_array[fn].vectorizable;
+}
+
 /* Return optab information about internal function FN.  Only meaningful
    if direct_internal_fn_p (FN).  */
 
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2018-05-25 11:28:05.953287025 +0100
+++ gcc/tree-vect-slp.c	2018-05-25 11:28:06.195277704 +0100
@@ -565,6 +565,41 @@ vect_get_and_check_slp_defs (vec_info *v
   return 0;
 }
 
+/* Return true if call statements CALL1 and CALL2 are similar enough
+   to be combined into the same SLP group.  */
+
+static bool
+compatible_calls_p (gcall *call1, gcall *call2)
+{
+  unsigned int nargs = gimple_call_num_args (call1);
+  if (nargs != gimple_call_num_args (call2))
+    return false;
+
+  if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2))
+    return false;
+
+  if (gimple_call_internal_p (call1))
+    {
+      if (!types_compatible_p (TREE_TYPE (gimple_call_lhs (call1)),
+			       TREE_TYPE (gimple_call_lhs (call2))))
+	return false;
+      for (unsigned int i = 0; i < nargs; ++i)
+	if (!types_compatible_p (TREE_TYPE (gimple_call_arg (call1, i)),
+				 TREE_TYPE (gimple_call_arg (call2, i))))
+	  return false;
+    }
+  else
+    {
+      if (!operand_equal_p (gimple_call_fn (call1),
+			    gimple_call_fn (call2), 0))
+	return false;
+
+      if (gimple_call_fntype (call1) != gimple_call_fntype (call2))
+	return false;
+    }
+  return true;
+}
+
 /* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the
    caller's attempt to find the vector type in STMT with the narrowest
    element type.  Return true if VECTYPE is nonnull and if it is valid
@@ -653,8 +688,8 @@ vect_two_operations_perm_ok_p (vec<gimpl
 static bool
 vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
 		       vec<gimple *> stmts, unsigned int group_size,
-		       unsigned nops, poly_uint64 *max_nunits,
-		       bool *matches, bool *two_operators)
+		       poly_uint64 *max_nunits, bool *matches,
+		       bool *two_operators)
 {
   unsigned int i;
   gimple *first_stmt = stmts[0], *stmt = stmts[0];
@@ -730,7 +765,9 @@ vect_build_slp_tree_1 (vec_info *vinfo,
       if (gcall *call_stmt = dyn_cast <gcall *> (stmt))
 	{
 	  rhs_code = CALL_EXPR;
-	  if (gimple_call_internal_p (call_stmt)
+	  if ((gimple_call_internal_p (call_stmt)
+	       && (!vectorizable_internal_fn_p
+		   (gimple_call_internal_fn (call_stmt))))
 	      || gimple_call_tail_p (call_stmt)
 	      || gimple_call_noreturn_p (call_stmt)
 	      || !gimple_call_nothrow_p (call_stmt)
@@ -876,11 +913,8 @@ vect_build_slp_tree_1 (vec_info *vinfo,
 	  if (rhs_code == CALL_EXPR)
 	    {
 	      gimple *first_stmt = stmts[0];
-	      if (gimple_call_num_args (stmt) != nops
-		  || !operand_equal_p (gimple_call_fn (first_stmt),
-				       gimple_call_fn (stmt), 0)
-		  || gimple_call_fntype (first_stmt)
-		     != gimple_call_fntype (stmt))
+	      if (!compatible_calls_p (as_a <gcall *> (first_stmt),
+				       as_a <gcall *> (stmt)))
 		{
 		  if (dump_enabled_p ())
 		    {
@@ -1196,8 +1230,7 @@ vect_build_slp_tree_2 (vec_info *vinfo,
 
   bool two_operators = false;
   unsigned char *swap = XALLOCAVEC (unsigned char, group_size);
-  if (!vect_build_slp_tree_1 (vinfo, swap,
-			      stmts, group_size, nops,
+  if (!vect_build_slp_tree_1 (vinfo, swap, stmts, group_size,
 			      &this_max_nunits, matches, &two_operators))
     return NULL;
 
Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c	2018-05-25 11:28:06.195277704 +0100
@@ -0,0 +1,62 @@
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include "tree-vect.h"
+
+#define N (VECTOR_BITS * 11 / 64 + 4)
+
+#define add(A, B) ((A) + (B))
+#define sub(A, B) ((A) - (B))
+#define mul(A, B) ((A) * (B))
+#define div(A, B) ((A) / (B))
+
+#define DEF(OP)								\
+  void __attribute__ ((noipa))						\
+  f_##OP (double *restrict a, double *restrict b, double x)		\
+  {									\
+    for (int i = 0; i < N; i += 2)					\
+      {									\
+	a[i] = b[i] < 100 ? OP (b[i], x) : b[i];			\
+	a[i + 1] = b[i + 1] < 70 ? OP (b[i + 1], x) : b[i + 1];		\
+      }									\
+  }
+
+#define TEST(OP)						\
+  {								\
+    f_##OP (a, b, 10);						\
+    for (int i = 0; i < N; ++i)					\
+      {								\
+	int bval = (i % 17) * 10;				\
+	int truev = OP (bval, 10);				\
+	if (a[i] != (bval < (i & 1 ? 70 : 100) ? truev : bval))	\
+	__builtin_abort ();					\
+	asm volatile ("" ::: "memory");				\
+      }								\
+  }
+
+#define FOR_EACH_OP(T)				\
+  T (add)					\
+  T (sub)					\
+  T (mul)					\
+  T (div)
+
+FOR_EACH_OP (DEF)
+
+int
+main (void)
+{
+  double a[N], b[N];
+  for (int i = 0; i < N; ++i)
+    {
+      b[i] = (i % 17) * 10;
+      asm volatile ("" ::: "memory");
+    }
+  FOR_EACH_OP (TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times {vectorizing stmts using SLP} 4 "vect" { target vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump-times { = \.COND_ADD} 1 "optimized" { target vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump-times { = \.COND_SUB} 1 "optimized" { target vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump-times { = \.COND_MUL} 1 "optimized" { target vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump-times { = \.COND_RDIV} 1 "optimized" { target vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target vect_double_cond_arith } } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c	2018-05-25 11:28:06.195277704 +0100
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include <stdint.h>
+
+#define TEST(TYPE, NAME, OP)						\
+  void __attribute__ ((noinline, noclone))				\
+  test_##TYPE##_##NAME (TYPE *__restrict x,				\
+			TYPE *__restrict y,				\
+			TYPE z1, TYPE z2,				\
+			TYPE *__restrict pred, int n)			\
+  {									\
+    for (int i = 0; i < n; i += 2)					\
+      {									\
+	x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]);			\
+	x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]);	\
+      }									\
+  }
+
+#define TEST_INT_TYPE(TYPE) \
+  TEST (TYPE, div, /)
+
+#define TEST_FP_TYPE(TYPE) \
+  TEST (TYPE, add, +) \
+  TEST (TYPE, sub, -) \
+  TEST (TYPE, mul, *) \
+  TEST (TYPE, div, /)
+
+#define TEST_ALL \
+  TEST_INT_TYPE (int32_t) \
+  TEST_INT_TYPE (uint32_t) \
+  TEST_INT_TYPE (int64_t) \
+  TEST_INT_TYPE (uint64_t) \
+  TEST_FP_TYPE (float) \
+  TEST_FP_TYPE (double)
+
+TEST_ALL
+
+/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 12 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 6 } } */
+
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 12 } } */
+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 6 } } */
+
+/* { dg-final { scan-assembler-not {\tsel\t} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c	2018-05-25 11:28:06.195277704 +0100
@@ -0,0 +1,32 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "cond_arith_4.c"
+
+#define N 98
+
+#undef TEST
+#define TEST(TYPE, NAME, OP)					\
+  {								\
+    TYPE x[N], y[N], pred[N], z[2] = { 5, 7 };			\
+    for (int i = 0; i < N; ++i)					\
+      {								\
+	y[i] = i * i;						\
+	pred[i] = i % 3;					\
+      }								\
+    test_##TYPE##_##NAME (x, y, z[0], z[1], pred, N);		\
+    for (int i = 0; i < N; ++i)					\
+      {								\
+	TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i];	\
+	if (x[i] != expected)					\
+	  __builtin_abort ();					\
+	asm volatile ("" ::: "memory");				\
+      }								\
+  }
+
+int
+main (void)
+{
+  TEST_ALL
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c	2018-05-25 11:28:06.195277704 +0100
@@ -0,0 +1,85 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#include <stdint.h>
+
+#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP)				\
+  void __attribute__ ((noinline, noclone))				\
+  test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (DATA_TYPE *__restrict x,	\
+					    DATA_TYPE *__restrict y,	\
+					    DATA_TYPE z1, DATA_TYPE z2,	\
+					    DATA_TYPE *__restrict pred,	\
+					    OTHER_TYPE *__restrict foo,	\
+					    int n)			\
+  {									\
+    for (int i = 0; i < n; i += 2)					\
+      {									\
+	x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]);			\
+	x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]);	\
+	foo[i] += 1;							\
+	foo[i + 1] += 2;						\
+      }									\
+  }
+
+#define TEST_INT_TYPE(DATA_TYPE, OTHER_TYPE) \
+  TEST (DATA_TYPE, OTHER_TYPE, div, /)
+
+#define TEST_FP_TYPE(DATA_TYPE, OTHER_TYPE) \
+  TEST (DATA_TYPE, OTHER_TYPE, add, +) \
+  TEST (DATA_TYPE, OTHER_TYPE, sub, -) \
+  TEST (DATA_TYPE, OTHER_TYPE, mul, *) \
+  TEST (DATA_TYPE, OTHER_TYPE, div, /)
+
+#define TEST_ALL \
+  TEST_INT_TYPE (int32_t, int8_t) \
+  TEST_INT_TYPE (int32_t, int16_t) \
+  TEST_INT_TYPE (uint32_t, int8_t) \
+  TEST_INT_TYPE (uint32_t, int16_t) \
+  TEST_INT_TYPE (int64_t, int8_t) \
+  TEST_INT_TYPE (int64_t, int16_t) \
+  TEST_INT_TYPE (int64_t, int32_t) \
+  TEST_INT_TYPE (uint64_t, int8_t) \
+  TEST_INT_TYPE (uint64_t, int16_t) \
+  TEST_INT_TYPE (uint64_t, int32_t) \
+  TEST_FP_TYPE (float, int8_t) \
+  TEST_FP_TYPE (float, int16_t) \
+  TEST_FP_TYPE (double, int8_t) \
+  TEST_FP_TYPE (double, int16_t) \
+  TEST_FP_TYPE (double, int32_t)
+
+TEST_ALL
+
+/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */
+/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */
+/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */
+/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */
+
+/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 6 } } */
+/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 14 } } */
+
+/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 6 } } */
+/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 14 } } */
+
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 6 } } */
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 14 } } */
+
+/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */
+/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */
+
+/* The load XFAILs for fixed-length SVE account for extra loads from the
+   constant pool.  */
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]/z,} 12 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */
+/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7],} 12 } } */
+
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z,} 12 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7],} 12 } } */
+
+/* 72 for x operations, 6 for foo operations.  */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 78 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */
+/* 36 for x operations, 6 for foo operations.  */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 42 } } */
+
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 168 } } */
+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 84 } } */
+
+/* { dg-final { scan-assembler-not {\tsel\t} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c	2018-05-25 11:28:06.195277704 +0100
@@ -0,0 +1,35 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "cond_arith_5.c"
+
+#define N 98
+
+#undef TEST
+#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP)				\
+  {									\
+    DATA_TYPE x[N], y[N], pred[N], z[2] = { 5, 7 };			\
+    OTHER_TYPE foo[N];							\
+    for (int i = 0; i < N; ++i)						\
+      {									\
+	y[i] = i * i;							\
+	pred[i] = i % 3;						\
+	foo[i] = i * 5;							\
+      }									\
+    test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (x, y, z[0], z[1],		\
+					      pred, foo, N);		\
+    for (int i = 0; i < N; ++i)						\
+      {									\
+	DATA_TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i];	\
+	if (x[i] != expected)						\
+	  __builtin_abort ();						\
+	asm volatile ("" ::: "memory");					\
+      }									\
+  }
+
+int
+main (void)
+{
+  TEST_ALL
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/slp_14.c	2018-05-25 11:28:06.195277704 +0100
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include <stdint.h>
+
+#define VEC_PERM(TYPE)						\
+void __attribute__ ((weak))					\
+vec_slp_##TYPE (TYPE *restrict a, TYPE *restrict b, int n)	\
+{								\
+  for (int i = 0; i < n; ++i)					\
+    {								\
+      TYPE a1 = a[i * 2];					\
+      TYPE a2 = a[i * 2 + 1];					\
+      TYPE b1 = b[i * 2];					\
+      TYPE b2 = b[i * 2 + 1];					\
+      a[i * 2] = b1 > 1 ? a1 / b1 : a1;				\
+      a[i * 2 + 1] = b2 > 2 ? a2 / b2 : a2;			\
+    }								\
+}
+
+#define TEST_ALL(T)				\
+  T (int32_t)					\
+  T (uint32_t)					\
+  T (int64_t)					\
+  T (uint64_t)					\
+  T (float)					\
+  T (double)
+
+TEST_ALL (VEC_PERM)
+
+/* The loop should be fully-masked.  The load XFAILs for fixed-length
+   SVE account for extra loads from the constant pool.  */
+/* { dg-final { scan-assembler-times {\tld1w\t} 6 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */
+/* { dg-final { scan-assembler-times {\tst1w\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1d\t} 6 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */
+/* { dg-final { scan-assembler-times {\tst1d\t} 3 } } */
+/* { dg-final { scan-assembler-not {\tldr} } } */
+/* { dg-final { scan-assembler-not {\tstr} } } */
+
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */
+
+/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s} 1 } } */
+/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s} 1 } } */
+/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s} 1 } } */
+/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d} 1 } } */
+/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d} 1 } } */
+/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c	2018-05-25 11:28:06.195277704 +0100
@@ -0,0 +1,34 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "slp_14.c"
+
+#define N1 (103 * 2)
+#define N2 (111 * 2)
+
+#define HARNESS(TYPE)						\
+  {								\
+    TYPE a[N2], b[N2];						\
+    for (unsigned int i = 0; i < N2; ++i)			\
+      {								\
+	a[i] = i * 2 + i % 5;					\
+	b[i] = i % 11;						\
+      }								\
+    vec_slp_##TYPE (a, b, N1 / 2);				\
+    for (unsigned int i = 0; i < N2; ++i)			\
+      {								\
+	TYPE orig_a = i * 2 + i % 5;				\
+	TYPE orig_b = i % 11;					\
+	TYPE expected_a = orig_a;				\
+	if (i < N1 && orig_b > (i & 1 ? 2 : 1))			\
+	  expected_a /= orig_b;					\
+	if (a[i] != expected_a || b[i] != orig_b)		\
+	  __builtin_abort ();					\
+      }								\
+  }
+
+int
+main (void)
+{
+  TEST_ALL (HARNESS)
+}

Richard Biener May 25, 2018, 11:09 a.m. UTC | #3

On Fri, May 25, 2018 at 12:31 PM Richard Sandiford <
richard.sandiford@linaro.org> wrote:

> Richard Biener <richard.guenther@gmail.com> writes:

> >> Index: gcc/tree-vect-slp.c

> >> ===================================================================

> >> --- gcc/tree-vect-slp.c 2018-05-16 11:02:46.262494712 +0100

> >> +++ gcc/tree-vect-slp.c 2018-05-16 11:12:11.873116180 +0100

> >> @@ -564,6 +564,41 @@ vect_get_and_check_slp_defs (vec_info *v

> >>     return 0;

> >>   }

> >

> >> +/* Return true if call statements CALL1 and CALL2 are similar enough

> >> +   to be combined into the same SLP group.  */

> >> +

> >> +static bool

> >> +compatible_calls_p (gcall *call1, gcall *call2)

> >> +{

> >> +  unsigned int nargs = gimple_call_num_args (call1);

> >> +  if (nargs != gimple_call_num_args (call2))

> >> +    return false;

> >> +

> >> +  if (gimple_call_combined_fn (call1) != gimple_call_combined_fn

(call2))
> >> +    return false;

> >> +

> >> +  if (gimple_call_internal_p (call1))

> >> +    {

> >> +      if (TREE_TYPE (gimple_call_lhs (call1))

> >> +         != TREE_TYPE (gimple_call_lhs (call2)))

> >> +       return false;

> >> +      for (unsigned int i = 0; i < nargs; ++i)

> >> +       if (TREE_TYPE (gimple_call_arg (call1, i))

> >> +           != TREE_TYPE (gimple_call_arg (call2, i)))

> >

> > Please use types_compatible_p in these two type comparisons.


> OK.


> > Can you please add a generic vect_call_sqrtf to the main vectorizer

> > testsuite?  In fact I already see

> > gcc.dg/vect/fast-math-bb-slp-call-1.c.  Does that mean SQRT does never

> > appear as internal function before vectorization?


> Yeah, sqrt vectorisation is scalar built-in -> vector internal function.


> But this patch adds a generic type keyed off vect_double_cond_arith.

> Would that be OK instead?


Yes, that works for me.

Thanks,
Richard.

> Tested as before.


> Thanks,

> Richard



> 2018-05-25  Richard Sandiford  <richard.sandiford@linaro.org>


> gcc/

>          * internal-fn.h (vectorizable_internal_fn_p): New function.

>          * tree-vect-slp.c (compatible_calls_p): Likewise.

>          (vect_build_slp_tree_1): Remove nops argument.  Handle calls

>          to internal functions.

>          (vect_build_slp_tree_2): Update call to vect_build_slp_tree_1.


> gcc/testsuite/

>          * gcc.dg/vect/vect-cond-arith-6.c: New test.

>          * gcc.target/aarch64/sve/cond_arith_4.c: Likewise.

>          * gcc.target/aarch64/sve/cond_arith_4_run.c: Likewise.

>          * gcc.target/aarch64/sve/cond_arith_5.c: Likewise.

>          * gcc.target/aarch64/sve/cond_arith_5_run.c: Likewise.

>          * gcc.target/aarch64/sve/slp_14.c: Likewise.

>          * gcc.target/aarch64/sve/slp_14_run.c: Likewise.


> Index: gcc/internal-fn.h

> ===================================================================

> --- gcc/internal-fn.h   2018-05-25 11:28:05.953287025 +0100

> +++ gcc/internal-fn.h   2018-05-25 11:28:06.193277781 +0100

> @@ -160,6 +160,17 @@ direct_internal_fn_p (internal_fn fn)

>     return direct_internal_fn_array[fn].type0 >= -1;

>   }


> +/* Return true if FN is a direct internal function that can be

vectorized by
> +   converting the return type and all argument types to vectors of the

same
> +   number of elements.  E.g. we can vectorize an IFN_SQRT on floats as an

> +   IFN_SQRT on vectors of N floats.  */

> +

> +inline bool

> +vectorizable_internal_fn_p (internal_fn fn)

> +{

> +  return direct_internal_fn_array[fn].vectorizable;

> +}

> +

>   /* Return optab information about internal function FN.  Only meaningful

>      if direct_internal_fn_p (FN).  */


> Index: gcc/tree-vect-slp.c

> ===================================================================

> --- gcc/tree-vect-slp.c 2018-05-25 11:28:05.953287025 +0100

> +++ gcc/tree-vect-slp.c 2018-05-25 11:28:06.195277704 +0100

> @@ -565,6 +565,41 @@ vect_get_and_check_slp_defs (vec_info *v

>     return 0;

>   }


> +/* Return true if call statements CALL1 and CALL2 are similar enough

> +   to be combined into the same SLP group.  */

> +

> +static bool

> +compatible_calls_p (gcall *call1, gcall *call2)

> +{

> +  unsigned int nargs = gimple_call_num_args (call1);

> +  if (nargs != gimple_call_num_args (call2))

> +    return false;

> +

> +  if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2))

> +    return false;

> +

> +  if (gimple_call_internal_p (call1))

> +    {

> +      if (!types_compatible_p (TREE_TYPE (gimple_call_lhs (call1)),

> +                              TREE_TYPE (gimple_call_lhs (call2))))

> +       return false;

> +      for (unsigned int i = 0; i < nargs; ++i)

> +       if (!types_compatible_p (TREE_TYPE (gimple_call_arg (call1, i)),

> +                                TREE_TYPE (gimple_call_arg (call2, i))))

> +         return false;

> +    }

> +  else

> +    {

> +      if (!operand_equal_p (gimple_call_fn (call1),

> +                           gimple_call_fn (call2), 0))

> +       return false;

> +

> +      if (gimple_call_fntype (call1) != gimple_call_fntype (call2))

> +       return false;

> +    }

> +  return true;

> +}

> +

>   /* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the

>      caller's attempt to find the vector type in STMT with the narrowest

>      element type.  Return true if VECTYPE is nonnull and if it is valid

> @@ -653,8 +688,8 @@ vect_two_operations_perm_ok_p (vec<gimpl

>   static bool

>   vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,

>                         vec<gimple *> stmts, unsigned int group_size,

> -                      unsigned nops, poly_uint64 *max_nunits,

> -                      bool *matches, bool *two_operators)

> +                      poly_uint64 *max_nunits, bool *matches,

> +                      bool *two_operators)

>   {

>     unsigned int i;

>     gimple *first_stmt = stmts[0], *stmt = stmts[0];

> @@ -730,7 +765,9 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>         if (gcall *call_stmt = dyn_cast <gcall *> (stmt))

>          {

>            rhs_code = CALL_EXPR;

> -         if (gimple_call_internal_p (call_stmt)

> +         if ((gimple_call_internal_p (call_stmt)

> +              && (!vectorizable_internal_fn_p

> +                  (gimple_call_internal_fn (call_stmt))))

>                || gimple_call_tail_p (call_stmt)

>                || gimple_call_noreturn_p (call_stmt)

>                || !gimple_call_nothrow_p (call_stmt)

> @@ -876,11 +913,8 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>            if (rhs_code == CALL_EXPR)

>              {

>                gimple *first_stmt = stmts[0];

> -             if (gimple_call_num_args (stmt) != nops

> -                 || !operand_equal_p (gimple_call_fn (first_stmt),

> -                                      gimple_call_fn (stmt), 0)

> -                 || gimple_call_fntype (first_stmt)

> -                    != gimple_call_fntype (stmt))

> +             if (!compatible_calls_p (as_a <gcall *> (first_stmt),

> +                                      as_a <gcall *> (stmt)))

>                  {

>                    if (dump_enabled_p ())

>                      {

> @@ -1196,8 +1230,7 @@ vect_build_slp_tree_2 (vec_info *vinfo,


>     bool two_operators = false;

>     unsigned char *swap = XALLOCAVEC (unsigned char, group_size);

> -  if (!vect_build_slp_tree_1 (vinfo, swap,

> -                             stmts, group_size, nops,

> +  if (!vect_build_slp_tree_1 (vinfo, swap, stmts, group_size,

>                                &this_max_nunits, matches, &two_operators))

>       return NULL;


> Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-6.c       2018-05-25

11:28:06.195277704 +0100
> @@ -0,0 +1,62 @@

> +/* { dg-additional-options "-fdump-tree-optimized" } */

> +

> +#include "tree-vect.h"

> +

> +#define N (VECTOR_BITS * 11 / 64 + 4)

> +

> +#define add(A, B) ((A) + (B))

> +#define sub(A, B) ((A) - (B))

> +#define mul(A, B) ((A) * (B))

> +#define div(A, B) ((A) / (B))

> +

> +#define DEF(OP)

        \
> +  void __attribute__ ((noipa))                                         \

> +  f_##OP (double *restrict a, double *restrict b, double x)            \

> +  {                                                                    \

> +    for (int i = 0; i < N; i += 2)                                     \

> +      {

        \
> +       a[i] = b[i] < 100 ? OP (b[i], x) : b[i];                        \

> +       a[i + 1] = b[i + 1] < 70 ? OP (b[i + 1], x) : b[i + 1];         \

> +      }

        \
> +  }

> +

> +#define TEST(OP)                                               \

> +  {                                                            \

> +    f_##OP (a, b, 10);                                         \

> +    for (int i = 0; i < N; ++i)                                        \

> +      {                                                                \

> +       int bval = (i % 17) * 10;                               \

> +       int truev = OP (bval, 10);                              \

> +       if (a[i] != (bval < (i & 1 ? 70 : 100) ? truev : bval)) \

> +       __builtin_abort ();                                     \

> +       asm volatile ("" ::: "memory");                         \

> +      }                                                                \

> +  }

> +

> +#define FOR_EACH_OP(T)                         \

> +  T (add)                                      \

> +  T (sub)                                      \

> +  T (mul)                                      \

> +  T (div)

> +

> +FOR_EACH_OP (DEF)

> +

> +int

> +main (void)

> +{

> +  double a[N], b[N];

> +  for (int i = 0; i < N; ++i)

> +    {

> +      b[i] = (i % 17) * 10;

> +      asm volatile ("" ::: "memory");

> +    }

> +  FOR_EACH_OP (TEST)

> +  return 0;

> +}

> +

> +/* { dg-final { scan-tree-dump-times {vectorizing stmts using SLP} 4

"vect" { target vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump-times { = \.COND_ADD} 1 "optimized" {

target vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump-times { = \.COND_SUB} 1 "optimized" {

target vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump-times { = \.COND_MUL} 1 "optimized" {

target vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump-times { = \.COND_RDIV} 1 "optimized" {

target vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump-not {VEC_COND_EXPR} "optimized" { target

vect_double_cond_arith } } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c 2018-05-25

11:28:06.195277704 +0100
> @@ -0,0 +1,62 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include <stdint.h>

> +

> +#define TEST(TYPE, NAME, OP)                                           \

> +  void __attribute__ ((noinline, noclone))                             \

> +  test_##TYPE##_##NAME (TYPE *__restrict x,                            \

> +                       TYPE *__restrict y,                             \

> +                       TYPE z1, TYPE z2,                               \

> +                       TYPE *__restrict pred, int n)                   \

> +  {                                                                    \

> +    for (int i = 0; i < n; i += 2)                                     \

> +      {

        \
> +       x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]);                      \

> +       x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]);      \

> +      }

        \
> +  }

> +

> +#define TEST_INT_TYPE(TYPE) \

> +  TEST (TYPE, div, /)

> +

> +#define TEST_FP_TYPE(TYPE) \

> +  TEST (TYPE, add, +) \

> +  TEST (TYPE, sub, -) \

> +  TEST (TYPE, mul, *) \

> +  TEST (TYPE, div, /)

> +

> +#define TEST_ALL \

> +  TEST_INT_TYPE (int32_t) \

> +  TEST_INT_TYPE (uint32_t) \

> +  TEST_INT_TYPE (int64_t) \

> +  TEST_INT_TYPE (uint64_t) \

> +  TEST_FP_TYPE (float) \

> +  TEST_FP_TYPE (double)

> +

> +TEST_ALL

> +

> +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 1 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 1 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 1 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 1 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 1 }

} */
> +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 1 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 12

} } */
> +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 6 } }

*/
> +

> +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 12

} } */
> +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 6 } }

*/
> +

> +/* { dg-final { scan-assembler-not {\tsel\t} } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c

2018-05-25 11:28:06.195277704 +0100
> @@ -0,0 +1,32 @@

> +/* { dg-do run { target aarch64_sve_hw } } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include "cond_arith_4.c"

> +

> +#define N 98

> +

> +#undef TEST

> +#define TEST(TYPE, NAME, OP)                                   \

> +  {                                                            \

> +    TYPE x[N], y[N], pred[N], z[2] = { 5, 7 };                 \

> +    for (int i = 0; i < N; ++i)                                        \

> +      {                                                                \

> +       y[i] = i * i;                                           \

> +       pred[i] = i % 3;                                        \

> +      }                                                                \

> +    test_##TYPE##_##NAME (x, y, z[0], z[1], pred, N);          \

> +    for (int i = 0; i < N; ++i)                                        \

> +      {                                                                \

> +       TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i];   \

> +       if (x[i] != expected)                                   \

> +         __builtin_abort ();                                   \

> +       asm volatile ("" ::: "memory");                         \

> +      }                                                                \

> +  }

> +

> +int

> +main (void)

> +{

> +  TEST_ALL

> +  return 0;

> +}

> Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c 2018-05-25

11:28:06.195277704 +0100
> @@ -0,0 +1,85 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */

> +

> +#include <stdint.h>

> +

> +#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP)                          \

> +  void __attribute__ ((noinline, noclone))                             \

> +  test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (DATA_TYPE *__restrict x,   \

> +                                           DATA_TYPE *__restrict y,    \

> +                                           DATA_TYPE z1, DATA_TYPE z2, \

> +                                           DATA_TYPE *__restrict pred, \

> +                                           OTHER_TYPE *__restrict foo, \

> +                                           int n)                      \

> +  {                                                                    \

> +    for (int i = 0; i < n; i += 2)                                     \

> +      {

        \
> +       x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]);                      \

> +       x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]);      \

> +       foo[i] += 1;                                                    \

> +       foo[i + 1] += 2;                                                \

> +      }

        \
> +  }

> +

> +#define TEST_INT_TYPE(DATA_TYPE, OTHER_TYPE) \

> +  TEST (DATA_TYPE, OTHER_TYPE, div, /)

> +

> +#define TEST_FP_TYPE(DATA_TYPE, OTHER_TYPE) \

> +  TEST (DATA_TYPE, OTHER_TYPE, add, +) \

> +  TEST (DATA_TYPE, OTHER_TYPE, sub, -) \

> +  TEST (DATA_TYPE, OTHER_TYPE, mul, *) \

> +  TEST (DATA_TYPE, OTHER_TYPE, div, /)

> +

> +#define TEST_ALL \

> +  TEST_INT_TYPE (int32_t, int8_t) \

> +  TEST_INT_TYPE (int32_t, int16_t) \

> +  TEST_INT_TYPE (uint32_t, int8_t) \

> +  TEST_INT_TYPE (uint32_t, int16_t) \

> +  TEST_INT_TYPE (int64_t, int8_t) \

> +  TEST_INT_TYPE (int64_t, int16_t) \

> +  TEST_INT_TYPE (int64_t, int32_t) \

> +  TEST_INT_TYPE (uint64_t, int8_t) \

> +  TEST_INT_TYPE (uint64_t, int16_t) \

> +  TEST_INT_TYPE (uint64_t, int32_t) \

> +  TEST_FP_TYPE (float, int8_t) \

> +  TEST_FP_TYPE (float, int16_t) \

> +  TEST_FP_TYPE (double, int8_t) \

> +  TEST_FP_TYPE (double, int16_t) \

> +  TEST_FP_TYPE (double, int32_t)

> +

> +TEST_ALL

> +

> +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 6 }

} */
> +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 6 }

} */
> +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 14

} } */
> +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 14

} } */
> +

> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 6 }

} */
> +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 14

} } */
> +

> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 6 }

} */
> +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 14

} } */
> +

> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 6 }

} */
> +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 14

} } */
> +

> +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 6 }

} */
> +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 14

} } */
> +

> +/* The load XFAILs for fixed-length SVE account for extra loads from the

> +   constant pool.  */

> +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]/z,} 12

{ xfail { aarch64_sve && { ! vect_variable_length } } } } } */
> +/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7],} 12 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z,} 12

{ xfail { aarch64_sve && { ! vect_variable_length } } } } } */
> +/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7],} 12 }

} */
> +

> +/* 72 for x operations, 6 for foo operations.  */

> +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 78

{ xfail { aarch64_sve && { ! vect_variable_length } } } } } */
> +/* 36 for x operations, 6 for foo operations.  */

> +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 42 }

} */
> +

> +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 168

} } */
> +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 84 }

} */
> +

> +/* { dg-final { scan-assembler-not {\tsel\t} } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c

2018-05-25 11:28:06.195277704 +0100
> @@ -0,0 +1,35 @@

> +/* { dg-do run { target aarch64_sve_hw } } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include "cond_arith_5.c"

> +

> +#define N 98

> +

> +#undef TEST

> +#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP)                          \

> +  {                                                                    \

> +    DATA_TYPE x[N], y[N], pred[N], z[2] = { 5, 7 };                    \

> +    OTHER_TYPE foo[N];                                                 \

> +    for (int i = 0; i < N; ++i)

        \
> +      {

        \
> +       y[i] = i * i;                                                   \

> +       pred[i] = i % 3;                                                \

> +       foo[i] = i * 5;                                                 \

> +      }

        \
> +    test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (x, y, z[0], z[1],

        \
> +                                             pred, foo, N);            \

> +    for (int i = 0; i < N; ++i)

        \
> +      {

        \
> +       DATA_TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i];      \

> +       if (x[i] != expected)                                           \

> +         __builtin_abort ();                                           \

> +       asm volatile ("" ::: "memory");                                 \

> +      }

        \
> +  }

> +

> +int

> +main (void)

> +{

> +  TEST_ALL

> +  return 0;

> +}

> Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/slp_14.c       2018-05-25

11:28:06.195277704 +0100
> @@ -0,0 +1,48 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include <stdint.h>

> +

> +#define VEC_PERM(TYPE)                                         \

> +void __attribute__ ((weak))                                    \

> +vec_slp_##TYPE (TYPE *restrict a, TYPE *restrict b, int n)     \

> +{                                                              \

> +  for (int i = 0; i < n; ++i)                                  \

> +    {                                                          \

> +      TYPE a1 = a[i * 2];                                      \

> +      TYPE a2 = a[i * 2 + 1];                                  \

> +      TYPE b1 = b[i * 2];                                      \

> +      TYPE b2 = b[i * 2 + 1];                                  \

> +      a[i * 2] = b1 > 1 ? a1 / b1 : a1;                                \

> +      a[i * 2 + 1] = b2 > 2 ? a2 / b2 : a2;                    \

> +    }                                                          \

> +}

> +

> +#define TEST_ALL(T)                            \

> +  T (int32_t)                                  \

> +  T (uint32_t)                                 \

> +  T (int64_t)                                  \

> +  T (uint64_t)                                 \

> +  T (float)                                    \

> +  T (double)

> +

> +TEST_ALL (VEC_PERM)

> +

> +/* The loop should be fully-masked.  The load XFAILs for fixed-length

> +   SVE account for extra loads from the constant pool.  */

> +/* { dg-final { scan-assembler-times {\tld1w\t} 6 { xfail { aarch64_sve

&& { ! vect_variable_length } } } } } */
> +/* { dg-final { scan-assembler-times {\tst1w\t} 3 } } */

> +/* { dg-final { scan-assembler-times {\tld1d\t} 6 { xfail { aarch64_sve

&& { ! vect_variable_length } } } } } */
> +/* { dg-final { scan-assembler-times {\tst1d\t} 3 } } */

> +/* { dg-final { scan-assembler-not {\tldr} } } */

> +/* { dg-final { scan-assembler-not {\tstr} } } */

> +

> +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */

> +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */

> +

> +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s} 1 } } */

> +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s} 1 } } */

> +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s} 1 } } */

> +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d} 1 } } */

> +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d} 1 } } */

> +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d} 1 } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c   2018-05-25

11:28:06.195277704 +0100
> @@ -0,0 +1,34 @@

> +/* { dg-do run { target aarch64_sve_hw } } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include "slp_14.c"

> +

> +#define N1 (103 * 2)

> +#define N2 (111 * 2)

> +

> +#define HARNESS(TYPE)                                          \

> +  {                                                            \

> +    TYPE a[N2], b[N2];                                         \

> +    for (unsigned int i = 0; i < N2; ++i)                      \

> +      {                                                                \

> +       a[i] = i * 2 + i % 5;                                   \

> +       b[i] = i % 11;                                          \

> +      }                                                                \

> +    vec_slp_##TYPE (a, b, N1 / 2);                             \

> +    for (unsigned int i = 0; i < N2; ++i)                      \

> +      {                                                                \

> +       TYPE orig_a = i * 2 + i % 5;                            \

> +       TYPE orig_b = i % 11;                                   \

> +       TYPE expected_a = orig_a;                               \

> +       if (i < N1 && orig_b > (i & 1 ? 2 : 1))                 \

> +         expected_a /= orig_b;                                 \

> +       if (a[i] != expected_a || b[i] != orig_b)               \

> +         __builtin_abort ();                                   \

> +      }                                                                \

> +  }

> +

> +int

> +main (void)

> +{

> +  TEST_ALL (HARNESS)

> +}

Index: gcc/internal-fn.h
===================================================================
--- gcc/internal-fn.h	2018-05-16 11:06:14.513574219 +0100
+++ gcc/internal-fn.h	2018-05-16 11:12:11.872116220 +0100
@@ -158,6 +158,17 @@  direct_internal_fn_p (internal_fn fn)
   return direct_internal_fn_array[fn].type0 >= -1;
 }
 
+/* Return true if FN is a direct internal function that can be vectorized by
+   converting the return type and all argument types to vectors of the same
+   number of elements.  E.g. we can vectorize an IFN_SQRT on floats as an
+   IFN_SQRT on vectors of N floats.  */
+
+inline bool
+vectorizable_internal_fn_p (internal_fn fn)
+{
+  return direct_internal_fn_array[fn].vectorizable;
+}
+
 /* Return optab information about internal function FN.  Only meaningful
    if direct_internal_fn_p (FN).  */
 
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2018-05-16 11:02:46.262494712 +0100
+++ gcc/tree-vect-slp.c	2018-05-16 11:12:11.873116180 +0100
@@ -564,6 +564,41 @@  vect_get_and_check_slp_defs (vec_info *v
   return 0;
 }
 
+/* Return true if call statements CALL1 and CALL2 are similar enough
+   to be combined into the same SLP group.  */
+
+static bool
+compatible_calls_p (gcall *call1, gcall *call2)
+{
+  unsigned int nargs = gimple_call_num_args (call1);
+  if (nargs != gimple_call_num_args (call2))
+    return false;
+
+  if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2))
+    return false;
+
+  if (gimple_call_internal_p (call1))
+    {
+      if (TREE_TYPE (gimple_call_lhs (call1))
+	  != TREE_TYPE (gimple_call_lhs (call2)))
+	return false;
+      for (unsigned int i = 0; i < nargs; ++i)
+	if (TREE_TYPE (gimple_call_arg (call1, i))
+	    != TREE_TYPE (gimple_call_arg (call2, i)))
+	  return false;
+    }
+  else
+    {
+      if (!operand_equal_p (gimple_call_fn (call1),
+			    gimple_call_fn (call2), 0))
+	return false;
+
+      if (gimple_call_fntype (call1) != gimple_call_fntype (call2))
+	return false;
+    }
+  return true;
+}
+
 /* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the
    caller's attempt to find the vector type in STMT with the narrowest
    element type.  Return true if VECTYPE is nonnull and if it is valid
@@ -625,8 +660,8 @@  vect_record_max_nunits (vec_info *vinfo,
 static bool
 vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
 		       vec<gimple *> stmts, unsigned int group_size,
-		       unsigned nops, poly_uint64 *max_nunits,
-		       bool *matches, bool *two_operators)
+		       poly_uint64 *max_nunits, bool *matches,
+		       bool *two_operators)
 {
   unsigned int i;
   gimple *first_stmt = stmts[0], *stmt = stmts[0];
@@ -698,7 +733,9 @@  vect_build_slp_tree_1 (vec_info *vinfo,
       if (gcall *call_stmt = dyn_cast <gcall *> (stmt))
 	{
 	  rhs_code = CALL_EXPR;
-	  if (gimple_call_internal_p (call_stmt)
+	  if ((gimple_call_internal_p (call_stmt)
+	       && (!vectorizable_internal_fn_p
+		   (gimple_call_internal_fn (call_stmt))))
 	      || gimple_call_tail_p (call_stmt)
 	      || gimple_call_noreturn_p (call_stmt)
 	      || !gimple_call_nothrow_p (call_stmt)
@@ -833,11 +870,8 @@  vect_build_slp_tree_1 (vec_info *vinfo,
 	  if (rhs_code == CALL_EXPR)
 	    {
 	      gimple *first_stmt = stmts[0];
-	      if (gimple_call_num_args (stmt) != nops
-		  || !operand_equal_p (gimple_call_fn (first_stmt),
-				       gimple_call_fn (stmt), 0)
-		  || gimple_call_fntype (first_stmt)
-		     != gimple_call_fntype (stmt))
+	      if (!compatible_calls_p (as_a <gcall *> (first_stmt),
+				       as_a <gcall *> (stmt)))
 		{
 		  if (dump_enabled_p ())
 		    {
@@ -1166,8 +1200,7 @@  vect_build_slp_tree_2 (vec_info *vinfo,
 
   bool two_operators = false;
   unsigned char *swap = XALLOCAVEC (unsigned char, group_size);
-  if (!vect_build_slp_tree_1 (vinfo, swap,
-			      stmts, group_size, nops,
+  if (!vect_build_slp_tree_1 (vinfo, swap, stmts, group_size,
 			      &this_max_nunits, matches, &two_operators))
     return NULL;
 
Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c	2018-05-16 11:12:11.872116220 +0100
@@ -0,0 +1,62 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include <stdint.h>
+
+#define TEST(TYPE, NAME, OP)						\
+  void __attribute__ ((noinline, noclone))				\
+  test_##TYPE##_##NAME (TYPE *__restrict x,				\
+			TYPE *__restrict y,				\
+			TYPE z1, TYPE z2,				\
+			TYPE *__restrict pred, int n)			\
+  {									\
+    for (int i = 0; i < n; i += 2)					\
+      {									\
+	x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]);			\
+	x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]);	\
+      }									\
+  }
+
+#define TEST_INT_TYPE(TYPE) \
+  TEST (TYPE, div, /)
+
+#define TEST_FP_TYPE(TYPE) \
+  TEST (TYPE, add, +) \
+  TEST (TYPE, sub, -) \
+  TEST (TYPE, mul, *) \
+  TEST (TYPE, div, /)
+
+#define TEST_ALL \
+  TEST_INT_TYPE (int32_t) \
+  TEST_INT_TYPE (uint32_t) \
+  TEST_INT_TYPE (int64_t) \
+  TEST_INT_TYPE (uint64_t) \
+  TEST_FP_TYPE (float) \
+  TEST_FP_TYPE (double)
+
+TEST_ALL
+
+/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */
+/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */
+
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 12 } } */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 6 } } */
+
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 12 } } */
+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 6 } } */
+
+/* { dg-final { scan-assembler-not {\tsel\t} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c	2018-05-16 11:12:11.872116220 +0100
@@ -0,0 +1,32 @@ 
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "cond_arith_4.c"
+
+#define N 98
+
+#undef TEST
+#define TEST(TYPE, NAME, OP)					\
+  {								\
+    TYPE x[N], y[N], pred[N], z[2] = { 5, 7 };			\
+    for (int i = 0; i < N; ++i)					\
+      {								\
+	y[i] = i * i;						\
+	pred[i] = i % 3;					\
+      }								\
+    test_##TYPE##_##NAME (x, y, z[0], z[1], pred, N);		\
+    for (int i = 0; i < N; ++i)					\
+      {								\
+	TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i];	\
+	if (x[i] != expected)					\
+	  __builtin_abort ();					\
+	asm volatile ("" ::: "memory");				\
+      }								\
+  }
+
+int
+main (void)
+{
+  TEST_ALL
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c	2018-05-16 11:12:11.872116220 +0100
@@ -0,0 +1,85 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+#include <stdint.h>
+
+#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP)				\
+  void __attribute__ ((noinline, noclone))				\
+  test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (DATA_TYPE *__restrict x,	\
+					    DATA_TYPE *__restrict y,	\
+					    DATA_TYPE z1, DATA_TYPE z2,	\
+					    DATA_TYPE *__restrict pred,	\
+					    OTHER_TYPE *__restrict foo,	\
+					    int n)			\
+  {									\
+    for (int i = 0; i < n; i += 2)					\
+      {									\
+	x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]);			\
+	x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]);	\
+	foo[i] += 1;							\
+	foo[i + 1] += 2;						\
+      }									\
+  }
+
+#define TEST_INT_TYPE(DATA_TYPE, OTHER_TYPE) \
+  TEST (DATA_TYPE, OTHER_TYPE, div, /)
+
+#define TEST_FP_TYPE(DATA_TYPE, OTHER_TYPE) \
+  TEST (DATA_TYPE, OTHER_TYPE, add, +) \
+  TEST (DATA_TYPE, OTHER_TYPE, sub, -) \
+  TEST (DATA_TYPE, OTHER_TYPE, mul, *) \
+  TEST (DATA_TYPE, OTHER_TYPE, div, /)
+
+#define TEST_ALL \
+  TEST_INT_TYPE (int32_t, int8_t) \
+  TEST_INT_TYPE (int32_t, int16_t) \
+  TEST_INT_TYPE (uint32_t, int8_t) \
+  TEST_INT_TYPE (uint32_t, int16_t) \
+  TEST_INT_TYPE (int64_t, int8_t) \
+  TEST_INT_TYPE (int64_t, int16_t) \
+  TEST_INT_TYPE (int64_t, int32_t) \
+  TEST_INT_TYPE (uint64_t, int8_t) \
+  TEST_INT_TYPE (uint64_t, int16_t) \
+  TEST_INT_TYPE (uint64_t, int32_t) \
+  TEST_FP_TYPE (float, int8_t) \
+  TEST_FP_TYPE (float, int16_t) \
+  TEST_FP_TYPE (double, int8_t) \
+  TEST_FP_TYPE (double, int16_t) \
+  TEST_FP_TYPE (double, int32_t)
+
+TEST_ALL
+
+/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */
+/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */
+/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */
+/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */
+
+/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 6 } } */
+/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 14 } } */
+
+/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 6 } } */
+/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 14 } } */
+
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 6 } } */
+/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 14 } } */
+
+/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */
+/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */
+
+/* The load XFAILs for fixed-length SVE account for extra loads from the
+   constant pool.  */
+/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]/z,} 12 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */
+/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7],} 12 } } */
+
+/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z,} 12 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */
+/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7],} 12 } } */
+
+/* 72 for x operations, 6 for foo operations.  */
+/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 78 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */
+/* 36 for x operations, 6 for foo operations.  */
+/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 42 } } */
+
+/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 168 } } */
+/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 84 } } */
+
+/* { dg-final { scan-assembler-not {\tsel\t} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c	2018-05-16 11:12:11.873116180 +0100
@@ -0,0 +1,35 @@ 
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "cond_arith_5.c"
+
+#define N 98
+
+#undef TEST
+#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP)				\
+  {									\
+    DATA_TYPE x[N], y[N], pred[N], z[2] = { 5, 7 };			\
+    OTHER_TYPE foo[N];							\
+    for (int i = 0; i < N; ++i)						\
+      {									\
+	y[i] = i * i;							\
+	pred[i] = i % 3;						\
+	foo[i] = i * 5;							\
+      }									\
+    test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (x, y, z[0], z[1],		\
+					      pred, foo, N);		\
+    for (int i = 0; i < N; ++i)						\
+      {									\
+	DATA_TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i];	\
+	if (x[i] != expected)						\
+	  __builtin_abort ();						\
+	asm volatile ("" ::: "memory");					\
+      }									\
+  }
+
+int
+main (void)
+{
+  TEST_ALL
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/slp_14.c	2018-05-16 11:12:11.873116180 +0100
@@ -0,0 +1,48 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include <stdint.h>
+
+#define VEC_PERM(TYPE)						\
+void __attribute__ ((weak))					\
+vec_slp_##TYPE (TYPE *restrict a, TYPE *restrict b, int n)	\
+{								\
+  for (int i = 0; i < n; ++i)					\
+    {								\
+      TYPE a1 = a[i * 2];					\
+      TYPE a2 = a[i * 2 + 1];					\
+      TYPE b1 = b[i * 2];					\
+      TYPE b2 = b[i * 2 + 1];					\
+      a[i * 2] = b1 > 1 ? a1 / b1 : a1;				\
+      a[i * 2 + 1] = b2 > 2 ? a2 / b2 : a2;			\
+    }								\
+}
+
+#define TEST_ALL(T)				\
+  T (int32_t)					\
+  T (uint32_t)					\
+  T (int64_t)					\
+  T (uint64_t)					\
+  T (float)					\
+  T (double)
+
+TEST_ALL (VEC_PERM)
+
+/* The loop should be fully-masked.  The load XFAILs for fixed-length
+   SVE account for extra loads from the constant pool.  */
+/* { dg-final { scan-assembler-times {\tld1w\t} 6 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */
+/* { dg-final { scan-assembler-times {\tst1w\t} 3 } } */
+/* { dg-final { scan-assembler-times {\tld1d\t} 6 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */
+/* { dg-final { scan-assembler-times {\tst1d\t} 3 } } */
+/* { dg-final { scan-assembler-not {\tldr} } } */
+/* { dg-final { scan-assembler-not {\tstr} } } */
+
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */
+/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */
+
+/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s} 1 } } */
+/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s} 1 } } */
+/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s} 1 } } */
+/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d} 1 } } */
+/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d} 1 } } */
+/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c	2018-05-16 11:12:11.873116180 +0100
@@ -0,0 +1,34 @@ 
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "slp_14.c"
+
+#define N1 (103 * 2)
+#define N2 (111 * 2)
+
+#define HARNESS(TYPE)						\
+  {								\
+    TYPE a[N2], b[N2];						\
+    for (unsigned int i = 0; i < N2; ++i)			\
+      {								\
+	a[i] = i * 2 + i % 5;					\
+	b[i] = i % 11;						\
+      }								\
+    vec_slp_##TYPE (a, b, N1 / 2);				\
+    for (unsigned int i = 0; i < N2; ++i)			\
+      {								\
+	TYPE orig_a = i * 2 + i % 5;				\
+	TYPE orig_b = i % 11;					\
+	TYPE expected_a = orig_a;				\
+	if (i < N1 && orig_b > (i & 1 ? 2 : 1))			\
+	  expected_a /= orig_b;					\
+	if (a[i] != expected_a || b[i] != orig_b)		\
+	  __builtin_abort ();					\
+      }								\
+  }
+
+int
+main (void)
+{
+  TEST_ALL (HARNESS)
+}

Implement SLP of internal functions

Commit Message

Comments

Patch