diff mbox series

Add IFN_COND_FMA functions

Message ID 87r2m1cl44.fsf@linaro.org
State New
Headers show
Series Add IFN_COND_FMA functions | expand

Commit Message

Richard Sandiford May 24, 2018, 12:07 p.m. UTC
This patch adds conditional equivalents of the IFN_FMA built-in functions.
Most of it is just a mechanical extension of the binary stuff.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK for the non-AArch64 bits?

Richard


2018-05-24  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* doc/md.texi (cond_fma, cond_fms, cond_fnma, cond_fnms): Document.
	* optabs.def (cond_fma_optab, cond_fms_optab, cond_fnma_optab)
	(cond_fnms_optab): New optabs.
	* internal-fn.def (COND_FMA, COND_FMS, COND_FNMA, COND_FNMS): New
	internal functions.
	(FMA): Use DEF_INTERNAL_FLT_FN rather than DEF_INTERNAL_FLT_FLOATN_FN.
	* internal-fn.h (get_conditional_internal_fn): Declare.
	(get_unconditional_internal_fn): Likewise.
	* internal-fn.c (cond_ternary_direct): New macro.
	(expand_cond_ternary_optab_fn): Likewise.
	(direct_cond_ternary_optab_supported_p): Likewise.
	(FOR_EACH_COND_FN_PAIR): Likewise.
	(get_conditional_internal_fn): New function.
	(get_unconditional_internal_fn): Likewise.
	* gimple-match.h (gimple_match_op::MAX_NUM_OPS): Bump to 5.
	(gimple_match_op::gimple_match_op): Add a new overload for 5
	operands.
	(gimple_match_op::set_op): Likewise.
	(gimple_resimplify5): Declare.
	* genmatch.c (decision_tree::gen): Generate simplifications for
	5 operands.
	* gimple-match-head.c (gimple_simplify): Define an overload for
	5 operands.  Handle calls with 5 arguments in the top-level overload.
	(convert_conditional_op): Handle conversions from unconditional
	internal functions to conditional ones.
	(gimple_resimplify5): New function.
	(build_call_internal): Pass a fifth operand.
	(maybe_push_res_to_seq): Likewise.
	(try_conditional_simplification): Try converting conditional
	internal functions to unconditional internal functions.
	Handle 3-operand unconditional forms.
	* match.pd (UNCOND_TERNARY, COND_TERNARY): Operator lists.
	Define ternary equivalents of the current rules for binary conditional
	internal functions.
	* config/aarch64/aarch64.c (aarch64_preferred_else_value): Handle
	ternary operations.
	* config/aarch64/aarch64-sve.md (cond_<optab><mode>)
	(*cond_<optab><mode>, *cond_<optab><mode>_acc): New
	SVE_COND_FP_TERNARY patterns.
	* config/aarch64/iterators.md (UNSPEC_COND_FMLA, UNSPEC_COND_FMLS)
	(UNSPEC_COND_FNMLA, UNSPEC_COND_FNMLS): New unspecs.
	(optab): Handle them.
	(SVE_COND_FP_TERNARY): New int iterator.
	(sve_fmla_op, sve_fmad_op): New int attributes.

gcc/testsuite/
	* gcc.dg/vect/vect-cond-arith-3.c: New test.
	* gcc.target/aarch64/sve/vcond_13.c: Likewise.
	* gcc.target/aarch64/sve/vcond_13_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_14.c: Likewise.
	* gcc.target/aarch64/sve/vcond_14_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_15.c: Likewise.
	* gcc.target/aarch64/sve/vcond_15_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_16.c: Likewise.
	* gcc.target/aarch64/sve/vcond_16_run.c: Likewise.

Comments

Richard Biener May 25, 2018, 7:27 a.m. UTC | #1
On Thu, May 24, 2018 at 2:08 PM Richard Sandiford <
richard.sandiford@linaro.org> wrote:

> This patch adds conditional equivalents of the IFN_FMA built-in functions.

> Most of it is just a mechanical extension of the binary stuff.


> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf

> and x86_64-linux-gnu.  OK for the non-AArch64 bits?


OK.

Richard.

> Richard



> 2018-05-24  Richard Sandiford  <richard.sandiford@linaro.org>


> gcc/

>          * doc/md.texi (cond_fma, cond_fms, cond_fnma, cond_fnms):

Document.
>          * optabs.def (cond_fma_optab, cond_fms_optab, cond_fnma_optab)

>          (cond_fnms_optab): New optabs.

>          * internal-fn.def (COND_FMA, COND_FMS, COND_FNMA, COND_FNMS): New

>          internal functions.

>          (FMA): Use DEF_INTERNAL_FLT_FN rather than

DEF_INTERNAL_FLT_FLOATN_FN.
>          * internal-fn.h (get_conditional_internal_fn): Declare.

>          (get_unconditional_internal_fn): Likewise.

>          * internal-fn.c (cond_ternary_direct): New macro.

>          (expand_cond_ternary_optab_fn): Likewise.

>          (direct_cond_ternary_optab_supported_p): Likewise.

>          (FOR_EACH_COND_FN_PAIR): Likewise.

>          (get_conditional_internal_fn): New function.

>          (get_unconditional_internal_fn): Likewise.

>          * gimple-match.h (gimple_match_op::MAX_NUM_OPS): Bump to 5.

>          (gimple_match_op::gimple_match_op): Add a new overload for 5

>          operands.

>          (gimple_match_op::set_op): Likewise.

>          (gimple_resimplify5): Declare.

>          * genmatch.c (decision_tree::gen): Generate simplifications for

>          5 operands.

>          * gimple-match-head.c (gimple_simplify): Define an overload for

>          5 operands.  Handle calls with 5 arguments in the top-level

overload.
>          (convert_conditional_op): Handle conversions from unconditional

>          internal functions to conditional ones.

>          (gimple_resimplify5): New function.

>          (build_call_internal): Pass a fifth operand.

>          (maybe_push_res_to_seq): Likewise.

>          (try_conditional_simplification): Try converting conditional

>          internal functions to unconditional internal functions.

>          Handle 3-operand unconditional forms.

>          * match.pd (UNCOND_TERNARY, COND_TERNARY): Operator lists.

>          Define ternary equivalents of the current rules for binary

conditional
>          internal functions.

>          * config/aarch64/aarch64.c (aarch64_preferred_else_value): Handle

>          ternary operations.

>          * config/aarch64/aarch64-sve.md (cond_<optab><mode>)

>          (*cond_<optab><mode>, *cond_<optab><mode>_acc): New

>          SVE_COND_FP_TERNARY patterns.

>          * config/aarch64/iterators.md (UNSPEC_COND_FMLA, UNSPEC_COND_FMLS)

>          (UNSPEC_COND_FNMLA, UNSPEC_COND_FNMLS): New unspecs.

>          (optab): Handle them.

>          (SVE_COND_FP_TERNARY): New int iterator.

>          (sve_fmla_op, sve_fmad_op): New int attributes.


> gcc/testsuite/

>          * gcc.dg/vect/vect-cond-arith-3.c: New test.

>          * gcc.target/aarch64/sve/vcond_13.c: Likewise.

>          * gcc.target/aarch64/sve/vcond_13_run.c: Likewise.

>          * gcc.target/aarch64/sve/vcond_14.c: Likewise.

>          * gcc.target/aarch64/sve/vcond_14_run.c: Likewise.

>          * gcc.target/aarch64/sve/vcond_15.c: Likewise.

>          * gcc.target/aarch64/sve/vcond_15_run.c: Likewise.

>          * gcc.target/aarch64/sve/vcond_16.c: Likewise.

>          * gcc.target/aarch64/sve/vcond_16_run.c: Likewise.


> Index: gcc/doc/md.texi

> ===================================================================

> --- gcc/doc/md.texi     2018-05-24 10:12:10.142352315 +0100

> +++ gcc/doc/md.texi     2018-05-24 13:05:46.047607587 +0100

> @@ -6386,6 +6386,23 @@ Operands 0, 2, 3 and 4 all have mode @va

>   integer if @var{m} is scalar, otherwise it has the mode returned by

>   @code{TARGET_VECTORIZE_GET_MASK_MODE}.


> +@cindex @code{cond_fma@var{mode}} instruction pattern

> +@cindex @code{cond_fms@var{mode}} instruction pattern

> +@cindex @code{cond_fnma@var{mode}} instruction pattern

> +@cindex @code{cond_fnms@var{mode}} instruction pattern

> +@item @samp{cond_fma@var{mode}}

> +@itemx @samp{cond_fms@var{mode}}

> +@itemx @samp{cond_fnma@var{mode}}

> +@itemx @samp{cond_fnms@var{mode}}

> +Like @samp{cond_add@var{m}}, except that the conditional operation

> +takes 3 operands rather than two.  For example, the vector form of

> +@samp{cond_fma@var{mode}} is equivalent to:

> +

> +@smallexample

> +for (i = 0; i < GET_MODE_NUNITS (@var{m}); i++)

> +  op0[i] = op1[i] ? fma (op2[i], op3[i], op4[i]) : op5[i];

> +@end smallexample

> +

>   @cindex @code{neg@var{mode}cc} instruction pattern

>   @item @samp{neg@var{mode}cc}

>   Similar to @samp{mov@var{mode}cc} but for conditional negation.

Conditionally
> Index: gcc/optabs.def

> ===================================================================

> --- gcc/optabs.def      2018-05-24 10:12:10.146352152 +0100

> +++ gcc/optabs.def      2018-05-24 13:05:46.049605128 +0100

> @@ -234,6 +234,10 @@ OPTAB_D (cond_smin_optab, "cond_smin$a")

>   OPTAB_D (cond_smax_optab, "cond_smax$a")

>   OPTAB_D (cond_umin_optab, "cond_umin$a")

>   OPTAB_D (cond_umax_optab, "cond_umax$a")

> +OPTAB_D (cond_fma_optab, "cond_fma$a")

> +OPTAB_D (cond_fms_optab, "cond_fms$a")

> +OPTAB_D (cond_fnma_optab, "cond_fnma$a")

> +OPTAB_D (cond_fnms_optab, "cond_fnms$a")

>   OPTAB_D (cmov_optab, "cmov$a6")

>   OPTAB_D (cstore_optab, "cstore$a4")

>   OPTAB_D (ctrap_optab, "ctrap$a4")

> Index: gcc/internal-fn.def

> ===================================================================

> --- gcc/internal-fn.def 2018-05-24 10:12:10.146352152 +0100

> +++ gcc/internal-fn.def 2018-05-24 13:05:46.048606357 +0100

> @@ -59,7 +59,8 @@ along with GCC; see the file COPYING3.

>      - binary: a normal binary optab, such as vec_interleave_lo_<mode>

>      - ternary: a normal ternary optab, such as fma<mode>4


> -   - cond_binary: a conditional binary optab, such as add<mode>cc

> +   - cond_binary: a conditional binary optab, such as cond_add<mode>

> +   - cond_ternary: a conditional ternary optab, such as

cond_fma_rev<mode>

>      - fold_left: for scalar = FN (scalar, vector), keyed off the vector

mode

> @@ -162,6 +163,11 @@ DEF_INTERNAL_OPTAB_FN (COND_IOR, ECF_CON

>   DEF_INTERNAL_OPTAB_FN (COND_XOR, ECF_CONST | ECF_NOTHROW,

>                         cond_xor, cond_binary)


> +DEF_INTERNAL_OPTAB_FN (COND_FMA, ECF_CONST, cond_fma, cond_ternary)

> +DEF_INTERNAL_OPTAB_FN (COND_FMS, ECF_CONST, cond_fms, cond_ternary)

> +DEF_INTERNAL_OPTAB_FN (COND_FNMA, ECF_CONST, cond_fnma, cond_ternary)

> +DEF_INTERNAL_OPTAB_FN (COND_FNMS, ECF_CONST, cond_fnms, cond_ternary)

> +

>   DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary)


>   DEF_INTERNAL_OPTAB_FN (REDUC_PLUS, ECF_CONST | ECF_NOTHROW,

> @@ -230,7 +236,7 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS

>   DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)


>   /* Ternary math functions.  */

> -DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary)

> +DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)


>   /* Unary integer ops.  */

>   DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)

> Index: gcc/internal-fn.h

> ===================================================================

> --- gcc/internal-fn.h   2018-05-24 10:33:30.870095164 +0100

> +++ gcc/internal-fn.h   2018-05-24 13:05:46.049605128 +0100

> @@ -193,7 +193,9 @@ direct_internal_fn_supported_p (internal

>   extern bool set_edom_supported_p (void);


>   extern internal_fn get_conditional_internal_fn (tree_code);

> +extern internal_fn get_conditional_internal_fn (internal_fn);

>   extern tree_code conditional_internal_fn_code (internal_fn);

> +extern internal_fn get_unconditional_internal_fn (internal_fn);


>   extern bool internal_load_fn_p (internal_fn);

>   extern bool internal_store_fn_p (internal_fn);

> Index: gcc/internal-fn.c

> ===================================================================

> --- gcc/internal-fn.c   2018-05-24 10:33:30.870095164 +0100

> +++ gcc/internal-fn.c   2018-05-24 13:05:46.048606357 +0100

> @@ -113,6 +113,7 @@ #define binary_direct { 0, 0, true }

>   #define ternary_direct { 0, 0, true }

>   #define cond_unary_direct { 1, 1, true }

>   #define cond_binary_direct { 1, 1, true }

> +#define cond_ternary_direct { 1, 1, true }

>   #define while_direct { 0, 2, false }

>   #define fold_extract_direct { 2, 2, false }

>   #define fold_left_direct { 1, 1, false }

> @@ -2993,6 +2994,9 @@ #define expand_cond_unary_optab_fn(FN, S

>   #define expand_cond_binary_optab_fn(FN, STMT, OPTAB) \

>     expand_direct_optab_fn (FN, STMT, OPTAB, 4)


> +#define expand_cond_ternary_optab_fn(FN, STMT, OPTAB) \

> +  expand_direct_optab_fn (FN, STMT, OPTAB, 5)

> +

>   #define expand_fold_extract_optab_fn(FN, STMT, OPTAB) \

>     expand_direct_optab_fn (FN, STMT, OPTAB, 3)


> @@ -3075,6 +3079,7 @@ #define direct_binary_optab_supported_p

>   #define direct_ternary_optab_supported_p direct_optab_supported_p

>   #define direct_cond_unary_optab_supported_p direct_optab_supported_p

>   #define direct_cond_binary_optab_supported_p direct_optab_supported_p

> +#define direct_cond_ternary_optab_supported_p direct_optab_supported_p

>   #define direct_mask_load_optab_supported_p direct_optab_supported_p

>   #define direct_load_lanes_optab_supported_p

multi_vector_optab_supported_p
>   #define direct_mask_load_lanes_optab_supported_p

multi_vector_optab_supported_p
> @@ -3277,6 +3282,57 @@ #define CASE(CODE, IFN) case IFN: return

>       }

>   }


> +/* Invoke T(IFN) for each internal function IFN that also has an

> +   IFN_COND_* form.  */

> +#define FOR_EACH_COND_FN_PAIR(T) \

> +  T (FMA) \

> +  T (FMS) \

> +  T (FNMA) \

> +  T (FNMS)

> +

> +/* Return a function that only performs internal function FN when a

> +   certain condition is met and that uses a given fallback value

otherwise.
> +   In other words, the returned function FN' is such that:

> +

> +     LHS = FN' (COND, A1, ... An, ELSE)

> +

> +   is equivalent to the C expression:

> +

> +     LHS = COND ? FN (A1, ..., An) : ELSE;

> +

> +   operating elementwise if the operands are vectors.

> +

> +   Return IFN_LAST if no such function exists.  */

> +

> +internal_fn

> +get_conditional_internal_fn (internal_fn fn)

> +{

> +  switch (fn)

> +    {

> +#define CASE(NAME) case IFN_##NAME: return IFN_COND_##NAME;

> +      FOR_EACH_COND_FN_PAIR(CASE)

> +#undef CASE

> +    default:

> +      return IFN_LAST;

> +    }

> +}

> +

> +/* If IFN implements the conditional form of an unconditional internal

> +   function, return that unconditional function, otherwise return

IFN_LAST.  */
> +

> +internal_fn

> +get_unconditional_internal_fn (internal_fn ifn)

> +{

> +  switch (ifn)

> +    {

> +#define CASE(NAME) case IFN_COND_##NAME: return IFN_##NAME;

> +      FOR_EACH_COND_FN_PAIR(CASE)

> +#undef CASE

> +    default:

> +      return IFN_LAST;

> +    }

> +}

> +

>   /* Return true if IFN is some form of load from memory.  */


>   bool

> Index: gcc/gimple-match.h

> ===================================================================

> --- gcc/gimple-match.h  2018-05-24 10:33:30.870095164 +0100

> +++ gcc/gimple-match.h  2018-05-24 13:05:46.048606357 +0100

> @@ -91,18 +91,21 @@ struct gimple_match_op

>                     code_helper, tree, tree, tree, tree);

>     gimple_match_op (const gimple_match_cond &,

>                     code_helper, tree, tree, tree, tree, tree);

> +  gimple_match_op (const gimple_match_cond &,

> +                  code_helper, tree, tree, tree, tree, tree, tree);


>     void set_op (code_helper, tree, unsigned int);

>     void set_op (code_helper, tree, tree);

>     void set_op (code_helper, tree, tree, tree);

>     void set_op (code_helper, tree, tree, tree, tree);

>     void set_op (code_helper, tree, tree, tree, tree, tree);

> +  void set_op (code_helper, tree, tree, tree, tree, tree, tree);

>     void set_value (tree);


>     tree op_or_null (unsigned int) const;


>     /* The maximum value of NUM_OPS.  */

> -  static const unsigned int MAX_NUM_OPS = 4;

> +  static const unsigned int MAX_NUM_OPS = 5;


>     /* The conditions under which the operation is performed, and the

value to
>        use as a fallback.  */

> @@ -182,6 +185,20 @@ gimple_match_op::gimple_match_op (const

>     ops[3] = op3;

>   }


> +inline

> +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,

> +                                 code_helper code_in, tree type_in,

> +                                 tree op0, tree op1, tree op2, tree op3,

> +                                 tree op4)

> +  : cond (cond_in), code (code_in), type (type_in), num_ops (5)

> +{

> +  ops[0] = op0;

> +  ops[1] = op1;

> +  ops[2] = op2;

> +  ops[3] = op3;

> +  ops[4] = op4;

> +}

> +

>   /* Change the operation performed to CODE_IN, the type of the result to

>      TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs

>      to set the operands itself.  */

> @@ -242,6 +259,20 @@ gimple_match_op::set_op (code_helper cod

>     ops[3] = op3;

>   }


> +inline void

> +gimple_match_op::set_op (code_helper code_in, tree type_in,

> +                        tree op0, tree op1, tree op2, tree op3, tree op4)

> +{

> +  code = code_in;

> +  type = type_in;

> +  num_ops = 5;

> +  ops[0] = op0;

> +  ops[1] = op1;

> +  ops[2] = op2;

> +  ops[3] = op3;

> +  ops[4] = op4;

> +}

> +

>   /* Set the "operation" to be the single value VALUE, such as a constant

>      or SSA_NAME.  */


> @@ -279,6 +310,7 @@ bool gimple_resimplify1 (gimple_seq *, g

>   bool gimple_resimplify2 (gimple_seq *, gimple_match_op *, tree

(*)(tree));
>   bool gimple_resimplify3 (gimple_seq *, gimple_match_op *, tree

(*)(tree));
>   bool gimple_resimplify4 (gimple_seq *, gimple_match_op *, tree

(*)(tree));
> +bool gimple_resimplify5 (gimple_seq *, gimple_match_op *, tree

(*)(tree));
>   tree maybe_push_res_to_seq (gimple_match_op *, gimple_seq *,

>                              tree res = NULL_TREE);

>   void maybe_build_generic_op (gimple_match_op *);

> Index: gcc/genmatch.c

> ===================================================================

> --- gcc/genmatch.c      2018-05-24 10:33:30.869095197 +0100

> +++ gcc/genmatch.c      2018-05-24 13:05:46.048606357 +0100

> @@ -3760,7 +3760,7 @@ decision_tree::gen (FILE *f, bool gimple

>       }

>     fprintf (stderr, "removed %u duplicate tails\n", rcnt);


> -  for (unsigned n = 1; n <= 4; ++n)

> +  for (unsigned n = 1; n <= 5; ++n)

>       {

>         /* First generate split-out functions.  */

>         for (unsigned i = 0; i < root->kids.length (); i++)

> Index: gcc/gimple-match-head.c

> ===================================================================

> --- gcc/gimple-match-head.c     2018-05-24 10:33:30.870095164 +0100

> +++ gcc/gimple-match-head.c     2018-05-24 13:05:46.048606357 +0100

> @@ -54,6 +54,8 @@ static bool gimple_simplify (gimple_matc

>                               code_helper, tree, tree, tree, tree);

>   static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree

(*)(tree),
>                               code_helper, tree, tree, tree, tree, tree);

> +static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree

(*)(tree),
> +                            code_helper, tree, tree, tree, tree, tree,

tree);

>   const unsigned int gimple_match_op::MAX_NUM_OPS;


> @@ -80,7 +82,12 @@ convert_conditional_op (gimple_match_op

>     if (orig_op->code.is_tree_code ())

>       ifn = get_conditional_internal_fn ((tree_code) orig_op->code);

>     else

> -    return false;

> +    {

> +      combined_fn cfn = orig_op->code;

> +      if (!internal_fn_p (cfn))

> +       return false;

> +      ifn = get_conditional_internal_fn (as_internal_fn (cfn));

> +    }

>     if (ifn == IFN_LAST)

>       return false;

>     unsigned int num_ops = orig_op->num_ops;

> @@ -347,6 +354,34 @@ gimple_resimplify4 (gimple_seq *seq, gim

>     return false;

>   }


> +/* Helper that matches and simplifies the toplevel result from

> +   a gimple_simplify run (where we don't want to build

> +   a stmt in case it's used in in-place folding).  Replaces

> +   RES_OP with a simplified and/or canonicalized result and

> +   returns whether any change was made.  */

> +

> +bool

> +gimple_resimplify5 (gimple_seq *seq, gimple_match_op *res_op,

> +                   tree (*valueize)(tree))

> +{

> +  /* No constant folding is defined for five-operand functions.  */

> +

> +  gimple_match_op res_op2 (*res_op);

> +  if (gimple_simplify (&res_op2, seq, valueize,

> +                      res_op->code, res_op->type,

> +                      res_op->ops[0], res_op->ops[1], res_op->ops[2],

> +                      res_op->ops[3], res_op->ops[4]))

> +    {

> +      *res_op = res_op2;

> +      return true;

> +    }

> +

> +  if (maybe_resimplify_conditional_op (seq, res_op, valueize))

> +    return true;

> +

> +  return false;

> +}

> +

>   /* If in GIMPLE the operation described by RES_OP should be single-rhs,

>      build a GENERIC tree for that expression and update RES_OP

accordingly.  */

> @@ -388,7 +423,8 @@ build_call_internal (internal_fn fn, gim

>                                       res_op->op_or_null (0),

>                                       res_op->op_or_null (1),

>                                       res_op->op_or_null (2),

> -                                    res_op->op_or_null (3));

> +                                    res_op->op_or_null (3),

> +                                    res_op->op_or_null (4));

>   }


>   /* Push the exploded expression described by RES_OP as a statement to

> @@ -482,7 +518,8 @@ maybe_push_res_to_seq (gimple_match_op *

>                                          res_op->op_or_null (0),

>                                          res_op->op_or_null (1),

>                                          res_op->op_or_null (2),

> -                                       res_op->op_or_null (3));

> +                                       res_op->op_or_null (3),

> +                                       res_op->op_or_null (4));

>          }

>         if (!res)

>          {

> @@ -689,14 +726,22 @@ do_valueize (tree op, tree (*valueize)(t

>   try_conditional_simplification (internal_fn ifn, gimple_match_op *res_op,

>                                  gimple_seq *seq, tree (*valueize) (tree))

>   {

> +  code_helper op;

>     tree_code code = conditional_internal_fn_code (ifn);

> -  if (code == ERROR_MARK)

> -    return false;

> +  if (code != ERROR_MARK)

> +    op = code;

> +  else

> +    {

> +      ifn = get_unconditional_internal_fn (ifn);

> +      if (ifn == IFN_LAST)

> +       return false;

> +      op = as_combined_fn (ifn);

> +    }


>     unsigned int num_ops = res_op->num_ops;

>     gimple_match_op cond_op (gimple_match_cond (res_op->ops[0],

>                                                res_op->ops[num_ops - 1]),

> -                          code, res_op->type, num_ops - 2);

> +                          op, res_op->type, num_ops - 2);

>     for (unsigned int i = 1; i < num_ops - 1; ++i)

>       cond_op.ops[i - 1] = res_op->ops[i];

>     switch (num_ops - 2)

> @@ -705,6 +750,10 @@ try_conditional_simplification (internal

>         if (!gimple_resimplify2 (seq, &cond_op, valueize))

>          return false;

>         break;

> +    case 3:

> +      if (!gimple_resimplify3 (seq, &cond_op, valueize))

> +       return false;

> +      break;

>       default:

>         gcc_unreachable ();

>       }

> @@ -837,7 +886,7 @@ gimple_simplify (gimple *stmt, gimple_ma

>         /* ???  This way we can't simplify calls with side-effects.  */

>         if (gimple_call_lhs (stmt) != NULL_TREE

>            && gimple_call_num_args (stmt) >= 1

> -         && gimple_call_num_args (stmt) <= 4)

> +         && gimple_call_num_args (stmt) <= 5)

>          {

>            bool valueized = false;

>            combined_fn cfn;

> @@ -887,6 +936,9 @@ gimple_simplify (gimple *stmt, gimple_ma

>              case 4:

>                return (gimple_resimplify4 (seq, res_op, valueize)

>                        || valueized);

> +           case 5:

> +             return (gimple_resimplify5 (seq, res_op, valueize)

> +                     || valueized);

>              default:

>               gcc_unreachable ();

>              }

> Index: gcc/match.pd

> ===================================================================

> --- gcc/match.pd        2018-05-24 10:33:30.870095164 +0100

> +++ gcc/match.pd        2018-05-24 13:05:46.049605128 +0100

> @@ -86,6 +86,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

>     IFN_COND_MUL IFN_COND_DIV IFN_COND_MOD IFN_COND_RDIV

>     IFN_COND_MIN IFN_COND_MAX

>     IFN_COND_AND IFN_COND_IOR IFN_COND_XOR)

> +

> +/* Same for ternary operations.  */

> +(define_operator_list UNCOND_TERNARY

> +  IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)

> +(define_operator_list COND_TERNARY

> +  IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)


>   /* As opposed to convert?, this still creates a single pattern, so

>      it is not a suitable replacement for convert? in all cases.  */

> @@ -4798,6 +4804,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

>      (if (element_precision (type) == element_precision (op_type))

>       (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type

@1)))))))

> +/* Same for ternary operations.  */

> +(for uncond_op (UNCOND_TERNARY)

> +     cond_op (COND_TERNARY)

> + (simplify

> +  (vec_cond @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4)

> +  (with { tree op_type = TREE_TYPE (@5); }

> +   (if (element_precision (type) == element_precision (op_type))

> +    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4))))))

> + (simplify

> +  (vec_cond @0 @1 (view_convert? (uncond_op@5 @2 @3 @4)))

> +  (with { tree op_type = TREE_TYPE (@5); }

> +   (if (element_precision (type) == element_precision (op_type))

> +    (view_convert (cond_op (bit_not @0) @2 @3 @4

> +                 (view_convert:op_type @1)))))))

> +

>   /* Detect cases in which a VEC_COND_EXPR effectively replaces the

>      "else" value of an IFN_COND_*.  */

>   (for cond_op (COND_BINARY)

> @@ -4806,3 +4827,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

>     (with { tree op_type = TREE_TYPE (@3); }

>      (if (element_precision (type) == element_precision (op_type))

>       (view_convert (cond_op @0 @1 @2 (view_convert:op_type @4)))))))

> +

> +/* Same for ternary operations.  */

> +(for cond_op (COND_TERNARY)

> + (simplify

> +  (vec_cond @0 (view_convert? (cond_op @0 @1 @2 @3 @4)) @5)

> +  (with { tree op_type = TREE_TYPE (@4); }

> +   (if (element_precision (type) == element_precision (op_type))

> +    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @5)))))))

> Index: gcc/config/aarch64/aarch64.c

> ===================================================================

> --- gcc/config/aarch64/aarch64.c        2018-05-24 10:33:30.867095262

+0100
> +++ gcc/config/aarch64/aarch64.c        2018-05-24 13:05:46.046608817

+0100
> @@ -1292,14 +1292,18 @@ aarch64_get_mask_mode (poly_uint64 nunit

>     return default_get_mask_mode (nunits, nbytes);

>   }


> -/* Implement TARGET_PREFERRED_ELSE_VALUE.  Prefer to use the first

> -   arithmetic operand as the else value if the else value doesn't matter,

> -   since that exactly matches the SVE destructive merging form.  */

> +/* Implement TARGET_PREFERRED_ELSE_VALUE.  For binary operations,

> +   prefer to use the first arithmetic operand as the else value if

> +   the else value doesn't matter, since that exactly matches the SVE

> +   destructive merging form.  For ternary operations we could either

> +   pick the first operand and use FMAD-like instructions or the last

> +   operand and use FMLA-like instructions; the latter seems more

> +   natural.  */


>   static tree

> -aarch64_preferred_else_value (unsigned, tree, unsigned int, tree *ops)

> +aarch64_preferred_else_value (unsigned, tree, unsigned int nops, tree

*ops)
>   {

> -  return ops[0];

> +  return nops == 3 ? ops[2] : ops[0];

>   }


>   /* Implement TARGET_HARD_REGNO_NREGS.  */

> Index: gcc/config/aarch64/aarch64-sve.md

> ===================================================================

> --- gcc/config/aarch64/aarch64-sve.md   2018-05-24 10:12:10.141352356

+0100
> +++ gcc/config/aarch64/aarch64-sve.md   2018-05-24 13:05:46.044611277

+0100
> @@ -2688,6 +2688,58 @@ (define_insn "*cond_<optab><mode>"

>     "<sve_fp_op>r\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>"

>   )


> +;; Predicated floating-point ternary operations with select.

> +(define_expand "cond_<optab><mode>"

> +  [(set (match_operand:SVE_F 0 "register_operand")

> +       (unspec:SVE_F

> +         [(match_operand:<VPRED> 1 "register_operand")

> +          (unspec:SVE_F

> +            [(match_dup 1)

> +             (match_operand:SVE_F 2 "register_operand")

> +             (match_operand:SVE_F 3 "register_operand")

> +             (match_operand:SVE_F 4 "register_operand")]

> +            SVE_COND_FP_TERNARY)

> +          (match_operand:SVE_F 5 "register_operand")]

> +         UNSPEC_SEL))]

> +  "TARGET_SVE"

> +{

> +  aarch64_sve_prepare_conditional_op (operands, 6, true);

> +})

> +

> +;; Predicated floating-point ternary operations using the FMAD-like form.

> +(define_insn "*cond_<optab><mode>"

> +  [(set (match_operand:SVE_F 0 "register_operand" "=w")

> +       (unspec:SVE_F

> +         [(match_operand:<VPRED> 1 "register_operand" "Upl")

> +          (unspec:SVE_F

> +            [(match_dup 1)

> +             (match_operand:SVE_F 2 "register_operand" "0")

> +             (match_operand:SVE_F 3 "register_operand" "w")

> +             (match_operand:SVE_F 4 "register_operand" "w")]

> +            SVE_COND_FP_TERNARY)

> +          (match_dup 2)]

> +         UNSPEC_SEL))]

> +  "TARGET_SVE"

> +  "<sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"

> +)

> +

> +;; Predicated floating-point ternary operations using the FMLA-like form.

> +(define_insn "*cond_<optab><mode>_acc"

> +  [(set (match_operand:SVE_F 0 "register_operand" "=w")

> +       (unspec:SVE_F

> +         [(match_operand:<VPRED> 1 "register_operand" "Upl")

> +          (unspec:SVE_F

> +            [(match_dup 1)

> +             (match_operand:SVE_F 2 "register_operand" "w")

> +             (match_operand:SVE_F 3 "register_operand" "w")

> +             (match_operand:SVE_F 4 "register_operand" "0")]

> +            SVE_COND_FP_TERNARY)

> +          (match_dup 4)]

> +         UNSPEC_SEL))]

> +  "TARGET_SVE"

> +  "<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>"

> +)

> +

>   ;; Shift an SVE vector left and insert a scalar into element 0.

>   (define_insn "vec_shl_insert_<mode>"

>     [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w")

> Index: gcc/config/aarch64/iterators.md

> ===================================================================

> --- gcc/config/aarch64/iterators.md     2018-05-24 10:12:10.142352315

+0100
> +++ gcc/config/aarch64/iterators.md     2018-05-24 13:05:46.046608817

+0100
> @@ -468,6 +468,10 @@ (define_c_enum "unspec"

>       UNSPEC_COND_DIV    ; Used in aarch64-sve.md.

>       UNSPEC_COND_MAX    ; Used in aarch64-sve.md.

>       UNSPEC_COND_MIN    ; Used in aarch64-sve.md.

> +    UNSPEC_COND_FMLA   ; Used in aarch64-sve.md.

> +    UNSPEC_COND_FMLS   ; Used in aarch64-sve.md.

> +    UNSPEC_COND_FNMLA  ; Used in aarch64-sve.md.

> +    UNSPEC_COND_FNMLS  ; Used in aarch64-sve.md.

>       UNSPEC_COND_LT     ; Used in aarch64-sve.md.

>       UNSPEC_COND_LE     ; Used in aarch64-sve.md.

>       UNSPEC_COND_EQ     ; Used in aarch64-sve.md.

> @@ -1549,6 +1553,11 @@ (define_int_iterator SVE_COND_FP_BINARY


>   (define_int_iterator SVE_COND_FP_BINARY_REV [UNSPEC_COND_SUB

UNSPEC_COND_DIV])

> +(define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA

> +                                         UNSPEC_COND_FMLS

> +                                         UNSPEC_COND_FNMLA

> +                                         UNSPEC_COND_FNMLS])

> +

>   (define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_LT UNSPEC_COND_LE

>                                        UNSPEC_COND_EQ UNSPEC_COND_NE

>                                        UNSPEC_COND_GE UNSPEC_COND_GT])

> @@ -1581,7 +1590,11 @@ (define_int_attr optab [(UNSPEC_ANDF "an

>                          (UNSPEC_COND_MUL "mul")

>                          (UNSPEC_COND_DIV "div")

>                          (UNSPEC_COND_MAX "smax")

> -                       (UNSPEC_COND_MIN "smin")])

> +                       (UNSPEC_COND_MIN "smin")

> +                       (UNSPEC_COND_FMLA "fma")

> +                       (UNSPEC_COND_FMLS "fnma")

> +                       (UNSPEC_COND_FNMLA "fnms")

> +                       (UNSPEC_COND_FNMLS "fms")])


>   (define_int_attr  maxmin_uns [(UNSPEC_UMAXV "umax")

>                                (UNSPEC_UMINV "umin")

> @@ -1799,6 +1812,16 @@ (define_int_attr sve_fp_op [(UNSPEC_COND

>                              (UNSPEC_COND_MAX "fmaxnm")

>                              (UNSPEC_COND_MIN "fminnm")])


> +(define_int_attr sve_fmla_op [(UNSPEC_COND_FMLA "fmla")

> +                             (UNSPEC_COND_FMLS "fmls")

> +                             (UNSPEC_COND_FNMLA "fnmla")

> +                             (UNSPEC_COND_FNMLS "fnmls")])

> +

> +(define_int_attr sve_fmad_op [(UNSPEC_COND_FMLA "fmad")

> +                             (UNSPEC_COND_FMLS "fmsb")

> +                             (UNSPEC_COND_FNMLA "fnmad")

> +                             (UNSPEC_COND_FNMLS "fnmsb")])

> +

>   (define_int_attr commutative [(UNSPEC_COND_ADD "true")

>                                (UNSPEC_COND_SUB "false")

>                                (UNSPEC_COND_MUL "true")

> Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c       2018-05-24

13:05:46.049605128 +0100
> @@ -0,0 +1,63 @@

> +/* { dg-require-effective-target scalar_all_fma } */

> +/* { dg-additional-options "-fdump-tree-optimized" } */

> +

> +#include "tree-vect.h"

> +

> +#define N (VECTOR_BITS * 11 / 64 + 3)

> +

> +#define DEF(INV)                                       \

> +  void __attribute__ ((noipa))                         \

> +  f_##INV (double *restrict a, double *restrict b,     \

> +          double *restrict c, double *restrict d)      \

> +  {                                                    \

> +    for (int i = 0; i < N; ++i)                                \

> +      {                                                        \

> +       double mb = (INV & 1 ? -b[i] : b[i]);           \

> +       double mc = c[i];                               \

> +       double md = (INV & 2 ? -d[i] : d[i]);           \

> +       double fma = __builtin_fma (mb, mc, md);        \

> +       double truev = (INV & 4 ? -fma : fma);          \

> +       a[i] = b[i] < 10 ? truev : 10.0;                \

> +      }                                                        \

> +  }

> +

> +#define TEST(INV)                                      \

> +  {                                                    \

> +    f_##INV (a, b, c, d);                              \

> +    for (int i = 0; i < N; ++i)                                \

> +      {                                                        \

> +       double mb = (INV & 1 ? -b[i] : b[i]);           \

> +       double mc = c[i];                               \

> +       double md = (INV & 2 ? -d[i] : d[i]);           \

> +       double fma = __builtin_fma (mb, mc, md);        \

> +       double truev = (INV & 4 ? -fma : fma);          \

> +       if (a[i] != (i % 17 < 10 ? truev : 10.0))       \

> +         __builtin_abort ();                           \

> +       asm volatile ("" ::: "memory");                 \

> +      }                                                        \

> +  }

> +

> +#define FOR_EACH_INV(T) \

> +  T (0) T (1) T (2) T (3) T (4) T (5) T (6) T (7)

> +

> +FOR_EACH_INV (DEF)

> +

> +int

> +main (void)

> +{

> +  double a[N], b[N], c[N], d[N];

> +  for (int i = 0; i < N; ++i)

> +    {

> +      b[i] = i % 17;

> +      c[i] = i % 9 + 11;

> +      d[i] = i % 13 + 14;

> +      asm volatile ("" ::: "memory");

> +    }

> +  FOR_EACH_INV (TEST)

> +  return 0;

> +}

> +

> +/* { dg-final { scan-tree-dump-times { = \.COND_FMA } 2 "optimized" {

target vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump-times { = \.COND_FMS } 2 "optimized" {

target vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump-times { = \.COND_FNMA } 2 "optimized" {

target vect_double_cond_arith } } } */
> +/* { dg-final { scan-tree-dump-times { = \.COND_FNMS } 2 "optimized" {

target vect_double_cond_arith } } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_13.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_13.c     2018-05-24

13:05:46.049605128 +0100
> @@ -0,0 +1,58 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#define N 119

> +

> +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX)                   \

> +  void __attribute__ ((noipa))                                 \

> +  f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b,      \

> +                     TYPE *restrict c, TYPE *restrict d,       \

> +                     CMPTYPE *restrict cond)                   \

> +  {                                                            \

> +    for (int i = 0; i < N; ++i)                                        \

> +      {                                                                \

> +       TYPE mb = (INV & 1 ? -b[i] : b[i]);                     \

> +       TYPE mc = c[i];                                         \

> +       TYPE md = (INV & 2 ? -d[i] : d[i]);                     \

> +       TYPE fma = __builtin_fma##SUFFIX (mb, mc, md);          \

> +       TYPE truev = (INV & 4 ? -fma : fma);                    \

> +       a[i] = cond[i] < 10 ? truev : b[i];                     \

> +      }                                                                \

> +  }

> +

> +#define FOR_EACH_TYPE(T, INV)                  \

> +  T (INV, _Float16, short, f16)                        \

> +  T (INV, float, float, f32)                   \

> +  T (INV, double, double, f64)

> +

> +#define FOR_EACH_INV(T)                                \

> +  FOR_EACH_TYPE (T, 0)                         \

> +  FOR_EACH_TYPE (T, 1)                         \

> +  FOR_EACH_TYPE (T, 2)                         \

> +  FOR_EACH_TYPE (T, 3)                         \

> +  FOR_EACH_TYPE (T, 4)                         \

> +  FOR_EACH_TYPE (T, 5)                         \

> +  FOR_EACH_TYPE (T, 6)                         \

> +  FOR_EACH_TYPE (T, 7)

> +

> +FOR_EACH_INV (DEF_LOOP)

> +

> +/* { dg-final { scan-assembler-not {\tsel\t} } } */

> +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */

> +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */

> +

> +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */

> +

> +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */

> +

> +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */

> +

> +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_13_run.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_13_run.c 2018-05-24

13:05:46.050603898 +0100
> @@ -0,0 +1,37 @@

> +/* { dg-do run { target aarch64_sve_hw } } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include "vcond_13.c"

> +

> +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX)          \

> +  {                                                    \

> +    TYPE a[N], b[N], c[N], d[N];                       \

> +    CMPTYPE cond[N];                                   \

> +    for (int i = 0; i < N; ++i)                                \

> +      {                                                        \

> +       b[i] = i % 15;                                  \

> +       c[i] = i % 9 + 11;                              \

> +       d[i] = i % 13 + 14;                             \

> +       cond[i] = i % 17;                               \

> +       asm volatile ("" ::: "memory");                 \

> +      }                                                        \

> +    f_##INV##_##SUFFIX (a, b, c, d, cond);             \

> +    for (int i = 0; i < N; ++i)                                \

> +      {                                                        \

> +       double mb = (INV & 1 ? -b[i] : b[i]);           \

> +       double mc = c[i];                               \

> +       double md = (INV & 2 ? -d[i] : d[i]);           \

> +       double fma = __builtin_fma (mb, mc, md);        \

> +       double truev = (INV & 4 ? -fma : fma);          \

> +       if (a[i] != (i % 17 < 10 ? truev : b[i]))       \

> +         __builtin_abort ();                           \

> +       asm volatile ("" ::: "memory");                 \

> +      }                                                        \

> +  }

> +

> +int

> +main (void)

> +{

> +  FOR_EACH_INV (TEST_LOOP)

> +  return 0;

> +}

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_14.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_14.c     2018-05-24

13:05:46.050603898 +0100
> @@ -0,0 +1,58 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#define N 119

> +

> +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX)                   \

> +  void __attribute__ ((noipa))                                 \

> +  f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b,      \

> +                     TYPE *restrict c, TYPE *restrict d,       \

> +                     CMPTYPE *restrict cond)                   \

> +  {                                                            \

> +    for (int i = 0; i < N; ++i)                                        \

> +      {                                                                \

> +       TYPE mb = (INV & 1 ? -b[i] : b[i]);                     \

> +       TYPE mc = c[i];                                         \

> +       TYPE md = (INV & 2 ? -d[i] : d[i]);                     \

> +       TYPE fma = __builtin_fma##SUFFIX (mb, mc, md);          \

> +       TYPE truev = (INV & 4 ? -fma : fma);                    \

> +       a[i] = cond[i] < 10 ? truev : c[i];                     \

> +      }                                                                \

> +  }

> +

> +#define FOR_EACH_TYPE(T, INV)                  \

> +  T (INV, _Float16, short, f16)                        \

> +  T (INV, float, float, f32)                   \

> +  T (INV, double, double, f64)

> +

> +#define FOR_EACH_INV(T)                                \

> +  FOR_EACH_TYPE (T, 0)                         \

> +  FOR_EACH_TYPE (T, 1)                         \

> +  FOR_EACH_TYPE (T, 2)                         \

> +  FOR_EACH_TYPE (T, 3)                         \

> +  FOR_EACH_TYPE (T, 4)                         \

> +  FOR_EACH_TYPE (T, 5)                         \

> +  FOR_EACH_TYPE (T, 6)                         \

> +  FOR_EACH_TYPE (T, 7)

> +

> +FOR_EACH_INV (DEF_LOOP)

> +

> +/* { dg-final { scan-assembler-not {\tsel\t} } } */

> +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */

> +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */

> +

> +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */

> +

> +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */

> +

> +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */

> +

> +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_14_run.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_14_run.c 2018-05-24

13:05:46.050603898 +0100
> @@ -0,0 +1,37 @@

> +/* { dg-do run { target aarch64_sve_hw } } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include "vcond_14.c"

> +

> +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX)          \

> +  {                                                    \

> +    TYPE a[N], b[N], c[N], d[N];                       \

> +    CMPTYPE cond[N];                                   \

> +    for (int i = 0; i < N; ++i)                                \

> +      {                                                        \

> +       b[i] = i % 15;                                  \

> +       c[i] = i % 9 + 11;                              \

> +       d[i] = i % 13 + 14;                             \

> +       cond[i] = i % 17;                               \

> +       asm volatile ("" ::: "memory");                 \

> +      }                                                        \

> +    f_##INV##_##SUFFIX (a, b, c, d, cond);             \

> +    for (int i = 0; i < N; ++i)                                \

> +      {                                                        \

> +       double mb = (INV & 1 ? -b[i] : b[i]);           \

> +       double mc = c[i];                               \

> +       double md = (INV & 2 ? -d[i] : d[i]);           \

> +       double fma = __builtin_fma (mb, mc, md);        \

> +       double truev = (INV & 4 ? -fma : fma);          \

> +       if (a[i] != (i % 17 < 10 ? truev : c[i]))       \

> +         __builtin_abort ();                           \

> +       asm volatile ("" ::: "memory");                 \

> +      }                                                        \

> +  }

> +

> +int

> +main (void)

> +{

> +  FOR_EACH_INV (TEST_LOOP)

> +  return 0;

> +}

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_15.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_15.c     2018-05-24

13:05:46.050603898 +0100
> @@ -0,0 +1,58 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#define N 119

> +

> +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX)                   \

> +  void __attribute__ ((noipa))                                 \

> +  f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b,      \

> +                     TYPE *restrict c, TYPE *restrict d,       \

> +                     CMPTYPE *restrict cond)                   \

> +  {                                                            \

> +    for (int i = 0; i < N; ++i)                                        \

> +      {                                                                \

> +       TYPE mb = (INV & 1 ? -b[i] : b[i]);                     \

> +       TYPE mc = c[i];                                         \

> +       TYPE md = (INV & 2 ? -d[i] : d[i]);                     \

> +       TYPE fma = __builtin_fma##SUFFIX (mb, mc, md);          \

> +       TYPE truev = (INV & 4 ? -fma : fma);                    \

> +       a[i] = cond[i] < 10 ? truev : d[i];                     \

> +      }                                                                \

> +  }

> +

> +#define FOR_EACH_TYPE(T, INV)                  \

> +  T (INV, _Float16, short, f16)                        \

> +  T (INV, float, float, f32)                   \

> +  T (INV, double, double, f64)

> +

> +#define FOR_EACH_INV(T)                                \

> +  FOR_EACH_TYPE (T, 0)                         \

> +  FOR_EACH_TYPE (T, 1)                         \

> +  FOR_EACH_TYPE (T, 2)                         \

> +  FOR_EACH_TYPE (T, 3)                         \

> +  FOR_EACH_TYPE (T, 4)                         \

> +  FOR_EACH_TYPE (T, 5)                         \

> +  FOR_EACH_TYPE (T, 6)                         \

> +  FOR_EACH_TYPE (T, 7)

> +

> +FOR_EACH_INV (DEF_LOOP)

> +

> +/* { dg-final { scan-assembler-not {\tsel\t} } } */

> +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */

> +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */

> +

> +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d,} 2 } } */

> +

> +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.d,} 2 } } */

> +

> +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.d,} 2 } } */

> +

> +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.d,} 2 } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_15_run.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_15_run.c 2018-05-24

13:05:46.050603898 +0100
> @@ -0,0 +1,37 @@

> +/* { dg-do run { target aarch64_sve_hw } } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include "vcond_15.c"

> +

> +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX)          \

> +  {                                                    \

> +    TYPE a[N], b[N], c[N], d[N];                       \

> +    CMPTYPE cond[N];                                   \

> +    for (int i = 0; i < N; ++i)                                \

> +      {                                                        \

> +       b[i] = i % 15;                                  \

> +       c[i] = i % 9 + 11;                              \

> +       d[i] = i % 13 + 14;                             \

> +       cond[i] = i % 17;                               \

> +       asm volatile ("" ::: "memory");                 \

> +      }                                                        \

> +    f_##INV##_##SUFFIX (a, b, c, d, cond);             \

> +    for (int i = 0; i < N; ++i)                                \

> +      {                                                        \

> +       double mb = (INV & 1 ? -b[i] : b[i]);           \

> +       double mc = c[i];                               \

> +       double md = (INV & 2 ? -d[i] : d[i]);           \

> +       double fma = __builtin_fma (mb, mc, md);        \

> +       double truev = (INV & 4 ? -fma : fma);          \

> +       if (a[i] != (i % 17 < 10 ? truev : d[i]))       \

> +         __builtin_abort ();                           \

> +       asm volatile ("" ::: "memory");                 \

> +      }                                                        \

> +  }

> +

> +int

> +main (void)

> +{

> +  FOR_EACH_INV (TEST_LOOP)

> +  return 0;

> +}

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_16.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_16.c     2018-05-24

13:05:46.050603898 +0100
> @@ -0,0 +1,58 @@

> +/* { dg-do compile } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#define N 119

> +

> +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX)                   \

> +  void __attribute__ ((noipa))                                 \

> +  f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b,      \

> +                     TYPE *restrict c, TYPE *restrict d,       \

> +                     CMPTYPE *restrict cond)                   \

> +  {                                                            \

> +    for (int i = 0; i < N; ++i)                                        \

> +      {                                                                \

> +       TYPE mb = (INV & 1 ? -b[i] : b[i]);                     \

> +       TYPE mc = c[i];                                         \

> +       TYPE md = (INV & 2 ? -d[i] : d[i]);                     \

> +       TYPE fma = __builtin_fma##SUFFIX (mb, mc, md);          \

> +       TYPE truev = (INV & 4 ? -fma : fma);                    \

> +       a[i] = cond[i] < 10 ? truev : 10;                       \

> +      }                                                                \

> +  }

> +

> +#define FOR_EACH_TYPE(T, INV)                  \

> +  T (INV, _Float16, short, f16)                        \

> +  T (INV, float, float, f32)                   \

> +  T (INV, double, double, f64)

> +

> +#define FOR_EACH_INV(T)                                \

> +  FOR_EACH_TYPE (T, 0)                         \

> +  FOR_EACH_TYPE (T, 1)                         \

> +  FOR_EACH_TYPE (T, 2)                         \

> +  FOR_EACH_TYPE (T, 3)                         \

> +  FOR_EACH_TYPE (T, 4)                         \

> +  FOR_EACH_TYPE (T, 5)                         \

> +  FOR_EACH_TYPE (T, 6)                         \

> +  FOR_EACH_TYPE (T, 7)

> +

> +FOR_EACH_INV (DEF_LOOP)

> +

> +/* { dg-final { scan-assembler-times {\tsel\t} 24 } } */

> +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */

> +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */

> +

> +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */

> +

> +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */

> +

> +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */

> +

> +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */

> +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */

> Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_16_run.c

> ===================================================================

> --- /dev/null   2018-04-20 16:19:46.369131350 +0100

> +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_16_run.c 2018-05-24

13:05:46.050603898 +0100
> @@ -0,0 +1,37 @@

> +/* { dg-do run { target aarch64_sve_hw } } */

> +/* { dg-options "-O2 -ftree-vectorize" } */

> +

> +#include "vcond_16.c"

> +

> +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX)          \

> +  {                                                    \

> +    TYPE a[N], b[N], c[N], d[N];                       \

> +    CMPTYPE cond[N];                                   \

> +    for (int i = 0; i < N; ++i)                                \

> +      {                                                        \

> +       b[i] = i % 15;                                  \

> +       c[i] = i % 9 + 11;                              \

> +       d[i] = i % 13 + 14;                             \

> +       cond[i] = i % 17;                               \

> +       asm volatile ("" ::: "memory");                 \

> +      }                                                        \

> +    f_##INV##_##SUFFIX (a, b, c, d, cond);             \

> +    for (int i = 0; i < N; ++i)                                \

> +      {                                                        \

> +       double mb = (INV & 1 ? -b[i] : b[i]);           \

> +       double mc = c[i];                               \

> +       double md = (INV & 2 ? -d[i] : d[i]);           \

> +       double fma = __builtin_fma (mb, mc, md);        \

> +       double truev = (INV & 4 ? -fma : fma);          \

> +       if (a[i] != (i % 17 < 10 ? truev : 10))         \

> +         __builtin_abort ();                           \

> +       asm volatile ("" ::: "memory");                 \

> +      }                                                        \

> +  }

> +

> +int

> +main (void)

> +{

> +  FOR_EACH_INV (TEST_LOOP)

> +  return 0;

> +}
Richard Sandiford July 12, 2018, 1:04 p.m. UTC | #2
Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, May 24, 2018 at 2:08 PM Richard Sandiford <

> richard.sandiford@linaro.org> wrote:

>

>> This patch adds conditional equivalents of the IFN_FMA built-in functions.

>> Most of it is just a mechanical extension of the binary stuff.

>

>> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf

>> and x86_64-linux-gnu.  OK for the non-AArch64 bits?

>

> OK.


Thanks.  For the record, here's what I installed after updating
the SVE patterns in line with rth's recent MOVPRFX changes.

Richard

2018-07-12  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* doc/md.texi (cond_fma, cond_fms, cond_fnma, cond_fnms): Document.
	* optabs.def (cond_fma_optab, cond_fms_optab, cond_fnma_optab)
	(cond_fnms_optab): New optabs.
	* internal-fn.def (COND_FMA, COND_FMS, COND_FNMA, COND_FNMS): New
	internal functions.
	(FMA): Use DEF_INTERNAL_FLT_FN rather than DEF_INTERNAL_FLT_FLOATN_FN.
	* internal-fn.h (get_conditional_internal_fn): Declare.
	(get_unconditional_internal_fn): Likewise.
	* internal-fn.c (cond_ternary_direct): New macro.
	(expand_cond_ternary_optab_fn): Likewise.
	(direct_cond_ternary_optab_supported_p): Likewise.
	(FOR_EACH_COND_FN_PAIR): Likewise.
	(get_conditional_internal_fn): New function.
	(get_unconditional_internal_fn): Likewise.
	* gimple-match.h (gimple_match_op::MAX_NUM_OPS): Bump to 5.
	(gimple_match_op::gimple_match_op): Add a new overload for 5
	operands.
	(gimple_match_op::set_op): Likewise.
	(gimple_resimplify5): Declare.
	* genmatch.c (decision_tree::gen): Generate simplifications for
	5 operands.
	* gimple-match-head.c (gimple_simplify): Define an overload for
	5 operands.  Handle calls with 5 arguments in the top-level overload.
	(convert_conditional_op): Handle conversions from unconditional
	internal functions to conditional ones.
	(gimple_resimplify5): New function.
	(build_call_internal): Pass a fifth operand.
	(maybe_push_res_to_seq): Likewise.
	(try_conditional_simplification): Try converting conditional
	internal functions to unconditional internal functions.
	Handle 3-operand unconditional forms.
	* match.pd (UNCOND_TERNARY, COND_TERNARY): Operator lists.
	Define ternary equivalents of the current rules for binary conditional
	internal functions.
	* config/aarch64/aarch64.c (aarch64_preferred_else_value): Handle
	ternary operations.
	* config/aarch64/iterators.md (UNSPEC_COND_FMLA, UNSPEC_COND_FMLS)
	(UNSPEC_COND_FNMLA, UNSPEC_COND_FNMLS): New unspecs.
	(optab): Handle them.
	(SVE_COND_FP_TERNARY): New int iterator.
	(sve_fmla_op, sve_fmad_op): New int attributes.
	* config/aarch64/aarch64-sve.md (cond_<optab><mode>)
	(*cond_<optab><mode>_2, *cond_<optab><mode_4)
	(*cond_<optab><mode>_any): New SVE_COND_FP_TERNARY patterns.

gcc/testsuite/
	* gcc.dg/vect/vect-cond-arith-3.c: New test.
	* gcc.target/aarch64/sve/vcond_13.c: Likewise.
	* gcc.target/aarch64/sve/vcond_13_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_14.c: Likewise.
	* gcc.target/aarch64/sve/vcond_14_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_15.c: Likewise.
	* gcc.target/aarch64/sve/vcond_15_run.c: Likewise.
	* gcc.target/aarch64/sve/vcond_16.c: Likewise.
	* gcc.target/aarch64/sve/vcond_16_run.c: Likewise.

Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2018-07-12 12:39:27.789323671 +0100
+++ gcc/doc/md.texi	2018-07-12 12:42:44.366933190 +0100
@@ -6438,6 +6438,23 @@ Operands 0, 2, 3 and 4 all have mode @va
 integer if @var{m} is scalar, otherwise it has the mode returned by
 @code{TARGET_VECTORIZE_GET_MASK_MODE}.
 
+@cindex @code{cond_fma@var{mode}} instruction pattern
+@cindex @code{cond_fms@var{mode}} instruction pattern
+@cindex @code{cond_fnma@var{mode}} instruction pattern
+@cindex @code{cond_fnms@var{mode}} instruction pattern
+@item @samp{cond_fma@var{mode}}
+@itemx @samp{cond_fms@var{mode}}
+@itemx @samp{cond_fnma@var{mode}}
+@itemx @samp{cond_fnms@var{mode}}
+Like @samp{cond_add@var{m}}, except that the conditional operation
+takes 3 operands rather than two.  For example, the vector form of
+@samp{cond_fma@var{mode}} is equivalent to:
+
+@smallexample
+for (i = 0; i < GET_MODE_NUNITS (@var{m}); i++)
+  op0[i] = op1[i] ? fma (op2[i], op3[i], op4[i]) : op5[i];
+@end smallexample
+
 @cindex @code{neg@var{mode}cc} instruction pattern
 @item @samp{neg@var{mode}cc}
 Similar to @samp{mov@var{mode}cc} but for conditional negation.  Conditionally
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def	2018-07-12 12:39:27.976869878 +0100
+++ gcc/optabs.def	2018-07-12 12:42:44.368856626 +0100
@@ -234,6 +234,10 @@ OPTAB_D (cond_smin_optab, "cond_smin$a")
 OPTAB_D (cond_smax_optab, "cond_smax$a")
 OPTAB_D (cond_umin_optab, "cond_umin$a")
 OPTAB_D (cond_umax_optab, "cond_umax$a")
+OPTAB_D (cond_fma_optab, "cond_fma$a")
+OPTAB_D (cond_fms_optab, "cond_fms$a")
+OPTAB_D (cond_fnma_optab, "cond_fnma$a")
+OPTAB_D (cond_fnms_optab, "cond_fnms$a")
 OPTAB_D (cmov_optab, "cmov$a6")
 OPTAB_D (cstore_optab, "cstore$a4")
 OPTAB_D (ctrap_optab, "ctrap$a4")
Index: gcc/internal-fn.def
===================================================================
--- gcc/internal-fn.def	2018-07-12 12:39:28.919588848 +0100
+++ gcc/internal-fn.def	2018-07-12 12:42:44.367894908 +0100
@@ -59,7 +59,8 @@ along with GCC; see the file COPYING3.
    - binary: a normal binary optab, such as vec_interleave_lo_<mode>
    - ternary: a normal ternary optab, such as fma<mode>4
 
-   - cond_binary: a conditional binary optab, such as add<mode>cc
+   - cond_binary: a conditional binary optab, such as cond_add<mode>
+   - cond_ternary: a conditional ternary optab, such as cond_fma_rev<mode>
 
    - fold_left: for scalar = FN (scalar, vector), keyed off the vector mode
 
@@ -167,6 +168,11 @@ DEF_INTERNAL_OPTAB_FN (COND_IOR, ECF_CON
 DEF_INTERNAL_OPTAB_FN (COND_XOR, ECF_CONST | ECF_NOTHROW,
 		       cond_xor, cond_binary)
 
+DEF_INTERNAL_OPTAB_FN (COND_FMA, ECF_CONST, cond_fma, cond_ternary)
+DEF_INTERNAL_OPTAB_FN (COND_FMS, ECF_CONST, cond_fms, cond_ternary)
+DEF_INTERNAL_OPTAB_FN (COND_FNMA, ECF_CONST, cond_fnma, cond_ternary)
+DEF_INTERNAL_OPTAB_FN (COND_FNMS, ECF_CONST, cond_fnms, cond_ternary)
+
 DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary)
 
 DEF_INTERNAL_OPTAB_FN (REDUC_PLUS, ECF_CONST | ECF_NOTHROW,
@@ -235,7 +241,7 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
 
 /* Ternary math functions.  */
-DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary)
+DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
 
 /* Unary integer ops.  */
 DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
Index: gcc/internal-fn.h
===================================================================
--- gcc/internal-fn.h	2018-07-12 12:41:44.919389631 +0100
+++ gcc/internal-fn.h	2018-07-12 12:42:44.367894908 +0100
@@ -193,7 +193,9 @@ direct_internal_fn_supported_p (internal
 extern bool set_edom_supported_p (void);
 
 extern internal_fn get_conditional_internal_fn (tree_code);
+extern internal_fn get_conditional_internal_fn (internal_fn);
 extern tree_code conditional_internal_fn_code (internal_fn);
+extern internal_fn get_unconditional_internal_fn (internal_fn);
 
 extern bool internal_load_fn_p (internal_fn);
 extern bool internal_store_fn_p (internal_fn);
Index: gcc/internal-fn.c
===================================================================
--- gcc/internal-fn.c	2018-07-12 12:41:44.919389631 +0100
+++ gcc/internal-fn.c	2018-07-12 12:42:44.367894908 +0100
@@ -113,6 +113,7 @@ #define binary_direct { 0, 0, true }
 #define ternary_direct { 0, 0, true }
 #define cond_unary_direct { 1, 1, true }
 #define cond_binary_direct { 1, 1, true }
+#define cond_ternary_direct { 1, 1, true }
 #define while_direct { 0, 2, false }
 #define fold_extract_direct { 2, 2, false }
 #define fold_left_direct { 1, 1, false }
@@ -2993,6 +2994,9 @@ #define expand_cond_unary_optab_fn(FN, S
 #define expand_cond_binary_optab_fn(FN, STMT, OPTAB) \
   expand_direct_optab_fn (FN, STMT, OPTAB, 4)
 
+#define expand_cond_ternary_optab_fn(FN, STMT, OPTAB) \
+  expand_direct_optab_fn (FN, STMT, OPTAB, 5)
+
 #define expand_fold_extract_optab_fn(FN, STMT, OPTAB) \
   expand_direct_optab_fn (FN, STMT, OPTAB, 3)
 
@@ -3075,6 +3079,7 @@ #define direct_binary_optab_supported_p
 #define direct_ternary_optab_supported_p direct_optab_supported_p
 #define direct_cond_unary_optab_supported_p direct_optab_supported_p
 #define direct_cond_binary_optab_supported_p direct_optab_supported_p
+#define direct_cond_ternary_optab_supported_p direct_optab_supported_p
 #define direct_mask_load_optab_supported_p direct_optab_supported_p
 #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
@@ -3277,6 +3282,57 @@ #define CASE(CODE, IFN) case IFN: return
     }
 }
 
+/* Invoke T(IFN) for each internal function IFN that also has an
+   IFN_COND_* form.  */
+#define FOR_EACH_COND_FN_PAIR(T) \
+  T (FMA) \
+  T (FMS) \
+  T (FNMA) \
+  T (FNMS)
+
+/* Return a function that only performs internal function FN when a
+   certain condition is met and that uses a given fallback value otherwise.
+   In other words, the returned function FN' is such that:
+
+     LHS = FN' (COND, A1, ... An, ELSE)
+
+   is equivalent to the C expression:
+
+     LHS = COND ? FN (A1, ..., An) : ELSE;
+
+   operating elementwise if the operands are vectors.
+
+   Return IFN_LAST if no such function exists.  */
+
+internal_fn
+get_conditional_internal_fn (internal_fn fn)
+{
+  switch (fn)
+    {
+#define CASE(NAME) case IFN_##NAME: return IFN_COND_##NAME;
+      FOR_EACH_COND_FN_PAIR(CASE)
+#undef CASE
+    default:
+      return IFN_LAST;
+    }
+}
+
+/* If IFN implements the conditional form of an unconditional internal
+   function, return that unconditional function, otherwise return IFN_LAST.  */
+
+internal_fn
+get_unconditional_internal_fn (internal_fn ifn)
+{
+  switch (ifn)
+    {
+#define CASE(NAME) case IFN_COND_##NAME: return IFN_##NAME;
+      FOR_EACH_COND_FN_PAIR(CASE)
+#undef CASE
+    default:
+      return IFN_LAST;
+    }
+}
+
 /* Return true if IFN is some form of load from memory.  */
 
 bool
Index: gcc/gimple-match.h
===================================================================
--- gcc/gimple-match.h	2018-07-12 12:41:44.919389631 +0100
+++ gcc/gimple-match.h	2018-07-12 12:42:44.367894908 +0100
@@ -91,18 +91,21 @@ struct gimple_match_op
 		   code_helper, tree, tree, tree, tree);
   gimple_match_op (const gimple_match_cond &,
 		   code_helper, tree, tree, tree, tree, tree);
+  gimple_match_op (const gimple_match_cond &,
+		   code_helper, tree, tree, tree, tree, tree, tree);
 
   void set_op (code_helper, tree, unsigned int);
   void set_op (code_helper, tree, tree);
   void set_op (code_helper, tree, tree, tree);
   void set_op (code_helper, tree, tree, tree, tree);
   void set_op (code_helper, tree, tree, tree, tree, tree);
+  void set_op (code_helper, tree, tree, tree, tree, tree, tree);
   void set_value (tree);
 
   tree op_or_null (unsigned int) const;
 
   /* The maximum value of NUM_OPS.  */
-  static const unsigned int MAX_NUM_OPS = 4;
+  static const unsigned int MAX_NUM_OPS = 5;
 
   /* The conditions under which the operation is performed, and the value to
      use as a fallback.  */
@@ -182,6 +185,20 @@ gimple_match_op::gimple_match_op (const
   ops[3] = op3;
 }
 
+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
+				  code_helper code_in, tree type_in,
+				  tree op0, tree op1, tree op2, tree op3,
+				  tree op4)
+  : cond (cond_in), code (code_in), type (type_in), num_ops (5)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+}
+
 /* Change the operation performed to CODE_IN, the type of the result to
    TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
    to set the operands itself.  */
@@ -242,6 +259,20 @@ gimple_match_op::set_op (code_helper cod
   ops[3] = op3;
 }
 
+inline void
+gimple_match_op::set_op (code_helper code_in, tree type_in,
+			 tree op0, tree op1, tree op2, tree op3, tree op4)
+{
+  code = code_in;
+  type = type_in;
+  num_ops = 5;
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+}
+
 /* Set the "operation" to be the single value VALUE, such as a constant
    or SSA_NAME.  */
 
@@ -279,6 +310,7 @@ bool gimple_resimplify1 (gimple_seq *, g
 bool gimple_resimplify2 (gimple_seq *, gimple_match_op *, tree (*)(tree));
 bool gimple_resimplify3 (gimple_seq *, gimple_match_op *, tree (*)(tree));
 bool gimple_resimplify4 (gimple_seq *, gimple_match_op *, tree (*)(tree));
+bool gimple_resimplify5 (gimple_seq *, gimple_match_op *, tree (*)(tree));
 tree maybe_push_res_to_seq (gimple_match_op *, gimple_seq *,
 			    tree res = NULL_TREE);
 void maybe_build_generic_op (gimple_match_op *);
Index: gcc/genmatch.c
===================================================================
--- gcc/genmatch.c	2018-07-12 12:41:44.918413713 +0100
+++ gcc/genmatch.c	2018-07-12 12:42:44.366933190 +0100
@@ -3750,7 +3750,7 @@ decision_tree::gen (FILE *f, bool gimple
     }
   fprintf (stderr, "removed %u duplicate tails\n", rcnt);
 
-  for (unsigned n = 1; n <= 4; ++n)
+  for (unsigned n = 1; n <= 5; ++n)
     {
       /* First generate split-out functions.  */
       for (unsigned i = 0; i < root->kids.length (); i++)
Index: gcc/gimple-match-head.c
===================================================================
--- gcc/gimple-match-head.c	2018-07-12 12:41:44.919389631 +0100
+++ gcc/gimple-match-head.c	2018-07-12 12:42:44.366933190 +0100
@@ -54,6 +54,8 @@ static bool gimple_simplify (gimple_matc
 			     code_helper, tree, tree, tree, tree);
 static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree),
 			     code_helper, tree, tree, tree, tree, tree);
+static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree),
+			     code_helper, tree, tree, tree, tree, tree, tree);
 
 const unsigned int gimple_match_op::MAX_NUM_OPS;
 
@@ -80,7 +82,12 @@ convert_conditional_op (gimple_match_op
   if (orig_op->code.is_tree_code ())
     ifn = get_conditional_internal_fn ((tree_code) orig_op->code);
   else
-    return false;
+    {
+      combined_fn cfn = orig_op->code;
+      if (!internal_fn_p (cfn))
+	return false;
+      ifn = get_conditional_internal_fn (as_internal_fn (cfn));
+    }
   if (ifn == IFN_LAST)
     return false;
   unsigned int num_ops = orig_op->num_ops;
@@ -403,6 +410,34 @@ gimple_resimplify4 (gimple_seq *seq, gim
   return false;
 }
 
+/* Helper that matches and simplifies the toplevel result from
+   a gimple_simplify run (where we don't want to build
+   a stmt in case it's used in in-place folding).  Replaces
+   RES_OP with a simplified and/or canonicalized result and
+   returns whether any change was made.  */
+
+bool
+gimple_resimplify5 (gimple_seq *seq, gimple_match_op *res_op,
+		    tree (*valueize)(tree))
+{
+  /* No constant folding is defined for five-operand functions.  */
+
+  gimple_match_op res_op2 (*res_op);
+  if (gimple_simplify (&res_op2, seq, valueize,
+		       res_op->code, res_op->type,
+		       res_op->ops[0], res_op->ops[1], res_op->ops[2],
+		       res_op->ops[3], res_op->ops[4]))
+    {
+      *res_op = res_op2;
+      return true;
+    }
+
+  if (maybe_resimplify_conditional_op (seq, res_op, valueize))
+    return true;
+
+  return false;
+}
+
 /* If in GIMPLE the operation described by RES_OP should be single-rhs,
    build a GENERIC tree for that expression and update RES_OP accordingly.  */
 
@@ -444,7 +479,8 @@ build_call_internal (internal_fn fn, gim
 				     res_op->op_or_null (0),
 				     res_op->op_or_null (1),
 				     res_op->op_or_null (2),
-				     res_op->op_or_null (3));
+				     res_op->op_or_null (3),
+				     res_op->op_or_null (4));
 }
 
 /* Push the exploded expression described by RES_OP as a statement to
@@ -538,7 +574,8 @@ maybe_push_res_to_seq (gimple_match_op *
 					res_op->op_or_null (0),
 					res_op->op_or_null (1),
 					res_op->op_or_null (2),
-					res_op->op_or_null (3));
+					res_op->op_or_null (3),
+					res_op->op_or_null (4));
 	}
       if (!res)
 	{
@@ -745,14 +782,22 @@ do_valueize (tree op, tree (*valueize)(t
 try_conditional_simplification (internal_fn ifn, gimple_match_op *res_op,
 				gimple_seq *seq, tree (*valueize) (tree))
 {
+  code_helper op;
   tree_code code = conditional_internal_fn_code (ifn);
-  if (code == ERROR_MARK)
-    return false;
+  if (code != ERROR_MARK)
+    op = code;
+  else
+    {
+      ifn = get_unconditional_internal_fn (ifn);
+      if (ifn == IFN_LAST)
+	return false;
+      op = as_combined_fn (ifn);
+    }
 
   unsigned int num_ops = res_op->num_ops;
   gimple_match_op cond_op (gimple_match_cond (res_op->ops[0],
 					      res_op->ops[num_ops - 1]),
-			   code, res_op->type, num_ops - 2);
+			   op, res_op->type, num_ops - 2);
   for (unsigned int i = 1; i < num_ops - 1; ++i)
     cond_op.ops[i - 1] = res_op->ops[i];
   switch (num_ops - 2)
@@ -761,6 +806,10 @@ try_conditional_simplification (internal
       if (!gimple_resimplify2 (seq, &cond_op, valueize))
 	return false;
       break;
+    case 3:
+      if (!gimple_resimplify3 (seq, &cond_op, valueize))
+	return false;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -893,7 +942,7 @@ gimple_simplify (gimple *stmt, gimple_ma
       /* ???  This way we can't simplify calls with side-effects.  */
       if (gimple_call_lhs (stmt) != NULL_TREE
 	  && gimple_call_num_args (stmt) >= 1
-	  && gimple_call_num_args (stmt) <= 4)
+	  && gimple_call_num_args (stmt) <= 5)
 	{
 	  bool valueized = false;
 	  combined_fn cfn;
@@ -943,6 +992,9 @@ gimple_simplify (gimple *stmt, gimple_ma
 	    case 4:
 	      return (gimple_resimplify4 (seq, res_op, valueize)
 		      || valueized);
+	    case 5:
+	      return (gimple_resimplify5 (seq, res_op, valueize)
+		      || valueized);
 	    default:
 	     gcc_unreachable ();
 	    }
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2018-07-12 12:41:44.920365549 +0100
+++ gcc/match.pd	2018-07-12 12:42:44.368856626 +0100
@@ -86,6 +86,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   IFN_COND_MUL IFN_COND_DIV IFN_COND_MOD IFN_COND_RDIV
   IFN_COND_MIN IFN_COND_MAX
   IFN_COND_AND IFN_COND_IOR IFN_COND_XOR)
+
+/* Same for ternary operations.  */
+(define_operator_list UNCOND_TERNARY
+  IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
+(define_operator_list COND_TERNARY
+  IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
     
 /* As opposed to convert?, this still creates a single pattern, so
    it is not a suitable replacement for convert? in all cases.  */
@@ -4885,6 +4891,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (if (element_precision (type) == element_precision (op_type))
     (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1)))))))
 
+/* Same for ternary operations.  */
+(for uncond_op (UNCOND_TERNARY)
+     cond_op (COND_TERNARY)
+ (simplify
+  (vec_cond @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4)
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (element_precision (type) == element_precision (op_type))
+    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4))))))
+ (simplify
+  (vec_cond @0 @1 (view_convert? (uncond_op@5 @2 @3 @4)))
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (element_precision (type) == element_precision (op_type))
+    (view_convert (cond_op (bit_not @0) @2 @3 @4
+		  (view_convert:op_type @1)))))))
+
 /* Detect cases in which a VEC_COND_EXPR effectively replaces the
    "else" value of an IFN_COND_*.  */
 (for cond_op (COND_BINARY)
@@ -4893,3 +4914,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (with { tree op_type = TREE_TYPE (@3); }
    (if (element_precision (type) == element_precision (op_type))
     (view_convert (cond_op @0 @1 @2 (view_convert:op_type @4)))))))
+
+/* Same for ternary operations.  */
+(for cond_op (COND_TERNARY)
+ (simplify
+  (vec_cond @0 (view_convert? (cond_op @0 @1 @2 @3 @4)) @5)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (element_precision (type) == element_precision (op_type))
+    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @5)))))))
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2018-07-12 12:41:44.915485958 +0100
+++ gcc/config/aarch64/aarch64.c	2018-07-12 12:42:44.365009753 +0100
@@ -1320,14 +1320,18 @@ aarch64_get_mask_mode (poly_uint64 nunit
   return default_get_mask_mode (nunits, nbytes);
 }
 
-/* Implement TARGET_PREFERRED_ELSE_VALUE.  Prefer to use the first
-   arithmetic operand as the else value if the else value doesn't matter,
-   since that exactly matches the SVE destructive merging form.  */
+/* Implement TARGET_PREFERRED_ELSE_VALUE.  For binary operations,
+   prefer to use the first arithmetic operand as the else value if
+   the else value doesn't matter, since that exactly matches the SVE
+   destructive merging form.  For ternary operations we could either
+   pick the first operand and use FMAD-like instructions or the last
+   operand and use FMLA-like instructions; the latter seems more
+   natural.  */
 
 static tree
-aarch64_preferred_else_value (unsigned, tree, unsigned int, tree *ops)
+aarch64_preferred_else_value (unsigned, tree, unsigned int nops, tree *ops)
 {
-  return ops[0];
+  return nops == 3 ? ops[2] : ops[0];
 }
 
 /* Implement TARGET_HARD_REGNO_NREGS.  */
Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md	2018-07-12 12:39:29.421374713 +0100
+++ gcc/config/aarch64/iterators.md	2018-07-12 12:42:44.365009753 +0100
@@ -471,6 +471,10 @@ (define_c_enum "unspec"
     UNSPEC_COND_DIV	; Used in aarch64-sve.md.
     UNSPEC_COND_MAX	; Used in aarch64-sve.md.
     UNSPEC_COND_MIN	; Used in aarch64-sve.md.
+    UNSPEC_COND_FMLA	; Used in aarch64-sve.md.
+    UNSPEC_COND_FMLS	; Used in aarch64-sve.md.
+    UNSPEC_COND_FNMLA	; Used in aarch64-sve.md.
+    UNSPEC_COND_FNMLS	; Used in aarch64-sve.md.
     UNSPEC_COND_LT	; Used in aarch64-sve.md.
     UNSPEC_COND_LE	; Used in aarch64-sve.md.
     UNSPEC_COND_EQ	; Used in aarch64-sve.md.
@@ -1567,6 +1571,11 @@ (define_int_iterator SVE_COND_FP_BINARY
 					 UNSPEC_COND_MUL UNSPEC_COND_DIV
 					 UNSPEC_COND_MAX UNSPEC_COND_MIN])
 
+(define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA
+					  UNSPEC_COND_FMLS
+					  UNSPEC_COND_FNMLA
+					  UNSPEC_COND_FNMLS])
+
 (define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_LT UNSPEC_COND_LE
 				      UNSPEC_COND_EQ UNSPEC_COND_NE
 				      UNSPEC_COND_GE UNSPEC_COND_GT])
@@ -1599,7 +1608,11 @@ (define_int_attr optab [(UNSPEC_ANDF "an
 			(UNSPEC_COND_MUL "mul")
 			(UNSPEC_COND_DIV "div")
 			(UNSPEC_COND_MAX "smax")
-			(UNSPEC_COND_MIN "smin")])
+			(UNSPEC_COND_MIN "smin")
+			(UNSPEC_COND_FMLA "fma")
+			(UNSPEC_COND_FMLS "fnma")
+			(UNSPEC_COND_FNMLA "fnms")
+			(UNSPEC_COND_FNMLS "fms")])
 
 (define_int_attr  maxmin_uns [(UNSPEC_UMAXV "umax")
 			      (UNSPEC_UMINV "umin")
@@ -1826,6 +1839,16 @@ (define_int_attr sve_fp_op_rev [(UNSPEC_
 			        (UNSPEC_COND_MAX "fmaxnm")
 			        (UNSPEC_COND_MIN "fminnm")])
 
+(define_int_attr sve_fmla_op [(UNSPEC_COND_FMLA "fmla")
+			      (UNSPEC_COND_FMLS "fmls")
+			      (UNSPEC_COND_FNMLA "fnmla")
+			      (UNSPEC_COND_FNMLS "fnmls")])
+
+(define_int_attr sve_fmad_op [(UNSPEC_COND_FMLA "fmad")
+			      (UNSPEC_COND_FMLS "fmsb")
+			      (UNSPEC_COND_FNMLA "fnmad")
+			      (UNSPEC_COND_FNMLS "fnmsb")])
+
 (define_int_attr commutative [(UNSPEC_COND_ADD "true")
 			      (UNSPEC_COND_SUB "false")
 			      (UNSPEC_COND_MUL "true")
Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2018-07-12 12:39:29.423369885 +0100
+++ gcc/config/aarch64/aarch64-sve.md	2018-07-12 12:42:44.360201163 +0100
@@ -2906,6 +2906,101 @@ (define_insn_and_split "*cond_<optab><mo
 	  UNSPEC_SEL))]
 )
 
+;; Predicated floating-point ternary operations with select.
+(define_expand "cond_<optab><mode>"
+  [(set (match_operand:SVE_F 0 "register_operand")
+	(unspec:SVE_F
+	  [(match_operand:<VPRED> 1 "register_operand")
+	   (unspec:SVE_F
+	     [(match_operand:SVE_F 2 "register_operand")
+	      (match_operand:SVE_F 3 "register_operand")
+	      (match_operand:SVE_F 4 "register_operand")]
+	     SVE_COND_FP_TERNARY)
+	   (match_operand:SVE_F 5 "aarch64_simd_reg_or_zero")]
+	  UNSPEC_SEL))]
+  "TARGET_SVE"
+{
+  /* Swap the multiplication operands if the fallback value is the
+     second of the two.  */
+  if (rtx_equal_p (operands[3], operands[5]))
+    std::swap (operands[2], operands[3]);
+})
+
+;; Predicated floating-point ternary operations using the FMAD-like form.
+(define_insn "*cond_<optab><mode>_2"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, ?&w")
+	(unspec:SVE_F
+	  [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+	   (unspec:SVE_F
+	     [(match_operand:SVE_F 2 "register_operand" "0, w")
+	      (match_operand:SVE_F 3 "register_operand" "w, w")
+	      (match_operand:SVE_F 4 "register_operand" "w, w")]
+	     SVE_COND_FP_TERNARY)
+	   (match_dup 2)]
+	  UNSPEC_SEL))]
+  "TARGET_SVE"
+  "@
+   <sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>
+   movprfx\t%0, %2\;<sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+  [(set_attr "movprfx" "*,yes")]
+)
+
+;; Predicated floating-point ternary operations using the FMLA-like form.
+(define_insn "*cond_<optab><mode>_4"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w, ?&w")
+	(unspec:SVE_F
+	  [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl")
+	   (unspec:SVE_F
+	     [(match_operand:SVE_F 2 "register_operand" "w, w")
+	      (match_operand:SVE_F 3 "register_operand" "w, w")
+	      (match_operand:SVE_F 4 "register_operand" "0, w")]
+	     SVE_COND_FP_TERNARY)
+	   (match_dup 4)]
+	  UNSPEC_SEL))]
+  "TARGET_SVE"
+  "@
+   <sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+   movprfx\t%0, %4\;<sve_fmad_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>"
+  [(set_attr "movprfx" "*,yes")]
+)
+
+;; Predicated floating-point ternary operations in which the value for
+;; inactive lanes is distinct from the other inputs.
+(define_insn_and_split "*cond_<optab><mode>_any"
+  [(set (match_operand:SVE_F 0 "register_operand" "=&w, &w, ?&w")
+	(unspec:SVE_F
+	  [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl, Upl")
+	   (unspec:SVE_F
+	     [(match_operand:SVE_F 2 "register_operand" "w, w, w")
+	      (match_operand:SVE_F 3 "register_operand" "w, w, w")
+	      (match_operand:SVE_F 4 "register_operand" "w, w, w")]
+	     SVE_COND_FP_TERNARY)
+	   (match_operand:SVE_F 5 "aarch64_simd_reg_or_zero" "Dz, 0, w")]
+	  UNSPEC_SEL))]
+  "TARGET_SVE
+   && !rtx_equal_p (operands[2], operands[5])
+   && !rtx_equal_p (operands[3], operands[5])
+   && !rtx_equal_p (operands[4], operands[5])"
+  "@
+   movprfx\t%0.<Vetype>, %1/z, %4.<Vetype>\;<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+   movprfx\t%0.<Vetype>, %1/m, %4.<Vetype>\;<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>
+   #"
+  "&& reload_completed
+   && !CONSTANT_P (operands[5])
+   && !rtx_equal_p (operands[0], operands[5])"
+  [(set (match_dup 0)
+	(unspec:SVE_F [(match_dup 1) (match_dup 4) (match_dup 5)] UNSPEC_SEL))
+   (set (match_dup 0)
+	(unspec:SVE_F
+	  [(match_dup 1)
+	   (unspec:SVE_F [(match_dup 2) (match_dup 3) (match_dup 0)]
+			 SVE_COND_FP_TERNARY)
+           (match_dup 0)]
+	  UNSPEC_SEL))]
+  ""
+  [(set_attr "movprfx" "yes")]
+)
+
 ;; Shift an SVE vector left and insert a scalar into element 0.
 (define_insn "vec_shl_insert_<mode>"
   [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w")
Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c
===================================================================
--- /dev/null	2018-06-15 06:21:49.150735301 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c	2018-07-12 12:42:44.368856626 +0100
@@ -0,0 +1,63 @@
+/* { dg-require-effective-target scalar_all_fma } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include "tree-vect.h"
+
+#define N (VECTOR_BITS * 11 / 64 + 3)
+
+#define DEF(INV)					\
+  void __attribute__ ((noipa))				\
+  f_##INV (double *restrict a, double *restrict b,	\
+	   double *restrict c, double *restrict d)	\
+  {							\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	double mb = (INV & 1 ? -b[i] : b[i]);		\
+	double mc = c[i];				\
+	double md = (INV & 2 ? -d[i] : d[i]);		\
+	double fma = __builtin_fma (mb, mc, md);	\
+	double truev = (INV & 4 ? -fma : fma);		\
+	a[i] = b[i] < 10 ? truev : 10.0;		\
+      }							\
+  }
+
+#define TEST(INV)					\
+  {							\
+    f_##INV (a, b, c, d);				\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	double mb = (INV & 1 ? -b[i] : b[i]);		\
+	double mc = c[i];				\
+	double md = (INV & 2 ? -d[i] : d[i]);		\
+	double fma = __builtin_fma (mb, mc, md);	\
+	double truev = (INV & 4 ? -fma : fma);		\
+	if (a[i] != (i % 17 < 10 ? truev : 10.0))	\
+	  __builtin_abort ();				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+  }
+
+#define FOR_EACH_INV(T) \
+  T (0) T (1) T (2) T (3) T (4) T (5) T (6) T (7)
+
+FOR_EACH_INV (DEF)
+
+int
+main (void)
+{
+  double a[N], b[N], c[N], d[N];
+  for (int i = 0; i < N; ++i)
+    {
+      b[i] = i % 17;
+      c[i] = i % 9 + 11;
+      d[i] = i % 13 + 14;
+      asm volatile ("" ::: "memory");
+    }
+  FOR_EACH_INV (TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times { = \.COND_FMA } 2 "optimized" { target vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump-times { = \.COND_FMS } 2 "optimized" { target vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump-times { = \.COND_FNMA } 2 "optimized" { target vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump-times { = \.COND_FNMS } 2 "optimized" { target vect_double_cond_arith } } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_13.c
===================================================================
--- /dev/null	2018-06-15 06:21:49.150735301 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_13.c	2018-07-12 12:42:44.368856626 +0100
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#define N 119
+
+#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX)			\
+  void __attribute__ ((noipa))					\
+  f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b,	\
+		      TYPE *restrict c, TYPE *restrict d,	\
+		      CMPTYPE *restrict cond)			\
+  {								\
+    for (int i = 0; i < N; ++i)					\
+      {								\
+	TYPE mb = (INV & 1 ? -b[i] : b[i]);			\
+	TYPE mc = c[i];						\
+	TYPE md = (INV & 2 ? -d[i] : d[i]);			\
+	TYPE fma = __builtin_fma##SUFFIX (mb, mc, md);		\
+	TYPE truev = (INV & 4 ? -fma : fma);			\
+	a[i] = cond[i] < 10 ? truev : b[i];			\
+      }								\
+  }
+
+#define FOR_EACH_TYPE(T, INV)			\
+  T (INV, _Float16, short, f16)			\
+  T (INV, float, float, f32)			\
+  T (INV, double, double, f64)
+
+#define FOR_EACH_INV(T)				\
+  FOR_EACH_TYPE (T, 0)				\
+  FOR_EACH_TYPE (T, 1)				\
+  FOR_EACH_TYPE (T, 2)				\
+  FOR_EACH_TYPE (T, 3)				\
+  FOR_EACH_TYPE (T, 4)				\
+  FOR_EACH_TYPE (T, 5)				\
+  FOR_EACH_TYPE (T, 6)				\
+  FOR_EACH_TYPE (T, 7)
+
+FOR_EACH_INV (DEF_LOOP)
+
+/* { dg-final { scan-assembler-not {\tsel\t} } } */
+/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */
+/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */
+
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_13_run.c
===================================================================
--- /dev/null	2018-06-15 06:21:49.150735301 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_13_run.c	2018-07-12 12:42:44.368856626 +0100
@@ -0,0 +1,37 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "vcond_13.c"
+
+#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX)		\
+  {							\
+    TYPE a[N], b[N], c[N], d[N];			\
+    CMPTYPE cond[N];					\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	b[i] = i % 15;					\
+	c[i] = i % 9 + 11;				\
+	d[i] = i % 13 + 14;				\
+	cond[i] = i % 17;				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+    f_##INV##_##SUFFIX (a, b, c, d, cond);		\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	double mb = (INV & 1 ? -b[i] : b[i]);		\
+	double mc = c[i];				\
+	double md = (INV & 2 ? -d[i] : d[i]);		\
+	double fma = __builtin_fma (mb, mc, md);	\
+	double truev = (INV & 4 ? -fma : fma);		\
+	if (a[i] != (i % 17 < 10 ? truev : b[i]))	\
+	  __builtin_abort ();				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+  }
+
+int
+main (void)
+{
+  FOR_EACH_INV (TEST_LOOP)
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_14.c
===================================================================
--- /dev/null	2018-06-15 06:21:49.150735301 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_14.c	2018-07-12 12:42:44.368856626 +0100
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#define N 119
+
+#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX)			\
+  void __attribute__ ((noipa))					\
+  f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b,	\
+		      TYPE *restrict c, TYPE *restrict d,	\
+		      CMPTYPE *restrict cond)			\
+  {								\
+    for (int i = 0; i < N; ++i)					\
+      {								\
+	TYPE mb = (INV & 1 ? -b[i] : b[i]);			\
+	TYPE mc = c[i];						\
+	TYPE md = (INV & 2 ? -d[i] : d[i]);			\
+	TYPE fma = __builtin_fma##SUFFIX (mb, mc, md);		\
+	TYPE truev = (INV & 4 ? -fma : fma);			\
+	a[i] = cond[i] < 10 ? truev : c[i];			\
+      }								\
+  }
+
+#define FOR_EACH_TYPE(T, INV)			\
+  T (INV, _Float16, short, f16)			\
+  T (INV, float, float, f32)			\
+  T (INV, double, double, f64)
+
+#define FOR_EACH_INV(T)				\
+  FOR_EACH_TYPE (T, 0)				\
+  FOR_EACH_TYPE (T, 1)				\
+  FOR_EACH_TYPE (T, 2)				\
+  FOR_EACH_TYPE (T, 3)				\
+  FOR_EACH_TYPE (T, 4)				\
+  FOR_EACH_TYPE (T, 5)				\
+  FOR_EACH_TYPE (T, 6)				\
+  FOR_EACH_TYPE (T, 7)
+
+FOR_EACH_INV (DEF_LOOP)
+
+/* { dg-final { scan-assembler-not {\tsel\t} } } */
+/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */
+/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */
+
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_14_run.c
===================================================================
--- /dev/null	2018-06-15 06:21:49.150735301 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_14_run.c	2018-07-12 12:42:44.368856626 +0100
@@ -0,0 +1,37 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "vcond_14.c"
+
+#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX)		\
+  {							\
+    TYPE a[N], b[N], c[N], d[N];			\
+    CMPTYPE cond[N];					\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	b[i] = i % 15;					\
+	c[i] = i % 9 + 11;				\
+	d[i] = i % 13 + 14;				\
+	cond[i] = i % 17;				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+    f_##INV##_##SUFFIX (a, b, c, d, cond);		\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	double mb = (INV & 1 ? -b[i] : b[i]);		\
+	double mc = c[i];				\
+	double md = (INV & 2 ? -d[i] : d[i]);		\
+	double fma = __builtin_fma (mb, mc, md);	\
+	double truev = (INV & 4 ? -fma : fma);		\
+	if (a[i] != (i % 17 < 10 ? truev : c[i]))	\
+	  __builtin_abort ();				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+  }
+
+int
+main (void)
+{
+  FOR_EACH_INV (TEST_LOOP)
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_15.c
===================================================================
--- /dev/null	2018-06-15 06:21:49.150735301 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_15.c	2018-07-12 12:42:44.369818344 +0100
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#define N 119
+
+#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX)			\
+  void __attribute__ ((noipa))					\
+  f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b,	\
+		      TYPE *restrict c, TYPE *restrict d,	\
+		      CMPTYPE *restrict cond)			\
+  {								\
+    for (int i = 0; i < N; ++i)					\
+      {								\
+	TYPE mb = (INV & 1 ? -b[i] : b[i]);			\
+	TYPE mc = c[i];						\
+	TYPE md = (INV & 2 ? -d[i] : d[i]);			\
+	TYPE fma = __builtin_fma##SUFFIX (mb, mc, md);		\
+	TYPE truev = (INV & 4 ? -fma : fma);			\
+	a[i] = cond[i] < 10 ? truev : d[i];			\
+      }								\
+  }
+
+#define FOR_EACH_TYPE(T, INV)			\
+  T (INV, _Float16, short, f16)			\
+  T (INV, float, float, f32)			\
+  T (INV, double, double, f64)
+
+#define FOR_EACH_INV(T)				\
+  FOR_EACH_TYPE (T, 0)				\
+  FOR_EACH_TYPE (T, 1)				\
+  FOR_EACH_TYPE (T, 2)				\
+  FOR_EACH_TYPE (T, 3)				\
+  FOR_EACH_TYPE (T, 4)				\
+  FOR_EACH_TYPE (T, 5)				\
+  FOR_EACH_TYPE (T, 6)				\
+  FOR_EACH_TYPE (T, 7)
+
+FOR_EACH_INV (DEF_LOOP)
+
+/* { dg-final { scan-assembler-not {\tsel\t} } } */
+/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */
+/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */
+
+/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.d,} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_15_run.c
===================================================================
--- /dev/null	2018-06-15 06:21:49.150735301 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_15_run.c	2018-07-12 12:42:44.369818344 +0100
@@ -0,0 +1,37 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "vcond_15.c"
+
+#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX)		\
+  {							\
+    TYPE a[N], b[N], c[N], d[N];			\
+    CMPTYPE cond[N];					\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	b[i] = i % 15;					\
+	c[i] = i % 9 + 11;				\
+	d[i] = i % 13 + 14;				\
+	cond[i] = i % 17;				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+    f_##INV##_##SUFFIX (a, b, c, d, cond);		\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	double mb = (INV & 1 ? -b[i] : b[i]);		\
+	double mc = c[i];				\
+	double md = (INV & 2 ? -d[i] : d[i]);		\
+	double fma = __builtin_fma (mb, mc, md);	\
+	double truev = (INV & 4 ? -fma : fma);		\
+	if (a[i] != (i % 17 < 10 ? truev : d[i]))	\
+	  __builtin_abort ();				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+  }
+
+int
+main (void)
+{
+  FOR_EACH_INV (TEST_LOOP)
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_16.c
===================================================================
--- /dev/null	2018-06-15 06:21:49.150735301 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_16.c	2018-07-12 12:42:44.369818344 +0100
@@ -0,0 +1,58 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#define N 119
+
+#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX)			\
+  void __attribute__ ((noipa))					\
+  f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b,	\
+		      TYPE *restrict c, TYPE *restrict d,	\
+		      CMPTYPE *restrict cond)			\
+  {								\
+    for (int i = 0; i < N; ++i)					\
+      {								\
+	TYPE mb = (INV & 1 ? -b[i] : b[i]);			\
+	TYPE mc = c[i];						\
+	TYPE md = (INV & 2 ? -d[i] : d[i]);			\
+	TYPE fma = __builtin_fma##SUFFIX (mb, mc, md);		\
+	TYPE truev = (INV & 4 ? -fma : fma);			\
+	a[i] = cond[i] < 10 ? truev : 10;			\
+      }								\
+  }
+
+#define FOR_EACH_TYPE(T, INV)			\
+  T (INV, _Float16, short, f16)			\
+  T (INV, float, float, f32)			\
+  T (INV, double, double, f64)
+
+#define FOR_EACH_INV(T)				\
+  FOR_EACH_TYPE (T, 0)				\
+  FOR_EACH_TYPE (T, 1)				\
+  FOR_EACH_TYPE (T, 2)				\
+  FOR_EACH_TYPE (T, 3)				\
+  FOR_EACH_TYPE (T, 4)				\
+  FOR_EACH_TYPE (T, 5)				\
+  FOR_EACH_TYPE (T, 6)				\
+  FOR_EACH_TYPE (T, 7)
+
+FOR_EACH_INV (DEF_LOOP)
+
+/* { dg-final { scan-assembler-times {\tsel\t} 24 } } */
+/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */
+/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */
+
+/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.d,} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_16_run.c
===================================================================
--- /dev/null	2018-06-15 06:21:49.150735301 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_16_run.c	2018-07-12 12:42:44.369818344 +0100
@@ -0,0 +1,37 @@
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "vcond_16.c"
+
+#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX)		\
+  {							\
+    TYPE a[N], b[N], c[N], d[N];			\
+    CMPTYPE cond[N];					\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	b[i] = i % 15;					\
+	c[i] = i % 9 + 11;				\
+	d[i] = i % 13 + 14;				\
+	cond[i] = i % 17;				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+    f_##INV##_##SUFFIX (a, b, c, d, cond);		\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	double mb = (INV & 1 ? -b[i] : b[i]);		\
+	double mc = c[i];				\
+	double md = (INV & 2 ? -d[i] : d[i]);		\
+	double fma = __builtin_fma (mb, mc, md);	\
+	double truev = (INV & 4 ? -fma : fma);		\
+	if (a[i] != (i % 17 < 10 ? truev : 10))		\
+	  __builtin_abort ();				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+  }
+
+int
+main (void)
+{
+  FOR_EACH_INV (TEST_LOOP)
+  return 0;
+}
diff mbox series

Patch

Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2018-05-24 10:12:10.142352315 +0100
+++ gcc/doc/md.texi	2018-05-24 13:05:46.047607587 +0100
@@ -6386,6 +6386,23 @@  Operands 0, 2, 3 and 4 all have mode @va
 integer if @var{m} is scalar, otherwise it has the mode returned by
 @code{TARGET_VECTORIZE_GET_MASK_MODE}.
 
+@cindex @code{cond_fma@var{mode}} instruction pattern
+@cindex @code{cond_fms@var{mode}} instruction pattern
+@cindex @code{cond_fnma@var{mode}} instruction pattern
+@cindex @code{cond_fnms@var{mode}} instruction pattern
+@item @samp{cond_fma@var{mode}}
+@itemx @samp{cond_fms@var{mode}}
+@itemx @samp{cond_fnma@var{mode}}
+@itemx @samp{cond_fnms@var{mode}}
+Like @samp{cond_add@var{m}}, except that the conditional operation
+takes 3 operands rather than two.  For example, the vector form of
+@samp{cond_fma@var{mode}} is equivalent to:
+
+@smallexample
+for (i = 0; i < GET_MODE_NUNITS (@var{m}); i++)
+  op0[i] = op1[i] ? fma (op2[i], op3[i], op4[i]) : op5[i];
+@end smallexample
+
 @cindex @code{neg@var{mode}cc} instruction pattern
 @item @samp{neg@var{mode}cc}
 Similar to @samp{mov@var{mode}cc} but for conditional negation.  Conditionally
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def	2018-05-24 10:12:10.146352152 +0100
+++ gcc/optabs.def	2018-05-24 13:05:46.049605128 +0100
@@ -234,6 +234,10 @@  OPTAB_D (cond_smin_optab, "cond_smin$a")
 OPTAB_D (cond_smax_optab, "cond_smax$a")
 OPTAB_D (cond_umin_optab, "cond_umin$a")
 OPTAB_D (cond_umax_optab, "cond_umax$a")
+OPTAB_D (cond_fma_optab, "cond_fma$a")
+OPTAB_D (cond_fms_optab, "cond_fms$a")
+OPTAB_D (cond_fnma_optab, "cond_fnma$a")
+OPTAB_D (cond_fnms_optab, "cond_fnms$a")
 OPTAB_D (cmov_optab, "cmov$a6")
 OPTAB_D (cstore_optab, "cstore$a4")
 OPTAB_D (ctrap_optab, "ctrap$a4")
Index: gcc/internal-fn.def
===================================================================
--- gcc/internal-fn.def	2018-05-24 10:12:10.146352152 +0100
+++ gcc/internal-fn.def	2018-05-24 13:05:46.048606357 +0100
@@ -59,7 +59,8 @@  along with GCC; see the file COPYING3.
    - binary: a normal binary optab, such as vec_interleave_lo_<mode>
    - ternary: a normal ternary optab, such as fma<mode>4
 
-   - cond_binary: a conditional binary optab, such as add<mode>cc
+   - cond_binary: a conditional binary optab, such as cond_add<mode>
+   - cond_ternary: a conditional ternary optab, such as cond_fma_rev<mode>
 
    - fold_left: for scalar = FN (scalar, vector), keyed off the vector mode
 
@@ -162,6 +163,11 @@  DEF_INTERNAL_OPTAB_FN (COND_IOR, ECF_CON
 DEF_INTERNAL_OPTAB_FN (COND_XOR, ECF_CONST | ECF_NOTHROW,
 		       cond_xor, cond_binary)
 
+DEF_INTERNAL_OPTAB_FN (COND_FMA, ECF_CONST, cond_fma, cond_ternary)
+DEF_INTERNAL_OPTAB_FN (COND_FMS, ECF_CONST, cond_fms, cond_ternary)
+DEF_INTERNAL_OPTAB_FN (COND_FNMA, ECF_CONST, cond_fnma, cond_ternary)
+DEF_INTERNAL_OPTAB_FN (COND_FNMS, ECF_CONST, cond_fnms, cond_ternary)
+
 DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary)
 
 DEF_INTERNAL_OPTAB_FN (REDUC_PLUS, ECF_CONST | ECF_NOTHROW,
@@ -230,7 +236,7 @@  DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS
 DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary)
 
 /* Ternary math functions.  */
-DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary)
+DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary)
 
 /* Unary integer ops.  */
 DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary)
Index: gcc/internal-fn.h
===================================================================
--- gcc/internal-fn.h	2018-05-24 10:33:30.870095164 +0100
+++ gcc/internal-fn.h	2018-05-24 13:05:46.049605128 +0100
@@ -193,7 +193,9 @@  direct_internal_fn_supported_p (internal
 extern bool set_edom_supported_p (void);
 
 extern internal_fn get_conditional_internal_fn (tree_code);
+extern internal_fn get_conditional_internal_fn (internal_fn);
 extern tree_code conditional_internal_fn_code (internal_fn);
+extern internal_fn get_unconditional_internal_fn (internal_fn);
 
 extern bool internal_load_fn_p (internal_fn);
 extern bool internal_store_fn_p (internal_fn);
Index: gcc/internal-fn.c
===================================================================
--- gcc/internal-fn.c	2018-05-24 10:33:30.870095164 +0100
+++ gcc/internal-fn.c	2018-05-24 13:05:46.048606357 +0100
@@ -113,6 +113,7 @@  #define binary_direct { 0, 0, true }
 #define ternary_direct { 0, 0, true }
 #define cond_unary_direct { 1, 1, true }
 #define cond_binary_direct { 1, 1, true }
+#define cond_ternary_direct { 1, 1, true }
 #define while_direct { 0, 2, false }
 #define fold_extract_direct { 2, 2, false }
 #define fold_left_direct { 1, 1, false }
@@ -2993,6 +2994,9 @@  #define expand_cond_unary_optab_fn(FN, S
 #define expand_cond_binary_optab_fn(FN, STMT, OPTAB) \
   expand_direct_optab_fn (FN, STMT, OPTAB, 4)
 
+#define expand_cond_ternary_optab_fn(FN, STMT, OPTAB) \
+  expand_direct_optab_fn (FN, STMT, OPTAB, 5)
+
 #define expand_fold_extract_optab_fn(FN, STMT, OPTAB) \
   expand_direct_optab_fn (FN, STMT, OPTAB, 3)
 
@@ -3075,6 +3079,7 @@  #define direct_binary_optab_supported_p
 #define direct_ternary_optab_supported_p direct_optab_supported_p
 #define direct_cond_unary_optab_supported_p direct_optab_supported_p
 #define direct_cond_binary_optab_supported_p direct_optab_supported_p
+#define direct_cond_ternary_optab_supported_p direct_optab_supported_p
 #define direct_mask_load_optab_supported_p direct_optab_supported_p
 #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p
@@ -3277,6 +3282,57 @@  #define CASE(CODE, IFN) case IFN: return
     }
 }
 
+/* Invoke T(IFN) for each internal function IFN that also has an
+   IFN_COND_* form.  */
+#define FOR_EACH_COND_FN_PAIR(T) \
+  T (FMA) \
+  T (FMS) \
+  T (FNMA) \
+  T (FNMS)
+
+/* Return a function that only performs internal function FN when a
+   certain condition is met and that uses a given fallback value otherwise.
+   In other words, the returned function FN' is such that:
+
+     LHS = FN' (COND, A1, ... An, ELSE)
+
+   is equivalent to the C expression:
+
+     LHS = COND ? FN (A1, ..., An) : ELSE;
+
+   operating elementwise if the operands are vectors.
+
+   Return IFN_LAST if no such function exists.  */
+
+internal_fn
+get_conditional_internal_fn (internal_fn fn)
+{
+  switch (fn)
+    {
+#define CASE(NAME) case IFN_##NAME: return IFN_COND_##NAME;
+      FOR_EACH_COND_FN_PAIR(CASE)
+#undef CASE
+    default:
+      return IFN_LAST;
+    }
+}
+
+/* If IFN implements the conditional form of an unconditional internal
+   function, return that unconditional function, otherwise return IFN_LAST.  */
+
+internal_fn
+get_unconditional_internal_fn (internal_fn ifn)
+{
+  switch (ifn)
+    {
+#define CASE(NAME) case IFN_COND_##NAME: return IFN_##NAME;
+      FOR_EACH_COND_FN_PAIR(CASE)
+#undef CASE
+    default:
+      return IFN_LAST;
+    }
+}
+
 /* Return true if IFN is some form of load from memory.  */
 
 bool
Index: gcc/gimple-match.h
===================================================================
--- gcc/gimple-match.h	2018-05-24 10:33:30.870095164 +0100
+++ gcc/gimple-match.h	2018-05-24 13:05:46.048606357 +0100
@@ -91,18 +91,21 @@  struct gimple_match_op
 		   code_helper, tree, tree, tree, tree);
   gimple_match_op (const gimple_match_cond &,
 		   code_helper, tree, tree, tree, tree, tree);
+  gimple_match_op (const gimple_match_cond &,
+		   code_helper, tree, tree, tree, tree, tree, tree);
 
   void set_op (code_helper, tree, unsigned int);
   void set_op (code_helper, tree, tree);
   void set_op (code_helper, tree, tree, tree);
   void set_op (code_helper, tree, tree, tree, tree);
   void set_op (code_helper, tree, tree, tree, tree, tree);
+  void set_op (code_helper, tree, tree, tree, tree, tree, tree);
   void set_value (tree);
 
   tree op_or_null (unsigned int) const;
 
   /* The maximum value of NUM_OPS.  */
-  static const unsigned int MAX_NUM_OPS = 4;
+  static const unsigned int MAX_NUM_OPS = 5;
 
   /* The conditions under which the operation is performed, and the value to
      use as a fallback.  */
@@ -182,6 +185,20 @@  gimple_match_op::gimple_match_op (const
   ops[3] = op3;
 }
 
+inline
+gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in,
+				  code_helper code_in, tree type_in,
+				  tree op0, tree op1, tree op2, tree op3,
+				  tree op4)
+  : cond (cond_in), code (code_in), type (type_in), num_ops (5)
+{
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+}
+
 /* Change the operation performed to CODE_IN, the type of the result to
    TYPE_IN, and the number of operands to NUM_OPS_IN.  The caller needs
    to set the operands itself.  */
@@ -242,6 +259,20 @@  gimple_match_op::set_op (code_helper cod
   ops[3] = op3;
 }
 
+inline void
+gimple_match_op::set_op (code_helper code_in, tree type_in,
+			 tree op0, tree op1, tree op2, tree op3, tree op4)
+{
+  code = code_in;
+  type = type_in;
+  num_ops = 5;
+  ops[0] = op0;
+  ops[1] = op1;
+  ops[2] = op2;
+  ops[3] = op3;
+  ops[4] = op4;
+}
+
 /* Set the "operation" to be the single value VALUE, such as a constant
    or SSA_NAME.  */
 
@@ -279,6 +310,7 @@  bool gimple_resimplify1 (gimple_seq *, g
 bool gimple_resimplify2 (gimple_seq *, gimple_match_op *, tree (*)(tree));
 bool gimple_resimplify3 (gimple_seq *, gimple_match_op *, tree (*)(tree));
 bool gimple_resimplify4 (gimple_seq *, gimple_match_op *, tree (*)(tree));
+bool gimple_resimplify5 (gimple_seq *, gimple_match_op *, tree (*)(tree));
 tree maybe_push_res_to_seq (gimple_match_op *, gimple_seq *,
 			    tree res = NULL_TREE);
 void maybe_build_generic_op (gimple_match_op *);
Index: gcc/genmatch.c
===================================================================
--- gcc/genmatch.c	2018-05-24 10:33:30.869095197 +0100
+++ gcc/genmatch.c	2018-05-24 13:05:46.048606357 +0100
@@ -3760,7 +3760,7 @@  decision_tree::gen (FILE *f, bool gimple
     }
   fprintf (stderr, "removed %u duplicate tails\n", rcnt);
 
-  for (unsigned n = 1; n <= 4; ++n)
+  for (unsigned n = 1; n <= 5; ++n)
     {
       /* First generate split-out functions.  */
       for (unsigned i = 0; i < root->kids.length (); i++)
Index: gcc/gimple-match-head.c
===================================================================
--- gcc/gimple-match-head.c	2018-05-24 10:33:30.870095164 +0100
+++ gcc/gimple-match-head.c	2018-05-24 13:05:46.048606357 +0100
@@ -54,6 +54,8 @@  static bool gimple_simplify (gimple_matc
 			     code_helper, tree, tree, tree, tree);
 static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree),
 			     code_helper, tree, tree, tree, tree, tree);
+static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree),
+			     code_helper, tree, tree, tree, tree, tree, tree);
 
 const unsigned int gimple_match_op::MAX_NUM_OPS;
 
@@ -80,7 +82,12 @@  convert_conditional_op (gimple_match_op
   if (orig_op->code.is_tree_code ())
     ifn = get_conditional_internal_fn ((tree_code) orig_op->code);
   else
-    return false;
+    {
+      combined_fn cfn = orig_op->code;
+      if (!internal_fn_p (cfn))
+	return false;
+      ifn = get_conditional_internal_fn (as_internal_fn (cfn));
+    }
   if (ifn == IFN_LAST)
     return false;
   unsigned int num_ops = orig_op->num_ops;
@@ -347,6 +354,34 @@  gimple_resimplify4 (gimple_seq *seq, gim
   return false;
 }
 
+/* Helper that matches and simplifies the toplevel result from
+   a gimple_simplify run (where we don't want to build
+   a stmt in case it's used in in-place folding).  Replaces
+   RES_OP with a simplified and/or canonicalized result and
+   returns whether any change was made.  */
+
+bool
+gimple_resimplify5 (gimple_seq *seq, gimple_match_op *res_op,
+		    tree (*valueize)(tree))
+{
+  /* No constant folding is defined for five-operand functions.  */
+
+  gimple_match_op res_op2 (*res_op);
+  if (gimple_simplify (&res_op2, seq, valueize,
+		       res_op->code, res_op->type,
+		       res_op->ops[0], res_op->ops[1], res_op->ops[2],
+		       res_op->ops[3], res_op->ops[4]))
+    {
+      *res_op = res_op2;
+      return true;
+    }
+
+  if (maybe_resimplify_conditional_op (seq, res_op, valueize))
+    return true;
+
+  return false;
+}
+
 /* If in GIMPLE the operation described by RES_OP should be single-rhs,
    build a GENERIC tree for that expression and update RES_OP accordingly.  */
 
@@ -388,7 +423,8 @@  build_call_internal (internal_fn fn, gim
 				     res_op->op_or_null (0),
 				     res_op->op_or_null (1),
 				     res_op->op_or_null (2),
-				     res_op->op_or_null (3));
+				     res_op->op_or_null (3),
+				     res_op->op_or_null (4));
 }
 
 /* Push the exploded expression described by RES_OP as a statement to
@@ -482,7 +518,8 @@  maybe_push_res_to_seq (gimple_match_op *
 					res_op->op_or_null (0),
 					res_op->op_or_null (1),
 					res_op->op_or_null (2),
-					res_op->op_or_null (3));
+					res_op->op_or_null (3),
+					res_op->op_or_null (4));
 	}
       if (!res)
 	{
@@ -689,14 +726,22 @@  do_valueize (tree op, tree (*valueize)(t
 try_conditional_simplification (internal_fn ifn, gimple_match_op *res_op,
 				gimple_seq *seq, tree (*valueize) (tree))
 {
+  code_helper op;
   tree_code code = conditional_internal_fn_code (ifn);
-  if (code == ERROR_MARK)
-    return false;
+  if (code != ERROR_MARK)
+    op = code;
+  else
+    {
+      ifn = get_unconditional_internal_fn (ifn);
+      if (ifn == IFN_LAST)
+	return false;
+      op = as_combined_fn (ifn);
+    }
 
   unsigned int num_ops = res_op->num_ops;
   gimple_match_op cond_op (gimple_match_cond (res_op->ops[0],
 					      res_op->ops[num_ops - 1]),
-			   code, res_op->type, num_ops - 2);
+			   op, res_op->type, num_ops - 2);
   for (unsigned int i = 1; i < num_ops - 1; ++i)
     cond_op.ops[i - 1] = res_op->ops[i];
   switch (num_ops - 2)
@@ -705,6 +750,10 @@  try_conditional_simplification (internal
       if (!gimple_resimplify2 (seq, &cond_op, valueize))
 	return false;
       break;
+    case 3:
+      if (!gimple_resimplify3 (seq, &cond_op, valueize))
+	return false;
+      break;
     default:
       gcc_unreachable ();
     }
@@ -837,7 +886,7 @@  gimple_simplify (gimple *stmt, gimple_ma
       /* ???  This way we can't simplify calls with side-effects.  */
       if (gimple_call_lhs (stmt) != NULL_TREE
 	  && gimple_call_num_args (stmt) >= 1
-	  && gimple_call_num_args (stmt) <= 4)
+	  && gimple_call_num_args (stmt) <= 5)
 	{
 	  bool valueized = false;
 	  combined_fn cfn;
@@ -887,6 +936,9 @@  gimple_simplify (gimple *stmt, gimple_ma
 	    case 4:
 	      return (gimple_resimplify4 (seq, res_op, valueize)
 		      || valueized);
+	    case 5:
+	      return (gimple_resimplify5 (seq, res_op, valueize)
+		      || valueized);
 	    default:
 	     gcc_unreachable ();
 	    }
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2018-05-24 10:33:30.870095164 +0100
+++ gcc/match.pd	2018-05-24 13:05:46.049605128 +0100
@@ -86,6 +86,12 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   IFN_COND_MUL IFN_COND_DIV IFN_COND_MOD IFN_COND_RDIV
   IFN_COND_MIN IFN_COND_MAX
   IFN_COND_AND IFN_COND_IOR IFN_COND_XOR)
+
+/* Same for ternary operations.  */
+(define_operator_list UNCOND_TERNARY
+  IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
+(define_operator_list COND_TERNARY
+  IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
     
 /* As opposed to convert?, this still creates a single pattern, so
    it is not a suitable replacement for convert? in all cases.  */
@@ -4798,6 +4804,21 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (if (element_precision (type) == element_precision (op_type))
     (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1)))))))
 
+/* Same for ternary operations.  */
+(for uncond_op (UNCOND_TERNARY)
+     cond_op (COND_TERNARY)
+ (simplify
+  (vec_cond @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4)
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (element_precision (type) == element_precision (op_type))
+    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4))))))
+ (simplify
+  (vec_cond @0 @1 (view_convert? (uncond_op@5 @2 @3 @4)))
+  (with { tree op_type = TREE_TYPE (@5); }
+   (if (element_precision (type) == element_precision (op_type))
+    (view_convert (cond_op (bit_not @0) @2 @3 @4
+		  (view_convert:op_type @1)))))))
+
 /* Detect cases in which a VEC_COND_EXPR effectively replaces the
    "else" value of an IFN_COND_*.  */
 (for cond_op (COND_BINARY)
@@ -4806,3 +4827,11 @@  DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (with { tree op_type = TREE_TYPE (@3); }
    (if (element_precision (type) == element_precision (op_type))
     (view_convert (cond_op @0 @1 @2 (view_convert:op_type @4)))))))
+
+/* Same for ternary operations.  */
+(for cond_op (COND_TERNARY)
+ (simplify
+  (vec_cond @0 (view_convert? (cond_op @0 @1 @2 @3 @4)) @5)
+  (with { tree op_type = TREE_TYPE (@4); }
+   (if (element_precision (type) == element_precision (op_type))
+    (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @5)))))))
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2018-05-24 10:33:30.867095262 +0100
+++ gcc/config/aarch64/aarch64.c	2018-05-24 13:05:46.046608817 +0100
@@ -1292,14 +1292,18 @@  aarch64_get_mask_mode (poly_uint64 nunit
   return default_get_mask_mode (nunits, nbytes);
 }
 
-/* Implement TARGET_PREFERRED_ELSE_VALUE.  Prefer to use the first
-   arithmetic operand as the else value if the else value doesn't matter,
-   since that exactly matches the SVE destructive merging form.  */
+/* Implement TARGET_PREFERRED_ELSE_VALUE.  For binary operations,
+   prefer to use the first arithmetic operand as the else value if
+   the else value doesn't matter, since that exactly matches the SVE
+   destructive merging form.  For ternary operations we could either
+   pick the first operand and use FMAD-like instructions or the last
+   operand and use FMLA-like instructions; the latter seems more
+   natural.  */
 
 static tree
-aarch64_preferred_else_value (unsigned, tree, unsigned int, tree *ops)
+aarch64_preferred_else_value (unsigned, tree, unsigned int nops, tree *ops)
 {
-  return ops[0];
+  return nops == 3 ? ops[2] : ops[0];
 }
 
 /* Implement TARGET_HARD_REGNO_NREGS.  */
Index: gcc/config/aarch64/aarch64-sve.md
===================================================================
--- gcc/config/aarch64/aarch64-sve.md	2018-05-24 10:12:10.141352356 +0100
+++ gcc/config/aarch64/aarch64-sve.md	2018-05-24 13:05:46.044611277 +0100
@@ -2688,6 +2688,58 @@  (define_insn "*cond_<optab><mode>"
   "<sve_fp_op>r\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>"
 )
 
+;; Predicated floating-point ternary operations with select.
+(define_expand "cond_<optab><mode>"
+  [(set (match_operand:SVE_F 0 "register_operand")
+	(unspec:SVE_F
+	  [(match_operand:<VPRED> 1 "register_operand")
+	   (unspec:SVE_F
+	     [(match_dup 1)
+	      (match_operand:SVE_F 2 "register_operand")
+	      (match_operand:SVE_F 3 "register_operand")
+	      (match_operand:SVE_F 4 "register_operand")]
+	     SVE_COND_FP_TERNARY)
+	   (match_operand:SVE_F 5 "register_operand")]
+	  UNSPEC_SEL))]
+  "TARGET_SVE"
+{
+  aarch64_sve_prepare_conditional_op (operands, 6, true);
+})
+
+;; Predicated floating-point ternary operations using the FMAD-like form.
+(define_insn "*cond_<optab><mode>"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w")
+	(unspec:SVE_F
+	  [(match_operand:<VPRED> 1 "register_operand" "Upl")
+	   (unspec:SVE_F
+	     [(match_dup 1)
+	      (match_operand:SVE_F 2 "register_operand" "0")
+	      (match_operand:SVE_F 3 "register_operand" "w")
+	      (match_operand:SVE_F 4 "register_operand" "w")]
+	     SVE_COND_FP_TERNARY)
+	   (match_dup 2)]
+	  UNSPEC_SEL))]
+  "TARGET_SVE"
+  "<sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>"
+)
+
+;; Predicated floating-point ternary operations using the FMLA-like form.
+(define_insn "*cond_<optab><mode>_acc"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w")
+	(unspec:SVE_F
+	  [(match_operand:<VPRED> 1 "register_operand" "Upl")
+	   (unspec:SVE_F
+	     [(match_dup 1)
+	      (match_operand:SVE_F 2 "register_operand" "w")
+	      (match_operand:SVE_F 3 "register_operand" "w")
+	      (match_operand:SVE_F 4 "register_operand" "0")]
+	     SVE_COND_FP_TERNARY)
+	   (match_dup 4)]
+	  UNSPEC_SEL))]
+  "TARGET_SVE"
+  "<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>"
+)
+
 ;; Shift an SVE vector left and insert a scalar into element 0.
 (define_insn "vec_shl_insert_<mode>"
   [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w")
Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md	2018-05-24 10:12:10.142352315 +0100
+++ gcc/config/aarch64/iterators.md	2018-05-24 13:05:46.046608817 +0100
@@ -468,6 +468,10 @@  (define_c_enum "unspec"
     UNSPEC_COND_DIV	; Used in aarch64-sve.md.
     UNSPEC_COND_MAX	; Used in aarch64-sve.md.
     UNSPEC_COND_MIN	; Used in aarch64-sve.md.
+    UNSPEC_COND_FMLA	; Used in aarch64-sve.md.
+    UNSPEC_COND_FMLS	; Used in aarch64-sve.md.
+    UNSPEC_COND_FNMLA	; Used in aarch64-sve.md.
+    UNSPEC_COND_FNMLS	; Used in aarch64-sve.md.
     UNSPEC_COND_LT	; Used in aarch64-sve.md.
     UNSPEC_COND_LE	; Used in aarch64-sve.md.
     UNSPEC_COND_EQ	; Used in aarch64-sve.md.
@@ -1549,6 +1553,11 @@  (define_int_iterator SVE_COND_FP_BINARY
 
 (define_int_iterator SVE_COND_FP_BINARY_REV [UNSPEC_COND_SUB UNSPEC_COND_DIV])
 
+(define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA
+					  UNSPEC_COND_FMLS
+					  UNSPEC_COND_FNMLA
+					  UNSPEC_COND_FNMLS])
+
 (define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_LT UNSPEC_COND_LE
 				      UNSPEC_COND_EQ UNSPEC_COND_NE
 				      UNSPEC_COND_GE UNSPEC_COND_GT])
@@ -1581,7 +1590,11 @@  (define_int_attr optab [(UNSPEC_ANDF "an
 			(UNSPEC_COND_MUL "mul")
 			(UNSPEC_COND_DIV "div")
 			(UNSPEC_COND_MAX "smax")
-			(UNSPEC_COND_MIN "smin")])
+			(UNSPEC_COND_MIN "smin")
+			(UNSPEC_COND_FMLA "fma")
+			(UNSPEC_COND_FMLS "fnma")
+			(UNSPEC_COND_FNMLA "fnms")
+			(UNSPEC_COND_FNMLS "fms")])
 
 (define_int_attr  maxmin_uns [(UNSPEC_UMAXV "umax")
 			      (UNSPEC_UMINV "umin")
@@ -1799,6 +1812,16 @@  (define_int_attr sve_fp_op [(UNSPEC_COND
 			    (UNSPEC_COND_MAX "fmaxnm")
 			    (UNSPEC_COND_MIN "fminnm")])
 
+(define_int_attr sve_fmla_op [(UNSPEC_COND_FMLA "fmla")
+			      (UNSPEC_COND_FMLS "fmls")
+			      (UNSPEC_COND_FNMLA "fnmla")
+			      (UNSPEC_COND_FNMLS "fnmls")])
+
+(define_int_attr sve_fmad_op [(UNSPEC_COND_FMLA "fmad")
+			      (UNSPEC_COND_FMLS "fmsb")
+			      (UNSPEC_COND_FNMLA "fnmad")
+			      (UNSPEC_COND_FNMLS "fnmsb")])
+
 (define_int_attr commutative [(UNSPEC_COND_ADD "true")
 			      (UNSPEC_COND_SUB "false")
 			      (UNSPEC_COND_MUL "true")
Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c	2018-05-24 13:05:46.049605128 +0100
@@ -0,0 +1,63 @@ 
+/* { dg-require-effective-target scalar_all_fma } */
+/* { dg-additional-options "-fdump-tree-optimized" } */
+
+#include "tree-vect.h"
+
+#define N (VECTOR_BITS * 11 / 64 + 3)
+
+#define DEF(INV)					\
+  void __attribute__ ((noipa))				\
+  f_##INV (double *restrict a, double *restrict b,	\
+	   double *restrict c, double *restrict d)	\
+  {							\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	double mb = (INV & 1 ? -b[i] : b[i]);		\
+	double mc = c[i];				\
+	double md = (INV & 2 ? -d[i] : d[i]);		\
+	double fma = __builtin_fma (mb, mc, md);	\
+	double truev = (INV & 4 ? -fma : fma);		\
+	a[i] = b[i] < 10 ? truev : 10.0;		\
+      }							\
+  }
+
+#define TEST(INV)					\
+  {							\
+    f_##INV (a, b, c, d);				\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	double mb = (INV & 1 ? -b[i] : b[i]);		\
+	double mc = c[i];				\
+	double md = (INV & 2 ? -d[i] : d[i]);		\
+	double fma = __builtin_fma (mb, mc, md);	\
+	double truev = (INV & 4 ? -fma : fma);		\
+	if (a[i] != (i % 17 < 10 ? truev : 10.0))	\
+	  __builtin_abort ();				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+  }
+
+#define FOR_EACH_INV(T) \
+  T (0) T (1) T (2) T (3) T (4) T (5) T (6) T (7)
+
+FOR_EACH_INV (DEF)
+
+int
+main (void)
+{
+  double a[N], b[N], c[N], d[N];
+  for (int i = 0; i < N; ++i)
+    {
+      b[i] = i % 17;
+      c[i] = i % 9 + 11;
+      d[i] = i % 13 + 14;
+      asm volatile ("" ::: "memory");
+    }
+  FOR_EACH_INV (TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times { = \.COND_FMA } 2 "optimized" { target vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump-times { = \.COND_FMS } 2 "optimized" { target vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump-times { = \.COND_FNMA } 2 "optimized" { target vect_double_cond_arith } } } */
+/* { dg-final { scan-tree-dump-times { = \.COND_FNMS } 2 "optimized" { target vect_double_cond_arith } } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_13.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_13.c	2018-05-24 13:05:46.049605128 +0100
@@ -0,0 +1,58 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#define N 119
+
+#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX)			\
+  void __attribute__ ((noipa))					\
+  f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b,	\
+		      TYPE *restrict c, TYPE *restrict d,	\
+		      CMPTYPE *restrict cond)			\
+  {								\
+    for (int i = 0; i < N; ++i)					\
+      {								\
+	TYPE mb = (INV & 1 ? -b[i] : b[i]);			\
+	TYPE mc = c[i];						\
+	TYPE md = (INV & 2 ? -d[i] : d[i]);			\
+	TYPE fma = __builtin_fma##SUFFIX (mb, mc, md);		\
+	TYPE truev = (INV & 4 ? -fma : fma);			\
+	a[i] = cond[i] < 10 ? truev : b[i];			\
+      }								\
+  }
+
+#define FOR_EACH_TYPE(T, INV)			\
+  T (INV, _Float16, short, f16)			\
+  T (INV, float, float, f32)			\
+  T (INV, double, double, f64)
+
+#define FOR_EACH_INV(T)				\
+  FOR_EACH_TYPE (T, 0)				\
+  FOR_EACH_TYPE (T, 1)				\
+  FOR_EACH_TYPE (T, 2)				\
+  FOR_EACH_TYPE (T, 3)				\
+  FOR_EACH_TYPE (T, 4)				\
+  FOR_EACH_TYPE (T, 5)				\
+  FOR_EACH_TYPE (T, 6)				\
+  FOR_EACH_TYPE (T, 7)
+
+FOR_EACH_INV (DEF_LOOP)
+
+/* { dg-final { scan-assembler-not {\tsel\t} } } */
+/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */
+/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */
+
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_13_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_13_run.c	2018-05-24 13:05:46.050603898 +0100
@@ -0,0 +1,37 @@ 
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "vcond_13.c"
+
+#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX)		\
+  {							\
+    TYPE a[N], b[N], c[N], d[N];			\
+    CMPTYPE cond[N];					\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	b[i] = i % 15;					\
+	c[i] = i % 9 + 11;				\
+	d[i] = i % 13 + 14;				\
+	cond[i] = i % 17;				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+    f_##INV##_##SUFFIX (a, b, c, d, cond);		\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	double mb = (INV & 1 ? -b[i] : b[i]);		\
+	double mc = c[i];				\
+	double md = (INV & 2 ? -d[i] : d[i]);		\
+	double fma = __builtin_fma (mb, mc, md);	\
+	double truev = (INV & 4 ? -fma : fma);		\
+	if (a[i] != (i % 17 < 10 ? truev : b[i]))	\
+	  __builtin_abort ();				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+  }
+
+int
+main (void)
+{
+  FOR_EACH_INV (TEST_LOOP)
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_14.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_14.c	2018-05-24 13:05:46.050603898 +0100
@@ -0,0 +1,58 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#define N 119
+
+#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX)			\
+  void __attribute__ ((noipa))					\
+  f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b,	\
+		      TYPE *restrict c, TYPE *restrict d,	\
+		      CMPTYPE *restrict cond)			\
+  {								\
+    for (int i = 0; i < N; ++i)					\
+      {								\
+	TYPE mb = (INV & 1 ? -b[i] : b[i]);			\
+	TYPE mc = c[i];						\
+	TYPE md = (INV & 2 ? -d[i] : d[i]);			\
+	TYPE fma = __builtin_fma##SUFFIX (mb, mc, md);		\
+	TYPE truev = (INV & 4 ? -fma : fma);			\
+	a[i] = cond[i] < 10 ? truev : c[i];			\
+      }								\
+  }
+
+#define FOR_EACH_TYPE(T, INV)			\
+  T (INV, _Float16, short, f16)			\
+  T (INV, float, float, f32)			\
+  T (INV, double, double, f64)
+
+#define FOR_EACH_INV(T)				\
+  FOR_EACH_TYPE (T, 0)				\
+  FOR_EACH_TYPE (T, 1)				\
+  FOR_EACH_TYPE (T, 2)				\
+  FOR_EACH_TYPE (T, 3)				\
+  FOR_EACH_TYPE (T, 4)				\
+  FOR_EACH_TYPE (T, 5)				\
+  FOR_EACH_TYPE (T, 6)				\
+  FOR_EACH_TYPE (T, 7)
+
+FOR_EACH_INV (DEF_LOOP)
+
+/* { dg-final { scan-assembler-not {\tsel\t} } } */
+/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */
+/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */
+
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_14_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_14_run.c	2018-05-24 13:05:46.050603898 +0100
@@ -0,0 +1,37 @@ 
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "vcond_14.c"
+
+#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX)		\
+  {							\
+    TYPE a[N], b[N], c[N], d[N];			\
+    CMPTYPE cond[N];					\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	b[i] = i % 15;					\
+	c[i] = i % 9 + 11;				\
+	d[i] = i % 13 + 14;				\
+	cond[i] = i % 17;				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+    f_##INV##_##SUFFIX (a, b, c, d, cond);		\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	double mb = (INV & 1 ? -b[i] : b[i]);		\
+	double mc = c[i];				\
+	double md = (INV & 2 ? -d[i] : d[i]);		\
+	double fma = __builtin_fma (mb, mc, md);	\
+	double truev = (INV & 4 ? -fma : fma);		\
+	if (a[i] != (i % 17 < 10 ? truev : c[i]))	\
+	  __builtin_abort ();				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+  }
+
+int
+main (void)
+{
+  FOR_EACH_INV (TEST_LOOP)
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_15.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_15.c	2018-05-24 13:05:46.050603898 +0100
@@ -0,0 +1,58 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#define N 119
+
+#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX)			\
+  void __attribute__ ((noipa))					\
+  f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b,	\
+		      TYPE *restrict c, TYPE *restrict d,	\
+		      CMPTYPE *restrict cond)			\
+  {								\
+    for (int i = 0; i < N; ++i)					\
+      {								\
+	TYPE mb = (INV & 1 ? -b[i] : b[i]);			\
+	TYPE mc = c[i];						\
+	TYPE md = (INV & 2 ? -d[i] : d[i]);			\
+	TYPE fma = __builtin_fma##SUFFIX (mb, mc, md);		\
+	TYPE truev = (INV & 4 ? -fma : fma);			\
+	a[i] = cond[i] < 10 ? truev : d[i];			\
+      }								\
+  }
+
+#define FOR_EACH_TYPE(T, INV)			\
+  T (INV, _Float16, short, f16)			\
+  T (INV, float, float, f32)			\
+  T (INV, double, double, f64)
+
+#define FOR_EACH_INV(T)				\
+  FOR_EACH_TYPE (T, 0)				\
+  FOR_EACH_TYPE (T, 1)				\
+  FOR_EACH_TYPE (T, 2)				\
+  FOR_EACH_TYPE (T, 3)				\
+  FOR_EACH_TYPE (T, 4)				\
+  FOR_EACH_TYPE (T, 5)				\
+  FOR_EACH_TYPE (T, 6)				\
+  FOR_EACH_TYPE (T, 7)
+
+FOR_EACH_INV (DEF_LOOP)
+
+/* { dg-final { scan-assembler-not {\tsel\t} } } */
+/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */
+/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */
+
+/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.d,} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_15_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_15_run.c	2018-05-24 13:05:46.050603898 +0100
@@ -0,0 +1,37 @@ 
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "vcond_15.c"
+
+#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX)		\
+  {							\
+    TYPE a[N], b[N], c[N], d[N];			\
+    CMPTYPE cond[N];					\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	b[i] = i % 15;					\
+	c[i] = i % 9 + 11;				\
+	d[i] = i % 13 + 14;				\
+	cond[i] = i % 17;				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+    f_##INV##_##SUFFIX (a, b, c, d, cond);		\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	double mb = (INV & 1 ? -b[i] : b[i]);		\
+	double mc = c[i];				\
+	double md = (INV & 2 ? -d[i] : d[i]);		\
+	double fma = __builtin_fma (mb, mc, md);	\
+	double truev = (INV & 4 ? -fma : fma);		\
+	if (a[i] != (i % 17 < 10 ? truev : d[i]))	\
+	  __builtin_abort ();				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+  }
+
+int
+main (void)
+{
+  FOR_EACH_INV (TEST_LOOP)
+  return 0;
+}
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_16.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_16.c	2018-05-24 13:05:46.050603898 +0100
@@ -0,0 +1,58 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#define N 119
+
+#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX)			\
+  void __attribute__ ((noipa))					\
+  f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b,	\
+		      TYPE *restrict c, TYPE *restrict d,	\
+		      CMPTYPE *restrict cond)			\
+  {								\
+    for (int i = 0; i < N; ++i)					\
+      {								\
+	TYPE mb = (INV & 1 ? -b[i] : b[i]);			\
+	TYPE mc = c[i];						\
+	TYPE md = (INV & 2 ? -d[i] : d[i]);			\
+	TYPE fma = __builtin_fma##SUFFIX (mb, mc, md);		\
+	TYPE truev = (INV & 4 ? -fma : fma);			\
+	a[i] = cond[i] < 10 ? truev : 10;			\
+      }								\
+  }
+
+#define FOR_EACH_TYPE(T, INV)			\
+  T (INV, _Float16, short, f16)			\
+  T (INV, float, float, f32)			\
+  T (INV, double, double, f64)
+
+#define FOR_EACH_INV(T)				\
+  FOR_EACH_TYPE (T, 0)				\
+  FOR_EACH_TYPE (T, 1)				\
+  FOR_EACH_TYPE (T, 2)				\
+  FOR_EACH_TYPE (T, 3)				\
+  FOR_EACH_TYPE (T, 4)				\
+  FOR_EACH_TYPE (T, 5)				\
+  FOR_EACH_TYPE (T, 6)				\
+  FOR_EACH_TYPE (T, 7)
+
+FOR_EACH_INV (DEF_LOOP)
+
+/* { dg-final { scan-assembler-times {\tsel\t} 24 } } */
+/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */
+/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */
+
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */
+
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */
+/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_16_run.c
===================================================================
--- /dev/null	2018-04-20 16:19:46.369131350 +0100
+++ gcc/testsuite/gcc.target/aarch64/sve/vcond_16_run.c	2018-05-24 13:05:46.050603898 +0100
@@ -0,0 +1,37 @@ 
+/* { dg-do run { target aarch64_sve_hw } } */
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include "vcond_16.c"
+
+#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX)		\
+  {							\
+    TYPE a[N], b[N], c[N], d[N];			\
+    CMPTYPE cond[N];					\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	b[i] = i % 15;					\
+	c[i] = i % 9 + 11;				\
+	d[i] = i % 13 + 14;				\
+	cond[i] = i % 17;				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+    f_##INV##_##SUFFIX (a, b, c, d, cond);		\
+    for (int i = 0; i < N; ++i)				\
+      {							\
+	double mb = (INV & 1 ? -b[i] : b[i]);		\
+	double mc = c[i];				\
+	double md = (INV & 2 ? -d[i] : d[i]);		\
+	double fma = __builtin_fma (mb, mc, md);	\
+	double truev = (INV & 4 ? -fma : fma);		\
+	if (a[i] != (i % 17 < 10 ? truev : 10))		\
+	  __builtin_abort ();				\
+	asm volatile ("" ::: "memory");			\
+      }							\
+  }
+
+int
+main (void)
+{
+  FOR_EACH_INV (TEST_LOOP)
+  return 0;
+}