Message ID | 87r2m1cl44.fsf@linaro.org |
---|---|
State | New |
Headers | show |
Series | Add IFN_COND_FMA functions | expand |
On Thu, May 24, 2018 at 2:08 PM Richard Sandiford < richard.sandiford@linaro.org> wrote: > This patch adds conditional equivalents of the IFN_FMA built-in functions. > Most of it is just a mechanical extension of the binary stuff. > Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf > and x86_64-linux-gnu. OK for the non-AArch64 bits? OK. Richard. > Richard > 2018-05-24 Richard Sandiford <richard.sandiford@linaro.org> > gcc/ > * doc/md.texi (cond_fma, cond_fms, cond_fnma, cond_fnms): Document. > * optabs.def (cond_fma_optab, cond_fms_optab, cond_fnma_optab) > (cond_fnms_optab): New optabs. > * internal-fn.def (COND_FMA, COND_FMS, COND_FNMA, COND_FNMS): New > internal functions. > (FMA): Use DEF_INTERNAL_FLT_FN rather than DEF_INTERNAL_FLT_FLOATN_FN. > * internal-fn.h (get_conditional_internal_fn): Declare. > (get_unconditional_internal_fn): Likewise. > * internal-fn.c (cond_ternary_direct): New macro. > (expand_cond_ternary_optab_fn): Likewise. > (direct_cond_ternary_optab_supported_p): Likewise. > (FOR_EACH_COND_FN_PAIR): Likewise. > (get_conditional_internal_fn): New function. > (get_unconditional_internal_fn): Likewise. > * gimple-match.h (gimple_match_op::MAX_NUM_OPS): Bump to 5. > (gimple_match_op::gimple_match_op): Add a new overload for 5 > operands. > (gimple_match_op::set_op): Likewise. > (gimple_resimplify5): Declare. > * genmatch.c (decision_tree::gen): Generate simplifications for > 5 operands. > * gimple-match-head.c (gimple_simplify): Define an overload for > 5 operands. Handle calls with 5 arguments in the top-level overload. > (convert_conditional_op): Handle conversions from unconditional > internal functions to conditional ones. > (gimple_resimplify5): New function. > (build_call_internal): Pass a fifth operand. > (maybe_push_res_to_seq): Likewise. > (try_conditional_simplification): Try converting conditional > internal functions to unconditional internal functions. > Handle 3-operand unconditional forms. > * match.pd (UNCOND_TERNARY, COND_TERNARY): Operator lists. > Define ternary equivalents of the current rules for binary conditional > internal functions. > * config/aarch64/aarch64.c (aarch64_preferred_else_value): Handle > ternary operations. > * config/aarch64/aarch64-sve.md (cond_<optab><mode>) > (*cond_<optab><mode>, *cond_<optab><mode>_acc): New > SVE_COND_FP_TERNARY patterns. > * config/aarch64/iterators.md (UNSPEC_COND_FMLA, UNSPEC_COND_FMLS) > (UNSPEC_COND_FNMLA, UNSPEC_COND_FNMLS): New unspecs. > (optab): Handle them. > (SVE_COND_FP_TERNARY): New int iterator. > (sve_fmla_op, sve_fmad_op): New int attributes. > gcc/testsuite/ > * gcc.dg/vect/vect-cond-arith-3.c: New test. > * gcc.target/aarch64/sve/vcond_13.c: Likewise. > * gcc.target/aarch64/sve/vcond_13_run.c: Likewise. > * gcc.target/aarch64/sve/vcond_14.c: Likewise. > * gcc.target/aarch64/sve/vcond_14_run.c: Likewise. > * gcc.target/aarch64/sve/vcond_15.c: Likewise. > * gcc.target/aarch64/sve/vcond_15_run.c: Likewise. > * gcc.target/aarch64/sve/vcond_16.c: Likewise. > * gcc.target/aarch64/sve/vcond_16_run.c: Likewise. > Index: gcc/doc/md.texi > =================================================================== > --- gcc/doc/md.texi 2018-05-24 10:12:10.142352315 +0100 > +++ gcc/doc/md.texi 2018-05-24 13:05:46.047607587 +0100 > @@ -6386,6 +6386,23 @@ Operands 0, 2, 3 and 4 all have mode @va > integer if @var{m} is scalar, otherwise it has the mode returned by > @code{TARGET_VECTORIZE_GET_MASK_MODE}. > +@cindex @code{cond_fma@var{mode}} instruction pattern > +@cindex @code{cond_fms@var{mode}} instruction pattern > +@cindex @code{cond_fnma@var{mode}} instruction pattern > +@cindex @code{cond_fnms@var{mode}} instruction pattern > +@item @samp{cond_fma@var{mode}} > +@itemx @samp{cond_fms@var{mode}} > +@itemx @samp{cond_fnma@var{mode}} > +@itemx @samp{cond_fnms@var{mode}} > +Like @samp{cond_add@var{m}}, except that the conditional operation > +takes 3 operands rather than two. For example, the vector form of > +@samp{cond_fma@var{mode}} is equivalent to: > + > +@smallexample > +for (i = 0; i < GET_MODE_NUNITS (@var{m}); i++) > + op0[i] = op1[i] ? fma (op2[i], op3[i], op4[i]) : op5[i]; > +@end smallexample > + > @cindex @code{neg@var{mode}cc} instruction pattern > @item @samp{neg@var{mode}cc} > Similar to @samp{mov@var{mode}cc} but for conditional negation. Conditionally > Index: gcc/optabs.def > =================================================================== > --- gcc/optabs.def 2018-05-24 10:12:10.146352152 +0100 > +++ gcc/optabs.def 2018-05-24 13:05:46.049605128 +0100 > @@ -234,6 +234,10 @@ OPTAB_D (cond_smin_optab, "cond_smin$a") > OPTAB_D (cond_smax_optab, "cond_smax$a") > OPTAB_D (cond_umin_optab, "cond_umin$a") > OPTAB_D (cond_umax_optab, "cond_umax$a") > +OPTAB_D (cond_fma_optab, "cond_fma$a") > +OPTAB_D (cond_fms_optab, "cond_fms$a") > +OPTAB_D (cond_fnma_optab, "cond_fnma$a") > +OPTAB_D (cond_fnms_optab, "cond_fnms$a") > OPTAB_D (cmov_optab, "cmov$a6") > OPTAB_D (cstore_optab, "cstore$a4") > OPTAB_D (ctrap_optab, "ctrap$a4") > Index: gcc/internal-fn.def > =================================================================== > --- gcc/internal-fn.def 2018-05-24 10:12:10.146352152 +0100 > +++ gcc/internal-fn.def 2018-05-24 13:05:46.048606357 +0100 > @@ -59,7 +59,8 @@ along with GCC; see the file COPYING3. > - binary: a normal binary optab, such as vec_interleave_lo_<mode> > - ternary: a normal ternary optab, such as fma<mode>4 > - - cond_binary: a conditional binary optab, such as add<mode>cc > + - cond_binary: a conditional binary optab, such as cond_add<mode> > + - cond_ternary: a conditional ternary optab, such as cond_fma_rev<mode> > - fold_left: for scalar = FN (scalar, vector), keyed off the vector mode > @@ -162,6 +163,11 @@ DEF_INTERNAL_OPTAB_FN (COND_IOR, ECF_CON > DEF_INTERNAL_OPTAB_FN (COND_XOR, ECF_CONST | ECF_NOTHROW, > cond_xor, cond_binary) > +DEF_INTERNAL_OPTAB_FN (COND_FMA, ECF_CONST, cond_fma, cond_ternary) > +DEF_INTERNAL_OPTAB_FN (COND_FMS, ECF_CONST, cond_fms, cond_ternary) > +DEF_INTERNAL_OPTAB_FN (COND_FNMA, ECF_CONST, cond_fnma, cond_ternary) > +DEF_INTERNAL_OPTAB_FN (COND_FNMS, ECF_CONST, cond_fnms, cond_ternary) > + > DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary) > DEF_INTERNAL_OPTAB_FN (REDUC_PLUS, ECF_CONST | ECF_NOTHROW, > @@ -230,7 +236,7 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > /* Ternary math functions. */ > -DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary) > +DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary) > /* Unary integer ops. */ > DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary) > Index: gcc/internal-fn.h > =================================================================== > --- gcc/internal-fn.h 2018-05-24 10:33:30.870095164 +0100 > +++ gcc/internal-fn.h 2018-05-24 13:05:46.049605128 +0100 > @@ -193,7 +193,9 @@ direct_internal_fn_supported_p (internal > extern bool set_edom_supported_p (void); > extern internal_fn get_conditional_internal_fn (tree_code); > +extern internal_fn get_conditional_internal_fn (internal_fn); > extern tree_code conditional_internal_fn_code (internal_fn); > +extern internal_fn get_unconditional_internal_fn (internal_fn); > extern bool internal_load_fn_p (internal_fn); > extern bool internal_store_fn_p (internal_fn); > Index: gcc/internal-fn.c > =================================================================== > --- gcc/internal-fn.c 2018-05-24 10:33:30.870095164 +0100 > +++ gcc/internal-fn.c 2018-05-24 13:05:46.048606357 +0100 > @@ -113,6 +113,7 @@ #define binary_direct { 0, 0, true } > #define ternary_direct { 0, 0, true } > #define cond_unary_direct { 1, 1, true } > #define cond_binary_direct { 1, 1, true } > +#define cond_ternary_direct { 1, 1, true } > #define while_direct { 0, 2, false } > #define fold_extract_direct { 2, 2, false } > #define fold_left_direct { 1, 1, false } > @@ -2993,6 +2994,9 @@ #define expand_cond_unary_optab_fn(FN, S > #define expand_cond_binary_optab_fn(FN, STMT, OPTAB) \ > expand_direct_optab_fn (FN, STMT, OPTAB, 4) > +#define expand_cond_ternary_optab_fn(FN, STMT, OPTAB) \ > + expand_direct_optab_fn (FN, STMT, OPTAB, 5) > + > #define expand_fold_extract_optab_fn(FN, STMT, OPTAB) \ > expand_direct_optab_fn (FN, STMT, OPTAB, 3) > @@ -3075,6 +3079,7 @@ #define direct_binary_optab_supported_p > #define direct_ternary_optab_supported_p direct_optab_supported_p > #define direct_cond_unary_optab_supported_p direct_optab_supported_p > #define direct_cond_binary_optab_supported_p direct_optab_supported_p > +#define direct_cond_ternary_optab_supported_p direct_optab_supported_p > #define direct_mask_load_optab_supported_p direct_optab_supported_p > #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p > #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p > @@ -3277,6 +3282,57 @@ #define CASE(CODE, IFN) case IFN: return > } > } > +/* Invoke T(IFN) for each internal function IFN that also has an > + IFN_COND_* form. */ > +#define FOR_EACH_COND_FN_PAIR(T) \ > + T (FMA) \ > + T (FMS) \ > + T (FNMA) \ > + T (FNMS) > + > +/* Return a function that only performs internal function FN when a > + certain condition is met and that uses a given fallback value otherwise. > + In other words, the returned function FN' is such that: > + > + LHS = FN' (COND, A1, ... An, ELSE) > + > + is equivalent to the C expression: > + > + LHS = COND ? FN (A1, ..., An) : ELSE; > + > + operating elementwise if the operands are vectors. > + > + Return IFN_LAST if no such function exists. */ > + > +internal_fn > +get_conditional_internal_fn (internal_fn fn) > +{ > + switch (fn) > + { > +#define CASE(NAME) case IFN_##NAME: return IFN_COND_##NAME; > + FOR_EACH_COND_FN_PAIR(CASE) > +#undef CASE > + default: > + return IFN_LAST; > + } > +} > + > +/* If IFN implements the conditional form of an unconditional internal > + function, return that unconditional function, otherwise return IFN_LAST. */ > + > +internal_fn > +get_unconditional_internal_fn (internal_fn ifn) > +{ > + switch (ifn) > + { > +#define CASE(NAME) case IFN_COND_##NAME: return IFN_##NAME; > + FOR_EACH_COND_FN_PAIR(CASE) > +#undef CASE > + default: > + return IFN_LAST; > + } > +} > + > /* Return true if IFN is some form of load from memory. */ > bool > Index: gcc/gimple-match.h > =================================================================== > --- gcc/gimple-match.h 2018-05-24 10:33:30.870095164 +0100 > +++ gcc/gimple-match.h 2018-05-24 13:05:46.048606357 +0100 > @@ -91,18 +91,21 @@ struct gimple_match_op > code_helper, tree, tree, tree, tree); > gimple_match_op (const gimple_match_cond &, > code_helper, tree, tree, tree, tree, tree); > + gimple_match_op (const gimple_match_cond &, > + code_helper, tree, tree, tree, tree, tree, tree); > void set_op (code_helper, tree, unsigned int); > void set_op (code_helper, tree, tree); > void set_op (code_helper, tree, tree, tree); > void set_op (code_helper, tree, tree, tree, tree); > void set_op (code_helper, tree, tree, tree, tree, tree); > + void set_op (code_helper, tree, tree, tree, tree, tree, tree); > void set_value (tree); > tree op_or_null (unsigned int) const; > /* The maximum value of NUM_OPS. */ > - static const unsigned int MAX_NUM_OPS = 4; > + static const unsigned int MAX_NUM_OPS = 5; > /* The conditions under which the operation is performed, and the value to > use as a fallback. */ > @@ -182,6 +185,20 @@ gimple_match_op::gimple_match_op (const > ops[3] = op3; > } > +inline > +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in, > + code_helper code_in, tree type_in, > + tree op0, tree op1, tree op2, tree op3, > + tree op4) > + : cond (cond_in), code (code_in), type (type_in), num_ops (5) > +{ > + ops[0] = op0; > + ops[1] = op1; > + ops[2] = op2; > + ops[3] = op3; > + ops[4] = op4; > +} > + > /* Change the operation performed to CODE_IN, the type of the result to > TYPE_IN, and the number of operands to NUM_OPS_IN. The caller needs > to set the operands itself. */ > @@ -242,6 +259,20 @@ gimple_match_op::set_op (code_helper cod > ops[3] = op3; > } > +inline void > +gimple_match_op::set_op (code_helper code_in, tree type_in, > + tree op0, tree op1, tree op2, tree op3, tree op4) > +{ > + code = code_in; > + type = type_in; > + num_ops = 5; > + ops[0] = op0; > + ops[1] = op1; > + ops[2] = op2; > + ops[3] = op3; > + ops[4] = op4; > +} > + > /* Set the "operation" to be the single value VALUE, such as a constant > or SSA_NAME. */ > @@ -279,6 +310,7 @@ bool gimple_resimplify1 (gimple_seq *, g > bool gimple_resimplify2 (gimple_seq *, gimple_match_op *, tree (*)(tree)); > bool gimple_resimplify3 (gimple_seq *, gimple_match_op *, tree (*)(tree)); > bool gimple_resimplify4 (gimple_seq *, gimple_match_op *, tree (*)(tree)); > +bool gimple_resimplify5 (gimple_seq *, gimple_match_op *, tree (*)(tree)); > tree maybe_push_res_to_seq (gimple_match_op *, gimple_seq *, > tree res = NULL_TREE); > void maybe_build_generic_op (gimple_match_op *); > Index: gcc/genmatch.c > =================================================================== > --- gcc/genmatch.c 2018-05-24 10:33:30.869095197 +0100 > +++ gcc/genmatch.c 2018-05-24 13:05:46.048606357 +0100 > @@ -3760,7 +3760,7 @@ decision_tree::gen (FILE *f, bool gimple > } > fprintf (stderr, "removed %u duplicate tails\n", rcnt); > - for (unsigned n = 1; n <= 4; ++n) > + for (unsigned n = 1; n <= 5; ++n) > { > /* First generate split-out functions. */ > for (unsigned i = 0; i < root->kids.length (); i++) > Index: gcc/gimple-match-head.c > =================================================================== > --- gcc/gimple-match-head.c 2018-05-24 10:33:30.870095164 +0100 > +++ gcc/gimple-match-head.c 2018-05-24 13:05:46.048606357 +0100 > @@ -54,6 +54,8 @@ static bool gimple_simplify (gimple_matc > code_helper, tree, tree, tree, tree); > static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), > code_helper, tree, tree, tree, tree, tree); > +static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), > + code_helper, tree, tree, tree, tree, tree, tree); > const unsigned int gimple_match_op::MAX_NUM_OPS; > @@ -80,7 +82,12 @@ convert_conditional_op (gimple_match_op > if (orig_op->code.is_tree_code ()) > ifn = get_conditional_internal_fn ((tree_code) orig_op->code); > else > - return false; > + { > + combined_fn cfn = orig_op->code; > + if (!internal_fn_p (cfn)) > + return false; > + ifn = get_conditional_internal_fn (as_internal_fn (cfn)); > + } > if (ifn == IFN_LAST) > return false; > unsigned int num_ops = orig_op->num_ops; > @@ -347,6 +354,34 @@ gimple_resimplify4 (gimple_seq *seq, gim > return false; > } > +/* Helper that matches and simplifies the toplevel result from > + a gimple_simplify run (where we don't want to build > + a stmt in case it's used in in-place folding). Replaces > + RES_OP with a simplified and/or canonicalized result and > + returns whether any change was made. */ > + > +bool > +gimple_resimplify5 (gimple_seq *seq, gimple_match_op *res_op, > + tree (*valueize)(tree)) > +{ > + /* No constant folding is defined for five-operand functions. */ > + > + gimple_match_op res_op2 (*res_op); > + if (gimple_simplify (&res_op2, seq, valueize, > + res_op->code, res_op->type, > + res_op->ops[0], res_op->ops[1], res_op->ops[2], > + res_op->ops[3], res_op->ops[4])) > + { > + *res_op = res_op2; > + return true; > + } > + > + if (maybe_resimplify_conditional_op (seq, res_op, valueize)) > + return true; > + > + return false; > +} > + > /* If in GIMPLE the operation described by RES_OP should be single-rhs, > build a GENERIC tree for that expression and update RES_OP accordingly. */ > @@ -388,7 +423,8 @@ build_call_internal (internal_fn fn, gim > res_op->op_or_null (0), > res_op->op_or_null (1), > res_op->op_or_null (2), > - res_op->op_or_null (3)); > + res_op->op_or_null (3), > + res_op->op_or_null (4)); > } > /* Push the exploded expression described by RES_OP as a statement to > @@ -482,7 +518,8 @@ maybe_push_res_to_seq (gimple_match_op * > res_op->op_or_null (0), > res_op->op_or_null (1), > res_op->op_or_null (2), > - res_op->op_or_null (3)); > + res_op->op_or_null (3), > + res_op->op_or_null (4)); > } > if (!res) > { > @@ -689,14 +726,22 @@ do_valueize (tree op, tree (*valueize)(t > try_conditional_simplification (internal_fn ifn, gimple_match_op *res_op, > gimple_seq *seq, tree (*valueize) (tree)) > { > + code_helper op; > tree_code code = conditional_internal_fn_code (ifn); > - if (code == ERROR_MARK) > - return false; > + if (code != ERROR_MARK) > + op = code; > + else > + { > + ifn = get_unconditional_internal_fn (ifn); > + if (ifn == IFN_LAST) > + return false; > + op = as_combined_fn (ifn); > + } > unsigned int num_ops = res_op->num_ops; > gimple_match_op cond_op (gimple_match_cond (res_op->ops[0], > res_op->ops[num_ops - 1]), > - code, res_op->type, num_ops - 2); > + op, res_op->type, num_ops - 2); > for (unsigned int i = 1; i < num_ops - 1; ++i) > cond_op.ops[i - 1] = res_op->ops[i]; > switch (num_ops - 2) > @@ -705,6 +750,10 @@ try_conditional_simplification (internal > if (!gimple_resimplify2 (seq, &cond_op, valueize)) > return false; > break; > + case 3: > + if (!gimple_resimplify3 (seq, &cond_op, valueize)) > + return false; > + break; > default: > gcc_unreachable (); > } > @@ -837,7 +886,7 @@ gimple_simplify (gimple *stmt, gimple_ma > /* ??? This way we can't simplify calls with side-effects. */ > if (gimple_call_lhs (stmt) != NULL_TREE > && gimple_call_num_args (stmt) >= 1 > - && gimple_call_num_args (stmt) <= 4) > + && gimple_call_num_args (stmt) <= 5) > { > bool valueized = false; > combined_fn cfn; > @@ -887,6 +936,9 @@ gimple_simplify (gimple *stmt, gimple_ma > case 4: > return (gimple_resimplify4 (seq, res_op, valueize) > || valueized); > + case 5: > + return (gimple_resimplify5 (seq, res_op, valueize) > + || valueized); > default: > gcc_unreachable (); > } > Index: gcc/match.pd > =================================================================== > --- gcc/match.pd 2018-05-24 10:33:30.870095164 +0100 > +++ gcc/match.pd 2018-05-24 13:05:46.049605128 +0100 > @@ -86,6 +86,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > IFN_COND_MUL IFN_COND_DIV IFN_COND_MOD IFN_COND_RDIV > IFN_COND_MIN IFN_COND_MAX > IFN_COND_AND IFN_COND_IOR IFN_COND_XOR) > + > +/* Same for ternary operations. */ > +(define_operator_list UNCOND_TERNARY > + IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS) > +(define_operator_list COND_TERNARY > + IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS) > /* As opposed to convert?, this still creates a single pattern, so > it is not a suitable replacement for convert? in all cases. */ > @@ -4798,6 +4804,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (if (element_precision (type) == element_precision (op_type)) > (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1))))))) > +/* Same for ternary operations. */ > +(for uncond_op (UNCOND_TERNARY) > + cond_op (COND_TERNARY) > + (simplify > + (vec_cond @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4) > + (with { tree op_type = TREE_TYPE (@5); } > + (if (element_precision (type) == element_precision (op_type)) > + (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4)))))) > + (simplify > + (vec_cond @0 @1 (view_convert? (uncond_op@5 @2 @3 @4))) > + (with { tree op_type = TREE_TYPE (@5); } > + (if (element_precision (type) == element_precision (op_type)) > + (view_convert (cond_op (bit_not @0) @2 @3 @4 > + (view_convert:op_type @1))))))) > + > /* Detect cases in which a VEC_COND_EXPR effectively replaces the > "else" value of an IFN_COND_*. */ > (for cond_op (COND_BINARY) > @@ -4806,3 +4827,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (with { tree op_type = TREE_TYPE (@3); } > (if (element_precision (type) == element_precision (op_type)) > (view_convert (cond_op @0 @1 @2 (view_convert:op_type @4))))))) > + > +/* Same for ternary operations. */ > +(for cond_op (COND_TERNARY) > + (simplify > + (vec_cond @0 (view_convert? (cond_op @0 @1 @2 @3 @4)) @5) > + (with { tree op_type = TREE_TYPE (@4); } > + (if (element_precision (type) == element_precision (op_type)) > + (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @5))))))) > Index: gcc/config/aarch64/aarch64.c > =================================================================== > --- gcc/config/aarch64/aarch64.c 2018-05-24 10:33:30.867095262 +0100 > +++ gcc/config/aarch64/aarch64.c 2018-05-24 13:05:46.046608817 +0100 > @@ -1292,14 +1292,18 @@ aarch64_get_mask_mode (poly_uint64 nunit > return default_get_mask_mode (nunits, nbytes); > } > -/* Implement TARGET_PREFERRED_ELSE_VALUE. Prefer to use the first > - arithmetic operand as the else value if the else value doesn't matter, > - since that exactly matches the SVE destructive merging form. */ > +/* Implement TARGET_PREFERRED_ELSE_VALUE. For binary operations, > + prefer to use the first arithmetic operand as the else value if > + the else value doesn't matter, since that exactly matches the SVE > + destructive merging form. For ternary operations we could either > + pick the first operand and use FMAD-like instructions or the last > + operand and use FMLA-like instructions; the latter seems more > + natural. */ > static tree > -aarch64_preferred_else_value (unsigned, tree, unsigned int, tree *ops) > +aarch64_preferred_else_value (unsigned, tree, unsigned int nops, tree *ops) > { > - return ops[0]; > + return nops == 3 ? ops[2] : ops[0]; > } > /* Implement TARGET_HARD_REGNO_NREGS. */ > Index: gcc/config/aarch64/aarch64-sve.md > =================================================================== > --- gcc/config/aarch64/aarch64-sve.md 2018-05-24 10:12:10.141352356 +0100 > +++ gcc/config/aarch64/aarch64-sve.md 2018-05-24 13:05:46.044611277 +0100 > @@ -2688,6 +2688,58 @@ (define_insn "*cond_<optab><mode>" > "<sve_fp_op>r\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>" > ) > +;; Predicated floating-point ternary operations with select. > +(define_expand "cond_<optab><mode>" > + [(set (match_operand:SVE_F 0 "register_operand") > + (unspec:SVE_F > + [(match_operand:<VPRED> 1 "register_operand") > + (unspec:SVE_F > + [(match_dup 1) > + (match_operand:SVE_F 2 "register_operand") > + (match_operand:SVE_F 3 "register_operand") > + (match_operand:SVE_F 4 "register_operand")] > + SVE_COND_FP_TERNARY) > + (match_operand:SVE_F 5 "register_operand")] > + UNSPEC_SEL))] > + "TARGET_SVE" > +{ > + aarch64_sve_prepare_conditional_op (operands, 6, true); > +}) > + > +;; Predicated floating-point ternary operations using the FMAD-like form. > +(define_insn "*cond_<optab><mode>" > + [(set (match_operand:SVE_F 0 "register_operand" "=w") > + (unspec:SVE_F > + [(match_operand:<VPRED> 1 "register_operand" "Upl") > + (unspec:SVE_F > + [(match_dup 1) > + (match_operand:SVE_F 2 "register_operand" "0") > + (match_operand:SVE_F 3 "register_operand" "w") > + (match_operand:SVE_F 4 "register_operand" "w")] > + SVE_COND_FP_TERNARY) > + (match_dup 2)] > + UNSPEC_SEL))] > + "TARGET_SVE" > + "<sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>" > +) > + > +;; Predicated floating-point ternary operations using the FMLA-like form. > +(define_insn "*cond_<optab><mode>_acc" > + [(set (match_operand:SVE_F 0 "register_operand" "=w") > + (unspec:SVE_F > + [(match_operand:<VPRED> 1 "register_operand" "Upl") > + (unspec:SVE_F > + [(match_dup 1) > + (match_operand:SVE_F 2 "register_operand" "w") > + (match_operand:SVE_F 3 "register_operand" "w") > + (match_operand:SVE_F 4 "register_operand" "0")] > + SVE_COND_FP_TERNARY) > + (match_dup 4)] > + UNSPEC_SEL))] > + "TARGET_SVE" > + "<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>" > +) > + > ;; Shift an SVE vector left and insert a scalar into element 0. > (define_insn "vec_shl_insert_<mode>" > [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w") > Index: gcc/config/aarch64/iterators.md > =================================================================== > --- gcc/config/aarch64/iterators.md 2018-05-24 10:12:10.142352315 +0100 > +++ gcc/config/aarch64/iterators.md 2018-05-24 13:05:46.046608817 +0100 > @@ -468,6 +468,10 @@ (define_c_enum "unspec" > UNSPEC_COND_DIV ; Used in aarch64-sve.md. > UNSPEC_COND_MAX ; Used in aarch64-sve.md. > UNSPEC_COND_MIN ; Used in aarch64-sve.md. > + UNSPEC_COND_FMLA ; Used in aarch64-sve.md. > + UNSPEC_COND_FMLS ; Used in aarch64-sve.md. > + UNSPEC_COND_FNMLA ; Used in aarch64-sve.md. > + UNSPEC_COND_FNMLS ; Used in aarch64-sve.md. > UNSPEC_COND_LT ; Used in aarch64-sve.md. > UNSPEC_COND_LE ; Used in aarch64-sve.md. > UNSPEC_COND_EQ ; Used in aarch64-sve.md. > @@ -1549,6 +1553,11 @@ (define_int_iterator SVE_COND_FP_BINARY > (define_int_iterator SVE_COND_FP_BINARY_REV [UNSPEC_COND_SUB UNSPEC_COND_DIV]) > +(define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA > + UNSPEC_COND_FMLS > + UNSPEC_COND_FNMLA > + UNSPEC_COND_FNMLS]) > + > (define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_LT UNSPEC_COND_LE > UNSPEC_COND_EQ UNSPEC_COND_NE > UNSPEC_COND_GE UNSPEC_COND_GT]) > @@ -1581,7 +1590,11 @@ (define_int_attr optab [(UNSPEC_ANDF "an > (UNSPEC_COND_MUL "mul") > (UNSPEC_COND_DIV "div") > (UNSPEC_COND_MAX "smax") > - (UNSPEC_COND_MIN "smin")]) > + (UNSPEC_COND_MIN "smin") > + (UNSPEC_COND_FMLA "fma") > + (UNSPEC_COND_FMLS "fnma") > + (UNSPEC_COND_FNMLA "fnms") > + (UNSPEC_COND_FNMLS "fms")]) > (define_int_attr maxmin_uns [(UNSPEC_UMAXV "umax") > (UNSPEC_UMINV "umin") > @@ -1799,6 +1812,16 @@ (define_int_attr sve_fp_op [(UNSPEC_COND > (UNSPEC_COND_MAX "fmaxnm") > (UNSPEC_COND_MIN "fminnm")]) > +(define_int_attr sve_fmla_op [(UNSPEC_COND_FMLA "fmla") > + (UNSPEC_COND_FMLS "fmls") > + (UNSPEC_COND_FNMLA "fnmla") > + (UNSPEC_COND_FNMLS "fnmls")]) > + > +(define_int_attr sve_fmad_op [(UNSPEC_COND_FMLA "fmad") > + (UNSPEC_COND_FMLS "fmsb") > + (UNSPEC_COND_FNMLA "fnmad") > + (UNSPEC_COND_FNMLS "fnmsb")]) > + > (define_int_attr commutative [(UNSPEC_COND_ADD "true") > (UNSPEC_COND_SUB "false") > (UNSPEC_COND_MUL "true") > Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c 2018-05-24 13:05:46.049605128 +0100 > @@ -0,0 +1,63 @@ > +/* { dg-require-effective-target scalar_all_fma } */ > +/* { dg-additional-options "-fdump-tree-optimized" } */ > + > +#include "tree-vect.h" > + > +#define N (VECTOR_BITS * 11 / 64 + 3) > + > +#define DEF(INV) \ > + void __attribute__ ((noipa)) \ > + f_##INV (double *restrict a, double *restrict b, \ > + double *restrict c, double *restrict d) \ > + { \ > + for (int i = 0; i < N; ++i) \ > + { \ > + double mb = (INV & 1 ? -b[i] : b[i]); \ > + double mc = c[i]; \ > + double md = (INV & 2 ? -d[i] : d[i]); \ > + double fma = __builtin_fma (mb, mc, md); \ > + double truev = (INV & 4 ? -fma : fma); \ > + a[i] = b[i] < 10 ? truev : 10.0; \ > + } \ > + } > + > +#define TEST(INV) \ > + { \ > + f_##INV (a, b, c, d); \ > + for (int i = 0; i < N; ++i) \ > + { \ > + double mb = (INV & 1 ? -b[i] : b[i]); \ > + double mc = c[i]; \ > + double md = (INV & 2 ? -d[i] : d[i]); \ > + double fma = __builtin_fma (mb, mc, md); \ > + double truev = (INV & 4 ? -fma : fma); \ > + if (a[i] != (i % 17 < 10 ? truev : 10.0)) \ > + __builtin_abort (); \ > + asm volatile ("" ::: "memory"); \ > + } \ > + } > + > +#define FOR_EACH_INV(T) \ > + T (0) T (1) T (2) T (3) T (4) T (5) T (6) T (7) > + > +FOR_EACH_INV (DEF) > + > +int > +main (void) > +{ > + double a[N], b[N], c[N], d[N]; > + for (int i = 0; i < N; ++i) > + { > + b[i] = i % 17; > + c[i] = i % 9 + 11; > + d[i] = i % 13 + 14; > + asm volatile ("" ::: "memory"); > + } > + FOR_EACH_INV (TEST) > + return 0; > +} > + > +/* { dg-final { scan-tree-dump-times { = \.COND_FMA } 2 "optimized" { target vect_double_cond_arith } } } */ > +/* { dg-final { scan-tree-dump-times { = \.COND_FMS } 2 "optimized" { target vect_double_cond_arith } } } */ > +/* { dg-final { scan-tree-dump-times { = \.COND_FNMA } 2 "optimized" { target vect_double_cond_arith } } } */ > +/* { dg-final { scan-tree-dump-times { = \.COND_FNMS } 2 "optimized" { target vect_double_cond_arith } } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_13.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_13.c 2018-05-24 13:05:46.049605128 +0100 > @@ -0,0 +1,58 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#define N 119 > + > +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ > + void __attribute__ ((noipa)) \ > + f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b, \ > + TYPE *restrict c, TYPE *restrict d, \ > + CMPTYPE *restrict cond) \ > + { \ > + for (int i = 0; i < N; ++i) \ > + { \ > + TYPE mb = (INV & 1 ? -b[i] : b[i]); \ > + TYPE mc = c[i]; \ > + TYPE md = (INV & 2 ? -d[i] : d[i]); \ > + TYPE fma = __builtin_fma##SUFFIX (mb, mc, md); \ > + TYPE truev = (INV & 4 ? -fma : fma); \ > + a[i] = cond[i] < 10 ? truev : b[i]; \ > + } \ > + } > + > +#define FOR_EACH_TYPE(T, INV) \ > + T (INV, _Float16, short, f16) \ > + T (INV, float, float, f32) \ > + T (INV, double, double, f64) > + > +#define FOR_EACH_INV(T) \ > + FOR_EACH_TYPE (T, 0) \ > + FOR_EACH_TYPE (T, 1) \ > + FOR_EACH_TYPE (T, 2) \ > + FOR_EACH_TYPE (T, 3) \ > + FOR_EACH_TYPE (T, 4) \ > + FOR_EACH_TYPE (T, 5) \ > + FOR_EACH_TYPE (T, 6) \ > + FOR_EACH_TYPE (T, 7) > + > +FOR_EACH_INV (DEF_LOOP) > + > +/* { dg-final { scan-assembler-not {\tsel\t} } } */ > +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ > +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */ > + > +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */ > + > +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */ > + > +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */ > + > +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_13_run.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_13_run.c 2018-05-24 13:05:46.050603898 +0100 > @@ -0,0 +1,37 @@ > +/* { dg-do run { target aarch64_sve_hw } } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include "vcond_13.c" > + > +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ > + { \ > + TYPE a[N], b[N], c[N], d[N]; \ > + CMPTYPE cond[N]; \ > + for (int i = 0; i < N; ++i) \ > + { \ > + b[i] = i % 15; \ > + c[i] = i % 9 + 11; \ > + d[i] = i % 13 + 14; \ > + cond[i] = i % 17; \ > + asm volatile ("" ::: "memory"); \ > + } \ > + f_##INV##_##SUFFIX (a, b, c, d, cond); \ > + for (int i = 0; i < N; ++i) \ > + { \ > + double mb = (INV & 1 ? -b[i] : b[i]); \ > + double mc = c[i]; \ > + double md = (INV & 2 ? -d[i] : d[i]); \ > + double fma = __builtin_fma (mb, mc, md); \ > + double truev = (INV & 4 ? -fma : fma); \ > + if (a[i] != (i % 17 < 10 ? truev : b[i])) \ > + __builtin_abort (); \ > + asm volatile ("" ::: "memory"); \ > + } \ > + } > + > +int > +main (void) > +{ > + FOR_EACH_INV (TEST_LOOP) > + return 0; > +} > Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_14.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_14.c 2018-05-24 13:05:46.050603898 +0100 > @@ -0,0 +1,58 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#define N 119 > + > +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ > + void __attribute__ ((noipa)) \ > + f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b, \ > + TYPE *restrict c, TYPE *restrict d, \ > + CMPTYPE *restrict cond) \ > + { \ > + for (int i = 0; i < N; ++i) \ > + { \ > + TYPE mb = (INV & 1 ? -b[i] : b[i]); \ > + TYPE mc = c[i]; \ > + TYPE md = (INV & 2 ? -d[i] : d[i]); \ > + TYPE fma = __builtin_fma##SUFFIX (mb, mc, md); \ > + TYPE truev = (INV & 4 ? -fma : fma); \ > + a[i] = cond[i] < 10 ? truev : c[i]; \ > + } \ > + } > + > +#define FOR_EACH_TYPE(T, INV) \ > + T (INV, _Float16, short, f16) \ > + T (INV, float, float, f32) \ > + T (INV, double, double, f64) > + > +#define FOR_EACH_INV(T) \ > + FOR_EACH_TYPE (T, 0) \ > + FOR_EACH_TYPE (T, 1) \ > + FOR_EACH_TYPE (T, 2) \ > + FOR_EACH_TYPE (T, 3) \ > + FOR_EACH_TYPE (T, 4) \ > + FOR_EACH_TYPE (T, 5) \ > + FOR_EACH_TYPE (T, 6) \ > + FOR_EACH_TYPE (T, 7) > + > +FOR_EACH_INV (DEF_LOOP) > + > +/* { dg-final { scan-assembler-not {\tsel\t} } } */ > +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ > +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */ > + > +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */ > + > +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */ > + > +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */ > + > +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_14_run.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_14_run.c 2018-05-24 13:05:46.050603898 +0100 > @@ -0,0 +1,37 @@ > +/* { dg-do run { target aarch64_sve_hw } } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include "vcond_14.c" > + > +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ > + { \ > + TYPE a[N], b[N], c[N], d[N]; \ > + CMPTYPE cond[N]; \ > + for (int i = 0; i < N; ++i) \ > + { \ > + b[i] = i % 15; \ > + c[i] = i % 9 + 11; \ > + d[i] = i % 13 + 14; \ > + cond[i] = i % 17; \ > + asm volatile ("" ::: "memory"); \ > + } \ > + f_##INV##_##SUFFIX (a, b, c, d, cond); \ > + for (int i = 0; i < N; ++i) \ > + { \ > + double mb = (INV & 1 ? -b[i] : b[i]); \ > + double mc = c[i]; \ > + double md = (INV & 2 ? -d[i] : d[i]); \ > + double fma = __builtin_fma (mb, mc, md); \ > + double truev = (INV & 4 ? -fma : fma); \ > + if (a[i] != (i % 17 < 10 ? truev : c[i])) \ > + __builtin_abort (); \ > + asm volatile ("" ::: "memory"); \ > + } \ > + } > + > +int > +main (void) > +{ > + FOR_EACH_INV (TEST_LOOP) > + return 0; > +} > Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_15.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_15.c 2018-05-24 13:05:46.050603898 +0100 > @@ -0,0 +1,58 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#define N 119 > + > +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ > + void __attribute__ ((noipa)) \ > + f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b, \ > + TYPE *restrict c, TYPE *restrict d, \ > + CMPTYPE *restrict cond) \ > + { \ > + for (int i = 0; i < N; ++i) \ > + { \ > + TYPE mb = (INV & 1 ? -b[i] : b[i]); \ > + TYPE mc = c[i]; \ > + TYPE md = (INV & 2 ? -d[i] : d[i]); \ > + TYPE fma = __builtin_fma##SUFFIX (mb, mc, md); \ > + TYPE truev = (INV & 4 ? -fma : fma); \ > + a[i] = cond[i] < 10 ? truev : d[i]; \ > + } \ > + } > + > +#define FOR_EACH_TYPE(T, INV) \ > + T (INV, _Float16, short, f16) \ > + T (INV, float, float, f32) \ > + T (INV, double, double, f64) > + > +#define FOR_EACH_INV(T) \ > + FOR_EACH_TYPE (T, 0) \ > + FOR_EACH_TYPE (T, 1) \ > + FOR_EACH_TYPE (T, 2) \ > + FOR_EACH_TYPE (T, 3) \ > + FOR_EACH_TYPE (T, 4) \ > + FOR_EACH_TYPE (T, 5) \ > + FOR_EACH_TYPE (T, 6) \ > + FOR_EACH_TYPE (T, 7) > + > +FOR_EACH_INV (DEF_LOOP) > + > +/* { dg-final { scan-assembler-not {\tsel\t} } } */ > +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ > +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */ > + > +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d,} 2 } } */ > + > +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.d,} 2 } } */ > + > +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.d,} 2 } } */ > + > +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.d,} 2 } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_15_run.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_15_run.c 2018-05-24 13:05:46.050603898 +0100 > @@ -0,0 +1,37 @@ > +/* { dg-do run { target aarch64_sve_hw } } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include "vcond_15.c" > + > +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ > + { \ > + TYPE a[N], b[N], c[N], d[N]; \ > + CMPTYPE cond[N]; \ > + for (int i = 0; i < N; ++i) \ > + { \ > + b[i] = i % 15; \ > + c[i] = i % 9 + 11; \ > + d[i] = i % 13 + 14; \ > + cond[i] = i % 17; \ > + asm volatile ("" ::: "memory"); \ > + } \ > + f_##INV##_##SUFFIX (a, b, c, d, cond); \ > + for (int i = 0; i < N; ++i) \ > + { \ > + double mb = (INV & 1 ? -b[i] : b[i]); \ > + double mc = c[i]; \ > + double md = (INV & 2 ? -d[i] : d[i]); \ > + double fma = __builtin_fma (mb, mc, md); \ > + double truev = (INV & 4 ? -fma : fma); \ > + if (a[i] != (i % 17 < 10 ? truev : d[i])) \ > + __builtin_abort (); \ > + asm volatile ("" ::: "memory"); \ > + } \ > + } > + > +int > +main (void) > +{ > + FOR_EACH_INV (TEST_LOOP) > + return 0; > +} > Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_16.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_16.c 2018-05-24 13:05:46.050603898 +0100 > @@ -0,0 +1,58 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#define N 119 > + > +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ > + void __attribute__ ((noipa)) \ > + f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b, \ > + TYPE *restrict c, TYPE *restrict d, \ > + CMPTYPE *restrict cond) \ > + { \ > + for (int i = 0; i < N; ++i) \ > + { \ > + TYPE mb = (INV & 1 ? -b[i] : b[i]); \ > + TYPE mc = c[i]; \ > + TYPE md = (INV & 2 ? -d[i] : d[i]); \ > + TYPE fma = __builtin_fma##SUFFIX (mb, mc, md); \ > + TYPE truev = (INV & 4 ? -fma : fma); \ > + a[i] = cond[i] < 10 ? truev : 10; \ > + } \ > + } > + > +#define FOR_EACH_TYPE(T, INV) \ > + T (INV, _Float16, short, f16) \ > + T (INV, float, float, f32) \ > + T (INV, double, double, f64) > + > +#define FOR_EACH_INV(T) \ > + FOR_EACH_TYPE (T, 0) \ > + FOR_EACH_TYPE (T, 1) \ > + FOR_EACH_TYPE (T, 2) \ > + FOR_EACH_TYPE (T, 3) \ > + FOR_EACH_TYPE (T, 4) \ > + FOR_EACH_TYPE (T, 5) \ > + FOR_EACH_TYPE (T, 6) \ > + FOR_EACH_TYPE (T, 7) > + > +FOR_EACH_INV (DEF_LOOP) > + > +/* { dg-final { scan-assembler-times {\tsel\t} 24 } } */ > +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ > +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */ > + > +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */ > + > +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */ > + > +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */ > + > +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */ > +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */ > Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_16_run.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_16_run.c 2018-05-24 13:05:46.050603898 +0100 > @@ -0,0 +1,37 @@ > +/* { dg-do run { target aarch64_sve_hw } } */ > +/* { dg-options "-O2 -ftree-vectorize" } */ > + > +#include "vcond_16.c" > + > +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ > + { \ > + TYPE a[N], b[N], c[N], d[N]; \ > + CMPTYPE cond[N]; \ > + for (int i = 0; i < N; ++i) \ > + { \ > + b[i] = i % 15; \ > + c[i] = i % 9 + 11; \ > + d[i] = i % 13 + 14; \ > + cond[i] = i % 17; \ > + asm volatile ("" ::: "memory"); \ > + } \ > + f_##INV##_##SUFFIX (a, b, c, d, cond); \ > + for (int i = 0; i < N; ++i) \ > + { \ > + double mb = (INV & 1 ? -b[i] : b[i]); \ > + double mc = c[i]; \ > + double md = (INV & 2 ? -d[i] : d[i]); \ > + double fma = __builtin_fma (mb, mc, md); \ > + double truev = (INV & 4 ? -fma : fma); \ > + if (a[i] != (i % 17 < 10 ? truev : 10)) \ > + __builtin_abort (); \ > + asm volatile ("" ::: "memory"); \ > + } \ > + } > + > +int > +main (void) > +{ > + FOR_EACH_INV (TEST_LOOP) > + return 0; > +}
Richard Biener <richard.guenther@gmail.com> writes: > On Thu, May 24, 2018 at 2:08 PM Richard Sandiford < > richard.sandiford@linaro.org> wrote: > >> This patch adds conditional equivalents of the IFN_FMA built-in functions. >> Most of it is just a mechanical extension of the binary stuff. > >> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf >> and x86_64-linux-gnu. OK for the non-AArch64 bits? > > OK. Thanks. For the record, here's what I installed after updating the SVE patterns in line with rth's recent MOVPRFX changes. Richard 2018-07-12 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/md.texi (cond_fma, cond_fms, cond_fnma, cond_fnms): Document. * optabs.def (cond_fma_optab, cond_fms_optab, cond_fnma_optab) (cond_fnms_optab): New optabs. * internal-fn.def (COND_FMA, COND_FMS, COND_FNMA, COND_FNMS): New internal functions. (FMA): Use DEF_INTERNAL_FLT_FN rather than DEF_INTERNAL_FLT_FLOATN_FN. * internal-fn.h (get_conditional_internal_fn): Declare. (get_unconditional_internal_fn): Likewise. * internal-fn.c (cond_ternary_direct): New macro. (expand_cond_ternary_optab_fn): Likewise. (direct_cond_ternary_optab_supported_p): Likewise. (FOR_EACH_COND_FN_PAIR): Likewise. (get_conditional_internal_fn): New function. (get_unconditional_internal_fn): Likewise. * gimple-match.h (gimple_match_op::MAX_NUM_OPS): Bump to 5. (gimple_match_op::gimple_match_op): Add a new overload for 5 operands. (gimple_match_op::set_op): Likewise. (gimple_resimplify5): Declare. * genmatch.c (decision_tree::gen): Generate simplifications for 5 operands. * gimple-match-head.c (gimple_simplify): Define an overload for 5 operands. Handle calls with 5 arguments in the top-level overload. (convert_conditional_op): Handle conversions from unconditional internal functions to conditional ones. (gimple_resimplify5): New function. (build_call_internal): Pass a fifth operand. (maybe_push_res_to_seq): Likewise. (try_conditional_simplification): Try converting conditional internal functions to unconditional internal functions. Handle 3-operand unconditional forms. * match.pd (UNCOND_TERNARY, COND_TERNARY): Operator lists. Define ternary equivalents of the current rules for binary conditional internal functions. * config/aarch64/aarch64.c (aarch64_preferred_else_value): Handle ternary operations. * config/aarch64/iterators.md (UNSPEC_COND_FMLA, UNSPEC_COND_FMLS) (UNSPEC_COND_FNMLA, UNSPEC_COND_FNMLS): New unspecs. (optab): Handle them. (SVE_COND_FP_TERNARY): New int iterator. (sve_fmla_op, sve_fmad_op): New int attributes. * config/aarch64/aarch64-sve.md (cond_<optab><mode>) (*cond_<optab><mode>_2, *cond_<optab><mode_4) (*cond_<optab><mode>_any): New SVE_COND_FP_TERNARY patterns. gcc/testsuite/ * gcc.dg/vect/vect-cond-arith-3.c: New test. * gcc.target/aarch64/sve/vcond_13.c: Likewise. * gcc.target/aarch64/sve/vcond_13_run.c: Likewise. * gcc.target/aarch64/sve/vcond_14.c: Likewise. * gcc.target/aarch64/sve/vcond_14_run.c: Likewise. * gcc.target/aarch64/sve/vcond_15.c: Likewise. * gcc.target/aarch64/sve/vcond_15_run.c: Likewise. * gcc.target/aarch64/sve/vcond_16.c: Likewise. * gcc.target/aarch64/sve/vcond_16_run.c: Likewise. Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi 2018-07-12 12:39:27.789323671 +0100 +++ gcc/doc/md.texi 2018-07-12 12:42:44.366933190 +0100 @@ -6438,6 +6438,23 @@ Operands 0, 2, 3 and 4 all have mode @va integer if @var{m} is scalar, otherwise it has the mode returned by @code{TARGET_VECTORIZE_GET_MASK_MODE}. +@cindex @code{cond_fma@var{mode}} instruction pattern +@cindex @code{cond_fms@var{mode}} instruction pattern +@cindex @code{cond_fnma@var{mode}} instruction pattern +@cindex @code{cond_fnms@var{mode}} instruction pattern +@item @samp{cond_fma@var{mode}} +@itemx @samp{cond_fms@var{mode}} +@itemx @samp{cond_fnma@var{mode}} +@itemx @samp{cond_fnms@var{mode}} +Like @samp{cond_add@var{m}}, except that the conditional operation +takes 3 operands rather than two. For example, the vector form of +@samp{cond_fma@var{mode}} is equivalent to: + +@smallexample +for (i = 0; i < GET_MODE_NUNITS (@var{m}); i++) + op0[i] = op1[i] ? fma (op2[i], op3[i], op4[i]) : op5[i]; +@end smallexample + @cindex @code{neg@var{mode}cc} instruction pattern @item @samp{neg@var{mode}cc} Similar to @samp{mov@var{mode}cc} but for conditional negation. Conditionally Index: gcc/optabs.def =================================================================== --- gcc/optabs.def 2018-07-12 12:39:27.976869878 +0100 +++ gcc/optabs.def 2018-07-12 12:42:44.368856626 +0100 @@ -234,6 +234,10 @@ OPTAB_D (cond_smin_optab, "cond_smin$a") OPTAB_D (cond_smax_optab, "cond_smax$a") OPTAB_D (cond_umin_optab, "cond_umin$a") OPTAB_D (cond_umax_optab, "cond_umax$a") +OPTAB_D (cond_fma_optab, "cond_fma$a") +OPTAB_D (cond_fms_optab, "cond_fms$a") +OPTAB_D (cond_fnma_optab, "cond_fnma$a") +OPTAB_D (cond_fnms_optab, "cond_fnms$a") OPTAB_D (cmov_optab, "cmov$a6") OPTAB_D (cstore_optab, "cstore$a4") OPTAB_D (ctrap_optab, "ctrap$a4") Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def 2018-07-12 12:39:28.919588848 +0100 +++ gcc/internal-fn.def 2018-07-12 12:42:44.367894908 +0100 @@ -59,7 +59,8 @@ along with GCC; see the file COPYING3. - binary: a normal binary optab, such as vec_interleave_lo_<mode> - ternary: a normal ternary optab, such as fma<mode>4 - - cond_binary: a conditional binary optab, such as add<mode>cc + - cond_binary: a conditional binary optab, such as cond_add<mode> + - cond_ternary: a conditional ternary optab, such as cond_fma_rev<mode> - fold_left: for scalar = FN (scalar, vector), keyed off the vector mode @@ -167,6 +168,11 @@ DEF_INTERNAL_OPTAB_FN (COND_IOR, ECF_CON DEF_INTERNAL_OPTAB_FN (COND_XOR, ECF_CONST | ECF_NOTHROW, cond_xor, cond_binary) +DEF_INTERNAL_OPTAB_FN (COND_FMA, ECF_CONST, cond_fma, cond_ternary) +DEF_INTERNAL_OPTAB_FN (COND_FMS, ECF_CONST, cond_fms, cond_ternary) +DEF_INTERNAL_OPTAB_FN (COND_FNMA, ECF_CONST, cond_fnma, cond_ternary) +DEF_INTERNAL_OPTAB_FN (COND_FNMS, ECF_CONST, cond_fnms, cond_ternary) + DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary) DEF_INTERNAL_OPTAB_FN (REDUC_PLUS, ECF_CONST | ECF_NOTHROW, @@ -235,7 +241,7 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) /* Ternary math functions. */ -DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary) +DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary) /* Unary integer ops. */ DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary) Index: gcc/internal-fn.h =================================================================== --- gcc/internal-fn.h 2018-07-12 12:41:44.919389631 +0100 +++ gcc/internal-fn.h 2018-07-12 12:42:44.367894908 +0100 @@ -193,7 +193,9 @@ direct_internal_fn_supported_p (internal extern bool set_edom_supported_p (void); extern internal_fn get_conditional_internal_fn (tree_code); +extern internal_fn get_conditional_internal_fn (internal_fn); extern tree_code conditional_internal_fn_code (internal_fn); +extern internal_fn get_unconditional_internal_fn (internal_fn); extern bool internal_load_fn_p (internal_fn); extern bool internal_store_fn_p (internal_fn); Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c 2018-07-12 12:41:44.919389631 +0100 +++ gcc/internal-fn.c 2018-07-12 12:42:44.367894908 +0100 @@ -113,6 +113,7 @@ #define binary_direct { 0, 0, true } #define ternary_direct { 0, 0, true } #define cond_unary_direct { 1, 1, true } #define cond_binary_direct { 1, 1, true } +#define cond_ternary_direct { 1, 1, true } #define while_direct { 0, 2, false } #define fold_extract_direct { 2, 2, false } #define fold_left_direct { 1, 1, false } @@ -2993,6 +2994,9 @@ #define expand_cond_unary_optab_fn(FN, S #define expand_cond_binary_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 4) +#define expand_cond_ternary_optab_fn(FN, STMT, OPTAB) \ + expand_direct_optab_fn (FN, STMT, OPTAB, 5) + #define expand_fold_extract_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 3) @@ -3075,6 +3079,7 @@ #define direct_binary_optab_supported_p #define direct_ternary_optab_supported_p direct_optab_supported_p #define direct_cond_unary_optab_supported_p direct_optab_supported_p #define direct_cond_binary_optab_supported_p direct_optab_supported_p +#define direct_cond_ternary_optab_supported_p direct_optab_supported_p #define direct_mask_load_optab_supported_p direct_optab_supported_p #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p @@ -3277,6 +3282,57 @@ #define CASE(CODE, IFN) case IFN: return } } +/* Invoke T(IFN) for each internal function IFN that also has an + IFN_COND_* form. */ +#define FOR_EACH_COND_FN_PAIR(T) \ + T (FMA) \ + T (FMS) \ + T (FNMA) \ + T (FNMS) + +/* Return a function that only performs internal function FN when a + certain condition is met and that uses a given fallback value otherwise. + In other words, the returned function FN' is such that: + + LHS = FN' (COND, A1, ... An, ELSE) + + is equivalent to the C expression: + + LHS = COND ? FN (A1, ..., An) : ELSE; + + operating elementwise if the operands are vectors. + + Return IFN_LAST if no such function exists. */ + +internal_fn +get_conditional_internal_fn (internal_fn fn) +{ + switch (fn) + { +#define CASE(NAME) case IFN_##NAME: return IFN_COND_##NAME; + FOR_EACH_COND_FN_PAIR(CASE) +#undef CASE + default: + return IFN_LAST; + } +} + +/* If IFN implements the conditional form of an unconditional internal + function, return that unconditional function, otherwise return IFN_LAST. */ + +internal_fn +get_unconditional_internal_fn (internal_fn ifn) +{ + switch (ifn) + { +#define CASE(NAME) case IFN_COND_##NAME: return IFN_##NAME; + FOR_EACH_COND_FN_PAIR(CASE) +#undef CASE + default: + return IFN_LAST; + } +} + /* Return true if IFN is some form of load from memory. */ bool Index: gcc/gimple-match.h =================================================================== --- gcc/gimple-match.h 2018-07-12 12:41:44.919389631 +0100 +++ gcc/gimple-match.h 2018-07-12 12:42:44.367894908 +0100 @@ -91,18 +91,21 @@ struct gimple_match_op code_helper, tree, tree, tree, tree); gimple_match_op (const gimple_match_cond &, code_helper, tree, tree, tree, tree, tree); + gimple_match_op (const gimple_match_cond &, + code_helper, tree, tree, tree, tree, tree, tree); void set_op (code_helper, tree, unsigned int); void set_op (code_helper, tree, tree); void set_op (code_helper, tree, tree, tree); void set_op (code_helper, tree, tree, tree, tree); void set_op (code_helper, tree, tree, tree, tree, tree); + void set_op (code_helper, tree, tree, tree, tree, tree, tree); void set_value (tree); tree op_or_null (unsigned int) const; /* The maximum value of NUM_OPS. */ - static const unsigned int MAX_NUM_OPS = 4; + static const unsigned int MAX_NUM_OPS = 5; /* The conditions under which the operation is performed, and the value to use as a fallback. */ @@ -182,6 +185,20 @@ gimple_match_op::gimple_match_op (const ops[3] = op3; } +inline +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in, + code_helper code_in, tree type_in, + tree op0, tree op1, tree op2, tree op3, + tree op4) + : cond (cond_in), code (code_in), type (type_in), num_ops (5) +{ + ops[0] = op0; + ops[1] = op1; + ops[2] = op2; + ops[3] = op3; + ops[4] = op4; +} + /* Change the operation performed to CODE_IN, the type of the result to TYPE_IN, and the number of operands to NUM_OPS_IN. The caller needs to set the operands itself. */ @@ -242,6 +259,20 @@ gimple_match_op::set_op (code_helper cod ops[3] = op3; } +inline void +gimple_match_op::set_op (code_helper code_in, tree type_in, + tree op0, tree op1, tree op2, tree op3, tree op4) +{ + code = code_in; + type = type_in; + num_ops = 5; + ops[0] = op0; + ops[1] = op1; + ops[2] = op2; + ops[3] = op3; + ops[4] = op4; +} + /* Set the "operation" to be the single value VALUE, such as a constant or SSA_NAME. */ @@ -279,6 +310,7 @@ bool gimple_resimplify1 (gimple_seq *, g bool gimple_resimplify2 (gimple_seq *, gimple_match_op *, tree (*)(tree)); bool gimple_resimplify3 (gimple_seq *, gimple_match_op *, tree (*)(tree)); bool gimple_resimplify4 (gimple_seq *, gimple_match_op *, tree (*)(tree)); +bool gimple_resimplify5 (gimple_seq *, gimple_match_op *, tree (*)(tree)); tree maybe_push_res_to_seq (gimple_match_op *, gimple_seq *, tree res = NULL_TREE); void maybe_build_generic_op (gimple_match_op *); Index: gcc/genmatch.c =================================================================== --- gcc/genmatch.c 2018-07-12 12:41:44.918413713 +0100 +++ gcc/genmatch.c 2018-07-12 12:42:44.366933190 +0100 @@ -3750,7 +3750,7 @@ decision_tree::gen (FILE *f, bool gimple } fprintf (stderr, "removed %u duplicate tails\n", rcnt); - for (unsigned n = 1; n <= 4; ++n) + for (unsigned n = 1; n <= 5; ++n) { /* First generate split-out functions. */ for (unsigned i = 0; i < root->kids.length (); i++) Index: gcc/gimple-match-head.c =================================================================== --- gcc/gimple-match-head.c 2018-07-12 12:41:44.919389631 +0100 +++ gcc/gimple-match-head.c 2018-07-12 12:42:44.366933190 +0100 @@ -54,6 +54,8 @@ static bool gimple_simplify (gimple_matc code_helper, tree, tree, tree, tree); static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), code_helper, tree, tree, tree, tree, tree); +static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), + code_helper, tree, tree, tree, tree, tree, tree); const unsigned int gimple_match_op::MAX_NUM_OPS; @@ -80,7 +82,12 @@ convert_conditional_op (gimple_match_op if (orig_op->code.is_tree_code ()) ifn = get_conditional_internal_fn ((tree_code) orig_op->code); else - return false; + { + combined_fn cfn = orig_op->code; + if (!internal_fn_p (cfn)) + return false; + ifn = get_conditional_internal_fn (as_internal_fn (cfn)); + } if (ifn == IFN_LAST) return false; unsigned int num_ops = orig_op->num_ops; @@ -403,6 +410,34 @@ gimple_resimplify4 (gimple_seq *seq, gim return false; } +/* Helper that matches and simplifies the toplevel result from + a gimple_simplify run (where we don't want to build + a stmt in case it's used in in-place folding). Replaces + RES_OP with a simplified and/or canonicalized result and + returns whether any change was made. */ + +bool +gimple_resimplify5 (gimple_seq *seq, gimple_match_op *res_op, + tree (*valueize)(tree)) +{ + /* No constant folding is defined for five-operand functions. */ + + gimple_match_op res_op2 (*res_op); + if (gimple_simplify (&res_op2, seq, valueize, + res_op->code, res_op->type, + res_op->ops[0], res_op->ops[1], res_op->ops[2], + res_op->ops[3], res_op->ops[4])) + { + *res_op = res_op2; + return true; + } + + if (maybe_resimplify_conditional_op (seq, res_op, valueize)) + return true; + + return false; +} + /* If in GIMPLE the operation described by RES_OP should be single-rhs, build a GENERIC tree for that expression and update RES_OP accordingly. */ @@ -444,7 +479,8 @@ build_call_internal (internal_fn fn, gim res_op->op_or_null (0), res_op->op_or_null (1), res_op->op_or_null (2), - res_op->op_or_null (3)); + res_op->op_or_null (3), + res_op->op_or_null (4)); } /* Push the exploded expression described by RES_OP as a statement to @@ -538,7 +574,8 @@ maybe_push_res_to_seq (gimple_match_op * res_op->op_or_null (0), res_op->op_or_null (1), res_op->op_or_null (2), - res_op->op_or_null (3)); + res_op->op_or_null (3), + res_op->op_or_null (4)); } if (!res) { @@ -745,14 +782,22 @@ do_valueize (tree op, tree (*valueize)(t try_conditional_simplification (internal_fn ifn, gimple_match_op *res_op, gimple_seq *seq, tree (*valueize) (tree)) { + code_helper op; tree_code code = conditional_internal_fn_code (ifn); - if (code == ERROR_MARK) - return false; + if (code != ERROR_MARK) + op = code; + else + { + ifn = get_unconditional_internal_fn (ifn); + if (ifn == IFN_LAST) + return false; + op = as_combined_fn (ifn); + } unsigned int num_ops = res_op->num_ops; gimple_match_op cond_op (gimple_match_cond (res_op->ops[0], res_op->ops[num_ops - 1]), - code, res_op->type, num_ops - 2); + op, res_op->type, num_ops - 2); for (unsigned int i = 1; i < num_ops - 1; ++i) cond_op.ops[i - 1] = res_op->ops[i]; switch (num_ops - 2) @@ -761,6 +806,10 @@ try_conditional_simplification (internal if (!gimple_resimplify2 (seq, &cond_op, valueize)) return false; break; + case 3: + if (!gimple_resimplify3 (seq, &cond_op, valueize)) + return false; + break; default: gcc_unreachable (); } @@ -893,7 +942,7 @@ gimple_simplify (gimple *stmt, gimple_ma /* ??? This way we can't simplify calls with side-effects. */ if (gimple_call_lhs (stmt) != NULL_TREE && gimple_call_num_args (stmt) >= 1 - && gimple_call_num_args (stmt) <= 4) + && gimple_call_num_args (stmt) <= 5) { bool valueized = false; combined_fn cfn; @@ -943,6 +992,9 @@ gimple_simplify (gimple *stmt, gimple_ma case 4: return (gimple_resimplify4 (seq, res_op, valueize) || valueized); + case 5: + return (gimple_resimplify5 (seq, res_op, valueize) + || valueized); default: gcc_unreachable (); } Index: gcc/match.pd =================================================================== --- gcc/match.pd 2018-07-12 12:41:44.920365549 +0100 +++ gcc/match.pd 2018-07-12 12:42:44.368856626 +0100 @@ -86,6 +86,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) IFN_COND_MUL IFN_COND_DIV IFN_COND_MOD IFN_COND_RDIV IFN_COND_MIN IFN_COND_MAX IFN_COND_AND IFN_COND_IOR IFN_COND_XOR) + +/* Same for ternary operations. */ +(define_operator_list UNCOND_TERNARY + IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS) +(define_operator_list COND_TERNARY + IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS) /* As opposed to convert?, this still creates a single pattern, so it is not a suitable replacement for convert? in all cases. */ @@ -4885,6 +4891,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (element_precision (type) == element_precision (op_type)) (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1))))))) +/* Same for ternary operations. */ +(for uncond_op (UNCOND_TERNARY) + cond_op (COND_TERNARY) + (simplify + (vec_cond @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4) + (with { tree op_type = TREE_TYPE (@5); } + (if (element_precision (type) == element_precision (op_type)) + (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4)))))) + (simplify + (vec_cond @0 @1 (view_convert? (uncond_op@5 @2 @3 @4))) + (with { tree op_type = TREE_TYPE (@5); } + (if (element_precision (type) == element_precision (op_type)) + (view_convert (cond_op (bit_not @0) @2 @3 @4 + (view_convert:op_type @1))))))) + /* Detect cases in which a VEC_COND_EXPR effectively replaces the "else" value of an IFN_COND_*. */ (for cond_op (COND_BINARY) @@ -4893,3 +4914,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (with { tree op_type = TREE_TYPE (@3); } (if (element_precision (type) == element_precision (op_type)) (view_convert (cond_op @0 @1 @2 (view_convert:op_type @4))))))) + +/* Same for ternary operations. */ +(for cond_op (COND_TERNARY) + (simplify + (vec_cond @0 (view_convert? (cond_op @0 @1 @2 @3 @4)) @5) + (with { tree op_type = TREE_TYPE (@4); } + (if (element_precision (type) == element_precision (op_type)) + (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @5))))))) Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2018-07-12 12:41:44.915485958 +0100 +++ gcc/config/aarch64/aarch64.c 2018-07-12 12:42:44.365009753 +0100 @@ -1320,14 +1320,18 @@ aarch64_get_mask_mode (poly_uint64 nunit return default_get_mask_mode (nunits, nbytes); } -/* Implement TARGET_PREFERRED_ELSE_VALUE. Prefer to use the first - arithmetic operand as the else value if the else value doesn't matter, - since that exactly matches the SVE destructive merging form. */ +/* Implement TARGET_PREFERRED_ELSE_VALUE. For binary operations, + prefer to use the first arithmetic operand as the else value if + the else value doesn't matter, since that exactly matches the SVE + destructive merging form. For ternary operations we could either + pick the first operand and use FMAD-like instructions or the last + operand and use FMLA-like instructions; the latter seems more + natural. */ static tree -aarch64_preferred_else_value (unsigned, tree, unsigned int, tree *ops) +aarch64_preferred_else_value (unsigned, tree, unsigned int nops, tree *ops) { - return ops[0]; + return nops == 3 ? ops[2] : ops[0]; } /* Implement TARGET_HARD_REGNO_NREGS. */ Index: gcc/config/aarch64/iterators.md =================================================================== --- gcc/config/aarch64/iterators.md 2018-07-12 12:39:29.421374713 +0100 +++ gcc/config/aarch64/iterators.md 2018-07-12 12:42:44.365009753 +0100 @@ -471,6 +471,10 @@ (define_c_enum "unspec" UNSPEC_COND_DIV ; Used in aarch64-sve.md. UNSPEC_COND_MAX ; Used in aarch64-sve.md. UNSPEC_COND_MIN ; Used in aarch64-sve.md. + UNSPEC_COND_FMLA ; Used in aarch64-sve.md. + UNSPEC_COND_FMLS ; Used in aarch64-sve.md. + UNSPEC_COND_FNMLA ; Used in aarch64-sve.md. + UNSPEC_COND_FNMLS ; Used in aarch64-sve.md. UNSPEC_COND_LT ; Used in aarch64-sve.md. UNSPEC_COND_LE ; Used in aarch64-sve.md. UNSPEC_COND_EQ ; Used in aarch64-sve.md. @@ -1567,6 +1571,11 @@ (define_int_iterator SVE_COND_FP_BINARY UNSPEC_COND_MUL UNSPEC_COND_DIV UNSPEC_COND_MAX UNSPEC_COND_MIN]) +(define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA + UNSPEC_COND_FMLS + UNSPEC_COND_FNMLA + UNSPEC_COND_FNMLS]) + (define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_LT UNSPEC_COND_LE UNSPEC_COND_EQ UNSPEC_COND_NE UNSPEC_COND_GE UNSPEC_COND_GT]) @@ -1599,7 +1608,11 @@ (define_int_attr optab [(UNSPEC_ANDF "an (UNSPEC_COND_MUL "mul") (UNSPEC_COND_DIV "div") (UNSPEC_COND_MAX "smax") - (UNSPEC_COND_MIN "smin")]) + (UNSPEC_COND_MIN "smin") + (UNSPEC_COND_FMLA "fma") + (UNSPEC_COND_FMLS "fnma") + (UNSPEC_COND_FNMLA "fnms") + (UNSPEC_COND_FNMLS "fms")]) (define_int_attr maxmin_uns [(UNSPEC_UMAXV "umax") (UNSPEC_UMINV "umin") @@ -1826,6 +1839,16 @@ (define_int_attr sve_fp_op_rev [(UNSPEC_ (UNSPEC_COND_MAX "fmaxnm") (UNSPEC_COND_MIN "fminnm")]) +(define_int_attr sve_fmla_op [(UNSPEC_COND_FMLA "fmla") + (UNSPEC_COND_FMLS "fmls") + (UNSPEC_COND_FNMLA "fnmla") + (UNSPEC_COND_FNMLS "fnmls")]) + +(define_int_attr sve_fmad_op [(UNSPEC_COND_FMLA "fmad") + (UNSPEC_COND_FMLS "fmsb") + (UNSPEC_COND_FNMLA "fnmad") + (UNSPEC_COND_FNMLS "fnmsb")]) + (define_int_attr commutative [(UNSPEC_COND_ADD "true") (UNSPEC_COND_SUB "false") (UNSPEC_COND_MUL "true") Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2018-07-12 12:39:29.423369885 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2018-07-12 12:42:44.360201163 +0100 @@ -2906,6 +2906,101 @@ (define_insn_and_split "*cond_<optab><mo UNSPEC_SEL))] ) +;; Predicated floating-point ternary operations with select. +(define_expand "cond_<optab><mode>" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_operand:<VPRED> 1 "register_operand") + (unspec:SVE_F + [(match_operand:SVE_F 2 "register_operand") + (match_operand:SVE_F 3 "register_operand") + (match_operand:SVE_F 4 "register_operand")] + SVE_COND_FP_TERNARY) + (match_operand:SVE_F 5 "aarch64_simd_reg_or_zero")] + UNSPEC_SEL))] + "TARGET_SVE" +{ + /* Swap the multiplication operands if the fallback value is the + second of the two. */ + if (rtx_equal_p (operands[3], operands[5])) + std::swap (operands[2], operands[3]); +}) + +;; Predicated floating-point ternary operations using the FMAD-like form. +(define_insn "*cond_<optab><mode>_2" + [(set (match_operand:SVE_F 0 "register_operand" "=w, ?&w") + (unspec:SVE_F + [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl") + (unspec:SVE_F + [(match_operand:SVE_F 2 "register_operand" "0, w") + (match_operand:SVE_F 3 "register_operand" "w, w") + (match_operand:SVE_F 4 "register_operand" "w, w")] + SVE_COND_FP_TERNARY) + (match_dup 2)] + UNSPEC_SEL))] + "TARGET_SVE" + "@ + <sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype> + movprfx\t%0, %2\;<sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>" + [(set_attr "movprfx" "*,yes")] +) + +;; Predicated floating-point ternary operations using the FMLA-like form. +(define_insn "*cond_<optab><mode>_4" + [(set (match_operand:SVE_F 0 "register_operand" "=w, ?&w") + (unspec:SVE_F + [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl") + (unspec:SVE_F + [(match_operand:SVE_F 2 "register_operand" "w, w") + (match_operand:SVE_F 3 "register_operand" "w, w") + (match_operand:SVE_F 4 "register_operand" "0, w")] + SVE_COND_FP_TERNARY) + (match_dup 4)] + UNSPEC_SEL))] + "TARGET_SVE" + "@ + <sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype> + movprfx\t%0, %4\;<sve_fmad_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>" + [(set_attr "movprfx" "*,yes")] +) + +;; Predicated floating-point ternary operations in which the value for +;; inactive lanes is distinct from the other inputs. +(define_insn_and_split "*cond_<optab><mode>_any" + [(set (match_operand:SVE_F 0 "register_operand" "=&w, &w, ?&w") + (unspec:SVE_F + [(match_operand:<VPRED> 1 "register_operand" "Upl, Upl, Upl") + (unspec:SVE_F + [(match_operand:SVE_F 2 "register_operand" "w, w, w") + (match_operand:SVE_F 3 "register_operand" "w, w, w") + (match_operand:SVE_F 4 "register_operand" "w, w, w")] + SVE_COND_FP_TERNARY) + (match_operand:SVE_F 5 "aarch64_simd_reg_or_zero" "Dz, 0, w")] + UNSPEC_SEL))] + "TARGET_SVE + && !rtx_equal_p (operands[2], operands[5]) + && !rtx_equal_p (operands[3], operands[5]) + && !rtx_equal_p (operands[4], operands[5])" + "@ + movprfx\t%0.<Vetype>, %1/z, %4.<Vetype>\;<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype> + movprfx\t%0.<Vetype>, %1/m, %4.<Vetype>\;<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype> + #" + "&& reload_completed + && !CONSTANT_P (operands[5]) + && !rtx_equal_p (operands[0], operands[5])" + [(set (match_dup 0) + (unspec:SVE_F [(match_dup 1) (match_dup 4) (match_dup 5)] UNSPEC_SEL)) + (set (match_dup 0) + (unspec:SVE_F + [(match_dup 1) + (unspec:SVE_F [(match_dup 2) (match_dup 3) (match_dup 0)] + SVE_COND_FP_TERNARY) + (match_dup 0)] + UNSPEC_SEL))] + "" + [(set_attr "movprfx" "yes")] +) + ;; Shift an SVE vector left and insert a scalar into element 0. (define_insn "vec_shl_insert_<mode>" [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w") Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c =================================================================== --- /dev/null 2018-06-15 06:21:49.150735301 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c 2018-07-12 12:42:44.368856626 +0100 @@ -0,0 +1,63 @@ +/* { dg-require-effective-target scalar_all_fma } */ +/* { dg-additional-options "-fdump-tree-optimized" } */ + +#include "tree-vect.h" + +#define N (VECTOR_BITS * 11 / 64 + 3) + +#define DEF(INV) \ + void __attribute__ ((noipa)) \ + f_##INV (double *restrict a, double *restrict b, \ + double *restrict c, double *restrict d) \ + { \ + for (int i = 0; i < N; ++i) \ + { \ + double mb = (INV & 1 ? -b[i] : b[i]); \ + double mc = c[i]; \ + double md = (INV & 2 ? -d[i] : d[i]); \ + double fma = __builtin_fma (mb, mc, md); \ + double truev = (INV & 4 ? -fma : fma); \ + a[i] = b[i] < 10 ? truev : 10.0; \ + } \ + } + +#define TEST(INV) \ + { \ + f_##INV (a, b, c, d); \ + for (int i = 0; i < N; ++i) \ + { \ + double mb = (INV & 1 ? -b[i] : b[i]); \ + double mc = c[i]; \ + double md = (INV & 2 ? -d[i] : d[i]); \ + double fma = __builtin_fma (mb, mc, md); \ + double truev = (INV & 4 ? -fma : fma); \ + if (a[i] != (i % 17 < 10 ? truev : 10.0)) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +#define FOR_EACH_INV(T) \ + T (0) T (1) T (2) T (3) T (4) T (5) T (6) T (7) + +FOR_EACH_INV (DEF) + +int +main (void) +{ + double a[N], b[N], c[N], d[N]; + for (int i = 0; i < N; ++i) + { + b[i] = i % 17; + c[i] = i % 9 + 11; + d[i] = i % 13 + 14; + asm volatile ("" ::: "memory"); + } + FOR_EACH_INV (TEST) + return 0; +} + +/* { dg-final { scan-tree-dump-times { = \.COND_FMA } 2 "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump-times { = \.COND_FMS } 2 "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump-times { = \.COND_FNMA } 2 "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump-times { = \.COND_FNMS } 2 "optimized" { target vect_double_cond_arith } } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_13.c =================================================================== --- /dev/null 2018-06-15 06:21:49.150735301 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_13.c 2018-07-12 12:42:44.368856626 +0100 @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define N 119 + +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + void __attribute__ ((noipa)) \ + f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b, \ + TYPE *restrict c, TYPE *restrict d, \ + CMPTYPE *restrict cond) \ + { \ + for (int i = 0; i < N; ++i) \ + { \ + TYPE mb = (INV & 1 ? -b[i] : b[i]); \ + TYPE mc = c[i]; \ + TYPE md = (INV & 2 ? -d[i] : d[i]); \ + TYPE fma = __builtin_fma##SUFFIX (mb, mc, md); \ + TYPE truev = (INV & 4 ? -fma : fma); \ + a[i] = cond[i] < 10 ? truev : b[i]; \ + } \ + } + +#define FOR_EACH_TYPE(T, INV) \ + T (INV, _Float16, short, f16) \ + T (INV, float, float, f32) \ + T (INV, double, double, f64) + +#define FOR_EACH_INV(T) \ + FOR_EACH_TYPE (T, 0) \ + FOR_EACH_TYPE (T, 1) \ + FOR_EACH_TYPE (T, 2) \ + FOR_EACH_TYPE (T, 3) \ + FOR_EACH_TYPE (T, 4) \ + FOR_EACH_TYPE (T, 5) \ + FOR_EACH_TYPE (T, 6) \ + FOR_EACH_TYPE (T, 7) + +FOR_EACH_INV (DEF_LOOP) + +/* { dg-final { scan-assembler-not {\tsel\t} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */ + +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_13_run.c =================================================================== --- /dev/null 2018-06-15 06:21:49.150735301 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_13_run.c 2018-07-12 12:42:44.368856626 +0100 @@ -0,0 +1,37 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "vcond_13.c" + +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + { \ + TYPE a[N], b[N], c[N], d[N]; \ + CMPTYPE cond[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + b[i] = i % 15; \ + c[i] = i % 9 + 11; \ + d[i] = i % 13 + 14; \ + cond[i] = i % 17; \ + asm volatile ("" ::: "memory"); \ + } \ + f_##INV##_##SUFFIX (a, b, c, d, cond); \ + for (int i = 0; i < N; ++i) \ + { \ + double mb = (INV & 1 ? -b[i] : b[i]); \ + double mc = c[i]; \ + double md = (INV & 2 ? -d[i] : d[i]); \ + double fma = __builtin_fma (mb, mc, md); \ + double truev = (INV & 4 ? -fma : fma); \ + if (a[i] != (i % 17 < 10 ? truev : b[i])) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int +main (void) +{ + FOR_EACH_INV (TEST_LOOP) + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_14.c =================================================================== --- /dev/null 2018-06-15 06:21:49.150735301 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_14.c 2018-07-12 12:42:44.368856626 +0100 @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define N 119 + +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + void __attribute__ ((noipa)) \ + f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b, \ + TYPE *restrict c, TYPE *restrict d, \ + CMPTYPE *restrict cond) \ + { \ + for (int i = 0; i < N; ++i) \ + { \ + TYPE mb = (INV & 1 ? -b[i] : b[i]); \ + TYPE mc = c[i]; \ + TYPE md = (INV & 2 ? -d[i] : d[i]); \ + TYPE fma = __builtin_fma##SUFFIX (mb, mc, md); \ + TYPE truev = (INV & 4 ? -fma : fma); \ + a[i] = cond[i] < 10 ? truev : c[i]; \ + } \ + } + +#define FOR_EACH_TYPE(T, INV) \ + T (INV, _Float16, short, f16) \ + T (INV, float, float, f32) \ + T (INV, double, double, f64) + +#define FOR_EACH_INV(T) \ + FOR_EACH_TYPE (T, 0) \ + FOR_EACH_TYPE (T, 1) \ + FOR_EACH_TYPE (T, 2) \ + FOR_EACH_TYPE (T, 3) \ + FOR_EACH_TYPE (T, 4) \ + FOR_EACH_TYPE (T, 5) \ + FOR_EACH_TYPE (T, 6) \ + FOR_EACH_TYPE (T, 7) + +FOR_EACH_INV (DEF_LOOP) + +/* { dg-final { scan-assembler-not {\tsel\t} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */ + +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_14_run.c =================================================================== --- /dev/null 2018-06-15 06:21:49.150735301 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_14_run.c 2018-07-12 12:42:44.368856626 +0100 @@ -0,0 +1,37 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "vcond_14.c" + +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + { \ + TYPE a[N], b[N], c[N], d[N]; \ + CMPTYPE cond[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + b[i] = i % 15; \ + c[i] = i % 9 + 11; \ + d[i] = i % 13 + 14; \ + cond[i] = i % 17; \ + asm volatile ("" ::: "memory"); \ + } \ + f_##INV##_##SUFFIX (a, b, c, d, cond); \ + for (int i = 0; i < N; ++i) \ + { \ + double mb = (INV & 1 ? -b[i] : b[i]); \ + double mc = c[i]; \ + double md = (INV & 2 ? -d[i] : d[i]); \ + double fma = __builtin_fma (mb, mc, md); \ + double truev = (INV & 4 ? -fma : fma); \ + if (a[i] != (i % 17 < 10 ? truev : c[i])) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int +main (void) +{ + FOR_EACH_INV (TEST_LOOP) + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_15.c =================================================================== --- /dev/null 2018-06-15 06:21:49.150735301 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_15.c 2018-07-12 12:42:44.369818344 +0100 @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define N 119 + +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + void __attribute__ ((noipa)) \ + f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b, \ + TYPE *restrict c, TYPE *restrict d, \ + CMPTYPE *restrict cond) \ + { \ + for (int i = 0; i < N; ++i) \ + { \ + TYPE mb = (INV & 1 ? -b[i] : b[i]); \ + TYPE mc = c[i]; \ + TYPE md = (INV & 2 ? -d[i] : d[i]); \ + TYPE fma = __builtin_fma##SUFFIX (mb, mc, md); \ + TYPE truev = (INV & 4 ? -fma : fma); \ + a[i] = cond[i] < 10 ? truev : d[i]; \ + } \ + } + +#define FOR_EACH_TYPE(T, INV) \ + T (INV, _Float16, short, f16) \ + T (INV, float, float, f32) \ + T (INV, double, double, f64) + +#define FOR_EACH_INV(T) \ + FOR_EACH_TYPE (T, 0) \ + FOR_EACH_TYPE (T, 1) \ + FOR_EACH_TYPE (T, 2) \ + FOR_EACH_TYPE (T, 3) \ + FOR_EACH_TYPE (T, 4) \ + FOR_EACH_TYPE (T, 5) \ + FOR_EACH_TYPE (T, 6) \ + FOR_EACH_TYPE (T, 7) + +FOR_EACH_INV (DEF_LOOP) + +/* { dg-final { scan-assembler-not {\tsel\t} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */ + +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.d,} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_15_run.c =================================================================== --- /dev/null 2018-06-15 06:21:49.150735301 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_15_run.c 2018-07-12 12:42:44.369818344 +0100 @@ -0,0 +1,37 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "vcond_15.c" + +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + { \ + TYPE a[N], b[N], c[N], d[N]; \ + CMPTYPE cond[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + b[i] = i % 15; \ + c[i] = i % 9 + 11; \ + d[i] = i % 13 + 14; \ + cond[i] = i % 17; \ + asm volatile ("" ::: "memory"); \ + } \ + f_##INV##_##SUFFIX (a, b, c, d, cond); \ + for (int i = 0; i < N; ++i) \ + { \ + double mb = (INV & 1 ? -b[i] : b[i]); \ + double mc = c[i]; \ + double md = (INV & 2 ? -d[i] : d[i]); \ + double fma = __builtin_fma (mb, mc, md); \ + double truev = (INV & 4 ? -fma : fma); \ + if (a[i] != (i % 17 < 10 ? truev : d[i])) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int +main (void) +{ + FOR_EACH_INV (TEST_LOOP) + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_16.c =================================================================== --- /dev/null 2018-06-15 06:21:49.150735301 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_16.c 2018-07-12 12:42:44.369818344 +0100 @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define N 119 + +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + void __attribute__ ((noipa)) \ + f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b, \ + TYPE *restrict c, TYPE *restrict d, \ + CMPTYPE *restrict cond) \ + { \ + for (int i = 0; i < N; ++i) \ + { \ + TYPE mb = (INV & 1 ? -b[i] : b[i]); \ + TYPE mc = c[i]; \ + TYPE md = (INV & 2 ? -d[i] : d[i]); \ + TYPE fma = __builtin_fma##SUFFIX (mb, mc, md); \ + TYPE truev = (INV & 4 ? -fma : fma); \ + a[i] = cond[i] < 10 ? truev : 10; \ + } \ + } + +#define FOR_EACH_TYPE(T, INV) \ + T (INV, _Float16, short, f16) \ + T (INV, float, float, f32) \ + T (INV, double, double, f64) + +#define FOR_EACH_INV(T) \ + FOR_EACH_TYPE (T, 0) \ + FOR_EACH_TYPE (T, 1) \ + FOR_EACH_TYPE (T, 2) \ + FOR_EACH_TYPE (T, 3) \ + FOR_EACH_TYPE (T, 4) \ + FOR_EACH_TYPE (T, 5) \ + FOR_EACH_TYPE (T, 6) \ + FOR_EACH_TYPE (T, 7) + +FOR_EACH_INV (DEF_LOOP) + +/* { dg-final { scan-assembler-times {\tsel\t} 24 } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */ + +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.d,} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_16_run.c =================================================================== --- /dev/null 2018-06-15 06:21:49.150735301 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_16_run.c 2018-07-12 12:42:44.369818344 +0100 @@ -0,0 +1,37 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "vcond_16.c" + +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + { \ + TYPE a[N], b[N], c[N], d[N]; \ + CMPTYPE cond[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + b[i] = i % 15; \ + c[i] = i % 9 + 11; \ + d[i] = i % 13 + 14; \ + cond[i] = i % 17; \ + asm volatile ("" ::: "memory"); \ + } \ + f_##INV##_##SUFFIX (a, b, c, d, cond); \ + for (int i = 0; i < N; ++i) \ + { \ + double mb = (INV & 1 ? -b[i] : b[i]); \ + double mc = c[i]; \ + double md = (INV & 2 ? -d[i] : d[i]); \ + double fma = __builtin_fma (mb, mc, md); \ + double truev = (INV & 4 ? -fma : fma); \ + if (a[i] != (i % 17 < 10 ? truev : 10)) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int +main (void) +{ + FOR_EACH_INV (TEST_LOOP) + return 0; +}
Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi 2018-05-24 10:12:10.142352315 +0100 +++ gcc/doc/md.texi 2018-05-24 13:05:46.047607587 +0100 @@ -6386,6 +6386,23 @@ Operands 0, 2, 3 and 4 all have mode @va integer if @var{m} is scalar, otherwise it has the mode returned by @code{TARGET_VECTORIZE_GET_MASK_MODE}. +@cindex @code{cond_fma@var{mode}} instruction pattern +@cindex @code{cond_fms@var{mode}} instruction pattern +@cindex @code{cond_fnma@var{mode}} instruction pattern +@cindex @code{cond_fnms@var{mode}} instruction pattern +@item @samp{cond_fma@var{mode}} +@itemx @samp{cond_fms@var{mode}} +@itemx @samp{cond_fnma@var{mode}} +@itemx @samp{cond_fnms@var{mode}} +Like @samp{cond_add@var{m}}, except that the conditional operation +takes 3 operands rather than two. For example, the vector form of +@samp{cond_fma@var{mode}} is equivalent to: + +@smallexample +for (i = 0; i < GET_MODE_NUNITS (@var{m}); i++) + op0[i] = op1[i] ? fma (op2[i], op3[i], op4[i]) : op5[i]; +@end smallexample + @cindex @code{neg@var{mode}cc} instruction pattern @item @samp{neg@var{mode}cc} Similar to @samp{mov@var{mode}cc} but for conditional negation. Conditionally Index: gcc/optabs.def =================================================================== --- gcc/optabs.def 2018-05-24 10:12:10.146352152 +0100 +++ gcc/optabs.def 2018-05-24 13:05:46.049605128 +0100 @@ -234,6 +234,10 @@ OPTAB_D (cond_smin_optab, "cond_smin$a") OPTAB_D (cond_smax_optab, "cond_smax$a") OPTAB_D (cond_umin_optab, "cond_umin$a") OPTAB_D (cond_umax_optab, "cond_umax$a") +OPTAB_D (cond_fma_optab, "cond_fma$a") +OPTAB_D (cond_fms_optab, "cond_fms$a") +OPTAB_D (cond_fnma_optab, "cond_fnma$a") +OPTAB_D (cond_fnms_optab, "cond_fnms$a") OPTAB_D (cmov_optab, "cmov$a6") OPTAB_D (cstore_optab, "cstore$a4") OPTAB_D (ctrap_optab, "ctrap$a4") Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def 2018-05-24 10:12:10.146352152 +0100 +++ gcc/internal-fn.def 2018-05-24 13:05:46.048606357 +0100 @@ -59,7 +59,8 @@ along with GCC; see the file COPYING3. - binary: a normal binary optab, such as vec_interleave_lo_<mode> - ternary: a normal ternary optab, such as fma<mode>4 - - cond_binary: a conditional binary optab, such as add<mode>cc + - cond_binary: a conditional binary optab, such as cond_add<mode> + - cond_ternary: a conditional ternary optab, such as cond_fma_rev<mode> - fold_left: for scalar = FN (scalar, vector), keyed off the vector mode @@ -162,6 +163,11 @@ DEF_INTERNAL_OPTAB_FN (COND_IOR, ECF_CON DEF_INTERNAL_OPTAB_FN (COND_XOR, ECF_CONST | ECF_NOTHROW, cond_xor, cond_binary) +DEF_INTERNAL_OPTAB_FN (COND_FMA, ECF_CONST, cond_fma, cond_ternary) +DEF_INTERNAL_OPTAB_FN (COND_FMS, ECF_CONST, cond_fms, cond_ternary) +DEF_INTERNAL_OPTAB_FN (COND_FNMA, ECF_CONST, cond_fnma, cond_ternary) +DEF_INTERNAL_OPTAB_FN (COND_FNMS, ECF_CONST, cond_fnms, cond_ternary) + DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, rsqrt, unary) DEF_INTERNAL_OPTAB_FN (REDUC_PLUS, ECF_CONST | ECF_NOTHROW, @@ -230,7 +236,7 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) /* Ternary math functions. */ -DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary) +DEF_INTERNAL_FLT_FLOATN_FN (FMA, ECF_CONST, fma, ternary) /* Unary integer ops. */ DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary) Index: gcc/internal-fn.h =================================================================== --- gcc/internal-fn.h 2018-05-24 10:33:30.870095164 +0100 +++ gcc/internal-fn.h 2018-05-24 13:05:46.049605128 +0100 @@ -193,7 +193,9 @@ direct_internal_fn_supported_p (internal extern bool set_edom_supported_p (void); extern internal_fn get_conditional_internal_fn (tree_code); +extern internal_fn get_conditional_internal_fn (internal_fn); extern tree_code conditional_internal_fn_code (internal_fn); +extern internal_fn get_unconditional_internal_fn (internal_fn); extern bool internal_load_fn_p (internal_fn); extern bool internal_store_fn_p (internal_fn); Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c 2018-05-24 10:33:30.870095164 +0100 +++ gcc/internal-fn.c 2018-05-24 13:05:46.048606357 +0100 @@ -113,6 +113,7 @@ #define binary_direct { 0, 0, true } #define ternary_direct { 0, 0, true } #define cond_unary_direct { 1, 1, true } #define cond_binary_direct { 1, 1, true } +#define cond_ternary_direct { 1, 1, true } #define while_direct { 0, 2, false } #define fold_extract_direct { 2, 2, false } #define fold_left_direct { 1, 1, false } @@ -2993,6 +2994,9 @@ #define expand_cond_unary_optab_fn(FN, S #define expand_cond_binary_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 4) +#define expand_cond_ternary_optab_fn(FN, STMT, OPTAB) \ + expand_direct_optab_fn (FN, STMT, OPTAB, 5) + #define expand_fold_extract_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 3) @@ -3075,6 +3079,7 @@ #define direct_binary_optab_supported_p #define direct_ternary_optab_supported_p direct_optab_supported_p #define direct_cond_unary_optab_supported_p direct_optab_supported_p #define direct_cond_binary_optab_supported_p direct_optab_supported_p +#define direct_cond_ternary_optab_supported_p direct_optab_supported_p #define direct_mask_load_optab_supported_p direct_optab_supported_p #define direct_load_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_load_lanes_optab_supported_p multi_vector_optab_supported_p @@ -3277,6 +3282,57 @@ #define CASE(CODE, IFN) case IFN: return } } +/* Invoke T(IFN) for each internal function IFN that also has an + IFN_COND_* form. */ +#define FOR_EACH_COND_FN_PAIR(T) \ + T (FMA) \ + T (FMS) \ + T (FNMA) \ + T (FNMS) + +/* Return a function that only performs internal function FN when a + certain condition is met and that uses a given fallback value otherwise. + In other words, the returned function FN' is such that: + + LHS = FN' (COND, A1, ... An, ELSE) + + is equivalent to the C expression: + + LHS = COND ? FN (A1, ..., An) : ELSE; + + operating elementwise if the operands are vectors. + + Return IFN_LAST if no such function exists. */ + +internal_fn +get_conditional_internal_fn (internal_fn fn) +{ + switch (fn) + { +#define CASE(NAME) case IFN_##NAME: return IFN_COND_##NAME; + FOR_EACH_COND_FN_PAIR(CASE) +#undef CASE + default: + return IFN_LAST; + } +} + +/* If IFN implements the conditional form of an unconditional internal + function, return that unconditional function, otherwise return IFN_LAST. */ + +internal_fn +get_unconditional_internal_fn (internal_fn ifn) +{ + switch (ifn) + { +#define CASE(NAME) case IFN_COND_##NAME: return IFN_##NAME; + FOR_EACH_COND_FN_PAIR(CASE) +#undef CASE + default: + return IFN_LAST; + } +} + /* Return true if IFN is some form of load from memory. */ bool Index: gcc/gimple-match.h =================================================================== --- gcc/gimple-match.h 2018-05-24 10:33:30.870095164 +0100 +++ gcc/gimple-match.h 2018-05-24 13:05:46.048606357 +0100 @@ -91,18 +91,21 @@ struct gimple_match_op code_helper, tree, tree, tree, tree); gimple_match_op (const gimple_match_cond &, code_helper, tree, tree, tree, tree, tree); + gimple_match_op (const gimple_match_cond &, + code_helper, tree, tree, tree, tree, tree, tree); void set_op (code_helper, tree, unsigned int); void set_op (code_helper, tree, tree); void set_op (code_helper, tree, tree, tree); void set_op (code_helper, tree, tree, tree, tree); void set_op (code_helper, tree, tree, tree, tree, tree); + void set_op (code_helper, tree, tree, tree, tree, tree, tree); void set_value (tree); tree op_or_null (unsigned int) const; /* The maximum value of NUM_OPS. */ - static const unsigned int MAX_NUM_OPS = 4; + static const unsigned int MAX_NUM_OPS = 5; /* The conditions under which the operation is performed, and the value to use as a fallback. */ @@ -182,6 +185,20 @@ gimple_match_op::gimple_match_op (const ops[3] = op3; } +inline +gimple_match_op::gimple_match_op (const gimple_match_cond &cond_in, + code_helper code_in, tree type_in, + tree op0, tree op1, tree op2, tree op3, + tree op4) + : cond (cond_in), code (code_in), type (type_in), num_ops (5) +{ + ops[0] = op0; + ops[1] = op1; + ops[2] = op2; + ops[3] = op3; + ops[4] = op4; +} + /* Change the operation performed to CODE_IN, the type of the result to TYPE_IN, and the number of operands to NUM_OPS_IN. The caller needs to set the operands itself. */ @@ -242,6 +259,20 @@ gimple_match_op::set_op (code_helper cod ops[3] = op3; } +inline void +gimple_match_op::set_op (code_helper code_in, tree type_in, + tree op0, tree op1, tree op2, tree op3, tree op4) +{ + code = code_in; + type = type_in; + num_ops = 5; + ops[0] = op0; + ops[1] = op1; + ops[2] = op2; + ops[3] = op3; + ops[4] = op4; +} + /* Set the "operation" to be the single value VALUE, such as a constant or SSA_NAME. */ @@ -279,6 +310,7 @@ bool gimple_resimplify1 (gimple_seq *, g bool gimple_resimplify2 (gimple_seq *, gimple_match_op *, tree (*)(tree)); bool gimple_resimplify3 (gimple_seq *, gimple_match_op *, tree (*)(tree)); bool gimple_resimplify4 (gimple_seq *, gimple_match_op *, tree (*)(tree)); +bool gimple_resimplify5 (gimple_seq *, gimple_match_op *, tree (*)(tree)); tree maybe_push_res_to_seq (gimple_match_op *, gimple_seq *, tree res = NULL_TREE); void maybe_build_generic_op (gimple_match_op *); Index: gcc/genmatch.c =================================================================== --- gcc/genmatch.c 2018-05-24 10:33:30.869095197 +0100 +++ gcc/genmatch.c 2018-05-24 13:05:46.048606357 +0100 @@ -3760,7 +3760,7 @@ decision_tree::gen (FILE *f, bool gimple } fprintf (stderr, "removed %u duplicate tails\n", rcnt); - for (unsigned n = 1; n <= 4; ++n) + for (unsigned n = 1; n <= 5; ++n) { /* First generate split-out functions. */ for (unsigned i = 0; i < root->kids.length (); i++) Index: gcc/gimple-match-head.c =================================================================== --- gcc/gimple-match-head.c 2018-05-24 10:33:30.870095164 +0100 +++ gcc/gimple-match-head.c 2018-05-24 13:05:46.048606357 +0100 @@ -54,6 +54,8 @@ static bool gimple_simplify (gimple_matc code_helper, tree, tree, tree, tree); static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), code_helper, tree, tree, tree, tree, tree); +static bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree), + code_helper, tree, tree, tree, tree, tree, tree); const unsigned int gimple_match_op::MAX_NUM_OPS; @@ -80,7 +82,12 @@ convert_conditional_op (gimple_match_op if (orig_op->code.is_tree_code ()) ifn = get_conditional_internal_fn ((tree_code) orig_op->code); else - return false; + { + combined_fn cfn = orig_op->code; + if (!internal_fn_p (cfn)) + return false; + ifn = get_conditional_internal_fn (as_internal_fn (cfn)); + } if (ifn == IFN_LAST) return false; unsigned int num_ops = orig_op->num_ops; @@ -347,6 +354,34 @@ gimple_resimplify4 (gimple_seq *seq, gim return false; } +/* Helper that matches and simplifies the toplevel result from + a gimple_simplify run (where we don't want to build + a stmt in case it's used in in-place folding). Replaces + RES_OP with a simplified and/or canonicalized result and + returns whether any change was made. */ + +bool +gimple_resimplify5 (gimple_seq *seq, gimple_match_op *res_op, + tree (*valueize)(tree)) +{ + /* No constant folding is defined for five-operand functions. */ + + gimple_match_op res_op2 (*res_op); + if (gimple_simplify (&res_op2, seq, valueize, + res_op->code, res_op->type, + res_op->ops[0], res_op->ops[1], res_op->ops[2], + res_op->ops[3], res_op->ops[4])) + { + *res_op = res_op2; + return true; + } + + if (maybe_resimplify_conditional_op (seq, res_op, valueize)) + return true; + + return false; +} + /* If in GIMPLE the operation described by RES_OP should be single-rhs, build a GENERIC tree for that expression and update RES_OP accordingly. */ @@ -388,7 +423,8 @@ build_call_internal (internal_fn fn, gim res_op->op_or_null (0), res_op->op_or_null (1), res_op->op_or_null (2), - res_op->op_or_null (3)); + res_op->op_or_null (3), + res_op->op_or_null (4)); } /* Push the exploded expression described by RES_OP as a statement to @@ -482,7 +518,8 @@ maybe_push_res_to_seq (gimple_match_op * res_op->op_or_null (0), res_op->op_or_null (1), res_op->op_or_null (2), - res_op->op_or_null (3)); + res_op->op_or_null (3), + res_op->op_or_null (4)); } if (!res) { @@ -689,14 +726,22 @@ do_valueize (tree op, tree (*valueize)(t try_conditional_simplification (internal_fn ifn, gimple_match_op *res_op, gimple_seq *seq, tree (*valueize) (tree)) { + code_helper op; tree_code code = conditional_internal_fn_code (ifn); - if (code == ERROR_MARK) - return false; + if (code != ERROR_MARK) + op = code; + else + { + ifn = get_unconditional_internal_fn (ifn); + if (ifn == IFN_LAST) + return false; + op = as_combined_fn (ifn); + } unsigned int num_ops = res_op->num_ops; gimple_match_op cond_op (gimple_match_cond (res_op->ops[0], res_op->ops[num_ops - 1]), - code, res_op->type, num_ops - 2); + op, res_op->type, num_ops - 2); for (unsigned int i = 1; i < num_ops - 1; ++i) cond_op.ops[i - 1] = res_op->ops[i]; switch (num_ops - 2) @@ -705,6 +750,10 @@ try_conditional_simplification (internal if (!gimple_resimplify2 (seq, &cond_op, valueize)) return false; break; + case 3: + if (!gimple_resimplify3 (seq, &cond_op, valueize)) + return false; + break; default: gcc_unreachable (); } @@ -837,7 +886,7 @@ gimple_simplify (gimple *stmt, gimple_ma /* ??? This way we can't simplify calls with side-effects. */ if (gimple_call_lhs (stmt) != NULL_TREE && gimple_call_num_args (stmt) >= 1 - && gimple_call_num_args (stmt) <= 4) + && gimple_call_num_args (stmt) <= 5) { bool valueized = false; combined_fn cfn; @@ -887,6 +936,9 @@ gimple_simplify (gimple *stmt, gimple_ma case 4: return (gimple_resimplify4 (seq, res_op, valueize) || valueized); + case 5: + return (gimple_resimplify5 (seq, res_op, valueize) + || valueized); default: gcc_unreachable (); } Index: gcc/match.pd =================================================================== --- gcc/match.pd 2018-05-24 10:33:30.870095164 +0100 +++ gcc/match.pd 2018-05-24 13:05:46.049605128 +0100 @@ -86,6 +86,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) IFN_COND_MUL IFN_COND_DIV IFN_COND_MOD IFN_COND_RDIV IFN_COND_MIN IFN_COND_MAX IFN_COND_AND IFN_COND_IOR IFN_COND_XOR) + +/* Same for ternary operations. */ +(define_operator_list UNCOND_TERNARY + IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS) +(define_operator_list COND_TERNARY + IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS) /* As opposed to convert?, this still creates a single pattern, so it is not a suitable replacement for convert? in all cases. */ @@ -4798,6 +4804,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (if (element_precision (type) == element_precision (op_type)) (view_convert (cond_op (bit_not @0) @2 @3 (view_convert:op_type @1))))))) +/* Same for ternary operations. */ +(for uncond_op (UNCOND_TERNARY) + cond_op (COND_TERNARY) + (simplify + (vec_cond @0 (view_convert? (uncond_op@5 @1 @2 @3)) @4) + (with { tree op_type = TREE_TYPE (@5); } + (if (element_precision (type) == element_precision (op_type)) + (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @4)))))) + (simplify + (vec_cond @0 @1 (view_convert? (uncond_op@5 @2 @3 @4))) + (with { tree op_type = TREE_TYPE (@5); } + (if (element_precision (type) == element_precision (op_type)) + (view_convert (cond_op (bit_not @0) @2 @3 @4 + (view_convert:op_type @1))))))) + /* Detect cases in which a VEC_COND_EXPR effectively replaces the "else" value of an IFN_COND_*. */ (for cond_op (COND_BINARY) @@ -4806,3 +4827,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (with { tree op_type = TREE_TYPE (@3); } (if (element_precision (type) == element_precision (op_type)) (view_convert (cond_op @0 @1 @2 (view_convert:op_type @4))))))) + +/* Same for ternary operations. */ +(for cond_op (COND_TERNARY) + (simplify + (vec_cond @0 (view_convert? (cond_op @0 @1 @2 @3 @4)) @5) + (with { tree op_type = TREE_TYPE (@4); } + (if (element_precision (type) == element_precision (op_type)) + (view_convert (cond_op @0 @1 @2 @3 (view_convert:op_type @5))))))) Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2018-05-24 10:33:30.867095262 +0100 +++ gcc/config/aarch64/aarch64.c 2018-05-24 13:05:46.046608817 +0100 @@ -1292,14 +1292,18 @@ aarch64_get_mask_mode (poly_uint64 nunit return default_get_mask_mode (nunits, nbytes); } -/* Implement TARGET_PREFERRED_ELSE_VALUE. Prefer to use the first - arithmetic operand as the else value if the else value doesn't matter, - since that exactly matches the SVE destructive merging form. */ +/* Implement TARGET_PREFERRED_ELSE_VALUE. For binary operations, + prefer to use the first arithmetic operand as the else value if + the else value doesn't matter, since that exactly matches the SVE + destructive merging form. For ternary operations we could either + pick the first operand and use FMAD-like instructions or the last + operand and use FMLA-like instructions; the latter seems more + natural. */ static tree -aarch64_preferred_else_value (unsigned, tree, unsigned int, tree *ops) +aarch64_preferred_else_value (unsigned, tree, unsigned int nops, tree *ops) { - return ops[0]; + return nops == 3 ? ops[2] : ops[0]; } /* Implement TARGET_HARD_REGNO_NREGS. */ Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2018-05-24 10:12:10.141352356 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2018-05-24 13:05:46.044611277 +0100 @@ -2688,6 +2688,58 @@ (define_insn "*cond_<optab><mode>" "<sve_fp_op>r\t%0.<Vetype>, %1/m, %0.<Vetype>, %2.<Vetype>" ) +;; Predicated floating-point ternary operations with select. +(define_expand "cond_<optab><mode>" + [(set (match_operand:SVE_F 0 "register_operand") + (unspec:SVE_F + [(match_operand:<VPRED> 1 "register_operand") + (unspec:SVE_F + [(match_dup 1) + (match_operand:SVE_F 2 "register_operand") + (match_operand:SVE_F 3 "register_operand") + (match_operand:SVE_F 4 "register_operand")] + SVE_COND_FP_TERNARY) + (match_operand:SVE_F 5 "register_operand")] + UNSPEC_SEL))] + "TARGET_SVE" +{ + aarch64_sve_prepare_conditional_op (operands, 6, true); +}) + +;; Predicated floating-point ternary operations using the FMAD-like form. +(define_insn "*cond_<optab><mode>" + [(set (match_operand:SVE_F 0 "register_operand" "=w") + (unspec:SVE_F + [(match_operand:<VPRED> 1 "register_operand" "Upl") + (unspec:SVE_F + [(match_dup 1) + (match_operand:SVE_F 2 "register_operand" "0") + (match_operand:SVE_F 3 "register_operand" "w") + (match_operand:SVE_F 4 "register_operand" "w")] + SVE_COND_FP_TERNARY) + (match_dup 2)] + UNSPEC_SEL))] + "TARGET_SVE" + "<sve_fmad_op>\t%0.<Vetype>, %1/m, %3.<Vetype>, %4.<Vetype>" +) + +;; Predicated floating-point ternary operations using the FMLA-like form. +(define_insn "*cond_<optab><mode>_acc" + [(set (match_operand:SVE_F 0 "register_operand" "=w") + (unspec:SVE_F + [(match_operand:<VPRED> 1 "register_operand" "Upl") + (unspec:SVE_F + [(match_dup 1) + (match_operand:SVE_F 2 "register_operand" "w") + (match_operand:SVE_F 3 "register_operand" "w") + (match_operand:SVE_F 4 "register_operand" "0")] + SVE_COND_FP_TERNARY) + (match_dup 4)] + UNSPEC_SEL))] + "TARGET_SVE" + "<sve_fmla_op>\t%0.<Vetype>, %1/m, %2.<Vetype>, %3.<Vetype>" +) + ;; Shift an SVE vector left and insert a scalar into element 0. (define_insn "vec_shl_insert_<mode>" [(set (match_operand:SVE_ALL 0 "register_operand" "=w, w") Index: gcc/config/aarch64/iterators.md =================================================================== --- gcc/config/aarch64/iterators.md 2018-05-24 10:12:10.142352315 +0100 +++ gcc/config/aarch64/iterators.md 2018-05-24 13:05:46.046608817 +0100 @@ -468,6 +468,10 @@ (define_c_enum "unspec" UNSPEC_COND_DIV ; Used in aarch64-sve.md. UNSPEC_COND_MAX ; Used in aarch64-sve.md. UNSPEC_COND_MIN ; Used in aarch64-sve.md. + UNSPEC_COND_FMLA ; Used in aarch64-sve.md. + UNSPEC_COND_FMLS ; Used in aarch64-sve.md. + UNSPEC_COND_FNMLA ; Used in aarch64-sve.md. + UNSPEC_COND_FNMLS ; Used in aarch64-sve.md. UNSPEC_COND_LT ; Used in aarch64-sve.md. UNSPEC_COND_LE ; Used in aarch64-sve.md. UNSPEC_COND_EQ ; Used in aarch64-sve.md. @@ -1549,6 +1553,11 @@ (define_int_iterator SVE_COND_FP_BINARY (define_int_iterator SVE_COND_FP_BINARY_REV [UNSPEC_COND_SUB UNSPEC_COND_DIV]) +(define_int_iterator SVE_COND_FP_TERNARY [UNSPEC_COND_FMLA + UNSPEC_COND_FMLS + UNSPEC_COND_FNMLA + UNSPEC_COND_FNMLS]) + (define_int_iterator SVE_COND_FP_CMP [UNSPEC_COND_LT UNSPEC_COND_LE UNSPEC_COND_EQ UNSPEC_COND_NE UNSPEC_COND_GE UNSPEC_COND_GT]) @@ -1581,7 +1590,11 @@ (define_int_attr optab [(UNSPEC_ANDF "an (UNSPEC_COND_MUL "mul") (UNSPEC_COND_DIV "div") (UNSPEC_COND_MAX "smax") - (UNSPEC_COND_MIN "smin")]) + (UNSPEC_COND_MIN "smin") + (UNSPEC_COND_FMLA "fma") + (UNSPEC_COND_FMLS "fnma") + (UNSPEC_COND_FNMLA "fnms") + (UNSPEC_COND_FNMLS "fms")]) (define_int_attr maxmin_uns [(UNSPEC_UMAXV "umax") (UNSPEC_UMINV "umin") @@ -1799,6 +1812,16 @@ (define_int_attr sve_fp_op [(UNSPEC_COND (UNSPEC_COND_MAX "fmaxnm") (UNSPEC_COND_MIN "fminnm")]) +(define_int_attr sve_fmla_op [(UNSPEC_COND_FMLA "fmla") + (UNSPEC_COND_FMLS "fmls") + (UNSPEC_COND_FNMLA "fnmla") + (UNSPEC_COND_FNMLS "fnmls")]) + +(define_int_attr sve_fmad_op [(UNSPEC_COND_FMLA "fmad") + (UNSPEC_COND_FMLS "fmsb") + (UNSPEC_COND_FNMLA "fnmad") + (UNSPEC_COND_FNMLS "fnmsb")]) + (define_int_attr commutative [(UNSPEC_COND_ADD "true") (UNSPEC_COND_SUB "false") (UNSPEC_COND_MUL "true") Index: gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/vect/vect-cond-arith-3.c 2018-05-24 13:05:46.049605128 +0100 @@ -0,0 +1,63 @@ +/* { dg-require-effective-target scalar_all_fma } */ +/* { dg-additional-options "-fdump-tree-optimized" } */ + +#include "tree-vect.h" + +#define N (VECTOR_BITS * 11 / 64 + 3) + +#define DEF(INV) \ + void __attribute__ ((noipa)) \ + f_##INV (double *restrict a, double *restrict b, \ + double *restrict c, double *restrict d) \ + { \ + for (int i = 0; i < N; ++i) \ + { \ + double mb = (INV & 1 ? -b[i] : b[i]); \ + double mc = c[i]; \ + double md = (INV & 2 ? -d[i] : d[i]); \ + double fma = __builtin_fma (mb, mc, md); \ + double truev = (INV & 4 ? -fma : fma); \ + a[i] = b[i] < 10 ? truev : 10.0; \ + } \ + } + +#define TEST(INV) \ + { \ + f_##INV (a, b, c, d); \ + for (int i = 0; i < N; ++i) \ + { \ + double mb = (INV & 1 ? -b[i] : b[i]); \ + double mc = c[i]; \ + double md = (INV & 2 ? -d[i] : d[i]); \ + double fma = __builtin_fma (mb, mc, md); \ + double truev = (INV & 4 ? -fma : fma); \ + if (a[i] != (i % 17 < 10 ? truev : 10.0)) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +#define FOR_EACH_INV(T) \ + T (0) T (1) T (2) T (3) T (4) T (5) T (6) T (7) + +FOR_EACH_INV (DEF) + +int +main (void) +{ + double a[N], b[N], c[N], d[N]; + for (int i = 0; i < N; ++i) + { + b[i] = i % 17; + c[i] = i % 9 + 11; + d[i] = i % 13 + 14; + asm volatile ("" ::: "memory"); + } + FOR_EACH_INV (TEST) + return 0; +} + +/* { dg-final { scan-tree-dump-times { = \.COND_FMA } 2 "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump-times { = \.COND_FMS } 2 "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump-times { = \.COND_FNMA } 2 "optimized" { target vect_double_cond_arith } } } */ +/* { dg-final { scan-tree-dump-times { = \.COND_FNMS } 2 "optimized" { target vect_double_cond_arith } } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_13.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_13.c 2018-05-24 13:05:46.049605128 +0100 @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define N 119 + +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + void __attribute__ ((noipa)) \ + f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b, \ + TYPE *restrict c, TYPE *restrict d, \ + CMPTYPE *restrict cond) \ + { \ + for (int i = 0; i < N; ++i) \ + { \ + TYPE mb = (INV & 1 ? -b[i] : b[i]); \ + TYPE mc = c[i]; \ + TYPE md = (INV & 2 ? -d[i] : d[i]); \ + TYPE fma = __builtin_fma##SUFFIX (mb, mc, md); \ + TYPE truev = (INV & 4 ? -fma : fma); \ + a[i] = cond[i] < 10 ? truev : b[i]; \ + } \ + } + +#define FOR_EACH_TYPE(T, INV) \ + T (INV, _Float16, short, f16) \ + T (INV, float, float, f32) \ + T (INV, double, double, f64) + +#define FOR_EACH_INV(T) \ + FOR_EACH_TYPE (T, 0) \ + FOR_EACH_TYPE (T, 1) \ + FOR_EACH_TYPE (T, 2) \ + FOR_EACH_TYPE (T, 3) \ + FOR_EACH_TYPE (T, 4) \ + FOR_EACH_TYPE (T, 5) \ + FOR_EACH_TYPE (T, 6) \ + FOR_EACH_TYPE (T, 7) + +FOR_EACH_INV (DEF_LOOP) + +/* { dg-final { scan-assembler-not {\tsel\t} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */ + +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_13_run.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_13_run.c 2018-05-24 13:05:46.050603898 +0100 @@ -0,0 +1,37 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "vcond_13.c" + +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + { \ + TYPE a[N], b[N], c[N], d[N]; \ + CMPTYPE cond[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + b[i] = i % 15; \ + c[i] = i % 9 + 11; \ + d[i] = i % 13 + 14; \ + cond[i] = i % 17; \ + asm volatile ("" ::: "memory"); \ + } \ + f_##INV##_##SUFFIX (a, b, c, d, cond); \ + for (int i = 0; i < N; ++i) \ + { \ + double mb = (INV & 1 ? -b[i] : b[i]); \ + double mc = c[i]; \ + double md = (INV & 2 ? -d[i] : d[i]); \ + double fma = __builtin_fma (mb, mc, md); \ + double truev = (INV & 4 ? -fma : fma); \ + if (a[i] != (i % 17 < 10 ? truev : b[i])) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int +main (void) +{ + FOR_EACH_INV (TEST_LOOP) + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_14.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_14.c 2018-05-24 13:05:46.050603898 +0100 @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define N 119 + +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + void __attribute__ ((noipa)) \ + f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b, \ + TYPE *restrict c, TYPE *restrict d, \ + CMPTYPE *restrict cond) \ + { \ + for (int i = 0; i < N; ++i) \ + { \ + TYPE mb = (INV & 1 ? -b[i] : b[i]); \ + TYPE mc = c[i]; \ + TYPE md = (INV & 2 ? -d[i] : d[i]); \ + TYPE fma = __builtin_fma##SUFFIX (mb, mc, md); \ + TYPE truev = (INV & 4 ? -fma : fma); \ + a[i] = cond[i] < 10 ? truev : c[i]; \ + } \ + } + +#define FOR_EACH_TYPE(T, INV) \ + T (INV, _Float16, short, f16) \ + T (INV, float, float, f32) \ + T (INV, double, double, f64) + +#define FOR_EACH_INV(T) \ + FOR_EACH_TYPE (T, 0) \ + FOR_EACH_TYPE (T, 1) \ + FOR_EACH_TYPE (T, 2) \ + FOR_EACH_TYPE (T, 3) \ + FOR_EACH_TYPE (T, 4) \ + FOR_EACH_TYPE (T, 5) \ + FOR_EACH_TYPE (T, 6) \ + FOR_EACH_TYPE (T, 7) + +FOR_EACH_INV (DEF_LOOP) + +/* { dg-final { scan-assembler-not {\tsel\t} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */ + +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_14_run.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_14_run.c 2018-05-24 13:05:46.050603898 +0100 @@ -0,0 +1,37 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "vcond_14.c" + +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + { \ + TYPE a[N], b[N], c[N], d[N]; \ + CMPTYPE cond[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + b[i] = i % 15; \ + c[i] = i % 9 + 11; \ + d[i] = i % 13 + 14; \ + cond[i] = i % 17; \ + asm volatile ("" ::: "memory"); \ + } \ + f_##INV##_##SUFFIX (a, b, c, d, cond); \ + for (int i = 0; i < N; ++i) \ + { \ + double mb = (INV & 1 ? -b[i] : b[i]); \ + double mc = c[i]; \ + double md = (INV & 2 ? -d[i] : d[i]); \ + double fma = __builtin_fma (mb, mc, md); \ + double truev = (INV & 4 ? -fma : fma); \ + if (a[i] != (i % 17 < 10 ? truev : c[i])) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int +main (void) +{ + FOR_EACH_INV (TEST_LOOP) + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_15.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_15.c 2018-05-24 13:05:46.050603898 +0100 @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define N 119 + +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + void __attribute__ ((noipa)) \ + f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b, \ + TYPE *restrict c, TYPE *restrict d, \ + CMPTYPE *restrict cond) \ + { \ + for (int i = 0; i < N; ++i) \ + { \ + TYPE mb = (INV & 1 ? -b[i] : b[i]); \ + TYPE mc = c[i]; \ + TYPE md = (INV & 2 ? -d[i] : d[i]); \ + TYPE fma = __builtin_fma##SUFFIX (mb, mc, md); \ + TYPE truev = (INV & 4 ? -fma : fma); \ + a[i] = cond[i] < 10 ? truev : d[i]; \ + } \ + } + +#define FOR_EACH_TYPE(T, INV) \ + T (INV, _Float16, short, f16) \ + T (INV, float, float, f32) \ + T (INV, double, double, f64) + +#define FOR_EACH_INV(T) \ + FOR_EACH_TYPE (T, 0) \ + FOR_EACH_TYPE (T, 1) \ + FOR_EACH_TYPE (T, 2) \ + FOR_EACH_TYPE (T, 3) \ + FOR_EACH_TYPE (T, 4) \ + FOR_EACH_TYPE (T, 5) \ + FOR_EACH_TYPE (T, 6) \ + FOR_EACH_TYPE (T, 7) + +FOR_EACH_INV (DEF_LOOP) + +/* { dg-final { scan-assembler-not {\tsel\t} } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */ + +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmla\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmls\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmla\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmls\tz[0-9]+\.d,} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_15_run.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_15_run.c 2018-05-24 13:05:46.050603898 +0100 @@ -0,0 +1,37 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "vcond_15.c" + +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + { \ + TYPE a[N], b[N], c[N], d[N]; \ + CMPTYPE cond[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + b[i] = i % 15; \ + c[i] = i % 9 + 11; \ + d[i] = i % 13 + 14; \ + cond[i] = i % 17; \ + asm volatile ("" ::: "memory"); \ + } \ + f_##INV##_##SUFFIX (a, b, c, d, cond); \ + for (int i = 0; i < N; ++i) \ + { \ + double mb = (INV & 1 ? -b[i] : b[i]); \ + double mc = c[i]; \ + double md = (INV & 2 ? -d[i] : d[i]); \ + double fma = __builtin_fma (mb, mc, md); \ + double truev = (INV & 4 ? -fma : fma); \ + if (a[i] != (i % 17 < 10 ? truev : d[i])) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int +main (void) +{ + FOR_EACH_INV (TEST_LOOP) + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_16.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_16.c 2018-05-24 13:05:46.050603898 +0100 @@ -0,0 +1,58 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#define N 119 + +#define DEF_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + void __attribute__ ((noipa)) \ + f_##INV##_##SUFFIX (TYPE *restrict a, TYPE *restrict b, \ + TYPE *restrict c, TYPE *restrict d, \ + CMPTYPE *restrict cond) \ + { \ + for (int i = 0; i < N; ++i) \ + { \ + TYPE mb = (INV & 1 ? -b[i] : b[i]); \ + TYPE mc = c[i]; \ + TYPE md = (INV & 2 ? -d[i] : d[i]); \ + TYPE fma = __builtin_fma##SUFFIX (mb, mc, md); \ + TYPE truev = (INV & 4 ? -fma : fma); \ + a[i] = cond[i] < 10 ? truev : 10; \ + } \ + } + +#define FOR_EACH_TYPE(T, INV) \ + T (INV, _Float16, short, f16) \ + T (INV, float, float, f32) \ + T (INV, double, double, f64) + +#define FOR_EACH_INV(T) \ + FOR_EACH_TYPE (T, 0) \ + FOR_EACH_TYPE (T, 1) \ + FOR_EACH_TYPE (T, 2) \ + FOR_EACH_TYPE (T, 3) \ + FOR_EACH_TYPE (T, 4) \ + FOR_EACH_TYPE (T, 5) \ + FOR_EACH_TYPE (T, 6) \ + FOR_EACH_TYPE (T, 7) + +FOR_EACH_INV (DEF_LOOP) + +/* { dg-final { scan-assembler-times {\tsel\t} 24 } } */ +/* { dg-final { scan-assembler-not {\tmovprfx\t} } } */ +/* { dg-final { scan-assembler-not {\tmov\tz[0-9]+\.., z[0-9]+} } } */ + +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmad\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfmsb\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmad\tz[0-9]+\.d,} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.h,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.s,} 2 } } */ +/* { dg-final { scan-assembler-times {\tfnmsb\tz[0-9]+\.d,} 2 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_16_run.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_16_run.c 2018-05-24 13:05:46.050603898 +0100 @@ -0,0 +1,37 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "vcond_16.c" + +#define TEST_LOOP(INV, TYPE, CMPTYPE, SUFFIX) \ + { \ + TYPE a[N], b[N], c[N], d[N]; \ + CMPTYPE cond[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + b[i] = i % 15; \ + c[i] = i % 9 + 11; \ + d[i] = i % 13 + 14; \ + cond[i] = i % 17; \ + asm volatile ("" ::: "memory"); \ + } \ + f_##INV##_##SUFFIX (a, b, c, d, cond); \ + for (int i = 0; i < N; ++i) \ + { \ + double mb = (INV & 1 ? -b[i] : b[i]); \ + double mc = c[i]; \ + double md = (INV & 2 ? -d[i] : d[i]); \ + double fma = __builtin_fma (mb, mc, md); \ + double truev = (INV & 4 ? -fma : fma); \ + if (a[i] != (i % 17 < 10 ? truev : 10)) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int +main (void) +{ + FOR_EACH_INV (TEST_LOOP) + return 0; +}