Message ID | 87po22p10d.fsf@linaro.org |
---|---|
State | New |
Headers | show |
Series | Replace FMA_EXPR with one internal fn per optab | expand |
On Fri, May 11, 2018 at 10:15 AM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > There are four optabs for various forms of fused multiply-add: > fma, fms, fnma and fnms. Of these, only fma had a direct gimple > representation. For the other three we relied on special pattern- > matching during expand, although tree-ssa-math-opts.c did have > some code to try to second-guess what expand would do. > > This patch removes the old FMA_EXPR representation of fma and > introduces four new internal functions, one for each optab. > IFN_FMA is tied to BUILT_IN_FMA* while the other three are > independent directly-mapped internal functions. It's then > possible to do the pattern-matching in match.pd and > tree-ssa-math-opts.c (via folding) can select the exact > FMA-based operation. > > The patch removes the gimple FE support for __FMA rather than mapping > it to the internal function. There's no reason now to treat it > differently from other internal functions (although the FE doesn't > handle those yet). > > The BRIG & HSA parts are a best guess, but seem relatively simple. > > The genmatch.c changes are structured to allow ternary ops in which > the second two rather than the first two operands are commutative. > A later patch makes use of this. > > Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf, > x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? I think there might be an issue with long double and __float128 support here (for both PowerPC and x86_64). Please add testcases for those to show they are not problematic. What about half float on the aarch64 case? Is that handle correctly? I did not see a testcase for that case either. Thanks, Andrew > > Richard > > > 2018-05-11 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > * doc/sourcebuild.texi (all_scalar_fma): Document. > * tree.def (FMA_EXPR): Delete. > * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions. > * internal-fn.c (ternary_direct): New macro. > (expand_ternary_optab_fn): Likewise. > (direct_ternary_optab_supported_p): Likewise. > * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h. > * builtins.c (fold_builtin_fma): Delete. > (fold_builtin_3): Don't call it. > * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling. > * expr.c (expand_expr_real_2): Likewise. > * fold-const.c (operand_equal_p): Likewise. > (fold_ternary_loc): Likewise. > * gimple-pretty-print.c (dump_ternary_rhs): Likewise. > * gimple.c (DEFTREECODE): Likewise. > * gimplify.c (gimplify_expr): Likewise. > * optabs-tree.c (optab_for_tree_code): Likewise. > * tree-cfg.c (verify_gimple_assign_ternary): Likewise. > * tree-eh.c (operation_could_trap_p): Likewise. > (stmt_could_throw_1_p): Likewise. > * tree-inline.c (estimate_operator_cost): Likewise. > * tree-pretty-print.c (dump_generic_node): Likewise. > (op_code_prio): Likewise. > * tree-ssa-loop-im.c (stmt_cost): Likewise. > * tree-ssa-operands.c (get_expr_operands): Likewise. > * tree.c (commutative_ternary_tree_code, add_expr): Likewise. > * fold-const-call.h (fold_fma): Delete. > * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS, > CFN_FNMA and CFN_FNMS. > (fold_fma): Delete. > * genmatch.c (combined_fn): New enum. > (commutative_ternary_tree_code): Remove FMA_EXPR handling. > (commutative_op): New function. > (commutate): Use it. Handle more than 2 operands. > (dt_operand::gen_gimple_expr): Use commutative_op. > (parser::parse_expr): Allow :c to be used with non-binary > operators if the commutative operand is known. > * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle > CFN_FMS, CFN_FNMA and CFN_FNMS. > (backprop::process_assign_use): Remove FMA_EXPR handling. > * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise. > (gen_hsa_fma): New function. > (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS, > IFN_FNMA and IFN_FNMS. > * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS. > * tree-ssa-math-opts.c (aggressive_valueize): New function. > (convert_mult_to_fma_1): Use the gimple_build interface and use > aggerssive_valueize to fold the result. > (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p > instead of checking for optabs directly. > * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls > rather than FMA_EXPRs. > * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a > call to IFN_FMA instead of an FMA_EXPR. > > gcc/brig/ > * brigfrontend/brig-function.cc > (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA > for BRIG_OPCODE_FMA. > (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA > as a call. > > gcc/c/ > * gimple-parser.c (c_parser_gimple_postfix_expression): Remove > __FMA_EXPR handlng. > > gcc/cp/ > * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling. > (potential_constant_expression_1): Likewise. > > gcc/testsuite/ > * lib/target-supports.exp (check_effective_target_all_scalar_fma): > New proc. > * gcc.dg/fma-1.c: New test. > * gcc.dg/fma-2.c: Likewise. > * gcc.dg/fma-3.c: Likewise. > * gcc.dg/fma-4.c: Likewise. > * gcc.dg/fma-5.c: Likewise. > * gcc.dg/fma-6.c: Likewise. > * gcc.dg/fma-7.c: Likewise. > * gcc.dg/gimplefe-26.c: Remove. > * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off. > * gfortran.dg/reassoc_8.f: Likewise. > * gfortran.dg/reassoc_9.f: Likewise. > * gfortran.dg/reassoc_10.f: Likewise. > > Index: gcc/doc/sourcebuild.texi > =================================================================== > --- gcc/doc/sourcebuild.texi 2018-05-08 09:42:01.479723260 +0100 > +++ gcc/doc/sourcebuild.texi 2018-05-11 18:08:24.833947453 +0100 > @@ -2170,6 +2170,11 @@ Target supports wide characters. > @subsubsection Other attributes > > @table @code > +@item all_scalar_fma > +Target supports all four fused multiply-add optabs for both @code{float} > +and @code{double}. These optabs are: @code{fma_optab}, @code{fms_optab}, > +@code{fnma_optab} and @code{fnms_optab}. > + > @item automatic_stack_alignment > Target supports automatic stack alignment. > > Index: gcc/tree.def > =================================================================== > --- gcc/tree.def 2018-01-03 11:12:58.606649676 +0000 > +++ gcc/tree.def 2018-05-11 18:08:24.852946671 +0100 > @@ -1345,12 +1345,6 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "wid > by the second argument. */ > DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2) > > -/* Fused multiply-add. > - All operands and the result are of the same type. No intermediate > - rounding is performed after multiplying operand one with operand two > - before adding operand three. */ > -DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3) > - > /* Widening vector multiplication. > The two operands are vectors with N elements of size S. Multiplying the > elements of the two vectors will result in N products of size 2*S. > Index: gcc/internal-fn.def > =================================================================== > --- gcc/internal-fn.def 2018-02-20 09:40:41.809452604 +0000 > +++ gcc/internal-fn.def 2018-05-11 18:08:24.842947083 +0100 > @@ -57,6 +57,7 @@ along with GCC; see the file COPYING3. > > - unary: a normal unary optab, such as vec_reverse_<mode> > - binary: a normal binary optab, such as vec_interleave_lo_<mode> > + - ternary: a normal ternary optab, such as fma<mode>4 > > - cond_binary: a conditional binary optab, such as add<mode>cc > > @@ -138,6 +139,10 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CO > DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, > vec_shl_insert, binary) > > +DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) > +DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) > +DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary) > + > DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary) > DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary) > DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first, > @@ -218,6 +223,9 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS > /* FP scales. */ > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > > +/* Ternary math functions. */ > +DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary) > + > /* Unary integer ops. */ > DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary) > DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | ECF_NOTHROW, clz, unary) > Index: gcc/internal-fn.c > =================================================================== > --- gcc/internal-fn.c 2018-03-02 09:44:45.456595753 +0000 > +++ gcc/internal-fn.c 2018-05-11 18:08:24.841947124 +0100 > @@ -90,6 +90,7 @@ #define mask_store_lanes_direct { 0, 0, > #define scatter_store_direct { 3, 3, false } > #define unary_direct { 0, 0, true } > #define binary_direct { 0, 0, true } > +#define ternary_direct { 0, 0, true } > #define cond_unary_direct { 1, 1, true } > #define cond_binary_direct { 1, 1, true } > #define while_direct { 0, 2, false } > @@ -2962,6 +2963,9 @@ #define expand_unary_optab_fn(FN, STMT, > #define expand_binary_optab_fn(FN, STMT, OPTAB) \ > expand_direct_optab_fn (FN, STMT, OPTAB, 2) > > +#define expand_ternary_optab_fn(FN, STMT, OPTAB) \ > + expand_direct_optab_fn (FN, STMT, OPTAB, 3) > + > #define expand_cond_unary_optab_fn(FN, STMT, OPTAB) \ > expand_direct_optab_fn (FN, STMT, OPTAB, 2) > > @@ -3047,6 +3051,7 @@ multi_vector_optab_supported_p (convert_ > > #define direct_unary_optab_supported_p direct_optab_supported_p > #define direct_binary_optab_supported_p direct_optab_supported_p > +#define direct_ternary_optab_supported_p direct_optab_supported_p > #define direct_cond_unary_optab_supported_p direct_optab_supported_p > #define direct_cond_binary_optab_supported_p direct_optab_supported_p > #define direct_mask_load_optab_supported_p direct_optab_supported_p > Index: gcc/Makefile.in > =================================================================== > --- gcc/Makefile.in 2018-03-13 15:06:01.749426280 +0000 > +++ gcc/Makefile.in 2018-05-11 18:08:24.816948152 +0100 > @@ -2786,7 +2786,7 @@ build/genmddump.o : genmddump.c $(RTL_BA > $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H) > build/genmatch.o : genmatch.c $(BCONFIG_H) $(SYSTEM_H) \ > $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \ > - tree.def builtins.def internal-fn.def > + tree.def builtins.def internal-fn.def case-cfn-macros.h > build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H) \ > $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def \ > internal-fn.def > Index: gcc/builtins.c > =================================================================== > --- gcc/builtins.c 2018-05-08 09:42:01.633717606 +0100 > +++ gcc/builtins.c 2018-05-11 18:08:24.817948111 +0100 > @@ -8340,21 +8340,6 @@ fold_builtin_abs (location_t loc, tree a > return fold_build1_loc (loc, ABS_EXPR, type, arg); > } > > -/* Fold a call to fma, fmaf, or fmal with arguments ARG[012]. */ > - > -static tree > -fold_builtin_fma (location_t loc, tree arg0, tree arg1, tree arg2, tree type) > -{ > - /* ??? Only expand to FMA_EXPR if it's directly supported. */ > - if (validate_arg (arg0, REAL_TYPE) > - && validate_arg (arg1, REAL_TYPE) > - && validate_arg (arg2, REAL_TYPE) > - && optab_handler (fma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) > - return fold_build3_loc (loc, FMA_EXPR, type, arg0, arg1, arg2); > - > - return NULL_TREE; > -} > - > /* Fold a call to builtin carg(a+bi) -> atan2(b,a). */ > > static tree > @@ -9260,10 +9245,6 @@ fold_builtin_3 (location_t loc, tree fnd > CASE_FLT_FN (BUILT_IN_SINCOS): > return fold_builtin_sincos (loc, arg0, arg1, arg2); > > - CASE_FLT_FN (BUILT_IN_FMA): > - CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA): > - return fold_builtin_fma (loc, arg0, arg1, arg2, type); > - > CASE_FLT_FN (BUILT_IN_REMQUO): > if (validate_arg (arg0, REAL_TYPE) > && validate_arg (arg1, REAL_TYPE) > Index: gcc/cfgexpand.c > =================================================================== > --- gcc/cfgexpand.c 2018-05-09 11:34:47.190553782 +0100 > +++ gcc/cfgexpand.c 2018-05-11 18:08:24.818948070 +0100 > @@ -4202,7 +4202,6 @@ expand_debug_expr (tree exp) > case SAD_EXPR: > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > - case FMA_EXPR: > goto ternary; > > case TRUTH_ANDIF_EXPR: > @@ -5190,9 +5189,6 @@ expand_debug_expr (tree exp) > } > return NULL; > > - case FMA_EXPR: > - return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2); > - > default: > flag_unsupported: > if (flag_checking) > Index: gcc/expr.c > =================================================================== > --- gcc/expr.c 2018-05-01 19:30:32.099548805 +0100 > +++ gcc/expr.c 2018-05-11 18:08:24.833947453 +0100 > @@ -8853,67 +8853,6 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b > expand_operands (treeop0, treeop1, subtarget, &op0, &op1, EXPAND_NORMAL); > return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp)); > > - case FMA_EXPR: > - { > - optab opt = fma_optab; > - gimple *def0, *def2; > - > - /* If there is no insn for FMA, emit it as __builtin_fma{,f,l} > - call. */ > - if (optab_handler (fma_optab, mode) == CODE_FOR_nothing) > - { > - tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA); > - tree call_expr; > - > - gcc_assert (fn != NULL_TREE); > - call_expr = build_call_expr (fn, 3, treeop0, treeop1, treeop2); > - return expand_builtin (call_expr, target, subtarget, mode, false); > - } > - > - def0 = get_def_for_expr (treeop0, NEGATE_EXPR); > - /* The multiplication is commutative - look at its 2nd operand > - if the first isn't fed by a negate. */ > - if (!def0) > - { > - def0 = get_def_for_expr (treeop1, NEGATE_EXPR); > - /* Swap operands if the 2nd operand is fed by a negate. */ > - if (def0) > - std::swap (treeop0, treeop1); > - } > - def2 = get_def_for_expr (treeop2, NEGATE_EXPR); > - > - op0 = op2 = NULL; > - > - if (def0 && def2 > - && optab_handler (fnms_optab, mode) != CODE_FOR_nothing) > - { > - opt = fnms_optab; > - op0 = expand_normal (gimple_assign_rhs1 (def0)); > - op2 = expand_normal (gimple_assign_rhs1 (def2)); > - } > - else if (def0 > - && optab_handler (fnma_optab, mode) != CODE_FOR_nothing) > - { > - opt = fnma_optab; > - op0 = expand_normal (gimple_assign_rhs1 (def0)); > - } > - else if (def2 > - && optab_handler (fms_optab, mode) != CODE_FOR_nothing) > - { > - opt = fms_optab; > - op2 = expand_normal (gimple_assign_rhs1 (def2)); > - } > - > - if (op0 == NULL) > - op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL); > - if (op2 == NULL) > - op2 = expand_normal (treeop2); > - op1 = expand_normal (treeop1); > - > - return expand_ternary_op (TYPE_MODE (type), opt, > - op0, op1, op2, target, 0); > - } > - > case MULT_EXPR: > /* If this is a fixed-point operation, then we cannot use the code > below because "expand_mult" doesn't support sat/no-sat fixed-point > Index: gcc/fold-const.c > =================================================================== > --- gcc/fold-const.c 2018-05-08 09:42:01.637717459 +0100 > +++ gcc/fold-const.c 2018-05-11 18:08:24.835947371 +0100 > @@ -3297,7 +3297,6 @@ #define OP_SAME_WITH_NULL(N) \ > case TRUTH_ORIF_EXPR: > return OP_SAME (0) && OP_SAME (1); > > - case FMA_EXPR: > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > if (!OP_SAME (2)) > @@ -11708,17 +11707,6 @@ fold_ternary_loc (location_t loc, enum t > > return NULL_TREE; > > - case FMA_EXPR: > - /* For integers we can decompose the FMA if possible. */ > - if (TREE_CODE (arg0) == INTEGER_CST > - && TREE_CODE (arg1) == INTEGER_CST) > - return fold_build2_loc (loc, PLUS_EXPR, type, > - const_binop (MULT_EXPR, arg0, arg1), arg2); > - if (integer_zerop (arg2)) > - return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1); > - > - return fold_fma (loc, type, arg0, arg1, arg2); > - > case VEC_PERM_EXPR: > if (TREE_CODE (arg2) == VECTOR_CST) > { > Index: gcc/gimple-pretty-print.c > =================================================================== > --- gcc/gimple-pretty-print.c 2018-02-13 10:28:33.325853088 +0000 > +++ gcc/gimple-pretty-print.c 2018-05-11 18:08:24.836947330 +0100 > @@ -490,27 +490,6 @@ dump_ternary_rhs (pretty_printer *buffer > pp_greater (buffer); > break; > > - case FMA_EXPR: > - if (flags & TDF_GIMPLE) > - { > - pp_string (buffer, "__FMA ("); > - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); > - pp_comma (buffer); > - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); > - pp_comma (buffer); > - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); > - pp_right_paren (buffer); > - } > - else > - { > - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); > - pp_string (buffer, " * "); > - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); > - pp_string (buffer, " + "); > - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); > - } > - break; > - > case DOT_PROD_EXPR: > pp_string (buffer, "DOT_PROD_EXPR <"); > dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); > Index: gcc/gimple.c > =================================================================== > --- gcc/gimple.c 2018-02-13 10:28:33.415861313 +0000 > +++ gcc/gimple.c 2018-05-11 18:08:24.838947247 +0100 > @@ -2143,8 +2143,7 @@ #define DEFTREECODE(SYM, STRING, TYPE, N > || (SYM) == REALIGN_LOAD_EXPR \ > || (SYM) == VEC_COND_EXPR \ > || (SYM) == VEC_PERM_EXPR \ > - || (SYM) == BIT_INSERT_EXPR \ > - || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS \ > + || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS \ > : ((SYM) == CONSTRUCTOR \ > || (SYM) == OBJ_TYPE_REF \ > || (SYM) == ASSERT_EXPR \ > Index: gcc/gimplify.c > =================================================================== > --- gcc/gimplify.c 2018-05-08 09:42:02.972668452 +0100 > +++ gcc/gimplify.c 2018-05-11 18:08:24.840947165 +0100 > @@ -12086,7 +12086,6 @@ gimplify_expr (tree *expr_p, gimple_seq > } > break; > > - case FMA_EXPR: > case VEC_PERM_EXPR: > /* Classified as tcc_expression. */ > goto expr_3; > Index: gcc/optabs-tree.c > =================================================================== > --- gcc/optabs-tree.c 2018-01-13 18:01:26.108685820 +0000 > +++ gcc/optabs-tree.c 2018-05-11 18:08:24.842947083 +0100 > @@ -143,9 +143,6 @@ optab_for_tree_code (enum tree_code code > : (TYPE_SATURATING (type) > ? ssmsub_widen_optab : smsub_widen_optab)); > > - case FMA_EXPR: > - return fma_optab; > - > case VEC_WIDEN_MULT_HI_EXPR: > return TYPE_UNSIGNED (type) ? > vec_widen_umult_hi_optab : vec_widen_smult_hi_optab; > Index: gcc/tree-cfg.c > =================================================================== > --- gcc/tree-cfg.c 2018-05-01 19:31:03.079312535 +0100 > +++ gcc/tree-cfg.c 2018-05-11 18:08:24.848946836 +0100 > @@ -4109,20 +4109,6 @@ verify_gimple_assign_ternary (gassign *s > } > break; > > - case FMA_EXPR: > - if (!useless_type_conversion_p (lhs_type, rhs1_type) > - || !useless_type_conversion_p (lhs_type, rhs2_type) > - || !useless_type_conversion_p (lhs_type, rhs3_type)) > - { > - error ("type mismatch in fused multiply-add expression"); > - debug_generic_expr (lhs_type); > - debug_generic_expr (rhs1_type); > - debug_generic_expr (rhs2_type); > - debug_generic_expr (rhs3_type); > - return true; > - } > - break; > - > case VEC_COND_EXPR: > if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type) > || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type), > Index: gcc/tree-eh.c > =================================================================== > --- gcc/tree-eh.c 2018-02-08 13:34:20.321293427 +0000 > +++ gcc/tree-eh.c 2018-05-11 18:08:24.849946795 +0100 > @@ -2512,8 +2512,7 @@ operation_could_trap_p (enum tree_code o > > if (TREE_CODE_CLASS (op) != tcc_comparison > && TREE_CODE_CLASS (op) != tcc_unary > - && TREE_CODE_CLASS (op) != tcc_binary > - && op != FMA_EXPR) > + && TREE_CODE_CLASS (op) != tcc_binary) > return false; > > return operation_could_trap_helper_p (op, fp_operation, honor_trapv, > @@ -2825,8 +2824,7 @@ stmt_could_throw_1_p (gassign *stmt) > > if (TREE_CODE_CLASS (code) == tcc_comparison > || TREE_CODE_CLASS (code) == tcc_unary > - || TREE_CODE_CLASS (code) == tcc_binary > - || code == FMA_EXPR) > + || TREE_CODE_CLASS (code) == tcc_binary) > { > if (TREE_CODE_CLASS (code) == tcc_comparison) > t = TREE_TYPE (gimple_assign_rhs1 (stmt)); > Index: gcc/tree-inline.c > =================================================================== > --- gcc/tree-inline.c 2018-03-13 15:06:01.738427167 +0000 > +++ gcc/tree-inline.c 2018-05-11 18:08:24.849946795 +0100 > @@ -3855,7 +3855,6 @@ estimate_operator_cost (enum tree_code c > case MINUS_EXPR: > case MULT_EXPR: > case MULT_HIGHPART_EXPR: > - case FMA_EXPR: > > case ADDR_SPACE_CONVERT_EXPR: > case FIXED_CONVERT_EXPR: > Index: gcc/tree-pretty-print.c > =================================================================== > --- gcc/tree-pretty-print.c 2018-03-13 15:06:00.523525161 +0000 > +++ gcc/tree-pretty-print.c 2018-05-11 18:08:24.850946754 +0100 > @@ -2901,16 +2901,6 @@ dump_generic_node (pretty_printer *pp, t > pp_string (pp, " > "); > break; > > - case FMA_EXPR: > - pp_string (pp, " FMA_EXPR < "); > - dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); > - pp_string (pp, ", "); > - dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false); > - pp_string (pp, ", "); > - dump_generic_node (pp, TREE_OPERAND (node, 2), spc, flags, false); > - pp_string (pp, " > "); > - break; > - > case OACC_PARALLEL: > pp_string (pp, "#pragma acc parallel"); > goto dump_omp_clauses_body; > @@ -3549,7 +3539,6 @@ op_code_prio (enum tree_code code) > case CEIL_MOD_EXPR: > case FLOOR_MOD_EXPR: > case ROUND_MOD_EXPR: > - case FMA_EXPR: > return 13; > > case TRUTH_NOT_EXPR: > Index: gcc/tree-ssa-loop-im.c > =================================================================== > --- gcc/tree-ssa-loop-im.c 2018-03-13 15:06:00.521525322 +0000 > +++ gcc/tree-ssa-loop-im.c 2018-05-11 18:08:24.850946754 +0100 > @@ -493,7 +493,6 @@ stmt_cost (gimple *stmt) > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > case DOT_PROD_EXPR: > - case FMA_EXPR: > case TRUNC_DIV_EXPR: > case CEIL_DIV_EXPR: > case FLOOR_DIV_EXPR: > Index: gcc/tree-ssa-operands.c > =================================================================== > --- gcc/tree-ssa-operands.c 2018-01-03 11:12:59.133628905 +0000 > +++ gcc/tree-ssa-operands.c 2018-05-11 18:08:24.851946713 +0100 > @@ -849,7 +849,6 @@ get_expr_operands (struct function *fn, > case REALIGN_LOAD_EXPR: > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > - case FMA_EXPR: > { > get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags); > get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 1), flags); > Index: gcc/tree.c > =================================================================== > --- gcc/tree.c 2018-05-08 09:42:01.631717680 +0100 > +++ gcc/tree.c 2018-05-11 18:08:24.852946671 +0100 > @@ -7171,7 +7171,6 @@ commutative_ternary_tree_code (enum tree > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > case DOT_PROD_EXPR: > - case FMA_EXPR: > return true; > > default: > @@ -7457,7 +7456,6 @@ add_expr (const_tree t, inchash::hash &h > flags &= ~OEP_ADDRESS_OF; > break; > > - case FMA_EXPR: > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > { > Index: gcc/fold-const-call.h > =================================================================== > --- gcc/fold-const-call.h 2018-01-03 11:12:58.150667646 +0000 > +++ gcc/fold-const-call.h 2018-05-11 18:08:24.833947453 +0100 > @@ -23,7 +23,6 @@ #define GCC_FOLD_CONST_CALL_H > tree fold_const_call (combined_fn, tree, tree); > tree fold_const_call (combined_fn, tree, tree, tree); > tree fold_const_call (combined_fn, tree, tree, tree, tree); > -tree fold_fma (location_t, tree, tree, tree, tree); > tree build_cmp_result (tree type, int res); > > #endif > Index: gcc/fold-const-call.c > =================================================================== > --- gcc/fold-const-call.c 2018-05-08 09:42:01.844709861 +0100 > +++ gcc/fold-const-call.c 2018-05-11 18:08:24.833947453 +0100 > @@ -1606,6 +1606,26 @@ fold_const_call_ssss (real_value *result > CASE_CFN_FMA_FN: > return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, arg2, format); > > + case CFN_FMS: > + { > + real_value new_arg2 = real_value_negate (arg2); > + return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, &new_arg2, format); > + } > + > + case CFN_FNMA: > + { > + real_value new_arg0 = real_value_negate (arg0); > + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, arg2, format); > + } > + > + case CFN_FNMS: > + { > + real_value new_arg0 = real_value_negate (arg0); > + real_value new_arg2 = real_value_negate (arg2); > + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, > + &new_arg2, format); > + } > + > default: > return false; > } > @@ -1719,20 +1739,3 @@ fold_const_call (combined_fn fn, tree ty > return fold_const_call_1 (fn, type, arg0, arg1, arg2); > } > } > - > -/* Fold a fma operation with arguments ARG[012]. */ > - > -tree > -fold_fma (location_t, tree type, tree arg0, tree arg1, tree arg2) > -{ > - REAL_VALUE_TYPE result; > - if (real_cst_p (arg0) > - && real_cst_p (arg1) > - && real_cst_p (arg2) > - && do_mpfr_arg3 (&result, mpfr_fma, TREE_REAL_CST_PTR (arg0), > - TREE_REAL_CST_PTR (arg1), TREE_REAL_CST_PTR (arg2), > - REAL_MODE_FORMAT (TYPE_MODE (type)))) > - return build_real (type, result); > - > - return NULL_TREE; > -} > Index: gcc/genmatch.c > =================================================================== > --- gcc/genmatch.c 2018-03-01 08:20:43.846526310 +0000 > +++ gcc/genmatch.c 2018-05-11 18:08:24.835947371 +0100 > @@ -241,6 +241,20 @@ enum internal_fn { > IFN_LAST > }; > > +enum combined_fn { > +#define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \ > + CFN_##ENUM = int (ENUM), > +#include "builtins.def" > + > +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \ > + CFN_##CODE = int (END_BUILTINS) + int (IFN_##CODE), > +#include "internal-fn.def" > + > + CFN_LAST > +}; > + > +#include "case-cfn-macros.h" > + > /* Return true if CODE represents a commutative tree code. Otherwise > return false. */ > bool > @@ -288,7 +302,6 @@ commutative_ternary_tree_code (enum tree > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > case DOT_PROD_EXPR: > - case FMA_EXPR: > return true; > > default: > @@ -450,6 +463,44 @@ is_a_helper <user_id *>::test (id_base * > return id->kind == id_base::USER; > } > > +/* If ID has a pair of consecutive, commutative operands, return the > + index of the first, otherwise return -1. */ > + > +static int > +commutative_op (id_base *id) > +{ > + if (operator_id *code = dyn_cast <operator_id *> (id)) > + { > + if (commutative_tree_code (code->code) > + || commutative_ternary_tree_code (code->code)) > + return 0; > + return -1; > + } > + if (fn_id *fn = dyn_cast <fn_id *> (id)) > + switch (fn->fn) > + { > + CASE_CFN_FMA: > + case CFN_FMS: > + case CFN_FNMA: > + case CFN_FNMS: > + return 0; > + > + default: > + return -1; > + } > + if (user_id *uid = dyn_cast<user_id *> (id)) > + { > + int res = commutative_op (uid->substitutes[0]); > + if (res < 0) > + return 0; > + for (unsigned i = 1; i < uid->substitutes.length (); ++i) > + if (res != commutative_op (uid->substitutes[i])) > + return -1; > + return res; > + } > + return -1; > +} > + > /* Add a predicate identifier to the hash. */ > > static predicate_id * > @@ -946,6 +997,9 @@ commutate (operand *op, vec<vec<user_id > if (!e->is_commutative) > return ret; > > + /* The operation is always binary if it isn't inherently commutative. */ > + int natural_opno = commutative_op (e->operation); > + unsigned int opno = natural_opno >= 0 ? natural_opno : 0; > for (unsigned i = 0; i < result.length (); ++i) > { > expr *ne = new expr (e); > @@ -994,9 +1048,11 @@ commutate (operand *op, vec<vec<user_id > } > } > ne->is_commutative = false; > - // result[i].length () is 2 since e->operation is binary > - for (unsigned j = result[i].length (); j; --j) > - ne->append_op (result[i][j-1]); > + for (unsigned j = 0; j < result[i].length (); ++j) > + { > + int old_j = (j == opno ? opno + 1 : j == opno + 1 ? opno : j); > + ne->append_op (result[i][old_j]); > + } > ret.safe_push (ne); > } > > @@ -2759,24 +2815,18 @@ dt_operand::gen_gimple_expr (FILE *f, in > /* While the toplevel operands are canonicalized by the caller > after valueizing operands of sub-expressions we have to > re-canonicalize operand order. */ > - if (operator_id *code = dyn_cast <operator_id *> (id)) > + int opno = commutative_op (id); > + if (opno >= 0) > { > - /* ??? We can't canonicalize tcc_comparison operands here > - because that requires changing the comparison code which > - we already matched... */ > - if (commutative_tree_code (code->code) > - || commutative_ternary_tree_code (code->code)) > - { > - char child_opname0[20], child_opname1[20]; > - gen_opname (child_opname0, 0); > - gen_opname (child_opname1, 1); > - fprintf_indent (f, indent, > - "if (tree_swap_operands_p (%s, %s))\n", > - child_opname0, child_opname1); > - fprintf_indent (f, indent, > - " std::swap (%s, %s);\n", > - child_opname0, child_opname1); > - } > + char child_opname0[20], child_opname1[20]; > + gen_opname (child_opname0, opno); > + gen_opname (child_opname1, opno + 1); > + fprintf_indent (f, indent, > + "if (tree_swap_operands_p (%s, %s))\n", > + child_opname0, child_opname1); > + fprintf_indent (f, indent, > + " std::swap (%s, %s);\n", > + child_opname0, child_opname1); > } > > return n_braces; > @@ -4217,11 +4267,14 @@ parser::parse_expr () > e->operation->id, e->operation->nargs, e->ops.length ()); > if (is_commutative) > { > - if (e->ops.length () == 2) > + if (e->ops.length () == 2 > + || commutative_op (e->operation) >= 0) > e->is_commutative = true; > else > - fatal_at (token, "only binary operators or function with " > - "two arguments can be marked commutative"); > + fatal_at (token, "only binary operators or functions with " > + "two arguments can be marked commutative, " > + "unless the operation is known to be inherently " > + "commutative"); > } > e->expr_type = expr_type; > return op; > Index: gcc/gimple-ssa-backprop.c > =================================================================== > --- gcc/gimple-ssa-backprop.c 2018-01-03 11:12:59.113629694 +0000 > +++ gcc/gimple-ssa-backprop.c 2018-05-11 18:08:24.836947330 +0100 > @@ -375,6 +375,9 @@ backprop::process_builtin_call_use (gcal > > CASE_CFN_FMA: > CASE_CFN_FMA_FN: > + case CFN_FMS: > + case CFN_FNMA: > + case CFN_FNMS: > /* In X * X + Y, where Y is distinct from X, the sign of X doesn't > matter. */ > if (gimple_call_arg (call, 0) == rhs > @@ -420,15 +423,6 @@ backprop::process_assign_use (gassign *a > } > break; > > - case FMA_EXPR: > - /* In X * X + Y, where Y is distinct from X, the sign of X doesn't > - matter. */ > - if (gimple_assign_rhs1 (assign) == rhs > - && gimple_assign_rhs2 (assign) == rhs > - && gimple_assign_rhs3 (assign) != rhs) > - info->flags.ignore_sign = true; > - break; > - > case MULT_EXPR: > /* In X * X, the sign of X doesn't matter. */ > if (gimple_assign_rhs1 (assign) == rhs > Index: gcc/hsa-gen.c > =================================================================== > --- gcc/hsa-gen.c 2018-03-17 08:30:21.230924973 +0000 > +++ gcc/hsa-gen.c 2018-05-11 18:08:24.841947124 +0100 > @@ -3178,23 +3178,6 @@ gen_hsa_insns_for_operation_assignment ( > case NEGATE_EXPR: > opcode = BRIG_OPCODE_NEG; > break; > - case FMA_EXPR: > - /* There is a native HSA instruction for scalar FMAs but not for vector > - ones. */ > - if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) > - { > - hsa_op_reg *dest > - = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign)); > - hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); > - hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); > - hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); > - hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); > - gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); > - gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb); > - return; > - } > - opcode = BRIG_OPCODE_MAD; > - break; > case MIN_EXPR: > opcode = BRIG_OPCODE_MIN; > break; > @@ -4490,6 +4473,57 @@ gen_hsa_divmod (gcall *call, hsa_bb *hbb > insn->set_output_in_type (dest, 0, hbb); > } > > +/* Emit instructions that implement FMA, FMS, FNMA or FNMS call STMT. > + Instructions are appended to basic block HBB. NEGATE1 is true for > + FNMA and FNMS. NEGATE3 is true for FMS and FNMS. */ > + > +static void > +gen_hsa_fma (gcall *call, hsa_bb *hbb, bool negate1, bool negate3) > +{ > + tree lhs = gimple_call_lhs (call); > + if (lhs == NULL_TREE) > + return; > + > + tree rhs1 = gimple_call_arg (call, 0); > + tree rhs2 = gimple_call_arg (call, 1); > + tree rhs3 = gimple_call_arg (call, 2); > + > + hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs); > + hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); > + hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); > + hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); > + > + if (negate1) > + { > + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); > + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op1, hbb); > + op1 = tmp; > + } > + > + /* There is a native HSA instruction for scalar FMAs but not for vector > + ones. */ > + if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) > + { > + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); > + gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); > + gen_hsa_binary_operation (negate3 ? BRIG_OPCODE_SUB : BRIG_OPCODE_ADD, > + dest, tmp, op3, hbb); > + } > + else > + { > + if (negate3) > + { > + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); > + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op3, hbb); > + op3 = tmp; > + } > + hsa_insn_basic *insn = new hsa_insn_basic (4, BRIG_OPCODE_MAD, > + dest->m_type, dest, > + op1, op2, op3); > + hbb->append_insn (insn); > + } > +} > + > /* Set VALUE to a shadow kernel debug argument and append a new instruction > to HBB basic block. */ > > @@ -5224,6 +5258,22 @@ gen_hsa_insn_for_internal_fn_call (gcall > gen_hsa_insns_for_call_of_internal_fn (stmt, hbb); > break; > > + case IFN_FMA: > + gen_hsa_fma (stmt, hbb, false, false); > + break; > + > + case IFN_FMS: > + gen_hsa_fma (stmt, hbb, false, true); > + break; > + > + case IFN_FNMA: > + gen_hsa_fma (stmt, hbb, true, false); > + break; > + > + case IFN_FNMS: > + gen_hsa_fma (stmt, hbb, true, true); > + break; > + > default: > HSA_SORRY_ATV (gimple_location (stmt), > "support for HSA does not implement internal function: %s", > Index: gcc/match.pd > =================================================================== > --- gcc/match.pd 2018-05-02 08:39:32.882737077 +0100 > +++ gcc/match.pd 2018-05-11 18:08:24.842947083 +0100 > @@ -4702,3 +4702,60 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > || wi::geu_p (wi::to_wide (@rpos), > wi::to_wide (@ipos) + isize)) > (BIT_FIELD_REF @0 @rsize @rpos))))) > + > +(for fmas (FMA) > + (simplify > + (fmas:c (negate @0) @1 @2) > + (IFN_FNMA @0 @1 @2)) > + (simplify > + (fmas @0 @1 (negate @2)) > + (IFN_FMS @0 @1 @2)) > + (simplify > + (fmas:c (negate @0) @1 (negate @2)) > + (IFN_FNMS @0 @1 @2)) > + (simplify > + (negate (fmas@3 @0 @1 @2)) > + (if (single_use (@3)) > + (IFN_FNMS @0 @1 @2)))) > + > +(simplify > + (IFN_FMS:c (negate @0) @1 @2) > + (IFN_FNMS @0 @1 @2)) > +(simplify > + (IFN_FMS @0 @1 (negate @2)) > + (IFN_FMA @0 @1 @2)) > +(simplify > + (IFN_FMS:c (negate @0) @1 (negate @2)) > + (IFN_FNMA @0 @1 @2)) > +(simplify > + (negate (IFN_FMS@3 @0 @1 @2)) > + (if (single_use (@3)) > + (IFN_FNMA @0 @1 @2))) > + > +(simplify > + (IFN_FNMA:c (negate @0) @1 @2) > + (IFN_FMA @0 @1 @2)) > +(simplify > + (IFN_FNMA @0 @1 (negate @2)) > + (IFN_FNMS @0 @1 @2)) > +(simplify > + (IFN_FNMA:c (negate @0) @1 (negate @2)) > + (IFN_FMS @0 @1 @2)) > +(simplify > + (negate (IFN_FNMA@3 @0 @1 @2)) > + (if (single_use (@3)) > + (IFN_FMS @0 @1 @2))) > + > +(simplify > + (IFN_FNMS:c (negate @0) @1 @2) > + (IFN_FMS @0 @1 @2)) > +(simplify > + (IFN_FNMS @0 @1 (negate @2)) > + (IFN_FNMA @0 @1 @2)) > +(simplify > + (IFN_FNMS:c (negate @0) @1 (negate @2)) > + (IFN_FMA @0 @1 @2)) > +(simplify > + (negate (IFN_FNMS@3 @0 @1 @2)) > + (if (single_use (@3)) > + (IFN_FMA @0 @1 @2))) > Index: gcc/tree-ssa-math-opts.c > =================================================================== > --- gcc/tree-ssa-math-opts.c 2018-01-12 14:45:51.037434575 +0000 > +++ gcc/tree-ssa-math-opts.c 2018-05-11 18:08:24.850946754 +0100 > @@ -2640,6 +2640,14 @@ convert_plusminus_to_widen (gimple_stmt_ > return true; > } > > +/* gimple_fold callback that "valueizes" everything. */ > + > +static tree > +aggressive_valueize (tree val) > +{ > + return val; > +} > + > /* Given a result MUL_RESULT which is a result of a multiplication of OP1 and > OP2 and which we know is used in statements that can be, together with the > multiplication, converted to FMAs, perform the transformation. */ > @@ -2650,7 +2658,7 @@ convert_mult_to_fma_1 (tree mul_result, > tree type = TREE_TYPE (mul_result); > gimple *use_stmt; > imm_use_iterator imm_iter; > - gassign *fma_stmt; > + gcall *fma_stmt; > > FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result) > { > @@ -2658,6 +2666,7 @@ convert_mult_to_fma_1 (tree mul_result, > enum tree_code use_code; > tree addop, mulop1 = op1, result = mul_result; > bool negate_p = false; > + gimple_seq seq = NULL; > > if (is_gimple_debug (use_stmt)) > continue; > @@ -2683,11 +2692,7 @@ convert_mult_to_fma_1 (tree mul_result, > addop = gimple_assign_rhs2 (use_stmt); > /* a * b - c -> a * b + (-c) */ > if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR) > - addop = force_gimple_operand_gsi (&gsi, > - build1 (NEGATE_EXPR, > - type, addop), > - true, NULL_TREE, true, > - GSI_SAME_STMT); > + addop = gimple_build (&seq, NEGATE_EXPR, type, addop); > } > else > { > @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result, > } > > if (negate_p) > - mulop1 = force_gimple_operand_gsi (&gsi, > - build1 (NEGATE_EXPR, > - type, mulop1), > - true, NULL_TREE, true, > - GSI_SAME_STMT); > + mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1); > > - fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), > - FMA_EXPR, mulop1, op2, addop); > + if (seq) > + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); > + fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop); > + gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt)); > + gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (use_stmt)); > + gsi_replace (&gsi, fma_stmt, true); > + /* Valueize aggressively so that we generate FMS, FNMA and FNMS > + regardless of where the negation occurs. */ > + if (fold_stmt (&gsi, aggressive_valueize)) > + update_stmt (gsi_stmt (gsi)); > > if (dump_file && (dump_flags & TDF_DETAILS)) > { > fprintf (dump_file, "Generated FMA "); > - print_gimple_stmt (dump_file, fma_stmt, 0, 0); > + print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0); > fprintf (dump_file, "\n"); > } > > - gsi_replace (&gsi, fma_stmt, true); > widen_mul_stats.fmas_inserted++; > } > } > @@ -2862,7 +2870,8 @@ convert_mult_to_fma (gimple *mul_stmt, t > > /* If the target doesn't support it, don't generate it. We assume that > if fma isn't available then fms, fnma or fnms are not either. */ > - if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing) > + optimization_type opt_type = bb_optimization_type (gimple_bb (mul_stmt)); > + if (!direct_internal_fn_supported_p (IFN_FMA, type, opt_type)) > return false; > > /* If the multiplication has zero uses, it is kept around probably because > @@ -2958,8 +2967,8 @@ convert_mult_to_fma (gimple *mul_stmt, t > that a mult / subtract pair. */ > if (use_code == MINUS_EXPR && !negate_p > && gimple_assign_rhs1 (use_stmt) == result > - && optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing > - && optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) > + && !direct_internal_fn_supported_p (IFN_FMS, type, opt_type) > + && direct_internal_fn_supported_p (IFN_FNMA, type, opt_type)) > { > tree rhs2 = gimple_assign_rhs2 (use_stmt); > > Index: gcc/config/i386/i386.c > =================================================================== > --- gcc/config/i386/i386.c 2018-05-09 11:34:41.204789431 +0100 > +++ gcc/config/i386/i386.c 2018-05-11 18:08:24.825947782 +0100 > @@ -50516,19 +50516,20 @@ ix86_add_stmt_cost (void *data, int coun > tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE; > int stmt_cost = - 1; > > + bool fp = false; > + machine_mode mode = TImode; > + > + if (vectype != NULL) > + { > + fp = FLOAT_TYPE_P (vectype); > + mode = TYPE_MODE (vectype); > + } > + > if ((kind == vector_stmt || kind == scalar_stmt) > && stmt_info > && stmt_info->stmt && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN) > { > tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt); > - bool fp = false; > - machine_mode mode = TImode; > - > - if (vectype != NULL) > - { > - fp = FLOAT_TYPE_P (vectype); > - mode = TYPE_MODE (vectype); > - } > /*machine_mode inner_mode = mode; > if (VECTOR_MODE_P (mode)) > inner_mode = GET_MODE_INNER (mode);*/ > @@ -50559,12 +50560,6 @@ ix86_add_stmt_cost (void *data, int coun > case MULT_HIGHPART_EXPR: > stmt_cost = ix86_multiplication_cost (ix86_cost, mode); > break; > - case FMA_EXPR: > - stmt_cost = ix86_vec_cost (mode, > - mode == SFmode ? ix86_cost->fmass > - : ix86_cost->fmasd, > - true); > - break; > case NEGATE_EXPR: > if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) > stmt_cost = ix86_cost->sse_op; > @@ -50627,6 +50622,24 @@ ix86_add_stmt_cost (void *data, int coun > break; > } > } > + > + combined_fn cfn; > + if ((kind == vector_stmt || kind == scalar_stmt) > + && stmt_info > + && stmt_info->stmt > + && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST) > + switch (cfn) > + { > + case CFN_FMA: > + stmt_cost = ix86_vec_cost (mode, > + mode == SFmode ? ix86_cost->fmass > + : ix86_cost->fmasd, > + true); > + break; > + default: > + break; > + } > + > /* If we do elementwise loads into a vector then we are bound by > latency and execution resources for the many scalar loads > (AGU and load ports). Try to account for this by scaling the > Index: gcc/config/rs6000/rs6000.c > =================================================================== > --- gcc/config/rs6000/rs6000.c 2018-05-08 09:42:03.510648702 +0100 > +++ gcc/config/rs6000/rs6000.c 2018-05-11 18:08:24.830947576 +0100 > @@ -16025,7 +16025,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_ > arg1 = gimple_call_arg (stmt, 1); > tree arg2 = gimple_call_arg (stmt, 2); > lhs = gimple_call_lhs (stmt); > - gimple *g = gimple_build_assign (lhs, FMA_EXPR, arg0, arg1, arg2); > + gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1, arg2); > + gimple_call_set_lhs (g, lhs); > + gimple_call_set_nothrow (g, true); > gimple_set_location (g, gimple_location (stmt)); > gsi_replace (gsi, g, true); > return true; > Index: gcc/brig/brigfrontend/brig-function.cc > =================================================================== > --- gcc/brig/brigfrontend/brig-function.cc 2018-05-08 09:42:01.419725462 +0100 > +++ gcc/brig/brigfrontend/brig-function.cc 2018-05-11 18:08:24.817948111 +0100 > @@ -1218,6 +1218,7 @@ brig_function::get_builtin_for_hsa_opcod > case BRIG_OPCODE_NEXP2: > builtin = mathfn_built_in (builtin_type, BUILT_IN_EXP2); > break; > + case BRIG_OPCODE_FMA: > case BRIG_OPCODE_NFMA: > builtin = mathfn_built_in (builtin_type, BUILT_IN_FMA); > break; > @@ -1460,8 +1461,6 @@ brig_function::get_tree_code_for_hsa_opc > return CALL_EXPR; > else > return MAX_EXPR; > - case BRIG_OPCODE_FMA: > - return FMA_EXPR; > case BRIG_OPCODE_ABS: > return ABS_EXPR; > case BRIG_OPCODE_SHL: > @@ -1496,6 +1495,7 @@ brig_function::get_tree_code_for_hsa_opc > /* Implement as 1/f (x). gcc should pattern detect that and > use a native instruction, if available, for it. */ > return TREE_LIST; > + case BRIG_OPCODE_FMA: > case BRIG_OPCODE_FLOOR: > case BRIG_OPCODE_CEIL: > case BRIG_OPCODE_SQRT: > Index: gcc/c/gimple-parser.c > =================================================================== > --- gcc/c/gimple-parser.c 2018-01-03 11:12:56.269741723 +0000 > +++ gcc/c/gimple-parser.c 2018-05-11 18:08:24.817948111 +0100 > @@ -903,27 +903,6 @@ c_parser_gimple_postfix_expression (c_pa > expr.value = fold_convert (type, val); > return expr; > } > - else if (strcmp (IDENTIFIER_POINTER (id), "__FMA") == 0) > - { > - c_parser_consume_token (parser); > - auto_vec<tree> args; > - > - if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>")) > - { > - c_parser_gimple_expr_list (parser, &args); > - c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, > - "expected %<)%>"); > - } > - if (args.length () != 3) > - { > - error_at (loc, "invalid number of operands to __FMA"); > - expr.value = error_mark_node; > - return expr; > - } > - expr.value = build3_loc (loc, FMA_EXPR, TREE_TYPE (args[0]), > - args[0], args[1], args[2]); > - return expr; > - } > > /* SSA name. */ > unsigned version, ver_offset; > Index: gcc/cp/constexpr.c > =================================================================== > --- gcc/cp/constexpr.c 2018-05-01 19:30:30.973597315 +0100 > +++ gcc/cp/constexpr.c 2018-05-11 18:08:24.830947576 +0100 > @@ -4573,7 +4573,6 @@ cxx_eval_constant_expression (const cons > non_constant_p, overflow_p); > break; > > - case FMA_EXPR: > case VEC_PERM_EXPR: > r = cxx_eval_trinary_expression (ctx, t, lval, > non_constant_p, overflow_p); > @@ -5998,7 +5997,6 @@ #define RECUR(T,RV) \ > return false; > return true; > > - case FMA_EXPR: > case VEC_PERM_EXPR: > for (i = 0; i < 3; ++i) > if (!RECUR (TREE_OPERAND (t, i), true)) > Index: gcc/testsuite/lib/target-supports.exp > =================================================================== > --- gcc/testsuite/lib/target-supports.exp 2018-05-08 09:42:01.256731446 +0100 > +++ gcc/testsuite/lib/target-supports.exp 2018-05-11 18:08:24.848946836 +0100 > @@ -2879,6 +2879,13 @@ proc check_effective_target_base_quadflo > return 1 > } > > +# Return 1 if the target supports all four forms of fused multiply-add > +# (fma, fms, fnma, and fnms) for both float and double. > + > +proc check_effective_target_all_scalar_fma { } { > + return [istarget aarch64*-*-*] > +} > + > # Return 1 if the target supports compiling fixed-point, > # 0 otherwise. > > Index: gcc/testsuite/gcc.dg/fma-1.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-1.c 2018-05-11 18:08:24.844947000 +0100 > @@ -0,0 +1,15 @@ > +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return a * b + c; > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return a * b + c; > +} > + > +/* { dg-final { scan-tree-dump-times { = FMA \(} 2 "widening_mul" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-2.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-2.c 2018-05-11 18:08:24.844947000 +0100 > @@ -0,0 +1,15 @@ > +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return a * b - c; > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return a * b - c; > +} > + > +/* { dg-final { scan-tree-dump-times { = FMS \(} 2 "widening_mul" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-3.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-3.c 2018-05-11 18:08:24.845946959 +0100 > @@ -0,0 +1,15 @@ > +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return c - a * b; > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return c - a * b; > +} > + > +/* { dg-final { scan-tree-dump-times { = FNMA \(} 2 "widening_mul" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-4.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-4.c 2018-05-11 18:08:24.845946959 +0100 > @@ -0,0 +1,15 @@ > +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return -(a * b) - c; > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return -(a * b) - c; > +} > + > +/* { dg-final { scan-tree-dump-times { = FNMS \(} 2 "widening_mul" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-5.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-5.c 2018-05-11 18:08:24.845946959 +0100 > @@ -0,0 +1,53 @@ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return __builtin_fmaf (a, b, -c); > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return __builtin_fma (a, b, -c); > +} > + > +void > +f3 (float a, float b, float c, float d, float e, float *res) > +{ > + res[0] = __builtin_fmaf (a, b, -e); > + res[1] = __builtin_fmaf (c, d, -e); > +} > + > +void > +f4 (double a, double b, double c, double d, double e, double *res) > +{ > + res[0] = __builtin_fma (a, b, -e); > + res[1] = __builtin_fma (c, d, -e); > +} > + > +float > +f5 (float a, float b, float c) > +{ > + return -__builtin_fmaf (-a, b, c); > +} > + > +double > +f6 (double a, double b, double c) > +{ > + return -__builtin_fma (-a, b, c); > +} > + > +float > +f7 (float a, float b, float c) > +{ > + return -__builtin_fmaf (a, -b, c); > +} > + > +double > +f8 (double a, double b, double c) > +{ > + return -__builtin_fma (a, -b, c); > +} > + > +/* { dg-final { scan-tree-dump-times { = FMS \(} 10 "optimized" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-6.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-6.c 2018-05-11 18:08:24.845946959 +0100 > @@ -0,0 +1,67 @@ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return __builtin_fmaf (-a, b, c); > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return __builtin_fma (-a, b, c); > +} > + > +float > +f3 (float a, float b, float c) > +{ > + return __builtin_fmaf (a, -b, c); > +} > + > +double > +f4 (double a, double b, double c) > +{ > + return __builtin_fma (a, -b, c); > +} > + > +void > +f5 (float a, float b, float c, float d, float e, float *res) > +{ > + res[0] = __builtin_fmaf (-a, b, c); > + res[1] = __builtin_fmaf (-a, d, e); > +} > + > +void > +f6 (double a, double b, double c, double d, double e, double *res) > +{ > + res[0] = __builtin_fma (-a, b, c); > + res[1] = __builtin_fma (-a, d, e); > +} > + > +void > +f7 (float a, float b, float c, float d, float e, float *res) > +{ > + res[0] = __builtin_fmaf (a, -b, c); > + res[1] = __builtin_fmaf (d, -b, e); > +} > + > +void > +f8 (double a, double b, double c, double d, double e, double *res) > +{ > + res[0] = __builtin_fma (a, -b, c); > + res[1] = __builtin_fma (d, -b, e); > +} > + > +float > +f9 (float a, float b, float c) > +{ > + return -__builtin_fmaf (a, b, -c); > +} > + > +double > +f10 (double a, double b, double c) > +{ > + return -__builtin_fma (a, b, -c); > +} > + > +/* { dg-final { scan-tree-dump-times { = FNMA \(} 14 "optimized" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-7.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-7.c 2018-05-11 18:08:24.845946959 +0100 > @@ -0,0 +1,67 @@ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return __builtin_fmaf (-a, b, -c); > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return __builtin_fma (-a, b, -c); > +} > + > +float > +f3 (float a, float b, float c) > +{ > + return __builtin_fmaf (a, -b, -c); > +} > + > +double > +f4 (double a, double b, double c) > +{ > + return __builtin_fma (a, -b, -c); > +} > + > +void > +f5 (float a, float b, float c, float d, float *res) > +{ > + res[0] = __builtin_fmaf (-a, b, -c); > + res[1] = __builtin_fmaf (-a, d, -c); > +} > + > +void > +f6 (double a, double b, double c, double d, double *res) > +{ > + res[0] = __builtin_fma (-a, b, -c); > + res[1] = __builtin_fma (-a, d, -c); > +} > + > +void > +f7 (float a, float b, float c, float d, float *res) > +{ > + res[0] = __builtin_fmaf (a, -b, -c); > + res[1] = __builtin_fmaf (d, -b, -c); > +} > + > +void > +f8 (double a, double b, double c, double d, double *res) > +{ > + res[0] = __builtin_fma (a, -b, -c); > + res[1] = __builtin_fma (d, -b, -c); > +} > + > +float > +f9 (float a, float b, float c) > +{ > + return -__builtin_fmaf (a, b, c); > +} > + > +double > +f10 (double a, double b, double c) > +{ > + return -__builtin_fma (a, b, c); > +} > + > +/* { dg-final { scan-tree-dump-times { = FNMS \(} 14 "optimized" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/gimplefe-26.c > =================================================================== > --- gcc/testsuite/gcc.dg/gimplefe-26.c 2017-02-23 19:54:08.000000000 +0000 > +++ /dev/null 2018-04-20 16:19:46.369131350 +0100 > @@ -1,16 +0,0 @@ > -/* { dg-do compile { target c99_runtime } } */ > -/* { dg-options "-O -fgimple -fdump-tree-ssa-gimple" } */ > - > -#define foo(type, num) \ > -type __GIMPLE () foo_##num (type a, type b, type c) \ > -{ \ > - type t0; \ > - t0_1 = __FMA (a, b, c); \ > - return t0_1; \ > -} > - > -foo(float, 1) > -foo(double, 2) > -foo(long double, 3) > - > -/* { dg-final { scan-tree-dump-times "__FMA" 3 "ssa" } } */ > Index: gcc/testsuite/gfortran.dg/reassoc_7.f > =================================================================== > --- gcc/testsuite/gfortran.dg/reassoc_7.f 2015-06-02 23:52:46.000000000 +0100 > +++ gcc/testsuite/gfortran.dg/reassoc_7.f 2018-05-11 18:08:24.846946918 +0100 > @@ -1,5 +1,5 @@ > ! { dg-do compile } > -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } > +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } > > SUBROUTINE S55199(P,Dvdph) > implicit none > Index: gcc/testsuite/gfortran.dg/reassoc_8.f > =================================================================== > --- gcc/testsuite/gfortran.dg/reassoc_8.f 2015-06-02 23:52:46.000000000 +0100 > +++ gcc/testsuite/gfortran.dg/reassoc_8.f 2018-05-11 18:08:24.846946918 +0100 > @@ -1,5 +1,5 @@ > ! { dg-do compile } > -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } > +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } > > SUBROUTINE S55199(P,Dvdph) > implicit none > Index: gcc/testsuite/gfortran.dg/reassoc_9.f > =================================================================== > --- gcc/testsuite/gfortran.dg/reassoc_9.f 2015-06-02 23:52:48.000000000 +0100 > +++ gcc/testsuite/gfortran.dg/reassoc_9.f 2018-05-11 18:08:24.847946877 +0100 > @@ -1,5 +1,5 @@ > ! { dg-do compile } > -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } > +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } > > SUBROUTINE S55199(P,Dvdph) > implicit none > Index: gcc/testsuite/gfortran.dg/reassoc_10.f > =================================================================== > --- gcc/testsuite/gfortran.dg/reassoc_10.f 2015-06-02 23:52:44.000000000 +0100 > +++ gcc/testsuite/gfortran.dg/reassoc_10.f 2018-05-11 18:08:24.846946918 +0100 > @@ -1,5 +1,5 @@ > ! { dg-do compile } > -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } > +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } > > SUBROUTINE S55199(P,Q,Dvdph) > implicit none <div id="DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><br /> <table style="border-top: 1px solid #D3D4DE;"> <tr> <td style="width: 55px; padding-top: 13px;"><a href="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail" target="_blank"><img src="https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png" alt="" width="46" height="29" style="width: 46px; height: 29px;" /></a></td> <td style="width: 470px; padding-top: 12px; color: #41424e; font-size: 13px; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">Virus-free. <a href="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail" target="_blank" style="color: #4453ea;">www.avg.com</a> </td> </tr> </table><a href="#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1" height="1"></a></div>
Andrew Pinski <pinskia@gmail.com> writes: > On Fri, May 11, 2018 at 10:15 AM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> There are four optabs for various forms of fused multiply-add: >> fma, fms, fnma and fnms. Of these, only fma had a direct gimple >> representation. For the other three we relied on special pattern- >> matching during expand, although tree-ssa-math-opts.c did have >> some code to try to second-guess what expand would do. >> >> This patch removes the old FMA_EXPR representation of fma and >> introduces four new internal functions, one for each optab. >> IFN_FMA is tied to BUILT_IN_FMA* while the other three are >> independent directly-mapped internal functions. It's then >> possible to do the pattern-matching in match.pd and >> tree-ssa-math-opts.c (via folding) can select the exact >> FMA-based operation. >> >> The patch removes the gimple FE support for __FMA rather than mapping >> it to the internal function. There's no reason now to treat it >> differently from other internal functions (although the FE doesn't >> handle those yet). >> >> The BRIG & HSA parts are a best guess, but seem relatively simple. >> >> The genmatch.c changes are structured to allow ternary ops in which >> the second two rather than the first two operands are commutative. >> A later patch makes use of this. >> >> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf, >> x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? > > > I think there might be an issue with long double and __float128 > support here (for both PowerPC and x86_64). Please add testcases for > those to show they are not problematic. > What about half float on the aarch64 case? Is that handle correctly? > I did not see a testcase for that case either. What specific kind of problem are you thinking of? The patch is type-generic and the internal functions are only used if the target advertises the appropriate optab. Targets already check (or should check) that the optabs are used under the appropriate conditions for that target. E.g. gcc.target/powerpc/float128-fma1.c checks for the four cases of __float128 for PowerPC. Thanks, Richard
On Fri, May 11, 2018 at 11:04 AM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Andrew Pinski <pinskia@gmail.com> writes: >> On Fri, May 11, 2018 at 10:15 AM, Richard Sandiford >> <richard.sandiford@linaro.org> wrote: >>> There are four optabs for various forms of fused multiply-add: >>> fma, fms, fnma and fnms. Of these, only fma had a direct gimple >>> representation. For the other three we relied on special pattern- >>> matching during expand, although tree-ssa-math-opts.c did have >>> some code to try to second-guess what expand would do. >>> >>> This patch removes the old FMA_EXPR representation of fma and >>> introduces four new internal functions, one for each optab. >>> IFN_FMA is tied to BUILT_IN_FMA* while the other three are >>> independent directly-mapped internal functions. It's then >>> possible to do the pattern-matching in match.pd and >>> tree-ssa-math-opts.c (via folding) can select the exact >>> FMA-based operation. >>> >>> The patch removes the gimple FE support for __FMA rather than mapping >>> it to the internal function. There's no reason now to treat it >>> differently from other internal functions (although the FE doesn't >>> handle those yet). >>> >>> The BRIG & HSA parts are a best guess, but seem relatively simple. >>> >>> The genmatch.c changes are structured to allow ternary ops in which >>> the second two rather than the first two operands are commutative. >>> A later patch makes use of this. >>> >>> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf, >>> x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? >> >> >> I think there might be an issue with long double and __float128 >> support here (for both PowerPC and x86_64). Please add testcases for >> those to show they are not problematic. >> What about half float on the aarch64 case? Is that handle correctly? >> I did not see a testcase for that case either. > > What specific kind of problem are you thinking of? The patch is > type-generic and the internal functions are only used if the target > advertises the appropriate optab. Targets already check (or should > check) that the optabs are used under the appropriate conditions for > that target. E.g. gcc.target/powerpc/float128-fma1.c checks for the > four cases of __float128 for PowerPC. It was more of reference to the documentation addition you did: "+Target supports all four fused multiply-add optabs for both @code{float} +and @code{double}." Also a side note, while I was working improving the use of integer madd instructions on aarch64, I found if I changed "madd<mode>" pattern name to "fma<mode>" I could get more madd used (and add with shifts too). Does the FMA internal function still support integer types? Thanks, Andrew > > Thanks, > Richard<div id="DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2"><br /> <table style="border-top: 1px solid #D3D4DE;"> <tr> <td style="width: 55px; padding-top: 13px;"><a href="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail" target="_blank"><img src="https://ipmcdn.avast.com/images/icons/icon-envelope-tick-green-avg-v1.png" alt="" width="46" height="29" style="width: 46px; height: 29px;" /></a></td> <td style="width: 470px; padding-top: 12px; color: #41424e; font-size: 13px; font-family: Arial, Helvetica, sans-serif; line-height: 18px;">Virus-free. <a href="http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail" target="_blank" style="color: #4453ea;">www.avg.com</a> </td> </tr> </table><a href="#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2" width="1" height="1"></a></div>
Andrew Pinski <pinskia@gmail.com> writes: > On Fri, May 11, 2018 at 11:04 AM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Andrew Pinski <pinskia@gmail.com> writes: >>> On Fri, May 11, 2018 at 10:15 AM, Richard Sandiford >>> <richard.sandiford@linaro.org> wrote: >>>> There are four optabs for various forms of fused multiply-add: >>>> fma, fms, fnma and fnms. Of these, only fma had a direct gimple >>>> representation. For the other three we relied on special pattern- >>>> matching during expand, although tree-ssa-math-opts.c did have >>>> some code to try to second-guess what expand would do. >>>> >>>> This patch removes the old FMA_EXPR representation of fma and >>>> introduces four new internal functions, one for each optab. >>>> IFN_FMA is tied to BUILT_IN_FMA* while the other three are >>>> independent directly-mapped internal functions. It's then >>>> possible to do the pattern-matching in match.pd and >>>> tree-ssa-math-opts.c (via folding) can select the exact >>>> FMA-based operation. >>>> >>>> The patch removes the gimple FE support for __FMA rather than mapping >>>> it to the internal function. There's no reason now to treat it >>>> differently from other internal functions (although the FE doesn't >>>> handle those yet). >>>> >>>> The BRIG & HSA parts are a best guess, but seem relatively simple. >>>> >>>> The genmatch.c changes are structured to allow ternary ops in which >>>> the second two rather than the first two operands are commutative. >>>> A later patch makes use of this. >>>> >>>> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf, >>>> x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? >>> >>> >>> I think there might be an issue with long double and __float128 >>> support here (for both PowerPC and x86_64). Please add testcases for >>> those to show they are not problematic. >>> What about half float on the aarch64 case? Is that handle correctly? >>> I did not see a testcase for that case either. >> >> What specific kind of problem are you thinking of? The patch is >> type-generic and the internal functions are only used if the target >> advertises the appropriate optab. Targets already check (or should >> check) that the optabs are used under the appropriate conditions for >> that target. E.g. gcc.target/powerpc/float128-fma1.c checks for the >> four cases of __float128 for PowerPC. > > It was more of reference to the documentation addition you did: > "+Target supports all four fused multiply-add optabs for both @code{float} > +and @code{double}." Ah, OK. That was just for the new testsuite target selector. It had to be defined relative to *some* types, and I thought in practice any target that implements these optabs will implement at least float and double, so those were the best types to use in the new tests. The tests weren't meant to cover every possible type, and I guess there's an argument that the float tests are redundant given the double ones. At least having both does prove some level of type genericity though. > Also a side note, while I was working improving the use of integer > madd instructions on aarch64, I found if I changed "madd<mode>" > pattern name to "fma<mode>" I could get more madd used (and add with > shifts too). Does the FMA internal function still support integer > types? Yeah, if the optab is defined for the corresponding integer modes. Not sure whether that's a good or a bad thing :-), but at least it's consistent with how FMA_EXPR worked. Thanks, Richard
On Fri, May 11, 2018 at 7:15 PM Richard Sandiford < richard.sandiford@linaro.org> wrote: > There are four optabs for various forms of fused multiply-add: > fma, fms, fnma and fnms. Of these, only fma had a direct gimple > representation. For the other three we relied on special pattern- > matching during expand, although tree-ssa-math-opts.c did have > some code to try to second-guess what expand would do. > This patch removes the old FMA_EXPR representation of fma and > introduces four new internal functions, one for each optab. > IFN_FMA is tied to BUILT_IN_FMA* while the other three are > independent directly-mapped internal functions. It's then > possible to do the pattern-matching in match.pd and > tree-ssa-math-opts.c (via folding) can select the exact > FMA-based operation. > The patch removes the gimple FE support for __FMA rather than mapping > it to the internal function. There's no reason now to treat it > differently from other internal functions (although the FE doesn't > handle those yet). > The BRIG & HSA parts are a best guess, but seem relatively simple. > The genmatch.c changes are structured to allow ternary ops in which > the second two rather than the first two operands are commutative. > A later patch makes use of this. > Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf, > x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Comment below > Richard > 2018-05-11 Richard Sandiford <richard.sandiford@linaro.org> > gcc/ > * doc/sourcebuild.texi (all_scalar_fma): Document. > * tree.def (FMA_EXPR): Delete. > * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions. > * internal-fn.c (ternary_direct): New macro. > (expand_ternary_optab_fn): Likewise. > (direct_ternary_optab_supported_p): Likewise. > * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h. > * builtins.c (fold_builtin_fma): Delete. > (fold_builtin_3): Don't call it. > * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling. > * expr.c (expand_expr_real_2): Likewise. > * fold-const.c (operand_equal_p): Likewise. > (fold_ternary_loc): Likewise. > * gimple-pretty-print.c (dump_ternary_rhs): Likewise. > * gimple.c (DEFTREECODE): Likewise. > * gimplify.c (gimplify_expr): Likewise. > * optabs-tree.c (optab_for_tree_code): Likewise. > * tree-cfg.c (verify_gimple_assign_ternary): Likewise. > * tree-eh.c (operation_could_trap_p): Likewise. > (stmt_could_throw_1_p): Likewise. > * tree-inline.c (estimate_operator_cost): Likewise. > * tree-pretty-print.c (dump_generic_node): Likewise. > (op_code_prio): Likewise. > * tree-ssa-loop-im.c (stmt_cost): Likewise. > * tree-ssa-operands.c (get_expr_operands): Likewise. > * tree.c (commutative_ternary_tree_code, add_expr): Likewise. > * fold-const-call.h (fold_fma): Delete. > * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS, > CFN_FNMA and CFN_FNMS. > (fold_fma): Delete. > * genmatch.c (combined_fn): New enum. > (commutative_ternary_tree_code): Remove FMA_EXPR handling. > (commutative_op): New function. > (commutate): Use it. Handle more than 2 operands. > (dt_operand::gen_gimple_expr): Use commutative_op. > (parser::parse_expr): Allow :c to be used with non-binary > operators if the commutative operand is known. > * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle > CFN_FMS, CFN_FNMA and CFN_FNMS. > (backprop::process_assign_use): Remove FMA_EXPR handling. > * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise. > (gen_hsa_fma): New function. > (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS, > IFN_FNMA and IFN_FNMS. > * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS. > * tree-ssa-math-opts.c (aggressive_valueize): New function. > (convert_mult_to_fma_1): Use the gimple_build interface and use > aggerssive_valueize to fold the result. > (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p > instead of checking for optabs directly. > * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls > rather than FMA_EXPRs. > * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a > call to IFN_FMA instead of an FMA_EXPR. > gcc/brig/ > * brigfrontend/brig-function.cc > (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA > for BRIG_OPCODE_FMA. > (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA > as a call. > gcc/c/ > * gimple-parser.c (c_parser_gimple_postfix_expression): Remove > __FMA_EXPR handlng. > gcc/cp/ > * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling. > (potential_constant_expression_1): Likewise. > gcc/testsuite/ > * lib/target-supports.exp (check_effective_target_all_scalar_fma): > New proc. > * gcc.dg/fma-1.c: New test. > * gcc.dg/fma-2.c: Likewise. > * gcc.dg/fma-3.c: Likewise. > * gcc.dg/fma-4.c: Likewise. > * gcc.dg/fma-5.c: Likewise. > * gcc.dg/fma-6.c: Likewise. > * gcc.dg/fma-7.c: Likewise. > * gcc.dg/gimplefe-26.c: Remove. > * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off. > * gfortran.dg/reassoc_8.f: Likewise. > * gfortran.dg/reassoc_9.f: Likewise. > * gfortran.dg/reassoc_10.f: Likewise. > Index: gcc/doc/sourcebuild.texi > =================================================================== > --- gcc/doc/sourcebuild.texi 2018-05-08 09:42:01.479723260 +0100 > +++ gcc/doc/sourcebuild.texi 2018-05-11 18:08:24.833947453 +0100 > @@ -2170,6 +2170,11 @@ Target supports wide characters. > @subsubsection Other attributes > @table @code > +@item all_scalar_fma > +Target supports all four fused multiply-add optabs for both @code{float} > +and @code{double}. These optabs are: @code{fma_optab}, @code{fms_optab}, > +@code{fnma_optab} and @code{fnms_optab}. > + > @item automatic_stack_alignment > Target supports automatic stack alignment. > Index: gcc/tree.def > =================================================================== > --- gcc/tree.def 2018-01-03 11:12:58.606649676 +0000 > +++ gcc/tree.def 2018-05-11 18:08:24.852946671 +0100 > @@ -1345,12 +1345,6 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "wid > by the second argument. */ > DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2) > -/* Fused multiply-add. > - All operands and the result are of the same type. No intermediate > - rounding is performed after multiplying operand one with operand two > - before adding operand three. */ > -DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3) > - > /* Widening vector multiplication. > The two operands are vectors with N elements of size S. Multiplying the > elements of the two vectors will result in N products of size 2*S. > Index: gcc/internal-fn.def > =================================================================== > --- gcc/internal-fn.def 2018-02-20 09:40:41.809452604 +0000 > +++ gcc/internal-fn.def 2018-05-11 18:08:24.842947083 +0100 > @@ -57,6 +57,7 @@ along with GCC; see the file COPYING3. > - unary: a normal unary optab, such as vec_reverse_<mode> > - binary: a normal binary optab, such as vec_interleave_lo_<mode> > + - ternary: a normal ternary optab, such as fma<mode>4 > - cond_binary: a conditional binary optab, such as add<mode>cc > @@ -138,6 +139,10 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CO > DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, > vec_shl_insert, binary) > +DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) > +DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) > +DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary) > + > DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary) > DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary) > DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first, > @@ -218,6 +223,9 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS > /* FP scales. */ > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > +/* Ternary math functions. */ > +DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary) > + > /* Unary integer ops. */ > DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary) > DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | ECF_NOTHROW, clz, unary) > Index: gcc/internal-fn.c > =================================================================== > --- gcc/internal-fn.c 2018-03-02 09:44:45.456595753 +0000 > +++ gcc/internal-fn.c 2018-05-11 18:08:24.841947124 +0100 > @@ -90,6 +90,7 @@ #define mask_store_lanes_direct { 0, 0, > #define scatter_store_direct { 3, 3, false } > #define unary_direct { 0, 0, true } > #define binary_direct { 0, 0, true } > +#define ternary_direct { 0, 0, true } > #define cond_unary_direct { 1, 1, true } > #define cond_binary_direct { 1, 1, true } > #define while_direct { 0, 2, false } > @@ -2962,6 +2963,9 @@ #define expand_unary_optab_fn(FN, STMT, > #define expand_binary_optab_fn(FN, STMT, OPTAB) \ > expand_direct_optab_fn (FN, STMT, OPTAB, 2) > +#define expand_ternary_optab_fn(FN, STMT, OPTAB) \ > + expand_direct_optab_fn (FN, STMT, OPTAB, 3) > + > #define expand_cond_unary_optab_fn(FN, STMT, OPTAB) \ > expand_direct_optab_fn (FN, STMT, OPTAB, 2) > @@ -3047,6 +3051,7 @@ multi_vector_optab_supported_p (convert_ > #define direct_unary_optab_supported_p direct_optab_supported_p > #define direct_binary_optab_supported_p direct_optab_supported_p > +#define direct_ternary_optab_supported_p direct_optab_supported_p > #define direct_cond_unary_optab_supported_p direct_optab_supported_p > #define direct_cond_binary_optab_supported_p direct_optab_supported_p > #define direct_mask_load_optab_supported_p direct_optab_supported_p > Index: gcc/Makefile.in > =================================================================== > --- gcc/Makefile.in 2018-03-13 15:06:01.749426280 +0000 > +++ gcc/Makefile.in 2018-05-11 18:08:24.816948152 +0100 > @@ -2786,7 +2786,7 @@ build/genmddump.o : genmddump.c $(RTL_BA > $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H) > build/genmatch.o : genmatch.c $(BCONFIG_H) $(SYSTEM_H) \ > $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \ > - tree.def builtins.def internal-fn.def > + tree.def builtins.def internal-fn.def case-cfn-macros.h > build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H) \ > $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def \ > internal-fn.def > Index: gcc/builtins.c > =================================================================== > --- gcc/builtins.c 2018-05-08 09:42:01.633717606 +0100 > +++ gcc/builtins.c 2018-05-11 18:08:24.817948111 +0100 > @@ -8340,21 +8340,6 @@ fold_builtin_abs (location_t loc, tree a > return fold_build1_loc (loc, ABS_EXPR, type, arg); > } > -/* Fold a call to fma, fmaf, or fmal with arguments ARG[012]. */ > - > -static tree > -fold_builtin_fma (location_t loc, tree arg0, tree arg1, tree arg2, tree type) > -{ > - /* ??? Only expand to FMA_EXPR if it's directly supported. */ > - if (validate_arg (arg0, REAL_TYPE) > - && validate_arg (arg1, REAL_TYPE) > - && validate_arg (arg2, REAL_TYPE) > - && optab_handler (fma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) > - return fold_build3_loc (loc, FMA_EXPR, type, arg0, arg1, arg2); > - > - return NULL_TREE; > -} > - > /* Fold a call to builtin carg(a+bi) -> atan2(b,a). */ > static tree > @@ -9260,10 +9245,6 @@ fold_builtin_3 (location_t loc, tree fnd > CASE_FLT_FN (BUILT_IN_SINCOS): > return fold_builtin_sincos (loc, arg0, arg1, arg2); > - CASE_FLT_FN (BUILT_IN_FMA): > - CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA): > - return fold_builtin_fma (loc, arg0, arg1, arg2, type); > - > CASE_FLT_FN (BUILT_IN_REMQUO): > if (validate_arg (arg0, REAL_TYPE) > && validate_arg (arg1, REAL_TYPE) > Index: gcc/cfgexpand.c > =================================================================== > --- gcc/cfgexpand.c 2018-05-09 11:34:47.190553782 +0100 > +++ gcc/cfgexpand.c 2018-05-11 18:08:24.818948070 +0100 > @@ -4202,7 +4202,6 @@ expand_debug_expr (tree exp) > case SAD_EXPR: > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > - case FMA_EXPR: > goto ternary; > case TRUTH_ANDIF_EXPR: > @@ -5190,9 +5189,6 @@ expand_debug_expr (tree exp) > } > return NULL; > - case FMA_EXPR: > - return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2); > - > default: > flag_unsupported: > if (flag_checking) > Index: gcc/expr.c > =================================================================== > --- gcc/expr.c 2018-05-01 19:30:32.099548805 +0100 > +++ gcc/expr.c 2018-05-11 18:08:24.833947453 +0100 > @@ -8853,67 +8853,6 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b > expand_operands (treeop0, treeop1, subtarget, &op0, &op1, EXPAND_NORMAL); > return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp)); > - case FMA_EXPR: > - { > - optab opt = fma_optab; > - gimple *def0, *def2; > - > - /* If there is no insn for FMA, emit it as __builtin_fma{,f,l} > - call. */ > - if (optab_handler (fma_optab, mode) == CODE_FOR_nothing) > - { > - tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA); > - tree call_expr; > - > - gcc_assert (fn != NULL_TREE); > - call_expr = build_call_expr (fn, 3, treeop0, treeop1, treeop2); > - return expand_builtin (call_expr, target, subtarget, mode, false); > - } > - > - def0 = get_def_for_expr (treeop0, NEGATE_EXPR); > - /* The multiplication is commutative - look at its 2nd operand > - if the first isn't fed by a negate. */ > - if (!def0) > - { > - def0 = get_def_for_expr (treeop1, NEGATE_EXPR); > - /* Swap operands if the 2nd operand is fed by a negate. */ > - if (def0) > - std::swap (treeop0, treeop1); > - } > - def2 = get_def_for_expr (treeop2, NEGATE_EXPR); > - > - op0 = op2 = NULL; > - > - if (def0 && def2 > - && optab_handler (fnms_optab, mode) != CODE_FOR_nothing) > - { > - opt = fnms_optab; > - op0 = expand_normal (gimple_assign_rhs1 (def0)); > - op2 = expand_normal (gimple_assign_rhs1 (def2)); > - } > - else if (def0 > - && optab_handler (fnma_optab, mode) != CODE_FOR_nothing) > - { > - opt = fnma_optab; > - op0 = expand_normal (gimple_assign_rhs1 (def0)); > - } > - else if (def2 > - && optab_handler (fms_optab, mode) != CODE_FOR_nothing) > - { > - opt = fms_optab; > - op2 = expand_normal (gimple_assign_rhs1 (def2)); > - } > - > - if (op0 == NULL) > - op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL); > - if (op2 == NULL) > - op2 = expand_normal (treeop2); > - op1 = expand_normal (treeop1); > - > - return expand_ternary_op (TYPE_MODE (type), opt, > - op0, op1, op2, target, 0); > - } > - > case MULT_EXPR: > /* If this is a fixed-point operation, then we cannot use the code > below because "expand_mult" doesn't support sat/no-sat fixed-point > Index: gcc/fold-const.c > =================================================================== > --- gcc/fold-const.c 2018-05-08 09:42:01.637717459 +0100 > +++ gcc/fold-const.c 2018-05-11 18:08:24.835947371 +0100 > @@ -3297,7 +3297,6 @@ #define OP_SAME_WITH_NULL(N) \ > case TRUTH_ORIF_EXPR: > return OP_SAME (0) && OP_SAME (1); > - case FMA_EXPR: > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > if (!OP_SAME (2)) > @@ -11708,17 +11707,6 @@ fold_ternary_loc (location_t loc, enum t > return NULL_TREE; > - case FMA_EXPR: > - /* For integers we can decompose the FMA if possible. */ > - if (TREE_CODE (arg0) == INTEGER_CST > - && TREE_CODE (arg1) == INTEGER_CST) > - return fold_build2_loc (loc, PLUS_EXPR, type, > - const_binop (MULT_EXPR, arg0, arg1), arg2); > - if (integer_zerop (arg2)) > - return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1); > - > - return fold_fma (loc, type, arg0, arg1, arg2); > - > case VEC_PERM_EXPR: > if (TREE_CODE (arg2) == VECTOR_CST) > { > Index: gcc/gimple-pretty-print.c > =================================================================== > --- gcc/gimple-pretty-print.c 2018-02-13 10:28:33.325853088 +0000 > +++ gcc/gimple-pretty-print.c 2018-05-11 18:08:24.836947330 +0100 > @@ -490,27 +490,6 @@ dump_ternary_rhs (pretty_printer *buffer > pp_greater (buffer); > break; > - case FMA_EXPR: > - if (flags & TDF_GIMPLE) > - { > - pp_string (buffer, "__FMA ("); > - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); > - pp_comma (buffer); > - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); > - pp_comma (buffer); > - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); > - pp_right_paren (buffer); > - } > - else > - { > - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); > - pp_string (buffer, " * "); > - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); > - pp_string (buffer, " + "); > - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); > - } > - break; > - > case DOT_PROD_EXPR: > pp_string (buffer, "DOT_PROD_EXPR <"); > dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); > Index: gcc/gimple.c > =================================================================== > --- gcc/gimple.c 2018-02-13 10:28:33.415861313 +0000 > +++ gcc/gimple.c 2018-05-11 18:08:24.838947247 +0100 > @@ -2143,8 +2143,7 @@ #define DEFTREECODE(SYM, STRING, TYPE, N > || (SYM) == REALIGN_LOAD_EXPR \ > || (SYM) == VEC_COND_EXPR \ > || (SYM) == VEC_PERM_EXPR \ > - || (SYM) == BIT_INSERT_EXPR \ > - || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS \ > + || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS \ > : ((SYM) == CONSTRUCTOR \ > || (SYM) == OBJ_TYPE_REF \ > || (SYM) == ASSERT_EXPR \ > Index: gcc/gimplify.c > =================================================================== > --- gcc/gimplify.c 2018-05-08 09:42:02.972668452 +0100 > +++ gcc/gimplify.c 2018-05-11 18:08:24.840947165 +0100 > @@ -12086,7 +12086,6 @@ gimplify_expr (tree *expr_p, gimple_seq > } > break; > - case FMA_EXPR: > case VEC_PERM_EXPR: > /* Classified as tcc_expression. */ > goto expr_3; > Index: gcc/optabs-tree.c > =================================================================== > --- gcc/optabs-tree.c 2018-01-13 18:01:26.108685820 +0000 > +++ gcc/optabs-tree.c 2018-05-11 18:08:24.842947083 +0100 > @@ -143,9 +143,6 @@ optab_for_tree_code (enum tree_code code > : (TYPE_SATURATING (type) > ? ssmsub_widen_optab : smsub_widen_optab)); > - case FMA_EXPR: > - return fma_optab; > - > case VEC_WIDEN_MULT_HI_EXPR: > return TYPE_UNSIGNED (type) ? > vec_widen_umult_hi_optab : vec_widen_smult_hi_optab; > Index: gcc/tree-cfg.c > =================================================================== > --- gcc/tree-cfg.c 2018-05-01 19:31:03.079312535 +0100 > +++ gcc/tree-cfg.c 2018-05-11 18:08:24.848946836 +0100 > @@ -4109,20 +4109,6 @@ verify_gimple_assign_ternary (gassign *s > } > break; > - case FMA_EXPR: > - if (!useless_type_conversion_p (lhs_type, rhs1_type) > - || !useless_type_conversion_p (lhs_type, rhs2_type) > - || !useless_type_conversion_p (lhs_type, rhs3_type)) > - { > - error ("type mismatch in fused multiply-add expression"); > - debug_generic_expr (lhs_type); > - debug_generic_expr (rhs1_type); > - debug_generic_expr (rhs2_type); > - debug_generic_expr (rhs3_type); > - return true; > - } > - break; > - > case VEC_COND_EXPR: > if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type) > || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type), > Index: gcc/tree-eh.c > =================================================================== > --- gcc/tree-eh.c 2018-02-08 13:34:20.321293427 +0000 > +++ gcc/tree-eh.c 2018-05-11 18:08:24.849946795 +0100 > @@ -2512,8 +2512,7 @@ operation_could_trap_p (enum tree_code o > if (TREE_CODE_CLASS (op) != tcc_comparison > && TREE_CODE_CLASS (op) != tcc_unary > - && TREE_CODE_CLASS (op) != tcc_binary > - && op != FMA_EXPR) > + && TREE_CODE_CLASS (op) != tcc_binary) > return false; > return operation_could_trap_helper_p (op, fp_operation, honor_trapv, > @@ -2825,8 +2824,7 @@ stmt_could_throw_1_p (gassign *stmt) > if (TREE_CODE_CLASS (code) == tcc_comparison > || TREE_CODE_CLASS (code) == tcc_unary > - || TREE_CODE_CLASS (code) == tcc_binary > - || code == FMA_EXPR) > + || TREE_CODE_CLASS (code) == tcc_binary) > { > if (TREE_CODE_CLASS (code) == tcc_comparison) > t = TREE_TYPE (gimple_assign_rhs1 (stmt)); > Index: gcc/tree-inline.c > =================================================================== > --- gcc/tree-inline.c 2018-03-13 15:06:01.738427167 +0000 > +++ gcc/tree-inline.c 2018-05-11 18:08:24.849946795 +0100 > @@ -3855,7 +3855,6 @@ estimate_operator_cost (enum tree_code c > case MINUS_EXPR: > case MULT_EXPR: > case MULT_HIGHPART_EXPR: > - case FMA_EXPR: > case ADDR_SPACE_CONVERT_EXPR: > case FIXED_CONVERT_EXPR: > Index: gcc/tree-pretty-print.c > =================================================================== > --- gcc/tree-pretty-print.c 2018-03-13 15:06:00.523525161 +0000 > +++ gcc/tree-pretty-print.c 2018-05-11 18:08:24.850946754 +0100 > @@ -2901,16 +2901,6 @@ dump_generic_node (pretty_printer *pp, t > pp_string (pp, " > "); > break; > - case FMA_EXPR: > - pp_string (pp, " FMA_EXPR < "); > - dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); > - pp_string (pp, ", "); > - dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false); > - pp_string (pp, ", "); > - dump_generic_node (pp, TREE_OPERAND (node, 2), spc, flags, false); > - pp_string (pp, " > "); > - break; > - > case OACC_PARALLEL: > pp_string (pp, "#pragma acc parallel"); > goto dump_omp_clauses_body; > @@ -3549,7 +3539,6 @@ op_code_prio (enum tree_code code) > case CEIL_MOD_EXPR: > case FLOOR_MOD_EXPR: > case ROUND_MOD_EXPR: > - case FMA_EXPR: > return 13; > case TRUTH_NOT_EXPR: > Index: gcc/tree-ssa-loop-im.c > =================================================================== > --- gcc/tree-ssa-loop-im.c 2018-03-13 15:06:00.521525322 +0000 > +++ gcc/tree-ssa-loop-im.c 2018-05-11 18:08:24.850946754 +0100 > @@ -493,7 +493,6 @@ stmt_cost (gimple *stmt) > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > case DOT_PROD_EXPR: > - case FMA_EXPR: > case TRUNC_DIV_EXPR: > case CEIL_DIV_EXPR: > case FLOOR_DIV_EXPR: > Index: gcc/tree-ssa-operands.c > =================================================================== > --- gcc/tree-ssa-operands.c 2018-01-03 11:12:59.133628905 +0000 > +++ gcc/tree-ssa-operands.c 2018-05-11 18:08:24.851946713 +0100 > @@ -849,7 +849,6 @@ get_expr_operands (struct function *fn, > case REALIGN_LOAD_EXPR: > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > - case FMA_EXPR: > { > get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags); > get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 1), flags); > Index: gcc/tree.c > =================================================================== > --- gcc/tree.c 2018-05-08 09:42:01.631717680 +0100 > +++ gcc/tree.c 2018-05-11 18:08:24.852946671 +0100 > @@ -7171,7 +7171,6 @@ commutative_ternary_tree_code (enum tree > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > case DOT_PROD_EXPR: > - case FMA_EXPR: > return true; > default: > @@ -7457,7 +7456,6 @@ add_expr (const_tree t, inchash::hash &h > flags &= ~OEP_ADDRESS_OF; > break; > - case FMA_EXPR: > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > { > Index: gcc/fold-const-call.h > =================================================================== > --- gcc/fold-const-call.h 2018-01-03 11:12:58.150667646 +0000 > +++ gcc/fold-const-call.h 2018-05-11 18:08:24.833947453 +0100 > @@ -23,7 +23,6 @@ #define GCC_FOLD_CONST_CALL_H > tree fold_const_call (combined_fn, tree, tree); > tree fold_const_call (combined_fn, tree, tree, tree); > tree fold_const_call (combined_fn, tree, tree, tree, tree); > -tree fold_fma (location_t, tree, tree, tree, tree); > tree build_cmp_result (tree type, int res); > #endif > Index: gcc/fold-const-call.c > =================================================================== > --- gcc/fold-const-call.c 2018-05-08 09:42:01.844709861 +0100 > +++ gcc/fold-const-call.c 2018-05-11 18:08:24.833947453 +0100 > @@ -1606,6 +1606,26 @@ fold_const_call_ssss (real_value *result > CASE_CFN_FMA_FN: > return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, arg2, format); > + case CFN_FMS: > + { > + real_value new_arg2 = real_value_negate (arg2); > + return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, &new_arg2, format); > + } > + > + case CFN_FNMA: > + { > + real_value new_arg0 = real_value_negate (arg0); > + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, arg2, format); > + } > + > + case CFN_FNMS: > + { > + real_value new_arg0 = real_value_negate (arg0); > + real_value new_arg2 = real_value_negate (arg2); > + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, > + &new_arg2, format); > + } > + > default: > return false; > } > @@ -1719,20 +1739,3 @@ fold_const_call (combined_fn fn, tree ty > return fold_const_call_1 (fn, type, arg0, arg1, arg2); > } > } > - > -/* Fold a fma operation with arguments ARG[012]. */ > - > -tree > -fold_fma (location_t, tree type, tree arg0, tree arg1, tree arg2) > -{ > - REAL_VALUE_TYPE result; > - if (real_cst_p (arg0) > - && real_cst_p (arg1) > - && real_cst_p (arg2) > - && do_mpfr_arg3 (&result, mpfr_fma, TREE_REAL_CST_PTR (arg0), > - TREE_REAL_CST_PTR (arg1), TREE_REAL_CST_PTR (arg2), > - REAL_MODE_FORMAT (TYPE_MODE (type)))) > - return build_real (type, result); > - > - return NULL_TREE; > -} > Index: gcc/genmatch.c > =================================================================== > --- gcc/genmatch.c 2018-03-01 08:20:43.846526310 +0000 > +++ gcc/genmatch.c 2018-05-11 18:08:24.835947371 +0100 > @@ -241,6 +241,20 @@ enum internal_fn { > IFN_LAST > }; > +enum combined_fn { > +#define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \ > + CFN_##ENUM = int (ENUM), > +#include "builtins.def" > + > +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \ > + CFN_##CODE = int (END_BUILTINS) + int (IFN_##CODE), > +#include "internal-fn.def" > + > + CFN_LAST > +}; > + > +#include "case-cfn-macros.h" > + > /* Return true if CODE represents a commutative tree code. Otherwise > return false. */ > bool > @@ -288,7 +302,6 @@ commutative_ternary_tree_code (enum tree > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > case DOT_PROD_EXPR: > - case FMA_EXPR: > return true; > default: > @@ -450,6 +463,44 @@ is_a_helper <user_id *>::test (id_base * > return id->kind == id_base::USER; > } > +/* If ID has a pair of consecutive, commutative operands, return the > + index of the first, otherwise return -1. */ > + > +static int > +commutative_op (id_base *id) > +{ > + if (operator_id *code = dyn_cast <operator_id *> (id)) > + { > + if (commutative_tree_code (code->code) > + || commutative_ternary_tree_code (code->code)) > + return 0; > + return -1; > + } > + if (fn_id *fn = dyn_cast <fn_id *> (id)) > + switch (fn->fn) > + { > + CASE_CFN_FMA: > + case CFN_FMS: > + case CFN_FNMA: > + case CFN_FNMS: > + return 0; > + > + default: > + return -1; > + } > + if (user_id *uid = dyn_cast<user_id *> (id)) > + { > + int res = commutative_op (uid->substitutes[0]); > + if (res < 0) > + return 0; > + for (unsigned i = 1; i < uid->substitutes.length (); ++i) > + if (res != commutative_op (uid->substitutes[i])) > + return -1; > + return res; > + } > + return -1; > +} > + > /* Add a predicate identifier to the hash. */ > static predicate_id * > @@ -946,6 +997,9 @@ commutate (operand *op, vec<vec<user_id > if (!e->is_commutative) > return ret; > + /* The operation is always binary if it isn't inherently commutative. */ > + int natural_opno = commutative_op (e->operation); > + unsigned int opno = natural_opno >= 0 ? natural_opno : 0; > for (unsigned i = 0; i < result.length (); ++i) > { > expr *ne = new expr (e); > @@ -994,9 +1048,11 @@ commutate (operand *op, vec<vec<user_id > } > } > ne->is_commutative = false; > - // result[i].length () is 2 since e->operation is binary > - for (unsigned j = result[i].length (); j; --j) > - ne->append_op (result[i][j-1]); > + for (unsigned j = 0; j < result[i].length (); ++j) > + { > + int old_j = (j == opno ? opno + 1 : j == opno + 1 ? opno : j); > + ne->append_op (result[i][old_j]); > + } > ret.safe_push (ne); > } > @@ -2759,24 +2815,18 @@ dt_operand::gen_gimple_expr (FILE *f, in > /* While the toplevel operands are canonicalized by the caller > after valueizing operands of sub-expressions we have to > re-canonicalize operand order. */ > - if (operator_id *code = dyn_cast <operator_id *> (id)) > + int opno = commutative_op (id); > + if (opno >= 0) > { > - /* ??? We can't canonicalize tcc_comparison operands here > - because that requires changing the comparison code which > - we already matched... */ > - if (commutative_tree_code (code->code) > - || commutative_ternary_tree_code (code->code)) > - { > - char child_opname0[20], child_opname1[20]; > - gen_opname (child_opname0, 0); > - gen_opname (child_opname1, 1); > - fprintf_indent (f, indent, > - "if (tree_swap_operands_p (%s, %s))\n", > - child_opname0, child_opname1); > - fprintf_indent (f, indent, > - " std::swap (%s, %s);\n", > - child_opname0, child_opname1); > - } > + char child_opname0[20], child_opname1[20]; > + gen_opname (child_opname0, opno); > + gen_opname (child_opname1, opno + 1); > + fprintf_indent (f, indent, > + "if (tree_swap_operands_p (%s, %s))\n", > + child_opname0, child_opname1); > + fprintf_indent (f, indent, > + " std::swap (%s, %s);\n", > + child_opname0, child_opname1); > } > return n_braces; > @@ -4217,11 +4267,14 @@ parser::parse_expr () > e->operation->id, e->operation->nargs, e->ops.length ()); > if (is_commutative) > { > - if (e->ops.length () == 2) > + if (e->ops.length () == 2 > + || commutative_op (e->operation) >= 0) > e->is_commutative = true; > else > - fatal_at (token, "only binary operators or function with " > - "two arguments can be marked commutative"); > + fatal_at (token, "only binary operators or functions with " > + "two arguments can be marked commutative, " > + "unless the operation is known to be inherently " > + "commutative"); > } > e->expr_type = expr_type; > return op; > Index: gcc/gimple-ssa-backprop.c > =================================================================== > --- gcc/gimple-ssa-backprop.c 2018-01-03 11:12:59.113629694 +0000 > +++ gcc/gimple-ssa-backprop.c 2018-05-11 18:08:24.836947330 +0100 > @@ -375,6 +375,9 @@ backprop::process_builtin_call_use (gcal > CASE_CFN_FMA: > CASE_CFN_FMA_FN: > + case CFN_FMS: > + case CFN_FNMA: > + case CFN_FNMS: > /* In X * X + Y, where Y is distinct from X, the sign of X doesn't > matter. */ > if (gimple_call_arg (call, 0) == rhs > @@ -420,15 +423,6 @@ backprop::process_assign_use (gassign *a > } > break; > - case FMA_EXPR: > - /* In X * X + Y, where Y is distinct from X, the sign of X doesn't > - matter. */ > - if (gimple_assign_rhs1 (assign) == rhs > - && gimple_assign_rhs2 (assign) == rhs > - && gimple_assign_rhs3 (assign) != rhs) > - info->flags.ignore_sign = true; > - break; > - > case MULT_EXPR: > /* In X * X, the sign of X doesn't matter. */ > if (gimple_assign_rhs1 (assign) == rhs > Index: gcc/hsa-gen.c > =================================================================== > --- gcc/hsa-gen.c 2018-03-17 08:30:21.230924973 +0000 > +++ gcc/hsa-gen.c 2018-05-11 18:08:24.841947124 +0100 > @@ -3178,23 +3178,6 @@ gen_hsa_insns_for_operation_assignment ( > case NEGATE_EXPR: > opcode = BRIG_OPCODE_NEG; > break; > - case FMA_EXPR: > - /* There is a native HSA instruction for scalar FMAs but not for vector > - ones. */ > - if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) > - { > - hsa_op_reg *dest > - = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign)); > - hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); > - hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); > - hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); > - hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); > - gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); > - gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb); > - return; > - } > - opcode = BRIG_OPCODE_MAD; > - break; > case MIN_EXPR: > opcode = BRIG_OPCODE_MIN; > break; > @@ -4490,6 +4473,57 @@ gen_hsa_divmod (gcall *call, hsa_bb *hbb > insn->set_output_in_type (dest, 0, hbb); > } > +/* Emit instructions that implement FMA, FMS, FNMA or FNMS call STMT. > + Instructions are appended to basic block HBB. NEGATE1 is true for > + FNMA and FNMS. NEGATE3 is true for FMS and FNMS. */ > + > +static void > +gen_hsa_fma (gcall *call, hsa_bb *hbb, bool negate1, bool negate3) > +{ > + tree lhs = gimple_call_lhs (call); > + if (lhs == NULL_TREE) > + return; > + > + tree rhs1 = gimple_call_arg (call, 0); > + tree rhs2 = gimple_call_arg (call, 1); > + tree rhs3 = gimple_call_arg (call, 2); > + > + hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs); > + hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); > + hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); > + hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); > + > + if (negate1) > + { > + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); > + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op1, hbb); > + op1 = tmp; > + } > + > + /* There is a native HSA instruction for scalar FMAs but not for vector > + ones. */ > + if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) > + { > + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); > + gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); > + gen_hsa_binary_operation (negate3 ? BRIG_OPCODE_SUB : BRIG_OPCODE_ADD, > + dest, tmp, op3, hbb); > + } > + else > + { > + if (negate3) > + { > + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); > + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op3, hbb); > + op3 = tmp; > + } > + hsa_insn_basic *insn = new hsa_insn_basic (4, BRIG_OPCODE_MAD, > + dest->m_type, dest, > + op1, op2, op3); > + hbb->append_insn (insn); > + } > +} > + > /* Set VALUE to a shadow kernel debug argument and append a new instruction > to HBB basic block. */ > @@ -5224,6 +5258,22 @@ gen_hsa_insn_for_internal_fn_call (gcall > gen_hsa_insns_for_call_of_internal_fn (stmt, hbb); > break; > + case IFN_FMA: > + gen_hsa_fma (stmt, hbb, false, false); > + break; > + > + case IFN_FMS: > + gen_hsa_fma (stmt, hbb, false, true); > + break; > + > + case IFN_FNMA: > + gen_hsa_fma (stmt, hbb, true, false); > + break; > + > + case IFN_FNMS: > + gen_hsa_fma (stmt, hbb, true, true); > + break; > + > default: > HSA_SORRY_ATV (gimple_location (stmt), > "support for HSA does not implement internal function: %s", > Index: gcc/match.pd > =================================================================== > --- gcc/match.pd 2018-05-02 08:39:32.882737077 +0100 > +++ gcc/match.pd 2018-05-11 18:08:24.842947083 +0100 > @@ -4702,3 +4702,60 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > || wi::geu_p (wi::to_wide (@rpos), > wi::to_wide (@ipos) + isize)) > (BIT_FIELD_REF @0 @rsize @rpos))))) > + > +(for fmas (FMA) > + (simplify > + (fmas:c (negate @0) @1 @2) > + (IFN_FNMA @0 @1 @2)) > + (simplify > + (fmas @0 @1 (negate @2)) > + (IFN_FMS @0 @1 @2)) > + (simplify > + (fmas:c (negate @0) @1 (negate @2)) > + (IFN_FNMS @0 @1 @2)) > + (simplify > + (negate (fmas@3 @0 @1 @2)) > + (if (single_use (@3)) > + (IFN_FNMS @0 @1 @2)))) > + > +(simplify > + (IFN_FMS:c (negate @0) @1 @2) > + (IFN_FNMS @0 @1 @2)) > +(simplify > + (IFN_FMS @0 @1 (negate @2)) > + (IFN_FMA @0 @1 @2)) > +(simplify > + (IFN_FMS:c (negate @0) @1 (negate @2)) > + (IFN_FNMA @0 @1 @2)) > +(simplify > + (negate (IFN_FMS@3 @0 @1 @2)) > + (if (single_use (@3)) > + (IFN_FNMA @0 @1 @2))) > + > +(simplify > + (IFN_FNMA:c (negate @0) @1 @2) > + (IFN_FMA @0 @1 @2)) > +(simplify > + (IFN_FNMA @0 @1 (negate @2)) > + (IFN_FNMS @0 @1 @2)) > +(simplify > + (IFN_FNMA:c (negate @0) @1 (negate @2)) > + (IFN_FMS @0 @1 @2)) > +(simplify > + (negate (IFN_FNMA@3 @0 @1 @2)) > + (if (single_use (@3)) > + (IFN_FMS @0 @1 @2))) > + > +(simplify > + (IFN_FNMS:c (negate @0) @1 @2) > + (IFN_FMS @0 @1 @2)) > +(simplify > + (IFN_FNMS @0 @1 (negate @2)) > + (IFN_FNMA @0 @1 @2)) > +(simplify > + (IFN_FNMS:c (negate @0) @1 (negate @2)) > + (IFN_FMA @0 @1 @2)) > +(simplify > + (negate (IFN_FNMS@3 @0 @1 @2)) > + (if (single_use (@3)) > + (IFN_FMA @0 @1 @2))) > Index: gcc/tree-ssa-math-opts.c > =================================================================== > --- gcc/tree-ssa-math-opts.c 2018-01-12 14:45:51.037434575 +0000 > +++ gcc/tree-ssa-math-opts.c 2018-05-11 18:08:24.850946754 +0100 > @@ -2640,6 +2640,14 @@ convert_plusminus_to_widen (gimple_stmt_ > return true; > } > +/* gimple_fold callback that "valueizes" everything. */ > + > +static tree > +aggressive_valueize (tree val) > +{ > + return val; > +} > + > /* Given a result MUL_RESULT which is a result of a multiplication of OP1 and > OP2 and which we know is used in statements that can be, together with the > multiplication, converted to FMAs, perform the transformation. */ > @@ -2650,7 +2658,7 @@ convert_mult_to_fma_1 (tree mul_result, > tree type = TREE_TYPE (mul_result); > gimple *use_stmt; > imm_use_iterator imm_iter; > - gassign *fma_stmt; > + gcall *fma_stmt; > FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result) > { > @@ -2658,6 +2666,7 @@ convert_mult_to_fma_1 (tree mul_result, > enum tree_code use_code; > tree addop, mulop1 = op1, result = mul_result; > bool negate_p = false; > + gimple_seq seq = NULL; > if (is_gimple_debug (use_stmt)) > continue; > @@ -2683,11 +2692,7 @@ convert_mult_to_fma_1 (tree mul_result, > addop = gimple_assign_rhs2 (use_stmt); > /* a * b - c -> a * b + (-c) */ > if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR) > - addop = force_gimple_operand_gsi (&gsi, > - build1 (NEGATE_EXPR, > - type, addop), > - true, NULL_TREE, true, > - GSI_SAME_STMT); > + addop = gimple_build (&seq, NEGATE_EXPR, type, addop); > } > else > { > @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result, > } > if (negate_p) > - mulop1 = force_gimple_operand_gsi (&gsi, > - build1 (NEGATE_EXPR, > - type, mulop1), > - true, NULL_TREE, true, > - GSI_SAME_STMT); > + mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1); > - fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), > - FMA_EXPR, mulop1, op2, addop); > + if (seq) > + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); > + fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop); > + gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt)); > + gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (use_stmt)); > + gsi_replace (&gsi, fma_stmt, true); > + /* Valueize aggressively so that we generate FMS, FNMA and FNMS > + regardless of where the negation occurs. */ > + if (fold_stmt (&gsi, aggressive_valueize)) > + update_stmt (gsi_stmt (gsi)); I think it would be nice to be able to use gimple_build () with IFNs so you can gimple_build () the IFN and then use gsi_replace_with_seq () on it. You only need to fold with generated negates, not with negates already in the IL? The the folding implied with gimple_build will take care of it. Otherwise can you please move aggressive_valueize to gimple-fold.[ch] alongside no_follow_ssa_edges / follow_single_use_edges and maybe rename it as follow_all_ssa_edges? Otherwise looks good to me. I guess I cannot convince you to add IFN parsing support to the GIMPLE FE? ;) Thanks, Richard. > if (dump_file && (dump_flags & TDF_DETAILS)) > { > fprintf (dump_file, "Generated FMA "); > - print_gimple_stmt (dump_file, fma_stmt, 0, 0); > + print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0); > fprintf (dump_file, "\n"); > } > - gsi_replace (&gsi, fma_stmt, true); > widen_mul_stats.fmas_inserted++; > } > } > @@ -2862,7 +2870,8 @@ convert_mult_to_fma (gimple *mul_stmt, t > /* If the target doesn't support it, don't generate it. We assume that > if fma isn't available then fms, fnma or fnms are not either. */ > - if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing) > + optimization_type opt_type = bb_optimization_type (gimple_bb (mul_stmt)); > + if (!direct_internal_fn_supported_p (IFN_FMA, type, opt_type)) > return false; > /* If the multiplication has zero uses, it is kept around probably because > @@ -2958,8 +2967,8 @@ convert_mult_to_fma (gimple *mul_stmt, t > that a mult / subtract pair. */ > if (use_code == MINUS_EXPR && !negate_p > && gimple_assign_rhs1 (use_stmt) == result > - && optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing > - && optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) > + && !direct_internal_fn_supported_p (IFN_FMS, type, opt_type) > + && direct_internal_fn_supported_p (IFN_FNMA, type, opt_type)) > { > tree rhs2 = gimple_assign_rhs2 (use_stmt); > Index: gcc/config/i386/i386.c > =================================================================== > --- gcc/config/i386/i386.c 2018-05-09 11:34:41.204789431 +0100 > +++ gcc/config/i386/i386.c 2018-05-11 18:08:24.825947782 +0100 > @@ -50516,19 +50516,20 @@ ix86_add_stmt_cost (void *data, int coun > tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE; > int stmt_cost = - 1; > + bool fp = false; > + machine_mode mode = TImode; > + > + if (vectype != NULL) > + { > + fp = FLOAT_TYPE_P (vectype); > + mode = TYPE_MODE (vectype); > + } > + > if ((kind == vector_stmt || kind == scalar_stmt) > && stmt_info > && stmt_info->stmt && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN) > { > tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt); > - bool fp = false; > - machine_mode mode = TImode; > - > - if (vectype != NULL) > - { > - fp = FLOAT_TYPE_P (vectype); > - mode = TYPE_MODE (vectype); > - } > /*machine_mode inner_mode = mode; > if (VECTOR_MODE_P (mode)) > inner_mode = GET_MODE_INNER (mode);*/ > @@ -50559,12 +50560,6 @@ ix86_add_stmt_cost (void *data, int coun > case MULT_HIGHPART_EXPR: > stmt_cost = ix86_multiplication_cost (ix86_cost, mode); > break; > - case FMA_EXPR: > - stmt_cost = ix86_vec_cost (mode, > - mode == SFmode ? ix86_cost->fmass > - : ix86_cost->fmasd, > - true); > - break; > case NEGATE_EXPR: > if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) > stmt_cost = ix86_cost->sse_op; > @@ -50627,6 +50622,24 @@ ix86_add_stmt_cost (void *data, int coun > break; > } > } > + > + combined_fn cfn; > + if ((kind == vector_stmt || kind == scalar_stmt) > + && stmt_info > + && stmt_info->stmt > + && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST) > + switch (cfn) > + { > + case CFN_FMA: > + stmt_cost = ix86_vec_cost (mode, > + mode == SFmode ? ix86_cost->fmass > + : ix86_cost->fmasd, > + true); > + break; > + default: > + break; > + } > + > /* If we do elementwise loads into a vector then we are bound by > latency and execution resources for the many scalar loads > (AGU and load ports). Try to account for this by scaling the > Index: gcc/config/rs6000/rs6000.c > =================================================================== > --- gcc/config/rs6000/rs6000.c 2018-05-08 09:42:03.510648702 +0100 > +++ gcc/config/rs6000/rs6000.c 2018-05-11 18:08:24.830947576 +0100 > @@ -16025,7 +16025,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_ > arg1 = gimple_call_arg (stmt, 1); > tree arg2 = gimple_call_arg (stmt, 2); > lhs = gimple_call_lhs (stmt); > - gimple *g = gimple_build_assign (lhs, FMA_EXPR, arg0, arg1, arg2); > + gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1, arg2); > + gimple_call_set_lhs (g, lhs); > + gimple_call_set_nothrow (g, true); > gimple_set_location (g, gimple_location (stmt)); > gsi_replace (gsi, g, true); > return true; > Index: gcc/brig/brigfrontend/brig-function.cc > =================================================================== > --- gcc/brig/brigfrontend/brig-function.cc 2018-05-08 09:42:01.419725462 +0100 > +++ gcc/brig/brigfrontend/brig-function.cc 2018-05-11 18:08:24.817948111 +0100 > @@ -1218,6 +1218,7 @@ brig_function::get_builtin_for_hsa_opcod > case BRIG_OPCODE_NEXP2: > builtin = mathfn_built_in (builtin_type, BUILT_IN_EXP2); > break; > + case BRIG_OPCODE_FMA: > case BRIG_OPCODE_NFMA: > builtin = mathfn_built_in (builtin_type, BUILT_IN_FMA); > break; > @@ -1460,8 +1461,6 @@ brig_function::get_tree_code_for_hsa_opc > return CALL_EXPR; > else > return MAX_EXPR; > - case BRIG_OPCODE_FMA: > - return FMA_EXPR; > case BRIG_OPCODE_ABS: > return ABS_EXPR; > case BRIG_OPCODE_SHL: > @@ -1496,6 +1495,7 @@ brig_function::get_tree_code_for_hsa_opc > /* Implement as 1/f (x). gcc should pattern detect that and > use a native instruction, if available, for it. */ > return TREE_LIST; > + case BRIG_OPCODE_FMA: > case BRIG_OPCODE_FLOOR: > case BRIG_OPCODE_CEIL: > case BRIG_OPCODE_SQRT: > Index: gcc/c/gimple-parser.c > =================================================================== > --- gcc/c/gimple-parser.c 2018-01-03 11:12:56.269741723 +0000 > +++ gcc/c/gimple-parser.c 2018-05-11 18:08:24.817948111 +0100 > @@ -903,27 +903,6 @@ c_parser_gimple_postfix_expression (c_pa > expr.value = fold_convert (type, val); > return expr; > } > - else if (strcmp (IDENTIFIER_POINTER (id), "__FMA") == 0) > - { > - c_parser_consume_token (parser); > - auto_vec<tree> args; > - > - if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>")) > - { > - c_parser_gimple_expr_list (parser, &args); > - c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, > - "expected %<)%>"); > - } > - if (args.length () != 3) > - { > - error_at (loc, "invalid number of operands to __FMA"); > - expr.value = error_mark_node; > - return expr; > - } > - expr.value = build3_loc (loc, FMA_EXPR, TREE_TYPE (args[0]), > - args[0], args[1], args[2]); > - return expr; > - } > /* SSA name. */ > unsigned version, ver_offset; > Index: gcc/cp/constexpr.c > =================================================================== > --- gcc/cp/constexpr.c 2018-05-01 19:30:30.973597315 +0100 > +++ gcc/cp/constexpr.c 2018-05-11 18:08:24.830947576 +0100 > @@ -4573,7 +4573,6 @@ cxx_eval_constant_expression (const cons > non_constant_p, overflow_p); > break; > - case FMA_EXPR: > case VEC_PERM_EXPR: > r = cxx_eval_trinary_expression (ctx, t, lval, > non_constant_p, overflow_p); > @@ -5998,7 +5997,6 @@ #define RECUR(T,RV) \ > return false; > return true; > - case FMA_EXPR: > case VEC_PERM_EXPR: > for (i = 0; i < 3; ++i) > if (!RECUR (TREE_OPERAND (t, i), true)) > Index: gcc/testsuite/lib/target-supports.exp > =================================================================== > --- gcc/testsuite/lib/target-supports.exp 2018-05-08 09:42:01.256731446 +0100 > +++ gcc/testsuite/lib/target-supports.exp 2018-05-11 18:08:24.848946836 +0100 > @@ -2879,6 +2879,13 @@ proc check_effective_target_base_quadflo > return 1 > } > +# Return 1 if the target supports all four forms of fused multiply-add > +# (fma, fms, fnma, and fnms) for both float and double. > + > +proc check_effective_target_all_scalar_fma { } { > + return [istarget aarch64*-*-*] > +} > + > # Return 1 if the target supports compiling fixed-point, > # 0 otherwise. > Index: gcc/testsuite/gcc.dg/fma-1.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-1.c 2018-05-11 18:08:24.844947000 +0100 > @@ -0,0 +1,15 @@ > +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return a * b + c; > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return a * b + c; > +} > + > +/* { dg-final { scan-tree-dump-times { = FMA \(} 2 "widening_mul" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-2.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-2.c 2018-05-11 18:08:24.844947000 +0100 > @@ -0,0 +1,15 @@ > +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return a * b - c; > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return a * b - c; > +} > + > +/* { dg-final { scan-tree-dump-times { = FMS \(} 2 "widening_mul" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-3.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-3.c 2018-05-11 18:08:24.845946959 +0100 > @@ -0,0 +1,15 @@ > +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return c - a * b; > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return c - a * b; > +} > + > +/* { dg-final { scan-tree-dump-times { = FNMA \(} 2 "widening_mul" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-4.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-4.c 2018-05-11 18:08:24.845946959 +0100 > @@ -0,0 +1,15 @@ > +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return -(a * b) - c; > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return -(a * b) - c; > +} > + > +/* { dg-final { scan-tree-dump-times { = FNMS \(} 2 "widening_mul" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-5.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-5.c 2018-05-11 18:08:24.845946959 +0100 > @@ -0,0 +1,53 @@ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return __builtin_fmaf (a, b, -c); > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return __builtin_fma (a, b, -c); > +} > + > +void > +f3 (float a, float b, float c, float d, float e, float *res) > +{ > + res[0] = __builtin_fmaf (a, b, -e); > + res[1] = __builtin_fmaf (c, d, -e); > +} > + > +void > +f4 (double a, double b, double c, double d, double e, double *res) > +{ > + res[0] = __builtin_fma (a, b, -e); > + res[1] = __builtin_fma (c, d, -e); > +} > + > +float > +f5 (float a, float b, float c) > +{ > + return -__builtin_fmaf (-a, b, c); > +} > + > +double > +f6 (double a, double b, double c) > +{ > + return -__builtin_fma (-a, b, c); > +} > + > +float > +f7 (float a, float b, float c) > +{ > + return -__builtin_fmaf (a, -b, c); > +} > + > +double > +f8 (double a, double b, double c) > +{ > + return -__builtin_fma (a, -b, c); > +} > + > +/* { dg-final { scan-tree-dump-times { = FMS \(} 10 "optimized" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-6.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-6.c 2018-05-11 18:08:24.845946959 +0100 > @@ -0,0 +1,67 @@ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return __builtin_fmaf (-a, b, c); > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return __builtin_fma (-a, b, c); > +} > + > +float > +f3 (float a, float b, float c) > +{ > + return __builtin_fmaf (a, -b, c); > +} > + > +double > +f4 (double a, double b, double c) > +{ > + return __builtin_fma (a, -b, c); > +} > + > +void > +f5 (float a, float b, float c, float d, float e, float *res) > +{ > + res[0] = __builtin_fmaf (-a, b, c); > + res[1] = __builtin_fmaf (-a, d, e); > +} > + > +void > +f6 (double a, double b, double c, double d, double e, double *res) > +{ > + res[0] = __builtin_fma (-a, b, c); > + res[1] = __builtin_fma (-a, d, e); > +} > + > +void > +f7 (float a, float b, float c, float d, float e, float *res) > +{ > + res[0] = __builtin_fmaf (a, -b, c); > + res[1] = __builtin_fmaf (d, -b, e); > +} > + > +void > +f8 (double a, double b, double c, double d, double e, double *res) > +{ > + res[0] = __builtin_fma (a, -b, c); > + res[1] = __builtin_fma (d, -b, e); > +} > + > +float > +f9 (float a, float b, float c) > +{ > + return -__builtin_fmaf (a, b, -c); > +} > + > +double > +f10 (double a, double b, double c) > +{ > + return -__builtin_fma (a, b, -c); > +} > + > +/* { dg-final { scan-tree-dump-times { = FNMA \(} 14 "optimized" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-7.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-7.c 2018-05-11 18:08:24.845946959 +0100 > @@ -0,0 +1,67 @@ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return __builtin_fmaf (-a, b, -c); > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return __builtin_fma (-a, b, -c); > +} > + > +float > +f3 (float a, float b, float c) > +{ > + return __builtin_fmaf (a, -b, -c); > +} > + > +double > +f4 (double a, double b, double c) > +{ > + return __builtin_fma (a, -b, -c); > +} > + > +void > +f5 (float a, float b, float c, float d, float *res) > +{ > + res[0] = __builtin_fmaf (-a, b, -c); > + res[1] = __builtin_fmaf (-a, d, -c); > +} > + > +void > +f6 (double a, double b, double c, double d, double *res) > +{ > + res[0] = __builtin_fma (-a, b, -c); > + res[1] = __builtin_fma (-a, d, -c); > +} > + > +void > +f7 (float a, float b, float c, float d, float *res) > +{ > + res[0] = __builtin_fmaf (a, -b, -c); > + res[1] = __builtin_fmaf (d, -b, -c); > +} > + > +void > +f8 (double a, double b, double c, double d, double *res) > +{ > + res[0] = __builtin_fma (a, -b, -c); > + res[1] = __builtin_fma (d, -b, -c); > +} > + > +float > +f9 (float a, float b, float c) > +{ > + return -__builtin_fmaf (a, b, c); > +} > + > +double > +f10 (double a, double b, double c) > +{ > + return -__builtin_fma (a, b, c); > +} > + > +/* { dg-final { scan-tree-dump-times { = FNMS \(} 14 "optimized" { target all_scalar_fma } } } */ > Index: gcc/testsuite/gcc.dg/gimplefe-26.c > =================================================================== > --- gcc/testsuite/gcc.dg/gimplefe-26.c 2017-02-23 19:54:08.000000000 +0000 > +++ /dev/null 2018-04-20 16:19:46.369131350 +0100 > @@ -1,16 +0,0 @@ > -/* { dg-do compile { target c99_runtime } } */ > -/* { dg-options "-O -fgimple -fdump-tree-ssa-gimple" } */ > - > -#define foo(type, num) \ > -type __GIMPLE () foo_##num (type a, type b, type c) \ > -{ \ > - type t0; \ > - t0_1 = __FMA (a, b, c); \ > - return t0_1; \ > -} > - > -foo(float, 1) > -foo(double, 2) > -foo(long double, 3) > - > -/* { dg-final { scan-tree-dump-times "__FMA" 3 "ssa" } } */ > Index: gcc/testsuite/gfortran.dg/reassoc_7.f > =================================================================== > --- gcc/testsuite/gfortran.dg/reassoc_7.f 2015-06-02 23:52:46.000000000 +0100 > +++ gcc/testsuite/gfortran.dg/reassoc_7.f 2018-05-11 18:08:24.846946918 +0100 > @@ -1,5 +1,5 @@ > ! { dg-do compile } > -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } > +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } > SUBROUTINE S55199(P,Dvdph) > implicit none > Index: gcc/testsuite/gfortran.dg/reassoc_8.f > =================================================================== > --- gcc/testsuite/gfortran.dg/reassoc_8.f 2015-06-02 23:52:46.000000000 +0100 > +++ gcc/testsuite/gfortran.dg/reassoc_8.f 2018-05-11 18:08:24.846946918 +0100 > @@ -1,5 +1,5 @@ > ! { dg-do compile } > -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } > +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } > SUBROUTINE S55199(P,Dvdph) > implicit none > Index: gcc/testsuite/gfortran.dg/reassoc_9.f > =================================================================== > --- gcc/testsuite/gfortran.dg/reassoc_9.f 2015-06-02 23:52:48.000000000 +0100 > +++ gcc/testsuite/gfortran.dg/reassoc_9.f 2018-05-11 18:08:24.847946877 +0100 > @@ -1,5 +1,5 @@ > ! { dg-do compile } > -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } > +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } > SUBROUTINE S55199(P,Dvdph) > implicit none > Index: gcc/testsuite/gfortran.dg/reassoc_10.f > =================================================================== > --- gcc/testsuite/gfortran.dg/reassoc_10.f 2015-06-02 23:52:44.000000000 +0100 > +++ gcc/testsuite/gfortran.dg/reassoc_10.f 2018-05-11 18:08:24.846946918 +0100 > @@ -1,5 +1,5 @@ > ! { dg-do compile } > -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } > +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } > SUBROUTINE S55199(P,Q,Dvdph) > implicit none
Hi, On Fri, May 11 2018, Richard Sandiford wrote: > There are four optabs for various forms of fused multiply-add: > fma, fms, fnma and fnms. Of these, only fma had a direct gimple > representation. For the other three we relied on special pattern- > matching during expand, although tree-ssa-math-opts.c did have > some code to try to second-guess what expand would do. > > This patch removes the old FMA_EXPR representation of fma and > introduces four new internal functions, one for each optab. > IFN_FMA is tied to BUILT_IN_FMA* while the other three are > independent directly-mapped internal functions. It's then > possible to do the pattern-matching in match.pd and > tree-ssa-math-opts.c (via folding) can select the exact > FMA-based operation. > > The patch removes the gimple FE support for __FMA rather than mapping > it to the internal function. There's no reason now to treat it > differently from other internal functions (although the FE doesn't > handle those yet). > > The BRIG & HSA parts are a best guess, but seem relatively simple. Both parts are OK. Thanks, Martin > > The genmatch.c changes are structured to allow ternary ops in which > the second two rather than the first two operands are commutative. > A later patch makes use of this. > > Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf, > x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? > > Richard > > > 2018-05-11 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > * doc/sourcebuild.texi (all_scalar_fma): Document. > * tree.def (FMA_EXPR): Delete. > * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions. > * internal-fn.c (ternary_direct): New macro. > (expand_ternary_optab_fn): Likewise. > (direct_ternary_optab_supported_p): Likewise. > * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h. > * builtins.c (fold_builtin_fma): Delete. > (fold_builtin_3): Don't call it. > * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling. > * expr.c (expand_expr_real_2): Likewise. > * fold-const.c (operand_equal_p): Likewise. > (fold_ternary_loc): Likewise. > * gimple-pretty-print.c (dump_ternary_rhs): Likewise. > * gimple.c (DEFTREECODE): Likewise. > * gimplify.c (gimplify_expr): Likewise. > * optabs-tree.c (optab_for_tree_code): Likewise. > * tree-cfg.c (verify_gimple_assign_ternary): Likewise. > * tree-eh.c (operation_could_trap_p): Likewise. > (stmt_could_throw_1_p): Likewise. > * tree-inline.c (estimate_operator_cost): Likewise. > * tree-pretty-print.c (dump_generic_node): Likewise. > (op_code_prio): Likewise. > * tree-ssa-loop-im.c (stmt_cost): Likewise. > * tree-ssa-operands.c (get_expr_operands): Likewise. > * tree.c (commutative_ternary_tree_code, add_expr): Likewise. > * fold-const-call.h (fold_fma): Delete. > * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS, > CFN_FNMA and CFN_FNMS. > (fold_fma): Delete. > * genmatch.c (combined_fn): New enum. > (commutative_ternary_tree_code): Remove FMA_EXPR handling. > (commutative_op): New function. > (commutate): Use it. Handle more than 2 operands. > (dt_operand::gen_gimple_expr): Use commutative_op. > (parser::parse_expr): Allow :c to be used with non-binary > operators if the commutative operand is known. > * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle > CFN_FMS, CFN_FNMA and CFN_FNMS. > (backprop::process_assign_use): Remove FMA_EXPR handling. > * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise. > (gen_hsa_fma): New function. > (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS, > IFN_FNMA and IFN_FNMS. > * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS. > * tree-ssa-math-opts.c (aggressive_valueize): New function. > (convert_mult_to_fma_1): Use the gimple_build interface and use > aggerssive_valueize to fold the result. > (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p > instead of checking for optabs directly. > * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls > rather than FMA_EXPRs. > * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a > call to IFN_FMA instead of an FMA_EXPR. > > gcc/brig/ > * brigfrontend/brig-function.cc > (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA > for BRIG_OPCODE_FMA. > (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA > as a call. > > gcc/c/ > * gimple-parser.c (c_parser_gimple_postfix_expression): Remove > __FMA_EXPR handlng. > > gcc/cp/ > * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling. > (potential_constant_expression_1): Likewise. > > gcc/testsuite/ > * lib/target-supports.exp (check_effective_target_all_scalar_fma): > New proc. > * gcc.dg/fma-1.c: New test. > * gcc.dg/fma-2.c: Likewise. > * gcc.dg/fma-3.c: Likewise. > * gcc.dg/fma-4.c: Likewise. > * gcc.dg/fma-5.c: Likewise. > * gcc.dg/fma-6.c: Likewise. > * gcc.dg/fma-7.c: Likewise. > * gcc.dg/gimplefe-26.c: Remove. > * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off. > * gfortran.dg/reassoc_8.f: Likewise. > * gfortran.dg/reassoc_9.f: Likewise. > * gfortran.dg/reassoc_10.f: Likewise. >
Richard Biener <richard.guenther@gmail.com> writes: >> @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result, >> } > >> if (negate_p) >> - mulop1 = force_gimple_operand_gsi (&gsi, >> - build1 (NEGATE_EXPR, >> - type, mulop1), >> - true, NULL_TREE, true, >> - GSI_SAME_STMT); >> + mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1); > >> - fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), >> - FMA_EXPR, mulop1, op2, addop); >> + if (seq) >> + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); >> + fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, > addop); >> + gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt)); >> + gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal > (use_stmt)); >> + gsi_replace (&gsi, fma_stmt, true); >> + /* Valueize aggressively so that we generate FMS, FNMA and FNMS >> + regardless of where the negation occurs. */ >> + if (fold_stmt (&gsi, aggressive_valueize)) >> + update_stmt (gsi_stmt (gsi)); > > I think it would be nice to be able to use gimple_build () with IFNs so you > can > gimple_build () the IFN and then use gsi_replace_with_seq () on it. You > only need to fold with generated negates, not with negates already in the > IL? > The the folding implied with gimple_build will take care of it. The idea was to pick up existing negates that feed the multiplication as well as any added by the pass itself. On IRC yesterday we talked about how this should handle the ECF_NOTHROW flag, and whether things like IFN_SQRT and IFN_FMA should always be nothrow (like the built-in functions are). But in the end I thought it'd be better to keep things as they are. We already handle -fnon-call-exceptions for unfused a * b + c and before the patch also handled it for FMA_EXPR. It'd seem like a step backwards if the new internal functions didn't handle it too. If anything it seems like the built-in functions should change to be closer to the tree_code and internal_fn way of doing things, if we want to support -fnon-call-exceptions properly. This also surprised me when doing the if-conversion patch I sent yesterday. We're happy to vectorise: for (int i = 0; i < 100; ++i) x[i] = ... ? sqrt (x[i]) : 0; by doing the sqrt unconditionally and selecting on the result, even with the default maths flags, but refuse to vectorise the simpler: for (int i = 0; i < 100; ++i) x[i] = ... ? x[i] + 1 : 0; in the same way. > Otherwise can you please move aggressive_valueize to gimple-fold.[ch] > alongside no_follow_ssa_edges / follow_single_use_edges and maybe > rename it as follow_all_ssa_edges? Ah, yeah, that's definitely a better name. I also renamed all_scalar_fma to scalar_all_fma, since I realised after Andrew's reply that the old name made it sound like it was "all scalars", whereas it meant to mean "all fmas". Tested as before. Thanks, Richard 2018-05-17 Richard Sandiford <richard.sandiford@linaro.org> gcc/ * doc/sourcebuild.texi (scalar_all_fma): Document. * tree.def (FMA_EXPR): Delete. * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions. * internal-fn.c (ternary_direct): New macro. (expand_ternary_optab_fn): Likewise. (direct_ternary_optab_supported_p): Likewise. * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h. * builtins.c (fold_builtin_fma): Delete. (fold_builtin_3): Don't call it. * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling. * expr.c (expand_expr_real_2): Likewise. * fold-const.c (operand_equal_p): Likewise. (fold_ternary_loc): Likewise. * gimple-pretty-print.c (dump_ternary_rhs): Likewise. * gimple.c (DEFTREECODE): Likewise. * gimplify.c (gimplify_expr): Likewise. * optabs-tree.c (optab_for_tree_code): Likewise. * tree-cfg.c (verify_gimple_assign_ternary): Likewise. * tree-eh.c (operation_could_trap_p): Likewise. (stmt_could_throw_1_p): Likewise. * tree-inline.c (estimate_operator_cost): Likewise. * tree-pretty-print.c (dump_generic_node): Likewise. (op_code_prio): Likewise. * tree-ssa-loop-im.c (stmt_cost): Likewise. * tree-ssa-operands.c (get_expr_operands): Likewise. * tree.c (commutative_ternary_tree_code, add_expr): Likewise. * fold-const-call.h (fold_fma): Delete. * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS, CFN_FNMA and CFN_FNMS. (fold_fma): Delete. * genmatch.c (combined_fn): New enum. (commutative_ternary_tree_code): Remove FMA_EXPR handling. (commutative_op): New function. (commutate): Use it. Handle more than 2 operands. (dt_operand::gen_gimple_expr): Use commutative_op. (parser::parse_expr): Allow :c to be used with non-binary operators if the commutative operand is known. * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle CFN_FMS, CFN_FNMA and CFN_FNMS. (backprop::process_assign_use): Remove FMA_EXPR handling. * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise. (gen_hsa_fma): New function. (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS, IFN_FNMA and IFN_FNMS. * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS. * gimple-fold.h (follow_all_ssa_edges): Declare. * gimple-fold.c (follow_all_ssa_edges): New function. * tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the gimple_build interface and use follow_all_ssa_edges to fold the result. (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p instead of checking for optabs directly. * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls rather than FMA_EXPRs. * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a call to IFN_FMA instead of an FMA_EXPR. gcc/brig/ * brigfrontend/brig-function.cc (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA for BRIG_OPCODE_FMA. (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA as a call. gcc/c/ * gimple-parser.c (c_parser_gimple_postfix_expression): Remove __FMA_EXPR handlng. gcc/cp/ * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling. (potential_constant_expression_1): Likewise. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_scalar_all_fma): New proc. * gcc.dg/fma-1.c: New test. * gcc.dg/fma-2.c: Likewise. * gcc.dg/fma-3.c: Likewise. * gcc.dg/fma-4.c: Likewise. * gcc.dg/fma-5.c: Likewise. * gcc.dg/fma-6.c: Likewise. * gcc.dg/fma-7.c: Likewise. * gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require scalar_all_fma. * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off. * gfortran.dg/reassoc_8.f: Likewise. * gfortran.dg/reassoc_9.f: Likewise. * gfortran.dg/reassoc_10.f: Likewise. Index: gcc/doc/sourcebuild.texi =================================================================== --- gcc/doc/sourcebuild.texi 2018-05-16 12:48:59.410941892 +0100 +++ gcc/doc/sourcebuild.texi 2018-05-17 09:18:19.954942948 +0100 @@ -2251,6 +2251,11 @@ Target supports @option{-pie}, @option{- @item rdynamic Target supports @option{-rdynamic}. +@item scalar_all_fma +Target supports all four fused multiply-add optabs for both @code{float} +and @code{double}. These optabs are: @code{fma_optab}, @code{fms_optab}, +@code{fnma_optab} and @code{fnms_optab}. + @item section_anchors Target supports section anchors. Index: gcc/tree.def =================================================================== --- gcc/tree.def 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree.def 2018-05-17 09:18:19.987942174 +0100 @@ -1345,12 +1345,6 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "wid by the second argument. */ DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2) -/* Fused multiply-add. - All operands and the result are of the same type. No intermediate - rounding is performed after multiplying operand one with operand two - before adding operand three. */ -DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3) - /* Widening vector multiplication. The two operands are vectors with N elements of size S. Multiplying the elements of the two vectors will result in N products of size 2*S. Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def 2018-05-16 12:48:59.410941892 +0100 +++ gcc/internal-fn.def 2018-05-17 09:18:19.974942479 +0100 @@ -57,6 +57,7 @@ along with GCC; see the file COPYING3. - unary: a normal unary optab, such as vec_reverse_<mode> - binary: a normal binary optab, such as vec_interleave_lo_<mode> + - ternary: a normal ternary optab, such as fma<mode>4 - cond_binary: a conditional binary optab, such as add<mode>cc @@ -138,6 +139,10 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CO DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, vec_shl_insert, binary) +DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) +DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) +DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary) + DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary) DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary) DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first, @@ -218,6 +223,9 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS /* FP scales. */ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) +/* Ternary math functions. */ +DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary) + /* Unary integer ops. */ DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary) DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | ECF_NOTHROW, clz, unary) Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c 2018-05-17 09:17:58.757608747 +0100 +++ gcc/internal-fn.c 2018-05-17 09:18:19.974942479 +0100 @@ -110,6 +110,7 @@ #define mask_store_lanes_direct { 0, 0, #define scatter_store_direct { 3, 3, false } #define unary_direct { 0, 0, true } #define binary_direct { 0, 0, true } +#define ternary_direct { 0, 0, true } #define cond_unary_direct { 1, 1, true } #define cond_binary_direct { 1, 1, true } #define while_direct { 0, 2, false } @@ -2982,6 +2983,9 @@ #define expand_unary_optab_fn(FN, STMT, #define expand_binary_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 2) +#define expand_ternary_optab_fn(FN, STMT, OPTAB) \ + expand_direct_optab_fn (FN, STMT, OPTAB, 3) + #define expand_cond_unary_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 2) @@ -3067,6 +3071,7 @@ multi_vector_optab_supported_p (convert_ #define direct_unary_optab_supported_p direct_optab_supported_p #define direct_binary_optab_supported_p direct_optab_supported_p +#define direct_ternary_optab_supported_p direct_optab_supported_p #define direct_cond_unary_optab_supported_p direct_optab_supported_p #define direct_cond_binary_optab_supported_p direct_optab_supported_p #define direct_mask_load_optab_supported_p direct_optab_supported_p Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in 2018-05-16 12:49:10.495216143 +0100 +++ gcc/Makefile.in 2018-05-17 09:18:19.936943371 +0100 @@ -2787,7 +2787,7 @@ build/genmddump.o : genmddump.c $(RTL_BA $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H) build/genmatch.o : genmatch.c $(BCONFIG_H) $(SYSTEM_H) \ $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \ - tree.def builtins.def internal-fn.def + tree.def builtins.def internal-fn.def case-cfn-macros.h build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H) \ $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def \ internal-fn.def Index: gcc/builtins.c =================================================================== --- gcc/builtins.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/builtins.c 2018-05-17 09:18:19.939943300 +0100 @@ -8340,21 +8340,6 @@ fold_builtin_abs (location_t loc, tree a return fold_build1_loc (loc, ABS_EXPR, type, arg); } -/* Fold a call to fma, fmaf, or fmal with arguments ARG[012]. */ - -static tree -fold_builtin_fma (location_t loc, tree arg0, tree arg1, tree arg2, tree type) -{ - /* ??? Only expand to FMA_EXPR if it's directly supported. */ - if (validate_arg (arg0, REAL_TYPE) - && validate_arg (arg1, REAL_TYPE) - && validate_arg (arg2, REAL_TYPE) - && optab_handler (fma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) - return fold_build3_loc (loc, FMA_EXPR, type, arg0, arg1, arg2); - - return NULL_TREE; -} - /* Fold a call to builtin carg(a+bi) -> atan2(b,a). */ static tree @@ -9260,10 +9245,6 @@ fold_builtin_3 (location_t loc, tree fnd CASE_FLT_FN (BUILT_IN_SINCOS): return fold_builtin_sincos (loc, arg0, arg1, arg2); - CASE_FLT_FN (BUILT_IN_FMA): - CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA): - return fold_builtin_fma (loc, arg0, arg1, arg2, type); - CASE_FLT_FN (BUILT_IN_REMQUO): if (validate_arg (arg0, REAL_TYPE) && validate_arg (arg1, REAL_TYPE) Index: gcc/cfgexpand.c =================================================================== --- gcc/cfgexpand.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/cfgexpand.c 2018-05-17 09:18:19.941943253 +0100 @@ -4202,7 +4202,6 @@ expand_debug_expr (tree exp) case SAD_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: - case FMA_EXPR: goto ternary; case TRUTH_ANDIF_EXPR: @@ -5190,9 +5189,6 @@ expand_debug_expr (tree exp) } return NULL; - case FMA_EXPR: - return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2); - default: flag_unsupported: if (flag_checking) Index: gcc/expr.c =================================================================== --- gcc/expr.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/expr.c 2018-05-17 09:18:19.957942878 +0100 @@ -8853,67 +8853,6 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b expand_operands (treeop0, treeop1, subtarget, &op0, &op1, EXPAND_NORMAL); return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp)); - case FMA_EXPR: - { - optab opt = fma_optab; - gimple *def0, *def2; - - /* If there is no insn for FMA, emit it as __builtin_fma{,f,l} - call. */ - if (optab_handler (fma_optab, mode) == CODE_FOR_nothing) - { - tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA); - tree call_expr; - - gcc_assert (fn != NULL_TREE); - call_expr = build_call_expr (fn, 3, treeop0, treeop1, treeop2); - return expand_builtin (call_expr, target, subtarget, mode, false); - } - - def0 = get_def_for_expr (treeop0, NEGATE_EXPR); - /* The multiplication is commutative - look at its 2nd operand - if the first isn't fed by a negate. */ - if (!def0) - { - def0 = get_def_for_expr (treeop1, NEGATE_EXPR); - /* Swap operands if the 2nd operand is fed by a negate. */ - if (def0) - std::swap (treeop0, treeop1); - } - def2 = get_def_for_expr (treeop2, NEGATE_EXPR); - - op0 = op2 = NULL; - - if (def0 && def2 - && optab_handler (fnms_optab, mode) != CODE_FOR_nothing) - { - opt = fnms_optab; - op0 = expand_normal (gimple_assign_rhs1 (def0)); - op2 = expand_normal (gimple_assign_rhs1 (def2)); - } - else if (def0 - && optab_handler (fnma_optab, mode) != CODE_FOR_nothing) - { - opt = fnma_optab; - op0 = expand_normal (gimple_assign_rhs1 (def0)); - } - else if (def2 - && optab_handler (fms_optab, mode) != CODE_FOR_nothing) - { - opt = fms_optab; - op2 = expand_normal (gimple_assign_rhs1 (def2)); - } - - if (op0 == NULL) - op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL); - if (op2 == NULL) - op2 = expand_normal (treeop2); - op1 = expand_normal (treeop1); - - return expand_ternary_op (TYPE_MODE (type), opt, - op0, op1, op2, target, 0); - } - case MULT_EXPR: /* If this is a fixed-point operation, then we cannot use the code below because "expand_mult" doesn't support sat/no-sat fixed-point Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/fold-const.c 2018-05-17 09:18:19.959942831 +0100 @@ -3297,7 +3297,6 @@ #define OP_SAME_WITH_NULL(N) \ case TRUTH_ORIF_EXPR: return OP_SAME (0) && OP_SAME (1); - case FMA_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: if (!OP_SAME (2)) @@ -11708,17 +11707,6 @@ fold_ternary_loc (location_t loc, enum t return NULL_TREE; - case FMA_EXPR: - /* For integers we can decompose the FMA if possible. */ - if (TREE_CODE (arg0) == INTEGER_CST - && TREE_CODE (arg1) == INTEGER_CST) - return fold_build2_loc (loc, PLUS_EXPR, type, - const_binop (MULT_EXPR, arg0, arg1), arg2); - if (integer_zerop (arg2)) - return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1); - - return fold_fma (loc, type, arg0, arg1, arg2); - case VEC_PERM_EXPR: if (TREE_CODE (arg2) == VECTOR_CST) { Index: gcc/gimple-pretty-print.c =================================================================== --- gcc/gimple-pretty-print.c 2018-05-17 09:17:58.756608780 +0100 +++ gcc/gimple-pretty-print.c 2018-05-17 09:18:19.963942737 +0100 @@ -490,27 +490,6 @@ dump_ternary_rhs (pretty_printer *buffer pp_greater (buffer); break; - case FMA_EXPR: - if (flags & TDF_GIMPLE) - { - pp_string (buffer, "__FMA ("); - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); - pp_comma (buffer); - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); - pp_comma (buffer); - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); - pp_right_paren (buffer); - } - else - { - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); - pp_string (buffer, " * "); - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); - pp_string (buffer, " + "); - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); - } - break; - case DOT_PROD_EXPR: pp_string (buffer, "DOT_PROD_EXPR <"); dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); Index: gcc/gimple.c =================================================================== --- gcc/gimple.c 2018-05-17 09:17:58.756608780 +0100 +++ gcc/gimple.c 2018-05-17 09:18:19.966942667 +0100 @@ -2150,8 +2150,7 @@ #define DEFTREECODE(SYM, STRING, TYPE, N || (SYM) == REALIGN_LOAD_EXPR \ || (SYM) == VEC_COND_EXPR \ || (SYM) == VEC_PERM_EXPR \ - || (SYM) == BIT_INSERT_EXPR \ - || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS \ + || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS \ : ((SYM) == CONSTRUCTOR \ || (SYM) == OBJ_TYPE_REF \ || (SYM) == ASSERT_EXPR \ Index: gcc/gimplify.c =================================================================== --- gcc/gimplify.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/gimplify.c 2018-05-17 09:18:19.972942526 +0100 @@ -12086,7 +12086,6 @@ gimplify_expr (tree *expr_p, gimple_seq } break; - case FMA_EXPR: case VEC_PERM_EXPR: /* Classified as tcc_expression. */ goto expr_3; Index: gcc/optabs-tree.c =================================================================== --- gcc/optabs-tree.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/optabs-tree.c 2018-05-17 09:18:19.975942456 +0100 @@ -143,9 +143,6 @@ optab_for_tree_code (enum tree_code code : (TYPE_SATURATING (type) ? ssmsub_widen_optab : smsub_widen_optab)); - case FMA_EXPR: - return fma_optab; - case VEC_WIDEN_MULT_HI_EXPR: return TYPE_UNSIGNED (type) ? vec_widen_umult_hi_optab : vec_widen_smult_hi_optab; Index: gcc/tree-cfg.c =================================================================== --- gcc/tree-cfg.c 2018-05-16 12:49:10.497382733 +0100 +++ gcc/tree-cfg.c 2018-05-17 09:18:19.979942362 +0100 @@ -4109,20 +4109,6 @@ verify_gimple_assign_ternary (gassign *s } break; - case FMA_EXPR: - if (!useless_type_conversion_p (lhs_type, rhs1_type) - || !useless_type_conversion_p (lhs_type, rhs2_type) - || !useless_type_conversion_p (lhs_type, rhs3_type)) - { - error ("type mismatch in fused multiply-add expression"); - debug_generic_expr (lhs_type); - debug_generic_expr (rhs1_type); - debug_generic_expr (rhs2_type); - debug_generic_expr (rhs3_type); - return true; - } - break; - case VEC_COND_EXPR: if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type) || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type), Index: gcc/tree-eh.c =================================================================== --- gcc/tree-eh.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree-eh.c 2018-05-17 09:18:19.980942338 +0100 @@ -2512,8 +2512,7 @@ operation_could_trap_p (enum tree_code o if (TREE_CODE_CLASS (op) != tcc_comparison && TREE_CODE_CLASS (op) != tcc_unary - && TREE_CODE_CLASS (op) != tcc_binary - && op != FMA_EXPR) + && TREE_CODE_CLASS (op) != tcc_binary) return false; return operation_could_trap_helper_p (op, fp_operation, honor_trapv, @@ -2825,8 +2824,7 @@ stmt_could_throw_1_p (gassign *stmt) if (TREE_CODE_CLASS (code) == tcc_comparison || TREE_CODE_CLASS (code) == tcc_unary - || TREE_CODE_CLASS (code) == tcc_binary - || code == FMA_EXPR) + || TREE_CODE_CLASS (code) == tcc_binary) { if (TREE_CODE_CLASS (code) == tcc_comparison) t = TREE_TYPE (gimple_assign_rhs1 (stmt)); Index: gcc/tree-inline.c =================================================================== --- gcc/tree-inline.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree-inline.c 2018-05-17 09:18:19.980942338 +0100 @@ -3855,7 +3855,6 @@ estimate_operator_cost (enum tree_code c case MINUS_EXPR: case MULT_EXPR: case MULT_HIGHPART_EXPR: - case FMA_EXPR: case ADDR_SPACE_CONVERT_EXPR: case FIXED_CONVERT_EXPR: Index: gcc/tree-pretty-print.c =================================================================== --- gcc/tree-pretty-print.c 2018-05-17 09:17:58.758608713 +0100 +++ gcc/tree-pretty-print.c 2018-05-17 09:18:19.981942315 +0100 @@ -2904,16 +2904,6 @@ dump_generic_node (pretty_printer *pp, t pp_string (pp, " > "); break; - case FMA_EXPR: - pp_string (pp, " FMA_EXPR < "); - dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); - pp_string (pp, ", "); - dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false); - pp_string (pp, ", "); - dump_generic_node (pp, TREE_OPERAND (node, 2), spc, flags, false); - pp_string (pp, " > "); - break; - case OACC_PARALLEL: pp_string (pp, "#pragma acc parallel"); goto dump_omp_clauses_body; @@ -3552,7 +3542,6 @@ op_code_prio (enum tree_code code) case CEIL_MOD_EXPR: case FLOOR_MOD_EXPR: case ROUND_MOD_EXPR: - case FMA_EXPR: return 13; case TRUTH_NOT_EXPR: Index: gcc/tree-ssa-loop-im.c =================================================================== --- gcc/tree-ssa-loop-im.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree-ssa-loop-im.c 2018-05-17 09:18:19.981942315 +0100 @@ -493,7 +493,6 @@ stmt_cost (gimple *stmt) case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case DOT_PROD_EXPR: - case FMA_EXPR: case TRUNC_DIV_EXPR: case CEIL_DIV_EXPR: case FLOOR_DIV_EXPR: Index: gcc/tree-ssa-operands.c =================================================================== --- gcc/tree-ssa-operands.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree-ssa-operands.c 2018-05-17 09:18:19.982942291 +0100 @@ -849,7 +849,6 @@ get_expr_operands (struct function *fn, case REALIGN_LOAD_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: - case FMA_EXPR: { get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags); get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 1), flags); Index: gcc/tree.c =================================================================== --- gcc/tree.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree.c 2018-05-17 09:18:19.986942197 +0100 @@ -7171,7 +7171,6 @@ commutative_ternary_tree_code (enum tree case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case DOT_PROD_EXPR: - case FMA_EXPR: return true; default: @@ -7457,7 +7456,6 @@ add_expr (const_tree t, inchash::hash &h flags &= ~OEP_ADDRESS_OF; break; - case FMA_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: { Index: gcc/fold-const-call.h =================================================================== --- gcc/fold-const-call.h 2018-05-16 12:48:59.410941892 +0100 +++ gcc/fold-const-call.h 2018-05-17 09:18:19.957942878 +0100 @@ -23,7 +23,6 @@ #define GCC_FOLD_CONST_CALL_H tree fold_const_call (combined_fn, tree, tree); tree fold_const_call (combined_fn, tree, tree, tree); tree fold_const_call (combined_fn, tree, tree, tree, tree); -tree fold_fma (location_t, tree, tree, tree, tree); tree build_cmp_result (tree type, int res); #endif Index: gcc/fold-const-call.c =================================================================== --- gcc/fold-const-call.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/fold-const-call.c 2018-05-17 09:18:19.957942878 +0100 @@ -1606,6 +1606,26 @@ fold_const_call_ssss (real_value *result CASE_CFN_FMA_FN: return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, arg2, format); + case CFN_FMS: + { + real_value new_arg2 = real_value_negate (arg2); + return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, &new_arg2, format); + } + + case CFN_FNMA: + { + real_value new_arg0 = real_value_negate (arg0); + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, arg2, format); + } + + case CFN_FNMS: + { + real_value new_arg0 = real_value_negate (arg0); + real_value new_arg2 = real_value_negate (arg2); + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, + &new_arg2, format); + } + default: return false; } @@ -1719,20 +1739,3 @@ fold_const_call (combined_fn fn, tree ty return fold_const_call_1 (fn, type, arg0, arg1, arg2); } } - -/* Fold a fma operation with arguments ARG[012]. */ - -tree -fold_fma (location_t, tree type, tree arg0, tree arg1, tree arg2) -{ - REAL_VALUE_TYPE result; - if (real_cst_p (arg0) - && real_cst_p (arg1) - && real_cst_p (arg2) - && do_mpfr_arg3 (&result, mpfr_fma, TREE_REAL_CST_PTR (arg0), - TREE_REAL_CST_PTR (arg1), TREE_REAL_CST_PTR (arg2), - REAL_MODE_FORMAT (TYPE_MODE (type)))) - return build_real (type, result); - - return NULL_TREE; -} Index: gcc/genmatch.c =================================================================== --- gcc/genmatch.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/genmatch.c 2018-05-17 09:18:19.960942808 +0100 @@ -241,6 +241,20 @@ enum internal_fn { IFN_LAST }; +enum combined_fn { +#define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \ + CFN_##ENUM = int (ENUM), +#include "builtins.def" + +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \ + CFN_##CODE = int (END_BUILTINS) + int (IFN_##CODE), +#include "internal-fn.def" + + CFN_LAST +}; + +#include "case-cfn-macros.h" + /* Return true if CODE represents a commutative tree code. Otherwise return false. */ bool @@ -288,7 +302,6 @@ commutative_ternary_tree_code (enum tree case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case DOT_PROD_EXPR: - case FMA_EXPR: return true; default: @@ -450,6 +463,44 @@ is_a_helper <user_id *>::test (id_base * return id->kind == id_base::USER; } +/* If ID has a pair of consecutive, commutative operands, return the + index of the first, otherwise return -1. */ + +static int +commutative_op (id_base *id) +{ + if (operator_id *code = dyn_cast <operator_id *> (id)) + { + if (commutative_tree_code (code->code) + || commutative_ternary_tree_code (code->code)) + return 0; + return -1; + } + if (fn_id *fn = dyn_cast <fn_id *> (id)) + switch (fn->fn) + { + CASE_CFN_FMA: + case CFN_FMS: + case CFN_FNMA: + case CFN_FNMS: + return 0; + + default: + return -1; + } + if (user_id *uid = dyn_cast<user_id *> (id)) + { + int res = commutative_op (uid->substitutes[0]); + if (res < 0) + return 0; + for (unsigned i = 1; i < uid->substitutes.length (); ++i) + if (res != commutative_op (uid->substitutes[i])) + return -1; + return res; + } + return -1; +} + /* Add a predicate identifier to the hash. */ static predicate_id * @@ -946,6 +997,9 @@ commutate (operand *op, vec<vec<user_id if (!e->is_commutative) return ret; + /* The operation is always binary if it isn't inherently commutative. */ + int natural_opno = commutative_op (e->operation); + unsigned int opno = natural_opno >= 0 ? natural_opno : 0; for (unsigned i = 0; i < result.length (); ++i) { expr *ne = new expr (e); @@ -994,9 +1048,11 @@ commutate (operand *op, vec<vec<user_id } } ne->is_commutative = false; - // result[i].length () is 2 since e->operation is binary - for (unsigned j = result[i].length (); j; --j) - ne->append_op (result[i][j-1]); + for (unsigned j = 0; j < result[i].length (); ++j) + { + int old_j = (j == opno ? opno + 1 : j == opno + 1 ? opno : j); + ne->append_op (result[i][old_j]); + } ret.safe_push (ne); } @@ -2759,24 +2815,18 @@ dt_operand::gen_gimple_expr (FILE *f, in /* While the toplevel operands are canonicalized by the caller after valueizing operands of sub-expressions we have to re-canonicalize operand order. */ - if (operator_id *code = dyn_cast <operator_id *> (id)) + int opno = commutative_op (id); + if (opno >= 0) { - /* ??? We can't canonicalize tcc_comparison operands here - because that requires changing the comparison code which - we already matched... */ - if (commutative_tree_code (code->code) - || commutative_ternary_tree_code (code->code)) - { - char child_opname0[20], child_opname1[20]; - gen_opname (child_opname0, 0); - gen_opname (child_opname1, 1); - fprintf_indent (f, indent, - "if (tree_swap_operands_p (%s, %s))\n", - child_opname0, child_opname1); - fprintf_indent (f, indent, - " std::swap (%s, %s);\n", - child_opname0, child_opname1); - } + char child_opname0[20], child_opname1[20]; + gen_opname (child_opname0, opno); + gen_opname (child_opname1, opno + 1); + fprintf_indent (f, indent, + "if (tree_swap_operands_p (%s, %s))\n", + child_opname0, child_opname1); + fprintf_indent (f, indent, + " std::swap (%s, %s);\n", + child_opname0, child_opname1); } return n_braces; @@ -4217,11 +4267,14 @@ parser::parse_expr () e->operation->id, e->operation->nargs, e->ops.length ()); if (is_commutative) { - if (e->ops.length () == 2) + if (e->ops.length () == 2 + || commutative_op (e->operation) >= 0) e->is_commutative = true; else - fatal_at (token, "only binary operators or function with " - "two arguments can be marked commutative"); + fatal_at (token, "only binary operators or functions with " + "two arguments can be marked commutative, " + "unless the operation is known to be inherently " + "commutative"); } e->expr_type = expr_type; return op; Index: gcc/gimple-ssa-backprop.c =================================================================== --- gcc/gimple-ssa-backprop.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/gimple-ssa-backprop.c 2018-05-17 09:18:19.963942737 +0100 @@ -375,6 +375,9 @@ backprop::process_builtin_call_use (gcal CASE_CFN_FMA: CASE_CFN_FMA_FN: + case CFN_FMS: + case CFN_FNMA: + case CFN_FNMS: /* In X * X + Y, where Y is distinct from X, the sign of X doesn't matter. */ if (gimple_call_arg (call, 0) == rhs @@ -420,15 +423,6 @@ backprop::process_assign_use (gassign *a } break; - case FMA_EXPR: - /* In X * X + Y, where Y is distinct from X, the sign of X doesn't - matter. */ - if (gimple_assign_rhs1 (assign) == rhs - && gimple_assign_rhs2 (assign) == rhs - && gimple_assign_rhs3 (assign) != rhs) - info->flags.ignore_sign = true; - break; - case MULT_EXPR: /* In X * X, the sign of X doesn't matter. */ if (gimple_assign_rhs1 (assign) == rhs Index: gcc/hsa-gen.c =================================================================== --- gcc/hsa-gen.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/hsa-gen.c 2018-05-17 09:18:19.973942503 +0100 @@ -3178,23 +3178,6 @@ gen_hsa_insns_for_operation_assignment ( case NEGATE_EXPR: opcode = BRIG_OPCODE_NEG; break; - case FMA_EXPR: - /* There is a native HSA instruction for scalar FMAs but not for vector - ones. */ - if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) - { - hsa_op_reg *dest - = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign)); - hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); - hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); - hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); - hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); - gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); - gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb); - return; - } - opcode = BRIG_OPCODE_MAD; - break; case MIN_EXPR: opcode = BRIG_OPCODE_MIN; break; @@ -4490,6 +4473,57 @@ gen_hsa_divmod (gcall *call, hsa_bb *hbb insn->set_output_in_type (dest, 0, hbb); } +/* Emit instructions that implement FMA, FMS, FNMA or FNMS call STMT. + Instructions are appended to basic block HBB. NEGATE1 is true for + FNMA and FNMS. NEGATE3 is true for FMS and FNMS. */ + +static void +gen_hsa_fma (gcall *call, hsa_bb *hbb, bool negate1, bool negate3) +{ + tree lhs = gimple_call_lhs (call); + if (lhs == NULL_TREE) + return; + + tree rhs1 = gimple_call_arg (call, 0); + tree rhs2 = gimple_call_arg (call, 1); + tree rhs3 = gimple_call_arg (call, 2); + + hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs); + hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); + hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); + hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); + + if (negate1) + { + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op1, hbb); + op1 = tmp; + } + + /* There is a native HSA instruction for scalar FMAs but not for vector + ones. */ + if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) + { + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); + gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); + gen_hsa_binary_operation (negate3 ? BRIG_OPCODE_SUB : BRIG_OPCODE_ADD, + dest, tmp, op3, hbb); + } + else + { + if (negate3) + { + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op3, hbb); + op3 = tmp; + } + hsa_insn_basic *insn = new hsa_insn_basic (4, BRIG_OPCODE_MAD, + dest->m_type, dest, + op1, op2, op3); + hbb->append_insn (insn); + } +} + /* Set VALUE to a shadow kernel debug argument and append a new instruction to HBB basic block. */ @@ -5224,6 +5258,22 @@ gen_hsa_insn_for_internal_fn_call (gcall gen_hsa_insns_for_call_of_internal_fn (stmt, hbb); break; + case IFN_FMA: + gen_hsa_fma (stmt, hbb, false, false); + break; + + case IFN_FMS: + gen_hsa_fma (stmt, hbb, false, true); + break; + + case IFN_FNMA: + gen_hsa_fma (stmt, hbb, true, false); + break; + + case IFN_FNMS: + gen_hsa_fma (stmt, hbb, true, true); + break; + default: HSA_SORRY_ATV (gimple_location (stmt), "support for HSA does not implement internal function: %s", Index: gcc/match.pd =================================================================== --- gcc/match.pd 2018-05-16 12:48:59.410941892 +0100 +++ gcc/match.pd 2018-05-17 09:18:19.974942479 +0100 @@ -4702,3 +4702,60 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) || wi::geu_p (wi::to_wide (@rpos), wi::to_wide (@ipos) + isize)) (BIT_FIELD_REF @0 @rsize @rpos))))) + +(for fmas (FMA) + (simplify + (fmas:c (negate @0) @1 @2) + (IFN_FNMA @0 @1 @2)) + (simplify + (fmas @0 @1 (negate @2)) + (IFN_FMS @0 @1 @2)) + (simplify + (fmas:c (negate @0) @1 (negate @2)) + (IFN_FNMS @0 @1 @2)) + (simplify + (negate (fmas@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FNMS @0 @1 @2)))) + +(simplify + (IFN_FMS:c (negate @0) @1 @2) + (IFN_FNMS @0 @1 @2)) +(simplify + (IFN_FMS @0 @1 (negate @2)) + (IFN_FMA @0 @1 @2)) +(simplify + (IFN_FMS:c (negate @0) @1 (negate @2)) + (IFN_FNMA @0 @1 @2)) +(simplify + (negate (IFN_FMS@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FNMA @0 @1 @2))) + +(simplify + (IFN_FNMA:c (negate @0) @1 @2) + (IFN_FMA @0 @1 @2)) +(simplify + (IFN_FNMA @0 @1 (negate @2)) + (IFN_FNMS @0 @1 @2)) +(simplify + (IFN_FNMA:c (negate @0) @1 (negate @2)) + (IFN_FMS @0 @1 @2)) +(simplify + (negate (IFN_FNMA@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FMS @0 @1 @2))) + +(simplify + (IFN_FNMS:c (negate @0) @1 @2) + (IFN_FMS @0 @1 @2)) +(simplify + (IFN_FNMS @0 @1 (negate @2)) + (IFN_FNMA @0 @1 @2)) +(simplify + (IFN_FNMS:c (negate @0) @1 (negate @2)) + (IFN_FMA @0 @1 @2)) +(simplify + (negate (IFN_FNMS@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FMA @0 @1 @2))) Index: gcc/gimple-fold.h =================================================================== --- gcc/gimple-fold.h 2018-05-17 09:17:32.876478942 +0100 +++ gcc/gimple-fold.h 2018-05-17 09:18:19.962942761 +0100 @@ -41,6 +41,7 @@ extern bool arith_overflowed_p (enum tre const_tree); extern tree no_follow_ssa_edges (tree); extern tree follow_single_use_edges (tree); +extern tree follow_all_ssa_edges (tree); extern tree gimple_fold_stmt_to_constant_1 (gimple *, tree (*) (tree), tree (*) (tree) = no_follow_ssa_edges); extern tree gimple_fold_stmt_to_constant (gimple *, tree (*) (tree)); Index: gcc/gimple-fold.c =================================================================== --- gcc/gimple-fold.c 2018-05-17 09:17:32.876478942 +0100 +++ gcc/gimple-fold.c 2018-05-17 09:18:19.961942784 +0100 @@ -4967,6 +4967,14 @@ follow_single_use_edges (tree val) return val; } +/* Valueization callback that follows all SSA edges. */ + +tree +follow_all_ssa_edges (tree val) +{ + return val; +} + /* Fold the statement pointed to by GSI. In some cases, this function may replace the whole statement with a new one. Returns true iff folding makes any changes. Index: gcc/tree-ssa-math-opts.c =================================================================== --- gcc/tree-ssa-math-opts.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/tree-ssa-math-opts.c 2018-05-17 09:18:19.982942291 +0100 @@ -2650,7 +2650,7 @@ convert_mult_to_fma_1 (tree mul_result, tree type = TREE_TYPE (mul_result); gimple *use_stmt; imm_use_iterator imm_iter; - gassign *fma_stmt; + gcall *fma_stmt; FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result) { @@ -2658,6 +2658,7 @@ convert_mult_to_fma_1 (tree mul_result, enum tree_code use_code; tree addop, mulop1 = op1, result = mul_result; bool negate_p = false; + gimple_seq seq = NULL; if (is_gimple_debug (use_stmt)) continue; @@ -2683,11 +2684,7 @@ convert_mult_to_fma_1 (tree mul_result, addop = gimple_assign_rhs2 (use_stmt); /* a * b - c -> a * b + (-c) */ if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR) - addop = force_gimple_operand_gsi (&gsi, - build1 (NEGATE_EXPR, - type, addop), - true, NULL_TREE, true, - GSI_SAME_STMT); + addop = gimple_build (&seq, NEGATE_EXPR, type, addop); } else { @@ -2698,23 +2695,26 @@ convert_mult_to_fma_1 (tree mul_result, } if (negate_p) - mulop1 = force_gimple_operand_gsi (&gsi, - build1 (NEGATE_EXPR, - type, mulop1), - true, NULL_TREE, true, - GSI_SAME_STMT); + mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1); - fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), - FMA_EXPR, mulop1, op2, addop); + if (seq) + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); + fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop); + gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt)); + gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (use_stmt)); + gsi_replace (&gsi, fma_stmt, true); + /* Follow all SSA edges so that we generate FMS, FNMA and FNMS + regardless of where the negation occurs. */ + if (fold_stmt (&gsi, follow_all_ssa_edges)) + update_stmt (gsi_stmt (gsi)); if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Generated FMA "); - print_gimple_stmt (dump_file, fma_stmt, 0, 0); + print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0); fprintf (dump_file, "\n"); } - gsi_replace (&gsi, fma_stmt, true); widen_mul_stats.fmas_inserted++; } } @@ -2862,7 +2862,8 @@ convert_mult_to_fma (gimple *mul_stmt, t /* If the target doesn't support it, don't generate it. We assume that if fma isn't available then fms, fnma or fnms are not either. */ - if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing) + optimization_type opt_type = bb_optimization_type (gimple_bb (mul_stmt)); + if (!direct_internal_fn_supported_p (IFN_FMA, type, opt_type)) return false; /* If the multiplication has zero uses, it is kept around probably because @@ -2958,8 +2959,8 @@ convert_mult_to_fma (gimple *mul_stmt, t that a mult / subtract pair. */ if (use_code == MINUS_EXPR && !negate_p && gimple_assign_rhs1 (use_stmt) == result - && optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing - && optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) + && !direct_internal_fn_supported_p (IFN_FMS, type, opt_type) + && direct_internal_fn_supported_p (IFN_FNMA, type, opt_type)) { tree rhs2 = gimple_assign_rhs2 (use_stmt); Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c 2018-05-16 12:49:10.501715913 +0100 +++ gcc/config/i386/i386.c 2018-05-17 09:18:19.949943066 +0100 @@ -50614,19 +50614,20 @@ ix86_add_stmt_cost (void *data, int coun tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE; int stmt_cost = - 1; + bool fp = false; + machine_mode mode = TImode; + + if (vectype != NULL) + { + fp = FLOAT_TYPE_P (vectype); + mode = TYPE_MODE (vectype); + } + if ((kind == vector_stmt || kind == scalar_stmt) && stmt_info && stmt_info->stmt && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN) { tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt); - bool fp = false; - machine_mode mode = TImode; - - if (vectype != NULL) - { - fp = FLOAT_TYPE_P (vectype); - mode = TYPE_MODE (vectype); - } /*machine_mode inner_mode = mode; if (VECTOR_MODE_P (mode)) inner_mode = GET_MODE_INNER (mode);*/ @@ -50657,12 +50658,6 @@ ix86_add_stmt_cost (void *data, int coun case MULT_HIGHPART_EXPR: stmt_cost = ix86_multiplication_cost (ix86_cost, mode); break; - case FMA_EXPR: - stmt_cost = ix86_vec_cost (mode, - mode == SFmode ? ix86_cost->fmass - : ix86_cost->fmasd, - true); - break; case NEGATE_EXPR: if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) stmt_cost = ix86_cost->sse_op; @@ -50725,6 +50720,24 @@ ix86_add_stmt_cost (void *data, int coun break; } } + + combined_fn cfn; + if ((kind == vector_stmt || kind == scalar_stmt) + && stmt_info + && stmt_info->stmt + && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST) + switch (cfn) + { + case CFN_FMA: + stmt_cost = ix86_vec_cost (mode, + mode == SFmode ? ix86_cost->fmass + : ix86_cost->fmasd, + true); + break; + default: + break; + } + /* If we do elementwise loads into a vector then we are bound by latency and execution resources for the many scalar loads (AGU and load ports). Try to account for this by scaling the Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/config/rs6000/rs6000.c 2018-05-17 09:18:19.953942972 +0100 @@ -15880,7 +15880,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_ arg1 = gimple_call_arg (stmt, 1); tree arg2 = gimple_call_arg (stmt, 2); lhs = gimple_call_lhs (stmt); - gimple *g = gimple_build_assign (lhs, FMA_EXPR, arg0, arg1, arg2); + gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1, arg2); + gimple_call_set_lhs (g, lhs); + gimple_call_set_nothrow (g, true); gimple_set_location (g, gimple_location (stmt)); gsi_replace (gsi, g, true); return true; Index: gcc/brig/brigfrontend/brig-function.cc =================================================================== --- gcc/brig/brigfrontend/brig-function.cc 2018-05-16 12:48:59.410941892 +0100 +++ gcc/brig/brigfrontend/brig-function.cc 2018-05-17 09:18:19.936943371 +0100 @@ -1218,6 +1218,7 @@ brig_function::get_builtin_for_hsa_opcod case BRIG_OPCODE_NEXP2: builtin = mathfn_built_in (builtin_type, BUILT_IN_EXP2); break; + case BRIG_OPCODE_FMA: case BRIG_OPCODE_NFMA: builtin = mathfn_built_in (builtin_type, BUILT_IN_FMA); break; @@ -1460,8 +1461,6 @@ brig_function::get_tree_code_for_hsa_opc return CALL_EXPR; else return MAX_EXPR; - case BRIG_OPCODE_FMA: - return FMA_EXPR; case BRIG_OPCODE_ABS: return ABS_EXPR; case BRIG_OPCODE_SHL: @@ -1496,6 +1495,7 @@ brig_function::get_tree_code_for_hsa_opc /* Implement as 1/f (x). gcc should pattern detect that and use a native instruction, if available, for it. */ return TREE_LIST; + case BRIG_OPCODE_FMA: case BRIG_OPCODE_FLOOR: case BRIG_OPCODE_CEIL: case BRIG_OPCODE_SQRT: Index: gcc/c/gimple-parser.c =================================================================== --- gcc/c/gimple-parser.c 2018-05-17 09:17:58.756608780 +0100 +++ gcc/c/gimple-parser.c 2018-05-17 09:18:19.939943300 +0100 @@ -952,27 +952,6 @@ c_parser_gimple_postfix_expression (c_pa expr.value = fold_convert (type, val); return expr; } - else if (strcmp (IDENTIFIER_POINTER (id), "__FMA") == 0) - { - c_parser_consume_token (parser); - auto_vec<tree> args; - - if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>")) - { - c_parser_gimple_expr_list (parser, &args); - c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, - "expected %<)%>"); - } - if (args.length () != 3) - { - error_at (loc, "invalid number of operands to __FMA"); - expr.value = error_mark_node; - return expr; - } - expr.value = build3_loc (loc, FMA_EXPR, TREE_TYPE (args[0]), - args[0], args[1], args[2]); - return expr; - } /* SSA name. */ unsigned version, ver_offset; Index: gcc/cp/constexpr.c =================================================================== --- gcc/cp/constexpr.c 2018-05-16 12:49:10.203809793 +0100 +++ gcc/cp/constexpr.c 2018-05-17 09:18:19.954942948 +0100 @@ -4574,7 +4574,6 @@ cxx_eval_constant_expression (const cons non_constant_p, overflow_p); break; - case FMA_EXPR: case VEC_PERM_EXPR: r = cxx_eval_trinary_expression (ctx, t, lval, non_constant_p, overflow_p); @@ -5999,7 +5998,6 @@ #define RECUR(T,RV) \ return false; return true; - case FMA_EXPR: case VEC_PERM_EXPR: for (i = 0; i < 3; ++i) if (!RECUR (TREE_OPERAND (t, i), true)) Index: gcc/testsuite/lib/target-supports.exp =================================================================== --- gcc/testsuite/lib/target-supports.exp 2018-05-16 12:48:59.410941892 +0100 +++ gcc/testsuite/lib/target-supports.exp 2018-05-17 09:18:19.978942385 +0100 @@ -2879,6 +2879,13 @@ proc check_effective_target_base_quadflo return 1 } +# Return 1 if the target supports all four forms of fused multiply-add +# (fma, fms, fnma, and fnms) for both float and double. + +proc check_effective_target_scalar_all_fma { } { + return [istarget aarch64*-*-*] +} + # Return 1 if the target supports compiling fixed-point, # 0 otherwise. Index: gcc/testsuite/gcc.dg/fma-1.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-1.c 2018-05-17 09:18:19.975942456 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return a * b + c; +} + +double +f2 (double a, double b, double c) +{ + return a * b + c; +} + +/* { dg-final { scan-tree-dump-times { = \.FMA \(} 2 "widening_mul" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-2.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-2.c 2018-05-17 09:18:19.975942456 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return a * b - c; +} + +double +f2 (double a, double b, double c) +{ + return a * b - c; +} + +/* { dg-final { scan-tree-dump-times { = \.FMS \(} 2 "widening_mul" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-3.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-3.c 2018-05-17 09:18:19.975942456 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return c - a * b; +} + +double +f2 (double a, double b, double c) +{ + return c - a * b; +} + +/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 2 "widening_mul" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-4.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-4.c 2018-05-17 09:18:19.975942456 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return -(a * b) - c; +} + +double +f2 (double a, double b, double c) +{ + return -(a * b) - c; +} + +/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 2 "widening_mul" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-5.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-5.c 2018-05-17 09:18:19.976942432 +0100 @@ -0,0 +1,53 @@ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +float +f1 (float a, float b, float c) +{ + return __builtin_fmaf (a, b, -c); +} + +double +f2 (double a, double b, double c) +{ + return __builtin_fma (a, b, -c); +} + +void +f3 (float a, float b, float c, float d, float e, float *res) +{ + res[0] = __builtin_fmaf (a, b, -e); + res[1] = __builtin_fmaf (c, d, -e); +} + +void +f4 (double a, double b, double c, double d, double e, double *res) +{ + res[0] = __builtin_fma (a, b, -e); + res[1] = __builtin_fma (c, d, -e); +} + +float +f5 (float a, float b, float c) +{ + return -__builtin_fmaf (-a, b, c); +} + +double +f6 (double a, double b, double c) +{ + return -__builtin_fma (-a, b, c); +} + +float +f7 (float a, float b, float c) +{ + return -__builtin_fmaf (a, -b, c); +} + +double +f8 (double a, double b, double c) +{ + return -__builtin_fma (a, -b, c); +} + +/* { dg-final { scan-tree-dump-times { = \.FMS \(} 10 "optimized" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-6.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-6.c 2018-05-17 09:18:19.976942432 +0100 @@ -0,0 +1,67 @@ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +float +f1 (float a, float b, float c) +{ + return __builtin_fmaf (-a, b, c); +} + +double +f2 (double a, double b, double c) +{ + return __builtin_fma (-a, b, c); +} + +float +f3 (float a, float b, float c) +{ + return __builtin_fmaf (a, -b, c); +} + +double +f4 (double a, double b, double c) +{ + return __builtin_fma (a, -b, c); +} + +void +f5 (float a, float b, float c, float d, float e, float *res) +{ + res[0] = __builtin_fmaf (-a, b, c); + res[1] = __builtin_fmaf (-a, d, e); +} + +void +f6 (double a, double b, double c, double d, double e, double *res) +{ + res[0] = __builtin_fma (-a, b, c); + res[1] = __builtin_fma (-a, d, e); +} + +void +f7 (float a, float b, float c, float d, float e, float *res) +{ + res[0] = __builtin_fmaf (a, -b, c); + res[1] = __builtin_fmaf (d, -b, e); +} + +void +f8 (double a, double b, double c, double d, double e, double *res) +{ + res[0] = __builtin_fma (a, -b, c); + res[1] = __builtin_fma (d, -b, e); +} + +float +f9 (float a, float b, float c) +{ + return -__builtin_fmaf (a, b, -c); +} + +double +f10 (double a, double b, double c) +{ + return -__builtin_fma (a, b, -c); +} + +/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 14 "optimized" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-7.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-7.c 2018-05-17 09:18:19.976942432 +0100 @@ -0,0 +1,67 @@ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +float +f1 (float a, float b, float c) +{ + return __builtin_fmaf (-a, b, -c); +} + +double +f2 (double a, double b, double c) +{ + return __builtin_fma (-a, b, -c); +} + +float +f3 (float a, float b, float c) +{ + return __builtin_fmaf (a, -b, -c); +} + +double +f4 (double a, double b, double c) +{ + return __builtin_fma (a, -b, -c); +} + +void +f5 (float a, float b, float c, float d, float *res) +{ + res[0] = __builtin_fmaf (-a, b, -c); + res[1] = __builtin_fmaf (-a, d, -c); +} + +void +f6 (double a, double b, double c, double d, double *res) +{ + res[0] = __builtin_fma (-a, b, -c); + res[1] = __builtin_fma (-a, d, -c); +} + +void +f7 (float a, float b, float c, float d, float *res) +{ + res[0] = __builtin_fmaf (a, -b, -c); + res[1] = __builtin_fmaf (d, -b, -c); +} + +void +f8 (double a, double b, double c, double d, double *res) +{ + res[0] = __builtin_fma (a, -b, -c); + res[1] = __builtin_fma (d, -b, -c); +} + +float +f9 (float a, float b, float c) +{ + return -__builtin_fmaf (a, b, c); +} + +double +f10 (double a, double b, double c) +{ + return -__builtin_fma (a, b, c); +} + +/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 14 "optimized" { target scalar_all_fma } } } */ Index: gcc/testsuite/gcc.dg/gimplefe-26.c =================================================================== --- gcc/testsuite/gcc.dg/gimplefe-26.c 2018-05-16 12:48:59.410941892 +0100 +++ gcc/testsuite/gcc.dg/gimplefe-26.c 2018-05-17 09:18:19.976942432 +0100 @@ -1,16 +1,15 @@ -/* { dg-do compile { target c99_runtime } } */ +/* { dg-do compile { target scalar_all_fma } } */ /* { dg-options "-O -fgimple -fdump-tree-ssa-gimple" } */ #define foo(type, num) \ type __GIMPLE () foo_##num (type a, type b, type c) \ { \ type t0; \ - t0_1 = __FMA (a, b, c); \ + t0_1 = .FMA (a, b, c); \ return t0_1; \ } foo(float, 1) foo(double, 2) -foo(long double, 3) -/* { dg-final { scan-tree-dump-times "__FMA" 3 "ssa" } } */ +/* { dg-final { scan-tree-dump-times {\.FMA} 2 "ssa" } } */ Index: gcc/testsuite/gfortran.dg/reassoc_7.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_7.f 2018-05-16 12:48:59.410941892 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_7.f 2018-05-17 09:18:19.976942432 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Dvdph) implicit none Index: gcc/testsuite/gfortran.dg/reassoc_8.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_8.f 2018-05-16 12:48:59.410941892 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_8.f 2018-05-17 09:18:19.977942409 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Dvdph) implicit none Index: gcc/testsuite/gfortran.dg/reassoc_9.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_9.f 2018-05-16 12:48:59.410941892 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_9.f 2018-05-17 09:18:19.977942409 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Dvdph) implicit none Index: gcc/testsuite/gfortran.dg/reassoc_10.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_10.f 2018-05-16 12:48:59.410941892 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_10.f 2018-05-17 09:18:19.976942432 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Q,Dvdph) implicit none
On Thu, May 17, 2018 at 10:56 AM Richard Sandiford < richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: > >> @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result, > >> } > > > >> if (negate_p) > >> - mulop1 = force_gimple_operand_gsi (&gsi, > >> - build1 (NEGATE_EXPR, > >> - type, mulop1), > >> - true, NULL_TREE, true, > >> - GSI_SAME_STMT); > >> + mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1); > > > >> - fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), > >> - FMA_EXPR, mulop1, op2, addop); > >> + if (seq) > >> + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); > >> + fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, > > addop); > >> + gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt)); > >> + gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal > > (use_stmt)); > >> + gsi_replace (&gsi, fma_stmt, true); > >> + /* Valueize aggressively so that we generate FMS, FNMA and FNMS > >> + regardless of where the negation occurs. */ > >> + if (fold_stmt (&gsi, aggressive_valueize)) > >> + update_stmt (gsi_stmt (gsi)); > > > > I think it would be nice to be able to use gimple_build () with IFNs so you > > can > > gimple_build () the IFN and then use gsi_replace_with_seq () on it. You > > only need to fold with generated negates, not with negates already in the > > IL? > > The the folding implied with gimple_build will take care of it. > The idea was to pick up existing negates that feed the multiplication > as well as any added by the pass itself. > On IRC yesterday we talked about how this should handle the ECF_NOTHROW > flag, and whether things like IFN_SQRT and IFN_FMA should always be > nothrow (like the built-in functions are). But in the end I thought > it'd be better to keep things as they are. We already handle > -fnon-call-exceptions for unfused a * b + c and before the patch also > handled it for FMA_EXPR. It'd seem like a step backwards if the new > internal functions didn't handle it too. If anything it seems like the > built-in functions should change to be closer to the tree_code and > internal_fn way of doing things, if we want to support -fnon-call-exceptions > properly. Right. -fnon-call-exceptions isn't very well tested outside of Ada which must have its own builtin declarations. > This also surprised me when doing the if-conversion patch I sent yesterday. > We're happy to vectorise: > for (int i = 0; i < 100; ++i) > x[i] = ... ? sqrt (x[i]) : 0; > by doing the sqrt unconditionally and selecting on the result, even with > the default maths flags, but refuse to vectorise the simpler: > for (int i = 0; i < 100; ++i) > x[i] = ... ? x[i] + 1 : 0; > in the same way. Heh. > > Otherwise can you please move aggressive_valueize to gimple-fold.[ch] > > alongside no_follow_ssa_edges / follow_single_use_edges and maybe > > rename it as follow_all_ssa_edges? > Ah, yeah, that's definitely a better name. > I also renamed all_scalar_fma to scalar_all_fma, since I realised > after Andrew's reply that the old name made it sound like it was > "all scalars", whereas it meant to mean "all fmas". > Tested as before. OK. Thanks, Richard. > Thanks, > Richard > 2018-05-17 Richard Sandiford <richard.sandiford@linaro.org> > gcc/ > * doc/sourcebuild.texi (scalar_all_fma): Document. > * tree.def (FMA_EXPR): Delete. > * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions. > * internal-fn.c (ternary_direct): New macro. > (expand_ternary_optab_fn): Likewise. > (direct_ternary_optab_supported_p): Likewise. > * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h. > * builtins.c (fold_builtin_fma): Delete. > (fold_builtin_3): Don't call it. > * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling. > * expr.c (expand_expr_real_2): Likewise. > * fold-const.c (operand_equal_p): Likewise. > (fold_ternary_loc): Likewise. > * gimple-pretty-print.c (dump_ternary_rhs): Likewise. > * gimple.c (DEFTREECODE): Likewise. > * gimplify.c (gimplify_expr): Likewise. > * optabs-tree.c (optab_for_tree_code): Likewise. > * tree-cfg.c (verify_gimple_assign_ternary): Likewise. > * tree-eh.c (operation_could_trap_p): Likewise. > (stmt_could_throw_1_p): Likewise. > * tree-inline.c (estimate_operator_cost): Likewise. > * tree-pretty-print.c (dump_generic_node): Likewise. > (op_code_prio): Likewise. > * tree-ssa-loop-im.c (stmt_cost): Likewise. > * tree-ssa-operands.c (get_expr_operands): Likewise. > * tree.c (commutative_ternary_tree_code, add_expr): Likewise. > * fold-const-call.h (fold_fma): Delete. > * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS, > CFN_FNMA and CFN_FNMS. > (fold_fma): Delete. > * genmatch.c (combined_fn): New enum. > (commutative_ternary_tree_code): Remove FMA_EXPR handling. > (commutative_op): New function. > (commutate): Use it. Handle more than 2 operands. > (dt_operand::gen_gimple_expr): Use commutative_op. > (parser::parse_expr): Allow :c to be used with non-binary > operators if the commutative operand is known. > * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle > CFN_FMS, CFN_FNMA and CFN_FNMS. > (backprop::process_assign_use): Remove FMA_EXPR handling. > * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise. > (gen_hsa_fma): New function. > (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS, > IFN_FNMA and IFN_FNMS. > * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS. > * gimple-fold.h (follow_all_ssa_edges): Declare. > * gimple-fold.c (follow_all_ssa_edges): New function. > * tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the > gimple_build interface and use follow_all_ssa_edges to fold the result. > (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p > instead of checking for optabs directly. > * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls > rather than FMA_EXPRs. > * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a > call to IFN_FMA instead of an FMA_EXPR. > gcc/brig/ > * brigfrontend/brig-function.cc > (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA > for BRIG_OPCODE_FMA. > (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA > as a call. > gcc/c/ > * gimple-parser.c (c_parser_gimple_postfix_expression): Remove > __FMA_EXPR handlng. > gcc/cp/ > * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling. > (potential_constant_expression_1): Likewise. > gcc/testsuite/ > * lib/target-supports.exp (check_effective_target_scalar_all_fma): > New proc. > * gcc.dg/fma-1.c: New test. > * gcc.dg/fma-2.c: Likewise. > * gcc.dg/fma-3.c: Likewise. > * gcc.dg/fma-4.c: Likewise. > * gcc.dg/fma-5.c: Likewise. > * gcc.dg/fma-6.c: Likewise. > * gcc.dg/fma-7.c: Likewise. > * gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require > scalar_all_fma. > * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off. > * gfortran.dg/reassoc_8.f: Likewise. > * gfortran.dg/reassoc_9.f: Likewise. > * gfortran.dg/reassoc_10.f: Likewise. > Index: gcc/doc/sourcebuild.texi > =================================================================== > --- gcc/doc/sourcebuild.texi 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/doc/sourcebuild.texi 2018-05-17 09:18:19.954942948 +0100 > @@ -2251,6 +2251,11 @@ Target supports @option{-pie}, @option{- > @item rdynamic > Target supports @option{-rdynamic}. > +@item scalar_all_fma > +Target supports all four fused multiply-add optabs for both @code{float} > +and @code{double}. These optabs are: @code{fma_optab}, @code{fms_optab}, > +@code{fnma_optab} and @code{fnms_optab}. > + > @item section_anchors > Target supports section anchors. > Index: gcc/tree.def > =================================================================== > --- gcc/tree.def 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/tree.def 2018-05-17 09:18:19.987942174 +0100 > @@ -1345,12 +1345,6 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "wid > by the second argument. */ > DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2) > -/* Fused multiply-add. > - All operands and the result are of the same type. No intermediate > - rounding is performed after multiplying operand one with operand two > - before adding operand three. */ > -DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3) > - > /* Widening vector multiplication. > The two operands are vectors with N elements of size S. Multiplying the > elements of the two vectors will result in N products of size 2*S. > Index: gcc/internal-fn.def > =================================================================== > --- gcc/internal-fn.def 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/internal-fn.def 2018-05-17 09:18:19.974942479 +0100 > @@ -57,6 +57,7 @@ along with GCC; see the file COPYING3. > - unary: a normal unary optab, such as vec_reverse_<mode> > - binary: a normal binary optab, such as vec_interleave_lo_<mode> > + - ternary: a normal ternary optab, such as fma<mode>4 > - cond_binary: a conditional binary optab, such as add<mode>cc > @@ -138,6 +139,10 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CO > DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, > vec_shl_insert, binary) > +DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) > +DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) > +DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary) > + > DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary) > DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary) > DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first, > @@ -218,6 +223,9 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS > /* FP scales. */ > DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) > +/* Ternary math functions. */ > +DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary) > + > /* Unary integer ops. */ > DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary) > DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | ECF_NOTHROW, clz, unary) > Index: gcc/internal-fn.c > =================================================================== > --- gcc/internal-fn.c 2018-05-17 09:17:58.757608747 +0100 > +++ gcc/internal-fn.c 2018-05-17 09:18:19.974942479 +0100 > @@ -110,6 +110,7 @@ #define mask_store_lanes_direct { 0, 0, > #define scatter_store_direct { 3, 3, false } > #define unary_direct { 0, 0, true } > #define binary_direct { 0, 0, true } > +#define ternary_direct { 0, 0, true } > #define cond_unary_direct { 1, 1, true } > #define cond_binary_direct { 1, 1, true } > #define while_direct { 0, 2, false } > @@ -2982,6 +2983,9 @@ #define expand_unary_optab_fn(FN, STMT, > #define expand_binary_optab_fn(FN, STMT, OPTAB) \ > expand_direct_optab_fn (FN, STMT, OPTAB, 2) > +#define expand_ternary_optab_fn(FN, STMT, OPTAB) \ > + expand_direct_optab_fn (FN, STMT, OPTAB, 3) > + > #define expand_cond_unary_optab_fn(FN, STMT, OPTAB) \ > expand_direct_optab_fn (FN, STMT, OPTAB, 2) > @@ -3067,6 +3071,7 @@ multi_vector_optab_supported_p (convert_ > #define direct_unary_optab_supported_p direct_optab_supported_p > #define direct_binary_optab_supported_p direct_optab_supported_p > +#define direct_ternary_optab_supported_p direct_optab_supported_p > #define direct_cond_unary_optab_supported_p direct_optab_supported_p > #define direct_cond_binary_optab_supported_p direct_optab_supported_p > #define direct_mask_load_optab_supported_p direct_optab_supported_p > Index: gcc/Makefile.in > =================================================================== > --- gcc/Makefile.in 2018-05-16 12:49:10.495216143 +0100 > +++ gcc/Makefile.in 2018-05-17 09:18:19.936943371 +0100 > @@ -2787,7 +2787,7 @@ build/genmddump.o : genmddump.c $(RTL_BA > $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H) > build/genmatch.o : genmatch.c $(BCONFIG_H) $(SYSTEM_H) \ > $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \ > - tree.def builtins.def internal-fn.def > + tree.def builtins.def internal-fn.def case-cfn-macros.h > build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H) \ > $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def \ > internal-fn.def > Index: gcc/builtins.c > =================================================================== > --- gcc/builtins.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/builtins.c 2018-05-17 09:18:19.939943300 +0100 > @@ -8340,21 +8340,6 @@ fold_builtin_abs (location_t loc, tree a > return fold_build1_loc (loc, ABS_EXPR, type, arg); > } > -/* Fold a call to fma, fmaf, or fmal with arguments ARG[012]. */ > - > -static tree > -fold_builtin_fma (location_t loc, tree arg0, tree arg1, tree arg2, tree type) > -{ > - /* ??? Only expand to FMA_EXPR if it's directly supported. */ > - if (validate_arg (arg0, REAL_TYPE) > - && validate_arg (arg1, REAL_TYPE) > - && validate_arg (arg2, REAL_TYPE) > - && optab_handler (fma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) > - return fold_build3_loc (loc, FMA_EXPR, type, arg0, arg1, arg2); > - > - return NULL_TREE; > -} > - > /* Fold a call to builtin carg(a+bi) -> atan2(b,a). */ > static tree > @@ -9260,10 +9245,6 @@ fold_builtin_3 (location_t loc, tree fnd > CASE_FLT_FN (BUILT_IN_SINCOS): > return fold_builtin_sincos (loc, arg0, arg1, arg2); > - CASE_FLT_FN (BUILT_IN_FMA): > - CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA): > - return fold_builtin_fma (loc, arg0, arg1, arg2, type); > - > CASE_FLT_FN (BUILT_IN_REMQUO): > if (validate_arg (arg0, REAL_TYPE) > && validate_arg (arg1, REAL_TYPE) > Index: gcc/cfgexpand.c > =================================================================== > --- gcc/cfgexpand.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/cfgexpand.c 2018-05-17 09:18:19.941943253 +0100 > @@ -4202,7 +4202,6 @@ expand_debug_expr (tree exp) > case SAD_EXPR: > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > - case FMA_EXPR: > goto ternary; > case TRUTH_ANDIF_EXPR: > @@ -5190,9 +5189,6 @@ expand_debug_expr (tree exp) > } > return NULL; > - case FMA_EXPR: > - return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2); > - > default: > flag_unsupported: > if (flag_checking) > Index: gcc/expr.c > =================================================================== > --- gcc/expr.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/expr.c 2018-05-17 09:18:19.957942878 +0100 > @@ -8853,67 +8853,6 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b > expand_operands (treeop0, treeop1, subtarget, &op0, &op1, EXPAND_NORMAL); > return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp)); > - case FMA_EXPR: > - { > - optab opt = fma_optab; > - gimple *def0, *def2; > - > - /* If there is no insn for FMA, emit it as __builtin_fma{,f,l} > - call. */ > - if (optab_handler (fma_optab, mode) == CODE_FOR_nothing) > - { > - tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA); > - tree call_expr; > - > - gcc_assert (fn != NULL_TREE); > - call_expr = build_call_expr (fn, 3, treeop0, treeop1, treeop2); > - return expand_builtin (call_expr, target, subtarget, mode, false); > - } > - > - def0 = get_def_for_expr (treeop0, NEGATE_EXPR); > - /* The multiplication is commutative - look at its 2nd operand > - if the first isn't fed by a negate. */ > - if (!def0) > - { > - def0 = get_def_for_expr (treeop1, NEGATE_EXPR); > - /* Swap operands if the 2nd operand is fed by a negate. */ > - if (def0) > - std::swap (treeop0, treeop1); > - } > - def2 = get_def_for_expr (treeop2, NEGATE_EXPR); > - > - op0 = op2 = NULL; > - > - if (def0 && def2 > - && optab_handler (fnms_optab, mode) != CODE_FOR_nothing) > - { > - opt = fnms_optab; > - op0 = expand_normal (gimple_assign_rhs1 (def0)); > - op2 = expand_normal (gimple_assign_rhs1 (def2)); > - } > - else if (def0 > - && optab_handler (fnma_optab, mode) != CODE_FOR_nothing) > - { > - opt = fnma_optab; > - op0 = expand_normal (gimple_assign_rhs1 (def0)); > - } > - else if (def2 > - && optab_handler (fms_optab, mode) != CODE_FOR_nothing) > - { > - opt = fms_optab; > - op2 = expand_normal (gimple_assign_rhs1 (def2)); > - } > - > - if (op0 == NULL) > - op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL); > - if (op2 == NULL) > - op2 = expand_normal (treeop2); > - op1 = expand_normal (treeop1); > - > - return expand_ternary_op (TYPE_MODE (type), opt, > - op0, op1, op2, target, 0); > - } > - > case MULT_EXPR: > /* If this is a fixed-point operation, then we cannot use the code > below because "expand_mult" doesn't support sat/no-sat fixed-point > Index: gcc/fold-const.c > =================================================================== > --- gcc/fold-const.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/fold-const.c 2018-05-17 09:18:19.959942831 +0100 > @@ -3297,7 +3297,6 @@ #define OP_SAME_WITH_NULL(N) \ > case TRUTH_ORIF_EXPR: > return OP_SAME (0) && OP_SAME (1); > - case FMA_EXPR: > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > if (!OP_SAME (2)) > @@ -11708,17 +11707,6 @@ fold_ternary_loc (location_t loc, enum t > return NULL_TREE; > - case FMA_EXPR: > - /* For integers we can decompose the FMA if possible. */ > - if (TREE_CODE (arg0) == INTEGER_CST > - && TREE_CODE (arg1) == INTEGER_CST) > - return fold_build2_loc (loc, PLUS_EXPR, type, > - const_binop (MULT_EXPR, arg0, arg1), arg2); > - if (integer_zerop (arg2)) > - return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1); > - > - return fold_fma (loc, type, arg0, arg1, arg2); > - > case VEC_PERM_EXPR: > if (TREE_CODE (arg2) == VECTOR_CST) > { > Index: gcc/gimple-pretty-print.c > =================================================================== > --- gcc/gimple-pretty-print.c 2018-05-17 09:17:58.756608780 +0100 > +++ gcc/gimple-pretty-print.c 2018-05-17 09:18:19.963942737 +0100 > @@ -490,27 +490,6 @@ dump_ternary_rhs (pretty_printer *buffer > pp_greater (buffer); > break; > - case FMA_EXPR: > - if (flags & TDF_GIMPLE) > - { > - pp_string (buffer, "__FMA ("); > - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); > - pp_comma (buffer); > - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); > - pp_comma (buffer); > - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); > - pp_right_paren (buffer); > - } > - else > - { > - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); > - pp_string (buffer, " * "); > - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); > - pp_string (buffer, " + "); > - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); > - } > - break; > - > case DOT_PROD_EXPR: > pp_string (buffer, "DOT_PROD_EXPR <"); > dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); > Index: gcc/gimple.c > =================================================================== > --- gcc/gimple.c 2018-05-17 09:17:58.756608780 +0100 > +++ gcc/gimple.c 2018-05-17 09:18:19.966942667 +0100 > @@ -2150,8 +2150,7 @@ #define DEFTREECODE(SYM, STRING, TYPE, N > || (SYM) == REALIGN_LOAD_EXPR \ > || (SYM) == VEC_COND_EXPR \ > || (SYM) == VEC_PERM_EXPR \ > - || (SYM) == BIT_INSERT_EXPR \ > - || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS \ > + || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS \ > : ((SYM) == CONSTRUCTOR \ > || (SYM) == OBJ_TYPE_REF \ > || (SYM) == ASSERT_EXPR \ > Index: gcc/gimplify.c > =================================================================== > --- gcc/gimplify.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/gimplify.c 2018-05-17 09:18:19.972942526 +0100 > @@ -12086,7 +12086,6 @@ gimplify_expr (tree *expr_p, gimple_seq > } > break; > - case FMA_EXPR: > case VEC_PERM_EXPR: > /* Classified as tcc_expression. */ > goto expr_3; > Index: gcc/optabs-tree.c > =================================================================== > --- gcc/optabs-tree.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/optabs-tree.c 2018-05-17 09:18:19.975942456 +0100 > @@ -143,9 +143,6 @@ optab_for_tree_code (enum tree_code code > : (TYPE_SATURATING (type) > ? ssmsub_widen_optab : smsub_widen_optab)); > - case FMA_EXPR: > - return fma_optab; > - > case VEC_WIDEN_MULT_HI_EXPR: > return TYPE_UNSIGNED (type) ? > vec_widen_umult_hi_optab : vec_widen_smult_hi_optab; > Index: gcc/tree-cfg.c > =================================================================== > --- gcc/tree-cfg.c 2018-05-16 12:49:10.497382733 +0100 > +++ gcc/tree-cfg.c 2018-05-17 09:18:19.979942362 +0100 > @@ -4109,20 +4109,6 @@ verify_gimple_assign_ternary (gassign *s > } > break; > - case FMA_EXPR: > - if (!useless_type_conversion_p (lhs_type, rhs1_type) > - || !useless_type_conversion_p (lhs_type, rhs2_type) > - || !useless_type_conversion_p (lhs_type, rhs3_type)) > - { > - error ("type mismatch in fused multiply-add expression"); > - debug_generic_expr (lhs_type); > - debug_generic_expr (rhs1_type); > - debug_generic_expr (rhs2_type); > - debug_generic_expr (rhs3_type); > - return true; > - } > - break; > - > case VEC_COND_EXPR: > if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type) > || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type), > Index: gcc/tree-eh.c > =================================================================== > --- gcc/tree-eh.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/tree-eh.c 2018-05-17 09:18:19.980942338 +0100 > @@ -2512,8 +2512,7 @@ operation_could_trap_p (enum tree_code o > if (TREE_CODE_CLASS (op) != tcc_comparison > && TREE_CODE_CLASS (op) != tcc_unary > - && TREE_CODE_CLASS (op) != tcc_binary > - && op != FMA_EXPR) > + && TREE_CODE_CLASS (op) != tcc_binary) > return false; > return operation_could_trap_helper_p (op, fp_operation, honor_trapv, > @@ -2825,8 +2824,7 @@ stmt_could_throw_1_p (gassign *stmt) > if (TREE_CODE_CLASS (code) == tcc_comparison > || TREE_CODE_CLASS (code) == tcc_unary > - || TREE_CODE_CLASS (code) == tcc_binary > - || code == FMA_EXPR) > + || TREE_CODE_CLASS (code) == tcc_binary) > { > if (TREE_CODE_CLASS (code) == tcc_comparison) > t = TREE_TYPE (gimple_assign_rhs1 (stmt)); > Index: gcc/tree-inline.c > =================================================================== > --- gcc/tree-inline.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/tree-inline.c 2018-05-17 09:18:19.980942338 +0100 > @@ -3855,7 +3855,6 @@ estimate_operator_cost (enum tree_code c > case MINUS_EXPR: > case MULT_EXPR: > case MULT_HIGHPART_EXPR: > - case FMA_EXPR: > case ADDR_SPACE_CONVERT_EXPR: > case FIXED_CONVERT_EXPR: > Index: gcc/tree-pretty-print.c > =================================================================== > --- gcc/tree-pretty-print.c 2018-05-17 09:17:58.758608713 +0100 > +++ gcc/tree-pretty-print.c 2018-05-17 09:18:19.981942315 +0100 > @@ -2904,16 +2904,6 @@ dump_generic_node (pretty_printer *pp, t > pp_string (pp, " > "); > break; > - case FMA_EXPR: > - pp_string (pp, " FMA_EXPR < "); > - dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); > - pp_string (pp, ", "); > - dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false); > - pp_string (pp, ", "); > - dump_generic_node (pp, TREE_OPERAND (node, 2), spc, flags, false); > - pp_string (pp, " > "); > - break; > - > case OACC_PARALLEL: > pp_string (pp, "#pragma acc parallel"); > goto dump_omp_clauses_body; > @@ -3552,7 +3542,6 @@ op_code_prio (enum tree_code code) > case CEIL_MOD_EXPR: > case FLOOR_MOD_EXPR: > case ROUND_MOD_EXPR: > - case FMA_EXPR: > return 13; > case TRUTH_NOT_EXPR: > Index: gcc/tree-ssa-loop-im.c > =================================================================== > --- gcc/tree-ssa-loop-im.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/tree-ssa-loop-im.c 2018-05-17 09:18:19.981942315 +0100 > @@ -493,7 +493,6 @@ stmt_cost (gimple *stmt) > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > case DOT_PROD_EXPR: > - case FMA_EXPR: > case TRUNC_DIV_EXPR: > case CEIL_DIV_EXPR: > case FLOOR_DIV_EXPR: > Index: gcc/tree-ssa-operands.c > =================================================================== > --- gcc/tree-ssa-operands.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/tree-ssa-operands.c 2018-05-17 09:18:19.982942291 +0100 > @@ -849,7 +849,6 @@ get_expr_operands (struct function *fn, > case REALIGN_LOAD_EXPR: > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > - case FMA_EXPR: > { > get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags); > get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 1), flags); > Index: gcc/tree.c > =================================================================== > --- gcc/tree.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/tree.c 2018-05-17 09:18:19.986942197 +0100 > @@ -7171,7 +7171,6 @@ commutative_ternary_tree_code (enum tree > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > case DOT_PROD_EXPR: > - case FMA_EXPR: > return true; > default: > @@ -7457,7 +7456,6 @@ add_expr (const_tree t, inchash::hash &h > flags &= ~OEP_ADDRESS_OF; > break; > - case FMA_EXPR: > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > { > Index: gcc/fold-const-call.h > =================================================================== > --- gcc/fold-const-call.h 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/fold-const-call.h 2018-05-17 09:18:19.957942878 +0100 > @@ -23,7 +23,6 @@ #define GCC_FOLD_CONST_CALL_H > tree fold_const_call (combined_fn, tree, tree); > tree fold_const_call (combined_fn, tree, tree, tree); > tree fold_const_call (combined_fn, tree, tree, tree, tree); > -tree fold_fma (location_t, tree, tree, tree, tree); > tree build_cmp_result (tree type, int res); > #endif > Index: gcc/fold-const-call.c > =================================================================== > --- gcc/fold-const-call.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/fold-const-call.c 2018-05-17 09:18:19.957942878 +0100 > @@ -1606,6 +1606,26 @@ fold_const_call_ssss (real_value *result > CASE_CFN_FMA_FN: > return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, arg2, format); > + case CFN_FMS: > + { > + real_value new_arg2 = real_value_negate (arg2); > + return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, &new_arg2, format); > + } > + > + case CFN_FNMA: > + { > + real_value new_arg0 = real_value_negate (arg0); > + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, arg2, format); > + } > + > + case CFN_FNMS: > + { > + real_value new_arg0 = real_value_negate (arg0); > + real_value new_arg2 = real_value_negate (arg2); > + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, > + &new_arg2, format); > + } > + > default: > return false; > } > @@ -1719,20 +1739,3 @@ fold_const_call (combined_fn fn, tree ty > return fold_const_call_1 (fn, type, arg0, arg1, arg2); > } > } > - > -/* Fold a fma operation with arguments ARG[012]. */ > - > -tree > -fold_fma (location_t, tree type, tree arg0, tree arg1, tree arg2) > -{ > - REAL_VALUE_TYPE result; > - if (real_cst_p (arg0) > - && real_cst_p (arg1) > - && real_cst_p (arg2) > - && do_mpfr_arg3 (&result, mpfr_fma, TREE_REAL_CST_PTR (arg0), > - TREE_REAL_CST_PTR (arg1), TREE_REAL_CST_PTR (arg2), > - REAL_MODE_FORMAT (TYPE_MODE (type)))) > - return build_real (type, result); > - > - return NULL_TREE; > -} > Index: gcc/genmatch.c > =================================================================== > --- gcc/genmatch.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/genmatch.c 2018-05-17 09:18:19.960942808 +0100 > @@ -241,6 +241,20 @@ enum internal_fn { > IFN_LAST > }; > +enum combined_fn { > +#define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \ > + CFN_##ENUM = int (ENUM), > +#include "builtins.def" > + > +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \ > + CFN_##CODE = int (END_BUILTINS) + int (IFN_##CODE), > +#include "internal-fn.def" > + > + CFN_LAST > +}; > + > +#include "case-cfn-macros.h" > + > /* Return true if CODE represents a commutative tree code. Otherwise > return false. */ > bool > @@ -288,7 +302,6 @@ commutative_ternary_tree_code (enum tree > case WIDEN_MULT_PLUS_EXPR: > case WIDEN_MULT_MINUS_EXPR: > case DOT_PROD_EXPR: > - case FMA_EXPR: > return true; > default: > @@ -450,6 +463,44 @@ is_a_helper <user_id *>::test (id_base * > return id->kind == id_base::USER; > } > +/* If ID has a pair of consecutive, commutative operands, return the > + index of the first, otherwise return -1. */ > + > +static int > +commutative_op (id_base *id) > +{ > + if (operator_id *code = dyn_cast <operator_id *> (id)) > + { > + if (commutative_tree_code (code->code) > + || commutative_ternary_tree_code (code->code)) > + return 0; > + return -1; > + } > + if (fn_id *fn = dyn_cast <fn_id *> (id)) > + switch (fn->fn) > + { > + CASE_CFN_FMA: > + case CFN_FMS: > + case CFN_FNMA: > + case CFN_FNMS: > + return 0; > + > + default: > + return -1; > + } > + if (user_id *uid = dyn_cast<user_id *> (id)) > + { > + int res = commutative_op (uid->substitutes[0]); > + if (res < 0) > + return 0; > + for (unsigned i = 1; i < uid->substitutes.length (); ++i) > + if (res != commutative_op (uid->substitutes[i])) > + return -1; > + return res; > + } > + return -1; > +} > + > /* Add a predicate identifier to the hash. */ > static predicate_id * > @@ -946,6 +997,9 @@ commutate (operand *op, vec<vec<user_id > if (!e->is_commutative) > return ret; > + /* The operation is always binary if it isn't inherently commutative. */ > + int natural_opno = commutative_op (e->operation); > + unsigned int opno = natural_opno >= 0 ? natural_opno : 0; > for (unsigned i = 0; i < result.length (); ++i) > { > expr *ne = new expr (e); > @@ -994,9 +1048,11 @@ commutate (operand *op, vec<vec<user_id > } > } > ne->is_commutative = false; > - // result[i].length () is 2 since e->operation is binary > - for (unsigned j = result[i].length (); j; --j) > - ne->append_op (result[i][j-1]); > + for (unsigned j = 0; j < result[i].length (); ++j) > + { > + int old_j = (j == opno ? opno + 1 : j == opno + 1 ? opno : j); > + ne->append_op (result[i][old_j]); > + } > ret.safe_push (ne); > } > @@ -2759,24 +2815,18 @@ dt_operand::gen_gimple_expr (FILE *f, in > /* While the toplevel operands are canonicalized by the caller > after valueizing operands of sub-expressions we have to > re-canonicalize operand order. */ > - if (operator_id *code = dyn_cast <operator_id *> (id)) > + int opno = commutative_op (id); > + if (opno >= 0) > { > - /* ??? We can't canonicalize tcc_comparison operands here > - because that requires changing the comparison code which > - we already matched... */ > - if (commutative_tree_code (code->code) > - || commutative_ternary_tree_code (code->code)) > - { > - char child_opname0[20], child_opname1[20]; > - gen_opname (child_opname0, 0); > - gen_opname (child_opname1, 1); > - fprintf_indent (f, indent, > - "if (tree_swap_operands_p (%s, %s))\n", > - child_opname0, child_opname1); > - fprintf_indent (f, indent, > - " std::swap (%s, %s);\n", > - child_opname0, child_opname1); > - } > + char child_opname0[20], child_opname1[20]; > + gen_opname (child_opname0, opno); > + gen_opname (child_opname1, opno + 1); > + fprintf_indent (f, indent, > + "if (tree_swap_operands_p (%s, %s))\n", > + child_opname0, child_opname1); > + fprintf_indent (f, indent, > + " std::swap (%s, %s);\n", > + child_opname0, child_opname1); > } > return n_braces; > @@ -4217,11 +4267,14 @@ parser::parse_expr () > e->operation->id, e->operation->nargs, e->ops.length ()); > if (is_commutative) > { > - if (e->ops.length () == 2) > + if (e->ops.length () == 2 > + || commutative_op (e->operation) >= 0) > e->is_commutative = true; > else > - fatal_at (token, "only binary operators or function with " > - "two arguments can be marked commutative"); > + fatal_at (token, "only binary operators or functions with " > + "two arguments can be marked commutative, " > + "unless the operation is known to be inherently " > + "commutative"); > } > e->expr_type = expr_type; > return op; > Index: gcc/gimple-ssa-backprop.c > =================================================================== > --- gcc/gimple-ssa-backprop.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/gimple-ssa-backprop.c 2018-05-17 09:18:19.963942737 +0100 > @@ -375,6 +375,9 @@ backprop::process_builtin_call_use (gcal > CASE_CFN_FMA: > CASE_CFN_FMA_FN: > + case CFN_FMS: > + case CFN_FNMA: > + case CFN_FNMS: > /* In X * X + Y, where Y is distinct from X, the sign of X doesn't > matter. */ > if (gimple_call_arg (call, 0) == rhs > @@ -420,15 +423,6 @@ backprop::process_assign_use (gassign *a > } > break; > - case FMA_EXPR: > - /* In X * X + Y, where Y is distinct from X, the sign of X doesn't > - matter. */ > - if (gimple_assign_rhs1 (assign) == rhs > - && gimple_assign_rhs2 (assign) == rhs > - && gimple_assign_rhs3 (assign) != rhs) > - info->flags.ignore_sign = true; > - break; > - > case MULT_EXPR: > /* In X * X, the sign of X doesn't matter. */ > if (gimple_assign_rhs1 (assign) == rhs > Index: gcc/hsa-gen.c > =================================================================== > --- gcc/hsa-gen.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/hsa-gen.c 2018-05-17 09:18:19.973942503 +0100 > @@ -3178,23 +3178,6 @@ gen_hsa_insns_for_operation_assignment ( > case NEGATE_EXPR: > opcode = BRIG_OPCODE_NEG; > break; > - case FMA_EXPR: > - /* There is a native HSA instruction for scalar FMAs but not for vector > - ones. */ > - if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) > - { > - hsa_op_reg *dest > - = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign)); > - hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); > - hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); > - hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); > - hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); > - gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); > - gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb); > - return; > - } > - opcode = BRIG_OPCODE_MAD; > - break; > case MIN_EXPR: > opcode = BRIG_OPCODE_MIN; > break; > @@ -4490,6 +4473,57 @@ gen_hsa_divmod (gcall *call, hsa_bb *hbb > insn->set_output_in_type (dest, 0, hbb); > } > +/* Emit instructions that implement FMA, FMS, FNMA or FNMS call STMT. > + Instructions are appended to basic block HBB. NEGATE1 is true for > + FNMA and FNMS. NEGATE3 is true for FMS and FNMS. */ > + > +static void > +gen_hsa_fma (gcall *call, hsa_bb *hbb, bool negate1, bool negate3) > +{ > + tree lhs = gimple_call_lhs (call); > + if (lhs == NULL_TREE) > + return; > + > + tree rhs1 = gimple_call_arg (call, 0); > + tree rhs2 = gimple_call_arg (call, 1); > + tree rhs3 = gimple_call_arg (call, 2); > + > + hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs); > + hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); > + hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); > + hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); > + > + if (negate1) > + { > + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); > + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op1, hbb); > + op1 = tmp; > + } > + > + /* There is a native HSA instruction for scalar FMAs but not for vector > + ones. */ > + if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) > + { > + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); > + gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); > + gen_hsa_binary_operation (negate3 ? BRIG_OPCODE_SUB : BRIG_OPCODE_ADD, > + dest, tmp, op3, hbb); > + } > + else > + { > + if (negate3) > + { > + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); > + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op3, hbb); > + op3 = tmp; > + } > + hsa_insn_basic *insn = new hsa_insn_basic (4, BRIG_OPCODE_MAD, > + dest->m_type, dest, > + op1, op2, op3); > + hbb->append_insn (insn); > + } > +} > + > /* Set VALUE to a shadow kernel debug argument and append a new instruction > to HBB basic block. */ > @@ -5224,6 +5258,22 @@ gen_hsa_insn_for_internal_fn_call (gcall > gen_hsa_insns_for_call_of_internal_fn (stmt, hbb); > break; > + case IFN_FMA: > + gen_hsa_fma (stmt, hbb, false, false); > + break; > + > + case IFN_FMS: > + gen_hsa_fma (stmt, hbb, false, true); > + break; > + > + case IFN_FNMA: > + gen_hsa_fma (stmt, hbb, true, false); > + break; > + > + case IFN_FNMS: > + gen_hsa_fma (stmt, hbb, true, true); > + break; > + > default: > HSA_SORRY_ATV (gimple_location (stmt), > "support for HSA does not implement internal function: %s", > Index: gcc/match.pd > =================================================================== > --- gcc/match.pd 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/match.pd 2018-05-17 09:18:19.974942479 +0100 > @@ -4702,3 +4702,60 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > || wi::geu_p (wi::to_wide (@rpos), > wi::to_wide (@ipos) + isize)) > (BIT_FIELD_REF @0 @rsize @rpos))))) > + > +(for fmas (FMA) > + (simplify > + (fmas:c (negate @0) @1 @2) > + (IFN_FNMA @0 @1 @2)) > + (simplify > + (fmas @0 @1 (negate @2)) > + (IFN_FMS @0 @1 @2)) > + (simplify > + (fmas:c (negate @0) @1 (negate @2)) > + (IFN_FNMS @0 @1 @2)) > + (simplify > + (negate (fmas@3 @0 @1 @2)) > + (if (single_use (@3)) > + (IFN_FNMS @0 @1 @2)))) > + > +(simplify > + (IFN_FMS:c (negate @0) @1 @2) > + (IFN_FNMS @0 @1 @2)) > +(simplify > + (IFN_FMS @0 @1 (negate @2)) > + (IFN_FMA @0 @1 @2)) > +(simplify > + (IFN_FMS:c (negate @0) @1 (negate @2)) > + (IFN_FNMA @0 @1 @2)) > +(simplify > + (negate (IFN_FMS@3 @0 @1 @2)) > + (if (single_use (@3)) > + (IFN_FNMA @0 @1 @2))) > + > +(simplify > + (IFN_FNMA:c (negate @0) @1 @2) > + (IFN_FMA @0 @1 @2)) > +(simplify > + (IFN_FNMA @0 @1 (negate @2)) > + (IFN_FNMS @0 @1 @2)) > +(simplify > + (IFN_FNMA:c (negate @0) @1 (negate @2)) > + (IFN_FMS @0 @1 @2)) > +(simplify > + (negate (IFN_FNMA@3 @0 @1 @2)) > + (if (single_use (@3)) > + (IFN_FMS @0 @1 @2))) > + > +(simplify > + (IFN_FNMS:c (negate @0) @1 @2) > + (IFN_FMS @0 @1 @2)) > +(simplify > + (IFN_FNMS @0 @1 (negate @2)) > + (IFN_FNMA @0 @1 @2)) > +(simplify > + (IFN_FNMS:c (negate @0) @1 (negate @2)) > + (IFN_FMA @0 @1 @2)) > +(simplify > + (negate (IFN_FNMS@3 @0 @1 @2)) > + (if (single_use (@3)) > + (IFN_FMA @0 @1 @2))) > Index: gcc/gimple-fold.h > =================================================================== > --- gcc/gimple-fold.h 2018-05-17 09:17:32.876478942 +0100 > +++ gcc/gimple-fold.h 2018-05-17 09:18:19.962942761 +0100 > @@ -41,6 +41,7 @@ extern bool arith_overflowed_p (enum tre > const_tree); > extern tree no_follow_ssa_edges (tree); > extern tree follow_single_use_edges (tree); > +extern tree follow_all_ssa_edges (tree); > extern tree gimple_fold_stmt_to_constant_1 (gimple *, tree (*) (tree), > tree (*) (tree) = no_follow_ssa_edges); > extern tree gimple_fold_stmt_to_constant (gimple *, tree (*) (tree)); > Index: gcc/gimple-fold.c > =================================================================== > --- gcc/gimple-fold.c 2018-05-17 09:17:32.876478942 +0100 > +++ gcc/gimple-fold.c 2018-05-17 09:18:19.961942784 +0100 > @@ -4967,6 +4967,14 @@ follow_single_use_edges (tree val) > return val; > } > +/* Valueization callback that follows all SSA edges. */ > + > +tree > +follow_all_ssa_edges (tree val) > +{ > + return val; > +} > + > /* Fold the statement pointed to by GSI. In some cases, this function may > replace the whole statement with a new one. Returns true iff folding > makes any changes. > Index: gcc/tree-ssa-math-opts.c > =================================================================== > --- gcc/tree-ssa-math-opts.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/tree-ssa-math-opts.c 2018-05-17 09:18:19.982942291 +0100 > @@ -2650,7 +2650,7 @@ convert_mult_to_fma_1 (tree mul_result, > tree type = TREE_TYPE (mul_result); > gimple *use_stmt; > imm_use_iterator imm_iter; > - gassign *fma_stmt; > + gcall *fma_stmt; > FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result) > { > @@ -2658,6 +2658,7 @@ convert_mult_to_fma_1 (tree mul_result, > enum tree_code use_code; > tree addop, mulop1 = op1, result = mul_result; > bool negate_p = false; > + gimple_seq seq = NULL; > if (is_gimple_debug (use_stmt)) > continue; > @@ -2683,11 +2684,7 @@ convert_mult_to_fma_1 (tree mul_result, > addop = gimple_assign_rhs2 (use_stmt); > /* a * b - c -> a * b + (-c) */ > if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR) > - addop = force_gimple_operand_gsi (&gsi, > - build1 (NEGATE_EXPR, > - type, addop), > - true, NULL_TREE, true, > - GSI_SAME_STMT); > + addop = gimple_build (&seq, NEGATE_EXPR, type, addop); > } > else > { > @@ -2698,23 +2695,26 @@ convert_mult_to_fma_1 (tree mul_result, > } > if (negate_p) > - mulop1 = force_gimple_operand_gsi (&gsi, > - build1 (NEGATE_EXPR, > - type, mulop1), > - true, NULL_TREE, true, > - GSI_SAME_STMT); > + mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1); > - fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), > - FMA_EXPR, mulop1, op2, addop); > + if (seq) > + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); > + fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop); > + gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt)); > + gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (use_stmt)); > + gsi_replace (&gsi, fma_stmt, true); > + /* Follow all SSA edges so that we generate FMS, FNMA and FNMS > + regardless of where the negation occurs. */ > + if (fold_stmt (&gsi, follow_all_ssa_edges)) > + update_stmt (gsi_stmt (gsi)); > if (dump_file && (dump_flags & TDF_DETAILS)) > { > fprintf (dump_file, "Generated FMA "); > - print_gimple_stmt (dump_file, fma_stmt, 0, 0); > + print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0); > fprintf (dump_file, "\n"); > } > - gsi_replace (&gsi, fma_stmt, true); > widen_mul_stats.fmas_inserted++; > } > } > @@ -2862,7 +2862,8 @@ convert_mult_to_fma (gimple *mul_stmt, t > /* If the target doesn't support it, don't generate it. We assume that > if fma isn't available then fms, fnma or fnms are not either. */ > - if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing) > + optimization_type opt_type = bb_optimization_type (gimple_bb (mul_stmt)); > + if (!direct_internal_fn_supported_p (IFN_FMA, type, opt_type)) > return false; > /* If the multiplication has zero uses, it is kept around probably because > @@ -2958,8 +2959,8 @@ convert_mult_to_fma (gimple *mul_stmt, t > that a mult / subtract pair. */ > if (use_code == MINUS_EXPR && !negate_p > && gimple_assign_rhs1 (use_stmt) == result > - && optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing > - && optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) > + && !direct_internal_fn_supported_p (IFN_FMS, type, opt_type) > + && direct_internal_fn_supported_p (IFN_FNMA, type, opt_type)) > { > tree rhs2 = gimple_assign_rhs2 (use_stmt); > Index: gcc/config/i386/i386.c > =================================================================== > --- gcc/config/i386/i386.c 2018-05-16 12:49:10.501715913 +0100 > +++ gcc/config/i386/i386.c 2018-05-17 09:18:19.949943066 +0100 > @@ -50614,19 +50614,20 @@ ix86_add_stmt_cost (void *data, int coun > tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE; > int stmt_cost = - 1; > + bool fp = false; > + machine_mode mode = TImode; > + > + if (vectype != NULL) > + { > + fp = FLOAT_TYPE_P (vectype); > + mode = TYPE_MODE (vectype); > + } > + > if ((kind == vector_stmt || kind == scalar_stmt) > && stmt_info > && stmt_info->stmt && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN) > { > tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt); > - bool fp = false; > - machine_mode mode = TImode; > - > - if (vectype != NULL) > - { > - fp = FLOAT_TYPE_P (vectype); > - mode = TYPE_MODE (vectype); > - } > /*machine_mode inner_mode = mode; > if (VECTOR_MODE_P (mode)) > inner_mode = GET_MODE_INNER (mode);*/ > @@ -50657,12 +50658,6 @@ ix86_add_stmt_cost (void *data, int coun > case MULT_HIGHPART_EXPR: > stmt_cost = ix86_multiplication_cost (ix86_cost, mode); > break; > - case FMA_EXPR: > - stmt_cost = ix86_vec_cost (mode, > - mode == SFmode ? ix86_cost->fmass > - : ix86_cost->fmasd, > - true); > - break; > case NEGATE_EXPR: > if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) > stmt_cost = ix86_cost->sse_op; > @@ -50725,6 +50720,24 @@ ix86_add_stmt_cost (void *data, int coun > break; > } > } > + > + combined_fn cfn; > + if ((kind == vector_stmt || kind == scalar_stmt) > + && stmt_info > + && stmt_info->stmt > + && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST) > + switch (cfn) > + { > + case CFN_FMA: > + stmt_cost = ix86_vec_cost (mode, > + mode == SFmode ? ix86_cost->fmass > + : ix86_cost->fmasd, > + true); > + break; > + default: > + break; > + } > + > /* If we do elementwise loads into a vector then we are bound by > latency and execution resources for the many scalar loads > (AGU and load ports). Try to account for this by scaling the > Index: gcc/config/rs6000/rs6000.c > =================================================================== > --- gcc/config/rs6000/rs6000.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/config/rs6000/rs6000.c 2018-05-17 09:18:19.953942972 +0100 > @@ -15880,7 +15880,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_ > arg1 = gimple_call_arg (stmt, 1); > tree arg2 = gimple_call_arg (stmt, 2); > lhs = gimple_call_lhs (stmt); > - gimple *g = gimple_build_assign (lhs, FMA_EXPR, arg0, arg1, arg2); > + gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1, arg2); > + gimple_call_set_lhs (g, lhs); > + gimple_call_set_nothrow (g, true); > gimple_set_location (g, gimple_location (stmt)); > gsi_replace (gsi, g, true); > return true; > Index: gcc/brig/brigfrontend/brig-function.cc > =================================================================== > --- gcc/brig/brigfrontend/brig-function.cc 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/brig/brigfrontend/brig-function.cc 2018-05-17 09:18:19.936943371 +0100 > @@ -1218,6 +1218,7 @@ brig_function::get_builtin_for_hsa_opcod > case BRIG_OPCODE_NEXP2: > builtin = mathfn_built_in (builtin_type, BUILT_IN_EXP2); > break; > + case BRIG_OPCODE_FMA: > case BRIG_OPCODE_NFMA: > builtin = mathfn_built_in (builtin_type, BUILT_IN_FMA); > break; > @@ -1460,8 +1461,6 @@ brig_function::get_tree_code_for_hsa_opc > return CALL_EXPR; > else > return MAX_EXPR; > - case BRIG_OPCODE_FMA: > - return FMA_EXPR; > case BRIG_OPCODE_ABS: > return ABS_EXPR; > case BRIG_OPCODE_SHL: > @@ -1496,6 +1495,7 @@ brig_function::get_tree_code_for_hsa_opc > /* Implement as 1/f (x). gcc should pattern detect that and > use a native instruction, if available, for it. */ > return TREE_LIST; > + case BRIG_OPCODE_FMA: > case BRIG_OPCODE_FLOOR: > case BRIG_OPCODE_CEIL: > case BRIG_OPCODE_SQRT: > Index: gcc/c/gimple-parser.c > =================================================================== > --- gcc/c/gimple-parser.c 2018-05-17 09:17:58.756608780 +0100 > +++ gcc/c/gimple-parser.c 2018-05-17 09:18:19.939943300 +0100 > @@ -952,27 +952,6 @@ c_parser_gimple_postfix_expression (c_pa > expr.value = fold_convert (type, val); > return expr; > } > - else if (strcmp (IDENTIFIER_POINTER (id), "__FMA") == 0) > - { > - c_parser_consume_token (parser); > - auto_vec<tree> args; > - > - if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>")) > - { > - c_parser_gimple_expr_list (parser, &args); > - c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, > - "expected %<)%>"); > - } > - if (args.length () != 3) > - { > - error_at (loc, "invalid number of operands to __FMA"); > - expr.value = error_mark_node; > - return expr; > - } > - expr.value = build3_loc (loc, FMA_EXPR, TREE_TYPE (args[0]), > - args[0], args[1], args[2]); > - return expr; > - } > /* SSA name. */ > unsigned version, ver_offset; > Index: gcc/cp/constexpr.c > =================================================================== > --- gcc/cp/constexpr.c 2018-05-16 12:49:10.203809793 +0100 > +++ gcc/cp/constexpr.c 2018-05-17 09:18:19.954942948 +0100 > @@ -4574,7 +4574,6 @@ cxx_eval_constant_expression (const cons > non_constant_p, overflow_p); > break; > - case FMA_EXPR: > case VEC_PERM_EXPR: > r = cxx_eval_trinary_expression (ctx, t, lval, > non_constant_p, overflow_p); > @@ -5999,7 +5998,6 @@ #define RECUR(T,RV) \ > return false; > return true; > - case FMA_EXPR: > case VEC_PERM_EXPR: > for (i = 0; i < 3; ++i) > if (!RECUR (TREE_OPERAND (t, i), true)) > Index: gcc/testsuite/lib/target-supports.exp > =================================================================== > --- gcc/testsuite/lib/target-supports.exp 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/testsuite/lib/target-supports.exp 2018-05-17 09:18:19.978942385 +0100 > @@ -2879,6 +2879,13 @@ proc check_effective_target_base_quadflo > return 1 > } > +# Return 1 if the target supports all four forms of fused multiply-add > +# (fma, fms, fnma, and fnms) for both float and double. > + > +proc check_effective_target_scalar_all_fma { } { > + return [istarget aarch64*-*-*] > +} > + > # Return 1 if the target supports compiling fixed-point, > # 0 otherwise. > Index: gcc/testsuite/gcc.dg/fma-1.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-1.c 2018-05-17 09:18:19.975942456 +0100 > @@ -0,0 +1,15 @@ > +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return a * b + c; > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return a * b + c; > +} > + > +/* { dg-final { scan-tree-dump-times { = \.FMA \(} 2 "widening_mul" { target scalar_all_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-2.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-2.c 2018-05-17 09:18:19.975942456 +0100 > @@ -0,0 +1,15 @@ > +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return a * b - c; > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return a * b - c; > +} > + > +/* { dg-final { scan-tree-dump-times { = \.FMS \(} 2 "widening_mul" { target scalar_all_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-3.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-3.c 2018-05-17 09:18:19.975942456 +0100 > @@ -0,0 +1,15 @@ > +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return c - a * b; > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return c - a * b; > +} > + > +/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 2 "widening_mul" { target scalar_all_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-4.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-4.c 2018-05-17 09:18:19.975942456 +0100 > @@ -0,0 +1,15 @@ > +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return -(a * b) - c; > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return -(a * b) - c; > +} > + > +/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 2 "widening_mul" { target scalar_all_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-5.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-5.c 2018-05-17 09:18:19.976942432 +0100 > @@ -0,0 +1,53 @@ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return __builtin_fmaf (a, b, -c); > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return __builtin_fma (a, b, -c); > +} > + > +void > +f3 (float a, float b, float c, float d, float e, float *res) > +{ > + res[0] = __builtin_fmaf (a, b, -e); > + res[1] = __builtin_fmaf (c, d, -e); > +} > + > +void > +f4 (double a, double b, double c, double d, double e, double *res) > +{ > + res[0] = __builtin_fma (a, b, -e); > + res[1] = __builtin_fma (c, d, -e); > +} > + > +float > +f5 (float a, float b, float c) > +{ > + return -__builtin_fmaf (-a, b, c); > +} > + > +double > +f6 (double a, double b, double c) > +{ > + return -__builtin_fma (-a, b, c); > +} > + > +float > +f7 (float a, float b, float c) > +{ > + return -__builtin_fmaf (a, -b, c); > +} > + > +double > +f8 (double a, double b, double c) > +{ > + return -__builtin_fma (a, -b, c); > +} > + > +/* { dg-final { scan-tree-dump-times { = \.FMS \(} 10 "optimized" { target scalar_all_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-6.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-6.c 2018-05-17 09:18:19.976942432 +0100 > @@ -0,0 +1,67 @@ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return __builtin_fmaf (-a, b, c); > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return __builtin_fma (-a, b, c); > +} > + > +float > +f3 (float a, float b, float c) > +{ > + return __builtin_fmaf (a, -b, c); > +} > + > +double > +f4 (double a, double b, double c) > +{ > + return __builtin_fma (a, -b, c); > +} > + > +void > +f5 (float a, float b, float c, float d, float e, float *res) > +{ > + res[0] = __builtin_fmaf (-a, b, c); > + res[1] = __builtin_fmaf (-a, d, e); > +} > + > +void > +f6 (double a, double b, double c, double d, double e, double *res) > +{ > + res[0] = __builtin_fma (-a, b, c); > + res[1] = __builtin_fma (-a, d, e); > +} > + > +void > +f7 (float a, float b, float c, float d, float e, float *res) > +{ > + res[0] = __builtin_fmaf (a, -b, c); > + res[1] = __builtin_fmaf (d, -b, e); > +} > + > +void > +f8 (double a, double b, double c, double d, double e, double *res) > +{ > + res[0] = __builtin_fma (a, -b, c); > + res[1] = __builtin_fma (d, -b, e); > +} > + > +float > +f9 (float a, float b, float c) > +{ > + return -__builtin_fmaf (a, b, -c); > +} > + > +double > +f10 (double a, double b, double c) > +{ > + return -__builtin_fma (a, b, -c); > +} > + > +/* { dg-final { scan-tree-dump-times { = \.FNMA \(} 14 "optimized" { target scalar_all_fma } } } */ > Index: gcc/testsuite/gcc.dg/fma-7.c > =================================================================== > --- /dev/null 2018-04-20 16:19:46.369131350 +0100 > +++ gcc/testsuite/gcc.dg/fma-7.c 2018-05-17 09:18:19.976942432 +0100 > @@ -0,0 +1,67 @@ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +float > +f1 (float a, float b, float c) > +{ > + return __builtin_fmaf (-a, b, -c); > +} > + > +double > +f2 (double a, double b, double c) > +{ > + return __builtin_fma (-a, b, -c); > +} > + > +float > +f3 (float a, float b, float c) > +{ > + return __builtin_fmaf (a, -b, -c); > +} > + > +double > +f4 (double a, double b, double c) > +{ > + return __builtin_fma (a, -b, -c); > +} > + > +void > +f5 (float a, float b, float c, float d, float *res) > +{ > + res[0] = __builtin_fmaf (-a, b, -c); > + res[1] = __builtin_fmaf (-a, d, -c); > +} > + > +void > +f6 (double a, double b, double c, double d, double *res) > +{ > + res[0] = __builtin_fma (-a, b, -c); > + res[1] = __builtin_fma (-a, d, -c); > +} > + > +void > +f7 (float a, float b, float c, float d, float *res) > +{ > + res[0] = __builtin_fmaf (a, -b, -c); > + res[1] = __builtin_fmaf (d, -b, -c); > +} > + > +void > +f8 (double a, double b, double c, double d, double *res) > +{ > + res[0] = __builtin_fma (a, -b, -c); > + res[1] = __builtin_fma (d, -b, -c); > +} > + > +float > +f9 (float a, float b, float c) > +{ > + return -__builtin_fmaf (a, b, c); > +} > + > +double > +f10 (double a, double b, double c) > +{ > + return -__builtin_fma (a, b, c); > +} > + > +/* { dg-final { scan-tree-dump-times { = \.FNMS \(} 14 "optimized" { target scalar_all_fma } } } */ > Index: gcc/testsuite/gcc.dg/gimplefe-26.c > =================================================================== > --- gcc/testsuite/gcc.dg/gimplefe-26.c 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/testsuite/gcc.dg/gimplefe-26.c 2018-05-17 09:18:19.976942432 +0100 > @@ -1,16 +1,15 @@ > -/* { dg-do compile { target c99_runtime } } */ > +/* { dg-do compile { target scalar_all_fma } } */ > /* { dg-options "-O -fgimple -fdump-tree-ssa-gimple" } */ > #define foo(type, num) \ > type __GIMPLE () foo_##num (type a, type b, type c) \ > { \ > type t0; \ > - t0_1 = __FMA (a, b, c); \ > + t0_1 = .FMA (a, b, c); \ > return t0_1; \ > } > foo(float, 1) > foo(double, 2) > -foo(long double, 3) > -/* { dg-final { scan-tree-dump-times "__FMA" 3 "ssa" } } */ > +/* { dg-final { scan-tree-dump-times {\.FMA} 2 "ssa" } } */ > Index: gcc/testsuite/gfortran.dg/reassoc_7.f > =================================================================== > --- gcc/testsuite/gfortran.dg/reassoc_7.f 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/testsuite/gfortran.dg/reassoc_7.f 2018-05-17 09:18:19.976942432 +0100 > @@ -1,5 +1,5 @@ > ! { dg-do compile } > -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } > +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } > SUBROUTINE S55199(P,Dvdph) > implicit none > Index: gcc/testsuite/gfortran.dg/reassoc_8.f > =================================================================== > --- gcc/testsuite/gfortran.dg/reassoc_8.f 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/testsuite/gfortran.dg/reassoc_8.f 2018-05-17 09:18:19.977942409 +0100 > @@ -1,5 +1,5 @@ > ! { dg-do compile } > -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } > +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } > SUBROUTINE S55199(P,Dvdph) > implicit none > Index: gcc/testsuite/gfortran.dg/reassoc_9.f > =================================================================== > --- gcc/testsuite/gfortran.dg/reassoc_9.f 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/testsuite/gfortran.dg/reassoc_9.f 2018-05-17 09:18:19.977942409 +0100 > @@ -1,5 +1,5 @@ > ! { dg-do compile } > -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } > +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } > SUBROUTINE S55199(P,Dvdph) > implicit none > Index: gcc/testsuite/gfortran.dg/reassoc_10.f > =================================================================== > --- gcc/testsuite/gfortran.dg/reassoc_10.f 2018-05-16 12:48:59.410941892 +0100 > +++ gcc/testsuite/gfortran.dg/reassoc_10.f 2018-05-17 09:18:19.976942432 +0100 > @@ -1,5 +1,5 @@ > ! { dg-do compile } > -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } > +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } > SUBROUTINE S55199(P,Q,Dvdph) > implicit none
On Thu, May 17, 2018 at 1:56 AM, Richard Sandiford <richard.sandiford@linaro.org> wrote: > Richard Biener <richard.guenther@gmail.com> writes: >>> @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result, >>> } >> >>> if (negate_p) >>> - mulop1 = force_gimple_operand_gsi (&gsi, >>> - build1 (NEGATE_EXPR, >>> - type, mulop1), >>> - true, NULL_TREE, true, >>> - GSI_SAME_STMT); >>> + mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1); >> >>> - fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), >>> - FMA_EXPR, mulop1, op2, addop); >>> + if (seq) >>> + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); >>> + fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, >> addop); >>> + gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt)); >>> + gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal >> (use_stmt)); >>> + gsi_replace (&gsi, fma_stmt, true); >>> + /* Valueize aggressively so that we generate FMS, FNMA and FNMS >>> + regardless of where the negation occurs. */ >>> + if (fold_stmt (&gsi, aggressive_valueize)) >>> + update_stmt (gsi_stmt (gsi)); >> >> I think it would be nice to be able to use gimple_build () with IFNs so you >> can >> gimple_build () the IFN and then use gsi_replace_with_seq () on it. You >> only need to fold with generated negates, not with negates already in the >> IL? >> The the folding implied with gimple_build will take care of it. > > The idea was to pick up existing negates that feed the multiplication > as well as any added by the pass itself. > > On IRC yesterday we talked about how this should handle the ECF_NOTHROW > flag, and whether things like IFN_SQRT and IFN_FMA should always be > nothrow (like the built-in functions are). But in the end I thought > it'd be better to keep things as they are. We already handle > -fnon-call-exceptions for unfused a * b + c and before the patch also > handled it for FMA_EXPR. It'd seem like a step backwards if the new > internal functions didn't handle it too. If anything it seems like the > built-in functions should change to be closer to the tree_code and > internal_fn way of doing things, if we want to support -fnon-call-exceptions > properly. > > This also surprised me when doing the if-conversion patch I sent yesterday. > We're happy to vectorise: > > for (int i = 0; i < 100; ++i) > x[i] = ... ? sqrt (x[i]) : 0; > > by doing the sqrt unconditionally and selecting on the result, even with > the default maths flags, but refuse to vectorise the simpler: > > for (int i = 0; i < 100; ++i) > x[i] = ... ? x[i] + 1 : 0; > > in the same way. > >> Otherwise can you please move aggressive_valueize to gimple-fold.[ch] >> alongside no_follow_ssa_edges / follow_single_use_edges and maybe >> rename it as follow_all_ssa_edges? > > Ah, yeah, that's definitely a better name. > > I also renamed all_scalar_fma to scalar_all_fma, since I realised > after Andrew's reply that the old name made it sound like it was > "all scalars", whereas it meant to mean "all fmas". > > Tested as before. > > Thanks, > Richard > > 2018-05-17 Richard Sandiford <richard.sandiford@linaro.org> > > gcc/ > * doc/sourcebuild.texi (scalar_all_fma): Document. > * tree.def (FMA_EXPR): Delete. > * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions. > * internal-fn.c (ternary_direct): New macro. > (expand_ternary_optab_fn): Likewise. > (direct_ternary_optab_supported_p): Likewise. > * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h. > * builtins.c (fold_builtin_fma): Delete. > (fold_builtin_3): Don't call it. > * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling. > * expr.c (expand_expr_real_2): Likewise. > * fold-const.c (operand_equal_p): Likewise. > (fold_ternary_loc): Likewise. > * gimple-pretty-print.c (dump_ternary_rhs): Likewise. > * gimple.c (DEFTREECODE): Likewise. > * gimplify.c (gimplify_expr): Likewise. > * optabs-tree.c (optab_for_tree_code): Likewise. > * tree-cfg.c (verify_gimple_assign_ternary): Likewise. > * tree-eh.c (operation_could_trap_p): Likewise. > (stmt_could_throw_1_p): Likewise. > * tree-inline.c (estimate_operator_cost): Likewise. > * tree-pretty-print.c (dump_generic_node): Likewise. > (op_code_prio): Likewise. > * tree-ssa-loop-im.c (stmt_cost): Likewise. > * tree-ssa-operands.c (get_expr_operands): Likewise. > * tree.c (commutative_ternary_tree_code, add_expr): Likewise. > * fold-const-call.h (fold_fma): Delete. > * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS, > CFN_FNMA and CFN_FNMS. > (fold_fma): Delete. > * genmatch.c (combined_fn): New enum. > (commutative_ternary_tree_code): Remove FMA_EXPR handling. > (commutative_op): New function. > (commutate): Use it. Handle more than 2 operands. > (dt_operand::gen_gimple_expr): Use commutative_op. > (parser::parse_expr): Allow :c to be used with non-binary > operators if the commutative operand is known. > * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle > CFN_FMS, CFN_FNMA and CFN_FNMS. > (backprop::process_assign_use): Remove FMA_EXPR handling. > * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise. > (gen_hsa_fma): New function. > (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS, > IFN_FNMA and IFN_FNMS. > * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS. > * gimple-fold.h (follow_all_ssa_edges): Declare. > * gimple-fold.c (follow_all_ssa_edges): New function. > * tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the > gimple_build interface and use follow_all_ssa_edges to fold the result. > (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p > instead of checking for optabs directly. > * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls > rather than FMA_EXPRs. > * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a > call to IFN_FMA instead of an FMA_EXPR. > > gcc/brig/ > * brigfrontend/brig-function.cc > (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA > for BRIG_OPCODE_FMA. > (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA > as a call. > > gcc/c/ > * gimple-parser.c (c_parser_gimple_postfix_expression): Remove > __FMA_EXPR handlng. > > gcc/cp/ > * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR handling. > (potential_constant_expression_1): Likewise. > > gcc/testsuite/ > * lib/target-supports.exp (check_effective_target_scalar_all_fma): > New proc. > * gcc.dg/fma-1.c: New test. > * gcc.dg/fma-2.c: Likewise. > * gcc.dg/fma-3.c: Likewise. > * gcc.dg/fma-4.c: Likewise. > * gcc.dg/fma-5.c: Likewise. > * gcc.dg/fma-6.c: Likewise. > * gcc.dg/fma-7.c: Likewise. > * gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require > scalar_all_fma. > * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off. > * gfortran.dg/reassoc_8.f: Likewise. > * gfortran.dg/reassoc_9.f: Likewise. > * gfortran.dg/reassoc_10.f: Likewise. > This caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85881 H.J.
"H.J. Lu" <hjl.tools@gmail.com> writes: > On Thu, May 17, 2018 at 1:56 AM, Richard Sandiford > <richard.sandiford@linaro.org> wrote: >> Richard Biener <richard.guenther@gmail.com> writes: >>>> @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result, >>>> } >>> >>>> if (negate_p) >>>> - mulop1 = force_gimple_operand_gsi (&gsi, >>>> - build1 (NEGATE_EXPR, >>>> - type, mulop1), >>>> - true, NULL_TREE, true, >>>> - GSI_SAME_STMT); >>>> + mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1); >>> >>>> - fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), >>>> - FMA_EXPR, mulop1, op2, addop); >>>> + if (seq) >>>> + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); >>>> + fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, >>> addop); >>>> + gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt)); >>>> + gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal >>> (use_stmt)); >>>> + gsi_replace (&gsi, fma_stmt, true); >>>> + /* Valueize aggressively so that we generate FMS, FNMA and FNMS >>>> + regardless of where the negation occurs. */ >>>> + if (fold_stmt (&gsi, aggressive_valueize)) >>>> + update_stmt (gsi_stmt (gsi)); >>> >>> I think it would be nice to be able to use gimple_build () with IFNs so you >>> can >>> gimple_build () the IFN and then use gsi_replace_with_seq () on it. You >>> only need to fold with generated negates, not with negates already in the >>> IL? >>> The the folding implied with gimple_build will take care of it. >> >> The idea was to pick up existing negates that feed the multiplication >> as well as any added by the pass itself. >> >> On IRC yesterday we talked about how this should handle the ECF_NOTHROW >> flag, and whether things like IFN_SQRT and IFN_FMA should always be >> nothrow (like the built-in functions are). But in the end I thought >> it'd be better to keep things as they are. We already handle >> -fnon-call-exceptions for unfused a * b + c and before the patch also >> handled it for FMA_EXPR. It'd seem like a step backwards if the new >> internal functions didn't handle it too. If anything it seems like the >> built-in functions should change to be closer to the tree_code and >> internal_fn way of doing things, if we want to support -fnon-call-exceptions >> properly. >> >> This also surprised me when doing the if-conversion patch I sent yesterday. >> We're happy to vectorise: >> >> for (int i = 0; i < 100; ++i) >> x[i] = ... ? sqrt (x[i]) : 0; >> >> by doing the sqrt unconditionally and selecting on the result, even with >> the default maths flags, but refuse to vectorise the simpler: >> >> for (int i = 0; i < 100; ++i) >> x[i] = ... ? x[i] + 1 : 0; >> >> in the same way. >> >>> Otherwise can you please move aggressive_valueize to gimple-fold.[ch] >>> alongside no_follow_ssa_edges / follow_single_use_edges and maybe >>> rename it as follow_all_ssa_edges? >> >> Ah, yeah, that's definitely a better name. >> >> I also renamed all_scalar_fma to scalar_all_fma, since I realised >> after Andrew's reply that the old name made it sound like it was >> "all scalars", whereas it meant to mean "all fmas". >> >> Tested as before. >> >> Thanks, >> Richard >> >> 2018-05-17 Richard Sandiford <richard.sandiford@linaro.org> >> >> gcc/ >> * doc/sourcebuild.texi (scalar_all_fma): Document. >> * tree.def (FMA_EXPR): Delete. >> * internal-fn.def (FMA, FMS, FNMA, FNMS): New internal functions. >> * internal-fn.c (ternary_direct): New macro. >> (expand_ternary_optab_fn): Likewise. >> (direct_ternary_optab_supported_p): Likewise. >> * Makefile.in (build/genmatch.o): Depend on case-fn-macros.h. >> * builtins.c (fold_builtin_fma): Delete. >> (fold_builtin_3): Don't call it. >> * cfgexpand.c (expand_debug_expr): Remove FMA_EXPR handling. >> * expr.c (expand_expr_real_2): Likewise. >> * fold-const.c (operand_equal_p): Likewise. >> (fold_ternary_loc): Likewise. >> * gimple-pretty-print.c (dump_ternary_rhs): Likewise. >> * gimple.c (DEFTREECODE): Likewise. >> * gimplify.c (gimplify_expr): Likewise. >> * optabs-tree.c (optab_for_tree_code): Likewise. >> * tree-cfg.c (verify_gimple_assign_ternary): Likewise. >> * tree-eh.c (operation_could_trap_p): Likewise. >> (stmt_could_throw_1_p): Likewise. >> * tree-inline.c (estimate_operator_cost): Likewise. >> * tree-pretty-print.c (dump_generic_node): Likewise. >> (op_code_prio): Likewise. >> * tree-ssa-loop-im.c (stmt_cost): Likewise. >> * tree-ssa-operands.c (get_expr_operands): Likewise. >> * tree.c (commutative_ternary_tree_code, add_expr): Likewise. >> * fold-const-call.h (fold_fma): Delete. >> * fold-const-call.c (fold_const_call_ssss): Handle CFN_FMS, >> CFN_FNMA and CFN_FNMS. >> (fold_fma): Delete. >> * genmatch.c (combined_fn): New enum. >> (commutative_ternary_tree_code): Remove FMA_EXPR handling. >> (commutative_op): New function. >> (commutate): Use it. Handle more than 2 operands. >> (dt_operand::gen_gimple_expr): Use commutative_op. >> (parser::parse_expr): Allow :c to be used with non-binary >> operators if the commutative operand is known. >> * gimple-ssa-backprop.c (backprop::process_builtin_call_use): Handle >> CFN_FMS, CFN_FNMA and CFN_FNMS. >> (backprop::process_assign_use): Remove FMA_EXPR handling. >> * hsa-gen.c (gen_hsa_insns_for_operation_assignment): Likewise. >> (gen_hsa_fma): New function. >> (gen_hsa_insn_for_internal_fn_call): Use it for IFN_FMA, IFN_FMS, >> IFN_FNMA and IFN_FNMS. >> * match.pd: Add folds for IFN_FMS, IFN_FNMA and IFN_FNMS. >> * gimple-fold.h (follow_all_ssa_edges): Declare. >> * gimple-fold.c (follow_all_ssa_edges): New function. >> * tree-ssa-math-opts.c (convert_mult_to_fma_1): Use the >> gimple_build interface and use follow_all_ssa_edges to fold >> the result. >> (convert_mult_to_fma): Use direct_internal_fn_suppoerted_p >> instead of checking for optabs directly. >> * config/i386/i386.c (ix86_add_stmt_cost): Recognize FMAs as calls >> rather than FMA_EXPRs. >> * config/rs6000/rs6000.c (rs6000_gimple_fold_builtin): Create a >> call to IFN_FMA instead of an FMA_EXPR. >> >> gcc/brig/ >> * brigfrontend/brig-function.cc >> (brig_function::get_builtin_for_hsa_opcode): Use BUILT_IN_FMA >> for BRIG_OPCODE_FMA. >> (brig_function::get_tree_code_for_hsa_opcode): Treat BUILT_IN_FMA >> as a call. >> >> gcc/c/ >> * gimple-parser.c (c_parser_gimple_postfix_expression): Remove >> __FMA_EXPR handlng. >> >> gcc/cp/ >> * constexpr.c (cxx_eval_constant_expression): Remove FMA_EXPR >> handling. >> (potential_constant_expression_1): Likewise. >> >> gcc/testsuite/ >> * lib/target-supports.exp (check_effective_target_scalar_all_fma): >> New proc. >> * gcc.dg/fma-1.c: New test. >> * gcc.dg/fma-2.c: Likewise. >> * gcc.dg/fma-3.c: Likewise. >> * gcc.dg/fma-4.c: Likewise. >> * gcc.dg/fma-5.c: Likewise. >> * gcc.dg/fma-6.c: Likewise. >> * gcc.dg/fma-7.c: Likewise. >> * gcc.dg/gimplefe-26.c: Use .FMA instead of __FMA and require >> scalar_all_fma. >> * gfortran.dg/reassoc_7.f: Pass -ffp-contract=off. >> * gfortran.dg/reassoc_8.f: Likewise. >> * gfortran.dg/reassoc_9.f: Likewise. >> * gfortran.dg/reassoc_10.f: Likewise. >> > > This caused: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85881 Sorry about that. Looking at it, I think it's another instance of the original RA problem in PR 80481. The expand code after this patch is the same as before apart from the numbering of the pseudo registers: after the patch there are fewer unused pseudo registers than before, which in itself should be a good thing. However, it seems to be enough to defeat the fix for PR80481, and in fact reverting that fix makes pr80481.C pass again. Thanks, Richard
Index: gcc/doc/sourcebuild.texi =================================================================== --- gcc/doc/sourcebuild.texi 2018-05-08 09:42:01.479723260 +0100 +++ gcc/doc/sourcebuild.texi 2018-05-11 18:08:24.833947453 +0100 @@ -2170,6 +2170,11 @@ Target supports wide characters. @subsubsection Other attributes @table @code +@item all_scalar_fma +Target supports all four fused multiply-add optabs for both @code{float} +and @code{double}. These optabs are: @code{fma_optab}, @code{fms_optab}, +@code{fnma_optab} and @code{fnms_optab}. + @item automatic_stack_alignment Target supports automatic stack alignment. Index: gcc/tree.def =================================================================== --- gcc/tree.def 2018-01-03 11:12:58.606649676 +0000 +++ gcc/tree.def 2018-05-11 18:08:24.852946671 +0100 @@ -1345,12 +1345,6 @@ DEFTREECODE (WIDEN_MULT_MINUS_EXPR, "wid by the second argument. */ DEFTREECODE (WIDEN_LSHIFT_EXPR, "widen_lshift_expr", tcc_binary, 2) -/* Fused multiply-add. - All operands and the result are of the same type. No intermediate - rounding is performed after multiplying operand one with operand two - before adding operand three. */ -DEFTREECODE (FMA_EXPR, "fma_expr", tcc_expression, 3) - /* Widening vector multiplication. The two operands are vectors with N elements of size S. Multiplying the elements of the two vectors will result in N products of size 2*S. Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def 2018-02-20 09:40:41.809452604 +0000 +++ gcc/internal-fn.def 2018-05-11 18:08:24.842947083 +0100 @@ -57,6 +57,7 @@ along with GCC; see the file COPYING3. - unary: a normal unary optab, such as vec_reverse_<mode> - binary: a normal binary optab, such as vec_interleave_lo_<mode> + - ternary: a normal ternary optab, such as fma<mode>4 - cond_binary: a conditional binary optab, such as add<mode>cc @@ -138,6 +139,10 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CO DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, vec_shl_insert, binary) +DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary) +DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary) +DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary) + DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, cond_add, cond_binary) DEF_INTERNAL_OPTAB_FN (COND_SUB, ECF_CONST, cond_sub, cond_binary) DEF_INTERNAL_SIGNED_OPTAB_FN (COND_MIN, ECF_CONST, first, @@ -218,6 +223,9 @@ DEF_INTERNAL_OPTAB_FN (XORSIGN, ECF_CONS /* FP scales. */ DEF_INTERNAL_FLT_FN (LDEXP, ECF_CONST, ldexp, binary) +/* Ternary math functions. */ +DEF_INTERNAL_FLT_FN (FMA, ECF_CONST, fma, ternary) + /* Unary integer ops. */ DEF_INTERNAL_INT_FN (CLRSB, ECF_CONST | ECF_NOTHROW, clrsb, unary) DEF_INTERNAL_INT_FN (CLZ, ECF_CONST | ECF_NOTHROW, clz, unary) Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c 2018-03-02 09:44:45.456595753 +0000 +++ gcc/internal-fn.c 2018-05-11 18:08:24.841947124 +0100 @@ -90,6 +90,7 @@ #define mask_store_lanes_direct { 0, 0, #define scatter_store_direct { 3, 3, false } #define unary_direct { 0, 0, true } #define binary_direct { 0, 0, true } +#define ternary_direct { 0, 0, true } #define cond_unary_direct { 1, 1, true } #define cond_binary_direct { 1, 1, true } #define while_direct { 0, 2, false } @@ -2962,6 +2963,9 @@ #define expand_unary_optab_fn(FN, STMT, #define expand_binary_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 2) +#define expand_ternary_optab_fn(FN, STMT, OPTAB) \ + expand_direct_optab_fn (FN, STMT, OPTAB, 3) + #define expand_cond_unary_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 2) @@ -3047,6 +3051,7 @@ multi_vector_optab_supported_p (convert_ #define direct_unary_optab_supported_p direct_optab_supported_p #define direct_binary_optab_supported_p direct_optab_supported_p +#define direct_ternary_optab_supported_p direct_optab_supported_p #define direct_cond_unary_optab_supported_p direct_optab_supported_p #define direct_cond_binary_optab_supported_p direct_optab_supported_p #define direct_mask_load_optab_supported_p direct_optab_supported_p Index: gcc/Makefile.in =================================================================== --- gcc/Makefile.in 2018-03-13 15:06:01.749426280 +0000 +++ gcc/Makefile.in 2018-05-11 18:08:24.816948152 +0100 @@ -2786,7 +2786,7 @@ build/genmddump.o : genmddump.c $(RTL_BA $(CORETYPES_H) $(GTM_H) errors.h $(READ_MD_H) $(GENSUPPORT_H) build/genmatch.o : genmatch.c $(BCONFIG_H) $(SYSTEM_H) \ $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-map.h $(GGC_H) is-a.h \ - tree.def builtins.def internal-fn.def + tree.def builtins.def internal-fn.def case-cfn-macros.h build/gencfn-macros.o : gencfn-macros.c $(BCONFIG_H) $(SYSTEM_H) \ $(CORETYPES_H) errors.h $(HASH_TABLE_H) hash-set.h builtins.def \ internal-fn.def Index: gcc/builtins.c =================================================================== --- gcc/builtins.c 2018-05-08 09:42:01.633717606 +0100 +++ gcc/builtins.c 2018-05-11 18:08:24.817948111 +0100 @@ -8340,21 +8340,6 @@ fold_builtin_abs (location_t loc, tree a return fold_build1_loc (loc, ABS_EXPR, type, arg); } -/* Fold a call to fma, fmaf, or fmal with arguments ARG[012]. */ - -static tree -fold_builtin_fma (location_t loc, tree arg0, tree arg1, tree arg2, tree type) -{ - /* ??? Only expand to FMA_EXPR if it's directly supported. */ - if (validate_arg (arg0, REAL_TYPE) - && validate_arg (arg1, REAL_TYPE) - && validate_arg (arg2, REAL_TYPE) - && optab_handler (fma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) - return fold_build3_loc (loc, FMA_EXPR, type, arg0, arg1, arg2); - - return NULL_TREE; -} - /* Fold a call to builtin carg(a+bi) -> atan2(b,a). */ static tree @@ -9260,10 +9245,6 @@ fold_builtin_3 (location_t loc, tree fnd CASE_FLT_FN (BUILT_IN_SINCOS): return fold_builtin_sincos (loc, arg0, arg1, arg2); - CASE_FLT_FN (BUILT_IN_FMA): - CASE_FLT_FN_FLOATN_NX (BUILT_IN_FMA): - return fold_builtin_fma (loc, arg0, arg1, arg2, type); - CASE_FLT_FN (BUILT_IN_REMQUO): if (validate_arg (arg0, REAL_TYPE) && validate_arg (arg1, REAL_TYPE) Index: gcc/cfgexpand.c =================================================================== --- gcc/cfgexpand.c 2018-05-09 11:34:47.190553782 +0100 +++ gcc/cfgexpand.c 2018-05-11 18:08:24.818948070 +0100 @@ -4202,7 +4202,6 @@ expand_debug_expr (tree exp) case SAD_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: - case FMA_EXPR: goto ternary; case TRUTH_ANDIF_EXPR: @@ -5190,9 +5189,6 @@ expand_debug_expr (tree exp) } return NULL; - case FMA_EXPR: - return simplify_gen_ternary (FMA, mode, inner_mode, op0, op1, op2); - default: flag_unsupported: if (flag_checking) Index: gcc/expr.c =================================================================== --- gcc/expr.c 2018-05-01 19:30:32.099548805 +0100 +++ gcc/expr.c 2018-05-11 18:08:24.833947453 +0100 @@ -8853,67 +8853,6 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b expand_operands (treeop0, treeop1, subtarget, &op0, &op1, EXPAND_NORMAL); return REDUCE_BIT_FIELD (expand_mult (mode, op0, op1, target, unsignedp)); - case FMA_EXPR: - { - optab opt = fma_optab; - gimple *def0, *def2; - - /* If there is no insn for FMA, emit it as __builtin_fma{,f,l} - call. */ - if (optab_handler (fma_optab, mode) == CODE_FOR_nothing) - { - tree fn = mathfn_built_in (TREE_TYPE (treeop0), BUILT_IN_FMA); - tree call_expr; - - gcc_assert (fn != NULL_TREE); - call_expr = build_call_expr (fn, 3, treeop0, treeop1, treeop2); - return expand_builtin (call_expr, target, subtarget, mode, false); - } - - def0 = get_def_for_expr (treeop0, NEGATE_EXPR); - /* The multiplication is commutative - look at its 2nd operand - if the first isn't fed by a negate. */ - if (!def0) - { - def0 = get_def_for_expr (treeop1, NEGATE_EXPR); - /* Swap operands if the 2nd operand is fed by a negate. */ - if (def0) - std::swap (treeop0, treeop1); - } - def2 = get_def_for_expr (treeop2, NEGATE_EXPR); - - op0 = op2 = NULL; - - if (def0 && def2 - && optab_handler (fnms_optab, mode) != CODE_FOR_nothing) - { - opt = fnms_optab; - op0 = expand_normal (gimple_assign_rhs1 (def0)); - op2 = expand_normal (gimple_assign_rhs1 (def2)); - } - else if (def0 - && optab_handler (fnma_optab, mode) != CODE_FOR_nothing) - { - opt = fnma_optab; - op0 = expand_normal (gimple_assign_rhs1 (def0)); - } - else if (def2 - && optab_handler (fms_optab, mode) != CODE_FOR_nothing) - { - opt = fms_optab; - op2 = expand_normal (gimple_assign_rhs1 (def2)); - } - - if (op0 == NULL) - op0 = expand_expr (treeop0, subtarget, VOIDmode, EXPAND_NORMAL); - if (op2 == NULL) - op2 = expand_normal (treeop2); - op1 = expand_normal (treeop1); - - return expand_ternary_op (TYPE_MODE (type), opt, - op0, op1, op2, target, 0); - } - case MULT_EXPR: /* If this is a fixed-point operation, then we cannot use the code below because "expand_mult" doesn't support sat/no-sat fixed-point Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c 2018-05-08 09:42:01.637717459 +0100 +++ gcc/fold-const.c 2018-05-11 18:08:24.835947371 +0100 @@ -3297,7 +3297,6 @@ #define OP_SAME_WITH_NULL(N) \ case TRUTH_ORIF_EXPR: return OP_SAME (0) && OP_SAME (1); - case FMA_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: if (!OP_SAME (2)) @@ -11708,17 +11707,6 @@ fold_ternary_loc (location_t loc, enum t return NULL_TREE; - case FMA_EXPR: - /* For integers we can decompose the FMA if possible. */ - if (TREE_CODE (arg0) == INTEGER_CST - && TREE_CODE (arg1) == INTEGER_CST) - return fold_build2_loc (loc, PLUS_EXPR, type, - const_binop (MULT_EXPR, arg0, arg1), arg2); - if (integer_zerop (arg2)) - return fold_build2_loc (loc, MULT_EXPR, type, arg0, arg1); - - return fold_fma (loc, type, arg0, arg1, arg2); - case VEC_PERM_EXPR: if (TREE_CODE (arg2) == VECTOR_CST) { Index: gcc/gimple-pretty-print.c =================================================================== --- gcc/gimple-pretty-print.c 2018-02-13 10:28:33.325853088 +0000 +++ gcc/gimple-pretty-print.c 2018-05-11 18:08:24.836947330 +0100 @@ -490,27 +490,6 @@ dump_ternary_rhs (pretty_printer *buffer pp_greater (buffer); break; - case FMA_EXPR: - if (flags & TDF_GIMPLE) - { - pp_string (buffer, "__FMA ("); - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); - pp_comma (buffer); - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); - pp_comma (buffer); - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); - pp_right_paren (buffer); - } - else - { - dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); - pp_string (buffer, " * "); - dump_generic_node (buffer, gimple_assign_rhs2 (gs), spc, flags, false); - pp_string (buffer, " + "); - dump_generic_node (buffer, gimple_assign_rhs3 (gs), spc, flags, false); - } - break; - case DOT_PROD_EXPR: pp_string (buffer, "DOT_PROD_EXPR <"); dump_generic_node (buffer, gimple_assign_rhs1 (gs), spc, flags, false); Index: gcc/gimple.c =================================================================== --- gcc/gimple.c 2018-02-13 10:28:33.415861313 +0000 +++ gcc/gimple.c 2018-05-11 18:08:24.838947247 +0100 @@ -2143,8 +2143,7 @@ #define DEFTREECODE(SYM, STRING, TYPE, N || (SYM) == REALIGN_LOAD_EXPR \ || (SYM) == VEC_COND_EXPR \ || (SYM) == VEC_PERM_EXPR \ - || (SYM) == BIT_INSERT_EXPR \ - || (SYM) == FMA_EXPR) ? GIMPLE_TERNARY_RHS \ + || (SYM) == BIT_INSERT_EXPR) ? GIMPLE_TERNARY_RHS \ : ((SYM) == CONSTRUCTOR \ || (SYM) == OBJ_TYPE_REF \ || (SYM) == ASSERT_EXPR \ Index: gcc/gimplify.c =================================================================== --- gcc/gimplify.c 2018-05-08 09:42:02.972668452 +0100 +++ gcc/gimplify.c 2018-05-11 18:08:24.840947165 +0100 @@ -12086,7 +12086,6 @@ gimplify_expr (tree *expr_p, gimple_seq } break; - case FMA_EXPR: case VEC_PERM_EXPR: /* Classified as tcc_expression. */ goto expr_3; Index: gcc/optabs-tree.c =================================================================== --- gcc/optabs-tree.c 2018-01-13 18:01:26.108685820 +0000 +++ gcc/optabs-tree.c 2018-05-11 18:08:24.842947083 +0100 @@ -143,9 +143,6 @@ optab_for_tree_code (enum tree_code code : (TYPE_SATURATING (type) ? ssmsub_widen_optab : smsub_widen_optab)); - case FMA_EXPR: - return fma_optab; - case VEC_WIDEN_MULT_HI_EXPR: return TYPE_UNSIGNED (type) ? vec_widen_umult_hi_optab : vec_widen_smult_hi_optab; Index: gcc/tree-cfg.c =================================================================== --- gcc/tree-cfg.c 2018-05-01 19:31:03.079312535 +0100 +++ gcc/tree-cfg.c 2018-05-11 18:08:24.848946836 +0100 @@ -4109,20 +4109,6 @@ verify_gimple_assign_ternary (gassign *s } break; - case FMA_EXPR: - if (!useless_type_conversion_p (lhs_type, rhs1_type) - || !useless_type_conversion_p (lhs_type, rhs2_type) - || !useless_type_conversion_p (lhs_type, rhs3_type)) - { - error ("type mismatch in fused multiply-add expression"); - debug_generic_expr (lhs_type); - debug_generic_expr (rhs1_type); - debug_generic_expr (rhs2_type); - debug_generic_expr (rhs3_type); - return true; - } - break; - case VEC_COND_EXPR: if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type) || maybe_ne (TYPE_VECTOR_SUBPARTS (rhs1_type), Index: gcc/tree-eh.c =================================================================== --- gcc/tree-eh.c 2018-02-08 13:34:20.321293427 +0000 +++ gcc/tree-eh.c 2018-05-11 18:08:24.849946795 +0100 @@ -2512,8 +2512,7 @@ operation_could_trap_p (enum tree_code o if (TREE_CODE_CLASS (op) != tcc_comparison && TREE_CODE_CLASS (op) != tcc_unary - && TREE_CODE_CLASS (op) != tcc_binary - && op != FMA_EXPR) + && TREE_CODE_CLASS (op) != tcc_binary) return false; return operation_could_trap_helper_p (op, fp_operation, honor_trapv, @@ -2825,8 +2824,7 @@ stmt_could_throw_1_p (gassign *stmt) if (TREE_CODE_CLASS (code) == tcc_comparison || TREE_CODE_CLASS (code) == tcc_unary - || TREE_CODE_CLASS (code) == tcc_binary - || code == FMA_EXPR) + || TREE_CODE_CLASS (code) == tcc_binary) { if (TREE_CODE_CLASS (code) == tcc_comparison) t = TREE_TYPE (gimple_assign_rhs1 (stmt)); Index: gcc/tree-inline.c =================================================================== --- gcc/tree-inline.c 2018-03-13 15:06:01.738427167 +0000 +++ gcc/tree-inline.c 2018-05-11 18:08:24.849946795 +0100 @@ -3855,7 +3855,6 @@ estimate_operator_cost (enum tree_code c case MINUS_EXPR: case MULT_EXPR: case MULT_HIGHPART_EXPR: - case FMA_EXPR: case ADDR_SPACE_CONVERT_EXPR: case FIXED_CONVERT_EXPR: Index: gcc/tree-pretty-print.c =================================================================== --- gcc/tree-pretty-print.c 2018-03-13 15:06:00.523525161 +0000 +++ gcc/tree-pretty-print.c 2018-05-11 18:08:24.850946754 +0100 @@ -2901,16 +2901,6 @@ dump_generic_node (pretty_printer *pp, t pp_string (pp, " > "); break; - case FMA_EXPR: - pp_string (pp, " FMA_EXPR < "); - dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false); - pp_string (pp, ", "); - dump_generic_node (pp, TREE_OPERAND (node, 1), spc, flags, false); - pp_string (pp, ", "); - dump_generic_node (pp, TREE_OPERAND (node, 2), spc, flags, false); - pp_string (pp, " > "); - break; - case OACC_PARALLEL: pp_string (pp, "#pragma acc parallel"); goto dump_omp_clauses_body; @@ -3549,7 +3539,6 @@ op_code_prio (enum tree_code code) case CEIL_MOD_EXPR: case FLOOR_MOD_EXPR: case ROUND_MOD_EXPR: - case FMA_EXPR: return 13; case TRUTH_NOT_EXPR: Index: gcc/tree-ssa-loop-im.c =================================================================== --- gcc/tree-ssa-loop-im.c 2018-03-13 15:06:00.521525322 +0000 +++ gcc/tree-ssa-loop-im.c 2018-05-11 18:08:24.850946754 +0100 @@ -493,7 +493,6 @@ stmt_cost (gimple *stmt) case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case DOT_PROD_EXPR: - case FMA_EXPR: case TRUNC_DIV_EXPR: case CEIL_DIV_EXPR: case FLOOR_DIV_EXPR: Index: gcc/tree-ssa-operands.c =================================================================== --- gcc/tree-ssa-operands.c 2018-01-03 11:12:59.133628905 +0000 +++ gcc/tree-ssa-operands.c 2018-05-11 18:08:24.851946713 +0100 @@ -849,7 +849,6 @@ get_expr_operands (struct function *fn, case REALIGN_LOAD_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: - case FMA_EXPR: { get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 0), flags); get_expr_operands (fn, stmt, &TREE_OPERAND (expr, 1), flags); Index: gcc/tree.c =================================================================== --- gcc/tree.c 2018-05-08 09:42:01.631717680 +0100 +++ gcc/tree.c 2018-05-11 18:08:24.852946671 +0100 @@ -7171,7 +7171,6 @@ commutative_ternary_tree_code (enum tree case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case DOT_PROD_EXPR: - case FMA_EXPR: return true; default: @@ -7457,7 +7456,6 @@ add_expr (const_tree t, inchash::hash &h flags &= ~OEP_ADDRESS_OF; break; - case FMA_EXPR: case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: { Index: gcc/fold-const-call.h =================================================================== --- gcc/fold-const-call.h 2018-01-03 11:12:58.150667646 +0000 +++ gcc/fold-const-call.h 2018-05-11 18:08:24.833947453 +0100 @@ -23,7 +23,6 @@ #define GCC_FOLD_CONST_CALL_H tree fold_const_call (combined_fn, tree, tree); tree fold_const_call (combined_fn, tree, tree, tree); tree fold_const_call (combined_fn, tree, tree, tree, tree); -tree fold_fma (location_t, tree, tree, tree, tree); tree build_cmp_result (tree type, int res); #endif Index: gcc/fold-const-call.c =================================================================== --- gcc/fold-const-call.c 2018-05-08 09:42:01.844709861 +0100 +++ gcc/fold-const-call.c 2018-05-11 18:08:24.833947453 +0100 @@ -1606,6 +1606,26 @@ fold_const_call_ssss (real_value *result CASE_CFN_FMA_FN: return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, arg2, format); + case CFN_FMS: + { + real_value new_arg2 = real_value_negate (arg2); + return do_mpfr_arg3 (result, mpfr_fma, arg0, arg1, &new_arg2, format); + } + + case CFN_FNMA: + { + real_value new_arg0 = real_value_negate (arg0); + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, arg2, format); + } + + case CFN_FNMS: + { + real_value new_arg0 = real_value_negate (arg0); + real_value new_arg2 = real_value_negate (arg2); + return do_mpfr_arg3 (result, mpfr_fma, &new_arg0, arg1, + &new_arg2, format); + } + default: return false; } @@ -1719,20 +1739,3 @@ fold_const_call (combined_fn fn, tree ty return fold_const_call_1 (fn, type, arg0, arg1, arg2); } } - -/* Fold a fma operation with arguments ARG[012]. */ - -tree -fold_fma (location_t, tree type, tree arg0, tree arg1, tree arg2) -{ - REAL_VALUE_TYPE result; - if (real_cst_p (arg0) - && real_cst_p (arg1) - && real_cst_p (arg2) - && do_mpfr_arg3 (&result, mpfr_fma, TREE_REAL_CST_PTR (arg0), - TREE_REAL_CST_PTR (arg1), TREE_REAL_CST_PTR (arg2), - REAL_MODE_FORMAT (TYPE_MODE (type)))) - return build_real (type, result); - - return NULL_TREE; -} Index: gcc/genmatch.c =================================================================== --- gcc/genmatch.c 2018-03-01 08:20:43.846526310 +0000 +++ gcc/genmatch.c 2018-05-11 18:08:24.835947371 +0100 @@ -241,6 +241,20 @@ enum internal_fn { IFN_LAST }; +enum combined_fn { +#define DEF_BUILTIN(ENUM, N, C, T, LT, B, F, NA, AT, IM, COND) \ + CFN_##ENUM = int (ENUM), +#include "builtins.def" + +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) \ + CFN_##CODE = int (END_BUILTINS) + int (IFN_##CODE), +#include "internal-fn.def" + + CFN_LAST +}; + +#include "case-cfn-macros.h" + /* Return true if CODE represents a commutative tree code. Otherwise return false. */ bool @@ -288,7 +302,6 @@ commutative_ternary_tree_code (enum tree case WIDEN_MULT_PLUS_EXPR: case WIDEN_MULT_MINUS_EXPR: case DOT_PROD_EXPR: - case FMA_EXPR: return true; default: @@ -450,6 +463,44 @@ is_a_helper <user_id *>::test (id_base * return id->kind == id_base::USER; } +/* If ID has a pair of consecutive, commutative operands, return the + index of the first, otherwise return -1. */ + +static int +commutative_op (id_base *id) +{ + if (operator_id *code = dyn_cast <operator_id *> (id)) + { + if (commutative_tree_code (code->code) + || commutative_ternary_tree_code (code->code)) + return 0; + return -1; + } + if (fn_id *fn = dyn_cast <fn_id *> (id)) + switch (fn->fn) + { + CASE_CFN_FMA: + case CFN_FMS: + case CFN_FNMA: + case CFN_FNMS: + return 0; + + default: + return -1; + } + if (user_id *uid = dyn_cast<user_id *> (id)) + { + int res = commutative_op (uid->substitutes[0]); + if (res < 0) + return 0; + for (unsigned i = 1; i < uid->substitutes.length (); ++i) + if (res != commutative_op (uid->substitutes[i])) + return -1; + return res; + } + return -1; +} + /* Add a predicate identifier to the hash. */ static predicate_id * @@ -946,6 +997,9 @@ commutate (operand *op, vec<vec<user_id if (!e->is_commutative) return ret; + /* The operation is always binary if it isn't inherently commutative. */ + int natural_opno = commutative_op (e->operation); + unsigned int opno = natural_opno >= 0 ? natural_opno : 0; for (unsigned i = 0; i < result.length (); ++i) { expr *ne = new expr (e); @@ -994,9 +1048,11 @@ commutate (operand *op, vec<vec<user_id } } ne->is_commutative = false; - // result[i].length () is 2 since e->operation is binary - for (unsigned j = result[i].length (); j; --j) - ne->append_op (result[i][j-1]); + for (unsigned j = 0; j < result[i].length (); ++j) + { + int old_j = (j == opno ? opno + 1 : j == opno + 1 ? opno : j); + ne->append_op (result[i][old_j]); + } ret.safe_push (ne); } @@ -2759,24 +2815,18 @@ dt_operand::gen_gimple_expr (FILE *f, in /* While the toplevel operands are canonicalized by the caller after valueizing operands of sub-expressions we have to re-canonicalize operand order. */ - if (operator_id *code = dyn_cast <operator_id *> (id)) + int opno = commutative_op (id); + if (opno >= 0) { - /* ??? We can't canonicalize tcc_comparison operands here - because that requires changing the comparison code which - we already matched... */ - if (commutative_tree_code (code->code) - || commutative_ternary_tree_code (code->code)) - { - char child_opname0[20], child_opname1[20]; - gen_opname (child_opname0, 0); - gen_opname (child_opname1, 1); - fprintf_indent (f, indent, - "if (tree_swap_operands_p (%s, %s))\n", - child_opname0, child_opname1); - fprintf_indent (f, indent, - " std::swap (%s, %s);\n", - child_opname0, child_opname1); - } + char child_opname0[20], child_opname1[20]; + gen_opname (child_opname0, opno); + gen_opname (child_opname1, opno + 1); + fprintf_indent (f, indent, + "if (tree_swap_operands_p (%s, %s))\n", + child_opname0, child_opname1); + fprintf_indent (f, indent, + " std::swap (%s, %s);\n", + child_opname0, child_opname1); } return n_braces; @@ -4217,11 +4267,14 @@ parser::parse_expr () e->operation->id, e->operation->nargs, e->ops.length ()); if (is_commutative) { - if (e->ops.length () == 2) + if (e->ops.length () == 2 + || commutative_op (e->operation) >= 0) e->is_commutative = true; else - fatal_at (token, "only binary operators or function with " - "two arguments can be marked commutative"); + fatal_at (token, "only binary operators or functions with " + "two arguments can be marked commutative, " + "unless the operation is known to be inherently " + "commutative"); } e->expr_type = expr_type; return op; Index: gcc/gimple-ssa-backprop.c =================================================================== --- gcc/gimple-ssa-backprop.c 2018-01-03 11:12:59.113629694 +0000 +++ gcc/gimple-ssa-backprop.c 2018-05-11 18:08:24.836947330 +0100 @@ -375,6 +375,9 @@ backprop::process_builtin_call_use (gcal CASE_CFN_FMA: CASE_CFN_FMA_FN: + case CFN_FMS: + case CFN_FNMA: + case CFN_FNMS: /* In X * X + Y, where Y is distinct from X, the sign of X doesn't matter. */ if (gimple_call_arg (call, 0) == rhs @@ -420,15 +423,6 @@ backprop::process_assign_use (gassign *a } break; - case FMA_EXPR: - /* In X * X + Y, where Y is distinct from X, the sign of X doesn't - matter. */ - if (gimple_assign_rhs1 (assign) == rhs - && gimple_assign_rhs2 (assign) == rhs - && gimple_assign_rhs3 (assign) != rhs) - info->flags.ignore_sign = true; - break; - case MULT_EXPR: /* In X * X, the sign of X doesn't matter. */ if (gimple_assign_rhs1 (assign) == rhs Index: gcc/hsa-gen.c =================================================================== --- gcc/hsa-gen.c 2018-03-17 08:30:21.230924973 +0000 +++ gcc/hsa-gen.c 2018-05-11 18:08:24.841947124 +0100 @@ -3178,23 +3178,6 @@ gen_hsa_insns_for_operation_assignment ( case NEGATE_EXPR: opcode = BRIG_OPCODE_NEG; break; - case FMA_EXPR: - /* There is a native HSA instruction for scalar FMAs but not for vector - ones. */ - if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) - { - hsa_op_reg *dest - = hsa_cfun->reg_for_gimple_ssa (gimple_assign_lhs (assign)); - hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); - hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); - hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); - hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); - gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); - gen_hsa_binary_operation (BRIG_OPCODE_ADD, dest, tmp, op3, hbb); - return; - } - opcode = BRIG_OPCODE_MAD; - break; case MIN_EXPR: opcode = BRIG_OPCODE_MIN; break; @@ -4490,6 +4473,57 @@ gen_hsa_divmod (gcall *call, hsa_bb *hbb insn->set_output_in_type (dest, 0, hbb); } +/* Emit instructions that implement FMA, FMS, FNMA or FNMS call STMT. + Instructions are appended to basic block HBB. NEGATE1 is true for + FNMA and FNMS. NEGATE3 is true for FMS and FNMS. */ + +static void +gen_hsa_fma (gcall *call, hsa_bb *hbb, bool negate1, bool negate3) +{ + tree lhs = gimple_call_lhs (call); + if (lhs == NULL_TREE) + return; + + tree rhs1 = gimple_call_arg (call, 0); + tree rhs2 = gimple_call_arg (call, 1); + tree rhs3 = gimple_call_arg (call, 2); + + hsa_op_reg *dest = hsa_cfun->reg_for_gimple_ssa (lhs); + hsa_op_with_type *op1 = hsa_reg_or_immed_for_gimple_op (rhs1, hbb); + hsa_op_with_type *op2 = hsa_reg_or_immed_for_gimple_op (rhs2, hbb); + hsa_op_with_type *op3 = hsa_reg_or_immed_for_gimple_op (rhs3, hbb); + + if (negate1) + { + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op1, hbb); + op1 = tmp; + } + + /* There is a native HSA instruction for scalar FMAs but not for vector + ones. */ + if (TREE_CODE (TREE_TYPE (lhs)) == VECTOR_TYPE) + { + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); + gen_hsa_binary_operation (BRIG_OPCODE_MUL, tmp, op1, op2, hbb); + gen_hsa_binary_operation (negate3 ? BRIG_OPCODE_SUB : BRIG_OPCODE_ADD, + dest, tmp, op3, hbb); + } + else + { + if (negate3) + { + hsa_op_reg *tmp = new hsa_op_reg (dest->m_type); + gen_hsa_unary_operation (BRIG_OPCODE_NEG, tmp, op3, hbb); + op3 = tmp; + } + hsa_insn_basic *insn = new hsa_insn_basic (4, BRIG_OPCODE_MAD, + dest->m_type, dest, + op1, op2, op3); + hbb->append_insn (insn); + } +} + /* Set VALUE to a shadow kernel debug argument and append a new instruction to HBB basic block. */ @@ -5224,6 +5258,22 @@ gen_hsa_insn_for_internal_fn_call (gcall gen_hsa_insns_for_call_of_internal_fn (stmt, hbb); break; + case IFN_FMA: + gen_hsa_fma (stmt, hbb, false, false); + break; + + case IFN_FMS: + gen_hsa_fma (stmt, hbb, false, true); + break; + + case IFN_FNMA: + gen_hsa_fma (stmt, hbb, true, false); + break; + + case IFN_FNMS: + gen_hsa_fma (stmt, hbb, true, true); + break; + default: HSA_SORRY_ATV (gimple_location (stmt), "support for HSA does not implement internal function: %s", Index: gcc/match.pd =================================================================== --- gcc/match.pd 2018-05-02 08:39:32.882737077 +0100 +++ gcc/match.pd 2018-05-11 18:08:24.842947083 +0100 @@ -4702,3 +4702,60 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) || wi::geu_p (wi::to_wide (@rpos), wi::to_wide (@ipos) + isize)) (BIT_FIELD_REF @0 @rsize @rpos))))) + +(for fmas (FMA) + (simplify + (fmas:c (negate @0) @1 @2) + (IFN_FNMA @0 @1 @2)) + (simplify + (fmas @0 @1 (negate @2)) + (IFN_FMS @0 @1 @2)) + (simplify + (fmas:c (negate @0) @1 (negate @2)) + (IFN_FNMS @0 @1 @2)) + (simplify + (negate (fmas@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FNMS @0 @1 @2)))) + +(simplify + (IFN_FMS:c (negate @0) @1 @2) + (IFN_FNMS @0 @1 @2)) +(simplify + (IFN_FMS @0 @1 (negate @2)) + (IFN_FMA @0 @1 @2)) +(simplify + (IFN_FMS:c (negate @0) @1 (negate @2)) + (IFN_FNMA @0 @1 @2)) +(simplify + (negate (IFN_FMS@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FNMA @0 @1 @2))) + +(simplify + (IFN_FNMA:c (negate @0) @1 @2) + (IFN_FMA @0 @1 @2)) +(simplify + (IFN_FNMA @0 @1 (negate @2)) + (IFN_FNMS @0 @1 @2)) +(simplify + (IFN_FNMA:c (negate @0) @1 (negate @2)) + (IFN_FMS @0 @1 @2)) +(simplify + (negate (IFN_FNMA@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FMS @0 @1 @2))) + +(simplify + (IFN_FNMS:c (negate @0) @1 @2) + (IFN_FMS @0 @1 @2)) +(simplify + (IFN_FNMS @0 @1 (negate @2)) + (IFN_FNMA @0 @1 @2)) +(simplify + (IFN_FNMS:c (negate @0) @1 (negate @2)) + (IFN_FMA @0 @1 @2)) +(simplify + (negate (IFN_FNMS@3 @0 @1 @2)) + (if (single_use (@3)) + (IFN_FMA @0 @1 @2))) Index: gcc/tree-ssa-math-opts.c =================================================================== --- gcc/tree-ssa-math-opts.c 2018-01-12 14:45:51.037434575 +0000 +++ gcc/tree-ssa-math-opts.c 2018-05-11 18:08:24.850946754 +0100 @@ -2640,6 +2640,14 @@ convert_plusminus_to_widen (gimple_stmt_ return true; } +/* gimple_fold callback that "valueizes" everything. */ + +static tree +aggressive_valueize (tree val) +{ + return val; +} + /* Given a result MUL_RESULT which is a result of a multiplication of OP1 and OP2 and which we know is used in statements that can be, together with the multiplication, converted to FMAs, perform the transformation. */ @@ -2650,7 +2658,7 @@ convert_mult_to_fma_1 (tree mul_result, tree type = TREE_TYPE (mul_result); gimple *use_stmt; imm_use_iterator imm_iter; - gassign *fma_stmt; + gcall *fma_stmt; FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, mul_result) { @@ -2658,6 +2666,7 @@ convert_mult_to_fma_1 (tree mul_result, enum tree_code use_code; tree addop, mulop1 = op1, result = mul_result; bool negate_p = false; + gimple_seq seq = NULL; if (is_gimple_debug (use_stmt)) continue; @@ -2683,11 +2692,7 @@ convert_mult_to_fma_1 (tree mul_result, addop = gimple_assign_rhs2 (use_stmt); /* a * b - c -> a * b + (-c) */ if (gimple_assign_rhs_code (use_stmt) == MINUS_EXPR) - addop = force_gimple_operand_gsi (&gsi, - build1 (NEGATE_EXPR, - type, addop), - true, NULL_TREE, true, - GSI_SAME_STMT); + addop = gimple_build (&seq, NEGATE_EXPR, type, addop); } else { @@ -2698,23 +2703,26 @@ convert_mult_to_fma_1 (tree mul_result, } if (negate_p) - mulop1 = force_gimple_operand_gsi (&gsi, - build1 (NEGATE_EXPR, - type, mulop1), - true, NULL_TREE, true, - GSI_SAME_STMT); + mulop1 = gimple_build (&seq, NEGATE_EXPR, type, mulop1); - fma_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), - FMA_EXPR, mulop1, op2, addop); + if (seq) + gsi_insert_seq_before (&gsi, seq, GSI_SAME_STMT); + fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop); + gimple_call_set_lhs (fma_stmt, gimple_assign_lhs (use_stmt)); + gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (use_stmt)); + gsi_replace (&gsi, fma_stmt, true); + /* Valueize aggressively so that we generate FMS, FNMA and FNMS + regardless of where the negation occurs. */ + if (fold_stmt (&gsi, aggressive_valueize)) + update_stmt (gsi_stmt (gsi)); if (dump_file && (dump_flags & TDF_DETAILS)) { fprintf (dump_file, "Generated FMA "); - print_gimple_stmt (dump_file, fma_stmt, 0, 0); + print_gimple_stmt (dump_file, gsi_stmt (gsi), 0, 0); fprintf (dump_file, "\n"); } - gsi_replace (&gsi, fma_stmt, true); widen_mul_stats.fmas_inserted++; } } @@ -2862,7 +2870,8 @@ convert_mult_to_fma (gimple *mul_stmt, t /* If the target doesn't support it, don't generate it. We assume that if fma isn't available then fms, fnma or fnms are not either. */ - if (optab_handler (fma_optab, TYPE_MODE (type)) == CODE_FOR_nothing) + optimization_type opt_type = bb_optimization_type (gimple_bb (mul_stmt)); + if (!direct_internal_fn_supported_p (IFN_FMA, type, opt_type)) return false; /* If the multiplication has zero uses, it is kept around probably because @@ -2958,8 +2967,8 @@ convert_mult_to_fma (gimple *mul_stmt, t that a mult / subtract pair. */ if (use_code == MINUS_EXPR && !negate_p && gimple_assign_rhs1 (use_stmt) == result - && optab_handler (fms_optab, TYPE_MODE (type)) == CODE_FOR_nothing - && optab_handler (fnma_optab, TYPE_MODE (type)) != CODE_FOR_nothing) + && !direct_internal_fn_supported_p (IFN_FMS, type, opt_type) + && direct_internal_fn_supported_p (IFN_FNMA, type, opt_type)) { tree rhs2 = gimple_assign_rhs2 (use_stmt); Index: gcc/config/i386/i386.c =================================================================== --- gcc/config/i386/i386.c 2018-05-09 11:34:41.204789431 +0100 +++ gcc/config/i386/i386.c 2018-05-11 18:08:24.825947782 +0100 @@ -50516,19 +50516,20 @@ ix86_add_stmt_cost (void *data, int coun tree vectype = stmt_info ? stmt_vectype (stmt_info) : NULL_TREE; int stmt_cost = - 1; + bool fp = false; + machine_mode mode = TImode; + + if (vectype != NULL) + { + fp = FLOAT_TYPE_P (vectype); + mode = TYPE_MODE (vectype); + } + if ((kind == vector_stmt || kind == scalar_stmt) && stmt_info && stmt_info->stmt && gimple_code (stmt_info->stmt) == GIMPLE_ASSIGN) { tree_code subcode = gimple_assign_rhs_code (stmt_info->stmt); - bool fp = false; - machine_mode mode = TImode; - - if (vectype != NULL) - { - fp = FLOAT_TYPE_P (vectype); - mode = TYPE_MODE (vectype); - } /*machine_mode inner_mode = mode; if (VECTOR_MODE_P (mode)) inner_mode = GET_MODE_INNER (mode);*/ @@ -50559,12 +50560,6 @@ ix86_add_stmt_cost (void *data, int coun case MULT_HIGHPART_EXPR: stmt_cost = ix86_multiplication_cost (ix86_cost, mode); break; - case FMA_EXPR: - stmt_cost = ix86_vec_cost (mode, - mode == SFmode ? ix86_cost->fmass - : ix86_cost->fmasd, - true); - break; case NEGATE_EXPR: if (SSE_FLOAT_MODE_P (mode) && TARGET_SSE_MATH) stmt_cost = ix86_cost->sse_op; @@ -50627,6 +50622,24 @@ ix86_add_stmt_cost (void *data, int coun break; } } + + combined_fn cfn; + if ((kind == vector_stmt || kind == scalar_stmt) + && stmt_info + && stmt_info->stmt + && (cfn = gimple_call_combined_fn (stmt_info->stmt)) != CFN_LAST) + switch (cfn) + { + case CFN_FMA: + stmt_cost = ix86_vec_cost (mode, + mode == SFmode ? ix86_cost->fmass + : ix86_cost->fmasd, + true); + break; + default: + break; + } + /* If we do elementwise loads into a vector then we are bound by latency and execution resources for the many scalar loads (AGU and load ports). Try to account for this by scaling the Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c 2018-05-08 09:42:03.510648702 +0100 +++ gcc/config/rs6000/rs6000.c 2018-05-11 18:08:24.830947576 +0100 @@ -16025,7 +16025,9 @@ rs6000_gimple_fold_builtin (gimple_stmt_ arg1 = gimple_call_arg (stmt, 1); tree arg2 = gimple_call_arg (stmt, 2); lhs = gimple_call_lhs (stmt); - gimple *g = gimple_build_assign (lhs, FMA_EXPR, arg0, arg1, arg2); + gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1, arg2); + gimple_call_set_lhs (g, lhs); + gimple_call_set_nothrow (g, true); gimple_set_location (g, gimple_location (stmt)); gsi_replace (gsi, g, true); return true; Index: gcc/brig/brigfrontend/brig-function.cc =================================================================== --- gcc/brig/brigfrontend/brig-function.cc 2018-05-08 09:42:01.419725462 +0100 +++ gcc/brig/brigfrontend/brig-function.cc 2018-05-11 18:08:24.817948111 +0100 @@ -1218,6 +1218,7 @@ brig_function::get_builtin_for_hsa_opcod case BRIG_OPCODE_NEXP2: builtin = mathfn_built_in (builtin_type, BUILT_IN_EXP2); break; + case BRIG_OPCODE_FMA: case BRIG_OPCODE_NFMA: builtin = mathfn_built_in (builtin_type, BUILT_IN_FMA); break; @@ -1460,8 +1461,6 @@ brig_function::get_tree_code_for_hsa_opc return CALL_EXPR; else return MAX_EXPR; - case BRIG_OPCODE_FMA: - return FMA_EXPR; case BRIG_OPCODE_ABS: return ABS_EXPR; case BRIG_OPCODE_SHL: @@ -1496,6 +1495,7 @@ brig_function::get_tree_code_for_hsa_opc /* Implement as 1/f (x). gcc should pattern detect that and use a native instruction, if available, for it. */ return TREE_LIST; + case BRIG_OPCODE_FMA: case BRIG_OPCODE_FLOOR: case BRIG_OPCODE_CEIL: case BRIG_OPCODE_SQRT: Index: gcc/c/gimple-parser.c =================================================================== --- gcc/c/gimple-parser.c 2018-01-03 11:12:56.269741723 +0000 +++ gcc/c/gimple-parser.c 2018-05-11 18:08:24.817948111 +0100 @@ -903,27 +903,6 @@ c_parser_gimple_postfix_expression (c_pa expr.value = fold_convert (type, val); return expr; } - else if (strcmp (IDENTIFIER_POINTER (id), "__FMA") == 0) - { - c_parser_consume_token (parser); - auto_vec<tree> args; - - if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>")) - { - c_parser_gimple_expr_list (parser, &args); - c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, - "expected %<)%>"); - } - if (args.length () != 3) - { - error_at (loc, "invalid number of operands to __FMA"); - expr.value = error_mark_node; - return expr; - } - expr.value = build3_loc (loc, FMA_EXPR, TREE_TYPE (args[0]), - args[0], args[1], args[2]); - return expr; - } /* SSA name. */ unsigned version, ver_offset; Index: gcc/cp/constexpr.c =================================================================== --- gcc/cp/constexpr.c 2018-05-01 19:30:30.973597315 +0100 +++ gcc/cp/constexpr.c 2018-05-11 18:08:24.830947576 +0100 @@ -4573,7 +4573,6 @@ cxx_eval_constant_expression (const cons non_constant_p, overflow_p); break; - case FMA_EXPR: case VEC_PERM_EXPR: r = cxx_eval_trinary_expression (ctx, t, lval, non_constant_p, overflow_p); @@ -5998,7 +5997,6 @@ #define RECUR(T,RV) \ return false; return true; - case FMA_EXPR: case VEC_PERM_EXPR: for (i = 0; i < 3; ++i) if (!RECUR (TREE_OPERAND (t, i), true)) Index: gcc/testsuite/lib/target-supports.exp =================================================================== --- gcc/testsuite/lib/target-supports.exp 2018-05-08 09:42:01.256731446 +0100 +++ gcc/testsuite/lib/target-supports.exp 2018-05-11 18:08:24.848946836 +0100 @@ -2879,6 +2879,13 @@ proc check_effective_target_base_quadflo return 1 } +# Return 1 if the target supports all four forms of fused multiply-add +# (fma, fms, fnma, and fnms) for both float and double. + +proc check_effective_target_all_scalar_fma { } { + return [istarget aarch64*-*-*] +} + # Return 1 if the target supports compiling fixed-point, # 0 otherwise. Index: gcc/testsuite/gcc.dg/fma-1.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-1.c 2018-05-11 18:08:24.844947000 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return a * b + c; +} + +double +f2 (double a, double b, double c) +{ + return a * b + c; +} + +/* { dg-final { scan-tree-dump-times { = FMA \(} 2 "widening_mul" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-2.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-2.c 2018-05-11 18:08:24.844947000 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return a * b - c; +} + +double +f2 (double a, double b, double c) +{ + return a * b - c; +} + +/* { dg-final { scan-tree-dump-times { = FMS \(} 2 "widening_mul" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-3.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-3.c 2018-05-11 18:08:24.845946959 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return c - a * b; +} + +double +f2 (double a, double b, double c) +{ + return c - a * b; +} + +/* { dg-final { scan-tree-dump-times { = FNMA \(} 2 "widening_mul" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-4.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-4.c 2018-05-11 18:08:24.845946959 +0100 @@ -0,0 +1,15 @@ +/* { dg-options "-O2 -fdump-tree-widening_mul" } */ + +float +f1 (float a, float b, float c) +{ + return -(a * b) - c; +} + +double +f2 (double a, double b, double c) +{ + return -(a * b) - c; +} + +/* { dg-final { scan-tree-dump-times { = FNMS \(} 2 "widening_mul" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-5.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-5.c 2018-05-11 18:08:24.845946959 +0100 @@ -0,0 +1,53 @@ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +float +f1 (float a, float b, float c) +{ + return __builtin_fmaf (a, b, -c); +} + +double +f2 (double a, double b, double c) +{ + return __builtin_fma (a, b, -c); +} + +void +f3 (float a, float b, float c, float d, float e, float *res) +{ + res[0] = __builtin_fmaf (a, b, -e); + res[1] = __builtin_fmaf (c, d, -e); +} + +void +f4 (double a, double b, double c, double d, double e, double *res) +{ + res[0] = __builtin_fma (a, b, -e); + res[1] = __builtin_fma (c, d, -e); +} + +float +f5 (float a, float b, float c) +{ + return -__builtin_fmaf (-a, b, c); +} + +double +f6 (double a, double b, double c) +{ + return -__builtin_fma (-a, b, c); +} + +float +f7 (float a, float b, float c) +{ + return -__builtin_fmaf (a, -b, c); +} + +double +f8 (double a, double b, double c) +{ + return -__builtin_fma (a, -b, c); +} + +/* { dg-final { scan-tree-dump-times { = FMS \(} 10 "optimized" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-6.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-6.c 2018-05-11 18:08:24.845946959 +0100 @@ -0,0 +1,67 @@ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +float +f1 (float a, float b, float c) +{ + return __builtin_fmaf (-a, b, c); +} + +double +f2 (double a, double b, double c) +{ + return __builtin_fma (-a, b, c); +} + +float +f3 (float a, float b, float c) +{ + return __builtin_fmaf (a, -b, c); +} + +double +f4 (double a, double b, double c) +{ + return __builtin_fma (a, -b, c); +} + +void +f5 (float a, float b, float c, float d, float e, float *res) +{ + res[0] = __builtin_fmaf (-a, b, c); + res[1] = __builtin_fmaf (-a, d, e); +} + +void +f6 (double a, double b, double c, double d, double e, double *res) +{ + res[0] = __builtin_fma (-a, b, c); + res[1] = __builtin_fma (-a, d, e); +} + +void +f7 (float a, float b, float c, float d, float e, float *res) +{ + res[0] = __builtin_fmaf (a, -b, c); + res[1] = __builtin_fmaf (d, -b, e); +} + +void +f8 (double a, double b, double c, double d, double e, double *res) +{ + res[0] = __builtin_fma (a, -b, c); + res[1] = __builtin_fma (d, -b, e); +} + +float +f9 (float a, float b, float c) +{ + return -__builtin_fmaf (a, b, -c); +} + +double +f10 (double a, double b, double c) +{ + return -__builtin_fma (a, b, -c); +} + +/* { dg-final { scan-tree-dump-times { = FNMA \(} 14 "optimized" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/fma-7.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.dg/fma-7.c 2018-05-11 18:08:24.845946959 +0100 @@ -0,0 +1,67 @@ +/* { dg-options "-O2 -fdump-tree-optimized" } */ + +float +f1 (float a, float b, float c) +{ + return __builtin_fmaf (-a, b, -c); +} + +double +f2 (double a, double b, double c) +{ + return __builtin_fma (-a, b, -c); +} + +float +f3 (float a, float b, float c) +{ + return __builtin_fmaf (a, -b, -c); +} + +double +f4 (double a, double b, double c) +{ + return __builtin_fma (a, -b, -c); +} + +void +f5 (float a, float b, float c, float d, float *res) +{ + res[0] = __builtin_fmaf (-a, b, -c); + res[1] = __builtin_fmaf (-a, d, -c); +} + +void +f6 (double a, double b, double c, double d, double *res) +{ + res[0] = __builtin_fma (-a, b, -c); + res[1] = __builtin_fma (-a, d, -c); +} + +void +f7 (float a, float b, float c, float d, float *res) +{ + res[0] = __builtin_fmaf (a, -b, -c); + res[1] = __builtin_fmaf (d, -b, -c); +} + +void +f8 (double a, double b, double c, double d, double *res) +{ + res[0] = __builtin_fma (a, -b, -c); + res[1] = __builtin_fma (d, -b, -c); +} + +float +f9 (float a, float b, float c) +{ + return -__builtin_fmaf (a, b, c); +} + +double +f10 (double a, double b, double c) +{ + return -__builtin_fma (a, b, c); +} + +/* { dg-final { scan-tree-dump-times { = FNMS \(} 14 "optimized" { target all_scalar_fma } } } */ Index: gcc/testsuite/gcc.dg/gimplefe-26.c =================================================================== --- gcc/testsuite/gcc.dg/gimplefe-26.c 2017-02-23 19:54:08.000000000 +0000 +++ /dev/null 2018-04-20 16:19:46.369131350 +0100 @@ -1,16 +0,0 @@ -/* { dg-do compile { target c99_runtime } } */ -/* { dg-options "-O -fgimple -fdump-tree-ssa-gimple" } */ - -#define foo(type, num) \ -type __GIMPLE () foo_##num (type a, type b, type c) \ -{ \ - type t0; \ - t0_1 = __FMA (a, b, c); \ - return t0_1; \ -} - -foo(float, 1) -foo(double, 2) -foo(long double, 3) - -/* { dg-final { scan-tree-dump-times "__FMA" 3 "ssa" } } */ Index: gcc/testsuite/gfortran.dg/reassoc_7.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_7.f 2015-06-02 23:52:46.000000000 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_7.f 2018-05-11 18:08:24.846946918 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Dvdph) implicit none Index: gcc/testsuite/gfortran.dg/reassoc_8.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_8.f 2015-06-02 23:52:46.000000000 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_8.f 2018-05-11 18:08:24.846946918 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Dvdph) implicit none Index: gcc/testsuite/gfortran.dg/reassoc_9.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_9.f 2015-06-02 23:52:48.000000000 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_9.f 2018-05-11 18:08:24.847946877 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Dvdph) implicit none Index: gcc/testsuite/gfortran.dg/reassoc_10.f =================================================================== --- gcc/testsuite/gfortran.dg/reassoc_10.f 2015-06-02 23:52:44.000000000 +0100 +++ gcc/testsuite/gfortran.dg/reassoc_10.f 2018-05-11 18:08:24.846946918 +0100 @@ -1,5 +1,5 @@ ! { dg-do compile } -! { dg-options "-O3 -ffast-math -fdump-tree-optimized" } +! { dg-options "-O3 -ffast-math -ffp-contract=off -fdump-tree-optimized" } SUBROUTINE S55199(P,Q,Dvdph) implicit none