From patchwork Fri Nov 17 15:29:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 119166 Delivered-To: patch@linaro.org Received: by 10.140.22.164 with SMTP id 33csp657992qgn; Fri, 17 Nov 2017 07:31:04 -0800 (PST) X-Google-Smtp-Source: AGs4zMa3CN4J4kAJbsqCkbbXHoMJFD98n8Xi+a1WbAeKTVQnXA6QJGpqLmpmUjLouBCCpD/hiwGo X-Received: by 10.159.229.136 with SMTP id az8mr5494333plb.133.1510932664544; Fri, 17 Nov 2017 07:31:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510932664; cv=none; d=google.com; s=arc-20160816; b=gLblG1sHLNaHrS2RNRfeQ0ulVJLBhp5p6pp+heEEzX4hBNsVt0VLXpAWfA5+CH/+IR /3TjhADr3jKcV3grZaSijcgPwK5aFGl9yrkLPv1L4nWBwNe24eApN32eCJj85dIOhB8K YNyjG3XAKgKHij8+VuVSZMDLAbDPdZQGqxqVYsCZvbD5+48FEofhhl0FDBuFqN7ccO0t KMQ5BgyOWxa0UpMkDeTEkIjpw2Nppb0y+OK9dTOmRWsZVP2xeLBUtTve9q6LiaBq+XHS 1hqGsdkL+bXOZBG88tVdhMWfvCU+lVwZINF6GTI5WQdnoN940ec3jIlbIkksOD/wLdOP h9+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:date:subject:mail-followup-to:to :from:delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=hGLARL7m2c5xzyUB3Nppfbobm3RjmgUSNP9HKRUHCuU=; b=JqyYdWz4QWep4D4uHMgCKC36+8FoNICYcCInlHy+6x1KThyuAKMTmDKK1Gt5qwl2LY /g9yHYceZkyujuc2w3S45mWLzfkzkD8WkQ2bqCtOtxQGguUdd/jtbOG28XD67qrcn9Ye AXdqBFs35mqx+ruypzdRW8lHL/zB8cAxgeJfpxfkGayoxD6XuVnYykPV9gQe5AqKVXVO 37THL3zjJ49+LsMT2AvoNTRsdB1ORke0liY+EuQ93dVL8hF7tlfJ4ZZDIWdRYl2V5v7I vjcu5qPoZ7+u8iMnlFjqgWCy1HF3G8VpxvfEFZAU7aye2UdBaUNJLbTwqiqAwfzbgj52 0phg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=sP0391iw; spf=pass (google.com: domain of gcc-patches-return-467156-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-467156-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id b31si2706964plb.222.2017.11.17.07.31.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Nov 2017 07:31:04 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-467156-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=sP0391iw; spf=pass (google.com: domain of gcc-patches-return-467156-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-467156-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=yGbQJVTdE+GJAv+FRQXbfFeGxydH2qqVuMoqeMVphWQVIgQjdOD/m nmyP+qpNQuuWmiar9nw8FlgkQYqujBHxZa3XaNb7rhBmA5s0PM6eMNhSPH/y98TU HyqasZ4kDlb4jCiqcYiLT+5XXBErNNF56zbmijOAjiUgoEOYiinf7k= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=RD3yfZOzwB8fFRFUcTy/wBYD1H4=; b=sP0391iw5UG2aC6BxDGf cRwqnfyHc+sveVXmyFqzqN5bQTdNZXrh/Kl8CGWv/289pyDQGSbSgHoZOx/d9sgp 9BRUPfqmx3LwrRYzrrQC8zXt01BYmrUzotlaXb+TRlogB9lfkbNCKn07oNwtTQvi GjCufT9pE9S52uvSluo/vMM= Received: (qmail 88483 invoked by alias); 17 Nov 2017 15:30:30 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 86917 invoked by uid 89); 17 Nov 2017 15:30:15 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-12.9 required=5.0 tests=AWL, BAYES_50, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KB_WAM_FROM_NAME_SINGLEWORD, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=UD:pr65947-3.c, UD:pr65947-1.c, pr65947-2.c, pr659474c X-HELO: mail-wr0-f171.google.com Received: from mail-wr0-f171.google.com (HELO mail-wr0-f171.google.com) (209.85.128.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 17 Nov 2017 15:30:04 +0000 Received: by mail-wr0-f171.google.com with SMTP id l22so2419738wrc.11 for ; Fri, 17 Nov 2017 07:30:02 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=hGLARL7m2c5xzyUB3Nppfbobm3RjmgUSNP9HKRUHCuU=; b=tlTb+Cpv2A5CKA+3W/p1t49Hsjo6is/XJqqZFo7o2MgzSKDkYP0yqFrgwdr/y2VwYi l+VsdfpaWhH8rcUsziZau50YwaNnqpZ0noMO5RPgpDr6pstTLZlDp87v4MUAw4Qd4Kp7 57J+BmIiu9V7m6koShWWLZC/PJ6/HjvouWeTdHjgpM3RJoSQpv7rJWBKg+zzS6FIPtfD dOM0F2OrGZUlEQLhXwZHoQf9AUnSaVV2/3dV30VM8z8KJuAuGafGyV6Cu0wdTsuU9h4P acnSHvaQo9TAqLOixIVqPH2nkp54ogaHEQ/7K34ffYTEA43w9mbs1XfsKxn7kGyMO3M1 M3hA== X-Gm-Message-State: AJaThX7RnutLkAUWfrjpOg54obkBFK7UDrrpzX3OvJZT7laFH/Atag9E c10wLtob8tmpI4oUwJAqjtqAWdvfSqs= X-Received: by 10.223.173.129 with SMTP id w1mr4846659wrc.19.1510932599914; Fri, 17 Nov 2017 07:29:59 -0800 (PST) Received: from localhost ([2.25.234.120]) by smtp.gmail.com with ESMTPSA id 59sm59939wrs.41.2017.11.17.07.29.57 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 17 Nov 2017 07:29:58 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Add support for conditional reductions using SVE CLASTB Date: Fri, 17 Nov 2017 15:29:57 +0000 Message-ID: <87shdcylca.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 This patch uses SVE CLASTB to optimise conditional reductions. It means that we no longer need to maintain a separate index vector to record the most recent valid value, and no longer need to worry about overflow cases. Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Richard 2017-11-17 Richard Sandiford Alan Hayward David Sherwood gcc/ * doc/md.texi (fold_extract_last_@var{m}): Document. * doc/sourcebuild.texi (vect_fold_extract_last): Likewise. * optabs.def (fold_extract_last_optab): New optab. * internal-fn.def (FOLD_EXTRACT_LAST): New internal function. * internal-fn.c (fold_extract_direct): New macro. (expand_fold_extract_optab_fn): Likewise. (direct_fold_extract_optab_supported_p): Likewise. * tree-vectorizer.h (EXTRACT_LAST_REDUCTION): New vect_reduction_type. * tree-vect-loop.c (vect_model_reduction_cost): Handle EXTRACT_LAST_REDUCTION. (get_initial_def_for_reduction): Do not create an initial vector for EXTRACT_LAST_REDUCTION reductions. (vectorizable_reduction): Leave the scalar phi in place for EXTRACT_LAST_REDUCTIONs. Try using EXTRACT_LAST_REDUCTION ahead of INTEGER_INDUC_COND_REDUCTION. Do not check for an epilogue code for EXTRACT_LAST_REDUCTION and defer the transform phase to vectorizable_condition. * tree-vect-stmts.c (vect_finish_stmt_generation_1): New function, split out from... (vect_finish_stmt_generation): ...here. (vect_finish_replace_stmt): New function. (vectorizable_condition): Handle EXTRACT_LAST_REDUCTION. * config/aarch64/aarch64-sve.md (fold_extract_last_): New pattern. * config/aarch64/aarch64.md (UNSPEC_CLASTB): New unspec. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_fold_extract_last): New proc. * gcc.dg/vect/pr65947-1.c: Update dump messages. Add markup for fold_extract_last. * gcc.dg/vect/pr65947-2.c: Likewise. * gcc.dg/vect/pr65947-3.c: Likewise. * gcc.dg/vect/pr65947-4.c: Likewise. * gcc.dg/vect/pr65947-5.c: Likewise. * gcc.dg/vect/pr65947-6.c: Likewise. * gcc.dg/vect/pr65947-9.c: Likewise. * gcc.dg/vect/pr65947-10.c: Likewise. * gcc.dg/vect/pr65947-12.c: Likewise. * gcc.dg/vect/pr65947-13.c: Likewise. * gcc.dg/vect/pr65947-14.c: Likewise. * gcc.target/aarch64/sve_clastb_1.c: New test. * gcc.target/aarch64/sve_clastb_1_run.c: Likewise. * gcc.target/aarch64/sve_clastb_2.c: Likewise. * gcc.target/aarch64/sve_clastb_2_run.c: Likewise. * gcc.target/aarch64/sve_clastb_3.c: Likewise. * gcc.target/aarch64/sve_clastb_3_run.c: Likewise. * gcc.target/aarch64/sve_clastb_4.c: Likewise. * gcc.target/aarch64/sve_clastb_4_run.c: Likewise. * gcc.target/aarch64/sve_clastb_5.c: Likewise. * gcc.target/aarch64/sve_clastb_5_run.c: Likewise. * gcc.target/aarch64/sve_clastb_6.c: Likewise. * gcc.target/aarch64/sve_clastb_6_run.c: Likewise. * gcc.target/aarch64/sve_clastb_7.c: Likewise. * gcc.target/aarch64/sve_clastb_7_run.c: Likewise. Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi 2017-11-17 15:23:56.035829132 +0000 +++ gcc/doc/md.texi 2017-11-17 15:29:00.818351462 +0000 @@ -5276,6 +5276,15 @@ has vector mode @var{m} while operand 0 element of @var{m}. Operand 1 has the usual mask mode for vectors of mode @var{m}; see @code{TARGET_VECTORIZE_GET_MASK_MODE}. +@cindex @code{fold_extract_last_@var{m}} instruction pattern +@item @code{fold_extract_last_@var{m}} +If any bits of mask operand 2 are set, find the last set bit, extract +the associated element from vector operand 3, and store the result +in operand 0. Store operand 1 in operand 0 otherwise. Operand 3 +has mode @var{m} and operands 0 and 1 have the mode appropriate for +one element of @var{m}. Operand 2 has the usual mask mode for vectors +of mode @var{m}; see @code{TARGET_VECTORIZE_GET_MASK_MODE}. + @cindex @code{sdot_prod@var{m}} instruction pattern @item @samp{sdot_prod@var{m}} @cindex @code{udot_prod@var{m}} instruction pattern Index: gcc/doc/sourcebuild.texi =================================================================== --- gcc/doc/sourcebuild.texi 2017-11-17 15:18:19.832557355 +0000 +++ gcc/doc/sourcebuild.texi 2017-11-17 15:29:00.818351462 +0000 @@ -1577,6 +1577,9 @@ Target supports 32- and 16-bytes vectors @item vect_logical_reduc Target supports AND, IOR and XOR reduction on vectors. + +@item vect_fold_extract_last +Target supports the @code{fold_extract_last} optab. @end table @subsubsection Thread Local Storage attributes Index: gcc/optabs.def =================================================================== --- gcc/optabs.def 2017-11-17 15:23:56.035829132 +0000 +++ gcc/optabs.def 2017-11-17 15:29:00.819351462 +0000 @@ -308,6 +308,7 @@ OPTAB_D (reduc_ior_scal_optab, "reduc_i OPTAB_D (reduc_xor_scal_optab, "reduc_xor_scal_$a") OPTAB_D (extract_last_optab, "extract_last_$a") +OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a") OPTAB_D (sdot_prod_optab, "sdot_prod$I$a") OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3") Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def 2017-11-17 15:23:56.035829132 +0000 +++ gcc/internal-fn.def 2017-11-17 15:29:00.819351462 +0000 @@ -146,6 +146,11 @@ DEF_INTERNAL_OPTAB_FN (RSQRT, ECF_CONST, DEF_INTERNAL_OPTAB_FN (EXTRACT_LAST, ECF_CONST | ECF_NOTHROW, extract_last, cond_unary) +/* Same, but return the first argument if no elements are active. */ +DEF_INTERNAL_OPTAB_FN (FOLD_EXTRACT_LAST, ECF_CONST | ECF_NOTHROW, + fold_extract_last, fold_extract) + + /* Unary math functions. */ DEF_INTERNAL_FLT_FN (ACOS, ECF_CONST, acos, unary) DEF_INTERNAL_FLT_FN (ASIN, ECF_CONST, asin, unary) Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c 2017-11-17 15:23:56.035829132 +0000 +++ gcc/internal-fn.c 2017-11-17 15:29:00.819351462 +0000 @@ -91,6 +91,7 @@ #define binary_direct { 0, 0, true } #define cond_unary_direct { 1, 1, true } #define cond_binary_direct { 1, 1, true } #define while_direct { 0, 2, false } +#define fold_extract_direct { 2, 2, false } const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = { #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct, @@ -2833,6 +2834,9 @@ #define expand_cond_unary_optab_fn(FN, S #define expand_cond_binary_optab_fn(FN, STMT, OPTAB) \ expand_direct_optab_fn (FN, STMT, OPTAB, 3) +#define expand_fold_extract_optab_fn(FN, STMT, OPTAB) \ + expand_direct_optab_fn (FN, STMT, OPTAB, 3) + /* RETURN_TYPE and ARGS are a return type and argument list that are in principle compatible with FN (which satisfies direct_internal_fn_p). Return the types that should be used to determine whether the @@ -2915,6 +2919,7 @@ #define direct_mask_store_optab_supporte #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_while_optab_supported_p convert_optab_supported_p +#define direct_fold_extract_optab_supported_p direct_optab_supported_p /* Return true if FN is supported for the types in TYPES when the optimization type is OPT_TYPE. The types are those associated with Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-11-17 15:18:19.847107868 +0000 +++ gcc/tree-vectorizer.h 2017-11-17 15:29:00.825351462 +0000 @@ -67,7 +67,14 @@ enum vect_reduction_type { TREE_CODE_REDUCTION, COND_REDUCTION, INTEGER_INDUC_COND_REDUCTION, - CONST_COND_REDUCTION + CONST_COND_REDUCTION, + + /* Retain a scalar phi and use a FOLD_EXTRACT_LAST within the loop + to implement: + + for (int i = 0; i < VF; ++i) + res = cond[i] ? val[i] : res; */ + EXTRACT_LAST_REDUCTION }; #define VECTORIZABLE_CYCLE_DEF(D) (((D) == vect_reduction_def) \ Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2017-11-17 15:23:56.036742308 +0000 +++ gcc/tree-vect-loop.c 2017-11-17 15:29:00.824351462 +0000 @@ -4025,7 +4025,7 @@ have_whole_vector_shift (machine_mode mo vect_model_reduction_cost (stmt_vec_info stmt_info, enum tree_code reduc_code, int ncopies) { - int prologue_cost = 0, epilogue_cost = 0; + int prologue_cost = 0, epilogue_cost = 0, inside_cost; enum tree_code code; optab optab; tree vectype; @@ -4044,13 +4044,11 @@ vect_model_reduction_cost (stmt_vec_info target_cost_data = BB_VINFO_TARGET_COST_DATA (STMT_VINFO_BB_VINFO (stmt_info)); /* Condition reductions generate two reductions in the loop. */ - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION) + vect_reduction_type reduction_type + = STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info); + if (reduction_type == COND_REDUCTION) ncopies *= 2; - /* Cost of reduction op inside loop. */ - unsigned inside_cost = add_stmt_cost (target_cost_data, ncopies, vector_stmt, - stmt_info, 0, vect_body); - vectype = STMT_VINFO_VECTYPE (stmt_info); mode = TYPE_MODE (vectype); orig_stmt = STMT_VINFO_RELATED_STMT (stmt_info); @@ -4060,14 +4058,30 @@ vect_model_reduction_cost (stmt_vec_info code = gimple_assign_rhs_code (orig_stmt); - /* Add in cost for initial definition. - For cond reduction we have four vectors: initial index, step, initial - result of the data reduction, initial value of the index reduction. */ - int prologue_stmts = STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) - == COND_REDUCTION ? 4 : 1; - prologue_cost += add_stmt_cost (target_cost_data, prologue_stmts, - scalar_to_vec, stmt_info, 0, - vect_prologue); + if (reduction_type == EXTRACT_LAST_REDUCTION) + { + /* No extra instructions needed in the prologue. */ + prologue_cost = 0; + + /* Count NCOPIES FOLD_EXTRACT_LAST operations. */ + inside_cost = add_stmt_cost (target_cost_data, ncopies, vec_to_scalar, + stmt_info, 0, vect_body); + } + else + { + /* Add in cost for initial definition. + For cond reduction we have four vectors: initial index, step, + initial result of the data reduction, initial value of the index + reduction. */ + int prologue_stmts = reduction_type == COND_REDUCTION ? 4 : 1; + prologue_cost += add_stmt_cost (target_cost_data, prologue_stmts, + scalar_to_vec, stmt_info, 0, + vect_prologue); + + /* Cost of reduction op inside loop. */ + inside_cost = add_stmt_cost (target_cost_data, ncopies, vector_stmt, + stmt_info, 0, vect_body); + } /* Determine cost of epilogue code. @@ -4078,7 +4092,7 @@ vect_model_reduction_cost (stmt_vec_info { if (reduc_code != ERROR_MARK) { - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION) + if (reduction_type == COND_REDUCTION) { /* An EQ stmt and an COND_EXPR stmt. */ epilogue_cost += add_stmt_cost (target_cost_data, 2, @@ -4103,7 +4117,7 @@ vect_model_reduction_cost (stmt_vec_info vect_epilogue); } } - else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION) + else if (reduction_type == COND_REDUCTION) { unsigned estimated_nunits = vect_nunits_for_cost (vectype); /* Extraction of scalar elements. */ @@ -4117,6 +4131,9 @@ vect_model_reduction_cost (stmt_vec_info scalar_stmt, stmt_info, 0, vect_epilogue); } + else if (reduction_type == EXTRACT_LAST_REDUCTION) + /* No extra instructions need in the epilogue. */ + ; else { int vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype)); @@ -4284,6 +4301,9 @@ get_initial_def_for_reduction (gimple *s return vect_create_destination_var (init_val, vectype); } + vect_reduction_type reduction_type + = STMT_VINFO_VEC_REDUCTION_TYPE (stmt_vinfo); + /* In case of a nested reduction do not use an adjustment def as that case is not supported by the epilogue generation correctly if ncopies is not one. */ @@ -4357,7 +4377,8 @@ get_initial_def_for_reduction (gimple *s if (adjustment_def) { *adjustment_def = NULL_TREE; - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_vinfo) != COND_REDUCTION) + if (reduction_type != COND_REDUCTION + && reduction_type != EXTRACT_LAST_REDUCTION) { init_def = vect_get_vec_def_for_operand (init_val, stmt); break; @@ -6039,6 +6060,11 @@ vectorizable_reduction (gimple *stmt, gi if (STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (reduc_stmt))) reduc_stmt = STMT_VINFO_RELATED_STMT (vinfo_for_stmt (reduc_stmt)); + if (STMT_VINFO_VEC_REDUCTION_TYPE (vinfo_for_stmt (reduc_stmt)) + == EXTRACT_LAST_REDUCTION) + /* Leave the scalar phi in place. */ + return true; + gcc_assert (is_gimple_assign (reduc_stmt)); for (unsigned k = 1; k < gimple_num_ops (reduc_stmt); ++k) { @@ -6290,16 +6316,6 @@ vectorizable_reduction (gimple *stmt, gi /* If we have a condition reduction, see if we can simplify it further. */ if (v_reduc_type == COND_REDUCTION) { - if (cond_reduc_dt == vect_induction_def) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "condition expression based on " - "integer induction.\n"); - STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) - = INTEGER_INDUC_COND_REDUCTION; - } - /* Loop peeling modifies initial value of reduction PHI, which makes the reduction stmt to be transformed different to the original stmt analyzed. We need to record reduction code for @@ -6312,6 +6328,24 @@ vectorizable_reduction (gimple *stmt, gi gcc_assert (cond_reduc_dt == vect_constant_def); STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) = CONST_COND_REDUCTION; } + else if (direct_internal_fn_supported_p (IFN_FOLD_EXTRACT_LAST, + vectype_in, OPTIMIZE_FOR_SPEED)) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, + "optimizing condition reduction with" + " FOLD_EXTRACT_LAST.\n"); + STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) = EXTRACT_LAST_REDUCTION; + } + else if (cond_reduc_dt == vect_induction_def) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "optimizing condition reduction based on " + "integer induction.\n"); + STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) + = INTEGER_INDUC_COND_REDUCTION; + } else if (cond_reduc_dt == vect_constant_def) { enum vect_def_type cond_initial_dt; @@ -6465,12 +6499,12 @@ vectorizable_reduction (gimple *stmt, gi (and also the same tree-code) when generating the epilog code and when generating the code inside the loop. */ - if (orig_stmt) + vect_reduction_type reduction_type + = STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info); + if (orig_stmt && reduction_type == TREE_CODE_REDUCTION) { /* This is a reduction pattern: get the vectype from the type of the reduction variable, and get the tree-code from orig_stmt. */ - gcc_assert (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) - == TREE_CODE_REDUCTION); orig_code = gimple_assign_rhs_code (orig_stmt); gcc_assert (vectype_out); vec_mode = TYPE_MODE (vectype_out); @@ -6486,13 +6520,12 @@ vectorizable_reduction (gimple *stmt, gi /* For simple condition reductions, replace with the actual expression we want to base our reduction around. */ - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == CONST_COND_REDUCTION) + if (reduction_type == CONST_COND_REDUCTION) { orig_code = STMT_VINFO_VEC_CONST_COND_REDUC_CODE (stmt_info); gcc_assert (orig_code == MAX_EXPR || orig_code == MIN_EXPR); } - else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) - == INTEGER_INDUC_COND_REDUCTION) + else if (reduction_type == INTEGER_INDUC_COND_REDUCTION) orig_code = MAX_EXPR; } @@ -6514,7 +6547,9 @@ vectorizable_reduction (gimple *stmt, gi epilog_reduc_code = ERROR_MARK; - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) != COND_REDUCTION) + if (reduction_type == TREE_CODE_REDUCTION + || reduction_type == INTEGER_INDUC_COND_REDUCTION + || reduction_type == CONST_COND_REDUCTION) { if (reduction_code_for_scalar_code (orig_code, &epilog_reduc_code)) { @@ -6549,7 +6584,7 @@ vectorizable_reduction (gimple *stmt, gi } } } - else + else if (reduction_type == COND_REDUCTION) { int scalar_precision = GET_MODE_PRECISION (SCALAR_TYPE_MODE (scalar_type)); @@ -6564,7 +6599,9 @@ vectorizable_reduction (gimple *stmt, gi epilog_reduc_code = REDUC_MAX_EXPR; } - if (epilog_reduc_code == ERROR_MARK && !nunits_out.is_constant ()) + if (reduction_type != EXTRACT_LAST_REDUCTION + && epilog_reduc_code == ERROR_MARK + && !nunits_out.is_constant ()) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -6573,8 +6610,7 @@ vectorizable_reduction (gimple *stmt, gi return false; } - if ((double_reduc - || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) != TREE_CODE_REDUCTION) + if ((double_reduc || reduction_type != TREE_CODE_REDUCTION) && ncopies > 1) { if (dump_enabled_p ()) @@ -6664,7 +6700,7 @@ vectorizable_reduction (gimple *stmt, gi } } - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION) + if (reduction_type == COND_REDUCTION) { widest_int ni; @@ -6801,6 +6837,13 @@ vectorizable_reduction (gimple *stmt, gi bool masked_loop_p = LOOP_VINFO_FULLY_MASKED_P (loop_vinfo); + if (reduction_type == EXTRACT_LAST_REDUCTION) + { + gcc_assert (!slp_node); + return vectorizable_condition (stmt, gsi, vec_stmt, + NULL, reduc_index, NULL); + } + /* Create the destination vector */ vec_dest = vect_create_destination_var (scalar_dest, vectype_out); Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2017-11-17 15:18:19.846198461 +0000 +++ gcc/tree-vect-stmts.c 2017-11-17 15:29:00.825351462 +0000 @@ -1601,6 +1601,47 @@ vect_get_vec_defs (tree op0, tree op1, g } } +/* Helper function called by vect_finish_replace_stmt and + vect_finish_stmt_generation. Set the location of the new + statement and create a stmt_vec_info for it. */ + +static void +vect_finish_stmt_generation_1 (gimple *stmt, gimple *vec_stmt) +{ + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + vec_info *vinfo = stmt_info->vinfo; + + set_vinfo_for_stmt (vec_stmt, new_stmt_vec_info (vec_stmt, vinfo)); + + if (dump_enabled_p ()) + { + dump_printf_loc (MSG_NOTE, vect_location, "add new stmt: "); + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, vec_stmt, 0); + } + + gimple_set_location (vec_stmt, gimple_location (stmt)); + + /* While EH edges will generally prevent vectorization, stmt might + e.g. be in a must-not-throw region. Ensure newly created stmts + that could throw are part of the same region. */ + int lp_nr = lookup_stmt_eh_lp (stmt); + if (lp_nr != 0 && stmt_could_throw_p (vec_stmt)) + add_stmt_to_eh_lp (vec_stmt, lp_nr); +} + +/* Replace the scalar statement STMT with a new vector statement VEC_STMT, + which sets the same scalar result as STMT did. */ + +void +vect_finish_replace_stmt (gimple *stmt, gimple *vec_stmt) +{ + gcc_assert (gimple_get_lhs (stmt) == gimple_get_lhs (vec_stmt)); + + gimple_stmt_iterator gsi = gsi_for_stmt (stmt); + gsi_replace (&gsi, vec_stmt, false); + + vect_finish_stmt_generation_1 (stmt, vec_stmt); +} /* Function vect_finish_stmt_generation. @@ -1610,9 +1651,6 @@ vect_get_vec_defs (tree op0, tree op1, g vect_finish_stmt_generation (gimple *stmt, gimple *vec_stmt, gimple_stmt_iterator *gsi) { - stmt_vec_info stmt_info = vinfo_for_stmt (stmt); - vec_info *vinfo = stmt_info->vinfo; - gcc_assert (gimple_code (stmt) != GIMPLE_LABEL); if (!gsi_end_p (*gsi) @@ -1642,23 +1680,7 @@ vect_finish_stmt_generation (gimple *stm } } gsi_insert_before (gsi, vec_stmt, GSI_SAME_STMT); - - set_vinfo_for_stmt (vec_stmt, new_stmt_vec_info (vec_stmt, vinfo)); - - if (dump_enabled_p ()) - { - dump_printf_loc (MSG_NOTE, vect_location, "add new stmt: "); - dump_gimple_stmt (MSG_NOTE, TDF_SLIM, vec_stmt, 0); - } - - gimple_set_location (vec_stmt, gimple_location (stmt)); - - /* While EH edges will generally prevent vectorization, stmt might - e.g. be in a must-not-throw region. Ensure newly created stmts - that could throw are part of the same region. */ - int lp_nr = lookup_stmt_eh_lp (stmt); - if (lp_nr != 0 && stmt_could_throw_p (vec_stmt)) - add_stmt_to_eh_lp (vec_stmt, lp_nr); + vect_finish_stmt_generation_1 (stmt, vec_stmt); } /* We want to vectorize a call to combined function CFN with function @@ -8091,7 +8113,9 @@ vectorizable_condition (gimple *stmt, gi if (reduc_index && STMT_SLP_TYPE (stmt_info)) return false; - if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == TREE_CODE_REDUCTION) + vect_reduction_type reduction_type + = STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info); + if (reduction_type == TREE_CODE_REDUCTION) { if (!STMT_VINFO_RELEVANT_P (stmt_info) && !bb_vinfo) return false; @@ -8250,12 +8274,13 @@ vectorizable_condition (gimple *stmt, gi /* Handle def. */ scalar_dest = gimple_assign_lhs (stmt); - vec_dest = vect_create_destination_var (scalar_dest, vectype); + if (reduction_type != EXTRACT_LAST_REDUCTION) + vec_dest = vect_create_destination_var (scalar_dest, vectype); /* Handle cond expr. */ for (j = 0; j < ncopies; j++) { - gassign *new_stmt = NULL; + gimple *new_stmt = NULL; if (j == 0) { if (slp_node) @@ -8389,11 +8414,42 @@ vectorizable_condition (gimple *stmt, gi } } } - new_temp = make_ssa_name (vec_dest); - new_stmt = gimple_build_assign (new_temp, VEC_COND_EXPR, - vec_compare, vec_then_clause, - vec_else_clause); - vect_finish_stmt_generation (stmt, new_stmt, gsi); + if (reduction_type == EXTRACT_LAST_REDUCTION) + { + if (!is_gimple_val (vec_compare)) + { + tree vec_compare_name = make_ssa_name (vec_cmp_type); + new_stmt = gimple_build_assign (vec_compare_name, + vec_compare); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + vec_compare = vec_compare_name; + } + gcc_assert (reduc_index == 2); + new_stmt = gimple_build_call_internal + (IFN_FOLD_EXTRACT_LAST, 3, else_clause, vec_compare, + vec_then_clause); + gimple_call_set_lhs (new_stmt, scalar_dest); + SSA_NAME_DEF_STMT (scalar_dest) = new_stmt; + if (stmt == gsi_stmt (*gsi)) + vect_finish_replace_stmt (stmt, new_stmt); + else + { + /* In this case we're moving the definition to later in the + block. That doesn't matter because the only uses of the + lhs are in phi statements. */ + gimple_stmt_iterator old_gsi = gsi_for_stmt (stmt); + gsi_remove (&old_gsi, true); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + } + } + else + { + new_temp = make_ssa_name (vec_dest); + new_stmt = gimple_build_assign (new_temp, VEC_COND_EXPR, + vec_compare, vec_then_clause, + vec_else_clause); + vect_finish_stmt_generation (stmt, new_stmt, gsi); + } if (slp_node) SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt); } Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2017-11-17 15:23:56.034915957 +0000 +++ gcc/config/aarch64/aarch64-sve.md 2017-11-17 15:29:00.816351462 +0000 @@ -1451,6 +1451,21 @@ (define_insn "cond_" "\t%0., %1/m, %0., %3." ) +;; Set operand 0 to the last active element in operand 3, or to tied +;; operand 1 if no elements are active. +(define_insn "fold_extract_last_" + [(set (match_operand: 0 "register_operand" "=r, w") + (unspec: + [(match_operand: 1 "register_operand" "0, 0") + (match_operand: 2 "register_operand" "Upl, Upl") + (match_operand:SVE_ALL 3 "register_operand" "w, w")] + UNSPEC_CLASTB))] + "TARGET_SVE" + "@ + clastb\t%0, %2, %0, %3. + clastb\t%0, %2, %0, %3." +) + ;; Unpredicated integer add reduction. (define_expand "reduc_plus_scal_" [(set (match_operand: 0 "register_operand") Index: gcc/config/aarch64/aarch64.md =================================================================== --- gcc/config/aarch64/aarch64.md 2017-11-17 15:18:19.792543444 +0000 +++ gcc/config/aarch64/aarch64.md 2017-11-17 15:29:00.817351462 +0000 @@ -163,6 +163,7 @@ (define_c_enum "unspec" [ UNSPEC_LDN UNSPEC_STN UNSPEC_INSR + UNSPEC_CLASTB ]) (define_c_enum "unspecv" [ Index: gcc/testsuite/lib/target-supports.exp =================================================================== --- gcc/testsuite/lib/target-supports.exp 2017-11-17 15:18:19.834376169 +0000 +++ gcc/testsuite/lib/target-supports.exp 2017-11-17 15:29:00.823351462 +0000 @@ -7174,6 +7174,12 @@ proc check_effective_target_vect_logical return [check_effective_target_aarch64_sve] } +# Return 1 if the target supports the fold_extract_last optab. + +proc check_effective_target_vect_fold_extract_last { } { + return [check_effective_target_aarch64_sve] +} + # Return 1 if the target supports section-anchors proc check_effective_target_section_anchors { } { Index: gcc/testsuite/gcc.dg/vect/pr65947-1.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-1.c 2017-06-30 12:50:37.703687557 +0100 +++ gcc/testsuite/gcc.dg/vect/pr65947-1.c 2017-11-17 15:29:00.819351462 +0000 @@ -41,4 +41,4 @@ main (void) } /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ -/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 4 "vect" } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction" 4 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-2.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-2.c 2017-06-30 12:50:37.704687511 +0100 +++ gcc/testsuite/gcc.dg/vect/pr65947-2.c 2017-11-17 15:29:00.820351462 +0000 @@ -42,4 +42,5 @@ main (void) } /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ -/* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 4 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump-not "optimizing condition reduction" "vect" { target { ! vect_fold_extract_last } } } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-3.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-3.c 2017-06-30 12:50:37.704687511 +0100 +++ gcc/testsuite/gcc.dg/vect/pr65947-3.c 2017-11-17 15:29:00.820351462 +0000 @@ -52,4 +52,5 @@ main (void) } /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ -/* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 4 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump-not "optimizing condition reduction" "vect" { target { ! vect_fold_extract_last } } } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-4.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-4.c 2017-06-30 12:50:37.704687511 +0100 +++ gcc/testsuite/gcc.dg/vect/pr65947-4.c 2017-11-17 15:29:00.820351462 +0000 @@ -41,5 +41,5 @@ main (void) } /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ -/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 4 "vect" } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction" 4 "vect" } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-5.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-5.c 2017-11-09 15:15:28.896643339 +0000 +++ gcc/testsuite/gcc.dg/vect/pr65947-5.c 2017-11-17 15:29:00.820351462 +0000 @@ -50,6 +50,8 @@ main (void) return 0; } -/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */ -/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */ -/* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { target { ! vect_fold_extract_last } } } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" { xfail vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 4 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump-not "optimizing condition reduction" "vect" { target { ! vect_fold_extract_last } } } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-6.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-6.c 2017-06-30 12:50:37.704687511 +0100 +++ gcc/testsuite/gcc.dg/vect/pr65947-6.c 2017-11-17 15:29:00.820351462 +0000 @@ -41,4 +41,5 @@ main (void) } /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ -/* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 4 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump-not "optimizing condition reduction" "vect" { target { ! vect_fold_extract_last } } } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-9.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-9.c 2017-10-02 09:10:57.330692940 +0100 +++ gcc/testsuite/gcc.dg/vect/pr65947-9.c 2017-11-17 15:29:00.821351462 +0000 @@ -45,5 +45,8 @@ main () return 0; } -/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" } } */ -/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } } */ +/* { dg-final { scan-tree-dump-not "LOOP VECTORIZED" "vect" { target { ! vect_fold_extract_last } } } } */ +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" { target { ! vect_fold_extract_last } } } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 2 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump-not "optimizing condition reduction" "vect" { target { ! vect_fold_extract_last } } } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-10.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-10.c 2017-10-04 16:25:39.698051107 +0100 +++ gcc/testsuite/gcc.dg/vect/pr65947-10.c 2017-11-17 15:29:00.819351462 +0000 @@ -42,5 +42,6 @@ main (void) } /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ -/* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 4 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump-not "optimizing condition reduction" "vect" { target { ! vect_fold_extract_last } } } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-12.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-12.c 2017-06-30 12:50:37.706687419 +0100 +++ gcc/testsuite/gcc.dg/vect/pr65947-12.c 2017-11-17 15:29:00.819351462 +0000 @@ -42,4 +42,5 @@ main (void) } /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ -/* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 4 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump-not "optimizing condition reduction" "vect" { target { ! vect_fold_extract_last } } } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-13.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-13.c 2017-06-30 12:50:37.706687419 +0100 +++ gcc/testsuite/gcc.dg/vect/pr65947-13.c 2017-11-17 15:29:00.820351462 +0000 @@ -42,4 +42,5 @@ main (void) } /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ -/* { dg-final { scan-tree-dump-not "condition expression based on integer induction." "vect" } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 4 "vect" { target vect_fold_extract_last } } } */ +/* { dg-final { scan-tree-dump-not "optimizing condition reduction" "vect" { target { ! vect_fold_extract_last } } } } */ Index: gcc/testsuite/gcc.dg/vect/pr65947-14.c =================================================================== --- gcc/testsuite/gcc.dg/vect/pr65947-14.c 2017-06-30 12:50:37.702687603 +0100 +++ gcc/testsuite/gcc.dg/vect/pr65947-14.c 2017-11-17 15:29:00.820351462 +0000 @@ -41,4 +41,5 @@ main (void) } /* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */ -/* { dg-final { scan-tree-dump-times "condition expression based on integer induction." 4 "vect" } } */ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction based on integer induction" 4 "vect" { target { ! vect_fold_extract_last } } } }*/ +/* { dg-final { scan-tree-dump-times "optimizing condition reduction with FOLD_EXTRACT_LAST" 4 "vect" { target vect_fold_extract_last } } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_1.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_1.c 2017-11-17 15:29:00.821351462 +0000 @@ -0,0 +1,20 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#define N 32 + +/* Simple condition reduction. */ + +int __attribute__ ((noinline, noclone)) +condition_reduction (int *a, int min_v) +{ + int last = 66; /* High start value. */ + + for (int i = 0; i < N; i++) + if (a[i] < min_v) + last = i; + + return last; +} + +/* { dg-final { scan-assembler {\tclastb\tw[0-9]+, p[0-7], w[0-9]+, z[0-9]+\.s} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_1_run.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_1_run.c 2017-11-17 15:29:00.821351462 +0000 @@ -0,0 +1,22 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "sve_clastb_1.c" + +int __attribute__ ((optimize (1))) +main (void) +{ + int a[N] = { + 11, -12, 13, 14, 15, 16, 17, 18, 19, 20, + 1, 2, -3, 4, 5, 6, 7, -8, 9, 10, + 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, + 31, 32 + }; + + int ret = condition_reduction (a, 1); + + if (ret != 17) + __builtin_abort (); + + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_2.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_2.c 2017-11-17 15:29:00.821351462 +0000 @@ -0,0 +1,26 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#include + +#if !defined(TYPE) +#define TYPE uint32_t +#endif + +#define N 254 + +/* Non-simple condition reduction. */ + +TYPE __attribute__ ((noinline, noclone)) +condition_reduction (TYPE *a, TYPE min_v) +{ + TYPE last = 65; + + for (TYPE i = 0; i < N; i++) + if (a[i] < min_v) + last = a[i]; + + return last; +} + +/* { dg-final { scan-assembler {\tclastb\tw[0-9]+, p[0-7]+, w[0-9]+, z[0-9]+\.s} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_2_run.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_2_run.c 2017-11-17 15:29:00.821351462 +0000 @@ -0,0 +1,23 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "sve_clastb_2.c" + +int __attribute__ ((optimize (1))) +main (void) +{ + unsigned int a[N] = { + 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, + 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, + 31, 32 + }; + __builtin_memset (a + 32, 43, (N - 32) * sizeof (int)); + + unsigned int ret = condition_reduction (a, 16); + + if (ret != 10) + __builtin_abort (); + + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_3.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_3.c 2017-11-17 15:29:00.821351462 +0000 @@ -0,0 +1,8 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#define TYPE uint8_t + +#include "sve_clastb_2.c" + +/* { dg-final { scan-assembler {\tclastb\tw[0-9]+, p[0-7]+, w[0-9]+, z[0-9]+\.b} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_3_run.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_3_run.c 2017-11-17 15:29:00.821351462 +0000 @@ -0,0 +1,23 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "sve_clastb_3.c" + +int __attribute__ ((optimize (1))) +main (void) +{ + unsigned char a[N] = { + 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, + 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, + 31, 32 + }; + __builtin_memset (a + 32, 43, N - 32); + + unsigned char ret = condition_reduction (a, 16); + + if (ret != 10) + __builtin_abort (); + + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_4.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_4.c 2017-11-17 15:29:00.821351462 +0000 @@ -0,0 +1,8 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#define TYPE int16_t + +#include "sve_clastb_2.c" + +/* { dg-final { scan-assembler {\tclastb\tw[0-9]+, p[0-7], w[0-9]+, z[0-9]+\.h} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_4_run.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_4_run.c 2017-11-17 15:29:00.822351462 +0000 @@ -0,0 +1,25 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -fno-inline -march=armv8-a+sve" } */ + +#include "sve_clastb_4.c" + +extern void abort (void) __attribute__ ((noreturn)); + +int __attribute__ ((optimize (1))) +main (void) +{ + short a[N] = { + 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, + 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, + 31, 32 + }; + __builtin_memset (a+32, 43, (N-32)*sizeof (short)); + + short ret = condition_reduction (a, 16); + + if (ret != 10) + abort (); + + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_5.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_5.c 2017-11-17 15:29:00.822351462 +0000 @@ -0,0 +1,8 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#define TYPE uint64_t + +#include "sve_clastb_2.c" + +/* { dg-final { scan-assembler {\tclastb\tx[0-9]+, p[0-7], x[0-9]+, z[0-9]+\.d} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_5_run.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_5_run.c 2017-11-17 15:29:00.822351462 +0000 @@ -0,0 +1,23 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "sve_clastb_5.c" + +int __attribute__ ((optimize (1))) +main (void) +{ + long a[N] = { + 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, + 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, + 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, + 31, 32 + }; + __builtin_memset (a + 32, 43, (N - 32) * sizeof (long)); + + long ret = condition_reduction (a, 16); + + if (ret != 10) + __builtin_abort (); + + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_6.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_6.c 2017-11-17 15:29:00.822351462 +0000 @@ -0,0 +1,24 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#define N 32 + +#ifndef TYPE +#define TYPE float +#endif + +/* Non-integer data types. */ + +TYPE __attribute__ ((noinline, noclone)) +condition_reduction (TYPE *a, TYPE min_v) +{ + TYPE last = 0; + + for (int i = 0; i < N; i++) + if (a[i] < min_v) + last = a[i]; + + return last; +} + +/* { dg-final { scan-assembler {\tclastb\ts[0-9]+, p[0-7], s[0-9]+, z[0-9]+\.s} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_6_run.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_6_run.c 2017-11-17 15:29:00.822351462 +0000 @@ -0,0 +1,22 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "sve_clastb_6.c" + +int __attribute__ ((optimize (1))) +main (void) +{ + float a[N] = { + 11.5, 12.2, 13.22, 14.1, 15.2, 16.3, 17, 18.7, 19, 20, + 1, 2, 3.3, 4.3333, 5.5, 6.23, 7, 8.63, 9, 10.6, + 21, 22.12, 23.55, 24.76, 25, 26, 27.34, 28.765, 29, 30, + 31.111, 32.322 + }; + + float ret = condition_reduction (a, 16.7); + + if (ret != (float) 10.6) + __builtin_abort (); + + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_7.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_7.c 2017-11-17 15:29:00.822351462 +0000 @@ -0,0 +1,7 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#define TYPE double +#include "sve_clastb_6.c" + +/* { dg-final { scan-assembler {\tclastb\td[0-9]+, p[0-7], d[0-9]+, z[0-9]+\.d} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_clastb_7_run.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_clastb_7_run.c 2017-11-17 15:29:00.822351462 +0000 @@ -0,0 +1,22 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "sve_clastb_7.c" + +int __attribute__ ((optimize (1))) +main (void) +{ + double a[N] = { + 11.5, 12.2, 13.22, 14.1, 15.2, 16.3, 17, 18.7, 19, 20, + 1, 2, 3.3, 4.3333, 5.5, 6.23, 7, 8.63, 9, 10.6, + 21, 22.12, 23.55, 24.76, 25, 26, 27.34, 28.765, 29, 30, + 31.111, 32.322 + }; + + double ret = condition_reduction (a, 16.7); + + if (ret != 10.6) + __builtin_abort (); + + return 0; +}