From patchwork Fri Nov 17 22:10:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 119232 Delivered-To: patch@linaro.org Received: by 10.140.22.164 with SMTP id 33csp1063416qgn; Fri, 17 Nov 2017 14:12:44 -0800 (PST) X-Google-Smtp-Source: AGs4zMZFO4YlR4g/1TaePUQdg4HIAFAvYeCaVyTYlVKog1oQgamwnxLZF9/JhnzgZGHwRsQ7VDP5 X-Received: by 10.101.97.130 with SMTP id c2mr6613030pgv.101.1510956764453; Fri, 17 Nov 2017 14:12:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510956764; cv=none; d=google.com; s=arc-20160816; b=DnMNqyT7Dr2YKudKTn5ES3z1pdvnbAnp/ZhJ5ikHLvtIJUokC2pDNyuX4XkM3fpnpB AKjv4e8AQL1wy9ycvLMTEYjHApOzkpHOLCjdqemp0uzFiSe6gnYW7QZ7ILyvMbTCop3+ CFu8yEm5wV+hLMJU0AILlbmLBBUaDJ+nHDq8bho43qKjGp2pkao+BgYY90GpeWUiAfpq iSYjA1aPOSVXsRfb/2RMo26SZUv3NE1XeFK/mIOnGumUD3h6W0cb8QiVpQFFnGG2JokC lJzKMdiOQTLQibzwCz3NioeEq6+aUJuW1eWKJSi7ZmHnSwhV0U5vKgP4KUKRgAoaWnlF fZIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:date:subject:mail-followup-to:to :from:delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=zpFIF+ExwmSDQdUKjc62HHriVe06/nUw3vp1DFDduRs=; b=K+RYWNpsYKwzBabMj+HFa4xI2Qy+IEF8hmRX804zkAlMiUkz9itX2kNqrql7r7SUBW 2LeC6fWLBVrKeykf16kSiIWhf1N5kfAVZuanKmckW84Qz+xn99z+2oaJ1NXLHDShmkkq xPrqVpRY9IaSCVwcGLyAfHgHIDXk5yDoICw5q5MhQH5RkcrrbmpEmKij/ryRweZBZtGH 8CwGjoLswoTtZOYkmL7KgUEIjcXGfyHHa3IXBcid+xqh+K2+3or8SuBNEH1CZT0SJHCB MW/ivfZdNstAeLutbkOO0SDHWgmOrRJljIQ8P8+mBU8MS5qjT5smO9GdzMNgORwIrm7j yUhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=v6v9xqh1; spf=pass (google.com: domain of gcc-patches-return-467234-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-467234-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id t194si3300774pgb.377.2017.11.17.14.12.44 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Nov 2017 14:12:44 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-467234-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=v6v9xqh1; spf=pass (google.com: domain of gcc-patches-return-467234-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-467234-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=JO6ijjbbwYgsEkVJyp5mE79343VOkKR86sZgTg9snvpAh5nGUN/WT 7SS6LRBcHIxN04EK12+fAccqfDTNn0Lkmm68CaQ3oVSHAEImFbVQTKqklj98rKYc 5i4IwpnadlqV7zi+3Ge36ICVAe8Sjq25vj01L2mB8+FdX9VHylOGXg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=OFgWDYgTGlm8DDSwJ1dYVdvG4y8=; b=v6v9xqh12SjvPsLeYUY5 jKzL9c1SV4/7CITfeqM4Y5s/RHcl7sW8wWLv7vZtyLTWZTHQ1QTO2CsQQO+fFegx eRn/nz4NmnYJyaDjttruGFbs03EaLtoMhgAk5+YLlGH14fhXdN5FeUL+DSN8zuk0 KGXuL8dR3iPQjbUBGo6RWXo= Received: (qmail 128960 invoked by alias); 17 Nov 2017 22:10:47 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 128820 invoked by uid 89); 17 Nov 2017 22:10:46 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-15.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KB_WAM_FROM_NAME_SINGLEWORD, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wr0-f180.google.com Received: from mail-wr0-f180.google.com (HELO mail-wr0-f180.google.com) (209.85.128.180) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 17 Nov 2017 22:10:39 +0000 Received: by mail-wr0-f180.google.com with SMTP id 4so3297351wrt.0 for ; Fri, 17 Nov 2017 14:10:38 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=zpFIF+ExwmSDQdUKjc62HHriVe06/nUw3vp1DFDduRs=; b=ZSR5kMfMi4Pfd5vWvpFXhNSzj8N/mb56T8rdnmgOghoZIKXX9HxNIbmemZOYdSRZHw kQ4JtIJaMAH7L/w5vCr1lcuE7F6sGjaBXyNONUfsbghWdLUYNTRPiP2d8ni4EV68Hl/S HwWb4iJkQMKnUtrz2Fgthek/tnyottb3rY26i1dm7+RSv4+DY1Wrsb6uc6TaCfxwUz3+ rxUT9G5wprUGuPfQQmFqy0y8BmBm7g0ikxknQoR944Eg7z7THE9V9/+6c4nZqjuK8s15 vOoeoEMWuSz77nGYvLtdp65OM83N+aUPeD0nNdXDa5CQRAivO5ZLH87/x05nm72vN8cu +HDQ== X-Gm-Message-State: AJaThX5cixWov8pUR7j3TvL3BVrRT7i3HsS6mZ+uF5u7ol1ia85mhes5 PE+G/TrO1g9R/KzDpVbLlnyYeWMlFXg= X-Received: by 10.223.134.106 with SMTP id 39mr5292227wrw.134.1510956635750; Fri, 17 Nov 2017 14:10:35 -0800 (PST) Received: from localhost ([2.25.234.120]) by smtp.gmail.com with ESMTPSA id a8sm2488017wmh.41.2017.11.17.14.10.33 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 17 Nov 2017 14:10:34 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Add support for SVE scatter stores Date: Fri, 17 Nov 2017 22:10:32 +0000 Message-ID: <874lpswo87.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 This is mostly a mechanical extension of the previous gather load support to scatter stores. The internal functions in this case are: IFN_SCATTER_STORE (base, offsets, scale, values) IFN_MASK_SCATTER_STORE (base, offsets, scale, values, mask) However, one nonobvious change is to vect_analyze_data_ref_access. If we're treating an access as a gather load or scatter store (i.e. if STMT_VINFO_GATHER_SCATTER_P is true), the existing code would create a dummy data_reference whose step is 0. There's not really much else it could do, since the whole point is that the step isn't predictable from iteration to iteration. We then went into this code in vect_analyze_data_ref_access: /* Allow loads with zero step in inner-loop vectorization. */ if (loop_vinfo && integer_zerop (step)) { GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; if (!nested_in_vect_loop_p (loop, stmt)) return DR_IS_READ (dr); I.e. we'd take the step literally and assume that this is a load or store to an invariant address. Loads from invariant addresses are supported but stores to them aren't. The code therefore had the effect of disabling all scatter stores. AFAICT this is true of AVX too: although tests like avx512f-scatter-1.c test for the correctness of a scatter-like loop, they don't seem to check whether a scatter instruction is actually used. The patch therefore makes vect_analyze_data_ref_access return true for scatters. We do seem to handle the aliasing correctly; that's tested by other functions, and is symmetrical to the already-working gather case. Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu and powerpc64le-linux-gnu. OK to install? Richard 2017-11-17 Richard Sandiford Alan Hayward David Sherwood gcc/ * optabs.def (scatter_store_optab, mask_scatter_store_optab): New optabs. * doc/md.texi (scatter_store@var{m}, mask_scatter_store@var{m}): Document. * genopinit.c (main): Add supports_vec_scatter_store and supports_vec_scatter_store_cached to target_optabs. * gimple.h (gimple_expr_type): Handle IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE. * internal-fn.def (SCATTER_STORE, MASK_SCATTER_STORE): New internal functions. * internal-fn.h (internal_store_fn_p): Declare. (internal_fn_stored_value_index): Likewise. * internal-fn.c (scatter_store_direct): New macro. (expand_scatter_store_optab_fn): New function. (direct_scatter_store_optab_supported_p): New macro. (internal_store_fn_p): New function. (internal_gather_scatter_fn_p): Handle IFN_SCATTER_STORE and IFN_MASK_SCATTER_STORE. (internal_fn_mask_index): Likewise. (internal_fn_stored_value_index): New function. (internal_gather_scatter_fn_supported_p): Adjust operand numbers for scatter stores. * optabs-query.h (supports_vec_scatter_store_p): Declare. * optabs-query.c (supports_vec_scatter_store_p): New function. * tree-vectorizer.h (vect_get_store_rhs): Declare. * tree-vect-data-refs.c (vect_analyze_data_ref_access): Return true for scatter stores. (vect_gather_scatter_fn_p): Handle scatter stores too. (vect_check_gather_scatter): Consider using scatter stores if supports_vec_scatter_store_p. * tree-vect-patterns.c (vect_try_gather_scatter_pattern): Handle scatter stores too. * tree-vect-stmts.c (exist_non_indexing_operands_for_use_p): Use internal_fn_stored_value_index. (check_load_store_masking): Handle scatter stores too. (vect_get_store_rhs): Make public. (vectorizable_call): Use internal_store_fn_p. (vectorizable_store): Handle scatter store internal functions. (vect_transform_stmt): Compare GROUP_STORE_COUNT with GROUP_SIZE when deciding whether the end of the group has been reached. * config/aarch64/aarch64.md (UNSPEC_ST1_SCATTER): New unspec. * config/aarch64/aarch64-sve.md (scatter_store): New expander. (mask_scatter_store): New insns. gcc/testsuite/ * gcc.target/aarch64/sve_mask_scatter_store_1.c: New test. * gcc.target/aarch64/sve_mask_scatter_store_2.c: Likewise. * gcc.target/aarch64/sve_scatter_store_1.c: Likewise. * gcc.target/aarch64/sve_scatter_store_2.c: Likewise. * gcc.target/aarch64/sve_scatter_store_3.c: Likewise. * gcc.target/aarch64/sve_scatter_store_4.c: Likewise. * gcc.target/aarch64/sve_scatter_store_5.c: Likewise. * gcc.target/aarch64/sve_scatter_store_6.c: Likewise. * gcc.target/aarch64/sve_scatter_store_7.c: Likewise. * gcc.target/aarch64/sve_strided_store_1.c: Likewise. * gcc.target/aarch64/sve_strided_store_2.c: Likewise. * gcc.target/aarch64/sve_strided_store_3.c: Likewise. * gcc.target/aarch64/sve_strided_store_4.c: Likewise. * gcc.target/aarch64/sve_strided_store_5.c: Likewise. * gcc.target/aarch64/sve_strided_store_6.c: Likewise. * gcc.target/aarch64/sve_strided_store_7.c: Likewise. Index: gcc/optabs.def =================================================================== --- gcc/optabs.def 2017-11-17 21:57:43.917004022 +0000 +++ gcc/optabs.def 2017-11-17 22:07:58.222016439 +0000 @@ -392,6 +392,8 @@ OPTAB_D (set_thread_pointer_optab, "set_ OPTAB_D (gather_load_optab, "gather_load$a") OPTAB_D (mask_gather_load_optab, "mask_gather_load$a") +OPTAB_D (scatter_store_optab, "scatter_store$a") +OPTAB_D (mask_scatter_store_optab, "mask_scatter_store$a") OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE) OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES) Index: gcc/doc/md.texi =================================================================== --- gcc/doc/md.texi 2017-11-17 21:57:43.915004222 +0000 +++ gcc/doc/md.texi 2017-11-17 22:07:58.220016439 +0000 @@ -4934,6 +4934,35 @@ operand 5. Bit @var{i} of the mask is s of the result should be loaded from memory and clear if element @var{i} of the result should be set to zero. +@cindex @code{scatter_store@var{m}} instruction pattern +@item @samp{scatter_store@var{m}} +Store a vector of mode @var{m} into several distinct memory locations. +Operand 0 is a scalar base address and operand 1 is a vector of offsets +from that base. Operand 4 is the vector of values that should be stored, +which has the same number of elements as the offset. For each element +index @var{i}: + +@itemize @bullet +@item +extend the offset element @var{i} to address width, using zero +extension if operand 2 is 1 and sign extension if operand 2 is zero; +@item +multiply the extended offset by operand 3; +@item +add the result to the base; and +@item +store element @var{i} of operand 4 to that address. +@end itemize + +The value of operand 2 does not matter if the offsets are already +address width. + +@cindex @code{mask_scatter_store@var{m}} instruction pattern +@item @samp{mask_scatter_store@var{m}} +Like @samp{scatter_store@var{m}}, but takes an extra mask operand as +operand 5. Bit @var{i} of the mask is set if element @var{i} +of the result should be stored to memory. + @cindex @code{vec_set@var{m}} instruction pattern @item @samp{vec_set@var{m}} Set given field in the vector value. Operand 0 is the vector to modify, Index: gcc/genopinit.c =================================================================== --- gcc/genopinit.c 2017-11-17 21:57:43.915004222 +0000 +++ gcc/genopinit.c 2017-11-17 22:07:58.220016439 +0000 @@ -239,6 +239,8 @@ main (int argc, const char **argv) " mode. */\n" " bool supports_vec_gather_load;\n" " bool supports_vec_gather_load_cached;\n" + " bool supports_vec_scatter_store;\n" + " bool supports_vec_scatter_store_cached;\n" "};\n" "extern void init_all_optabs (struct target_optabs *);\n" "\n" Index: gcc/gimple.h =================================================================== --- gcc/gimple.h 2017-10-26 10:02:02.709481562 +0100 +++ gcc/gimple.h 2017-11-17 22:07:58.221016439 +0000 @@ -6319,11 +6319,18 @@ gimple_expr_type (const gimple *stmt) if (code == GIMPLE_CALL) { const gcall *call_stmt = as_a (stmt); - if (gimple_call_internal_p (call_stmt) - && gimple_call_internal_fn (call_stmt) == IFN_MASK_STORE) - return TREE_TYPE (gimple_call_arg (call_stmt, 3)); - else - return gimple_call_return_type (call_stmt); + if (gimple_call_internal_p (call_stmt)) + switch (gimple_call_internal_fn (call_stmt)) + { + case IFN_MASK_STORE: + case IFN_SCATTER_STORE: + return TREE_TYPE (gimple_call_arg (call_stmt, 3)); + case IFN_MASK_SCATTER_STORE: + return TREE_TYPE (gimple_call_arg (call_stmt, 4)); + default: + break; + } + return gimple_call_return_type (call_stmt); } else if (code == GIMPLE_ASSIGN) { Index: gcc/internal-fn.def =================================================================== --- gcc/internal-fn.def 2017-11-17 21:57:43.916004122 +0000 +++ gcc/internal-fn.def 2017-11-17 22:07:58.222016439 +0000 @@ -52,6 +52,7 @@ along with GCC; see the file COPYING3. - mask_store: currently just maskstore - store_lanes: currently just vec_store_lanes - mask_store_lanes: currently just vec_mask_store_lanes + - scatter_store: used for {mask_,}scatter_store - unary: a normal unary optab, such as vec_reverse_ - binary: a normal binary optab, such as vec_interleave_lo_ @@ -115,6 +116,10 @@ DEF_INTERNAL_OPTAB_FN (GATHER_LOAD, ECF_ DEF_INTERNAL_OPTAB_FN (MASK_GATHER_LOAD, ECF_PURE, mask_gather_load, gather_load) +DEF_INTERNAL_OPTAB_FN (SCATTER_STORE, 0, scatter_store, scatter_store) +DEF_INTERNAL_OPTAB_FN (MASK_SCATTER_STORE, 0, + mask_scatter_store, scatter_store) + DEF_INTERNAL_OPTAB_FN (MASK_STORE, 0, maskstore, mask_store) DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes) DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0, Index: gcc/internal-fn.h =================================================================== --- gcc/internal-fn.h 2017-11-17 21:57:43.916004122 +0000 +++ gcc/internal-fn.h 2017-11-17 22:07:58.222016439 +0000 @@ -193,8 +193,10 @@ extern bool set_edom_supported_p (void); extern internal_fn get_conditional_internal_fn (tree_code, tree); extern bool internal_load_fn_p (internal_fn); +extern bool internal_store_fn_p (internal_fn); extern bool internal_gather_scatter_fn_p (internal_fn); extern int internal_fn_mask_index (internal_fn); +extern int internal_fn_stored_value_index (internal_fn); extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree, tree, signop, int); Index: gcc/internal-fn.c =================================================================== --- gcc/internal-fn.c 2017-11-17 21:57:43.916004122 +0000 +++ gcc/internal-fn.c 2017-11-17 22:07:58.221016439 +0000 @@ -87,6 +87,7 @@ #define gather_load_direct { -1, 1, fals #define mask_store_direct { 3, 2, false } #define store_lanes_direct { 0, 0, false } #define mask_store_lanes_direct { 0, 0, false } +#define scatter_store_direct { 3, 1, false } #define unary_direct { 0, 0, true } #define binary_direct { 0, 0, true } #define cond_unary_direct { 1, 1, true } @@ -2677,6 +2678,42 @@ expand_LAUNDER (internal_fn, gcall *call expand_assignment (lhs, gimple_call_arg (call, 0), false); } +/* Expand {MASK_,}SCATTER_STORE{S,U} call CALL using optab OPTAB. */ + +static void +expand_scatter_store_optab_fn (internal_fn, gcall *stmt, direct_optab optab) +{ + internal_fn ifn = gimple_call_internal_fn (stmt); + int rhs_index = internal_fn_stored_value_index (ifn); + int mask_index = internal_fn_mask_index (ifn); + tree base = gimple_call_arg (stmt, 0); + tree offset = gimple_call_arg (stmt, 1); + tree scale = gimple_call_arg (stmt, 2); + tree rhs = gimple_call_arg (stmt, rhs_index); + + rtx base_rtx = expand_normal (base); + rtx offset_rtx = expand_normal (offset); + HOST_WIDE_INT scale_int = tree_to_shwi (scale); + rtx rhs_rtx = expand_normal (rhs); + + struct expand_operand ops[6]; + int i = 0; + create_address_operand (&ops[i++], base_rtx); + create_input_operand (&ops[i++], offset_rtx, TYPE_MODE (TREE_TYPE (offset))); + create_integer_operand (&ops[i++], TYPE_UNSIGNED (TREE_TYPE (offset))); + create_integer_operand (&ops[i++], scale_int); + create_input_operand (&ops[i++], rhs_rtx, TYPE_MODE (TREE_TYPE (rhs))); + if (mask_index >= 0) + { + tree mask = gimple_call_arg (stmt, mask_index); + rtx mask_rtx = expand_normal (mask); + create_input_operand (&ops[i++], mask_rtx, TYPE_MODE (TREE_TYPE (mask))); + } + + insn_code icode = direct_optab_handler (optab, TYPE_MODE (TREE_TYPE (rhs))); + expand_insn (icode, i, ops); +} + /* Expand {MASK_,}GATHER_LOAD call CALL using optab OPTAB. */ static void @@ -2952,6 +2989,7 @@ #define direct_gather_load_optab_support #define direct_mask_store_optab_supported_p direct_optab_supported_p #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p +#define direct_scatter_store_optab_supported_p direct_optab_supported_p #define direct_while_optab_supported_p convert_optab_supported_p #define direct_fold_extract_optab_supported_p direct_optab_supported_p @@ -3094,6 +3132,25 @@ internal_load_fn_p (internal_fn fn) } } +/* Return true if IFN is some form of store to memory. */ + +bool +internal_store_fn_p (internal_fn fn) +{ + switch (fn) + { + case IFN_MASK_STORE: + case IFN_STORE_LANES: + case IFN_MASK_STORE_LANES: + case IFN_SCATTER_STORE: + case IFN_MASK_SCATTER_STORE: + return true; + + default: + return false; + } +} + /* Return true if IFN is some form of gather load or scatter store. */ bool @@ -3103,6 +3160,8 @@ internal_gather_scatter_fn_p (internal_f { case IFN_GATHER_LOAD: case IFN_MASK_GATHER_LOAD: + case IFN_SCATTER_STORE: + case IFN_MASK_SCATTER_STORE: return true; default: @@ -3127,6 +3186,27 @@ internal_fn_mask_index (internal_fn fn) case IFN_MASK_GATHER_LOAD: return 3; + case IFN_MASK_SCATTER_STORE: + return 4; + + default: + return -1; + } +} + +/* If FN takes a value that should be stored to memory, return the index + of that argument, otherwise return -1. */ + +int +internal_fn_stored_value_index (internal_fn fn) +{ + switch (fn) + { + case IFN_MASK_STORE: + case IFN_SCATTER_STORE: + case IFN_MASK_SCATTER_STORE: + return 3; + default: return -1; } @@ -3151,9 +3231,12 @@ internal_gather_scatter_fn_supported_p ( return false; optab optab = direct_internal_fn_optab (ifn); insn_code icode = direct_optab_handler (optab, TYPE_MODE (vector_type)); + int output_ops = internal_load_fn_p (ifn) ? 1 : 0; return (icode != CODE_FOR_nothing - && insn_operand_matches (icode, 3, GEN_INT (offset_sign == UNSIGNED)) - && insn_operand_matches (icode, 4, GEN_INT (scale))); + && insn_operand_matches (icode, 2 + output_ops, + GEN_INT (offset_sign == UNSIGNED)) + && insn_operand_matches (icode, 3 + output_ops, + GEN_INT (scale))); } /* Expand STMT as though it were a call to internal function FN. */ Index: gcc/optabs-query.h =================================================================== --- gcc/optabs-query.h 2017-11-17 21:57:43.916004122 +0000 +++ gcc/optabs-query.h 2017-11-17 22:07:58.222016439 +0000 @@ -188,6 +188,7 @@ bool can_atomic_exchange_p (machine_mode bool can_atomic_load_p (machine_mode); bool lshift_cheap_p (bool); bool supports_vec_gather_load_p (); +bool supports_vec_scatter_store_p (); /* Version of find_widening_optab_handler_and_mode that operates on specific mode types. */ Index: gcc/optabs-query.c =================================================================== --- gcc/optabs-query.c 2017-11-17 21:57:43.916004122 +0000 +++ gcc/optabs-query.c 2017-11-17 22:07:58.222016439 +0000 @@ -650,3 +650,21 @@ supports_vec_gather_load_p () return this_fn_optabs->supports_vec_gather_load; } + +/* Return true if vec_scatter_store is available for at least one vector + mode. */ + +bool +supports_vec_scatter_store_p () +{ + if (this_fn_optabs->supports_vec_scatter_store_cached) + return this_fn_optabs->supports_vec_scatter_store; + + this_fn_optabs->supports_vec_scatter_store_cached = true; + + this_fn_optabs->supports_vec_scatter_store + = supports_at_least_one_mode_p (scatter_store_optab); + + return this_fn_optabs->supports_vec_scatter_store; +} + Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-11-17 22:02:44.221485217 +0000 +++ gcc/tree-vectorizer.h 2017-11-17 22:07:58.227016438 +0000 @@ -1412,6 +1412,7 @@ extern void vect_finish_replace_stmt (gi extern void vect_finish_stmt_generation (gimple *, gimple *, gimple_stmt_iterator *); extern bool vect_mark_stmts_to_be_vectorized (loop_vec_info); +extern tree vect_get_store_rhs (gimple *); extern tree vect_get_vec_def_for_operand_1 (gimple *, enum vect_def_type); extern tree vect_get_vec_def_for_operand (tree, gimple *, tree = NULL); extern void vect_get_vec_defs (tree, tree, gimple *, vec *, Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-11-17 22:02:44.220555940 +0000 +++ gcc/tree-vect-data-refs.c 2017-11-17 22:07:58.225016438 +0000 @@ -2659,6 +2659,9 @@ vect_analyze_data_ref_access (struct dat loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); struct loop *loop = NULL; + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + return true; + if (loop_vinfo) loop = LOOP_VINFO_LOOP (loop_vinfo); @@ -3331,7 +3334,7 @@ vect_gather_scatter_fn_p (bool read_p, b if (read_p) ifn = masked_p ? IFN_MASK_GATHER_LOAD : IFN_GATHER_LOAD; else - return false; + ifn = masked_p ? IFN_MASK_SCATTER_STORE : IFN_SCATTER_STORE; /* Test whether the target supports this combination. */ if (!internal_gather_scatter_fn_supported_p (ifn, vectype, memory_type, @@ -3403,7 +3406,8 @@ vect_check_gather_scatter (gimple *stmt, /* True if we should aim to use internal functions rather than built-in functions. */ bool use_ifn_p = (DR_IS_READ (dr) - && supports_vec_gather_load_p ()); + ? supports_vec_gather_load_p () + : supports_vec_scatter_store_p ()); base = DR_REF (dr); /* For masked loads/stores, DR_REF (dr) is an artificial MEM_REF, @@ -3716,7 +3720,8 @@ vect_analyze_data_refs (vec_info *vinfo, bool maybe_scatter = DR_IS_WRITE (dr) && !TREE_THIS_VOLATILE (DR_REF (dr)) - && targetm.vectorize.builtin_scatter != NULL; + && (targetm.vectorize.builtin_scatter != NULL + || supports_vec_scatter_store_p ()); bool maybe_simd_lane_access = is_a (vinfo) && loop->simduid; Index: gcc/tree-vect-patterns.c =================================================================== --- gcc/tree-vect-patterns.c 2017-11-17 21:57:43.919003822 +0000 +++ gcc/tree-vect-patterns.c 2017-11-17 22:07:58.225016438 +0000 @@ -4207,10 +4207,6 @@ vect_try_gather_scatter_pattern (gimple if (!dr || !STMT_VINFO_GATHER_SCATTER_P (stmt_info)) return NULL; - /* Reject stores for now. */ - if (!DR_IS_READ (dr)) - return NULL; - /* Get the boolean that controls whether the load or store happens. This is null if the operation is unconditional. */ tree mask = vect_get_load_store_mask (stmt); @@ -4249,8 +4245,16 @@ vect_try_gather_scatter_pattern (gimple gimple_call_set_lhs (pattern_stmt, load_lhs); } else - /* Not yet supported. */ - gcc_unreachable (); + { + tree rhs = vect_get_store_rhs (stmt); + if (mask != NULL) + pattern_stmt = gimple_build_call_internal (IFN_MASK_SCATTER_STORE, 5, + base, offset, scale, rhs, + mask); + else + pattern_stmt = gimple_build_call_internal (IFN_SCATTER_STORE, 4, + base, offset, scale, rhs); + } gimple_call_set_nothrow (pattern_stmt, true); /* Copy across relevant vectorization info and associate DR with the Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2017-11-17 22:02:44.221485217 +0000 +++ gcc/tree-vect-stmts.c 2017-11-17 22:07:58.226016438 +0000 @@ -395,12 +395,13 @@ exist_non_indexing_operands_for_use_p (t if (mask_index >= 0 && use == gimple_call_arg (stmt, mask_index)) return true; + int stored_value_index = internal_fn_stored_value_index (ifn); + if (stored_value_index >= 0 + && use == gimple_call_arg (stmt, stored_value_index)) + return true; if (internal_gather_scatter_fn_p (ifn) && use == gimple_call_arg (stmt, 1)) return true; - if (ifn == IFN_MASK_STORE - && use == gimple_call_arg (stmt, 3)) - return true; } return false; } @@ -1763,10 +1764,11 @@ check_load_store_masking (loop_vec_info if (memory_access_type == VMAT_GATHER_SCATTER) { - gcc_assert (is_load); + internal_fn ifn = (is_load + ? IFN_MASK_GATHER_LOAD + : IFN_MASK_SCATTER_STORE); tree offset_type = TREE_TYPE (gs_info->offset); - if (!internal_gather_scatter_fn_supported_p (IFN_MASK_GATHER_LOAD, - vectype, + if (!internal_gather_scatter_fn_supported_p (ifn, vectype, gs_info->memory_type, TYPE_SIGN (offset_type), gs_info->scale)) @@ -1775,7 +1777,7 @@ check_load_store_masking (loop_vec_info dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "can't use a fully-masked loop because the" " target doesn't have an appropriate masked" - " gather load instruction.\n"); + " gather load or scatter store instruction.\n"); LOOP_VINFO_CAN_FULLY_MASK_P (loop_vinfo) = false; return; } @@ -2059,7 +2061,7 @@ reverse_vector (tree vec_dest, tree inpu /* STMT is either a masked or unconditional store. Return the value being stored. */ -static tree +tree vect_get_store_rhs (gimple *stmt) { if (gassign *assign = dyn_cast (stmt)) @@ -2070,8 +2072,9 @@ vect_get_store_rhs (gimple *stmt) if (gcall *call = dyn_cast (stmt)) { internal_fn ifn = gimple_call_internal_fn (call); - gcc_assert (ifn == IFN_MASK_STORE); - return gimple_call_arg (stmt, 3); + int index = internal_fn_stored_value_index (ifn); + gcc_assert (index >= 0); + return gimple_call_arg (stmt, index); } gcc_unreachable (); } @@ -3051,7 +3054,7 @@ vectorizable_call (gimple *gs, gimple_st if (gimple_call_internal_p (stmt) && (internal_load_fn_p (gimple_call_internal_fn (stmt)) - || gimple_call_internal_fn (stmt) == IFN_MASK_STORE)) + || internal_store_fn_p (gimple_call_internal_fn (stmt)))) /* Handled by vectorizable_load and vectorizable_store. */ return false; @@ -6122,7 +6125,11 @@ vectorizable_store (gimple *stmt, gimple else { gcall *call = dyn_cast (stmt); - if (!call || !gimple_call_internal_p (call, IFN_MASK_STORE)) + if (!call || !gimple_call_internal_p (call)) + return false; + + internal_fn ifn = gimple_call_internal_fn (call); + if (!internal_store_fn_p (ifn)) return false; if (slp_node != NULL) @@ -6133,10 +6140,13 @@ vectorizable_store (gimple *stmt, gimple return false; } - ref_type = TREE_TYPE (gimple_call_arg (call, 1)); - mask = gimple_call_arg (call, 2); - if (!vect_check_load_store_mask (stmt, mask, &mask_vectype)) - return false; + int mask_index = internal_fn_mask_index (ifn); + if (mask_index >= 0) + { + mask = gimple_call_arg (call, mask_index); + if (!vect_check_load_store_mask (stmt, mask, &mask_vectype)) + return false; + } } op = vect_get_store_rhs (stmt); @@ -6198,7 +6208,8 @@ vectorizable_store (gimple *stmt, gimple TYPE_MODE (mask_vectype), false)) return false; } - else if (memory_access_type != VMAT_LOAD_STORE_LANES) + else if (memory_access_type != VMAT_LOAD_STORE_LANES + && memory_access_type != VMAT_GATHER_SCATTER) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -6214,7 +6225,8 @@ vectorizable_store (gimple *stmt, gimple return false; } - grouped_store = STMT_VINFO_GROUPED_ACCESS (stmt_info); + grouped_store = (STMT_VINFO_GROUPED_ACCESS (stmt_info) + && memory_access_type != VMAT_GATHER_SCATTER); if (grouped_store) { first_stmt = GROUP_FIRST_ELEMENT (stmt_info); @@ -6250,7 +6262,7 @@ vectorizable_store (gimple *stmt, gimple ensure_base_align (dr); - if (memory_access_type == VMAT_GATHER_SCATTER) + if (memory_access_type == VMAT_GATHER_SCATTER && gs_info.decl) { tree vec_oprnd0 = NULL_TREE, vec_oprnd1 = NULL_TREE, src; tree arglist = TYPE_ARG_TYPES (TREE_TYPE (gs_info.decl)); @@ -6397,10 +6409,14 @@ vectorizable_store (gimple *stmt, gimple return true; } - if (grouped_store) + if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) { - GROUP_STORE_COUNT (vinfo_for_stmt (first_stmt))++; + gimple *group_stmt = GROUP_FIRST_ELEMENT (stmt_info); + GROUP_STORE_COUNT (vinfo_for_stmt (group_stmt))++; + } + if (grouped_store) + { /* FORNOW */ gcc_assert (!loop || !nested_in_vect_loop_p (loop, stmt)); @@ -6700,10 +6716,27 @@ vectorizable_store (gimple *stmt, gimple || memory_access_type == VMAT_CONTIGUOUS_REVERSE) offset = size_int (-TYPE_VECTOR_SUBPARTS (vectype) + 1); - if (memory_access_type == VMAT_LOAD_STORE_LANES) - aggr_type = build_array_type_nelts (elem_type, vec_num * nunits); + tree bump; + tree vec_offset = NULL_TREE; + if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + { + aggr_type = NULL_TREE; + bump = NULL_TREE; + } + else if (memory_access_type == VMAT_GATHER_SCATTER) + { + aggr_type = elem_type; + vect_get_strided_load_store_ops (stmt, loop_vinfo, &gs_info, + &bump, &vec_offset); + } else - aggr_type = vectype; + { + if (memory_access_type == VMAT_LOAD_STORE_LANES) + aggr_type = build_array_type_nelts (elem_type, vec_num * nunits); + else + aggr_type = vectype; + bump = vect_get_data_ptr_increment (dr, aggr_type, memory_access_type); + } if (mask) LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true; @@ -6808,12 +6841,16 @@ vectorizable_store (gimple *stmt, gimple dataref_offset = build_int_cst (ref_type, 0); inv_p = false; } + else if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + vect_get_gather_scatter_ops (loop, stmt, &gs_info, + &dataref_ptr, &vec_offset); else dataref_ptr = vect_create_data_ref_ptr (first_stmt, aggr_type, simd_lane_access_p ? loop : NULL, offset, &dummy, gsi, &ptr_incr, - simd_lane_access_p, &inv_p); + simd_lane_access_p, &inv_p, + NULL_TREE, bump); gcc_assert (bb_vinfo || !inv_p); } else @@ -6840,11 +6877,17 @@ vectorizable_store (gimple *stmt, gimple } if (dataref_offset) dataref_offset - = int_const_binop (PLUS_EXPR, dataref_offset, - TYPE_SIZE_UNIT (aggr_type)); + = int_const_binop (PLUS_EXPR, dataref_offset, bump); + else if (STMT_VINFO_GATHER_SCATTER_P (stmt_info)) + { + gimple *def_stmt; + vect_def_type dt; + vect_is_simple_use (vec_offset, loop_vinfo, &def_stmt, &dt); + vec_offset = vect_get_vec_def_for_stmt_copy (dt, vec_offset); + } else dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, stmt, - TYPE_SIZE_UNIT (aggr_type)); + bump); } if (memory_access_type == VMAT_LOAD_STORE_LANES) @@ -6916,10 +6959,28 @@ vectorizable_store (gimple *stmt, gimple final_mask = prepare_load_store_mask (mask_vectype, final_mask, vec_mask, gsi); + if (memory_access_type == VMAT_GATHER_SCATTER) + { + tree scale = size_int (gs_info.scale); + gcall *call; + if (masked_loop_p) + call = gimple_build_call_internal + (IFN_MASK_SCATTER_STORE, 5, dataref_ptr, vec_offset, + scale, vec_oprnd, final_mask); + else + call = gimple_build_call_internal + (IFN_SCATTER_STORE, 4, dataref_ptr, vec_offset, + scale, vec_oprnd); + gimple_call_set_nothrow (call, true); + new_stmt = call; + vect_finish_stmt_generation (stmt, new_stmt, gsi); + break; + } + if (i > 0) /* Bump the vector pointer. */ dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi, - stmt, NULL_TREE); + stmt, bump); if (slp) vec_oprnd = vec_oprnds[i]; @@ -9394,9 +9455,11 @@ vect_transform_stmt (gimple *stmt, gimpl one are skipped, and there vec_stmt_info shouldn't be freed meanwhile. */ *grouped_store = true; - if (STMT_VINFO_VEC_STMT (stmt_info)) + stmt_vec_info group_info + = vinfo_for_stmt (GROUP_FIRST_ELEMENT (stmt_info)); + if (GROUP_STORE_COUNT (group_info) == GROUP_SIZE (group_info)) is_store = true; - } + } else is_store = true; break; Index: gcc/config/aarch64/aarch64.md =================================================================== --- gcc/config/aarch64/aarch64.md 2017-11-17 21:57:43.914004322 +0000 +++ gcc/config/aarch64/aarch64.md 2017-11-17 22:07:58.219016439 +0000 @@ -152,6 +152,7 @@ (define_c_enum "unspec" [ UNSPEC_LD1_SVE UNSPEC_ST1_SVE UNSPEC_LD1_GATHER + UNSPEC_ST1_SCATTER UNSPEC_MERGE_PTRUE UNSPEC_PTEST_PTRUE UNSPEC_UNPACKSHI Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2017-11-17 21:57:43.913004422 +0000 +++ gcc/config/aarch64/aarch64-sve.md 2017-11-17 22:07:58.219016439 +0000 @@ -246,6 +246,63 @@ (define_insn "mask_gather_load" ld1d\t%0.d, %5/z, [%1, %2.d, lsl %p4]" ) +;; Unpredicated scatter store. +(define_expand "scatter_store" + [(set (mem:BLK (scratch)) + (unspec:BLK + [(match_dup 5) + (match_operand:DI 0 "aarch64_reg_or_zero") + (match_operand: 1 "register_operand") + (match_operand:DI 2 "const_int_operand") + (match_operand:DI 3 "aarch64_gather_scale_operand_") + (match_operand:SVE_SD 4 "register_operand")] + UNSPEC_ST1_SCATTER))] + "TARGET_SVE" + { + operands[5] = force_reg (mode, CONSTM1_RTX (mode)); + } +) + +;; Predicated scatter stores for 32-bit elements. Operand 2 is true for +;; unsigned extension and false for signed extension. +(define_insn "mask_scatter_store" + [(set (mem:BLK (scratch)) + (unspec:BLK + [(match_operand: 5 "register_operand" "Upl, Upl, Upl, Upl, Upl") + (match_operand:DI 0 "aarch64_reg_or_zero" "Z, rk, rk, rk, rk") + (match_operand: 1 "register_operand" "w, w, w, w, w") + (match_operand:DI 2 "const_int_operand" "i, Z, Ui1, Z, Ui1") + (match_operand:DI 3 "aarch64_gather_scale_operand_w" "Ui1, Ui1, Ui1, i, i") + (match_operand:SVE_S 4 "register_operand" "w, w, w, w, w")] + UNSPEC_ST1_SCATTER))] + "TARGET_SVE" + "@ + st1w\t%4.s, %5, [%1.s] + st1w\t%4.s, %5, [%0, %1.s, sxtw] + st1w\t%4.s, %5, [%0, %1.s, uxtw] + st1w\t%4.s, %5, [%0, %1.s, sxtw %p3] + st1w\t%4.s, %5, [%0, %1.s, uxtw %p3]" +) + +;; Predicated scatter stores for 64-bit elements. The value of operand 2 +;; doesn't matter in this case. +(define_insn "mask_scatter_store" + [(set (mem:BLK (scratch)) + (unspec:BLK + [(match_operand: 5 "register_operand" "Upl, Upl, Upl") + (match_operand:DI 0 "aarch64_reg_or_zero" "Z, rk, rk") + (match_operand: 1 "register_operand" "w, w, w") + (match_operand:DI 2 "const_int_operand") + (match_operand:DI 3 "aarch64_gather_scale_operand_d" "Ui1, Ui1, i") + (match_operand:SVE_D 4 "register_operand" "w, w, w")] + UNSPEC_ST1_SCATTER))] + "TARGET_SVE" + "@ + st1d\t%4.d, %5, [%1.d] + st1d\t%4.d, %5, [%0, %1.d] + st1d\t%4.d, %5, [%0, %1.d, lsl %p3]" +) + ;; SVE structure moves. (define_expand "mov" [(set (match_operand:SVE_STRUCT 0 "nonimmediate_operand") Index: gcc/testsuite/gcc.target/aarch64/sve_mask_scatter_store_1.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_scatter_store_1.c 2017-11-17 22:07:58.222016439 +0000 @@ -0,0 +1,51 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -ffast-math -march=armv8-a+sve --save-temps" } */ + +#include + +#ifndef INDEX32 +#define INDEX32 int32_t +#define INDEX64 int64_t +#endif + +#define TEST_LOOP(DATA_TYPE, CMP_TYPE, BITS) \ + void \ + f_##DATA_TYPE##_##CMP_TYPE \ + (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + CMP_TYPE *restrict cmp1, CMP_TYPE *restrict cmp2, \ + INDEX##BITS *restrict indices, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + if (cmp1[i] == cmp2[i]) \ + dest[indices[i]] = src[i] + 1; \ + } + +#define TEST32(T, DATA_TYPE) \ + T (DATA_TYPE, int32_t, 32) \ + T (DATA_TYPE, uint32_t, 32) \ + T (DATA_TYPE, float, 32) + +#define TEST64(T, DATA_TYPE) \ + T (DATA_TYPE, int64_t, 64) \ + T (DATA_TYPE, uint64_t, 64) \ + T (DATA_TYPE, double, 64) + +#define TEST_ALL(T) \ + TEST32 (T, int32_t) \ + TEST32 (T, uint32_t) \ + TEST32 (T, float) \ + TEST64 (T, int64_t) \ + TEST64 (T, uint64_t) \ + TEST64 (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+\.s, sxtw 2\]\n} 9 } } */ + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 3\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+\.d, lsl 3\]\n} 9 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_mask_scatter_store_2.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_mask_scatter_store_2.c 2017-11-17 22:07:58.222016439 +0000 @@ -0,0 +1,17 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -ffast-math -march=armv8-a+sve --save-temps" } */ + +#define INDEX32 uint32_t +#define INDEX64 uint64_t + +#include "sve_mask_scatter_store_1.c" + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+\.s, uxtw 2\]\n} 9 } } */ + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 3\]\n} 36 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+\.d, lsl 3\]\n} 9 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_scatter_store_1.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_scatter_store_1.c 2017-11-17 22:07:58.223016439 +0000 @@ -0,0 +1,31 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#include + +#ifndef INDEX32 +#define INDEX32 int32_t +#define INDEX64 int64_t +#endif + +#define TEST_LOOP(DATA_TYPE, BITS) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + INDEX##BITS *indices, int n) \ + { \ + for (int i = 9; i < n; ++i) \ + dest[indices[i]] = src[i] + 1; \ + } + +#define TEST_ALL(T) \ + T (int32_t, 32) \ + T (uint32_t, 32) \ + T (float, 32) \ + T (int64_t, 64) \ + T (uint64_t, 64) \ + T (double, 64) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, sxtw 2\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+.d, lsl 3\]\n} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_scatter_store_2.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_scatter_store_2.c 2017-11-17 22:07:58.223016439 +0000 @@ -0,0 +1,10 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#define INDEX32 uint32_t +#define INDEX64 uint64_t + +#include "sve_scatter_store_1.c" + +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, uxtw 2\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+.d, lsl 3\]\n} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_scatter_store_3.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_scatter_store_3.c 2017-11-17 22:07:58.223016439 +0000 @@ -0,0 +1,32 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#include + +#ifndef INDEX32 +#define INDEX32 int32_t +#define INDEX64 int64_t +#endif + +/* Invoked 18 times for each data size. */ +#define TEST_LOOP(DATA_TYPE, BITS) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + INDEX##BITS *indices, int n) \ + { \ + for (int i = 9; i < n; ++i) \ + *(DATA_TYPE *) ((char *) dest + indices[i]) = src[i] + 1; \ + } + +#define TEST_ALL(T) \ + T (int32_t, 32) \ + T (uint32_t, 32) \ + T (float, 32) \ + T (int64_t, 64) \ + T (uint64_t, 64) \ + T (double, 64) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, sxtw\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+.d\]\n} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_scatter_store_4.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_scatter_store_4.c 2017-11-17 22:07:58.223016439 +0000 @@ -0,0 +1,10 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#define INDEX32 uint32_t +#define INDEX64 uint64_t + +#include "sve_scatter_store_3.c" + +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, uxtw\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+.d\]\n} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_scatter_store_5.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_scatter_store_5.c 2017-11-17 22:07:58.223016439 +0000 @@ -0,0 +1,23 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#include + +/* Invoked 18 times for each data size. */ +#define TEST_LOOP(DATA_TYPE) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE (DATA_TYPE *restrict *dest, DATA_TYPE *restrict src, \ + int n) \ + { \ + for (int i = 9; i < n; ++i) \ + *dest[i] = src[i] + 1; \ + } + +#define TEST_ALL(T) \ + T (int64_t) \ + T (uint64_t) \ + T (double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[z[0-9]+.d\]\n} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_scatter_store_6.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_scatter_store_6.c 2017-11-17 22:07:58.223016439 +0000 @@ -0,0 +1,36 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -fwrapv -march=armv8-a+sve --save-temps" } */ + +#include + +#ifndef INDEX32 +#define INDEX16 int16_t +#define INDEX32 int32_t +#endif + +/* Invoked 18 times for each data size. */ +#define TEST_LOOP(DATA_TYPE, BITS) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE (DATA_TYPE *restrict dest, DATA_TYPE *restrict src, \ + INDEX##BITS *indices, INDEX##BITS mask, int n) \ + { \ + for (int i = 9; i < n; ++i) \ + dest[(INDEX##BITS) (indices[i] | mask)] = src[i] + 1; \ + } + +#define TEST_ALL(T) \ + T (int32_t, 16) \ + T (uint32_t, 16) \ + T (float, 16) \ + T (int64_t, 32) \ + T (uint64_t, 32) \ + T (double, 32) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tsunpkhi\tz[0-9]+\.s, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tsunpklo\tz[0-9]+\.s, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tsunpkhi\tz[0-9]+\.d, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tsunpklo\tz[0-9]+\.d, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, sxtw 2\]\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+.d, lsl 3\]\n} 6 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_scatter_store_7.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_scatter_store_7.c 2017-11-17 22:07:58.223016439 +0000 @@ -0,0 +1,15 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#define INDEX16 uint16_t +#define INDEX32 uint32_t + +#include "sve_scatter_store_6.c" + +/* { dg-final { scan-assembler-times {\tuunpkhi\tz[0-9]+\.s, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tuunpklo\tz[0-9]+\.s, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tuunpkhi\tz[0-9]+\.d, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tuunpklo\tz[0-9]+\.d, z[0-9]+\.s\n} 3 } } */ +/* Either extension type is OK here. */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, [us]xtw 2\]\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+.d, lsl 3\]\n} 6 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_strided_store_1.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_strided_store_1.c 2017-11-17 22:07:58.224016438 +0000 @@ -0,0 +1,40 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#include + +#ifndef INDEX8 +#define INDEX8 int8_t +#define INDEX16 int16_t +#define INDEX32 int32_t +#define INDEX64 int64_t +#endif + +#define TEST_LOOP(DATA_TYPE, BITS) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, \ + DATA_TYPE *restrict src, \ + INDEX##BITS stride, INDEX##BITS n) \ + { \ + for (INDEX##BITS i = 0; i < n; ++i) \ + dest[i * stride] = src[i] + 1; \ + } + +#define TEST_TYPE(T, DATA_TYPE) \ + T (DATA_TYPE, 8) \ + T (DATA_TYPE, 16) \ + T (DATA_TYPE, 32) \ + T (DATA_TYPE, 64) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, uint32_t) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, int64_t) \ + TEST_TYPE (T, uint64_t) \ + TEST_TYPE (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, sxtw 2\]\n} 9 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+.d, lsl 3\]\n} 12 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_strided_store_2.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_strided_store_2.c 2017-11-17 22:07:58.224016438 +0000 @@ -0,0 +1,18 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#define INDEX8 uint8_t +#define INDEX16 uint16_t +#define INDEX32 uint32_t +#define INDEX64 uint64_t + +#include "sve_strided_store_1.c" + +/* 8 and 16 bits are signed because the multiplication promotes to int. + Using uxtw for all 9 would be OK. */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, sxtw 2\]\n} 6 } } */ +/* The 32-bit loop needs to honor the defined overflow in uint32_t, + so we vectorize the offset calculation. This means that the + 64-bit version needs two copies. */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, uxtw 2\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+.d, lsl 3\]\n} 15 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_strided_store_3.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_strided_store_3.c 2017-11-17 22:07:58.224016438 +0000 @@ -0,0 +1,33 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#include + +#define TEST_LOOP(DATA_TYPE, OTHER_TYPE) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE##_##BITS (DATA_TYPE *restrict dest, \ + DATA_TYPE *restrict src, \ + OTHER_TYPE *restrict other, \ + OTHER_TYPE mask, \ + int stride, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + dest[i * stride] = src[i] + (OTHER_TYPE) (other[i] | mask); \ + } + +#define TEST_ALL(T) \ + T (int32_t, int16_t) \ + T (uint32_t, int16_t) \ + T (float, int16_t) \ + T (int64_t, int32_t) \ + T (uint64_t, int32_t) \ + T (double, int32_t) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 1\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 2\]\n} 9 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, sxtw 2\]\n} 6 } } */ + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, x[0-9]+, lsl 3\]\n} 6 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+.d, lsl 3\]\n} 6 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_strided_store_4.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_strided_store_4.c 2017-11-17 22:07:58.224016438 +0000 @@ -0,0 +1,33 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#include + +#define TEST_LOOP(DATA_TYPE, NAME, SCALE) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE##_##NAME (DATA_TYPE *restrict dest, \ + DATA_TYPE *restrict src, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + dest[i * SCALE] = src[i] + 1; \ + } + +#define TEST_TYPE(T, DATA_TYPE) \ + T (DATA_TYPE, 5, 5) \ + T (DATA_TYPE, 7, 7) \ + T (DATA_TYPE, 11, 11) \ + T (DATA_TYPE, 200, 200) \ + T (DATA_TYPE, m100, -100) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, uint32_t) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, int64_t) \ + TEST_TYPE (T, uint64_t) \ + TEST_TYPE (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, sxtw 2\]\n} 15 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+.d, lsl 3\]\n} 15 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_strided_store_5.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_strided_store_5.c 2017-11-17 22:07:58.224016438 +0000 @@ -0,0 +1,34 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=256 --save-temps" } */ + +#include + +#define TEST_LOOP(DATA_TYPE, NAME, SCALE) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE##_##NAME (DATA_TYPE *restrict dest, \ + DATA_TYPE *restrict src, long n) \ + { \ + for (long i = 0; i < n; ++i) \ + dest[i * SCALE] = src[i] + 1; \ + } + +#define TEST_TYPE(T, DATA_TYPE) \ + T (DATA_TYPE, 5, 5) \ + T (DATA_TYPE, 7, 7) \ + T (DATA_TYPE, 11, 11) \ + T (DATA_TYPE, 200, 200) \ + T (DATA_TYPE, m100, -100) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, uint32_t) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, int64_t) \ + TEST_TYPE (T, uint64_t) \ + TEST_TYPE (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, uxtw\]\n} 12 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, sxtw\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+.d\]\n} 15 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_strided_store_6.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_strided_store_6.c 2017-11-17 22:07:58.224016438 +0000 @@ -0,0 +1,7 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=scalable --save-temps" } */ + +#include "sve_strided_store_5.c" + +/* { dg-final { scan-assembler-not {\[x[0-9]+, z[0-9]+\.s} } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+.d\]\n} 15 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_strided_store_7.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_strided_store_7.c 2017-11-17 22:07:58.224016438 +0000 @@ -0,0 +1,34 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#include + +#define TEST_LOOP(DATA_TYPE, NAME, SCALE) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE##_##NAME (DATA_TYPE *restrict dest, \ + DATA_TYPE *restrict src) \ + { \ + for (long i = 0; i < 1000; ++i) \ + dest[i * SCALE] = src[i] + 1; \ + } + +#define TEST_TYPE(T, DATA_TYPE) \ + T (DATA_TYPE, 5, 5) \ + T (DATA_TYPE, 7, 7) \ + T (DATA_TYPE, 11, 11) \ + T (DATA_TYPE, 200, 200) \ + T (DATA_TYPE, m100, -100) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, uint32_t) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, int64_t) \ + TEST_TYPE (T, uint64_t) \ + TEST_TYPE (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, uxtw\]\n} 12 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7], \[x[0-9]+, z[0-9]+.s, sxtw\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7], \[x[0-9]+, z[0-9]+.d\]\n} 15 } } */