From patchwork Fri Nov 17 22:04:53 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 119231 Delivered-To: patch@linaro.org Received: by 10.140.22.164 with SMTP id 33csp1057547qgn; Fri, 17 Nov 2017 14:07:22 -0800 (PST) X-Google-Smtp-Source: AGs4zMbxt5dsritTDRhPVd/HYPieQ6cPV+wrkK4c2BSrkeLfEMHiS/eycgDlf8Hc9ZT/ckrOoYeT X-Received: by 10.98.106.5 with SMTP id f5mr3527227pfc.27.1510956442572; Fri, 17 Nov 2017 14:07:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1510956442; cv=none; d=google.com; s=arc-20160816; b=ezkQyJBfcHY35uOp1ZCGbPZsDoIQ5zjq430n2pnejw+tzSilHrLCoi0uBXMlHcva7t UjuRYFTmZg0NwmQijDqPgVsnPZKs6V6r0SFe97dzejYLttwi++kp6B6uSKCGiDyt7yoh ot2EofVm+VoR1bgpyMQYbhI8gmxPO81NbpdwveNJqoXLJ8MQ93YVl2b1UtH9G3CV58qD o/cooGt1yef+Ppha58lO1ldZDjuQxfKSmS9Rpr/w1qIiEAZ8aGFaTCoRHxS7q1ipLsPu Y7ovC6fQJg6BMBxhlCFu0joC2+1t605b9PkR7TCcPeAZAJjBNV069hFKyUOl7SM/AQGg eeNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:date:subject:mail-followup-to:to :from:delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=MduwokZ2i7+xB/0Wcl8UNxikq4+gwEeQKih4D8U2s8A=; b=LLWtZ2wNzJLikuPcnkPuLvL+aPLoUxEi2vMe5Z4OMElWb6n3Smzs2C7HiU3ZqgUv59 DtmA5Y+X4am65uMxBisrfNQwdk9yckMXahW7GCssn2FDclUSYORP5dywkPh8hPSN5aGv /+9BpJwPerUk8oiMbo4K3TftRaA9wg5eq6uTgpIQGyWMtxZGgqzgMHQIX7OR+2KAeUCc 2IAgqN19YoXnJoAebP4rsVyGUPopwFAnOO0GU6/KC99CQN3a2UdTw961W0qLZDN4oFCJ l/GDnE4no3lV/LxVM4oKwIUAaV7zu9RLhY289mWJnoCjpx1L/THre+1Myu5DJNmdruFm xzxg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=X1TolIIm; spf=pass (google.com: domain of gcc-patches-return-467232-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-467232-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id r17si3410387pgu.439.2017.11.17.14.07.22 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 Nov 2017 14:07:22 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-467232-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=X1TolIIm; spf=pass (google.com: domain of gcc-patches-return-467232-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-467232-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=fMveDI6MetXm+qsa6PBZwxWZ6iQkqyVDDFXQ3TvfZPllT6Qbdmgim L9s+PFYJlG+EVpSzb01B2C830kxPYjY5CdPCgz3vrj/aGIStXMivFoSKrAX25gO1 fPNb0qtQ/pBCMPr3twrn5sam7nCBnhz8g9nLZQij4+S8rUJreS0sAU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=e6JvHSjqVJ5RaSEDoERnR54ecSE=; b=X1TolIIm7d2MfX+UjSkV ResGj4BTUBO7qH+He3h2zKj4cF+S7b+l6GxgQBTyhnLPKd1Xtpx2+fNdygcnz1wL po5HPa1WlCYQYrtFcfRSKBMtZMidnJHjfN1R87LwFsa7b0E98zoJ8ptbbQxfpurF HMzsoOqb7VyqmD8f1dLfwMU= Received: (qmail 90592 invoked by alias); 17 Nov 2017 22:05:02 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 90387 invoked by uid 89); 17 Nov 2017 22:05:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KB_WAM_FROM_NAME_SINGLEWORD, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=resort X-HELO: mail-wr0-f171.google.com Received: from mail-wr0-f171.google.com (HELO mail-wr0-f171.google.com) (209.85.128.171) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 17 Nov 2017 22:04:58 +0000 Received: by mail-wr0-f171.google.com with SMTP id z75so1933496wrc.5 for ; Fri, 17 Nov 2017 14:04:57 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=MduwokZ2i7+xB/0Wcl8UNxikq4+gwEeQKih4D8U2s8A=; b=VI3CexNbgP+QPrsFxOPELWMRN+I9qk2bWDiAqP22uFTdlF2OjMuHDyE0Vuu8nR26Oe chmrzOUIC5mk1vmoXwAFcr3/dCK5j0YjWc4kAXM7zVIqY/bB65SuyIjzxFqCEyxZItzp WFw84YuLZ0FptShxRp6QKOOSXXu5xhXw0zflEdGAKjEPtYo4svSr+WOnD8WwtmjVjZPI 9uWJaWJce20kiauAcbyKcZcIBffR1X2TWq7nLhFE+gyJ+aFEwVAWYnkKfa2JdAg1Wvn3 Qdd3jkjwBAe1jU6ZdZY7eSCREoXLTj6hHcNX/iBrrKdqblYlWPoNV2MqDiXRjY/KIYpI 1mgw== X-Gm-Message-State: AJaThX6U8uYbI6uj86cAvRZcAIZarXYpMT6dHW9lwJ2YVsFgGhSg0J+7 mG/8XLa+biecfVXWwDPduIiQP1+GkSU= X-Received: by 10.223.175.49 with SMTP id z46mr3718144wrc.12.1510956295288; Fri, 17 Nov 2017 14:04:55 -0800 (PST) Received: from localhost ([2.25.234.120]) by smtp.gmail.com with ESMTPSA id v23sm1888941wmh.8.2017.11.17.14.04.54 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 17 Nov 2017 14:04:54 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Allow gather loads to be used for grouped accesses Date: Fri, 17 Nov 2017 22:04:53 +0000 Message-ID: <87a7zkwohm.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Following on from the previous patch for strided accesses, this patch allows gather loads to be used with grouped accesses, if we otherwise would need to fall back to VMAT_ELEMENTWISE. However, as the comment says, this is restricted to single-element groups for now: ??? Although the code can handle all group sizes correctly, it probably isn't a win to use separate strided accesses based on nearby locations. Or, even if it's a win over scalar code, it might not be a win over vectorizing at a lower VF, if that allows us to use contiguous accesses. Single-element groups are an important special case though, and this means that code is less sensitive to GCC's classification of single accesses with constant steps as "grouped" and ones with variable steps as "strided". 2017-11-17 Richard Sandiford Alan Hayward David Sherwood gcc/ * tree-vectorizer.h (vect_gather_scatter_fn_p): Declare. * tree-vect-data-refs.c (vect_gather_scatter_fn_p): Make public. * tree-vect-stmts.c (vect_truncate_gather_scatter_offset): New function. (vect_use_strided_gather_scatters_p): Take a masked_p argument. Use vect_truncate_gather_scatter_offset if we can't treat the operation as a normal gather load or scatter store. (get_group_load_store_type): Take the gather_scatter_info as argument. Try using a gather load or scatter store for single-element groups. (get_load_store_type): Update calls to get_group_load_store_type and vect_use_strided_gather_scatters_p. gcc/testsuite/ * gcc.target/aarch64/sve_strided_load_4.c: New test. * gcc.target/aarch64/sve_strided_load_5.c: Likewise. * gcc.target/aarch64/sve_strided_load_6.c: Likewise. * gcc.target/aarch64/sve_strided_load_7.c: Likewise. Index: gcc/tree-vectorizer.h =================================================================== --- gcc/tree-vectorizer.h 2017-11-17 21:59:27.828803892 +0000 +++ gcc/tree-vectorizer.h 2017-11-17 22:02:44.221485217 +0000 @@ -1454,6 +1454,8 @@ extern bool vect_verify_datarefs_alignme extern bool vect_slp_analyze_and_verify_instance_alignment (slp_instance); extern bool vect_analyze_data_ref_accesses (vec_info *); extern bool vect_prune_runtime_alias_test_list (loop_vec_info); +extern bool vect_gather_scatter_fn_p (bool, bool, tree, tree, unsigned int, + signop, int, internal_fn *, tree *); extern bool vect_check_gather_scatter (gimple *, loop_vec_info, gather_scatter_info *); extern bool vect_analyze_data_refs (vec_info *, poly_uint64 *); Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-11-17 21:59:27.827803892 +0000 +++ gcc/tree-vect-data-refs.c 2017-11-17 22:02:44.220555940 +0000 @@ -3307,7 +3307,7 @@ vect_prune_runtime_alias_test_list (loop Return true if the function is supported, storing the function id in *IFN_OUT and the type of a vector element in *ELEMENT_TYPE_OUT. */ -static bool +bool vect_gather_scatter_fn_p (bool read_p, bool masked_p, tree vectype, tree memory_type, unsigned int offset_bits, signop offset_sign, int scale, Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2017-11-17 21:59:27.828803892 +0000 +++ gcc/tree-vect-stmts.c 2017-11-17 22:02:44.221485217 +0000 @@ -1847,17 +1847,116 @@ prepare_load_store_mask (tree mask_type, return and_res; } +/* Determine whether we can use a gather load or scatter store to vectorize + strided load or store STMT by truncating the current offset to a smaller + width. We need to be able to construct an offset vector: + + { 0, X, X*2, X*3, ... } + + without loss of precision, where X is STMT's DR_STEP. + + Return true if this is possible, describing the gather load or scatter + store in GS_INFO. MASKED_P is true if the load or store is conditional. */ + +static bool +vect_truncate_gather_scatter_offset (gimple *stmt, loop_vec_info loop_vinfo, + bool masked_p, + gather_scatter_info *gs_info) +{ + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + tree step = DR_STEP (dr); + if (TREE_CODE (step) != INTEGER_CST) + { + /* ??? Perhaps we could use range information here? */ + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "cannot truncate variable step.\n"); + return false; + } + + /* Get the number of bits in an element. */ + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + scalar_mode element_mode = SCALAR_TYPE_MODE (TREE_TYPE (vectype)); + unsigned int element_bits = GET_MODE_BITSIZE (element_mode); + + /* Set COUNT to the upper limit on the number of elements - 1. + Start with the maximum vectorization factor. */ + unsigned HOST_WIDE_INT count = vect_max_vf (loop_vinfo) - 1; + + /* Try lowering COUNT to the number of scalar latch iterations. */ + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + widest_int max_iters; + if (max_loop_iterations (loop, &max_iters) + && max_iters < count) + count = max_iters.to_shwi (); + + /* Try scales of 1 and the element size. */ + int scales[] = { 1, vect_get_scalar_dr_size (dr) }; + bool overflow_p = false; + for (int i = 0; i < 2; ++i) + { + int scale = scales[i]; + widest_int factor; + if (!wi::multiple_of_p (wi::to_widest (step), scale, SIGNED, &factor)) + continue; + + /* See whether we can calculate (COUNT - 1) * STEP / SCALE + in OFFSET_BITS bits. */ + widest_int range = wi::mul (count, factor, SIGNED, &overflow_p); + if (overflow_p) + continue; + signop sign = range >= 0 ? UNSIGNED : SIGNED; + if (wi::min_precision (range, sign) > element_bits) + { + overflow_p = true; + continue; + } + + /* See whether the target supports the operation. */ + tree memory_type = TREE_TYPE (DR_REF (dr)); + if (!vect_gather_scatter_fn_p (DR_IS_READ (dr), masked_p, vectype, + memory_type, element_bits, sign, scale, + &gs_info->ifn, &gs_info->element_type)) + continue; + + tree offset_type = build_nonstandard_integer_type (element_bits, + sign == UNSIGNED); + + gs_info->decl = NULL_TREE; + /* Logically the sum of DR_BASE_ADDRESS, DR_INIT and DR_OFFSET, + but we don't need to store that here. */ + gs_info->base = NULL_TREE; + gs_info->offset = fold_convert (offset_type, step); + gs_info->offset_dt = vect_unknown_def_type; + gs_info->offset_vectype = NULL_TREE; + gs_info->scale = scale; + gs_info->memory_type = memory_type; + return true; + } + + if (overflow_p && dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "truncating gather/scatter offset to %d bits" + " might change its value.\n", element_bits); + + return false; +} + /* Return true if we can use gather/scatter internal functions to vectorize STMT, which is a grouped or strided load or store. - When returning true, fill in GS_INFO with the information required - to perform the operation. */ + MASKED_P is true if load or store is conditional. When returning + true, fill in GS_INFO with the information required to perform the + operation. */ static bool vect_use_strided_gather_scatters_p (gimple *stmt, loop_vec_info loop_vinfo, + bool masked_p, gather_scatter_info *gs_info) { if (!vect_check_gather_scatter (stmt, loop_vinfo, gs_info)) - return false; + return vect_truncate_gather_scatter_offset (stmt, loop_vinfo, + masked_p, gs_info); scalar_mode element_mode = SCALAR_TYPE_MODE (gs_info->element_type); unsigned int element_bits = GET_MODE_BITSIZE (element_mode); @@ -1989,7 +2088,8 @@ vect_get_store_rhs (gimple *stmt) static bool get_group_load_store_type (gimple *stmt, tree vectype, bool slp, bool masked_p, vec_load_store_type vls_type, - vect_memory_access_type *memory_access_type) + vect_memory_access_type *memory_access_type, + gather_scatter_info *gs_info) { stmt_vec_info stmt_info = vinfo_for_stmt (stmt); vec_info *vinfo = stmt_info->vinfo; @@ -2104,6 +2204,20 @@ get_group_load_store_type (gimple *stmt, overrun_p = would_overrun_p; } } + + /* As a last resort, trying using a gather load or scatter store. + + ??? Although the code can handle all group sizes correctly, + it probably isn't a win to use separate strided accesses based + on nearby locations. Or, even if it's a win over scalar code, + it might not be a win over vectorizing at a lower VF, if that + allows us to use contiguous accesses. */ + if (*memory_access_type == VMAT_ELEMENTWISE + && single_element_p + && loop_vinfo + && vect_use_strided_gather_scatters_p (stmt, loop_vinfo, + masked_p, gs_info)) + *memory_access_type = VMAT_GATHER_SCATTER; } if (vls_type != VLS_LOAD && first_stmt == stmt) @@ -2231,14 +2345,15 @@ get_load_store_type (gimple *stmt, tree else if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) { if (!get_group_load_store_type (stmt, vectype, slp, masked_p, vls_type, - memory_access_type)) + memory_access_type, gs_info)) return false; } else if (STMT_VINFO_STRIDED_P (stmt_info)) { gcc_assert (!slp); if (loop_vinfo - && vect_use_strided_gather_scatters_p (stmt, loop_vinfo, gs_info)) + && vect_use_strided_gather_scatters_p (stmt, loop_vinfo, + masked_p, gs_info)) *memory_access_type = VMAT_GATHER_SCATTER; else *memory_access_type = VMAT_ELEMENTWISE; Index: gcc/testsuite/gcc.target/aarch64/sve_strided_load_4.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_strided_load_4.c 2017-11-17 22:02:44.219626663 +0000 @@ -0,0 +1,33 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#include + +#define TEST_LOOP(DATA_TYPE, NAME, SCALE) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE##_##NAME (DATA_TYPE *restrict dest, \ + DATA_TYPE *restrict src, int n) \ + { \ + for (int i = 0; i < n; ++i) \ + dest[i] += src[i * SCALE]; \ + } + +#define TEST_TYPE(T, DATA_TYPE) \ + T (DATA_TYPE, 5, 5) \ + T (DATA_TYPE, 7, 7) \ + T (DATA_TYPE, 11, 11) \ + T (DATA_TYPE, 200, 200) \ + T (DATA_TYPE, m100, -100) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, uint32_t) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, int64_t) \ + TEST_TYPE (T, uint64_t) \ + TEST_TYPE (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+.s, sxtw 2\]\n} 15 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+.d, lsl 3\]\n} 15 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_strided_load_5.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_strided_load_5.c 2017-11-17 22:02:44.219626663 +0000 @@ -0,0 +1,34 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=256 --save-temps" } */ + +#include + +#define TEST_LOOP(DATA_TYPE, NAME, SCALE) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE##_##NAME (DATA_TYPE *restrict dest, \ + DATA_TYPE *restrict src, long n) \ + { \ + for (long i = 0; i < n; ++i) \ + dest[i] += src[i * SCALE]; \ + } + +#define TEST_TYPE(T, DATA_TYPE) \ + T (DATA_TYPE, 5, 5) \ + T (DATA_TYPE, 7, 7) \ + T (DATA_TYPE, 11, 11) \ + T (DATA_TYPE, 200, 200) \ + T (DATA_TYPE, m100, -100) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, uint32_t) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, int64_t) \ + TEST_TYPE (T, uint64_t) \ + TEST_TYPE (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+.s, uxtw\]\n} 12 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+.s, sxtw\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+.d\]\n} 15 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_strided_load_6.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_strided_load_6.c 2017-11-17 22:02:44.219626663 +0000 @@ -0,0 +1,7 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve -msve-vector-bits=scalable --save-temps" } */ + +#include "sve_strided_load_5.c" + +/* { dg-final { scan-assembler-not {\[x[0-9]+, z[0-9]+\.s} } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+.d\]\n} 15 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve_strided_load_7.c =================================================================== --- /dev/null 2017-11-14 14:28:07.424493901 +0000 +++ gcc/testsuite/gcc.target/aarch64/sve_strided_load_7.c 2017-11-17 22:02:44.219626663 +0000 @@ -0,0 +1,34 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve --save-temps" } */ + +#include + +#define TEST_LOOP(DATA_TYPE, NAME, SCALE) \ + void __attribute__ ((noinline, noclone)) \ + f_##DATA_TYPE##_##NAME (DATA_TYPE *restrict dest, \ + DATA_TYPE *restrict src) \ + { \ + for (long i = 0; i < 1000; ++i) \ + dest[i] += src[i * SCALE]; \ + } + +#define TEST_TYPE(T, DATA_TYPE) \ + T (DATA_TYPE, 5, 5) \ + T (DATA_TYPE, 7, 7) \ + T (DATA_TYPE, 11, 11) \ + T (DATA_TYPE, 200, 200) \ + T (DATA_TYPE, m100, -100) + +#define TEST_ALL(T) \ + TEST_TYPE (T, int32_t) \ + TEST_TYPE (T, uint32_t) \ + TEST_TYPE (T, float) \ + TEST_TYPE (T, int64_t) \ + TEST_TYPE (T, uint64_t) \ + TEST_TYPE (T, double) + +TEST_ALL (TEST_LOOP) + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+.s, uxtw\]\n} 12 } } */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z, \[x[0-9]+, z[0-9]+.s, sxtw\]\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z, \[x[0-9]+, z[0-9]+.d\]\n} 15 } } */