From patchwork Wed May 16 10:18:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 135991 Delivered-To: patch@linaro.org Received: by 2002:a2e:9706:0:0:0:0:0 with SMTP id r6-v6csp717717lji; Wed, 16 May 2018 03:19:08 -0700 (PDT) X-Google-Smtp-Source: AB8JxZobz3ReHtM8w1Fwk3y+1NVivdEBBbrpKy9AndBnPa5LAf92EusGtbx+pP0oslhnD+fcz4rB X-Received: by 2002:a17:902:6b8b:: with SMTP id p11-v6mr300780plk.212.1526465948751; Wed, 16 May 2018 03:19:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526465948; cv=none; d=google.com; s=arc-20160816; b=ecZ7n55buJXX1MEV7b8BE9sswUBvKBb8v8cqO1vqZ6JjwIWtuIPZXjR2lt6YXlfGQh DatUq6Zi33KKiGoo0vDcx0kdAdsgW4TnYYpY1W0+5B2iSySW+ZYesW36NOlek740R+fn B6pSGezbbhaJ5fGSBmsROBZNCtoGyfL+eZQaFWJsRunymUVeUybm6v/8Pd7amm17tNF0 fHOkpsLlB14+2+TtwLbLwJ6wQosoODnuHZiq9iJRQUTg1Bvbg0KXyokVqewPnuf3WPsZ gPCIxNdnqlWvBzp74MENGd9VVAkOchAPS68X2+2c5OA5A7J/vTfN4CpvbeBkQ+daBE/L 2lIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:date:subject:mail-followup-to:to :from:delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=BphQS4NFCPiGksRUQAUFt4LKUomcxLlmPOPO0dI2Yb4=; b=cidqGYGrtuwu1DZTKl50P1mIpzeolwMaD4bdqyeZccAFRI9/TY2xJ7KYXrWnYRPyjW p3RcZkpZGxFE8a6x1lByYb2PbQKLsqIG84sLg9J8Cd3jGxoXELqazXtmJ1qMYY48/3ff 7bsXvTmEDBm33P/oLnwRMMB5K65o5fx4UkVYxNPPwHZ1UA+H4tJldhbUDdEb4GVhPjqj yJcUHx6y3RwqMuwCYzT7d1mGq7sUzlmwOw7LAp+1UVAbjJavz+dtlgmE7SjJimlvUAOj SC8npLYeitIZu3KWMdRqNHqcXhj1Dxt5wYh1I0y96OkB5cQzVM7fNH6vWVkys6yLPckq ukTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=DjS1g1OS; spf=pass (google.com: domain of gcc-patches-return-477750-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-477750-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id j12-v6si1833787pgf.222.2018.05.16.03.19.08 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 May 2018 03:19:08 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-477750-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=DjS1g1OS; spf=pass (google.com: domain of gcc-patches-return-477750-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-477750-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=TrtsDIHM2tTz1gs0RdsJcYhUb3uxjdwZn90JsNYsjl2eu5t8HbuI8 Za+c8MBrYFwIq566thD/2k2JebkDYWAz1VsAk+k2yb1yECtqM4nNArkIyJfb3SJB VW9gRhRAv7Z0WkThnW0mCND1t5Kin3kUmb+zK0eRh5EwCR15FQmNvQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=mCLk1NxF37kz89zDrQ7lfVfKphA=; b=DjS1g1OSrOyK/0aV7oF4 mOW3ZHIIqQT22ypxB2+Df2QLhmTaYqSTcPRMPLY2Ppctc/ifTgKy5LXiU59In/Kn 5MdDhxGZ/eMvLuSVqtHzJ5yU9rjAwVxv1JBSs7lRFPwy3KvG+KErlhbwdMWAYfJl PU+RbSIsgOM4ixEJc6zSbN8= Received: (qmail 114833 invoked by alias); 16 May 2018 10:18:56 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 114824 invoked by uid 89); 16 May 2018 10:18:55 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-10.9 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wm0-f49.google.com Received: from mail-wm0-f49.google.com (HELO mail-wm0-f49.google.com) (74.125.82.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 16 May 2018 10:18:52 +0000 Received: by mail-wm0-f49.google.com with SMTP id f6-v6so316532wmc.4 for ; Wed, 16 May 2018 03:18:51 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=BphQS4NFCPiGksRUQAUFt4LKUomcxLlmPOPO0dI2Yb4=; b=dds7Jdez+vGM/V1XRMEWBbii8qRPPPJBVOTfmEWMsY7hr1ZSAvGVBwuWSj8+tQPX0D fJfNhce/lewyoYMg5Fl8Fw/cKbiSBR/W5r24hSdco0hrBlNhm9oupeWyoFn0cPiOINky JelrUVsd10WHpf4IL2Hjv3NoJBZes+Mh+evsymzFxN5jqlqTDQAsFb6hZs2a2CNFbM9l D4YSKZeNMZhpXjdNvSHW6rJ0K+OAfE3cOISYPhvB8mtI76l0GLUA/oh0ma7QHk2HbVD7 3O6O7eZzTvA/A2AjZavkOaJB8bfzit2claD54bnO5NRjXIG+xHWo1eq4k3nvlewwf/xa C2vA== X-Gm-Message-State: ALKqPwfCOAphpkkH8TZLxdVd6b1irKNt/+GxGfBMClVTMk8cDvfSPnk+ gvWVK+kCMUW0g2HInywmPuxNFe9Oppc= X-Received: by 2002:a1c:4a0d:: with SMTP id x13-v6mr157168wma.150.1526465929696; Wed, 16 May 2018 03:18:49 -0700 (PDT) Received: from localhost ([217.140.96.141]) by smtp.gmail.com with ESMTPSA id b66-v6sm3863094wma.48.2018.05.16.03.18.48 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 16 May 2018 03:18:49 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Implement SLP of internal functions Date: Wed, 16 May 2018 11:18:48 +0100 Message-ID: <87muwzoqd3.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 SLP of calls was previously restricted to built-in functions. This patch extends it to internal functions. Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf and x86_64-linux-gnu. OK to install? Richard 2018-05-16 Richard Sandiford gcc/ * internal-fn.h (vectorizable_internal_fn_p): New function. * tree-vect-slp.c (compatible_calls_p): Likewise. (vect_build_slp_tree_1): Remove nops argument. Handle calls to internal functions. (vect_build_slp_tree_2): Update call to vect_build_slp_tree_1. gcc/testsuite/ * gcc.target/aarch64/sve/cond_arith_4.c: New test. * gcc.target/aarch64/sve/cond_arith_4_run.c: Likewise. * gcc.target/aarch64/sve/cond_arith_5.c: Likewise. * gcc.target/aarch64/sve/cond_arith_5_run.c: Likewise. * gcc.target/aarch64/sve/slp_14.c: Likewise. * gcc.target/aarch64/sve/slp_14_run.c: Likewise. Index: gcc/internal-fn.h =================================================================== --- gcc/internal-fn.h 2018-05-16 11:06:14.513574219 +0100 +++ gcc/internal-fn.h 2018-05-16 11:12:11.872116220 +0100 @@ -158,6 +158,17 @@ direct_internal_fn_p (internal_fn fn) return direct_internal_fn_array[fn].type0 >= -1; } +/* Return true if FN is a direct internal function that can be vectorized by + converting the return type and all argument types to vectors of the same + number of elements. E.g. we can vectorize an IFN_SQRT on floats as an + IFN_SQRT on vectors of N floats. */ + +inline bool +vectorizable_internal_fn_p (internal_fn fn) +{ + return direct_internal_fn_array[fn].vectorizable; +} + /* Return optab information about internal function FN. Only meaningful if direct_internal_fn_p (FN). */ Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c 2018-05-16 11:02:46.262494712 +0100 +++ gcc/tree-vect-slp.c 2018-05-16 11:12:11.873116180 +0100 @@ -564,6 +564,41 @@ vect_get_and_check_slp_defs (vec_info *v return 0; } +/* Return true if call statements CALL1 and CALL2 are similar enough + to be combined into the same SLP group. */ + +static bool +compatible_calls_p (gcall *call1, gcall *call2) +{ + unsigned int nargs = gimple_call_num_args (call1); + if (nargs != gimple_call_num_args (call2)) + return false; + + if (gimple_call_combined_fn (call1) != gimple_call_combined_fn (call2)) + return false; + + if (gimple_call_internal_p (call1)) + { + if (TREE_TYPE (gimple_call_lhs (call1)) + != TREE_TYPE (gimple_call_lhs (call2))) + return false; + for (unsigned int i = 0; i < nargs; ++i) + if (TREE_TYPE (gimple_call_arg (call1, i)) + != TREE_TYPE (gimple_call_arg (call2, i))) + return false; + } + else + { + if (!operand_equal_p (gimple_call_fn (call1), + gimple_call_fn (call2), 0)) + return false; + + if (gimple_call_fntype (call1) != gimple_call_fntype (call2)) + return false; + } + return true; +} + /* A subroutine of vect_build_slp_tree for checking VECTYPE, which is the caller's attempt to find the vector type in STMT with the narrowest element type. Return true if VECTYPE is nonnull and if it is valid @@ -625,8 +660,8 @@ vect_record_max_nunits (vec_info *vinfo, static bool vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, vec stmts, unsigned int group_size, - unsigned nops, poly_uint64 *max_nunits, - bool *matches, bool *two_operators) + poly_uint64 *max_nunits, bool *matches, + bool *two_operators) { unsigned int i; gimple *first_stmt = stmts[0], *stmt = stmts[0]; @@ -698,7 +733,9 @@ vect_build_slp_tree_1 (vec_info *vinfo, if (gcall *call_stmt = dyn_cast (stmt)) { rhs_code = CALL_EXPR; - if (gimple_call_internal_p (call_stmt) + if ((gimple_call_internal_p (call_stmt) + && (!vectorizable_internal_fn_p + (gimple_call_internal_fn (call_stmt)))) || gimple_call_tail_p (call_stmt) || gimple_call_noreturn_p (call_stmt) || !gimple_call_nothrow_p (call_stmt) @@ -833,11 +870,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, if (rhs_code == CALL_EXPR) { gimple *first_stmt = stmts[0]; - if (gimple_call_num_args (stmt) != nops - || !operand_equal_p (gimple_call_fn (first_stmt), - gimple_call_fn (stmt), 0) - || gimple_call_fntype (first_stmt) - != gimple_call_fntype (stmt)) + if (!compatible_calls_p (as_a (first_stmt), + as_a (stmt))) { if (dump_enabled_p ()) { @@ -1166,8 +1200,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, bool two_operators = false; unsigned char *swap = XALLOCAVEC (unsigned char, group_size); - if (!vect_build_slp_tree_1 (vinfo, swap, - stmts, group_size, nops, + if (!vect_build_slp_tree_1 (vinfo, swap, stmts, group_size, &this_max_nunits, matches, &two_operators)) return NULL; Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4.c 2018-05-16 11:12:11.872116220 +0100 @@ -0,0 +1,62 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define TEST(TYPE, NAME, OP) \ + void __attribute__ ((noinline, noclone)) \ + test_##TYPE##_##NAME (TYPE *__restrict x, \ + TYPE *__restrict y, \ + TYPE z1, TYPE z2, \ + TYPE *__restrict pred, int n) \ + { \ + for (int i = 0; i < n; i += 2) \ + { \ + x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]); \ + x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]); \ + } \ + } + +#define TEST_INT_TYPE(TYPE) \ + TEST (TYPE, div, /) + +#define TEST_FP_TYPE(TYPE) \ + TEST (TYPE, add, +) \ + TEST (TYPE, sub, -) \ + TEST (TYPE, mul, *) \ + TEST (TYPE, div, /) + +#define TEST_ALL \ + TEST_INT_TYPE (int32_t) \ + TEST_INT_TYPE (uint32_t) \ + TEST_INT_TYPE (int64_t) \ + TEST_INT_TYPE (uint64_t) \ + TEST_FP_TYPE (float) \ + TEST_FP_TYPE (double) + +TEST_ALL + +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 1 } } */ +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 1 } } */ + +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 12 } } */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 6 } } */ + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 12 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 6 } } */ + +/* { dg-final { scan-assembler-not {\tsel\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_4_run.c 2018-05-16 11:12:11.872116220 +0100 @@ -0,0 +1,32 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "cond_arith_4.c" + +#define N 98 + +#undef TEST +#define TEST(TYPE, NAME, OP) \ + { \ + TYPE x[N], y[N], pred[N], z[2] = { 5, 7 }; \ + for (int i = 0; i < N; ++i) \ + { \ + y[i] = i * i; \ + pred[i] = i % 3; \ + } \ + test_##TYPE##_##NAME (x, y, z[0], z[1], pred, N); \ + for (int i = 0; i < N; ++i) \ + { \ + TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i]; \ + if (x[i] != expected) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int +main (void) +{ + TEST_ALL + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5.c 2018-05-16 11:12:11.872116220 +0100 @@ -0,0 +1,85 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */ + +#include + +#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP) \ + void __attribute__ ((noinline, noclone)) \ + test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (DATA_TYPE *__restrict x, \ + DATA_TYPE *__restrict y, \ + DATA_TYPE z1, DATA_TYPE z2, \ + DATA_TYPE *__restrict pred, \ + OTHER_TYPE *__restrict foo, \ + int n) \ + { \ + for (int i = 0; i < n; i += 2) \ + { \ + x[i] = (pred[i] != 1 ? y[i] OP z1 : y[i]); \ + x[i + 1] = (pred[i + 1] != 1 ? y[i + 1] OP z2 : y[i + 1]); \ + foo[i] += 1; \ + foo[i + 1] += 2; \ + } \ + } + +#define TEST_INT_TYPE(DATA_TYPE, OTHER_TYPE) \ + TEST (DATA_TYPE, OTHER_TYPE, div, /) + +#define TEST_FP_TYPE(DATA_TYPE, OTHER_TYPE) \ + TEST (DATA_TYPE, OTHER_TYPE, add, +) \ + TEST (DATA_TYPE, OTHER_TYPE, sub, -) \ + TEST (DATA_TYPE, OTHER_TYPE, mul, *) \ + TEST (DATA_TYPE, OTHER_TYPE, div, /) + +#define TEST_ALL \ + TEST_INT_TYPE (int32_t, int8_t) \ + TEST_INT_TYPE (int32_t, int16_t) \ + TEST_INT_TYPE (uint32_t, int8_t) \ + TEST_INT_TYPE (uint32_t, int16_t) \ + TEST_INT_TYPE (int64_t, int8_t) \ + TEST_INT_TYPE (int64_t, int16_t) \ + TEST_INT_TYPE (int64_t, int32_t) \ + TEST_INT_TYPE (uint64_t, int8_t) \ + TEST_INT_TYPE (uint64_t, int16_t) \ + TEST_INT_TYPE (uint64_t, int32_t) \ + TEST_FP_TYPE (float, int8_t) \ + TEST_FP_TYPE (float, int16_t) \ + TEST_FP_TYPE (double, int8_t) \ + TEST_FP_TYPE (double, int16_t) \ + TEST_FP_TYPE (double, int32_t) + +TEST_ALL + +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ + +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ +/* { dg-final { scan-assembler-times {\tfadd\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ + +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ +/* { dg-final { scan-assembler-times {\tfsub\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ + +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ +/* { dg-final { scan-assembler-times {\tfmul\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ + +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s, p[0-7]/m,} 6 } } */ +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d, p[0-7]/m,} 14 } } */ + +/* The load XFAILs for fixed-length SVE account for extra loads from the + constant pool. */ +/* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]/z,} 12 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ +/* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7],} 12 } } */ + +/* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z,} 12 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ +/* { dg-final { scan-assembler-times {\tst1h\tz[0-9]+\.h, p[0-7],} 12 } } */ + +/* 72 for x operations, 6 for foo operations. */ +/* { dg-final { scan-assembler-times {\tld1w\tz[0-9]+\.s, p[0-7]/z,} 78 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ +/* 36 for x operations, 6 for foo operations. */ +/* { dg-final { scan-assembler-times {\tst1w\tz[0-9]+\.s, p[0-7],} 42 } } */ + +/* { dg-final { scan-assembler-times {\tld1d\tz[0-9]+\.d, p[0-7]/z,} 168 } } */ +/* { dg-final { scan-assembler-times {\tst1d\tz[0-9]+\.d, p[0-7],} 84 } } */ + +/* { dg-final { scan-assembler-not {\tsel\t} } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/cond_arith_5_run.c 2018-05-16 11:12:11.873116180 +0100 @@ -0,0 +1,35 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "cond_arith_5.c" + +#define N 98 + +#undef TEST +#define TEST(DATA_TYPE, OTHER_TYPE, NAME, OP) \ + { \ + DATA_TYPE x[N], y[N], pred[N], z[2] = { 5, 7 }; \ + OTHER_TYPE foo[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + y[i] = i * i; \ + pred[i] = i % 3; \ + foo[i] = i * 5; \ + } \ + test_##DATA_TYPE##_##OTHER_TYPE##_##NAME (x, y, z[0], z[1], \ + pred, foo, N); \ + for (int i = 0; i < N; ++i) \ + { \ + DATA_TYPE expected = i % 3 != 1 ? y[i] OP z[i & 1] : y[i]; \ + if (x[i] != expected) \ + __builtin_abort (); \ + asm volatile ("" ::: "memory"); \ + } \ + } + +int +main (void) +{ + TEST_ALL + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/slp_14.c 2018-05-16 11:12:11.873116180 +0100 @@ -0,0 +1,48 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define VEC_PERM(TYPE) \ +void __attribute__ ((weak)) \ +vec_slp_##TYPE (TYPE *restrict a, TYPE *restrict b, int n) \ +{ \ + for (int i = 0; i < n; ++i) \ + { \ + TYPE a1 = a[i * 2]; \ + TYPE a2 = a[i * 2 + 1]; \ + TYPE b1 = b[i * 2]; \ + TYPE b2 = b[i * 2 + 1]; \ + a[i * 2] = b1 > 1 ? a1 / b1 : a1; \ + a[i * 2 + 1] = b2 > 2 ? a2 / b2 : a2; \ + } \ +} + +#define TEST_ALL(T) \ + T (int32_t) \ + T (uint32_t) \ + T (int64_t) \ + T (uint64_t) \ + T (float) \ + T (double) + +TEST_ALL (VEC_PERM) + +/* The loop should be fully-masked. The load XFAILs for fixed-length + SVE account for extra loads from the constant pool. */ +/* { dg-final { scan-assembler-times {\tld1w\t} 6 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ +/* { dg-final { scan-assembler-times {\tst1w\t} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\t} 6 { xfail { aarch64_sve && { ! vect_variable_length } } } } } */ +/* { dg-final { scan-assembler-times {\tst1d\t} 3 } } */ +/* { dg-final { scan-assembler-not {\tldr} } } */ +/* { dg-final { scan-assembler-not {\tstr} } } */ + +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s} 6 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d} 6 } } */ + +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.s} 1 } } */ +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.s} 1 } } */ +/* { dg-final { scan-assembler-times {\tsdiv\tz[0-9]+\.d} 1 } } */ +/* { dg-final { scan-assembler-times {\tudiv\tz[0-9]+\.d} 1 } } */ +/* { dg-final { scan-assembler-times {\tfdiv\tz[0-9]+\.d} 1 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/slp_14_run.c 2018-05-16 11:12:11.873116180 +0100 @@ -0,0 +1,34 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "slp_14.c" + +#define N1 (103 * 2) +#define N2 (111 * 2) + +#define HARNESS(TYPE) \ + { \ + TYPE a[N2], b[N2]; \ + for (unsigned int i = 0; i < N2; ++i) \ + { \ + a[i] = i * 2 + i % 5; \ + b[i] = i % 11; \ + } \ + vec_slp_##TYPE (a, b, N1 / 2); \ + for (unsigned int i = 0; i < N2; ++i) \ + { \ + TYPE orig_a = i * 2 + i % 5; \ + TYPE orig_b = i % 11; \ + TYPE expected_a = orig_a; \ + if (i < N1 && orig_b > (i & 1 ? 2 : 1)) \ + expected_a /= orig_b; \ + if (a[i] != expected_a || b[i] != orig_b) \ + __builtin_abort (); \ + } \ + } + +int +main (void) +{ + TEST_ALL (HARNESS) +}