From patchwork Tue May 8 10:14:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 135141 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp4194765lji; Tue, 8 May 2018 03:14:57 -0700 (PDT) X-Google-Smtp-Source: AB8JxZp0c1l3R+xFiU0Lxr65FJmVCjR1zoCFFx+KYyqTM3OdU3A6elMIezSvQ5vfjPGPItDXlIj1 X-Received: by 2002:a17:902:7c0d:: with SMTP id x13-v6mr22408098pll.291.1525774497332; Tue, 08 May 2018 03:14:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525774497; cv=none; d=google.com; s=arc-20160816; b=bBX4JrhAT33NN8Lz3IfywtKsMsM+tCYQ2XR+g8iNLipqxLUC46pFWexNoGuOnKtFK7 232MuHd79GA3IaS9nitkbqvABTuy5UB4+Hpmn7IkhIb5pIcQFUZJPujOYhCMKTS4wcyt vejPYbwno4g0aNy5CirNWfL5tKJpHtDksZBIStk5mJZqyMjq5Gtl92d9I5EEeccgHhYC Wf+0UPd0rULBYgzbLmCBBrZMjhBYHeFAC7vmemJZ1qh0q1VN/hJTqFdx772Q82/Gqg2r FD4NxnpRqPWSEVHt5sD2tSv9swUIvUz6iP14JaGG1tEijxEYma6MT15bOrhjdEu26H8S qjcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:date:subject:mail-followup-to:to :from:delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=NtfIk6RPAA2UB3OOU3HnvV8bj55SmvaoOd62V25i3dY=; b=ebX9sYkPUTH/iqG7y/uPDA7+XADhzzCJLTTJItrVxhLkx+9JB8/L2aLLXHpjxVUMUk wD8Q4nG6T/qD9BKllm9tRTDFo/mg6izZeC1dllY98jiNAWsSHO6Wkwz2kKEfxjuPe6y6 8ZgO8Z8TEmnOT269f/XgESYE8PNLr1uEyDJkACDEeyEJI26aF5Qy03JpvUUnqHPxn0ax 6ZOfLcixuvMxJIQopaeRwHDmOT8wNt1EdXbUm854TXCgJpdoK+gyZTwxRhr2WA3Nw6ya UuE8eJqSD3rP9QjlWkFm9/X+o+AF7a4I0LVekRuJ7ZXVukm91R+DVoDXUwZTUmQZZsOu wvJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=D8i5y804; spf=pass (google.com: domain of gcc-patches-return-477339-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-477339-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id az5-v6si12461417plb.369.2018.05.08.03.14.56 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 08 May 2018 03:14:57 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-477339-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=D8i5y804; spf=pass (google.com: domain of gcc-patches-return-477339-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-477339-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=LqiMax9elVG+2Qeo1N/ITB/mqTKU6FXKTOjN5t9Sjhd/JdFXGyk1n hl7EtZA6j+sPvyKmSH1rijm0yZjsJc2f5A0lO0/ZHC+YW+U3qUC1m6eWDbcU3RIR yrSI6xit0tiso1YyxrAGj8iv8bIqxVWW/1+4harNk8e6+l+fU8jySc= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=lt6NIlpds7aVRthEqfmNOgs8avM=; b=D8i5y804r7o9lvn9TDoQ ZnLaDDDCMbAiduYoNcc9uCLVPcYSAbXaZKlDf0VfSc8j0TeH2eALF+AGofNX8dPS Ld7mtgKqTf4dTdlstS//4EIlqKkArLxDccx7CaQ66uoOweKc/7qho3Dn/rAvyijK tfPXQ1x85PKFwSdua3bwm3Y= Received: (qmail 100457 invoked by alias); 8 May 2018 10:14:43 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 100447 invoked by uid 89); 8 May 2018 10:14:43 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=BIC, niche, HX-Received:a1c X-HELO: mail-wm0-f47.google.com Received: from mail-wm0-f47.google.com (HELO mail-wm0-f47.google.com) (74.125.82.47) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 08 May 2018 10:14:39 +0000 Received: by mail-wm0-f47.google.com with SMTP id a137-v6so18526273wme.1 for ; Tue, 08 May 2018 03:14:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=NtfIk6RPAA2UB3OOU3HnvV8bj55SmvaoOd62V25i3dY=; b=L87D8qoqumsIJkgaMmEUKKEhyWPM652z7rKm2kowaazNgFL7ZTltRmLX4UgIARqiEU 8lCqXjHYU0+qoIQKeD7LUSRCerMxU4UMXu2lKa4BSwC1tY8veQlDNKv+pSVj2kDG7sqL BdmuIsJyJtrs3b3zOQwAFhCQGJ4NJTp4NgHk78r/Qpnb9Lqj0MtfF+bpY/7WA+GuDlHP ppOCwGCNGUAyhky3TbmTat/bm10KziQJkHaEgpLgGXPjZQb7HWFxeofmYbJ1zantAelI T4fqQhSSzeC4wA5WTgOI1sSO0P6q6f7HXOR9hb9OOJUML70V58FULk/bO1+P/E0EDOqP P2QQ== X-Gm-Message-State: ALKqPwdZ1oB1Hroelor68/aUDsbRKMeWYGXqj4HuTfVzyDTbirITVwho dP0gU2oAKsmcRWSbR31Fgz31KLFuNQA= X-Received: by 2002:a1c:d1c2:: with SMTP id i185-v6mr3004959wmg.10.1525774475551; Tue, 08 May 2018 03:14:35 -0700 (PDT) Received: from localhost (116.58.7.51.dyn.plus.net. [51.7.58.116]) by smtp.gmail.com with ESMTPSA id m17-v6sm23788909wrh.3.2018.05.08.03.14.34 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 08 May 2018 03:14:34 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: [committed][AArch64] Predicated SVE comparison folds Date: Tue, 08 May 2018 11:14:34 +0100 Message-ID: <87603ysbcl.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 This patch adds SVE patterns that combine a PTRUE-predicated comparison with a separate AND. The main benefit is for optimising ANDs with the loop predicate, as in the testcase. However, one of the potential drawbacks is that it triggers even for cases in which two naturally-parallel comparisons are ANDed together. Whether that's a win or a less will depend on the schedule, but it has the potential to be a win more often than a loss. The combine patterns are undeniably ugly. One way of getting around them would be to allow 1->1 "splits" when combining 2 instructions, as well as 1->2 splits when combining more than 2 instructions (although that wouldn't really be a split). Another would be to have a way of defining target-specific rtx simplifications. branches/ARM/sve-branch has a prototype implementation of that, but it would need some clean-up before being ready to submit. It would also be good to make it closer to the match.pd style. Until then, I think what the combine patterns are doing is the "correct" implementation given the current infrastructure. Tested on aarch64-linux-gnu (with and without SVE) and aaarch64_be-elf. Applied as r260031. Richard 2018-05-08 Richard Sandiford Alan Hayward David Sherwood gcc/ * config/aarch64/aarch64-sve.md (*pred_cmp_combine) (*pred_cmp, *fcm_and_combine) (*fcmuo_and_combine, *fcm_and) (*fcmuo_and): New patterns. gcc/testsuite/ * gcc.target/aarch64/sve/vcond_6.c: Do not expect any ANDs. XFAIL the BIC test. * gcc.target/aarch64/sve/vcond_7.c: New test. * gcc.target/aarch64/sve/vcond_7_run.c: Likewise. Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2018-05-08 10:56:20.122789504 +0100 +++ gcc/config/aarch64/aarch64-sve.md 2018-05-08 11:12:30.156289597 +0100 @@ -1358,6 +1358,49 @@ (define_insn "*cmp_cc" cmp\t%0., %1/z, %2., %3." ) +;; Predicated integer comparisons, formed by combining a PTRUE-predicated +;; comparison with an AND. Split the instruction into its preferred form +;; (below) at the earliest opportunity, in order to get rid of the +;; redundant operand 1. +(define_insn_and_split "*pred_cmp_combine" + [(set (match_operand: 0 "register_operand" "=Upa, Upa") + (and: + (unspec: + [(match_operand: 1) + (SVE_INT_CMP: + (match_operand:SVE_I 2 "register_operand" "w, w") + (match_operand:SVE_I 3 "aarch64_sve_cmp__operand" ", w"))] + UNSPEC_MERGE_PTRUE) + (match_operand: 4 "register_operand" "Upl, Upl"))) + (clobber (reg:CC CC_REGNUM))] + "TARGET_SVE" + "#" + "&& 1" + [(parallel + [(set (match_dup 0) + (and: + (SVE_INT_CMP: + (match_dup 2) + (match_dup 3)) + (match_dup 4))) + (clobber (reg:CC CC_REGNUM))])] +) + +;; Predicated integer comparisons. +(define_insn "*pred_cmp" + [(set (match_operand: 0 "register_operand" "=Upa, Upa") + (and: + (SVE_INT_CMP: + (match_operand:SVE_I 2 "register_operand" "w, w") + (match_operand:SVE_I 3 "aarch64_sve_cmp__operand" ", w")) + (match_operand: 1 "register_operand" "Upl, Upl"))) + (clobber (reg:CC CC_REGNUM))] + "TARGET_SVE" + "@ + cmp\t%0., %1/z, %2., #%3 + cmp\t%0., %1/z, %2., %3." +) + ;; Floating-point comparisons predicated with a PTRUE. (define_insn "*fcm" [(set (match_operand: 0 "register_operand" "=Upa, Upa") @@ -1384,6 +1427,83 @@ (define_insn "*fcmuo" "TARGET_SVE" "fcmuo\t%0., %1/z, %2., %3." ) + +;; Floating-point comparisons predicated on a PTRUE, with the results ANDed +;; with another predicate P. This does not have the same trapping behavior +;; as predicating the comparison itself on P, but it's a legitimate fold, +;; since we can drop any potentially-trapping operations whose results +;; are not needed. +;; +;; Split the instruction into its preferred form (below) at the earliest +;; opportunity, in order to get rid of the redundant operand 1. +(define_insn_and_split "*fcm_and_combine" + [(set (match_operand: 0 "register_operand" "=Upa, Upa") + (and: + (unspec: + [(match_operand: 1) + (SVE_FP_CMP + (match_operand:SVE_F 2 "register_operand" "w, w") + (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero" "Dz, w"))] + UNSPEC_MERGE_PTRUE) + (match_operand: 4 "register_operand" "Upl, Upl")))] + "TARGET_SVE" + "#" + "&& 1" + [(set (match_dup 0) + (and: + (SVE_FP_CMP: + (match_dup 2) + (match_dup 3)) + (match_dup 4)))] +) + +(define_insn_and_split "*fcmuo_and_combine" + [(set (match_operand: 0 "register_operand" "=Upa") + (and: + (unspec: + [(match_operand: 1) + (unordered + (match_operand:SVE_F 2 "register_operand" "w") + (match_operand:SVE_F 3 "register_operand" "w"))] + UNSPEC_MERGE_PTRUE) + (match_operand: 4 "register_operand" "Upl")))] + "TARGET_SVE" + "#" + "&& 1" + [(set (match_dup 0) + (and: + (unordered: + (match_dup 2) + (match_dup 3)) + (match_dup 4)))] +) + +;; Unpredicated floating-point comparisons, with the results ANDed +;; with another predicate. This is a valid fold for the same reasons +;; as above. +(define_insn "*fcm_and" + [(set (match_operand: 0 "register_operand" "=Upa, Upa") + (and: + (SVE_FP_CMP: + (match_operand:SVE_F 2 "register_operand" "w, w") + (match_operand:SVE_F 3 "aarch64_simd_reg_or_zero" "Dz, w")) + (match_operand: 1 "register_operand" "Upl, Upl")))] + "TARGET_SVE" + "@ + fcm\t%0., %1/z, %2., #0.0 + fcm\t%0., %1/z, %2., %3." +) + +(define_insn "*fcmuo_and" + [(set (match_operand: 0 "register_operand" "=Upa") + (and: + (unordered: + (match_operand:SVE_F 2 "register_operand" "w") + (match_operand:SVE_F 3 "register_operand" "w")) + (match_operand: 1 "register_operand" "Upl")))] + "TARGET_SVE" + "fcmuo\t%0., %1/z, %2., %3." +) ;; Predicated floating-point comparisons. We don't need a version ;; of this for unordered comparisons. Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_6.c =================================================================== --- gcc/testsuite/gcc.target/aarch64/sve/vcond_6.c 2018-05-08 10:35:31.102864315 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_6.c 2018-05-08 11:12:30.156289597 +0100 @@ -43,10 +43,16 @@ #define TEST_ALL(T) \ TEST_ALL (LOOP) -/* { dg-final { scan-assembler-times {\tand\tp[0-9]+\.b, p[0-9]+/z, p[0-9]+\.b, p[0-9]+\.b} 3 } } */ +/* ??? We predicate one of the comparisons on the result of the other, + but whether that's a win or a loss will depend on the schedule. */ +/* { dg-final { scan-assembler-not {\tand\t} } } */ /* { dg-final { scan-assembler-times {\torr\tp[0-9]+\.b, p[0-9]+/z, p[0-9]+\.b, p[0-9]+\.b} 3 } } */ /* { dg-final { scan-assembler-times {\teor\tp[0-9]+\.b, p[0-9]+/z, p[0-9]+\.b, p[0-9]+\.b} 3 } } */ /* { dg-final { scan-assembler-times {\tnand\tp[0-9]+\.b, p[0-9]+/z, p[0-9]+\.b, p[0-9]+\.b} 3 } } */ /* { dg-final { scan-assembler-times {\tnor\tp[0-9]+\.b, p[0-9]+/z, p[0-9]+\.b, p[0-9]+\.b} 3 } } */ -/* { dg-final { scan-assembler-times {\tbic\tp[0-9]+\.b, p[0-9]+/z, p[0-9]+\.b, p[0-9]+\.b} 3 } } */ +/* Currently we predicate one of the comparisons on the result of the other + and then use NOT, but the original BIC sequence is better. It's a fairly + niche failure though. We'd handle most other types of comparison by + using the inverse operation instead of a separate NOT. */ +/* { dg-final { scan-assembler-times {\tbic\tp[0-9]+\.b, p[0-9]+/z, p[0-9]+\.b, p[0-9]+\.b} 3 { xfail *-*-* } } */ /* { dg-final { scan-assembler-times {\torn\tp[0-9]+\.b, p[0-9]+/z, p[0-9]+\.b, p[0-9]+\.b} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_7.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_7.c 2018-05-08 11:12:30.156289597 +0100 @@ -0,0 +1,216 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include + +#define N 100 + +#define eq(A, B) ((A) == (B)) +#define ne(A, B) ((A) != (B)) +#define lt(A, B) ((A) < (B)) +#define le(A, B) ((A) <= (B)) +#define ge(A, B) ((A) >= (B)) +#define gt(A, B) ((A) > (B)) +#define unordered(A, B) (__builtin_isunordered (A, B)) + +#define DEF_CONST_LOOP(NAME, SUFFIX, TYPE, CONST) \ + void __attribute__ ((noipa)) \ + NAME##_##SUFFIX##_##TYPE (TYPE *restrict dst, TYPE *restrict src) \ + { \ + for (int i = 0; i < N; ++i) \ + if (NAME (src[i], CONST)) \ + dst[i] = 1; \ + } + +#define DEF_LOOP(NAME, TYPE, CONST1, CONST2) \ + void __attribute__ ((noipa)) \ + NAME##_var_##TYPE (TYPE *restrict dst, TYPE *restrict src, TYPE x) \ + { \ + for (int i = 0; i < N; ++i) \ + if (NAME (src[i], x)) \ + dst[i] = x; \ + } \ + DEF_CONST_LOOP (NAME, const1, TYPE, CONST1) \ + DEF_CONST_LOOP (NAME, const2, TYPE, CONST2) + +#define FOR_EACH_INT_OPERATOR(T, TYPE, CONST1, CONST2) \ + T (eq, TYPE, CONST1, CONST2) \ + T (ne, TYPE, CONST1, CONST2) \ + T (le, TYPE, CONST1, CONST2) \ + T (lt, TYPE, CONST1, CONST2) \ + T (gt, TYPE, CONST1, CONST2) \ + T (ge, TYPE, CONST1, CONST2) + +#define FOR_EACH_FLOAT_OPERATOR(T, TYPE, CONST1, CONST2) \ + FOR_EACH_INT_OPERATOR(T, TYPE, CONST1, CONST2) \ + T (unordered, TYPE, CONST1, CONST2) + +#define FOR_EACH_TYPE(T) \ + FOR_EACH_INT_OPERATOR (T, int8_t, 2, 100) \ + FOR_EACH_INT_OPERATOR (T, int16_t, 3, 1000) \ + FOR_EACH_INT_OPERATOR (T, int32_t, 4, 2000) \ + FOR_EACH_INT_OPERATOR (T, int64_t, 5, 3000) \ + FOR_EACH_INT_OPERATOR (T, uint8_t, 2, 160) \ + FOR_EACH_INT_OPERATOR (T, uint16_t, 3, 500) \ + FOR_EACH_INT_OPERATOR (T, uint32_t, 4, 1500) \ + FOR_EACH_INT_OPERATOR (T, uint64_t, 5, 2500) \ + FOR_EACH_FLOAT_OPERATOR (T, _Float16, 0, 1) \ + FOR_EACH_FLOAT_OPERATOR (T, float, 0, 1) \ + FOR_EACH_FLOAT_OPERATOR (T, double, 0, 1) + +FOR_EACH_TYPE (DEF_LOOP) + +/* { dg-final { scan-assembler-not {\tand\t} } } */ + +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, z[0-9]+\.b\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */ + +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, #2\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #3\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #4\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tcmpeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #5\n} 2 } } */ + +/* { dg-final { scan-assembler-times {\tcmpne\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, z[0-9]+\.b\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tcmpne\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tcmpne\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 4 } } */ +/* { dg-final { scan-assembler-times {\tcmpne\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 4 } } */ + +/* { dg-final { scan-assembler-times {\tcmpne\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, #2\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tcmpne\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #3\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tcmpne\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #4\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tcmpne\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #5\n} 2 } } */ + +/* { dg-final { scan-assembler-times {\tcmplt\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, z[0-9]+\.b\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmplt\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmplt\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmplt\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tcmple\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, #1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmple\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmple\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #3\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmple\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #4\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tcmple\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, z[0-9]+\.b\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tcmple\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tcmple\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tcmple\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ + +/* { dg-final { scan-assembler-times {\tcmple\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, #2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmple\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #3\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmple\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #4\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmple\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #5\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tcmpgt\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, z[0-9]+\.b\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tcmpgt\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tcmpgt\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tcmpgt\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ + +/* { dg-final { scan-assembler-times {\tcmpgt\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, #2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpgt\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #3\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpgt\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #4\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpgt\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #5\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tcmpge\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, z[0-9]+\.b\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpge\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpge\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpge\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tcmpgt\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, #1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpgt\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpgt\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #3\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpgt\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #4\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tcmplo\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, z[0-9]+\.b\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmplo\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmplo\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmplo\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tcmpls\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, #1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpls\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpls\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #3\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpls\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #4\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tcmpls\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, z[0-9]+\.b\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tcmpls\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tcmpls\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tcmpls\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ + +/* { dg-final { scan-assembler-times {\tcmpls\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, #2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpls\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #3\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpls\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #4\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmpls\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #5\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tcmphi\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, z[0-9]+\.b\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tcmphi\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tcmphi\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tcmphi\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ + +/* { dg-final { scan-assembler-times {\tcmphi\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, #2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmphi\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #3\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmphi\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #4\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmphi\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #5\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tcmphs\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, z[0-9]+\.b\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmphs\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmphs\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmphs\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tcmphi\tp[0-7]\.b, p[0-7]/z, z[0-9]+\.b, #1\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmphi\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #2\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmphi\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #3\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tcmphi\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #4\n} 1 } } */ + + +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #0\.0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #0\.0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tfcmeq\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #0\.0\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfcmne\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tfcmne\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tfcmne\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfcmne\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #0\.0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tfcmne\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #0\.0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tfcmne\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #0\.0\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfcmle\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tfcmle\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tfcmle\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfcmle\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #0\.0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tfcmle\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #0\.0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tfcmle\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #0\.0\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfcmlt\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tfcmlt\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tfcmlt\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfcmlt\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #0\.0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tfcmlt\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #0\.0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tfcmlt\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #0\.0\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfcmge\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tfcmge\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tfcmge\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfcmge\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #0\.0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tfcmge\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #0\.0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tfcmge\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #0\.0\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfcmgt\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tfcmgt\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 2 } } */ +/* { dg-final { scan-assembler-times {\tfcmgt\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 2 } } */ + +/* { dg-final { scan-assembler-times {\tfcmgt\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, #0\.0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tfcmgt\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, #0\.0\n} 1 } } */ +/* { dg-final { scan-assembler-times {\tfcmgt\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, #0\.0\n} 1 } } */ + +/* { dg-final { scan-assembler-times {\tfcmuo\tp[0-7]\.h, p[0-7]/z, z[0-9]+\.h, z[0-9]+\.h\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tfcmuo\tp[0-7]\.s, p[0-7]/z, z[0-9]+\.s, z[0-9]+\.s\n} 3 } } */ +/* { dg-final { scan-assembler-times {\tfcmuo\tp[0-7]\.d, p[0-7]/z, z[0-9]+\.d, z[0-9]+\.d\n} 3 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_7_run.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_7_run.c 2018-05-08 11:12:30.157289558 +0100 @@ -0,0 +1,40 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize" } */ + +#include "vcond_7.c" + +#define TEST_CONST_LOOP(NAME, SUFFIX, TYPE, CONST) \ + { \ + for (int i = 0; i < N; ++i) \ + { \ + dst[i] = i * 3; \ + src[i] = i % (CONST + 3); \ + } \ + NAME##_##SUFFIX##_##TYPE (dst, src); \ + for (int i = 0; i < N; ++i) \ + if (dst[i] != (NAME (src[i], CONST) ? (TYPE) 1 : (TYPE) (i * 3))) \ + __builtin_abort (); \ + } + +#define TEST_LOOPS(NAME, TYPE, CONST1, CONST2) \ + { \ + TYPE dst[N], src[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + dst[i] = i * 2; \ + src[i] = i % 5; \ + } \ + NAME##_var_##TYPE (dst, src, 3); \ + for (int i = 0; i < N; ++i) \ + if (dst[i] != (NAME (src[i], 3) ? (TYPE) 3 : (TYPE) (i * 2))) \ + __builtin_abort (); \ + TEST_CONST_LOOP (NAME, const1, TYPE, CONST1) \ + TEST_CONST_LOOP (NAME, const2, TYPE, CONST2) \ + } + +int __attribute__ ((noipa)) +main (void) +{ + FOR_EACH_TYPE (TEST_LOOPS); + return 0; +}