From patchwork Wed May 22 02:04:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kugan Vivekanandarajah X-Patchwork-Id: 164766 Delivered-To: patch@linaro.org Received: by 2002:a92:9e1a:0:0:0:0:0 with SMTP id q26csp290366ili; Tue, 21 May 2019 19:09:52 -0700 (PDT) X-Google-Smtp-Source: APXvYqwI9G3YSlrUZKoO5bPyFU1dp2ezbiFBHt9sr415+7/TO/lvBogEyeuMddi2YGjuoGBIG7ob X-Received: by 2002:a63:6bc3:: with SMTP id g186mr74710654pgc.21.1558490992917; Tue, 21 May 2019 19:09:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1558490992; cv=none; d=google.com; s=arc-20160816; b=UVJrVDPfkuA3vrE1ScTbKPSo5Yqc4XzKr4MG5tFD/c1c3c6Ume8Hj9oSGzk99mUo6x dxp/++bAkI366ZQkbu6PnUAI7i4C73jf7agTvLi0xZ1QGLqU5P77zEmb379LGU4YMTjh 821jlbfnLg7/MydQNqG/o+6sYwpW9NfvHhmWSfObKYrmUn5PWKpCbjEWzrhLZ9/Mc5Di u14SVwRQ6FbcJoiR440BCgx4BPG+kC2lnsUQBa8LOWz5VYVJ/0woo+uJC+Ni8Nhzx3b9 IMDmoFTE1ZYFIrPpkXUdCsQitKbYH07bxFhEycl21+lyVXMJmQV+iNt7+mZFL+HiILcp tnpg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=to:subject:message-id:date:from:mime-version:dkim-signature :delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature; bh=o+Jcfty8aFPY937Akp53uo4grefskkoJc7YR2AMwW8Q=; b=NeF5wpgcMwxFWRTJ3Apj5tJTNi6cv9PyyIle0SyJlKoPGmPE+AgF5GwCHEXma37nEE gOXC44b2JDWr/Y+AF9wo+ZPsOXPqThVXfq6PtLls8LN4NIwIHuYGEUGW9wSsevgCRazd 5WBGv6ffEv6zCc5npmWOrrCU/2i4S6O9IZpTqw2qpGcR94pvpisubfh4XM9W1ojpSWt5 mf5CQV6GX/4iIRAVlsNS5leVDuJraErxg0+guSudfQiFHIEjH/vHJ3R7UEzoUDLYMhGh AnHzBtNzdT80XNnJUwenGWx/+k0PhqPbLohFKNbt3PAxxgmFhDBb/amv7osAuVJxzFfp EkyQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Pi2Ns5w2; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=knRGTout; spf=pass (google.com: domain of gcc-patches-return-501400-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom="gcc-patches-return-501400-patch=linaro.org@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id z23si24431738pfa.277.2019.05.21.19.09.52 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 May 2019 19:09:52 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-501400-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Pi2Ns5w2; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=knRGTout; spf=pass (google.com: domain of gcc-patches-return-501400-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom="gcc-patches-return-501400-patch=linaro.org@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; q= dns; s=default; b=cUeBHr65+V/wLrBnm0L2Qx8upvLS/McmPq1rYA0Anrm3l7 efCJxhZG38NTLNOildPeinmF2AjHlMemx4ibUqvWyNiInSMpsazBUXtQ9CC3CzXA hyBoQ2lCnMR1nXdVJp/FSZOquw2uNcw/lJrRhlvCN+RNY6DtE0c7jNetAghUE= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; s= default; bh=SfMV3d7EF3vQaexIXjqd2gyzfgc=; b=Pi2Ns5w26h8sGdHCMsN4 AL3U2D2KfDN93KrLwwTQqh9gnmoJnvNpZaYKcoyj5W52WXgs1+l00NV7BZYn7DAW x6HT8YeOc1wf2Yq0BTBwywlcTLObWu0OdFoo4xcg+k2WXCZNa9TM2hjvxdpRS7Ge Psip70aL8QYmAGDHvRLn0jg= Received: (qmail 120026 invoked by alias); 22 May 2019 02:09:37 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 120008 invoked by uid 89); 22 May 2019 02:09:36 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.3 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy=mar, *loc, reusable, integer_cst X-HELO: mail-lf1-f42.google.com Received: from mail-lf1-f42.google.com (HELO mail-lf1-f42.google.com) (209.85.167.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 22 May 2019 02:09:34 +0000 Received: by mail-lf1-f42.google.com with SMTP id n134so397846lfn.11 for ; Tue, 21 May 2019 19:09:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:from:date:message-id:subject:to; bh=P+b9991pk1m6v2jG5F+3+Hod/jxUCzg5TzuODkwsH7U=; b=knRGToutRgmVJhJqekqC6zS55U8syEH4L20YuQixw8OHSz/hcjkmvg4gmafjmWTKtJ zPOEOeD0idIc7kxz52mOqAna7CkiXL4qro4/mwFLdOxn1a4FqvtqvjWApg9Lv4vmK9PW 7iKNvXK8BOIIc74g4B42vwcqcpXRRZlg3xptueGMq6PbxN9ybk49s4Q8bGqgIxvF6I09 joNJNF9EKpUUp+7l459pJYxXjmRVs73Vo0rpldxThCeKoh7fi2CU8E/tqihum4uRTxdA 6UalHo+hfHgdajzlxqjuvdaFUeSskBR8Tz0RvKWSbTU4gPv580nfelYaQhew5Zf8KEa6 K2fg== MIME-Version: 1.0 From: Kugan Vivekanandarajah Date: Wed, 22 May 2019 12:04:37 +1000 Message-ID: Subject: [RFC][PR88838][SVE] Use 32-bit WHILELO in LP64 mode To: GCC Patches X-IsSubscribed: yes Hi, Attached RFC patch attempts to use 32-bit WHILELO in LP64 mode to fix the PR. Bootstarp and regression testing ongoing. In earlier testing, I ran into an issue related to fwprop. I will tackle that based on the feedback for the patch. Thanks, Kugan >From 4e9837ff9c0c080923f342e83574a6fdba2b3d92 Mon Sep 17 00:00:00 2001 From: Kugan Vivekanandarajah Date: Tue, 5 Mar 2019 10:01:45 +1100 Subject: [PATCH] pr88838[v2] As Mentioned in PR88838, this patch avoid the SXTW by using WHILELO on W registers instead of X registers. As mentined in PR, vect_verify_full_masking checks which IV widths are supported for WHILELO but prefers to go to Pmode width. This is because using Pmode allows ivopts to reuse the IV for indices (as in the loads and store above). However, it would be better to use a 32-bit WHILELO with a truncated 64-bit IV if: (a) the limit is extended from 32 bits. (b) the detection loop in vect_verify_full_masking detects that using a 32-bit IV would be correct. gcc/ChangeLog: 2019-05-22 Kugan Vivekanandarajah * tree-vect-loop-manip.c (vect_set_loop_masks_directly): If the compare_type is not with Pmode size, we will create an IV with Pmode size with truncated use (i.e. converted to the correct type). * tree-vect-loop.c (vect_verify_full_masking): Find which IV widths are supported for WHILELO. gcc/testsuite/ChangeLog: 2019-05-22 Kugan Vivekanandarajah * gcc.target/aarch64/pr88838.c: New test. * gcc.target/aarch64/sve/while_1.c: Adjust. Change-Id: Iff52946c28d468078f2cc0868d53edb05325b8ca --- gcc/fwprop.c | 13 +++++++ gcc/testsuite/gcc.target/aarch64/pr88838.c | 11 ++++++ gcc/testsuite/gcc.target/aarch64/sve/while_1.c | 16 ++++---- gcc/tree-vect-loop-manip.c | 52 ++++++++++++++++++++++++-- gcc/tree-vect-loop.c | 39 ++++++++++++++++++- 5 files changed, 117 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/pr88838.c diff --git a/gcc/fwprop.c b/gcc/fwprop.c index cf2c9de..5275ad3 100644 --- a/gcc/fwprop.c +++ b/gcc/fwprop.c @@ -1358,6 +1358,19 @@ forward_propagate_and_simplify (df_ref use, rtx_insn *def_insn, rtx def_set) else mode = GET_MODE (*loc); + /* TODO. */ + if (GET_MODE_CLASS (mode) != GET_MODE_CLASS (GET_MODE (reg))) + return false; + /* TODO. We can't get the mode for + (set (reg:VNx16BI 109) + (unspec:VNx16BI [ + (reg:SI 131) + (reg:SI 106) + ] UNSPEC_WHILE_LO)) + Thus, bailout when it is UNSPEC and MODEs are not compatible. */ + if (GET_MODE_CLASS (mode) != GET_MODE_CLASS (GET_MODE (reg)) + && GET_CODE (SET_SRC (use_set)) == UNSPEC) + return false; new_rtx = propagate_rtx (*loc, mode, reg, src, optimize_bb_for_speed_p (BLOCK_FOR_INSN (use_insn))); diff --git a/gcc/testsuite/gcc.target/aarch64/pr88838.c b/gcc/testsuite/gcc.target/aarch64/pr88838.c new file mode 100644 index 0000000..9d03c0a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr88838.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-S -O3 -march=arm8.2-a+sve" } */ + +void +f (int *restrict x, int *restrict y, int *restrict z, int n) +{ + for (int i = 0; i < n; i += 1) + x[i] = y[i] + z[i]; +} + +/* { dg-final { scan-assembler-not "sxtw" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve/while_1.c b/gcc/testsuite/gcc.target/aarch64/sve/while_1.c index a93a04b..05a4860 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/while_1.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/while_1.c @@ -26,14 +26,14 @@ TEST_ALL (ADD_LOOP) /* { dg-final { scan-assembler-not {\tuqdec} } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b, xzr,} 2 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b, x[0-9]+,} 2 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h, xzr,} 2 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h, x[0-9]+,} 2 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s, xzr,} 3 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s, x[0-9]+,} 3 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d, xzr,} 3 } } */ -/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d, x[0-9]+,} 3 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b, wzr,} 2 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.b, w[0-9]+,} 2 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h, wzr,} 2 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.h, w[0-9]+,} 2 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s, wzr,} 3 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.s, w[0-9]+,} 3 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d, wzr,} 3 } } */ +/* { dg-final { scan-assembler-times {\twhilelo\tp[0-7]\.d, w[0-9]+,} 3 } } */ /* { dg-final { scan-assembler-times {\tld1b\tz[0-9]+\.b, p[0-7]/z, \[x0, x[0-9]+\]\n} 2 } } */ /* { dg-final { scan-assembler-times {\tst1b\tz[0-9]+\.b, p[0-7], \[x0, x[0-9]+\]\n} 2 } } */ /* { dg-final { scan-assembler-times {\tld1h\tz[0-9]+\.h, p[0-7]/z, \[x0, x[0-9]+, lsl 1\]\n} 2 } } */ diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 77d3dac..d6452a1 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -418,7 +418,20 @@ vect_set_loop_masks_directly (struct loop *loop, loop_vec_info loop_vinfo, tree mask_type = rgm->mask_type; unsigned int nscalars_per_iter = rgm->max_nscalars_per_iter; poly_uint64 nscalars_per_mask = TYPE_VECTOR_SUBPARTS (mask_type); - + bool convert = false; + tree iv_type = NULL_TREE; + + /* If the compare_type is not with Pmode size, we will create an IV with + Pmode size with truncated use (i.e. converted to the correct type). + This is because using Pmode allows ivopts to reuse the IV for indices + (in the loads and store). */ + if (known_lt (GET_MODE_BITSIZE (TYPE_MODE (compare_type)), + GET_MODE_BITSIZE (Pmode))) + { + iv_type = build_nonstandard_integer_type (GET_MODE_BITSIZE (Pmode), + true); + convert = true; + } /* Calculate the maximum number of scalar values that the rgroup handles in total, the number that it handles for each iteration of the vector loop, and the number that it should skip during the @@ -444,12 +457,43 @@ vect_set_loop_masks_directly (struct loop *loop, loop_vec_info loop_vinfo, processed. */ tree index_before_incr, index_after_incr; gimple_stmt_iterator incr_gsi; + gimple_stmt_iterator incr_gsi2; bool insert_after; - tree zero_index = build_int_cst (compare_type, 0); + tree zero_index; standard_iv_increment_position (loop, &incr_gsi, &insert_after); - create_iv (zero_index, nscalars_step, NULL_TREE, loop, &incr_gsi, - insert_after, &index_before_incr, &index_after_incr); + if (convert) + { + /* If we are creating IV of Pmode type and converting. */ + zero_index = build_int_cst (iv_type, 0); + tree step = build_int_cst (iv_type, + LOOP_VINFO_VECT_FACTOR (loop_vinfo)); + /* Creating IV of Pmode type. */ + create_iv (zero_index, step, NULL_TREE, loop, &incr_gsi, + insert_after, &index_before_incr, &index_after_incr); + /* Create truncated index_before and after increament. */ + tree index_before_incr_trunc = make_ssa_name (compare_type); + tree index_after_incr_trunc = make_ssa_name (compare_type); + gimple *incr_before_stmt = gimple_build_assign (index_before_incr_trunc, + NOP_EXPR, + index_before_incr); + gimple *incr_after_stmt = gimple_build_assign (index_after_incr_trunc, + NOP_EXPR, + index_after_incr); + incr_gsi2 = incr_gsi; + gsi_insert_before (&incr_gsi2, incr_before_stmt, GSI_NEW_STMT); + gsi_insert_after (&incr_gsi, incr_after_stmt, GSI_NEW_STMT); + index_before_incr = index_before_incr_trunc; + index_after_incr = index_after_incr_trunc; + zero_index = build_int_cst (compare_type, 0); + } + else + { + /* If the IV is of Pmode compare_type, no convertion needed. */ + zero_index = build_int_cst (compare_type, 0); + create_iv (zero_index, nscalars_step, NULL_TREE, loop, &incr_gsi, + insert_after, &index_before_incr, &index_after_incr); + } tree test_index, test_limit, first_limit; gimple_stmt_iterator *test_gsi; if (might_wrap_p) diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index bd81193..2769c86 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -1035,6 +1035,30 @@ vect_verify_full_masking (loop_vec_info loop_vinfo) /* Find a scalar mode for which WHILE_ULT is supported. */ opt_scalar_int_mode cmp_mode_iter; tree cmp_type = NULL_TREE; + tree niters_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)); + tree niters_skip = LOOP_VINFO_MASK_SKIP_NITERS (loop_vinfo); + unsigned HOST_WIDE_INT max_vf = vect_max_vf (loop_vinfo); + widest_int iv_limit; + bool known_max_iters = max_loop_iterations (loop, &iv_limit); + if (known_max_iters) + { + if (niters_skip) + { + /* Add the maximum number of skipped iterations to the + maximum iteration count. */ + if (TREE_CODE (niters_skip) == INTEGER_CST) + iv_limit += wi::to_widest (niters_skip); + else + iv_limit += max_vf - 1; + } + /* IV_LIMIT is the maximum number of latch iterations, which is also + the maximum in-range IV value. Round this value down to the previous + vector alignment boundary and then add an extra full iteration. */ + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); + iv_limit = (iv_limit & -(int) known_alignment (vf)) + max_vf; + } + + /* Get the vectorization factor in tree form. */ FOR_EACH_MODE_IN_CLASS (cmp_mode_iter, MODE_INT) { unsigned int cmp_bits = GET_MODE_BITSIZE (cmp_mode_iter.require ()); @@ -1045,12 +1069,23 @@ vect_verify_full_masking (loop_vec_info loop_vinfo) if (this_type && can_produce_all_loop_masks_p (loop_vinfo, this_type)) { + /* See whether zero-based IV would ever generate all-false masks + before wrapping around. */ + bool might_wrap_p + = (!known_max_iters + || (wi::min_precision + (iv_limit + * vect_get_max_nscalars_per_iter (loop_vinfo), + UNSIGNED) > cmp_bits)); /* Although we could stop as soon as we find a valid mode, it's often better to continue until we hit Pmode, since the operands to the WHILE are more likely to be reusable in - address calculations. */ + address calculations. Unless the limit is extended from + this_type. */ cmp_type = this_type; - if (cmp_bits >= GET_MODE_BITSIZE (Pmode)) + if (cmp_bits >= GET_MODE_BITSIZE (Pmode) + || (!might_wrap_p + && (cmp_bits == TYPE_PRECISION (niters_type)))) break; } } -- 2.7.4