From patchwork Fri Jan 26 13:59:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 125985 Delivered-To: patch@linaro.org Received: by 10.46.84.92 with SMTP id y28csp353696ljd; Fri, 26 Jan 2018 06:00:07 -0800 (PST) X-Google-Smtp-Source: AH8x225nlv2GmjUjWrD9imBKWccY3I/shs37fQajorcJ/5aEpCAFQ8cFWM7nf0wrB4j08VT80ODI X-Received: by 10.98.9.202 with SMTP id 71mr19357688pfj.122.1516975207546; Fri, 26 Jan 2018 06:00:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516975207; cv=none; d=google.com; s=arc-20160816; b=o+mat8GI8PFrgeGaZdvxlmNWHauftJBXlFEKU10Cfk7eBbsAXkfcRhOeQXFElZo8/o XjAOnA6W2XFkuTBTVbR+I3bv/tYU5zatUkUJOCqWXRVMEmS9hCChNiX2k4suErGb3Bqr KJ6Z+tWPyT17tcw4uvvmPtN5lt9PWbzblJJ88dAiWD/+/cO2xdwEu3OYJtbLTJgRXzN5 J0mPWcJsh9SCcibXPzm+zj0WEvAzcMV6WaCqsbGAPeOYChWjwBdcoTW1EWBEnr4ffitL R4Gu6zUtTgAohjeVFtzFHNvAuXGBkk8wUZCpjRMxX/+5gnHK6XSYY3e38XH/eFCSrgrQ 6r7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:date:subject:cc:mail-followup-to :to:from:delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=octDoR/sJJXXGTgA4CS1p7zQyrvVM29lS42+sCHa/d8=; b=iVWGcoC3a8ekEUJmPka2qs5PmF0a7ZBw5cnbD3Fxq6dbl3VkaBAlHJf3RU1naqr81s s/R5CHqlhKA2P9UjkJA9cLuM8lSCM6zWn2aoOxpQHhptIa7ppqUBHEzOv3/q0J81Nlom VZ34DOf7mX3P0MBrvnTTlQkomNuDYX5gHYOdG5ev3q7doPJQIQm8F+n21FE13OiZdhgx NCdNjbPRqbG0qZ5GpJla+e6lA+b3BGtc3qL1xDpA93JoM0QZqoaFqXJiu5Za0hylFw4w Tpwp8ovmfayqN0jWZORozOmDxIXdxV1aZZRq9fz/mpVg9PC/3YQVURSfzmsTNQu7p2o/ yRPA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=xkmHZDX4; spf=pass (google.com: domain of gcc-patches-return-472107-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-472107-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id k11si3051355pgc.143.2018.01.26.06.00.07 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 26 Jan 2018 06:00:07 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-472107-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=xkmHZDX4; spf=pass (google.com: domain of gcc-patches-return-472107-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-472107-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; q=dns; s=default; b=eUTRVWfsmsMlN+oryljxFLuWOzlI2TTjoLYkODzdwzyof+cXoE wdySW6hDO323xurIjyMieLdiuXZh6WkPU2L82cyHvU5Gb42O3M51IVVtJ6UX4WoJ erG1MVT0S8OLVwkmCJMeOYV9pQUU3X8usNsyjUDPhyiSvb9m+U+Sb+C0c= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:mime-version:content-type; s= default; bh=P5aR+2b4PBbjQWK852PKlSapiQc=; b=xkmHZDX4LPT16TgEvmWo C24Yd9CYUkNjY/91HAEC2Jbx5EFsypGLiJND98h0ZBqw6RG0AZXyXaQ7STZAa8jy eGg+FXx4VEE6SFWe6yba6p9JwBzFhCjxTv7rlLnaOXVlVAcNCcnMmmp3AvFJabO2 5O0Tsa1GA9eiIfm03FoVpzo= Received: (qmail 101554 invoked by alias); 26 Jan 2018 13:59:52 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 101498 invoked by uid 89); 26 Jan 2018 13:59:48 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wr0-f178.google.com Received: from mail-wr0-f178.google.com (HELO mail-wr0-f178.google.com) (209.85.128.178) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 26 Jan 2018 13:59:45 +0000 Received: by mail-wr0-f178.google.com with SMTP id v15so607671wrb.8 for ; Fri, 26 Jan 2018 05:59:45 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:cc:subject:date :message-id:user-agent:mime-version; bh=octDoR/sJJXXGTgA4CS1p7zQyrvVM29lS42+sCHa/d8=; b=iMN2CH6PBO9qbVs81l84ZFr2uKhbBE3DxHzIXP4uyIxMFIUAwx1kW5FFVFCxizJ2IH um1sxWxNNj0Bl9a1jTwHAW13TMRhOYZYyocbeJ2OW7av7x8Dt5R4h1GmJ8aFyjydu85y YFKCUg1l4ZqdWE/oLTG7KtzmccSQ8aj+n1t5cGyRulW/UZUS8GGOicwGLVcZjCERHTXh 4CBCC6k97K1yJsUdRh9BgARILPE8uKYxS+YW8tTy9kukySkEtAs9UrgnMA4pkZhQRQiZ lqMsj8qj7tVUVmvBlGcuxLtAoNg5RouvleiKgsWjJZuRi7utqFdVXFfAfUanwfz8Rfnw VPfw== X-Gm-Message-State: AKwxytclfAeR4A3V97ubkgEm0LTdPoVP4/dx5dlY6bzTuJ0mks2aUBNk rXY0p7gl+BZBMZ7KkAaxLeFw6Q== X-Received: by 10.223.193.71 with SMTP id w7mr13029686wre.88.1516975183214; Fri, 26 Jan 2018 05:59:43 -0800 (PST) Received: from localhost (92.40.248.158.threembb.co.uk. [92.40.248.158]) by smtp.gmail.com with ESMTPSA id d195sm4252082wmd.2.2018.01.26.05.59.42 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Fri, 26 Jan 2018 05:59:42 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.earnshaw@arm.com, james.greenhalgh@arm.com, marcus.shawcroft@arm.com, richard.sandiford@linaro.org Cc: richard.earnshaw@arm.com, james.greenhalgh@arm.com, marcus.shawcroft@arm.com Subject: [AArch64] Handle SVE subregs that are effectively REVs Date: Fri, 26 Jan 2018 13:59:40 +0000 Message-ID: <87o9lgoh4j.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 Subreg reads should be equivalent to storing the inner register to memory and loading the appropriate memory bytes back, with subreg writes doing the reverse. For the reasons explained in the comments, this isn't what happens for big-endian SVE if we simply reinterpret one vector register as having a different element size, so the conceptual store and load is needed in the general case. However, that obviously produces poor code if we do it too often. The patch therefore adds a pattern for handling the operation in registers. This copes with the important case of a VIEW_CONVERT created by tree-vect-slp.c:duplicate_and_interleave. It might make sense to tighten the predicates in aarch64-sve.md so that such subregs are not allowed as operands to most instructions, but that's future work. This fixes the sve/slp_*.c tests on aarch64_be. Tested on aarch64_be-elf and aarch64-linux-gnu. OK to install? Richard 2018-01-26 Richard Sandiford gcc/ * config/aarch64/aarch64-protos.h (aarch64_split_sve_subreg_move) (aarch64_maybe_expand_sve_subreg_move): Declare. * config/aarch64/aarch64.md (UNSPEC_REV_SUBREG): New unspec. * config/aarch64/predicates.md (aarch64_any_register_operand): New predicate. * config/aarch64/aarch64-sve.md (mov): Optimize subreg moves that are semantically a reverse operation. (*aarch64_sve_mov_subreg_be): New pattern. * config/aarch64/aarch64.c (aarch64_maybe_expand_sve_subreg_move): (aarch64_replace_reg_mode, aarch64_split_sve_subreg_move): New functions. (aarch64_can_change_mode_class): For big-endian, forbid changes between two SVE modes if they have different element sizes. Index: gcc/config/aarch64/aarch64-protos.h =================================================================== --- gcc/config/aarch64/aarch64-protos.h 2018-01-19 11:57:11.135992201 +0000 +++ gcc/config/aarch64/aarch64-protos.h 2018-01-26 13:55:12.276219256 +0000 @@ -447,6 +447,8 @@ void aarch64_expand_epilogue (bool); void aarch64_expand_mov_immediate (rtx, rtx, rtx (*) (rtx, rtx) = 0); void aarch64_emit_sve_pred_move (rtx, rtx, rtx); void aarch64_expand_sve_mem_move (rtx, rtx, machine_mode); +bool aarch64_maybe_expand_sve_subreg_move (rtx, rtx); +void aarch64_split_sve_subreg_move (rtx, rtx, rtx); void aarch64_expand_prologue (void); void aarch64_expand_vector_init (rtx, rtx); void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx, Index: gcc/config/aarch64/aarch64.md =================================================================== --- gcc/config/aarch64/aarch64.md 2018-01-19 11:57:11.134992236 +0000 +++ gcc/config/aarch64/aarch64.md 2018-01-26 13:55:12.279219121 +0000 @@ -168,6 +168,7 @@ (define_c_enum "unspec" [ UNSPEC_INSR UNSPEC_CLASTB UNSPEC_FADDA + UNSPEC_REV_SUBREG ]) (define_c_enum "unspecv" [ Index: gcc/config/aarch64/predicates.md =================================================================== --- gcc/config/aarch64/predicates.md 2018-01-19 11:57:11.130992372 +0000 +++ gcc/config/aarch64/predicates.md 2018-01-26 13:55:12.279219121 +0000 @@ -617,3 +617,7 @@ (define_predicate "aarch64_gather_scale_ (define_predicate "aarch64_gather_scale_operand_d" (and (match_code "const_int") (match_test "INTVAL (op) == 1 || INTVAL (op) == 8"))) + +;; A special predicate that doesn't match a particular mode. +(define_special_predicate "aarch64_any_register_operand" + (match_code "reg")) Index: gcc/config/aarch64/aarch64-sve.md =================================================================== --- gcc/config/aarch64/aarch64-sve.md 2018-01-26 13:49:00.069630245 +0000 +++ gcc/config/aarch64/aarch64-sve.md 2018-01-26 13:55:12.277219211 +0000 @@ -84,6 +84,32 @@ (define_expand "mov" gen_vec_duplicate); DONE; } + + /* Optimize subregs on big-endian targets: we can use REV[BHW] + instead of going through memory. */ + if (BYTES_BIG_ENDIAN + && aarch64_maybe_expand_sve_subreg_move (operands[0], operands[1])) + DONE; + } +) + +;; A pattern for optimizing SUBREGs that have a reinterpreting effect +;; on big-endian targets; see aarch64_maybe_expand_sve_subreg_move +;; for details. We use a special predicate for operand 2 to reduce +;; the number of patterns. +(define_insn_and_split "*aarch64_sve_mov_subreg_be" + [(set (match_operand:SVE_ALL 0 "aarch64_sve_nonimmediate_operand" "=w") + (unspec:SVE_ALL + [(match_operand:VNx16BI 1 "register_operand" "Upl") + (match_operand 2 "aarch64_any_register_operand" "w")] + UNSPEC_REV_SUBREG))] + "TARGET_SVE && BYTES_BIG_ENDIAN" + "#" + "&& reload_completed" + [(const_int 0)] + { + aarch64_split_sve_subreg_move (operands[0], operands[1], operands[2]); + DONE; } ) Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2018-01-26 13:52:59.777644520 +0000 +++ gcc/config/aarch64/aarch64.c 2018-01-26 13:55:12.278219166 +0000 @@ -3074,6 +3074,120 @@ aarch64_expand_sve_mem_move (rtx dest, r aarch64_emit_sve_pred_move (dest, ptrue, src); } +/* Called only on big-endian targets. See whether an SVE vector move + from SRC to DEST is effectively a REV[BHW] instruction, because at + least one operand is a subreg of an SVE vector that has wider or + narrower elements. Return true and emit the instruction if so. + + For example: + + (set (reg:VNx8HI R1) (subreg:VNx8HI (reg:VNx16QI R2) 0)) + + represents a VIEW_CONVERT between the following vectors, viewed + in memory order: + + R2: { [0].high, [0].low, [1].high, [1].low, ... } + R1: { [0], [1], [2], [3], ... } + + The high part of lane X in R2 should therefore correspond to lane X*2 + of R1, but the register representations are: + + msb lsb + R2: ...... [1].high [1].low [0].high [0].low + R1: ...... [3] [2] [1] [0] + + where the low part of lane X in R2 corresponds to lane X*2 in R1. + We therefore need a reverse operation to swap the high and low values + around. + + This is purely an optimization. Without it we would spill the + subreg operand to the stack in one mode and reload it in the + other mode, which has the same effect as the REV. */ + +bool +aarch64_maybe_expand_sve_subreg_move (rtx dest, rtx src) +{ + gcc_assert (BYTES_BIG_ENDIAN); + if (GET_CODE (dest) == SUBREG) + dest = SUBREG_REG (dest); + if (GET_CODE (src) == SUBREG) + src = SUBREG_REG (src); + + /* The optimization handles two single SVE REGs with different element + sizes. */ + if (!REG_P (dest) + || !REG_P (src) + || aarch64_classify_vector_mode (GET_MODE (dest)) != VEC_SVE_DATA + || aarch64_classify_vector_mode (GET_MODE (src)) != VEC_SVE_DATA + || (GET_MODE_UNIT_SIZE (GET_MODE (dest)) + == GET_MODE_UNIT_SIZE (GET_MODE (src)))) + return false; + + /* Generate *aarch64_sve_mov_subreg_be. */ + rtx ptrue = force_reg (VNx16BImode, CONSTM1_RTX (VNx16BImode)); + rtx unspec = gen_rtx_UNSPEC (GET_MODE (dest), gen_rtvec (2, ptrue, src), + UNSPEC_REV_SUBREG); + emit_insn (gen_rtx_SET (dest, unspec)); + return true; +} + +/* Return a copy of X with mode MODE, without changing its other + attributes. Unlike gen_lowpart, this doesn't care whether the + mode change is valid. */ + +static rtx +aarch64_replace_reg_mode (rtx x, machine_mode mode) +{ + if (GET_MODE (x) == mode) + return x; + + x = shallow_copy_rtx (x); + set_mode_and_regno (x, mode, REGNO (x)); + return x; +} + +/* Split a *aarch64_sve_mov_subreg_be pattern with the given + operands. */ + +void +aarch64_split_sve_subreg_move (rtx dest, rtx ptrue, rtx src) +{ + /* Decide which REV operation we need. The mode with narrower elements + determines the mode of the operands and the mode with the wider + elements determines the reverse width. */ + machine_mode mode_with_wider_elts = GET_MODE (dest); + machine_mode mode_with_narrower_elts = GET_MODE (src); + if (GET_MODE_UNIT_SIZE (mode_with_wider_elts) + < GET_MODE_UNIT_SIZE (mode_with_narrower_elts)) + std::swap (mode_with_wider_elts, mode_with_narrower_elts); + + unsigned int wider_bytes = GET_MODE_UNIT_SIZE (mode_with_wider_elts); + unsigned int unspec; + if (wider_bytes == 8) + unspec = UNSPEC_REV64; + else if (wider_bytes == 4) + unspec = UNSPEC_REV32; + else if (wider_bytes == 2) + unspec = UNSPEC_REV16; + else + gcc_unreachable (); + machine_mode pred_mode = aarch64_sve_pred_mode (wider_bytes).require (); + + /* Emit: + + (set DEST (unspec [PTRUE (unspec [SRC] UNSPEC_REV)] + UNSPEC_MERGE_PTRUE)) + + with the appropriate modes. */ + ptrue = gen_lowpart (pred_mode, ptrue); + dest = aarch64_replace_reg_mode (dest, mode_with_narrower_elts); + src = aarch64_replace_reg_mode (src, mode_with_narrower_elts); + src = gen_rtx_UNSPEC (mode_with_narrower_elts, gen_rtvec (1, src), unspec); + src = gen_rtx_UNSPEC (mode_with_narrower_elts, gen_rtvec (2, ptrue, src), + UNSPEC_MERGE_PTRUE); + emit_insn (gen_rtx_SET (dest, src)); +} + static bool aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED, tree exp ATTRIBUTE_UNUSED) @@ -17197,10 +17311,27 @@ aarch64_compute_pressure_classes (reg_cl aarch64_can_change_mode_class (machine_mode from, machine_mode to, reg_class_t) { - /* See the comment at the head of aarch64-sve.md for details. */ - if (BYTES_BIG_ENDIAN - && (aarch64_sve_data_mode_p (from) != aarch64_sve_data_mode_p (to))) - return false; + if (BYTES_BIG_ENDIAN) + { + bool from_sve_p = aarch64_sve_data_mode_p (from); + bool to_sve_p = aarch64_sve_data_mode_p (to); + + /* Don't allow changes between SVE data modes and non-SVE modes. + See the comment at the head of aarch64-sve.md for details. */ + if (from_sve_p != to_sve_p) + return false; + + /* Don't allow changes in element size: lane 0 of the new vector + would not then be lane 0 of the old vector. See the comment + above aarch64_maybe_expand_sve_subreg_move for a more detailed + description. + + In the worst case, this forces a register to be spilled in + one mode and reloaded in the other, which handles the + endianness correctly. */ + if (from_sve_p && GET_MODE_UNIT_SIZE (from) != GET_MODE_UNIT_SIZE (to)) + return false; + } return true; }