From patchwork Sat Dec 9 23:20:01 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 121297 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp1164279qgn; Sat, 9 Dec 2017 15:20:35 -0800 (PST) X-Google-Smtp-Source: AGs4zMbTDc0YJIPjNWwa/8drG2vx7t5slme0BcZM0J3+2CxtwmC9nAtSiVGZT78XJ/jiADRYm8eq X-Received: by 10.98.7.149 with SMTP id 21mr7173807pfh.14.1512861635838; Sat, 09 Dec 2017 15:20:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512861635; cv=none; d=google.com; s=arc-20160816; b=XDuOZ9C0O/f3NQlvvy5T6qebxuBHAEi9QfmmqlO5LjxWTHN3e/5QmNl2QbotkOLWja 2OWK+6iVPq9AXL5eTeSZgreYpf2Z7p3wEyxMHSBwp/k22JAJajWXeI9WeRIdCokWkuiD F81tgGSOKcZMgaEBATSDQFaZgNo7aOAeq28vuX6ew/qMcz1tsgkHZcsvmmbfz6wUAen4 hkSG6Klrj1tIfwZMN7yrMQuyJPbcRjfTQ06VGI1OsXieT4Y133N6+U4Ek8PlPjuDlE06 79E5lE6NLZIR4ttSNYJIo8IRVTY8EuBEbLD87KhdTACIDZeLYJAW+uCkP/rcADUxOswB 8CXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:in-reply-to:date:references :subject:mail-followup-to:to:from:delivered-to:sender:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :mailing-list:dkim-signature:domainkey-signature :arc-authentication-results; bh=NZw+WHR4wan/YL+5UxEE0PY9fUcCdU/2Btk09/rt34o=; b=CVhnzFcNaYckE0wk/HpRe39ghZk80GiD4wkcVsuGj8v+gu8kKK6bmaBEZIfD/Rphl/ 0dlLuMLXMNXj5CAQEPKsHZue7aRuqr2xZIOdll14GS9PpN2TAUWX51PJ1Xi+u2JVDFLo CMjad8l1uUQvgRuoJCO7y20Knqs2AafqK3vqs//+SwCWkoZGYMkc8+PQH4joiP7kxIUt x+mfxN3cKYbJzYGrat7FtKz1kDrm0+bLbHoohm4Rp5p05RuBS/q3IsZZo64GjWqk8d3r S0BDbmH0CUjgh4Hd8zqxElAYUL6BoH0EFQJnH/zTOD6GB6Pe2Tt9b98aY5k8UYQy6+Jr 0KJw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=GczFyBCd; spf=pass (google.com: domain of gcc-patches-return-468857-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-468857-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id f1si1061935plb.58.2017.12.09.15.20.35 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 09 Dec 2017 15:20:35 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-468857-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=GczFyBCd; spf=pass (google.com: domain of gcc-patches-return-468857-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-468857-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:references:date:in-reply-to:message-id:mime-version :content-type; q=dns; s=default; b=UXiTrn0KK/3Z9S54lEIfVVwxO7Qfq PZx6r8fZqHxENyCrwDzmCYMBnSmh+UcJcyigvfcX1dg9NxSXR/eu1Nc0SWF5mAnl mWpBWkCauxuExSrpL1K6rhRvFly43iNLPrfts/f1WkkUMMEz59pE8kY2gI/CcKn8 lLwNwuGuxQz3Bk= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:references:date:in-reply-to:message-id:mime-version :content-type; s=default; bh=Lcvy+FIHjhmhvy5IsGJbx92pYX0=; b=Gcz FyBCdqCfus1sNfa2hohKMK86PDi7kulT7mWTB9k/Nl+0Q1f8gxsTPVMPCgrggITx QNzmhHUMpPljYyPFRFuiY4BH4CiopgPG1k9nkF/QkCy6g2xJmRcaBRnh9eTd1zBq jKGBAZOWk0ETHpD+NiAyn1IJ5ckRSoyJjykFjpsU= Received: (qmail 100445 invoked by alias); 9 Dec 2017 23:20:16 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 100430 invoked by uid 89); 9 Dec 2017 23:20:13 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-15.5 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=stepped X-HELO: mail-wm0-f51.google.com Received: from mail-wm0-f51.google.com (HELO mail-wm0-f51.google.com) (74.125.82.51) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Sat, 09 Dec 2017 23:20:07 +0000 Received: by mail-wm0-f51.google.com with SMTP id f9so8377438wmh.0 for ; Sat, 09 Dec 2017 15:20:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:references:date :in-reply-to:message-id:user-agent:mime-version; bh=NZw+WHR4wan/YL+5UxEE0PY9fUcCdU/2Btk09/rt34o=; b=rY7MkaiXh/rnrwg2XTQDLugdU59OZ6dANoUGsC/cAQls3qiUsQDJaU78RvCifsLJWO rgsAMBLr4DDnpoBfJQwwfGepnJxz7QF8uGDrIoaFAjl5lo2mOlCXdmTsvxR3sT6PnzWD ebGHnopF2jTcFwniXtClPwhM6SxPZAEIBtOfaSYBfIeD6HLNb5x4mDmCJdMedNpSOTNE x/u5aUvghtYbUgC01/YyDVWScSNfIDVZ2GqscoXxrUgU1qZgN97bIDUCMEIAIf2GZ831 yq6ZlUOhUpKV6ebI2iB2rOEoWscLYGl7HNFvm2nMTD+yF1QKhwt9vd4lBMHQmD3HjvuI vbEg== X-Gm-Message-State: AKGB3mLfPwl0xLy5fr1xvtN3gwpVl17g5WozKGy/lFcGea2SERpG/xgB VQt16IktTgz4lBSDe1QisPymVVAMZXI= X-Received: by 10.28.225.197 with SMTP id y188mr6888879wmg.12.1512861603806; Sat, 09 Dec 2017 15:20:03 -0800 (PST) Received: from localhost ([2.25.234.120]) by smtp.gmail.com with ESMTPSA id h12sm10586237wre.52.2017.12.09.15.20.01 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sat, 09 Dec 2017 15:20:03 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: [07/13] Make vec_perm_indices use new vector encoding References: <87indfmrgt.fsf@linaro.org> Date: Sat, 09 Dec 2017 23:20:01 +0000 In-Reply-To: <87indfmrgt.fsf@linaro.org> (Richard Sandiford's message of "Sat, 09 Dec 2017 23:06:26 +0000") Message-ID: <87lgiblc9q.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 This patch changes vec_perm_indices from a plain vec<> to a class that stores a canonicalised permutation, using the same encoding as for VECTOR_CSTs. This means that vec_perm_indices now carries information about the number of vectors being permuted (currently always 1 or 2) and the number of elements in each input vector. A new vec_perm_builder class is used to actually build up the vector, like tree_vector_builder does for trees. vec_perm_indices is the completed representation, a bit like VECTOR_CST is for trees. The patch just does a mechanical conversion of the code to vec_perm_builder: a later patch uses explicit encodings where possible. The point of all this is that it makes the representation suitable for variable-length vectors. It's no longer necessary for the underlying vec<>s to store every element explicitly. In int-vector-builder.h, "using the same encoding as tree and rtx constants" describes the endpoint -- adding the rtx encoding comes later. 2017-12-09 Richard Sandiford gcc/ * int-vector-builder.h: New file. * vec-perm-indices.h: Include int-vector-builder.h. (vec_perm_indices): Redefine as an int_vector_builder. (auto_vec_perm_indices): Delete. (vec_perm_builder): Redefine as a stand-alone class. (vec_perm_indices::vec_perm_indices): New function. (vec_perm_indices::clamp): Likewise. * vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h. (vec_perm_indices::new_vector): New function. (vec_perm_indices::new_expanded_vector): Update for new vec_perm_indices class. (vec_perm_indices::rotate_inputs): New function. (vec_perm_indices::all_in_range_p): Operate directly on the encoded form, without computing elided elements. (tree_to_vec_perm_builder): Operate directly on the VECTOR_CST encoding. Update for new vec_perm_indices class. * optabs.c (expand_vec_perm_const): Create a vec_perm_indices for the given vec_perm_builder. (expand_vec_perm_var): Update vec_perm_builder constructor. (expand_mult_highpart): Use vec_perm_builder instead of auto_vec_perm_indices. * optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. Use a single or double series encoding as appropriate. * fold-const.c (fold_ternary_loc): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise. * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise. (vect_permute_store_chain): Likewise. (vect_grouped_load_supported): Likewise. (vect_permute_load_chain): Likewise. (vect_shift_permute_load_chain): Likewise. * tree-vect-slp.c (vect_build_slp_tree_1): Likewise. (vect_transform_slp_perm_load): Likewise. (vect_schedule_slp_instance): Likewise. * tree-vect-stmts.c (perm_mask_for_reverse): Likewise. (vectorizable_mask_load_store): Likewise. (vectorizable_bswap): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. * tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. Use tree_to_vec_perm_builder to read the vector from a tree. * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a vec_perm_builder instead of a vec_perm_indices. (have_whole_vector_shift): Use vec_perm_builder and vec_perm_indices instead of auto_vec_perm_indices. Leave the truncation to calc_vec_perm_mask_for_shift. (vect_create_epilog_for_reduction): Likewise. * config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change from auto_vec_perm_indices to vec_perm_indices. (aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm instead of changing individual elements. (aarch64_vectorize_vec_perm_const): Use new_vector to install the vector in d.perm. * config/arm/arm.c (expand_vec_perm_d::perm): Change from auto_vec_perm_indices to vec_perm_indices. (arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm instead of changing individual elements. (arm_vectorize_vec_perm_const): Use new_vector to install the vector in d.perm. * config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even): Update vec_perm_builder constructor. (rs6000_expand_interleave): Likewise. * config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise. (rs6000_expand_interleave): Likewise. Index: gcc/int-vector-builder.h =================================================================== --- /dev/null 2017-12-09 13:59:56.352713187 +0000 +++ gcc/int-vector-builder.h 2017-12-09 22:48:47.545825268 +0000 @@ -0,0 +1,90 @@ +/* A class for building vector integer constants. + Copyright (C) 2017 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +. */ + +#ifndef GCC_INT_VECTOR_BUILDER_H +#define GCC_INT_VECTOR_BUILDER_H 1 + +#include "vector-builder.h" + +/* This class is used to build vectors of integer type T using the same + encoding as tree and rtx constants. See vector_builder for more + details. */ +template +class int_vector_builder : public vector_builder > +{ + typedef vector_builder parent; + friend class vector_builder; + +public: + int_vector_builder () {} + int_vector_builder (unsigned int, unsigned int, unsigned int); + + using parent::new_vector; + +private: + bool equal_p (T, T) const; + bool allow_steps_p () const { return true; } + bool integral_p (T) const { return true; } + T step (T, T) const; + T apply_step (T, unsigned int, T) const; + bool can_elide_p (T) const { return true; } + void note_representative (T *, T) {} +}; + +/* Create a new builder for a vector with FULL_NELTS elements. + Initially encode the value as NPATTERNS interleaved patterns with + NELTS_PER_PATTERN elements each. */ + +template +inline +int_vector_builder::int_vector_builder (unsigned int full_nelts, + unsigned int npatterns, + unsigned int nelts_per_pattern) +{ + new_vector (full_nelts, npatterns, nelts_per_pattern); +} + +/* Return true if elements ELT1 and ELT2 are equal. */ + +template +inline bool +int_vector_builder::equal_p (T elt1, T elt2) const +{ + return elt1 == elt2; +} + +/* Return the value of element ELT2 minus the value of element ELT1. */ + +template +inline T +int_vector_builder::step (T elt1, T elt2) const +{ + return elt2 - elt1; +} + +/* Return a vector element with the value BASE + FACTOR * STEP. */ + +template +inline T +int_vector_builder::apply_step (T base, unsigned int factor, T step) const +{ + return base + factor * step; +} + +#endif Index: gcc/vec-perm-indices.h =================================================================== --- gcc/vec-perm-indices.h 2017-12-09 22:47:27.885318101 +0000 +++ gcc/vec-perm-indices.h 2017-12-09 22:48:47.548825399 +0000 @@ -20,30 +20,102 @@ Software Foundation; either version 3, o #ifndef GCC_VEC_PERN_INDICES_H #define GCC_VEC_PERN_INDICES_H 1 +#include "int-vector-builder.h" + +/* A vector_builder for building constant permutation vectors. + The elements do not need to be clamped to a particular range + of input elements. */ +typedef int_vector_builder vec_perm_builder; + /* This class represents a constant permutation vector, such as that used - as the final operand to a VEC_PERM_EXPR. */ -class vec_perm_indices : public auto_vec + as the final operand to a VEC_PERM_EXPR. The vector is canonicalized + for a particular number of input vectors and for a particular number + of elements per input. The class copes with cases in which the + input and output vectors have different numbers of elements. */ +class vec_perm_indices { - typedef unsigned short element_type; - typedef auto_vec parent_type; + typedef HOST_WIDE_INT element_type; public: - vec_perm_indices () {} - vec_perm_indices (unsigned int nunits) : parent_type (nunits) {} + vec_perm_indices (); + vec_perm_indices (const vec_perm_builder &, unsigned int, unsigned int); + void new_vector (const vec_perm_builder &, unsigned int, unsigned int); void new_expanded_vector (const vec_perm_indices &, unsigned int); + void rotate_inputs (int delta); + + /* Return the underlying vector encoding. */ + const vec_perm_builder &encoding () const { return m_encoding; } + + /* Return the number of output elements. This is called length () + so that we present a more vec-like interface. */ + unsigned int length () const { return m_encoding.full_nelts (); } + + /* Return the number of input vectors being permuted. */ + unsigned int ninputs () const { return m_ninputs; } + /* Return the number of elements in each input vector. */ + unsigned int nelts_per_input () const { return m_nelts_per_input; } + + /* Return the total number of input elements. */ + unsigned int input_nelts () const { return m_ninputs * m_nelts_per_input; } + + element_type clamp (element_type) const; + element_type operator[] (unsigned int i) const; bool all_in_range_p (element_type, element_type) const; private: vec_perm_indices (const vec_perm_indices &); -}; -/* Temporary. */ -typedef vec_perm_indices vec_perm_builder; -typedef vec_perm_indices auto_vec_perm_indices; + vec_perm_builder m_encoding; + unsigned int m_ninputs; + unsigned int m_nelts_per_input; +}; bool tree_to_vec_perm_builder (vec_perm_builder *, tree); rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &); +inline +vec_perm_indices::vec_perm_indices () + : m_ninputs (0), + m_nelts_per_input (0) +{ +} + +/* Construct a permutation vector that selects between NINPUTS vector + inputs that have NELTS_PER_INPUT elements each. Take the elements of + the new vector from ELEMENTS, clamping each one to be in range. */ + +inline +vec_perm_indices::vec_perm_indices (const vec_perm_builder &elements, + unsigned int ninputs, + unsigned int nelts_per_input) +{ + new_vector (elements, ninputs, nelts_per_input); +} + +/* Return the canonical value for permutation vector element ELT, + taking into account the current number of input elements. */ + +inline vec_perm_indices::element_type +vec_perm_indices::clamp (element_type elt) const +{ + element_type limit = input_nelts (); + elt %= limit; + /* Treat negative elements as counting from the end. This only matters + if the vector size is not a power of 2. */ + if (elt < 0) + elt += limit; + return elt; +} + +/* Return the value of vector element I, which might or might not be + explicitly encoded. */ + +inline vec_perm_indices::element_type +vec_perm_indices::operator[] (unsigned int i) const +{ + return clamp (m_encoding.elt (i)); +} + #endif Index: gcc/vec-perm-indices.c =================================================================== --- gcc/vec-perm-indices.c 2017-12-09 22:47:27.885318101 +0000 +++ gcc/vec-perm-indices.c 2017-12-09 22:48:47.548825399 +0000 @@ -22,11 +22,33 @@ Software Foundation; either version 3, o #include "coretypes.h" #include "vec-perm-indices.h" #include "tree.h" +#include "fold-const.h" +#include "tree-vector-builder.h" #include "backend.h" #include "rtl.h" #include "memmodel.h" #include "emit-rtl.h" +/* Switch to a new permutation vector that selects between NINPUTS vector + inputs that have NELTS_PER_INPUT elements each. Take the elements of the + new permutation vector from ELEMENTS, clamping each one to be in range. */ + +void +vec_perm_indices::new_vector (const vec_perm_builder &elements, + unsigned int ninputs, + unsigned int nelts_per_input) +{ + m_ninputs = ninputs; + m_nelts_per_input = nelts_per_input; + /* Expand the encoding and clamp each element. E.g. { 0, 2, 4, ... } + might wrap halfway if there is only one vector input. */ + unsigned int full_nelts = elements.full_nelts (); + m_encoding.new_vector (full_nelts, full_nelts, 1); + for (unsigned int i = 0; i < full_nelts; ++i) + m_encoding.quick_push (clamp (elements.elt (i))); + m_encoding.finalize (); +} + /* Switch to a new permutation vector that selects the same input elements as ORIG, but with each element split into FACTOR pieces. For example, if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is @@ -36,14 +58,31 @@ Software Foundation; either version 3, o vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig, unsigned int factor) { - truncate (0); - reserve (orig.length () * factor); - for (unsigned int i = 0; i < orig.length (); ++i) + m_ninputs = orig.m_ninputs; + m_nelts_per_input = orig.m_nelts_per_input * factor; + m_encoding.new_vector (orig.m_encoding.full_nelts () * factor, + orig.m_encoding.npatterns () * factor, + orig.m_encoding.nelts_per_pattern ()); + unsigned int encoded_nelts = orig.m_encoding.encoded_nelts (); + for (unsigned int i = 0; i < encoded_nelts; ++i) { - element_type base = orig[i] * factor; + element_type base = orig.m_encoding[i] * factor; for (unsigned int j = 0; j < factor; ++j) - quick_push (base + j); + m_encoding.quick_push (base + j); } + m_encoding.finalize (); +} + +/* Rotate the inputs of the permutation right by DELTA inputs. This changes + the values of the permutation vector but it doesn't change the way that + the elements are encoded. */ + +void +vec_perm_indices::rotate_inputs (int delta) +{ + element_type element_delta = delta * m_nelts_per_input; + for (unsigned int i = 0; i < m_encoding.length (); ++i) + m_encoding[i] = clamp (m_encoding[i] + element_delta); } /* Return true if all elements of the permutation vector are in the range @@ -52,9 +91,44 @@ vec_perm_indices::new_expanded_vector (c bool vec_perm_indices::all_in_range_p (element_type start, element_type size) const { - for (unsigned int i = 0; i < length (); ++i) - if ((*this)[i] < start || ((*this)[i] - start) >= size) + /* Check the first two elements of each pattern. */ + unsigned int npatterns = m_encoding.npatterns (); + unsigned int nelts_per_pattern = m_encoding.nelts_per_pattern (); + unsigned int base_nelts = npatterns * MIN (nelts_per_pattern, 2); + for (unsigned int i = 0; i < base_nelts; ++i) + if (m_encoding[i] < start || (m_encoding[i] - start) >= size) return false; + + /* For stepped encodings, check the full range of the series. */ + if (nelts_per_pattern == 3) + { + element_type limit = input_nelts (); + + /* The number of elements in each pattern beyond the first two + that we checked above. */ + unsigned int step_nelts = (m_encoding.full_nelts () / npatterns) - 2; + for (unsigned int i = 0; i < npatterns; ++i) + { + /* BASE1 has been checked but BASE2 hasn't. */ + element_type base1 = m_encoding[i + npatterns]; + element_type base2 = m_encoding[i + base_nelts]; + + /* The step to add to get from BASE1 to each subsequent value. */ + element_type step = clamp (base2 - base1); + + /* STEP has no inherent sign, so a value near LIMIT can + act as a negative step. The series is in range if it + is in range according to one of the two interpretations. + + Since we're dealing with clamped values, ELEMENT_TYPE is + wide enough for overflow not to be a problem. */ + element_type headroom_down = base1 - start; + element_type headroom_up = size - headroom_down - 1; + if (headroom_up < step * step_nelts + && headroom_down < (limit - step) * step_nelts) + return false; + } + } return true; } @@ -65,15 +139,16 @@ vec_perm_indices::all_in_range_p (elemen bool tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst) { - unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)); - for (unsigned int i = 0; i < nelts; ++i) - if (!tree_fits_shwi_p (vector_cst_elt (cst, i))) + unsigned int encoded_nelts = vector_cst_encoded_nelts (cst); + for (unsigned int i = 0; i < encoded_nelts; ++i) + if (!tree_fits_shwi_p (VECTOR_CST_ENCODED_ELT (cst, i))) return false; - builder->reserve (nelts); - for (unsigned int i = 0; i < nelts; ++i) - builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i)) - & (2 * nelts - 1)); + builder->new_vector (TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)), + VECTOR_CST_NPATTERNS (cst), + VECTOR_CST_NELTS_PER_PATTERN (cst)); + for (unsigned int i = 0; i < encoded_nelts; ++i) + builder->quick_push (tree_to_shwi (VECTOR_CST_ENCODED_ELT (cst, i))); return true; } Index: gcc/optabs.c =================================================================== --- gcc/optabs.c 2017-12-09 22:47:27.881318099 +0000 +++ gcc/optabs.c 2017-12-09 22:48:47.546825312 +0000 @@ -5456,6 +5456,11 @@ expand_vec_perm_const (machine_mode mode rtx_insn *last = get_last_insn (); bool single_arg_p = rtx_equal_p (v0, v1); + /* Always specify two input vectors here and leave the target to handle + cases in which the inputs are equal. Not all backends can cope with + the single-input representation when testing for a double-input + target instruction. */ + vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode)); /* See if this can be handled with a vec_shr. We only do this if the second vector is all zeroes. */ @@ -5468,7 +5473,7 @@ expand_vec_perm_const (machine_mode mode && (shift_code != CODE_FOR_nothing || shift_code_qi != CODE_FOR_nothing)) { - rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel); + rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices); if (shift_amt) { struct expand_operand ops[3]; @@ -5500,7 +5505,7 @@ expand_vec_perm_const (machine_mode mode else v1 = force_reg (mode, v1); - if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel)) + if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices)) return target; } @@ -5509,7 +5514,7 @@ expand_vec_perm_const (machine_mode mode rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX; if (qimode != VOIDmode) { - qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode)); + qimode_indices.new_expanded_vector (indices, GET_MODE_UNIT_SIZE (mode)); target_qi = gen_reg_rtx (qimode); v0_qi = gen_lowpart (qimode, v0); v1_qi = gen_lowpart (qimode, v1); @@ -5536,7 +5541,7 @@ expand_vec_perm_const (machine_mode mode REQUIRED_SEL_MODE is OK. */ if (sel_mode != required_sel_mode) { - if (!selector_fits_mode_p (required_sel_mode, sel)) + if (!selector_fits_mode_p (required_sel_mode, indices)) { delete_insns_since (last); return NULL_RTX; @@ -5547,7 +5552,7 @@ expand_vec_perm_const (machine_mode mode insn_code icode = direct_optab_handler (vec_perm_optab, mode); if (icode != CODE_FOR_nothing) { - rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel); + rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, indices); rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx); if (tmp) return tmp; @@ -5621,7 +5626,7 @@ expand_vec_perm_var (machine_mode mode, gcc_assert (sel != NULL); /* Broadcast the low byte each element into each of its bytes. */ - vec_perm_builder const_sel (w); + vec_perm_builder const_sel (w, w, 1); for (i = 0; i < w; ++i) { int this_e = i / u * u; @@ -5848,7 +5853,7 @@ expand_mult_highpart (machine_mode mode, expand_insn (optab_handler (tab2, mode), 3, eops); m2 = gen_lowpart (mode, eops[0].value); - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); if (method == 2) { for (i = 0; i < nunits; ++i) Index: gcc/optabs-query.c =================================================================== --- gcc/optabs-query.c 2017-12-09 22:47:27.881318099 +0000 +++ gcc/optabs-query.c 2017-12-09 22:48:47.545825268 +0000 @@ -501,12 +501,13 @@ can_mult_highpart_p (machine_mode mode, op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab; if (optab_handler (op, mode) != CODE_FOR_nothing) { - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < nunits; ++i) sel.quick_push (!BYTES_BIG_ENDIAN + (i & ~1) + ((i & 1) ? nunits : 0)); - if (can_vec_perm_const_p (mode, sel)) + vec_perm_indices indices (sel, 2, nunits); + if (can_vec_perm_const_p (mode, indices)) return 2; } } @@ -517,10 +518,11 @@ can_mult_highpart_p (machine_mode mode, op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab; if (optab_handler (op, mode) != CODE_FOR_nothing) { - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < nunits; ++i) sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1)); - if (can_vec_perm_const_p (mode, sel)) + vec_perm_indices indices (sel, 2, nunits); + if (can_vec_perm_const_p (mode, indices)) return 3; } } Index: gcc/fold-const.c =================================================================== --- gcc/fold-const.c 2017-12-09 22:47:27.881318099 +0000 +++ gcc/fold-const.c 2017-12-09 22:48:47.545825268 +0000 @@ -11217,7 +11217,7 @@ fold_ternary_loc (location_t loc, enum t { unsigned int nelts = VECTOR_CST_NELTS (arg0), i; gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type)); - auto_vec_perm_indices sel (nelts); + vec_perm_builder sel (nelts, nelts, 1); for (i = 0; i < nelts; i++) { tree val = VECTOR_CST_ELT (arg0, i); @@ -11228,7 +11228,8 @@ fold_ternary_loc (location_t loc, enum t else /* Currently unreachable. */ return NULL_TREE; } - tree t = fold_vec_perm (type, arg1, arg2, sel); + tree t = fold_vec_perm (type, arg1, arg2, + vec_perm_indices (sel, 2, nelts)); if (t != NULL_TREE) return t; } @@ -11558,8 +11559,8 @@ fold_ternary_loc (location_t loc, enum t mask2 = 2 * nelts - 1; mask = single_arg ? (nelts - 1) : mask2; gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type)); - auto_vec_perm_indices sel (nelts); - auto_vec_perm_indices sel2 (nelts); + vec_perm_builder sel (nelts, nelts, 1); + vec_perm_builder sel2 (nelts, nelts, 1); for (i = 0; i < nelts; i++) { tree val = VECTOR_CST_ELT (arg2, i); @@ -11604,12 +11605,13 @@ fold_ternary_loc (location_t loc, enum t need_mask_canon = true; } + vec_perm_indices indices (sel, 2, nelts); if ((TREE_CODE (op0) == VECTOR_CST || TREE_CODE (op0) == CONSTRUCTOR) && (TREE_CODE (op1) == VECTOR_CST || TREE_CODE (op1) == CONSTRUCTOR)) { - tree t = fold_vec_perm (type, op0, op1, sel); + tree t = fold_vec_perm (type, op0, op1, indices); if (t != NULL_TREE) return t; } @@ -11621,11 +11623,14 @@ fold_ternary_loc (location_t loc, enum t argument permutation while still allowing an equivalent 2-argument version. */ if (need_mask_canon && arg2 == op2 - && !can_vec_perm_const_p (TYPE_MODE (type), sel, false) - && can_vec_perm_const_p (TYPE_MODE (type), sel2, false)) + && !can_vec_perm_const_p (TYPE_MODE (type), indices, false) + && can_vec_perm_const_p (TYPE_MODE (type), + vec_perm_indices (sel2, 2, nelts), + false)) { need_mask_canon = need_mask_canon2; - sel = sel2; + sel.truncate (0); + sel.splice (sel2); } if (need_mask_canon && arg2 == op2) Index: gcc/tree-ssa-forwprop.c =================================================================== --- gcc/tree-ssa-forwprop.c 2017-12-09 22:47:27.883318100 +0000 +++ gcc/tree-ssa-forwprop.c 2017-12-09 22:48:47.546825312 +0000 @@ -2019,7 +2019,7 @@ simplify_vector_constructor (gimple_stmt elem_type = TREE_TYPE (type); elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type)); - auto_vec_perm_indices sel (nelts); + vec_perm_builder sel (nelts, nelts, 1); orig = NULL; conv_code = ERROR_MARK; maybe_ident = true; @@ -2109,7 +2109,8 @@ simplify_vector_constructor (gimple_stmt { tree mask_type; - if (!can_vec_perm_const_p (TYPE_MODE (type), sel)) + vec_perm_indices indices (sel, 1, nelts); + if (!can_vec_perm_const_p (TYPE_MODE (type), indices)) return false; mask_type = build_vector_type (build_nonstandard_integer_type (elem_size, 1), Index: gcc/tree-vect-data-refs.c =================================================================== --- gcc/tree-vect-data-refs.c 2017-12-09 22:47:27.883318100 +0000 +++ gcc/tree-vect-data-refs.c 2017-12-09 22:48:47.546825312 +0000 @@ -4566,7 +4566,7 @@ vect_grouped_store_supported (tree vecty if (VECTOR_MODE_P (mode)) { unsigned int i, nelt = GET_MODE_NUNITS (mode); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); if (count == 3) @@ -4574,6 +4574,7 @@ vect_grouped_store_supported (tree vecty unsigned int j0 = 0, j1 = 0, j2 = 0; unsigned int i, j; + vec_perm_indices indices; for (j = 0; j < 3; j++) { int nelt0 = ((3 - j) * nelt) % 3; @@ -4588,7 +4589,8 @@ vect_grouped_store_supported (tree vecty if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = 0; } - if (!can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) dump_printf (MSG_MISSED_OPTIMIZATION, @@ -4605,7 +4607,8 @@ vect_grouped_store_supported (tree vecty if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = nelt + j2++; } - if (!can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) dump_printf (MSG_MISSED_OPTIMIZATION, @@ -4625,11 +4628,13 @@ vect_grouped_store_supported (tree vecty sel[i * 2] = i; sel[i * 2 + 1] = i + nelt; } - if (can_vec_perm_const_p (mode, sel)) + vec_perm_indices indices (sel, 2, nelt); + if (can_vec_perm_const_p (mode, indices)) { for (i = 0; i < nelt; i++) sel[i] += nelt / 2; - if (can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (can_vec_perm_const_p (mode, indices)) return true; } } @@ -4731,7 +4736,7 @@ vect_permute_store_chain (vec dr_c unsigned int i, n, log_length = exact_log2 (length); unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); result_chain->quick_grow (length); @@ -4742,6 +4747,7 @@ vect_permute_store_chain (vec dr_c { unsigned int j0 = 0, j1 = 0, j2 = 0; + vec_perm_indices indices; for (j = 0; j < 3; j++) { int nelt0 = ((3 - j) * nelt) % 3; @@ -4757,7 +4763,8 @@ vect_permute_store_chain (vec dr_c if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = 0; } - perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < nelt; i++) { @@ -4768,7 +4775,8 @@ vect_permute_store_chain (vec dr_c if (3 * i + nelt2 < nelt) sel[3 * i + nelt2] = nelt + j2++; } - perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices); vect1 = dr_chain[0]; vect2 = dr_chain[1]; @@ -4805,11 +4813,13 @@ vect_permute_store_chain (vec dr_c sel[i * 2] = i; sel[i * 2 + 1] = i + nelt; } - perm_mask_high = vect_gen_perm_mask_checked (vectype, sel); + vec_perm_indices indices (sel, 2, nelt); + perm_mask_high = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < nelt; i++) sel[i] += nelt / 2; - perm_mask_low = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm_mask_low = vect_gen_perm_mask_checked (vectype, indices); for (i = 0, n = log_length; i < n; i++) { @@ -5154,11 +5164,12 @@ vect_grouped_load_supported (tree vectyp if (VECTOR_MODE_P (mode)) { unsigned int i, j, nelt = GET_MODE_NUNITS (mode); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); if (count == 3) { + vec_perm_indices indices; unsigned int k; for (k = 0; k < 3; k++) { @@ -5167,7 +5178,8 @@ vect_grouped_load_supported (tree vectyp sel[i] = 3 * i + k; else sel[i] = 0; - if (!can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5180,7 +5192,8 @@ vect_grouped_load_supported (tree vectyp sel[i] = i; else sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++); - if (!can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5195,13 +5208,16 @@ vect_grouped_load_supported (tree vectyp { /* If length is not equal to 3 then only power of 2 is supported. */ gcc_assert (pow2p_hwi (count)); + for (i = 0; i < nelt; i++) sel[i] = i * 2; - if (can_vec_perm_const_p (mode, sel)) + vec_perm_indices indices (sel, 2, nelt); + if (can_vec_perm_const_p (mode, indices)) { for (i = 0; i < nelt; i++) sel[i] = i * 2 + 1; - if (can_vec_perm_const_p (mode, sel)) + indices.new_vector (sel, 2, nelt); + if (can_vec_perm_const_p (mode, indices)) return true; } } @@ -5316,7 +5332,7 @@ vect_permute_load_chain (vec dr_ch unsigned int i, j, log_length = exact_log2 (length); unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); result_chain->quick_grow (length); @@ -5327,6 +5343,7 @@ vect_permute_load_chain (vec dr_ch { unsigned int k; + vec_perm_indices indices; for (k = 0; k < 3; k++) { for (i = 0; i < nelt; i++) @@ -5334,15 +5351,16 @@ vect_permute_load_chain (vec dr_ch sel[i] = 3 * i + k; else sel[i] = 0; - perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices); for (i = 0, j = 0; i < nelt; i++) if (3 * i + k < 2 * nelt) sel[i] = i; else sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++); - - perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices); first_vect = dr_chain[0]; second_vect = dr_chain[1]; @@ -5374,11 +5392,13 @@ vect_permute_load_chain (vec dr_ch for (i = 0; i < nelt; ++i) sel[i] = i * 2; - perm_mask_even = vect_gen_perm_mask_checked (vectype, sel); + vec_perm_indices indices (sel, 2, nelt); + perm_mask_even = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < nelt; ++i) sel[i] = i * 2 + 1; - perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel); + indices.new_vector (sel, 2, nelt); + perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < log_length; i++) { @@ -5514,7 +5534,7 @@ vect_shift_permute_load_chain (vec stmt_vec_info stmt_info = vinfo_for_stmt (stmt); loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); - auto_vec_perm_indices sel (nelt); + vec_perm_builder sel (nelt, nelt, 1); sel.quick_grow (nelt); result_chain->quick_grow (length); @@ -5528,7 +5548,8 @@ vect_shift_permute_load_chain (vec sel[i] = i * 2; for (i = 0; i < nelt / 2; ++i) sel[nelt / 2 + i] = i * 2 + 1; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + vec_perm_indices indices (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5536,13 +5557,14 @@ vect_shift_permute_load_chain (vec supported by target\n"); return false; } - perm2_mask1 = vect_gen_perm_mask_checked (vectype, sel); + perm2_mask1 = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < nelt / 2; ++i) sel[i] = i * 2 + 1; for (i = 0; i < nelt / 2; ++i) sel[nelt / 2 + i] = i * 2; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5550,20 +5572,21 @@ vect_shift_permute_load_chain (vec supported by target\n"); return false; } - perm2_mask2 = vect_gen_perm_mask_checked (vectype, sel); + perm2_mask2 = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {4 5 6 7 8 9 10 11}. */ for (i = 0; i < nelt; i++) sel[i] = nelt / 2 + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift1_mask = vect_gen_perm_mask_checked (vectype, sel); + shift1_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to select vector from 2. For vector length 8 it is {0 1 2 3 12 13 14 15}. */ @@ -5571,14 +5594,15 @@ vect_shift_permute_load_chain (vec sel[i] = i; for (i = nelt / 2; i < nelt; i++) sel[i] = nelt + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "select is not supported by target\n"); return false; } - select_mask = vect_gen_perm_mask_checked (vectype, sel); + select_mask = vect_gen_perm_mask_checked (vectype, indices); for (i = 0; i < log_length; i++) { @@ -5634,7 +5658,8 @@ vect_shift_permute_load_chain (vec sel[i] = 3 * k + (l % 3); k++; } - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + vec_perm_indices indices (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -5642,59 +5667,63 @@ vect_shift_permute_load_chain (vec supported by target\n"); return false; } - perm3_mask = vect_gen_perm_mask_checked (vectype, sel); + perm3_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {6 7 8 9 10 11 12 13}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + (nelt % 3) + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift1_mask = vect_gen_perm_mask_checked (vectype, sel); + shift1_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {5 6 7 8 9 10 11 12}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + 1 + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift2_mask = vect_gen_perm_mask_checked (vectype, sel); + shift2_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {3 4 5 6 7 8 9 10}. */ for (i = 0; i < nelt; i++) sel[i] = (nelt / 3) + (nelt % 3) / 2 + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift3_mask = vect_gen_perm_mask_checked (vectype, sel); + shift3_mask = vect_gen_perm_mask_checked (vectype, indices); /* Generating permutation constant to shift all elements. For vector length 8 it is {5 6 7 8 9 10 11 12}. */ for (i = 0; i < nelt; i++) sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i; - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, "shift permutation is not supported by target\n"); return false; } - shift4_mask = vect_gen_perm_mask_checked (vectype, sel); + shift4_mask = vect_gen_perm_mask_checked (vectype, indices); for (k = 0; k < 3; k++) { Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c 2017-12-09 22:47:27.884318101 +0000 +++ gcc/tree-vect-slp.c 2017-12-09 22:48:47.547825355 +0000 @@ -894,7 +894,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference) { unsigned int count = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices sel (count); + vec_perm_builder sel (count, count, 1); for (i = 0; i < count; ++i) { unsigned int elt = i; @@ -902,7 +902,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, elt += count; sel.quick_push (elt); } - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + vec_perm_indices indices (sel, 2, count); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) { for (i = 0; i < group_size; ++i) if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code) @@ -3570,8 +3571,9 @@ vect_transform_slp_perm_load (slp_tree n (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1); mask_type = get_vectype_for_scalar_type (mask_element_type); nunits = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices mask (nunits); + vec_perm_builder mask (nunits, nunits, 1); mask.quick_grow (nunits); + vec_perm_indices indices; /* Initialize the vect stmts of NODE to properly insert the generated stmts later. */ @@ -3644,10 +3646,10 @@ vect_transform_slp_perm_load (slp_tree n noop_p = false; mask[index++] = mask_element; - if (index == nunits) + if (index == nunits && !noop_p) { - if (! noop_p - && ! can_vec_perm_const_p (mode, mask)) + indices.new_vector (mask, 2, nunits); + if (!can_vec_perm_const_p (mode, indices)) { if (dump_enabled_p ()) { @@ -3655,16 +3657,19 @@ vect_transform_slp_perm_load (slp_tree n vect_location, "unsupported vect permute { "); for (i = 0; i < nunits; ++i) - dump_printf (MSG_MISSED_OPTIMIZATION, "%d ", mask[i]); + dump_printf (MSG_MISSED_OPTIMIZATION, + HOST_WIDE_INT_PRINT_DEC " ", mask[i]); dump_printf (MSG_MISSED_OPTIMIZATION, "}\n"); } gcc_assert (analyze_only); return false; } - if (! noop_p) - ++*n_perms; + ++*n_perms; + } + if (index == nunits) + { if (!analyze_only) { tree mask_vec = NULL_TREE; @@ -3797,7 +3802,7 @@ vect_schedule_slp_instance (slp_tree nod enum tree_code code0 = gimple_assign_rhs_code (stmt); enum tree_code ocode = ERROR_MARK; gimple *ostmt; - auto_vec_perm_indices mask (group_size); + vec_perm_builder mask (group_size, group_size, 1); FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, ostmt) if (gimple_assign_rhs_code (ostmt) != code0) { Index: gcc/tree-vect-stmts.c =================================================================== --- gcc/tree-vect-stmts.c 2017-12-09 22:47:27.885318101 +0000 +++ gcc/tree-vect-stmts.c 2017-12-09 22:48:47.548825399 +0000 @@ -1717,13 +1717,14 @@ perm_mask_for_reverse (tree vectype) nunits = TYPE_VECTOR_SUBPARTS (vectype); - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < nunits; ++i) sel.quick_push (nunits - 1 - i); - if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel)) + vec_perm_indices indices (sel, 1, nunits); + if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices)) return NULL_TREE; - return vect_gen_perm_mask_checked (vectype, sel); + return vect_gen_perm_mask_checked (vectype, indices); } /* A subroutine of get_load_store_type, with a subset of the same @@ -2185,27 +2186,32 @@ vectorizable_mask_load_store (gimple *st { modifier = WIDEN; - auto_vec_perm_indices sel (gather_off_nunits); + vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1); for (i = 0; i < gather_off_nunits; ++i) sel.quick_push (i | nunits); - perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); + vec_perm_indices indices (sel, 1, gather_off_nunits); + perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, + indices); } else if (nunits == gather_off_nunits * 2) { modifier = NARROW; - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); sel.quick_grow (nunits); for (i = 0; i < nunits; ++i) sel[i] = i < gather_off_nunits ? i : i + nunits - gather_off_nunits; + vec_perm_indices indices (sel, 2, nunits); + perm_mask = vect_gen_perm_mask_checked (vectype, indices); - perm_mask = vect_gen_perm_mask_checked (vectype, sel); ncopies *= 2; + for (i = 0; i < nunits; ++i) sel[i] = i | gather_off_nunits; - mask_perm_mask = vect_gen_perm_mask_checked (masktype, sel); + indices.new_vector (sel, 2, gather_off_nunits); + mask_perm_mask = vect_gen_perm_mask_checked (masktype, indices); } else gcc_unreachable (); @@ -2498,12 +2504,13 @@ vectorizable_bswap (gimple *stmt, gimple unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype); unsigned word_bytes = num_bytes / nunits; - auto_vec_perm_indices elts (num_bytes); + vec_perm_builder elts (num_bytes, num_bytes, 1); for (unsigned i = 0; i < nunits; ++i) for (unsigned j = 0; j < word_bytes; ++j) elts.quick_push ((i + 1) * word_bytes - j - 1); - if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts)) + vec_perm_indices indices (elts, 1, num_bytes); + if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), indices)) return false; if (! vec_stmt) @@ -5809,22 +5816,25 @@ vectorizable_store (gimple *stmt, gimple { modifier = WIDEN; - auto_vec_perm_indices sel (scatter_off_nunits); + vec_perm_builder sel (scatter_off_nunits, scatter_off_nunits, 1); for (i = 0; i < (unsigned int) scatter_off_nunits; ++i) sel.quick_push (i | nunits); - perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); + vec_perm_indices indices (sel, 1, scatter_off_nunits); + perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, + indices); gcc_assert (perm_mask != NULL_TREE); } else if (nunits == (unsigned int) scatter_off_nunits * 2) { modifier = NARROW; - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < (unsigned int) nunits; ++i) sel.quick_push (i | scatter_off_nunits); - perm_mask = vect_gen_perm_mask_checked (vectype, sel); + vec_perm_indices indices (sel, 2, nunits); + perm_mask = vect_gen_perm_mask_checked (vectype, indices); gcc_assert (perm_mask != NULL_TREE); ncopies *= 2; } @@ -6845,22 +6855,25 @@ vectorizable_load (gimple *stmt, gimple_ { modifier = WIDEN; - auto_vec_perm_indices sel (gather_off_nunits); + vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1); for (i = 0; i < gather_off_nunits; ++i) sel.quick_push (i | nunits); - perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel); + vec_perm_indices indices (sel, 1, gather_off_nunits); + perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, + indices); } else if (nunits == gather_off_nunits * 2) { modifier = NARROW; - auto_vec_perm_indices sel (nunits); + vec_perm_builder sel (nunits, nunits, 1); for (i = 0; i < nunits; ++i) sel.quick_push (i < gather_off_nunits ? i : i + nunits - gather_off_nunits); - perm_mask = vect_gen_perm_mask_checked (vectype, sel); + vec_perm_indices indices (sel, 2, nunits); + perm_mask = vect_gen_perm_mask_checked (vectype, indices); ncopies *= 2; } else Index: gcc/tree-vect-generic.c =================================================================== --- gcc/tree-vect-generic.c 2017-12-09 22:47:27.883318100 +0000 +++ gcc/tree-vect-generic.c 2017-12-09 22:48:47.547825355 +0000 @@ -1299,15 +1299,13 @@ lower_vec_perm (gimple_stmt_iterator *gs mask = gimple_assign_rhs1 (def_stmt); } - if (TREE_CODE (mask) == VECTOR_CST) - { - auto_vec_perm_indices sel_int (elements); - - for (i = 0; i < elements; ++i) - sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i)) - & (2 * elements - 1)); + vec_perm_builder sel_int; - if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int)) + if (TREE_CODE (mask) == VECTOR_CST + && tree_to_vec_perm_builder (&sel_int, mask)) + { + vec_perm_indices indices (sel_int, 2, elements); + if (can_vec_perm_const_p (TYPE_MODE (vect_type), indices)) { gimple_assign_set_rhs3 (stmt, mask); update_stmt (stmt); @@ -1319,14 +1317,14 @@ lower_vec_perm (gimple_stmt_iterator *gs != CODE_FOR_nothing && TREE_CODE (vec1) == VECTOR_CST && initializer_zerop (vec1) - && sel_int[0] - && sel_int[0] < elements) + && indices[0] + && indices[0] < elements) { for (i = 1; i < elements; ++i) { - unsigned int expected = i + sel_int[0]; + unsigned int expected = i + indices[0]; /* Indices into the second vector are all equivalent. */ - if (MIN (elements, (unsigned) sel_int[i]) + if (MIN (elements, (unsigned) indices[i]) != MIN (elements, expected)) break; } Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c 2017-12-09 22:47:27.884318101 +0000 +++ gcc/tree-vect-loop.c 2017-12-09 22:48:47.547825355 +0000 @@ -3714,12 +3714,11 @@ vect_estimate_min_profitable_iters (loop vector elements (not bits) for a vector with NELT elements. */ static void calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt, - vec_perm_indices *sel) + vec_perm_builder *sel) { - unsigned int i; - - for (i = 0; i < nelt; i++) - sel->quick_push ((i + offset) & (2 * nelt - 1)); + sel->new_vector (nelt, nelt, 1); + for (unsigned int i = 0; i < nelt; i++) + sel->quick_push (i + offset); } /* Checks whether the target supports whole-vector shifts for vectors of mode @@ -3732,13 +3731,13 @@ have_whole_vector_shift (machine_mode mo return true; unsigned int i, nelt = GET_MODE_NUNITS (mode); - auto_vec_perm_indices sel (nelt); - + vec_perm_builder sel; + vec_perm_indices indices; for (i = nelt/2; i >= 1; i/=2) { - sel.truncate (0); calc_vec_perm_mask_for_shift (i, nelt, &sel); - if (!can_vec_perm_const_p (mode, sel, false)) + indices.new_vector (sel, 2, nelt); + if (!can_vec_perm_const_p (mode, indices, false)) return false; } return true; @@ -5028,7 +5027,8 @@ vect_create_epilog_for_reduction (vec= 1; elt_offset /= 2) { - sel.truncate (0); calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel); - tree mask = vect_gen_perm_mask_any (vectype, sel); + indices.new_vector (sel, 2, nelements); + tree mask = vect_gen_perm_mask_any (vectype, indices); epilog_stmt = gimple_build_assign (vec_dest, VEC_PERM_EXPR, new_temp, zero_vec, mask); new_name = make_ssa_name (vec_dest, epilog_stmt); Index: gcc/config/aarch64/aarch64.c =================================================================== --- gcc/config/aarch64/aarch64.c 2017-12-09 22:47:27.856318084 +0000 +++ gcc/config/aarch64/aarch64.c 2017-12-09 22:48:47.535824832 +0000 @@ -13208,7 +13208,7 @@ #define MAX_VECT_LEN 16 struct expand_vec_perm_d { rtx target, op0, op1; - auto_vec_perm_indices perm; + vec_perm_indices perm; machine_mode vmode; bool one_vector_p; bool testing_p; @@ -13598,10 +13598,7 @@ aarch64_expand_vec_perm_const_1 (struct unsigned int nelt = d->perm.length (); if (d->perm[0] >= nelt) { - gcc_assert (nelt == (nelt & -nelt)); - for (unsigned int i = 0; i < nelt; ++i) - d->perm[i] ^= nelt; /* Keep the same index, but in the other vector. */ - + d->perm.rotate_inputs (1); std::swap (d->op0, d->op1); } @@ -13641,12 +13638,10 @@ aarch64_vectorize_vec_perm_const (machin /* Calculate whether all elements are in one vector. */ unsigned int nelt = sel.length (); - d.perm.reserve (nelt); for (i = which = 0; i < nelt; ++i) { unsigned int ei = sel[i] & (2 * nelt - 1); which |= (ei < nelt ? 1 : 2); - d.perm.quick_push (ei); } switch (which) @@ -13665,8 +13660,6 @@ aarch64_vectorize_vec_perm_const (machin input vector. */ /* Fall Through. */ case 2: - for (i = 0; i < nelt; ++i) - d.perm[i] &= nelt - 1; d.op0 = op1; d.one_vector_p = true; break; @@ -13677,6 +13670,8 @@ aarch64_vectorize_vec_perm_const (machin break; } + d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt); + if (!d.testing_p) return aarch64_expand_vec_perm_const_1 (&d); Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c 2017-12-09 22:47:27.858318085 +0000 +++ gcc/config/arm/arm.c 2017-12-09 22:48:47.538824963 +0000 @@ -28852,7 +28852,7 @@ #define MAX_VECT_LEN 16 struct expand_vec_perm_d { rtx target, op0, op1; - auto_vec_perm_indices perm; + vec_perm_indices perm; machine_mode vmode; bool one_vector_p; bool testing_p; @@ -29360,9 +29360,7 @@ arm_expand_vec_perm_const_1 (struct expa unsigned int nelt = d->perm.length (); if (d->perm[0] >= nelt) { - for (unsigned int i = 0; i < nelt; ++i) - d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1); - + d->perm.rotate_inputs (1); std::swap (d->op0, d->op1); } @@ -29402,12 +29400,10 @@ arm_vectorize_vec_perm_const (machine_mo d.testing_p = !target; nelt = GET_MODE_NUNITS (d.vmode); - d.perm.reserve (nelt); for (i = which = 0; i < nelt; ++i) { int ei = sel[i] & (2 * nelt - 1); which |= (ei < nelt ? 1 : 2); - d.perm.quick_push (ei); } switch (which) @@ -29426,8 +29422,6 @@ arm_vectorize_vec_perm_const (machine_mo input vector. */ /* FALLTHRU */ case 2: - for (i = 0; i < nelt; ++i) - d.perm[i] &= nelt - 1; d.op0 = op1; d.one_vector_p = true; break; @@ -29438,6 +29432,8 @@ arm_vectorize_vec_perm_const (machine_mo break; } + d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt); + if (d.testing_p) return arm_expand_vec_perm_const_1 (&d); Index: gcc/config/powerpcspe/powerpcspe.c =================================================================== --- gcc/config/powerpcspe/powerpcspe.c 2017-12-09 22:47:27.871318093 +0000 +++ gcc/config/powerpcspe/powerpcspe.c 2017-12-09 22:48:47.541825094 +0000 @@ -38780,7 +38780,7 @@ rs6000_expand_extract_even (rtx target, { machine_mode vmode = GET_MODE (target); unsigned i, nelt = GET_MODE_NUNITS (vmode); - vec_perm_builder perm (nelt); + vec_perm_builder perm (nelt, nelt, 1); for (i = 0; i < nelt; i++) perm.quick_push (i * 2); @@ -38795,7 +38795,7 @@ rs6000_expand_interleave (rtx target, rt { machine_mode vmode = GET_MODE (target); unsigned i, high, nelt = GET_MODE_NUNITS (vmode); - vec_perm_builder perm (nelt); + vec_perm_builder perm (nelt, nelt, 1); high = (highp ? 0 : nelt / 2); for (i = 0; i < nelt / 2; i++) Index: gcc/config/rs6000/rs6000.c =================================================================== --- gcc/config/rs6000/rs6000.c 2017-12-09 22:47:27.874318095 +0000 +++ gcc/config/rs6000/rs6000.c 2017-12-09 22:48:47.544825224 +0000 @@ -36017,7 +36017,7 @@ rs6000_expand_extract_even (rtx target, { machine_mode vmode = GET_MODE (target); unsigned i, nelt = GET_MODE_NUNITS (vmode); - vec_perm_builder perm (nelt); + vec_perm_builder perm (nelt, nelt, 1); for (i = 0; i < nelt; i++) perm.quick_push (i * 2); @@ -36032,7 +36032,7 @@ rs6000_expand_interleave (rtx target, rt { machine_mode vmode = GET_MODE (target); unsigned i, high, nelt = GET_MODE_NUNITS (vmode); - vec_perm_builder perm (nelt); + vec_perm_builder perm (nelt, nelt, 1); high = (highp ? 0 : nelt / 2); for (i = 0; i < nelt / 2; i++)