From patchwork Wed May 9 10:34:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 135246 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp5463030lji; Wed, 9 May 2018 03:34:39 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrMRdgVxocpxpihhyVXRoDfCSM0hENXDkszikvCFnKP9KvEdx5gMIkaDP6l7X6ssIpFDTFD X-Received: by 2002:a63:7150:: with SMTP id b16-v6mr35478572pgn.326.1525862079100; Wed, 09 May 2018 03:34:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525862079; cv=none; d=google.com; s=arc-20160816; b=mejS5I4FzMNBBEH1/psAPYeVvD7kuTTJd3Xbny5W/Ph8fcLrUPjfQ6phoxqdG70EU7 Psh1Y+Hlry9qfA+TjpRaYy89iuXUjwwdac0hNmwTffXr3tEZBdqm5p+gS7K9YSHWiIxP R7vX8CbauhWXxbNQf2g8TEe/7pf4ozvfXGUxcjupb6Glc3c7vtbn9OODe0wVM+QFHoSJ qpYMEgx28Li6Q5QFwhQoUCy5yl9Y3tEyiX/xDgbr2ZtzC/R6tgHpBVia7qmZ544R71dh AKMOiXGBOlXsoupP4NPu3kglQmiTuQM6IJeQ6/Exvb7l56zV/3txOm+jbMf8NAkFMgC1 GW3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:user-agent:message-id:date:subject:mail-followup-to:to :from:delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=pxmF9EmdYiVM/rYCsxAxbrV94nfLpuUapIR4DWWnLkI=; b=csugwtl7Ca6nNjTT4R6cDsh6BFO3wYBRmHmLYy9NFgWE09NIm90DKBypQesVhBt/kM ZZs9eLehT6npF9Euuy9vkvdpdVMP7K3a7t5vAmCOPlX21/mAWjeUx5yha0PULKe4Prcw Rnmb46ScGzXuIbbOc4YPqqrEVVfGPRHB+dLLtQkSkLENRLuHeWMYRnurYcmq6+pDlMqw n8mu/yJ5x5kHQTBOb9Ez5Wq4kGPg3XkWhvAoL7p9aNdOfxFD1ihLuiSWNsy7hNLI2/bA eXxcgWX7n3tkJdHo4GThqvgsTvt++/wsEQMKHJ8lJreiFa3TOQMA5pHsK1Ge3Aa4pF+I +SIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=cNB4DQf4; spf=pass (google.com: domain of gcc-patches-return-477417-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-477417-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id v11-v6si17066368plp.25.2018.05.09.03.34.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 May 2018 03:34:39 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-477417-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=cNB4DQf4; spf=pass (google.com: domain of gcc-patches-return-477417-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-477417-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; q=dns; s= default; b=atpCwJgQ0bP1/P+lhIkCvPCd6QRaBdK9uAdaxK4+U1lZYQlD9zaZo vwhuIoodtPO8miBO1e+mdZeYpHXGeIpHI9aELv/JdZ2TnfXOX0qPUEnIVCHNMdWt fKA8x290wYKn/7vD6HvY7TDzzc4gksrJvanZ71XGlA8eJ1nf8Oe4ko= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:subject:date:message-id:mime-version:content-type; s= default; bh=mLKEFszXV9Z8JEbNKl/608KvRX4=; b=cNB4DQf4ZQex2xBBO3Do vDOTUxWTDdC8NCfC5ef58+FLzrVVOSZ9u3BYZhhvV1jyQ1pT7dcV16GOVR0w7VTq KSIY3qBALBF0O62NjZ9eId/tkNHdcS+1IlyRVwxRFS5iwcnsFi7pfn62Y9kHpYqZ sYw8ky9AMj3qtc6c8W4LF50= Received: (qmail 42330 invoked by alias); 9 May 2018 10:34:27 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 42321 invoked by uid 89); 9 May 2018 10:34:26 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-11.0 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=UD:tree-vect-slp.c, tree-vect-slp.c, armv8a, armv8-a X-HELO: mail-wr0-f170.google.com Received: from mail-wr0-f170.google.com (HELO mail-wr0-f170.google.com) (209.85.128.170) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 09 May 2018 10:34:24 +0000 Received: by mail-wr0-f170.google.com with SMTP id p5-v6so35147756wre.12 for ; Wed, 09 May 2018 03:34:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=pxmF9EmdYiVM/rYCsxAxbrV94nfLpuUapIR4DWWnLkI=; b=p5z9uxwjPsoTTsdquO/zHpCJz3fD3xxhnpB1jtlNJzI0RpNdpKv4Np/rMbtaLoxAqa alXwX6WOdLQYPhE7hAj0w5RMVE1FZG4VcFaIAi//KKF1iYd7thYZoRI1G1g5hNmS9Zld tC4glbMrAhYB6e5uZdna2EY7PmPPLq1++nSnM31xZeyqjlIWZQymKUwMjsX8+5h+SBYR zdXcERhzU7EO3aH7nb3L4etIKBnRCaE0/u2DZ2O+V+yzTw/FPGhe8q/xak6ToiUfDUTL qKzVQQRboV5Ei5pXtnlImsKbclCV7sW3Fnm1DVQ2HhjKETsUteVBtYVNCEKUpMwecrsj G6Lg== X-Gm-Message-State: ALQs6tA7ZoncaCfkIeOrFV8prWx3AsKjQa9sOWjKP847/fn+Z8m14kkk b7oRFv2PXUw8BXzWJ+PqGKOuiblsE8w= X-Received: by 2002:adf:85b8:: with SMTP id 53-v6mr38098059wrt.31.1525862061973; Wed, 09 May 2018 03:34:21 -0700 (PDT) Received: from localhost (116.58.7.51.dyn.plus.net. [51.7.58.116]) by smtp.gmail.com with ESMTPSA id i26-v6sm8985217wmb.19.2018.05.09.03.34.20 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 09 May 2018 03:34:21 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: Handle vector boolean types when calculating the SLP unroll factor Date: Wed, 09 May 2018 11:34:20 +0100 Message-ID: <87efilqfrn.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.3 (gnu/linux) MIME-Version: 1.0 The SLP unrolling factor is calculated by finding the smallest scalar type for each SLP statement and taking the number of required lanes from the vector versions of those scalar types. E.g. for an int32->int64 conversion, it's the vector of int32s rather than the vector of int64s that determines the unroll factor. We rely on tree-vect-patterns.c to replace boolean operations like: bool a, b, c; a = b & c; with integer operations of whatever the best size is in context. E.g. if b and c are fed by comparisons of ints, a, b and c will become the appropriate size for an int comparison. For most targets this means that a, b and c will end up as int-sized themselves, but on targets like SVE and AVX512 with packed vector booleans, they'll instead become a small bitfield like :1, padded to a byte for memory purposes. The SLP code would then take these scalar types and try to calculate the vector type for them, causing the unroll factor to be much higher than necessary. This patch makes SLP use the cached vector boolean type if that's appropriate. Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-none-elf and x86_64-linux-gnu. OK to install? Richard 2018-05-09 Richard Sandiford gcc/ * tree-vect-slp.c (get_vectype_for_smallest_scalar_type): New function. (vect_build_slp_tree_1): Use it when calculating the unroll factor. gcc/testsuite/ * gcc.target/aarch64/sve/vcond_10.c: New test. * gcc.target/aarch64/sve/vcond_10_run.c: Likewise. * gcc.target/aarch64/sve/vcond_11.c: Likewise. * gcc.target/aarch64/sve/vcond_11_run.c: Likewise. Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c 2018-05-08 09:42:03.526648115 +0100 +++ gcc/tree-vect-slp.c 2018-05-09 11:30:41.061096063 +0100 @@ -608,6 +608,41 @@ vect_record_max_nunits (vec_info *vinfo, return true; } +/* Return the vector type associated with the smallest scalar type in STMT. */ + +static tree +get_vectype_for_smallest_scalar_type (gimple *stmt) +{ + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + tree vectype = STMT_VINFO_VECTYPE (stmt_info); + if (vectype != NULL_TREE + && VECTOR_BOOLEAN_TYPE_P (vectype)) + { + /* The result of a vector boolean operation has the smallest scalar + type unless the statement is extending an even narrower boolean. */ + if (!gimple_assign_cast_p (stmt)) + return vectype; + + tree src = gimple_assign_rhs1 (stmt); + gimple *def_stmt; + enum vect_def_type dt; + tree src_vectype = NULL_TREE; + if (vect_is_simple_use (src, stmt_info->vinfo, &def_stmt, &dt, + &src_vectype) + && src_vectype + && VECTOR_BOOLEAN_TYPE_P (src_vectype)) + { + if (TYPE_PRECISION (TREE_TYPE (src_vectype)) + < TYPE_PRECISION (TREE_TYPE (vectype))) + return src_vectype; + return vectype; + } + } + HOST_WIDE_INT dummy; + tree scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy); + return get_vectype_for_scalar_type (scalar_type); +} + /* Verify if the scalar stmts STMTS are isomorphic, require data permutation or are of unsupported types of operation. Return true if they are, otherwise return false and indicate in *MATCHES @@ -636,12 +671,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, enum tree_code first_cond_code = ERROR_MARK; tree lhs; bool need_same_oprnds = false; - tree vectype = NULL_TREE, scalar_type, first_op1 = NULL_TREE; + tree vectype = NULL_TREE, first_op1 = NULL_TREE; optab optab; int icode; machine_mode optab_op2_mode; machine_mode vec_mode; - HOST_WIDE_INT dummy; gimple *first_load = NULL, *prev_first_load = NULL; /* For every stmt in NODE find its def stmt/s. */ @@ -685,15 +719,14 @@ vect_build_slp_tree_1 (vec_info *vinfo, return false; } - scalar_type = vect_get_smallest_scalar_type (stmt, &dummy, &dummy); - vectype = get_vectype_for_scalar_type (scalar_type); + vectype = get_vectype_for_smallest_scalar_type (stmt); if (!vect_record_max_nunits (vinfo, stmt, group_size, vectype, max_nunits)) { /* Fatal mismatch. */ matches[0] = false; - return false; - } + return false; + } if (gcall *call_stmt = dyn_cast (stmt)) { Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10.c 2018-05-09 11:30:41.057096221 +0100 @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include + +#define DEF_LOOP(TYPE) \ + void __attribute__ ((noinline, noclone)) \ + test_##TYPE (TYPE *a, TYPE a1, TYPE a2, TYPE a3, TYPE a4, int n) \ + { \ + for (int i = 0; i < n; i += 2) \ + { \ + a[i] = a[i] >= 1 && a[i] != 3 ? a1 : a2; \ + a[i + 1] = a[i + 1] >= 1 && a[i + 1] != 3 ? a3 : a4; \ + } \ + } + +#define FOR_EACH_TYPE(T) \ + T (int8_t) \ + T (uint8_t) \ + T (int16_t) \ + T (uint16_t) \ + T (int32_t) \ + T (uint32_t) \ + T (int64_t) \ + T (uint64_t) \ + T (_Float16) \ + T (float) \ + T (double) + +FOR_EACH_TYPE (DEF_LOOP) + +/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1h\t} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1w\t} 3 } } */ +/* { dg-final { scan-assembler-times {\tld1d\t} 3 } } */ +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 11 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_10_run.c 2018-05-09 11:30:41.057096221 +0100 @@ -0,0 +1,24 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "vcond_10.c" + +#define N 133 + +#define TEST_LOOP(TYPE) \ + { \ + TYPE a[N]; \ + for (int i = 0; i < N; ++i) \ + a[i] = i % 7; \ + test_##TYPE (a, 10, 11, 12, 13, N); \ + for (int i = 0; i < N; ++i) \ + if (a[i] != 10 + (i & 1) * 2 + (i % 7 == 0 || i % 7 == 3)) \ + __builtin_abort (); \ + } + +int +main (void) +{ + FOR_EACH_TYPE (TEST_LOOP); + return 0; +} Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11.c 2018-05-09 11:30:41.057096221 +0100 @@ -0,0 +1,36 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include + +#define DEF_LOOP(TYPE) \ + void __attribute__ ((noinline, noclone)) \ + test_##TYPE (int *restrict a, TYPE *restrict b, int a1, int a2, \ + int a3, int a4, int n) \ + { \ + for (int i = 0; i < n; i += 2) \ + { \ + a[i] = a[i] >= 1 & b[i] != 3 ? a1 : a2; \ + a[i + 1] = a[i + 1] >= 1 & b[i + 1] != 3 ? a3 : a4; \ + } \ + } + +#define FOR_EACH_TYPE(T) \ + T (int8_t) \ + T (uint8_t) \ + T (int16_t) \ + T (uint16_t) \ + T (int64_t) \ + T (uint64_t) \ + T (double) + +FOR_EACH_TYPE (DEF_LOOP) + +/* { dg-final { scan-assembler-times {\tld1b\t} 2 } } */ +/* { dg-final { scan-assembler-times {\tld1h\t} 2 } } */ +/* 4 for each 8-bit function, 2 for each 16-bit function, 1 for + each 64-bit function. */ +/* { dg-final { scan-assembler-times {\tld1w\t} 15 } } */ +/* 3 64-bit functions * 2 64-bit vectors per 32-bit vector. */ +/* { dg-final { scan-assembler-times {\tld1d\t} 6 } } */ +/* { dg-final { scan-assembler-times {\tsel\tz[0-9]} 15 } } */ Index: gcc/testsuite/gcc.target/aarch64/sve/vcond_11_run.c =================================================================== --- /dev/null 2018-04-20 16:19:46.369131350 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/vcond_11_run.c 2018-05-09 11:30:41.059096142 +0100 @@ -0,0 +1,28 @@ +/* { dg-do run { target aarch64_sve_hw } } */ +/* { dg-options "-O2 -ftree-vectorize -march=armv8-a+sve" } */ + +#include "vcond_11.c" + +#define N 133 + +#define TEST_LOOP(TYPE) \ + { \ + int a[N]; \ + TYPE b[N]; \ + for (int i = 0; i < N; ++i) \ + { \ + a[i] = i % 5; \ + b[i] = i % 7; \ + } \ + test_##TYPE (a, b, 10, 11, 12, 13, N); \ + for (int i = 0; i < N; ++i) \ + if (a[i] != 10 + (i & 1) * 2 + (i % 5 == 0 || i % 7 == 3)) \ + __builtin_abort (); \ + } + +int +main (void) +{ + FOR_EACH_TYPE (TEST_LOOP); + return 0; +}