From patchwork Mon Dec 11 15:44:34 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Greenhalgh X-Patchwork-Id: 121418 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp2956372qgn; Mon, 11 Dec 2017 07:46:59 -0800 (PST) X-Google-Smtp-Source: ACJfBovYPglPH27WLftQR3vO3Fgt//+gjL2ksnGfAGVsXR7ivC7b+3xyMM4ioq4oCactlzo1dJUt X-Received: by 10.98.196.155 with SMTP id h27mr751671pfk.137.1513007219499; Mon, 11 Dec 2017 07:46:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1513007219; cv=none; d=google.com; s=arc-20160816; b=ASBZls5kBFLBVo/YiRZCKoei+wUe7dz6pbxEg2fOQbABQ5pAH/7/UzYm/7jXlm+V+W c3K1ggcvLYfJ0UY49lYi6aeOpeGthVdK47TJvJ32/G5VHOmJpPgW/cSUEXKRJBC240OF ATBRoH3ZKK35Tr77OsciIOfWWAfqoe6H5dg0bpLUw/Kwkja12Ca17kOmpFbGOo8HGarx DBpcpg7bXgxi+hm5sRv7Zy3wHm6PU5IkZqPbTuT/99dvrN+E/h13pVkn817IO+bShI0e vBO0du8c9WFgbN6WnZMuqfAEHxrl70rTCYjarT+yonwJbExkDT9cJQNcL6WAKoZNbXLP cvjQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=spamdiagnosticmetadata:spamdiagnosticoutput:nodisclaimer :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:delivered-to:sender:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mailing-list:dkim-signature :domainkey-signature:arc-authentication-results; bh=6jQtU5nw+aTP/xACWiEsSlUhEh7CIzzSgQebJdnl7oE=; b=rNP5bFjDK7VZiER5MXXwLQl/SpbAN2lIpMCvTf2ZQT50w71xscOSojDr10ZTn4k7Bn Eo+npD9e0htMMcdSSv27+Q89r9oyJCFbSjhDfng1PyDOhHhb0CyB4R2G6m9wmAe7d/tX 4YlXU2JiIEhovWKWb7hsjtmacsU/2V+3EyobSSMbxbm3oLYuqrOzTDsi1yHkg2W64qmc RvQUwS1Xlm1jRz+h/km2R7HNEO2ZkJUzt5EvJjYc8ItC8VKpCmWKiEyb8CBN2uXjoSge jkDVYuAA8WkOpyrdmkZm6NHrKZIGHeEb6gKRz8D1ZHCBG8Y/FiHirBAx1xMzdPgebbnA my5A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=TUU58JKS; spf=pass (google.com: domain of gcc-patches-return-468918-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-468918-patch=linaro.org@gcc.gnu.org Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id z13si10012317pgo.335.2017.12.11.07.46.59 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 11 Dec 2017 07:46:59 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-468918-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=TUU58JKS; spf=pass (google.com: domain of gcc-patches-return-468918-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-468918-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type; q=dns; s=default; b=iG0D/xNREmI5voSa G9Au1vuSpTXfBPsadsn3MzCEaZL0EnqsOHL3AA/w6AvJMk5SIEt8or1eF3p94V7D BNfODD/2Jf2HmXKo8h/Sx6I+9ox/N16pS0ROXpOKejC7Yxej3BDlhkrwCDLI3twY IipS3HzYplbXD2PrsZNOd+RY038= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type; s=default; bh=MAm/zt0RE++Z5+ZOkPJ4cT zOOpc=; b=TUU58JKSe0ulfp/Cdjb1I0WKq+C40oWxt8IQK8LP6OsRjYo2smEedF N/s0hDE9+ATM07DSA2YuppF9UyQVuYepCyTkHqhtn7HzwFDf0b+NoGt6ApzjmEpN yBOq2gUOKWv9xVor1YVnu2TNahhMbKxubN8/lHZdgKMP+PEw0jjbk= Received: (qmail 60230 invoked by alias); 11 Dec 2017 15:46:42 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 60217 invoked by uid 89); 11 Dec 2017 15:46:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-24.7 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.2 spammy=equally, (unknown), H*RU:200, Hx-spam-relays-external:200 X-HELO: EUR03-VE1-obe.outbound.protection.outlook.com Received: from mail-eopbgr50041.outbound.protection.outlook.com (HELO EUR03-VE1-obe.outbound.protection.outlook.com) (40.107.5.41) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 11 Dec 2017 15:46:37 +0000 Received: from VI1PR0802CA0039.eurprd08.prod.outlook.com (2603:10a6:800:a9::25) by HE1PR0801MB1531.eurprd08.prod.outlook.com (2603:10a6:3:13::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.302.9; Mon, 11 Dec 2017 15:46:32 +0000 Received: from VE1EUR03FT018.eop-EUR03.prod.protection.outlook.com (2a01:111:f400:7e09::200) by VI1PR0802CA0039.outlook.office365.com (2603:10a6:800:a9::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.302.9 via Frontend Transport; Mon, 11 Dec 2017 15:46:32 +0000 Authentication-Results: spf=pass (sender IP is 217.140.96.140) smtp.mailfrom=arm.com; redhat.com; dkim=none (message not signed) header.d=none; redhat.com; dmarc=bestguesspass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 217.140.96.140 as permitted sender) receiver=protection.outlook.com; client-ip=217.140.96.140; helo=nebula.arm.com; Received: from nebula.arm.com (217.140.96.140) by VE1EUR03FT018.mail.protection.outlook.com (10.152.18.135) with Microsoft SMTP Server (version=TLS1_0, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA_P384) id 15.20.302.6 via Frontend Transport; Mon, 11 Dec 2017 15:46:28 +0000 Received: from e107456-lin.cambridge.arm.com (10.1.2.79) by mail.arm.com (10.1.106.66) with Microsoft SMTP Server id 14.3.294.0; Mon, 11 Dec 2017 15:44:38 +0000 From: James Greenhalgh To: CC: , , , Subject: [patch AArch64] Do not perform a vector splat for vector initialisation if it is not useful Date: Mon, 11 Dec 2017 15:44:34 +0000 Message-ID: <1513007074-18802-1-git-send-email-james.greenhalgh@arm.com> In-Reply-To: <1513001933-17348-1-git-send-email-james.greenhalgh@arm.com> References: <1513001933-17348-1-git-send-email-james.greenhalgh@arm.com> MIME-Version: 1.0 X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-HT: Tenant X-Forefront-Antispam-Report: CIP:217.140.96.140; IPV:CAL; SCL:-1; CTRY:GB; EFV:NLI; SFV:NSPM; SFS:(10009020)(376002)(39860400002)(346002)(2980300002)(438002)(377424004)(199004)(189003)(72206003)(356003)(76176011)(2476003)(2351001)(26826003)(305945005)(478600001)(5660300001)(5890100001)(246002)(8676002)(4326008)(966005)(104016004)(106466001)(512874002)(33964004)(568964002)(2950100002)(7696005)(77096006)(84326002)(8936002)(6916009)(54906003)(2906002)(36756003)(6306002)(4610100001)(316002)(106002)(16586007)(86362001)(50226002)(4001150100001)(6666003)(59450400001); DIR:OUT; SFP:1101; SCL:1; SRVR:HE1PR0801MB1531; H:nebula.arm.com; FPR:; SPF:Pass; PTR:fw-tnat.cambridge.arm.com; A:1; MX:1; LANG:en; X-Microsoft-Exchange-Diagnostics: 1; VE1EUR03FT018; 1:6tBq1+sscp5O7TXAQH0flymH6FrmR5qCKMcvnqCneNI748lk1CDgvD7S5hLLjB4mZCBCV+bMjKq6XHH6nsFO5s0CUlM86wniQ+tB5epjHQ6VaDqPO+58jOXz65THmvF8 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: f1c73c4a-297f-4731-781c-08d540ae59d1 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(5600026)(4604075)(4608076)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(2017052603307)(49563074); SRVR:HE1PR0801MB1531; X-Microsoft-Exchange-Diagnostics: 1; HE1PR0801MB1531; 3:e/Ti309f01HJP0HO7rUL7GcZbxN5qcVZfbcIzh/6MIJpJDvXe6CXSaQnVUwdUxBeJ2XzY5KYGKhj1X4NHnXpG2Z1ryvTp/MkpQpq6W4IzWkU8P1G9vu/zeOC5bW5nPZHWYWsusxtCSWuYstjNbEQYdCG8RiTBOcVfTqGdBCRpPL3YvuazuB9T4X4nU2/SheTxrADYBmIFk2KNSMDX49f1sbOfKGOAIzYzjPO8g59bfrH9VRFoq6HSUfHMT/SZYvFZd9BN/KIo7R3brfqwwKxPuymNvdPXxglDuqKocI1xwbtlobBQWe2CU07Yd4YyEodqScUGgllK5rUzBPp/s1zSEqPrNCbgZsJwEJVnPVoyOw=; 25:7DXNLbEZ/s8Pk7DNcd9dWk8ZlgsGvtZONkNhuDSF01pw/aDH2kX8obzAg0Q9444Sz/K7/c/j/yMK2lfPj0I+5bUXnux/rhLRQztMOeA9WGi1LXLi40rW8gwCjOYw8dtmXqaLLmkqgN1JsqmgPghbKfihyg0TXxQcHAAAkcvKVrIByjbPX6/W7TFu5m8QxR7n7J4ypZqDznYTO2jd6nesEb0c1gHUIo9qwnLAws0O9nt9+dbq57xsNBUqdUVGGe9/pYhcfGhDPAvLrfxY7/z4S4KJqIlD0TqaWI5Re1Cf0Ic0TUBDu+MwWybQZUWSQQBZyGxtpppxzWPBbSp3YF//Tw== X-MS-TrafficTypeDiagnostic: HE1PR0801MB1531: X-Microsoft-Exchange-Diagnostics: 1; HE1PR0801MB1531; 31:u7LeClj7HnvYc6NGJRFMFBC7E/vypRq5H8wKRx3k1dVFvmFN+JZBkS9bhcQ18uUsvQBBRkM1UWNWX3WaX9AGDyHXtqCwNg+DFxdlLBl8hNcHDR5V8VVppvd7Q8w8rVyly+Xlh1OgV7CAI0wVceZCKJ4BrgUDj11D0f77Nlq8yzXPKaQMMVHxfoomrJFhqYZD148TeTEWsJbjTch0ilNSpsdqghGcjNVki0rCVM/xFeM=; 20:4DhSQgi9NAp7fN1JamOkfM16daM8D6VgkUq6rXKTKKKgvKO13eV9E00tM4PbOMIku69uJOLU/ccZM5hPXoCDIZ5Ty6W5vDw97XT2M9clWV4duMh/CyKq/jG73l1qNBYFygcrL9vZG8+Y2PLOlcCKtYAY7Ry3uk8e4G1w3FaFb/lZx6Tr2t6Tpf0KwNNJKMmhP9NSqnuIwH56QIzZW3qQJWLTZOn0lGoTpOmkVF9Bf0B/AJ62Vy8AO5Q/Rrp0mRRY; 4:/Fv8H9dSpXe/mfNLdlWqGhz1bK93Z43YpKden4KLV+plj3OHdzodSlP4anap/D8jrJfILmBCFVjKmXT1AjR7Wm/9VwflWbgzMccopAT8+ODPPChCpaxVf6naVfuRpSnpKcmEoeszqlCuh56r5+f75S9aCgYrT45MO0MpJwtkof9ns6Ukdsx8uQeFXsFrEJR/xD4oM05cRFmd1lYObEl86wGiO34i6YSIR+y+dC6ajn0/Qc3vog9bwSbAI1C2fFCHlNbYlap0bfdsNQpyAx9npO7T7JpDDiF1aGH54BItGvlk0/sNh5XtFWCSezOUjA0WljXptCYHKdMRcy8SVlx2oIqchCXGOVnA1Bb+jQhWaJWBLf0nr4Wj7BtGiBJNTary NoDisclaimer: True X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(180628864354917)(22074186197030)(183786458502308); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(102415395)(6040450)(2401047)(8121501046)(5005006)(3231022)(93006095)(93004095)(10201501046)(3002001)(6055026)(6041248)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(20161123562025)(20161123560025)(20161123564025)(6072148)(201708071742011); SRVR:HE1PR0801MB1531; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:HE1PR0801MB1531; X-Forefront-PRVS: 0518EEFB48 X-Microsoft-Exchange-Diagnostics: 1; HE1PR0801MB1531; 23:JIS2mfbJ4UNUNHDLqqetoh79d4zuewbJmZwv2rGyQeSBNT0E199CrstnXdPFocyJmBx/zd2G0zAy+87p3Dwg39ief0lzxzQpylAwSPmvaTxKXQjoTLv+BwsxPQi6tge+ZKaQVmLeC1wnRX0abspgf1LfYZIL1MEdxJa0U/7K0ys5Pfl4sOiTsdjsaX0meGrV0dILcCIi4Jj4E3UyrB/Rb6ceXj3H3SsdJpgGPpHbKWAEA4QLcsmaFqf+EtR3Odx7/Vviu+FmDOHfIe7Vv0wPVaYDQ3wDShRfFTEGVWtbRGmC5W/TYckuRZx9CqeN7Qu3pN+zIPtCM4pal0o7fWRem3/J4d2wiwrh9pPYsoJu0nZwrM8MDJvi6tvVRvJBk1q+PHH3MRoJuMRHvgQeH8jiOzpmqQGuEFnxzuiaWmweKJ5lAUQYFPQuZhrlNmhRagqlMZG6aFo3MgFXynvpoGBq1DRzA/QGMs0NUC/82aNoL7Z7TSE1bkfVW4g+WLiKEvpuES+2cM4Lx1Xpk5RicQ+8SgNJDzuVbR7ccmgqIDWq/obrx1mdYH4evcCas0iWdfRDzNlTsOQqk8We93YD27mZYmVN+BBdDVZBlV6mnz7nFw8bnr62k2UKLgPmqTdbrrHiWGcx+n4QLbbQGoOFoZ7nOo4YbAvF4Hz4eOBvwxJ2AtmqJTncrF0COpBZJ/N9rYX7w6PgGUO0UiL187bmt3mxlNpv5JSBrVpQbp/TZHi2Fu9IsVB8AkEMK/HajF83RKTBZp7+QHESHuniPmnn6gvqA/h64e8zfpWBx7bbKn51xgnnRz/GLYOfPXVF3ugwtT8ZHtzo95XZkEC45/SmQPP3ZrIVuD3Evi/6KmgfbfHD8Zj1LoAKszK/JoopTLyCEqu+16LB6A6bINQeww+7UcOPeGLiVpymh3R4xb4pvBXh6ErsgBxG51fnXo3E8dvLa5mSX87nVwaIkG+dycJcNb9rnlphLQeHOmrKbaEv3zeMl5Hm0VJcR9pfLK7ghECBnsm+rLMfTErnr+gDeoEw7QWrgreHa1l2lPPKN2qvYrQSgaIV5wuYlZwhX8Cg902eM09A5bULqX9hwWA4A0rRx5iSvQ== X-Microsoft-Exchange-Diagnostics: 1; HE1PR0801MB1531; 6:qJ37NNghyQiY2nh3bexoYQeVFYYWqLurUKruo1U2ahIikR7v4k5qj0bvatASuuupWCSacNOVCvBprU/TJnKad+yTcqgcVq48DfPaPAJMH4pHm4Be6/fvavdEyZMOxXXjW+gkoNF/CeaNpHv5xgLxC0WkXVUtWdpU55hT0mX10ujaLEiMuNShe9M1A9VnlAuKmkWS0A9chJpLyxotX2NObKrVYDiWqSO2TKNbsDeNge/giqCRdHEQ4OMGrzTCDSQY7KfYTnTXnF3cIFU0AvkfcsjxdmAfjYd5rilLV8UrfazogcFxTHgxtkFCnBveniJTZ50IO5nV5IE4RPENXA/6WNssLORG7c1e8EQ5JxV25Rg=; 5:LDQlDkwlbYXqb9wz7livZVWxbK64q9CABYjkc+W8yE9mIzaXbQGcMmNXVSAUuFM81xidW1yUbalzZ4TxVFrRbR/FZV5koN/paboqQLuPLRUamqpK98rLfSrb33v/rvkJFs6F6gSskfR3KWrUpycrQzX3aZ0vKQDXEaLepwgLDPE=; 24:npi9QOdFUvuhhb0Npb+vIOc5CtS8avD0KvRXD1RSx09w63OQpYhaFqlenZNL/hGhmpHjWxDnOy3vS5+cN4EfZD4+0cl+CL5nFeZrEOuWCak=; 7:PsnQbEDEqYoRclV75xyjaBMof63IuynFXCREcDree0NGtQfk6lmOsVDCiF4K/KP04U5KzGn+Ij3VbUOC26A58/y4GH1NUEUMxHOKkuvTePYqD9chCGZ+tUpmgs74UFIMvqu8ebfVl0bQqkyRyndyoMZ7BX8KYxxAI5c8s4nbYV/QfUdhNwJnFdUCVgCskC86c6LpEiH/LbYLvocHMTw6L54LbsD9o3QWSlZxAGCWxr/7mqjg++5VmmUhFvHdYxee SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Dec 2017 15:46:28.3453 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f1c73c4a-297f-4731-781c-08d540ae59d1 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[217.140.96.140]; Helo=[nebula.arm.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1PR0801MB1531 X-IsSubscribed: yes Hi, In the testcase in this patch we create an SLP vector with only two elements. Our current vector initialisation code will first duplicate the first element to both lanes, then overwrite the top lane with a new value. This duplication can be clunky and wasteful. Better would be to simply use the fact that we will always be overwriting the remaining bits, and simply move the first element to the corrcet place (implicitly zeroing all other bits). This reduces the code generation for this case, and can allow more efficient addressing modes, and other second order benefits for AArch64 code which has been vectorized to V2DI mode. Note that the change is generic enough to catch the case for any vector mode, but is expected to be most useful for 2x64-bit vectorization. Unfortunately, on its own, this would cause failures in gcc.target/aarch64/load_v2vec_lanes_1.c and gcc.target/aarch64/store_v2vec_lanes.c , which expect to see many more vec_merge and vec_duplicate for their simplifications to apply. To fix this, add a special case to the AArch64 code if we are loading from two memory addresses, and use the load_pair_lanes patterns directly. We also need a new pattern in simplify-rtx.c:simplify_ternary_operation , to catch: (vec_merge:OUTER (vec_duplicate:OUTER x:INNER) (subreg:OUTER y:INNER 0) (const_int N)) And simplify it to: (vec_concat:OUTER x:INNER y:INNER) or (vec_concat y x) This is similar to the existing patterns which are tested in this function, without requiring the second operand to also be a vec_duplicate. Bootstrapped and tested on aarch64-none-linux-gnu and tested on aarch64-none-elf. Note that this requires https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00614.html if we don't want to ICE creating broken vector zero extends. Are the non-AArch64 parts OK? Thanks, James --- 2017-12-11 James Greenhalgh * config/aarch64/aarch64.c (aarch64_expand_vector_init): Modify code generation for cases where splatting a value is not useful. * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge across a vec_duplicate and a paradoxical subreg forming a vector mode to a vec_concat. 2017-12-11 James Greenhalgh * gcc.target/aarch64/vect-slp-dup.c: New. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 83d8607..8abb8e4 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -12105,9 +12105,51 @@ aarch64_expand_vector_init (rtx target, rtx vals) maxv = matches[i][1]; } - /* Create a duplicate of the most common element. */ - rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement)); - aarch64_emit_move (target, gen_vec_duplicate (mode, x)); + /* Create a duplicate of the most common element, unless all elements + are equally useless to us, in which case just immediately set the + vector register using the first element. */ + + if (maxv == 1) + { + /* For vectors of two 64-bit elements, we can do even better. */ + if (n_elts == 2 + && (inner_mode == E_DImode + || inner_mode == E_DFmode)) + + { + rtx x0 = XVECEXP (vals, 0, 0); + rtx x1 = XVECEXP (vals, 0, 1); + /* Combine can pick up this case, but handling it directly + here leaves clearer RTL. + + This is load_pair_lanes, and also gives us a clean-up + for store_pair_lanes. */ + if (memory_operand (x0, inner_mode) + && memory_operand (x1, inner_mode) + && !STRICT_ALIGNMENT + && rtx_equal_p (XEXP (x1, 0), + plus_constant (Pmode, + XEXP (x0, 0), + GET_MODE_SIZE (inner_mode)))) + { + rtx t; + if (inner_mode == DFmode) + t = gen_load_pair_lanesdf (target, x0, x1); + else + t = gen_load_pair_lanesdi (target, x0, x1); + emit_insn (t); + return; + } + } + rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0)); + aarch64_emit_move (target, lowpart_subreg (mode, x, inner_mode)); + maxelement = 0; + } + else + { + rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement)); + aarch64_emit_move (target, gen_vec_duplicate (mode, x)); + } /* Insert the rest. */ for (int i = 0; i < n_elts; i++) diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 806c309..ed16f70 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -5785,6 +5785,36 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode, return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1); } + /* Replace: + + (vec_merge:outer (vec_duplicate:outer x:inner) + (subreg:outer y:inner 0) + (const_int N)) + + with (vec_concat:outer x:inner y:inner) if N == 1, + or (vec_concat:outer y:inner x:inner) if N == 2. + + Implicitly, this means we have a paradoxical subreg, but such + a check is cheap, so make it anyway. + + Only applies for vectors of two elements. */ + if (GET_CODE (op0) == VEC_DUPLICATE + && GET_CODE (op1) == SUBREG + && GET_MODE (op1) == GET_MODE (op0) + && GET_MODE (SUBREG_REG (op1)) == GET_MODE (XEXP (op0, 0)) + && paradoxical_subreg_p (op1) + && SUBREG_BYTE (op1) == 0 + && GET_MODE_NUNITS (GET_MODE (op0)) == 2 + && GET_MODE_NUNITS (GET_MODE (op1)) == 2 + && IN_RANGE (sel, 1, 2)) + { + rtx newop0 = XEXP (op0, 0); + rtx newop1 = SUBREG_REG (op1); + if (sel == 2) + std::swap (newop0, newop1); + return simplify_gen_binary (VEC_CONCAT, mode, newop0, newop1); + } + /* Replace (vec_merge (vec_duplicate x) (vec_duplicate y) (const_int n)) with (vec_concat x y) or (vec_concat y x) depending on value diff --git a/gcc/testsuite/gcc.target/aarch64/vect-slp-dup.c b/gcc/testsuite/gcc.target/aarch64/vect-slp-dup.c new file mode 100644 index 0000000..0541e48 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-slp-dup.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ + +/* { dg-options "-O3 -ftree-vectorize -fno-vect-cost-model" } */ + +void bar (double); + +void +foo (double *restrict in, double *restrict in2, + double *restrict out1, double *restrict out2) +{ + for (int i = 0; i < 1024; i++) + { + out1[i] = in[i] + 2.0 * in[i+128]; + out1[i+1] = in[i+1] + 2.0 * in2[i]; + bar (in[i]); + } +} + +/* { dg-final { scan-assembler-not "dup\tv\[0-9\]+.2d, v\[0-9\]+" } } */ +