From patchwork Mon Oct 24 14:27:10 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 78970 Delivered-To: patch@linaro.org Received: by 10.140.97.247 with SMTP id m110csp2607388qge; Mon, 24 Oct 2016 07:27:41 -0700 (PDT) X-Received: by 10.107.25.20 with SMTP id 20mr12586125ioz.189.1477319261652; Mon, 24 Oct 2016 07:27:41 -0700 (PDT) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id ey6si13102709pab.6.2016.10.24.07.27.41 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 24 Oct 2016 07:27:41 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-439406-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-439406-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-439406-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; q=dns; s=default; b=PTxe5Ae6yghKl3PjBSObmIsYs76zJecxAjCKANYFvPJ 2G+TQvLo42dup4sfRPsQPUdn/8hkvWAqHQYGFQFiH+nCAiTEmkA4kcPIMh3IIloL +PhKM0aAiOqLgIU5RoB+nClykPl+MTheGxAYGGkjXRcH3KSap4tPmCc+mFFScM90 = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:content-type; s=default; bh=rO9ImkyM/CxmRwIDJGyTQC9thMk=; b=JZB0h+iNj1eoTDE8j ZcIrtBnzqQMW0/bv0AUFCmqE8yjQ8re/9MRc7uuIx/y8GEqkB+BO/unDA3b3ja9B kKkdEk4kmO5kkzDNfewCQHejQB56BcaEG+ms4ecfIHjoIAqCdjAveZcrL2RGzb0o mM90VMfzNCRwu0ueUAThNMd8os= Received: (qmail 27316 invoked by alias); 24 Oct 2016 14:27:26 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 27299 invoked by uid 89); 24 Oct 2016 14:27:25 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, KAM_LOTSOFHASH, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=STR, 0x140, DST, force_reg X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 24 Oct 2016 14:27:14 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1C0DC2B; Mon, 24 Oct 2016 07:27:13 -0700 (PDT) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5C6C03F220; Mon, 24 Oct 2016 07:27:12 -0700 (PDT) Message-ID: <580E1A3E.6090108@foss.arm.com> Date: Mon, 24 Oct 2016 15:27:10 +0100 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches CC: Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Subject: [PATCH][AArch64] Expand DImode constant stores to two SImode stores when profitable Hi all, When storing a 64-bit immediate that has equal bottom and top halves we currently synthesize the repeating 32-bit pattern twice and perform a single X-store. With this patch we synthesize the 32-bit pattern once into a W register and store that twice using an STP. This reduces codesize bloat from synthesising the same constant multiple times at the expense of converting a store to a store-pair. It will only trigger if we can save two or more instructions, so it will only transform: mov x1, 49370 movk x1, 0xc0da, lsl 32 str x1, [x0] into: mov w1, 49370 stp w1, w1, [x0] when optimising for -Os, whereas it will always transform a 4-insn synthesis sequence into a two-insn sequence + STP (see comments in the patch). This patch triggers already but will trigger more with the store merging pass that I'm working on since that will generate more of these repeating 64-bit constants. This helps improve codegen on 456.hmmer where store merging can sometimes create very complex repeating constants and target-specific expand needs to break them down. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2016-10-24 Kyrylo Tkachov * config/aarch64/aarch64.md (mov): Call aarch64_split_dimode_const_store on DImode constant stores. * config/aarch64/aarch64-protos.h (aarch64_split_dimode_const_store): New prototype. * config/aarch64/aarch64.c (aarch64_split_dimode_const_store): New function. 2016-10-24 Kyrylo Tkachov * gcc.target/aarch64/store_repeating_constant_1.c: New test. * gcc.target/aarch64/store_repeating_constant_2.c: Likewise. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 07a8cd0455d64a861cb919083a9d369bf23724b7..4c551ef143d3b32e94bd58989c85ebd3352cdd9b 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -337,6 +337,7 @@ bool aarch64_simd_scalar_immediate_valid_for_move (rtx, machine_mode); bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool); bool aarch64_simd_valid_immediate (rtx, machine_mode, bool, struct simd_immediate_info *); +bool aarch64_split_dimode_const_store (rtx, rtx); bool aarch64_symbolic_address_p (rtx); bool aarch64_uimm12_shift (HOST_WIDE_INT); bool aarch64_use_return_insn_p (void); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 3861e7d9ed82ebaffe93471fec08a72c70cfd133..0980befa42be65a760ad06e43decec77bff1b762 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -13141,6 +13141,61 @@ aarch64_expand_movmem (rtx *operands) return true; } +/* Split a DImode store of a CONST_INT SRC to MEM DST as two + SImode stores. Handle the case when the constant has identical + bottom and top halves. This is beneficial when the two stores can be + merged into an STP and we avoid synthesising potentially expensive + immediates twice. Return true if such a split is possible. */ + +bool +aarch64_split_dimode_const_store (rtx dst, rtx src) +{ + rtx lo = gen_lowpart (SImode, src); + rtx hi = gen_highpart_mode (SImode, DImode, src); + + bool size_p = optimize_function_for_size_p (cfun); + + if (!rtx_equal_p (lo, hi)) + return false; + + unsigned int orig_cost + = aarch64_internal_mov_immediate (NULL_RTX, src, false, DImode); + unsigned int lo_cost + = aarch64_internal_mov_immediate (NULL_RTX, lo, false, SImode); + + /* We want to transform: + MOV x1, 49370 + MOVK x1, 0x140, lsl 16 + MOVK x1, 0xc0da, lsl 32 + MOVK x1, 0x140, lsl 48 + STR x1, [x0] + into: + MOV w1, 49370 + MOVK w1, 0x140, lsl 16 + STP w1, w1, [x0] + So we want to perform this only when we save two instructions + or more. When optimizing for size, however, accept any code size + savings we can. */ + if (size_p && orig_cost <= lo_cost) + return false; + + if (!size_p + && (orig_cost <= lo_cost + 1)) + return false; + + rtx mem_lo = adjust_address (dst, SImode, 0); + if (!aarch64_mem_pair_operand (mem_lo, SImode)) + return false; + + rtx tmp_reg = gen_reg_rtx (SImode); + aarch64_expand_mov_immediate (tmp_reg, lo); + rtx mem_hi = aarch64_move_pointer (mem_lo, GET_MODE_SIZE (SImode)); + + emit_insn (gen_store_pairsi (mem_lo, tmp_reg, mem_hi, tmp_reg)); + + return true; +} + /* Implement the TARGET_ASAN_SHADOW_OFFSET hook. */ static unsigned HOST_WIDE_INT diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 8861ac18f4e33ada3bc6dde0e4667fd040b1c213..ec423eb4b048609daaf2f81b0ae874f2e4350f69 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1010,6 +1010,11 @@ (define_expand "mov" (match_operand:GPI 1 "general_operand" ""))] "" " + if (MEM_P (operands[0]) && CONST_INT_P (operands[1]) + && mode == DImode + && aarch64_split_dimode_const_store (operands[0], operands[1])) + DONE; + if (GET_CODE (operands[0]) == MEM && operands[1] != const0_rtx) operands[1] = force_reg (mode, operands[1]); diff --git a/gcc/testsuite/gcc.target/aarch64/store_repeating_constant_1.c b/gcc/testsuite/gcc.target/aarch64/store_repeating_constant_1.c new file mode 100644 index 0000000000000000000000000000000000000000..fc964272eb45dc067fac40a390cf67121b51c180 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/store_repeating_constant_1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +void +foo (unsigned long long *a) +{ + a[0] = 0x0140c0da0140c0daULL; +} + +/* { dg-final { scan-assembler-times "movk\\tw.*" 1 } } */ +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]+.*" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/store_repeating_constant_2.c b/gcc/testsuite/gcc.target/aarch64/store_repeating_constant_2.c new file mode 100644 index 0000000000000000000000000000000000000000..c421277989adcf446ad8a7b3ab9060602c03a7ea --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/store_repeating_constant_2.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-Os" } */ + +/* Check that for -Os we synthesize only the bottom half and then + store it twice with an STP rather than synthesizing it twice in each + half of an X-reg. */ + +void +foo (unsigned long long *a) +{ + a[0] = 0xc0da0000c0daULL; +} + +/* { dg-final { scan-assembler-times "mov\\tw.*" 1 } } */ +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]+.*" 1 } } */