From patchwork Wed Nov 30 14:07:58 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kyrill Tkachov X-Patchwork-Id: 85873 Delivered-To: patch@linaro.org Received: by 10.182.112.6 with SMTP id im6csp309026obb; Wed, 30 Nov 2016 06:08:32 -0800 (PST) X-Received: by 10.84.133.36 with SMTP id 33mr72526834plf.6.1480514912297; Wed, 30 Nov 2016 06:08:32 -0800 (PST) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id n24si64583795pgc.301.2016.11.30.06.08.32 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 30 Nov 2016 06:08:32 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-return-443065-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-443065-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-443065-patch=linaro.org@gcc.gnu.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; q=dns; s=default; b=meU6DJJ44eNkqNj2X P4LmzCYZmTrFFg8X57qO/CzP9P6ns1/mPswiZpipUoefhwl9EA4cKv9UHIWCoOVr Q8jpdLO8VPeNoh14lj1wUQ3WTrTXgL2kG46d1HGAUXt6+yBgBCX7KgFZBVkyOT37 BLqpCnBOpxr9v9c+B9GdZ/2VY0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :message-id:date:from:mime-version:to:cc:subject:references :in-reply-to:content-type; s=default; bh=4f9NjYhRLj+wnbvh7fUqgum 2iWM=; b=XeJ+ZyVwgIClrua2n2SCYSH2Vl5QVQy19PSdujAlEwM8m1qy6qkoGLh CdJH9Fe62qdelUpvAA8Oc0aTFmZlkpnKchflG+Zaw+CUbu03BqonYW3o2HpIgVrT xAZ6ZUCMgjXQKtZlin6Gz2ywVxUNMtk/qOZdMZrI5kLGJev8dYJI= Received: (qmail 86025 invoked by alias); 30 Nov 2016 14:08:14 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 86011 invoked by uid 89); 30 Nov 2016 14:08:13 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-3.8 required=5.0 tests=BAYES_00, KAM_LAZY_DOMAIN_SECURITY, RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=get_mode_size, GET_MODE_SIZE, 20161130 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 30 Nov 2016 14:08:03 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 78FA916; Wed, 30 Nov 2016 06:08:01 -0800 (PST) Received: from [10.2.207.77] (e100706-lin.cambridge.arm.com [10.2.207.77]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 765083F318; Wed, 30 Nov 2016 06:08:00 -0800 (PST) Message-ID: <583EDD3E.2050108@foss.arm.com> Date: Wed, 30 Nov 2016 14:07:58 +0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Segher Boessenkool , James Greenhalgh CC: Andrew Pinski , GCC Patches , Marcus Shawcroft , Richard Earnshaw Subject: Re: [PATCH][AArch64] Separate shrink wrapping hooks implementation References: <5824836B.5030302@foss.arm.com> <20161110233943.GC17570@gate.crashing.org> <58259AD6.4040203@foss.arm.com> <5825E454.6070302@foss.arm.com> <5829C958.2010601@foss.arm.com> <20161129105732.GA35224@arm.com> <20161129202912.GA12889@gate.crashing.org> In-Reply-To: <20161129202912.GA12889@gate.crashing.org> On 29/11/16 20:29, Segher Boessenkool wrote: > Hi James, Kyrill, > > On Tue, Nov 29, 2016 at 10:57:33AM +0000, James Greenhalgh wrote: >>> +static sbitmap >>> +aarch64_components_for_bb (basic_block bb) >>> +{ >>> + bitmap in = DF_LIVE_IN (bb); >>> + bitmap gen = &DF_LIVE_BB_INFO (bb)->gen; >>> + bitmap kill = &DF_LIVE_BB_INFO (bb)->kill; >>> + >>> + sbitmap components = sbitmap_alloc (V31_REGNUM + 1); >>> + bitmap_clear (components); >>> + >>> + /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets. */ >>> + for (unsigned regno = R0_REGNUM; regno <= V31_REGNUM; regno++) >> The use of R0_REGNUM and V31_REGNUM scare me a little bit, as we're hardcoding >> where the end of the register file is (does this, for example, fall apart >> with the SVE work that was recently posted). Something like a >> LAST_HARDREG_NUM might work? > Components and registers aren't the same thing (you can have components > for things that aren't just a register save, e.g. the frame setup, stack > alignment, save of some non-GPR via a GPR, PIC register setup, etc.) > The loop here should really only cover the non-volatile registers, and > there should be some translation from register number to component number > (it of course is convenient to have a 1-1 translation for the GPRs and > floating point registers). For rs6000 many things in the backend already > use non-symbolic numbers for the FPRs and GPRs, so that is easier there. Anyway, here's the patch with James's comments implemented. I've introduced LAST_SAVED_REGNUM which is used to delimit the registers considered for shrink-wrapping. aarch64_process_components is introduced and used to implement the emit_prologue_components and emit_epilogue_components functions in a single place. Bootstrapped and tested on aarch64-none-linux-gnu. Thanks, Kyrill 2016-11-30 Kyrylo Tkachov * config/aarch64/aarch64.h (machine_function): Add reg_is_wrapped_separately field. * config/aarch64/aarch64.md (LAST_SAVED_REGNUM): Define new constant. * config/aarch64/aarch64.c (emit_set_insn): Change return type to rtx_insn *. (aarch64_save_callee_saves): Don't save registers that are wrapped separately. (aarch64_restore_callee_saves): Don't restore registers that are wrapped separately. (offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p, aarch64_offset_7bit_signed_scaled_p): Move earlier in the file. (aarch64_get_separate_components): New function. (aarch64_get_next_set_bit): Likewise. (aarch64_components_for_bb): Likewise. (aarch64_disqualify_components): Likewise. (aarch64_emit_prologue_components): Likewise. (aarch64_emit_epilogue_components): Likewise. (aarch64_set_handled_components): Likewise. (aarch64_process_components): Likewise. (TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS, TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB, TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS, TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS, TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS): Define. >>> +static void >>> +aarch64_disqualify_components (sbitmap, edge, sbitmap, bool) >>> +{ >>> +} >> Is there no default "do nothing" hook for this? > I can make the shrink-wrap code do nothing here if this hook isn't > defined, if you want? I don't mind either way. If you do it I'll then remove the empty implementation in aarch64. > > Segher commit 194816281ec6da2620bb34c9278ed7edf8bcf0da Author: Kyrylo Tkachov Date: Tue Oct 11 09:25:54 2016 +0100 [AArch64] Separate shrink wrapping hooks implementation diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 82bfe14..48e6e2c 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1138,7 +1138,7 @@ aarch64_is_extend_from_extract (machine_mode mode, rtx mult_imm, /* Emit an insn that's a simple single-set. Both the operands must be known to be valid. */ -inline static rtx +inline static rtx_insn * emit_set_insn (rtx x, rtx y) { return emit_insn (gen_rtx_SET (x, y)); @@ -3135,6 +3135,9 @@ aarch64_save_callee_saves (machine_mode mode, HOST_WIDE_INT start_offset, || regno == cfun->machine->frame.wb_candidate2)) continue; + if (cfun->machine->reg_is_wrapped_separately[regno]) + continue; + reg = gen_rtx_REG (mode, regno); offset = start_offset + cfun->machine->frame.reg_offset[regno]; mem = gen_mem_ref (mode, plus_constant (Pmode, stack_pointer_rtx, @@ -3143,6 +3146,7 @@ aarch64_save_callee_saves (machine_mode mode, HOST_WIDE_INT start_offset, regno2 = aarch64_next_callee_save (regno + 1, limit); if (regno2 <= limit + && !cfun->machine->reg_is_wrapped_separately[regno2] && ((cfun->machine->frame.reg_offset[regno] + UNITS_PER_WORD) == cfun->machine->frame.reg_offset[regno2])) @@ -3191,6 +3195,9 @@ aarch64_restore_callee_saves (machine_mode mode, regno <= limit; regno = aarch64_next_callee_save (regno + 1, limit)) { + if (cfun->machine->reg_is_wrapped_separately[regno]) + continue; + rtx reg, mem; if (skip_wb @@ -3205,6 +3212,7 @@ aarch64_restore_callee_saves (machine_mode mode, regno2 = aarch64_next_callee_save (regno + 1, limit); if (regno2 <= limit + && !cfun->machine->reg_is_wrapped_separately[regno2] && ((cfun->machine->frame.reg_offset[regno] + UNITS_PER_WORD) == cfun->machine->frame.reg_offset[regno2])) { @@ -3224,6 +3232,245 @@ aarch64_restore_callee_saves (machine_mode mode, } } +static inline bool +offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED, + HOST_WIDE_INT offset) +{ + return offset >= -256 && offset < 256; +} + +static inline bool +offset_12bit_unsigned_scaled_p (machine_mode mode, HOST_WIDE_INT offset) +{ + return (offset >= 0 + && offset < 4096 * GET_MODE_SIZE (mode) + && offset % GET_MODE_SIZE (mode) == 0); +} + +bool +aarch64_offset_7bit_signed_scaled_p (machine_mode mode, HOST_WIDE_INT offset) +{ + return (offset >= -64 * GET_MODE_SIZE (mode) + && offset < 64 * GET_MODE_SIZE (mode) + && offset % GET_MODE_SIZE (mode) == 0); +} + +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */ + +static sbitmap +aarch64_get_separate_components (void) +{ + aarch64_layout_frame (); + + sbitmap components = sbitmap_alloc (LAST_SAVED_REGNUM + 1); + bitmap_clear (components); + + /* The registers we need saved to the frame. */ + for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++) + if (aarch64_register_saved_on_entry (regno)) + { + HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno]; + if (!frame_pointer_needed) + offset += cfun->machine->frame.frame_size + - cfun->machine->frame.hard_fp_offset; + /* Check that we can access the stack slot of the register with one + direct load with no adjustments needed. */ + if (offset_12bit_unsigned_scaled_p (DImode, offset)) + bitmap_set_bit (components, regno); + } + + /* Don't mess with the hard frame pointer. */ + if (frame_pointer_needed) + bitmap_clear_bit (components, HARD_FRAME_POINTER_REGNUM); + + unsigned reg1 = cfun->machine->frame.wb_candidate1; + unsigned reg2 = cfun->machine->frame.wb_candidate2; + /* If aarch64_layout_frame has chosen registers to store/restore with + writeback don't interfere with them to avoid having to output explicit + stack adjustment instructions. */ + if (reg2 != INVALID_REGNUM) + bitmap_clear_bit (components, reg2); + if (reg1 != INVALID_REGNUM) + bitmap_clear_bit (components, reg1); + + bitmap_clear_bit (components, LR_REGNUM); + bitmap_clear_bit (components, SP_REGNUM); + + return components; +} + +/* Implement TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB. */ + +static sbitmap +aarch64_components_for_bb (basic_block bb) +{ + bitmap in = DF_LIVE_IN (bb); + bitmap gen = &DF_LIVE_BB_INFO (bb)->gen; + bitmap kill = &DF_LIVE_BB_INFO (bb)->kill; + + sbitmap components = sbitmap_alloc (LAST_SAVED_REGNUM + 1); + bitmap_clear (components); + + /* GPRs are used in a bb if they are in the IN, GEN, or KILL sets. */ + for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++) + if ((!call_used_regs[regno]) + && (bitmap_bit_p (in, regno) + || bitmap_bit_p (gen, regno) + || bitmap_bit_p (kill, regno))) + bitmap_set_bit (components, regno); + + return components; +} + +/* Implement TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS. + Nothing to do for aarch64. */ + +static void +aarch64_disqualify_components (sbitmap, edge, sbitmap, bool) +{ +} + +/* Return the next set bit in BMP from START onwards. Return the total number + of bits in BMP if no set bit is found at or after START. */ + +static unsigned int +aarch64_get_next_set_bit (sbitmap bmp, unsigned int start) +{ + unsigned int nbits = SBITMAP_SIZE (bmp); + if (start == nbits) + return start; + + gcc_assert (start < nbits); + for (unsigned int i = start; i < nbits; i++) + if (bitmap_bit_p (bmp, i)) + return i; + + return nbits; +} + +/* Do the work for aarch64_emit_prologue_components and + aarch64_emit_epilogue_components. COMPONENTS is the bitmap of registers + to save/restore, PROLOGUE_P indicates whether to emit the prologue sequence + for these components or the epilogue sequence. That is, it determines + whether we should emit stores or loads and what kind of CFA notes to attach + to the insns. Otherwise the logic for the two sequences is very + similar. */ + +static void +aarch64_process_components (sbitmap components, bool prologue_p) +{ + rtx ptr_reg = gen_rtx_REG (Pmode, frame_pointer_needed + ? HARD_FRAME_POINTER_REGNUM + : STACK_POINTER_REGNUM); + + unsigned last_regno = SBITMAP_SIZE (components); + unsigned regno = aarch64_get_next_set_bit (components, R0_REGNUM); + rtx_insn *insn = NULL; + + while (regno != last_regno) + { + /* AAPCS64 section 5.1.2 requires only the bottom 64 bits to be saved + so DFmode for the vector registers is enough. */ + machine_mode mode = GP_REGNUM_P (regno) ? DImode : DFmode; + rtx reg = gen_rtx_REG (mode, regno); + HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno]; + if (!frame_pointer_needed) + offset += cfun->machine->frame.frame_size + - cfun->machine->frame.hard_fp_offset; + rtx addr = plus_constant (Pmode, ptr_reg, offset); + rtx mem = gen_frame_mem (mode, addr); + + rtx set = prologue_p ? gen_rtx_SET (mem, reg) : gen_rtx_SET (reg, mem); + unsigned regno2 = aarch64_get_next_set_bit (components, regno + 1); + /* No more registers to handle after REGNO. + Emit a single save/restore and exit. */ + if (regno2 == last_regno) + { + insn = emit_insn (set); + RTX_FRAME_RELATED_P (insn) = 1; + if (prologue_p) + add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set)); + else + add_reg_note (insn, REG_CFA_RESTORE, reg); + break; + } + + HOST_WIDE_INT offset2 = cfun->machine->frame.reg_offset[regno2]; + /* The next register is not of the same class or its offset is not + mergeable with the current one into a pair. */ + if (!satisfies_constraint_Ump (mem) + || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2) + || (offset2 - cfun->machine->frame.reg_offset[regno]) + != GET_MODE_SIZE (mode)) + { + insn = emit_insn (set); + RTX_FRAME_RELATED_P (insn) = 1; + if (prologue_p) + add_reg_note (insn, REG_CFA_OFFSET, copy_rtx (set)); + else + add_reg_note (insn, REG_CFA_RESTORE, reg); + + regno = regno2; + continue; + } + + /* REGNO2 can be saved/restored in a pair with REGNO. */ + rtx reg2 = gen_rtx_REG (mode, regno2); + if (!frame_pointer_needed) + offset2 += cfun->machine->frame.frame_size + - cfun->machine->frame.hard_fp_offset; + rtx addr2 = plus_constant (Pmode, ptr_reg, offset2); + rtx mem2 = gen_frame_mem (mode, addr2); + rtx set2 = prologue_p ? gen_rtx_SET (mem2, reg2) + : gen_rtx_SET (reg2, mem2); + + if (prologue_p) + insn = emit_insn (aarch64_gen_store_pair (mode, mem, reg, mem2, reg2)); + else + insn = emit_insn (aarch64_gen_load_pair (mode, reg, mem, reg2, mem2)); + + RTX_FRAME_RELATED_P (insn) = 1; + if (prologue_p) + { + add_reg_note (insn, REG_CFA_OFFSET, set); + add_reg_note (insn, REG_CFA_OFFSET, set2); + } + else + { + add_reg_note (insn, REG_CFA_RESTORE, reg); + add_reg_note (insn, REG_CFA_RESTORE, reg2); + } + + regno = aarch64_get_next_set_bit (components, regno2 + 1); + } +} + +/* Implement TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS. */ + +static void +aarch64_emit_prologue_components (sbitmap components) +{ + aarch64_process_components (components, true); +} + +/* Implement TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS. */ + +static void +aarch64_emit_epilogue_components (sbitmap components) +{ + aarch64_process_components (components, false); +} + +/* Implement TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS. */ + +static void +aarch64_set_handled_components (sbitmap components) +{ + for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++) + if (bitmap_bit_p (components, regno)) + cfun->machine->reg_is_wrapped_separately[regno] = true; +} + /* AArch64 stack frames generated by this compiler look like: +-------------------------------+ @@ -3982,29 +4229,6 @@ aarch64_classify_index (struct aarch64_address_info *info, rtx x, return false; } -bool -aarch64_offset_7bit_signed_scaled_p (machine_mode mode, HOST_WIDE_INT offset) -{ - return (offset >= -64 * GET_MODE_SIZE (mode) - && offset < 64 * GET_MODE_SIZE (mode) - && offset % GET_MODE_SIZE (mode) == 0); -} - -static inline bool -offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED, - HOST_WIDE_INT offset) -{ - return offset >= -256 && offset < 256; -} - -static inline bool -offset_12bit_unsigned_scaled_p (machine_mode mode, HOST_WIDE_INT offset) -{ - return (offset >= 0 - && offset < 4096 * GET_MODE_SIZE (mode) - && offset % GET_MODE_SIZE (mode) == 0); -} - /* Return true if MODE is one of the modes for which we support LDP/STP operations. */ @@ -14573,6 +14797,30 @@ aarch64_libgcc_floating_mode_supported_p #define TARGET_SCHED_FIRST_CYCLE_MULTIPASS_DFA_LOOKAHEAD_GUARD \ aarch64_first_cycle_multipass_dfa_lookahead_guard +#undef TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS +#define TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS \ + aarch64_get_separate_components + +#undef TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB +#define TARGET_SHRINK_WRAP_COMPONENTS_FOR_BB \ + aarch64_components_for_bb + +#undef TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS +#define TARGET_SHRINK_WRAP_DISQUALIFY_COMPONENTS \ + aarch64_disqualify_components + +#undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS +#define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_COMPONENTS \ + aarch64_emit_prologue_components + +#undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS +#define TARGET_SHRINK_WRAP_EMIT_EPILOGUE_COMPONENTS \ + aarch64_emit_epilogue_components + +#undef TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS +#define TARGET_SHRINK_WRAP_SET_HANDLED_COMPONENTS \ + aarch64_set_handled_components + #undef TARGET_TRAMPOLINE_INIT #define TARGET_TRAMPOLINE_INIT aarch64_trampoline_init diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 584ff5c..c417569 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -591,6 +591,8 @@ struct GTY (()) aarch64_frame typedef struct GTY (()) machine_function { struct aarch64_frame frame; + /* One entry for each hard register. */ + bool reg_is_wrapped_separately[LAST_SAVED_REGNUM]; } machine_function; #endif diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 3b67be0..6b4d0ba 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -59,6 +59,7 @@ (define_constants (V0_REGNUM 32) (V15_REGNUM 47) (V31_REGNUM 63) + (LAST_SAVED_REGNUM 63) (SFP_REGNUM 64) (AP_REGNUM 65) (CC_REGNUM 66)