From patchwork Thu Oct 2 08:31:09 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramana Radhakrishnan X-Patchwork-Id: 38270 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-wg0-f71.google.com (mail-wg0-f71.google.com [74.125.82.71]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id B7D4620549 for ; Thu, 2 Oct 2014 08:31:20 +0000 (UTC) Received: by mail-wg0-f71.google.com with SMTP id y10sf855045wgg.6 for ; Thu, 02 Oct 2014 01:31:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mailing-list:precedence:list-id :list-unsubscribe:list-archive:list-post:list-help:sender :delivered-to:message-id:date:from:user-agent:mime-version:to :subject:x-original-sender:x-original-authentication-results :content-type; bh=TdUcPHpJ847mT3DlrkQuBJh0ObDdELaO6dq+NtlRPbQ=; b=VDGavyT5g2Q+nmQfunnemQ5go7Kf5dPBVvSfL2dHatrjrhgxX+WJqojsQyrAjvU6wg dmPXSIDyG/ArsLVARRy5GiMDz7dnDO6IriUIQvvJoNzKVhZTZffwm1/F7BZjwaWj0kXq r0txpjwTpOD0HndHmP3VzfuRoQnaoc6htuQKpZ+YCGN3m5BjyYAgD328zFKyHAdgh0oi ZXheCyH5oMjVXWBFnu5GlbwSBuh16j9D1uHPvB741KXsXx+Wmpz31QxW5ng0pRplV7TI S8mjbzNbG212pk4Ew1bxFo/B7aMj/ka/vgiKljqYoyt9k/zDbrf4jKqHSEdhqDZvPPKv EYVg== X-Gm-Message-State: ALoCoQmQ/Q0En/vN7IaNfsGwIIXL6Wa/ehH+0+/ctDzFvpRE9c2xMF3XcwHdJXvXrnhFatZfHyQd X-Received: by 10.152.26.225 with SMTP id o1mr4618118lag.4.1412238679829; Thu, 02 Oct 2014 01:31:19 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.20.6 with SMTP id j6ls265807lae.24.gmail; Thu, 02 Oct 2014 01:31:19 -0700 (PDT) X-Received: by 10.112.75.233 with SMTP id f9mr23463021lbw.102.1412238679243; Thu, 02 Oct 2014 01:31:19 -0700 (PDT) Received: from mail-lb0-x229.google.com (mail-lb0-x229.google.com [2a00:1450:4010:c04::229]) by mx.google.com with ESMTPS id j15si5495225lbg.30.2014.10.02.01.31.18 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 02 Oct 2014 01:31:18 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::229 as permitted sender) client-ip=2a00:1450:4010:c04::229; Received: by mail-lb0-f169.google.com with SMTP id 10so1817053lbg.14 for ; Thu, 02 Oct 2014 01:31:18 -0700 (PDT) X-Received: by 10.152.22.137 with SMTP id d9mr62047781laf.29.1412238678868; Thu, 02 Oct 2014 01:31:18 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.130.169 with SMTP id of9csp15965lbb; Thu, 2 Oct 2014 01:31:17 -0700 (PDT) X-Received: by 10.70.109.194 with SMTP id hu2mr11571083pdb.137.1412238676859; Thu, 02 Oct 2014 01:31:16 -0700 (PDT) Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id dd3si3276957pdb.47.2014.10.02.01.31.16 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 02 Oct 2014 01:31:16 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-379361-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Received: (qmail 10621 invoked by alias); 2 Oct 2014 08:31:04 -0000 Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: , List-Help: , Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 10581 invoked by uid 89); 2 Oct 2014 08:31:01 -0000 X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.4 required=5.0 tests=AWL, BAYES_00, RP_MATCHES_RCVD, SPF_PASS autolearn=ham version=3.3.2 X-HELO: foss-mx-na.foss.arm.com Received: from foss-mx-na.foss.arm.com (HELO foss-mx-na.foss.arm.com) (217.140.108.86) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 02 Oct 2014 08:30:59 +0000 Received: from foss-smtp-na-1.foss.arm.com (unknown [10.80.61.8]) by foss-mx-na.foss.arm.com (Postfix) with ESMTP id 605F137C for ; Thu, 2 Oct 2014 03:30:55 -0500 (CDT) Received: from collaborate-mta1.arm.com (highbank-bc01-b06.austin.arm.com [10.112.81.134]) by foss-smtp-na-1.foss.arm.com (Postfix) with ESMTP id D17396124D for ; Thu, 2 Oct 2014 03:30:52 -0500 (CDT) Received: from [10.1.209.40] (e105545-lin.cambridge.arm.com [10.1.209.40]) by collaborate-mta1.arm.com (Postfix) with ESMTPS id 6398313F697 for ; Thu, 2 Oct 2014 03:30:52 -0500 (CDT) Message-ID: <542D0D4D.3000709@arm.com> Date: Thu, 02 Oct 2014 09:31:09 +0100 From: Ramana Radhakrishnan User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: "gcc-patches@gcc.gnu.org" Subject: [RFC] costs and it's use in assign_reg_parm X-IsSubscribed: yes X-Original-Sender: ramana.radhakrishnan@arm.com X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2a00:1450:4010:c04::229 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org; dkim=pass header.i=@gcc.gnu.org X-Google-Group-Id: 836684582541 Hi, I've been digging into why on AArch64 we generate pretty bad code for the following testcase. void g2(float, float, float, float, float, float, float, float); void f2a(void) { float x0 = 1.0, x1 = 2.0, x2 = 3.0, x3 = 4.0, x4 = 5.0, x5 = 6.0, x6 = 7.0, x7 = 8.0; float x8 = 0.5, x9 = 1.5, x10 = 2.5, x11 = 0.25, x12 = 0.125, x13 = 3.5, x14 = 0.75, x15 = 1.25; g2(x0, x1, x2, x3, x4, x5, x6, x7); g2(x8, x9, x10, x11, x12, x13, x14, x15); g2(x0, x1, x2, x3, x4, x5, x6, x7); g2(x8, x9, x10, x11, x12, x13, x14, x15); } And a couple of items caught my attention. For one the backend doesn't set the costs of a reg-reg move to be the same as a reg-const move. In the AArch64 backend the approach appears to be in line with the documentation which is to set the costs of various instructions relative to a fast integer instruction. The hack to aarch64.c in the attached patch is for setting the cost properly for a reg-reg move of the appropriate mode and is only for demonstration purposes. I expect this to be replaced by an equivalent bit of code in the backend to achieve the same thing. However the code in precompute_register_parameters assumes that the value is forced into a register if the set_src_cost of a constant is > COSTS_N_INSNS(1). Now this appears to be checking the cost of a set of an FP immediate constant to a register and the backend not unreasonably sets it to an appropriate cost. Now to assume that this number should always be less than 1 is really not appropriate. The same can be achieved with removing the fpconst case in aarch64.c:rtx_costs but ... Instead of putting in what's effectively a lie in the backend, should we just be moving the midend to a world where all such numbers are compared to costs from the backend rather than relying on magic numbers. The costs comparison logic is similar to whats used in lower-subreg. The thought was to move this to a common area (Richard Sandiford suggested expmed.h in a private conversation) so that we had common APIs to check the cost of a SET in this form rather than relying on the rtx_cost interface. Some of you might ask what's the impact on other backends, I still need to do the due diligence there with various test programs but my expectation based on reading the code is that a sample of backends (x86, mips, aarch64 and powerpc) handle the reg-reg move cost with a "set" already and I would expect the same to be in line with other costs. AArch32 does not but I'll take care of that in a follow up. Longer term should we move towards cleaning up such magic numbers from the mid-end and that would make for a more maintainable compiler IMHO. Thoughts ? Lightly tested with the testcase and nothing more. regards Ramana 2a: fmov s23, 8.0e+0 fmov s22, 7.0e+0 fmov s21, 6.0e+0 fmov s20, 5.0e+0 fmov s19, 4.0e+0 fmov s18, 3.0e+0 fmov s17, 2.0e+0 fmov s16, 1.0e+0 stp x29, x30, [sp, -112]! fmov s7, s23 fmov s6, s22 add x29, sp, 0 fmov s5, s21 fmov s4, s20 stp d8, d9, [sp, 16] fmov s3, s19 stp d10, d11, [sp, 32] fmov s2, s18 stp d12, d13, [sp, 48] fmov s1, s17 stp d14, d15, [sp, 64] fmov s0, s16 fmov s15, 1.25e+0 fmov s14, 7.5e-1 fmov s13, 3.5e+0 fmov s12, 1.25e-1 fmov s11, 2.5e-1 fmov s10, 2.5e+0 fmov s9, 1.5e+0 fmov s8, 5.0e-1 str s23, [x29, 80] str s22, [x29, 84] str s21, [x29, 88] str s20, [x29, 92] str s19, [x29, 96] str s18, [x29, 100] str s17, [x29, 104] str s16, [x29, 108] bl g2 fmov s7, s15 fmov s6, s14 fmov s5, s13 fmov s4, s12 fmov s3, s11 fmov s2, s10 fmov s1, s9 fmov s0, s8 bl g2 ldr s23, [x29, 80] TO f2a: stp x29, x30, [sp, -16]! fmov s7, 8.0e+0 add x29, sp, 0 fmov s6, 7.0e+0 fmov s5, 6.0e+0 fmov s4, 5.0e+0 fmov s3, 4.0e+0 fmov s2, 3.0e+0 fmov s1, 2.0e+0 fmov s0, 1.0e+0 bl g2 fmov s7, 1.25e+0 fmov s6, 7.5e-1 fmov s5, 3.5e+0 fmov s4, 1.25e-1 fmov s3, 2.5e-1 fmov s2, 2.5e+0 fmov s1, 1.5e+0 fmov s0, 5.0e-1 bl g2 fmov s7, 8.0e+0 fmov s6, 7.0e+0 fmov s5, 6.0e+0 fmov s4, 5.0e+0 fmov s3, 4.0e+0 fmov s2, 3.0e+0 fmov s1, 2.0e+0 fmov s0, 1.0e+0 bl g2 ldp x29, x30, [sp], 16 fmov s7, 1.25e+0 fmov s6, 7.5e-1 fmov s5, 3.5e+0 fmov s4, 1.25e-1 fmov s3, 2.5e-1 fmov s2, 2.5e+0 fmov s1, 1.5e+0 fmov s0, 5.0e-1 b g2 diff --git a/gcc/calls.c b/gcc/calls.c index 345331f..a34da07 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -813,11 +813,26 @@ call_expr_flags (const_tree t) Set REG_PARM_SEEN if we encounter a register parameter. */ +/* RTXes used while computing costs. */ +struct cost_rtxes { + /* Source and target registers. */ + rtx source; + rtx target; + /* A SET of TARGET. */ + rtx set; +}; + + static void precompute_register_parameters (int num_actuals, struct arg_data *args, int *reg_parm_seen) { int i; + static struct cost_rtxes cost_rtx; + + cost_rtx.target = gen_rtx_REG (word_mode, FIRST_PSEUDO_REGISTER); + cost_rtx.source = gen_rtx_REG (word_mode, FIRST_PSEUDO_REGISTER + 1); + cost_rtx.set = gen_rtx_SET (VOIDmode, cost_rtx.target, cost_rtx.source); *reg_parm_seen = 0; @@ -825,6 +840,8 @@ precompute_register_parameters (int num_actuals, struct arg_data *args, if (args[i].reg != 0 && ! args[i].pass_on_stack) { *reg_parm_seen = 1; + PUT_MODE (cost_rtx.target, args[i].mode); + PUT_MODE (cost_rtx.source, args[i].mode); if (args[i].value == 0) { @@ -872,8 +889,13 @@ precompute_register_parameters (int num_actuals, struct arg_data *args, || (GET_CODE (args[i].value) == SUBREG && REG_P (SUBREG_REG (args[i].value))))) && args[i].mode != BLKmode - && set_src_cost (args[i].value, optimize_insn_for_speed_p ()) - > COSTS_N_INSNS (1) + && (args[i].mode != VOIDmode + && optimize + && CONSTANT_P (args[i].value) + && (set_src_cost (args[i].value, + optimize_insn_for_speed_p ()) + > set_src_cost (cost_rtx.set, + optimize_insn_for_speed_p ()))) && ((*reg_parm_seen && targetm.small_register_classes_for_mode_p (args[i].mode)) || optimize)) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4e0cba8..91ece7b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5083,6 +5083,7 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED, int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1) / UNITS_PER_WORD; *cost = COSTS_N_INSNS (n_minus_1 + 1); + *cost = COSTS_N_INSNS (3); } else /* Cost is just the cost of the RHS of the set. */