From patchwork Wed Jan 18 15:03:47 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ramana Radhakrishnan X-Patchwork-Id: 6280 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id 9E06B241A4 for ; Wed, 18 Jan 2012 15:03:51 +0000 (UTC) Received: from mail-bk0-f52.google.com (mail-bk0-f52.google.com [209.85.214.52]) by fiordland.canonical.com (Postfix) with ESMTP id 767C7A1802A for ; Wed, 18 Jan 2012 15:03:51 +0000 (UTC) Received: by bkbzt4 with SMTP id zt4so3098060bkb.11 for ; Wed, 18 Jan 2012 07:03:51 -0800 (PST) Received: by 10.205.141.72 with SMTP id jd8mr4080722bkc.135.1326899030989; Wed, 18 Jan 2012 07:03:50 -0800 (PST) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.205.82.144 with SMTP id ac16cs150670bkc; Wed, 18 Jan 2012 07:03:50 -0800 (PST) Received: by 10.236.182.167 with SMTP id o27mr31387685yhm.115.1326899028686; Wed, 18 Jan 2012 07:03:48 -0800 (PST) Received: from mail-tul01m020-f178.google.com (mail-tul01m020-f178.google.com [209.85.214.178]) by mx.google.com with ESMTPS id y3si14166383any.107.2012.01.18.07.03.48 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 18 Jan 2012 07:03:48 -0800 (PST) Received-SPF: neutral (google.com: 209.85.214.178 is neither permitted nor denied by best guess record for domain of ramana.radhakrishnan@linaro.org) client-ip=209.85.214.178; Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.214.178 is neither permitted nor denied by best guess record for domain of ramana.radhakrishnan@linaro.org) smtp.mail=ramana.radhakrishnan@linaro.org Received: by obbtb2 with SMTP id tb2so9612730obb.37 for ; Wed, 18 Jan 2012 07:03:47 -0800 (PST) MIME-Version: 1.0 Received: by 10.182.226.6 with SMTP id ro6mr19487624obc.3.1326899027665; Wed, 18 Jan 2012 07:03:47 -0800 (PST) Received: by 10.182.88.7 with HTTP; Wed, 18 Jan 2012 07:03:47 -0800 (PST) Date: Wed, 18 Jan 2012 15:03:47 +0000 Message-ID: Subject: [Patch ARM] Fix PR50313 and handle PIC addresses properly. From: Ramana Radhakrishnan To: gcc-patches Cc: Patch Tracking Hi , PR50313 is a case where having the 2 patterns "pic_load_addr_*" and "pic_add_dot_plus_eight/four" from expand time becomes a problem as discussed by Jakub in his comment in the audit trail for PR48308. There is a separate problem in combine as explained by my comment in the audit trail for PR48308 and my RFC patch for the same sometime last week. What I've done in this case is to take the existing patterns merged them into a UNIFIED form which means that we can hoist the entire computation out with any of the loop optimizers rather than piece-meal hoping to have each one being hoisted correctly. I've had to adjust the cost of this to be equivalent to that of a memory load ( it's in fact the cost of a memory load + the cost of an add instruction but it's good enough for the moment). It's reasonably close enough for arm_size_rtx_costs, and thumb1_rtx_costs pushes this high enough that it will warrant a hoist anyway. I've also been conservative in adjusting cannot_copy_insn_p to disallow copying such insns to prevent any cases where you might have duplicate labels being generated. Once this happens the original bug report in PR50313 appears to be fixed and I've had this survive a bootstrap and regression test on an A9 board and this has successfully cross-built a version of eglibc with an ARM v7-a configuration (which is currently undergoing testing). There are a few ways by which we can get better code - one is to replace the load from the constant pool with a movw / movt pair of the actual differences in which case we'd be able to get performant code in libraries for this case, the other which as RichardE suggested in a conversation was to allow the splitter to actually generate the internal labels in the pic_load_addr and pic_add_dot_plus_eight which could then mean that the label number is really not reserved until this complex pattern has been split. I don't think these enhancements are worth doing so late in stage4 and I do want to backport this patch as it stands into the 4.6 branch. OK (and to backport to 4.6 branch ?) if no regressions ? cheers Ramana 2012-01-18 Ramana Radhakrishnan PR target/50313 * config/arm/arm.c (arm_load_pic_register): Use gen_pic_load_addr_unified. Delete calls to gen_pic_load_addr_32bit , gen_pic_add_dot_plus_eight and gen_pic_add_dot_plus_four. (arm_pic_static_addr): Likewise. (arm_rtx_costs_1): Adjust cost for UNSPEC_PIC_UNIFIED. (arm_note_pic_base): Handle UNSPEC_PIC_UNIFIED. * config/arm/arm.md (UNSPEC_PIC_UNIFIED): Define. (pic_load_addr_unified): New. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 4c310d4..240434f 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -5578,11 +5578,7 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED) if (TARGET_32BIT) { - emit_insn (gen_pic_load_addr_32bit (pic_reg, pic_rtx)); - if (TARGET_ARM) - emit_insn (gen_pic_add_dot_plus_eight (pic_reg, pic_reg, labelno)); - else - emit_insn (gen_pic_add_dot_plus_four (pic_reg, pic_reg, labelno)); + emit_insn (gen_pic_load_addr_unified (pic_reg, pic_rtx, labelno)); } else /* TARGET_THUMB1 */ { @@ -5595,10 +5591,10 @@ arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED) thumb_find_work_register (saved_regs)); emit_insn (gen_pic_load_addr_thumb1 (pic_tmp, pic_rtx)); emit_insn (gen_movsi (pic_offset_table_rtx, pic_tmp)); + emit_insn (gen_pic_add_dot_plus_four (pic_reg, pic_reg, labelno)); } else - emit_insn (gen_pic_load_addr_thumb1 (pic_reg, pic_rtx)); - emit_insn (gen_pic_add_dot_plus_four (pic_reg, pic_reg, labelno)); + emit_insn (gen_pic_load_addr_unified (pic_reg, pic_rtx, labelno)); } } @@ -5628,20 +5624,7 @@ arm_pic_static_addr (rtx orig, rtx reg) UNSPEC_SYMBOL_OFFSET); offset_rtx = gen_rtx_CONST (Pmode, offset_rtx); - if (TARGET_32BIT) - { - emit_insn (gen_pic_load_addr_32bit (reg, offset_rtx)); - if (TARGET_ARM) - insn = emit_insn (gen_pic_add_dot_plus_eight (reg, reg, labelno)); - else - insn = emit_insn (gen_pic_add_dot_plus_four (reg, reg, labelno)); - } - else /* TARGET_THUMB1 */ - { - emit_insn (gen_pic_load_addr_thumb1 (reg, offset_rtx)); - insn = emit_insn (gen_pic_add_dot_plus_four (reg, reg, labelno)); - } - + insn = emit_insn (gen_pic_load_addr_unified (reg, offset_rtx, labelno)); return insn; } @@ -7648,6 +7631,15 @@ arm_rtx_costs_1 (rtx x, enum rtx_code outer, int* total, bool speed) case SET: return false; + + case UNSPEC: + /* We cost this as high as our memory costs to allow this to + be hoisted from loops. */ + if (XINT (x, 1) == UNSPEC_PIC_UNIFIED) + { + *total = COSTS_N_INSNS (2 + ARM_NUM_REGS (mode)); + } + return true; default: *total = COSTS_N_INSNS (4); @@ -10008,7 +10000,8 @@ static int arm_note_pic_base (rtx *x, void *date ATTRIBUTE_UNUSED) { if (GET_CODE (*x) == UNSPEC - && XINT (*x, 1) == UNSPEC_PIC_BASE) + && (XINT (*x, 1) == UNSPEC_PIC_BASE + || XINT (*x, 1) == UNSPEC_PIC_UNIFIED)) return 1; return 0; } diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 5620d7d..8842846 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -116,6 +116,7 @@ ; unaligned locations, on architectures which support ; that. UNSPEC_UNALIGNED_STORE ; Same for str/strh. + UNSPEC_PIC_UNIFIED ; Create a common pic addressing form. ]) ;; UNSPEC_VOLATILE Usage: @@ -5601,7 +5602,7 @@ "flag_pic" ) -;; Split calculate_pic_address into pic_load_addr_* and a move. +;; Split calculate_pic_address into pic_load_addr_32bit/thumb1 and a move. (define_split [(set (match_operand:SI 0 "register_operand" "") (mem:SI (plus:SI (match_operand:SI 1 "register_operand" "") @@ -5613,6 +5614,29 @@ "operands[3] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];" ) +;; operand1 is the memory address to go into pic_load_addr_32bit. +;; operand2 is the PIC label to be emitted from pic_add_dot_plus_*. +;; We do this to allow hoisting of the entire c +(define_insn_and_split "pic_load_addr_unified" + [(set (match_operand:SI 0 "s_register_operand" "=r,r,l") + (unspec:SI [(match_operand:SI 1 "" "mX,mX,mX") + (match_operand:SI 2 "" "")] + UNSPEC_PIC_UNIFIED))] + "flag_pic" + "#" + "flag_pic" + [(set (match_dup 3) (unspec:SI [(match_dup 1)] UNSPEC_PIC_SYM)) + (set (match_dup 0) (unspec:SI [(match_dup 3) (match_dup 4) + (match_dup 2)] UNSPEC_PIC_BASE))] + "operands[3] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0]; + operands[4] = TARGET_THUMB ? GEN_INT (4) : GEN_INT (8);" + [(set_attr "type" "load1,load1,load1") + (set_attr "pool_range" "4096,4096,1024") + (set_attr "neg_pool_range" "0,4084,0") + (set_attr "arch" "a,t2,t1") + (set_attr "length" "8,6,4")] +) + ;; The rather odd constraints on the following are to force reload to leave ;; the insn alone, and to force the minipool generation pass to then move ;; the GOT symbol to memory.