From patchwork Mon Dec 5 10:44:32 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Ramana Radhakrishnan X-Patchwork-Id: 5450 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id F2C3223E0C for ; Mon, 5 Dec 2011 10:44:37 +0000 (UTC) Received: from mail-lpp01m010-f52.google.com (mail-lpp01m010-f52.google.com [209.85.215.52]) by fiordland.canonical.com (Postfix) with ESMTP id C27CEA181F0 for ; Mon, 5 Dec 2011 10:44:37 +0000 (UTC) Received: by lagm6 with SMTP id m6so75713lag.11 for ; Mon, 05 Dec 2011 02:44:37 -0800 (PST) Received: by 10.152.106.115 with SMTP id gt19mr5586198lab.27.1323081877453; Mon, 05 Dec 2011 02:44:37 -0800 (PST) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.152.41.198 with SMTP id h6cs250260lal; Mon, 5 Dec 2011 02:44:36 -0800 (PST) Received: by 10.68.30.164 with SMTP id t4mr22385820pbh.63.1323081874590; Mon, 05 Dec 2011 02:44:34 -0800 (PST) Received: from mail-pz0-f50.google.com (mail-pz0-f50.google.com [209.85.210.50]) by mx.google.com with ESMTPS id 3si4427598pbr.104.2011.12.05.02.44.33 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 05 Dec 2011 02:44:34 -0800 (PST) Received-SPF: neutral (google.com: 209.85.210.50 is neither permitted nor denied by best guess record for domain of ramana.radhakrishnan@linaro.org) client-ip=209.85.210.50; Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.210.50 is neither permitted nor denied by best guess record for domain of ramana.radhakrishnan@linaro.org) smtp.mail=ramana.radhakrishnan@linaro.org Received: by dadp14 with SMTP id p14so5191102dad.37 for ; Mon, 05 Dec 2011 02:44:33 -0800 (PST) MIME-Version: 1.0 Received: by 10.68.199.6 with SMTP id jg6mr22455265pbc.26.1323081872770; Mon, 05 Dec 2011 02:44:32 -0800 (PST) Received: by 10.68.64.138 with HTTP; Mon, 5 Dec 2011 02:44:32 -0800 (PST) In-Reply-To: References: Date: Mon, 5 Dec 2011 10:44:32 +0000 Message-ID: Subject: [Patch ARM] Use vcvt.f32/64.s32 with immediate bits to do fixed to floating point conversions better. From: Ramana Radhakrishnan To: gcc-patches Cc: Patch Tracking The original RFC is here - http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01961.html > >        * config/arm/arm.c (vfp3_const_double_for_fract_bits): Define. >        * config/arm/arm-protos.h (vfp3_const_double_for_fract_bits): Declare. >        * config/arm/constraints.md ("Dt"): New constraint. >        * config/arm/predicates.md (const_double_vcvt_power_of_two_reciprocal): >        New. >        * config/arm/vfp.md (*arm_combine_vcvt_f32_s32): New. >        (*arm_combine_vcvt_f32_u32): New. After testing this recently and having received no other feedback on the RFC, I've now committed the attached patch. Ramana 2011-12-05 Ramana Radhakrishnan * config/arm/arm.c (vfp3_const_double_for_fract_bits): Define. * config/arm/arm-protos.h (vfp3_const_double_for_fract_bits): Declare. * config/arm/constraints.md ("Dt"): New constraint. * config/arm/predicates.md (const_double_vcvt_power_of_two_reciprocal): New. * config/arm/vfp.md (*arm_combine_vcvt_f32_s32): New. (*arm_combine_vcvt_f32_u32): New. Index: gcc/config/arm/arm.c =================================================================== --- gcc/config/arm/arm.c (revision 182004) +++ gcc/config/arm/arm.c (working copy) @@ -17671,6 +17671,11 @@ } return; + case 'v': + gcc_assert (GET_CODE (x) == CONST_DOUBLE); + fprintf (stream, "#%d", vfp3_const_double_for_fract_bits (x)); + return; + /* Register specifier for vld1.16/vst1.16. Translate the S register number into a D register number and element index. */ case 'z': @@ -25038,4 +25043,27 @@ return count; } +int +vfp3_const_double_for_fract_bits (rtx operand) +{ + REAL_VALUE_TYPE r0; + + if (GET_CODE (operand) != CONST_DOUBLE) + return 0; + + REAL_VALUE_FROM_CONST_DOUBLE (r0, operand); + if (exact_real_inverse (DFmode, &r0)) + { + if (exact_real_truncate (DFmode, &r0)) + { + HOST_WIDE_INT value = real_to_integer (&r0); + value = value & 0xffffffff; + if ((value != 0) && ( (value & (value - 1)) == 0)) + return int_log2 (value); + } + } + return 0; +} + #include "gt-arm.h" + Index: gcc/config/arm/arm-protos.h =================================================================== --- gcc/config/arm/arm-protos.h (revision 182004) +++ gcc/config/arm/arm-protos.h (working copy) @@ -241,6 +241,7 @@ }; extern const struct tune_params *current_tune; +extern int vfp3_const_double_for_fract_bits (rtx); #endif /* RTX_CODE */ #endif /* ! GCC_ARM_PROTOS_H */ Index: gcc/config/arm/vfp.md =================================================================== --- gcc/config/arm/vfp.md (revision 182004) +++ gcc/config/arm/vfp.md (working copy) @@ -1144,9 +1144,40 @@ (set_attr "type" "fcmpd")] ) +;; Fixed point to floating point conversions. +(define_code_iterator FCVT [unsigned_float float]) +(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")]) +(define_insn "*combine_vcvt_f32_" + [(set (match_operand:SF 0 "s_register_operand" "=t") + (mult:SF (FCVT:SF (match_operand:SI 1 "s_register_operand" "0")) + (match_operand 2 + "const_double_vcvt_power_of_two_reciprocal" "Dt")))] + "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP3 && !flag_rounding_math" + "vcvt.f32.\\t%0, %1, %v2" + [(set_attr "predicable" "no") + (set_attr "type" "f_cvt")] +) + +;; Not the ideal way of implementing this. Ideally we would be able to split +;; this into a move to a DP register and then a vcvt.f64.i32 +(define_insn "*combine_vcvt_f64_" + [(set (match_operand:DF 0 "s_register_operand" "=x,x,w") + (mult:DF (FCVT:DF (match_operand:SI 1 "s_register_operand" "r,t,r")) + (match_operand 2 + "const_double_vcvt_power_of_two_reciprocal" "Dt,Dt,Dt")))] + "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP3 && !flag_rounding_math + && !TARGET_VFP_SINGLE" + "@ + vmov.f32\\t%0, %1\;vcvt.f64.\\t%P0, %P0, %v2 + vmov.f32\\t%0, %1\;vcvt.f64.\\t%P0, %P0, %v2 + vmov.f64\\t%0, %1, %1\; vcvt.f64.\\t%P0, %P0, %v2" + [(set_attr "predicable" "no") + (set_attr "type" "f_cvt") + (set_attr "length" "8")] +) + ;; Store multiple insn used in function prologue. - (define_insn "*push_multi_vfp" [(match_parallel 2 "multi_register_push" [(set (match_operand:BLK 0 "memory_operand" "=m") Index: gcc/config/arm/constraints.md =================================================================== --- gcc/config/arm/constraints.md (revision 182004) +++ gcc/config/arm/constraints.md (working copy) @@ -29,7 +29,7 @@ ;; in Thumb-1 state: I, J, K, L, M, N, O ;; The following multi-letter normal constraints have been used: -;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dz +;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz ;; in Thumb-1 state: Pa, Pb, Pc, Pd ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py @@ -291,6 +291,12 @@ (and (match_code "const_double") (match_test "TARGET_32BIT && TARGET_VFP_DOUBLE && vfp3_const_double_rtx (op)"))) +(define_constraint "Dt" + "@internal + In ARM/ Thumb2 a const_double which can be used with a vcvt.f32.s32 with fract bits operation" + (and (match_code "const_double") + (match_test "TARGET_32BIT && TARGET_VFP && vfp3_const_double_for_fract_bits (op)"))) + (define_memory_constraint "Ut" "@internal In ARM/Thumb-2 state an address valid for loading/storing opaque structure Index: gcc/config/arm/predicates.md =================================================================== --- gcc/config/arm/predicates.md (revision 182004) +++ gcc/config/arm/predicates.md (working copy) @@ -754,6 +754,11 @@ return true; }) +(define_predicate "const_double_vcvt_power_of_two_reciprocal" + (and (match_code "const_double") + (match_test "TARGET_32BIT && TARGET_VFP + && vfp3_const_double_for_fract_bits (op)"))) + (define_predicate "neon_struct_operand" (and (match_code "mem") (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 2)")))