From patchwork Mon Dec  5 10:44:32 2011
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
X-Patchwork-Id: 5450
Return-Path: <patch+caf_=linaro-patchwork=canonical.com@linaro.org>
X-Original-To: patchwork@peony.canonical.com
Delivered-To: patchwork@peony.canonical.com
Received: from fiordland.canonical.com (fiordland.canonical.com
 [91.189.94.145])
 by peony.canonical.com (Postfix) with ESMTP id F2C3223E0C
 for <patchwork@peony.canonical.com>;
 Mon,  5 Dec 2011 10:44:37 +0000 (UTC)
Received: from mail-lpp01m010-f52.google.com (mail-lpp01m010-f52.google.com
 [209.85.215.52])
 by fiordland.canonical.com (Postfix) with ESMTP id C27CEA181F0
 for <linaro-patchwork@canonical.com>;
 Mon,  5 Dec 2011 10:44:37 +0000 (UTC)
Received: by lagm6 with SMTP id m6so75713lag.11
 for <linaro-patchwork@canonical.com>;
 Mon, 05 Dec 2011 02:44:37 -0800 (PST)
Received: by 10.152.106.115 with SMTP id gt19mr5586198lab.27.1323081877453; 
 Mon, 05 Dec 2011 02:44:37 -0800 (PST)
X-Forwarded-To: linaro-patchwork@canonical.com
X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com
Delivered-To: patches@linaro.org
Received: by 10.152.41.198 with SMTP id h6cs250260lal;
 Mon, 5 Dec 2011 02:44:36 -0800 (PST)
Received: by 10.68.30.164 with SMTP id t4mr22385820pbh.63.1323081874590;
 Mon, 05 Dec 2011 02:44:34 -0800 (PST)
Received: from mail-pz0-f50.google.com (mail-pz0-f50.google.com
 [209.85.210.50])
 by mx.google.com with ESMTPS id 3si4427598pbr.104.2011.12.05.02.44.33
 (version=TLSv1/SSLv3 cipher=OTHER);
 Mon, 05 Dec 2011 02:44:34 -0800 (PST)
Received-SPF: neutral (google.com: 209.85.210.50 is neither permitted nor
 denied by best guess record for domain of
 ramana.radhakrishnan@linaro.org) client-ip=209.85.210.50; 
Authentication-Results: mx.google.com;
 spf=neutral (google.com: 209.85.210.50 is neither
 permitted nor denied by best guess record for domain of
 ramana.radhakrishnan@linaro.org)
 smtp.mail=ramana.radhakrishnan@linaro.org
Received: by dadp14 with SMTP id p14so5191102dad.37
 for <patches@linaro.org>; Mon, 05 Dec 2011 02:44:33 -0800 (PST)
MIME-Version: 1.0
Received: by 10.68.199.6 with SMTP id jg6mr22455265pbc.26.1323081872770; Mon,
 05 Dec 2011 02:44:32 -0800 (PST)
Received: by 10.68.64.138 with HTTP; Mon, 5 Dec 2011 02:44:32 -0800 (PST)
In-Reply-To: <CACUk7=UZitdqSar0+trKreYFdhgNO8aruLuq7i6RTROwrWQbuw@mail.gmail.com>
References: <CACUk7=UZitdqSar0+trKreYFdhgNO8aruLuq7i6RTROwrWQbuw@mail.gmail.com>
Date: Mon, 5 Dec 2011 10:44:32 +0000
Message-ID: <CACUk7=WEoK6gNNT=mO2HCHEMknpZFOxpiE4tdV6qPokk8dU1TQ@mail.gmail.com>
Subject: [Patch ARM] Use vcvt.f32/64.s32 with immediate bits to do fixed to
 floating point conversions better.
From: Ramana Radhakrishnan <ramana.radhakrishnan@linaro.org>
To: gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Patch Tracking <patches@linaro.org>

The original RFC is here -
http://gcc.gnu.org/ml/gcc-patches/2011-10/msg01961.html
>
>        * config/arm/arm.c (vfp3_const_double_for_fract_bits): Define.
>        * config/arm/arm-protos.h (vfp3_const_double_for_fract_bits): Declare.
>        * config/arm/constraints.md ("Dt"): New constraint.
>        * config/arm/predicates.md (const_double_vcvt_power_of_two_reciprocal):
>        New.
>        * config/arm/vfp.md (*arm_combine_vcvt_f32_s32): New.
>        (*arm_combine_vcvt_f32_u32): New.

After testing this recently and having received no other feedback on
the RFC, I've now committed the attached patch.

Ramana


2011-12-05  Ramana Radhakrishnan  <ramana.radhakrishnan@linaro.org>

       * config/arm/arm.c (vfp3_const_double_for_fract_bits): Define.
       * config/arm/arm-protos.h (vfp3_const_double_for_fract_bits): Declare.
       * config/arm/constraints.md ("Dt"): New constraint.
       * config/arm/predicates.md
       (const_double_vcvt_power_of_two_reciprocal): New.
       * config/arm/vfp.md (*arm_combine_vcvt_f32_s32): New.
       (*arm_combine_vcvt_f32_u32): New.

Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 182004)
+++ gcc/config/arm/arm.c	(working copy)
@@ -17671,6 +17671,11 @@
       }
       return;
 
+    case 'v':
+	gcc_assert (GET_CODE (x) == CONST_DOUBLE);
+	fprintf (stream, "#%d", vfp3_const_double_for_fract_bits (x));
+	return;
+
     /* Register specifier for vld1.16/vst1.16.  Translate the S register
        number into a D register number and element index.  */
     case 'z':
@@ -25038,4 +25043,27 @@
   return count;
 }
 
+int
+vfp3_const_double_for_fract_bits (rtx operand)
+{
+  REAL_VALUE_TYPE r0;
+  
+  if (GET_CODE (operand) != CONST_DOUBLE)
+    return 0;
+  
+  REAL_VALUE_FROM_CONST_DOUBLE (r0, operand);
+  if (exact_real_inverse (DFmode, &r0))
+    {
+      if (exact_real_truncate (DFmode, &r0))
+	{
+	  HOST_WIDE_INT value = real_to_integer (&r0);
+	  value = value & 0xffffffff;
+	  if ((value != 0) && ( (value & (value - 1)) == 0))
+	    return int_log2 (value);
+	}
+    }
+  return 0;
+}
+
 #include "gt-arm.h"
+
Index: gcc/config/arm/arm-protos.h
===================================================================
--- gcc/config/arm/arm-protos.h	(revision 182004)
+++ gcc/config/arm/arm-protos.h	(working copy)
@@ -241,6 +241,7 @@
 };
 
 extern const struct tune_params *current_tune;
+extern int vfp3_const_double_for_fract_bits (rtx);
 #endif /* RTX_CODE */
 
 #endif /* ! GCC_ARM_PROTOS_H */
Index: gcc/config/arm/vfp.md
===================================================================
--- gcc/config/arm/vfp.md	(revision 182004)
+++ gcc/config/arm/vfp.md	(working copy)
@@ -1144,9 +1144,40 @@
    (set_attr "type" "fcmpd")]
 )
 
+;; Fixed point to floating point conversions. 
+(define_code_iterator FCVT [unsigned_float float])
+(define_code_attr FCVTI32typename [(unsigned_float "u32") (float "s32")])
 
+(define_insn "*combine_vcvt_f32_<FCVTI32typename>"
+  [(set (match_operand:SF 0 "s_register_operand" "=t")
+	(mult:SF (FCVT:SF (match_operand:SI 1 "s_register_operand" "0"))
+		 (match_operand 2 
+			"const_double_vcvt_power_of_two_reciprocal" "Dt")))]
+  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP3 && !flag_rounding_math"
+  "vcvt.f32.<FCVTI32typename>\\t%0, %1, %v2"
+ [(set_attr "predicable" "no")
+  (set_attr "type" "f_cvt")]
+)
+
+;; Not the ideal way of implementing this. Ideally we would be able to split
+;; this into a move to a DP register and then a vcvt.f64.i32
+(define_insn "*combine_vcvt_f64_<FCVTI32typename>"
+  [(set (match_operand:DF 0 "s_register_operand" "=x,x,w")
+	(mult:DF (FCVT:DF (match_operand:SI 1 "s_register_operand" "r,t,r"))
+		 (match_operand 2 
+		     "const_double_vcvt_power_of_two_reciprocal" "Dt,Dt,Dt")))]
+  "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_VFP3 && !flag_rounding_math 
+  && !TARGET_VFP_SINGLE"
+  "@
+  vmov.f32\\t%0, %1\;vcvt.f64.<FCVTI32typename>\\t%P0, %P0, %v2
+  vmov.f32\\t%0, %1\;vcvt.f64.<FCVTI32typename>\\t%P0, %P0, %v2
+  vmov.f64\\t%0, %1, %1\; vcvt.f64.<FCVTI32typename>\\t%P0, %P0, %v2"
+ [(set_attr "predicable" "no")
+  (set_attr "type" "f_cvt")
+  (set_attr "length" "8")]
+)
+
 ;; Store multiple insn used in function prologue.
-
 (define_insn "*push_multi_vfp"
   [(match_parallel 2 "multi_register_push"
     [(set (match_operand:BLK 0 "memory_operand" "=m")
Index: gcc/config/arm/constraints.md
===================================================================
--- gcc/config/arm/constraints.md	(revision 182004)
+++ gcc/config/arm/constraints.md	(working copy)
@@ -29,7 +29,7 @@
 ;; in Thumb-1 state: I, J, K, L, M, N, O
 
 ;; The following multi-letter normal constraints have been used:
-;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dz
+;; in ARM/Thumb-2 state: Da, Db, Dc, Dn, Dl, DL, Dv, Dy, Di, Dt, Dz
 ;; in Thumb-1 state: Pa, Pb, Pc, Pd
 ;; in Thumb-2 state: Pj, PJ, Ps, Pt, Pu, Pv, Pw, Px, Py
 
@@ -291,6 +291,12 @@
  (and (match_code "const_double")
       (match_test "TARGET_32BIT && TARGET_VFP_DOUBLE && vfp3_const_double_rtx (op)")))
 
+(define_constraint "Dt" 
+ "@internal
+  In ARM/ Thumb2 a const_double which can be used with a vcvt.f32.s32 with fract bits operation"
+  (and (match_code "const_double")
+       (match_test "TARGET_32BIT && TARGET_VFP && vfp3_const_double_for_fract_bits (op)")))
+
 (define_memory_constraint "Ut"
  "@internal
   In ARM/Thumb-2 state an address valid for loading/storing opaque structure
Index: gcc/config/arm/predicates.md
===================================================================
--- gcc/config/arm/predicates.md	(revision 182004)
+++ gcc/config/arm/predicates.md	(working copy)
@@ -754,6 +754,11 @@
   return true; 
 })
 
+(define_predicate "const_double_vcvt_power_of_two_reciprocal"
+  (and (match_code "const_double")
+       (match_test "TARGET_32BIT && TARGET_VFP 
+       		   && vfp3_const_double_for_fract_bits (op)")))
+
 (define_predicate "neon_struct_operand"
   (and (match_code "mem")
        (match_test "TARGET_32BIT && neon_vector_mem_operand (op, 2)")))