[RFC,ARM] Use vcvt.f32/64.s32 with immediate bits to do fixed to floating point conversions better.

Hi,

Some time back Michael pointed out that the ARM backend doesn't generate
vcvt.f32.s<type> where you have a conversion from fixed to floating point
as in the example below. It should also be possible to generate the vector forms
of this which will be the subject of a follow-up patch .

I've chosen to implement this in the following manner in the backend using
these interfaces from real.c . The reason I've chosen to not allow
this transformation in case flag_rounding_math is true is because this
instruction always ends up rounding using round-to-nearest rather than
obeying whats in the FPSCR and thus is not safe for programs that want
to dynamically set their rounding modes.

I have chosen to use the unified assembler syntax for this patch and
have a set of follow up patches that I've been working on that try to
replace all the old assembler mnemonics with the newer UAL ones. I
think gas has matured to a point where most of the new syntax for VFP
is now fully recognized and there's no reason why we shouldn't move
forward. What is the opinion in this regard ?

The benefits are quite obvious in that we eliminate a load from the
constant pool and a floating point multiply and thus essentially
shaving off a floating point multiply + Load latency off these
sequences. This instruction can only write the output into the same
register as the input register which is why I've modelled it as below
by tying op1 into op0. Also the i32 -> f64 cases were quite impossible
to model with
insn_and_splits and subreg modes which is what Richard and I tried to cook up.

If someone has an idea as to how this might be achieved I'm all ears
compared to the current way
in which it's all sort of tied together.

Also, if there's a simpler way of using the interfaces into real.c
then I'm all ears ?

OK for trunk ?

cheers
Ramana

	* config/arm/arm.c (vfp3_const_double_for_fract_bits): Define.
	* config/arm/arm-protos.h (vfp3_const_double_for_fract_bits): Declare.
	* config/arm/constraints.md ("Dt"): New constraint.
	* config/arm/predicates.md (const_double_vcvt_power_of_two_reciprocal):
	New.
	* config/arm/vfp.md (*arm_combine_vcvt_f32_s32): New.
	(*arm_combine_vcvt_f32_u32): New.

For the following testcases I see the code as follows with
-mfloat-abi=hard -mfpu=vfpv3 and -mcpu=cortex-a9

float foo (int i)
{
 float v = (float)i / (1 << 11);
 return v;
}
float foa_unsigned (unsigned int i)
{
 float v = (float)i / (1 << 5);
 return v;
}

After patch .

foo:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	fmsr	s0, r0	@ int
	vcvt.f32.s32	s0, s0, #11
	bx	lr
	.size	foo, .-foo
	.align	2
	.global	foa_unsigned
	.type	foa_unsigned, %function
foa_unsigned:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	fmsr	s0, r0	@ int
	vcvt.f32.u32	s0, s0, #5
	bx	lr
	.size	foa_unsigned, .-foa_unsigned
	.align	2
	.global	foo1
	.type	foo1, %function

rather than
	.type	foo, %function
foo:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	fmsr	s15, r0	@ int
	fsitos	s0, s15
	flds	s15, .L2
	fmuls 	s0, s0, s15
	bx	lr
.L3:
	.align	2
.L2:
	.word	973078528
	.size	foo, .-foo
	.align	2
	.global	foa_unsigned
	.type	foa_unsigned, %function
foa_unsigned:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	fmsr	s15, r0	@ int
	fuitos	 s0, s15
	flds	s15, .L5
	fmuls 	s0, s0, s15
	bx	lr
.L6:
	.align	2
.L5:
	.word	1023410176

[RFC,ARM] Use vcvt.f32/64.s32 with immediate bits to do fixed to floating point conversions better.

Commit Message

Patch