[(3/7)] Widening multiply-and-accumulate pattern matching

Message ID 4E09B142.4020402@codesourcery.com
State New
Headers show

Commit Message

Andrew Stubbs June 28, 2011, 10:47 a.m.
On 24/06/11 16:47, Richard Guenther wrote:
>> I can certainly add checks to make sure that the skipped operations
>> >  actually don't make any important changes to the value, but do I need to?
> Yes.

OK, how about this patch?

I've added checks to make sure the value is not truncated at any point.

I've also changed the test cases to address Janis' comments.

Andrew

Comments

Richard Biener June 28, 2011, 12:03 p.m. | #1
On Tue, Jun 28, 2011 at 12:47 PM, Andrew Stubbs <andrew.stubbs@gmail.com> wrote:
> On 24/06/11 16:47, Richard Guenther wrote:
>>>
>>> I can certainly add checks to make sure that the skipped operations
>>> >  actually don't make any important changes to the value, but do I need
>>> > to?
>>
>> Yes.
>
> OK, how about this patch?

I'd name the predicate value_preserving_conversion_p which I think
is what you mean.  harmless isn't really descriptive.

Note that you include non-value-preserving conversions, namely
int -> unsigned int.  Don't dispatch to useless_type_conversion_p,
it's easy to enumerate which conversions are value-preserving.

Don't try to match the tree_ssa_useless_* set of functions, instead
put the value_preserving_conversion_p predicate in tree.[ch] and
a suitable function using it in tree-ssa-math-opts.c.

Thanks,
Richard.

> I've added checks to make sure the value is not truncated at any point.
>
> I've also changed the test cases to address Janis' comments.
>
> Andrew
>
Michael Matz June 28, 2011, 3:53 p.m. | #2
Hi,

On Tue, 28 Jun 2011, Richard Guenther wrote:

> I'd name the predicate value_preserving_conversion_p which I think is 
> what you mean.  harmless isn't really descriptive.
> 
> Note that you include non-value-preserving conversions, namely int -> 
> unsigned int.

It seems that Andrew really does want to accept them.  If so 
value_preserving_conversion_p would be the wrong name.  It seems to me he 
wants to accept those conversions that make it possible to retrieve the 
old value, i.e. when "T1 x; (T1)(T2)x == x", then T1->T2 has the 
to-be-named property.  bits_preserving?  Hmm.


Ciao,
Michael.
Andrew Stubbs June 28, 2011, 4:14 p.m. | #3
On 28/06/11 16:53, Michael Matz wrote:
> On Tue, 28 Jun 2011, Richard Guenther wrote:
>> I'd name the predicate value_preserving_conversion_p which I think is
>> what you mean.  harmless isn't really descriptive.
>>
>> Note that you include non-value-preserving conversions, namely int ->
>> unsigned int.
>
> It seems that Andrew really does want to accept them.  If so
> value_preserving_conversion_p would be the wrong name.  It seems to me he
> wants to accept those conversions that make it possible to retrieve the
> old value, i.e. when "T1 x; (T1)(T2)x == x", then T1->T2 has the
> to-be-named property.  bits_preserving?  Hmm.

What I want (and I'm not totally clear on what this actually means) is 
to be able to optimize all the cases where the end result will be the 
same as the compiler produces now (using multiple multiply, shift, and 
add operations).

Ok, so that's an obvious statement, but the point is that, right now, 
the compiler does nothing special when you cast from int -> unsigned 
int, or vice-versa, and I want to capture that somehow. There are some 
exceptions, I'm sure, but what are they?

What is clear is that I don't want to just assume that casting from one 
signedness to the other is a show-stopper.

For example:

   unsigned long long
   foo (unsigned long long a, unsigned char b, unsigned char c)
   {
     return a + b * c;
   }

This appears to be entirely unsigned maths with plenty of spare 
precision, and therefore a dead cert for any SI->DI 
multiply-and-accumulate instruction, but not so - it is represented 
internally as:

   signed int tmp = (signed int)a * (signed int)b;
   unsigned long long result = a + (unsigned long long)tmp;

Notice the unexpected signed int in the middle! I need to be able to get 
past that to optimize this properly.

I've tried various test cases in which I cast signedness and mode around 
a bit, and so far it appear to perform safely, but probably I'm not be 
cunning enough.

Andrew
Michael Matz June 28, 2011, 4:37 p.m. | #4
Hi,

On Tue, 28 Jun 2011, Andrew Stubbs wrote:

> What I want (and I'm not totally clear on what this actually means) is 
> to be able to optimize all the cases where the end result will be the 
> same as the compiler produces now (using multiple multiply, shift, and 
> add operations).

Okay, then you really want to look through value-preserving conversions.

> Ok, so that's an obvious statement, but the point is that, right now, 
> the compiler does nothing special when you cast from int -> unsigned 
> int, or vice-versa, and I want to capture that somehow. There are some 
> exceptions, I'm sure, but what are they?

Same-sized signed <-> unsigned conversions aren't value preserving:
  unsigned char c = 255; (signed char)c == -1; 255 != -1
unsigned -> larger sized signed is value preserving
  unsigned char c = 255; (signed short)c == 255;
signed -> unsigned never is value preserving

> multiply-and-accumulate instruction, but not so - it is represented 
> internally as:
> 
>   signed int tmp = (signed int)a * (signed int)b;
>   unsigned long long result = a + (unsigned long long)tmp;
> 
> Notice the unexpected signed int in the middle!

Yeah, the C standard requires this.

> I need to be able to get past that to optimize this properly.

Then you're lucky because unsigned char -> signed int is an embedding, 
hence value preserving.  I thought we had a predicate for such conversions 
already, but seems I was wrong.  So, create it as Richi said, but 
enumerate explicitely the cases you want to handle, and include only those 
that really are value preserving.


Ciao,
Michael.
Andrew Stubbs July 1, 2011, 11:58 a.m. | #5
On 28/06/11 17:37, Michael Matz wrote:
>> What I want (and I'm not totally clear on what this actually means) is
>> >  to be able to optimize all the cases where the end result will be the
>> >  same as the compiler produces now (using multiple multiply, shift, and
>> >  add operations).
> Okay, then you really want to look through value-preserving conversions.
>
>> >  Ok, so that's an obvious statement, but the point is that, right now,
>> >  the compiler does nothing special when you cast from int ->  unsigned
>> >  int, or vice-versa, and I want to capture that somehow. There are some
>> >  exceptions, I'm sure, but what are they?
> Same-sized signed<->  unsigned conversions aren't value preserving:
>    unsigned char c = 255; (signed char)c == -1; 255 != -1
> unsigned ->  larger sized signed is value preserving
>    unsigned char c = 255; (signed short)c == 255;
> signed ->  unsigned never is value preserving

OK, so I've tried implementing this, and I find I hit against a problem:

Given this test case:

   unsigned long long
   foo (unsigned long long a, signed char *b, signed char *c)
   {
     return a + *b * *c;
   }

Those rules say that it should not be suitable for optimization because 
there's an implicit cast from signed int to unsigned long long.

Without any widening multiplications allowed, GCC gives this code (for ARM):

   ldrsb   r2, [r2, #0]
   ldrsb   r3, [r3, #0]
   mul     r2, r2, r3
   adds    r0, r0, r2
   adc     r1, r1, r2, asr #31

This is exactly what a signed widening multiply-and-accumulate with 
smlalbb would have done!

OK, so the types in the testcase are a bit contrived, but my point is 
that I want to be able to use the widening-mult instructions everywhere 
that they would produce the same output and gcc would otherwise, and gcc 
just doesn't seem that interested in signed<->unsigned conversions.

So, I'm happy to put in checks to ensure that truncations are not 
ignore, but I'm really not sure what's the right thing to do with the 
extends and signedness switches.

Any suggestions?

Andrew
Richard Biener July 1, 2011, 12:25 p.m. | #6
On Fri, Jul 1, 2011 at 1:58 PM, Stubbs, Andrew <Andrew_Stubbs@mentor.com> wrote:
> On 28/06/11 17:37, Michael Matz wrote:
>>> What I want (and I'm not totally clear on what this actually means) is
>>> >  to be able to optimize all the cases where the end result will be the
>>> >  same as the compiler produces now (using multiple multiply, shift, and
>>> >  add operations).
>> Okay, then you really want to look through value-preserving conversions.
>>
>>> >  Ok, so that's an obvious statement, but the point is that, right now,
>>> >  the compiler does nothing special when you cast from int ->  unsigned
>>> >  int, or vice-versa, and I want to capture that somehow. There are some
>>> >  exceptions, I'm sure, but what are they?
>> Same-sized signed<->  unsigned conversions aren't value preserving:
>>    unsigned char c = 255; (signed char)c == -1; 255 != -1
>> unsigned ->  larger sized signed is value preserving
>>    unsigned char c = 255; (signed short)c == 255;
>> signed ->  unsigned never is value preserving
>
> OK, so I've tried implementing this, and I find I hit against a problem:
>
> Given this test case:
>
>   unsigned long long
>   foo (unsigned long long a, signed char *b, signed char *c)
>   {
>     return a + *b * *c;
>   }
>
> Those rules say that it should not be suitable for optimization because
> there's an implicit cast from signed int to unsigned long long.
>
> Without any widening multiplications allowed, GCC gives this code (for ARM):
>
>   ldrsb   r2, [r2, #0]
>   ldrsb   r3, [r3, #0]
>   mul     r2, r2, r3
>   adds    r0, r0, r2
>   adc     r1, r1, r2, asr #31
>
> This is exactly what a signed widening multiply-and-accumulate with
> smlalbb would have done!
>
> OK, so the types in the testcase are a bit contrived, but my point is
> that I want to be able to use the widening-mult instructions everywhere
> that they would produce the same output and gcc would otherwise, and gcc
> just doesn't seem that interested in signed<->unsigned conversions.
>
> So, I'm happy to put in checks to ensure that truncations are not
> ignore, but I'm really not sure what's the right thing to do with the
> extends and signedness switches.
>
> Any suggestions?

Well - some operations work the same on both signedness if you
just care about the twos-complement result.  This includes
multiplication (but not for example division).  For this special
case I suggest to not bother trying to invent a generic predicate
but do something local in tree-ssa-math-opts.c.

Richard.

> Andrew
>
Paolo Bonzini July 1, 2011, 12:33 p.m. | #7
On 07/01/2011 01:58 PM, Stubbs, Andrew wrote:
> Given this test case:
>
>     unsigned long long
>     foo (unsigned long long a, signed char *b, signed char *c)
>     {
>       return a + *b * *c;
>     }
>
> Those rules say that it should not be suitable for optimization because
> there's an implicit cast from signed int to unsigned long long.

Got it now!  Casts from signed to unsigned are not value-preserving, but 
they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the 
same result bit-by-bit as the s64 result.  The fact that s64 has an 
implicit 1111... in front, while an u64 has an implicit 0000... does not 
matter.

Is this the meaning of the predicate you want?  I think so, based on the 
discussion, but it's hard to say without seeing the cases enumerated 
(i.e. a patch).

However, perhaps there is a catch.  We can do the following thought 
experiment.  What would happen if you had multiple widening multiplies? 
  Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 
128-bit unsigned?  I believe in this case you couldn't optimize 8-bit 
signed to 128-bit unsigned.  Would your code do it?

Paolo
Andrew Stubbs July 1, 2011, 1:30 p.m. | #8
On 01/07/11 13:33, Paolo Bonzini wrote:
> Got it now! Casts from signed to unsigned are not value-preserving, but
> they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the
> same result bit-by-bit as the s64 result. The fact that s64 has an
> implicit 1111... in front, while an u64 has an implicit 0000... does not
> matter.

But, the 1111... and 0000... are not implicit. They are very real, and 
if applied incorrectly will change the result, I think.

> Is this the meaning of the predicate you want? I think so, based on the
> discussion, but it's hard to say without seeing the cases enumerated
> (i.e. a patch).

The purpose of this predicate is to determine whether any type 
conversions that occur between the output of a widening multiply, and 
the input of an addition have any bearing on the end result.

We know what the effective output type of the multiply is (the size is 
2x the input type, and the signed if either one of the inputs in 
signed), and we know what the input type of the addition is, but any 
amount of junk can lie in between. The problem is determining if it *is* 
junk.

In an ideal world there would only be two cases to consider:

   1. No conversion needed.

   2. A single sign-extend or zero-extend (according to the type of the 
inputs) to match the input size of the addition.

Anything else would be unsuitable for optimization. Of course, it's 
never that simple, but it should still be possible to boil down a list 
of conversions to one of these cases, if it's valid.

The signedness of the input to the addition is not significant - the 
code would be the same either way. But, I is important not to try to 
zero-extend something that started out signed, and not to sign-extend 
something that started out unsigned.

> However, perhaps there is a catch. We can do the following thought
> experiment. What would happen if you had multiple widening multiplies?
> Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 128-bit
> unsigned? I believe in this case you couldn't optimize 8-bit signed to
> 128-bit unsigned. Would your code do it?

My code does not attempt to combine multiple multiplies. In any case, if 
you have two multiplications, surely you have at least three input 
values, so they can't be combined?

It does attempt to combine a multiply and an addition, where a suitable 
madd* insn is available. (This is not new; I'm just trying to do it in 
more cases.)

I have considered the case where you have "(a * b) + (c * d)", but have 
not yet coded anything for it. At present, the code will simply choose 
whichever multiply happens to find itself the first input operand of the 
plus, and ignores the other, even if the first turns out not to be a 
suitable candidate.

Andrew
Paolo Bonzini July 1, 2011, 2:40 p.m. | #9
On 07/01/2011 03:30 PM, Stubbs, Andrew wrote:
>> >  However, perhaps there is a catch. We can do the following thought
>> >  experiment. What would happen if you had multiple widening multiplies?
>> >  Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 128-bit
>> >  unsigned? I believe in this case you couldn't optimize 8-bit signed to
>> >  128-bit unsigned. Would your code do it?
> My code does not attempt to combine multiple multiplies. In any case, if
> you have two multiplications, surely you have at least three input
> values, so they can't be combined?

What about (u128)c + (u64)((s8)a * (s8)b)?  You cannot convert this to 
(u128)c + (u128)((s8)a * (s8)b).

Paolo
Andrew Stubbs July 1, 2011, 2:55 p.m. | #10
On 01/07/11 15:40, Paolo Bonzini wrote:
> On 07/01/2011 03:30 PM, Stubbs, Andrew wrote:
>>> > However, perhaps there is a catch. We can do the following thought
>>> > experiment. What would happen if you had multiple widening multiplies?
>>> > Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to
>>> 128-bit
>>> > unsigned? I believe in this case you couldn't optimize 8-bit signed to
>>> > 128-bit unsigned. Would your code do it?
>> My code does not attempt to combine multiple multiplies. In any case, if
>> you have two multiplications, surely you have at least three input
>> values, so they can't be combined?
>
> What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to
> (u128)c + (u128)((s8)a * (s8)b).

Oh I see, sorry. Yes, that's exactly what I'm trying to do here.

No, wait, I don't see. Where are these multiple widening multiplies 
you're talking about? I only see one multiply?

Andrew
Andrew Stubbs July 1, 2011, 3:09 p.m. | #11
On 01/07/11 14:30, Stubbs, Andrew wrote:
>> Got it now! Casts from signed to unsigned are not value-preserving, but
>> >  they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the
>> >  same result bit-by-bit as the s64 result. The fact that s64 has an
>> >  implicit 1111... in front, while an u64 has an implicit 0000... does not
>> >  matter.
> But, the 1111... and 0000... are not implicit. They are very real, and
> if applied incorrectly will change the result, I think.

Wait, I'm clearly confused ....

When I try a s32->u64 conversion, the expand pass generates a 
sign_extend insn.

Clearly it's the source type that determines the extension type, not the 
destination type ... and I'm a dunce!

Thanks :)

Andrew
Paolo Bonzini July 1, 2011, 3:54 p.m. | #12
On 07/01/2011 04:55 PM, Stubbs, Andrew wrote:
>> >
>> >  What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to
>> >  (u128)c + (u128)((s8)a * (s8)b).
> Oh I see, sorry. Yes, that's exactly what I'm trying to do here.
>
> No, wait, I don't see. Where are these multiple widening multiplies
> you're talking about? I only see one multiply?

I meant one multiplication with multiple widening steps.  Not clear at 
all, sorry.

Paolo
Bernd Schmidt July 1, 2011, 4:40 p.m. | #13
On 06/28/11 18:14, Andrew Stubbs wrote:

>   unsigned long long
>   foo (unsigned long long a, unsigned char b, unsigned char c)
>   {
>     return a + b * c;
>   }
> 
> This appears to be entirely unsigned maths with plenty of spare
> precision, and therefore a dead cert for any SI->DI
> multiply-and-accumulate instruction, but not so - it is represented
> internally as:
> 
>   signed int tmp = (signed int)a * (signed int)b;
>   unsigned long long result = a + (unsigned long long)tmp;
> 
> Notice the unexpected signed int in the middle! I need to be able to get
> past that to optimize this properly.

Since both inputs are positive in a signed int (they must be, being cast
from a smaller unsigned value), you can infer that it does not matter
whether you treat the result of the multiplication as a signed or an
unsigned value. It is positive in any case.

So, I think the thing to test is: if the accumulate step requires
widening the result of the multiplication, either the cast must be value
preserving (widening unsigned to signed), or you must be able to prove
that the multiplication produces a positive result.

If the accumulate step just casts the multiplication result from signed
to unsigned, keeping the precision the same, you can ignore the cast
since the addition is unaffected by it.


Bernd
Andrew Stubbs July 1, 2011, 6:17 p.m. | #14
On 01/07/11 16:54, Paolo Bonzini wrote:
> On 07/01/2011 04:55 PM, Stubbs, Andrew wrote:
>>> >
>>> > What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to
>>> > (u128)c + (u128)((s8)a * (s8)b).
>> Oh I see, sorry. Yes, that's exactly what I'm trying to do here.
>>
>> No, wait, I don't see. Where are these multiple widening multiplies
>> you're talking about? I only see one multiply?
>
> I meant one multiplication with multiple widening steps. Not clear at
> all, sorry.

Yes, I see now, the whole purpose of my patch set is widening by more 
than one mode.

The case of the multiply-and-accumulate is the only way there can be 
more than one step though. Widening multiplies themselves are always 
handled as one unit.

Andrew

Patch

2011-06-28  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* gimple.h (tree_ssa_harmless_type_conversion): New prototype.
	(tree_ssa_strip_harmless_type_conversions): New prototype.
	(harmless_type_conversion_p): New prototype.
	* tree-ssa-math-opts.c (convert_plusminus_to_widen): Look for
	multiply statement beyond no-op conversion statements.
	* tree-ssa.c (harmless_type_conversion_p): New function.
	(tree_ssa_harmless_type_conversion): New function.
	(tree_ssa_strip_harmless_type_conversions): New function.

	gcc/testsuite/
	* gcc.target/arm/wmul-5.c: New file.
	* gcc.target/arm/no-wmla-1.c: New file.

--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -1090,8 +1090,11 @@  extern bool validate_gimple_arglist (const_gimple, ...);
 
 /* In tree-ssa.c  */
 extern bool tree_ssa_useless_type_conversion (tree);
+extern bool tree_ssa_harmless_type_conversion (tree);
 extern tree tree_ssa_strip_useless_type_conversions (tree);
+extern tree tree_ssa_strip_harmless_type_conversions (tree);
 extern bool useless_type_conversion_p (tree, tree);
+extern bool harmless_type_conversion_p (tree, tree);
 extern bool types_compatible_p (tree, tree);
 
 /* Return the code for GIMPLE statement G.  */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c
@@ -0,0 +1,11 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+int
+foo (int a, short b, short c)
+{
+     int bc = b * c;
+        return a + (short)bc;
+}
+
+/* { dg-final { scan-assembler "mul" } } */
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/wmul-5.c
@@ -0,0 +1,10 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-a" } */
+
+long long
+foo (long long a, char *b, char *c)
+{
+  return a + *b * *c;
+}
+
+/* { dg-final { scan-assembler "umlal" } } */
--- a/gcc/tree-ssa-math-opts.c
+++ b/gcc/tree-ssa-math-opts.c
@@ -2117,23 +2117,19 @@  convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt,
   rhs1 = gimple_assign_rhs1 (stmt);
   rhs2 = gimple_assign_rhs2 (stmt);
 
-  if (TREE_CODE (rhs1) == SSA_NAME)
-    {
-      rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
-      if (is_gimple_assign (rhs1_stmt))
-	rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
-    }
-  else
+  if (TREE_CODE (rhs1) != SSA_NAME
+      || TREE_CODE (rhs2) != SSA_NAME)
     return false;
 
-  if (TREE_CODE (rhs2) == SSA_NAME)
-    {
-      rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
-      if (is_gimple_assign (rhs2_stmt))
-	rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
-    }
-  else
-    return false;
+  rhs1 = tree_ssa_strip_harmless_type_conversions (rhs1);
+  rhs1_stmt = SSA_NAME_DEF_STMT (rhs1);
+  if (is_gimple_assign (rhs1_stmt))
+    rhs1_code = gimple_assign_rhs_code (rhs1_stmt);
+
+  rhs2 = tree_ssa_strip_harmless_type_conversions(rhs2);
+  rhs2_stmt = SSA_NAME_DEF_STMT (rhs2);
+  if (is_gimple_assign (rhs2_stmt))
+    rhs2_code = gimple_assign_rhs_code (rhs2_stmt);
 
   if (code == PLUS_EXPR && rhs1_code == MULT_EXPR)
     {
--- a/gcc/tree-ssa.c
+++ b/gcc/tree-ssa.c
@@ -1484,6 +1484,33 @@  useless_type_conversion_p (tree outer_type, tree inner_type)
   return false;
 }
 
+/* Return true if the conversion from INNER_TYPE to OUTER_TYPE will
+   not alter the arithmetic meaning of a type, otherwise return false.
+
+   For example, widening an integer type leaves the value unchanged,
+   but narrowing an integer type can cause truncation.
+
+   Note that switching between signed and unsigned modes doesn't change
+   the underlying representation, and so is harmless.
+
+   This function is not yet a complete definition of what is harmless
+   but should reject everything that is not.  */
+
+bool
+harmless_type_conversion_p (tree outer_type, tree inner_type)
+{
+  /* If it's useless, it's also harmless.  */
+  if (useless_type_conversion_p (outer_type, inner_type))
+    return true;
+
+  if (INTEGRAL_TYPE_P (inner_type)
+      && INTEGRAL_TYPE_P (outer_type)
+      && TYPE_PRECISION (inner_type) <= TYPE_PRECISION (outer_type))
+    return true;
+
+  return false;
+}
+
 /* Return true if a conversion from either type of TYPE1 and TYPE2
    to the other is not required.  Otherwise return false.  */
 
@@ -1515,6 +1542,29 @@  tree_ssa_useless_type_conversion (tree expr)
   return false;
 }
 
+/* Return true if EXPR is a harmless type conversion, otherwise return
+   false.  */
+
+bool
+tree_ssa_harmless_type_conversion (tree expr)
+{
+  gimple stmt;
+
+  if (TREE_CODE (expr) != SSA_NAME)
+    return false;
+
+  stmt = SSA_NAME_DEF_STMT (expr);
+
+  if (!is_gimple_assign (stmt))
+    return false;
+
+  if (!CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt)))
+    return false;
+
+  return harmless_type_conversion_p (TREE_TYPE (gimple_assign_lhs (stmt)),
+				     TREE_TYPE (gimple_assign_rhs1 (stmt)));
+}
+
 /* Strip conversions from EXP according to
    tree_ssa_useless_type_conversion and return the resulting
    expression.  */
@@ -1527,6 +1577,18 @@  tree_ssa_strip_useless_type_conversions (tree exp)
   return exp;
 }
 
+/* Strip conversions from EXP according to
+   tree_ssa_harmless_type_conversion and return the resulting
+   expression.  */
+
+tree
+tree_ssa_strip_harmless_type_conversions (tree exp)
+{
+  while (tree_ssa_harmless_type_conversion (exp))
+    exp = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (exp));
+  return exp;
+}
+
 
 /* Internal helper for walk_use_def_chains.  VAR, FN and DATA are as
    described in walk_use_def_chains.