Message ID | 4E09B142.4020402@codesourcery.com |
---|---|
State | New |
Headers | show |
On Tue, Jun 28, 2011 at 12:47 PM, Andrew Stubbs <andrew.stubbs@gmail.com> wrote: > On 24/06/11 16:47, Richard Guenther wrote: >>> >>> I can certainly add checks to make sure that the skipped operations >>> > actually don't make any important changes to the value, but do I need >>> > to? >> >> Yes. > > OK, how about this patch? I'd name the predicate value_preserving_conversion_p which I think is what you mean. harmless isn't really descriptive. Note that you include non-value-preserving conversions, namely int -> unsigned int. Don't dispatch to useless_type_conversion_p, it's easy to enumerate which conversions are value-preserving. Don't try to match the tree_ssa_useless_* set of functions, instead put the value_preserving_conversion_p predicate in tree.[ch] and a suitable function using it in tree-ssa-math-opts.c. Thanks, Richard. > I've added checks to make sure the value is not truncated at any point. > > I've also changed the test cases to address Janis' comments. > > Andrew >
Hi, On Tue, 28 Jun 2011, Richard Guenther wrote: > I'd name the predicate value_preserving_conversion_p which I think is > what you mean. harmless isn't really descriptive. > > Note that you include non-value-preserving conversions, namely int -> > unsigned int. It seems that Andrew really does want to accept them. If so value_preserving_conversion_p would be the wrong name. It seems to me he wants to accept those conversions that make it possible to retrieve the old value, i.e. when "T1 x; (T1)(T2)x == x", then T1->T2 has the to-be-named property. bits_preserving? Hmm. Ciao, Michael.
On 28/06/11 16:53, Michael Matz wrote: > On Tue, 28 Jun 2011, Richard Guenther wrote: >> I'd name the predicate value_preserving_conversion_p which I think is >> what you mean. harmless isn't really descriptive. >> >> Note that you include non-value-preserving conversions, namely int -> >> unsigned int. > > It seems that Andrew really does want to accept them. If so > value_preserving_conversion_p would be the wrong name. It seems to me he > wants to accept those conversions that make it possible to retrieve the > old value, i.e. when "T1 x; (T1)(T2)x == x", then T1->T2 has the > to-be-named property. bits_preserving? Hmm. What I want (and I'm not totally clear on what this actually means) is to be able to optimize all the cases where the end result will be the same as the compiler produces now (using multiple multiply, shift, and add operations). Ok, so that's an obvious statement, but the point is that, right now, the compiler does nothing special when you cast from int -> unsigned int, or vice-versa, and I want to capture that somehow. There are some exceptions, I'm sure, but what are they? What is clear is that I don't want to just assume that casting from one signedness to the other is a show-stopper. For example: unsigned long long foo (unsigned long long a, unsigned char b, unsigned char c) { return a + b * c; } This appears to be entirely unsigned maths with plenty of spare precision, and therefore a dead cert for any SI->DI multiply-and-accumulate instruction, but not so - it is represented internally as: signed int tmp = (signed int)a * (signed int)b; unsigned long long result = a + (unsigned long long)tmp; Notice the unexpected signed int in the middle! I need to be able to get past that to optimize this properly. I've tried various test cases in which I cast signedness and mode around a bit, and so far it appear to perform safely, but probably I'm not be cunning enough. Andrew
Hi, On Tue, 28 Jun 2011, Andrew Stubbs wrote: > What I want (and I'm not totally clear on what this actually means) is > to be able to optimize all the cases where the end result will be the > same as the compiler produces now (using multiple multiply, shift, and > add operations). Okay, then you really want to look through value-preserving conversions. > Ok, so that's an obvious statement, but the point is that, right now, > the compiler does nothing special when you cast from int -> unsigned > int, or vice-versa, and I want to capture that somehow. There are some > exceptions, I'm sure, but what are they? Same-sized signed <-> unsigned conversions aren't value preserving: unsigned char c = 255; (signed char)c == -1; 255 != -1 unsigned -> larger sized signed is value preserving unsigned char c = 255; (signed short)c == 255; signed -> unsigned never is value preserving > multiply-and-accumulate instruction, but not so - it is represented > internally as: > > signed int tmp = (signed int)a * (signed int)b; > unsigned long long result = a + (unsigned long long)tmp; > > Notice the unexpected signed int in the middle! Yeah, the C standard requires this. > I need to be able to get past that to optimize this properly. Then you're lucky because unsigned char -> signed int is an embedding, hence value preserving. I thought we had a predicate for such conversions already, but seems I was wrong. So, create it as Richi said, but enumerate explicitely the cases you want to handle, and include only those that really are value preserving. Ciao, Michael.
On 28/06/11 17:37, Michael Matz wrote: >> What I want (and I'm not totally clear on what this actually means) is >> > to be able to optimize all the cases where the end result will be the >> > same as the compiler produces now (using multiple multiply, shift, and >> > add operations). > Okay, then you really want to look through value-preserving conversions. > >> > Ok, so that's an obvious statement, but the point is that, right now, >> > the compiler does nothing special when you cast from int -> unsigned >> > int, or vice-versa, and I want to capture that somehow. There are some >> > exceptions, I'm sure, but what are they? > Same-sized signed<-> unsigned conversions aren't value preserving: > unsigned char c = 255; (signed char)c == -1; 255 != -1 > unsigned -> larger sized signed is value preserving > unsigned char c = 255; (signed short)c == 255; > signed -> unsigned never is value preserving OK, so I've tried implementing this, and I find I hit against a problem: Given this test case: unsigned long long foo (unsigned long long a, signed char *b, signed char *c) { return a + *b * *c; } Those rules say that it should not be suitable for optimization because there's an implicit cast from signed int to unsigned long long. Without any widening multiplications allowed, GCC gives this code (for ARM): ldrsb r2, [r2, #0] ldrsb r3, [r3, #0] mul r2, r2, r3 adds r0, r0, r2 adc r1, r1, r2, asr #31 This is exactly what a signed widening multiply-and-accumulate with smlalbb would have done! OK, so the types in the testcase are a bit contrived, but my point is that I want to be able to use the widening-mult instructions everywhere that they would produce the same output and gcc would otherwise, and gcc just doesn't seem that interested in signed<->unsigned conversions. So, I'm happy to put in checks to ensure that truncations are not ignore, but I'm really not sure what's the right thing to do with the extends and signedness switches. Any suggestions? Andrew
On Fri, Jul 1, 2011 at 1:58 PM, Stubbs, Andrew <Andrew_Stubbs@mentor.com> wrote: > On 28/06/11 17:37, Michael Matz wrote: >>> What I want (and I'm not totally clear on what this actually means) is >>> > to be able to optimize all the cases where the end result will be the >>> > same as the compiler produces now (using multiple multiply, shift, and >>> > add operations). >> Okay, then you really want to look through value-preserving conversions. >> >>> > Ok, so that's an obvious statement, but the point is that, right now, >>> > the compiler does nothing special when you cast from int -> unsigned >>> > int, or vice-versa, and I want to capture that somehow. There are some >>> > exceptions, I'm sure, but what are they? >> Same-sized signed<-> unsigned conversions aren't value preserving: >> unsigned char c = 255; (signed char)c == -1; 255 != -1 >> unsigned -> larger sized signed is value preserving >> unsigned char c = 255; (signed short)c == 255; >> signed -> unsigned never is value preserving > > OK, so I've tried implementing this, and I find I hit against a problem: > > Given this test case: > > unsigned long long > foo (unsigned long long a, signed char *b, signed char *c) > { > return a + *b * *c; > } > > Those rules say that it should not be suitable for optimization because > there's an implicit cast from signed int to unsigned long long. > > Without any widening multiplications allowed, GCC gives this code (for ARM): > > ldrsb r2, [r2, #0] > ldrsb r3, [r3, #0] > mul r2, r2, r3 > adds r0, r0, r2 > adc r1, r1, r2, asr #31 > > This is exactly what a signed widening multiply-and-accumulate with > smlalbb would have done! > > OK, so the types in the testcase are a bit contrived, but my point is > that I want to be able to use the widening-mult instructions everywhere > that they would produce the same output and gcc would otherwise, and gcc > just doesn't seem that interested in signed<->unsigned conversions. > > So, I'm happy to put in checks to ensure that truncations are not > ignore, but I'm really not sure what's the right thing to do with the > extends and signedness switches. > > Any suggestions? Well - some operations work the same on both signedness if you just care about the twos-complement result. This includes multiplication (but not for example division). For this special case I suggest to not bother trying to invent a generic predicate but do something local in tree-ssa-math-opts.c. Richard. > Andrew >
On 07/01/2011 01:58 PM, Stubbs, Andrew wrote: > Given this test case: > > unsigned long long > foo (unsigned long long a, signed char *b, signed char *c) > { > return a + *b * *c; > } > > Those rules say that it should not be suitable for optimization because > there's an implicit cast from signed int to unsigned long long. Got it now! Casts from signed to unsigned are not value-preserving, but they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the same result bit-by-bit as the s64 result. The fact that s64 has an implicit 1111... in front, while an u64 has an implicit 0000... does not matter. Is this the meaning of the predicate you want? I think so, based on the discussion, but it's hard to say without seeing the cases enumerated (i.e. a patch). However, perhaps there is a catch. We can do the following thought experiment. What would happen if you had multiple widening multiplies? Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 128-bit unsigned? I believe in this case you couldn't optimize 8-bit signed to 128-bit unsigned. Would your code do it? Paolo
On 01/07/11 13:33, Paolo Bonzini wrote: > Got it now! Casts from signed to unsigned are not value-preserving, but > they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the > same result bit-by-bit as the s64 result. The fact that s64 has an > implicit 1111... in front, while an u64 has an implicit 0000... does not > matter. But, the 1111... and 0000... are not implicit. They are very real, and if applied incorrectly will change the result, I think. > Is this the meaning of the predicate you want? I think so, based on the > discussion, but it's hard to say without seeing the cases enumerated > (i.e. a patch). The purpose of this predicate is to determine whether any type conversions that occur between the output of a widening multiply, and the input of an addition have any bearing on the end result. We know what the effective output type of the multiply is (the size is 2x the input type, and the signed if either one of the inputs in signed), and we know what the input type of the addition is, but any amount of junk can lie in between. The problem is determining if it *is* junk. In an ideal world there would only be two cases to consider: 1. No conversion needed. 2. A single sign-extend or zero-extend (according to the type of the inputs) to match the input size of the addition. Anything else would be unsuitable for optimization. Of course, it's never that simple, but it should still be possible to boil down a list of conversions to one of these cases, if it's valid. The signedness of the input to the addition is not significant - the code would be the same either way. But, I is important not to try to zero-extend something that started out signed, and not to sign-extend something that started out unsigned. > However, perhaps there is a catch. We can do the following thought > experiment. What would happen if you had multiple widening multiplies? > Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 128-bit > unsigned? I believe in this case you couldn't optimize 8-bit signed to > 128-bit unsigned. Would your code do it? My code does not attempt to combine multiple multiplies. In any case, if you have two multiplications, surely you have at least three input values, so they can't be combined? It does attempt to combine a multiply and an addition, where a suitable madd* insn is available. (This is not new; I'm just trying to do it in more cases.) I have considered the case where you have "(a * b) + (c * d)", but have not yet coded anything for it. At present, the code will simply choose whichever multiply happens to find itself the first input operand of the plus, and ignores the other, even if the first turns out not to be a suitable candidate. Andrew
On 07/01/2011 03:30 PM, Stubbs, Andrew wrote: >> > However, perhaps there is a catch. We can do the following thought >> > experiment. What would happen if you had multiple widening multiplies? >> > Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to 128-bit >> > unsigned? I believe in this case you couldn't optimize 8-bit signed to >> > 128-bit unsigned. Would your code do it? > My code does not attempt to combine multiple multiplies. In any case, if > you have two multiplications, surely you have at least three input > values, so they can't be combined? What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to (u128)c + (u128)((s8)a * (s8)b). Paolo
On 01/07/11 15:40, Paolo Bonzini wrote: > On 07/01/2011 03:30 PM, Stubbs, Andrew wrote: >>> > However, perhaps there is a catch. We can do the following thought >>> > experiment. What would happen if you had multiple widening multiplies? >>> > Like 8-bit signed to 64-bit unsigned and then 64-bit unsigned to >>> 128-bit >>> > unsigned? I believe in this case you couldn't optimize 8-bit signed to >>> > 128-bit unsigned. Would your code do it? >> My code does not attempt to combine multiple multiplies. In any case, if >> you have two multiplications, surely you have at least three input >> values, so they can't be combined? > > What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to > (u128)c + (u128)((s8)a * (s8)b). Oh I see, sorry. Yes, that's exactly what I'm trying to do here. No, wait, I don't see. Where are these multiple widening multiplies you're talking about? I only see one multiply? Andrew
On 01/07/11 14:30, Stubbs, Andrew wrote: >> Got it now! Casts from signed to unsigned are not value-preserving, but >> > they are "bit-preserving": s32->s64 obviously is, and s32->u64 has the >> > same result bit-by-bit as the s64 result. The fact that s64 has an >> > implicit 1111... in front, while an u64 has an implicit 0000... does not >> > matter. > But, the 1111... and 0000... are not implicit. They are very real, and > if applied incorrectly will change the result, I think. Wait, I'm clearly confused .... When I try a s32->u64 conversion, the expand pass generates a sign_extend insn. Clearly it's the source type that determines the extension type, not the destination type ... and I'm a dunce! Thanks :) Andrew
On 07/01/2011 04:55 PM, Stubbs, Andrew wrote: >> > >> > What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to >> > (u128)c + (u128)((s8)a * (s8)b). > Oh I see, sorry. Yes, that's exactly what I'm trying to do here. > > No, wait, I don't see. Where are these multiple widening multiplies > you're talking about? I only see one multiply? I meant one multiplication with multiple widening steps. Not clear at all, sorry. Paolo
On 06/28/11 18:14, Andrew Stubbs wrote: > unsigned long long > foo (unsigned long long a, unsigned char b, unsigned char c) > { > return a + b * c; > } > > This appears to be entirely unsigned maths with plenty of spare > precision, and therefore a dead cert for any SI->DI > multiply-and-accumulate instruction, but not so - it is represented > internally as: > > signed int tmp = (signed int)a * (signed int)b; > unsigned long long result = a + (unsigned long long)tmp; > > Notice the unexpected signed int in the middle! I need to be able to get > past that to optimize this properly. Since both inputs are positive in a signed int (they must be, being cast from a smaller unsigned value), you can infer that it does not matter whether you treat the result of the multiplication as a signed or an unsigned value. It is positive in any case. So, I think the thing to test is: if the accumulate step requires widening the result of the multiplication, either the cast must be value preserving (widening unsigned to signed), or you must be able to prove that the multiplication produces a positive result. If the accumulate step just casts the multiplication result from signed to unsigned, keeping the precision the same, you can ignore the cast since the addition is unaffected by it. Bernd
On 01/07/11 16:54, Paolo Bonzini wrote: > On 07/01/2011 04:55 PM, Stubbs, Andrew wrote: >>> > >>> > What about (u128)c + (u64)((s8)a * (s8)b)? You cannot convert this to >>> > (u128)c + (u128)((s8)a * (s8)b). >> Oh I see, sorry. Yes, that's exactly what I'm trying to do here. >> >> No, wait, I don't see. Where are these multiple widening multiplies >> you're talking about? I only see one multiply? > > I meant one multiplication with multiple widening steps. Not clear at > all, sorry. Yes, I see now, the whole purpose of my patch set is widening by more than one mode. The case of the multiply-and-accumulate is the only way there can be more than one step though. Widening multiplies themselves are always handled as one unit. Andrew
2011-06-28 Andrew Stubbs <ams@codesourcery.com> gcc/ * gimple.h (tree_ssa_harmless_type_conversion): New prototype. (tree_ssa_strip_harmless_type_conversions): New prototype. (harmless_type_conversion_p): New prototype. * tree-ssa-math-opts.c (convert_plusminus_to_widen): Look for multiply statement beyond no-op conversion statements. * tree-ssa.c (harmless_type_conversion_p): New function. (tree_ssa_harmless_type_conversion): New function. (tree_ssa_strip_harmless_type_conversions): New function. gcc/testsuite/ * gcc.target/arm/wmul-5.c: New file. * gcc.target/arm/no-wmla-1.c: New file. --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -1090,8 +1090,11 @@ extern bool validate_gimple_arglist (const_gimple, ...); /* In tree-ssa.c */ extern bool tree_ssa_useless_type_conversion (tree); +extern bool tree_ssa_harmless_type_conversion (tree); extern tree tree_ssa_strip_useless_type_conversions (tree); +extern tree tree_ssa_strip_harmless_type_conversions (tree); extern bool useless_type_conversion_p (tree, tree); +extern bool harmless_type_conversion_p (tree, tree); extern bool types_compatible_p (tree, tree); /* Return the code for GIMPLE statement G. */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/no-wmla-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +int +foo (int a, short b, short c) +{ + int bc = b * c; + return a + (short)bc; +} + +/* { dg-final { scan-assembler "mul" } } */ --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/wmul-5.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -march=armv7-a" } */ + +long long +foo (long long a, char *b, char *c) +{ + return a + *b * *c; +} + +/* { dg-final { scan-assembler "umlal" } } */ --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -2117,23 +2117,19 @@ convert_plusminus_to_widen (gimple_stmt_iterator *gsi, gimple stmt, rhs1 = gimple_assign_rhs1 (stmt); rhs2 = gimple_assign_rhs2 (stmt); - if (TREE_CODE (rhs1) == SSA_NAME) - { - rhs1_stmt = SSA_NAME_DEF_STMT (rhs1); - if (is_gimple_assign (rhs1_stmt)) - rhs1_code = gimple_assign_rhs_code (rhs1_stmt); - } - else + if (TREE_CODE (rhs1) != SSA_NAME + || TREE_CODE (rhs2) != SSA_NAME) return false; - if (TREE_CODE (rhs2) == SSA_NAME) - { - rhs2_stmt = SSA_NAME_DEF_STMT (rhs2); - if (is_gimple_assign (rhs2_stmt)) - rhs2_code = gimple_assign_rhs_code (rhs2_stmt); - } - else - return false; + rhs1 = tree_ssa_strip_harmless_type_conversions (rhs1); + rhs1_stmt = SSA_NAME_DEF_STMT (rhs1); + if (is_gimple_assign (rhs1_stmt)) + rhs1_code = gimple_assign_rhs_code (rhs1_stmt); + + rhs2 = tree_ssa_strip_harmless_type_conversions(rhs2); + rhs2_stmt = SSA_NAME_DEF_STMT (rhs2); + if (is_gimple_assign (rhs2_stmt)) + rhs2_code = gimple_assign_rhs_code (rhs2_stmt); if (code == PLUS_EXPR && rhs1_code == MULT_EXPR) { --- a/gcc/tree-ssa.c +++ b/gcc/tree-ssa.c @@ -1484,6 +1484,33 @@ useless_type_conversion_p (tree outer_type, tree inner_type) return false; } +/* Return true if the conversion from INNER_TYPE to OUTER_TYPE will + not alter the arithmetic meaning of a type, otherwise return false. + + For example, widening an integer type leaves the value unchanged, + but narrowing an integer type can cause truncation. + + Note that switching between signed and unsigned modes doesn't change + the underlying representation, and so is harmless. + + This function is not yet a complete definition of what is harmless + but should reject everything that is not. */ + +bool +harmless_type_conversion_p (tree outer_type, tree inner_type) +{ + /* If it's useless, it's also harmless. */ + if (useless_type_conversion_p (outer_type, inner_type)) + return true; + + if (INTEGRAL_TYPE_P (inner_type) + && INTEGRAL_TYPE_P (outer_type) + && TYPE_PRECISION (inner_type) <= TYPE_PRECISION (outer_type)) + return true; + + return false; +} + /* Return true if a conversion from either type of TYPE1 and TYPE2 to the other is not required. Otherwise return false. */ @@ -1515,6 +1542,29 @@ tree_ssa_useless_type_conversion (tree expr) return false; } +/* Return true if EXPR is a harmless type conversion, otherwise return + false. */ + +bool +tree_ssa_harmless_type_conversion (tree expr) +{ + gimple stmt; + + if (TREE_CODE (expr) != SSA_NAME) + return false; + + stmt = SSA_NAME_DEF_STMT (expr); + + if (!is_gimple_assign (stmt)) + return false; + + if (!CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt))) + return false; + + return harmless_type_conversion_p (TREE_TYPE (gimple_assign_lhs (stmt)), + TREE_TYPE (gimple_assign_rhs1 (stmt))); +} + /* Strip conversions from EXP according to tree_ssa_useless_type_conversion and return the resulting expression. */ @@ -1527,6 +1577,18 @@ tree_ssa_strip_useless_type_conversions (tree exp) return exp; } +/* Strip conversions from EXP according to + tree_ssa_harmless_type_conversion and return the resulting + expression. */ + +tree +tree_ssa_strip_harmless_type_conversions (tree exp) +{ + while (tree_ssa_harmless_type_conversion (exp)) + exp = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (exp)); + return exp; +} + /* Internal helper for walk_use_def_chains. VAR, FN and DATA are as described in walk_use_def_chains.