diff mbox

Tree-level fix for PR 69526

Message ID d6e24bfd-fc3e-4160-fe27-db3eda5b857c@linux.vnet.ibm.com
State New
Headers show

Commit Message

Robin Dapp Dec. 7, 2016, 4:14 p.m. UTC
> So we have (uint64_t)(uint32 + -1U) + 1 and using TYPE_SIGN (inner_type)

> produces (uint64_t)uint32 + -1U + 1.  This simply means that we cannot ignore

> overflow of the inner operation and for some reason your change

> to extract_range_from_binary_expr didn't catch this.  That is _8 + 4294967295

> overflows but we ignored that.


In this case the range of _8 was [1,1] so no overflow.
I think the problem is rather about the interpretation of the inner
constant. I tried discerning two cases now, a range split (i.e. when the
single range of a binop variable becomes an anti range or two ranges
after combining it with the binop's other range) and an overflow of the
range's min and max.
If the range didn't split, we can perform the simplification. If there
was a min and max overflow, we have to interpret the inner constant as
signed and perform a sign extension before converting it to the outer
type. Without overflow we can use TYPE_SIGN (inner_type).
Does this make sense?

Included the remarks and attached the new version.

Regards
 Robin

Comments

Richard Biener Dec. 13, 2016, 2:13 p.m. UTC | #1
On Wed, Dec 7, 2016 at 5:14 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>> So we have (uint64_t)(uint32 + -1U) + 1 and using TYPE_SIGN (inner_type)

>> produces (uint64_t)uint32 + -1U + 1.  This simply means that we cannot ignore

>> overflow of the inner operation and for some reason your change

>> to extract_range_from_binary_expr didn't catch this.  That is _8 + 4294967295

>> overflows but we ignored that.

>

> In this case the range of _8 was [1,1] so no overflow.

> I think the problem is rather about the interpretation of the inner

> constant. I tried discerning two cases now, a range split (i.e. when the

> single range of a binop variable becomes an anti range or two ranges

> after combining it with the binop's other range) and an overflow of the

> range's min and max.

> If the range didn't split, we can perform the simplification. If there

> was a min and max overflow, we have to interpret the inner constant as

> signed and perform a sign extension before converting it to the outer

> type. Without overflow we can use TYPE_SIGN (inner_type).

> Does this make sense?


I'm not sure there is anything to "interpret" -- the operation is unsigned
and overflow is when the operation may wrap around zero.  There might
be clever ways of re-writing the expression to
(uint64_t)((uint32_t)((int32_t)uint32 + -1) + 1)
avoiding the overflow and thus allowing the transform but I'm not sure that's
good.

A related thing would be canonicalizing unsigned X plus CST to
unsigned X minus CST'
if CST' has a smaller absolute value than CST.  I think currently we
simply canonicalize
to 'plus CST' but also only in fold-const.c, not in match.pd (ah we
do, but only in a simplified manner).

That said, can we leave that "trick" out of the patch?  I think your
more complicated
"overflows" result from extract_range_from_binary_expr_1 doesn't apply to all
ops (like MULT_EXPR where more complicated cases can arise).

Richard.


> Included the remarks and attached the new version.

>

> Regards

>  Robin
Robin Dapp Jan. 10, 2017, 1:32 p.m. UTC | #2
Perhaps I'm still missing how some cases are handled or not handled,
sorry for the noise.

> I'm not sure there is anything to "interpret" -- the operation is unsigned

> and overflow is when the operation may wrap around zero.  There might

> be clever ways of re-writing the expression to

> (uint64_t)((uint32_t)((int32_t)uint32 + -1) + 1)

> avoiding the overflow and thus allowing the transform but I'm not sure that's

> good.


The extra work I introduced was to discern between

 (uint64_t)(a + UINT_MAX) + 1  -> (uint64_t)(a),
 (uint64_t)(a + UINT_MAX) + 1  -> (uint64_t)(a) + (uint64_t)(UINT_MAX + 1),

For a's range of [1,1] there is an overflow in both cases.
We still want to simplify the first case by combining UINT_MAX + 1 -> 0.
If "interpreting" UINT_MAX as -1 is not the correct thing to do, perhaps
(uint64_t)((uint32_t)(UINT_MAX + 1)) is? This fails, however, if the
outer constant is larger than UINT_MAX. What else can we do here?
Do we see cases like the second one at all? If it's not needed, the
extra work is likely not needed.

> A related thing would be canonicalizing unsigned X plus CST to

> unsigned X minus CST'

> if CST' has a smaller absolute value than CST.  I think currently we

> simply canonicalize

> to 'plus CST' but also only in fold-const.c, not in match.pd (ah we

> do, but only in a simplified manner).


I can imagine this could simplify the treatment of some cases, yet I'm
already a bit lost with the current cases :)

> That said, can we leave that "trick" out of the patch?  I think your

> more complicated

> "overflows" result from extract_range_from_binary_expr_1 doesn't apply to all

> ops (like MULT_EXPR where more complicated cases can arise).


There is certainly additional work to be done for MULT_EXPR, I
disregarded it so far. For this patch, I'd rather conservatively assume
overflow.

Regards
 Robin
Robin Dapp Jan. 17, 2017, 7:34 a.m. UTC | #3
Ping.

To put it shortly, I'm not sure how to differentiate between:

example range of a: [3,3]
(ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(-1 + 1), sign extend

example range of a: [0,0]
(ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(UINT_MAX + 1), no
sign extend

In this case, there is an "ok" overflow in the first example's range and
no overflow in the second, but I don't think this suffices as criterion.
As said, your proposed canonicalization (" unsigned X plus CST to
unsigned X minus CST' if CST' has a smaller absolute value than CST)
might help here, but I'm unsure how to do it currently.
Richard Biener Jan. 17, 2017, 9:48 a.m. UTC | #4
On Tue, Jan 10, 2017 at 2:32 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> Perhaps I'm still missing how some cases are handled or not handled,

> sorry for the noise.

>

>> I'm not sure there is anything to "interpret" -- the operation is unsigned

>> and overflow is when the operation may wrap around zero.  There might

>> be clever ways of re-writing the expression to

>> (uint64_t)((uint32_t)((int32_t)uint32 + -1) + 1)

>> avoiding the overflow and thus allowing the transform but I'm not sure that's

>> good.

>

> The extra work I introduced was to discern between

>

>  (uint64_t)(a + UINT_MAX) + 1  -> (uint64_t)(a),

>  (uint64_t)(a + UINT_MAX) + 1  -> (uint64_t)(a) + (uint64_t)(UINT_MAX + 1),

>

> For a's range of [1,1] there is an overflow in both cases.

> We still want to simplify the first case by combining UINT_MAX + 1 -> 0.


So there's the case where a == 0 where (uint64_t)(0 + UINT_MAX) + 1 is not zero.

I think we're still searching for the proper condition on when it is allowed to
re-write

 (uint64_t)(uint32_t + uint32_t-CST) + uint64_t-CST

to

 (uint64_t)(uint32_t) + (uint64_t)uint32_t-CST + uint64_t-CST

or to

 (uint64_t)(uint32_t) + (uint64_t)(uint32_t-CST + (uint32_t)uint64_t-CST)

> If "interpreting" UINT_MAX as -1 is not the correct thing to do, perhaps

> (uint64_t)((uint32_t)(UINT_MAX + 1)) is? This fails, however, if the

> outer constant is larger than UINT_MAX. What else can we do here?

> Do we see cases like the second one at all? If it's not needed, the

> extra work is likely not needed.


We do have the need to associate and simplfy across conversions but of
course we have to do it conservatively correct.

>> A related thing would be canonicalizing unsigned X plus CST to

>> unsigned X minus CST'

>> if CST' has a smaller absolute value than CST.  I think currently we

>> simply canonicalize

>> to 'plus CST' but also only in fold-const.c, not in match.pd (ah we

>> do, but only in a simplified manner).

>

> I can imagine this could simplify the treatment of some cases, yet I'm

> already a bit lost with the current cases :)


Yes, but I am lost a bit as well.  I don't know the correct conditions to test
off-head -- I know we may not introduce new undefined overflow ops and
of course we need to not compute wrong numbers either.

>> That said, can we leave that "trick" out of the patch?  I think your

>> more complicated

>> "overflows" result from extract_range_from_binary_expr_1 doesn't apply to all

>> ops (like MULT_EXPR where more complicated cases can arise).

>

> There is certainly additional work to be done for MULT_EXPR, I

> disregarded it so far. For this patch, I'd rather conservatively assume

> overflow.


Ok...

Richard.

> Regards

>  Robin

>
Robin Dapp Feb. 2, 2017, 9:27 a.m. UTC | #5
I skimmed through the code to see where transformation like
(a - 1) -> (a + UINT_MAX) are performed. It seems there are only two
places, match.pd (/* A - B -> A + (-B) if B is easily negatable.  */)
and fold-const.c.

In order to be able to reliably know whether to zero-extend or to
sign-extend the constant (1) in

 (ulong)(a - 1) + 1 -> (ulong)(a) + 0
                 or -> (ulong)(a) + (ulong)(UINT_MAX + 1)

we'd have to know if the constant was converted by a transformation like
above.

I first tried removing the two transformations altogether but this
causes around 20 new regressions on s390x and I didn't go through all of
them to see whether they can be fixed. Is there a rationale for applying
the transformations other than canonicalization? I saw some
optimizations that only apply to PLUS_EXPRs but that can easily be
changed to also include MINUS_EXPR.

My other attempt entails an additional flag TREE_WAS_SIGNED for the lack
of a better naming idea. It is set when the above transformation takes
place. I used (NODE)->base.deprecated_flag but there may be better
choices. Tentative/example patch attached for reference.

Using this, the type extension in my original patch can be changed to
w1 = w1.from (w1, TYPE_PRECISION (type), TREE_WAS_SIGNED (@1)
	      ? SIGNED : TYPE_SIGN (inner_type));
which works for me and does not introduce regressions on s390x.

Is this a viable approach?

Regards
 Robindiff --git a/gcc/fold-const.c b/gcc/fold-const.c
index c649e54..cbb7469 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -9648,9 +9648,15 @@ fold_binary_loc (location_t loc,
 	       && (TREE_CODE (op1) != REAL_CST
 		   || REAL_VALUE_NEGATIVE (TREE_REAL_CST (op1))))
 	      || INTEGRAL_TYPE_P (type)))
-	return fold_build2_loc (loc, PLUS_EXPR, type,
-				fold_convert_loc (loc, type, arg0),
-				negate_expr (op1));
+	{
+	  tree negtem = negate_expr (op1);
+	  if (CONSTANT_CLASS_P (negtem))
+	    TREE_WAS_SIGNED (negtem) = 1;
+	  tree tem = fold_build2_loc (loc, PLUS_EXPR, type,
+				      fold_convert_loc (loc, type, arg0),
+				      negtem);
+	  return tem;
+	}
 
       /* Fold &a[i] - &a[j] to i-j.  */
       if (TREE_CODE (arg0) == ADDR_EXPR
@@ -13620,6 +13626,7 @@ fold_negate_const (tree arg0, tree type)
 	t = force_fit_type (type, val, 1,
 			    (overflow | TREE_OVERFLOW (arg0))
 			    && !TYPE_UNSIGNED (type));
+	TREE_WAS_SIGNED (t) = TREE_WAS_SIGNED (arg0);
 	break;
       }
 
diff --git a/gcc/match.pd b/gcc/match.pd
index b3f2063..e791942 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -870,7 +870,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (minus @0 negate_expr_p@1)
  (if (!FIXED_POINT_TYPE_P (type))
- (plus @0 (negate @1))))
+  (with {
+   if (CONSTANT_CLASS_P (@1))
+     TREE_WAS_SIGNED (@1) = 1;
+   }
+ (plus @0 (negate @1)))))
 
 /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST))
    when profitable.
@@ -1223,8 +1227,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 		/* Extend @1 to TYPE.  Perform sign extension if the range
 		   overflowed but did not split.  */
-		w1 = w1.from (w1, TYPE_PRECISION (type), ovf.ovf ? SIGNED :
-		    TYPE_SIGN (inner_type));
+		w1 = w1.from (w1, TYPE_PRECISION (type), TREE_WAS_SIGNED (@1)
+			      ? SIGNED : TYPE_SIGN (inner_type));
 		/*
 		w1 = w1.from (w1, TYPE_PRECISION (type), TYPE_SIGN (inner_type));
 		    */
diff --git a/gcc/tree.h b/gcc/tree.h
index 62cd7bb..1e7efd9 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -735,11 +735,18 @@ extern void omp_clause_range_check_failed (const_tree, const char *, int,
 
 #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
 
+/* Foo.  */
+
+#define TREE_WAS_SIGNED(NODE) (CST_CHECK (NODE)->base.deprecated_flag)
+
 /* TREE_OVERFLOW can only be true for EXPR of CONSTANT_CLASS_P.  */
 
 #define TREE_OVERFLOW_P(EXPR) \
  (CONSTANT_CLASS_P (EXPR) && TREE_OVERFLOW (EXPR))
 
+#define TREE_WAS_SIGNED_P(EXPR) \
+ (CONSTANT_CLASS_P (EXPR) && TREE_WAS_SIGNED (EXPR))
+
 /* In a VAR_DECL, FUNCTION_DECL, NAMESPACE_DECL or TYPE_DECL,
    nonzero means name is to be accessible from outside this translation unit.
    In an IDENTIFIER_NODE, nonzero means an external declaration

Robin Dapp May 9, 2017, 7:13 a.m. UTC | #6
ping.
Bin.Cheng May 11, 2017, 3:06 p.m. UTC | #7
On Tue, Jan 17, 2017 at 9:48 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, Jan 10, 2017 at 2:32 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:

>> Perhaps I'm still missing how some cases are handled or not handled,

>> sorry for the noise.

>>

>>> I'm not sure there is anything to "interpret" -- the operation is unsigned

>>> and overflow is when the operation may wrap around zero.  There might

>>> be clever ways of re-writing the expression to

>>> (uint64_t)((uint32_t)((int32_t)uint32 + -1) + 1)

>>> avoiding the overflow and thus allowing the transform but I'm not sure that's

>>> good.

>>

>> The extra work I introduced was to discern between

>>

>>  (uint64_t)(a + UINT_MAX) + 1  -> (uint64_t)(a),

>>  (uint64_t)(a + UINT_MAX) + 1  -> (uint64_t)(a) + (uint64_t)(UINT_MAX + 1),

>>

>> For a's range of [1,1] there is an overflow in both cases.

>> We still want to simplify the first case by combining UINT_MAX + 1 -> 0.

>

> So there's the case where a == 0 where (uint64_t)(0 + UINT_MAX) + 1 is not zero.

>

> I think we're still searching for the proper condition on when it is allowed to

> re-write

>

>  (uint64_t)(uint32_t + uint32_t-CST) + uint64_t-CST

>

> to

>

>  (uint64_t)(uint32_t) + (uint64_t)uint32_t-CST + uint64_t-CST

Hmm, won't (uint32_t + uint32_t-CST) doesn't overflow be sufficient
condition for such transformation?
>

> or to

>

>  (uint64_t)(uint32_t) + (uint64_t)(uint32_t-CST + (uint32_t)uint64_t-CST)

We don't actually need to prove this?  What complicates this is when
it's safe to transform:
 (uint64_t)(uint32_t + uint32_t-CST) + uint64_t-CST
into
 (uint64_t)(uint32_t + uint32_t-CSTx)
where
 uint32_t-CSTx = uint32_t-CST + (uint32_t)uint64_t-CST

Am I misunderstanding the whole thing?

Thanks,
bin
>

>> If "interpreting" UINT_MAX as -1 is not the correct thing to do, perhaps

>> (uint64_t)((uint32_t)(UINT_MAX + 1)) is? This fails, however, if the

>> outer constant is larger than UINT_MAX. What else can we do here?

>> Do we see cases like the second one at all? If it's not needed, the

>> extra work is likely not needed.

>

> We do have the need to associate and simplfy across conversions but of

> course we have to do it conservatively correct.

>

>>> A related thing would be canonicalizing unsigned X plus CST to

>>> unsigned X minus CST'

>>> if CST' has a smaller absolute value than CST.  I think currently we

>>> simply canonicalize

>>> to 'plus CST' but also only in fold-const.c, not in match.pd (ah we

>>> do, but only in a simplified manner).

>>

>> I can imagine this could simplify the treatment of some cases, yet I'm

>> already a bit lost with the current cases :)

>

> Yes, but I am lost a bit as well.  I don't know the correct conditions to test

> off-head -- I know we may not introduce new undefined overflow ops and

> of course we need to not compute wrong numbers either.

>

>>> That said, can we leave that "trick" out of the patch?  I think your

>>> more complicated

>>> "overflows" result from extract_range_from_binary_expr_1 doesn't apply to all

>>> ops (like MULT_EXPR where more complicated cases can arise).

>>

>> There is certainly additional work to be done for MULT_EXPR, I

>> disregarded it so far. For this patch, I'd rather conservatively assume

>> overflow.

>

> Ok...

>

> Richard.

>

>> Regards

>>  Robin

>>
Robin Dapp May 18, 2017, 2:45 p.m. UTC | #8
> Hmm, won't (uint32_t + uint32_t-CST) doesn't overflow be sufficient

> condition for such transformation?


Yes, in principle this should suffice.  What we're actually looking for
is something like a "proper" (or no) overflow, i.e. an overflow in both
min and max of the value range.  In

 (a + cst1) + cst2

an overflow of (a + cst1) will still produce a valid range as long as
overflow(a_min + cst1) == overflow(a_max + cst1).  When max overflows
but min does not we must not simplify.  Currently, I'm checking if the
range of (a + cst1) is still a VR_RANGE, for now disregarding
ANTI_RANGEs which could most likely be dealt with as well.

A major discussion point in my last try was to wrongly always use
sign-extend when extending cst1 to the outer type.  This is now changed
to use sign-extension when (a + cst1) "properly" overflows as in

 ASSUME (a > 0);
 (unsigned long)(a + UINT_MAX) + 1;

resulting in (unsigned long)(a) + (unsigned long)0, or use the type sign
of the constant like in

 ASSUME (a < 2);
 (unsigned long)(a + 4294967294u) + 10;

resulting in (unsigned long)(a) + (unsigned long)((unsigned
long)4294967294 + (unsigned long)10).  The additional flag from the last
patch is not necessary.

Test suite is clean on s390x and x86, bootstraps successful.

Regards
 Robin
Bin.Cheng May 18, 2017, 3:36 p.m. UTC | #9
On Thu, May 18, 2017 at 3:47 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
> match.pd part of the patch.

>

> gcc/ChangeLog:

>

> 2017-05-18  Robin Dapp  <rdapp@linux.vnet.ibm.com>

>

>         * match.pd: Simplify wrapped binary operations.

>         * tree-vrp.c (extract_range_from_binary_expr_1): Add overflow

>         parameter.

>         (extract_range_from_binary_expr): Likewise.

>         * tree-vrp.h: Export.


Hi,
I didn't follow this issue from the beginning, so might asking stupid questions.

> diff --git a/gcc/match.pd b/gcc/match.pd

> index 80a17ba..3fa18b9 100644

> --- a/gcc/match.pd

> +++ b/gcc/match.pd

> @@ -1290,6 +1290,85 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

>      (if (cst && !TREE_OVERFLOW (cst))

>       (plus { cst; } @0))))

>

> +/* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */

> +#if GIMPLE

> +   (for outer_op (plus minus)

> +     (for inner_op (plus minus)

> +       (simplify

> +     (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)

> +       (if (TREE_CODE (type) == INTEGER_TYPE

> +        && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3)))

> +        (with

> +        {

> +          bool ovf = true;

> +

> +          tree cst = NULL_TREE;

> +          tree inner_type = TREE_TYPE (@3);

> +          value_range vr = VR_INITIALIZER;

> +

> +          /* Convert combined constant to tree of outer type if

> +         there was no overflow in the original operation.  */

> +          wide_int minv, maxv;

> +          if (TYPE_OVERFLOW_UNDEFINED (inner_type)

> +          || (extract_range_from_binary_expr (&vr, inner_op,

> +            inner_type, @0, @1, &ovf), vr.type == VR_RANGE))

Any reason to expose tree-vrp.c internal interface here?  The function
looks quite expensive.  Overflow check can be done by get_range_info
and simple wi::cmp calls.  Existing code like in
tree-ssa-loop-niters.c already does that.  Also could you avoid using
comma expressions in condition please? It only makes the code harder
to be read.

Thanks,
bin
Robin Dapp May 18, 2017, 4:08 p.m. UTC | #10
> Any reason to expose tree-vrp.c internal interface here?  The function

> looks quite expensive.  Overflow check can be done by get_range_info

> and simple wi::cmp calls.  Existing code like in

> tree-ssa-loop-niters.c already does that.  Also could you avoid using

> comma expressions in condition please? It only makes the code harder

> to be read.


I tried get_range_info () as well and got a FAIL in
gcc.c-torture/execute/bitfld-5.c.
where get_range_info () returns a VR_RANGE but extract...() gives
VR_VARYING.  The test case relies on not simplifying, i.e. would expect
a VR_VARYING here but I didn't look into it more.

Also, is there a possibility to know if there was an "ok" overflow or
not from get_range_info ()'s output? Would I have to compare the result
with the involved variable's range?

Regards
 Robin
Bin.Cheng May 18, 2017, 4:48 p.m. UTC | #11
On Thu, May 18, 2017 at 5:08 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>> Any reason to expose tree-vrp.c internal interface here?  The function

>> looks quite expensive.  Overflow check can be done by get_range_info

>> and simple wi::cmp calls.  Existing code like in

>> tree-ssa-loop-niters.c already does that.  Also could you avoid using

>> comma expressions in condition please? It only makes the code harder

>> to be read.

>

> I tried get_range_info () as well and got a FAIL in

> gcc.c-torture/execute/bitfld-5.c.

> where get_range_info () returns a VR_RANGE but extract...() gives

> VR_VARYING.  The test case relies on not simplifying, i.e. would expect

> a VR_VARYING here but I didn't look into it more.

I can guess what is happening here.  It's a 40 bits unsigned long long
field, (s.b-8) will be like:
_1 = s.b
_2 = _1 + 0xfffffffff8
Also get_range_info returns value range [0, 0xFFFFFFFFFF] for _1.
You'd need to check if _1(with range [0, 0xFFFFFFFFFF]) + 0xfffffffff8
overflows against precision of the bit-field which is 40 bits
precision.  The failure might because overflowness is checked against
unsigned long long's precision which is 64 bits.

>

> Also, is there a possibility to know if there was an "ok" overflow or

> not from get_range_info ()'s output? Would I have to compare the result

> with the involved variable's range?

I think you have to check it manually against max/min value of that
type precision.

Thanks,
bin
>

> Regards

>  Robin

>
Robin Dapp May 19, 2017, 10:09 a.m. UTC | #12
> I can guess what is happening here.  It's a 40 bits unsigned long long

> field, (s.b-8) will be like:

> _1 = s.b

> _2 = _1 + 0xfffffffff8

> Also get_range_info returns value range [0, 0xFFFFFFFFFF] for _1.

> You'd need to check if _1(with range [0, 0xFFFFFFFFFF]) + 0xfffffffff8

> overflows against precision of the bit-field which is 40 bits

> precision.  The failure might because overflowness is checked against

> unsigned long long's precision which is 64 bits.


>> Also, is there a possibility to know if there was an "ok" overflow or

>> not from get_range_info ()'s output? Would I have to compare the result

>> with the involved variable's range?

> I think you have to check it manually against max/min value of that

> type precision.


Currently, extract_... () does that all that for me, is it really too
expensive to call? I guess, using get_range_info first and calling
extract when get_range_info returns a VR_RANGE is not really a favorable
thing to do either? :)

Regards
 Robin
Bin.Cheng May 19, 2017, 10:13 a.m. UTC | #13
On Fri, May 19, 2017 at 11:09 AM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>> I can guess what is happening here.  It's a 40 bits unsigned long long

>> field, (s.b-8) will be like:

>> _1 = s.b

>> _2 = _1 + 0xfffffffff8

>> Also get_range_info returns value range [0, 0xFFFFFFFFFF] for _1.

>> You'd need to check if _1(with range [0, 0xFFFFFFFFFF]) + 0xfffffffff8

>> overflows against precision of the bit-field which is 40 bits

>> precision.  The failure might because overflowness is checked against

>> unsigned long long's precision which is 64 bits.

>

>>> Also, is there a possibility to know if there was an "ok" overflow or

>>> not from get_range_info ()'s output? Would I have to compare the result

>>> with the involved variable's range?

>> I think you have to check it manually against max/min value of that

>> type precision.

>

> Currently, extract_... () does that all that for me, is it really too

> expensive to call? I guess, using get_range_info first and calling

> extract when get_range_info returns a VR_RANGE is not really a favorable

> thing to do either? :)

Not only the cost, we should avoid introducing more interfaces while
old ones can do the work.  Anyway, it's Richard's call here.

Thanks,
bin
>

> Regards

>  Robin

>
Richard Biener May 19, 2017, 10:22 a.m. UTC | #14
On Fri, May 19, 2017 at 12:13 PM, Bin.Cheng <amker.cheng@gmail.com> wrote:
> On Fri, May 19, 2017 at 11:09 AM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:

>>> I can guess what is happening here.  It's a 40 bits unsigned long long

>>> field, (s.b-8) will be like:

>>> _1 = s.b

>>> _2 = _1 + 0xfffffffff8

>>> Also get_range_info returns value range [0, 0xFFFFFFFFFF] for _1.

>>> You'd need to check if _1(with range [0, 0xFFFFFFFFFF]) + 0xfffffffff8

>>> overflows against precision of the bit-field which is 40 bits

>>> precision.  The failure might because overflowness is checked against

>>> unsigned long long's precision which is 64 bits.

>>

>>>> Also, is there a possibility to know if there was an "ok" overflow or

>>>> not from get_range_info ()'s output? Would I have to compare the result

>>>> with the involved variable's range?

>>> I think you have to check it manually against max/min value of that

>>> type precision.

>>

>> Currently, extract_... () does that all that for me, is it really too

>> expensive to call? I guess, using get_range_info first and calling

>> extract when get_range_info returns a VR_RANGE is not really a favorable

>> thing to do either? :)

> Not only the cost, we should avoid introducing more interfaces while

> old ones can do the work.  Anyway, it's Richard's call here.


Using get_range_info and wi:: is prefered, I didn't look into the issue you
are running into but wi:: do have proper bit-precision tracking.  Maybe
the overflow checks are not implemented correctly there though.

Richard.

> Thanks,

> bin

>>

>> Regards

>>  Robin

>>
Robin Dapp June 20, 2017, 1:08 p.m. UTC | #15
>> Currently, extract_... () does that all that for me, is it really too

>> expensive to call? I guess, using get_range_info first and calling

>> extract when get_range_info returns a VR_RANGE is not really a favorable

>> thing to do either? :)

> Not only the cost, we should avoid introducing more interfaces while

> old ones can do the work.  Anyway, it's Richard's call here.


I rewrote the match.pd patterns to use get_range_info () now, keeping
track of an "ok" overflow (both min and max overflow) and one which does
not allow us to continue (min xor max overflow, split/anti range).  Test
suite on s390x has no regressions, bootstrap is ok, x86 running.

Regards
 Robin

--

gcc/ChangeLog:

2017-06-19  Robin Dapp  <rdapp@linux.vnet.ibm.com>

        * match.pd: Simplify wrapped binary operations.diff --git a/gcc/match.pd b/gcc/match.pd
index 80a17ba..66c37f6 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1290,6 +1290,128 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
     (if (cst && !TREE_OVERFLOW (cst))
      (plus { cst; } @0))))
 
+/* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+     (for inner_op (plus minus)
+       (simplify
+	 (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+	   (if (TREE_CODE (type) == INTEGER_TYPE
+		&& TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3)))
+	    (with
+	    {
+	      tree cst = NULL_TREE;
+	      tree inner_type = TREE_TYPE (@3);
+	      wide_int wmin, wmax;
+	      wide_int wmin0, wmax0;
+
+	      bool ovf = true;
+	      bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type);
+
+	      enum value_range_type vr_outer =
+		get_range_info (@3, &wmin, &wmax);
+	      enum value_range_type vr0 =
+		get_range_info (@0, &wmin0, &wmax0);
+
+	      /* Convert combined constant to tree of outer type if
+		 there was no overflow in the original operation.  */
+	      if (ovf_undef || vr_outer == VR_RANGE)
+	      {
+		wide_int w1 = @1;
+		wide_int w2 = @2;
+
+		if (ovf_undef || vr0 == VR_RANGE)
+		  {
+		    if (inner_op == MINUS_EXPR)
+		      w1 = wi::neg (w1);
+
+		    if (outer_op == MINUS_EXPR)
+		      w2 = wi::neg (w2);
+
+		    bool split_range = true;
+
+		    if (!ovf_undef && vr0 == VR_RANGE)
+		      {
+			int max_ovf = 0;
+			int min_ovf = 0;
+
+			signop sgn = TYPE_SIGN (inner_type);
+
+			wmin = wi::add (wmin0, w1);
+			min_ovf = wi::cmp (wmin, w1, sgn) < 0;
+
+			wmax = wi::add (wmax0, w1);
+			max_ovf = wi::cmp (wmax, w1, sgn) < 0;
+
+			ovf = min_ovf || max_ovf;
+
+			split_range = ((min_ovf && !max_ovf)
+				       || (!min_ovf && max_ovf));
+		      }
+
+		    if (ovf_undef || !split_range)
+		      {
+			/* Extend @1 to TYPE. */
+			w1 = w1.from (w1, TYPE_PRECISION (type),
+				      ovf ? SIGNED : TYPE_SIGN (TREE_TYPE (@1)));
+
+			/* Combine in outer, larger type.  */
+			wide_int combined_cst;
+			combined_cst = wi::add (w1, w2);
+
+			cst = wide_int_to_tree (type, combined_cst);
+		      }
+		  }
+	      }
+	    }
+(if (cst)
+	 (outer_op (convert @0) { cst; }))
+	)))))
+#endif
+
+/* ((T)(A)) +- CST -> (T)(A +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+    (simplify
+     (outer_op (convert SSA_NAME@0) INTEGER_CST@2)
+      (if (TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+	   && TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
+	   && TREE_CODE (type) == INTEGER_TYPE)
+       /* Perform binary operation inside the cast if the constant fits
+	  and there is no overflow.  */
+       (with
+	{
+	  bool split_range = true;
+	  tree cst_inner = NULL_TREE;
+	  enum value_range_type vr = VR_VARYING;
+	  tree inner_type = TREE_TYPE (@0);
+
+	  if (int_fits_type_p (@2, inner_type))
+	  {
+	    cst_inner = fold_convert (inner_type, @2);
+
+	    wide_int wmin0, wmax0;
+	    wide_int w1 = cst_inner;
+	    signop sgn = TYPE_SIGN (inner_type);
+	    vr = get_range_info (@0, &wmin0, &wmax0);
+
+	    if (vr == VR_RANGE)
+	      {
+		wide_int wmin = wi::add (wmin0, w1);
+		bool min_ovf = wi::cmp (wmin, w1, sgn) < 0;
+
+		wide_int wmax = wi::add (wmax0, w1);
+		bool max_ovf = wi::cmp (wmax, w1, sgn) < 0;
+
+		split_range = (min_ovf && !max_ovf) || (!min_ovf && max_ovf);
+	      }
+	  }
+	}
+	(if (cst_inner && !split_range)
+	 (convert (outer_op @0 { cst_inner; })))
+	))))
+#endif
+
   /* ~A + A -> -1 */
   (simplify
    (plus:c (bit_not @0) @0)

Richard Biener June 20, 2017, 1:49 p.m. UTC | #16
On Tue, Jun 20, 2017 at 3:08 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>>> Currently, extract_... () does that all that for me, is it really too

>>> expensive to call? I guess, using get_range_info first and calling

>>> extract when get_range_info returns a VR_RANGE is not really a favorable

>>> thing to do either? :)

>> Not only the cost, we should avoid introducing more interfaces while

>> old ones can do the work.  Anyway, it's Richard's call here.

>

> I rewrote the match.pd patterns to use get_range_info () now, keeping

> track of an "ok" overflow (both min and max overflow) and one which does

> not allow us to continue (min xor max overflow, split/anti range).  Test

> suite on s390x has no regressions, bootstrap is ok, x86 running.


+          (if (TREE_CODE (type) == INTEGER_TYPE
+               && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3)))
+           (with

use INTEGRAL_TYPE_P.

+             bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type);
+

so this is overflow behavior of the inner op.

+             /* Convert combined constant to tree of outer type if
+                there was no overflow in the original operation.  */

"in the original inner operation."

you then go on and use ovf_undef also for the outer operation:

+             if (ovf_undef || vr_outer == VR_RANGE)
+             {

but you do not actually _use_ vr_outer.  Do you think that if
vr_outer is a VR_RANGE then the outer operation may not
possibly have wrapped?  That's a false conclusion.

But I don't see how overflow in the original outer operation matters
and the code lacks comments as to explaining that as well.

So if you have a vr0 then you can compute whether the inner
operation cannot overflow.  You do this here:

+                   if (!ovf_undef && vr0 == VR_RANGE)
+                     {
+                       int max_ovf = 0;
+                       int min_ovf = 0;
+
+                       signop sgn = TYPE_SIGN (inner_type);
+
+                       wmin = wi::add (wmin0, w1);
+                       min_ovf = wi::cmp (wmin, w1, sgn) < 0;
+
+                       wmax = wi::add (wmax0, w1);
+                       max_ovf = wi::cmp (wmax, w1, sgn) < 0;
+
+                       ovf = min_ovf || max_ovf;
+
+                       split_range = ((min_ovf && !max_ovf)
+                                      || (!min_ovf && max_ovf));

ah, here's the use of the outer value-range.  This lacks a comment
(and it looks fishy given the outer value-range is a conservative approximation
and thus could be [-INF, +INF]).  Why's this not using the
wi::add overload with the overflow flag?  ISTR you want to handle "negative"
unsigned constants somehow, but then I don't see how the above works.
I'd say if wmin/wmax interpreted as signed are positive and then using
a signed op to add w1 results in a still positive number you're fine
(you don't seem
to restrict the widening cast to either zero- or sign-extending).

+                   if (ovf_undef || !split_range)
+                     {
+                       /* Extend @1 to TYPE. */
+                       w1 = w1.from (w1, TYPE_PRECISION (type),
+                                     ovf ? SIGNED : TYPE_SIGN
(TREE_TYPE (@1)));

ideally you could always interpret w1 as signed?

+                       /* Combine in outer, larger type.  */
+                       wide_int combined_cst;
+                       combined_cst = wi::add (w1, w2);

+(if (cst)
+        (outer_op (convert @0) { cst; }))
+       )))))

bogus indent.

+/* ((T)(A)) +- CST -> (T)(A +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+    (simplify
+     (outer_op (convert SSA_NAME@0) INTEGER_CST@2)
+      (if (TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+          && TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
+          && TREE_CODE (type) == INTEGER_TYPE)

INTEGRAL_TYPE_P and do that first before looking at TYPE_PRECISION.

+           if (vr == VR_RANGE)
+             {
+               wide_int wmin = wi::add (wmin0, w1);
+               bool min_ovf = wi::cmp (wmin, w1, sgn) < 0;
+
+               wide_int wmax = wi::add (wmax0, w1);
+               bool max_ovf = wi::cmp (wmax, w1, sgn) < 0;
+
+               split_range = (min_ovf && !max_ovf) || (!min_ovf && max_ovf);

similar why not use wi:add overload with the overflow flag?

Btw, I find

  (with
   {
     tree x = NULL;
      if (...) x = non-NULL;
   }
   (if (x)
    (....))))

ugly.  Use

   (with
    {
      ...
    }
    (if (...)
     (... { non-NULL } )

or sth like that which makes control flow more easily visible.

Richard.


> Regards

>  Robin

>

> --

>

> gcc/ChangeLog:

>

> 2017-06-19  Robin Dapp  <rdapp@linux.vnet.ibm.com>

>

>         * match.pd: Simplify wrapped binary operations.
Robin Dapp June 21, 2017, 11:44 a.m. UTC | #17
> use INTEGRAL_TYPE_P.


Done.

> but you do not actually _use_ vr_outer.  Do you think that if

> vr_outer is a VR_RANGE then the outer operation may not

> possibly have wrapped?  That's a false conclusion.


These were remains of a previous version.  vr_outer is indeed not needed
anymore; removed.

> wi::add overload with the overflow flag?  ISTR you want to handle "negative"

> unsigned constants somehow, but then I don't see how the above works.

> I'd say if wmin/wmax interpreted as signed are positive and then using

> a signed op to add w1 results in a still positive number you're fine

> (you don't seem

> to restrict the widening cast to either zero- or sign-extending).


Changed to using wi:add overload now.

In essence, three cases are being handled:
 - wrapped_range --> do not simplify
 - !wrapped_range && ovf ("negative" unsigned) --> simplify and combine
with sign extension in the outer type
 - !wrapped_range && !ovf ("positive" unsigned) --> simplify and combine
with zero extension in the outer type.

Regards
 Robindiff --git a/gcc/match.pd b/gcc/match.pd
index 80a17ba..ec1af69 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1290,6 +1290,116 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
     (if (cst && !TREE_OVERFLOW (cst))
      (plus { cst; } @0))))
 
+/* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+     (for inner_op (plus minus)
+       (simplify
+	 (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+	   (if (INTEGRAL_TYPE_P (type)
+		&& TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3)))
+	    (with
+	    {
+	      tree cst;
+	      tree inner_type = TREE_TYPE (@3);
+	      wide_int wmin0, wmax0;
+
+	      bool ovf = true;
+	      bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type);
+
+	      enum value_range_type vr0 =
+		get_range_info (@0, &wmin0, &wmax0);
+
+	      bool wrapped_range = true;
+
+	      /* Convert combined constant to tree of outer type if
+		 there was no overflow in the original inner operation.  */
+	      if (ovf_undef || vr0 == VR_RANGE)
+	      {
+		wide_int w1 = @1;
+		wide_int w2 = @2;
+
+		if (inner_op == MINUS_EXPR)
+		  w1 = wi::neg (w1);
+
+		if (outer_op == MINUS_EXPR)
+		  w2 = wi::neg (w2);
+
+		bool ovf;
+
+		if (!ovf_undef && vr0 == VR_RANGE)
+		  {
+		    bool max_ovf;
+		    bool min_ovf;
+
+		    signop sgn = TYPE_SIGN (inner_type);
+		    wi::add (wmin0, w1, sgn, &min_ovf);
+		    wi::add (wmax0, w1, sgn, &max_ovf);
+
+		    ovf = min_ovf || max_ovf;
+		    wrapped_range = ((min_ovf && !max_ovf)
+				   || (!min_ovf && max_ovf));
+		  }
+
+		/* Extend @1 to TYPE. */
+		w1 = w1.from (w1, TYPE_PRECISION (type),
+			      ovf ? SIGNED : TYPE_SIGN (inner_type));
+
+		/* Combine in outer, larger type.  */
+		wide_int combined_cst;
+		combined_cst = wi::add (w1, w2);
+
+		cst = wide_int_to_tree (type, combined_cst);
+	      }
+	    }
+        (if (ovf_undef || !wrapped_range)
+	 (outer_op (convert @0) { cst; }))
+	)))))
+#endif
+
+/* ((T)(A)) +- CST -> (T)(A +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+    (simplify
+     (outer_op (convert SSA_NAME@0) INTEGER_CST@2)
+      (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+	   && INTEGRAL_TYPE_P (type)
+	   && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
+       /* Perform binary operation inside the cast if the constant fits
+	  and there is no overflow.  */
+       (with
+	{
+	  bool wrapped_range = true;
+	  tree cst_inner = NULL_TREE;
+	  enum value_range_type vr = VR_VARYING;
+	  tree inner_type = TREE_TYPE (@0);
+
+	  if (int_fits_type_p (@2, inner_type))
+	  {
+	    cst_inner = fold_convert (inner_type, @2);
+
+	    wide_int wmin0, wmax0;
+	    wide_int w1 = cst_inner;
+	    signop sgn = TYPE_SIGN (inner_type);
+	    vr = get_range_info (@0, &wmin0, &wmax0);
+
+	    if (vr == VR_RANGE)
+	      {
+		bool min_ovf;
+		wi::add (wmin0, w1, sgn, &min_ovf);
+
+		bool max_ovf;
+		wi::add (wmax0, w1, sgn, &max_ovf);
+
+		wrapped_range = (min_ovf && !max_ovf) || (!min_ovf && max_ovf);
+	      }
+	  }
+	}
+       (if (cst_inner && !wrapped_range)
+	(convert (outer_op @0 { cst_inner; })))
+       ))))
+#endif
+
   /* ~A + A -> -1 */
   (simplify
    (plus:c (bit_not @0) @0)

Robin Dapp June 27, 2017, 7:17 a.m. UTC | #18
Ping.
Richard Biener June 27, 2017, 12:14 p.m. UTC | #19
On Wed, Jun 21, 2017 at 1:44 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote:
>> use INTEGRAL_TYPE_P.

>

> Done.

>

>> but you do not actually _use_ vr_outer.  Do you think that if

>> vr_outer is a VR_RANGE then the outer operation may not

>> possibly have wrapped?  That's a false conclusion.

>

> These were remains of a previous version.  vr_outer is indeed not needed

> anymore; removed.

>

>> wi::add overload with the overflow flag?  ISTR you want to handle "negative"

>> unsigned constants somehow, but then I don't see how the above works.

>> I'd say if wmin/wmax interpreted as signed are positive and then using

>> a signed op to add w1 results in a still positive number you're fine

>> (you don't seem

>> to restrict the widening cast to either zero- or sign-extending).

>

> Changed to using wi:add overload now.

>

> In essence, three cases are being handled:

>  - wrapped_range --> do not simplify

>  - !wrapped_range && ovf ("negative" unsigned) --> simplify and combine

> with sign extension in the outer type

>  - !wrapped_range && !ovf ("positive" unsigned) --> simplify and combine

> with zero extension in the outer type.


Let's split this and look at the simpler case:

+/* ((T)(A)) +- CST -> (T)(A +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+    (simplify
+     (outer_op (convert SSA_NAME@0) INTEGER_CST@2)
+      (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+          && INTEGRAL_TYPE_P (type)
+          && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
+       /* Perform binary operation inside the cast if the constant fits
+         and there is no overflow.  */
+       (with
+       {
+         bool wrapped_range = true;
+         tree cst_inner = NULL_TREE;
+         enum value_range_type vr = VR_VARYING;
+         tree inner_type = TREE_TYPE (@0);
+
+         if (int_fits_type_p (@2, inner_type))

do

      && get_range_info (...) == VR_RANGE)

here. That avoids vr and its initialization and you get all of the "work" when
you know it will eventually succeed.

+         {
+           cst_inner = fold_convert (inner_type, @2);

ideally you'd use a wide-int here and defer the tree allocation to the result

             wide_int wi = wi::from (@2, TYPE_PRECISION (inner_type),
                                              TYPE_SIGN (inner_type));

+           wide_int wmin0, wmax0;
+           wide_int w1 = cst_inner;
+           signop sgn = TYPE_SIGN (inner_type);
+           vr = get_range_info (@0, &wmin0, &wmax0);
+
+           if (vr == VR_RANGE)
+             {
+               bool min_ovf;
+               wi::add (wmin0, w1, sgn, &min_ovf);
+
+               bool max_ovf;
+               wi::add (wmax0, w1, sgn, &max_ovf);

So I guess we never run into the outer_op == minus case as the above is
clearly wrong for that?

The comment above says "if there is no overflow" but below you allow
min_ovf && max_ovf without any further explanation.

+               wrapped_range = (min_ovf && !max_ovf) || (!min_ovf && max_ovf);
+             }
+         }
+       }
+       (if (cst_inner && !wrapped_range)
+       (convert (outer_op @0 { cst_inner; })))

thus

        (if ((min_ovf && !max_ovf) || ....)
         (convert (outer_op @0 { wide_int_to_tree (inner_type, w1); } ))))

try to keep vertical spacing in patterns minimal -- I belive that patterns
should be small enough to fit in a terminal window (24 lines).

Richard.

+       ))))
+#endif


> Regards

>  Robin
Robin Dapp June 28, 2017, 2:34 p.m. UTC | #20
> ideally you'd use a wide-int here and defer the tree allocation to the result


Did that in the attached version.

> So I guess we never run into the outer_op == minus case as the above is

> clearly wrong for that?


Right, damn, not only was the treatment for this missing but it was
bogus in the other pattern as well.  Since we are mostly dealing with
PLUS_EXPR anyways it's probably better to defer the MINUS_EXPR case for
now.  This will also slim down the patterns a bit.

> try to keep vertical spacing in patterns minimal -- I belive that patterns

> should be small enough to fit in a terminal window (24 lines).


I find using the expanded wrapped_range condition in the simplification
somewhat cumbersome, especially because I need the condition to evaluate
to true by default making the initialization unintuitive.  Yet, I guess
setting wrapped_range = true was not terribly intuitive either...

Regards
 Robindiff --git a/gcc/match.pd b/gcc/match.pd
index 80a17ba..ed497cf 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1290,6 +1290,79 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
     (if (cst && !TREE_OVERFLOW (cst))
      (plus { cst; } @0))))
 
+/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST  */
+#if GIMPLE
+  (simplify
+    (plus (convert (plus@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+      (if (INTEGRAL_TYPE_P (type)
+           && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3)))
+       /* Combine CST1 and CST2 to CST and convert to outer type if
+          (A + CST1)'s range does not wrap.  */
+       (with
+       {
+         tree inner_type = TREE_TYPE (@3);
+         wide_int wmin0, wmax0;
+         wide_int w1 = @1;
+         wide_int w2 = @2;
+         wide_int combined_cst;
+
+         bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type);
+         bool min_ovf = true, max_ovf = false;
+
+         enum value_range_type vr0 =
+           get_range_info (@0, &wmin0, &wmax0);
+
+         if (ovf_undef || vr0 == VR_RANGE)
+           {
+             bool ovf = true;
+             if (!ovf_undef && vr0 == VR_RANGE)
+	       {
+		 wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf);
+		 wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf);
+		 ovf = min_ovf || max_ovf;
+	       }
+
+             /* Extend CST1 to TYPE. */
+             w1 = w1.from (w1, TYPE_PRECISION (type),
+			   ovf ? SIGNED : TYPE_SIGN (inner_type));
+           }
+       }
+   (if (ovf_undef || !((min_ovf && !max_ovf) || (!min_ovf && max_ovf)))
+    (plus (convert @0) { wide_int_to_tree (type, wi::add (w1, w2)); }
+     )))))
+#endif
+
+/* ((T)(A)) + CST -> (T)(A + CST)  */
+#if GIMPLE
+  (simplify
+   (plus (convert SSA_NAME@0) INTEGER_CST@1)
+    (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+         && INTEGRAL_TYPE_P (type)
+         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+         && int_fits_type_p (@1, TREE_TYPE (@0)))
+     /* Perform binary operation inside the cast if the constant fits
+        and (A + CST)'s range does not wrap.  */
+     (with
+      {
+        bool min_ovf = true, max_ovf = false;
+        tree inner_type = TREE_TYPE (@0);
+
+        wide_int w1 = @1;
+        w1 = w1.from (w1, TYPE_PRECISION (inner_type), TYPE_SIGN
+      		(inner_type));
+
+        wide_int wmin0, wmax0;
+        if (get_range_info (@0, &wmin0, &wmax0) == VR_RANGE)
+          {
+            wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf);
+            wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf);
+          }
+      }
+     (if (!((min_ovf && !max_ovf) || (!min_ovf && max_ovf)) )
+      (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; })))
+     )))
+#endif
+
   /* ~A + A -> -1 */
   (simplify
    (plus:c (bit_not @0) @0)

Richard Biener July 3, 2017, 1:10 p.m. UTC | #21
On Wed, Jun 28, 2017 at 4:34 PM, Robin Dapp <rdapp@linux.vnet.ibm.com> wrote
>> ideally you'd use a wide-int here and defer the tree allocation to the result

>

> Did that in the attached version.

>

>> So I guess we never run into the outer_op == minus case as the above is

>> clearly wrong for that?

>

> Right, damn, not only was the treatment for this missing but it was

> bogus in the other pattern as well.  Since we are mostly dealing with

> PLUS_EXPR anyways it's probably better to defer the MINUS_EXPR case for

> now.  This will also slim down the patterns a bit.

>

>> try to keep vertical spacing in patterns minimal -- I belive that patterns

>> should be small enough to fit in a terminal window (24 lines).

>

> I find using the expanded wrapped_range condition in the simplification

> somewhat cumbersome, especially because I need the condition to evaluate

> to true by default making the initialization unintuitive.  Yet, I guess

> setting wrapped_range = true was not terribly intuitive either...


+     /* Perform binary operation inside the cast if the constant fits
+        and (A + CST)'s range does not wrap.  */
+     (with
+      {
+        bool min_ovf = true, max_ovf = false;

While the initialization value doesn't matter (wi::add will overwrite it)
better initialize both to false ;)  Ah, you mean because we want to
transform only if get_range_info returned VR_RANGE.  Indeed somewhat
unintuitive (but still the best variant for now).

+        wide_int w1 = @1;
+        w1 = w1.from (w1, TYPE_PRECISION (inner_type), TYPE_SIGN
+               (inner_type));

I think wi::from (@1, ....) should work as well.

+     (if (!((min_ovf && !max_ovf) || (!min_ovf && max_ovf)) )
+      (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; })))

so I'm still missing a comment on why min_ovf && max_ovf is ok.
The simple-minded would have written

   (if  (! min_ovf && ! max_ovf)
...

I'd like to see testcase(s) with this patch, preferably exactly also for the
case of min_ovf && max_ovf.  That said, consider (long)[0xfffffffe,
0xffffffff] + 2
which should have min_ovf and max_ovf which results in [0x0, 0x1] in type
unsigned int but [0x100000000, 0x100000001] in type long.

Richard.

> Regards

>  Robin
Robin Dapp July 5, 2017, 8:51 a.m. UTC | #22
> While the initialization value doesn't matter (wi::add will overwrite it)

> better initialize both to false ;)  Ah, you mean because we want to

> transform only if get_range_info returned VR_RANGE.  Indeed somewhat

> unintuitive (but still the best variant for now).


> so I'm still missing a comment on why min_ovf && max_ovf is ok.

> The simple-minded would have written [...]


I suppose it's more a matter of considering too many things at the same
time for me...  I was still thinking of including more cases than
necessary for the regression.  Guess the attached version will do as
well and should not contain any more surprises.  If needed, I'll add
additional cases some time.

Tests in a followup message.

Regards
 Robindiff --git a/gcc/match.pd b/gcc/match.pd
index 80a17ba..3acf8be 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1290,6 +1290,70 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
     (if (cst && !TREE_OVERFLOW (cst))
      (plus { cst; } @0))))
 
+/* ((T)(A + CST1)) + CST2 -> (T)(A) + CST  */
+#if GIMPLE
+  (simplify
+    (plus (convert (plus@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+      (if (INTEGRAL_TYPE_P (type)
+           && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3)))
+       /* Combine CST1 and CST2 to CST and convert to outer type if
+          (A + CST1)'s range does not overflow.  */
+       (with
+       {
+         tree inner_type = TREE_TYPE (@3);
+         wide_int wmin0, wmax0;
+         wide_int w1 = @1;
+
+         bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type);
+         bool min_ovf = true, max_ovf = true;
+
+         enum value_range_type vr0 = get_range_info (@0, &wmin0, &wmax0);
+
+         if (ovf_undef || vr0 == VR_RANGE)
+           {
+             if (!ovf_undef && vr0 == VR_RANGE)
+	       {
+		 wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf);
+		 wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf);
+	       }
+	     w1 = w1.from (@1, TYPE_PRECISION (type), TYPE_SIGN (inner_type));
+           }
+       }
+   (if (ovf_undef || !(min_ovf || max_ovf))
+    (plus (convert @0) { wide_int_to_tree (type, wi::add (w1, @2)); }
+     )))))
+#endif
+
+/* ((T)(A)) + CST -> (T)(A + CST)  */
+#if GIMPLE
+  (simplify
+   (plus (convert SSA_NAME@0) INTEGER_CST@1)
+    (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+         && INTEGRAL_TYPE_P (type)
+         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
+         && int_fits_type_p (@1, TREE_TYPE (@0)))
+     /* Perform binary operation inside the cast if the constant fits
+        and (A + CST)'s range does not overflow.  */
+     (with
+      {
+        bool min_ovf = true, max_ovf = true;
+        tree inner_type = TREE_TYPE (@0);
+
+        wide_int w1 = w1.from (@1, TYPE_PRECISION (inner_type), TYPE_SIGN
+      		(inner_type));
+
+        wide_int wmin0, wmax0;
+        if (get_range_info (@0, &wmin0, &wmax0) == VR_RANGE)
+          {
+            wi::add (wmin0, w1, TYPE_SIGN (inner_type), &min_ovf);
+            wi::add (wmax0, w1, TYPE_SIGN (inner_type), &max_ovf);
+          }
+      }
+     (if (!min_ovf && !max_ovf)
+      (convert (plus @0 { {wide_int_to_tree (TREE_TYPE (@0), w1)}; })))
+     )))
+#endif
+
   /* ~A + A -> -1 */
   (simplify
    (plus:c (bit_not @0) @0)

Robin Dapp July 5, 2017, 8:54 a.m. UTC | #23
[3/3] Tests

--

gcc/testsuite/ChangeLog:

2017-07-05  Robin Dapp  <rdapp@linux.vnet.ibm.com>

        * gcc.dg/wrapped-binop-simplify-signed-1.c: New test.
        * gcc.dg/wrapped-binop-simplify-signed-2.c: New test.
        * gcc.dg/wrapped-binop-simplify-unsigned-1.c: New test.
        * gcc.dg/wrapped-binop-simplify-unsigned-2.c: New test.diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c
new file mode 100644
index 0000000..2571a07
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-1.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ccp1-details" } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to" 12 "ccp1" } } */
+
+#include <limits.h>
+
+long foo(int a)
+{
+  return (long)(a - 2) + 1;
+}
+
+long bar(int a)
+{
+  return (long)(a + 3) - 1;
+}
+
+long baz(int a)
+{
+  return (long)(a - 1) + 2;
+}
+
+long baf(int a)
+{
+  return (long)(a + 1) - 2;
+}
+
+long bak(int a)
+{
+  return (long)(a + 1) + 3;
+}
+
+long bal(int a)
+{
+  return (long)(a - 7) - 4;
+}
+
+long bam(int a)
+{
+  return (long)(a - 1) - INT_MAX;
+}
+
+long bam2(int a)
+{
+  return (long)(a + 1) + INT_MAX;
+}
+
+long ban(int a)
+{
+  return (long)(a - 1) + INT_MIN;
+}
+
+long ban2(int a)
+{
+  return (long)(a + 1) - INT_MIN;
+}
+
+unsigned long baq(int a)
+{
+  return (unsigned long)(a + 1) - 1;
+}
+
+unsigned long baq2(int a)
+{
+  return (unsigned long)(a - 2) + 1;
+}
diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-2.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-2.c
new file mode 100644
index 0000000..5c897ba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-signed-2.c
@@ -0,0 +1,39 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include <assert.h>
+#include <limits.h>
+
+int aa = -3;
+
+__attribute__((noinline))
+long foo (int a)
+{
+  return (long)(a - INT_MIN) + 1;
+}
+
+__attribute__((noinline))
+long foo2 (int a)
+{
+  if (a > -10 && a < 10)
+    return (long)(a + 2) - 1;
+}
+
+__attribute__((noinline))
+long foo3 (int a)
+{
+  if (a > -10 && a < 10)
+    return (long)(a) - 3;
+}
+
+int main()
+{
+  volatile long h = foo (aa);
+  assert (h == 2147483646);
+
+  volatile long i = foo2 (aa);
+  assert (i == -2);
+
+  volatile long j = foo3 (aa);
+  assert (j == -6);
+}
diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c
new file mode 100644
index 0000000..04a7ca49
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-1.c
@@ -0,0 +1,43 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-evrp-details -fdump-tree-ccp2-details -fdump-tree-vrp1-details" } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to" 2 "evrp" } } */
+/* { dg-final { scan-tree-dump-times "Match-and-simplified" 2 "ccp2" } } */
+/* { dg-final { scan-tree-dump-times "gimple_simplified to" 3 "vrp1" } } */
+
+#include <limits.h>
+
+unsigned long oof2(unsigned int a)
+{
+  if (a > 0)
+    return (unsigned long)(a - 1) + 1;
+}
+
+unsigned long bah (unsigned int a)
+{
+  if (a > 0)
+    return (unsigned long)(a - 1) - 1;
+}
+
+long baq3(unsigned int a)
+{
+  if (a > 0)
+    return (long)(a - 1) + 1;
+}
+
+unsigned long bap(unsigned int a)
+{
+  if (a < UINT_MAX)
+    return (unsigned long)(a + 1) + ULONG_MAX;
+}
+
+unsigned long bar3(unsigned int a)
+{
+  if (a < UINT_MAX)
+    return (unsigned long)(a + 1) - 5;
+}
+
+unsigned long bar4(unsigned int a)
+{
+  if (a < UINT_MAX)
+    return (unsigned long)(a + 1) - 6;
+}
diff --git a/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-2.c b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-2.c
new file mode 100644
index 0000000..46290e7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/wrapped-binop-simplify-unsigned-2.c
@@ -0,0 +1,125 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+#include <assert.h>
+#include <limits.h>
+
+unsigned int a = 3;
+int aa = 3;
+int bb = 1;
+int cc = 4;
+unsigned int dd = 0;
+unsigned int ee = 4294967294u;
+
+__attribute__((noinline))
+unsigned long foo1 (unsigned int a)
+{
+  return (unsigned long)(UINT_MAX + 1) - 1;
+}
+
+__attribute__((noinline))
+unsigned long foo2 (unsigned int a)
+{
+  if (a < 4)
+    return (unsigned long)(a - 4) + 1;
+}
+
+__attribute__((noinline))
+unsigned long foo3 (unsigned int a)
+{
+  if (a > 2)
+    return (unsigned long)(a + UINT_MAX - 4) + 2;
+}
+
+__attribute__((noinline))
+unsigned long foo4 (unsigned int a)
+{
+  if (a > 2)
+    return (unsigned long)(a - UINT_MAX) + UINT_MAX;
+}
+
+__attribute__((noinline))
+unsigned long foo5 (unsigned int a)
+{
+  if (a > 2)
+    return (unsigned long)(a + UINT_MAX) - UINT_MAX;
+}
+
+__attribute__((noinline))
+long foo6 (unsigned int a)
+{
+  if (a > 2)
+    return (long)(a - 4) + 1;
+}
+
+__attribute__((noinline))
+long foo7 (unsigned int a)
+{
+  if (a > 2)
+    return (long)(a + UINT_MAX) + 1;
+}
+
+__attribute__((noinline))
+unsigned long foo8 (unsigned int a)
+{
+  if (a < 2)
+    return (unsigned long)(a + 4294967294u) + 5000000000;
+}
+
+__attribute__((noinline))
+unsigned long foo9 (unsigned int a)
+{
+  if (a > 2)
+    return (unsigned long)(a + 4294967294u) + 8000000000;
+}
+
+__attribute__((noinline))
+unsigned long foo10 (unsigned int a)
+{
+  if (a < 2)
+    return (unsigned long)(a + 4294967294u) + 2;
+}
+
+__attribute__((noinline))
+unsigned long foo11 (unsigned int a)
+{
+  if (a > 4294967293u)
+    return (unsigned long)(a + 2u) + 2;
+}
+
+
+int main()
+{
+  unsigned long b = foo1 (UINT_MAX);
+  assert (b == 18446744073709551615ul);
+
+  unsigned long c = foo2 (a);
+  assert (c == 4294967296u);
+
+  unsigned long d = foo3 (a);
+  assert (d == 4294967296ul);
+
+  unsigned long e = foo4 (a);
+  assert (e == 4294967299ul);
+
+  unsigned long f = foo5 (a);
+  assert (f == 18446744069414584323ul);
+
+  long g = foo6 (a);
+  assert (g == 4294967296ul);
+
+  long h = foo7 (aa);
+  assert (h == 3);
+
+  unsigned long i = foo8 (bb);
+  assert (i == 9294967295ul);
+
+  unsigned long j = foo9 (cc);
+  assert (j == 8000000002);
+
+  unsigned long k = foo10 (dd);
+  assert (k == 0x100000000);
+
+  unsigned long l = foo11 (ee);
+  assert (l == 2);
+}

Marc Glisse July 15, 2017, 9:58 a.m. UTC | #24
On Wed, 5 Jul 2017, Robin Dapp wrote:

>> While the initialization value doesn't matter (wi::add will overwrite it)

>> better initialize both to false ;)  Ah, you mean because we want to

>> transform only if get_range_info returned VR_RANGE.  Indeed somewhat

>> unintuitive (but still the best variant for now).

>

>> so I'm still missing a comment on why min_ovf && max_ovf is ok.

>> The simple-minded would have written [...]

>

> I suppose it's more a matter of considering too many things at the same

> time for me...  I was still thinking of including more cases than

> necessary for the regression.  Guess the attached version will do as

> well and should not contain any more surprises.  If needed, I'll add

> additional cases some time.


What happens for (long)(X+10)+LONG_MAX where X has type int and is in 
[-30, -20]? It looks like wi::add will overflow and you will generate 
X+negative which overflows at runtime.

(It looks like you don't need to name @3, you could just use the type of 
@0 instead)

-- 
Marc Glisse
diff mbox

Patch

diff --git a/gcc/tree-ssa-propagate.c b/gcc/tree-ssa-propagate.c
index fbe7e13..110587d 100644
--- a/gcc/tree-ssa-propagate.c
+++ b/gcc/tree-ssa-propagate.c
@@ -1065,13 +1065,15 @@  substitute_and_fold_dom_walker::before_dom_children (basic_block bb)
 
       /* Replace real uses in the statement.  */
       did_replace |= replace_uses_in (stmt, get_value_fn);
+      if (did_replace)
+	gimple_set_modified (stmt, true);
 
       /* If we made a replacement, fold the statement.  */
-      if (did_replace)
+      if (fold_stmt (&i, follow_single_use_edges))
 	{
-	  fold_stmt (&i, follow_single_use_edges);
 	  stmt = gsi_stmt (i);
 	  gimple_set_modified (stmt, true);
+	  did_replace = true;
 	}
 
       /* Some statements may be simplified using propagator