diff mbox

[RTL-ree] PR rtl-optimization/68194: Restrict copy instruction in presence of conditional moves

Message ID 5649E333.4090904@arm.com
State Superseded
Headers show

Commit Message

Kyrylo Tkachov Nov. 16, 2015, 2:07 p.m. UTC
Hi all,

In this PR we encounter a wrong-code bug on x86_64. It started with an RTL if-conversion patch of mine
which I believe exposed a latent issue in the ree (redundant extension elimination) pass.
The different if-conversion behaviour enabled a new cse opportunity
which then produced the RTL that triggered the bad ree behaviour.
The relevant RTL insns before the ree pass look like:

Basic Block A:
(set (reg:HI 0 ax)
      (mem:HI (symbol_ref:DI ("f"))))
...
(set (reg:SI 3 bx)
      (if_then_else:SI (eq (reg:CCZ 17 flags)
                 (const_int 0))
             (reg:SI 0 ax)
             (reg:SI 3 bx)))

(set (reg:SI 4 si)
      (sign_extend:SI (reg:HI 3 bx)))
...
Basic block B (dominated by basic block A):
(set (reg:SI 4 si)
      (sign_extend:SI (reg:QI 0 ax))) /* ax contains symbol "f". */


ree changes that into the broken:
Basic block A:
(set (reg:SI 4 si)
      (sign_extend:SI (mem:HI (symbol_ref:DI ("f")))))

(set (reg:SI 3 bx)
      (reg:SI 4 si))

...
(set (reg:SI 3 bx)
      (if_then_else:SI (eq (reg:CCZ 17 flags)
                 (const_int 0 [0]))
             (reg:SI 0 ax)
             (reg:SI 3 bx)))
...
Basic block B:
(set (reg:SI 4 si)
      (sign_extend:SI (reg:QI 0 ax))) /* Insn unchanged by ree, but ax now undefined.  */


Note that after ree register ax is now used uninitialised in basic block A and
the insn that previously set it to symbol "f" has now been transformed into:
(set (reg:SI 4 si)
      (sign_extend:SI (mem:HI (symbol_ref:DI ("f")))))

I've explained in the comments in the patch what's going on but the short version is
trying to change the destination of a defining insn that feeds into an extend insn
is not valid if the defining insn doesn't feed directly into the extend insn.
In the ree pass the only way this can happen is if there is an intermediate conditional
move that the pass tries to handle in a special way.  An equivalent fix would have been
to check on that path (when copy_needed in combine_reaching_defs is true) that the
state->copies_list vector (that contains the conditional move insns feeding into the
extend insn) is empty.

This patch is the minimal fix that I could do for this PR and the cases it rejects are
cases where we're pretty much guaranteed to miscompile the code, so it doesn't reject
any legitimate extension elimination opportunities. I checked that the code generation
doesn't change on aarch64 for the whole of SPEC2006 (I did see codegen regressions with
previous versions of the patch that were less targeted).

Bootstrapped and tested on x86_64-unknown-linux-gnu, arm-none-linux-gnueabihf,
aarch64-none-linux-gnu.

I've marked some other PRs that are dups of this one in the ChangeLog. If anyone
has any other PRs that are dups of this one please let me know.

Ok for trunk?

Thanks,
Kyrill

2015-11-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68194
     PR rtl-optimization/68328
     PR rtl-optimization/68185
     * ree.c (combine_reaching_defs): Reject copy_needed case if defining
     insn does not feed directly into candidate extension insn.

2015-11-16  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     PR rtl-optimization/68194
     * gcc.dg/pr68194.c: New test.

Comments

Kyrylo Tkachov Nov. 17, 2015, 9:08 a.m. UTC | #1
Hi Bernd,

On 16/11/15 18:40, Bernd Schmidt wrote:
> On 11/16/2015 03:07 PM, Kyrill Tkachov wrote:

>

>> I've explained in the comments in the patch what's going on but the

>> short version is trying to change the destination of a defining insn

>> that feeds into an extend insn is not valid if the defining insn

>> doesn't feed directly into the extend insn. In the ree pass the only

>> way this can happen is if there is an intermediate conditional move

>> that the pass tries to handle in a special way. An equivalent fix

>> would have been to check on that path (when copy_needed in

>> combine_reaching_defs is true) that the state->copies_list vector

>> (that contains the conditional move insns feeding into the extend

>> insn) is empty.

>

> I ran this through gdb, and I think I see what's going on. For reference, here's a comment from the source:

>

>       /* Considering transformation of

>          (set (reg1) (expression))

>          ...

>          (set (reg2) (any_extend (reg1)))

>

>          into

>

>          (set (reg2) (any_extend (expression)))

>          (set (reg1) (reg2))

>          ...  */

>

> I was thinking that another possible fix would be to also check !reg_used_between_p for reg1 to ensure it's not used. I'm thinking this might be a little clearer - what is your opinion?


Yes, I had considered that as well. It should be equivalent. I didn't use !reg_used_between_p because I thought
it'd be more expensive than checking reg_overlap_mentioned_p since we must iterate over a number of instructions
and call reg_overlap_mentioned_p on each one. But I suppose this case is rare enough that it wouldn't make any
measurable difference.

Would you prefer to use !reg_used_between_p here?

>

> The added comment could lead to some confusion since it's placed in front of an existing if statement that also tests a different condition. Also, if we go with your fix,

>

>> +      || !reg_overlap_mentioned_p (tmp_reg, SET_SRC (PATTERN (cand->insn))))

>

> Shouldn't this really be !rtx_equal_p?

>


Maybe, will it behave the right way if the two regs have different modes or when subregs are involved?
(will we even hit such a case in this path?)

Thanks,
Kyrill
>

> Bernd

>
Kyrylo Tkachov Nov. 17, 2015, 9:49 a.m. UTC | #2
On 17/11/15 09:08, Kyrill Tkachov wrote:
> Hi Bernd,

>

> On 16/11/15 18:40, Bernd Schmidt wrote:

>> On 11/16/2015 03:07 PM, Kyrill Tkachov wrote:

>>

>>> I've explained in the comments in the patch what's going on but the

>>> short version is trying to change the destination of a defining insn

>>> that feeds into an extend insn is not valid if the defining insn

>>> doesn't feed directly into the extend insn. In the ree pass the only

>>> way this can happen is if there is an intermediate conditional move

>>> that the pass tries to handle in a special way. An equivalent fix

>>> would have been to check on that path (when copy_needed in

>>> combine_reaching_defs is true) that the state->copies_list vector

>>> (that contains the conditional move insns feeding into the extend

>>> insn) is empty.

>>

>> I ran this through gdb, and I think I see what's going on. For reference, here's a comment from the source:

>>

>>       /* Considering transformation of

>>          (set (reg1) (expression))

>>          ...

>>          (set (reg2) (any_extend (reg1)))

>>

>>          into

>>

>>          (set (reg2) (any_extend (expression)))

>>          (set (reg1) (reg2))

>>          ...  */

>>

>> I was thinking that another possible fix would be to also check !reg_used_between_p for reg1 to ensure it's not used. I'm thinking this might be a little clearer - what is your opinion?

>

> Yes, I had considered that as well. It should be equivalent. I didn't use !reg_used_between_p because I thought

> it'd be more expensive than checking reg_overlap_mentioned_p since we must iterate over a number of instructions

> and call reg_overlap_mentioned_p on each one. But I suppose this case is rare enough that it wouldn't make any

> measurable difference.


Actually, I tried it out. And while a check reg_used_between_p fixed the testcase, it caused code quality regressions
on aarch64. Seems it's too aggressive in restricting ree.

I'll have a closer look.

Kyrill

>

> Would you prefer to use !reg_used_between_p here?

>

>>

>> The added comment could lead to some confusion since it's placed in front of an existing if statement that also tests a different condition. Also, if we go with your fix,

>>

>>> +      || !reg_overlap_mentioned_p (tmp_reg, SET_SRC (PATTERN (cand->insn))))

>>

>> Shouldn't this really be !rtx_equal_p?

>>

>

> Maybe, will it behave the right way if the two regs have different modes or when subregs are involved?

> (will we even hit such a case in this path?)

>

> Thanks,

> Kyrill

>>

>> Bernd

>>

>
Kyrylo Tkachov Nov. 17, 2015, 10:17 a.m. UTC | #3
On 17/11/15 09:49, Kyrill Tkachov wrote:
>

> On 17/11/15 09:08, Kyrill Tkachov wrote:

>> Hi Bernd,

>>

>> On 16/11/15 18:40, Bernd Schmidt wrote:

>>> On 11/16/2015 03:07 PM, Kyrill Tkachov wrote:

>>>

>>>> I've explained in the comments in the patch what's going on but the

>>>> short version is trying to change the destination of a defining insn

>>>> that feeds into an extend insn is not valid if the defining insn

>>>> doesn't feed directly into the extend insn. In the ree pass the only

>>>> way this can happen is if there is an intermediate conditional move

>>>> that the pass tries to handle in a special way. An equivalent fix

>>>> would have been to check on that path (when copy_needed in

>>>> combine_reaching_defs is true) that the state->copies_list vector

>>>> (that contains the conditional move insns feeding into the extend

>>>> insn) is empty.

>>>

>>> I ran this through gdb, and I think I see what's going on. For reference, here's a comment from the source:

>>>

>>>       /* Considering transformation of

>>>          (set (reg1) (expression))

>>>          ...

>>>          (set (reg2) (any_extend (reg1)))

>>>

>>>          into

>>>

>>>          (set (reg2) (any_extend (expression)))

>>>          (set (reg1) (reg2))

>>>          ...  */

>>>

>>> I was thinking that another possible fix would be to also check !reg_used_between_p for reg1 to ensure it's not used. I'm thinking this might be a little clearer - what is your opinion?

>>

>> Yes, I had considered that as well. It should be equivalent. I didn't use !reg_used_between_p because I thought

>> it'd be more expensive than checking reg_overlap_mentioned_p since we must iterate over a number of instructions

>> and call reg_overlap_mentioned_p on each one. But I suppose this case is rare enough that it wouldn't make any

>> measurable difference.

>

> Actually, I tried it out. And while a check reg_used_between_p fixed the testcase, it caused code quality regressions

> on aarch64. Seems it's too aggressive in restricting ree.

>

> I'll have a closer look.


Ok, so the testcases that regress code-quality-wise on aarch64 look like this before ree:
(insn 48 57 49 7 (set (reg:SI 7 x7)
         (zero_extend:SI (mem:QI (plus:DI (reg/f:DI 1 x1)
                     (const_int 2 [0x2]))))))
(insn 49 48 52 7 (set (reg/v:SI 2 x2)
         (reg:SI 7 x7)))

(insn 52 49 53 7 (set (reg:DI 8 x8)
         (zero_extend:DI (reg:SI 7 x7))))


ree wants to transform this into:
(insn 48 57 296 7 (set (reg:DI 8 x8)
         (zero_extend:DI (mem:QI (plus:DI (reg/f:DI 1 x1)
                     (const_int 2 [0x2]))))))
(insn 296 48 49 7 (set (reg:DI 7 x7)
         (reg:DI 8 x8)))
(insn 49 296 53 7 (set (reg/v:SI 2 x2)
         (reg:SI 7 x7)))


which is valid, but we reject that with the reg_used_between_p check because x7 is used in
the intermediate insn 49. Note that no conditional move is present here.

So, I think that the crucial element here is that the destination of the def_insn should
feed directly into the extend, and that is what my original patch was testing for.
So, I'd like to keep my original proposed patch as is, although I think I'll add a couple of
testcases from the duplicate PRs to it for the testsuite.
What do you think?

Thanks,
Kyrill


>

> Kyrill

>

>>

>> Would you prefer to use !reg_used_between_p here?

>>

>>>

>>> The added comment could lead to some confusion since it's placed in front of an existing if statement that also tests a different condition. Also, if we go with your fix,

>>>

>>>> +      || !reg_overlap_mentioned_p (tmp_reg, SET_SRC (PATTERN (cand->insn))))

>>>

>>> Shouldn't this really be !rtx_equal_p?

>>>

>>

>> Maybe, will it behave the right way if the two regs have different modes or when subregs are involved?

>> (will we even hit such a case in this path?)

>>

>> Thanks,

>> Kyrill

>>>

>>> Bernd

>>>

>>

>
Kyrylo Tkachov Nov. 17, 2015, 1:03 p.m. UTC | #4
On 17/11/15 12:10, Bernd Schmidt wrote:
> On 11/17/2015 10:08 AM, Kyrill Tkachov wrote:

>> Yes, I had considered that as well. It should be equivalent. I didn't

>> use !reg_used_between_p because I thought

>> it'd be more expensive than checking reg_overlap_mentioned_p since we

>> must iterate over a number of instructions

>> and call reg_overlap_mentioned_p on each one. But I suppose this case is

>> rare enough that it wouldn't make any

>> measurable difference.

>>

>> Would you prefer to use !reg_used_between_p here?

>

> I would but apparently it doesn't work, so that's kind of neither here nor there.

>

>>> The added comment could lead to some confusion since it's placed in

>>> front of an existing if statement that also tests a different

>>> condition. Also, if we go with your fix,

>>>

>>>> +      || !reg_overlap_mentioned_p (tmp_reg, SET_SRC (PATTERN

>>>> (cand->insn))))

>>>

>>> Shouldn't this really be !rtx_equal_p?

>>>

>>

>> Maybe, will it behave the right way if the two regs have different modes

>> or when subregs are involved?

>

> It would return false, in which case we'll conservatively fail here. I think that's desirable?

>


Well, I think the statement we want to make is
"return false from this function if the two expressions contain the same register number".
if (!rtx_equal_p (..., ...))
   return false;

will only return false if the two expressions are the same REG with the same mode.
if (!reg_overlap_mentioned_p (..., ...))
   return false;

should return false even if the modes are different or one is a subreg, which is what we want.

I did not see any codegen regressions using reg_overlap_mentioned_p on aarch64, so I don't think
it will restrict any legitimate cases.

Thanks,
Kyrill

>

> Bernd

>
Kyrylo Tkachov Nov. 18, 2015, 9:11 a.m. UTC | #5
On 17/11/15 23:11, Bernd Schmidt wrote:
> On 11/17/2015 02:03 PM, Kyrill Tkachov wrote:

>> +      || !reg_overlap_mentioned_p (tmp_reg, SET_SRC (PATTERN (cand->insn))))

>>      return false;

>

>> Well, I think the statement we want to make is

>> "return false from this function if the two expressions contain the same

>> register number".

>

> I looked at it again and I think I'll need more time to really figure out what's going on in this pass.

>

> However, about the above... either I'm very confused, or the actual statement here is "return false _unless_ the two expressions contain the same register number". In the testcase, the regs in question are ax and bx, which is then 

> rejected if the patch has been applied.


I'm sorry, it is my mistake in explaining. I meant to say:
"return false from this function unless the two expressions contain the same
  register number"

>

> (gdb) p tmp_reg

> (reg:SI 0 ax)

> (gdb) p cand->insn

> (insn 59 117 60 7 (set (reg:SI 4 si [orig:107 D.1813 ] [107])

>         (sign_extend:SI (reg:HI 3 bx [orig:99 D.1813 ] [99])))

>

> And I think it would be better to strengthen that to "return false unless registers are identical". (From the above it's clear however that a plain rtx_equal_p wouldn't work since there's an extension in the operand).


So the three relevant instructions are:
I1: (set (reg:HI 0 ax)
      (mem:HI (symbol_ref:DI ("f"))))

I2: (set (reg:SI 3 bx)
      (if_then_else:SI (eq (reg:CCZ 17 flags)
                 (const_int 0))
             (reg:SI 0 ax)
             (reg:SI 3 bx)))

I3: (set (reg:SI 4 si)
      (sign_extend:SI (reg:HI 3 bx)))

I1 is def_insn, I3 is cand->insn. tmp_reg is 'ax'. What we want to do is reject this transformation
because the destination of def_insn (aka I1), that is 'ax', is not the operand of the extend operation
in cand->insn (aka I3). As you said, rtx_equal won't work on just SET_SRC (PATTERN (cand->insn)) because
it's an extend operation. So reg_overlap_mentioned should be appropriate.
Hope this helps.

>

> Also, I had another look at the testcase. It uses __builtin_printf and dg-output, which is at least unusual. Try to use the normal "if (cond) abort ()".

>


I did not try to remove the __builtin_printf because the use of 'f' there is needed to trigger this.
There are at least two other dups of this PR: 68328 and 68185 the testcases for which have an "if (cond) abort ();" form.
I'll use them instead.


>

> Bernd

>
diff mbox

Patch

commit 33131f774705b936afc1a26c145e1214b388771f
Author: Kyrylo Tkachov <kyrylo.tkachov@arm.com>
Date:   Fri Nov 13 15:01:47 2015 +0000

    [RTL-ree] PR rtl-optimization/68194: Restrict copy instruction in presence of conditional moves

diff --git a/gcc/ree.c b/gcc/ree.c
index b8436f2..e91d164 100644
--- a/gcc/ree.c
+++ b/gcc/ree.c
@@ -814,7 +814,30 @@  combine_reaching_defs (ext_cand *cand, const_rtx set_pat, ext_state *state)
 
       rtx tmp_reg = gen_rtx_REG (GET_MODE (SET_DEST (PATTERN (cand->insn))),
 				 REGNO (SET_DEST (*dest_sub_rtx)));
-      if (reg_overlap_mentioned_p (tmp_reg, SET_DEST (PATTERN (cand->insn))))
+
+      /* When transforming:
+	(set (reg1) (expression))
+	 ...
+	(set (reg2) (any_extend (reg1)))
+
+	into
+
+	(set (reg2) (any_extend (expression)))
+	(set (reg1) (reg2))
+	make sure that reg1 from the first set feeds directly into the extend.
+	This may not hold in a situation with an intermediate
+	conditional copy i.e.
+	I1: (set (reg3) (expression))
+	I2: (set (reg1) (cond ? reg3 : reg1))
+	I3: (set (reg2) (any_extend (reg1)))
+
+	where I3 is cand, I1 is def_insn and I2 is a conditional copy.
+	We want to avoid transforming that into:
+	(set (reg2) (any_extend (expression)))
+	(set (reg1) (reg2))
+	(set (reg1) (cond ? reg3 : reg1)).  */
+      if (reg_overlap_mentioned_p (tmp_reg, SET_DEST (PATTERN (cand->insn)))
+	  || !reg_overlap_mentioned_p (tmp_reg, SET_SRC (PATTERN (cand->insn))))
 	return false;
 
       /* The destination register of the extension insn must not be
diff --git a/gcc/testsuite/gcc.dg/pr68194.c b/gcc/testsuite/gcc.dg/pr68194.c
new file mode 100644
index 0000000..b4855ea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr68194.c
@@ -0,0 +1,35 @@ 
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+int a, c, d, e, g, h;
+short f;
+
+short
+fn1 (void)
+{
+  int j[2];
+  for (; e; e++)
+    if (j[0])
+      for (;;)
+	;
+  if (!g)
+    return f;
+}
+
+int
+main (void)
+{
+  for (; a < 1; a++)
+    {
+      for (c = 0; c < 2; c++)
+	{
+	  d && (f = 0);
+	  h = fn1 ();
+	}
+      __builtin_printf ("%d\n", (char) f);
+   }
+
+ return 0;
+}
+
+/* { dg-output "0" } */