diff mbox series

[aarch64] PR111702 - ICE in insert_regs after interleave+zip1 vector initialization patch

Message ID CAAgBjMmxC+Lk1Fk7o4iGhy0G=svOB3ZoYfAdf835-PUvc5rMZw@mail.gmail.com
State New
Headers show
Series [aarch64] PR111702 - ICE in insert_regs after interleave+zip1 vector initialization patch | expand

Commit Message

Prathamesh Kulkarni Nov. 23, 2023, 11:36 a.m. UTC
Hi Richard,
For the test-case mentioned in PR111702, compiling with -O2
-frounding-math -fstack-protector-all results in following ICE during
cse2 pass:

test.c: In function 'foo':
test.c:119:1: internal compiler error: in insert_regs, at cse.cc:1120
  119 | }
      | ^
0xb7ebb0 insert_regs
        ../../gcc/gcc/cse.cc:1120
0x1f95134 merge_equiv_classes
        ../../gcc/gcc/cse.cc:1764
0x1f9b9ab cse_insn
        ../../gcc/gcc/cse.cc:4793
0x1f9fe30 cse_extended_basic_block
        ../../gcc/gcc/cse.cc:6577
0x1f9fe30 cse_main
        ../../gcc/gcc/cse.cc:6722
0x1fa0984 rest_of_handle_cse2
        ../../gcc/gcc/cse.cc:7620
0x1fa0984 execute
        ../../gcc/gcc/cse.cc:7675

This happens only with interleave+zip1 vector initialization with
-frounding-math -fstack-protector-all, while it compiles OK without
-fstack-protector-all. Also, it compiles OK with fallback sequence
code-gen (with or without -fstack-protector-all). Unfortunately, I
haven't been able to reduce the test-case further :/

From the test-case, it seems only the vector initializer for type J
uses interleave+zip1 approach, while rest of the vector initializers
use fallback sequence.

J is defined as:
typedef _Float16 __attribute__((__vector_size__ (16))) J;

and the initializer is:
(J) { 11654, 4801, 5535, 9743, 61680}

interleave+zip1 sequence for above initializer J:
mode = V8HF

vals: (parallel:V8HF [
        (reg:HF 642)
        (reg:HF 645)
        (reg:HF 648)
        (reg:HF 651)
        (reg:HF 654)
        (const_double:HF 0.0 [0x0.0p+0]) repeated x3
    ])

target: (reg:V8HF 641)
seq:
(insn 1058 0 1059 (set (reg:V4HF 657)
        (const_vector:V4HF [
                (const_double:HF 0.0 [0x0.0p+0]) repeated x4
            ])) "test.c":81:8 -1
     (nil))
(insn 1059 1058 1060 (set (reg:V4HF 657)
        (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 642))
            (reg:V4HF 657)
            (const_int 1 [0x1]))) "test.c":81:8 -1
     (nil))
(insn 1060 1059 1061 (set (reg:V4HF 657)
        (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 648))
            (reg:V4HF 657)
            (const_int 2 [0x2]))) "test.c":81:8 -1
     (nil))
(insn 1061 1060 1062 (set (reg:V4HF 657)
        (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 654))
            (reg:V4HF 657)
            (const_int 4 [0x4]))) "test.c":81:8 -1
     (nil))
(insn 1062 1061 1063 (set (reg:V4HF 658)
        (const_vector:V4HF [
                (const_double:HF 0.0 [0x0.0p+0]) repeated x4
            ])) "test.c":81:8 -1
     (nil))
(insn 1063 1062 1064 (set (reg:V4HF 658)
        (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 645))
            (reg:V4HF 658)
            (const_int 1 [0x1]))) "test.c":81:8 -1
     (nil))
(insn 1064 1063 1065 (set (reg:V4HF 658)
        (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 651))
            (reg:V4HF 658)
            (const_int 2 [0x2]))) "test.c":81:8 -1
     (nil))
(insn 1065 1064 0 (set (reg:V8HF 641)
        (unspec:V8HF [
                (subreg:V8HF (reg:V4HF 657) 0)
                (subreg:V8HF (reg:V4HF 658) 0)
            ] UNSPEC_ZIP1)) "test.c":81:8 -1
     (nil))

It seems to me that the above sequence correctly initializes the
vector into r641 ?
insns 1058-1061 construct r657 = { r642, r648, r654, 0 }
insns 1062-1064 construct r658 = { r645, r651, 0, 0 }
and zip1 will create r641 = { r642, r645, r648, r651, r654, 0, 0, 0 }

For the above test, it seems that with interleave+zip1 approach and
-fstack-protector-all,
in cse pass, there are two separate equivalence classes created for
(const_int 1), that need
to be merged in cse_insn:

       if (elt->first_same_value != src_eqv_elt->first_same_value)
            {
              /* The REG_EQUAL is indicating that two formerly distinct
                 classes are now equivalent.  So merge them.  */
              merge_equiv_classes (elt, src_eqv_elt);

elt equivalence chain:
Equivalence chain for (subreg:QI (reg:V16QI 671) 0):
(subreg:QI (reg:V16QI 671) 0)
(const_int 1 [0x1])

src_eqv_elt equivalence chain:
Equivalence chain for (const_int 1 [0x1]):
(reg:QI 34 v2)
(reg:QI 32 v0)
(reg:QI 34 v2)
(const_int 1 [0x1])
(vec_select:QI (reg:V16QI 671)
    (parallel [
            (const_int 1 [0x1])
        ]))
(vec_select:QI (reg:V16QI 32 v0)
    (parallel [
            (const_int 1 [0x1])
        ]))
(vec_select:QI (reg:V16QI 33 v1)
    (parallel [
            (const_int 2 [0x2])
        ]))
(vec_select:QI (reg:V16QI 33 v1)
    (parallel [
            (const_int 1 [0x1])
        ]))

The issue is that merge_equiv_classes doesn't seem to deal correctly with
multiple occurences of same register in class2 (src_eqv_elt), which
has two occurrences of
(reg:QI 34 v2)

In merge_equiv_classes, on first iteration, it will remove (reg:QI 34)
from reg_equiv_table
by calling delete_equiv_reg(34), and in insert_regs it will create an
entry for (reg:QI 34) in qty_table with new quantity number, and
create new equivalence in reg_eqv_table.

When we again come across (reg:QI 34) in class2, it will
unconditionally remove the register from reg_eqv_table, thus making
REG_QTY(34) = -35, even tho (reg:QI 34) is now present in class1
chain.

Then in insert_regs, we have:
x: (reg:QI 34 v2)

classp:
(subreg:QI (reg:V16QI 671) 0)
(reg:QI 34 v2)
(const_int 1 [0x1])

And while iterating over elements in classp, we end up with regno ==
c_regno == 34.
However, as mentioned above, merge_equiv_classes has deleted entry for
(reg:QI 34) from reg_eqv_table, so it's no longer valid, and thus end
up hitting the following assert:
gcc_assert (REGNO_QTY_VALID_P (c_regno));

I am not sure tho why this is triggered only with interleave+zip1 approach with
-fstack-protector-all.

The attached (untested) patch is a workaround for the above issue --
In merge_equiv_classes,
while iterating over elements in class2, it simply checks if element
is a reg, and already inserted in class1 with equivalent mode, and
avoids deleting it from reg_eqv_table in that case.

This avoids hitting the assert, and following is the result of merging
above two classes:
Equivalence chain for (subreg:QI (reg:V16QI 671) 0):
(subreg:QI (reg:V16QI 671) 0)
(reg:QI 34 v2)
(reg:QI 32 v0)
(reg:QI 34 v2)
(const_int 1 [0x1])
(const_int 1 [0x1])
(vec_select:QI (reg:V16QI 671)
    (parallel [
            (const_int 1 [0x1])
        ]))
(vec_select:QI (reg:V16QI 33 v1)
    (parallel [
            (const_int 1 [0x1])
        ]))
(vec_select:QI (reg:V16QI 33 v1)
    (parallel [
            (const_int 2 [0x2])
        ]))
(vec_select:QI (reg:V16QI 32 v0)
    (parallel [
            (const_int 1 [0x1])
        ]))

Which seems to be OK (?), but am not sure if this patch is in the
right direction,
and is also not efficient.
Could you please suggest how to proceed ?

Thanks,
Prathamesh

Comments

Prathamesh Kulkarni Dec. 4, 2023, 9:14 a.m. UTC | #1
On Thu, 23 Nov 2023 at 17:06, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> Hi Richard,
> For the test-case mentioned in PR111702, compiling with -O2
> -frounding-math -fstack-protector-all results in following ICE during
> cse2 pass:
>
> test.c: In function 'foo':
> test.c:119:1: internal compiler error: in insert_regs, at cse.cc:1120
>   119 | }
>       | ^
> 0xb7ebb0 insert_regs
>         ../../gcc/gcc/cse.cc:1120
> 0x1f95134 merge_equiv_classes
>         ../../gcc/gcc/cse.cc:1764
> 0x1f9b9ab cse_insn
>         ../../gcc/gcc/cse.cc:4793
> 0x1f9fe30 cse_extended_basic_block
>         ../../gcc/gcc/cse.cc:6577
> 0x1f9fe30 cse_main
>         ../../gcc/gcc/cse.cc:6722
> 0x1fa0984 rest_of_handle_cse2
>         ../../gcc/gcc/cse.cc:7620
> 0x1fa0984 execute
>         ../../gcc/gcc/cse.cc:7675
>
> This happens only with interleave+zip1 vector initialization with
> -frounding-math -fstack-protector-all, while it compiles OK without
> -fstack-protector-all. Also, it compiles OK with fallback sequence
> code-gen (with or without -fstack-protector-all). Unfortunately, I
> haven't been able to reduce the test-case further :/
>
> From the test-case, it seems only the vector initializer for type J
> uses interleave+zip1 approach, while rest of the vector initializers
> use fallback sequence.
>
> J is defined as:
> typedef _Float16 __attribute__((__vector_size__ (16))) J;
>
> and the initializer is:
> (J) { 11654, 4801, 5535, 9743, 61680}
>
> interleave+zip1 sequence for above initializer J:
> mode = V8HF
>
> vals: (parallel:V8HF [
>         (reg:HF 642)
>         (reg:HF 645)
>         (reg:HF 648)
>         (reg:HF 651)
>         (reg:HF 654)
>         (const_double:HF 0.0 [0x0.0p+0]) repeated x3
>     ])
>
> target: (reg:V8HF 641)
> seq:
> (insn 1058 0 1059 (set (reg:V4HF 657)
>         (const_vector:V4HF [
>                 (const_double:HF 0.0 [0x0.0p+0]) repeated x4
>             ])) "test.c":81:8 -1
>      (nil))
> (insn 1059 1058 1060 (set (reg:V4HF 657)
>         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 642))
>             (reg:V4HF 657)
>             (const_int 1 [0x1]))) "test.c":81:8 -1
>      (nil))
> (insn 1060 1059 1061 (set (reg:V4HF 657)
>         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 648))
>             (reg:V4HF 657)
>             (const_int 2 [0x2]))) "test.c":81:8 -1
>      (nil))
> (insn 1061 1060 1062 (set (reg:V4HF 657)
>         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 654))
>             (reg:V4HF 657)
>             (const_int 4 [0x4]))) "test.c":81:8 -1
>      (nil))
> (insn 1062 1061 1063 (set (reg:V4HF 658)
>         (const_vector:V4HF [
>                 (const_double:HF 0.0 [0x0.0p+0]) repeated x4
>             ])) "test.c":81:8 -1
>      (nil))
> (insn 1063 1062 1064 (set (reg:V4HF 658)
>         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 645))
>             (reg:V4HF 658)
>             (const_int 1 [0x1]))) "test.c":81:8 -1
>      (nil))
> (insn 1064 1063 1065 (set (reg:V4HF 658)
>         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 651))
>             (reg:V4HF 658)
>             (const_int 2 [0x2]))) "test.c":81:8 -1
>      (nil))
> (insn 1065 1064 0 (set (reg:V8HF 641)
>         (unspec:V8HF [
>                 (subreg:V8HF (reg:V4HF 657) 0)
>                 (subreg:V8HF (reg:V4HF 658) 0)
>             ] UNSPEC_ZIP1)) "test.c":81:8 -1
>      (nil))
>
> It seems to me that the above sequence correctly initializes the
> vector into r641 ?
> insns 1058-1061 construct r657 = { r642, r648, r654, 0 }
> insns 1062-1064 construct r658 = { r645, r651, 0, 0 }
> and zip1 will create r641 = { r642, r645, r648, r651, r654, 0, 0, 0 }
>
> For the above test, it seems that with interleave+zip1 approach and
> -fstack-protector-all,
> in cse pass, there are two separate equivalence classes created for
> (const_int 1), that need
> to be merged in cse_insn:
>
>        if (elt->first_same_value != src_eqv_elt->first_same_value)
>             {
>               /* The REG_EQUAL is indicating that two formerly distinct
>                  classes are now equivalent.  So merge them.  */
>               merge_equiv_classes (elt, src_eqv_elt);
>
> elt equivalence chain:
> Equivalence chain for (subreg:QI (reg:V16QI 671) 0):
> (subreg:QI (reg:V16QI 671) 0)
> (const_int 1 [0x1])
>
> src_eqv_elt equivalence chain:
> Equivalence chain for (const_int 1 [0x1]):
> (reg:QI 34 v2)
> (reg:QI 32 v0)
> (reg:QI 34 v2)
> (const_int 1 [0x1])
> (vec_select:QI (reg:V16QI 671)
>     (parallel [
>             (const_int 1 [0x1])
>         ]))
> (vec_select:QI (reg:V16QI 32 v0)
>     (parallel [
>             (const_int 1 [0x1])
>         ]))
> (vec_select:QI (reg:V16QI 33 v1)
>     (parallel [
>             (const_int 2 [0x2])
>         ]))
> (vec_select:QI (reg:V16QI 33 v1)
>     (parallel [
>             (const_int 1 [0x1])
>         ]))
>
> The issue is that merge_equiv_classes doesn't seem to deal correctly with
> multiple occurences of same register in class2 (src_eqv_elt), which
> has two occurrences of
> (reg:QI 34 v2)
>
> In merge_equiv_classes, on first iteration, it will remove (reg:QI 34)
> from reg_equiv_table
> by calling delete_equiv_reg(34), and in insert_regs it will create an
> entry for (reg:QI 34) in qty_table with new quantity number, and
> create new equivalence in reg_eqv_table.
>
> When we again come across (reg:QI 34) in class2, it will
> unconditionally remove the register from reg_eqv_table, thus making
> REG_QTY(34) = -35, even tho (reg:QI 34) is now present in class1
> chain.
>
> Then in insert_regs, we have:
> x: (reg:QI 34 v2)
>
> classp:
> (subreg:QI (reg:V16QI 671) 0)
> (reg:QI 34 v2)
> (const_int 1 [0x1])
>
> And while iterating over elements in classp, we end up with regno ==
> c_regno == 34.
> However, as mentioned above, merge_equiv_classes has deleted entry for
> (reg:QI 34) from reg_eqv_table, so it's no longer valid, and thus end
> up hitting the following assert:
> gcc_assert (REGNO_QTY_VALID_P (c_regno));
>
> I am not sure tho why this is triggered only with interleave+zip1 approach with
> -fstack-protector-all.
>
> The attached (untested) patch is a workaround for the above issue --
> In merge_equiv_classes,
> while iterating over elements in class2, it simply checks if element
> is a reg, and already inserted in class1 with equivalent mode, and
> avoids deleting it from reg_eqv_table in that case.
>
> This avoids hitting the assert, and following is the result of merging
> above two classes:
> Equivalence chain for (subreg:QI (reg:V16QI 671) 0):
> (subreg:QI (reg:V16QI 671) 0)
> (reg:QI 34 v2)
> (reg:QI 32 v0)
> (reg:QI 34 v2)
> (const_int 1 [0x1])
> (const_int 1 [0x1])
> (vec_select:QI (reg:V16QI 671)
>     (parallel [
>             (const_int 1 [0x1])
>         ]))
> (vec_select:QI (reg:V16QI 33 v1)
>     (parallel [
>             (const_int 1 [0x1])
>         ]))
> (vec_select:QI (reg:V16QI 33 v1)
>     (parallel [
>             (const_int 2 [0x2])
>         ]))
> (vec_select:QI (reg:V16QI 32 v0)
>     (parallel [
>             (const_int 1 [0x1])
>         ]))
>
> Which seems to be OK (?), but am not sure if this patch is in the
> right direction,
> and is also not efficient.
> Could you please suggest how to proceed ?
Hi,
ping: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637913.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
Prathamesh Kulkarni Dec. 20, 2023, 7:01 a.m. UTC | #2
On Mon, 4 Dec 2023 at 14:44, Prathamesh Kulkarni
<prathamesh.kulkarni@linaro.org> wrote:
>
> On Thu, 23 Nov 2023 at 17:06, Prathamesh Kulkarni
> <prathamesh.kulkarni@linaro.org> wrote:
> >
> > Hi Richard,
> > For the test-case mentioned in PR111702, compiling with -O2
> > -frounding-math -fstack-protector-all results in following ICE during
> > cse2 pass:
> >
> > test.c: In function 'foo':
> > test.c:119:1: internal compiler error: in insert_regs, at cse.cc:1120
> >   119 | }
> >       | ^
> > 0xb7ebb0 insert_regs
> >         ../../gcc/gcc/cse.cc:1120
> > 0x1f95134 merge_equiv_classes
> >         ../../gcc/gcc/cse.cc:1764
> > 0x1f9b9ab cse_insn
> >         ../../gcc/gcc/cse.cc:4793
> > 0x1f9fe30 cse_extended_basic_block
> >         ../../gcc/gcc/cse.cc:6577
> > 0x1f9fe30 cse_main
> >         ../../gcc/gcc/cse.cc:6722
> > 0x1fa0984 rest_of_handle_cse2
> >         ../../gcc/gcc/cse.cc:7620
> > 0x1fa0984 execute
> >         ../../gcc/gcc/cse.cc:7675
> >
> > This happens only with interleave+zip1 vector initialization with
> > -frounding-math -fstack-protector-all, while it compiles OK without
> > -fstack-protector-all. Also, it compiles OK with fallback sequence
> > code-gen (with or without -fstack-protector-all). Unfortunately, I
> > haven't been able to reduce the test-case further :/
> >
> > From the test-case, it seems only the vector initializer for type J
> > uses interleave+zip1 approach, while rest of the vector initializers
> > use fallback sequence.
> >
> > J is defined as:
> > typedef _Float16 __attribute__((__vector_size__ (16))) J;
> >
> > and the initializer is:
> > (J) { 11654, 4801, 5535, 9743, 61680}
> >
> > interleave+zip1 sequence for above initializer J:
> > mode = V8HF
> >
> > vals: (parallel:V8HF [
> >         (reg:HF 642)
> >         (reg:HF 645)
> >         (reg:HF 648)
> >         (reg:HF 651)
> >         (reg:HF 654)
> >         (const_double:HF 0.0 [0x0.0p+0]) repeated x3
> >     ])
> >
> > target: (reg:V8HF 641)
> > seq:
> > (insn 1058 0 1059 (set (reg:V4HF 657)
> >         (const_vector:V4HF [
> >                 (const_double:HF 0.0 [0x0.0p+0]) repeated x4
> >             ])) "test.c":81:8 -1
> >      (nil))
> > (insn 1059 1058 1060 (set (reg:V4HF 657)
> >         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 642))
> >             (reg:V4HF 657)
> >             (const_int 1 [0x1]))) "test.c":81:8 -1
> >      (nil))
> > (insn 1060 1059 1061 (set (reg:V4HF 657)
> >         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 648))
> >             (reg:V4HF 657)
> >             (const_int 2 [0x2]))) "test.c":81:8 -1
> >      (nil))
> > (insn 1061 1060 1062 (set (reg:V4HF 657)
> >         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 654))
> >             (reg:V4HF 657)
> >             (const_int 4 [0x4]))) "test.c":81:8 -1
> >      (nil))
> > (insn 1062 1061 1063 (set (reg:V4HF 658)
> >         (const_vector:V4HF [
> >                 (const_double:HF 0.0 [0x0.0p+0]) repeated x4
> >             ])) "test.c":81:8 -1
> >      (nil))
> > (insn 1063 1062 1064 (set (reg:V4HF 658)
> >         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 645))
> >             (reg:V4HF 658)
> >             (const_int 1 [0x1]))) "test.c":81:8 -1
> >      (nil))
> > (insn 1064 1063 1065 (set (reg:V4HF 658)
> >         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 651))
> >             (reg:V4HF 658)
> >             (const_int 2 [0x2]))) "test.c":81:8 -1
> >      (nil))
> > (insn 1065 1064 0 (set (reg:V8HF 641)
> >         (unspec:V8HF [
> >                 (subreg:V8HF (reg:V4HF 657) 0)
> >                 (subreg:V8HF (reg:V4HF 658) 0)
> >             ] UNSPEC_ZIP1)) "test.c":81:8 -1
> >      (nil))
> >
> > It seems to me that the above sequence correctly initializes the
> > vector into r641 ?
> > insns 1058-1061 construct r657 = { r642, r648, r654, 0 }
> > insns 1062-1064 construct r658 = { r645, r651, 0, 0 }
> > and zip1 will create r641 = { r642, r645, r648, r651, r654, 0, 0, 0 }
> >
> > For the above test, it seems that with interleave+zip1 approach and
> > -fstack-protector-all,
> > in cse pass, there are two separate equivalence classes created for
> > (const_int 1), that need
> > to be merged in cse_insn:
> >
> >        if (elt->first_same_value != src_eqv_elt->first_same_value)
> >             {
> >               /* The REG_EQUAL is indicating that two formerly distinct
> >                  classes are now equivalent.  So merge them.  */
> >               merge_equiv_classes (elt, src_eqv_elt);
> >
> > elt equivalence chain:
> > Equivalence chain for (subreg:QI (reg:V16QI 671) 0):
> > (subreg:QI (reg:V16QI 671) 0)
> > (const_int 1 [0x1])
> >
> > src_eqv_elt equivalence chain:
> > Equivalence chain for (const_int 1 [0x1]):
> > (reg:QI 34 v2)
> > (reg:QI 32 v0)
> > (reg:QI 34 v2)
> > (const_int 1 [0x1])
> > (vec_select:QI (reg:V16QI 671)
> >     (parallel [
> >             (const_int 1 [0x1])
> >         ]))
> > (vec_select:QI (reg:V16QI 32 v0)
> >     (parallel [
> >             (const_int 1 [0x1])
> >         ]))
> > (vec_select:QI (reg:V16QI 33 v1)
> >     (parallel [
> >             (const_int 2 [0x2])
> >         ]))
> > (vec_select:QI (reg:V16QI 33 v1)
> >     (parallel [
> >             (const_int 1 [0x1])
> >         ]))
> >
> > The issue is that merge_equiv_classes doesn't seem to deal correctly with
> > multiple occurences of same register in class2 (src_eqv_elt), which
> > has two occurrences of
> > (reg:QI 34 v2)
> >
> > In merge_equiv_classes, on first iteration, it will remove (reg:QI 34)
> > from reg_equiv_table
> > by calling delete_equiv_reg(34), and in insert_regs it will create an
> > entry for (reg:QI 34) in qty_table with new quantity number, and
> > create new equivalence in reg_eqv_table.
> >
> > When we again come across (reg:QI 34) in class2, it will
> > unconditionally remove the register from reg_eqv_table, thus making
> > REG_QTY(34) = -35, even tho (reg:QI 34) is now present in class1
> > chain.
> >
> > Then in insert_regs, we have:
> > x: (reg:QI 34 v2)
> >
> > classp:
> > (subreg:QI (reg:V16QI 671) 0)
> > (reg:QI 34 v2)
> > (const_int 1 [0x1])
> >
> > And while iterating over elements in classp, we end up with regno ==
> > c_regno == 34.
> > However, as mentioned above, merge_equiv_classes has deleted entry for
> > (reg:QI 34) from reg_eqv_table, so it's no longer valid, and thus end
> > up hitting the following assert:
> > gcc_assert (REGNO_QTY_VALID_P (c_regno));
> >
> > I am not sure tho why this is triggered only with interleave+zip1 approach with
> > -fstack-protector-all.
> >
> > The attached (untested) patch is a workaround for the above issue --
> > In merge_equiv_classes,
> > while iterating over elements in class2, it simply checks if element
> > is a reg, and already inserted in class1 with equivalent mode, and
> > avoids deleting it from reg_eqv_table in that case.
> >
> > This avoids hitting the assert, and following is the result of merging
> > above two classes:
> > Equivalence chain for (subreg:QI (reg:V16QI 671) 0):
> > (subreg:QI (reg:V16QI 671) 0)
> > (reg:QI 34 v2)
> > (reg:QI 32 v0)
> > (reg:QI 34 v2)
> > (const_int 1 [0x1])
> > (const_int 1 [0x1])
> > (vec_select:QI (reg:V16QI 671)
> >     (parallel [
> >             (const_int 1 [0x1])
> >         ]))
> > (vec_select:QI (reg:V16QI 33 v1)
> >     (parallel [
> >             (const_int 1 [0x1])
> >         ]))
> > (vec_select:QI (reg:V16QI 33 v1)
> >     (parallel [
> >             (const_int 2 [0x2])
> >         ]))
> > (vec_select:QI (reg:V16QI 32 v0)
> >     (parallel [
> >             (const_int 1 [0x1])
> >         ]))
> >
> > Which seems to be OK (?), but am not sure if this patch is in the
> > right direction,
> > and is also not efficient.
> > Could you please suggest how to proceed ?
> Hi,
> ping: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637913.html
Hi,
ping * 2: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/637913.html

Thanks,
Prathamesh
>
> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
Richard Sandiford Dec. 20, 2023, 6:29 p.m. UTC | #3
Prathamesh Kulkarni <prathamesh.kulkarni@linaro.org> writes:
> Hi Richard,
> For the test-case mentioned in PR111702, compiling with -O2
> -frounding-math -fstack-protector-all results in following ICE during
> cse2 pass:
>
> test.c: In function 'foo':
> test.c:119:1: internal compiler error: in insert_regs, at cse.cc:1120
>   119 | }
>       | ^
> 0xb7ebb0 insert_regs
>         ../../gcc/gcc/cse.cc:1120
> 0x1f95134 merge_equiv_classes
>         ../../gcc/gcc/cse.cc:1764
> 0x1f9b9ab cse_insn
>         ../../gcc/gcc/cse.cc:4793
> 0x1f9fe30 cse_extended_basic_block
>         ../../gcc/gcc/cse.cc:6577
> 0x1f9fe30 cse_main
>         ../../gcc/gcc/cse.cc:6722
> 0x1fa0984 rest_of_handle_cse2
>         ../../gcc/gcc/cse.cc:7620
> 0x1fa0984 execute
>         ../../gcc/gcc/cse.cc:7675
>
> This happens only with interleave+zip1 vector initialization with
> -frounding-math -fstack-protector-all, while it compiles OK without
> -fstack-protector-all. Also, it compiles OK with fallback sequence
> code-gen (with or without -fstack-protector-all). Unfortunately, I
> haven't been able to reduce the test-case further :/
>
> From the test-case, it seems only the vector initializer for type J
> uses interleave+zip1 approach, while rest of the vector initializers
> use fallback sequence.
>
> J is defined as:
> typedef _Float16 __attribute__((__vector_size__ (16))) J;
>
> and the initializer is:
> (J) { 11654, 4801, 5535, 9743, 61680}
>
> interleave+zip1 sequence for above initializer J:
> mode = V8HF
>
> vals: (parallel:V8HF [
>         (reg:HF 642)
>         (reg:HF 645)
>         (reg:HF 648)
>         (reg:HF 651)
>         (reg:HF 654)
>         (const_double:HF 0.0 [0x0.0p+0]) repeated x3
>     ])
>
> target: (reg:V8HF 641)
> seq:
> (insn 1058 0 1059 (set (reg:V4HF 657)
>         (const_vector:V4HF [
>                 (const_double:HF 0.0 [0x0.0p+0]) repeated x4
>             ])) "test.c":81:8 -1
>      (nil))
> (insn 1059 1058 1060 (set (reg:V4HF 657)
>         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 642))
>             (reg:V4HF 657)
>             (const_int 1 [0x1]))) "test.c":81:8 -1
>      (nil))
> (insn 1060 1059 1061 (set (reg:V4HF 657)
>         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 648))
>             (reg:V4HF 657)
>             (const_int 2 [0x2]))) "test.c":81:8 -1
>      (nil))
> (insn 1061 1060 1062 (set (reg:V4HF 657)
>         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 654))
>             (reg:V4HF 657)
>             (const_int 4 [0x4]))) "test.c":81:8 -1
>      (nil))
> (insn 1062 1061 1063 (set (reg:V4HF 658)
>         (const_vector:V4HF [
>                 (const_double:HF 0.0 [0x0.0p+0]) repeated x4
>             ])) "test.c":81:8 -1
>      (nil))
> (insn 1063 1062 1064 (set (reg:V4HF 658)
>         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 645))
>             (reg:V4HF 658)
>             (const_int 1 [0x1]))) "test.c":81:8 -1
>      (nil))
> (insn 1064 1063 1065 (set (reg:V4HF 658)
>         (vec_merge:V4HF (vec_duplicate:V4HF (reg:HF 651))
>             (reg:V4HF 658)
>             (const_int 2 [0x2]))) "test.c":81:8 -1
>      (nil))
> (insn 1065 1064 0 (set (reg:V8HF 641)
>         (unspec:V8HF [
>                 (subreg:V8HF (reg:V4HF 657) 0)
>                 (subreg:V8HF (reg:V4HF 658) 0)
>             ] UNSPEC_ZIP1)) "test.c":81:8 -1
>      (nil))
>
> It seems to me that the above sequence correctly initializes the
> vector into r641 ?
> insns 1058-1061 construct r657 = { r642, r648, r654, 0 }
> insns 1062-1064 construct r658 = { r645, r651, 0, 0 }
> and zip1 will create r641 = { r642, r645, r648, r651, r654, 0, 0, 0 }
>
> For the above test, it seems that with interleave+zip1 approach and
> -fstack-protector-all,
> in cse pass, there are two separate equivalence classes created for
> (const_int 1), that need
> to be merged in cse_insn:
>
>        if (elt->first_same_value != src_eqv_elt->first_same_value)
>             {
>               /* The REG_EQUAL is indicating that two formerly distinct
>                  classes are now equivalent.  So merge them.  */
>               merge_equiv_classes (elt, src_eqv_elt);
>
> elt equivalence chain:
> Equivalence chain for (subreg:QI (reg:V16QI 671) 0):
> (subreg:QI (reg:V16QI 671) 0)
> (const_int 1 [0x1])
>
> src_eqv_elt equivalence chain:
> Equivalence chain for (const_int 1 [0x1]):
> (reg:QI 34 v2)
> (reg:QI 32 v0)
> (reg:QI 34 v2)
> (const_int 1 [0x1])
> (vec_select:QI (reg:V16QI 671)
>     (parallel [
>             (const_int 1 [0x1])
>         ]))
> (vec_select:QI (reg:V16QI 32 v0)
>     (parallel [
>             (const_int 1 [0x1])
>         ]))
> (vec_select:QI (reg:V16QI 33 v1)
>     (parallel [
>             (const_int 2 [0x2])
>         ]))
> (vec_select:QI (reg:V16QI 33 v1)
>     (parallel [
>             (const_int 1 [0x1])
>         ]))
>
> The issue is that merge_equiv_classes doesn't seem to deal correctly with
> multiple occurences of same register in class2 (src_eqv_elt), which
> has two occurrences of
> (reg:QI 34 v2)

I don't think that's supposed to occur.

> In merge_equiv_classes, on first iteration, it will remove (reg:QI 34)
> from reg_equiv_table
> by calling delete_equiv_reg(34), and in insert_regs it will create an
> entry for (reg:QI 34) in qty_table with new quantity number, and
> create new equivalence in reg_eqv_table.
>
> When we again come across (reg:QI 34) in class2, it will
> unconditionally remove the register from reg_eqv_table, thus making
> REG_QTY(34) = -35, even tho (reg:QI 34) is now present in class1
> chain.
>
> Then in insert_regs, we have:
> x: (reg:QI 34 v2)
>
> classp:
> (subreg:QI (reg:V16QI 671) 0)
> (reg:QI 34 v2)
> (const_int 1 [0x1])
>
> And while iterating over elements in classp, we end up with regno ==
> c_regno == 34.
> However, as mentioned above, merge_equiv_classes has deleted entry for
> (reg:QI 34) from reg_eqv_table, so it's no longer valid, and thus end
> up hitting the following assert:
> gcc_assert (REGNO_QTY_VALID_P (c_regno));
>
> I am not sure tho why this is triggered only with interleave+zip1 approach with
> -fstack-protector-all.
>
> The attached (untested) patch is a workaround for the above issue --
> In merge_equiv_classes,
> while iterating over elements in class2, it simply checks if element
> is a reg, and already inserted in class1 with equivalent mode, and
> avoids deleting it from reg_eqv_table in that case.
>
> This avoids hitting the assert, and following is the result of merging
> above two classes:
> Equivalence chain for (subreg:QI (reg:V16QI 671) 0):
> (subreg:QI (reg:V16QI 671) 0)
> (reg:QI 34 v2)
> (reg:QI 32 v0)
> (reg:QI 34 v2)
> (const_int 1 [0x1])
> (const_int 1 [0x1])
> (vec_select:QI (reg:V16QI 671)
>     (parallel [
>             (const_int 1 [0x1])
>         ]))
> (vec_select:QI (reg:V16QI 33 v1)
>     (parallel [
>             (const_int 1 [0x1])
>         ]))
> (vec_select:QI (reg:V16QI 33 v1)
>     (parallel [
>             (const_int 2 [0x2])
>         ]))
> (vec_select:QI (reg:V16QI 32 v0)
>     (parallel [
>             (const_int 1 [0x1])
>         ]))
>
> Which seems to be OK (?), but am not sure if this patch is in the
> right direction,
> and is also not efficient.
> Could you please suggest how to proceed ?

I think things went wrong when we added the same value twice.
Debugging how that happened suggested a further (and IMO more fundemantal)
problem with the handling of CONST_VECTORs.  I'll post a patch for that
in a sec.

Thanks,
Richard
diff mbox series

Patch

diff --git a/gcc/cse.cc b/gcc/cse.cc
index f9603fdfd43..1e20be457c4 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -1747,7 +1747,16 @@  merge_equiv_classes (struct table_elt *class1, struct table_elt *class2)
 	  if (REG_P (exp))
 	    {
 	      need_rehash = REGNO_QTY_VALID_P (REGNO (exp));
-	      delete_reg_equiv (REGNO (exp));
+
+	      /* If reg is already inserted into class1 and has a valid new
+		 quantity, avoid deleting it from reg_eqv_table.  */
+	      table_elt *e;
+	      for (e = class1->first_same_value; e; e = e->next_same_value)
+		if (REG_P (e->exp) && REGNO (e->exp) == REGNO (exp)
+		    && e->mode == mode)
+		  break;
+	      if (e == NULL)
+		delete_reg_equiv (REGNO (exp));
 	    }
 
 	  if (REG_P (exp) && REGNO (exp) >= FIRST_PSEUDO_REGISTER)