[PPC] Fix ICE using power9 with soft-float

Message ID 0065c16a-d7c3-1237-9deb-73400a9cd0c8@codesourcery.com
State New
Headers show

Commit Message

Andrew Stubbs Nov. 14, 2016, 4:57 p.m.
The testcase powerpc/fusion3.c causes an ICE when compiled with 
-msoft-float.

The key line in the testcase looks fairly harmless:

    void fusion_float_write (float *p, float f){ p[LARGE] = f; }

The error message look like this:

.../gcc.target/powerpc/fusion3.c: In function 'fusion_float_write':^M
.../gcc.target/powerpc/fusion3.c:12:1: error: unrecognizable insn:^M
(insn 18 4 14 2 (parallel [^M
             (set (mem:SF (plus:SI (plus:SI (reg:SI 3 3 [ p ])^M
                             (const_int 327680 [0x50000]))^M
                         (const_int -29420 [0xffffffffffff8d14])) [1 
MEM[(float *)p_1(D) + 298260B]+0 S4 A32])^M
                 (unspec:SF [^M
                         (reg:SF 4 4 [ f ])^M
                     ] UNSPEC_FUSION_P9))^M
             (clobber (reg/f:SI 3 3 [157]))^M
         ]) 
"/scratch/astubbs/fsf/src/gcc-mainline/gcc/testsuite/gcc.target/powerpc/fusion3.c":12 
-1^M
      (nil))^M

Basically, the problem is that the peephole optimization tries to create 
a Power9 Fusion instruction, but those do not support SF values in 
integer registers (AFAICT).

So, presumably, I need to adjust either the predicate or the condition 
of the peephole rules.

The predicate used is "toc_fusion_or_p9_reg_operand", and this might be 
the root cause, but I don't know the architecture well enough to be 
sure. The predicate code seems to suggest that "toc_fusion", whatever 
that is, should be able to do this, but the insn produced by the 
peephole uses only UNSPEC_FUSION_P9, which does not. Perhaps this 
predicate is inappropriate for the P9 Fusion peephole, or perhaps it 
needs to be taught about this corner case?

In any case, I don't want to change the predicate without being sure 
what it does (here and elsewhere), so the attached patch solves the 
problem by changing the condition.

Is this OK, or do I need to do something less blunt?

Thanks

Andrew

Comments

Segher Boessenkool Nov. 15, 2016, 12:29 p.m. | #1
Hi Andrew,

Thanks for the patch and looking into this.

On Mon, Nov 14, 2016 at 04:57:58PM +0000, Andrew Stubbs wrote:
> The testcase powerpc/fusion3.c causes an ICE when compiled with 

> -msoft-float.


> Basically, the problem is that the peephole optimization tries to create 

> a Power9 Fusion instruction, but those do not support SF values in 

> integer registers (AFAICT).


The peepholes do not support it, or maybe the define_insns do not either.
The machine of course will not care.

> So, presumably, I need to adjust either the predicate or the condition 

> of the peephole rules.


Yes.

> The predicate used is "toc_fusion_or_p9_reg_operand", and this might be 

> the root cause, but I don't know the architecture well enough to be 

> sure.


This fusion is quite simple really.  Offset addressing insns can only
access a 16-bit range, but instead of writing e.g.

	lwz 0,0x12345678(3)

you can do

	addis 4,3,0x1234
	lwz 0,0x5678(4)

and the processor will execute it faster, essentially like if it was
written as the first example.  See 2.1.1 in Power ISA 3.0.

> The predicate code seems to suggest that "toc_fusion", whatever 

> that is, should be able to do this, but the insn produced by the 

> peephole uses only UNSPEC_FUSION_P9, which does not. Perhaps this 

> predicate is inappropriate for the P9 Fusion peephole, or perhaps it 

> needs to be taught about this corner case?


One of those yes.

> In any case, I don't want to change the predicate without being sure 

> what it does (here and elsewhere), so the attached patch solves the 

> problem by changing the condition.

> 

> Is this OK, or do I need to do something less blunt?


We can have floats in GPRs even without TARGET_SOFT_FLOAT, so this
does not even work?  And yes this is blunt :-)


Segher
Andrew Stubbs Nov. 15, 2016, 4:52 p.m. | #2
On 15/11/16 12:29, Segher Boessenkool wrote:
> The peepholes do not support it, or maybe the define_insns do not either.

> The machine of course will not care.


Oh, OK, so probably the bug is not in the peephole at all, but in the 
define_insn, or lack thereof.

More investigation required.

Thanks

Andrew
Michael Meissner Nov. 15, 2016, 9:06 p.m. | #3
On Mon, Nov 14, 2016 at 04:57:58PM +0000, Andrew Stubbs wrote:
> The testcase powerpc/fusion3.c causes an ICE when compiled with

> -msoft-float.

> 

> The key line in the testcase looks fairly harmless:

> 

>    void fusion_float_write (float *p, float f){ p[LARGE] = f; }


LARGE is large enough that it won't fit in the offset field of a single
instruction.

> The error message look like this:

> 

> .../gcc.target/powerpc/fusion3.c: In function 'fusion_float_write':

> .../gcc.target/powerpc/fusion3.c:12:1: error: unrecognizable insn:

> (insn 18 4 14 2 (parallel [

>             (set (mem:SF (plus:SI (plus:SI (reg:SI 3 3 [ p ])

>                             (const_int 327680 [0x50000]))

>                         (const_int -29420 [0xffffffffffff8d14])) [1

> MEM[(float *)p_1(D) + 298260B]+0 S4 A32])

>                 (unspec:SF [

>                         (reg:SF 4 4 [ f ])

>                     ] UNSPEC_FUSION_P9))

>             (clobber (reg/f:SI 3 3 [157]))

>         ]) "/scratch/astubbs/fsf/src/gcc-mainline/gcc/testsuite/gcc.target/powerpc/fusion3.c":12

> -1

>      (nil))


When I wrote the basic fusion stuff, I was assuming nobody would do
-msoft-float -mcpu=power7.  By the time the code had been written, the
soft-float libraries were no longer being built on the 64-bit Linux systems,
due to the Linux distributions dropping support for them.

However, while we can make this particular failure go away by making
powerpc_p9vector_ok (and probably some of the other targets needing VSX
features) false if -msoft-float it is still a problem, since SFmode can go in
GPRs.

This is the same basic failure (PR 78101) that I saw in building the next
generation of the Spec benchmark suite, except that it is a DFmode instead of
SFmode.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78101

Both are trying to store a value from a GPR


> Basically, the problem is that the peephole optimization tries to

> create a Power9 Fusion instruction, but those do not support SF

> values in integer registers (AFAICT).

> 

> So, presumably, I need to adjust either the predicate or the

> condition of the peephole rules.


Now, that I have a little time, I can look into this, to at least make
predicate and peepholes match.  There is some other stuff (support for the new
load/store that were added to the compiler after that we should also tackle).

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Andrew Stubbs Nov. 15, 2016, 9:11 p.m. | #4
On 15/11/16 21:06, Michael Meissner wrote:
> Now, that I have a little time, I can look into this, to at least make

> predicate and peepholes match.  There is some other stuff (support for the new

> load/store that were added to the compiler after that we should also tackle).


I've been investigating this today, and I've found that the insn does 
not match because the "fusion_addis_mem_combo_store" predicate requires 
TARGET_SF_FPR is true, which in turn requires TARGET_HARD_FLOAT is true.

So basically the fusion stuff is disabled in soft-float mode regardless 
of where the value is stored.

Anyway, I'm at end-of-day now, so let me know if you come up with anything.

Thanks

Andrew
Michael Meissner Nov. 16, 2016, 1:10 p.m. | #5
On Tue, Nov 15, 2016 at 09:11:56PM +0000, Andrew Stubbs wrote:
> On 15/11/16 21:06, Michael Meissner wrote:

> >Now, that I have a little time, I can look into this, to at least make

> >predicate and peepholes match.  There is some other stuff (support for the new

> >load/store that were added to the compiler after that we should also tackle).

> 

> I've been investigating this today, and I've found that the insn

> does not match because the "fusion_addis_mem_combo_store" predicate

> requires TARGET_SF_FPR is true, which in turn requires

> TARGET_HARD_FLOAT is true.

> 

> So basically the fusion stuff is disabled in soft-float mode

> regardless of where the value is stored.

> 

> Anyway, I'm at end-of-day now, so let me know if you come up with anything.


Yeah, SFmode and DFmode should not have the TARGET_{S,D}F_FPR checks.

But a secondary problem is the early clobber in the match_scratch.

Before the peephole2 we have:

(insn 7 4 8 2 (set (reg/f:DI 3 3 [157])
        (plus:DI (reg:DI 3 3 [ p ])
            (const_int 327680 [0x50000]))) "fusion3.c":12 75 {*adddi3}
     (nil))

(insn 8 7 14 2 (set (mem:SF (plus:DI (reg/f:DI 3 3 [157])
                (const_int -29420 [0xffffffffffff8d14])) [1 MEM[(float *)p_1(D) + 298260B]+0 S4 A32])
        (reg:SF 4 4 [ f ])) "fusion3.c":12 484 {*movsf_softfloat}
     (expr_list:REG_DEAD (reg:SF 4 4 [ f ])
        (expr_list:REG_DEAD (reg/f:DI 3 3 [157])
            (nil))))

This gets combined into:

(insn 17 4 14 2 (parallel [
            (set (mem:SF (plus:DI (plus:DI (reg:DI 3 3 [ p ])
                            (const_int 327680 [0x50000]))
                        (const_int -29420 [0xffffffffffff8d14])) [1 MEM[(float *)p_1(D) + 298260B]+0 S4 A32])
                (unspec:SF [
                        (reg:SF 4 4 [ f ])
                    ] UNSPEC_FUSION_P9))
            (clobber (reg/f:DI 3 3 [157]))
        ]) "fusion3.c":12 -1
     (nil))

where (reg:DI 3) is the base register temporary.  Since it was used as part of
the original address, the early clobber check in constrain_operands will flag
it.  This is what is causing PR 78101.  Thus there are two issues here.

In the scope of the ISA 3.0/power9 work, there is also the issue that the new
d-form load and stores should be folded into this as well.  This is due to me
doing the original power9/ISA 3.0 fusion first, and then putting it on the
shelf, and then months later working on the new address forms, and never
getting back to fusion.

As it is currently written, there was an expectation that the fusion stuff
would eventually move from being a peephole2 to being part of the addressing
handled before register allocation.  Then the secondary reload hook would call
the appropriate fusion helper function, which is why we have int_ and gpr_
fusion insn functions, each based on a separate register class.  The int_
fusion form allows SFmode/DFmode, but as you point out the combo recognizer
doesn't allow them in GPRs.

I'm not as sure that we will have the will to move fused addresses early before
register allocation, and whether we need to keep around the separate int_ and
fpr_ forms.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Andrew Stubbs Nov. 16, 2016, 4:15 p.m. | #6
On 16/11/16 13:10, Michael Meissner wrote:
> Yeah, SFmode and DFmode should not have the TARGET_{S,D}F_FPR checks.


So, I can safely resolve my initial problem by simply removing them? And 
that wouldn't break the other use of that predicate?

> But a secondary problem is the early clobber in the match_scratch.


So, the FPR_FUSION insn works because operands 1 and 2 cannot conflict, 
which means the early-clobber is not necessary, but the GPR_FUSION insn 
cannot work because there's no way to ensure that operands 1 and 2 don't 
conflict without also specifying that operands 0 and 2 don't conflict, 
which they commonly do.

We could fix it, for now, by adding new patterns that fit both cases 
(given that the register numbers are known at peephole time).

Or, we could disable the peephole in the case where this would occur (as 
my original patch does, albeit bluntly).

Or, something else?

Andrew
Andrew Stubbs Nov. 23, 2016, 11:52 a.m. | #7
On 16/11/16 17:05, Michael Meissner wrote:
> I'm starting to test this patch right now (it's on LE power8 stage3 right now,

> and I need to build BE power8 and BE power7 versions when I get into the office

> shortly, and build spec 2017 with it for PR 78101):


Did the testing go OK?

Andrew
Michael Meissner Nov. 23, 2016, 8:23 p.m. | #8
On Wed, Nov 23, 2016 at 11:52:01AM +0000, Andrew Stubbs wrote:
> On 16/11/16 17:05, Michael Meissner wrote:

> >I'm starting to test this patch right now (it's on LE power8 stage3 right now,

> >and I need to build BE power8 and BE power7 versions when I get into the office

> >shortly, and build spec 2017 with it for PR 78101):

> 

> Did the testing go OK?


Yes, the testing worked fine, and I checked in the fix.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Patch

2016-11-11  Andrew Stubbs  <ams@codesourcery.com>

	gcc/
	* config/rs6000/rs6000.md: Disable P9-fusion peepholes when
	TARGET_SOFT_FLOAT is set.

	gcc/testsuite/
	* gcc.target/powerpc/fusion3.c: Skip on -msoft-float.

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index de959c9..28d8174 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -13024,7 +13024,8 @@ 
    (set (match_operand:SFDF 2 "toc_fusion_or_p9_reg_operand" "")
 	(match_operand:SFDF 3 "fusion_offsettable_mem_operand" ""))]
   "TARGET_P9_FUSION && peep2_reg_dead_p (2, operands[0])
-   && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])"
+   && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])
+   && !TARGET_SOFT_FLOAT"
   [(const_int 0)]
 {
   expand_fusion_p9_load (operands);
@@ -13037,7 +13038,8 @@ 
    (set (match_operand:SFDF 2 "offsettable_mem_operand" "")
 	(match_operand:SFDF 3 "toc_fusion_or_p9_reg_operand" ""))]
   "TARGET_P9_FUSION && peep2_reg_dead_p (2, operands[0])
-   && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])"
+   && fusion_p9_p (operands[0], operands[1], operands[2], operands[3])
+   && !TARGET_SOFT_FLOAT"
   [(const_int 0)]
 {
   expand_fusion_p9_store (operands);
@@ -13050,7 +13052,8 @@ 
    (set (match_dup 0)
 	(ior:SDI (match_dup 0)
 		 (match_operand:SDI 2 "u_short_cint_operand" "")))]
-  "TARGET_P9_FUSION"
+  "TARGET_P9_FUSION
+   && !TARGET_SOFT_FLOAT"
   [(set (match_dup 0)
 	(unspec:SDI [(match_dup 1)
 		     (match_dup 2)] UNSPEC_FUSION_P9))])
@@ -13063,7 +13066,8 @@ 
 		 (match_operand:SDI 3 "u_short_cint_operand" "")))]
   "TARGET_P9_FUSION
    && !rtx_equal_p (operands[0], operands[2])
-   && peep2_reg_dead_p (2, operands[0])"
+   && peep2_reg_dead_p (2, operands[0])
+   && !TARGET_SOFT_FLOAT"
   [(set (match_dup 2)
 	(unspec:SDI [(match_dup 1)
 		     (match_dup 3)] UNSPEC_FUSION_P9))])
diff --git a/gcc/testsuite/gcc.target/powerpc/fusion3.c b/gcc/testsuite/gcc.target/powerpc/fusion3.c
index 8eca640..20992d0 100644
--- a/gcc/testsuite/gcc.target/powerpc/fusion3.c
+++ b/gcc/testsuite/gcc.target/powerpc/fusion3.c
@@ -2,6 +2,7 @@ 
 /* { dg-skip-if "" { powerpc*-*-darwin* } { "*" } { "" } } */
 /* { dg-require-effective-target powerpc_p9vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { "-mcpu=power7" } } */
+/* { dg-skip-if "-mpower9-fusion and -msoft-float are incompatible" { powerpc*-*-* } { "-msoft-float" } {} } */
 /* { dg-options "-mcpu=power7 -mtune=power9 -O3" } */
 
 #define LARGE 0x12345