diff mbox

[RFC,ARM] : Fix reload spill failure (PR 60617)

Message ID CAJK_mQ2UXGt+zrJP_NLbn4V3rE0cK+3kaC0FnWXTT=3xidC1HA@mail.gmail.com
State New
Headers show

Commit Message

Venkataramanan Kumar June 16, 2014, 12:53 p.m. UTC
Hi Maintainers,

This patch fixes the PR 60617 that occurs when we turn on reload pass
in thumb2 mode.

It occurs for the pattern "*ior_scc_scc" that gets generated for the 3
argument of the below function call.

JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));


(----snip---)
(insn 634 633 635 27 (parallel [
            (set (reg:SI 3 r3)
                (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
r5 is registers gets assigned
                        (reg/v:SI 112 [ op2 ]))
                    (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
                        (reg/v:SI 111 [ op1 ]))))
            (clobber (reg:CC 100 cc))
        ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
{*ior_scc_scc
(----snip---)

The issue here is that the above pattern demands 5 registers (LO_REGS).

But when we are in reload, registers r0 is used for pointer to the
class, r1 and r2 for first and second argument. r7 is used for stack
pointer.

So we are left with r3,r4,r5 and r6. But the above patterns needs five
LO_REGS. Hence we get spill failure when processing the last register
operand in that pattern,

In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
for thumb 2 mode there is mention of using LO_REG in the comment as
below.

"Care should be taken to avoid adding thumb-2 patterns that require
many low registers"

So conservative fix is not to allow this pattern for Thumb-2 mode.

I allowed these pattern for Thumb2 when we have constant operands for
comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
thum2-cond-cmp-4.c pass.

Regression tested with gcc 4.9 branch since in trunk this bug is
masked revision 209897.

Please provide your suggestion on this patch

regards,
Venkat.

Comments

Ajit Kumar Agarwal June 16, 2014, 3:50 p.m. UTC | #1
You can assign the same register to more than operand based on the Liveness. It will be tricky if on the basis of Liveness  available registers not found. In that you need to spill one of the operands and use the registers assigned to the next operand. This forms the basis of spilling one of the values to assign the same registers to another operand.

This pattern is very specific to LO_REGS and I am sure there will low registers pressure for such operands. Is this pattern used  for dedicated registers for the operands?

Thanks & Regards
Ajit

-----Original Message-----
From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-owner@gcc.gnu.org] On Behalf Of Venkataramanan Kumar

Sent: Monday, June 16, 2014 6:23 PM
To: gcc-patches@gcc.gnu.org; Richard Earnshaw; Ramana Radhakrishnan; Marcus Shawcroft; Patch Tracking; Maxim Kuvyrkov
Subject: [RFC][ARM]: Fix reload spill failure (PR 60617)

Hi Maintainers,

This patch fixes the PR 60617 that occurs when we turn on reload pass in thumb2 mode.

It occurs for the pattern "*ior_scc_scc" that gets generated for the 3 argument of the below function call.

JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));


(----snip---)
(insn 634 633 635 27 (parallel [
            (set (reg:SI 3 r3)
                (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
r5 is registers gets assigned
                        (reg/v:SI 112 [ op2 ]))
                    (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
                        (reg/v:SI 111 [ op1 ]))))
            (clobber (reg:CC 100 cc))
        ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300 {*ior_scc_scc
(----snip---)

The issue here is that the above pattern demands 5 registers (LO_REGS).

But when we are in reload, registers r0 is used for pointer to the class, r1 and r2 for first and second argument. r7 is used for stack pointer.

So we are left with r3,r4,r5 and r6. But the above patterns needs five LO_REGS. Hence we get spill failure when processing the last register operand in that pattern,

In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and for thumb 2 mode there is mention of using LO_REG in the comment as below.

"Care should be taken to avoid adding thumb-2 patterns that require many low registers"

So conservative fix is not to allow this pattern for Thumb-2 mode.

I allowed these pattern for Thumb2 when we have constant operands for comparison. That makes the target tests arm/thum2-cond-cmp-1.c to thum2-cond-cmp-4.c pass.

Regression tested with gcc 4.9 branch since in trunk this bug is masked revision 209897.

Please provide your suggestion on this patch

regards,
Venkat.
Ramana Radhakrishnan June 18, 2014, 9:59 a.m. UTC | #2
On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar
<venkataramanan.kumar@linaro.org> wrote:
> Hi Maintainers,
>
> This patch fixes the PR 60617 that occurs when we turn on reload pass
> in thumb2 mode.
>
> It occurs for the pattern "*ior_scc_scc" that gets generated for the 3
> argument of the below function call.
>
> JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));
>
>
> (----snip---)
> (insn 634 633 635 27 (parallel [
>             (set (reg:SI 3 r3)
>                 (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
> r5 is registers gets assigned
>                         (reg/v:SI 112 [ op2 ]))
>                     (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>                         (reg/v:SI 111 [ op1 ]))))
>             (clobber (reg:CC 100 cc))
>         ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
> {*ior_scc_scc
> (----snip---)
>
> The issue here is that the above pattern demands 5 registers (LO_REGS).
>
> But when we are in reload, registers r0 is used for pointer to the
> class, r1 and r2 for first and second argument. r7 is used for stack
> pointer.
>
> So we are left with r3,r4,r5 and r6. But the above patterns needs five
> LO_REGS. Hence we get spill failure when processing the last register
> operand in that pattern,
>
> In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
> for thumb 2 mode there is mention of using LO_REG in the comment as
> below.
>
> "Care should be taken to avoid adding thumb-2 patterns that require
> many low registers"
>
> So conservative fix is not to allow this pattern for Thumb-2 mode.

I don't have an additional solution off the top of my head and
probably need to go do some digging.

It sounds like the conservative fix but what's the impact of doing so
? Have you measured that in terms of performance or code size on a
range of benchmarks ?

>
> I allowed these pattern for Thumb2 when we have constant operands for
> comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
> thum2-cond-cmp-4.c pass.

That sounds fine and fair - no trouble there.

My concern is with removing the register alternatives and loosing the
ability to trigger conditional compares on 4.9 and trunk for Thumb1
till the time the "new" conditional compare work makes it in.

Ramana

>
> Regression tested with gcc 4.9 branch since in trunk this bug is
> masked revision 209897.
>
> Please provide your suggestion on this patch
>
> regards,
> Venkat.
Venkataramanan Kumar June 18, 2014, 10:35 a.m. UTC | #3
Hi Ramana,

On 18 June 2014 15:29, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
> On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar
> <venkataramanan.kumar@linaro.org> wrote:
>> Hi Maintainers,
>>
>> This patch fixes the PR 60617 that occurs when we turn on reload pass
>> in thumb2 mode.
>>
>> It occurs for the pattern "*ior_scc_scc" that gets generated for the 3
>> argument of the below function call.
>>
>> JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));
>>
>>
>> (----snip---)
>> (insn 634 633 635 27 (parallel [
>>             (set (reg:SI 3 r3)
>>                 (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>> r5 is registers gets assigned
>>                         (reg/v:SI 112 [ op2 ]))
>>                     (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>                         (reg/v:SI 111 [ op1 ]))))
>>             (clobber (reg:CC 100 cc))
>>         ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
>> {*ior_scc_scc
>> (----snip---)
>>
>> The issue here is that the above pattern demands 5 registers (LO_REGS).
>>
>> But when we are in reload, registers r0 is used for pointer to the
>> class, r1 and r2 for first and second argument. r7 is used for stack
>> pointer.
>>
>> So we are left with r3,r4,r5 and r6. But the above patterns needs five
>> LO_REGS. Hence we get spill failure when processing the last register
>> operand in that pattern,
>>
>> In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
>> for thumb 2 mode there is mention of using LO_REG in the comment as
>> below.
>>
>> "Care should be taken to avoid adding thumb-2 patterns that require
>> many low registers"
>>
>> So conservative fix is not to allow this pattern for Thumb-2 mode.
>
> I don't have an additional solution off the top of my head and
> probably need to go do some digging.
>
> It sounds like the conservative fix but what's the impact of doing so
> ? Have you measured that in terms of performance or code size on a
> range of benchmarks ?
>
>>

I haven't done any benchmark testing. I will try and run some
benchmarks with my patch.


>> I allowed these pattern for Thumb2 when we have constant operands for
>> comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
>> thum2-cond-cmp-4.c pass.
>
> That sounds fine and fair - no trouble there.
>
> My concern is with removing the register alternatives and loosing the
> ability to trigger conditional compares on 4.9 and trunk for Thumb1
> till the time the "new" conditional compare work makes it in.
>
> Ramana

This bug does not occur when LRA is enabled. In 4.9 FSF and trunk, the
 LRA pass is enabled by default now .
May be too conservative, but is there a way to enable this pattern
when we have LRA pass and prevent it we have old reload pass?

regards,
Venkat.



>
>>
>> Regression tested with gcc 4.9 branch since in trunk this bug is
>> masked revision 209897.
>>
>> Please provide your suggestion on this patch
>>
>> regards,
>> Venkat.
Venkataramanan Kumar July 7, 2014, 11:48 a.m. UTC | #4
Hi Ramana/Maxim,


On 18 June 2014 16:05, Venkataramanan Kumar
<venkataramanan.kumar@linaro.org> wrote:
> Hi Ramana,
>
> On 18 June 2014 15:29, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>> On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar
>> <venkataramanan.kumar@linaro.org> wrote:
>>> Hi Maintainers,
>>>
>>> This patch fixes the PR 60617 that occurs when we turn on reload pass
>>> in thumb2 mode.
>>>
>>> It occurs for the pattern "*ior_scc_scc" that gets generated for the 3
>>> argument of the below function call.
>>>
>>> JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));
>>>
>>>
>>> (----snip---)
>>> (insn 634 633 635 27 (parallel [
>>>             (set (reg:SI 3 r3)
>>>                 (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>> r5 is registers gets assigned
>>>                         (reg/v:SI 112 [ op2 ]))
>>>                     (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>>                         (reg/v:SI 111 [ op1 ]))))
>>>             (clobber (reg:CC 100 cc))
>>>         ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
>>> {*ior_scc_scc
>>> (----snip---)
>>>
>>> The issue here is that the above pattern demands 5 registers (LO_REGS).
>>>
>>> But when we are in reload, registers r0 is used for pointer to the
>>> class, r1 and r2 for first and second argument. r7 is used for stack
>>> pointer.
>>>
>>> So we are left with r3,r4,r5 and r6. But the above patterns needs five
>>> LO_REGS. Hence we get spill failure when processing the last register
>>> operand in that pattern,
>>>
>>> In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
>>> for thumb 2 mode there is mention of using LO_REG in the comment as
>>> below.
>>>
>>> "Care should be taken to avoid adding thumb-2 patterns that require
>>> many low registers"
>>>
>>> So conservative fix is not to allow this pattern for Thumb-2 mode.
>>
>> I don't have an additional solution off the top of my head and
>> probably need to go do some digging.
>>
>> It sounds like the conservative fix but what's the impact of doing so
>> ? Have you measured that in terms of performance or code size on a
>> range of benchmarks ?
>>
>>>
>
> I haven't done any benchmark testing. I will try and run some
> benchmarks with my patch.
>
>
>>> I allowed these pattern for Thumb2 when we have constant operands for
>>> comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
>>> thum2-cond-cmp-4.c pass.
>>
>> That sounds fine and fair - no trouble there.
>>
>> My concern is with removing the register alternatives and loosing the
>> ability to trigger conditional compares on 4.9 and trunk for Thumb1
>> till the time the "new" conditional compare work makes it in.
>>
>> Ramana

I tested this conservative fix with Coremark (ran it on chromebook)and
SPEC cpu2006 (cross compiled and compared assembly differences).

With Coremark there are no performance issues. In fact there no
assembly differences with CPU flags for A15 and A9.

For SPEC2006 I cross compiled and compared assembly differences with
and without patch (-O3 -fno-common).
I have not run these bechmarks.

There are major code differences and are due to conditional compare
instructions not getting generated as you expected. This also results
in different physical register numbers assigned in the generated code
and also there are code scheduling differences when comparing it with
base.


I am showing a simple test case to showcase the conditional compare
difference I am seeing in SPEC2006 benchmarks.

char a,b;
int i;
int f( int j)
{
  if ( (i >= a) ? (j <= a) : 1)
    return 1;
  else
    return 0;
}

GCC FSF 4.9
-----------

        movw    r2, #:lower16:a
        movw    r3, #:lower16:i
        movt    r2, #:upper16:a
        movt    r3, #:upper16:i
        ldrb    r2, [r2]        @ zero_extendqisi2
        ldr     r3, [r3]
        cmp     r2, r3
        it      le
        cmple   r2, r0          <== conditional compare instrucion generated.
        ite     ge
        movge   r0, #1
        movlt   r0, #0
        bx      lr


With patch
---------

        movw    r2, #:lower16:a
        movw    r3, #:lower16:i
        movt    r2, #:upper16:a
        movt    r3, #:upper16:i
        ldr     r3, [r3]
        ldrb    r2, [r2]        @ zero_extendqisi2
        cmp     r2, r3
        ite     le
        movle   r3, #0 <== Unoptimal moves generated.
        movgt   r3, #1 <==
        cmp     r2, r0
        ite     lt
        movlt   r0, r3
        orrge   r0, r3, #1<==
        bx      lr

The following benchmarks have maximum number of conditional compare
pattern differences and also code scheduling changes/different
physical register numbers in generated code.
416.gamess/
434.zeusmp/
400.perlbench
403.gcc/
445.gobmk
483.xalancbmk/
401.bzip2
433.milc/
435.gromacs

Assembly files differences seen in the below benchmarks buy are not very large.

436.cactusADM/
447.dealII
454.calculix
456.hmmer
453.povray/
465.tonto/
471.omnetpp
459.GemsFDTD
437.leslie3d
464.h264ref


Based on the results, I am of the opinion that this patch would cause
some performance hit if we have it up streamed on trunk or GCC 4.9.

Another thing to be noted is that this bug occurs when only when we
compile with -fno-tree-dce and -mno-lra on GCC 4.9.

Also with current FSF trunk and GCC 4.9,  we have LRA as default for
ARM, which does not have this bug.

My opinion is that reporter can use this fix to build the package that
caused it and if it has to be built with -fno-tree-dce.

I am not sure if we can do anything in machine descriptions level to
prevent this bug. Also in reload pass I am not sure if it is easy to
fix.

Do you have any suggestions?

regards,
Venkat.

>
> This bug does not occur when LRA is enabled. In 4.9 FSF and trunk, the
>  LRA pass is enabled by default now .
> May be too conservative, but is there a way to enable this pattern
> when we have LRA pass and prevent it we have old reload pass?
>
> regards,
> Venkat.
>
>
>
>>
>>>
>>> Regression tested with gcc 4.9 branch since in trunk this bug is
>>> masked revision 209897.
>>>
>>> Please provide your suggestion on this patch
>>>
>>> regards,
>>> Venkat.
Christophe Lyon Sept. 26, 2014, 2:26 p.m. UTC | #5
Ramana,

On 7 July 2014 13:48, Venkataramanan Kumar
<venkataramanan.kumar@linaro.org> wrote:
> Hi Ramana/Maxim,
>
>
> On 18 June 2014 16:05, Venkataramanan Kumar
> <venkataramanan.kumar@linaro.org> wrote:
>> Hi Ramana,
>>
>> On 18 June 2014 15:29, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>>> On Mon, Jun 16, 2014 at 1:53 PM, Venkataramanan Kumar
>>> <venkataramanan.kumar@linaro.org> wrote:
>>>> Hi Maintainers,
>>>>
>>>> This patch fixes the PR 60617 that occurs when we turn on reload pass
>>>> in thumb2 mode.
>>>>
>>>> It occurs for the pattern "*ior_scc_scc" that gets generated for the 3
>>>> argument of the below function call.
>>>>
>>>> JIT:emitStoreInt32(dst,regT0m, (op1 == dst || op2 == dst)));
>>>>
>>>>
>>>> (----snip---)
>>>> (insn 634 633 635 27 (parallel [
>>>>             (set (reg:SI 3 r3)
>>>>                 (ior:SI (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>>> r5 is registers gets assigned
>>>>                         (reg/v:SI 112 [ op2 ]))
>>>>                     (eq:SI (reg/v:SI 110 [ dst ]) <== This operand
>>>>                         (reg/v:SI 111 [ op1 ]))))
>>>>             (clobber (reg:CC 100 cc))
>>>>         ]) ../Source/JavaScriptCore/jit/JITArithmetic32_64.cpp:179 300
>>>> {*ior_scc_scc
>>>> (----snip---)
>>>>
>>>> The issue here is that the above pattern demands 5 registers (LO_REGS).
>>>>
>>>> But when we are in reload, registers r0 is used for pointer to the
>>>> class, r1 and r2 for first and second argument. r7 is used for stack
>>>> pointer.
>>>>
>>>> So we are left with r3,r4,r5 and r6. But the above patterns needs five
>>>> LO_REGS. Hence we get spill failure when processing the last register
>>>> operand in that pattern,
>>>>
>>>> In ARM port,  TARGET_LIKELY_SPILLED_CLASS is defined for Thumb-1 and
>>>> for thumb 2 mode there is mention of using LO_REG in the comment as
>>>> below.
>>>>
>>>> "Care should be taken to avoid adding thumb-2 patterns that require
>>>> many low registers"
>>>>
>>>> So conservative fix is not to allow this pattern for Thumb-2 mode.
>>>
>>> I don't have an additional solution off the top of my head and
>>> probably need to go do some digging.
>>>
>>> It sounds like the conservative fix but what's the impact of doing so
>>> ? Have you measured that in terms of performance or code size on a
>>> range of benchmarks ?
>>>
>>>>
>>
>> I haven't done any benchmark testing. I will try and run some
>> benchmarks with my patch.
>>
>>
>>>> I allowed these pattern for Thumb2 when we have constant operands for
>>>> comparison. That makes the target tests arm/thum2-cond-cmp-1.c to
>>>> thum2-cond-cmp-4.c pass.
>>>
>>> That sounds fine and fair - no trouble there.
>>>
>>> My concern is with removing the register alternatives and loosing the
>>> ability to trigger conditional compares on 4.9 and trunk for Thumb1
>>> till the time the "new" conditional compare work makes it in.
>>>
>>> Ramana
>
> I tested this conservative fix with Coremark (ran it on chromebook)and
> SPEC cpu2006 (cross compiled and compared assembly differences).
>
> With Coremark there are no performance issues. In fact there no
> assembly differences with CPU flags for A15 and A9.
>
> For SPEC2006 I cross compiled and compared assembly differences with
> and without patch (-O3 -fno-common).
> I have not run these bechmarks.
>
> There are major code differences and are due to conditional compare
> instructions not getting generated as you expected. This also results
> in different physical register numbers assigned in the generated code
> and also there are code scheduling differences when comparing it with
> base.
>
>
> I am showing a simple test case to showcase the conditional compare
> difference I am seeing in SPEC2006 benchmarks.
>
> char a,b;
> int i;
> int f( int j)
> {
>   if ( (i >= a) ? (j <= a) : 1)
>     return 1;
>   else
>     return 0;
> }
>
> GCC FSF 4.9
> -----------
>
>         movw    r2, #:lower16:a
>         movw    r3, #:lower16:i
>         movt    r2, #:upper16:a
>         movt    r3, #:upper16:i
>         ldrb    r2, [r2]        @ zero_extendqisi2
>         ldr     r3, [r3]
>         cmp     r2, r3
>         it      le
>         cmple   r2, r0          <== conditional compare instrucion generated.
>         ite     ge
>         movge   r0, #1
>         movlt   r0, #0
>         bx      lr
>
>
> With patch
> ---------
>
>         movw    r2, #:lower16:a
>         movw    r3, #:lower16:i
>         movt    r2, #:upper16:a
>         movt    r3, #:upper16:i
>         ldr     r3, [r3]
>         ldrb    r2, [r2]        @ zero_extendqisi2
>         cmp     r2, r3
>         ite     le
>         movle   r3, #0 <== Unoptimal moves generated.
>         movgt   r3, #1 <==
>         cmp     r2, r0
>         ite     lt
>         movlt   r0, r3
>         orrge   r0, r3, #1<==
>         bx      lr
>
> The following benchmarks have maximum number of conditional compare
> pattern differences and also code scheduling changes/different
> physical register numbers in generated code.
> 416.gamess/
> 434.zeusmp/
> 400.perlbench
> 403.gcc/
> 445.gobmk
> 483.xalancbmk/
> 401.bzip2
> 433.milc/
> 435.gromacs
>
> Assembly files differences seen in the below benchmarks buy are not very large.
>
> 436.cactusADM/
> 447.dealII
> 454.calculix
> 456.hmmer
> 453.povray/
> 465.tonto/
> 471.omnetpp
> 459.GemsFDTD
> 437.leslie3d
> 464.h264ref
>
>
> Based on the results, I am of the opinion that this patch would cause
> some performance hit if we have it up streamed on trunk or GCC 4.9.
>
> Another thing to be noted is that this bug occurs when only when we
> compile with -fno-tree-dce and -mno-lra on GCC 4.9.
>
> Also with current FSF trunk and GCC 4.9,  we have LRA as default for
> ARM, which does not have this bug.
>
> My opinion is that reporter can use this fix to build the package that
> caused it and if it has to be built with -fno-tree-dce.
>
> I am not sure if we can do anything in machine descriptions level to
> prevent this bug. Also in reload pass I am not sure if it is easy to
> fix.
>
> Do you have any suggestions?
>
So this is a 4.8-only bug, or a 4.9/trunk when disabling LRA.
Ramana, do you think it's worth fixing it in 4.8, at the price of a
likely performance degradation, or leaving it open as "known bug" in
that branch?

Thanks,

Christophe.


> regards,
> Venkat.
>
>>
>> This bug does not occur when LRA is enabled. In 4.9 FSF and trunk, the
>>  LRA pass is enabled by default now .
>> May be too conservative, but is there a way to enable this pattern
>> when we have LRA pass and prevent it we have old reload pass?
>>
>> regards,
>> Venkat.
>>
>>
>>
>>>
>>>>
>>>> Regression tested with gcc 4.9 branch since in trunk this bug is
>>>> masked revision 209897.
>>>>
>>>> Please provide your suggestion on this patch
>>>>
>>>> regards,
>>>> Venkat.
diff mbox

Patch

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 0284f95..e8fbb11 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -10654,7 +10654,7 @@ 
 		 [(match_operand:SI 4 "s_register_operand" "r")
 		  (match_operand:SI 5 "arm_add_operand" "rIL")])))
    (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT
+  "TARGET_ARM
    && (arm_select_dominance_cc_mode (operands[3], operands[6], DOM_CC_X_OR_Y)
        != CCmode)"
   "#"
@@ -10675,6 +10675,36 @@ 
    (set_attr "type" "multiple")]
 )
 
+(define_insn_and_split "*ior_scc_scc_imm"
+  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
+        (ior:SI (match_operator:SI 3 "arm_comparison_operator"
+                 [(match_operand:SI 1 "s_register_operand" "r")
+                  (match_operand:SI 2 "arm_addimm_operand" "IL")])
+                (match_operator:SI 6 "arm_comparison_operator"
+                 [(match_operand:SI 4 "s_register_operand" "r")
+                  (match_operand:SI 5 "arm_addimm_operand" "IL")])))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_THUMB2
+   && (arm_select_dominance_cc_mode (operands[3], operands[6], DOM_CC_X_OR_Y)
+       != CCmode)"
+  "#"
+  "TARGET_THUMB2 && reload_completed"
+  [(set (match_dup 7)
+        (compare
+         (ior:SI
+          (match_op_dup 3 [(match_dup 1) (match_dup 2)])
+          (match_op_dup 6 [(match_dup 4) (match_dup 5)]))
+         (const_int 0)))
+   (set (match_dup 0) (ne:SI (match_dup 7) (const_int 0)))]
+  "operands[7]
+     = gen_rtx_REG (arm_select_dominance_cc_mode (operands[3], operands[6],
+                                                  DOM_CC_X_OR_Y),
+                    CC_REGNUM);"
+  [(set_attr "conds" "clob")
+   (set_attr "length" "16")
+   (set_attr "type" "multiple")]
+)
+
 ; If the above pattern is followed by a CMP insn, then the compare is 
 ; redundant, since we can rework the conditional instruction that follows.
 (define_insn_and_split "*ior_scc_scc_cmp"
@@ -10714,7 +10744,7 @@ 
 		 [(match_operand:SI 4 "s_register_operand" "r")
 		  (match_operand:SI 5 "arm_add_operand" "rIL")])))
    (clobber (reg:CC CC_REGNUM))]
-  "TARGET_32BIT
+  "TARGET_ARM 
    && (arm_select_dominance_cc_mode (operands[3], operands[6], DOM_CC_X_AND_Y)
        != CCmode)"
   "#"
@@ -10737,6 +10767,38 @@ 
    (set_attr "type" "multiple")]
 )
 
+(define_insn_and_split "*and_scc_scc_imm"
+  [(set (match_operand:SI 0 "s_register_operand" "=Ts")
+        (and:SI (match_operator:SI 3 "arm_comparison_operator"
+                 [(match_operand:SI 1 "s_register_operand" "r")
+                  (match_operand:SI 2 "arm_addimm_operand" "IL")])
+                (match_operator:SI 6 "arm_comparison_operator"
+                 [(match_operand:SI 4 "s_register_operand" "r")
+                  (match_operand:SI 5 "arm_addimm_operand" "IL")])))
+   (clobber (reg:CC CC_REGNUM))]
+  "TARGET_THUMB2 
+   && (arm_select_dominance_cc_mode (operands[3], operands[6], DOM_CC_X_AND_Y)
+       != CCmode)"
+  "#"
+  "TARGET_THUMB2 && reload_completed
+   && (arm_select_dominance_cc_mode (operands[3], operands[6], DOM_CC_X_AND_Y)
+       != CCmode)"
+  [(set (match_dup 7)
+        (compare
+         (and:SI
+          (match_op_dup 3 [(match_dup 1) (match_dup 2)])
+          (match_op_dup 6 [(match_dup 4) (match_dup 5)]))
+         (const_int 0)))
+   (set (match_dup 0) (ne:SI (match_dup 7) (const_int 0)))]
+  "operands[7]
+     = gen_rtx_REG (arm_select_dominance_cc_mode (operands[3], operands[6],
+                                                  DOM_CC_X_AND_Y),
+                    CC_REGNUM);"
+  [(set_attr "conds" "clob")
+   (set_attr "length" "16")
+   (set_attr "type" "multiple")]
+)
+
 ; If the above pattern is followed by a CMP insn, then the compare is 
 ; redundant, since we can rework the conditional instruction that follows.
 (define_insn_and_split "*and_scc_scc_cmp"