diff mbox

[3/3,RTL,ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out

Message ID CAKdteOabZUzYqGDWCKuN_pU=pYTZa5eDcgOv6PReiEA4SHJHsg@mail.gmail.com
State New
Headers show

Commit Message

Christophe Lyon June 10, 2016, 1:39 p.m. UTC
On 10 June 2016 at 10:38, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>

> On 09/06/16 13:14, Christophe Lyon wrote:

>>

>> On 8 June 2016 at 18:40, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>

>> wrote:

>>>

>>> On 07/06/16 20:34, Christophe Lyon wrote:

>>>>

>>>> On 26 May 2016 at 11:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>

>>>> wrote:

>>>>>

>>>>> Hi all,

>>>>>

>>>>> In this PR we want to optimise:

>>>>> int foo (int i)

>>>>> {

>>>>>     return (i == 0) ? N : __builtin_clz (i);

>>>>> }

>>>>>

>>>>> on targets where CLZ is defined at zero to the constant 'N'.

>>>>> This is determined at the RTL level through the

>>>>> CLZ_DEFINED_VALUE_AT_ZERO

>>>>> macro.

>>>>> The obvious place to implement this would be in combine through

>>>>> simplify-rtx

>>>>> where we'd

>>>>> recognise an IF_THEN_ELSE of the form:

>>>>> (set (reg:SI r1)

>>>>>        (if_then_else:SI (ne (reg:SI r2)

>>>>>                             (const_int 0 [0]))

>>>>>          (clz:SI (reg:SI r2))

>>>>>          (const_int 32)))

>>>>>

>>>>> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd

>>>>> simplify

>>>>> it into

>>>>> just (clz:SI (reg:SI r2)).

>>>>> However, I found this doesn't quite happen for a couple of reasons:

>>>>> 1) This depends on ifcvt or some other pass to have created a

>>>>> conditional

>>>>> move of the

>>>>> two branches that provide the IF_THEN_ELSE to propagate the const_int

>>>>> and

>>>>> clz operation into.

>>>>>

>>>>> 2) Combine will refuse to propagate r2 from the above example into both

>>>>> the

>>>>> condition and the

>>>>> CLZ at the same time, so the most we see is:

>>>>> (set (reg:SI r1)

>>>>>        (if_then_else:SI (ne (reg:CC cc)

>>>>>               (const_int 0))

>>>>>          (clz:SI (reg:SI r2))

>>>>>          (const_int 32)))

>>>>>

>>>>> which is not enough information to perform the simplification.

>>>>>

>>>>> This patch implements the optimisation in ce1 using the noce ifcvt

>>>>> framework.

>>>>> During ifcvt noce_process_if_block can see that we're trying to

>>>>> optimise

>>>>> something

>>>>> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of

>>>>> all

>>>>> the information

>>>>> needed to perform the transformation.

>>>>>

>>>>> The transformation is performed by adding a new noce_try* function that

>>>>> tries to put the

>>>>> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and

>>>>> try

>>>>> to

>>>>> simplify that

>>>>> using the simplify-rtx machinery. That way, we can implement the

>>>>> simplification logic in

>>>>> simplify-rtx.c where it belongs.

>>>>>

>>>>> A similar transformation for CTZ is implemented as well.

>>>>> So for code:

>>>>> int foo (int i)

>>>>> {

>>>>>     return (i == 0) ? 32 : __builtin_clz (i);

>>>>> }

>>>>>

>>>>> On aarch64 we now emit:

>>>>> foo:

>>>>>           clz     w0, w0

>>>>>           ret

>>>>>

>>>>> instead of:

>>>>> foo:

>>>>>           mov     w1, 32

>>>>>           clz     w2, w0

>>>>>           cmp     w0, 0

>>>>>           csel    w0, w2, w1, ne

>>>>>           ret

>>>>>

>>>>> and for arm similarly we generate:

>>>>> foo:

>>>>>           clz     r0, r0

>>>>>           bx      lr

>>>>>

>>>>> instead of:

>>>>> foo:

>>>>>           cmp     r0, #0

>>>>>           clzne   r0, r0

>>>>>           moveq   r0, #32

>>>>>           bx      lr

>>>>>

>>>>>

>>>>> and for x86_64 with -O2 -mlzcnt we generate:

>>>>> foo:

>>>>>           xorl    %eax, %eax

>>>>>           lzcntl  %edi, %eax

>>>>>           ret

>>>>>

>>>>> instead of:

>>>>> foo:

>>>>>           xorl    %eax, %eax

>>>>>           movl    $32, %edx

>>>>>           lzcntl  %edi, %eax

>>>>>           testl   %edi, %edi

>>>>>           cmove   %edx, %eax

>>>>>           ret

>>>>>

>>>>>

>>>>> I tried getting this to work on other targets as well, but encountered

>>>>> difficulties.

>>>>> For example on powerpc the two arms of the condition seen during ifcvt

>>>>> are:

>>>>>

>>>>> (insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])

>>>>>           (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}

>>>>>        (nil))

>>>>> and

>>>>> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)

>>>>>           (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132

>>>>> {clzsi2}

>>>>>        (expr_list:REG_DEAD (reg/v:DI 157 [ i ])

>>>>>           (nil)))

>>>>>

>>>>> So the setup code in noce_process_if_block sees that the set

>>>>> destination

>>>>> is

>>>>> not the same

>>>>> ((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ])

>>>>> 0))

>>>>> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b))

>>>>> check.

>>>>> I suppose that's a consequence of how SImode operations are represented

>>>>> in

>>>>> early RTL

>>>>> on powerpc, I don't know what to do there. Perhaps that part of ivcvt

>>>>> can

>>>>> be

>>>>> taught to handle

>>>>> destinations that are subregs of one another, but that would be a

>>>>> separate

>>>>> patch.

>>>>>

>>>>> Anyway, is this patch ok for trunk?

>>>>>

>>>>> Bootstrapped and tested on arm-none-linux-gnueabihf,

>>>>> aarch64-none-linux-gnu,

>>>>> x86_64-pc-linux-gnu.

>>>>>

>>>>> Thanks,

>>>>> Kyrill

>>>>>

>>>>> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

>>>>>

>>>>>       PR middle-end/37780

>>>>>       * ifcvt.c (noce_try_ifelse_collapse): New function.

>>>>>       Declare prototype.

>>>>>       (noce_process_if_block): Call noce_try_ifelse_collapse.

>>>>>       * simplify-rtx.c (simplify_cond_clz_ctz): New function.

>>>>>       (simplify_ternary_operation): Use the above to simplify

>>>>>       conditional CLZ/CTZ expressions.

>>>>>

>>>>> 2016-05-26  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

>>>>>

>>>>>       PR middle-end/37780

>>>>>       * gcc.c-torture/execute/pr37780.c: New test.

>>>>>       * gcc.target/aarch64/pr37780_1.c: Likewise.

>>>>>       * gcc.target/arm/pr37780_1.c: Likewise.

>>>>

>>>> Hi Kyrylo,

>>>

>>>

>>> Hi Christophe,

>>>

>>>> I've noticed that gcc.target/arm/pr37780_1.c fails on

>>>> arm if arch < v6.

>>>> I first tried to fix the effective-target guard (IIRC, the doc

>>>> says clz is available starting with v5t), but that isn't sufficient.

>>>>

>>>> When compiling for armv5-t, the scan-assembler directives

>>>> fail. It seems to work with v6t2, so I am wondering whether

>>>> it's just a matter of increasing the effective-target arch version,

>>>> or if you really intended to make the test pass on these old

>>>> architectures?

>>>

>>>

>>> I've dug into it a bit.

>>> I think the problem is that CLZ is available with ARMv5T but only in arm

>>> mode.

>>> In Thumb mode it's only available with ARMv6T2.

>>> So if your armv5t gcc compiles for Thumb by default you'll get the

>>> failure.

>>

>> Actually this gcc is configured with default cpu & mode,

>> target=arm-none-eabi.

>>

>> So I think it's arm7tdmi, arm mode.

>>

>>> I think just bumping the effective to armv6t2 here is appropriate as I

>>> intended

>>> the test to just check the ifcvt transformation when the instruction is

>>> available

>>> rather than test that the CLZ instruction is available at one

>>> architecture

>>> level

>>> or another.

>>>

>>> A patch to bump the effective target check is pre-approved if you want to

>>> do

>>> it.

>>> Otherwise I'll get to it in a few days.

>>>

>> I was a about to commit the arm_arch_v5_ok -> arm_arch_v6t2_ok, but

>> this would not be sufficient without adding

>> dg-add-options arm_arch_v6t2

>> but then if gcc is configured with a higher default architecture, we'll

>> end

>> up testing if works for v6t2 only.

>

>

> I'm okay with adding "dg-add-options arm_arch_v6t2".

> We're not testing the availability of the instruction here, just the ifcvt

> transformation when it *is* available, so I think we should just add

> whatever

> option guarantees that instruction is available.

>

> Kyrill

>


OK, I've committed the attached patch.

2016-06-10  Christophe Lyon  <christophe.lyon@linaro.org>

        * gcc.target/arm/pr37780_1.c: Use arm_arch_v6t2 effective target
        and options.

>

>>> Thanks for spotting this,

>>> Kyrill

>>>>

>>>> Thanks,

>>>>

>>>> Christophe.

>>>

>>>

>
diff mbox

Patch

diff --git a/gcc/testsuite/gcc.target/arm/pr37780_1.c b/gcc/testsuite/gcc.target/arm/pr37780_1.c
index b954fe5..8e06920 100644
--- a/gcc/testsuite/gcc.target/arm/pr37780_1.c
+++ b/gcc/testsuite/gcc.target/arm/pr37780_1.c
@@ -2,8 +2,9 @@ 
    being defined at zero.  */
 
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_arch_v5_ok } */
+/* { dg-require-effective-target arm_arch_v6t2_ok } */
 /* { dg-options "-O2" } */
+/* { dg-add-options arm_arch_v6t2 } */
 
 int
 fooctz (int i)