From patchwork Fri Jun 10 13:39:01 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christophe Lyon X-Patchwork-Id: 69781 Delivered-To: patch@linaro.org Received: by 10.140.106.246 with SMTP id e109csp300390qgf; Fri, 10 Jun 2016 06:39:27 -0700 (PDT) X-Received: by 10.36.200.5 with SMTP id w5mr31689884itf.33.1465565966988; Fri, 10 Jun 2016 06:39:26 -0700 (PDT) Return-Path: Received: from sourceware.org (server1.sourceware.org. [209.132.180.131]) by mx.google.com with ESMTPS id s207si13261942pfs.76.2016.06.10.06.39.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 10 Jun 2016 06:39:26 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-return-429541-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org; spf=pass (google.com: domain of gcc-patches-return-429541-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) smtp.mailfrom=gcc-patches-return-429541-patch=linaro.org@gcc.gnu.org; dmarc=fail (p=NONE dis=NONE) header.from=linaro.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; q=dns; s=default; b=PL8YGJfS0BeJ/Go KfySnpc8/UZBkiflXNUH4JvIPLf2kD5/6G2/79rPz1CBkGjpHOEdxznVqhqPQHQL 0ffLsTX2uKe7pon9NpuC5+aLAAlAQCDK1kaHd6DQVKopC3hyCkKi9AcEPd1unIQ2 xZHdSIt5QYekE7sR6UA9mUNdkQ7Q= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; s=default; bh=Fl/bR+i31Sf6V+5Ib7mf3 WID4C8=; b=bJ8SXl0y/+L2IGmXRaQy8pZGlWbx/KHJsUUBv6q1XdVdEofsZAB+A 83Kn3qnln0j6JpuV2h88DaI8uyJdDkA/CKKncePt7pJPdw18nWWsK0GY+DKD+grq FTGguXCcQHIPMTgZOEjwL/kuG3q0M3Yah9r/JrOsuITpDjFsprQp/E= Received: (qmail 3528 invoked by alias); 10 Jun 2016 13:39:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 3518 invoked by uid 89); 10 Jun 2016 13:39:14 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.1 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-qk0-f176.google.com Received: from mail-qk0-f176.google.com (HELO mail-qk0-f176.google.com) (209.85.220.176) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 10 Jun 2016 13:39:04 +0000 Received: by mail-qk0-f176.google.com with SMTP id s186so37212132qkc.1 for ; Fri, 10 Jun 2016 06:39:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=pEVWa24iThLbSRw+M35pWteuyRiARz4far9j1GJn5vM=; b=jyM44GTLGOPcZl3R0l0fF1tV3n33i121qPraAyPBkVZfNv3Q4igqXBA+VS6r8I0pnu paTKCbd/nW+KAWCTBV2OkZ7FjY5qgou9qPdM5FofyISgyqCgUcgbRoD0CSxip5rnrfV9 13OcDNQW3Wma3ZDjzjZJmsRPwwVbVu4Fn/tp0k4VQCE2gCzsOLroJ4BitM0FQBmMs+3/ Gt9ecJ389+l4J8cYwQJnynSJWXcwvVk3saabUv9/lA3wSLL596FU39szmQEpBIJLSiu4 qU8Eo7Q/g4yXECj6ycxx6n3fgCY4CTRal1UFxRqm0yb1/2VBmxNEumhS+cx6cgTjP5Fu UJBA== X-Gm-Message-State: ALyK8tLBAPKmPb3i3rZcLNmSQyTl92MEOKsihg0NK7lAh7LQJSdlZ0ZmNFcmNroLX2n7PhLzGhM9pbA0I0wFxNHA X-Received: by 10.55.126.7 with SMTP id z7mr1805638qkc.137.1465565941818; Fri, 10 Jun 2016 06:39:01 -0700 (PDT) MIME-Version: 1.0 Received: by 10.140.39.47 with HTTP; Fri, 10 Jun 2016 06:39:01 -0700 (PDT) In-Reply-To: <575A7C6E.9060709@foss.arm.com> References: <5746C78B.3030006@foss.arm.com> <57584A7E.5080600@foss.arm.com> <575A7C6E.9060709@foss.arm.com> From: Christophe Lyon Date: Fri, 10 Jun 2016 15:39:01 +0200 Message-ID: Subject: Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out To: Kyrill Tkachov Cc: GCC Patches X-IsSubscribed: yes On 10 June 2016 at 10:38, Kyrill Tkachov wrote: > > On 09/06/16 13:14, Christophe Lyon wrote: >> >> On 8 June 2016 at 18:40, Kyrill Tkachov >> wrote: >>> >>> On 07/06/16 20:34, Christophe Lyon wrote: >>>> >>>> On 26 May 2016 at 11:53, Kyrill Tkachov >>>> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> In this PR we want to optimise: >>>>> int foo (int i) >>>>> { >>>>> return (i == 0) ? N : __builtin_clz (i); >>>>> } >>>>> >>>>> on targets where CLZ is defined at zero to the constant 'N'. >>>>> This is determined at the RTL level through the >>>>> CLZ_DEFINED_VALUE_AT_ZERO >>>>> macro. >>>>> The obvious place to implement this would be in combine through >>>>> simplify-rtx >>>>> where we'd >>>>> recognise an IF_THEN_ELSE of the form: >>>>> (set (reg:SI r1) >>>>> (if_then_else:SI (ne (reg:SI r2) >>>>> (const_int 0 [0])) >>>>> (clz:SI (reg:SI r2)) >>>>> (const_int 32))) >>>>> >>>>> and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd >>>>> simplify >>>>> it into >>>>> just (clz:SI (reg:SI r2)). >>>>> However, I found this doesn't quite happen for a couple of reasons: >>>>> 1) This depends on ifcvt or some other pass to have created a >>>>> conditional >>>>> move of the >>>>> two branches that provide the IF_THEN_ELSE to propagate the const_int >>>>> and >>>>> clz operation into. >>>>> >>>>> 2) Combine will refuse to propagate r2 from the above example into both >>>>> the >>>>> condition and the >>>>> CLZ at the same time, so the most we see is: >>>>> (set (reg:SI r1) >>>>> (if_then_else:SI (ne (reg:CC cc) >>>>> (const_int 0)) >>>>> (clz:SI (reg:SI r2)) >>>>> (const_int 32))) >>>>> >>>>> which is not enough information to perform the simplification. >>>>> >>>>> This patch implements the optimisation in ce1 using the noce ifcvt >>>>> framework. >>>>> During ifcvt noce_process_if_block can see that we're trying to >>>>> optimise >>>>> something >>>>> of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of >>>>> all >>>>> the information >>>>> needed to perform the transformation. >>>>> >>>>> The transformation is performed by adding a new noce_try* function that >>>>> tries to put the >>>>> condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and >>>>> try >>>>> to >>>>> simplify that >>>>> using the simplify-rtx machinery. That way, we can implement the >>>>> simplification logic in >>>>> simplify-rtx.c where it belongs. >>>>> >>>>> A similar transformation for CTZ is implemented as well. >>>>> So for code: >>>>> int foo (int i) >>>>> { >>>>> return (i == 0) ? 32 : __builtin_clz (i); >>>>> } >>>>> >>>>> On aarch64 we now emit: >>>>> foo: >>>>> clz w0, w0 >>>>> ret >>>>> >>>>> instead of: >>>>> foo: >>>>> mov w1, 32 >>>>> clz w2, w0 >>>>> cmp w0, 0 >>>>> csel w0, w2, w1, ne >>>>> ret >>>>> >>>>> and for arm similarly we generate: >>>>> foo: >>>>> clz r0, r0 >>>>> bx lr >>>>> >>>>> instead of: >>>>> foo: >>>>> cmp r0, #0 >>>>> clzne r0, r0 >>>>> moveq r0, #32 >>>>> bx lr >>>>> >>>>> >>>>> and for x86_64 with -O2 -mlzcnt we generate: >>>>> foo: >>>>> xorl %eax, %eax >>>>> lzcntl %edi, %eax >>>>> ret >>>>> >>>>> instead of: >>>>> foo: >>>>> xorl %eax, %eax >>>>> movl $32, %edx >>>>> lzcntl %edi, %eax >>>>> testl %edi, %edi >>>>> cmove %edx, %eax >>>>> ret >>>>> >>>>> >>>>> I tried getting this to work on other targets as well, but encountered >>>>> difficulties. >>>>> For example on powerpc the two arms of the condition seen during ifcvt >>>>> are: >>>>> >>>>> (insn 4 22 11 4 (set (reg:DI 156 [ ]) >>>>> (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64} >>>>> (nil)) >>>>> and >>>>> (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ ]) 0) >>>>> (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 >>>>> {clzsi2} >>>>> (expr_list:REG_DEAD (reg/v:DI 157 [ i ]) >>>>> (nil))) >>>>> >>>>> So the setup code in noce_process_if_block sees that the set >>>>> destination >>>>> is >>>>> not the same >>>>> ((reg:DI 156 [ ]) and (subreg/s/u:SI (reg:DI 156 [ ]) >>>>> 0)) >>>>> so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) >>>>> check. >>>>> I suppose that's a consequence of how SImode operations are represented >>>>> in >>>>> early RTL >>>>> on powerpc, I don't know what to do there. Perhaps that part of ivcvt >>>>> can >>>>> be >>>>> taught to handle >>>>> destinations that are subregs of one another, but that would be a >>>>> separate >>>>> patch. >>>>> >>>>> Anyway, is this patch ok for trunk? >>>>> >>>>> Bootstrapped and tested on arm-none-linux-gnueabihf, >>>>> aarch64-none-linux-gnu, >>>>> x86_64-pc-linux-gnu. >>>>> >>>>> Thanks, >>>>> Kyrill >>>>> >>>>> 2016-05-26 Kyrylo Tkachov >>>>> >>>>> PR middle-end/37780 >>>>> * ifcvt.c (noce_try_ifelse_collapse): New function. >>>>> Declare prototype. >>>>> (noce_process_if_block): Call noce_try_ifelse_collapse. >>>>> * simplify-rtx.c (simplify_cond_clz_ctz): New function. >>>>> (simplify_ternary_operation): Use the above to simplify >>>>> conditional CLZ/CTZ expressions. >>>>> >>>>> 2016-05-26 Kyrylo Tkachov >>>>> >>>>> PR middle-end/37780 >>>>> * gcc.c-torture/execute/pr37780.c: New test. >>>>> * gcc.target/aarch64/pr37780_1.c: Likewise. >>>>> * gcc.target/arm/pr37780_1.c: Likewise. >>>> >>>> Hi Kyrylo, >>> >>> >>> Hi Christophe, >>> >>>> I've noticed that gcc.target/arm/pr37780_1.c fails on >>>> arm if arch < v6. >>>> I first tried to fix the effective-target guard (IIRC, the doc >>>> says clz is available starting with v5t), but that isn't sufficient. >>>> >>>> When compiling for armv5-t, the scan-assembler directives >>>> fail. It seems to work with v6t2, so I am wondering whether >>>> it's just a matter of increasing the effective-target arch version, >>>> or if you really intended to make the test pass on these old >>>> architectures? >>> >>> >>> I've dug into it a bit. >>> I think the problem is that CLZ is available with ARMv5T but only in arm >>> mode. >>> In Thumb mode it's only available with ARMv6T2. >>> So if your armv5t gcc compiles for Thumb by default you'll get the >>> failure. >> >> Actually this gcc is configured with default cpu & mode, >> target=arm-none-eabi. >> >> So I think it's arm7tdmi, arm mode. >> >>> I think just bumping the effective to armv6t2 here is appropriate as I >>> intended >>> the test to just check the ifcvt transformation when the instruction is >>> available >>> rather than test that the CLZ instruction is available at one >>> architecture >>> level >>> or another. >>> >>> A patch to bump the effective target check is pre-approved if you want to >>> do >>> it. >>> Otherwise I'll get to it in a few days. >>> >> I was a about to commit the arm_arch_v5_ok -> arm_arch_v6t2_ok, but >> this would not be sufficient without adding >> dg-add-options arm_arch_v6t2 >> but then if gcc is configured with a higher default architecture, we'll >> end >> up testing if works for v6t2 only. > > > I'm okay with adding "dg-add-options arm_arch_v6t2". > We're not testing the availability of the instruction here, just the ifcvt > transformation when it *is* available, so I think we should just add > whatever > option guarantees that instruction is available. > > Kyrill > OK, I've committed the attached patch. 2016-06-10 Christophe Lyon * gcc.target/arm/pr37780_1.c: Use arm_arch_v6t2 effective target and options. > >>> Thanks for spotting this, >>> Kyrill >>>> >>>> Thanks, >>>> >>>> Christophe. >>> >>> > diff --git a/gcc/testsuite/gcc.target/arm/pr37780_1.c b/gcc/testsuite/gcc.target/arm/pr37780_1.c index b954fe5..8e06920 100644 --- a/gcc/testsuite/gcc.target/arm/pr37780_1.c +++ b/gcc/testsuite/gcc.target/arm/pr37780_1.c @@ -2,8 +2,9 @@ being defined at zero. */ /* { dg-do compile } */ -/* { dg-require-effective-target arm_arch_v5_ok } */ +/* { dg-require-effective-target arm_arch_v6t2_ok } */ /* { dg-options "-O2" } */ +/* { dg-add-options arm_arch_v6t2 } */ int fooctz (int i)